This code is taken almost verbatim from

https://github.com/marcandrysco/Errol.git

with only minor changes to make it suitable for OpenCL use
(removed dependencies on libc). Unfortunately it still relies
on certain functions only found in compiler-rt, so it's not
usable with every Clang build.

the paper describing the algorithm can be found at

https://cseweb.ucsd.edu/~mandrysc/pub/dtoa.pdf

