here are some sample floating point representations:
0 0x00000000
1.0 0x3f800000
0.5 0x3f000000
3 0x40400000
+inf 0x7f800000
-inf 0xff800000
+NaN 0x7fc00000 or 0x7ff00000
in general: number = (sign ? -1:1) * 2^(exponent) * 1.(mantissa bits)
As a programmer, it is important to know certain characteristics of your FP representation. These are listed below, with example values for both single- and double-precision IEEE floating point numbers:
| Property |
Value for float |
Value for double |
| Largest representable number |
3.402823466e+38 |
1.7976931348623157e+308 |
| Smallest number without losing precision |
1.175494351e-38 |
2.2250738585072014e-308 |
| Smallest representable number(*) |
1.401298464e-45 |
5e-324 |
| Mantissa bits |
23 |
52 |
| Exponent bits |
8 |
11 |
| Epsilon(**) |
1.1929093e-7 |
2.220446049250313e-16 |
Note that all numbers in the text of this article assume single-precision floats; doubles are included above for comparison and reference purposes.
(*)
Just to make life interesting, here we have yet another special case. It turns out that if you set the exponent bits to zero, you can represent numbers other than zero by setting mantissa bits. As long as we have an implied leading 1, the smallest number we can get is clearly 2^-126, so to get these lower values we make an exception. The “1.m” interpretation disappears, and the number’s magnitude is determined only by bit positions; if you shift the mantissa to the right, the apparent exponent will change (try it!). It may help clarify matters to point out that 1.401298464e-45 = 2^(-126-23), in other words the smallest exponent minus the number of mantissa bits.
However, as I have implied in the above table, when using these extra-small numbers you sacrifice precision. When there is no implied 1, all bits to the left of the lowest set bit are leading zeros, which add no information to a number (as you know, you can write zeros to the left of any number all day long if you want). Therefore the absolute smallest representable number (1.401298464e-45, with only the lowest bit of the FP word set) has an appalling mere single bit of precision!
(**)
Epsilon is the smallest x such that 1+x > 1. It is the place value of the least significant bit when the exponent is zero (i.e., stored as 0x7f).