可以存储在浮点类型中的最大(有限)数这个问题的答案是FLT_MAX
or DBL_MAX
for float
and double
, 分别。
然而,这并不意味着该类型可以精确地表示每个较小的数字或整数(事实上,甚至不接近)。
First you need to understand that not all bits of a floating point number are “equal”. A floating point number has an exponent (8 bits in IEEE-754 standard float
, 11 bits in double
), and a mantissa (23 and 52 bits in float
, and double
respectively). The number is obtained by multiplying the mantissa (which has an implied leading 1-bit and binary point) by 2exponent (after normalizing the exponent; its binary value is not used directly). There is also a separate sign bit, so the following applies to negative numbers as well.
随着指数的变化,尾数的连续值之间的距离也会变化,即指数越大,相隔越远浮点数的连续可表示值是。因此,您可以精确地存储给定大小的一个数字,但不能存储“下一个”数字。人们还应该记住,一些看似简单的分数无法用任意数量的二进制数字精确表示(例如,1/10
,十分之一,是一个无限重复的二进制序列,就像1/3
,三分之一,以十进制表示)。
When it comes to integers, you can precisely represent every integer up to 2mantissa_bits + 1 magnitude. Thus an IEEE-754 float
can represent all integers up to 224 and a double
up to 253 (in the last half of these ranges the consecutive floating point values are exactly one integer apart, since the entire mantissa is used for the integer part only). There are individual larger integers that can be represented, but they are spaced more than one integer apart, i.e., you can represent some integers greater than 2mantissa_bits + 1 but every integer only up to that magnitude.
例如:
float f = powf(2.0f, 24.0f);
float f1 = f + 1.0f, f2 = f1 + 2.0f;
double d = pow(2.0, 53.0);
double d1 = d + 1.0, d2 = d + 2.0;
(void) printf("2**24 float = %.0f, +1 = %.0f, +2 = %.0f\n", f, f1, f2);
(void) printf("2**53 double = %.0f, +1 = %.0f, +2 = %.0f\n", d, d1, d2);
Outputs:
2**24 float = 16777216, +1 = 16777216, +2 = 16777218
2**53 double = 9007199254740992, +1 = 9007199254740992, +2 = 9007199254740994
As you can see, adding 1
to 2mantissa_bits + 1 makes no difference since the result is not representable, but adding 2
does produce the correct answer (as it happens, at this magnitude the representable numbers are two integers apart since the multiplier has doubled).
TL;DR An IEE-754 float
can precisely represent all integers up to 224 and double
up to 253, but only some integers of greater magnitude (the spacing of representable values depends on the magnitude).