解释了将双精度数舍入为 32 位整数的快速方法

2024-04-09

读书时Lua http://en.wikipedia.org/wiki/Lua_%28programming_language%29的源码中,我注意到Lua使用了一个宏来进行舍入double值转为 32 位int价值观。该宏定义在Llimits.h头文件 http://www.lua.org/source/5.2/llimits.h.html内容如下:

union i_cast {double d; int i[2]};
#define double2int(i, d, t) \
    {volatile union i_cast u; u.d = (d) + 6755399441055744.0; \
    (i) = (t)u.i[ENDIANLOC];}

Here ENDIANLOC是根据字节顺序 http://en.wikipedia.org/wiki/Endianness:0 表示小端,1 表示大端架构; Lua 小心地处理字节顺序。这t参数被替换为整数类型,例如int or unsigned int.

我做了一些研究,发现该宏有一种更简单的格式,它使用相同的技术:

#define double2int(i, d) \
    {double t = ((d) + 6755399441055744.0); i = *((int *)(&t));}

或者,以 C++ 风格:

inline int double2int(double d)
{
    d += 6755399441055744.0;
    return reinterpret_cast<int&>(d);
}

这个技巧可以在任何机器上使用IEEE 754 https://en.wikipedia.org/wiki/IEEE_floating_point(这意味着今天几乎每台机器)。它适用于正数和负数,四舍五入如下银行家法则 https://en.wikipedia.org/wiki/Rounding#Round_half_to_even。 (这并不奇怪,因为它遵循 IEEE 754。)

我写了一个小程序来测试它:

int main()
{
    double d = -12345678.9;
    int i;
    double2int(i, d)
    printf("%d\n", i);
    return 0;
}

它输出-12345679,正如预期的那样。

I would like to understand how this tricky macro works in detail. The magic number 6755399441055744.0 is actually 251 + 252, or 1.5 × 252, and 1.5 in binary can be represented as 1.1. When any 32-bit integer is added to this magic number—

好吧,我从这里迷路了。这个技巧如何发挥作用?

Update

  1. As @Mysticial points out, this method does not limit itself to a 32-bit int, it can also be expanded to a 64-bit int as long as the number is in the range of 252. (Although the macro needs some modification.)

  2. 有些材料说这种方法不能用于Direct3D http://en.wikipedia.org/wiki/Microsoft_Direct3D.

  3. 当使用 Microsoft x86 汇编器时,有一个用汇编代码编写的更快的宏(以下也摘自 Lua 源代码):

     #define double2int(i,n)  __asm {__asm fld n   __asm fistp i}
    
  4. There is a similar magic number for single precision numbers: 1.5 × 223.


的值double浮点类型表示如下:

可以看作两个32位整数;现在int包含代码的所有版本(假设它是 32 位int) 就是图中右边的那个,所以你最后所做的只是取尾数的最低 32 位。


Now, to the magic number; as you correctly stated, 6755399441055744 is 251 + 252; adding such a number forces the double to go into the “sweet range” between 252 and 253, which, as explained by Wikipedia https://en.wikipedia.org/wiki/Double_precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64, has an interesting property:

Between 252 = 4,503,599,627,370,496 and 253 = 9,007,199,254,740,992, the representable numbers are exactly the integers.

这是因为尾数为 52 位宽。

The other interesting fact about adding 251 + 252 is that it affects the mantissa only in the two highest bits—which are discarded anyway, since we are taking only its lowest 32 bits.


最后但并非最不重要的一点:标志。

IEEE 754 浮点使用幅度和符号表示,而“普通”机器上的整数使用 2 的补码算术;这里是如何处理的?

We talked only about positive integers; now suppose we are dealing with a negative number in the range representable by a 32-bit int, so less (in absolute value) than (−231 + 1); call it −a. Such a number is obviously made positive by adding the magic number, and the resulting value is 252 + 251 + (−a).

Now, what do we get if we interpret the mantissa in 2’s complement representation? It must be the result of 2’s complement sum of (252 + 251) and (−a). Again, the first term affects only the upper two bits, what remains in the bits 0–50 is the 2’s complement representation of (−a) (again, minus the upper two bits).

由于将 2 的补码数减少到更小的宽度只需删除左侧的额外位即可完成,因此采用较低的 32 位可以在 32 位 2 的补码算术中正确给出 (−a)。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

解释了将双精度数舍入为 32 位整数的快速方法 的相关文章

随机推荐