TL:DR:看起来 POWER7 是 AltiVec 64 位元素大小的最低要求。这是一部分VSX(矢量标量扩展) https://en.wikipedia.org/wiki/AltiVec#VSX_(Vector_Scalar_Extension),维基百科确认它首次出现在 POWER7 中。
gcc 很可能知道它在做什么,并以最低的必要性启用 64 位元素大小的向量内在函数-mcpu=
要求。
#include <altivec.h>
auto vec32(void) { // compiles with your options: Power4
return vec_splats((int) 1);
}
// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
return vec_splats((long long) 1);
}
(With auto
代替vector long long
,第二个函数编译为返回两个 64 位整数寄存器。)
Adding -mvsx
让第二个函数编译。使用-mcpu=power7
也可以,但 power6 不行。
Godbolt 上的源代码 + asm (PowerPC64 gcc6.3) https://gcc.godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAViAIzkDO6ArqatiAOQCkATAGYCAO1QAbVjgDUvQQGEAhuKIEAbtlQA6JHNy8ADAEFDRjahKlpTAsBHZM00UWnmA%2BgAdJTN01TLFKwhzS2tbe0dnaQAPcicRFz8A0gBKU14AdgAhU2k86QB6ApCyMLsHeJcAI38mFzkAEVdNXy9FIl9Bfgh6NMEsvKLpTv5pRRFHFml0EWmRcQBPaQB3MgBrFeIkaQwROvGiJnTjfMKC6Xlmi1KiBY9sMYnpGsU66WxxbABbbASwgC8Hjh7hMmHNpEQkA9bvdpgAzCFQsakYC5fJDDBfDwET5gkhjayWbAFUgfdCKRzjRxqJheIjLOKQ7CksDcMEidBhETAT7xOqkVgWAgzaRwm5IAhHE75EpWGzlSJ/F5vRpXVridpMCBJDWpOQ5YxovLRWSCJruCmOCCxZ61Ih9A1GU6kojsWbRfXpDINY5mTSheURSpqrysHw6wJuXZvYL%2B0qBipRaJpQ3ZI1nWVlINRZX1M1q2kajpMLo9B3pzMJxXVO2m80tQuasv6wbncTEZnKaT0aRIV4Qu4PZzpoaNw7SbX%2BXUpHboLE47Bg7BwuEEVAEX5ERabSGzvZEA5MRlIpiKH5jPEnr7KcQAWmjB4St5m240pBsM3TJtVFswVptubltKeQum6MSeqmPqGsYRSZuIMzANI8Hcui5zAKgqDWIoCyXpKThgqIajKAQjjLFsbAuLeXw0jaywSqg2yoB4rCNB46DLMyGTvCIihVLipiKKwJCmOYABsAAswToCRM6ZI6zrYK6pCzO4Y5ahAyGIZpM69C2ZzSKBykEbMozONgwDMgZ5lHIIABiXpQSYxiCfi5ilmo0mYLJabAVZSkqQ2bSHFAzg6UBJjetwKTkOIPAAKzcOQIg8AYiXoDw8gCDk/ADCw7CcLIQj0IlRApVF0VrCAcUGDFPDiYlXwgBkBhaEIAAcBgAJxxe1XU9YISVleQ6XcIlTAgDVpXcKl0VwLAKAUcxRCUNQmLYp8pAMOQDjEGQDDRbF3AJYN01pTwAiCD2ZG7gACuxzI3ZcEnSKJWiCFoqXkFNM3RVCFLMtQB11Q1ID8J1WgZKJ9CdQY4mQ9DsOQ6JJ2fSNY0TV9ZUpBVIDie9BgZOJbUZPQ/D0G11VxV0tXcIIiXJadw08CVWPkHNyBoHO63MitEBrQum0eB4qASZ8wCiYINWriozLjRAVRDVUoiBAsPDFeQmI/AkADy8yq4zODXtynxDYQpJCho42M9g0SaEJXDcOrZmHZ9RCkAQjWO1FMUEFU42QNF6AeKoMxW7e2uCNIVFMSxZpsRxpDiVH14qOomjJ6SdhnouY1sBwXC9DTx0M6jPB3Qnj0vUnr34xO%2BC7VYF2MBcXMC4Vl0pCzp3Y%2BQlX8BkrUGGTw8k0TokSQNh31SjZ2jcwGPfd77MQCg/Mbbza8A0LIviWLEtSziRCy9QCuM0rPGkPr6ua5uuuLKb3zjDyDufWb/pp1bn023bR9q4lztDTdh7P%2BB1fb%2BwgIHYOwo9g8HDpHaOzFWL3VIFxKiyhVDmAzuZHiPwmC53ygXIGR16ZDRGuXB6T1q5vQ%2BnXQgoQm5xHkK3Da7d%2BCd0xt3HGYMIZCAMATOK9A%2BEGA6uJGm08S6z3RpNLGRD%2BAkMZmjDhP1yBvg/MlcSQA
# with auto without VSX:
vec64(): # -O3 -mcpu=power4 -maltivec -mregnames
li %r4,1
li %r3,1
blr
vec64(): # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0: addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
addis %r9,%r2,.LC0@toc@ha
addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think.
lxvd2x %vs34,0,%r9 # vector load?
xxpermdi %vs34,%vs34,%vs34,2
blr
.LC0: # in .rodata
.quad 1
.quad 1
顺便说一句,vec_splats
(splat 标量)与常量编译为单个指令。但是使用运行时变量(例如函数arg),它会编译为整数存储/向量加载/向量splat(就像vec_splat
固有的)。显然没有一条 int->vec 指令。
The vec_splat_s32
和相关的内在函数only接受一个小的(5 位)常量,因此它们仅在编译器可以使用相应的 splat-immediate 指令的情况下进行编译。
This 英特尔 SSE 到 PowerPC AltiVec 的迁移 https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/Intel%20SSE%20to%20PowerPC%20AltiVec%20migration看起来大部分都不错,但搞错了(它声称vec_splats
生成一个有符号字节)。