以下实验表明微指令在加载完成之前的某个时刻被释放。虽然这不是您问题的完整答案,但它可能会提供一些有趣的见解。
在 Skylake 上,有一个 33 个入口的装载预约站(参见https://stackoverflow.com/a/58575898/10461973 https://stackoverflow.com/a/58575898/10461973)。对于下面的实验使用的Coffee Lake i7-8700K也应该是这样。
我们假设R14
包含有效的内存地址。
clflush [R14]
clflush [R14+512]
mfence
# start measuring cycles
mov RAX, [R14]
mov RAX, [R14]
...
mov RAX, [R14]
mov RBX, [R14+512]
# stop measuring cycles
mov RAX, [R14]
展开 35 次。在此系统上,从内存加载至少需要大约 280 个周期。如果加载微指令停留在33个条目的保留站直到完成,则最后一次加载只能在超过280个周期后才开始,并且还需要约280个周期。然而,该实验的总测量时间仅为约 340 个周期。这表明加载微指令在完成之前的某个时间离开了 RS。
相反,以下实验显示了大多数 uop 被迫保留在预留中直到第一次加载完成的情况:
mov RAX, R14
mov [RAX], RAX
clflush [R14]
clflush [R14+512]
mfence
# start measuring cycles
mov RAX, [RAX]
mov RAX, [RAX]
...
mov RAX, [RAX]
mov RBX, [R14+512]
# stop measuring cycles
前 35 个负载现在相互依赖。该实验的测量时间约为 600 个周期。
实验是在除一个核心之外的所有核心都被禁用的情况下进行的,并且 CPU 调速器设置为性能(cpupower frequency-set --governor performance
).
这里有纳米工作台 https://github.com/andreas-abel/nanoBench我使用的命令:
./nanoBench.sh -unroll 1 -basic -asm_init "clflush [R14]; clflush [R14+512]; mfence" -asm "mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RBX, [R14+512]"
./nanoBench.sh -unroll 1 -basic -asm_init "mov RAX, R14; mov [RAX], RAX; clflush [R14]; clflush [R14+512]; mfence" -asm "mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RBX, [R14+512]"