加载操作在调度、完成或其他时间时是否从 RS 中释放?

2024-01-05

On modern Intel1 x86, are load uops freed from the RS (Reservation Station) at the point they dispatch2, or when they complete3, or somewhere in-between4?


1 I am also interested in AMD Zen and sequels, so feel free to include that too, but for the purposes of making the question manageable I limit it to Intel. Also, AMD seems to have a somewhat different load pipeline from Intel which may make investigating this on AMD a separate task.

2 Dispatch here means leave the RS for execution.

3 Complete here means when the load data returns and is ready to satisfy dependent uops.

4 Or even somewhere outside of the range of time defined by these two events, which seems unlikely but possible.


以下实验表明微指令在加载完成之前的某个时刻被释放。虽然这不是您问题的完整答案,但它可能会提供一些有趣的见解。

在 Skylake 上,有一个 33 个入口的装载预约站(参见https://stackoverflow.com/a/58575898/10461973 https://stackoverflow.com/a/58575898/10461973)。对于下面的实验使用的Coffee Lake i7-8700K也应该是这样。

我们假设R14包含有效的内存地址。

clflush [R14]
clflush [R14+512]
mfence

# start measuring cycles

mov RAX, [R14]
mov RAX, [R14]
...
mov RAX, [R14]

mov RBX, [R14+512]

# stop measuring cycles

mov RAX, [R14]展开 35 次。在此系统上,从内存加载至少需要大约 280 个周期。如果加载微指令停留在33个条目的保留站直到完成,则最后一次加载只能在超过280个周期后才开始,并且还需要约280个周期。然而,该实验的总测量时间仅为约 340 个周期。这表明加载微指令在完成之前的某个时间离开了 RS。

相反,以下实验显示了大多数 uop 被迫保留在预留中直到第一次加载完成的情况:

mov RAX, R14
mov [RAX], RAX
clflush [R14]
clflush [R14+512]
mfence

# start measuring cycles

mov RAX, [RAX]
mov RAX, [RAX]
...
mov RAX, [RAX]

mov RBX, [R14+512]

# stop measuring cycles

前 35 个负载现在相互依赖。该实验的测量时间约为 600 个周期。

实验是在除一个核心之外的所有核心都被禁用的情况下进行的,并且 CPU 调速器设置为性能(cpupower frequency-set --governor performance).

这里有纳米工作台 https://github.com/andreas-abel/nanoBench我使用的命令:

./nanoBench.sh -unroll 1 -basic -asm_init "clflush [R14]; clflush [R14+512]; mfence" -asm "mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RBX, [R14+512]"

./nanoBench.sh -unroll 1 -basic -asm_init "mov RAX, R14; mov [RAX], RAX; clflush [R14]; clflush [R14+512]; mfence" -asm "mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RBX, [R14+512]"

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

加载操作在调度、完成或其他时间时是否从 RS 中释放? 的相关文章

随机推荐