x86 上的 MESI 的目的与几乎任何多核/CPU 系统上的相同:强制缓存一致性。 x86 上等式的缓存一致性部分没有使用“部分一致性”:缓存是完全连贯。那么,可能的重新排序是一致的缓存系统以及与核心本地组件(例如加载/存储子系统(尤其是存储缓冲区)和其他无序机器)交互的结果。
The result of that interaction is the architected strong memory model that x86 provides, with only limited re-ordering. Without coherent caches, you couldn't reasonably implement this model at all, or almost any model that was anything other than completely weak1.
Your question seems to embed the assumption that there are only possible states "coherent" and "everything every else". Also, there is some mixing of the ideas of cache coherency (which mostly deals with the caches specifically, and is mostly a hidden detail), and the memory consistency model which is architecturally defined and will be implemented by each architecture2. Wikipedia explains https://en.wikipedia.org/wiki/Cache_coherence that one difference between cache coherency and memory consistency is that the rules for the former applies only to one location at a time, whereas consistency rules apply across locations. In practice, the more important distinction is that the memory consistency model is the only architecturally documented one.
Briefly, Intel (and AMD likewise) define a specific memory consistency model, x86-TSO3 https://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf - which is relatively strong as far as memory models go, but is still weaker than sequential consistency https://en.wikipedia.org/wiki/Sequential_consistency. The primary behaviors weakened compared to sequential consistency are:
- 较晚的负载可以通过较早的商店。
- 该存储可以以与总存储顺序不同的顺序看到,但只能由执行其中一个存储的核心看到。
订购至实施这种记忆模型,各个部分必须按规则发挥才能实现它。在所有最新的 x86 上,这意味着有序的加载和存储缓冲区,从而避免不允许的重新排序。使用存储缓冲区会导致上述两种重新排序:如果不允许这些重新排序,实现将受到很大限制,并且可能会慢得多。实际上,这也意味着完全一致的数据缓存,因为如果没有它,许多保证(例如,没有加载-加载重新排序)将很难实现。
总结一下:
- 内存一致性与缓存一致性不同:前者是有文档记录的,并且构成编程模型的一部分。
- 在实践中,x86 实现有完全一致的缓存,这有助于他们实现 x86-TSO 内存模型,该模型相当强大,但弱于顺序一致性。
- Finally, perhaps the answer you were looking for, in different words: a memory model weaker than sequential consistency is still very useful since you can program against it, and in the case you need sequential consistency for some particular operations(s) you insert the right memory barriers4.
- 如果您针对语言提供的内存模型进行编程,例如Java's https://en.wikipedia.org/wiki/Java_memory_model or C++11's http://en.cppreference.com/w/cpp/language/memory_model您无需担心硬件细节,而不必担心语言内存模型,编译器会插入将语言内存模型语义与硬件匹配所需的障碍。硬件模型越强大,所需的障碍就越少。
1 If your memory model was completely weak, i.e., not really placing any restrictions on cross-core reordering, I suppose you could implement it directly on a non-cache coherent system in a cheap way for normal operations, but then memory barriers potentially become very expensive since they would need to flush a potentially large part of the local private cache.
2 Various chips may implement in differently internally, and in particular some chips may implement stronger semantics than the model (i.e., some allowed re-orderings can never be observed), but absent bugs none will implement a weaker one.
3 This is the name given to it in that paper, which I used because Intel themselves doesn't give it a name, and the paper is a more formal definition than the one Intel gives a less formal model as a series of litmus tests.
4 It practice on x86 you usually use locked instructions (using the lock
prefix) rather than separate barriers, although standalone barriers exist also. Here's I'll just use the term barries to refer to both standalone barriers and the barrier semantics embedded into locked instructions.