简短的回答是“这很复杂,但访问肯定可以潜在地在某些情况下同时发生”。
我认为你的问题有点太黑白分明:你可能正在寻找像“是的,多个设备可以访问内存”这样的答案。same时间”或“不,他们不能”,但现实是,首先您需要描述一些特定的硬件配置,包括一些低级实现细节和优化功能,以获得准确的答案。最后您会需要准确定义“同时”的含义。
一般来说,一个好的一阶近似是硬件能够实现appear所有硬件都可以几乎同时访问内存,但可能会由于争用而导致延迟增加和带宽减少。在非常细粒度的时序级别,一个设备的访问确实可能会推迟另一设备的访问,也可能不会,具体取决于许多因素。您极不可能需要此信息来实现软件正确地,而且您不太可能需要了解细节才能最大限度地提高性能。
也就是说,如果您确实需要了解详细信息,请继续阅读,我可以对某种理想化的笔记本电脑/台式机/服务器规模硬件给出一些一般性观察。
正如 Matthias 提到的,您首先必须考虑缓存。缓存意味着任何受缓存影响的读或写操作(包括几乎所有 CPU 请求和许多其他类型的请求)可能根本不接触内存,因此从这个意义上说,许多内核可以“访问”内存(至少是缓存)它的图像)同时进行。
If you then consider requests that miss in all cache levels, you need to know about the configuration of the memory subsystem. In general a RAM chips can only do "one thing" at a time (i.e., commands1 such a read and write apply to the entire module) and that usually extends to DRAM modules comprised of several chips and also to a series of DRAMs connected via a bus to a single memory controller.
So you can say that electrically speaking, the combination of one memory controller and its attached RAM is likely to be doing only on thing at once. Now that thing is usually something like reading bytes out of a physically contiguous span of bytes, but that operation could actually help handle several requests from different devices at once: even though each devices sends separate requests to the controller, good implementations will coalesce requests to the same or nearby2 area of memory.
此外,甚至 CPU 也可能具有这样的能力:当出现新请求时,它可以/必须注意到现有请求正在针对重叠区域进行,并将新请求与旧请求联系起来。
Still, you can say that for a single memory controller you'll usually be serving the request of one device at a time, absent unusual opportunities to combine requests. Now the requests themselves are typically on the order of nanoseconds, so many separate requests can be served in a small unit of time, so this "exclusiveness" fine-grained and not generally noticeable3.
Now above I was careful to limit the discussion to a single memory-controller - when you have multiple memory controllers4 you can definitely have multiple devices accessing memory simultaneously even at the RAM level. Here each controller is essentially independent, so if the requests from two devices map to different controllers (different NUMA regions) they can proceed in parallel.
这是很长的答案。
1 In fact, the command stream is lower level and more complex than things like "read" or "write" and involves concepts such as opening a memory page, streaming bytes from it, etc. What every programmer should know about memory serves as an excellent intro to the topic.
2 For example, imagine two requests for adjacent bytes in memory: it is possible the controller can combine them into a single request if they fit within the bus width.
3 Of course if you are competing for memory across several devices, the overall impact may be very noticeable: a reduction in per-device bandwidth and an increase in latency, but what I mean is that the sharing is fine-grained enough that you can't generally tell the difference between finely-sliced exclusive access and some hypothetical device which makes simultaneous progress on each request in each period.
4 The most common configuration on modern hardware is one memory controller per socket, so on a 2P system you'd usually have two controllers, also other rations (both higher and lower) are certainly possible.