Intel lock前缀指令的屏障能力

2023-11-01

Intel lock前缀指令除了单操作原子性的能力之外,还具备可见性和有序性。

对于Intel lock前缀指令的单操作原子性和可见性,参见下面两个链接,其实本质就是锁总线或锁缓存,加上缓存一致性协议。

Intel LOCK前缀指令https://blog.csdn.net/reliveIT/article/details/90038750hotspot x86平台的内存屏障的实现https://blog.csdn.net/reliveIT/article/details/121945327

要特别声明的是,Intel lock前缀指令的有序性是禁止硬件重排序,不会禁止编译器重排序。禁止编译器重排序是C++的volatile和破坏寄存器条件为memory的内联汇编指令。

我是想找时间写一篇《聊聊volatile》的文章,好好讲讲hotspot虚拟机在X86平台上是怎么实现JSR133 Java内存模型中Java volatile的内存语义的,就是怎么做到禁止编译器重排序和处理器重排序,包括禁止volatile修饰的变量和非volatile变量之间重排序。但知道怎么回事儿,和把知道的事儿清楚的写出来告诉别人,比较麻烦,也比较费时间,所以我就分治法,等找个时间汇总。

在Intel CPU手中,关于lock前缀指令禁止处理器重排序的部分集中在卷三中,本文主要是整理归纳出来,方便后续查阅。

11.10  STORE BUFFER

Intel 64 and IA-32 processors temporarily store each write (store) to memory in a store buffer. The store buffer improves processor performance by allowing the processor to continue executing instructions without having to wait until a write to memory and/or to a cache is complete. It also allows writes to be delayed for more efficient use of memory-access bus cycles.

In general, the existence of the store buffer is transparent to software, even in systems that use multiple processors. The processor ensures that write operations are always carried out in program order. It also insures that the contents of the store buffer are always drained to memory in the following situations:

  • When an exception or interrupt is generated.
  • (P6 and more recent processor families only) When a serializing instruction is executed.
  • When an I/O instruction is executed.
  • When a LOCK operation is performed.
  • (P6 and more recent processor families only) When a BINIT operation is performed.
  • (Pentium III, and more recent processor families only) When using an SFENCE instruction to order stores.
  • (Pentium 4 and more recent processor families only) When using an MFENCE instruction to order stores.

The discussion of write ordering in Section 8.2, “Memory Ordering,” gives a detailed description of the operation of the store buffer.

8.2  MEMORY ORDERING

The term memory ordering refers to the order in which the processor issues reads (loads) and writes (stores) through the system bus to system memory. The Intel 64 and IA-32 architectures support several memory-ordering models depending on the implementation of the architecture. For example, the Intel386 processor enforces program ordering (generally referred to as strong ordering), where reads and writes are issued on the system bus in the order they occur in the instruction stream under all circumstances. 

To allow performance optimization of instruction execution, the IA-32 architecture allows departures from strong-ordering model called processor ordering in Pentium 4, Intel Xeon, and P6 family processors. These processor-ordering variations (called here the memory-ordering model) allow performance enhancing operations such as allowing reads to go ahead of buffered writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor system.

8.2.1  Memory Ordering in the Intel ® Pentium ®  and Intel486 ™  Processors

The Pentium and Intel486 processors follow the processor-ordered memory model; however, they operate as strongly-ordered processors under most circumstances. Reads and writes always appear in programmed order at the system bus—except for the following situation where processor ordering is exhibited. Read misses are permitted to go ahead of buffered writes on the system bus when all the buffered writes are cache hits and, therefore, are not directed to the same address being accessed by the read miss. 

In the case of I/O operations, both reads and writes always appear in programmed order.

Software intended to operate correctly in processor-ordered processors (such as the Pentium 4, Intel Xeon, and P6 family processors) should not depend on the relatively strong ordering of the Pentium or Intel486 processors. Instead, it should ensure that accesses to shared variables that are intended to control concurrent execution among processors are explicitly required to obey program ordering through the use of appropriate locking or serializing operations (see Section 8.2.5, “Strengthening or Weakening the Memory-Ordering Model”).

8.2.2  Memory Ordering in P6 and More Recent Processor Families

The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium 4, and P6 family processors also use a processor-ordered memory-ordering model that can be further defined as “write ordered with store-buffer forwarding.” This model can be characterized as follows. 

In a single-processor system for memory regions defined as write-back cacheable, the memory-ordering model respects the following principles (Note the memory-ordering principles for single-processor and multiple-processor systems are written from the perspective of software executing on the processor, where the term “processor” refers to a logical processor. For example, a physical processor supporting multiple cores and/or HyperThreading Technology is treated as a multi-processor systems.):

  • Reads are not reordered with other reads.
  • Writes are not reordered with older reads.
  • Writes to memory are not reordered with other writes, with the following exceptions:
    • writes executed with the CLFLUSH instruction;
    • streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); 
    • and string operations (see Section 8.2.4.1).
  • Reads may be reordered with older writes to different locations but not with older writes to the same location. 
  • Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
  • Reads cannot pass earlier LFENCE and MFENCE instructions.
  • Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.
  • LFENCE instructions cannot pass earlier reads.
  • SFENCE instructions cannot pass earlier writes.
  • MFENCE instructions cannot pass earlier reads or writes.

In a multiple-processor system, the following ordering principles apply:

  • Individual processors use the same ordering principles as in a single-processor system.
  • Writes by a single processor are observed in the same order by all processors.
  • Writes from an individual processor are NOT ordered with respect to the writes from other processors.
  • Memory ordering obeys causality (memory ordering respects transitive visibility).
  • Any two stores are seen in a consistent order by processors other than those performing the stores
  • Locked instructions have a total order.

8.2.3.2   Neither Loads Nor Stores Are Reordered with Like Operations

The Intel-64 memory-ordering model allows neither loads nor stores to be reordered with the same kind of operation. That is, it ensures that loads are seen in program order and that stores are seen in program order. 

附注:对于没有相关性的两个共享变量,在X86平台上,读后读、写后写不允许重排序。

8.2.3.3   Stores Are Not Reordered With Earlier Loads

The Intel-64 memory-ordering model ensures that a store by a processor may not occur before a previous load by the same processor. 

附注:对于没有相关性的两个共享变量,在X86平台上,读后写不允许重排序。

8.2.3.4   Loads May Be Reordered with Earlier Stores to Different Locations

The Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location. However, loads are not reordered with stores to the same location.

附注:对于没有相关性的两个共享变量,在X86平台上,写后读允许重排序。

8.2.3.6   Stores Are Transitively Visible

The memory-ordering model ensures transitive visibility of stores; stores that are causally related appear to all processors to occur in an order consistent with the causality relation. 

8.2.3.8   Locked Instructions Have a Total Order

The memory-ordering model ensures that all processors agree on a single execution order of all locked instructions, including those that are larger than 8 bytes or are not naturally aligned. 

8.2.3.9   Loads and Stores Are Not Reordered with Locked Instructions

The memory-ordering model prevents loads and stores from being reordered with locked instructions that execute earlier or later. The examples in this section illustrate only cases in which a locked instruction is executed before a load or a store. The reader should note that reordering is prevented also if the locked instruction is executed after a load or a store.

22.34  STORE BUFFERS AND MEMORY ORDERING

The Pentium 4, Intel Xeon, and P6 family processors provide a store buffer for temporary storage of writes (stores) to memory (see Section 11.10, “Store Buffer”). Writes stored in the store buffer(s) are always written to memory in program order, with the exception of “fast string” store operations (see Section 8.2.4, “Fast-String Operation and Out-of-Order Stores”).

The Pentium processor has two store buffers, one corresponding to each of the pipelines. Writes in these buffers are always written to memory in the order they were generated by the processor core.

It should be noted that only memory writes are buffered and I/O writes are not. The Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors do not synchronize the completion of memory writes on the bus and instruction execution after a write. An I/O, locked, or serializing instruction needs to be executed to synchronize writes with the next instruction (see Section 8.3, “Serializing Instructions”).

The Pentium 4, Intel Xeon, and P6 family processors use processor ordering to maintain consistency in the order that data is read (loaded) and written (stored) in a program and the order the processor actually carries out the reads and writes. With this type of ordering, reads can be carried out speculatively and in any order, reads can pass buffered writes, and writes to memory are always carried out in program order. (See Section 8.2, “Memory Ordering,” for more information about processor ordering.) The Pentium III processor introduced a new instruction to serialize writes and make them globally visible. Memory ordering issues can arise between a producer and a consumer of data. The SFENCE instruction provides a performance-efficient way of ensuring ordering between routines that produce weakly-ordered results and routines that consume this data.

No re-ordering of reads occurs on the Pentium processor, except under the condition noted in Section 8.2.1, “Memory Ordering in the Intel® Pentium® and Intel486™ Processors,” and in the following paragraph describing the Intel486 processor. 

Specifically, the store buffers are flushed before the IN instruction is executed. No reads (as a result of cache miss) are reordered around previously generated writes sitting in the store buffers. The implication of this is that the store buffers will be flushed or emptied before a subsequent bus cycle is run on the external bus.

On both the Intel486 and Pentium processors, under certain conditions, a memory read will go onto the external bus before the pending memory writes in the buffer even though the writes occurred earlier in the program execution. A memory read will only be reordered in front of all writes pending in the buffers if all writes pending in the buffers are cache hits and the read is a cache miss. Under these conditions, the Intel486 and Pentium processors will not read from an external memory location that needs to be updated by one of the pending writes. 

During a locked bus cycle, the Intel486 processor will always access external memory, it will never look for the location in the on-chip cache. All data pending in the Intel486 processor's store buffers will be written to memory before a locked cycle is allowed to proceed to the external bus. Thus, the locked bus cycle can be used for eliminating the possibility of reordering read cycles on the Intel486 processor. The Pentium processor does check its cache on a read-modify-write access and, if the cache line has been modified, writes the contents back to memory before locking the bus. The P6 family processors write to their cache on a read-modify-write operation (if the access does not split across a cache line) and does not write back to system memory. If the access does split across a cache line, it locks the bus and accesses system memory.

I/O reads are never reordered in front of buffered memory writes on an IA-32 processor. This ensures an update of all memory locations before reading the status from an I/O device.

22.35  BUS LOCKING

The Intel 286 processor performs the bus locking differently than the Intel P6 family, Pentium, Intel486, and Intel386 processors. Programs that use forms of memory locking specific to the Intel 286 processor may not run properly when run on later processors.

A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may lock a larger memory area. For example, typical 8086 and Intel 286 configurations lock the entire physical memory space. Programmers should not depend on this.

On the Intel 286 processor, the LOCK prefix is sensitive to IOPL. If the CPL is greater than the IOPL, a general-protection exception (#GP) is generated. On the Intel386 DX, Intel486, and Pentium, and P6 family processors, no check against IOPL is performed.

The Pentium processor automatically asserts the LOCK# signal when acknowledging external interrupts. After signaling an interrupt request, an external interrupt controller may use the data bus to send the interrupt vector to the processor. After receiving the interrupt request signal, the processor asserts LOCK# to insure that no other data appears on the data bus until the interrupt vector is received. This bus locking does not occur on the P6 family processors.

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Intel lock前缀指令的屏障能力 的相关文章

  • 任务“:app:dexDebug”执行失败

    我目前正在处理我的项目 我决定将我的 Android Studio 更新到新版本 但在我导入项目后 它显示如下错误 Information Gradle tasks app assembleDebug app preBuild UP TO
  • 用 @DataJpaTest 注释的测试不是用 @Autowired 注释的自动装配字段

    我有一个 Spring Boot 应用程序 其中包含 Spring Data Jpa 存储库 我需要围绕这个存储库运行单元 或组件 测试 我对 Spring Data Jpa 没有太多经验 这是我的测试 这很简单 我无法让它通过 impor
  • 如何打印整个字符串池?

    我想打印包含文字的整个字符串池String使用添加的对象intern 就在垃圾收集之前 JDK有没有隐式的方法来进行这样的操作 我们如何检查字符串池 EDIT The comment suggests that there may be a
  • Java 创建浮雕(红/蓝图像)

    我正在编写一个 Java 游戏引擎 http victoryengine org http victoryengine org 并且我一直在尝试生成具有深度的 3D 图像 您可以使用那些红色 蓝色眼镜看到 我正在使用 Java2D 进行图形
  • 使用 Spring 时实例化对象,用于测试与生产

    使用 Spring 时 应该使用 Spring 配置 xml 来实例化生产对象 并在测试时直接实例化对象 这样的理解是否正确 Eg MyMain java package org world hello import org springf
  • 使用 volatile bool 强制另一个线程等待是否安全? (C++)

    我读到的有关 volatile 的所有内容都说它永远不安全 但我仍然倾向于尝试它 而且我还没有看到这种特定场景被宣布为不安全 我有一个单独的线程来渲染场景 从主模拟线程中提取数据 这没有同步 并且工作正常 问题是 当程序退出时 渲染器需要停
  • MI设备中即使应用程序被杀死,如何运行后台服务

    您好 我正在使用 alaram 管理器运行后台服务 它工作正常 但对于某些 mi 设备 后台服务无法工作 我使用了服务 但它无法工作 如何在 mi 中运行我的后台服务 MI UI有自己的安全选项 所以你需要的不仅仅是上面提到的粘性服务 你需
  • Spring Stomp over Websocket:流式传输大文件

    我的SockJs客户端在网页中 发送帧大小为16K的消息 消息大小限制决定了我可以传输的文件的最大大小 以下是我在文档中找到的内容 Configure the maximum size for an incoming sub protoco
  • 服务器到 Firebase HTTP POST 结果为响应消息 200

    使用 Java 代码 向下滚动查看 我使用 FCM 向我的 Android 发送通知消息 当提供正确的服务器密钥令牌时 我收到如下所示的响应消息 之后从 FCM 收到以下响应消息 Response 200 Success Message m
  • spring - 强制 @Autowired 字段的 cglib 代理

    我有混合堆栈 EJB 和 Spring 为了将 Spring 自动装配到 EJB 我使用SpringBeanAutowiringInterceptor 不确定这是否会影响我遇到的问题 在尝试通过以下方式自动装配 bean 时 Scope p
  • 使用 Java 在浏览器中下载 CSV 文件

    我正在尝试在 Web 应用程序上添加一个按钮 单击该按钮会下载一个 CSV 文件 该文件很小 大小仅约 4KB 我已经制作了按钮并附加了一个侦听器 文件也准备好了 我现在唯一需要做的就是创建单击按钮时下载 csv 文件的实际事件 假设 fi
  • 所有junit测试后的清理

    在我的项目中 我必须在所有测试之前进行一些存储库设置 这是使用一些棘手的静态规则来完成的 然而 在所有测试之后我不知道如何进行清理 我不想保留一些神奇的静态数字来引用所有测试方法的数量 我应该一直维护它 最受赞赏的方法是添加一些侦听器 该侦
  • cucumber-junit-platform-engine 中的功能文件发现

    In cucumber junit我使用的库 CucumberOptions定义功能文件位置 package com mycompany cucumber import cucumber api CucumberOptions import
  • 具有多种值类型的 Java 枚举

    基本上我所做的是为国家编写一个枚举 我希望不仅能够像国家一样访问它们 而且还能够访问它们的缩写以及它们是否是原始殖民地 public enum States MASSACHUSETTS Massachusetts MA true MICHI
  • 为什么 ConcurrentHashMap::putIfAbsent 比 ConcurrentHashMap::computeIfAbsent 更快?

    使用 ConcurrentHashMap 我发现computeIfAbsent 比putIfAbsent 慢两倍 这是简单的测试 import java util ArrayList import java util List import
  • Apache Commons CLI:替代已弃用的 OptionBuilder?

    IntelliJ 显示此示例代码中不推荐使用 OptionBuilderhttp commons apache org proper commons cli usage html http commons apache org proper
  • ExceptionHandler 不适用于 Throwable

    我们的应用程序是基于 Spring MVC 的 REST 应用程序 我正在尝试使用 ExceptionHandler 注释来处理所有错误和异常 I have ExceptionHandler Throwable class public R
  • 阻止 OSX 变音符号为所有用户禁用 Java 中的 KeyBindings?

    注 我知道这个问题 https stackoverflow com questions 40335285 java keybinds stop working after holding down a key用户必须输入终端命令才能解决此问
  • struts 教程或示例

    我正在尝试在 Struts 中制作一个登录页面 这个想法是验证用户是否存在等 然后如果有错误 则返回到登录页面 错误显示为红色 典型的登录或任何表单页面验证 我想知道是否有人知道 Struts 中的错误管理教程 我正在专门寻找有关的教程 或
  • Path2D 上的鼠标指针检测

    我构建了一个Path2D http docs oracle com javase 7 docs api java awt geom Path2D html表示由直线组成的未闭合形状 我希望能够检测何时单击鼠标并且鼠标指针靠近路径 在几个像素

随机推荐

  • 空类中都有哪些东西

    定义一个空类 里面什么内容都没有 class A 想想看 它的大小应该是多少 要计算一个类对象的大小 要知道这么几点 类大小是非静态数据成员的类型大小之和 若类中定义了虚函数 需要考虑到虚表指针也占用类对象的内存空间 32位机器下占用四字节
  • matlab如何显示神经网络的均方误差,matlab神经网络工具箱

    1 输入nftool 点击next 2 输入特征X 和目标值Y如下 注意按行 按列 3 设置训练集 验证集 测试机比例 一般默认为0 7 0 15 0 15 4 设置隐藏层个数 需要调的参数之一 5 选择优化算法 默认如图 点击train进
  • sklearn学习笔记

    sklearn简介 2007年 数据科学家大卫 库尔纳佩 David Cournapeau 等人发起了机器学习的开源项目 sklearn 至今已逾十载 到目前为止 它已成为一款非常成熟的知名机器学习框架 sklearn 是一款开源的 Pyt
  • qt 简易画板换线宽

    画板如图所示 想实现线宽变化后 画图工具的线宽可以变化 主窗口是editpicture 画板为paintwidget 算是两个类之间传递数据 想用spinbox的valuechange信号 在主窗口中添加connect area为paint
  • 全国二等计算机,喜报丨热烈祝贺赵思哲同学获得2020全国青少年信息学奥林匹克联赛全国二等奖...

    日前 由CCF主办的2020全国青少年信息学奥林匹克联赛 NOIP 顺利举行 共有来自全国31个省市自治区 含港澳 的选手参赛 经过激烈角逐 我校2020级6班赵思哲同学取得了全国二等奖的优异成绩 全国青少年信息学奥林匹克联赛 NOIP 是
  • Entity Framework Core-数据库优先

    数据库优先是EF Core会根据数据库自动创建Entity Context 因此首先你得先创建数据库 我们通过一个Company数据库做个演示 1 SQL Server中创建数据库 在Visual Studio中打开View gt SQL
  • 模拟电子技术动画-PN结(动画是转的)内容再修改

    1 空穴和电子 动画中空穴是红的 电子是蓝的 其实我一直对空穴这个概念很抵触 因为从这个动画上来看空穴是不动的 但讲PN结 三极管的时候都会把空穴当成运动的载流子 虽然似乎也不是很难理解 标题 2 PN结 标题
  • 强大的BaseRecyclerViewAdapterHelper使用

    介绍 相信大家RecyclerView应该不会陌生 大多数开发者应该都使用上它了 它也是google推荐替换ListView的控件 但是用过它的同学应该都知道它在某些方面并没有ListView使用起来方便 需要我们额外的编写代码 今天就给大
  • postgresql模糊查询(like和~)引用变量

    群里看到关于变量替换语句的问题 兴趣来了就研究一下 过滤包含pg sleep的查询 拼接terminate backend命令 postgres set querystr pg sleep postgres echo querystr pg
  • 刷脸会员结合能力在场景中也在做深度应用

    技术的更新与升级 刷脸的之后 需要号码输入 许多人仍然觉得麻烦 对于是否可能技术升级的问题 零售行业负责人解释道 支付宝刷脸支付整个支付过程可以在10秒内完成 是普通消费者可以接受的时间 在目前阶段可以满足大部分消费者的支付需求 在支付快速
  • Unity中设置物体的透明度

    Unity中设置物体的透明度 Unity中设置物体的透明度 不要再用Metial aa new Metial 因为不再支持 改用matial color Color red 来进行相关设置 在ANDROID端基本是不支持的 PC端目前好像还
  • ProGuard代码混淆器如何使用

    一 概述 1 ProGuard简介 背景 ProGuard 是一个免费的 Java 类文件的压缩 优化 混肴器 它删除没有用的类 字段 方法与属性 使字节码最大程度地优化 使用简短且无意义的名字来重命名类 字段和方法 使用场景 我们在工程应
  • Idea系列文章2-依赖包的引入

    Idea系列文章 IDEA 全称 IntelliJ IDEA 是java编程语言开发的集成环境 IntelliJ在业界被公认为最好的java开发工具 尤其在智能代码助手 代码自动提示 重构 JavaEE支持 各类版本工具 git svn等
  • VUE element-ui之el-tree树形控件勾选节点指定节点自动勾选(指定节点为必选项)

    产品需求 最后一级节点中列表节点为必选项 勾选列表节点之外的同级节点 列表节点自动勾选 取消列表节点勾选 其他同级节点也取消勾选 即列表节点为必选项 列表之外的节点可单独操作 勾选或取消勾选 实现步骤 HTML中定义
  • php mysql utf 8_PHP+MySQL中对UTF-8,UTF8(utf8),set names gbk 的理解

    问题一 在我们进行数据库操作时会发现 数据库中表的编码用的是utf 8 但是在进行dos命令是要使用set names gbk 一 Mysql中默认字符集设置有四级 服务器级 数据库级 表级 和字段级 前三种都是默认设置 并不代表你的字段最
  • mysql删除表数据 MySQL清空表内容 3种命令方法及比较

    一 MySQL清空表数据命令 truncate SQL语法 truncate table 表名 注意 不能与where一起使用 truncate删除数据后是不可以rollback的 truncate删除数据后会重置Identity 标识列
  • Spring Boot 学习系列(09)—自定义Bean的顺序加载

    此文已由作者易国强授权网易云社区发布 欢迎访问网易云社区 了解更多网易技术产品运营经验 Bean 的顺序加载 有些场景中 我们希望编写的Bean能够按照指定的顺序进行加载 比如 有UserServiceBean和OrderServiceBe
  • 薪酬问题手册

    薪酬问题手册 所有离职人员的缺勤扣款都不对 考勤报表 自由制的考勤日历工作时长要显示实际工作时长 考勤报表 自由制 计算月工作时长 解决在接受页面数据CompAtteMonth对象时的absenteeismTime 调整后的缺勤时长 问题
  • GiftWrapping算法求最小凸包的简单实现

    目录 前言 问题简介 基本知识 算法简介 算法简单实现过程 源代码 结语 前言 本篇文章是基于哈工大软件构造的实验一写出的 源代码也只是TurtleSoup类中一个方法 虽然不能直接使用 但其思想还是有一定的参考价值 问题简介 一组平面上的
  • Intel lock前缀指令的屏障能力

    Intel lock前缀指令除了单操作原子性的能力之外 还具备可见性和有序性 对于Intel lock前缀指令的单操作原子性和可见性 参见下面两个链接 其实本质就是锁总线或锁缓存 加上缓存一致性协议 Intel LOCK前缀指令https