A detailed explanation of memory barriers in the Linux kernel
Preface
I have read a discussion article about sequential consistency and cache consistency before, and I have a clearer understanding of the difference and connection between these two concepts. In the Linux kernel, there are many synchronization and barrier mechanisms, which I would like to summarize here.
Cache consistency
Before I always thought that many mechanisms in Linux were to ensure cache consistency, but in fact, most of the cache consistency is achieved by hardware mechanisms. Only when using instructions with a lock prefix, it has something to do with caching (although this is definitely not strict, but from the current point of view, this is the case in most cases). Most of the time, we want to ensure sequential consistency.
Cache consistency means that in a multi-processor system, each CPU has its own L1 cache. Since the contents of the same piece of memory may be cached in the L1 cache of different CPUs, when a CPU changes its cached content, it must ensure that another CPU can also read the latest content when reading this data. But don't worry, this complex work is completely done by the hardware. By implementing the MESI protocol, the hardware can easily complete the cache coherency work. Even if multiple CPUs write at the same time, there will be no problem. Whether it is in its own cache, the cache of other CPUs, or in memory, a CPU can always read the latest data. This is how cache consistency works.
Sequential Consistency
The so-called sequential consistency refers to a completely different concept from cache consistency, although they are both products of processor development. Because compiler technology continues to evolve, it may change the order of certain operations in order to optimize your code. The concepts of multi-issue and out-of-order execution have long been present in processors. The result is that the actual order of instructions executed will be slightly different from the order of execution of the code during programming. Of course, this is nothing under a single processor. After all, as long as your own code does not pass, no one will care. The compiler and processor disrupt the order of execution while ensuring that their own code cannot be discovered. But this is not the case with multiprocessors. The order in which instructions are completed on one processor may have a great impact on the code executed on other processors. Therefore, there is the concept of sequential consistency, which ensures that the execution order of threads on one processor is the same from the perspective of threads on other processors. The solution to this problem cannot be solved by the processor or compiler alone, but requires software intervention.
Memory barrier
The method of software intervention is also very simple, that is, inserting a memory barrier. In fact, the term memory barrier was coined by processor developers, which makes it difficult for us to understand. Memory barriers can easily lead us to cache consistency, and even doubt whether we can do this to allow other CPUs to see the modified cache. It is wrong to think so. The so-called memory barrier, from a processor perspective, is used to serialize read and write operations. From a software perspective, it is used to solve the problem of sequential consistency. Doesn’t the compiler want to disrupt the order of code execution? Doesn’t the processor want to execute the code out of order? When you insert a memory barrier, it is equivalent to telling the compiler that the order of instructions before and after the barrier cannot be reversed. It tells the processor that it can only wait for the instructions before the barrier. After the instruction is executed, the instruction behind the barrier can begin to be executed. Of course, memory barriers can stop the compiler from messing around, but the processor still has a way. Isn't there a concept of multi-issue, out-of-order execution, and sequential completion in the processor? During the memory barrier, it only needs to ensure that the read and write operations of the previous instructions must be completed before the read and write operations of the following instructions are completed. Therefore, there are three types of memory barriers: read barriers, write barriers, and read-write barriers. For example, before x86, write operations were guaranteed to be completed in order, so write barriers were not needed. However, some ia32 processors now have write operations that are completed out of order, so write barriers are also needed.
In fact, in addition to special read-write barrier instructions, there are many instructions that are executed with read-write barrier functions, such as instructions with a lock prefix. Before the emergence of special read and write barrier instructions, Linux relied on lock to survive.
As for where to insert the read and write barriers, it depends on the needs of the software. The read-write barrier cannot fully achieve sequential consistency, but the thread on the multi-processor will not always stare at your execution order. As long as it ensures that when it looks over, it thinks that you comply with the sequential consistency, the execution will not cause you There are no unexpected situations in the code. The so-called unexpected situation, for example, your thread first assigns a value to variable a, and then assigns a value to variable b. As a result, threads running on other processors look over and find that b has been assigned a value, but a has not been assigned a value. (Note This inconsistency is not caused by cache inconsistency, but by the inconsistency in the order in which the processor write operations are completed). In this case, a write barrier must be added between the assignment of a and the assignment of b.
Inter-processor synchronization
With SMP, threads start running on multiple processors at the same time. As long as it is a thread, there are communication and synchronization requirements. Fortunately, the SMP system uses shared memory, which means that all processors see the same memory content. Although there is an independent L1 cache, cache consistency processing is still handled by the hardware. If threads on different processors want to access the same data, they need critical sections and synchronization. What synchronization depends on? In the UP system before, we relied on semaphores at the top and turned off interrupts and read, modify and write instructions at the bottom. Now in SMP systems, turning off interrupts has been abolished. Although it is still necessary to synchronize threads on the same processor, it is no longer enough to rely on it alone. Read modify write instructions? Not anymore. When the read operation in your instruction is completed and the write operation is not carried out, another processor may perform a read operation or write operation. The cache coherence protocol is advanced, but it is not yet advanced enough to predict which instruction issued this read operation. So x86 invented instructions with lock prefix. When this instruction is executed, all cache lines containing the read and write addresses in the instruction will be invalidated and the memory bus will be locked. In this way, if other processors want to read or write the same address or the address on the same cache line, they can neither do it from the cache (the relevant line in the cache has expired), nor can they do it from the memory bus (the entire memory bus has failed). locked), finally achieving the goal of atomic execution. Of course, starting from the P6 processor, if the address to be accessed by the lock prefix instruction is already in the cache, there is no need to lock the memory bus and the atomic operation can be completed (although I suspect this is because of the addition of the internal common function of the multi-processor). Because of the L2 cache).
Because the memory bus will be locked, unfinished read and write operations will be completed before the instruction with the lock prefix is executed, which also functions as a memory barrier.
Nowadays, the synchronization of threads between multi-processors uses spin locks at the top and read, modify and write instructions with lock prefix at the bottom. Of course, the actual synchronization also includes disabling the task scheduling of the processor, adding task off interrupts, and adding a semaphore outside. The implementation of this kind of spin lock in Linux has gone through four generations of development and has become more efficient and powerful.
内存屏障的实现
\#ifdef CONFIG_SMP \#define smp_mb() mb() \#define smp_rmb() rmb() \#define smp_wmb() wmb() \#else \#define smp_mb() barrier() \#define smp_rmb() barrier() \#define smp_wmb() barrier() \#endif
CONFIG_SMP就是用来支持多处理器的。如果是UP(uniprocessor)系统,就会翻译成barrier()。
#define barrier() asm volatile(“”: : :”memory”)
barrier()的作用,就是告诉编译器,内存的变量值都改变了,之前存在寄存器里的变量副本无效,要访问变量还需再访问内存。这样做足以满足UP中所有的内存屏障。
\#ifdef CONFIG_X86_32 /* \* Some non-Intel clones support out of order store. wmb() ceases to be a \* nop for these. */ \#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2) \#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2) \#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) \#else \#define mb() asm volatile("mfence":::"memory") \#define rmb() asm volatile("lfence":::"memory") \#define wmb() asm volatile("sfence" ::: "memory") \#endif
如果是SMP系统,内存屏障就会翻译成对应的mb()、rmb()和wmb()。这里CONFIG_X86_32的意思是说这是一个32位x86系统,否则就是64位的x86系统。现在的linux内核将32位x86和64位x86融合在同一个x86目录,所以需要增加这个配置选项。
可以看到,如果是64位x86,肯定有mfence、lfence和sfence三条指令,而32位的x86系统则不一定,所以需要进一步查看cpu是否支持这三条新的指令,不行则用加锁的方式来增加内存屏障。
SFENCE,LFENCE,MFENCE指令提供了高效的方式来保证读写内存的排序,这种操作发生在产生弱排序数据的程序和读取这个数据的程序之间。 SFENCE——串行化发生在SFENCE指令之前的写操作但是不影响读操作。 LFENCE——串行化发生在SFENCE指令之前的读操作但是不影响写操作。 MFENCE——串行化发生在MFENCE指令之前的读写操作。 sfence:在sfence指令前的写操作当必须在sfence指令后的写操作前完成。 lfence:在lfence指令前的读操作当必须在lfence指令后的读操作前完成。 mfence:在mfence指令前的读写操作当必须在mfence指令后的读写操作前完成。
至于带lock的内存操作,会在锁内存总线之前,就把之前的读写操作结束,功能相当于mfence,当然执行效率上要差一些。
说起来,现在写点底层代码真不容易,既要注意SMP问题,又要注意cpu乱序读写问题,还要注意cache问题,还有设备DMA问题,等等。
多处理器间同步的实现
多处理器间同步所使用的自旋锁实现,已经有专门的文章介绍
The above is the detailed content of A detailed explanation of memory barriers in the Linux kernel. For more information, please follow other related articles on the PHP Chinese website!


For years, Linux software distribution relied on native formats like DEB and RPM, deeply ingrained in each distribution's ecosystem. However, Flatpak and Snap have emerged, promising a universal approach to application packaging. This article exami

The differences between Linux and Windows in handling device drivers are mainly reflected in the flexibility of driver management and the development environment. 1. Linux adopts a modular design, and the driver can be loaded and uninstalled dynamically. Developers need to have an in-depth understanding of the kernel mechanism. 2. Windows relies on the Microsoft ecosystem, and the driver needs to be developed through WDK and signed and certified. The development is relatively complex but ensures the stability and security of the system.

The security models of Linux and Windows each have their own advantages. Linux provides flexibility and customizability, enabling security through user permissions, file system permissions, and SELinux/AppArmor. Windows focuses on user-friendliness and relies on WindowsDefender, UAC, firewall and BitLocker to ensure security.

Linux and Windows differ in hardware compatibility: Windows has extensive driver support, and Linux depends on the community and vendors. To solve Linux compatibility problems, you can manually compile drivers, such as cloning RTL8188EU driver repository, compiling and installing; Windows users need to manage drivers to optimize performance.

The main differences between Linux and Windows in virtualization support are: 1) Linux provides KVM and Xen, with outstanding performance and flexibility, suitable for high customization environments; 2) Windows supports virtualization through Hyper-V, with a friendly interface, and is closely integrated with the Microsoft ecosystem, suitable for enterprises that rely on Microsoft software.

The main tasks of Linux system administrators include system monitoring and performance tuning, user management, software package management, security management and backup, troubleshooting and resolution, performance optimization and best practices. 1. Use top, htop and other tools to monitor system performance and tune it. 2. Manage user accounts and permissions through useradd commands and other commands. 3. Use apt and yum to manage software packages to ensure system updates and security. 4. Configure a firewall, monitor logs, and perform data backup to ensure system security. 5. Troubleshoot and resolve through log analysis and tool use. 6. Optimize kernel parameters and application configuration, and follow best practices to improve system performance and stability.

Learning Linux is not difficult. 1.Linux is an open source operating system based on Unix and is widely used in servers, embedded systems and personal computers. 2. Understanding file system and permission management is the key. The file system is hierarchical, and permissions include reading, writing and execution. 3. Package management systems such as apt and dnf make software management convenient. 4. Process management is implemented through ps and top commands. 5. Start learning from basic commands such as mkdir, cd, touch and nano, and then try advanced usage such as shell scripts and text processing. 6. Common errors such as permission problems can be solved through sudo and chmod. 7. Performance optimization suggestions include using htop to monitor resources, cleaning unnecessary files, and using sy


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Atom editor mac version download
The most popular open source editor

WebStorm Mac version
Useful JavaScript development tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function
