In concurrent programs, programmers will pay special attention to data synchronization between different processes or threads. Especially when multiple threads modify the same variable at the same time, reliable synchronization or other measures must be taken to ensure that the data is modified correctly. Here An important principle is: don't make assumptions about the order in which instructions are executed. You cannot predict the order in which instructions between different threads will be executed.
But in a single-threaded program, it is usually easy for us to assume that instructions are executed sequentially, otherwise we can imagine what terrible changes will happen to the program. The ideal model is: the order in which various instructions are executed is unique and ordered. This order is the order in which they are written in the code, regardless of the processor or other factors. This model is called the sequential consistency model, and it is Model based on the von Neumann system. Of course, this assumption is reasonable in itself and rarely occurs abnormally in practice, but in fact, no modern multiprocessor architecture adopts this model because it is simply too inefficient. In compilation optimization and CPU pipeline, almost all involve instruction reordering.
Compile-time reordering
A typical compile-time reordering is to adjust the order of instructions to reduce the number of register reads and stores as much as possible without changing the program semantics, and to fully replicate the Use the stored value of the register.
Assume that the first instruction calculates a value and assigns it to variable A and stores it in a register. The second instruction has nothing to do with A but needs to occupy a register (assuming it will occupy the register where A is located). The third instruction The instruction uses the value of A and is independent of the second instruction. Then if according to the sequential consistency model, A is put into the register after the first instruction is executed, A no longer exists when the second instruction is executed, and A is read into the register again when the third instruction is executed, and during this process , the value of A has not changed. Usually the compiler will swap the positions of the second and third instructions, so that A exists in the register at the end of the first instruction, and then the value of A can be read directly from the register, reducing the overhead of repeated reading.
The significance of reordering for the pipeline
Modern CPUs almost all use the pipeline mechanism to speed up the processing of instructions. Generally speaking, an instruction requires several CPU clock cycles to process, and it is executed in parallel through the pipeline , several instructions can be executed in the same clock cycle. The specific method is simply to divide the instructions into different execution cycles, such as reading, addressing, parsing, execution and other steps, and place them in different components for processing. At the same time, in the execution unit EU, the functional units are divided into different components, such as addition components, multiplication components, loading components, storage components, etc., which can further realize parallel execution of different calculations.
The pipeline architecture determines that instructions should be executed in parallel, not as considered in the sequential model. Reordering is conducive to making full use of the pipeline, thereby achieving superscalar effects.
Ensure sequence
Although instructions are not necessarily executed in the order we wrote them, there is no doubt that in a single-threaded environment, the final effect of instruction execution should be the same as The effect is consistent under sequential execution, otherwise this optimization will be meaningless.
Usually, the above principles will be met whether the instruction reordering is performed at compile time or run time.
Reordering in Java Storage Model
In the Java Storage Model (Java Memory Model, JMM), reordering is a very important section, especially in concurrent programming. JMM ensures sequential execution semantics through the happens-before rule. If you want the thread executing operation B to observe the results of the thread executing operation A, then A and B must satisfy the happens-before principle. Otherwise, the JVM can perform arbitrary operations on them. Sorting to improve program performance.
The volatile keyword can ensure the visibility of variables, because operations on volatile are all in Main Memory, and Main Memory is shared by all threads. The price here is that performance is sacrificed and registers or registers cannot be used. Cache, because they are not global, visibility cannot be guaranteed and dirty reads may occur.
volatile also has the function of locally preventing reordering. Operation instructions on volatile variables will not be reordered, because if reordered, visibility problems may occur.
In terms of ensuring visibility, locks (including explicit locks, object locks) and reading and writing of atomic variables can ensure the visibility of variables. However, the implementation methods are slightly different. For example, synchronization lock ensures that data is re-read from the memory to refresh the cache when the lock is obtained. When the lock is released, the data is written back to the memory to ensure that the data is visible, while volatile variables simply read and write memory.
For more detailed introduction to JVM reordering in JAVA and related articles, please pay attention to the PHP Chinese website!