Home >Java >javaTutorial >Concurrent programming knowledge points for Java thread learning
This article brings you relevant knowledge about java, which mainly organizes issues related to concurrent programming, including the Java memory model, detailed explanation of volatile, and the implementation principle of synchronized, etc. Let’s take a look at it together, I hope it will be helpful to everyone.
Recommended study: "java Video Tutorial"
Java Memory The model is Java Memory Model, or JMM for short. JMM defines how the Java Virtual Machine (JVM) works in computer memory (RAM). JVM is the entire computer virtual model, so JMM is affiliated with JVM. The Java1.5 version has refactored it, and the current Java still uses the Java1.5 version. The problems encountered by Jmm are similar to those encountered in modern computers.
Concurrency problems in physical computers. The concurrency problems encountered by physical machines have many similarities with the situations in virtual machines. The concurrency handling scheme of physical machines also has considerable reference significance for the implementation of virtual machines.
Based on "Jeff Dean's Report at Google All-Engineering Conference" we can see that
When the computer does some of our usual basic operations, the response time required is Different.
The following cases are for illustration only and do not represent the real situation.
If 1M of int type data is read from the memory and accumulated by the CPU, how long will it take?
Do a simple calculation. For 1M data, the int type in Java is 32 bits and 4 bytes. There are a total of 1024*1024/4 = 262144 integers. The CPU calculation time is: 262144 0.6 = 157286 Nanoseconds, and we know that it takes 250,000 nanoseconds to read 1M data from memory. Although there is a gap between the two (of course, this gap is not small, one hundred thousand nanoseconds is enough time for the CPU to execute nearly two hundred thousand instructions), but it is still On an order of magnitude. However, without any caching mechanism, it means that each number needs to be read from the memory. In this case, it takes 100 nanoseconds for the CPU to read the memory once, and 262144 integers are read from the memory to the CPU plus the calculation time. It takes 262144100 250000 = 26 464 400 nanoseconds, which is a difference in order of magnitude.
And in reality, most computing tasks cannot be completed by just "computing" by the processor. The processor must at least interact with the memory, such as reading computing data, storing computing results, etc. This I/O operations are basically impossible to eliminate (cannot rely on registers alone to complete all computing tasks). The speeds of the CPU and memory in early computers were almost the same, but in modern computers, the instruction speed of the CPU far exceeds the access speed of the memory. Since there is a gap of several orders of magnitude between the computer's storage device and the computing speed of the processor, modern computers Computer systems have to add a layer of cache (Cache) with a read and write speed as close as possible to the processor's operation speed to serve as a buffer between the memory and the processor: copy the data needed for the operation into the cache so that the operation can Proceed quickly, and when the operation is completed, it is synchronized back to the memory from the cache, so that the processor does not have to wait for slow memory reads and writes.
In a computer system, the register is the L0 level cache, followed by L1, L2, and L3 (followed by memory, local disk, remote storage). The cache storage space further up is smaller, the speed is faster, and the cost is higher; the storage space further down is larger, the speed is slower, and the cost is lower. From top to bottom, each layer can be regarded as the cache of the next layer, that is: the L0 register is the cache of the L1 first-level cache, L1 is the cache of the L2, and so on; the data of each layer comes from The layer below it, so the data of each layer is a subset of the data of the next layer.
## On modern CPUs, generally speaking, L0, L1, L2, and L3 are integrated inside the CPU, and L1 is also divided into a first-level data cache (Data Cache, D -Cache, L1d) and the first-level instruction cache (Instruction Cache, I-Cache, L1i), which are used to store data and execute instruction decoding of data respectively. Each core has an independent computing processing unit, controller, register, L1, and L2 cache, and then multiple cores of a CPU share the last layer of CPU cache L3.From an abstract point of view, JMM defines the abstract relationship between threads and main memory: shared variables between threads are stored in main memory (Main Memory ), each thread has a private local memory (Local Memory), which stores a copy of the shared variables that the thread can read/write. Local memory is an abstract concept of JMM and does not really exist. It covers caches, write buffers, registers, and other hardware and compiler optimizations.
##2.1 Visibility Visibility means that when multiple threads access the same variable, a thread If the value of this variable is modified, other threads can immediately see the modified value. Since all operations on variables by threads must be performed in the working memory and cannot directly read and write variables in the main memory, then for the shared variables V, they are first in their own working memory and then synchronized to the main memory. . However, it will not be flushed to the main memory in time, but there will be a certain time difference. Obviously, at this time, thread A's operation on variable V is no longer visible to thread B.
To solve the problem of shared object visibility, we can use the volatile keyword or lock.
Atomicity: That is, one operation or multiple operations, either all are executed and the execution process will not be interrupted by any factors, or None are implemented. We all know that CPU resources are allocated in units of threads and are called in a time-sharing manner. The operating system allows a process to execute for a short period of time, such as 50 milliseconds. After 50 milliseconds, the operating system will reselect a process. process to execute (we call it "task switching"), this 50 milliseconds is called the "time slice". Most tasks are switched after the time segment ends.
So why does thread switching cause bugs? Because the operating system performs task switching, it can occur after any CPU instruction is executed! Note that it is a CPU instruction, CPU instruction, CPU instruction, not a statement in a high-level language. For example, count is just one sentence in Java, but in high-level languages a statement often requires multiple CPU instructions to complete. In fact, count contains at least three CPU instructions!
read/write of a volatile variable as using the same lock Synchronizing these single
read/write operations
public class Volati { // 使用volatile 声明一个64位的long型变量 volatile long i = 0L;// 单个volatile 变量的读 public long getI() { return i; }// 单个volatile 变量的写 public void setI(long i) { this.i = i; }// 复合(多个)volatile 变量的 读/写 public void iCount(){ i ++; }}
can be seen as the following code:So the volatile variable itself has the following Features:public class VolaLikeSyn { // 使用 long 型变量 long i = 0L; public synchronized long getI() { return i; }// 对单个的普通变量的读用同一个锁同步 public synchronized void setI(long i) { this.i = i; }// 普通方法调用 public void iCount(){ long temp = getI(); // 调用已同步的读方法 temp = temp + 1L; // 普通写操作 setI(temp); // 调用已同步的写方法 }}
For synchronized blocks, the MonitorEnter instruction is inserted at the beginning of the synchronized code block, while the monitorExit instruction is inserted at the end of the method and the exception. The JVM guarantees that each MonitorEnter must have a corresponding MonitorExit. In general, when the code executes this instruction, it will try to obtain ownership of the object Monitor, that is, try to obtain the lock of the object:
The lock used by synchronized is stored in the Java object header. The object header of the Java object consists of two parts: mark word and klass pointer:
#The lock information exists in the mark word of the object. The default data in MarkWord is to store the HashCode and other information of the object.
But it will change as the operation of the object changes. Different lock states correspond to different record storage methods
Comparing the above picture, we found that there are four lock states, no lock state, biased lock state, lightweight lock state and heavyweight lock state , it will gradually escalate with the competitive situation. Locks can be upgraded but not downgraded in order to improve the efficiency of acquiring and releasing locks.
Introducing background: In most cases, the lock not only does not have multi-thread competition, but also is always acquired multiple times by the same thread. In order to allow threads The cost of acquiring locks is lower and biased locks are introduced to reduce unnecessary CAS operations.
is biased towards the lock, as the name suggests, it will bias to the first visit to the thread. If the synchronous lock is only accessed by a thread during the operation, there is no multi -threaded dispute. , Reduce some locking/unlocking CAS operations (such as some CAS operations waiting for the queue). In this case, a bias lock will be added to the thread. If other threads preempt the lock during operation, the thread holding the biased lock will be suspended, and the JVM will eliminate the biased lock on it and restore the lock to a standard lightweight lock. It further improves the running performance of the program by eliminating synchronization primitives when there is no competition for resources.
## Step 1. Check whether the bias lock logo in Mark Word is set If it is 1, whether the lock flag is 01, confirm that it is in the deflectable state.Look at the picture below to understand the process of obtaining the bias lock:
Step 2. If it is in the biasable state, test whether the thread ID points to the current thread. If so, go to step 5, otherwise go to step 3.
Step 3. If the thread ID does not point to the current thread, compete for the lock through CAS operation. If the competition succeeds, set the thread ID in Mark Word to the current thread ID, and then execute 5; if the competition fails, execute 4.
Step 4. If CAS fails to acquire the bias lock, it means there is competition. When reaching the global safe point (safepoint), the thread that obtains the bias lock is suspended, the bias lock is upgraded to a lightweight lock, and then the thread blocked at the safe point continues to execute the synchronization code. (Revoking the bias lock will cause stop the word)
Step 5. Execute the synchronization code.
Bias lock release:
The cancellation of the biased lock is mentioned in the fourth step above. The bias lock will only release the bias lock when other threads try to compete for the bias lock. The thread holding the bias lock will not take the initiative to release the bias lock. To cancel the biased lock, you need to wait for the global safety point (no bytecode is being executed at this point in time). It will first pause the thread that owns the biased lock, determine whether the lock object is in a locked state, and then restore the biased lock to the previous state after canceling the biased lock. The status of lock (flag bit is "01") or lightweight lock (flag bit is "00").
Applicable scenarios for biased locks:
There is always only one thread executing the synchronization block. Before it finishes executing and releases the lock, no other thread executes the synchronization block. It is used when there is no competition for the lock. Once there is competition, it will be upgraded to a lightweight lock. When upgrading to a lightweight lock, the biased lock needs to be revoked. Revoking the biased lock will cause the stop the word operation;
in When there is lock competition, the biased lock will do a lot of extra operations. Especially when canceling the biased lock, it will lead to a safe point. The safe point will cause stw and lead to performance degradation. In this case, it should be disabled.
jvm Turn on/off bias lock
Turn on bias lock: -XX: UseBiasedLocking -XX:BiasedLockingStartupDelay=0 Turn off bias lock: -XX:-UseBiasedLocking
The lightweight lock is upgraded from the biased lock. The biased lock runs when one thread enters the synchronization block. When the second thread joins the lock contention, the biased lock The lock will be upgraded to a lightweight lock;
Lightweight lock locking process:
The principle of spin lock is very simple. If the thread holding the lock can release the lock resource in a short time, then those waiting to compete The threads holding the lock do not need to switch between the kernel mode and the user mode to enter the blocked and suspended state. They only need to wait (spin) and acquire the lock immediately after the thread holding the lock releases the lock. In this way, Avoid the cost of switching between user threads and kernels.
But thread spinning needs to consume the CPU. To put it bluntly, it means that the CPU is doing useless work. The thread cannot always occupy the CPU and spin to do useless work, so you need to set a maximum spin waiting time.
If the execution time of the thread holding the lock exceeds the maximum spin waiting time and the lock is not released, other threads competing for the lock will still not be able to obtain the lock within the maximum waiting time. At this time, the contention thread will Stop spinning and enter blocking state.
Spin locks reduce thread blocking as much as possible. This is a code block that does not compete fiercely for locks and occupies a very short lock time. In terms of performance, the performance is greatly improved, because the consumption of spin will be less than the consumption of thread blocking and suspending operations.
But if the competition for the lock is fierce, or the thread holding the lock needs to occupy the lock for a long time to execute the synchronization block, it is not suitable to use the spin lock at this time, because the spin lock always occupies the CPU before acquiring the lock. It is useless work and occupying a pit. The consumption of thread spinning is greater than the consumption of thread blocking and suspending operations. Other threads that need cup cannot obtain the CPU, resulting in a waste of CPU.
The purpose of the spin lock is to occupy the CPU resources without releasing them, and wait until the lock is acquired to process it immediately. But how to choose the execution time of spin? If the spin execution time is too long, a large number of threads will be in the spin state and occupy CPU resources, which will affect the performance of the overall system. So the number of spins is important.
can be selected by JDK1.5 can be set to 10 times by default in jdk1.5. In 1.6, adaptive spin locks were introduced. Adaptive spin locks mean that the spin time is no longer fixed, but is determined by the previous spin lock. It is determined by the spin time on the same lock and the status of the lock owner. It is basically considered that the time of context switching of a thread is the best time.
In JDK1.6-XX: UseSpinning turns on the spin lock; after JDK1.7, this parameter is removed and controlled by jvm;
Recommended study: "java video tutorial"
The above is the detailed content of Concurrent programming knowledge points for Java thread learning. For more information, please follow other related articles on the PHP Chinese website!