Home >Java >JavaBase >Introduction to Java CAS principle analysis

Introduction to Java CAS principle analysis

coldplay.xixi
coldplay.xixiforward
2020-12-24 17:37:002592browse

java basic tutorialColumn introduction and analysis Java CAS

Introduction to Java CAS principle analysis

##Recommendation (free): java basic tutorial

1. Introduction

CAS stands for compare and swap. A mechanism for implementing synchronization functionality in a multi-threaded environment. A CAS operation contains three operands -- a memory location, an expected value, and a new value. The implementation logic of CAS is to compare the value at the memory location with the expected value. If they are equal, replace the value at the memory location with the new value. If not equal, no operation is performed.

In Java, Java does not directly implement CAS. CAS-related implementations are implemented in the form of C inline assembly. Java code needs to be called through JNI. I will analyze the implementation details in Chapter 3.

As mentioned earlier, the process of CAS operation is not difficult. But the above explanation is not enough. Next, I will introduce some other background knowledge. Only with this background knowledge can we better understand the subsequent content.

2. Background introduction

We all know that the CPU transmits data through the bus and memory. In the multi-core era, multiple cores communicate with memory and other hardware through the same bus. As shown below:

Introduction to Java CAS principle analysis

Picture source: "In-depth Understanding of Computer Systems"

The above picture is a relatively simple computer structure diagram. Although simple, it is sufficient to explain question. In the diagram above, the CPU communicates with the memory via the bus marked by the two blue arrows. Let’s consider a question. If multiple cores of the CPU operate on the same memory at the same time, what kind of errors will occur if it is not controlled? Here is a brief explanation, assuming that core 1 writes 64-bit data to the memory via a 32-bit bandwidth bus, core 1 needs to write twice to complete the entire operation. If after core 1 writes 32-bit data for the first time, core 2 reads 64-bit data from the memory location written by core 1. Since core 1 has not completely written all 64-bit data into the memory, core 2 begins to read data from this memory location, so the read data must be chaotic.

But there is actually no need to worry about this issue. Through the Intel Developer Manual, we can learn that starting with Pentium processors, Intel processors will ensure atomic reading and writing of quadwords aligned on 64-bit boundaries.

Based on the above description, we can conclude that Intel processors can ensure that single-access memory-aligned instructions are executed atomically. But what if it is an instruction to access memory twice? The answer is no guarantee. For example, the increment instruction

inc dword ptr [...] is equivalent to DEST = DEST 1. This instruction contains three operationsRead->Modify->Write, involving two memory accesses. Consider a situation where a value of 1 is stored at a specified location in memory. Now both CPU cores execute the instruction at the same time. The process of alternate execution of the two cores is as follows:

    Core 1 reads the value 1 from the specified location in the memory and loads it into the register
  1. Core 2 reads from the specified location in the memory Value 1 and load it into the register
  2. Core 1 Decrement the value in the register by 1
  3. Core 2 Decrement the value in the register by 1
  4. Core 1 Write the modified value Back to memory
  5. Core 2 Write the modified value back to memory
After executing the above process, the final value in the memory is 2, and what we expect is 3, this is what happens Problem. To deal with this problem, it is necessary to prevent two or more cores from operating the same memory area at the same time. So how to avoid it? This introduces the protagonist of this article - the lock prefix. For a detailed description of this instruction, please refer to the Intel Developer Manual Volume 2 Instruction Set Reference, Chapter 3 Instruction Set Reference A-L. I quote a section of it here, as follows:

LOCK—Assert LOCK# Signal Prefix
Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction (
turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted.
The key points described above have been used It is highlighted in bold that in a multi-processor environment, the LOCK# signal can ensure that the processor has exclusive use of some shared memory. lock can be added before the following instructions:

ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG.

By adding the lock prefix before the inc instruction, the instruction can be made atomic. When multiple cores execute the same inc instruction at the same time, they will do so in a serial manner, thus avoiding the situation mentioned above. So there is another question here. How does the lock prefix ensure that the core exclusively occupies a certain memory area? The answer is as follows:

In Intel processors, there are two ways to ensure that a certain core of the processor occupies a certain memory area exclusively. The first way is to lock the bus and let a certain core use the bus exclusively, but this is too expensive. After the bus is locked, other cores cannot access the memory, which may cause other cores to stop working for a short time. The second way is to lock the cache, if some memory data is cached in the processor cache. The LOCK# signal issued by the processor does not lock the bus, but locks the memory area corresponding to the cache line. Other processors cannot perform related operations on this memory area while this memory area is locked. Compared with locking the bus, the cost of locking the cache is obviously smaller. Regarding bus locks and cache locks, for a more detailed description, please refer to the Intel Developer’s Manual Volume 3 Software Developer’s Manual, Chapter 8 Multiple-Processor Management.

3. Source code analysis

With the above background knowledge, now we can read the source code of CAS leisurely. The content of this chapter will analyze the compareAndSet method in the atomic class AtomicInteger under the java.util.concurrent.atomic package. The relevant analysis is as follows:

public class AtomicInteger extends Number implements java.io.Serializable {

    // setup to use Unsafe.compareAndSwapInt for updates
    private static final Unsafe unsafe = Unsafe.getUnsafe();
    private static final long valueOffset;

    static {
        try {
            // 计算变量 value 在类对象中的偏移
            valueOffset = unsafe.objectFieldOffset
                (AtomicInteger.class.getDeclaredField("value"));
        } catch (Exception ex) { throw new Error(ex); }
    }

    private volatile int value;
    
    public final boolean compareAndSet(int expect, int update) {
        /*
         * compareAndSet 实际上只是一个壳子,主要的逻辑封装在 Unsafe 的 
         * compareAndSwapInt 方法中
         */
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }
    
    // ......
}

public final class Unsafe {
    // compareAndSwapInt 是 native 类型的方法,继续往下看
    public final native boolean compareAndSwapInt(Object o, long offset,
                                                  int expected,
                                                  int x);
    // ......
}
// unsafe.cpp
/*
 * 这个看起来好像不像一个函数,不过不用担心,不是重点。UNSAFE_ENTRY 和 UNSAFE_END 都是宏,
 * 在预编译期间会被替换成真正的代码。下面的 jboolean、jlong 和 jint 等是一些类型定义(typedef):
 * 
 * jni.h
 *     typedef unsigned char   jboolean;
 *     typedef unsigned short  jchar;
 *     typedef short           jshort;
 *     typedef float           jfloat;
 *     typedef double          jdouble;
 * 
 * jni_md.h
 *     typedef int jint;
 *     #ifdef _LP64 // 64-bit
 *     typedef long jlong;
 *     #else
 *     typedef long long jlong;
 *     #endif
 *     typedef signed char jbyte;
 */
UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt");
  oop p = JNIHandles::resolve(obj);
  // 根据偏移量,计算 value 的地址。这里的 offset 就是 AtomaicInteger 中的 valueOffset
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
  // 调用 Atomic 中的函数 cmpxchg,该函数声明于 Atomic.hpp 中
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END

// atomic.cpp
unsigned Atomic::cmpxchg(unsigned int exchange_value,
                         volatile unsigned int* dest, unsigned int compare_value) {
  assert(sizeof(unsigned int) == sizeof(jint), "more work to do");
  /*
   * 根据操作系统类型调用不同平台下的重载函数,这个在预编译期间编译器会决定调用哪个平台下的重载
   * 函数。相关的预编译逻辑如下:
   * 
   * atomic.inline.hpp:
   *    #include "runtime/atomic.hpp"
   *    
   *    // Linux
   *    #ifdef TARGET_OS_ARCH_linux_x86
   *    # include "atomic_linux_x86.inline.hpp"
   *    #endif
   *   
   *    // 省略部分代码
   *    
   *    // Windows
   *    #ifdef TARGET_OS_ARCH_windows_x86
   *    # include "atomic_windows_x86.inline.hpp"
   *    #endif
   *    
   *    // BSD
   *    #ifdef TARGET_OS_ARCH_bsd_x86
   *    # include "atomic_bsd_x86.inline.hpp"
   *    #endif
   * 
   * 接下来分析 atomic_windows_x86.inline.hpp 中的 cmpxchg 函数实现
   */
  return (unsigned int)Atomic::cmpxchg((jint)exchange_value, (volatile jint*)dest,
                                       (jint)compare_value);
}

The above analysis seems to be more, but the main process is not complicated. . If you don't get hung up on the details of the code, it's relatively easy to understand. Next, I will analyze the Atomic::cmpxchg function under the Windows platform. Read on.

// atomic_windows_x86.inline.hpp
#define LOCK_IF_MP(mp) __asm cmp mp, 0  \
                       __asm je L0      \
                       __asm _emit 0xF0 \
                       __asm L0:
              
inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint compare_value) {
  // alternative for InterlockedCompareExchange
  int mp = os::is_MP();
  __asm {
    mov edx, dest
    mov ecx, exchange_value
    mov eax, compare_value
    LOCK_IF_MP(mp)
    cmpxchg dword ptr [edx], ecx
  }
}

The above code consists of the LOCK_IF_MP precompiled identifier and the cmpxchg function. To see it a little clearer, let's replace LOCK_IF_MP in the cmpxchg function with the actual content. As follows:

inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint compare_value) {
  // 判断是否是多核 CPU
  int mp = os::is_MP();
  __asm {
    // 将参数值放入寄存器中
    mov edx, dest    // 注意: dest 是指针类型,这里是把内存地址存入 edx 寄存器中
    mov ecx, exchange_value
    mov eax, compare_value
    
    // LOCK_IF_MP
    cmp mp, 0
    /*
     * 如果 mp = 0,表明是线程运行在单核 CPU 环境下。此时 je 会跳转到 L0 标记处,
     * 也就是越过 _emit 0xF0 指令,直接执行 cmpxchg 指令。也就是不在下面的 cmpxchg 指令
     * 前加 lock 前缀。
     */
    je L0
    /*
     * 0xF0 是 lock 前缀的机器码,这里没有使用 lock,而是直接使用了机器码的形式。至于这样做的
     * 原因可以参考知乎的一个回答:
     *     https://www.zhihu.com/question/50878124/answer/123099923
     */ 
    _emit 0xF0
L0:
    /*
     * 比较并交换。简单解释一下下面这条指令,熟悉汇编的朋友可以略过下面的解释:
     *   cmpxchg: 即“比较并交换”指令
     *   dword: 全称是 double word,在 x86/x64 体系中,一个 
     *          word = 2 byte,dword = 4 byte = 32 bit
     *   ptr: 全称是 pointer,与前面的 dword 连起来使用,表明访问的内存单元是一个双字单元
     *   [edx]: [...] 表示一个内存单元,edx 是寄存器,dest 指针值存放在 edx 中。
     *          那么 [edx] 表示内存地址为 dest 的内存单元
     *          
     * 这一条指令的意思就是,将 eax 寄存器中的值(compare_value)与 [edx] 双字内存单元中的值
     * 进行对比,如果相同,则将 ecx 寄存器中的值(exchange_value)存入 [edx] 内存单元中。
     */
    cmpxchg dword ptr [edx], ecx
  }
}

The implementation process of CAS is finished here. The implementation of CAS is inseparable from the support of the processor. There are so many codes above, but the core code is actually a cmpxchg instruction with lock prefix, that is, lock cmpxchg dword ptr [edx], ecx.

4. ABA problem

When talking about CAS, we basically have to talk about the ABA problem of CAS. CAS consists of three steps, namely "read->compare->writeback". Consider a situation where thread 1 and thread 2 execute CAS logic at the same time. The execution sequence of the two threads is as follows:

  1. Time 1: Thread 1 performs a read operation and obtains the original value A, and then the thread Switched away
  2. Time 2: Thread 2 completes the CAS operation and changes the original value from A to B
  3. Time 3: Thread 2 performs the CAS operation again and changes the original value from B to A
  4. Moment 4: Thread 1 resumes running, compares the comparison value (compareValue) with the original value (oldValue), and finds that the two values ​​are equal. Then write the new value (newValue) into the memory to complete the CAS operation

As in the above process, thread 1 does not know that the original value has been modified, and it seems that there is no change, so it The process will continue to execute. For ABA problems, the usual solution is to set a version number for each CAS operation. The java.util.concurrent.atomic package provides an atomic class AtomicStampedReference that can handle ABA issues. The specific implementation will not be analyzed here. Interested friends can check it out for themselves.

5. Summary

Writing this, this article is finally coming to an end. Although the principle of CAS itself, including its implementation, is not difficult, it is really not easy to write. This involves some low-level knowledge. Although I can understand it, it is still a bit difficult to understand it. Due to my lack of underlying knowledge, some of the above analysis will inevitably be wrong. So if there is an error, please feel free to comment. Of course, it is best to explain why it is wrong. Thank you.

Okay, that’s it for this article. Thanks for reading and bye.

Appendix

The paths to several files used in the previous source code analysis section are posted here. It will help everyone index, as follows:

File name Path
Unsafe.java openjdk/jdk/src/share/classes/sun/misc/Unsafe.java
unsafe.cpp openjdk/ hotspot/src/share/vm/prims/unsafe.cpp
atomic.cpp openjdk/hotspot/src/share/vm/runtime/atomic.cpp
atomic_windows_x86.inline.hpp openjdk/hotspot/src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp

The above is the detailed content of Introduction to Java CAS principle analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete