Home >Backend Development >C++ >How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?

How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-07 06:08:15488browse

How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?

Atomic Floating Point Operations on x86_64

While C does not natively support atomic double operations, it does provide lock-free atomic implementations on most platforms. These implementations typically use compare-and-swap (CAS) operations with lock cmpxchg instruction.

For atomic vector operations on x86_64, there is no direct hardware support. However, aligned 128-bit and 256-bit loads and stores are generally guaranteed to be atomic. For non-aligned vector operations, the atomicity guarantees are less clear.

Assembly-Level Support for Double and Vector Operations

x86_64 provides assembly-level support for atomic operations on doubles and vectors:

  • Doubles: Atomic loads, stores, and add/subtract/multiply operations are supported through the memory-destination instructions movsd, movq, addsd, subsd, and mulsd.
  • Vectors: Aligned 128-bit and 256-bit loads and stores are atomic on x86_64 with AVX support. For non-aligned vector operations, there is no direct hardware guarantee of atomicity.

MSVC 2017 Implementation of Lock-Free atomic

MSVC 2017 implements lock-free atomic operations using double-width integer registers. For example, the load operation involves:

CAS: movq QWORD PTR [dst_addr], rax  // 64-bit CAS

The add operation uses:

CAS: lock cmpxchg16b QWORD PTR [dst_addr], rax  // 128-bit CAS

Atomic RMW (Read-Modify-Write) Operations

Atomic read-modify-write (RMW) operations, such as fetch_add, require a CAS loop implementation. On x86_64, the CAS instruction supports 16-byte operations (cmpxchg16b).

CAS: lock cmpxchg16b QWORD PTR [dst_addr], rax

While CAS loops provide atomic RMW functionality, they are more expensive than atomic loads and stores.

Additional Notes

  • Some non-x86 hardware supports atomic add operations for float/double types.
  • Intel's Transactional Memory Extensions (TSX) provide improved support for atomic FP and SIMD operations.
  • Compilers often generate inefficient code for atomic operations, but improvements are being made.
  • Atomic operations on shared arrays of aligned doubles should be safe, while operations on unaligned vectors may involve tearing.
  • It is possible to implement atomic operations on 16-byte objects using cmpxchg16b, but performance will be poor.

The above is the detailed content of How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn