Home >Backend Development >C++ >Does x86_64 Support Atomic Operations on Doubles and SSE/AVX Vectors?

Does x86_64 Support Atomic Operations on Doubles and SSE/AVX Vectors?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-27 06:58:101051browse

Does x86_64 Support Atomic Operations on Doubles and SSE/AVX Vectors?

Atomic Floating-Point Operations and SSE/AVX Vector Load/Store on x86_64

Although C 11 supports lock-free std::atomic, it does not currently support atomic AVX/SSE vectors due to CPU dependencies. However, the question arises: does x86_64 provide assembly-level support for atomic operations on doubles or vectors?

Atomic Operations on x86_64

x86_64 supports the following atomic operations on doubles, performed using lock-free instructions:

  • Load
  • Store
  • Add
  • Subtract
  • Multiply

Atomic Vector Operations on x86_64

Unfortunately, there is no way to guarantee the atomicity of 128b or 256b vector stores or loads across the cache coherency system. However, for aligned vectors, you can safely use vector loads and stores on shared double arrays without risk of tearing.

If atomic 16B loads are required, your only option is to use lock cmpxchg16b with desired=expected. if it succeeds, it replaces the existing value with itself. If it fails, you get the old contents. Note that this "load" faults on read-only memory, so use caution when passing pointers to functions that perform this operation.

Atomic 16B stores and RMW can both use lock cmpxchg16b in the obvious way. This makes pure stores much more expensive than regular vector stores, especially with multiple cmpxchg16b retries. However, atomic RMW is already expensive.

Limitations of Atomic Vector Operations

  • atomic<__m128d> would be slow even for read-only or write-only operations due to the use of cmpxchg16b.
  • atomic<__m256d> cannot be lock-free.
  • alignas(64) atomic shared_buffer[1024]; would allow auto-vectorization, but compilers do not generate efficient asm for this.

Atomically Reading and Updating 16B Objects

You can atomically update 16B objects but read the 8B halves separately. However, compilers do not provide a clean way to express this, and inlining cmpxchg16b is unreliable due to ongoing considerations by compiler developers.

The above is the detailed content of Does x86_64 Support Atomic Operations on Doubles and SSE/AVX Vectors?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn