Home >Backend Development >C++ >Is Reinterpreting Casts Between Hardware SIMD Vector Pointers and Corresponding Types Undefined Behavior in C ?
Is Reinterpret Casting Between Hardware SIMD Vector Pointer and Corresponding Type an Undefined Behavior?
In C , is it permissible to reinterpret_cast a float to a __m256 and access float objects through a different pointer type?
The following code example illustrates this:
#include <immintrin.h> constexpr size_t _m256_float_step_sz = sizeof(__m256) / sizeof(float); alignas(__m256) float stack_store[100 * _m256_float_step_sz ]{}; __m256& hwvec1 = *reinterpret_cast<__m256*&>(&stack_store[0 * _m256_float_step_sz]); using arr_t = float[_m256_float_step_sz]; arr_t& arr1 = *reinterpret_cast<float(*)[_m256_float_step_sz]&>(&hwvec1);
Do hwvec1 and arr1 have undefined behaviors? Are strict aliasing rules violated ([basic.lval]/11)? Alternatively, is there only one defined intrinsic way:
__m256 hwvec2 = _mm256_load_ps(&stack_store[0 * _m256_float_step_sz]); _mm256_store_ps(&stack_store[1 * _m256_float_step_sz], hwvec2);
Answer:
ISO C doesn't define __m256, so we need to look at what does define their behaviour on the implementations that support them. Intel's intrinsics define vector-pointers like __m256 as being allowed to alias anything else, the same way ISO C defines char as being allowed to alias. (But not vice-versa: it's UB and breaks in practice to point an int* at a __m256i and deref it.)
So yes, it's safe to dereference a __m256 instead of using a _mm256_load_ps() aligned-load intrinsic. But especially for float/double, it's often easier to use the intrinsics because they take care of casting from float, too. For integers, the AVX512 load/store intrinsics are defined as taking void but AVX2 and earlier need a cast like (__m256i)&arr[i] which is pretty clunky API design and clutters up code using it.
A few non-AVX512 intrinsics have also been added using void like movd/movq load/store alignment and aliasing safe intrinsics such as _mm_loadu_si32(void). Previously I think Intel assumed you'd use _mm_cvtsi32_si128 which required getting an int loaded safely yourself, which meant using memcpy to avoid UB (at least on compilers other than classic ICC and MSVC, if they allow unaligned int* as well as not enforcing strict aliasing).
This might have been around the time Intel started looking at migrating to LLVM for ICX/ICPX / OneAPI, and realizing how much of a mess it was to deal with narrow loads on compilers that enforce strict aliasing.
The above is the detailed content of Is Reinterpreting Casts Between Hardware SIMD Vector Pointers and Corresponding Types Undefined Behavior in C ?. For more information, please follow other related articles on the PHP Chinese website!