From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 53854385803D; Mon, 28 Mar 2022 07:23:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53854385803D From: "peter at cordes dot ca" To: gcc-bugs@gcc.gnu.org Subject: [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd Date: Mon, 28 Mar 2022 07:23:09 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: sanitizer X-Bugzilla-Version: 6.3.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: peter at cordes dot ca X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Mar 2022 07:23:10 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D84508 --- Comment #17 from Peter Cordes --- (In reply to Andrew Pinski from comment #16) > >According to Intel ( > > https://software.intel.com/sites/landingpage/IntrinsicsGuide), there ar= e no > > alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd= . For > > example, from _mm_load_sd: >=20 > I disagree with saying there is no alignment requirement. >=20 > The alignment requirement comes from the type of the argument (double > const*). [...] > Pointers themselves have an alignment requirement not just at the time of > the load/store of them. The intrinsics are badly designed to take pointer args with types other than void*, despite how they're expected to work. This is something we just nee= d to accept. Starting with AVX-512, any new intrinsics take void*, but they hav= en't redefined the old ones. _mm_loadu_si128 takes a __m128i*, same as _mm_load_si128. alignof(__m128i)= =3D=3D 16, so _mm_loadu_si128 must not simply dereference it, that's what _mm_load_si128 does. Intel's intrinsics API requires you to do unaligned 16-byte loads by creati= ng a misaligned pointer and passing it to a loadu intrinsic. (This in turn requ= ires that implementations supporting these intrinsics define the behaviour of creating such a pointer without deref; in ISO C that alone would be UB.) This additional unaligned-pointer behaviour that implementations must define (at least for __m128i* and float/double*) is something I wrote about in an = SO answer: https://stackoverflow.com/questions/52112605/is-reinterpret-casting-between= -hardware-simd-vector-pointer-and-the-correspond _mm_loadu_ps (like _mm_load_ps) takes a float*, but its entire purpose it to not require alignment. _mm512_loadu_ps takes a void* arg, so we can infer that earlier FP load intrinsics really are intended to work on data with any alignment, not just with the alignment of a float. They're unlike a normal deref of a float* in aliasing rules, although that's separate from creating a misaligned float* in code outside the intrinsic. A hypothetical low-performance portable emulation of intrinsics that ended up dereferencing that float* arg directly would be broken for strict-aliasing = as well. The requirement to define the behaviour of having a misaligned float* can be blamed on Intel in 1995 (when SSE1 was new). Later extensions like AVX _mm256_loadu_ps just followed the same pattern of taking float* until they finally used void* for intrinsics introduced with or after AVX-512. The introduction of _mm_loadu_si32 and si16 is another step in the right direction, recognizing that _mm_cvtsi32_si128( *int_ptr ) isn't strict-alia= sing safe. When those were new, it might have been around the time Intel started exploring replacing ICC with the LLVM-based ICX. Anyway, the requirement to support misaligned vector and float/double point= ers implies that _mm_load_ss/sd taking float*/double* doesn't imply alignof(flo= at) or alignof(double). > So either the intrinsics definition needs to be changed to be > correct or GCC is correct. That's an option; I'd love it if all the load/store intrinsics were changed across all compilers to take void*. It's ugly and a pain to type=20=20 _mm_loadu_si128( (const __m128i*)ptr ) as well as creating cognitive dissonance because alignof(__m128i) =3D=3D 16. I'm not sure if it could break anything to change the intrinsics to take vo= id* even for older ones; possibly only C++ overload resolution for insane code = that defines a _mm_loadu_ps( other_type * ) and relies on float* args picking the intrinsic. If we changed just GCC, without getting buy-in from other compilers, taking void* would let people's code compile on GCC without casts from stuff like int*, when it wouldn't compile on other compilers. That could be considered a bad thing if people test their code with GCC and= are surprised to get reports of failure from people using compilers that follow Intel's documentation for the intrinsic function arg types.=20 (https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html).= It would basically be a case of being overly permissive for the feature / API = that people are trying to write portable code against.=