From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 53854385803D; Mon, 28 Mar 2022 07:23:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53854385803D
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
Date: Mon, 28 Mar 2022 07:23:09 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: sanitizer
X-Bugzilla-Version: 6.3.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: peter at cordes dot ca
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-84508-4-aTLQieqa2k@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-84508-4@http.gcc.gnu.org/bugzilla/>
References: <bug-84508-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Mar 2022 07:23:10 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D84508
--- Comment #17 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Andrew Pinski from comment #16)
> >According to Intel (
> > https://software.intel.com/sites/landingpage/IntrinsicsGuide), there ar=
e no
> > alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd=
. For
> > example, from _mm_load_sd:
>=20
> I disagree with saying there is no alignment requirement.
>=20
> The alignment requirement comes from the type of the argument (double
> const*). [...]
> Pointers themselves have an alignment requirement not just at the time of
> the load/store of them.

The intrinsics are badly designed to take pointer args with types other than
void*, despite how they're expected to work.  This is something we just nee=
d to
accept.  Starting with AVX-512, any new intrinsics take void*, but they hav=
en't
redefined the old ones.

_mm_loadu_si128 takes a __m128i*, same as _mm_load_si128.  alignof(__m128i)=
 =3D=3D
16, so _mm_loadu_si128 must not simply dereference it, that's what
_mm_load_si128 does.

Intel's intrinsics API requires you to do unaligned 16-byte loads by creati=
ng a
misaligned pointer and passing it to a loadu intrinsic.  (This in turn requ=
ires
that implementations supporting these intrinsics define the behaviour of
creating such a pointer without deref; in ISO C that alone would be UB.)

This additional unaligned-pointer behaviour that implementations must define
(at least for __m128i* and float/double*) is something I wrote about in an =
SO
answer:
https://stackoverflow.com/questions/52112605/is-reinterpret-casting-between=
-hardware-simd-vector-pointer-and-the-correspond


_mm_loadu_ps (like _mm_load_ps) takes a float*, but its entire purpose it to
not require alignment.

_mm512_loadu_ps takes a void* arg, so we can infer that earlier FP load
intrinsics really are intended to work on data with any alignment, not just
with the alignment of a float.

They're unlike a normal deref of a float* in aliasing rules, although that's
separate from creating a misaligned float* in code outside the intrinsic.  A
hypothetical low-performance portable emulation of intrinsics that ended up
dereferencing that float* arg directly would be broken for strict-aliasing =
as
well.

The requirement to define the behaviour of having a misaligned float* can be
blamed on Intel in 1995 (when SSE1 was new). Later extensions like AVX
_mm256_loadu_ps just followed the same pattern of taking float* until they
finally used void* for intrinsics introduced with or after AVX-512.

The introduction of _mm_loadu_si32 and si16 is another step in the right
direction, recognizing that _mm_cvtsi32_si128( *int_ptr ) isn't strict-alia=
sing
safe.  When those were new, it might have been around the time Intel started
exploring replacing ICC with the LLVM-based ICX.

Anyway, the requirement to support misaligned vector and float/double point=
ers
implies that _mm_load_ss/sd taking float*/double* doesn't imply alignof(flo=
at)
or alignof(double).

>  So either the intrinsics definition needs to be changed to be
> correct or GCC is correct.

That's an option; I'd love it if all the load/store intrinsics were changed
across all compilers to take void*.  It's ugly and a pain to type=20=20
_mm_loadu_si128( (const __m128i*)ptr )
as well as creating cognitive dissonance because alignof(__m128i) =3D=3D 16.

I'm not sure if it could break anything to change the intrinsics to take vo=
id*
even for older ones; possibly only C++ overload resolution for insane code =
that
defines a _mm_loadu_ps( other_type * ) and relies on float* args picking the
intrinsic.

If we changed just GCC, without getting buy-in from other compilers, taking
void* would let people's code compile on GCC without casts from stuff like
int*, when it wouldn't compile on other compilers.

That could be considered a bad thing if people test their code with GCC and=
 are
surprised to get reports of failure from people using compilers that follow
Intel's documentation for the intrinsic function arg types.=20
(https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html).=
  It
would basically be a case of being overly permissive for the feature / API =
that
people are trying to write portable code against.=