From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id A78503858CDB; Wed, 29 May 2024 19:13:12 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A78503858CDB
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1717009992;
	bh=Ns9wxZ5K/og0LRBjuynOb5hHzq115GEySgokMqlQoz4=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=eBiNl2lChak0M94A91Tg55veuGN47pWs99VNu3JEZTecdm/ymFWRhVzICl1TgVLwP
	 hKuqHnDts1XgRzBw4fvtZTE1972r3rbZNgYa1ua1X1JjdL6+WOhQipjSSUB65nRstu
	 u4YiGJM2Z71fYm/BSzLl75pblJgujGW4TrxWXQog=
From: "pcordes at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
Date: Wed, 29 May 2024 19:13:11 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: sanitizer
X-Bugzilla-Version: 6.3.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: pcordes at gmail dot com
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: FIXED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-84508-4-wb96e1xota@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-84508-4@http.gcc.gnu.org/bugzilla/>
References: <bug-84508-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D84508
--- Comment #24 from Peter Cordes <pcordes at gmail dot com> ---
(In reply to Jeffrey Walton from comment #23)
> (In reply to Peter Cordes from comment #22)
> > [...]
> > That instruction is useless and should never be used in asm except for
> > code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD=
, all
> > three doing the same thing for the memory-destination form).  But easy =
to
> > imagine some code using that intrinsic to store an unaligned double int=
o a
> > byte buffer.
>=20
> Reading from and writing to a [unaligned] byte stream in 4 or 8 byte chun=
ks
> is our use case. Eventually, we need to perform traditional SIMD processi=
ng.
> But the loads and stores have to occur using these old instrinsics due to
> the word types, data stream format and supported ISA's.
>=20
> I believe the other option is to memcpy the byte stream into a properly
> aligned intermediate buffer. But that could incur a performance hit if the
> optimizer misses the opportunity (and fails to elide the memcpy).


Apparently GCC has been "broken" for ages, making it UB to use misaligned
pointers with any of these intrinsics that only just now had their alignment
requirements removed.  And with _mm_storel_pd which is the same as before.=
=20
Usually not resulting in miscompilation, though.

Going forward, simply avoid _mm_storel_pd.
Use _mm_store_sd (MOVSD) or _mm_storel_pi (MOVLPS) which have been fixed by
this patch.

_mm_store_sd derefs a  double_u  pointer, __attribute__((aligned(1),may_ali=
as))

_mm_storel_pi uses __builtin_ia32_storelps
It didn't change in this patch, so presumably has been correct for longer. =
 If
you can put up with the amount of casting required to use it for the low do=
uble
of a __m128d (perhaps in a wrapper function that takes a void* and a vector=
),
_mm_storel_pi might be your best bet, unless there's anything weird about t=
he
GCC internals for __builtin_ia32_storelps

The asm instruction you want is MOVLPS (1 byte shorter than the others in
non-AVX code) so it also has the advantage of hinting GCC to use that.=