From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A78503858CDB; Wed, 29 May 2024 19:13:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A78503858CDB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1717009992; bh=Ns9wxZ5K/og0LRBjuynOb5hHzq115GEySgokMqlQoz4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eBiNl2lChak0M94A91Tg55veuGN47pWs99VNu3JEZTecdm/ymFWRhVzICl1TgVLwP hKuqHnDts1XgRzBw4fvtZTE1972r3rbZNgYa1ua1X1JjdL6+WOhQipjSSUB65nRstu u4YiGJM2Z71fYm/BSzLl75pblJgujGW4TrxWXQog= From: "pcordes at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd Date: Wed, 29 May 2024 19:13:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: sanitizer X-Bugzilla-Version: 6.3.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: pcordes at gmail dot com X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D84508 --- Comment #24 from Peter Cordes --- (In reply to Jeffrey Walton from comment #23) > (In reply to Peter Cordes from comment #22) > > [...] > > That instruction is useless and should never be used in asm except for > > code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD= , all > > three doing the same thing for the memory-destination form). But easy = to > > imagine some code using that intrinsic to store an unaligned double int= o a > > byte buffer. >=20 > Reading from and writing to a [unaligned] byte stream in 4 or 8 byte chun= ks > is our use case. Eventually, we need to perform traditional SIMD processi= ng. > But the loads and stores have to occur using these old instrinsics due to > the word types, data stream format and supported ISA's. >=20 > I believe the other option is to memcpy the byte stream into a properly > aligned intermediate buffer. But that could incur a performance hit if the > optimizer misses the opportunity (and fails to elide the memcpy). Apparently GCC has been "broken" for ages, making it UB to use misaligned pointers with any of these intrinsics that only just now had their alignment requirements removed. And with _mm_storel_pd which is the same as before.= =20 Usually not resulting in miscompilation, though. Going forward, simply avoid _mm_storel_pd. Use _mm_store_sd (MOVSD) or _mm_storel_pi (MOVLPS) which have been fixed by this patch. _mm_store_sd derefs a double_u pointer, __attribute__((aligned(1),may_ali= as)) _mm_storel_pi uses __builtin_ia32_storelps It didn't change in this patch, so presumably has been correct for longer. = If you can put up with the amount of casting required to use it for the low do= uble of a __m128d (perhaps in a wrapper function that takes a void* and a vector= ), _mm_storel_pi might be your best bet, unless there's anything weird about t= he GCC internals for __builtin_ia32_storelps The asm instruction you want is MOVLPS (1 byte shorter than the others in non-AVX code) so it also has the advantage of hinting GCC to use that.=