public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
@ 2021-05-17  8:08 ` jakub at gcc dot gnu.org
  2022-03-26 23:35 ` peter at cordes dot ca
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-17  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|8.5                         |---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
  2021-05-17  8:08 ` [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd jakub at gcc dot gnu.org
@ 2022-03-26 23:35 ` peter at cordes dot ca
  2022-03-28  1:59 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: peter at cordes dot ca @ 2022-03-26 23:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #14 from Peter Cordes <peter at cordes dot ca> ---
This bug is mis-categorized; it's not a sanitizer bug, it's a bug in the
implementation _mm_load_ss / sd.

It currently derefs the  `float const*` arg directly, which is not
strict-aliasing or alignment safe.  alignof(float) is 4, but Intel's
documentation for this API still says "mem_addr does not need to be aligned on
any particular boundary."

_mm_load_ss (float const *__P)
{
  return _mm_set_ss (*__P);
}


As discussed on PR99754 _mm_load_si32(const void*) *is* strict-aliasing and
alignment safe.  But it only existed recently, and GCC11's implementation of it
is buggy (shuffling the element to the wrong place).  Before that, one safe way
to do a 32-bit SIMD load is with _mm_load_ss and _mm_castps_si128.  Or it was
supposed to be safe, but isn't!!

Clang uses a packed may_alias struct containing a float to get a safe load
done.  Another way would be casting the pointer to

typdef float aliasing_unaligned_f32 __attribute__((aligned(1),may_alias));

This is similar to what we do with __m32_u for use in aliasing-safe integer
load/store, except we define that as int with
vector_size(4),may_alias,aligned(1) for some reason.  Perhaps influenced by
__m64_u which is a vector of 2 ints.

MSVC is like gcc -fno-strict-aliasing, so however it handles intrinsics,
they're always aliasing-safe.

I'm not 100% sure about what ICC formally guarantees, but in practice it
doesn't move aliasing short*  stores across a _mm_load_ss( (float*)pshort )
load.
https://godbolt.org/z/6s76v71xz  I didn't test with _mm_store_ss aliasing with
short loads, only vice versa.

So GCC is the odd one out, out of the major 4 compilers that support Intel's
intrinsics API.  All our narrow load/store intrinsics should be strict-aliasing
and alignment safe, regardless of what pointer type they accept.

Intel's early design of taking float* and double* instead of void* could be
considered poor design.  Their naming with just load/store instead of
_mm_loadu_ss / storeu is also poor design, clearly motivated by the asm
differences rather than an actual intrinsic API difference.

In x86 asm, loads/stores narrower than 16 bytes never require alignment (unless
the AC bit is set in EFLAGS).  Assuming Intel modeled their intrinsics API
after their asm, then it makes sense to have load and loadu for ps and si128,
but only load/store with an implied lack of alignment for intrinsics that wrap
instructions like movlps / movhps / movss / movsd, and movd / movq, which do
narrower memory accesses.

That of course *doesn't* make sense in C terms, where it's always potentially a
problem to dereference misaligned pointers to narrow objects, even when
compiling for x86-64:
https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64
has an example and links some others, showing that compilers *don't* define the
behaviour of deref of misaligned pointers.

I'm pretty certain that Intel always intended their narrow load/store
intrinsics to not have any alignment requirements, like the asm instructions
that wrap them, but weren't thinking in C terms when naming them.  And were
sloppily in their choices of which ones to provide until decades later, since
it seems they thought that _mm_cvtsi32_si128(*x) was sufficient for a movd
load.  (Only the case on a compiler without strict-aliasing or alignment, since
the deref happens on the user's plain int*).

Anyway, hopefully this refutes the argument that _mm_load_sd should be aligned
because of the name, and clarifies what Intel might have been thinking when
naming these.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
  2021-05-17  8:08 ` [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd jakub at gcc dot gnu.org
  2022-03-26 23:35 ` peter at cordes dot ca
@ 2022-03-28  1:59 ` crazylht at gmail dot com
  2022-03-28  2:09 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: crazylht at gmail dot com @ 2022-03-28  1:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---

Clang's implementation:

1681static __inline__ __m128 __DEFAULT_FN_ATTRS
1682_mm_load_ss(const float *__p)
1683{
1684  struct __mm_load_ss_struct {
1685    float __u;
1686  } __attribute__((__packed__, __may_alias__));
1687  float __u = ((const struct __mm_load_ss_struct*)__p)->__u;
1688  return __extension__ (__m128){ __u, 0, 0, 0 };
1689}

Guess we can do similar things, will handle it in GCC13.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-03-28  1:59 ` crazylht at gmail dot com
@ 2022-03-28  2:09 ` pinskia at gcc dot gnu.org
  2022-03-28  7:23 ` peter at cordes dot ca
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-03-28  2:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>According to Intel (
> https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd. For
> example, from _mm_load_sd:

I disagree with saying there is no alignment requirement.

The alignment requirement comes from the type of the argument (double const*).
So either the intrinsics definition needs to be changed to be correct or GCC is
correct.
Pointers themselves have an alignment requirement not just at the time of the
load/store of them.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2022-03-28  2:09 ` pinskia at gcc dot gnu.org
@ 2022-03-28  7:23 ` peter at cordes dot ca
  2023-12-17 23:21 ` pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: peter at cordes dot ca @ 2022-03-28  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #17 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Andrew Pinski from comment #16)
> >According to Intel (
> > https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> > alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd. For
> > example, from _mm_load_sd:
> 
> I disagree with saying there is no alignment requirement.
> 
> The alignment requirement comes from the type of the argument (double
> const*). [...]
> Pointers themselves have an alignment requirement not just at the time of
> the load/store of them.

The intrinsics are badly designed to take pointer args with types other than
void*, despite how they're expected to work.  This is something we just need to
accept.  Starting with AVX-512, any new intrinsics take void*, but they haven't
redefined the old ones.

_mm_loadu_si128 takes a __m128i*, same as _mm_load_si128.  alignof(__m128i) ==
16, so _mm_loadu_si128 must not simply dereference it, that's what
_mm_load_si128 does.

Intel's intrinsics API requires you to do unaligned 16-byte loads by creating a
misaligned pointer and passing it to a loadu intrinsic.  (This in turn requires
that implementations supporting these intrinsics define the behaviour of
creating such a pointer without deref; in ISO C that alone would be UB.)

This additional unaligned-pointer behaviour that implementations must define
(at least for __m128i* and float/double*) is something I wrote about in an SO
answer:
https://stackoverflow.com/questions/52112605/is-reinterpret-casting-between-hardware-simd-vector-pointer-and-the-correspond


_mm_loadu_ps (like _mm_load_ps) takes a float*, but its entire purpose it to
not require alignment.

_mm512_loadu_ps takes a void* arg, so we can infer that earlier FP load
intrinsics really are intended to work on data with any alignment, not just
with the alignment of a float.

They're unlike a normal deref of a float* in aliasing rules, although that's
separate from creating a misaligned float* in code outside the intrinsic.  A
hypothetical low-performance portable emulation of intrinsics that ended up
dereferencing that float* arg directly would be broken for strict-aliasing as
well.

The requirement to define the behaviour of having a misaligned float* can be
blamed on Intel in 1995 (when SSE1 was new). Later extensions like AVX
_mm256_loadu_ps just followed the same pattern of taking float* until they
finally used void* for intrinsics introduced with or after AVX-512.

The introduction of _mm_loadu_si32 and si16 is another step in the right
direction, recognizing that _mm_cvtsi32_si128( *int_ptr ) isn't strict-aliasing
safe.  When those were new, it might have been around the time Intel started
exploring replacing ICC with the LLVM-based ICX.

Anyway, the requirement to support misaligned vector and float/double pointers
implies that _mm_load_ss/sd taking float*/double* doesn't imply alignof(float)
or alignof(double).

>  So either the intrinsics definition needs to be changed to be
> correct or GCC is correct.

That's an option; I'd love it if all the load/store intrinsics were changed
across all compilers to take void*.  It's ugly and a pain to type  
_mm_loadu_si128( (const __m128i*)ptr )
as well as creating cognitive dissonance because alignof(__m128i) == 16.

I'm not sure if it could break anything to change the intrinsics to take void*
even for older ones; possibly only C++ overload resolution for insane code that
defines a _mm_loadu_ps( other_type * ) and relies on float* args picking the
intrinsic.

If we changed just GCC, without getting buy-in from other compilers, taking
void* would let people's code compile on GCC without casts from stuff like
int*, when it wouldn't compile on other compilers.

That could be considered a bad thing if people test their code with GCC and are
surprised to get reports of failure from people using compilers that follow
Intel's documentation for the intrinsic function arg types. 
(https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html).  It
would basically be a case of being overly permissive for the feature / API that
people are trying to write portable code against.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2022-03-28  7:23 ` peter at cordes dot ca
@ 2023-12-17 23:21 ` pinskia at gcc dot gnu.org
  2024-05-09  7:43 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-17 23:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pobrn at protonmail dot com

--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 113053 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2023-12-17 23:21 ` pinskia at gcc dot gnu.org
@ 2024-05-09  7:43 ` cvs-commit at gcc dot gnu.org
  2024-05-09  8:55 ` liuhongt at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-09  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:5967696c0f6300da4387fea5d102be5bc9f23233

commit r15-337-g5967696c0f6300da4387fea5d102be5bc9f23233
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Fri Jan 19 15:22:10 2024 +0800

    i386: Fix some intrinsics without alignment requirements.

    gcc/ChangeLog:

            PR target/84508
            * config/i386/emmintrin.h
            (_mm_load_sd): Remove alignment requirement.
            (_mm_store_sd): Ditto.
            (_mm_loadh_pd): Ditto.
            (_mm_loadl_pd): Ditto.
            (_mm_storel_pd): Add alignment requirement.
            * config/i386/xmmintrin.h
            (_mm_loadh_pi): Remove alignment requirement.
            (_mm_loadl_pi): Ditto.
            (_mm_load_ss): Ditto.
            (_mm_store_ss): Ditto.

    gcc/testsuite/ChangeLog:

            PR target/84508
            * gcc.target/i386/pr84508-1.c: New test.
            * gcc.target/i386/pr84508-2.c: Ditto.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2024-05-09  7:43 ` cvs-commit at gcc dot gnu.org
@ 2024-05-09  8:55 ` liuhongt at gcc dot gnu.org
  2024-05-09 19:02 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-09  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #20 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
Fixed in GCC15.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2024-05-09  8:55 ` liuhongt at gcc dot gnu.org
@ 2024-05-09 19:02 ` cvs-commit at gcc dot gnu.org
  2024-05-29 18:45 ` pcordes at gmail dot com
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-09 19:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #21 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:e02b5683e77c2b4317b23be72e43b6e6cc6c8e5b

commit r15-350-ge02b5683e77c2b4317b23be72e43b6e6cc6c8e5b
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu May 9 20:59:05 2024 +0200

    testsuite: Fix up pr84508* tests [PR84508]

    The tests FAIL on x86_64-linux with
    /usr/bin/ld: cannot find -lubsan
    collect2: error: ld returned 1 exit status
    compiler exited with status 1
    FAIL: gcc.target/i386/pr84508-1.c (test for excess errors)
    Excess errors:
    /usr/bin/ld: cannot find -lubsan

    The problem is that only *.dg/ubsan/ubsan.exp calls ubsan_init
    which adds the needed search paths to libubsan library.
    So, link/run tests for -fsanitize=undefined need to go into
    gcc.dg/ubsan/ or g++.dg/ubsan/, even when they are target specific.

    2024-05-09  Jakub Jelinek  <jakub@redhat.com>

            PR target/84508
            * gcc.target/i386/pr84508-1.c: Move to ...
            * gcc.dg/ubsan/pr84508-1.c: ... here.  Restrict to i?86/x86_64
            non-ia32 targets.
            * gcc.target/i386/pr84508-2.c: Move to ...
            * gcc.dg/ubsan/pr84508-2.c: ... here.  Restrict to i?86/x86_64
            non-ia32 targets.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2024-05-09 19:02 ` cvs-commit at gcc dot gnu.org
@ 2024-05-29 18:45 ` pcordes at gmail dot com
  2024-05-29 19:01 ` noloader at gmail dot com
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: pcordes at gmail dot com @ 2024-05-29 18:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #22 from Peter Cordes <pcordes at gmail dot com> ---
Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic for
MOVLPD?

It was defined in terms of _mm_store_sd (which this patch correctly changes to
remove the alignment requirement), so we're technically not *adding* an
alignment requirement, rather keeping it from the old definition of
_mm_store_sd.

This is a bad thing; we should be removing the alignment requirement on it,
too.

That instruction is useless and should never be used in asm except for
code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD, all
three doing the same thing for the memory-destination form).  But easy to
imagine some code using that intrinsic to store an unaligned double into a byte
buffer.

IDK if there's any authoritative documentation from Intel on which intrinsics
support unaligned pointers, but for intrinsics which are documented as
corresponding to one specific instruction (unlike _mm_set), the sensible
assumption would be that the intrinsic has the same alignment requirements as
the instruction.  For everything narrower than 16 bytes, that means no
alignment requirement.  I think most programmers would find it surprising if
that wasn't the case, especially since GCC doesn't AFAIK document the
intrinsics itself to specify anything else.

(And with Intel intrinsics, I think they're all intended to allow aliasing,
e.g. pointing a double* at a buffer also accessed with some struct type.)

Also, should the type name  double_u   be changed to something with __ to avoid
polluting the namespace?

(In reply to GCC Commits from comment #19)
> The master branch has been updated by Hu <hulin@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:5967696c0f6300da4387fea5d102be5bc9f23233
> 
> commit r15-337-g5967696c0f6300da4387fea5d102be5bc9f23233
...
>             (_mm_storel_pd): Add alignment requirement.
>             * config/i386/xmmintrin.h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2024-05-29 18:45 ` pcordes at gmail dot com
@ 2024-05-29 19:01 ` noloader at gmail dot com
  2024-05-29 19:13 ` pcordes at gmail dot com
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: noloader at gmail dot com @ 2024-05-29 19:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #23 from Jeffrey Walton <noloader at gmail dot com> ---
(In reply to Peter Cordes from comment #22)
> [...]
> That instruction is useless and should never be used in asm except for
> code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD, all
> three doing the same thing for the memory-destination form).  But easy to
> imagine some code using that intrinsic to store an unaligned double into a
> byte buffer.

Reading from and writing to a [unaligned] byte stream in 4 or 8 byte chunks is
our use case. Eventually, we need to perform traditional SIMD processing. But
the loads and stores have to occur using these old instrinsics due to the word
types, data stream format and supported ISA's.

I believe the other option is to memcpy the byte stream into a properly aligned
intermediate buffer. But that could incur a performance hit if the optimizer
misses the opportunity (and fails to elide the memcpy).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2024-05-29 19:01 ` noloader at gmail dot com
@ 2024-05-29 19:13 ` pcordes at gmail dot com
  2024-05-30  6:01 ` liuhongt at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 15+ messages in thread
From: pcordes at gmail dot com @ 2024-05-29 19:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #24 from Peter Cordes <pcordes at gmail dot com> ---
(In reply to Jeffrey Walton from comment #23)
> (In reply to Peter Cordes from comment #22)
> > [...]
> > That instruction is useless and should never be used in asm except for
> > code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD, all
> > three doing the same thing for the memory-destination form).  But easy to
> > imagine some code using that intrinsic to store an unaligned double into a
> > byte buffer.
> 
> Reading from and writing to a [unaligned] byte stream in 4 or 8 byte chunks
> is our use case. Eventually, we need to perform traditional SIMD processing.
> But the loads and stores have to occur using these old instrinsics due to
> the word types, data stream format and supported ISA's.
> 
> I believe the other option is to memcpy the byte stream into a properly
> aligned intermediate buffer. But that could incur a performance hit if the
> optimizer misses the opportunity (and fails to elide the memcpy).


Apparently GCC has been "broken" for ages, making it UB to use misaligned
pointers with any of these intrinsics that only just now had their alignment
requirements removed.  And with _mm_storel_pd which is the same as before. 
Usually not resulting in miscompilation, though.

Going forward, simply avoid _mm_storel_pd.
Use _mm_store_sd (MOVSD) or _mm_storel_pi (MOVLPS) which have been fixed by
this patch.

_mm_store_sd derefs a  double_u  pointer, __attribute__((aligned(1),may_alias))

_mm_storel_pi uses __builtin_ia32_storelps
It didn't change in this patch, so presumably has been correct for longer.  If
you can put up with the amount of casting required to use it for the low double
of a __m128d (perhaps in a wrapper function that takes a void* and a vector),
_mm_storel_pi might be your best bet, unless there's anything weird about the
GCC internals for __builtin_ia32_storelps

The asm instruction you want is MOVLPS (1 byte shorter than the others in
non-AVX code) so it also has the advantage of hinting GCC to use that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2024-05-29 19:13 ` pcordes at gmail dot com
@ 2024-05-30  6:01 ` liuhongt at gcc dot gnu.org
  2024-05-30  6:33 ` liuhongt at gcc dot gnu.org
  2024-05-30 18:38 ` pcordes at gmail dot com
  14 siblings, 0 replies; 15+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-30  6:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #25 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Peter Cordes from comment #22)
> Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic
> for MOVLPD?
> 
From Intel intrinsic guide[1], there's explict "mem_addr does not need to be
aligned on any particular boundary" for mm_store_sd, but not for _mm_storel_pd.
[1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

> Also, should the type name  double_u   be changed to something with __ to
> avoid polluting the namespace?
Yes, __double_u makes sense.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2024-05-30  6:01 ` liuhongt at gcc dot gnu.org
@ 2024-05-30  6:33 ` liuhongt at gcc dot gnu.org
  2024-05-30 18:38 ` pcordes at gmail dot com
  14 siblings, 0 replies; 15+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-30  6:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #26 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #25)
> (In reply to Peter Cordes from comment #22)
> > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic
> > for MOVLPD?
> > 
> From Intel intrinsic guide[1], there's explict "mem_addr does not need to be
> aligned on any particular boundary" for mm_store_sd, but not for
> _mm_storel_pd.
> [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
> 
But for mm_loadl_pd, it also says no need for alignment, I need to confirm with
my peers if there's any specific purpose on that.
And yes, for <16-byte memory access, there's no alignment requirement
functionally.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd
       [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2024-05-30  6:33 ` liuhongt at gcc dot gnu.org
@ 2024-05-30 18:38 ` pcordes at gmail dot com
  14 siblings, 0 replies; 15+ messages in thread
From: pcordes at gmail dot com @ 2024-05-30 18:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #27 from Peter Cordes <pcordes at gmail dot com> ---
(In reply to Hongtao Liu from comment #26)
> (In reply to Hongtao Liu from comment #25)
> > (In reply to Peter Cordes from comment #22)
> > > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic
> > > for MOVLPD?
> > > 
> > From Intel intrinsic guide[1], there's explict "mem_addr does not need to be
> > aligned on any particular boundary" for mm_store_sd, but not for
> > _mm_storel_pd.
> > [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
> > 
> But for mm_loadl_pd, it also says no need for alignment, I need to confirm
> with my peers if there's any specific purpose on that.
> And yes, for <16-byte memory access, there's no alignment requirement
> functionally.

Interesting, yes some entries explicitly say the memory can be unaligned, some
don't.  But I don't think we should read that as alignment required being the
default if not stated.  Every intrinsic that does require alignment explicitly
says so.  (Like _mm_load_si128.)  We could make the same argument in the other
direction, that if an alignment requirement isn't mentioned, we should assume
there isn't one.

And I already posted earlier about why we shouldn't assume C semantics based on
the pointer type as Andrew Pinski had thought.  Intel's intrinsic docs were
originally written for ICC (classic), which takes intrinsics very literally: an
intrinsic in the C source will (almost?) always compile to the corresponding
asm instruction.  And presumably not optimizing based on pointer-alignment UB
even on a deref.  And definitely not on strict-aliasing UB.

So the C defaults for deref of a double* or __m64* shouldn't be assumed even
when the docs don't say anything about alignment.  They also don't mention
aliasing but we know from Intel's examples of how to use intrinsics (I think)
that the load/store intrinsics are all may_alias accesses.

Intel's current ICX compiler is based on LLVM which does care about aliasing
and alignment UB when optimizing, but their intrinsic docs still read like
they're thinking more in terms of asm than in terms of the C abstract machine. 
Probably they haven't been rewritten with that in mind since they implement
them (in their own compilers) so they Just Work even when aliasing other types
or without alignment.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-30 18:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-84508-4@http.gcc.gnu.org/bugzilla/>
2021-05-17  8:08 ` [Bug sanitizer/84508] Load of misaligned address using _mm_load_sd jakub at gcc dot gnu.org
2022-03-26 23:35 ` peter at cordes dot ca
2022-03-28  1:59 ` crazylht at gmail dot com
2022-03-28  2:09 ` pinskia at gcc dot gnu.org
2022-03-28  7:23 ` peter at cordes dot ca
2023-12-17 23:21 ` pinskia at gcc dot gnu.org
2024-05-09  7:43 ` cvs-commit at gcc dot gnu.org
2024-05-09  8:55 ` liuhongt at gcc dot gnu.org
2024-05-09 19:02 ` cvs-commit at gcc dot gnu.org
2024-05-29 18:45 ` pcordes at gmail dot com
2024-05-29 19:01 ` noloader at gmail dot com
2024-05-29 19:13 ` pcordes at gmail dot com
2024-05-30  6:01 ` liuhongt at gcc dot gnu.org
2024-05-30  6:33 ` liuhongt at gcc dot gnu.org
2024-05-30 18:38 ` pcordes at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).