[Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
@ 2021-03-24 18:07 evan@coeus-group.com
  2021-03-25  6:52 ` [Bug target/99754] " crazylht at gmail dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: evan@coeus-group.com @ 2021-03-24 18:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

            Bug ID: 99754
           Summary: [sse2] new _mm_loadu_si16 and _mm_loadu_si32
                    implemented incorrectly
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: evan@coeus-group.com
  Target Milestone: ---

Created attachment 50470
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50470&action=edit
Trivial patch

_mm_loadu_si16 and _mm_loadu_si32 were implemented in GCC 11, but incorrectly. 
The value pointed to by the argument is supposed to go in the first element,
but _mm_set_epi16 / _mm_set_epi32 reverse the argument order so in GCC they go
in the *last* elemement.

The most straightforward solution would be to change the _mm_set_* calls so the
input is used for the last argument instead of the first (patch attached).

FWIW, here is LLVM's implementation:
<https://github.com/llvm/llvm-project/blob/a76d0207d5f94af698525d7dc1f0953ed35901a6/clang/lib/Headers/emmintrin.h#L1670-L1710>.
I've verified that LLVM's implementation matches ICC's.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
@ 2021-03-25  6:52 ` crazylht at gmail dot com
  2022-03-11 15:15 ` peter at cordes dot ca
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2021-03-25  6:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
Yes, __mm_set_epi32 will reverse order of parameters, Could you send out a
patch for review?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
  2021-03-25  6:52 ` [Bug target/99754] " crazylht at gmail dot com
@ 2022-03-11 15:15 ` peter at cordes dot ca
  2022-03-11 15:21 ` peter at cordes dot ca
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: peter at cordes dot ca @ 2022-03-11 15:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
Can we get this patch applied soon?  There aren't any other
strict-aliasing-safe movd load intrinsics, but this one won't be portably
usable while there are buggy GCC versions around.

Until then, code should probably use something like


inline __m128i movd(void *p){
    return _mm_castps_si128(_mm_load_ss((const float*)p));
}

(Which believe it or not is strict-aliasing safe even on integer data.  At
least it should be; last I tested it was across compilers, except maybe on ICC.
 Would have to double-check there.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
  2021-03-25  6:52 ` [Bug target/99754] " crazylht at gmail dot com
  2022-03-11 15:15 ` peter at cordes dot ca
@ 2022-03-11 15:21 ` peter at cordes dot ca
  2022-03-11 15:53 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: peter at cordes dot ca @ 2022-03-11 15:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
Wait a minute, the current implementation of _mm_loadu_si32 isn't
strict-aliasing or alignment safe!!!   That defeats the purpose for its
existence as something to use instead of _mm_cvtsi32_si128( *(int*)p );

The current code contains a deref of a plain (int*).
It should be using something like

typdef int unaligned_aliasing_int __attribute__((aligned(1),may_alias));

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (2 preceding siblings ...)
  2022-03-11 15:21 ` peter at cordes dot ca
@ 2022-03-11 15:53 ` jakub at gcc dot gnu.org
  2022-03-14  9:45 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-03-11 15:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-03-11
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 52609
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52609&action=edit
gcc12-pr99754.patch

Full untested patch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (3 preceding siblings ...)
  2022-03-11 15:53 ` jakub at gcc dot gnu.org
@ 2022-03-14  9:45 ` cvs-commit at gcc dot gnu.org
  2022-03-26 23:03 ` peter at cordes dot ca
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-14  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a010954cc190e942994ed5d56fba503851b20b9a

commit r12-7640-ga010954cc190e942994ed5d56fba503851b20b9a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Mon Mar 14 10:44:38 2022 +0100

    i386: Fix up _mm_loadu_si{16,32} [PR99754]

    These intrinsics are supposed to do an unaligned may_alias load
    of a 16-bit or 32-bit value and store it as the first element of
    a 128-bit integer vector, with all other elements cleared.

    The current _mm_storeu_* implementation implements that correctly, uses
    __*_u types to do the store and extracts the first element of a vector into
    it.
    But _mm_loadu_si{16,32} gets it all wrong.  It performs an aligned
    non-may_alias load and because _mm_set_epi{16,32} has the args reversed,
    it also inserts it into the last vector element instead of first.

    The following patch fixes that.

    Note, while the Intrinsics guide for _mm_loadu_si32 says SSE2,
    for _mm_loadu_si16 it says strangely SSE.  But the intrinsics
    returns __m128i, which is only defined in emmintrin.h, and
    _mm_set_epi16 is also only SSE2 and later in emmintrin.h.
    Even clang defines it in emmintrin.h and ends up with inlining
    failure when calling _mm_loadu_si16 from sse,no-sse2 function.
    So, isn't that a bug in the intrinsic guide instead?

    2022-03-14  Jakub Jelinek  <jakub@redhat.com>

            PR target/99754
            * config/i386/emmintrin.h (_mm_loadu_si32): Put loaded value into
            first   rather than last element of the vector, use __m32_u to do
            a really unaligned load, use just 0 instead of (int)0.
            (_mm_loadu_si16): Put loaded value into first rather than last
            element of the vector, use __m16_u to do a really unaligned load,
            use just 0 instead of (short)0.

            * gcc.target/i386/pr99754-1.c: New test.
            * gcc.target/i386/pr99754-2.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (4 preceding siblings ...)
  2022-03-14  9:45 ` cvs-commit at gcc dot gnu.org
@ 2022-03-26 23:03 ` peter at cordes dot ca
  2022-03-28  1:44 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: peter at cordes dot ca @ 2022-03-26 23:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #6 from Peter Cordes <peter at cordes dot ca> ---
Looks good to me, thanks for taking care of this quickly, hopefully we can get
this backported to the GCC11 series to limit the damage for people using these
newish intrinsics.  I'd love to recommend them for general use, except for this
GCC problem where some distros have already shipped GCC versions that compile
without error but in a 100% broken way.

Portable ways to do narrow alignment/aliasing-safe SIMD loads were sorely
lacking; there aren't good effective workarounds for this, especially for
16-bit loads.  (I still don't know how to portably / safely write code that
will compile to a memory-source PMOVZXBQ across all compilers; Intel's
intrinsics API is rather lacking in some areas and relies on compilers folding
loads into memory source operands.)

> So, isn't that a bug in the intrinsic guide instead?

Yes, __m128i _mm_loadu_si16 only really makes sense with SSE2 for PINSRW.  Even
movzx into an integer reg and then MOVD xmm, eax requires SSE2.  With only SSE1
you'd have to movzx / dword store to stack / MOVSS reload.

SSE1 makes *some* sense for _mm_loadu_si32 since it can be implemented with a
single MOVSS if MOVD isn't available.

But we already have SSE1 __m128 _mm_load_ss(const float *) for that.

Except GCC's implementation of _mm_load_ss isn't alignment and strict-aliasing
safe; it derefs the actual float *__P as _mm_set_ss (*__P).  Which I think is a
bug, although I'm not clear what semantics Intel intended for that intrinsic. 
Clang implements it as alignment/aliasing safe with a packed may_alias struct
containing a float.  MSVC always behaves like -fno-strict-aliasing, and I
*think* ICC does, too.

Perhaps best to follow the crowd and make all narrow load/store intrinsics
alignment and aliasing safe, unless that causes code-gen regressions; users can
_mm_set_ss( *ptr ) themselves if they want that to tell the compiler that's its
a normal C float object.

Was going to report this, but PR84508 is still open and already covers the
relevant ss and sd intrinsics.  That points out that Intel specifically
documents it as not requiring alignment, not mentioning aliasing.

----

Speaking of bouncing through a GP-integer reg, GCC unfortunately does that; it
seems to incorrectly think PINSRW xmm, mem, 0 requires -msse4.1, unlike with a
GP register source.  Reported as PR105066 along with related missed
optimizations about folding into a memory source operand for pmovzx/sx.

But that's unrelated to correctness; this bug can be closed unless we're
keeping it open until it's fixed in the GCC11 current stable series.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (5 preceding siblings ...)
  2022-03-26 23:03 ` peter at cordes dot ca
@ 2022-03-28  1:44 ` crazylht at gmail dot com
  2022-03-28  2:33 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-03-28  1:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> But that's unrelated to correctness; this bug can be closed unless we're
> keeping it open until it's fixed in the GCC11 current stable series.

Let me do the backporting.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (6 preceding siblings ...)
  2022-03-28  1:44 ` crazylht at gmail dot com
@ 2022-03-28  2:33 ` cvs-commit at gcc dot gnu.org
  2022-03-28  2:35 ` crazylht at gmail dot com
  2022-05-07  7:26 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-28  2:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by hongtao Liu
<liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:85568e505c3b06708ec0fb21d1ab4f78e0c66896

commit r11-9699-g85568e505c3b06708ec0fb21d1ab4f78e0c66896
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Mon Mar 14 10:44:38 2022 +0100

    i386: Fix up _mm_loadu_si{16,32} [PR99754]

    These intrinsics are supposed to do an unaligned may_alias load
    of a 16-bit or 32-bit value and store it as the first element of
    a 128-bit integer vector, with all other elements cleared.

    The current _mm_storeu_* implementation implements that correctly, uses
    __*_u types to do the store and extracts the first element of a vector into
    it.
    But _mm_loadu_si{16,32} gets it all wrong.  It performs an aligned
    non-may_alias load and because _mm_set_epi{16,32} has the args reversed,
    it also inserts it into the last vector element instead of first.

    The following patch fixes that.

    Note, while the Intrinsics guide for _mm_loadu_si32 says SSE2,
    for _mm_loadu_si16 it says strangely SSE.  But the intrinsics
    returns __m128i, which is only defined in emmintrin.h, and
    _mm_set_epi16 is also only SSE2 and later in emmintrin.h.
    Even clang defines it in emmintrin.h and ends up with inlining
    failure when calling _mm_loadu_si16 from sse,no-sse2 function.
    So, isn't that a bug in the intrinsic guide instead?

    2022-03-14  Jakub Jelinek  <jakub@redhat.com>

            PR target/99754
            * config/i386/emmintrin.h (_mm_loadu_si32): Put loaded value into
            first   rather than last element of the vector, use __m32_u to do
            a really unaligned load, use just 0 instead of (int)0.
            (_mm_loadu_si16): Put loaded value into first rather than last
            element of the vector, use __m16_u to do a really unaligned load,
            use just 0 instead of (short)0.

            * gcc.target/i386/pr99754-1.c: New test.
            * gcc.target/i386/pr99754-2.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (7 preceding siblings ...)
  2022-03-28  2:33 ` cvs-commit at gcc dot gnu.org
@ 2022-03-28  2:35 ` crazylht at gmail dot com
  2022-05-07  7:26 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-03-28  2:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #7)
> > 
> > But that's unrelated to correctness; this bug can be closed unless we're
> > keeping it open until it's fixed in the GCC11 current stable series.
> 
> Let me do the backporting.

Fixed in GCC 11.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly
  2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
                   ` (8 preceding siblings ...)
  2022-03-28  2:35 ` crazylht at gmail dot com
@ 2022-05-07  7:26 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-07  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-07  7:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 18:07 [Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly evan@coeus-group.com
2021-03-25  6:52 ` [Bug target/99754] " crazylht at gmail dot com
2022-03-11 15:15 ` peter at cordes dot ca
2022-03-11 15:21 ` peter at cordes dot ca
2022-03-11 15:53 ` jakub at gcc dot gnu.org
2022-03-14  9:45 ` cvs-commit at gcc dot gnu.org
2022-03-26 23:03 ` peter at cordes dot ca
2022-03-28  1:44 ` crazylht at gmail dot com
2022-03-28  2:33 ` cvs-commit at gcc dot gnu.org
2022-03-28  2:35 ` crazylht at gmail dot com
2022-05-07  7:26 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).