From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 9CFFC3858C53; Sat, 26 Mar 2022 23:03:58 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9CFFC3858C53
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32
 implemented incorrectly
Date: Sat, 26 Mar 2022 23:03:58 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: peter at cordes dot ca
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-99754-4-02F3G3sL4T@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-99754-4@http.gcc.gnu.org/bugzilla/>
References: <bug-99754-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sat, 26 Mar 2022 23:03:58 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99754
--- Comment #6 from Peter Cordes <peter at cordes dot ca> ---
Looks good to me, thanks for taking care of this quickly, hopefully we can =
get
this backported to the GCC11 series to limit the damage for people using th=
ese
newish intrinsics.  I'd love to recommend them for general use, except for =
this
GCC problem where some distros have already shipped GCC versions that compi=
le
without error but in a 100% broken way.

Portable ways to do narrow alignment/aliasing-safe SIMD loads were sorely
lacking; there aren't good effective workarounds for this, especially for
16-bit loads.  (I still don't know how to portably / safely write code that
will compile to a memory-source PMOVZXBQ across all compilers; Intel's
intrinsics API is rather lacking in some areas and relies on compilers fold=
ing
loads into memory source operands.)


> So, isn't that a bug in the intrinsic guide instead?

Yes, __m128i _mm_loadu_si16 only really makes sense with SSE2 for PINSRW.  =
Even
movzx into an integer reg and then MOVD xmm, eax requires SSE2.  With only =
SSE1
you'd have to movzx / dword store to stack / MOVSS reload.

SSE1 makes *some* sense for _mm_loadu_si32 since it can be implemented with=
 a
single MOVSS if MOVD isn't available.

But we already have SSE1 __m128 _mm_load_ss(const float *) for that.

Except GCC's implementation of _mm_load_ss isn't alignment and strict-alias=
ing
safe; it derefs the actual float *__P as _mm_set_ss (*__P).  Which I think =
is a
bug, although I'm not clear what semantics Intel intended for that intrinsi=
c.=20
Clang implements it as alignment/aliasing safe with a packed may_alias stru=
ct
containing a float.  MSVC always behaves like -fno-strict-aliasing, and I
*think* ICC does, too.

Perhaps best to follow the crowd and make all narrow load/store intrinsics
alignment and aliasing safe, unless that causes code-gen regressions; users=
 can
_mm_set_ss( *ptr ) themselves if they want that to tell the compiler that's=
 its
a normal C float object.

Was going to report this, but PR84508 is still open and already covers the
relevant ss and sd intrinsics.  That points out that Intel specifically
documents it as not requiring alignment, not mentioning aliasing.

----

Speaking of bouncing through a GP-integer reg, GCC unfortunately does that;=
 it
seems to incorrectly think PINSRW xmm, mem, 0 requires -msse4.1, unlike wit=
h a
GP register source.  Reported as PR105066 along with related missed
optimizations about folding into a memory source operand for pmovzx/sx.

But that's unrelated to correctness; this bug can be closed unless we're
keeping it open until it's fixed in the GCC11 current stable series.=