From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9CFFC3858C53; Sat, 26 Mar 2022 23:03:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9CFFC3858C53 From: "peter at cordes dot ca" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly Date: Sat, 26 Mar 2022 23:03:58 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: peter at cordes dot ca X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Mar 2022 23:03:58 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99754 --- Comment #6 from Peter Cordes --- Looks good to me, thanks for taking care of this quickly, hopefully we can = get this backported to the GCC11 series to limit the damage for people using th= ese newish intrinsics. I'd love to recommend them for general use, except for = this GCC problem where some distros have already shipped GCC versions that compi= le without error but in a 100% broken way. Portable ways to do narrow alignment/aliasing-safe SIMD loads were sorely lacking; there aren't good effective workarounds for this, especially for 16-bit loads. (I still don't know how to portably / safely write code that will compile to a memory-source PMOVZXBQ across all compilers; Intel's intrinsics API is rather lacking in some areas and relies on compilers fold= ing loads into memory source operands.) > So, isn't that a bug in the intrinsic guide instead? Yes, __m128i _mm_loadu_si16 only really makes sense with SSE2 for PINSRW. = Even movzx into an integer reg and then MOVD xmm, eax requires SSE2. With only = SSE1 you'd have to movzx / dword store to stack / MOVSS reload. SSE1 makes *some* sense for _mm_loadu_si32 since it can be implemented with= a single MOVSS if MOVD isn't available. But we already have SSE1 __m128 _mm_load_ss(const float *) for that. Except GCC's implementation of _mm_load_ss isn't alignment and strict-alias= ing safe; it derefs the actual float *__P as _mm_set_ss (*__P). Which I think = is a bug, although I'm not clear what semantics Intel intended for that intrinsi= c.=20 Clang implements it as alignment/aliasing safe with a packed may_alias stru= ct containing a float. MSVC always behaves like -fno-strict-aliasing, and I *think* ICC does, too. Perhaps best to follow the crowd and make all narrow load/store intrinsics alignment and aliasing safe, unless that causes code-gen regressions; users= can _mm_set_ss( *ptr ) themselves if they want that to tell the compiler that's= its a normal C float object. Was going to report this, but PR84508 is still open and already covers the relevant ss and sd intrinsics. That points out that Intel specifically documents it as not requiring alignment, not mentioning aliasing. ---- Speaking of bouncing through a GP-integer reg, GCC unfortunately does that;= it seems to incorrectly think PINSRW xmm, mem, 0 requires -msse4.1, unlike wit= h a GP register source. Reported as PR105066 along with related missed optimizations about folding into a memory source operand for pmovzx/sx. But that's unrelated to correctness; this bug can be closed unless we're keeping it open until it's fixed in the GCC11 current stable series.=