From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id F0711385F032; Fri, 17 May 2024 09:22:37 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F0711385F032
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1715937757;
	bh=CMrg2wJJX7CGzl46FZ7I5ulh+69PTmQN+EbjFlxFOFM=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=MJh3z+Pjp0CEkkC1bmfX1+hqm8uGjjUkK/zD2kwwuZ2Frye30A4DCGaUiIDvOzBlE
	 s5DYU7SCkLEkTK6k2HE4ZAqn1lq7wM4hgAQyfa7N9/8rlEf4PNrtDVA4TcP8PY5lc3
	 roRyVH8xPno/axEGEtytsRqHfAnaSfiVu1KHk7XM=
From: "jan.wassenberg at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/115115] [12/13/14/15 Regression] highway-1.0.7 wrong
 _mm_cvttps_epi32() constant fold
Date: Fri, 17 May 2024 09:22:36 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jan.wassenberg at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.4
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-115115-4-iKwExmkqse@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-115115-4@http.gcc.gnu.org/bugzilla/>
References: <bug-115115-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115115
--- Comment #9 from Jan Wassenberg <jan.wassenberg at gmail dot com> ---
On second thought, we are actually trying to convert out-of-bounds values to
the closest representable. We use the documented behavior of the instructio=
n,
as mentioned in #5, and then correct the result afterwards.

Per the comment in the code below, it seems GCC since v11 or even 10 has be=
en
assuming this is UB, and optimizing out our fix.

I do believe this is compiler misbehavior, rooted in treating the operation=
 as
if it were scalar code. But vector instructions are more modern and have
tighter specs; for example, integers are 2's complement and wraparound for
addition is well-defined in the actual instructions.

Given that at least GCC's constant folding has unexpected results, we will =
have
to find a workaround. I had previously worried that a floating-point min(in=
put,
(2^63)-1) is not exact. But comparing the float >=3D 2^63 and if so returni=
ng
(2^63)-1 would work, right? The CPU will anyway truncate the float to int.

Our current fixup code:

// For ConvertTo float->int of same size, clamping before conversion would
// change the result because the max integer value is not exactly
representable.
// Instead detect the overflow result after conversion and fix it.
// Generic for all vector lengths.
template <class DI>
HWY_INLINE VFromD<DI> FixConversionOverflow(DI di,
                                            VFromD<RebindToFloat<DI>> origi=
nal,
                                            VFromD<DI> converted) {
  // Combinations of original and output sign:
  //   --: normal <0 or -huge_val to 80..00: OK
  //   -+: -0 to 0                         : OK
  //   +-: +huge_val to 80..00             : xor with FF..FF to get 7F..FF
  //   ++: normal >0                       : OK
  const VFromD<DI> sign_wrong =3D AndNot(BitCast(di, original), converted);
#if HWY_COMPILER_GCC_ACTUAL
  // Critical GCC 11 compiler bug (possibly also GCC 10): omits the Xor; al=
so
  // Add() if using that instead. Work around with one more instruction.
  const RebindToUnsigned<DI> du;
  const VFromD<DI> mask =3D BroadcastSignBit(sign_wrong);
  const VFromD<DI> max =3D BitCast(di, ShiftRight<1>(BitCast(du, mask)));
  return IfVecThenElse(mask, max, converted);
#else
  return Xor(converted, BroadcastSignBit(sign_wrong));
#endif
}=