From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id F0711385F032; Fri, 17 May 2024 09:22:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F0711385F032 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1715937757; bh=CMrg2wJJX7CGzl46FZ7I5ulh+69PTmQN+EbjFlxFOFM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MJh3z+Pjp0CEkkC1bmfX1+hqm8uGjjUkK/zD2kwwuZ2Frye30A4DCGaUiIDvOzBlE s5DYU7SCkLEkTK6k2HE4ZAqn1lq7wM4hgAQyfa7N9/8rlEf4PNrtDVA4TcP8PY5lc3 roRyVH8xPno/axEGEtytsRqHfAnaSfiVu1KHk7XM= From: "jan.wassenberg at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/115115] [12/13/14/15 Regression] highway-1.0.7 wrong _mm_cvttps_epi32() constant fold Date: Fri, 17 May 2024 09:22:36 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jan.wassenberg at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.4 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115115 --- Comment #9 from Jan Wassenberg --- On second thought, we are actually trying to convert out-of-bounds values to the closest representable. We use the documented behavior of the instructio= n, as mentioned in #5, and then correct the result afterwards. Per the comment in the code below, it seems GCC since v11 or even 10 has be= en assuming this is UB, and optimizing out our fix. I do believe this is compiler misbehavior, rooted in treating the operation= as if it were scalar code. But vector instructions are more modern and have tighter specs; for example, integers are 2's complement and wraparound for addition is well-defined in the actual instructions. Given that at least GCC's constant folding has unexpected results, we will = have to find a workaround. I had previously worried that a floating-point min(in= put, (2^63)-1) is not exact. But comparing the float >=3D 2^63 and if so returni= ng (2^63)-1 would work, right? The CPU will anyway truncate the float to int. Our current fixup code: // For ConvertTo float->int of same size, clamping before conversion would // change the result because the max integer value is not exactly representable. // Instead detect the overflow result after conversion and fix it. // Generic for all vector lengths. template HWY_INLINE VFromD FixConversionOverflow(DI di, VFromD> origi= nal, VFromD converted) { // Combinations of original and output sign: // --: normal <0 or -huge_val to 80..00: OK // -+: -0 to 0 : OK // +-: +huge_val to 80..00 : xor with FF..FF to get 7F..FF // ++: normal >0 : OK const VFromD sign_wrong =3D AndNot(BitCast(di, original), converted); #if HWY_COMPILER_GCC_ACTUAL // Critical GCC 11 compiler bug (possibly also GCC 10): omits the Xor; al= so // Add() if using that instead. Work around with one more instruction. const RebindToUnsigned du; const VFromD mask =3D BroadcastSignBit(sign_wrong); const VFromD max =3D BitCast(di, ShiftRight<1>(BitCast(du, mask))); return IfVecThenElse(mask, max, converted); #else return Xor(converted, BroadcastSignBit(sign_wrong)); #endif }=