From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 08191386183F; Wed, 8 Sep 2021 09:26:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 08191386183F From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102224] [9/10/11/12 regession] wrong code for `x * copysign(1.0, x)` since r9-5298-g33142cf9cf82aa1f Date: Wed, 08 Sep 2021 09:26:34 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: 9.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2021 09:26:35 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102224 --- Comment #13 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:a7b626d98a9a821ffb33466818d6aa86cac1d6fd commit r12-3413-ga7b626d98a9a821ffb33466818d6aa86cac1d6fd Author: Jakub Jelinek Date: Wed Sep 8 11:25:31 2021 +0200 i386: Fix up @xorsign3_1 [PR102224] As the testcase shows, we miscompile @xorsign3_1 if both input operands are in the same register, because the splitter overwrites op1 before with op1 & mask before using op0. For dest =3D xorsign op0, op0 we can actually simplify it from dest =3D (op0 & mask) ^ op0 to dest =3D op0 & ~mask (aka abs). The expander change is an optimization improvement, if we at expansion time know it is xorsign op0, op0, we can emit abs right away and get be= tter code through that. The @xorsign3_1 is a fix for the case where xorsign wouldn't be k= nown to have same operands during expansion, but during RTL optimizations th= ey would appear. For non-AVX we need to use earlyclobber, we require dest and op1 to be the same but op0 must be different because we overwr= ite op1 first. For AVX the constraints ensure that at most 2 of the 3 oper= ands may be the same register and if both inputs are the same, handles that case. This case can be easily tested with the xorsign3 expander change reverted. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Thinking about it more this morning, while this patch fixes the problems revealed in the testcase, the recent PR89984 change was buggy too, but perhaps that can be fixed incrementally. Because for AVX the new code destructively modifies op1. If that is different from dest, say on: float foo (float x, float y) { return x * __builtin_copysignf (1.0f, y) + y; } then we get after RA: (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82]) (unspec:SF [ (reg:SF 20 xmm0 [88]) (reg:SF 21 xmm1 [89]) (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [= 0=20 S16 A128]) ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1} (nil)) (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87]) (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82]) (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm} (nil)) but split the xorsign into: vandps .LC0(%rip), %xmm1, %xmm1 vxorps %xmm0, %xmm1, %xmm0 and then the addition: vaddss %xmm1, %xmm0, %xmm0 which means we miscompile it - instead of adding y in the end we add __builtin_copysignf (0.0f, y). So, wonder if we don't want instead in addition to the &Yv <- Yv, 0 alternative (enabled for both pre-AVX and AVX as in this patch) the &Yv <- Yv, Yv where destination must be different from inputs and anoth= er Yv <- Yv, Yv where it can be the same but then need a match_scratch (with X for the other alternatives and =3DYv for the last one). That way we'd always have a safe register we can store the op1 & mask value into, either the destination (in the first alternative known to be equal to op1 which is needed for non-AVX but ok for AVX too), in the second alternative known to be different from both inputs and in the th= ird which could be used for those float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y)= ; } cases where op1 is naturally xmm1 and dest =3D=3D op0 naturally xmm0 we= 'd use some other register like xmm2. 2021-09-08 Jakub Jelinek PR target/102224 * config/i386/i386.md (xorsign3): If operands[1] is equal= to operands[2], emit abs2 instead. (@xorsign3_1): Add early-clobbers for output operand, ena= ble first alternative even for avx, add another alternative with =3D&Yv <- 0, Yv, Yvm constraints. * config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equ= al to op1, emit vpandn instead. * gcc.dg/pr102224.c: New test. * gcc.target/i386/avx-pr102224.c: New test.=