From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 728D0385803B; Wed, 8 Sep 2021 12:09:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 728D0385803B From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/89984] Extra register move Date: Wed, 08 Sep 2021 12:09:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 9.0 X-Bugzilla-Keywords: easyhack, missed-optimization, ra X-Bugzilla-Severity: enhancement X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: REOPENED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2021 12:09:25 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D89984 --- Comment #6 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:7485a52551d71db2e8bbfc4c484196bcc321a1cd commit r12-3417-g7485a52551d71db2e8bbfc4c484196bcc321a1cd Author: Jakub Jelinek Date: Wed Sep 8 14:06:10 2021 +0200 i386: Fix up xorsign for AVX [PR89984] Thinking about it more this morning, while this patch fixes the problems revealed in the testcase, the recent PR89984 change was buggy too, but perhaps that can be fixed incrementally. Because for AVX the new code destructively modifies op1. If that is different from dest, say on: float foo (float x, float y) { return x * __builtin_copysignf (1.0f, y) + y; } then we get after RA: (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82]) (unspec:SF [ (reg:SF 20 xmm0 [88]) (reg:SF 21 xmm1 [89]) (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [= 0=20 S16 A128]) ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1} (nil)) (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87]) (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82]) (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm} (nil)) but split the xorsign into: vandps .LC0(%rip), %xmm1, %xmm1 vxorps %xmm0, %xmm1, %xmm0 and then the addition: vaddss %xmm1, %xmm0, %xmm0 which means we miscompile it - instead of adding y in the end we add __builtin_copysignf (0.0f, y). So, wonder if we don't want instead in addition to the &Yv <- Yv, 0 alternative (enabled for both pre-AVX and AVX as in this patch) the &Yv <- Yv, Yv where destination must be different from inputs and anoth= er Yv <- Yv, Yv where it can be the same but then need a match_scratch (with X for the other alternatives and =3DYv for the last one). That way we'd always have a safe register we can store the op1 & mask value into, either the destination (in the first alternative known to be equal to op1 which is needed for non-AVX but ok for AVX too), in the second alternative known to be different from both inputs and in the th= ird which could be used for those float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y)= ; } cases where op1 is naturally xmm1 and dest =3D=3D op0 naturally xmm0 we= 'd use some other register like xmm2. On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote: > I'm curious why we need the post_reload splitter @xorsign3_1 > for scalar mode, can't we just expand them into and/xor operations in > the expander, just like vector modes did. Following seems to work for all the testcases I've tried (and in some generates better code than the post-reload splitter). 2021-09-08 Jakub Jelinek liuhongt PR target/89984 * config/i386/i386.md (@xorsign3_1): Remove. * config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away into AND with mask and XOR, using paradoxical subregs. (ix86_split_xorsign): Remove. * config/i386/i386-protos.h (ix86_split_xorsign): Remove. * gcc.target/i386/avx-pr102224.c: Fix up PR number. * gcc.dg/pr89984.c: New test. * gcc.target/i386/avx-pr89984.c: New test.=