From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa2b.google.com (mail-vk1-xa2b.google.com [IPv6:2607:f8b0:4864:20::a2b]) by sourceware.org (Postfix) with ESMTPS id 794FB3858415 for ; Wed, 8 Sep 2021 09:18:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 794FB3858415 Received: by mail-vk1-xa2b.google.com with SMTP id b13so518309vkl.10 for ; Wed, 08 Sep 2021 02:18:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Arxlw9ZJ6nlhmMN0+9t9x0vI06vwpmDJERVcHa6g5jQ=; b=GZd6CtXdC7hc5ngaXbKXtPZvWaComVKQrO3TUzf0f0yVN813ATULdvHNfAnbTOd+S6 OEWPfTVF34P01GHy3vgzhMF+8coUiG8iY/rVAVgQbjNy9NFFmwErKcy4nioqey+UHYlK G2IWqXXKqGv+pGh3C4av2CR3HkojhjaiQnB+/AJjOACU/yn0Unp0C4UNMShMVMG8DeTN M/fXSlYyrk1WJn/wcEj2opMLY9eKdRBuOP/+A0ZltZJ7uUlS8EXbCOE0DMoWnplyaEBW dvyfPEgXb9jISQ2aKDfdOad2YCtaIKMLULzgYyqwXwO9SALwsxsnHp+WoCE0k/zIH75o 4cPg== X-Gm-Message-State: AOAM5329wWAph1Om4WeSl5NbyAnhURtRaJW14VbHtch50Ee3iCWYfqoC SkP1T5A/WTicl17mzQ2u881TOiNVjdiQtXcDLks= X-Google-Smtp-Source: ABdhPJy41RXpvyi2JTkeCck4MehaeB+ZA+AsZbsktoK+OAgwXJedLqlbNHYLPdC8p3S87FYuPrkXQVWr7SaVcz9oZNc= X-Received: by 2002:a1f:944d:: with SMTP id w74mr1329708vkd.9.1631092683878; Wed, 08 Sep 2021 02:18:03 -0700 (PDT) MIME-Version: 1.0 References: <20210908074250.GO920497@tucnak> In-Reply-To: <20210908074250.GO920497@tucnak> From: Hongtao Liu Date: Wed, 8 Sep 2021 17:23:40 +0800 Message-ID: Subject: Re: [PATCH] i386: Fix up @xorsign3_1 [PR102224] To: Jakub Jelinek Cc: Uros Bizjak , liuhongt , "H.J. Lu" , GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2021 09:18:08 -0000 On Wed, Sep 8, 2021 at 3:43 PM Jakub Jelinek via Gcc-patches wrote: > > Hi! > > As the testcase shows, we miscompile @xorsign3_1 if both input > operands are in the same register, because the splitter overwrites op1 > before with op1 & mask before using op0. > > For dest = xorsign op0, op0 we can actually simplify it from > dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs). > > The expander change is an optimization improvement, if we at expansion > time know it is xorsign op0, op0, we can emit abs right away and get better > code through that. > > The @xorsign3_1 is a fix for the case where xorsign wouldn't be known > to have same operands during expansion, but during RTL optimizations they > would appear. For non-AVX we need to use earlyclobber, we require > dest and op1 to be the same but op0 must be different because we overwrite > op1 first. For AVX the constraints ensure that at most 2 of the 3 operands > may be the same register and if both inputs are the same, handles that case. > This case can be easily tested with the xorsign3 expander change > reverted. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > Patch LGTM. PS: I'm curious why we need the post_reload splitter @xorsign3_1 for scalar mode, can't we just expand them into and/xor operations in the expander, just like vector modes did. Let me do some experiments to see whether it is ok to remove the splitter. > Thinking about it more this morning, while this patch fixes the problems > revealed in the testcase, the recent PR89984 change was buggy too, but > perhaps that can be fixed incrementally. Because for AVX the new code > destructively modifies op1. If that is different from dest, say on: > float > foo (float x, float y) > { > return x * __builtin_copysignf (1.0f, y) + y; > } > then we get after RA: > (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82]) > (unspec:SF [ > (reg:SF 20 xmm0 [88]) > (reg:SF 21 xmm1 [89]) > (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]) > ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1} > (nil)) > (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87]) > (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82]) > (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm} > (nil)) > but split the xorsign into: > vandps .LC0(%rip), %xmm1, %xmm1 > vxorps %xmm0, %xmm1, %xmm0 > and then the addition: > vaddss %xmm1, %xmm0, %xmm0 > which means we miscompile it - instead of adding y in the end we add > __builtin_copysignf (0.0f, y). > So, wonder if we don't want instead in addition to the &Yv <- Yv, 0 > alternative (enabled for both pre-AVX and AVX as in this patch) the > &Yv <- Yv, Yv where destination must be different from inputs and another > Yv <- Yv, Yv where it can be the same but then need a match_scratch > (with X for the other alternatives and =Yv for the last one). > That way we'd always have a safe register we can store the op1 & mask > value into, either the destination (in the first alternative known to > be equal to op1 which is needed for non-AVX but ok for AVX too), in the > second alternative known to be different from both inputs and in the third > which could be used for those > float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); } > cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use > some other register like xmm2. > > 2021-09-07 Jakub Jelinek > > PR target/102224 > * config/i386/i386.md (xorsign3): If operands[1] is equal to > operands[2], emit abs2 instead. > (@xorsign3_1): Add early-clobbers for output operand, enable > first alternative even for avx, add another alternative with > =&Yv <- 0, Yv, Yvm constraints. > * config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal > to op1, emit vpandn instead. > > * gcc.dg/pr102224.c: New test. > * gcc.target/i386/avx-pr102224.c: New test. > > --- gcc/config/i386/i386.md.jj 2021-09-06 14:47:43.199049975 +0200 > +++ gcc/config/i386/i386.md 2021-09-07 13:13:50.825603852 +0200 > @@ -10803,21 +10803,27 @@ (define_expand "xorsign3" > (match_operand:MODEF 1 "register_operand") > (match_operand:MODEF 2 "register_operand")] > "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" > - "ix86_expand_xorsign (operands); DONE;") > +{ > + if (rtx_equal_p (operands[1], operands[2])) > + emit_insn (gen_abs2 (operands[0], operands[1])); > + else > + ix86_expand_xorsign (operands); > + DONE; > +}) > > (define_insn_and_split "@xorsign3_1" > - [(set (match_operand:MODEF 0 "register_operand" "=Yv,Yv") > + [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv") > (unspec:MODEF > - [(match_operand:MODEF 1 "register_operand" "Yv,Yv") > - (match_operand:MODEF 2 "register_operand" "0,Yv") > - (match_operand: 3 "nonimmediate_operand" "Yvm,Yvm")] > + [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv") > + (match_operand:MODEF 2 "register_operand" "0,Yv,Yv") > + (match_operand: 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")] > UNSPEC_XORSIGN))] > "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" > "#" > "&& reload_completed" > [(const_int 0)] > "ix86_split_xorsign (operands); DONE;" > - [(set_attr "isa" "noavx,avx")]) > + [(set_attr "isa" "*,avx,avx")]) > > ;; One complement instructions > > --- gcc/config/i386/i386-expand.c.jj 2021-09-07 12:35:26.340685935 +0200 > +++ gcc/config/i386/i386-expand.c 2021-09-07 13:09:43.961032052 +0200 > @@ -2289,12 +2289,39 @@ ix86_split_xorsign (rtx operands[]) > mode = GET_MODE (dest); > vmode = GET_MODE (mask); > > - op1 = lowpart_subreg (vmode, op1, mode); > - x = gen_rtx_AND (vmode, op1, mask); > - emit_insn (gen_rtx_SET (op1, x)); > + /* The constraints ensure that for non-AVX dest == op1 is > + different from op0, and for AVX that at most two of > + dest, op0 and op1 are the same register but the third one > + is different. */ > + if (rtx_equal_p (op0, op1)) > + { > + gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest)); > + if (vmode == V4SFmode) > + vmode = V4SImode; > + else > + { > + gcc_assert (vmode == V2DFmode); > + vmode = V2DImode; > + } > + mask = lowpart_subreg (vmode, mask, GET_MODE (mask)); > + if (MEM_P (mask)) > + { > + rtx msk = lowpart_subreg (vmode, dest, mode); > + emit_insn (gen_rtx_SET (msk, mask)); > + mask = msk; > + } > + op0 = lowpart_subreg (vmode, op0, mode); > + x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0); > + } > + else > + { > + op1 = lowpart_subreg (vmode, op1, mode); > + x = gen_rtx_AND (vmode, op1, mask); > + emit_insn (gen_rtx_SET (op1, x)); > > - op0 = lowpart_subreg (vmode, op0, mode); > - x = gen_rtx_XOR (vmode, op1, op0); > + op0 = lowpart_subreg (vmode, op0, mode); > + x = gen_rtx_XOR (vmode, op1, op0); > + } > > dest = lowpart_subreg (vmode, dest, mode); > emit_insn (gen_rtx_SET (dest, x)); > --- gcc/testsuite/gcc.dg/pr102224.c.jj 2021-09-07 12:17:43.088507018 +0200 > +++ gcc/testsuite/gcc.dg/pr102224.c 2021-09-07 13:11:52.923241157 +0200 > @@ -0,0 +1,49 @@ > +/* PR target/102224 */ > +/* { dg-do run } */ > +/* { dg-options "-O2" } */ > + > +__attribute__((noipa)) float > +foo (float x) > +{ > + return x * __builtin_copysignf (1.0f, x); > +} > + > +__attribute__((noipa)) float > +bar (float x, float y) > +{ > + return x * __builtin_copysignf (1.0f, y); > +} > + > +__attribute__((noipa)) float > +baz (float z, float x) > +{ > + return x * __builtin_copysignf (1.0f, x); > +} > + > +__attribute__((noipa)) float > +qux (float z, float x, float y) > +{ > + return x * __builtin_copysignf (1.0f, y); > +} > + > +int > +main () > +{ > + if (foo (1.0f) != 1.0f > + || foo (-4.0f) != 4.0f) > + __builtin_abort (); > + if (bar (1.25f, 7.25f) != 1.25f > + || bar (1.75f, -3.25f) != -1.75f > + || bar (-2.25f, 7.5f) != -2.25f > + || bar (-3.0f, -4.0f) != 3.0f) > + __builtin_abort (); > + if (baz (5.5f, 1.0f) != 1.0f > + || baz (4.25f, -4.0f) != 4.0f) > + __builtin_abort (); > + if (qux (1.0f, 1.25f, 7.25f) != 1.25f > + || qux (2.0f, 1.75f, -3.25f) != -1.75f > + || qux (3.0f, -2.25f, 7.5f) != -2.25f > + || qux (4.0f, -3.0f, -4.0f) != 3.0f) > + __builtin_abort (); > + return 0; > +} > --- gcc/testsuite/gcc.target/i386/avx-pr102224.c.jj 2021-09-07 13:12:41.429567551 +0200 > +++ gcc/testsuite/gcc.target/i386/avx-pr102224.c 2021-09-07 13:13:11.397151393 +0200 > @@ -0,0 +1,23 @@ > +/* PR tree-optimization/51581 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx" } */ > +/* { dg-require-effective-target avx } */ > + > +#ifndef CHECK_H > +#define CHECK_H "avx-check.h" > +#endif > +#ifndef TEST > +#define TEST avx_test > +#endif > + > +#define main main1 > +#include "../../gcc.dg/pr102224.c" > +#undef main > + > +#include CHECK_H > + > +static void > +TEST (void) > +{ > + main1 (); > +} > > Jakub > -- BR, Hongtao