public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Hongtao Liu <crazylht@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, liuhongt <hongtao.liu@intel.com>,
	 "H.J. Lu" <hjl.tools@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
Date: Wed, 8 Sep 2021 17:23:40 +0800	[thread overview]
Message-ID: <CAMZc-bypQ21TOktB7iDG0RA_HXdBPH3G3HnDT4j+=WNgNZ9_6w@mail.gmail.com> (raw)
In-Reply-To: <20210908074250.GO920497@tucnak>

On Wed, Sep 8, 2021 at 3:43 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi!
>
> As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
> operands are in the same register, because the splitter overwrites op1
> before with op1 & mask before using op0.
>
> For dest = xorsign op0, op0 we can actually simplify it from
> dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).
>
> The expander change is an optimization improvement, if we at expansion
> time know it is xorsign op0, op0, we can emit abs right away and get better
> code through that.
>
> The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
> to have same operands during expansion, but during RTL optimizations they
> would appear.  For non-AVX we need to use earlyclobber, we require
> dest and op1 to be the same but op0 must be different because we overwrite
> op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
> may be the same register and if both inputs are the same, handles that case.
> This case can be easily tested with the xorsign<mode>3 expander change
> reverted.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
Patch LGTM.

PS:
  I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
for scalar mode, can't we just expand them into and/xor operations in
the expander, just like vector modes did.
Let me do some experiments to see whether it is ok to remove the splitter.




> Thinking about it more this morning, while this patch fixes the problems
> revealed in the testcase, the recent PR89984 change was buggy too, but
> perhaps that can be fixed incrementally.  Because for AVX the new code
> destructively modifies op1.  If that is different from dest, say on:
> float
> foo (float x, float y)
> {
>   return x * __builtin_copysignf (1.0f, y) + y;
> }
> then we get after RA:
> (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
>         (unspec:SF [
>                 (reg:SF 20 xmm0 [88])
>                 (reg:SF 21 xmm1 [89])
>                 (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
>             ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
>      (nil))
> (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
>         (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
>             (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
>      (nil))
> but split the xorsign into:
>         vandps  .LC0(%rip), %xmm1, %xmm1
>         vxorps  %xmm0, %xmm1, %xmm0
> and then the addition:
>         vaddss  %xmm1, %xmm0, %xmm0
> which means we miscompile it - instead of adding y in the end we add
> __builtin_copysignf (0.0f, y).
> So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
> alternative (enabled for both pre-AVX and AVX as in this patch) the
> &Yv <- Yv, Yv where destination must be different from inputs and another
> Yv <- Yv, Yv where it can be the same but then need a match_scratch
> (with X for the other alternatives and =Yv for the last one).
> That way we'd always have a safe register we can store the op1 & mask
> value into, either the destination (in the first alternative known to
> be equal to op1 which is needed for non-AVX but ok for AVX too), in the
> second alternative known to be different from both inputs and in the third
> which could be used for those
> float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
> cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
> some other register like xmm2.
>
> 2021-09-07  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/102224
>         * config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
>         operands[2], emit abs<mode>2 instead.
>         (@xorsign<mode>3_1): Add early-clobbers for output operand, enable
>         first alternative even for avx, add another alternative with
>         =&Yv <- 0, Yv, Yvm constraints.
>         * config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
>         to op1, emit vpandn instead.
>
>         * gcc.dg/pr102224.c: New test.
>         * gcc.target/i386/avx-pr102224.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2021-09-06 14:47:43.199049975 +0200
> +++ gcc/config/i386/i386.md     2021-09-07 13:13:50.825603852 +0200
> @@ -10803,21 +10803,27 @@ (define_expand "xorsign<mode>3"
>     (match_operand:MODEF 1 "register_operand")
>     (match_operand:MODEF 2 "register_operand")]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> -  "ix86_expand_xorsign (operands); DONE;")
> +{
> +  if (rtx_equal_p (operands[1], operands[2]))
> +    emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
> +  else
> +    ix86_expand_xorsign (operands);
> +  DONE;
> +})
>
>  (define_insn_and_split "@xorsign<mode>3_1"
> -  [(set (match_operand:MODEF 0 "register_operand" "=Yv,Yv")
> +  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
>         (unspec:MODEF
> -         [(match_operand:MODEF 1 "register_operand" "Yv,Yv")
> -          (match_operand:MODEF 2 "register_operand" "0,Yv")
> -          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm")]
> +         [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
> +          (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
> +          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
>           UNSPEC_XORSIGN))]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]
>    "ix86_split_xorsign (operands); DONE;"
> -  [(set_attr "isa" "noavx,avx")])
> +  [(set_attr "isa" "*,avx,avx")])
>
>  ;; One complement instructions
>
> --- gcc/config/i386/i386-expand.c.jj    2021-09-07 12:35:26.340685935 +0200
> +++ gcc/config/i386/i386-expand.c       2021-09-07 13:09:43.961032052 +0200
> @@ -2289,12 +2289,39 @@ ix86_split_xorsign (rtx operands[])
>    mode = GET_MODE (dest);
>    vmode = GET_MODE (mask);
>
> -  op1 = lowpart_subreg (vmode, op1, mode);
> -  x = gen_rtx_AND (vmode, op1, mask);
> -  emit_insn (gen_rtx_SET (op1, x));
> +  /* The constraints ensure that for non-AVX dest == op1 is
> +     different from op0, and for AVX that at most two of
> +     dest, op0 and op1 are the same register but the third one
> +     is different.  */
> +  if (rtx_equal_p (op0, op1))
> +    {
> +      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
> +      if (vmode == V4SFmode)
> +       vmode = V4SImode;
> +      else
> +       {
> +         gcc_assert (vmode == V2DFmode);
> +         vmode = V2DImode;
> +       }
> +      mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
> +      if (MEM_P (mask))
> +       {
> +         rtx msk = lowpart_subreg (vmode, dest, mode);
> +         emit_insn (gen_rtx_SET (msk, mask));
> +         mask = msk;
> +       }
> +      op0 = lowpart_subreg (vmode, op0, mode);
> +      x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
> +    }
> +  else
> +    {
> +      op1 = lowpart_subreg (vmode, op1, mode);
> +      x = gen_rtx_AND (vmode, op1, mask);
> +      emit_insn (gen_rtx_SET (op1, x));
>
> -  op0 = lowpart_subreg (vmode, op0, mode);
> -  x = gen_rtx_XOR (vmode, op1, op0);
> +      op0 = lowpart_subreg (vmode, op0, mode);
> +      x = gen_rtx_XOR (vmode, op1, op0);
> +    }
>
>    dest = lowpart_subreg (vmode, dest, mode);
>    emit_insn (gen_rtx_SET (dest, x));
> --- gcc/testsuite/gcc.dg/pr102224.c.jj  2021-09-07 12:17:43.088507018 +0200
> +++ gcc/testsuite/gcc.dg/pr102224.c     2021-09-07 13:11:52.923241157 +0200
> @@ -0,0 +1,49 @@
> +/* PR target/102224 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) float
> +foo (float x)
> +{
> +  return x * __builtin_copysignf (1.0f, x);
> +}
> +
> +__attribute__((noipa)) float
> +bar (float x, float y)
> +{
> +  return x * __builtin_copysignf (1.0f, y);
> +}
> +
> +__attribute__((noipa)) float
> +baz (float z, float x)
> +{
> +  return x * __builtin_copysignf (1.0f, x);
> +}
> +
> +__attribute__((noipa)) float
> +qux (float z, float x, float y)
> +{
> +  return x * __builtin_copysignf (1.0f, y);
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (1.0f) != 1.0f
> +      || foo (-4.0f) != 4.0f)
> +    __builtin_abort ();
> +  if (bar (1.25f, 7.25f) != 1.25f
> +      || bar (1.75f, -3.25f) != -1.75f
> +      || bar (-2.25f, 7.5f) != -2.25f
> +      || bar (-3.0f, -4.0f) != 3.0f)
> +    __builtin_abort ();
> +  if (baz (5.5f, 1.0f) != 1.0f
> +      || baz (4.25f, -4.0f) != 4.0f)
> +    __builtin_abort ();
> +  if (qux (1.0f, 1.25f, 7.25f) != 1.25f
> +      || qux (2.0f, 1.75f, -3.25f) != -1.75f
> +      || qux (3.0f, -2.25f, 7.5f) != -2.25f
> +      || qux (4.0f, -3.0f, -4.0f) != 3.0f)
> +    __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.target/i386/avx-pr102224.c.jj     2021-09-07 13:12:41.429567551 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-pr102224.c        2021-09-07 13:13:11.397151393 +0200
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/51581 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx" } */
> +/* { dg-require-effective-target avx } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "avx-check.h"
> +#endif
> +#ifndef TEST
> +#define TEST avx_test
> +#endif
> +
> +#define main main1
> +#include "../../gcc.dg/pr102224.c"
> +#undef main
> +
> +#include CHECK_H
> +
> +static void
> +TEST (void)
> +{
> +  main1 ();
> +}
>
>         Jakub
>


--
BR,
Hongtao

  reply	other threads:[~2021-09-08  9:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08  7:42 Jakub Jelinek
2021-09-08  9:23 ` Hongtao Liu [this message]
2021-09-08  9:33   ` Jakub Jelinek
2021-09-08 10:00     ` Hongtao Liu
2021-09-08 10:02       ` Jakub Jelinek
2021-09-08 10:15         ` Hongtao Liu
2021-09-08 10:07       ` Jakub Jelinek
2021-09-14  0:57       ` Andrew Pinski
2021-09-14  2:06         ` Hongtao Liu
2021-09-14  2:18           ` Hongtao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZc-bypQ21TOktB7iDG0RA_HXdBPH3G3HnDT4j+=WNgNZ9_6w@mail.gmail.com' \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=hongtao.liu@intel.com \
    --cc=jakub@redhat.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).