public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-3413] i386: Fix up @xorsign<mode>3_1 [PR102224]
@ 2021-09-08  9:26 Jakub Jelinek
  0 siblings, 0 replies; only message in thread
From: Jakub Jelinek @ 2021-09-08  9:26 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:a7b626d98a9a821ffb33466818d6aa86cac1d6fd

commit r12-3413-ga7b626d98a9a821ffb33466818d6aa86cac1d6fd
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Sep 8 11:25:31 2021 +0200

    i386: Fix up @xorsign<mode>3_1 [PR102224]
    
    As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
    operands are in the same register, because the splitter overwrites op1
    before with op1 & mask before using op0.
    
    For dest = xorsign op0, op0 we can actually simplify it from
    dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).
    
    The expander change is an optimization improvement, if we at expansion
    time know it is xorsign op0, op0, we can emit abs right away and get better
    code through that.
    
    The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
    to have same operands during expansion, but during RTL optimizations they
    would appear.  For non-AVX we need to use earlyclobber, we require
    dest and op1 to be the same but op0 must be different because we overwrite
    op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
    may be the same register and if both inputs are the same, handles that case.
    This case can be easily tested with the xorsign<mode>3 expander change
    reverted.
    
    Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
    
    Thinking about it more this morning, while this patch fixes the problems
    revealed in the testcase, the recent PR89984 change was buggy too, but
    perhaps that can be fixed incrementally.  Because for AVX the new code
    destructively modifies op1.  If that is different from dest, say on:
    float
    foo (float x, float y)
    {
      return x * __builtin_copysignf (1.0f, y) + y;
    }
    then we get after RA:
    (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (unspec:SF [
                    (reg:SF 20 xmm0 [88])
                    (reg:SF 21 xmm1 [89])
                    (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
                ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
         (nil))
    (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
            (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
                (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
         (nil))
    but split the xorsign into:
            vandps  .LC0(%rip), %xmm1, %xmm1
            vxorps  %xmm0, %xmm1, %xmm0
    and then the addition:
            vaddss  %xmm1, %xmm0, %xmm0
    which means we miscompile it - instead of adding y in the end we add
    __builtin_copysignf (0.0f, y).
    So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
    alternative (enabled for both pre-AVX and AVX as in this patch) the
    &Yv <- Yv, Yv where destination must be different from inputs and another
    Yv <- Yv, Yv where it can be the same but then need a match_scratch
    (with X for the other alternatives and =Yv for the last one).
    That way we'd always have a safe register we can store the op1 & mask
    value into, either the destination (in the first alternative known to
    be equal to op1 which is needed for non-AVX but ok for AVX too), in the
    second alternative known to be different from both inputs and in the third
    which could be used for those
    float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
    cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
    some other register like xmm2.
    
    2021-09-08  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/102224
            * config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
            operands[2], emit abs<mode>2 instead.
            (@xorsign<mode>3_1): Add early-clobbers for output operand, enable
            first alternative even for avx, add another alternative with
            =&Yv <- 0, Yv, Yvm constraints.
            * config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
            to op1, emit vpandn instead.
    
            * gcc.dg/pr102224.c: New test.
            * gcc.target/i386/avx-pr102224.c: New test.

Diff:
---
 gcc/config/i386/i386-expand.c                | 37 ++++++++++++++++++---
 gcc/config/i386/i386.md                      | 18 ++++++----
 gcc/testsuite/gcc.dg/pr102224.c              | 49 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/avx-pr102224.c | 23 +++++++++++++
 4 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index dfffbe598d4..0cc572c4903 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -2306,12 +2306,39 @@ ix86_split_xorsign (rtx operands[])
   mode = GET_MODE (dest);
   vmode = GET_MODE (mask);
 
-  op1 = lowpart_subreg (vmode, op1, mode);
-  x = gen_rtx_AND (vmode, op1, mask);
-  emit_insn (gen_rtx_SET (op1, x));
+  /* The constraints ensure that for non-AVX dest == op1 is
+     different from op0, and for AVX that at most two of
+     dest, op0 and op1 are the same register but the third one
+     is different.  */
+  if (rtx_equal_p (op0, op1))
+    {
+      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
+      if (vmode == V4SFmode)
+	vmode = V4SImode;
+      else
+	{
+	  gcc_assert (vmode == V2DFmode);
+	  vmode = V2DImode;
+	}
+      mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
+      if (MEM_P (mask))
+	{
+	  rtx msk = lowpart_subreg (vmode, dest, mode);
+	  emit_insn (gen_rtx_SET (msk, mask));
+	  mask = msk;
+	}
+      op0 = lowpart_subreg (vmode, op0, mode);
+      x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
+    }
+  else
+    {
+      op1 = lowpart_subreg (vmode, op1, mode);
+      x = gen_rtx_AND (vmode, op1, mask);
+      emit_insn (gen_rtx_SET (op1, x));
 
-  op0 = lowpart_subreg (vmode, op0, mode);
-  x = gen_rtx_XOR (vmode, op1, op0);
+      op0 = lowpart_subreg (vmode, op0, mode);
+      x = gen_rtx_XOR (vmode, op1, op0);
+    }
 
   dest = lowpart_subreg (vmode, dest, mode);
   emit_insn (gen_rtx_SET (dest, x));
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fe36d7ede4e..0414f24949e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10910,21 +10910,27 @@
    (match_operand:MODEF 1 "register_operand")
    (match_operand:MODEF 2 "register_operand")]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
-  "ix86_expand_xorsign (operands); DONE;")
+{
+  if (rtx_equal_p (operands[1], operands[2]))
+    emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
+  else
+    ix86_expand_xorsign (operands);
+  DONE;
+})
 
 (define_insn_and_split "@xorsign<mode>3_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=Yv,Yv")
+  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
 	(unspec:MODEF
-	  [(match_operand:MODEF 1 "register_operand" "Yv,Yv")
-	   (match_operand:MODEF 2 "register_operand" "0,Yv")
-	   (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm")]
+	  [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
+	   (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
+	   (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
 	  UNSPEC_XORSIGN))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   "ix86_split_xorsign (operands); DONE;"
-  [(set_attr "isa" "noavx,avx")])
+  [(set_attr "isa" "*,avx,avx")])
 \f
 ;; One complement instructions
 
diff --git a/gcc/testsuite/gcc.dg/pr102224.c b/gcc/testsuite/gcc.dg/pr102224.c
new file mode 100644
index 00000000000..9f09ba5ccbb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102224.c
@@ -0,0 +1,49 @@
+/* PR target/102224 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__((noipa)) float
+foo (float x)
+{
+  return x * __builtin_copysignf (1.0f, x);
+}
+
+__attribute__((noipa)) float
+bar (float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y);
+}
+
+__attribute__((noipa)) float
+baz (float z, float x)
+{
+  return x * __builtin_copysignf (1.0f, x);
+}
+
+__attribute__((noipa)) float
+qux (float z, float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y);
+}
+
+int
+main ()
+{
+  if (foo (1.0f) != 1.0f
+      || foo (-4.0f) != 4.0f)
+    __builtin_abort ();
+  if (bar (1.25f, 7.25f) != 1.25f
+      || bar (1.75f, -3.25f) != -1.75f
+      || bar (-2.25f, 7.5f) != -2.25f
+      || bar (-3.0f, -4.0f) != 3.0f)
+    __builtin_abort ();
+  if (baz (5.5f, 1.0f) != 1.0f
+      || baz (4.25f, -4.0f) != 4.0f)
+    __builtin_abort ();
+  if (qux (1.0f, 1.25f, 7.25f) != 1.25f
+      || qux (2.0f, 1.75f, -3.25f) != -1.75f
+      || qux (3.0f, -2.25f, 7.5f) != -2.25f
+      || qux (4.0f, -3.0f, -4.0f) != 3.0f)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx-pr102224.c b/gcc/testsuite/gcc.target/i386/avx-pr102224.c
new file mode 100644
index 00000000000..be6b88c05db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-pr102224.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/51581 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx" } */
+/* { dg-require-effective-target avx } */
+
+#ifndef CHECK_H
+#define CHECK_H "avx-check.h"
+#endif
+#ifndef TEST
+#define TEST avx_test
+#endif
+
+#define main main1
+#include "../../gcc.dg/pr102224.c"
+#undef main
+
+#include CHECK_H
+
+static void
+TEST (void)
+{
+  main1 ();
+}


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-09-08  9:26 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08  9:26 [gcc r12-3413] i386: Fix up @xorsign<mode>3_1 [PR102224] Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).