public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
@ 2021-09-08  7:42 Jakub Jelinek
  2021-09-08  9:23 ` Hongtao Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2021-09-08  7:42 UTC (permalink / raw)
  To: Uros Bizjak, liuhongt, H.J. Lu; +Cc: gcc-patches

Hi!

As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear.  For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that case.
This case can be easily tested with the xorsign<mode>3 expander change
reverted.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

2021-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR target/102224
	* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
	operands[2], emit abs<mode>2 instead.
	(@xorsign<mode>3_1): Add early-clobbers for output operand, enable
	first alternative even for avx, add another alternative with
	=&Yv <- 0, Yv, Yvm constraints.
	* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
	to op1, emit vpandn instead.

	* gcc.dg/pr102224.c: New test.
	* gcc.target/i386/avx-pr102224.c: New test.

--- gcc/config/i386/i386.md.jj	2021-09-06 14:47:43.199049975 +0200
+++ gcc/config/i386/i386.md	2021-09-07 13:13:50.825603852 +0200
@@ -10803,21 +10803,27 @@ (define_expand "xorsign<mode>3"
    (match_operand:MODEF 1 "register_operand")
    (match_operand:MODEF 2 "register_operand")]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
-  "ix86_expand_xorsign (operands); DONE;")
+{
+  if (rtx_equal_p (operands[1], operands[2]))
+    emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
+  else
+    ix86_expand_xorsign (operands);
+  DONE;
+})
 
 (define_insn_and_split "@xorsign<mode>3_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=Yv,Yv")
+  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
 	(unspec:MODEF
-	  [(match_operand:MODEF 1 "register_operand" "Yv,Yv")
-	   (match_operand:MODEF 2 "register_operand" "0,Yv")
-	   (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm")]
+	  [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
+	   (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
+	   (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
 	  UNSPEC_XORSIGN))]
   "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   "ix86_split_xorsign (operands); DONE;"
-  [(set_attr "isa" "noavx,avx")])
+  [(set_attr "isa" "*,avx,avx")])
 \f
 ;; One complement instructions
 
--- gcc/config/i386/i386-expand.c.jj	2021-09-07 12:35:26.340685935 +0200
+++ gcc/config/i386/i386-expand.c	2021-09-07 13:09:43.961032052 +0200
@@ -2289,12 +2289,39 @@ ix86_split_xorsign (rtx operands[])
   mode = GET_MODE (dest);
   vmode = GET_MODE (mask);
 
-  op1 = lowpart_subreg (vmode, op1, mode);
-  x = gen_rtx_AND (vmode, op1, mask);
-  emit_insn (gen_rtx_SET (op1, x));
+  /* The constraints ensure that for non-AVX dest == op1 is
+     different from op0, and for AVX that at most two of
+     dest, op0 and op1 are the same register but the third one
+     is different.  */
+  if (rtx_equal_p (op0, op1))
+    {
+      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
+      if (vmode == V4SFmode)
+	vmode = V4SImode;
+      else
+	{
+	  gcc_assert (vmode == V2DFmode);
+	  vmode = V2DImode;
+	}
+      mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
+      if (MEM_P (mask))
+	{
+	  rtx msk = lowpart_subreg (vmode, dest, mode);
+	  emit_insn (gen_rtx_SET (msk, mask));
+	  mask = msk;
+	}
+      op0 = lowpart_subreg (vmode, op0, mode);
+      x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
+    }
+  else
+    {
+      op1 = lowpart_subreg (vmode, op1, mode);
+      x = gen_rtx_AND (vmode, op1, mask);
+      emit_insn (gen_rtx_SET (op1, x));
 
-  op0 = lowpart_subreg (vmode, op0, mode);
-  x = gen_rtx_XOR (vmode, op1, op0);
+      op0 = lowpart_subreg (vmode, op0, mode);
+      x = gen_rtx_XOR (vmode, op1, op0);
+    }
 
   dest = lowpart_subreg (vmode, dest, mode);
   emit_insn (gen_rtx_SET (dest, x));
--- gcc/testsuite/gcc.dg/pr102224.c.jj	2021-09-07 12:17:43.088507018 +0200
+++ gcc/testsuite/gcc.dg/pr102224.c	2021-09-07 13:11:52.923241157 +0200
@@ -0,0 +1,49 @@
+/* PR target/102224 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__((noipa)) float
+foo (float x)
+{
+  return x * __builtin_copysignf (1.0f, x);
+}
+
+__attribute__((noipa)) float
+bar (float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y);
+}
+
+__attribute__((noipa)) float
+baz (float z, float x)
+{
+  return x * __builtin_copysignf (1.0f, x);
+}
+
+__attribute__((noipa)) float
+qux (float z, float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y);
+}
+
+int
+main ()
+{
+  if (foo (1.0f) != 1.0f
+      || foo (-4.0f) != 4.0f)
+    __builtin_abort ();
+  if (bar (1.25f, 7.25f) != 1.25f
+      || bar (1.75f, -3.25f) != -1.75f
+      || bar (-2.25f, 7.5f) != -2.25f
+      || bar (-3.0f, -4.0f) != 3.0f)
+    __builtin_abort ();
+  if (baz (5.5f, 1.0f) != 1.0f
+      || baz (4.25f, -4.0f) != 4.0f)
+    __builtin_abort ();
+  if (qux (1.0f, 1.25f, 7.25f) != 1.25f
+      || qux (2.0f, 1.75f, -3.25f) != -1.75f
+      || qux (3.0f, -2.25f, 7.5f) != -2.25f
+      || qux (4.0f, -3.0f, -4.0f) != 3.0f)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/avx-pr102224.c.jj	2021-09-07 13:12:41.429567551 +0200
+++ gcc/testsuite/gcc.target/i386/avx-pr102224.c	2021-09-07 13:13:11.397151393 +0200
@@ -0,0 +1,23 @@
+/* PR tree-optimization/51581 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx" } */
+/* { dg-require-effective-target avx } */
+
+#ifndef CHECK_H
+#define CHECK_H "avx-check.h"
+#endif
+#ifndef TEST
+#define TEST avx_test
+#endif
+
+#define main main1
+#include "../../gcc.dg/pr102224.c"
+#undef main
+
+#include CHECK_H
+
+static void
+TEST (void)
+{
+  main1 ();
+}

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08  7:42 [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224] Jakub Jelinek
@ 2021-09-08  9:23 ` Hongtao Liu
  2021-09-08  9:33   ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Hongtao Liu @ 2021-09-08  9:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, liuhongt, H.J. Lu, GCC Patches

On Wed, Sep 8, 2021 at 3:43 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi!
>
> As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
> operands are in the same register, because the splitter overwrites op1
> before with op1 & mask before using op0.
>
> For dest = xorsign op0, op0 we can actually simplify it from
> dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).
>
> The expander change is an optimization improvement, if we at expansion
> time know it is xorsign op0, op0, we can emit abs right away and get better
> code through that.
>
> The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
> to have same operands during expansion, but during RTL optimizations they
> would appear.  For non-AVX we need to use earlyclobber, we require
> dest and op1 to be the same but op0 must be different because we overwrite
> op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
> may be the same register and if both inputs are the same, handles that case.
> This case can be easily tested with the xorsign<mode>3 expander change
> reverted.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
Patch LGTM.

PS:
  I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
for scalar mode, can't we just expand them into and/xor operations in
the expander, just like vector modes did.
Let me do some experiments to see whether it is ok to remove the splitter.




> Thinking about it more this morning, while this patch fixes the problems
> revealed in the testcase, the recent PR89984 change was buggy too, but
> perhaps that can be fixed incrementally.  Because for AVX the new code
> destructively modifies op1.  If that is different from dest, say on:
> float
> foo (float x, float y)
> {
>   return x * __builtin_copysignf (1.0f, y) + y;
> }
> then we get after RA:
> (insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
>         (unspec:SF [
>                 (reg:SF 20 xmm0 [88])
>                 (reg:SF 21 xmm1 [89])
>                 (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
>             ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
>      (nil))
> (insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
>         (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
>             (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
>      (nil))
> but split the xorsign into:
>         vandps  .LC0(%rip), %xmm1, %xmm1
>         vxorps  %xmm0, %xmm1, %xmm0
> and then the addition:
>         vaddss  %xmm1, %xmm0, %xmm0
> which means we miscompile it - instead of adding y in the end we add
> __builtin_copysignf (0.0f, y).
> So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
> alternative (enabled for both pre-AVX and AVX as in this patch) the
> &Yv <- Yv, Yv where destination must be different from inputs and another
> Yv <- Yv, Yv where it can be the same but then need a match_scratch
> (with X for the other alternatives and =Yv for the last one).
> That way we'd always have a safe register we can store the op1 & mask
> value into, either the destination (in the first alternative known to
> be equal to op1 which is needed for non-AVX but ok for AVX too), in the
> second alternative known to be different from both inputs and in the third
> which could be used for those
> float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
> cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
> some other register like xmm2.
>
> 2021-09-07  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/102224
>         * config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
>         operands[2], emit abs<mode>2 instead.
>         (@xorsign<mode>3_1): Add early-clobbers for output operand, enable
>         first alternative even for avx, add another alternative with
>         =&Yv <- 0, Yv, Yvm constraints.
>         * config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
>         to op1, emit vpandn instead.
>
>         * gcc.dg/pr102224.c: New test.
>         * gcc.target/i386/avx-pr102224.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2021-09-06 14:47:43.199049975 +0200
> +++ gcc/config/i386/i386.md     2021-09-07 13:13:50.825603852 +0200
> @@ -10803,21 +10803,27 @@ (define_expand "xorsign<mode>3"
>     (match_operand:MODEF 1 "register_operand")
>     (match_operand:MODEF 2 "register_operand")]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> -  "ix86_expand_xorsign (operands); DONE;")
> +{
> +  if (rtx_equal_p (operands[1], operands[2]))
> +    emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
> +  else
> +    ix86_expand_xorsign (operands);
> +  DONE;
> +})
>
>  (define_insn_and_split "@xorsign<mode>3_1"
> -  [(set (match_operand:MODEF 0 "register_operand" "=Yv,Yv")
> +  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
>         (unspec:MODEF
> -         [(match_operand:MODEF 1 "register_operand" "Yv,Yv")
> -          (match_operand:MODEF 2 "register_operand" "0,Yv")
> -          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm")]
> +         [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
> +          (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
> +          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
>           UNSPEC_XORSIGN))]
>    "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]
>    "ix86_split_xorsign (operands); DONE;"
> -  [(set_attr "isa" "noavx,avx")])
> +  [(set_attr "isa" "*,avx,avx")])
>
>  ;; One complement instructions
>
> --- gcc/config/i386/i386-expand.c.jj    2021-09-07 12:35:26.340685935 +0200
> +++ gcc/config/i386/i386-expand.c       2021-09-07 13:09:43.961032052 +0200
> @@ -2289,12 +2289,39 @@ ix86_split_xorsign (rtx operands[])
>    mode = GET_MODE (dest);
>    vmode = GET_MODE (mask);
>
> -  op1 = lowpart_subreg (vmode, op1, mode);
> -  x = gen_rtx_AND (vmode, op1, mask);
> -  emit_insn (gen_rtx_SET (op1, x));
> +  /* The constraints ensure that for non-AVX dest == op1 is
> +     different from op0, and for AVX that at most two of
> +     dest, op0 and op1 are the same register but the third one
> +     is different.  */
> +  if (rtx_equal_p (op0, op1))
> +    {
> +      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
> +      if (vmode == V4SFmode)
> +       vmode = V4SImode;
> +      else
> +       {
> +         gcc_assert (vmode == V2DFmode);
> +         vmode = V2DImode;
> +       }
> +      mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
> +      if (MEM_P (mask))
> +       {
> +         rtx msk = lowpart_subreg (vmode, dest, mode);
> +         emit_insn (gen_rtx_SET (msk, mask));
> +         mask = msk;
> +       }
> +      op0 = lowpart_subreg (vmode, op0, mode);
> +      x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
> +    }
> +  else
> +    {
> +      op1 = lowpart_subreg (vmode, op1, mode);
> +      x = gen_rtx_AND (vmode, op1, mask);
> +      emit_insn (gen_rtx_SET (op1, x));
>
> -  op0 = lowpart_subreg (vmode, op0, mode);
> -  x = gen_rtx_XOR (vmode, op1, op0);
> +      op0 = lowpart_subreg (vmode, op0, mode);
> +      x = gen_rtx_XOR (vmode, op1, op0);
> +    }
>
>    dest = lowpart_subreg (vmode, dest, mode);
>    emit_insn (gen_rtx_SET (dest, x));
> --- gcc/testsuite/gcc.dg/pr102224.c.jj  2021-09-07 12:17:43.088507018 +0200
> +++ gcc/testsuite/gcc.dg/pr102224.c     2021-09-07 13:11:52.923241157 +0200
> @@ -0,0 +1,49 @@
> +/* PR target/102224 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) float
> +foo (float x)
> +{
> +  return x * __builtin_copysignf (1.0f, x);
> +}
> +
> +__attribute__((noipa)) float
> +bar (float x, float y)
> +{
> +  return x * __builtin_copysignf (1.0f, y);
> +}
> +
> +__attribute__((noipa)) float
> +baz (float z, float x)
> +{
> +  return x * __builtin_copysignf (1.0f, x);
> +}
> +
> +__attribute__((noipa)) float
> +qux (float z, float x, float y)
> +{
> +  return x * __builtin_copysignf (1.0f, y);
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (1.0f) != 1.0f
> +      || foo (-4.0f) != 4.0f)
> +    __builtin_abort ();
> +  if (bar (1.25f, 7.25f) != 1.25f
> +      || bar (1.75f, -3.25f) != -1.75f
> +      || bar (-2.25f, 7.5f) != -2.25f
> +      || bar (-3.0f, -4.0f) != 3.0f)
> +    __builtin_abort ();
> +  if (baz (5.5f, 1.0f) != 1.0f
> +      || baz (4.25f, -4.0f) != 4.0f)
> +    __builtin_abort ();
> +  if (qux (1.0f, 1.25f, 7.25f) != 1.25f
> +      || qux (2.0f, 1.75f, -3.25f) != -1.75f
> +      || qux (3.0f, -2.25f, 7.5f) != -2.25f
> +      || qux (4.0f, -3.0f, -4.0f) != 3.0f)
> +    __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.target/i386/avx-pr102224.c.jj     2021-09-07 13:12:41.429567551 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-pr102224.c        2021-09-07 13:13:11.397151393 +0200
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/51581 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx" } */
> +/* { dg-require-effective-target avx } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "avx-check.h"
> +#endif
> +#ifndef TEST
> +#define TEST avx_test
> +#endif
> +
> +#define main main1
> +#include "../../gcc.dg/pr102224.c"
> +#undef main
> +
> +#include CHECK_H
> +
> +static void
> +TEST (void)
> +{
> +  main1 ();
> +}
>
>         Jakub
>


--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08  9:23 ` Hongtao Liu
@ 2021-09-08  9:33   ` Jakub Jelinek
  2021-09-08 10:00     ` Hongtao Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2021-09-08  9:33 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, H.J. Lu, GCC Patches

On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> Patch LGTM.

Thanks, committed.

> PS:
>   I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.
> Let me do some experiments to see whether it is ok to remove the splitter.

I bet it is the question how should the code look like before RA.
stv is somewhat related, but as that replaces whole chains, it can do:
(insn 14 5 6 2 (set (subreg:V2DI (reg:DI 92) 0)
        (vec_concat:V2DI (mem/c:DI (symbol_ref:SI ("c") [flags 0x2]  <var_decl 0x7f65a131fd80 c>) [1 c+0 S8 A64])
            (const_int 0 [0]))) "hohohou.c":6:9 -1
     (nil))
on loads of memory.
But it stv still does use paradoxical subregs:
(insn 10 16 11 2 (set (subreg:V2DI (reg:DI 91) 0)
        (minus:V2DI (subreg:V2DI (reg:DI 87) 0)
            (subreg:V2DI (reg:DI 94) 0))) "hohohou.c":6:13 5003 {*subv2di3}
     (expr_list:REG_DEAD (reg:DI 87)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("a") [flags 0x2]  <var_decl 0x7f65a131fc60 a>) [1 a+0 S8 A64])
        (reg:DI 91)) "hohohou.c":6:5 76 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 91)
        (nil)))
so perhaps just using paradoxical subregs everywhere would be ok?

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08  9:33   ` Jakub Jelinek
@ 2021-09-08 10:00     ` Hongtao Liu
  2021-09-08 10:02       ` Jakub Jelinek
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hongtao Liu @ 2021-09-08 10:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, liuhongt, H.J. Lu, GCC Patches

On Wed, Sep 8, 2021 at 5:33 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > >
> > Patch LGTM.
>
> Thanks, committed.
>
> > PS:
> >   I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> > for scalar mode, can't we just expand them into and/xor operations in
> > the expander, just like vector modes did.
> > Let me do some experiments to see whether it is ok to remove the splitter.
>
> I bet it is the question how should the code look like before RA.
> stv is somewhat related, but as that replaces whole chains, it can do:
> (insn 14 5 6 2 (set (subreg:V2DI (reg:DI 92) 0)
>         (vec_concat:V2DI (mem/c:DI (symbol_ref:SI ("c") [flags 0x2]  <var_decl 0x7f65a131fd80 c>) [1 c+0 S8 A64])
>             (const_int 0 [0]))) "hohohou.c":6:9 -1
>      (nil))
> on loads of memory.
> But it stv still does use paradoxical subregs:
> (insn 10 16 11 2 (set (subreg:V2DI (reg:DI 91) 0)
>         (minus:V2DI (subreg:V2DI (reg:DI 87) 0)
>             (subreg:V2DI (reg:DI 94) 0))) "hohohou.c":6:13 5003 {*subv2di3}
>      (expr_list:REG_DEAD (reg:DI 87)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> (insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("a") [flags 0x2]  <var_decl 0x7f65a131fc60 a>) [1 a+0 S8 A64])
>         (reg:DI 91)) "hohohou.c":6:5 76 {*movdi_internal}
>      (expr_list:REG_DEAD (reg:DI 91)
>         (nil)))
> so perhaps just using paradoxical subregs everywhere would be ok?
Yes, I think so.
And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
allowed by validate_subreg until r11-621.
That's why post_reload splitter is needed here.
>         Jakub
>



--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08 10:00     ` Hongtao Liu
@ 2021-09-08 10:02       ` Jakub Jelinek
  2021-09-08 10:15         ` Hongtao Liu
  2021-09-08 10:07       ` Jakub Jelinek
  2021-09-14  0:57       ` Andrew Pinski
  2 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2021-09-08 10:02 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, H.J. Lu, GCC Patches

On Wed, Sep 08, 2021 at 06:00:50PM +0800, Hongtao Liu wrote:
> Yes, I think so.
> And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
> allowed by validate_subreg until r11-621.
> That's why post_reload splitter is needed here.

Following seems to work for all the testcases I've find (and in some
generates better code than the post-reload splitter):

2021-09-08  Jakub Jelinek  <jakub@redhat.com>
	    liuhongt  <hongtao.liu@intel.com>

	PR target/89984
	* config/i386/i386.md (@xorsign<mode>3_1): Remove.
	* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
	into AND with mask and XOR, using paradoxical subregs.
	(ix86_split_xorsign): Remove.

	* gcc.target/i386/avx-pr102224.c: Fix up PR number.
	* gcc.dg/pr89984.c: New test.
	* gcc.target/i386/avx-pr89984.c: New test.

--- gcc/config/i386/i386.md.jj	2021-09-08 11:40:55.826534981 +0200
+++ gcc/config/i386/i386.md	2021-09-08 11:44:08.394828674 +0200
@@ -10918,20 +10918,6 @@ (define_expand "xorsign<mode>3"
   DONE;
 })
 
-(define_insn_and_split "@xorsign<mode>3_1"
-  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
-	(unspec:MODEF
-	  [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
-	   (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
-	   (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
-	  UNSPEC_XORSIGN))]
-  "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  "ix86_split_xorsign (operands); DONE;"
-  [(set_attr "isa" "*,avx,avx")])
-\f
 ;; One complement instructions
 
 (define_expand "one_cmpl<mode>2"
--- gcc/config/i386/i386-expand.c.jj	2021-09-08 11:40:55.824535010 +0200
+++ gcc/config/i386/i386-expand.c	2021-09-08 11:51:15.969819626 +0200
@@ -2270,7 +2270,7 @@ void
 ix86_expand_xorsign (rtx operands[])
 {
   machine_mode mode, vmode;
-  rtx dest, op0, op1, mask;
+  rtx dest, op0, op1, mask, x, temp;
 
   dest = operands[0];
   op0 = operands[1];
@@ -2285,60 +2285,15 @@ ix86_expand_xorsign (rtx operands[])
   else
     gcc_unreachable ();
 
+  temp = gen_reg_rtx (vmode);
   mask = ix86_build_signbit_mask (vmode, 0, 0);
 
-  emit_insn (gen_xorsign3_1 (mode, dest, op0, op1, mask));
-}
+  op1 = lowpart_subreg (vmode, op1, mode);
+  x = gen_rtx_AND (vmode, op1, mask);
+  emit_insn (gen_rtx_SET (temp, x));
 
-/* Deconstruct an xorsign operation into bit masks.  */
-
-void
-ix86_split_xorsign (rtx operands[])
-{
-  machine_mode mode, vmode;
-  rtx dest, op0, op1, mask, x;
-
-  dest = operands[0];
-  op0 = operands[1];
-  op1 = operands[2];
-  mask = operands[3];
-
-  mode = GET_MODE (dest);
-  vmode = GET_MODE (mask);
-
-  /* The constraints ensure that for non-AVX dest == op1 is
-     different from op0, and for AVX that at most two of
-     dest, op0 and op1 are the same register but the third one
-     is different.  */
-  if (rtx_equal_p (op0, op1))
-    {
-      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
-      if (vmode == V4SFmode)
-	vmode = V4SImode;
-      else
-	{
-	  gcc_assert (vmode == V2DFmode);
-	  vmode = V2DImode;
-	}
-      mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
-      if (MEM_P (mask))
-	{
-	  rtx msk = lowpart_subreg (vmode, dest, mode);
-	  emit_insn (gen_rtx_SET (msk, mask));
-	  mask = msk;
-	}
-      op0 = lowpart_subreg (vmode, op0, mode);
-      x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
-    }
-  else
-    {
-      op1 = lowpart_subreg (vmode, op1, mode);
-      x = gen_rtx_AND (vmode, op1, mask);
-      emit_insn (gen_rtx_SET (op1, x));
-
-      op0 = lowpart_subreg (vmode, op0, mode);
-      x = gen_rtx_XOR (vmode, op1, op0);
-    }
+  op0 = lowpart_subreg (vmode, op0, mode);
+  x = gen_rtx_XOR (vmode, temp, op0);
 
   dest = lowpart_subreg (vmode, dest, mode);
   emit_insn (gen_rtx_SET (dest, x));
--- gcc/testsuite/gcc.target/i386/avx-pr102224.c.jj	2021-09-08 11:40:55.826534981 +0200
+++ gcc/testsuite/gcc.target/i386/avx-pr102224.c	2021-09-08 11:57:41.741386062 +0200
@@ -1,4 +1,4 @@
-/* PR tree-optimization/51581 */
+/* PR target/102224 */
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx" } */
 /* { dg-require-effective-target avx } */
--- gcc/testsuite/gcc.dg/pr89984.c.jj	2021-09-08 11:56:33.799343240 +0200
+++ gcc/testsuite/gcc.dg/pr89984.c	2021-09-08 11:54:36.070001821 +0200
@@ -0,0 +1,20 @@
+/* PR target/89984 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__((noipa)) float
+foo (float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y) + y;
+}
+
+int
+main ()
+{
+  if (foo (1.25f, 7.25f) != 1.25f + 7.25f
+      || foo (1.75f, -3.25f) != -1.75f + -3.25f
+      || foo (-2.25f, 7.5f) != -2.25f + 7.5f
+      || foo (-3.0f, -4.0f) != 3.0f + -4.0f)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/avx-pr89984.c.jj	2021-09-08 11:57:12.297800869 +0200
+++ gcc/testsuite/gcc.target/i386/avx-pr89984.c	2021-09-08 11:57:56.936172001 +0200
@@ -0,0 +1,23 @@
+/* PR target/89984 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx" } */
+/* { dg-require-effective-target avx } */
+
+#ifndef CHECK_H
+#define CHECK_H "avx-check.h"
+#endif
+#ifndef TEST
+#define TEST avx_test
+#endif
+
+#define main main1
+#include "../../gcc.dg/pr89984.c"
+#undef main
+
+#include CHECK_H
+
+static void
+TEST (void)
+{
+  main1 ();
+}


	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08 10:00     ` Hongtao Liu
  2021-09-08 10:02       ` Jakub Jelinek
@ 2021-09-08 10:07       ` Jakub Jelinek
  2021-09-14  0:57       ` Andrew Pinski
  2 siblings, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2021-09-08 10:07 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, liuhongt, H.J. Lu, GCC Patches

On Wed, Sep 08, 2021 at 06:00:50PM +0800, Hongtao Liu wrote:
> And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
> allowed by validate_subreg until r11-621.

My reading of the r11-621 change is that it allowed (subreg:V4SF (reg:V2SF) 0)
but that (subreg:V4SF (reg:SF) 0) has been valid even before that.
In any case, the PR89984 fix was a missed-optimization fix, so we don't need
to backport that and thus don't need to backport the follow-up patch either.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08 10:02       ` Jakub Jelinek
@ 2021-09-08 10:15         ` Hongtao Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Hongtao Liu @ 2021-09-08 10:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, liuhongt, H.J. Lu, GCC Patches

On Wed, Sep 8, 2021 at 6:02 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Sep 08, 2021 at 06:00:50PM +0800, Hongtao Liu wrote:
> > Yes, I think so.
> > And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
> > allowed by validate_subreg until r11-621.
> > That's why post_reload splitter is needed here.
>
> Following seems to work for all the testcases I've find (and in some
> generates better code than the post-reload splitter):
>
> 2021-09-08  Jakub Jelinek  <jakub@redhat.com>
>             liuhongt  <hongtao.liu@intel.com>
>
>         PR target/89984
>         * config/i386/i386.md (@xorsign<mode>3_1): Remove.
>         * config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
>         into AND with mask and XOR, using paradoxical subregs.
>         (ix86_split_xorsign): Remove.
Also remove this from i386-protos.h.
>
>         * gcc.target/i386/avx-pr102224.c: Fix up PR number.
>         * gcc.dg/pr89984.c: New test.
>         * gcc.target/i386/avx-pr89984.c: New test.
>
Other LGTM.
> --- gcc/config/i386/i386.md.jj  2021-09-08 11:40:55.826534981 +0200
> +++ gcc/config/i386/i386.md     2021-09-08 11:44:08.394828674 +0200
> @@ -10918,20 +10918,6 @@ (define_expand "xorsign<mode>3"
>    DONE;
>  })
>
> -(define_insn_and_split "@xorsign<mode>3_1"
> -  [(set (match_operand:MODEF 0 "register_operand" "=&Yv,&Yv,&Yv")
> -       (unspec:MODEF
> -         [(match_operand:MODEF 1 "register_operand" "Yv,0,Yv")
> -          (match_operand:MODEF 2 "register_operand" "0,Yv,Yv")
> -          (match_operand:<ssevecmode> 3 "nonimmediate_operand" "Yvm,Yvm,Yvm")]
> -         UNSPEC_XORSIGN))]
> -  "SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH"
> -  "#"
> -  "&& reload_completed"
> -  [(const_int 0)]
> -  "ix86_split_xorsign (operands); DONE;"
> -  [(set_attr "isa" "*,avx,avx")])
> -
>  ;; One complement instructions
>
>  (define_expand "one_cmpl<mode>2"
> --- gcc/config/i386/i386-expand.c.jj    2021-09-08 11:40:55.824535010 +0200
> +++ gcc/config/i386/i386-expand.c       2021-09-08 11:51:15.969819626 +0200
> @@ -2270,7 +2270,7 @@ void
>  ix86_expand_xorsign (rtx operands[])
>  {
>    machine_mode mode, vmode;
> -  rtx dest, op0, op1, mask;
> +  rtx dest, op0, op1, mask, x, temp;
>
>    dest = operands[0];
>    op0 = operands[1];
> @@ -2285,60 +2285,15 @@ ix86_expand_xorsign (rtx operands[])
>    else
>      gcc_unreachable ();
>
> +  temp = gen_reg_rtx (vmode);
>    mask = ix86_build_signbit_mask (vmode, 0, 0);
>
> -  emit_insn (gen_xorsign3_1 (mode, dest, op0, op1, mask));
> -}
> +  op1 = lowpart_subreg (vmode, op1, mode);
> +  x = gen_rtx_AND (vmode, op1, mask);
> +  emit_insn (gen_rtx_SET (temp, x));
>
> -/* Deconstruct an xorsign operation into bit masks.  */
> -
> -void
> -ix86_split_xorsign (rtx operands[])
> -{
> -  machine_mode mode, vmode;
> -  rtx dest, op0, op1, mask, x;
> -
> -  dest = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  mask = operands[3];
> -
> -  mode = GET_MODE (dest);
> -  vmode = GET_MODE (mask);
> -
> -  /* The constraints ensure that for non-AVX dest == op1 is
> -     different from op0, and for AVX that at most two of
> -     dest, op0 and op1 are the same register but the third one
> -     is different.  */
> -  if (rtx_equal_p (op0, op1))
> -    {
> -      gcc_assert (TARGET_AVX && !rtx_equal_p (op0, dest));
> -      if (vmode == V4SFmode)
> -       vmode = V4SImode;
> -      else
> -       {
> -         gcc_assert (vmode == V2DFmode);
> -         vmode = V2DImode;
> -       }
> -      mask = lowpart_subreg (vmode, mask, GET_MODE (mask));
> -      if (MEM_P (mask))
> -       {
> -         rtx msk = lowpart_subreg (vmode, dest, mode);
> -         emit_insn (gen_rtx_SET (msk, mask));
> -         mask = msk;
> -       }
> -      op0 = lowpart_subreg (vmode, op0, mode);
> -      x = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, mask), op0);
> -    }
> -  else
> -    {
> -      op1 = lowpart_subreg (vmode, op1, mode);
> -      x = gen_rtx_AND (vmode, op1, mask);
> -      emit_insn (gen_rtx_SET (op1, x));
> -
> -      op0 = lowpart_subreg (vmode, op0, mode);
> -      x = gen_rtx_XOR (vmode, op1, op0);
> -    }
> +  op0 = lowpart_subreg (vmode, op0, mode);
> +  x = gen_rtx_XOR (vmode, temp, op0);
>
>    dest = lowpart_subreg (vmode, dest, mode);
>    emit_insn (gen_rtx_SET (dest, x));
> --- gcc/testsuite/gcc.target/i386/avx-pr102224.c.jj     2021-09-08 11:40:55.826534981 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-pr102224.c        2021-09-08 11:57:41.741386062 +0200
> @@ -1,4 +1,4 @@
> -/* PR tree-optimization/51581 */
> +/* PR target/102224 */
>  /* { dg-do run } */
>  /* { dg-options "-O2 -mavx" } */
>  /* { dg-require-effective-target avx } */
> --- gcc/testsuite/gcc.dg/pr89984.c.jj   2021-09-08 11:56:33.799343240 +0200
> +++ gcc/testsuite/gcc.dg/pr89984.c      2021-09-08 11:54:36.070001821 +0200
> @@ -0,0 +1,20 @@
> +/* PR target/89984 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) float
> +foo (float x, float y)
> +{
> +  return x * __builtin_copysignf (1.0f, y) + y;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (1.25f, 7.25f) != 1.25f + 7.25f
> +      || foo (1.75f, -3.25f) != -1.75f + -3.25f
> +      || foo (-2.25f, 7.5f) != -2.25f + 7.5f
> +      || foo (-3.0f, -4.0f) != 3.0f + -4.0f)
> +    __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.target/i386/avx-pr89984.c.jj      2021-09-08 11:57:12.297800869 +0200
> +++ gcc/testsuite/gcc.target/i386/avx-pr89984.c 2021-09-08 11:57:56.936172001 +0200
> @@ -0,0 +1,23 @@
> +/* PR target/89984 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx" } */
> +/* { dg-require-effective-target avx } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "avx-check.h"
> +#endif
> +#ifndef TEST
> +#define TEST avx_test
> +#endif
> +
> +#define main main1
> +#include "../../gcc.dg/pr89984.c"
> +#undef main
> +
> +#include CHECK_H
> +
> +static void
> +TEST (void)
> +{
> +  main1 ();
> +}
>
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-08 10:00     ` Hongtao Liu
  2021-09-08 10:02       ` Jakub Jelinek
  2021-09-08 10:07       ` Jakub Jelinek
@ 2021-09-14  0:57       ` Andrew Pinski
  2021-09-14  2:06         ` Hongtao Liu
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Pinski @ 2021-09-14  0:57 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, liuhongt, GCC Patches

On Wed, Sep 8, 2021 at 2:55 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Wed, Sep 8, 2021 at 5:33 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > >
> > > Patch LGTM.
> >
> > Thanks, committed.
> >
> > > PS:
> > >   I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> > > for scalar mode, can't we just expand them into and/xor operations in
> > > the expander, just like vector modes did.
> > > Let me do some experiments to see whether it is ok to remove the splitter.
> >
> > I bet it is the question how should the code look like before RA.
> > stv is somewhat related, but as that replaces whole chains, it can do:
> > (insn 14 5 6 2 (set (subreg:V2DI (reg:DI 92) 0)
> >         (vec_concat:V2DI (mem/c:DI (symbol_ref:SI ("c") [flags 0x2]  <var_decl 0x7f65a131fd80 c>) [1 c+0 S8 A64])
> >             (const_int 0 [0]))) "hohohou.c":6:9 -1
> >      (nil))
> > on loads of memory.
> > But it stv still does use paradoxical subregs:
> > (insn 10 16 11 2 (set (subreg:V2DI (reg:DI 91) 0)
> >         (minus:V2DI (subreg:V2DI (reg:DI 87) 0)
> >             (subreg:V2DI (reg:DI 94) 0))) "hohohou.c":6:13 5003 {*subv2di3}
> >      (expr_list:REG_DEAD (reg:DI 87)
> >         (expr_list:REG_UNUSED (reg:CC 17 flags)
> >             (nil))))
> > (insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("a") [flags 0x2]  <var_decl 0x7f65a131fc60 a>) [1 a+0 S8 A64])
> >         (reg:DI 91)) "hohohou.c":6:5 76 {*movdi_internal}
> >      (expr_list:REG_DEAD (reg:DI 91)
> >         (nil)))
> > so perhaps just using paradoxical subregs everywhere would be ok?
> Yes, I think so.
> And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
> allowed by validate_subreg until r11-621.
> That's why post_reload splitter is needed here.

That is not exactly true.  It has been around since before 2005.  See
https://gcc.gnu.org/PR24436 which is referencing the fixme comment in
validate_subreg.

Thanks,
Andrew Pinski

Thanks,
Andrew Pinski

> >         Jakub
> >
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-14  0:57       ` Andrew Pinski
@ 2021-09-14  2:06         ` Hongtao Liu
  2021-09-14  2:18           ` Hongtao Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Hongtao Liu @ 2021-09-14  2:06 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Jakub Jelinek, liuhongt, GCC Patches

On Tue, Sep 14, 2021 at 8:58 AM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Wed, Sep 8, 2021 at 2:55 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Wed, Sep 8, 2021 at 5:33 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > > >
> > > > Patch LGTM.
> > >
> > > Thanks, committed.
> > >
> > > > PS:
> > > >   I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> > > > for scalar mode, can't we just expand them into and/xor operations in
> > > > the expander, just like vector modes did.
> > > > Let me do some experiments to see whether it is ok to remove the splitter.
> > >
> > > I bet it is the question how should the code look like before RA.
> > > stv is somewhat related, but as that replaces whole chains, it can do:
> > > (insn 14 5 6 2 (set (subreg:V2DI (reg:DI 92) 0)
> > >         (vec_concat:V2DI (mem/c:DI (symbol_ref:SI ("c") [flags 0x2]  <var_decl 0x7f65a131fd80 c>) [1 c+0 S8 A64])
> > >             (const_int 0 [0]))) "hohohou.c":6:9 -1
> > >      (nil))
> > > on loads of memory.
> > > But it stv still does use paradoxical subregs:
> > > (insn 10 16 11 2 (set (subreg:V2DI (reg:DI 91) 0)
> > >         (minus:V2DI (subreg:V2DI (reg:DI 87) 0)
> > >             (subreg:V2DI (reg:DI 94) 0))) "hohohou.c":6:13 5003 {*subv2di3}
> > >      (expr_list:REG_DEAD (reg:DI 87)
> > >         (expr_list:REG_UNUSED (reg:CC 17 flags)
> > >             (nil))))
> > > (insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("a") [flags 0x2]  <var_decl 0x7f65a131fc60 a>) [1 a+0 S8 A64])
> > >         (reg:DI 91)) "hohohou.c":6:5 76 {*movdi_internal}
> > >      (expr_list:REG_DEAD (reg:DI 91)
> > >         (nil)))
> > > so perhaps just using paradoxical subregs everywhere would be ok?
> > Yes, I think so.
> > And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
> > allowed by validate_subreg until r11-621.
> > That's why post_reload splitter is needed here.
>
> That is not exactly true.  It has been around since before 2005.  See
> https://gcc.gnu.org/PR24436 which is referencing the fixme comment in
> validate_subreg.
We also have things like (subreg:V4SF(reg:V2SF) 0), the problem of
defining post_reload splitter with V2SF is movv2sf is only defined
under TARGET_64BIT if there's no mmx(so should we also enable 64-bit
vector 32-bit mode?).
And for xorsign w/o post_reload splitter, the code is cleaner and even
more optimal.
>
> Thanks,
> Andrew Pinski
>
> Thanks,
> Andrew Pinski
>
> > >         Jakub
> > >
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224]
  2021-09-14  2:06         ` Hongtao Liu
@ 2021-09-14  2:18           ` Hongtao Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Hongtao Liu @ 2021-09-14  2:18 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Jakub Jelinek, liuhongt, GCC Patches

On Tue, Sep 14, 2021 at 10:06 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Sep 14, 2021 at 8:58 AM Andrew Pinski <pinskia@gmail.com> wrote:
> >
> > On Wed, Sep 8, 2021 at 2:55 AM Hongtao Liu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Wed, Sep 8, 2021 at 5:33 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > > >
> > > > On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > > > >
> > > > > Patch LGTM.
> > > >
> > > > Thanks, committed.
> > > >
> > > > > PS:
> > > > >   I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> > > > > for scalar mode, can't we just expand them into and/xor operations in
> > > > > the expander, just like vector modes did.
> > > > > Let me do some experiments to see whether it is ok to remove the splitter.
> > > >
> > > > I bet it is the question how should the code look like before RA.
> > > > stv is somewhat related, but as that replaces whole chains, it can do:
> > > > (insn 14 5 6 2 (set (subreg:V2DI (reg:DI 92) 0)
> > > >         (vec_concat:V2DI (mem/c:DI (symbol_ref:SI ("c") [flags 0x2]  <var_decl 0x7f65a131fd80 c>) [1 c+0 S8 A64])
> > > >             (const_int 0 [0]))) "hohohou.c":6:9 -1
> > > >      (nil))
> > > > on loads of memory.
> > > > But it stv still does use paradoxical subregs:
> > > > (insn 10 16 11 2 (set (subreg:V2DI (reg:DI 91) 0)
> > > >         (minus:V2DI (subreg:V2DI (reg:DI 87) 0)
> > > >             (subreg:V2DI (reg:DI 94) 0))) "hohohou.c":6:13 5003 {*subv2di3}
> > > >      (expr_list:REG_DEAD (reg:DI 87)
> > > >         (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > >             (nil))))
> > > > (insn 11 10 0 2 (set (mem/c:DI (symbol_ref:SI ("a") [flags 0x2]  <var_decl 0x7f65a131fc60 a>) [1 a+0 S8 A64])
> > > >         (reg:DI 91)) "hohohou.c":6:5 76 {*movdi_internal}
> > > >      (expr_list:REG_DEAD (reg:DI 91)
> > > >         (nil)))
> > > > so perhaps just using paradoxical subregs everywhere would be ok?
> > > Yes, I think so.
> > > And I find paradoxical subreg like (subreg:V4SF (reg:SF)) are not
> > > allowed by validate_subreg until r11-621.
> > > That's why post_reload splitter is needed here.
> >
> > That is not exactly true.  It has been around since before 2005.  See
> > https://gcc.gnu.org/PR24436 which is referencing the fixme comment in
> > validate_subreg.
> We also have things like (subreg:V4SF(reg:V2SF) 0), the problem of
> defining post_reload splitter with V2SF is movv2sf is only defined
> under TARGET_64BIT if there's no mmx(so should we also enable 64-bit
> vector 32-bit mode?).
> And for xorsign w/o post_reload splitter, the code is cleaner and even
> more optimal.
And if we allow something like subreg:V4SF (reg:TI), it seems we could
have something like
mov reg:SI, subreg:SI (reg:SF)
mov reg:TI, subreg:TI (reg:SI)
mov reg:V4SF, subreg:V4SF (reg:TI)

> >
> > Thanks,
> > Andrew Pinski
> >
> > Thanks,
> > Andrew Pinski
> >
> > > >         Jakub
> > > >
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-14  2:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08  7:42 [PATCH] i386: Fix up @xorsign<mode>3_1 [PR102224] Jakub Jelinek
2021-09-08  9:23 ` Hongtao Liu
2021-09-08  9:33   ` Jakub Jelinek
2021-09-08 10:00     ` Hongtao Liu
2021-09-08 10:02       ` Jakub Jelinek
2021-09-08 10:15         ` Hongtao Liu
2021-09-08 10:07       ` Jakub Jelinek
2021-09-14  0:57       ` Andrew Pinski
2021-09-14  2:06         ` Hongtao Liu
2021-09-14  2:18           ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).