[PATCH] [PR rtl-optimization/97249]Simplify vec

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
@ 2020-10-13  8:40 Hongtao Liu
  2020-10-13 19:59 ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2020-10-13  8:40 UTC (permalink / raw)
  To: GCC Patches, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 469 bytes --]

Hi:
  For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
                   (parallel [(const_int 0) (const_int 1)]))
 it could be simplified as inner.

  Bootstrap is ok, regression test on i386 backend is ok.

gcc/ChangeLog
        PR rtl-optimization/97249
        * simplify-rtx.c (simplify_binary_operation_1): Simplify
        vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

        * gcc.target/i386/pr97249-1.c: New test.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Simplify-vec_select-of-paradoxical-subreg.patch --]
[-- Type: text/x-patch, Size: 3180 bytes --]

From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of paradoxical subreg.

For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
		   (parallel [(const_int 0) (const_int 1)]))
it could be simplified as inner.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 27 ++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..9c397157f28 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* For cases like
+	     (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
+			      (parallel [(const_int 0) (const_int 1)])).
+	     return inner directly.  */
+	  if (GET_CODE (trueop0) == SUBREG
+	      && paradoxical_subreg_p (trueop0)
+	      && mode == GET_MODE (XEXP (trueop0, 0))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && l0 % l1 == 0)
+	    {
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
+	      unsigned HOST_WIDE_INT sel = 0;
+	      int i = 0;
+	      for (;i != l1; i++)
+		{
+		  rtx j = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (j))
+		    break;
+		  sel |= HOST_WIDE_INT_1U << UINTVAL (j);
+		}
+	      /* ??? Need to simplify XEXP (trueop0, 0) here.  */
+	      if (sel == expect)
+		return XEXP (trueop0, 0);
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..bc34aa8baa6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


[-- Attachment #3: 0001-Simplify-vec_select-of-paradoxical-subreg.patch --]
[-- Type: text/x-patch, Size: 3180 bytes --]

From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of paradoxical subreg.

For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
		   (parallel [(const_int 0) (const_int 1)]))
it could be simplified as inner.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 27 ++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..9c397157f28 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* For cases like
+	     (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
+			      (parallel [(const_int 0) (const_int 1)])).
+	     return inner directly.  */
+	  if (GET_CODE (trueop0) == SUBREG
+	      && paradoxical_subreg_p (trueop0)
+	      && mode == GET_MODE (XEXP (trueop0, 0))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && l0 % l1 == 0)
+	    {
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
+	      unsigned HOST_WIDE_INT sel = 0;
+	      int i = 0;
+	      for (;i != l1; i++)
+		{
+		  rtx j = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (j))
+		    break;
+		  sel |= HOST_WIDE_INT_1U << UINTVAL (j);
+		}
+	      /* ??? Need to simplify XEXP (trueop0, 0) here.  */
+	      if (sel == expect)
+		return XEXP (trueop0, 0);
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..bc34aa8baa6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-13  8:40 [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg Hongtao Liu
@ 2020-10-13 19:59 ` Segher Boessenkool
  2020-10-14  5:43   ` Hongtao Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-13 19:59 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu, wwwhhhyyy333

Hi!

On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
>   For rtx like
>   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
>                    (parallel [(const_int 0) (const_int 1)]))
>  it could be simplified as inner.

You could even simplify any vec_select of a subreg of X to just a
vec_select of X, by changing the selection vector a bit (well, only do
this if that is a constant vector, I suppose).  Not just for paradoxical
subregs either, just for *all* subregs.

> gcc/ChangeLog
>         PR rtl-optimization/97249
>         * simplify-rtx.c (simplify_binary_operation_1): Simplify
>         vec_select of paradoxical subreg.
> 
> gcc/testsuite/ChangeLog
> 
>         * gcc.target/i386/pr97249-1.c: New test.

> +	  /* For cases like
> +	     (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> +			      (parallel [(const_int 0) (const_int 1)])).
> +	     return inner directly.  */
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && paradoxical_subreg_p (trueop0)
> +	      && mode == GET_MODE (XEXP (trueop0, 0))
> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && l0 % l1 == 0)

Why this?  Why does the number of elements of the input have to divide
that of the output?

> +	    {
> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> +	      unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> +	      unsigned HOST_WIDE_INT sel = 0;
> +	      int i = 0;
> +	      for (;i != l1; i++)

  for (int i = 0; i != l1; i++)

> +		{
> +		  rtx j = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (j))
> +		    break;
> +		  sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> +		}
> +	      /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> +	      if (sel == expect)
> +		return XEXP (trueop0, 0);
> +	    }
>  	}

If you just handle the much more generic case, all the other vec_select
simplifications can be done as well, not just this one.

> +/* PR target/97249  */
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */

I don't know enough about the x86 backend to know if this is exactly
what you need in the testsuite.  I do know a case of backslashitis when
I see one though -- you might want to use {} instead of "", and perhaps
\m and \M and \s etc.  And to make sure things are on one line, don't do
all that nastiness with [^\n], just start the RE with (?n) :-)


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-13 19:59 ` Segher Boessenkool
@ 2020-10-14  5:43   ` Hongtao Liu
  2020-10-14 17:35     ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2020-10-14  5:43 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches, H. J. Lu, wwwhhhyyy333

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> >   For rtx like
> >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> >                    (parallel [(const_int 0) (const_int 1)]))
> >  it could be simplified as inner.
>
> You could even simplify any vec_select of a subreg of X to just a
> vec_select of X, by changing the selection vector a bit (well, only do

Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.

> this if that is a constant vector, I suppose).  Not just for paradoxical
> subregs either, just for *all* subregs.
>

Yes, and only when X has the same inner mode and more elements.

> > gcc/ChangeLog
> >         PR rtl-optimization/97249
> >         * simplify-rtx.c (simplify_binary_operation_1): Simplify
> >         vec_select of paradoxical subreg.
> >
> > gcc/testsuite/ChangeLog
> >
> >         * gcc.target/i386/pr97249-1.c: New test.
>
> > +       /* For cases like
> > +          (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > +                           (parallel [(const_int 0) (const_int 1)])).
> > +          return inner directly.  */
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && paradoxical_subreg_p (trueop0)
> > +           && mode == GET_MODE (XEXP (trueop0, 0))
> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && l0 % l1 == 0)
>
> Why this?  Why does the number of elements of the input have to divide
> that of the output?
>

Removed, also add condition for my upper comments.

> > +         {
> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> > +           unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> > +           unsigned HOST_WIDE_INT sel = 0;
> > +           int i = 0;
> > +           for (;i != l1; i++)
>
>   for (int i = 0; i != l1; i++)
>
> > +             {
> > +               rtx j = XVECEXP (trueop1, 0, i);
> > +               if (!CONST_INT_P (j))
> > +                 break;
> > +               sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> > +             }
> > +           /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> > +           if (sel == expect)
> > +             return XEXP (trueop0, 0);
> > +         }
> >       }
>
> If you just handle the much more generic case, all the other vec_select
> simplifications can be done as well, not just this one.
>

Yes, changed, also selection should be inside the elements of X.

> > +/* PR target/97249  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx2 -O3 -masm=att" } */
> > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
>
> I don't know enough about the x86 backend to know if this is exactly
> what you need in the testsuite.  I do know a case of backslashitis when
> I see one though -- you might want to use {} instead of "", and perhaps
> \m and \M and \s etc.  And to make sure things are on one line, don't do
> all that nastiness with [^\n], just start the RE with (?n) :-)
>

Yes, changed and it's very clean with usage of (?n) and {}.

>
> Segher

Update patch.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec_s-v2.patch --]
[-- Type: text/x-patch, Size: 3473 bytes --]

From df71eb46e394e5b778c69e9e8f25b301997e365d Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X when
	available.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..8a10b6cf4d5 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	     when available.  */
+	  int l2;
+	  if (GET_CODE (trueop0) == SUBREG
+	      && (GET_MODE_INNER (mode)
+		  == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
+		  .is_constant (&l2)
+	      && known_le (l1, l2))
+	    {
+	      unsigned HOST_WIDE_INT subreg_offset = 0;
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
+					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
+					   &subreg_offset));
+	      bool success = true;
+	      for (int i = 0;i != l1; i++)
+		{
+		  rtx j = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (j)
+		      || known_ge (UINTVAL (j), l2 - subreg_offset))
+		    {
+		      success = false;
+		      break;
+		    }
+		}
+	      if (success)
+		{
+		  rtx par = trueop1;
+		  if (subreg_offset)
+		    {
+		      rtvec vec = rtvec_alloc (l1);
+		      for (int i = 0; i < l1; i++)
+			RTVEC_ELT (vec, i)
+			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
+					     + subreg_offset));
+		      par = gen_rtx_PARALLEL (VOIDmode, vec);
+		    }
+		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
+		}
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-14  5:43   ` Hongtao Liu
@ 2020-10-14 17:35     ` Segher Boessenkool
  2020-10-14 17:55       ` Richard Biener
  2020-10-15  8:14       ` Hongtao Liu
  0 siblings, 2 replies; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-14 17:35 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu, wwwhhhyyy333

Hi!

On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > >   For rtx like
> > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > >                    (parallel [(const_int 0) (const_int 1)]))
> > >  it could be simplified as inner.
> >
> > You could even simplify any vec_select of a subreg of X to just a
> > vec_select of X, by changing the selection vector a bit (well, only do
> 
> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.

Exactly.

> > this if that is a constant vector, I suppose).  Not just for paradoxical
> > subregs either, just for *all* subregs.
> 
> Yes, and only when X has the same inner mode and more elements.

No, for *all*.  The mode of the first argument of vec_select does not
have to equal its result mode.

Any (constant indices) vec_select of a subreg can be written as just a
vec_select.

> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when available.  */

What does "when available" mean here?

> +	  int l2;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && (GET_MODE_INNER (mode)
> +		  == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))

Don't use unnecessary parens here please, it makes it harder to read
(there are quite enough parens already :-) )

> +	      gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> +					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
> +					   &subreg_offset));

Why is this needed?

> +	      bool success = true;
> +	      for (int i = 0;i != l1; i++)

(space after ; )

> +		{
> +		  rtx j = XVECEXP (trueop1, 0, i);

(i and j and k ususally are integers, not rtx)

> +		  if (!CONST_INT_P (j)
> +		      || known_ge (UINTVAL (j), l2 - subreg_offset))
> +		    {
> +		      success = false;
> +		      break;
> +		    }
> +		}

You don't have to test if the input RTL is valid.  You can assume it is.

> +	      if (success)
> +		{
> +		  rtx par = trueop1;
> +		  if (subreg_offset)
> +		    {
> +		      rtvec vec = rtvec_alloc (l1);
> +		      for (int i = 0; i < l1; i++)
> +			RTVEC_ELT (vec, i)
> +			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> +					     + subreg_offset));
> +		      par = gen_rtx_PARALLEL (VOIDmode, vec);
> +		    }
> +		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> +		}
> +	    }

subreg_offset will differ in meaning if big-endian; is this correct
there, do all the stars align so this code works out fine there as well?

Looks fine otherwise, thanks :-)


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-14 17:35     ` Segher Boessenkool
@ 2020-10-14 17:55       ` Richard Biener
  2020-10-14 19:23         ` Segher Boessenkool
  2020-10-15  8:14       ` Hongtao Liu
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Biener @ 2020-10-14 17:55 UTC (permalink / raw)
  To: gcc-patches, Segher Boessenkool, Hongtao Liu; +Cc: GCC Patches

On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>Hi!
>
>On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
>> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
>> > >   For rtx like
>> > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
>> > >                    (parallel [(const_int 0) (const_int 1)]))
>> > >  it could be simplified as inner.
>> >
>> > You could even simplify any vec_select of a subreg of X to just a
>> > vec_select of X, by changing the selection vector a bit (well, only
>do
>> 
>> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to
>selection.
>
>Exactly.
>
>> > this if that is a constant vector, I suppose).  Not just for
>paradoxical
>> > subregs either, just for *all* subregs.
>> 
>> Yes, and only when X has the same inner mode and more elements.
>
>No, for *all*.  The mode of the first argument of vec_select does not
>have to equal its result mode.

But IIRC the component mode needs to match. 

>Any (constant indices) vec_select of a subreg can be written as just a
>vec_select.
>
>> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
>> +	     when available.  */
>
>What does "when available" mean here?
>
>> +	  int l2;
>> +	  if (GET_CODE (trueop0) == SUBREG
>> +	      && (GET_MODE_INNER (mode)
>> +		  == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
>
>Don't use unnecessary parens here please, it makes it harder to read
>(there are quite enough parens already :-) )
>
>> +	      gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
>> +					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
>> +					   &subreg_offset));
>
>Why is this needed?
>
>> +	      bool success = true;
>> +	      for (int i = 0;i != l1; i++)
>
>(space after ; )
>
>> +		{
>> +		  rtx j = XVECEXP (trueop1, 0, i);
>
>(i and j and k ususally are integers, not rtx)
>
>> +		  if (!CONST_INT_P (j)
>> +		      || known_ge (UINTVAL (j), l2 - subreg_offset))
>> +		    {
>> +		      success = false;
>> +		      break;
>> +		    }
>> +		}
>
>You don't have to test if the input RTL is valid.  You can assume it
>is.
>
>> +	      if (success)
>> +		{
>> +		  rtx par = trueop1;
>> +		  if (subreg_offset)
>> +		    {
>> +		      rtvec vec = rtvec_alloc (l1);
>> +		      for (int i = 0; i < l1; i++)
>> +			RTVEC_ELT (vec, i)
>> +			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
>> +					     + subreg_offset));
>> +		      par = gen_rtx_PARALLEL (VOIDmode, vec);
>> +		    }
>> +		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
>> +		}
>> +	    }
>
>subreg_offset will differ in meaning if big-endian; is this correct
>there, do all the stars align so this code works out fine there as
>well?
>
>Looks fine otherwise, thanks :-)
>
>
>Segher


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-14 17:55       ` Richard Biener
@ 2020-10-14 19:23         ` Segher Boessenkool
  0 siblings, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-14 19:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Hongtao Liu

On Wed, Oct 14, 2020 at 07:55:55PM +0200, Richard Biener wrote:
> On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> >> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> >> <segher@kernel.crashing.org> wrote:
> >> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> >> > >   For rtx like
> >> > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> >> > >                    (parallel [(const_int 0) (const_int 1)]))
> >> > >  it could be simplified as inner.
> >> >
> >> > You could even simplify any vec_select of a subreg of X to just a
> >> > vec_select of X, by changing the selection vector a bit (well, only
> >do
> >> 
> >> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to
> >selection.
> >
> >Exactly.
> >
> >> > this if that is a constant vector, I suppose).  Not just for
> >paradoxical
> >> > subregs either, just for *all* subregs.
> >> 
> >> Yes, and only when X has the same inner mode and more elements.
> >
> >No, for *all*.  The mode of the first argument of vec_select does not
> >have to equal its result mode.
> 
> But IIRC the component mode needs to match. 

Yeah, good point, at least the i386 backend uses crazy subregs, which
is why validate_subreg does not test this :-(


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-14 17:35     ` Segher Boessenkool
  2020-10-14 17:55       ` Richard Biener
@ 2020-10-15  8:14       ` Hongtao Liu
  2020-10-15  9:58         ` Hongtao Liu
  2020-10-20 20:43         ` Segher Boessenkool
  1 sibling, 2 replies; 20+ messages in thread
From: Hongtao Liu @ 2020-10-15  8:14 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches, H. J. Lu, Hongyu Wang

On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > > >   For rtx like
> > > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > > >                    (parallel [(const_int 0) (const_int 1)]))
> > > >  it could be simplified as inner.
> > >
> > > You could even simplify any vec_select of a subreg of X to just a
> > > vec_select of X, by changing the selection vector a bit (well, only do
> >
> > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.
>
> Exactly.
>
> > > this if that is a constant vector, I suppose).  Not just for paradoxical
> > > subregs either, just for *all* subregs.
> >
> > Yes, and only when X has the same inner mode and more elements.
>
> No, for *all*.  The mode of the first argument of vec_select does not
> have to equal its result mode.
>
> Any (constant indices) vec_select of a subreg can be written as just a
> vec_select.
>
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when available.  */
>
> What does "when available" mean here?
>

When X has same component mode as vec_select, i'll add this to comments.

> > +       int l2;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && (GET_MODE_INNER (mode)
> > +               == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
>
> Don't use unnecessary parens here please, it makes it harder to read
> (there are quite enough parens already :-) )
>

Yes.

> > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > +                                        &subreg_offset));
>
> Why is this needed?

I only found this interface for poly_uint64 division to get subreg_offset.

>
> > +           bool success = true;
> > +           for (int i = 0;i != l1; i++)
>
> (space after ; )
>
> > +             {
> > +               rtx j = XVECEXP (trueop1, 0, i);
>
> (i and j and k ususally are integers, not rtx)
>
> > +               if (!CONST_INT_P (j)
> > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > +                 {
> > +                   success = false;
> > +                   break;
> > +                 }
> > +             }
>
> You don't have to test if the input RTL is valid.  You can assume it is.
>

This test is for something like (vec_select:v2di (subreg:v4di
(reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist?

> > +           if (success)
> > +             {
> > +               rtx par = trueop1;
> > +               if (subreg_offset)
> > +                 {
> > +                   rtvec vec = rtvec_alloc (l1);
> > +                   for (int i = 0; i < l1; i++)
> > +                     RTVEC_ELT (vec, i)
> > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > +                                          + subreg_offset));
> > +                   par = gen_rtx_PARALLEL (VOIDmode, vec);
> > +                 }
> > +               return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> > +             }
> > +         }
>
> subreg_offset will differ in meaning if big-endian; is this correct
Yes.
> there, do all the stars align so this code works out fine there as well?
>

i found it's a bit tricky to adjust selection index for target
BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
Especially for component mode smaller than word, Any interface to handle this?

> Looks fine otherwise, thanks :-)
>
>
> Segher



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-15  8:14       ` Hongtao Liu
@ 2020-10-15  9:58         ` Hongtao Liu
  2020-10-15 12:38           ` Richard Sandiford
  2020-10-20 20:43         ` Segher Boessenkool
  1 sibling, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2020-10-15  9:58 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches, H. J. Lu, Hongyu Wang

[-- Attachment #1: Type: text/plain, Size: 4294 bytes --]

On Thu, Oct 15, 2020 at 4:14 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > Hi!
> >
> > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> > > <segher@kernel.crashing.org> wrote:
> > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > > > >   For rtx like
> > > > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > > > >                    (parallel [(const_int 0) (const_int 1)]))
> > > > >  it could be simplified as inner.
> > > >
> > > > You could even simplify any vec_select of a subreg of X to just a
> > > > vec_select of X, by changing the selection vector a bit (well, only do
> > >
> > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.
> >
> > Exactly.
> >
> > > > this if that is a constant vector, I suppose).  Not just for paradoxical
> > > > subregs either, just for *all* subregs.
> > >
> > > Yes, and only when X has the same inner mode and more elements.
> >
> > No, for *all*.  The mode of the first argument of vec_select does not
> > have to equal its result mode.
> >
> > Any (constant indices) vec_select of a subreg can be written as just a
> > vec_select.
> >
> > > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > > +          when available.  */
> >
> > What does "when available" mean here?
> >
>
> When X has same component mode as vec_select, i'll add this to comments.
>
> > > +       int l2;
> > > +       if (GET_CODE (trueop0) == SUBREG
> > > +           && (GET_MODE_INNER (mode)
> > > +               == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
> >
> > Don't use unnecessary parens here please, it makes it harder to read
> > (there are quite enough parens already :-) )
> >
>
> Yes.
>
> > > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > > +                                        &subreg_offset));
> >
> > Why is this needed?
>
> I only found this interface for poly_uint64 division to get subreg_offset.
>
> >
> > > +           bool success = true;
> > > +           for (int i = 0;i != l1; i++)
> >
> > (space after ; )
> >
> > > +             {
> > > +               rtx j = XVECEXP (trueop1, 0, i);
> >
> > (i and j and k ususally are integers, not rtx)
> >
> > > +               if (!CONST_INT_P (j)
> > > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > > +                 {
> > > +                   success = false;
> > > +                   break;
> > > +                 }
> > > +             }
> >
> > You don't have to test if the input RTL is valid.  You can assume it is.
> >
>
> This test is for something like (vec_select:v2di (subreg:v4di
> (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
> const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist?
>

Remove the test.

> > > +           if (success)
> > > +             {
> > > +               rtx par = trueop1;
> > > +               if (subreg_offset)
> > > +                 {
> > > +                   rtvec vec = rtvec_alloc (l1);
> > > +                   for (int i = 0; i < l1; i++)
> > > +                     RTVEC_ELT (vec, i)
> > > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > > +                                          + subreg_offset));
> > > +                   par = gen_rtx_PARALLEL (VOIDmode, vec);
> > > +                 }
> > > +               return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> > > +             }
> > > +         }
> >
> > subreg_offset will differ in meaning if big-endian; is this correct
> Yes.
> > there, do all the stars align so this code works out fine there as well?
> >
>
> i found it's a bit tricky to adjust selection index for target
> BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
> Especially for component mode smaller than word, Any interface to handle this?
>

Use subreg_lsb to get offset from least significant bits, suppose this
could be independent of big/little-endian.

> > Looks fine otherwise, thanks :-)
> >
> >
> > Segher
>
>
>
> --
> BR,
> Hongtao

Update Patch.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec-v3.patch --]
[-- Type: text/x-patch, Size: 3477 bytes --]

From 2b56fed24fe7ef7f49bcac131bbc7f51f36b0a6a Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X when
	available.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 43 +++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..ff831bcec8f 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,49 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	     when X has same component mode as vec_select.  */
+	  int l2;
+	  if (GET_CODE (trueop0) == SUBREG
+	      && GET_MODE_INNER (mode)
+		 == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
+		  .is_constant (&l2)
+	      && known_le (l1, l2))
+	    {
+	      unsigned HOST_WIDE_INT subreg_offset = 0;
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
+					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
+					   &subreg_offset));
+	      bool success = true;
+	      for (int i = 0; i != l1; i++)
+		{
+		  rtx idx  = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (idx))
+		    {
+		      success = false;
+		      break;
+		    }
+		}
+	      if (success)
+		{
+		  rtx par = trueop1;
+		  if (subreg_offset)
+		    {
+		      rtvec vec = rtvec_alloc (l1);
+		      for (int i = 0; i < l1; i++)
+			RTVEC_ELT (vec, i)
+			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
+					     + subreg_offset));
+		      par = gen_rtx_PARALLEL (VOIDmode, vec);
+		    }
+		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
+		}
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-15  9:58         ` Hongtao Liu
@ 2020-10-15 12:38           ` Richard Sandiford
  2020-10-19  5:18             ` Hongtao Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-10-15 12:38 UTC (permalink / raw)
  To: Hongtao Liu via Gcc-patches; +Cc: Segher Boessenkool, Hongtao Liu

Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  int l2;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))

Better to use SUBREG_REG here and below.

> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> +		  .is_constant (&l2)
> +	      && known_le (l1, l2))
> +	    {
> +	      unsigned HOST_WIDE_INT subreg_offset = 0;
> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> +	      gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> +					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
> +					   &subreg_offset));

can_div_trunc_p discards the remainder, whereas it looks like here
you want an exact multiple.

I don't think it's absolutely guaranteed that the “if” condition makes
the division by GET_MODE_SIZE exact.  E.g. in principle you could have
a subreg of a vector of TIs in which the subreg offset is misaligned by
a DI offset.

I'm not sure the subreg_lsb conversion is correct though.  On big-endian
targets, lane numbering follows memory layout, just like subreg byte
offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
was correct.

In summary, I think the "if” condition should include something like:

  constant_mulitple_p (SUBREG_BYTE (trueop0),
                       GET_MODE_UNIT_BITSIZE (mode),
                       &subreg_offset)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-15 12:38           ` Richard Sandiford
@ 2020-10-19  5:18             ` Hongtao Liu
  2020-10-19 15:31               ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2020-10-19  5:18 UTC (permalink / raw)
  To: Hongtao Liu via Gcc-patches, Segher Boessenkool, Hongtao Liu,
	Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when X has same component mode as vec_select.  */
> > +       int l2;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
>
> Better to use SUBREG_REG here and below.
>

Yes and changed.

> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> > +               .is_constant (&l2)
> > +           && known_le (l1, l2))
> > +         {
> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > +                                        &subreg_offset));
>
> can_div_trunc_p discards the remainder, whereas it looks like here
> you want an exact multiple.
>
> I don't think it's absolutely guaranteed that the “if” condition makes
> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
> a subreg of a vector of TIs in which the subreg offset is misaligned by
> a DI offset.
>
> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
> targets, lane numbering follows memory layout, just like subreg byte
> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
> was correct.
>
> In summary, I think the "if” condition should include something like:
>
>   constant_mulitple_p (SUBREG_BYTE (trueop0),
>                        GET_MODE_UNIT_BITSIZE (mode),
>                        &subreg_offset)
>

Changed.

> Thanks,
> Richard


Update patch.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec-v4.patch --]
[-- Type: text/x-patch, Size: 3430 bytes --]

From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..b1009837b2b 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	     when X has same component mode as vec_select.  */
+	  int l2;
+	  unsigned HOST_WIDE_INT subreg_offset = 0;
+	  if (GET_CODE (trueop0) == SUBREG
+	      && GET_MODE_INNER (mode)
+		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
+		  .is_constant (&l2)
+	      && known_le (l1, l2)
+	      && constant_multiple_p (SUBREG_BYTE (trueop0),
+				      GET_MODE_UNIT_BITSIZE (mode),
+				      &subreg_offset))
+	    {
+
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      bool success = true;
+	      for (int i = 0; i != l1; i++)
+		{
+		  rtx idx  = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (idx))
+		    {
+		      success = false;
+		      break;
+		    }
+		}
+	      if (success)
+		{
+		  rtx par = trueop1;
+		  if (subreg_offset)
+		    {
+		      rtvec vec = rtvec_alloc (l1);
+		      for (int i = 0; i < l1; i++)
+			RTVEC_ELT (vec, i)
+			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
+					     + subreg_offset));
+		      par = gen_rtx_PARALLEL (VOIDmode, vec);
+		    }
+		  return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par);
+		}
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-19  5:18             ` Hongtao Liu
@ 2020-10-19 15:31               ` Richard Sandiford
  2020-10-20  3:20                 ` Hongtao Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-10-19 15:31 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, Segher Boessenkool

Hongtao Liu <crazylht@gmail.com> writes:
> On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
>> > +          when X has same component mode as vec_select.  */
>> > +       int l2;
>> > +       if (GET_CODE (trueop0) == SUBREG
>> > +           && GET_MODE_INNER (mode)
>> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
>>
>> Better to use SUBREG_REG here and below.
>>
>
> Yes and changed.
>
>> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
>> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
>> > +               .is_constant (&l2)
>> > +           && known_le (l1, l2))
>> > +         {
>> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
>> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
>> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
>> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
>> > +                                        &subreg_offset));
>>
>> can_div_trunc_p discards the remainder, whereas it looks like here
>> you want an exact multiple.
>>
>> I don't think it's absolutely guaranteed that the “if” condition makes
>> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
>> a subreg of a vector of TIs in which the subreg offset is misaligned by
>> a DI offset.
>>
>> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
>> targets, lane numbering follows memory layout, just like subreg byte
>> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
>> was correct.
>>
>> In summary, I think the "if” condition should include something like:
>>
>>   constant_mulitple_p (SUBREG_BYTE (trueop0),
>>                        GET_MODE_UNIT_BITSIZE (mode),
>>                        &subreg_offset)
>>
>
> Changed.
>
>> Thanks,
>> Richard
>
>
> Update patch.
>
> -- 
> BR,
> Hongtao
>
> From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Tue, 13 Oct 2020 15:35:29 +0800
> Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
>  X.
>
> gcc/ChangeLog
> 	PR rtl-optimization/97249
> 	* simplify-rtx.c (simplify_binary_operation_1): Simplify
> 	vec_select of a subreg of X to a vec_select of X.
>
> gcc/testsuite/ChangeLog
>
> 	* gcc.target/i386/pr97249-1.c: New test.
> ---
>  gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
>  2 files changed, 74 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 869f0d11b2e..b1009837b2b 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
>  		    return subop1;
>  		}
>  	    }
> +
> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  int l2;
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)

Nothing really relies on this last line, and nothing uses l0, so better
to drop it.

> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> +		  .is_constant (&l2)
> +	      && known_le (l1, l2)

I'm not sure the last two &&s are really the important condition.
I think we should drop them for the suggestion below.

> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))
> +	    {
> +

Excess blank line.

> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));

This can just use ==.

> +	      bool success = true;
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx  = XVECEXP (trueop1, 0, i);

Excess space.

> +		  if (!CONST_INT_P (idx))

Here I think we should check:

		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

where:

       poly_uint64 nunits
         = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).

This makes sure that all indices are in range.  In particular, it's
valid for the SUBREG_REG to be narrower than mode, for appropriate
vec_select indices

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-19 15:31               ` Richard Sandiford
@ 2020-10-20  3:20                 ` Hongtao Liu
  2020-10-20 16:42                   ` Richard Sandiford
  2020-10-20 21:05                   ` Segher Boessenkool
  0 siblings, 2 replies; 20+ messages in thread
From: Hongtao Liu @ 2020-10-20  3:20 UTC (permalink / raw)
  To: Hongtao Liu, Hongtao Liu via Gcc-patches, Segher Boessenkool,
	Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 5947 bytes --]

On Mon, Oct 19, 2020 at 11:31 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu <crazylht@gmail.com> writes:
> > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> >> > +          when X has same component mode as vec_select.  */
> >> > +       int l2;
> >> > +       if (GET_CODE (trueop0) == SUBREG
> >> > +           && GET_MODE_INNER (mode)
> >> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> >>
> >> Better to use SUBREG_REG here and below.
> >>
> >
> > Yes and changed.
> >
> >> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> >> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> >> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> >> > +               .is_constant (&l2)
> >> > +           && known_le (l1, l2))
> >> > +         {
> >> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
> >> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> >> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> >> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> >> > +                                        &subreg_offset));
> >>
> >> can_div_trunc_p discards the remainder, whereas it looks like here
> >> you want an exact multiple.
> >>
> >> I don't think it's absolutely guaranteed that the “if” condition makes
> >> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
> >> a subreg of a vector of TIs in which the subreg offset is misaligned by
> >> a DI offset.
> >>
> >> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
> >> targets, lane numbering follows memory layout, just like subreg byte
> >> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
> >> was correct.
> >>
> >> In summary, I think the "if” condition should include something like:
> >>
> >>   constant_mulitple_p (SUBREG_BYTE (trueop0),
> >>                        GET_MODE_UNIT_BITSIZE (mode),
> >>                        &subreg_offset)
> >>
> >
> > Changed.
> >
> >> Thanks,
> >> Richard
> >
> >
> > Update patch.
> >
> > --
> > BR,
> > Hongtao
> >
> > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
> > From: liuhongt <hongtao.liu@intel.com>
> > Date: Tue, 13 Oct 2020 15:35:29 +0800
> > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
> >  X.
> >
> > gcc/ChangeLog
> >       PR rtl-optimization/97249
> >       * simplify-rtx.c (simplify_binary_operation_1): Simplify
> >       vec_select of a subreg of X to a vec_select of X.
> >
> > gcc/testsuite/ChangeLog
> >
> >       * gcc.target/i386/pr97249-1.c: New test.
> > ---
> >  gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
> >  2 files changed, 74 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
> >
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index 869f0d11b2e..b1009837b2b 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
> >                   return subop1;
> >               }
> >           }
> > +
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when X has same component mode as vec_select.  */
> > +       int l2;
> > +       unsigned HOST_WIDE_INT subreg_offset = 0;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
>
> Nothing really relies on this last line, and nothing uses l0, so better
> to drop it.
>

Changed, so there won't be any vector mode with variable number elts.

> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> > +               .is_constant (&l2)
> > +           && known_le (l1, l2)
>
> I'm not sure the last two &&s are really the important condition.
> I think we should drop them for the suggestion below.
>

Changed, assume gcc also support something like (vec_select:v4di
(reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
(const_int 0)]))
as long as the range of selection guaranteed by
  || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

> > +           && constant_multiple_p (SUBREG_BYTE (trueop0),
> > +                                   GET_MODE_UNIT_BITSIZE (mode),
> > +                                   &subreg_offset))
> > +         {
> > +
>
> Excess blank line.
>

Changed.

> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
>
> This can just use ==.
>

Changed.

> > +           bool success = true;
> > +           for (int i = 0; i != l1; i++)
> > +             {
> > +               rtx idx  = XVECEXP (trueop1, 0, i);
>
> Excess space.

Changed.

>
> > +               if (!CONST_INT_P (idx))
>
> Here I think we should check:
>
>                       || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> where:
>
>        poly_uint64 nunits
>          = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
>

Changed.

> This makes sure that all indices are in range.  In particular, it's
> valid for the SUBREG_REG to be narrower than mode, for appropriate
> vec_select indices
>

Yes, that's what paradoxical subreg means.

> Thanks,
> Richard




--
BR,
Hongtao

[-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-v5.patch --]
[-- Type: text/x-patch, Size: 3367 bytes --]

From 5dc3de7f3fabb932be2c097db00b75061228caaf Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 41 +++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..df751318237 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,47 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	     when X has same component mode as vec_select.  */
+	  unsigned HOST_WIDE_INT subreg_offset = 0;
+	  if (GET_CODE (trueop0) == SUBREG
+	      && GET_MODE_INNER (mode)
+		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && constant_multiple_p (SUBREG_BYTE (trueop0),
+				      GET_MODE_UNIT_BITSIZE (mode),
+				      &subreg_offset))
+	    {
+	      gcc_assert (XVECLEN (trueop1, 0) == l1);
+	      bool success = true;
+	      poly_uint64 nunits
+		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
+	      for (int i = 0; i != l1; i++)
+		{
+		  rtx idx = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (idx)
+		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
+		    {
+		      success = false;
+		      break;
+		    }
+		}
+	      if (success)
+		{
+		  rtx par = trueop1;
+		  if (subreg_offset)
+		    {
+		      rtvec vec = rtvec_alloc (l1);
+		      for (int i = 0; i < l1; i++)
+			RTVEC_ELT (vec, i)
+			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
+					     + subreg_offset));
+		      par = gen_rtx_PARALLEL (VOIDmode, vec);
+		    }
+		  return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par);
+		}
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-20  3:20                 ` Hongtao Liu
@ 2020-10-20 16:42                   ` Richard Sandiford
  2020-10-21  2:43                     ` Hongtao Liu
  2020-10-20 21:05                   ` Segher Boessenkool
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-10-20 16:42 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, Segher Boessenkool

Hongtao Liu <crazylht@gmail.com> writes:
>> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>> > +           && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
>> > +               .is_constant (&l2)
>> > +           && known_le (l1, l2)
>>
>> I'm not sure the last two &&s are really the important condition.
>> I think we should drop them for the suggestion below.
>>
>
> Changed, assume gcc also support something like (vec_select:v4di
> (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
> (const_int 0)]))
> as long as the range of selection guaranteed by
>   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

Yeah, that vec_select looks OK.

>>
>> > +               if (!CONST_INT_P (idx))
>>
>> Here I think we should check:
>>
>>                       || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>>
>> where:
>>
>>        poly_uint64 nunits
>>          = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
>>
>
> Changed.
>
>> This makes sure that all indices are in range.  In particular, it's
>> valid for the SUBREG_REG to be narrower than mode, for appropriate
>> vec_select indices
>>
>
> Yes, that's what paradoxical subreg means.

But I was comparing the mode of the vec_select with the mode of the
SUBREG_REG (rather than the mode of trueop0 with the mode of the
SUBREG_REG, which is what matters for paradoxical subregs).

> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)

Unnecessary brackets around “GET_MODE_NUNITS (mode)”.

> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))

Sorry, my bad, this should be:

	      && constant_multiple_p (subreg_memory_offset (trueop0),
				      GET_MODE_UNIT_BITSIZE (mode),
				      &subreg_offset))

> +	    {
> +	      gcc_assert (XVECLEN (trueop1, 0) == l1);
> +	      bool success = true;
> +	      poly_uint64 nunits
> +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (idx)
> +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> +		    {
> +		      success = false;
> +		      break;
> +		    }
> +		}
> +	      if (success)
> +		{
> +		  rtx par = trueop1;
> +		  if (subreg_offset)
> +		    {
> +		      rtvec vec = rtvec_alloc (l1);
> +		      for (int i = 0; i < l1; i++)
> +			RTVEC_ELT (vec, i)
> +			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> +					     + subreg_offset));

This is applying subreg_offset to the pointer rather than the INTVAL.
It should be:

			  = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i))
				     + subreg_offset);

OK with those changes, thanks.

Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-15  8:14       ` Hongtao Liu
  2020-10-15  9:58         ` Hongtao Liu
@ 2020-10-20 20:43         ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-20 20:43 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu, Hongyu Wang

On Thu, Oct 15, 2020 at 04:14:39PM +0800, Hongtao Liu wrote:
> On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > > +                                        &subreg_offset));
> >
> > Why is this needed?
> 
> I only found this interface for poly_uint64 division to get subreg_offset.

I mean, why do you have this assert at all?

> > > +               if (!CONST_INT_P (j)
> > > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > > +                 {
> > > +                   success = false;
> > > +                   break;
> > > +                 }
> > > +             }
> >
> > You don't have to test if the input RTL is valid.  You can assume it is.
> >
> 
> This test is for something like (vec_select:v2di (subreg:v4di
> (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
> const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist?

Assuming this is LE: yes, this is just invalid.  You can do whatever you
want with it (except ICE :-) )

> > subreg_offset will differ in meaning if big-endian; is this correct
> Yes.
> > there, do all the stars align so this code works out fine there as well?
> 
> i found it's a bit tricky to adjust selection index for target
> BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
> Especially for component mode smaller than word, Any interface to handle this?

For most things you want BYTES_BIG_ENDIAN, anything in a subreg here for
example.  I don't know which of those vectors use; I cannot find it in
the documentation, either.


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-20  3:20                 ` Hongtao Liu
  2020-10-20 16:42                   ` Richard Sandiford
@ 2020-10-20 21:05                   ` Segher Boessenkool
  2020-10-21  3:17                     ` Hongtao Liu
  1 sibling, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-20 21:05 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, Richard Sandiford

On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote:
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))
> +	    {
> +	      gcc_assert (XVECLEN (trueop1, 0) == l1);

Why?  If we want to check that, it should be in RTL checking (and maybe
it already is!)

> +	      bool success = true;
> +	      poly_uint64 nunits
> +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (idx)
> +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

Can that ever happen in valid code?  This seems to just hide problems.

> +		    {
> +		      success = false;
> +		      break;
> +		    }
> +		}
> +	      if (success)

If you have a huge piece of code like this, factor it?  Esp. if you now
need to have all kinds of booleans where you really just want to do
early returns.


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-20 16:42                   ` Richard Sandiford
@ 2020-10-21  2:43                     ` Hongtao Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Hongtao Liu @ 2020-10-21  2:43 UTC (permalink / raw)
  To: Hongtao Liu, Hongtao Liu via Gcc-patches, Segher Boessenkool,
	Richard Sandiford

On Wed, Oct 21, 2020 at 12:42 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu <crazylht@gmail.com> writes:
> >> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> >> > +           && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> >> > +               .is_constant (&l2)
> >> > +           && known_le (l1, l2)
> >>
> >> I'm not sure the last two &&s are really the important condition.
> >> I think we should drop them for the suggestion below.
> >>
> >
> > Changed, assume gcc also support something like (vec_select:v4di
> > (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
> > (const_int 0)]))
> > as long as the range of selection guaranteed by
> >   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> Yeah, that vec_select looks OK.
>
> >>
> >> > +               if (!CONST_INT_P (idx))
> >>
> >> Here I think we should check:
> >>
> >>                       || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> >>
> >> where:
> >>
> >>        poly_uint64 nunits
> >>          = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
> >>
> >
> > Changed.
> >
> >> This makes sure that all indices are in range.  In particular, it's
> >> valid for the SUBREG_REG to be narrower than mode, for appropriate
> >> vec_select indices
> >>
> >
> > Yes, that's what paradoxical subreg means.
>
> But I was comparing the mode of the vec_select with the mode of the
> SUBREG_REG (rather than the mode of trueop0 with the mode of the
> SUBREG_REG, which is what matters for paradoxical subregs).
>
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when X has same component mode as vec_select.  */
> > +       unsigned HOST_WIDE_INT subreg_offset = 0;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>
> Unnecessary brackets around “GET_MODE_NUNITS (mode)”.
>

Changed.

> > +           && constant_multiple_p (SUBREG_BYTE (trueop0),
> > +                                   GET_MODE_UNIT_BITSIZE (mode),
> > +                                   &subreg_offset))
>
> Sorry, my bad, this should be:
>
>               && constant_multiple_p (subreg_memory_offset (trueop0),
>                                       GET_MODE_UNIT_BITSIZE (mode),
>                                       &subreg_offset))
>

Changed.

> > +         {
> > +           gcc_assert (XVECLEN (trueop1, 0) == l1);
> > +           bool success = true;
> > +           poly_uint64 nunits
> > +             = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > +           for (int i = 0; i != l1; i++)
> > +             {
> > +               rtx idx = XVECEXP (trueop1, 0, i);
> > +               if (!CONST_INT_P (idx)
> > +                   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > +                 {
> > +                   success = false;
> > +                   break;
> > +                 }
> > +             }
> > +           if (success)
> > +             {
> > +               rtx par = trueop1;
> > +               if (subreg_offset)
> > +                 {
> > +                   rtvec vec = rtvec_alloc (l1);
> > +                   for (int i = 0; i < l1; i++)
> > +                     RTVEC_ELT (vec, i)
> > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > +                                          + subreg_offset));
>
> This is applying subreg_offset to the pointer rather than the INTVAL.
> It should be:
>
>                           = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i))
>                                      + subreg_offset);
>

oops, sorry for typo and changed.

> OK with those changes, thanks.
>
> Richard



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-20 21:05                   ` Segher Boessenkool
@ 2020-10-21  3:17                     ` Hongtao Liu
  2020-10-21 15:43                       ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2020-10-21  3:17 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Hongtao Liu via Gcc-patches, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

On Wed, Oct 21, 2020 at 5:07 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote:
> > +       unsigned HOST_WIDE_INT subreg_offset = 0;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && constant_multiple_p (SUBREG_BYTE (trueop0),
> > +                                   GET_MODE_UNIT_BITSIZE (mode),
> > +                                   &subreg_offset))
> > +         {
> > +           gcc_assert (XVECLEN (trueop1, 0) == l1);
>
> Why?  If we want to check that, it should be in RTL checking (and maybe
> it already is!)
>

Yes, RTL checking would guarantee that and it should be removed.

> > +           bool success = true;
> > +           poly_uint64 nunits
> > +             = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > +           for (int i = 0; i != l1; i++)
> > +             {
> > +               rtx idx = XVECEXP (trueop1, 0, i);
> > +               if (!CONST_INT_P (idx)
> > +                   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> Can that ever happen in valid code?  This seems to just hide problems.
>

for rtx like (vec_select:v4di:(subreg:v8di (reg:v2di))
 (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])),
It seems valid for rtl checking.

> > +                 {
> > +                   success = false;
> > +                   break;
> > +                 }
> > +             }
> > +           if (success)
>
> If you have a huge piece of code like this, factor it?  Esp. if you now
> need to have all kinds of booleans where you really just want to do
> early returns.
>

I want to jump out of this if branch, since later codes in this function
 won't simplify VEC_SELECT further when it matches my if condition,
it's ok to use ealry returns.

>
> Segher

Update patch.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec-v6.patch --]
[-- Type: text/x-patch, Size: 3217 bytes --]

From e4e9c256efc636e994b0994c69cb0b4e7edc25a0 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 34 +++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++++++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..947a9f37241 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,40 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	     when X has same component mode as vec_select.  */
+	  unsigned HOST_WIDE_INT subreg_offset = 0;
+	  if (GET_CODE (trueop0) == SUBREG
+	      && GET_MODE_INNER (mode)
+		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
+	      && GET_MODE_NUNITS (mode).is_constant (&l1)
+	      && constant_multiple_p (subreg_memory_offset (trueop0),
+				      GET_MODE_UNIT_BITSIZE (mode),
+				      &subreg_offset))
+	    {
+	      poly_uint64 nunits
+		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
+	      rtx par = trueop1;
+	      for (int i = 0; i != l1; i++)
+		{
+		  rtx idx = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (idx)
+		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
+		    return 0;
+		}
+
+	      if (subreg_offset)
+		{
+		  rtvec vec = rtvec_alloc (l1);
+		  for (int i = 0; i < l1; i++)
+		    RTVEC_ELT (vec, i)
+		      = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i))
+				 + subreg_offset);
+		  par = gen_rtx_PARALLEL (VOIDmode, vec);
+		}
+	      return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par);
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-21  3:17                     ` Hongtao Liu
@ 2020-10-21 15:43                       ` Richard Sandiford
  2020-10-21 16:34                         ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-10-21 15:43 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Segher Boessenkool, Hongtao Liu via Gcc-patches

Hongtao Liu <crazylht@gmail.com> writes:
> +	      poly_uint64 nunits
> +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> +	      rtx par = trueop1;
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (idx)
> +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> +		    return 0;
> +		}

I think the previous version was better.  We shouldn't assume that
further simplification rules will fail just because the conditions
for this rule haven't been met.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-21 15:43                       ` Richard Sandiford
@ 2020-10-21 16:34                         ` Segher Boessenkool
  2020-10-22  3:33                           ` Hongtao Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-21 16:34 UTC (permalink / raw)
  To: Hongtao Liu, Hongtao Liu via Gcc-patches, richard.sandiford

On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote:
> Hongtao Liu <crazylht@gmail.com> writes:
> > +	      poly_uint64 nunits
> > +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > +	      rtx par = trueop1;
> > +	      for (int i = 0; i != l1; i++)
> > +		{
> > +		  rtx idx = XVECEXP (trueop1, 0, i);
> > +		  if (!CONST_INT_P (idx)
> > +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > +		    return 0;
> > +		}
> 
> I think the previous version was better.  We shouldn't assume that
> further simplification rules will fail just because the conditions
> for this rule haven't been met.

Yes.  My suggestion was to factor this big piece of code to a separate
function, and do an early return from *that*.

The patch is okay for trunk without that, with the clumsy booleans.
Thanks Hongtao!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
  2020-10-21 16:34                         ` Segher Boessenkool
@ 2020-10-22  3:33                           ` Hongtao Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Hongtao Liu @ 2020-10-22  3:33 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Hongtao Liu via Gcc-patches, Richard Sandiford

On Thu, Oct 22, 2020 at 12:36 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote:
> > Hongtao Liu <crazylht@gmail.com> writes:
> > > +         poly_uint64 nunits
> > > +           = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > > +         rtx par = trueop1;
> > > +         for (int i = 0; i != l1; i++)
> > > +           {
> > > +             rtx idx = XVECEXP (trueop1, 0, i);
> > > +             if (!CONST_INT_P (idx)
> > > +                 || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > > +               return 0;
> > > +           }
> >
> > I think the previous version was better.  We shouldn't assume that
> > further simplification rules will fail just because the conditions
> > for this rule haven't been met.
>
> Yes.  My suggestion was to factor this big piece of code to a separate
> function, and do an early return from *that*.
>
> The patch is okay for trunk without that, with the clumsy booleans.
> Thanks Hongtao!
>
>
> Segher

Thank you both for the review, I'll commit the patch with *bool success* keeped.

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-10-22  3:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13  8:40 [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg Hongtao Liu
2020-10-13 19:59 ` Segher Boessenkool
2020-10-14  5:43   ` Hongtao Liu
2020-10-14 17:35     ` Segher Boessenkool
2020-10-14 17:55       ` Richard Biener
2020-10-14 19:23         ` Segher Boessenkool
2020-10-15  8:14       ` Hongtao Liu
2020-10-15  9:58         ` Hongtao Liu
2020-10-15 12:38           ` Richard Sandiford
2020-10-19  5:18             ` Hongtao Liu
2020-10-19 15:31               ` Richard Sandiford
2020-10-20  3:20                 ` Hongtao Liu
2020-10-20 16:42                   ` Richard Sandiford
2020-10-21  2:43                     ` Hongtao Liu
2020-10-20 21:05                   ` Segher Boessenkool
2020-10-21  3:17                     ` Hongtao Liu
2020-10-21 15:43                       ` Richard Sandiford
2020-10-21 16:34                         ` Segher Boessenkool
2020-10-22  3:33                           ` Hongtao Liu
2020-10-20 20:43         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).