* [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. @ 2020-10-13 8:40 Hongtao Liu 2020-10-13 19:59 ` Segher Boessenkool 0 siblings, 1 reply; 20+ messages in thread From: Hongtao Liu @ 2020-10-13 8:40 UTC (permalink / raw) To: GCC Patches, Segher Boessenkool [-- Attachment #1: Type: text/plain, Size: 469 bytes --] Hi: For rtx like (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) (parallel [(const_int 0) (const_int 1)])) it could be simplified as inner. Bootstrap is ok, regression test on i386 backend is ok. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of paradoxical subreg. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. -- BR, Hongtao [-- Attachment #2: 0001-Simplify-vec_select-of-paradoxical-subreg.patch --] [-- Type: text/x-patch, Size: 3180 bytes --] From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of paradoxical subreg. For rtx like (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) (parallel [(const_int 0) (const_int 1)])) it could be simplified as inner. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of paradoxical subreg. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 27 ++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++++++++ 2 files changed, 57 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..9c397157f28 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* For cases like + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) + (parallel [(const_int 0) (const_int 1)])). + return inner directly. */ + if (GET_CODE (trueop0) == SUBREG + && paradoxical_subreg_p (trueop0) + && mode == GET_MODE (XEXP (trueop0, 0)) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && l0 % l1 == 0) + { + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; + unsigned HOST_WIDE_INT sel = 0; + int i = 0; + for (;i != l1; i++) + { + rtx j = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (j)) + break; + sel |= HOST_WIDE_INT_1U << UINTVAL (j); + } + /* ??? Need to simplify XEXP (trueop0, 0) here. */ + if (sel == expect) + return XEXP (trueop0, 0); + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..bc34aa8baa6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 [-- Attachment #3: 0001-Simplify-vec_select-of-paradoxical-subreg.patch --] [-- Type: text/x-patch, Size: 3180 bytes --] From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of paradoxical subreg. For rtx like (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) (parallel [(const_int 0) (const_int 1)])) it could be simplified as inner. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of paradoxical subreg. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 27 ++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++++++++ 2 files changed, 57 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..9c397157f28 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* For cases like + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) + (parallel [(const_int 0) (const_int 1)])). + return inner directly. */ + if (GET_CODE (trueop0) == SUBREG + && paradoxical_subreg_p (trueop0) + && mode == GET_MODE (XEXP (trueop0, 0)) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && l0 % l1 == 0) + { + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; + unsigned HOST_WIDE_INT sel = 0; + int i = 0; + for (;i != l1; i++) + { + rtx j = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (j)) + break; + sel |= HOST_WIDE_INT_1U << UINTVAL (j); + } + /* ??? Need to simplify XEXP (trueop0, 0) here. */ + if (sel == expect) + return XEXP (trueop0, 0); + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..bc34aa8baa6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-13 8:40 [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg Hongtao Liu @ 2020-10-13 19:59 ` Segher Boessenkool 2020-10-14 5:43 ` Hongtao Liu 0 siblings, 1 reply; 20+ messages in thread From: Segher Boessenkool @ 2020-10-13 19:59 UTC (permalink / raw) To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu, wwwhhhyyy333 Hi! On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > For rtx like > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > (parallel [(const_int 0) (const_int 1)])) > it could be simplified as inner. You could even simplify any vec_select of a subreg of X to just a vec_select of X, by changing the selection vector a bit (well, only do this if that is a constant vector, I suppose). Not just for paradoxical subregs either, just for *all* subregs. > gcc/ChangeLog > PR rtl-optimization/97249 > * simplify-rtx.c (simplify_binary_operation_1): Simplify > vec_select of paradoxical subreg. > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr97249-1.c: New test. > + /* For cases like > + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > + (parallel [(const_int 0) (const_int 1)])). > + return inner directly. */ > + if (GET_CODE (trueop0) == SUBREG > + && paradoxical_subreg_p (trueop0) > + && mode == GET_MODE (XEXP (trueop0, 0)) > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && l0 % l1 == 0) Why this? Why does the number of elements of the input have to divide that of the output? > + { > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; > + unsigned HOST_WIDE_INT sel = 0; > + int i = 0; > + for (;i != l1; i++) for (int i = 0; i != l1; i++) > + { > + rtx j = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (j)) > + break; > + sel |= HOST_WIDE_INT_1U << UINTVAL (j); > + } > + /* ??? Need to simplify XEXP (trueop0, 0) here. */ > + if (sel == expect) > + return XEXP (trueop0, 0); > + } > } If you just handle the much more generic case, all the other vec_select simplifications can be done as well, not just this one. > +/* PR target/97249 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx2 -O3 -masm=att" } */ > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ I don't know enough about the x86 backend to know if this is exactly what you need in the testsuite. I do know a case of backslashitis when I see one though -- you might want to use {} instead of "", and perhaps \m and \M and \s etc. And to make sure things are on one line, don't do all that nastiness with [^\n], just start the RE with (?n) :-) Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-13 19:59 ` Segher Boessenkool @ 2020-10-14 5:43 ` Hongtao Liu 2020-10-14 17:35 ` Segher Boessenkool 0 siblings, 1 reply; 20+ messages in thread From: Hongtao Liu @ 2020-10-14 5:43 UTC (permalink / raw) To: Segher Boessenkool; +Cc: GCC Patches, H. J. Lu, wwwhhhyyy333 [-- Attachment #1: Type: text/plain, Size: 3502 bytes --] On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > For rtx like > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > (parallel [(const_int 0) (const_int 1)])) > > it could be simplified as inner. > > You could even simplify any vec_select of a subreg of X to just a > vec_select of X, by changing the selection vector a bit (well, only do Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > this if that is a constant vector, I suppose). Not just for paradoxical > subregs either, just for *all* subregs. > Yes, and only when X has the same inner mode and more elements. > > gcc/ChangeLog > > PR rtl-optimization/97249 > > * simplify-rtx.c (simplify_binary_operation_1): Simplify > > vec_select of paradoxical subreg. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/pr97249-1.c: New test. > > > + /* For cases like > > + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > + (parallel [(const_int 0) (const_int 1)])). > > + return inner directly. */ > > + if (GET_CODE (trueop0) == SUBREG > > + && paradoxical_subreg_p (trueop0) > > + && mode == GET_MODE (XEXP (trueop0, 0)) > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && l0 % l1 == 0) > > Why this? Why does the number of elements of the input have to divide > that of the output? > Removed, also add condition for my upper comments. > > + { > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; > > + unsigned HOST_WIDE_INT sel = 0; > > + int i = 0; > > + for (;i != l1; i++) > > for (int i = 0; i != l1; i++) > > > + { > > + rtx j = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (j)) > > + break; > > + sel |= HOST_WIDE_INT_1U << UINTVAL (j); > > + } > > + /* ??? Need to simplify XEXP (trueop0, 0) here. */ > > + if (sel == expect) > > + return XEXP (trueop0, 0); > > + } > > } > > If you just handle the much more generic case, all the other vec_select > simplifications can be done as well, not just this one. > Yes, changed, also selection should be inside the elements of X. > > +/* PR target/97249 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-mavx2 -O3 -masm=att" } */ > > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > I don't know enough about the x86 backend to know if this is exactly > what you need in the testsuite. I do know a case of backslashitis when > I see one though -- you might want to use {} instead of "", and perhaps > \m and \M and \s etc. And to make sure things are on one line, don't do > all that nastiness with [^\n], just start the RE with (?n) :-) > Yes, changed and it's very clean with usage of (?n) and {}. > > Segher Update patch. -- BR, Hongtao [-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec_s-v2.patch --] [-- Type: text/x-patch, Size: 3473 bytes --] From df71eb46e394e5b778c69e9e8f25b301997e365d Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X when available. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ 2 files changed, 74 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..8a10b6cf4d5 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when available. */ + int l2; + if (GET_CODE (trueop0) == SUBREG + && (GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) + .is_constant (&l2) + && known_le (l1, l2)) + { + unsigned HOST_WIDE_INT subreg_offset = 0; + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), + GET_MODE_SIZE (GET_MODE_INNER (mode)), + &subreg_offset)); + bool success = true; + for (int i = 0;i != l1; i++) + { + rtx j = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (j) + || known_ge (UINTVAL (j), l2 - subreg_offset)) + { + success = false; + break; + } + } + if (success) + { + rtx par = trueop1; + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) + + subreg_offset)); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); + } + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-14 5:43 ` Hongtao Liu @ 2020-10-14 17:35 ` Segher Boessenkool 2020-10-14 17:55 ` Richard Biener 2020-10-15 8:14 ` Hongtao Liu 0 siblings, 2 replies; 20+ messages in thread From: Segher Boessenkool @ 2020-10-14 17:35 UTC (permalink / raw) To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu, wwwhhhyyy333 Hi! On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > > For rtx like > > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > > (parallel [(const_int 0) (const_int 1)])) > > > it could be simplified as inner. > > > > You could even simplify any vec_select of a subreg of X to just a > > vec_select of X, by changing the selection vector a bit (well, only do > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. Exactly. > > this if that is a constant vector, I suppose). Not just for paradoxical > > subregs either, just for *all* subregs. > > Yes, and only when X has the same inner mode and more elements. No, for *all*. The mode of the first argument of vec_select does not have to equal its result mode. Any (constant indices) vec_select of a subreg can be written as just a vec_select. > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when available. */ What does "when available" mean here? > + int l2; > + if (GET_CODE (trueop0) == SUBREG > + && (GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) Don't use unnecessary parens here please, it makes it harder to read (there are quite enough parens already :-) ) > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > + &subreg_offset)); Why is this needed? > + bool success = true; > + for (int i = 0;i != l1; i++) (space after ; ) > + { > + rtx j = XVECEXP (trueop1, 0, i); (i and j and k ususally are integers, not rtx) > + if (!CONST_INT_P (j) > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > + { > + success = false; > + break; > + } > + } You don't have to test if the input RTL is valid. You can assume it is. > + if (success) > + { > + rtx par = trueop1; > + if (subreg_offset) > + { > + rtvec vec = rtvec_alloc (l1); > + for (int i = 0; i < l1; i++) > + RTVEC_ELT (vec, i) > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > + + subreg_offset)); > + par = gen_rtx_PARALLEL (VOIDmode, vec); > + } > + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); > + } > + } subreg_offset will differ in meaning if big-endian; is this correct there, do all the stars align so this code works out fine there as well? Looks fine otherwise, thanks :-) Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-14 17:35 ` Segher Boessenkool @ 2020-10-14 17:55 ` Richard Biener 2020-10-14 19:23 ` Segher Boessenkool 2020-10-15 8:14 ` Hongtao Liu 1 sibling, 1 reply; 20+ messages in thread From: Richard Biener @ 2020-10-14 17:55 UTC (permalink / raw) To: gcc-patches, Segher Boessenkool, Hongtao Liu; +Cc: GCC Patches On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote: >Hi! > >On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: >> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool >> <segher@kernel.crashing.org> wrote: >> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: >> > > For rtx like >> > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) >> > > (parallel [(const_int 0) (const_int 1)])) >> > > it could be simplified as inner. >> > >> > You could even simplify any vec_select of a subreg of X to just a >> > vec_select of X, by changing the selection vector a bit (well, only >do >> >> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to >selection. > >Exactly. > >> > this if that is a constant vector, I suppose). Not just for >paradoxical >> > subregs either, just for *all* subregs. >> >> Yes, and only when X has the same inner mode and more elements. > >No, for *all*. The mode of the first argument of vec_select does not >have to equal its result mode. But IIRC the component mode needs to match. >Any (constant indices) vec_select of a subreg can be written as just a >vec_select. > >> + /* Simplify vec_select of a subreg of X to just a vec_select of X >> + when available. */ > >What does "when available" mean here? > >> + int l2; >> + if (GET_CODE (trueop0) == SUBREG >> + && (GET_MODE_INNER (mode) >> + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) > >Don't use unnecessary parens here please, it makes it harder to read >(there are quite enough parens already :-) ) > >> + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), >> + GET_MODE_SIZE (GET_MODE_INNER (mode)), >> + &subreg_offset)); > >Why is this needed? > >> + bool success = true; >> + for (int i = 0;i != l1; i++) > >(space after ; ) > >> + { >> + rtx j = XVECEXP (trueop1, 0, i); > >(i and j and k ususally are integers, not rtx) > >> + if (!CONST_INT_P (j) >> + || known_ge (UINTVAL (j), l2 - subreg_offset)) >> + { >> + success = false; >> + break; >> + } >> + } > >You don't have to test if the input RTL is valid. You can assume it >is. > >> + if (success) >> + { >> + rtx par = trueop1; >> + if (subreg_offset) >> + { >> + rtvec vec = rtvec_alloc (l1); >> + for (int i = 0; i < l1; i++) >> + RTVEC_ELT (vec, i) >> + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) >> + + subreg_offset)); >> + par = gen_rtx_PARALLEL (VOIDmode, vec); >> + } >> + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); >> + } >> + } > >subreg_offset will differ in meaning if big-endian; is this correct >there, do all the stars align so this code works out fine there as >well? > >Looks fine otherwise, thanks :-) > > >Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-14 17:55 ` Richard Biener @ 2020-10-14 19:23 ` Segher Boessenkool 0 siblings, 0 replies; 20+ messages in thread From: Segher Boessenkool @ 2020-10-14 19:23 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Hongtao Liu On Wed, Oct 14, 2020 at 07:55:55PM +0200, Richard Biener wrote: > On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote: > >On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > >> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > >> <segher@kernel.crashing.org> wrote: > >> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > >> > > For rtx like > >> > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > >> > > (parallel [(const_int 0) (const_int 1)])) > >> > > it could be simplified as inner. > >> > > >> > You could even simplify any vec_select of a subreg of X to just a > >> > vec_select of X, by changing the selection vector a bit (well, only > >do > >> > >> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to > >selection. > > > >Exactly. > > > >> > this if that is a constant vector, I suppose). Not just for > >paradoxical > >> > subregs either, just for *all* subregs. > >> > >> Yes, and only when X has the same inner mode and more elements. > > > >No, for *all*. The mode of the first argument of vec_select does not > >have to equal its result mode. > > But IIRC the component mode needs to match. Yeah, good point, at least the i386 backend uses crazy subregs, which is why validate_subreg does not test this :-( Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-14 17:35 ` Segher Boessenkool 2020-10-14 17:55 ` Richard Biener @ 2020-10-15 8:14 ` Hongtao Liu 2020-10-15 9:58 ` Hongtao Liu 2020-10-20 20:43 ` Segher Boessenkool 1 sibling, 2 replies; 20+ messages in thread From: Hongtao Liu @ 2020-10-15 8:14 UTC (permalink / raw) To: Segher Boessenkool; +Cc: GCC Patches, H. J. Lu, Hongyu Wang On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > > <segher@kernel.crashing.org> wrote: > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > > > For rtx like > > > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > > > (parallel [(const_int 0) (const_int 1)])) > > > > it could be simplified as inner. > > > > > > You could even simplify any vec_select of a subreg of X to just a > > > vec_select of X, by changing the selection vector a bit (well, only do > > > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > > Exactly. > > > > this if that is a constant vector, I suppose). Not just for paradoxical > > > subregs either, just for *all* subregs. > > > > Yes, and only when X has the same inner mode and more elements. > > No, for *all*. The mode of the first argument of vec_select does not > have to equal its result mode. > > Any (constant indices) vec_select of a subreg can be written as just a > vec_select. > > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when available. */ > > What does "when available" mean here? > When X has same component mode as vec_select, i'll add this to comments. > > + int l2; > > + if (GET_CODE (trueop0) == SUBREG > > + && (GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) > > Don't use unnecessary parens here please, it makes it harder to read > (there are quite enough parens already :-) ) > Yes. > > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > + &subreg_offset)); > > Why is this needed? I only found this interface for poly_uint64 division to get subreg_offset. > > > + bool success = true; > > + for (int i = 0;i != l1; i++) > > (space after ; ) > > > + { > > + rtx j = XVECEXP (trueop1, 0, i); > > (i and j and k ususally are integers, not rtx) > > > + if (!CONST_INT_P (j) > > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > > + { > > + success = false; > > + break; > > + } > > + } > > You don't have to test if the input RTL is valid. You can assume it is. > This test is for something like (vec_select:v2di (subreg:v4di (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])). const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist? > > + if (success) > > + { > > + rtx par = trueop1; > > + if (subreg_offset) > > + { > > + rtvec vec = rtvec_alloc (l1); > > + for (int i = 0; i < l1; i++) > > + RTVEC_ELT (vec, i) > > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > > + + subreg_offset)); > > + par = gen_rtx_PARALLEL (VOIDmode, vec); > > + } > > + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); > > + } > > + } > > subreg_offset will differ in meaning if big-endian; is this correct Yes. > there, do all the stars align so this code works out fine there as well? > i found it's a bit tricky to adjust selection index for target BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN. Especially for component mode smaller than word, Any interface to handle this? > Looks fine otherwise, thanks :-) > > > Segher -- BR, Hongtao ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-15 8:14 ` Hongtao Liu @ 2020-10-15 9:58 ` Hongtao Liu 2020-10-15 12:38 ` Richard Sandiford 2020-10-20 20:43 ` Segher Boessenkool 1 sibling, 1 reply; 20+ messages in thread From: Hongtao Liu @ 2020-10-15 9:58 UTC (permalink / raw) To: Segher Boessenkool; +Cc: GCC Patches, H. J. Lu, Hongyu Wang [-- Attachment #1: Type: text/plain, Size: 4294 bytes --] On Thu, Oct 15, 2020 at 4:14 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > Hi! > > > > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > > > <segher@kernel.crashing.org> wrote: > > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > > > > For rtx like > > > > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > > > > (parallel [(const_int 0) (const_int 1)])) > > > > > it could be simplified as inner. > > > > > > > > You could even simplify any vec_select of a subreg of X to just a > > > > vec_select of X, by changing the selection vector a bit (well, only do > > > > > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > > > > Exactly. > > > > > > this if that is a constant vector, I suppose). Not just for paradoxical > > > > subregs either, just for *all* subregs. > > > > > > Yes, and only when X has the same inner mode and more elements. > > > > No, for *all*. The mode of the first argument of vec_select does not > > have to equal its result mode. > > > > Any (constant indices) vec_select of a subreg can be written as just a > > vec_select. > > > > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > > + when available. */ > > > > What does "when available" mean here? > > > > When X has same component mode as vec_select, i'll add this to comments. > > > > + int l2; > > > + if (GET_CODE (trueop0) == SUBREG > > > + && (GET_MODE_INNER (mode) > > > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) > > > > Don't use unnecessary parens here please, it makes it harder to read > > (there are quite enough parens already :-) ) > > > > Yes. > > > > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > > + &subreg_offset)); > > > > Why is this needed? > > I only found this interface for poly_uint64 division to get subreg_offset. > > > > > > + bool success = true; > > > + for (int i = 0;i != l1; i++) > > > > (space after ; ) > > > > > + { > > > + rtx j = XVECEXP (trueop1, 0, i); > > > > (i and j and k ususally are integers, not rtx) > > > > > + if (!CONST_INT_P (j) > > > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > > > + { > > > + success = false; > > > + break; > > > + } > > > + } > > > > You don't have to test if the input RTL is valid. You can assume it is. > > > > This test is for something like (vec_select:v2di (subreg:v4di > (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])). > const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist? > Remove the test. > > > + if (success) > > > + { > > > + rtx par = trueop1; > > > + if (subreg_offset) > > > + { > > > + rtvec vec = rtvec_alloc (l1); > > > + for (int i = 0; i < l1; i++) > > > + RTVEC_ELT (vec, i) > > > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > > > + + subreg_offset)); > > > + par = gen_rtx_PARALLEL (VOIDmode, vec); > > > + } > > > + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); > > > + } > > > + } > > > > subreg_offset will differ in meaning if big-endian; is this correct > Yes. > > there, do all the stars align so this code works out fine there as well? > > > > i found it's a bit tricky to adjust selection index for target > BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN. > Especially for component mode smaller than word, Any interface to handle this? > Use subreg_lsb to get offset from least significant bits, suppose this could be independent of big/little-endian. > > Looks fine otherwise, thanks :-) > > > > > > Segher > > > > -- > BR, > Hongtao Update Patch. -- BR, Hongtao [-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec-v3.patch --] [-- Type: text/x-patch, Size: 3477 bytes --] From 2b56fed24fe7ef7f49bcac131bbc7f51f36b0a6a Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X when available. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 43 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ 2 files changed, 73 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..ff831bcec8f 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,49 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when X has same component mode as vec_select. */ + int l2; + if (GET_CODE (trueop0) == SUBREG + && GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) + .is_constant (&l2) + && known_le (l1, l2)) + { + unsigned HOST_WIDE_INT subreg_offset = 0; + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), + GET_MODE_SIZE (GET_MODE_INNER (mode)), + &subreg_offset)); + bool success = true; + for (int i = 0; i != l1; i++) + { + rtx idx = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (idx)) + { + success = false; + break; + } + } + if (success) + { + rtx par = trueop1; + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) + + subreg_offset)); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); + } + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-15 9:58 ` Hongtao Liu @ 2020-10-15 12:38 ` Richard Sandiford 2020-10-19 5:18 ` Hongtao Liu 0 siblings, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-10-15 12:38 UTC (permalink / raw) To: Hongtao Liu via Gcc-patches; +Cc: Segher Boessenkool, Hongtao Liu Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when X has same component mode as vec_select. */ > + int l2; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) Better to use SUBREG_REG here and below. > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > + .is_constant (&l2) > + && known_le (l1, l2)) > + { > + unsigned HOST_WIDE_INT subreg_offset = 0; > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > + &subreg_offset)); can_div_trunc_p discards the remainder, whereas it looks like here you want an exact multiple. I don't think it's absolutely guaranteed that the “if” condition makes the division by GET_MODE_SIZE exact. E.g. in principle you could have a subreg of a vector of TIs in which the subreg offset is misaligned by a DI offset. I'm not sure the subreg_lsb conversion is correct though. On big-endian targets, lane numbering follows memory layout, just like subreg byte offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) was correct. In summary, I think the "if” condition should include something like: constant_mulitple_p (SUBREG_BYTE (trueop0), GET_MODE_UNIT_BITSIZE (mode), &subreg_offset) Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-15 12:38 ` Richard Sandiford @ 2020-10-19 5:18 ` Hongtao Liu 2020-10-19 15:31 ` Richard Sandiford 0 siblings, 1 reply; 20+ messages in thread From: Hongtao Liu @ 2020-10-19 5:18 UTC (permalink / raw) To: Hongtao Liu via Gcc-patches, Segher Boessenkool, Hongtao Liu, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 2123 bytes --] On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + int l2; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > > Better to use SUBREG_REG here and below. > Yes and changed. > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > > + .is_constant (&l2) > > + && known_le (l1, l2)) > > + { > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > + &subreg_offset)); > > can_div_trunc_p discards the remainder, whereas it looks like here > you want an exact multiple. > > I don't think it's absolutely guaranteed that the “if” condition makes > the division by GET_MODE_SIZE exact. E.g. in principle you could have > a subreg of a vector of TIs in which the subreg offset is misaligned by > a DI offset. > > I'm not sure the subreg_lsb conversion is correct though. On big-endian > targets, lane numbering follows memory layout, just like subreg byte > offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) > was correct. > > In summary, I think the "if” condition should include something like: > > constant_mulitple_p (SUBREG_BYTE (trueop0), > GET_MODE_UNIT_BITSIZE (mode), > &subreg_offset) > Changed. > Thanks, > Richard Update patch. -- BR, Hongtao [-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec-v4.patch --] [-- Type: text/x-patch, Size: 3430 bytes --] From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ 2 files changed, 74 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..b1009837b2b 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when X has same component mode as vec_select. */ + int l2; + unsigned HOST_WIDE_INT subreg_offset = 0; + if (GET_CODE (trueop0) == SUBREG + && GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) + .is_constant (&l2) + && known_le (l1, l2) + && constant_multiple_p (SUBREG_BYTE (trueop0), + GET_MODE_UNIT_BITSIZE (mode), + &subreg_offset)) + { + + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + bool success = true; + for (int i = 0; i != l1; i++) + { + rtx idx = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (idx)) + { + success = false; + break; + } + } + if (success) + { + rtx par = trueop1; + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) + + subreg_offset)); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par); + } + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-19 5:18 ` Hongtao Liu @ 2020-10-19 15:31 ` Richard Sandiford 2020-10-20 3:20 ` Hongtao Liu 0 siblings, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-10-19 15:31 UTC (permalink / raw) To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, Segher Boessenkool Hongtao Liu <crazylht@gmail.com> writes: > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> > + /* Simplify vec_select of a subreg of X to just a vec_select of X >> > + when X has same component mode as vec_select. */ >> > + int l2; >> > + if (GET_CODE (trueop0) == SUBREG >> > + && GET_MODE_INNER (mode) >> > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) >> >> Better to use SUBREG_REG here and below. >> > > Yes and changed. > >> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) >> > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) >> > + .is_constant (&l2) >> > + && known_le (l1, l2)) >> > + { >> > + unsigned HOST_WIDE_INT subreg_offset = 0; >> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); >> > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), >> > + GET_MODE_SIZE (GET_MODE_INNER (mode)), >> > + &subreg_offset)); >> >> can_div_trunc_p discards the remainder, whereas it looks like here >> you want an exact multiple. >> >> I don't think it's absolutely guaranteed that the “if” condition makes >> the division by GET_MODE_SIZE exact. E.g. in principle you could have >> a subreg of a vector of TIs in which the subreg offset is misaligned by >> a DI offset. >> >> I'm not sure the subreg_lsb conversion is correct though. On big-endian >> targets, lane numbering follows memory layout, just like subreg byte >> offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) >> was correct. >> >> In summary, I think the "if” condition should include something like: >> >> constant_mulitple_p (SUBREG_BYTE (trueop0), >> GET_MODE_UNIT_BITSIZE (mode), >> &subreg_offset) >> > > Changed. > >> Thanks, >> Richard > > > Update patch. > > -- > BR, > Hongtao > > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001 > From: liuhongt <hongtao.liu@intel.com> > Date: Tue, 13 Oct 2020 15:35:29 +0800 > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of > X. > > gcc/ChangeLog > PR rtl-optimization/97249 > * simplify-rtx.c (simplify_binary_operation_1): Simplify > vec_select of a subreg of X to a vec_select of X. > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr97249-1.c: New test. > --- > gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ > gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ > 2 files changed, 74 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index 869f0d11b2e..b1009837b2b 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, > return subop1; > } > } > + > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when X has same component mode as vec_select. */ > + int l2; > + unsigned HOST_WIDE_INT subreg_offset = 0; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) Nothing really relies on this last line, and nothing uses l0, so better to drop it. > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > + .is_constant (&l2) > + && known_le (l1, l2) I'm not sure the last two &&s are really the important condition. I think we should drop them for the suggestion below. > + && constant_multiple_p (SUBREG_BYTE (trueop0), > + GET_MODE_UNIT_BITSIZE (mode), > + &subreg_offset)) > + { > + Excess blank line. > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); This can just use ==. > + bool success = true; > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); Excess space. > + if (!CONST_INT_P (idx)) Here I think we should check: || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) where: poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). This makes sure that all indices are in range. In particular, it's valid for the SUBREG_REG to be narrower than mode, for appropriate vec_select indices Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-19 15:31 ` Richard Sandiford @ 2020-10-20 3:20 ` Hongtao Liu 2020-10-20 16:42 ` Richard Sandiford 2020-10-20 21:05 ` Segher Boessenkool 0 siblings, 2 replies; 20+ messages in thread From: Hongtao Liu @ 2020-10-20 3:20 UTC (permalink / raw) To: Hongtao Liu, Hongtao Liu via Gcc-patches, Segher Boessenkool, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 5947 bytes --] On Mon, Oct 19, 2020 at 11:31 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Hongtao Liu <crazylht@gmail.com> writes: > > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> > + /* Simplify vec_select of a subreg of X to just a vec_select of X > >> > + when X has same component mode as vec_select. */ > >> > + int l2; > >> > + if (GET_CODE (trueop0) == SUBREG > >> > + && GET_MODE_INNER (mode) > >> > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > >> > >> Better to use SUBREG_REG here and below. > >> > > > > Yes and changed. > > > >> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > >> > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > >> > + .is_constant (&l2) > >> > + && known_le (l1, l2)) > >> > + { > >> > + unsigned HOST_WIDE_INT subreg_offset = 0; > >> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > >> > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), > >> > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > >> > + &subreg_offset)); > >> > >> can_div_trunc_p discards the remainder, whereas it looks like here > >> you want an exact multiple. > >> > >> I don't think it's absolutely guaranteed that the “if” condition makes > >> the division by GET_MODE_SIZE exact. E.g. in principle you could have > >> a subreg of a vector of TIs in which the subreg offset is misaligned by > >> a DI offset. > >> > >> I'm not sure the subreg_lsb conversion is correct though. On big-endian > >> targets, lane numbering follows memory layout, just like subreg byte > >> offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) > >> was correct. > >> > >> In summary, I think the "if” condition should include something like: > >> > >> constant_mulitple_p (SUBREG_BYTE (trueop0), > >> GET_MODE_UNIT_BITSIZE (mode), > >> &subreg_offset) > >> > > > > Changed. > > > >> Thanks, > >> Richard > > > > > > Update patch. > > > > -- > > BR, > > Hongtao > > > > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001 > > From: liuhongt <hongtao.liu@intel.com> > > Date: Tue, 13 Oct 2020 15:35:29 +0800 > > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of > > X. > > > > gcc/ChangeLog > > PR rtl-optimization/97249 > > * simplify-rtx.c (simplify_binary_operation_1): Simplify > > vec_select of a subreg of X to a vec_select of X. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/pr97249-1.c: New test. > > --- > > gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ > > 2 files changed, 74 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index 869f0d11b2e..b1009837b2b 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, > > return subop1; > > } > > } > > + > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + int l2; > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > Nothing really relies on this last line, and nothing uses l0, so better > to drop it. > Changed, so there won't be any vector mode with variable number elts. > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > > + .is_constant (&l2) > > + && known_le (l1, l2) > > I'm not sure the last two &&s are really the important condition. > I think we should drop them for the suggestion below. > Changed, assume gcc also support something like (vec_select:v4di (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) (const_int 0)])) as long as the range of selection guaranteed by || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > + { > > + > > Excess blank line. > Changed. > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > This can just use ==. > Changed. > > + bool success = true; > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > Excess space. Changed. > > > + if (!CONST_INT_P (idx)) > > Here I think we should check: > > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > where: > > poly_uint64 nunits > = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). > Changed. > This makes sure that all indices are in range. In particular, it's > valid for the SUBREG_REG to be narrower than mode, for appropriate > vec_select indices > Yes, that's what paradoxical subreg means. > Thanks, > Richard -- BR, Hongtao [-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-v5.patch --] [-- Type: text/x-patch, Size: 3367 bytes --] From 5dc3de7f3fabb932be2c097db00b75061228caaf Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 41 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..df751318237 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,47 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when X has same component mode as vec_select. */ + unsigned HOST_WIDE_INT subreg_offset = 0; + if (GET_CODE (trueop0) == SUBREG + && GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && constant_multiple_p (SUBREG_BYTE (trueop0), + GET_MODE_UNIT_BITSIZE (mode), + &subreg_offset)) + { + gcc_assert (XVECLEN (trueop1, 0) == l1); + bool success = true; + poly_uint64 nunits + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); + for (int i = 0; i != l1; i++) + { + rtx idx = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (idx) + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) + { + success = false; + break; + } + } + if (success) + { + rtx par = trueop1; + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) + + subreg_offset)); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par); + } + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-20 3:20 ` Hongtao Liu @ 2020-10-20 16:42 ` Richard Sandiford 2020-10-21 2:43 ` Hongtao Liu 2020-10-20 21:05 ` Segher Boessenkool 1 sibling, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-10-20 16:42 UTC (permalink / raw) To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, Segher Boessenkool Hongtao Liu <crazylht@gmail.com> writes: >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) >> > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) >> > + .is_constant (&l2) >> > + && known_le (l1, l2) >> >> I'm not sure the last two &&s are really the important condition. >> I think we should drop them for the suggestion below. >> > > Changed, assume gcc also support something like (vec_select:v4di > (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) > (const_int 0)])) > as long as the range of selection guaranteed by > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) Yeah, that vec_select looks OK. >> >> > + if (!CONST_INT_P (idx)) >> >> Here I think we should check: >> >> || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) >> >> where: >> >> poly_uint64 nunits >> = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). >> > > Changed. > >> This makes sure that all indices are in range. In particular, it's >> valid for the SUBREG_REG to be narrower than mode, for appropriate >> vec_select indices >> > > Yes, that's what paradoxical subreg means. But I was comparing the mode of the vec_select with the mode of the SUBREG_REG (rather than the mode of trueop0 with the mode of the SUBREG_REG, which is what matters for paradoxical subregs). > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when X has same component mode as vec_select. */ > + unsigned HOST_WIDE_INT subreg_offset = 0; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) Unnecessary brackets around “GET_MODE_NUNITS (mode)”. > + && constant_multiple_p (SUBREG_BYTE (trueop0), > + GET_MODE_UNIT_BITSIZE (mode), > + &subreg_offset)) Sorry, my bad, this should be: && constant_multiple_p (subreg_memory_offset (trueop0), GET_MODE_UNIT_BITSIZE (mode), &subreg_offset)) > + { > + gcc_assert (XVECLEN (trueop1, 0) == l1); > + bool success = true; > + poly_uint64 nunits > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (idx) > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > + { > + success = false; > + break; > + } > + } > + if (success) > + { > + rtx par = trueop1; > + if (subreg_offset) > + { > + rtvec vec = rtvec_alloc (l1); > + for (int i = 0; i < l1; i++) > + RTVEC_ELT (vec, i) > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > + + subreg_offset)); This is applying subreg_offset to the pointer rather than the INTVAL. It should be: = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i)) + subreg_offset); OK with those changes, thanks. Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-20 16:42 ` Richard Sandiford @ 2020-10-21 2:43 ` Hongtao Liu 0 siblings, 0 replies; 20+ messages in thread From: Hongtao Liu @ 2020-10-21 2:43 UTC (permalink / raw) To: Hongtao Liu, Hongtao Liu via Gcc-patches, Segher Boessenkool, Richard Sandiford On Wed, Oct 21, 2020 at 12:42 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Hongtao Liu <crazylht@gmail.com> writes: > >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > >> > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > >> > + .is_constant (&l2) > >> > + && known_le (l1, l2) > >> > >> I'm not sure the last two &&s are really the important condition. > >> I think we should drop them for the suggestion below. > >> > > > > Changed, assume gcc also support something like (vec_select:v4di > > (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) > > (const_int 0)])) > > as long as the range of selection guaranteed by > > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > Yeah, that vec_select looks OK. > > >> > >> > + if (!CONST_INT_P (idx)) > >> > >> Here I think we should check: > >> > >> || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > >> > >> where: > >> > >> poly_uint64 nunits > >> = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). > >> > > > > Changed. > > > >> This makes sure that all indices are in range. In particular, it's > >> valid for the SUBREG_REG to be narrower than mode, for appropriate > >> vec_select indices > >> > > > > Yes, that's what paradoxical subreg means. > > But I was comparing the mode of the vec_select with the mode of the > SUBREG_REG (rather than the mode of trueop0 with the mode of the > SUBREG_REG, which is what matters for paradoxical subregs). > > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > Unnecessary brackets around “GET_MODE_NUNITS (mode)”. > Changed. > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > Sorry, my bad, this should be: > > && constant_multiple_p (subreg_memory_offset (trueop0), > GET_MODE_UNIT_BITSIZE (mode), > &subreg_offset)) > Changed. > > + { > > + gcc_assert (XVECLEN (trueop1, 0) == l1); > > + bool success = true; > > + poly_uint64 nunits > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (idx) > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + { > > + success = false; > > + break; > > + } > > + } > > + if (success) > > + { > > + rtx par = trueop1; > > + if (subreg_offset) > > + { > > + rtvec vec = rtvec_alloc (l1); > > + for (int i = 0; i < l1; i++) > > + RTVEC_ELT (vec, i) > > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > > + + subreg_offset)); > > This is applying subreg_offset to the pointer rather than the INTVAL. > It should be: > > = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i)) > + subreg_offset); > oops, sorry for typo and changed. > OK with those changes, thanks. > > Richard -- BR, Hongtao ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-20 3:20 ` Hongtao Liu 2020-10-20 16:42 ` Richard Sandiford @ 2020-10-20 21:05 ` Segher Boessenkool 2020-10-21 3:17 ` Hongtao Liu 1 sibling, 1 reply; 20+ messages in thread From: Segher Boessenkool @ 2020-10-20 21:05 UTC (permalink / raw) To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, Richard Sandiford On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote: > + unsigned HOST_WIDE_INT subreg_offset = 0; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && constant_multiple_p (SUBREG_BYTE (trueop0), > + GET_MODE_UNIT_BITSIZE (mode), > + &subreg_offset)) > + { > + gcc_assert (XVECLEN (trueop1, 0) == l1); Why? If we want to check that, it should be in RTL checking (and maybe it already is!) > + bool success = true; > + poly_uint64 nunits > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (idx) > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) Can that ever happen in valid code? This seems to just hide problems. > + { > + success = false; > + break; > + } > + } > + if (success) If you have a huge piece of code like this, factor it? Esp. if you now need to have all kinds of booleans where you really just want to do early returns. Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-20 21:05 ` Segher Boessenkool @ 2020-10-21 3:17 ` Hongtao Liu 2020-10-21 15:43 ` Richard Sandiford 0 siblings, 1 reply; 20+ messages in thread From: Hongtao Liu @ 2020-10-21 3:17 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Hongtao Liu via Gcc-patches, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 2024 bytes --] On Wed, Oct 21, 2020 at 5:07 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote: > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > + { > > + gcc_assert (XVECLEN (trueop1, 0) == l1); > > Why? If we want to check that, it should be in RTL checking (and maybe > it already is!) > Yes, RTL checking would guarantee that and it should be removed. > > + bool success = true; > > + poly_uint64 nunits > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (idx) > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > Can that ever happen in valid code? This seems to just hide problems. > for rtx like (vec_select:v4di:(subreg:v8di (reg:v2di)) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])), It seems valid for rtl checking. > > + { > > + success = false; > > + break; > > + } > > + } > > + if (success) > > If you have a huge piece of code like this, factor it? Esp. if you now > need to have all kinds of booleans where you really just want to do > early returns. > I want to jump out of this if branch, since later codes in this function won't simplify VEC_SELECT further when it matches my if condition, it's ok to use ealry returns. > > Segher Update patch. -- BR, Hongtao [-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-vec-v6.patch --] [-- Type: text/x-patch, Size: 3217 bytes --] From e4e9c256efc636e994b0994c69cb0b4e7edc25a0 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of X. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 34 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..947a9f37241 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,40 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* Simplify vec_select of a subreg of X to just a vec_select of X + when X has same component mode as vec_select. */ + unsigned HOST_WIDE_INT subreg_offset = 0; + if (GET_CODE (trueop0) == SUBREG + && GET_MODE_INNER (mode) + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) + && GET_MODE_NUNITS (mode).is_constant (&l1) + && constant_multiple_p (subreg_memory_offset (trueop0), + GET_MODE_UNIT_BITSIZE (mode), + &subreg_offset)) + { + poly_uint64 nunits + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); + rtx par = trueop1; + for (int i = 0; i != l1; i++) + { + rtx idx = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (idx) + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) + return 0; + } + + if (subreg_offset) + { + rtvec vec = rtvec_alloc (l1); + for (int i = 0; i < l1; i++) + RTVEC_ELT (vec, i) + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)) + + subreg_offset); + par = gen_rtx_PARALLEL (VOIDmode, vec); + } + return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par); + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..4478a34a9f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-21 3:17 ` Hongtao Liu @ 2020-10-21 15:43 ` Richard Sandiford 2020-10-21 16:34 ` Segher Boessenkool 0 siblings, 1 reply; 20+ messages in thread From: Richard Sandiford @ 2020-10-21 15:43 UTC (permalink / raw) To: Hongtao Liu; +Cc: Segher Boessenkool, Hongtao Liu via Gcc-patches Hongtao Liu <crazylht@gmail.com> writes: > + poly_uint64 nunits > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > + rtx par = trueop1; > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (idx) > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > + return 0; > + } I think the previous version was better. We shouldn't assume that further simplification rules will fail just because the conditions for this rule haven't been met. Thanks, Richard ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-21 15:43 ` Richard Sandiford @ 2020-10-21 16:34 ` Segher Boessenkool 2020-10-22 3:33 ` Hongtao Liu 0 siblings, 1 reply; 20+ messages in thread From: Segher Boessenkool @ 2020-10-21 16:34 UTC (permalink / raw) To: Hongtao Liu, Hongtao Liu via Gcc-patches, richard.sandiford On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote: > Hongtao Liu <crazylht@gmail.com> writes: > > + poly_uint64 nunits > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > + rtx par = trueop1; > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (idx) > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + return 0; > > + } > > I think the previous version was better. We shouldn't assume that > further simplification rules will fail just because the conditions > for this rule haven't been met. Yes. My suggestion was to factor this big piece of code to a separate function, and do an early return from *that*. The patch is okay for trunk without that, with the clumsy booleans. Thanks Hongtao! Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-21 16:34 ` Segher Boessenkool @ 2020-10-22 3:33 ` Hongtao Liu 0 siblings, 0 replies; 20+ messages in thread From: Hongtao Liu @ 2020-10-22 3:33 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Hongtao Liu via Gcc-patches, Richard Sandiford On Thu, Oct 22, 2020 at 12:36 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote: > > Hongtao Liu <crazylht@gmail.com> writes: > > > + poly_uint64 nunits > > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > > + rtx par = trueop1; > > > + for (int i = 0; i != l1; i++) > > > + { > > > + rtx idx = XVECEXP (trueop1, 0, i); > > > + if (!CONST_INT_P (idx) > > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > > + return 0; > > > + } > > > > I think the previous version was better. We shouldn't assume that > > further simplification rules will fail just because the conditions > > for this rule haven't been met. > > Yes. My suggestion was to factor this big piece of code to a separate > function, and do an early return from *that*. > > The patch is okay for trunk without that, with the clumsy booleans. > Thanks Hongtao! > > > Segher Thank you both for the review, I'll commit the patch with *bool success* keeped. -- BR, Hongtao ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg. 2020-10-15 8:14 ` Hongtao Liu 2020-10-15 9:58 ` Hongtao Liu @ 2020-10-20 20:43 ` Segher Boessenkool 1 sibling, 0 replies; 20+ messages in thread From: Segher Boessenkool @ 2020-10-20 20:43 UTC (permalink / raw) To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu, Hongyu Wang On Thu, Oct 15, 2020 at 04:14:39PM +0800, Hongtao Liu wrote: > On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > > + &subreg_offset)); > > > > Why is this needed? > > I only found this interface for poly_uint64 division to get subreg_offset. I mean, why do you have this assert at all? > > > + if (!CONST_INT_P (j) > > > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > > > + { > > > + success = false; > > > + break; > > > + } > > > + } > > > > You don't have to test if the input RTL is valid. You can assume it is. > > > > This test is for something like (vec_select:v2di (subreg:v4di > (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])). > const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist? Assuming this is LE: yes, this is just invalid. You can do whatever you want with it (except ICE :-) ) > > subreg_offset will differ in meaning if big-endian; is this correct > Yes. > > there, do all the stars align so this code works out fine there as well? > > i found it's a bit tricky to adjust selection index for target > BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN. > Especially for component mode smaller than word, Any interface to handle this? For most things you want BYTES_BIG_ENDIAN, anything in a subreg here for example. I don't know which of those vectors use; I cannot find it in the documentation, either. Segher ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2020-10-22 3:31 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-13 8:40 [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg Hongtao Liu 2020-10-13 19:59 ` Segher Boessenkool 2020-10-14 5:43 ` Hongtao Liu 2020-10-14 17:35 ` Segher Boessenkool 2020-10-14 17:55 ` Richard Biener 2020-10-14 19:23 ` Segher Boessenkool 2020-10-15 8:14 ` Hongtao Liu 2020-10-15 9:58 ` Hongtao Liu 2020-10-15 12:38 ` Richard Sandiford 2020-10-19 5:18 ` Hongtao Liu 2020-10-19 15:31 ` Richard Sandiford 2020-10-20 3:20 ` Hongtao Liu 2020-10-20 16:42 ` Richard Sandiford 2020-10-21 2:43 ` Hongtao Liu 2020-10-20 21:05 ` Segher Boessenkool 2020-10-21 3:17 ` Hongtao Liu 2020-10-21 15:43 ` Richard Sandiford 2020-10-21 16:34 ` Segher Boessenkool 2020-10-22 3:33 ` Hongtao Liu 2020-10-20 20:43 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).