From: Hongtao Liu <crazylht@gmail.com>
To: Hongtao Liu <crazylht@gmail.com>,
Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org>,
Segher Boessenkool <segher@kernel.crashing.org>,
Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.
Date: Tue, 20 Oct 2020 11:20:48 +0800 [thread overview]
Message-ID: <CAMZc-bwKQOD9s3xPpkemE2yFUrrgrJ_NGMbYc-dZKA3-V2YqEg@mail.gmail.com> (raw)
In-Reply-To: <mptft6a6xmk.fsf@arm.com>
[-- Attachment #1: Type: text/plain, Size: 5947 bytes --]
On Mon, Oct 19, 2020 at 11:31 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu <crazylht@gmail.com> writes:
> > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > + /* Simplify vec_select of a subreg of X to just a vec_select of X
> >> > + when X has same component mode as vec_select. */
> >> > + int l2;
> >> > + if (GET_CODE (trueop0) == SUBREG
> >> > + && GET_MODE_INNER (mode)
> >> > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> >>
> >> Better to use SUBREG_REG here and below.
> >>
> >
> > Yes and changed.
> >
> >> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> >> > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> >> > + .is_constant (&l2)
> >> > + && known_le (l1, l2))
> >> > + {
> >> > + unsigned HOST_WIDE_INT subreg_offset = 0;
> >> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> >> > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> >> > + GET_MODE_SIZE (GET_MODE_INNER (mode)),
> >> > + &subreg_offset));
> >>
> >> can_div_trunc_p discards the remainder, whereas it looks like here
> >> you want an exact multiple.
> >>
> >> I don't think it's absolutely guaranteed that the “if” condition makes
> >> the division by GET_MODE_SIZE exact. E.g. in principle you could have
> >> a subreg of a vector of TIs in which the subreg offset is misaligned by
> >> a DI offset.
> >>
> >> I'm not sure the subreg_lsb conversion is correct though. On big-endian
> >> targets, lane numbering follows memory layout, just like subreg byte
> >> offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch)
> >> was correct.
> >>
> >> In summary, I think the "if” condition should include something like:
> >>
> >> constant_mulitple_p (SUBREG_BYTE (trueop0),
> >> GET_MODE_UNIT_BITSIZE (mode),
> >> &subreg_offset)
> >>
> >
> > Changed.
> >
> >> Thanks,
> >> Richard
> >
> >
> > Update patch.
> >
> > --
> > BR,
> > Hongtao
> >
> > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
> > From: liuhongt <hongtao.liu@intel.com>
> > Date: Tue, 13 Oct 2020 15:35:29 +0800
> > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
> > X.
> >
> > gcc/ChangeLog
> > PR rtl-optimization/97249
> > * simplify-rtx.c (simplify_binary_operation_1): Simplify
> > vec_select of a subreg of X to a vec_select of X.
> >
> > gcc/testsuite/ChangeLog
> >
> > * gcc.target/i386/pr97249-1.c: New test.
> > ---
> > gcc/simplify-rtx.c | 44 +++++++++++++++++++++++
> > gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
> > 2 files changed, 74 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
> >
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index 869f0d11b2e..b1009837b2b 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
> > return subop1;
> > }
> > }
> > +
> > + /* Simplify vec_select of a subreg of X to just a vec_select of X
> > + when X has same component mode as vec_select. */
> > + int l2;
> > + unsigned HOST_WIDE_INT subreg_offset = 0;
> > + if (GET_CODE (trueop0) == SUBREG
> > + && GET_MODE_INNER (mode)
> > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
>
> Nothing really relies on this last line, and nothing uses l0, so better
> to drop it.
>
Changed, so there won't be any vector mode with variable number elts.
> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> > + .is_constant (&l2)
> > + && known_le (l1, l2)
>
> I'm not sure the last two &&s are really the important condition.
> I think we should drop them for the suggestion below.
>
Changed, assume gcc also support something like (vec_select:v4di
(reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
(const_int 0)]))
as long as the range of selection guaranteed by
|| maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > + && constant_multiple_p (SUBREG_BYTE (trueop0),
> > + GET_MODE_UNIT_BITSIZE (mode),
> > + &subreg_offset))
> > + {
> > +
>
> Excess blank line.
>
Changed.
> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
>
> This can just use ==.
>
Changed.
> > + bool success = true;
> > + for (int i = 0; i != l1; i++)
> > + {
> > + rtx idx = XVECEXP (trueop1, 0, i);
>
> Excess space.
Changed.
>
> > + if (!CONST_INT_P (idx))
>
> Here I think we should check:
>
> || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> where:
>
> poly_uint64 nunits
> = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
>
Changed.
> This makes sure that all indices are in range. In particular, it's
> valid for the SUBREG_REG to be narrower than mode, for appropriate
> vec_select indices
>
Yes, that's what paradoxical subreg means.
> Thanks,
> Richard
--
BR,
Hongtao
[-- Attachment #2: 0001-Simplify-vec_select-of-a-subreg-of-X-to-just-a-v5.patch --]
[-- Type: text/x-patch, Size: 3367 bytes --]
From 5dc3de7f3fabb932be2c097db00b75061228caaf Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
X.
gcc/ChangeLog
PR rtl-optimization/97249
* simplify-rtx.c (simplify_binary_operation_1): Simplify
vec_select of a subreg of X to a vec_select of X.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr97249-1.c: New test.
---
gcc/simplify-rtx.c | 41 +++++++++++++++++++++++
gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++
2 files changed, 71 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..df751318237 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,47 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
return subop1;
}
}
+
+ /* Simplify vec_select of a subreg of X to just a vec_select of X
+ when X has same component mode as vec_select. */
+ unsigned HOST_WIDE_INT subreg_offset = 0;
+ if (GET_CODE (trueop0) == SUBREG
+ && GET_MODE_INNER (mode)
+ == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
+ && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+ && constant_multiple_p (SUBREG_BYTE (trueop0),
+ GET_MODE_UNIT_BITSIZE (mode),
+ &subreg_offset))
+ {
+ gcc_assert (XVECLEN (trueop1, 0) == l1);
+ bool success = true;
+ poly_uint64 nunits
+ = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
+ for (int i = 0; i != l1; i++)
+ {
+ rtx idx = XVECEXP (trueop1, 0, i);
+ if (!CONST_INT_P (idx)
+ || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
+ {
+ success = false;
+ break;
+ }
+ }
+ if (success)
+ {
+ rtx par = trueop1;
+ if (subreg_offset)
+ {
+ rtvec vec = rtvec_alloc (l1);
+ for (int i = 0; i < l1; i++)
+ RTVEC_ELT (vec, i)
+ = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
+ + subreg_offset));
+ par = gen_rtx_PARALLEL (VOIDmode, vec);
+ }
+ return gen_rtx_VEC_SELECT (mode, SUBREG_REG (trueop0), par);
+ }
+ }
}
if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..4478a34a9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249 */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxbw[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxwd[ \t]+\(.*%xmm[0-9]} 2 } } */
+/* { dg-final { scan-assembler-times {(?n)vpmovzxdq[ \t]+\(.*%xmm[0-9]} 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+ for (int i = 0 ; i != 8; i++)
+ p3[i] = p1[i] + p2[i];
+ return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+ for (int i = 0 ; i != 4; i++)
+ p3[i] = p1[i] + p2[i];
+ return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+ for (int i = 0 ; i != 2; i++)
+ p3[i] = (long long)p1[i] + (long long)p2[i];
+ return;
+}
--
2.18.1
next prev parent reply other threads:[~2020-10-20 3:19 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-13 8:40 Hongtao Liu
2020-10-13 19:59 ` Segher Boessenkool
2020-10-14 5:43 ` Hongtao Liu
2020-10-14 17:35 ` Segher Boessenkool
2020-10-14 17:55 ` Richard Biener
2020-10-14 19:23 ` Segher Boessenkool
2020-10-15 8:14 ` Hongtao Liu
2020-10-15 9:58 ` Hongtao Liu
2020-10-15 12:38 ` Richard Sandiford
2020-10-19 5:18 ` Hongtao Liu
2020-10-19 15:31 ` Richard Sandiford
2020-10-20 3:20 ` Hongtao Liu [this message]
2020-10-20 16:42 ` Richard Sandiford
2020-10-21 2:43 ` Hongtao Liu
2020-10-20 21:05 ` Segher Boessenkool
2020-10-21 3:17 ` Hongtao Liu
2020-10-21 15:43 ` Richard Sandiford
2020-10-21 16:34 ` Segher Boessenkool
2020-10-22 3:33 ` Hongtao Liu
2020-10-20 20:43 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMZc-bwKQOD9s3xPpkemE2yFUrrgrJ_NGMbYc-dZKA3-V2YqEg@mail.gmail.com \
--to=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).