public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,
	 Richard Biener <richard.guenther@gmail.com>,
	gcc Patches <gcc-patches@gcc.gnu.org>,
	 richard.sandiford@arm.com
Subject: Re: [match.pd] [SVE] Add pattern to transform svrev(svrev(v)) --> v
Date: Sun, 23 Apr 2023 11:41:19 +0530	[thread overview]
Message-ID: <CAAgBjMkgFog2+Ok3FnYi32420kEeRzWCwNTPnGrRuwzA+JdKXg@mail.gmail.com> (raw)
In-Reply-To: <mptsfct8ak3.fsf@arm.com>

[-- Attachment #1: Type: text/plain, Size: 10184 bytes --]

On Fri, 21 Apr 2023 at 21:57, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Wed, 19 Apr 2023 at 16:17, Richard Biener <richard.guenther@gmail.com> wrote:
> >>
> >> On Wed, Apr 19, 2023 at 11:21 AM Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> >
> >> > On Tue, 11 Apr 2023 at 19:36, Prathamesh Kulkarni
> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> > >
> >> > > On Tue, 11 Apr 2023 at 14:17, Richard Biener <richard.guenther@gmail.com> wrote:
> >> > > >
> >> > > > On Wed, Apr 5, 2023 at 10:39 AM Prathamesh Kulkarni via Gcc-patches
> >> > > > <gcc-patches@gcc.gnu.org> wrote:
> >> > > > >
> >> > > > > Hi,
> >> > > > > For the following test:
> >> > > > >
> >> > > > > svint32_t f(svint32_t v)
> >> > > > > {
> >> > > > >   return svrev_s32 (svrev_s32 (v));
> >> > > > > }
> >> > > > >
> >> > > > > We generate 2 rev instructions instead of nop:
> >> > > > > f:
> >> > > > >         rev     z0.s, z0.s
> >> > > > >         rev     z0.s, z0.s
> >> > > > >         ret
> >> > > > >
> >> > > > > The attached patch tries to fix that by trying to recognize the following
> >> > > > > pattern in match.pd:
> >> > > > > v1 = VEC_PERM_EXPR (v0, v0, mask)
> >> > > > > v2 = VEC_PERM_EXPR (v1, v1, mask)
> >> > > > > -->
> >> > > > > v2 = v0
> >> > > > > if mask is { nelts - 1, nelts - 2, nelts - 3, ... }
> >> > > > >
> >> > > > > Code-gen with patch:
> >> > > > > f:
> >> > > > >         ret
> >> > > > >
> >> > > > > Bootstrap+test passes on aarch64-linux-gnu, and SVE bootstrap in progress.
> >> > > > > Does it look OK for stage-1 ?
> >> > > >
> >> > > > I didn't look at the patch but tree-ssa-forwprop.cc:simplify_permutation should
> >> > > > handle two consecutive permutes with the is_combined_permutation_identity
> >> > > > which might need tweaking for VLA vectors
> >> > > Hi Richard,
> >> > > Thanks for the suggestions. The attached patch modifies
> >> > > is_combined_permutation_identity
> >> > > to recognize the above pattern.
> >> > > Does it look OK ?
> >> > > Bootstrap+test in progress on aarch64-linux-gnu and x86_64-linux-gnu.
> >> > Hi,
> >> > ping https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615502.html
> >>
> >> Can you instead of def_stmt pass in a bool whether rhs1 is equal to rhs2
> >> and amend the function comment accordingly, say,
> >>
> >>   tem = VEC_PERM <op0, op1, MASK1>;
> >>   res = VEC_PERM <tem, tem, MASK2>;
> >>
> >> SAME_P specifies whether op0 and op1 compare equal.  */
> >>
> >> +  if (def_stmt)
> >> +    gcc_checking_assert (is_gimple_assign (def_stmt)
> >> +                        && gimple_assign_rhs_code (def_stmt) == VEC_PERM_EXPR);
> >> this is then unnecessary
> >>
> >>    mask = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask1, mask2);
> >> +
> >> +  /* For VLA masks, check for the following pattern:
> >> +     v1 = VEC_PERM_EXPR (v0, v0, mask)
> >> +     v2 = VEC_PERM_EXPR (v1, v1, mask)
> >> +     -->
> >> +     v2 = v0
> >>
> >> you are not using 'mask' so please defer fold_ternary until after your
> >> special-case.
> >>
> >> +  if (operand_equal_p (mask1, mask2, 0)
> >> +      && !VECTOR_CST_NELTS (mask1).is_constant ()
> >> +      && def_stmt
> >> +      && operand_equal_p (gimple_assign_rhs1 (def_stmt),
> >> +                         gimple_assign_rhs2 (def_stmt), 0))
> >> +    {
> >> +      vec_perm_builder builder;
> >> +      if (tree_to_vec_perm_builder (&builder, mask1))
> >> +       {
> >> +         poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1));
> >> +         vec_perm_indices sel (builder, 1, nelts);
> >> +         if (sel.series_p (0, 1, nelts - 1, -1))
> >> +           return 1;
> >> +       }
> >> +      return 0;
> >>
> >> I'm defering to Richard whether this is the correct way to check for a vector
> >> reversing mask (I wonder how constructing such mask is even possible)
> > Hi Richard,
> > Thanks for the suggestions, I have updated the patch accordingly.
> >
> > The following hunk from svrev_impl::fold() constructs mask in reverse:
> >     /* Permute as { nelts - 1, nelts - 2, nelts - 3, ... }.  */
> >     poly_int64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs));
> >     vec_perm_builder builder (nelts, 1, 3);
> >     for (int i = 0; i < 3; ++i)
> >       builder.quick_push (nelts - i - 1);
> >     return fold_permute (f, builder);
> >
> > To see if mask chooses elements in reverse, I borrowed it from function comment
> > for series_p in vec-perm-indices.cc:
> > /* Return true if index OUT_BASE + I * OUT_STEP selects input
> >    element IN_BASE + I * IN_STEP.  For example, the call to test
> >    whether a permute reverses a vector of N elements would be:
> >
> >      series_p (0, 1, N - 1, -1)
> >
> >    which would return true for { N - 1, N - 2, N - 3, ... }.  */
> >
> > Thanks,
> > Prathamesh
> >>
> >> Richard.
> >>
> >> > Thanks,
> >> > Prathamesh
> >> > >
> >> > > Thanks,
> >> > > Prathamesh
> >> > > >
> >> > > > Richard.
> >> > > >
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Prathamesh
> >
> > gcc/ChangeLog:
> >       * tree-ssa-forwprop.cc (is_combined_permutation_identity):
> >       New parameter same_p.
> >       Try to simplify two successive VEC_PERM_EXPRs with single operand
> >       and same mask, where mask chooses elements in reverse order.
> >
> > gcc/testesuite/ChangeLog:
> >       * gcc.target/aarch64/sve/acle/general/rev-1.c: New test.
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
> > new file mode 100644
> > index 00000000000..e57ee67d716
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +
> > +#include <arm_sve.h>
> > +
> > +svint32_t f(svint32_t v)
> > +{
> > +  return svrev_s32 (svrev_s32 (v));
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "return v_1\\(D\\)" "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
> > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > index 9b567440ba4..ebd4a368ae9 100644
> > --- a/gcc/tree-ssa-forwprop.cc
> > +++ b/gcc/tree-ssa-forwprop.cc
> > @@ -2528,11 +2528,16 @@ simplify_bitfield_ref (gimple_stmt_iterator *gsi)
> >    return true;
> >  }
> >
> > -/* Determine whether applying the 2 permutations (mask1 then mask2)
> > -   gives back one of the input.  */
> > +/* For the following sequence:
> > +   tem = VEC_PERM_EXPR <op0, op1, mask1>
> > +   res = VEC_PERM_EXPR <tem, tem, mask2>
> > +
> > +   Determine whether applying the 2 permutations (mask1 then mask2)
> > +   gives back one of the input. SAME_P specifies whether op0
> > +   and op1 compare equal.  */
> >
> >  static int
> > -is_combined_permutation_identity (tree mask1, tree mask2)
> > +is_combined_permutation_identity (tree mask1, tree mask2, bool same_p)
> >  {
> >    tree mask;
> >    unsigned HOST_WIDE_INT nelts, i, j;
> > @@ -2541,6 +2546,29 @@ is_combined_permutation_identity (tree mask1, tree mask2)
> >
> >    gcc_checking_assert (TREE_CODE (mask1) == VECTOR_CST
> >                      && TREE_CODE (mask2) == VECTOR_CST);
> > +
> > +  /* For VLA masks, check for the following pattern:
> > +     v1 = VEC_PERM_EXPR (v0, v0, mask1)
> > +     v2 = VEC_PERM_EXPR (v1, v1, mask2)
> > +     -->
> > +     v2 = v0
> > +     if mask1 == mask2 == {nelts - 1, nelts - 2, ...}.  */
> > +
> > +  if (operand_equal_p (mask1, mask2, 0)
> > +      && !VECTOR_CST_NELTS (mask1).is_constant ()
> > +      && same_p)
>
> The same_p isn't needed, since a reverse mask only picks from the
> first operand.  Canonicalisation rules should ensure that op0 and op1
> end up being equal in that case, but this fold doesn't depend on that.
>
> > +    {
> > +      vec_perm_builder builder;
> > +      if (tree_to_vec_perm_builder (&builder, mask1))
> > +     {
> > +       poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1));
> > +       vec_perm_indices sel (builder, 1, nelts);
> > +       if (sel.series_p (0, 1, nelts - 1, -1))
> > +         return 1;
> > +     }
>
> I have conflicting feelings about this.
>
> (1) It seems quite special-purpose.  I'm a bit worried that we'll end
>     up with a list of highly specific folds (which could grow quite long)
>     rather than something more general.
>
> (2) Constructing a vec_perm_indices seems quite heavyweight for this
>     single check.  We could go directly from the mask instead.
>
> But “fixing” (2) reduces the generality of the structure and so leans
> more heavily into (1).
>
> I agree the check is correct though, so let's go with the patch
> in this form (without the same_p thing).  However:
>
> > +      return 0;
> > +    }
> > +
>
> I don't think we want this early exit.  It won't be used for all VLA cases
> due the operand_equal_p check.  And in future, more VLA-compatible stuff
> might be added further down.
Hi Richard,
Thanks for the suggestions, does the attached patch look OK ?
Bootstrapped+tested on aarch64-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard
>
> >    mask = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask1, mask2);
> >    if (mask == NULL_TREE || TREE_CODE (mask) != VECTOR_CST)
> >      return 0;
> > @@ -2629,7 +2657,9 @@ simplify_permutation (gimple_stmt_iterator *gsi)
> >        op3 = gimple_assign_rhs3 (def_stmt);
> >        if (TREE_CODE (op3) != VECTOR_CST)
> >       return 0;
> > -      ident = is_combined_permutation_identity (op3, op2);
> > +      bool same_p = operand_equal_p (gimple_assign_rhs1 (def_stmt),
> > +                                  gimple_assign_rhs2 (def_stmt), 0);
> > +      ident = is_combined_permutation_identity (op3, op2, same_p);
> >        if (!ident)
> >       return 0;
> >        orig = (ident == 1) ? gimple_assign_rhs1 (def_stmt)

[-- Attachment #2: gnu-830-5.txt --]
[-- Type: text/plain, Size: 1989 bytes --]

gcc/ChangeLog:
	* tree-ssa-forwprop.cc (is_combined_permutation_identity): Try to
	simplify two successive VEC_PERM_EXPRs with single operand and same
	mask, where mask chooses elements in reverse order.

gcc/testesuite/ChangeLog:
	* gcc.target/aarch64/sve/acle/general/rev-1.c: New test.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
new file mode 100644
index 00000000000..e57ee67d716
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include <arm_sve.h>
+
+svint32_t f(svint32_t v)
+{
+  return svrev_s32 (svrev_s32 (v));
+}
+
+/* { dg-final { scan-tree-dump "return v_1\\(D\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 9b567440ba4..61df7efe82c 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2541,6 +2541,27 @@ is_combined_permutation_identity (tree mask1, tree mask2)
 
   gcc_checking_assert (TREE_CODE (mask1) == VECTOR_CST
 		       && TREE_CODE (mask2) == VECTOR_CST);
+
+  /* For VLA masks, check for the following pattern:
+     v1 = VEC_PERM_EXPR (v0, v0, mask1)
+     v2 = VEC_PERM_EXPR (v1, v1, mask2)
+     -->
+     v2 = v0
+     if mask1 == mask2 == {nelts - 1, nelts - 2, ...}.  */
+
+  if (operand_equal_p (mask1, mask2, 0)
+      && !VECTOR_CST_NELTS (mask1).is_constant ())
+    {
+      vec_perm_builder builder;
+      if (tree_to_vec_perm_builder (&builder, mask1))
+	{
+	  poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1));
+	  vec_perm_indices sel (builder, 1, nelts);
+	  if (sel.series_p (0, 1, nelts - 1, -1))
+	    return 1;
+	}
+    }
+
   mask = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask1, mask2);
   if (mask == NULL_TREE || TREE_CODE (mask) != VECTOR_CST)
     return 0;

  reply	other threads:[~2023-04-23  6:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05  8:38 Prathamesh Kulkarni
2023-04-11  8:46 ` Richard Biener
2023-04-11 14:06   ` Prathamesh Kulkarni
2023-04-19  9:21     ` Prathamesh Kulkarni
2023-04-19 10:45       ` Richard Biener
2023-04-19 12:20         ` Prathamesh Kulkarni
2023-04-19 12:32           ` Richard Biener
2023-04-21 16:27           ` Richard Sandiford
2023-04-23  6:11             ` Prathamesh Kulkarni [this message]
2023-04-24  9:32               ` Richard Sandiford
2023-04-24 19:53                 ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAgBjMkgFog2+Ok3FnYi32420kEeRzWCwNTPnGrRuwzA+JdKXg@mail.gmail.com \
    --to=prathamesh.kulkarni@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).