RE: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "ook@ucw.cz" <ook@ucw.cz>
Subject: RE: [PATCH v2 9/18]middle-end optimize slp simplify back to back permutes.
Date: Wed, 4 Nov 2020 13:08:26 +0000	[thread overview]
Message-ID: <VI1PR08MB532598C2E70FE6AEDB7B5477FFEF0@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2011041347090.10073@p653.nepu.fhfr.qr>

Hi Richi,

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Wednesday, November 4, 2020 1:00 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> Subject: Re: [PATCH v2 9/18]middle-end optimize slp simplify back to back
> permutes.
> 
> On Tue, 3 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This optimizes sequential permutes. i.e. if there are two permutes
> > back to back this function applies the permute of the parent to the
> > child and removed the parent.
> >
> > If the resulting permute in the child is now a no-op.  Then the child
> > is also dropped from the graph and the parent's parent attached to the
> child's child.
> >
> > This relies on the materialization point calculation in optimize SLP.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Tests are included as part of the final patch as they need the SLP
> > pattern matcher to insert permutes in between.
> >
> > This allows us to remove useless permutes such as
> >
> > 	ldr     q0, [x0, x3]
> > 	ldr     q2, [x1, x3]
> > 	trn1    v1.4s, v0.4s, v0.4s
> > 	trn2    v0.4s, v0.4s, v0.4s
> > 	trn1    v0.4s, v1.4s, v0.4s
> > 	mov     v1.16b, v3.16b
> > 	fcmla   v1.4s, v0.4s, v2.4s, #0
> > 	fcmla   v1.4s, v0.4s, v2.4s, #90
> > 	str     q1, [x2, x3]
> >
> > from the sequence the vectorizer puts out and give
> >
> > 	ldr     q0, [x0, x3]
> > 	ldr     q2, [x1, x3]
> > 	mov     v1.16b, v3.16b
> > 	fcmla   v1.4s, v0.4s, v2.4s, #0
> > 	fcmla   v1.4s, v0.4s, v2.4s, #90
> > 	str     q1, [x2, x3]
> >
> > instead
> >
> > Ok for master?
> 
> +             /* If the remaining permute is a no-op then we can just
> + drop
> the
> +                node instead of materializing it.  */
> +             if (vect_slp_tree_permute_noop_p (node))
> +               {
> +                 if (dump_enabled_p ())
> +                   dump_printf_loc (MSG_NOTE, vect_location,
> +                                    "removing unneeded permute node
> %p\n",
> +                                    node);
> +
> +                  unsigned idx = SLP_TREE_LANE_PERMUTATION
> (node)[0].first;
> +                  slp_tree value = SLP_TREE_CHILDREN (node)[idx];
> +                  unsigned src = slpg->vertices[node->vertex].pred->src;
> +                  slp_tree prev = vertices[src];
> +                  unsigned dest;
> +                  slp_tree tmp;
> +                  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (prev), dest, tmp)
> +                    if (tmp == node)
> +                      {
> +                         SLP_TREE_CHILDREN (prev)[dest] = value;
> +                         break;
> +                       }
> 
> so I don't think this will work reliably since we do not update the graph when
> inserting permute nodes and thus the "parent"
> can refer to a permute rather than the original node now (we're just walking
> over all vertices in no specific order during materialization - guess using IPO
> might fix this apart from in cycles).  You would also need to iterate over preds
> here (pred_next).
> I guess removing no-op permutes is only important for costing?
> They should not cause any actual code generation?

Yeah, it's just for costing, the simplification of the permute part is the one fixing
the codegen. I could just remove the lane permute (as in, clear it) and change the
costing function to not cost VEC_PERMS with no lane permutes (if it doesn't already do that).

> 
> You also need to adjust reference counts when you change
> SLP_TREE_CHILDREN (prev)[dest], first add to that of VALUE and then
> slp_tree_free node itself (which might be tricky at this point).
> 
> +static bool
> +vect_slp_tree_permute_noop_p (slp_tree node) {
> +  gcc_assert (SLP_TREE_CODE (node) == VEC_PERM_EXPR);
> +
> +  if (!SLP_TREE_LANE_PERMUTATION (node).exists ())
> +    return true;
> +
> +  unsigned x, seed;
> +  lane_permutation_t perms = SLP_TREE_LANE_PERMUTATION (node);
> seed =
> + perms[0].second;  for (x = 1; x < perms.length (); x++)
> +    if (perms[x].first != perms[0].first || perms[x].second != ++seed)
> +      return false;
> 
> 'seed' needs to be zero to be a noop permute and SLP_TREE_LANES
> (SLP_TREE_CHILDREN (node)[perms[0].first]) needs to be the same as
> SLP_TREE_LANES (node).  Otherwise you'll make permutes that select parts
> of a vector no-op.
> 
> Maybe simplify the patch and do the vect_slp_tree_permute_noop_p check
> in vectorizable_slp_permutation instead?
> 
> The permute node adjustment part is OK, thus up to
> 
> +         else if (SLP_TREE_LANE_PERMUTATION (node).exists ())
> +           {
> +             /* If the node if already a permute node we just need to
> apply
> +                the permutation to the permute node itself.  */
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_NOTE, vect_location,
> +                                "simplifying permute node %p\n",
> +                                node);
> +
> +             vect_slp_permute (perms[perm], SLP_TREE_LANE_PERMUTATION
> (node),
> +                               true);
> 
> in case you want to split up, independently of the rest of the patches.
> 

I could but don't think I can generate a testcase for this as the build function wouldn't
have generated back to back permutes.  So I'd still need the rest to get approved for a
testcase 😊

Thanks,
Tamar
> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
> > 	(vect_optimize_slp): Optimize permutes.
> > 	(vectorizable_slp_permutation): Fix typo.
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend

next prev parent reply	other threads:[~2020-11-04 13:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03 15:07 Tamar Christina
2020-11-04 13:00 ` Richard Biener
2020-11-04 13:08   ` Tamar Christina [this message]
2020-11-04 13:12     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB532598C2E70FE6AEDB7B5477FFEF0@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=ook@ucw.cz \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).