public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,  "ook@ucw.cz" <ook@ucw.cz>
Subject: RE: [PATCH 1/8 v9]middle-end slp: Support optimizing load distribution
Date: Thu, 7 Jan 2021 14:36:12 +0100 (CET)	[thread overview]
Message-ID: <nycvar.YFH.7.76.2101071434510.17979@zhemvz.fhfr.qr> (raw)
In-Reply-To: <VI1PR08MB53257ACF5B9F7BBEC623EB81FFAF0@VI1PR08MB5325.eurprd08.prod.outlook.com>

On Thu, 7 Jan 2021, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, January 7, 2021 1:21 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> > Subject: Re: [PATCH 1/8 v9]middle-end slp: Support optimizing load
> > distribution
> > 
> > > From tamar.christina@arm.com Mon Dec 28 14:36:32 2020
> > > Date: Mon, 28 Dec 2020 13:35:56 +0000
> > > From: Tamar Christina <tamar.christina@arm.com>
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd@arm.com, rguenther@suse.de, ook@ucw.cz
> > > Subject: [PATCH 1/8 v9]middle-end slp: Support optimizing load
> > > distribution
> > >
> > > Hi All,
> > >
> > > This introduces a post processing step for the pattern matcher to
> > > flatten permutes introduced by the complex multiplications patterns.
> > >
> > > This performs a blend early such that SLP is not cancelled by the
> > > LOAD_LANES permute.  This is a temporary workaround to the fact that
> > > loads are not CSEd during building and is required to produce efficient code.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-slp.c (optimize_load_redistribution_1): New.
> > > 	(optimize_load_redistribution): New.
> > > 	(vect_match_slp_patterns): Use it.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index
> > >
> > 2a58e54fe51471df5f55ce4a524d0022744054b0..8360a59098f517498f3155f325c
> > f
> > > 8406466ac25c 100644
> > > --- a/gcc/tree-vect-slp.c
> > > +++ b/gcc/tree-vect-slp.c
> > > @@ -2228,6 +2228,115 @@ calculate_unrolling_factor (poly_uint64 nunits,
> > unsigned int group_size)
> > >    return exact_div (common_multiple (nunits, group_size),
> > > group_size);  }
> > >
> > > +/* Helper function of optimize_load_redistribution that performs the
> > operation
> > > +   recursively.  */
> > > +
> > > +static slp_tree
> > > +optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t
> > *bst_map,
> > > +				hash_set<slp_tree> *visited, slp_tree root) {
> > > +  if (visited->add (root))
> > > +    return NULL;
> > > +
> > > +  slp_tree node;
> > > +  unsigned i;
> > > +
> > > +  /* For now, we don't know anything about externals so do not do
> > > + anything.  */  if (SLP_TREE_DEF_TYPE (root) == vect_external_def
> > > +      || SLP_TREE_DEF_TYPE (root) == vect_constant_def)
> > 
> > use a single != vect_internal_def test please
> > 
> > > +    return NULL;
> > > +  else if (SLP_TREE_CODE (root) == VEC_PERM_EXPR
> > > +      && SLP_TREE_LANE_PERMUTATION (root).exists ()
> > > +      && !SLP_TREE_SCALAR_STMTS (root).exists ())
> > 
> > I think both last tests are unnecessary
> 
> It's there to prevent it from trying to optimize  two_operands nodes
> which are a vec_perm but contain no scalar statements. I didn't find a different
> way to distinguish between the two. The SLP tree can contain a number of these
> that haven't been pattern matched away.

Well, that's because of the weak check for what you want to pattern match
below.  Certainly !SLP_TREE_SCALAR_STMTS (root).exists () isn't a reliable
way to catch these.

> > 
> > > +    {
> > > +      /* First convert this node into a load node and add it to the leaves
> > > +         list and flatten the permute from a lane to a load one.  If it's
> > > +         unneeded it will be elided later.  */
> > > +      auto_vec<stmt_vec_info> stmts;
> > > +      stmts.create (SLP_TREE_LANES (root));
> > > +      load_permutation_t load_perm;
> > > +      load_perm.create (SLP_TREE_LANES (root));
> > > +      lane_permutation_t lane_perm = SLP_TREE_LANE_PERMUTATION
> > > + (root);
> > 
> > load_perm leaks when any of the below outs is taken
> > 
> > > +      for (unsigned j = 0; j < lane_perm.length (); j++)
> > > +        {
> > > +          std::pair<unsigned, unsigned> perm = lane_perm[j];
> > > +	  /* This isn't strictly needed, but this function is a temporary
> > > +	     one for specifically pattern matching, so don't want it to
> > > +	     optimize things the remainder of the pipeline will.  */
> > > +	  if (perm.first != j)
> > > +	    goto next;
> > 
> > but please elide it nevertheless
> > 
> > > +          node = SLP_TREE_CHILDREN (root)[perm.first];
> > > +
> > > +	  if (!SLP_TREE_LOAD_PERMUTATION (node).exists ())
> > > +	    return NULL;
> > 
> > so you want to check whether this is a load, I think more to the point would
> > be a vect_internal_def + zero SLP children check.  And a comment on what
> > we test (we do lack classification of SLP nodes, so a helper like
> > vect_is_slp_load_node or so would be OK as well)
> > 
> > > +
> > > +	  stmts.quick_push (SLP_TREE_SCALAR_STMTS
> > (node)[perm.second]);
> > > +          load_perm.safe_push (SLP_TREE_LOAD_PERMUTATION
> > > +(node)[perm.second]);
> > 
> > As you're doing here lacks a check that we are actually loading from the same
> > DR group.  I think it might be easier to just collect scalar stmts and throw
> > them at vect_build_slp_tree?  That should perform the necessary
> > verification, build the appropriate lane permute and perform the CSE.  Which
> > leads to the question why the VEC_PERM node doesn't have scalar stmts set
> > while we are actually be able to compute them here ... that is, the CSE
> > opportunity could have been noticed during pattern matching itself?
> > 
> > > +        }
> > > +
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			 "converting stmts on permute node %p\n", root);
> > > +
> > > +      slp_tree *value = bst_map->get (stmts);
> > > +      if (value)
> > > +	node = *value;
> > > +      else
> > > +	{
> > > +	  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, node)
> > > +	    SLP_TREE_REF_COUNT (node)++;
> > > +
> > > +	  vec<stmt_vec_info> stmts_cpy = stmts.copy ();
> > > +	  node = vect_create_new_slp_node (stmts_cpy.copy (), 0);
> > > +	  SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (root);
> > > +	  SLP_TREE_LOAD_PERMUTATION (node) = load_perm;
> > > +	  bst_map->put (stmts_cpy, node);
> > > +	}
> > > +      SLP_TREE_REF_COUNT (node)++;
> > 
> > Adjusting the refcount here but doing the replacement in the caller is a bit
> > awkward to follow - how about passing a reference so you can adjust the
> > edge here?
> > 
> > > +
> > > +      return node;
> > > +    }
> > > +
> > > +next:
> > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node)
> > > +    {
> > > +      slp_tree value = optimize_load_redistribution_1 (bst_map, visited,
> > node);
> > > +      if (value)
> > > +	{
> > > +          SLP_TREE_CHILDREN (root)[i] = value;
> > > +          vect_free_slp_tree (node);
> > > +	}
> > > +    }
> > > +
> > > +  return NULL;
> > > +}
> > > +
> > > +/* Temporary workaround for loads not being CSEd during SLP build.  This
> > > +   function will traverse the SLP tree rooted in ROOT for INSTANCE and find
> > > +   VEC_PERM nodes that blend vectors from multiple nodes that all read
> > from the
> > > +   same DR such that the final operation is equal to a permuted load.  Such
> > > +   NODES are then directly converted into LOADS themselves.  The nodes
> > are
> > > +   CSEd using BST_MAP.  */
> > > +
> > > +static void
> > > +optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t
> > *bst_map,
> > > +			      slp_tree root)
> > > +{
> > > +  slp_tree node;
> > > +  unsigned i;
> > > +  hash_set<slp_tree> visited;
> > > +
> > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i , node)
> > > +    {
> > > +      slp_tree value = optimize_load_redistribution_1 (bst_map, &visited,
> > node);
> > > +      if (value)
> > > +	{
> > > +          SLP_TREE_CHILDREN (root)[i] = value;
> > > +          vect_free_slp_tree (node);
> > > +	}
> > > +    }
> > > +}
> > > +
> > >  /* Helper function of vect_match_slp_patterns.
> > >
> > >     Attempts to match patterns against the slp tree rooted in REF_NODE
> > > using @@ -2276,7 +2385,7 @@ static bool  vect_match_slp_patterns
> > > (slp_instance instance, vec_info *vinfo,
> > >  			 hash_set<slp_tree> *visited,
> > >  			 slp_tree_to_load_perm_map_t *perm_cache,
> > > -			 scalar_stmts_to_slp_tree_map_t * /* bst_map */)
> > > +			 scalar_stmts_to_slp_tree_map_t *bst_map)
> > >  {
> > >    DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > >    slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2291,6
> > > +2400,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info
> > > *vinfo,
> > >
> > >    if (found_p)
> > >      {
> > > +      optimize_load_redistribution (bst_map, *ref_node);
> > > +
> > >        if (dump_enabled_p ())
> > >  	{
> > >  	  dump_printf_loc (MSG_NOTE, vect_location,
> > >
> > >
> > > --
> > >
> > >
> > >     [ Part 2, Text/X-DIFF 140 lines. ]
> > >     [ Unable to print this part. ]
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

  reply	other threads:[~2021-01-07 13:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-28 13:35 Tamar Christina
2020-12-28 13:36 ` [PATCH 2/8 v9]middle-end slp: fix is_linear_load_p to prevent multiple answers Tamar Christina
2021-01-07 13:17   ` Richard Biener
2020-12-28 13:36 ` [PATCH 3/8 v9]middle-end slp: handle externals correctly in linear_loads_p Tamar Christina
2021-01-07 13:17   ` Richard Biener
2020-12-28 13:37 ` [PATCH 4/8 v9]middle-end slp: upgrade complex add to new format and fix memory leaks Tamar Christina
2021-01-07 13:18   ` Richard Biener
2020-12-28 13:37 ` [PATCH 5/8 v9]middle-end slp: support complex multiply and complex multiply conjugate Tamar Christina
2021-01-08  9:37   ` Richard Biener
2021-01-11 11:01     ` Tamar Christina
2021-01-11 12:04       ` Richard Biener
2020-12-28 13:37 ` [PATCH 6/8 v9]middle-end slp: support complex FMA and complex FMA conjugate Tamar Christina
2021-01-08  9:45   ` Richard Biener
2021-01-08  9:59     ` Tamar Christina
2021-01-08 10:17       ` Richard Biener
2021-01-08 10:21         ` Tamar Christina
2021-01-11 10:24     ` Tamar Christina
2020-12-28 13:38 ` [PATCH 7/8 v9]middle-end slp: support complex FMS and complex FMS conjugate Tamar Christina
2021-01-08  9:49   ` Richard Biener
2021-01-08 10:02     ` Tamar Christina
2020-12-28 13:38 ` [PATCH 8/8 v9]middle-end slp: Add complex operations class to share first match among all matchers Tamar Christina
2021-01-08  9:50   ` Richard Biener
2021-01-07 13:20 ` [PATCH 1/8 v9]middle-end slp: Support optimizing load distribution Richard Biener
2021-01-07 13:25   ` Tamar Christina
2021-01-07 13:36     ` Richard Biener [this message]
2021-01-11 11:01   ` Tamar Christina
2021-01-11 13:54     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.76.2101071434510.17979@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=ook@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).