[gcc(refs/users/rguenth/heads/vect-force-slp)] Improve combined store node splitting

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/rguenth/heads/vect-force-slp)] Improve combined store node splitting
@ 2024-03-20 14:19 Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2024-03-20 14:19 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3a1fe1d6d9412430435b9672f3a1967a395d191d

commit 3a1fe1d6d9412430435b9672f3a1967a395d191d
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Mar 20 14:55:08 2024 +0100

    Improve combined store node splitting
    
    The following improves on the initial "Avoid splitting store dataref
    groups during SLP discovery" change, in particular on how we deal
    with the multi-input VEC_PERM node combining back the SLP instances
    into the single node for the whole group store.  Instead of
    combining the last two inputs recursively this more carefully
    selects nodes to combine (but still recursively), combining the
    first two nodes with the least number of inputs.  That should avoid
    the need for three-input permutes consistently.
    
            * tree-vect-slp.cc (vect_build_slp_instance): Split merge
            permute node in a better manner.

Diff:
---
 gcc/tree-vect-slp.cc | 66 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 71152411776..12b9cb2c519 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3619,18 +3619,45 @@ vect_build_slp_instance (vec_info *vinfo,
 		}
 
 	      /* ???  Now we have a single permute node but when that's
-	         fed more than two inputs it's prone to hit the limitation
+		 fed more than two inputs it's prone to hit the limitation
 		 on at most two sources for a VEC_PERM_EXPR.  Ideally
 		 we'd defer the following to the optimize-slp pass but
 		 for now split it here.
-		 ???  Optimally we'd produce permute nodes feeding in
-		 the same number of lanes from each input and also have
-		 the same vector type (only the width will eventually
-		 differ here), for now just do "something".  */
+		 For now perform pairwise reduction, reducing the two inputs
+		 with the least number of lanes to one and then repeat until
+		 we end up with two inputs.  */
 	      while (SLP_TREE_CHILDREN (perm).length () > 2)
 		{
-		  slp_tree b = SLP_TREE_CHILDREN (perm).pop ();
-		  slp_tree a = SLP_TREE_CHILDREN (perm).pop ();
+		  /* Pick the two nodes with the least number of lanes,
+		     prefer the earliest candidate and maintain ai < bi.  */
+		  int ai = -1;
+		  int bi = -1;
+		  for (unsigned ci = 0;
+		       ci < SLP_TREE_CHILDREN (perm).length (); ++ci)
+		    {
+		      if (ai == -1)
+			ai = ci;
+		      else if (bi == -1)
+			bi = ci;
+		      else if ((SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ci])
+				< SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ai]))
+			       || (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ci])
+				   < SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[bi])))
+			{
+			  if (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ai])
+			      <= SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[bi]))
+			    bi = ci;
+			  else
+			    {
+			      ai = bi;
+			      bi = ci;
+			    }
+			}
+		    }
+
+		  /* Produce a merge of nodes ai and bi.  */
+		  slp_tree a = SLP_TREE_CHILDREN (perm)[ai];
+		  slp_tree b = SLP_TREE_CHILDREN (perm)[bi];
 		  unsigned n = SLP_TREE_LANES (a) + SLP_TREE_LANES (b);
 		  slp_tree permab = vect_create_new_slp_node (2, VEC_PERM_EXPR);
 		  SLP_TREE_LANES (permab) = n;
@@ -3647,12 +3674,25 @@ vect_build_slp_instance (vec_info *vinfo,
 		  for (unsigned k = 0; k < SLP_TREE_LANES (b); ++k)
 		    SLP_TREE_LANE_PERMUTATION (permab)
 		      .quick_push (std::make_pair (1, k));
-		  /* ???  Popluate SLP_TREE_SCALAR_STMTS/OPS of permab.  */
-		  SLP_TREE_CHILDREN (perm).quick_push (permab);
-		  for (unsigned k = group_size - n; k < group_size; ++k)
-		    SLP_TREE_LANE_PERMUTATION (perm)[k]
-		      = std::make_pair (SLP_TREE_CHILDREN (perm).length () - 1,
-					k - (group_size - n));
+
+		  /* Put the merged node into 'perm', in place of a  */
+		  SLP_TREE_CHILDREN (perm)[ai] = permab;
+		  /* Adjust the references to b in the permutation
+		     of perm and to the later children which we'll
+		     remove.  */
+		  for (unsigned k = 0; k < SLP_TREE_LANES (perm); ++k)
+		    {
+		      std::pair<unsigned, unsigned> &p
+			  = SLP_TREE_LANE_PERMUTATION (perm)[k];
+		      if (p.first == (unsigned) bi)
+			{
+			  p.first = ai;
+			  p.second += SLP_TREE_LANES (a);
+			}
+		      else if (p.first > (unsigned) bi)
+			p.first--;
+		    }
+		  SLP_TREE_CHILDREN (perm).ordered_remove (bi);
 		}
 
 	  /* Create a new SLP instance.  */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [gcc(refs/users/rguenth/heads/vect-force-slp)] Improve combined store node splitting
@ 2024-05-13 14:28 Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2024-05-13 14:28 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4336060fe2db8ec41c0f108034a4ae8de89e5fa1

commit 4336060fe2db8ec41c0f108034a4ae8de89e5fa1
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Mar 20 14:55:08 2024 +0100

    Improve combined store node splitting
    
    The following improves on the initial "Avoid splitting store dataref
    groups during SLP discovery" change, in particular on how we deal
    with the multi-input VEC_PERM node combining back the SLP instances
    into the single node for the whole group store.  Instead of
    combining the last two inputs recursively this more carefully
    selects nodes to combine (but still recursively), combining the
    first two nodes with the least number of inputs.  That should avoid
    the need for three-input permutes consistently.
    
            * tree-vect-slp.cc (vect_build_slp_instance): Split merge
            permute node in a better manner.

Diff:
---
 gcc/tree-vect-slp.cc | 66 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f3743997e9cd..7e6ff07db0ff 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3654,18 +3654,45 @@ vect_build_slp_instance (vec_info *vinfo,
 		}
 
 	      /* ???  Now we have a single permute node but when that's
-	         fed more than two inputs it's prone to hit the limitation
+		 fed more than two inputs it's prone to hit the limitation
 		 on at most two sources for a VEC_PERM_EXPR.  Ideally
 		 we'd defer the following to the optimize-slp pass but
 		 for now split it here.
-		 ???  Optimally we'd produce permute nodes feeding in
-		 the same number of lanes from each input and also have
-		 the same vector type (only the width will eventually
-		 differ here), for now just do "something".  */
+		 For now perform pairwise reduction, reducing the two inputs
+		 with the least number of lanes to one and then repeat until
+		 we end up with two inputs.  */
 	      while (SLP_TREE_CHILDREN (perm).length () > 2)
 		{
-		  slp_tree b = SLP_TREE_CHILDREN (perm).pop ();
-		  slp_tree a = SLP_TREE_CHILDREN (perm).pop ();
+		  /* Pick the two nodes with the least number of lanes,
+		     prefer the earliest candidate and maintain ai < bi.  */
+		  int ai = -1;
+		  int bi = -1;
+		  for (unsigned ci = 0;
+		       ci < SLP_TREE_CHILDREN (perm).length (); ++ci)
+		    {
+		      if (ai == -1)
+			ai = ci;
+		      else if (bi == -1)
+			bi = ci;
+		      else if ((SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ci])
+				< SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ai]))
+			       || (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ci])
+				   < SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[bi])))
+			{
+			  if (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[ai])
+			      <= SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[bi]))
+			    bi = ci;
+			  else
+			    {
+			      ai = bi;
+			      bi = ci;
+			    }
+			}
+		    }
+
+		  /* Produce a merge of nodes ai and bi.  */
+		  slp_tree a = SLP_TREE_CHILDREN (perm)[ai];
+		  slp_tree b = SLP_TREE_CHILDREN (perm)[bi];
 		  unsigned n = SLP_TREE_LANES (a) + SLP_TREE_LANES (b);
 		  slp_tree permab = vect_create_new_slp_node (2, VEC_PERM_EXPR);
 		  SLP_TREE_LANES (permab) = n;
@@ -3682,12 +3709,25 @@ vect_build_slp_instance (vec_info *vinfo,
 		  for (unsigned k = 0; k < SLP_TREE_LANES (b); ++k)
 		    SLP_TREE_LANE_PERMUTATION (permab)
 		      .quick_push (std::make_pair (1, k));
-		  /* ???  Popluate SLP_TREE_SCALAR_STMTS/OPS of permab.  */
-		  SLP_TREE_CHILDREN (perm).quick_push (permab);
-		  for (unsigned k = group_size - n; k < group_size; ++k)
-		    SLP_TREE_LANE_PERMUTATION (perm)[k]
-		      = std::make_pair (SLP_TREE_CHILDREN (perm).length () - 1,
-					k - (group_size - n));
+
+		  /* Put the merged node into 'perm', in place of a  */
+		  SLP_TREE_CHILDREN (perm)[ai] = permab;
+		  /* Adjust the references to b in the permutation
+		     of perm and to the later children which we'll
+		     remove.  */
+		  for (unsigned k = 0; k < SLP_TREE_LANES (perm); ++k)
+		    {
+		      std::pair<unsigned, unsigned> &p
+			  = SLP_TREE_LANE_PERMUTATION (perm)[k];
+		      if (p.first == (unsigned) bi)
+			{
+			  p.first = ai;
+			  p.second += SLP_TREE_LANES (a);
+			}
+		      else if (p.first > (unsigned) bi)
+			p.first--;
+		    }
+		  SLP_TREE_CHILDREN (perm).ordered_remove (bi);
 		}
 
 	  /* Create a new SLP instance.  */

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-05-13 14:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-20 14:19 [gcc(refs/users/rguenth/heads/vect-force-slp)] Improve combined store node splitting Richard Biener
2024-05-13 14:28 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).