Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization
Date: Mon, 14 Jun 2021 09:10:29 +0100	[thread overview]
Message-ID: <4925fee5-dcea-0c14-388c-85c881ffd918@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2105201147390.9200@zhemvz.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 4929 bytes --]

Hi,


On 20/05/2021 11:22, Richard Biener wrote:
> On Mon, 17 May 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>>
>> So this is my second attempt at finding a way to improve how we generate the
>> vector IV's and teach the vectorizer to share them between main loop and
>> epilogues. On IRC we discussed my idea to use the loop's control_iv, but that
>> was a terrible idea and I quickly threw it in the bin. The main problem, that
>> for some reason I failed to see, was that the control_iv increases by 's' and
>> the datarefs by 's' * NELEMENTS where 's' is usually 1 and NELEMENTs the
>> amount of elements we handle per iteration. That means the epilogue loops
>> would have to start from the last loop's IV * the last loop's NELEMENT's and
>> that would just cause a mess.
>>
>> Instead I started to think about creating IV's for the datarefs and what I
>> thought worked best was to create these in scalar before peeling. That way the
>> peeling mechanisms takes care of the duplication of these for the vector and
>> scalar epilogues and it also takes care of adding phi-nodes for the
>> skip_vector paths.
> How does this work for if-converted loops where we use the
> non-if-converted scalar loop for (scalar) peeling but the
> if-converted scalar loop for vectorized epilogues?  I suppose
> you're only adjusting the if-converted copy.
True hadn't thought about this :(
>
>> These new IV's have two functions:
>> 1) 'vect_create_data_ref_ptr' can use them to:
>>   a) if it's the main loop: replace the values of the 'initial' value of the
>> main loop's IV and the initial values in the skip_vector phi-nodes
>>   b) Update the the skip_vector phi-nodes argument for the non-skip path with
>> the updated vector ptr.
> b) means the prologue IV will not be dead there so we actually need
> to compute it?  I suppose IVOPTs could be teached to replace an
> IV with its final value (based on some other IV) when it's unused?
> Or does it already magically do good?
It does not and ...
>
>> 2) They are used for the scalar epilogue ensuring they share the same
>> datareference ptr.
>>
>> There are still a variety of 'hacky' elements here and a lot of testing to be
>> done, but I hope to be able to clean them away. One of the main issues I had
>> was that I had to skip a couple of checks and things for the added phi-nodes
>> and update statements as these do not have stmt_vec_info representation.
>> Though I'm not sure adding this representation at their creation was much
>> cleaner... It is something I could play around with but I thought this was a
>> good moment to ask you for input. For instance, maybe we could do this
>> transformation before analysis?
>>
>> Also be aware that because I create a IV for each dataref this leads to
>> regressions with SVE codegen for instance. NEON is able to use the post-index
>> addressing mode to increase each dr IV at access time, but SVE can't do this.
>> For this I don't know if maybe we could try to be smart and create shared
>> IV's. So rather than make them based on the actual vector ptr, use a shared
>> sizetype IV that can be shared among dr IV's with the same step. Or maybe this
>> is something for IVOPTs?
> Certainly IVOPTs could decide to use the newly created IVs in the
> scalar loops for the DRs therein as well.  But since IVOPTs only
> considers a single loop at a time it will probably not pay too
> much attention and is only influenced by the out-of-loop uses of
> the final values of the IVs.
>
> My gut feeling tells me that whatever we do we'll have to look
> into improving IVOPTs to consider multiple loops.

So I redid the IV-sharing and it's looking a lot simpler and neater, 
however it only shares IVs between vectorized loops and not scalar pro- 
or epilogues. I am not certain IVOPTs will be able to deal with these, 
as it has no knowledge of the number of iterations of each different 
loop. So take for instance a prologue peeling for alignment loop and a 
first main vectorization loop. To be able to reuse the IV's from the 
prologue in the main vectorization loop it would need to know that the 
initial start adress + PEELING_NITERS == base address for main 
vectorization loop.

I'll starting testing this approach for correctness if there are no 
major concerns. Though I suspect we will only want to turn this into a 
patch once we have the IVOPTs work done to a point where it at least 
doesn't regress codegen because of shared IVs and eventually we can look 
at how to solve the sharing between vectorized and scalar loops.

A small nitpick on my own RFC. I will probably move the 'skip_e' to 
outside of the map, as we only need one per loop_vinfo and not one per 
DR. Initially I didnt even have this skip_e in, but was using the 
creation of a dummy PHI node and then replacing it with the real thing 
later. Though this made the code simpler, especially when inserting the 
'init's stmt_list.

Kind regards,
Andre

[-- Attachment #2: iv_rfc_3.patch --]
[-- Type: text/plain, Size: 8422 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index b317df532a9a92a619de9572378437d09c632ab0..e7d0f1e657b1a0c9bec75799817242e0bc1d8282 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4909,11 +4909,27 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
 						   offset, byte_offset);
   if (new_stmt_list)
     {
-      if (pe)
-        {
-          new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmt_list);
-          gcc_assert (!new_bb);
-        }
+      if (loop_vinfo)
+	{
+	  iv_struct *iv = loop_vinfo->dr_ivs->get (dr);
+	  if (iv && iv->skip_e)
+	    {
+	      /* The initial value of the IV we are inserting here will be used
+		 in a PHI-node that will be used as the initial IV for the next
+		 vectorized epilogue, so we have to make sure we insert this
+		 NEW_STMT_LIST before the SKIP_E.  */
+	      gimple_stmt_iterator g = gsi_last_bb (iv->skip_e->src);
+	      if (gimple_code (gsi_stmt (g)) == GIMPLE_COND)
+		gsi_insert_seq_before (&g, new_stmt_list, GSI_NEW_STMT);
+	      else
+		gsi_insert_seq_after (&g, new_stmt_list, GSI_NEW_STMT);
+	    }
+	  else
+	    {
+	      new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmt_list);
+	      gcc_assert (!new_bb);
+	    }
+	}
       else
         gsi_insert_seq_before (gsi, new_stmt_list, GSI_SAME_STMT);
     }
@@ -4946,12 +4962,55 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
 
       standard_iv_increment_position (loop, &incr_gsi, &insert_after);
 
+      /* If this is an epilogue, make sure to use the previous loop's IV for
+	  the same DR as the init of this DR's IV.  */
+      if (loop_vinfo
+	  && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+	{
+	  iv_struct *iv
+	    = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)->dr_ivs->get (dr);
+	  gcc_assert (iv);
+	  aggr_ptr_init = iv->def;
+	}
+
       create_iv (aggr_ptr_init,
 		 fold_convert (aggr_ptr_type, iv_step),
 		 aggr_ptr, loop, &incr_gsi, insert_after,
 		 &indx_before_incr, &indx_after_incr);
       incr = gsi_stmt (incr_gsi);
 
+      if (loop_vinfo)
+	{
+	  iv_struct &iv = loop_vinfo->dr_ivs->get_or_insert (dr);
+	  tree incr_lhs = gimple_get_lhs (incr);
+	  if (iv.skip_e)
+	    {
+	      edge e;
+	      edge_iterator ei;
+	      tree val;
+	      /* Create a skip PHI, use INCR_LHS as the value for the not-skip
+		 path(s) and AGGR_PTR_INIT as the value for the skip path.  */
+	      iv.def = copy_ssa_name (incr_lhs);
+	      gphi *skip_phi = create_phi_node (iv.def,
+						iv.skip_e->dest);
+
+	      FOR_EACH_EDGE (e, ei, skip_phi->bb->preds)
+		{
+		  if (e == iv.skip_e)
+		    val = aggr_ptr_init;
+		  else
+		    val = incr_lhs;
+		  add_phi_arg (skip_phi, val, e, UNKNOWN_LOCATION);
+		}
+	    }
+	  else
+	    {
+	      /* No skip PHI has been created, so we can use INCR_LHS as the
+		 IV.DEF.  */
+	      iv.def = incr_lhs;
+	    }
+	}
+
       /* Copy the points-to information if it exists. */
       if (DR_PTR_INFO (dr))
 	{
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 012f48bd4870125c820049b4fc70db0ef0759bdf..2475ab8b8263499f3471b64aa10b956523fe56df 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2554,8 +2554,7 @@ class loop *
 vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 		 tree *niters_vector, tree *step_vector,
 		 tree *niters_vector_mult_vf_var, int th,
-		 bool check_profitability, bool niters_no_overflow,
-		 tree *advance)
+		 bool check_profitability, bool niters_no_overflow)
 {
   edge e, guard_e;
   tree type = TREE_TYPE (niters), guard_cond;
@@ -3059,10 +3058,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  niters = PHI_RESULT (new_phi);
 	}
 
-      /* Set ADVANCE to the number of iterations performed by the previous
-	 loop and its prologue.  */
-      *advance = niters;
-
       if (!vect_epilogues_updated_niters)
 	{
 	  /* Subtract the number of iterations performed by the vectorized loop
@@ -3090,6 +3085,21 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   adjust_vec.release ();
   free_original_copy_tables ();
 
+  if (vect_epilogues && skip_e)
+    {
+	/* Keep track of the SKIP_E to later construct a phi-node per
+	 * datareference.  */
+	unsigned int i;
+	vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+	struct data_reference *dr;
+
+	FOR_EACH_VEC_ELT (datarefs, i, dr)
+	  {
+	    iv_struct &iv = loop_vinfo->dr_ivs->get_or_insert (dr);
+	    iv.skip_e = skip_e;
+	  }
+    }
+
   return vect_epilogues ? epilog : NULL;
 }
 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ba36348b835c25bc556da71a133f81f8a6fc3745..fc2468ebc001935d86fee9b06855a02832b444ad 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -901,6 +901,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     }
 
   epilogue_vinfos.create (6);
+  dr_ivs = new hash_map<data_reference *, iv_struct> ();
 }
 
 /* Free all levels of rgroup CONTROLS.  */
@@ -927,6 +928,7 @@ _loop_vec_info::~_loop_vec_info ()
   delete ivexpr_map;
   delete scan_map;
   epilogue_vinfos.release ();
+  delete dr_ivs;
 
   /* When we release an epiloge vinfo that we do not intend to use
      avoid clearing AUX of the main loop which should continue to
@@ -9252,7 +9254,7 @@ find_in_mapping (tree t, void *context)
    prologue of the main loop.  */
 
 static void
-update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
+update_epilogue_loop_vinfo (class loop *epilogue)
 {
   loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue);
   auto_vec<gimple *> stmt_worklist;
@@ -9267,11 +9269,6 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   free (LOOP_VINFO_BBS (epilogue_vinfo));
   LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
 
-  /* Advance data_reference's with the number of iterations of the previous
-     loop and its prologue.  */
-  vect_update_inits_of_drs (epilogue_vinfo, advance, PLUS_EXPR);
-
-
   /* The EPILOGUE loop is a copy of the original loop so they share the same
      gimple UIDs.  In this loop we update the loop_vec_info of the EPILOGUE to
      point to the copied statements.  We also create a mapping of all LHS' in
@@ -9489,8 +9486,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector,
 			      &step_vector, &niters_vector_mult_vf, th,
-			      check_profitability, niters_no_overflow,
-			      &advance);
+			      check_profitability, niters_no_overflow);
 
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)
       && LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ())
@@ -9818,7 +9814,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   if (epilogue)
     {
-      update_epilogue_loop_vinfo (epilogue, advance);
+      update_epilogue_loop_vinfo (epilogue);
 
       epilogue->simduid = loop->simduid;
       epilogue->force_vectorize = loop->force_vectorize;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7dcb4cd0b46b03eef90705eed776d9c3dd797101..da7e2fe72a549634d3cb197e7136de995e20c2b0 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -545,6 +545,12 @@ typedef auto_vec<rgroup_controls> vec_loop_lens;
 
 typedef auto_vec<std::pair<data_reference*, tree> > drs_init_vec;
 
+struct iv_struct
+{
+  tree def;
+  edge skip_e;
+};
+
 /*-----------------------------------------------------------------*/
 /* Info on vectorized loops.                                       */
 /*-----------------------------------------------------------------*/
@@ -756,6 +762,8 @@ public:
      analysis.  */
   vec<_loop_vec_info *> epilogue_vinfos;
 
+  hash_map<data_reference *, iv_struct> *dr_ivs;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -1791,8 +1799,7 @@ class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
 						     class loop *, edge);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
-				    tree *, tree *, tree *, int, bool, bool,
-				    tree *);
+				    tree *, tree *, tree *, int, bool, bool);
 extern void vect_prepare_for_masked_peels (loop_vec_info);
 extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);

next prev parent reply	other threads:[~2021-06-14  8:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30  9:07 Andre Vieira (lists)
2021-05-04  9:56 ` Richard Biener
2021-05-05 11:34   ` Andre Vieira (lists)
2021-05-05 12:34     ` Richard Biener
2021-05-05 16:58       ` Andre Vieira (lists)
2021-05-07 12:05         ` Richard Biener
2021-05-17  8:14         ` Andre Vieira (lists)
2021-05-20 10:22           ` Richard Biener
2021-06-14  8:10             ` Andre Vieira (lists) [this message]
2021-06-14  8:37               ` Richard Biener
2021-06-14 10:57                 ` Richard Biener
2021-06-16 10:24                   ` Andre Vieira (lists)
2021-06-16 11:13                     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4925fee5-dcea-0c14-388c-85c881ffd918@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).