public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization
Date: Mon, 17 May 2021 09:14:49 +0100	[thread overview]
Message-ID: <c03961fa-ef0f-7ce4-180b-8414a610ee73@arm.com> (raw)
In-Reply-To: <6e4541ef-e0a1-1d2d-53f5-4bfed9a65598@arm.com>

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

Hi,

So this is my second attempt at finding a way to improve how we generate 
the vector IV's and teach the vectorizer to share them between main loop 
and epilogues. On IRC we discussed my idea to use the loop's control_iv, 
but that was a terrible idea and I quickly threw it in the bin. The main 
problem, that for some reason I failed to see, was that the control_iv 
increases by 's' and the datarefs by 's' * NELEMENTS where 's' is 
usually 1 and NELEMENTs the amount of elements we handle per iteration. 
That means the epilogue loops would have to start from the last loop's 
IV * the last loop's NELEMENT's and that would just cause a mess.

Instead I started to think about creating IV's for the datarefs and what 
I thought worked best was to create these in scalar before peeling. That 
way the peeling mechanisms takes care of the duplication of these for 
the vector and scalar epilogues and it also takes care of adding 
phi-nodes for the skip_vector paths.
These new IV's have two functions:
1) 'vect_create_data_ref_ptr' can use them to:
  a) if it's the main loop: replace the values of the 'initial' value of 
the main loop's IV and the initial values in the skip_vector phi-nodes
  b) Update the the skip_vector phi-nodes argument for the non-skip path 
with the updated vector ptr.

2) They are used for the scalar epilogue ensuring they share the same 
datareference ptr.

There are still a variety of 'hacky' elements here and a lot of testing 
to be done, but I hope to be able to clean them away. One of the main 
issues I had was that I had to skip a couple of checks and things for 
the added phi-nodes and update statements as these do not have 
stmt_vec_info representation.  Though I'm not sure adding this 
representation at their creation was much cleaner... It is something I 
could play around with but I thought this was a good moment to ask you 
for input. For instance, maybe we could do this transformation before 
analysis?

Also be aware that because I create a IV for each dataref this leads to 
regressions with SVE codegen for instance. NEON is able to use the 
post-index addressing mode to increase each dr IV at access time, but 
SVE can't do this.  For this I don't know if maybe we could try to be 
smart and create shared IV's. So rather than make them based on the 
actual vector ptr, use a shared sizetype IV that can be shared among dr 
IV's with the same step. Or maybe this is something for IVOPTs?

Let me know what ya think!

Kind regards,
Andre

[-- Attachment #2: iv_rfc_2.patch --]
[-- Type: text/plain, Size: 12264 bytes --]

diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 8001cc54f518d9d9d1a0fcfe5790d22dae109fb2..939c0a7fefd4355dd75d7646ac2ae63ce23a0e14 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -174,6 +174,8 @@ struct data_reference
 
   /* Alias information for the data reference.  */
   struct dr_alias alias;
+
+  hash_map<loop_p, tree> *iv_bases;
 };
 
 #define DR_STMT(DR)                (DR)->stmt
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 124a7bea6a94161556a6622fa7b113b3cef98bcf..f638bb3e0aa007e0bf7ad8f75fb767d3484b02ce 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1475,6 +1475,7 @@ void
 free_data_ref (data_reference_p dr)
 {
   DR_ACCESS_FNS (dr).release ();
+  delete dr->iv_bases;
   free (dr);
 }
 
@@ -1506,6 +1507,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
   DR_REF (dr) = memref;
   DR_IS_READ (dr) = is_read;
   DR_IS_CONDITIONAL_IN_STMT (dr) = is_conditional_in_stmt;
+  dr->iv_bases = new hash_map<loop_p, tree> ();
 
   dr_analyze_innermost (&DR_INNERMOST (dr), memref,
 			nest != NULL ? loop : NULL, stmt);
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 86fc118b6befb06233e5e86a01454fd7075075e1..93e14d09763da5034ba97d09b07c94c20fe25a28 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(class loop *, void *);
 
 extern void create_iv (tree, tree, tree, class loop *, gimple_stmt_iterator *,
 		       bool, tree *, tree *);
+extern void create_or_update_iv (tree, tree, tree, class loop *, gimple_stmt_iterator *,
+				  bool, tree *, tree *, gphi *, bool);
 extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
 					    class loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 28ae1316fa0eb6939a45d15e893b7386622ba60c..1709e175c382ef5d74c2f628a61c9fffe26f726d 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -57,9 +57,10 @@ static bitmap_obstack loop_renamer_obstack;
    VAR_AFTER (unless they are NULL).  */
 
 void
-create_iv (tree base, tree step, tree var, class loop *loop,
-	   gimple_stmt_iterator *incr_pos, bool after,
-	   tree *var_before, tree *var_after)
+create_or_update_iv (tree base, tree step, tree var, class loop *loop,
+		     gimple_stmt_iterator *incr_pos, bool after,
+		     tree *var_before, tree *var_after, gphi *update_phi,
+		     bool epilogue)
 {
   gassign *stmt;
   gphi *phi;
@@ -153,11 +154,64 @@ create_iv (tree base, tree step, tree var, class loop *loop,
   if (stmts)
     gsi_insert_seq_on_edge_immediate (pe, stmts);
 
+  if (update_phi) {
+      gimple_seq stmts;
+      gimple_stmt_iterator gsi;
+      imm_use_iterator imm_iter;
+      gimple *stmt;
+      tree old_va = PHI_ARG_DEF_FROM_EDGE (update_phi, loop_latch_edge (loop));
+      tree old_init = PHI_ARG_DEF_FROM_EDGE (update_phi, loop_preheader_edge (loop));
+      tree type = TREE_TYPE (old_va);
+      tree va_conv = fold_convert (type, va);
+
+      gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (va));
+      va_conv = force_gimple_operand (va_conv, &stmts, true, NULL_TREE);
+
+      if (stmts)
+	gsi_insert_seq_after (&gsi, stmts, GSI_NEW_STMT);
+
+
+      FOR_EACH_IMM_USE_STMT (stmt, imm_iter, old_va) {
+	  if (!flow_bb_inside_loop_p (loop, stmt->bb))
+	    {
+	      /* We added this IV, so the outside uses should all be in PHI's
+		 added for skip_vector handling.  */
+	      gcc_assert (gimple_code (stmt) == GIMPLE_PHI);
+	      use_operand_p use_p;
+	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+		SET_USE (use_p, va_conv);
+
+	    }
+      }
+      if (epilogue)
+	initial = old_init;
+      else {
+	  tree initial_conv = fold_convert (type, initial);
+	  stmts = NULL;
+	  initial_conv = force_gimple_operand (initial_conv, &stmts, true,
+					       NULL_TREE);
+	  if (stmts)
+	    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),
+					      stmts);
+	  replace_uses_by (old_init, initial_conv);
+	  release_ssa_name_fn (cfun, old_init);
+      }
+  }
   phi = create_phi_node (vb, loop->header);
   add_phi_arg (phi, initial, loop_preheader_edge (loop), UNKNOWN_LOCATION);
   add_phi_arg (phi, va, loop_latch_edge (loop), UNKNOWN_LOCATION);
 }
 
+void
+create_iv (tree base, tree step, tree var, class loop *loop,
+	   gimple_stmt_iterator *incr_pos, bool after,
+	   tree *var_before, tree *var_after)
+{
+  create_or_update_iv (base, step, var, loop, incr_pos, after, var_before,
+		       var_after, NULL, false);
+}
+
+
 /* Return the innermost superloop LOOP of USE_LOOP that is a superloop of
    both DEF_LOOP and USE_LOOP.  */
 
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 97c8577ebe720cf857d811c70eccf60c7655509c..87ca4891df792af73f226ebfb1ce6753818ec87e 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4941,11 +4941,14 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      gphi *update_phi
+	= as_a<gphi *> (SSA_NAME_DEF_STMT (*dr->iv_bases->get (loop)));
 
-      create_iv (aggr_ptr_init,
+      create_or_update_iv (aggr_ptr_init,
 		 fold_convert (aggr_ptr_type, iv_step),
 		 aggr_ptr, loop, &incr_gsi, insert_after,
-		 &indx_before_incr, &indx_after_incr);
+		 &indx_before_incr, &indx_after_incr, update_phi,
+		 LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) != NULL);
       incr = gsi_stmt (incr_gsi);
 
       /* Copy the points-to information if it exists. */
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 012f48bd4870125c820049b4fc70db0ef0759bdf..89e0db5d8adcd9afa709dd37a1842ec36dfa13e6 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1402,6 +1402,10 @@ find_loop_location (class loop *loop)
 static bool
 iv_phi_p (stmt_vec_info stmt_info)
 {
+  /* This is to make sure we don't try to vectorize the IV's added by
+     the vectorizer for datarefs.  */
+  if (stmt_info == NULL)
+    return false;
   gphi *phi = as_a <gphi *> (stmt_info->stmt);
   if (virtual_operand_p (PHI_RESULT (phi)))
     return false;
@@ -1439,6 +1443,10 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
 
       gphi *phi = gsi.phi ();
       stmt_vec_info phi_info = loop_vinfo->lookup_stmt (phi);
+
+      if (!phi_info)
+	continue;
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "Analyze phi: %G",
 			 phi_info->stmt);
@@ -2280,7 +2288,8 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 static void
 slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
 				    class loop *update_loop,
-				    edge guard_edge, edge merge_edge)
+				    edge guard_edge, edge merge_edge,
+				    vec<data_reference_p> *datarefs)
 {
   location_t merge_loc, guard_loc;
   edge orig_e = loop_preheader_edge (skip_loop);
@@ -2310,6 +2319,25 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
 
       /* Update phi in UPDATE_PHI.  */
       adjust_phi_and_debug_stmts (update_phi, update_e, new_res);
+
+      if (datarefs) {
+	  struct data_reference *dr;
+	  unsigned int i;
+
+	  FOR_EACH_VEC_ELT (*datarefs, i, dr) {
+	      if (*(dr->iv_bases->get (skip_loop)) == PHI_RESULT (orig_phi)) {
+		  tree &iv_base = dr->iv_bases->get_or_insert (update_loop);
+		  iv_base = PHI_RESULT (update_phi);
+		  /*
+		  tree new_dummy_var = make_ssa_name (TREE_TYPE (iv_base));
+		  SET_PHI_ARG_DEF (update_phi,
+				   loop_latch_edge (update_loop)->dest_idx,
+				   new_dummy_var);
+				   */
+	      }
+	  }
+	}
+
     }
 }
 
@@ -2831,7 +2859,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 					   irred_flag);
 	  e = EDGE_PRED (guard_to, 0);
 	  e = (e != guard_e ? e : EDGE_PRED (guard_to, 1));
-	  slpeel_update_phi_nodes_for_guard1 (prolog, loop, guard_e, e);
+	  slpeel_update_phi_nodes_for_guard1 (prolog, loop, guard_e, e,
+					      NULL);
+	  /* TODO: create dummy phi-nodes and iv-bases for prolog.  Update
+	     'loop's iv-bases.  */
 
 	  scale_bbs_frequencies (&bb_after_prolog, 1, prob_prolog);
 	  scale_loop_profile (prolog, prob_prolog, bound_prolog);
@@ -2931,7 +2962,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  skip_e = guard_e;
 	  e = EDGE_PRED (guard_to, 0);
 	  e = (e != guard_e ? e : EDGE_PRED (guard_to, 1));
-	  slpeel_update_phi_nodes_for_guard1 (first_loop, epilog, guard_e, e);
+	  slpeel_update_phi_nodes_for_guard1 (first_loop, epilog, guard_e, e,
+					      &LOOP_VINFO_DATAREFS (loop_vinfo));
 
 	  /* Simply propagate profile info from guard_bb to guard_to which is
 	     a merge point of control flow.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 3e973e774af8f9205be893e01ad9263281116885..638f620db96438b67512aa0d364a4dde57e1ea42 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9264,11 +9264,6 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   free (LOOP_VINFO_BBS (epilogue_vinfo));
   LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
 
-  /* Advance data_reference's with the number of iterations of the previous
-     loop and its prologue.  */
-  vect_update_inits_of_drs (epilogue_vinfo, advance, PLUS_EXPR);
-
-
   /* The EPILOGUE loop is a copy of the original loop so they share the same
      gimple UIDs.  In this loop we update the loop_vec_info of the EPILOGUE to
      point to the copied statements.  We also create a mapping of all LHS' in
@@ -9281,7 +9276,9 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
 	{
 	  new_stmt = epilogue_phi_gsi.phi ();
 
-	  gcc_assert (gimple_uid (new_stmt) > 0);
+	  if (gimple_uid (new_stmt) == 0)
+	    continue;
+
 	  stmt_vinfo
 	    = epilogue_vinfo->stmt_vec_infos[gimple_uid (new_stmt) - 1];
 
@@ -9299,10 +9296,9 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
 	   !gsi_end_p (epilogue_gsi); gsi_next (&epilogue_gsi))
 	{
 	  new_stmt = gsi_stmt (epilogue_gsi);
-	  if (is_gimple_debug (new_stmt))
+	  if (is_gimple_debug (new_stmt) || gimple_uid (new_stmt) == 0)
 	    continue;
 
-	  gcc_assert (gimple_uid (new_stmt) > 0);
 	  stmt_vinfo
 	    = epilogue_vinfo->stmt_vec_infos[gimple_uid (new_stmt) - 1];
 
@@ -9484,6 +9480,51 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   tree advance;
   drs_init_vec orig_drs_init;
 
+  if (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) == NULL)
+    {
+      struct data_reference *dr;
+      unsigned int i;
+      gimple_seq stmts;
+      gimple_stmt_iterator gsi = gsi_for_stmt (get_loop_exit_condition (loop));
+
+      FOR_EACH_VEC_ELT (LOOP_VINFO_DATAREFS (loop_vinfo), i, dr) {
+	  tree &iv_base = dr->iv_bases->get_or_insert (loop);
+	  tree type = TREE_TYPE (DR_BASE_ADDRESS (dr));
+	  gphi *phi = create_phi_node (type, loop->header);
+
+	  /* Create an update.  */
+	  int step_i = int_cst_value (DR_STEP (dr));
+	  step_i = step_i / int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))));
+	  tree step = build_int_cst (sizetype, step_i);
+	  tree update = fold_build2 (POINTER_PLUS_EXPR, type,
+				     PHI_RESULT (phi),
+				     step);
+	  stmts = NULL;
+	  update = force_gimple_operand (update, &stmts, true, NULL_TREE);
+	  gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
+
+	  /* Replace IV in ref. */
+	  tree mem = fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)),
+				  PHI_RESULT (phi),
+				  build_zero_cst (type));
+	  for (int j = 0; j < gimple_num_ops (DR_STMT (dr)); ++j) {
+	      tree op = gimple_op (DR_STMT (dr), j);
+	     if (op == DR_REF (dr))
+		gimple_set_op (DR_STMT (dr), j, mem);
+	      else
+		simplify_replace_tree (op, DR_REF (dr), mem, NULL, NULL, false);
+	  }
+	  update_stmt (DR_STMT (dr));
+
+	  tree dummy_base = make_ssa_name (type);
+	  add_phi_arg (phi, update, single_succ_edge (loop->latch),
+		       UNKNOWN_LOCATION);
+	  add_phi_arg (phi, dummy_base, loop_preheader_edge (loop), UNKNOWN_LOCATION);
+	  iv_base = PHI_RESULT (phi);
+      }
+    }
+
+
   epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector,
 			      &step_vector, &niters_vector_mult_vf, th,
 			      check_profitability, niters_no_overflow,

  parent reply	other threads:[~2021-05-17  8:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30  9:07 Andre Vieira (lists)
2021-05-04  9:56 ` Richard Biener
2021-05-05 11:34   ` Andre Vieira (lists)
2021-05-05 12:34     ` Richard Biener
2021-05-05 16:58       ` Andre Vieira (lists)
2021-05-07 12:05         ` Richard Biener
2021-05-17  8:14         ` Andre Vieira (lists) [this message]
2021-05-20 10:22           ` Richard Biener
2021-06-14  8:10             ` Andre Vieira (lists)
2021-06-14  8:37               ` Richard Biener
2021-06-14 10:57                 ` Richard Biener
2021-06-16 10:24                   ` Andre Vieira (lists)
2021-06-16 11:13                     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c03961fa-ef0f-7ce4-180b-8414a610ee73@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).