public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
@ 2023-10-02  7:41 Tamar Christina
  2023-10-02  7:41 ` [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits Tamar Christina
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tamar Christina @ 2023-10-02  7:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 37602 bytes --]

Hi All,

This is extracted out of the patch series to support early break vectorization
in order to simplify the review of that patch series.

The goal of this one is to separate out the refactoring from the new
functionality.

This first patch separates out the vectorizer's definition of an exit to their
own values inside loop_vinfo.  During vectorization we can have three separate
copies for each loop: scalar, vectorized, epilogue.  The scalar loop can also be
the versioned loop before peeling.

Because of this we track 3 different exits inside loop_vinfo corresponding to
each of these loops.  Additionally each function that uses an exit, when not
obviously clear which exit is needed will now take the exit explicitly as an
argument.

This is because often times the callers switch the loops being passed around.
While the caller knows which loops it is, the callee does not.

For now the loop exits are simply initialized to same value as before determined
by single_exit (..).

No change in functionality is expected throughout this patch series.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
	(loop_distribution::distribute_loop): Bail out of not single exit.
	* tree-scalar-evolution.cc (get_loop_exit_condition): New.
	* tree-scalar-evolution.h (get_loop_exit_condition): New.
	* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
	explicitly.
	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
	vect_set_loop_condition_partial_vectors_avx512,
	vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
	take exit.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
	return new peeled corresponding peeled exit.
	(slpeel_can_duplicate_loop_p): Explicitly take exit.
	(find_loop_location): Handle not knowing an explicit exit.
	(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
	find_guard_arg, slpeel_update_phi_nodes_for_loops,
	slpeel_update_phi_nodes_for_guard2): Use new exits.
	(vect_do_peeling): Update bookkeeping to keep track of exits.
	* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
	analyze.
	(vec_init_loop_exit_info): New.
	(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
	vec_epilogue_loop_iv, scalar_loop_iv.
	(vect_analyze_loop_form): Initialize exits.
	(vect_create_loop_vinfo): Set main exit.
	(vect_create_epilog_for_reduction, vectorizable_live_operation,
	vect_transform_loop): Use it.
	(scale_profile_for_vect_loop): Explicitly take exit to scale.
	* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
	* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
	LOOP_VINFO_SCALAR_IV_EXIT): New.
	(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
	scalar_loop_iv.
	(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
	slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
	(vec_init_loop_exit_info): New.
	(struct vect_loop_form_info): Add loop_exit.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index a28470b66ea935741a61fb73961ed7c927543a3d..902edc49ab588152a5b845f2c8a42a7e2a1d6080 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -949,7 +949,8 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
+						NULL, preheader, NULL);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
@@ -3043,6 +3044,24 @@ loop_distribution::distribute_loop (class loop *loop,
       return 0;
     }
 
+  /* Loop distribution only does prologue peeling but we still need to
+     initialize loop exit information.  However we only support single exits at
+     the moment.  As such, should exit information not have been provided and we
+     have more than one exit, bail out.  */
+  if (!single_exit (loop))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Loop %d not distributed: too many exits.\n",
+		 loop->num);
+
+      free_rdg (rdg);
+      loop_nest.release ();
+      free_data_refs (datarefs_vec);
+      delete ddrs_table;
+      return 0;
+    }
+
   data_reference_p dref;
   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
     dref->aux = (void *) (uintptr_t) i;
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index c58a8a16e81573aada38e912b7c58b3e1b23b66d..f35ca1bded0b841179e4958645d264ad23684019 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern tree number_of_latch_executions (class loop *);
 extern gcond *get_loop_exit_condition (const class loop *);
+extern gcond *get_loop_exit_condition (const_edge);
 
 extern void scev_initialize (void);
 extern bool scev_initialized_p (void);
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 3fb6951e6085352c027d32c3548246042b98b64b..7cafe5ce576079921e380aaab5c5c4aa84cea372 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -1292,9 +1292,17 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr,
 
 gcond *
 get_loop_exit_condition (const class loop *loop)
+{
+  return get_loop_exit_condition (single_exit (loop));
+}
+
+/* If the statement just before the EXIT_EDGE contains a condition then
+   return the condition, otherwise NULL. */
+
+gcond *
+get_loop_exit_condition (const_edge exit_edge)
 {
   gcond *res = NULL;
-  edge exit_edge = single_exit (loop);
 
   if (dump_file && (dump_flags & TDF_SCEV))
     fprintf (dump_file, "(get_loop_exit_condition \n  ");
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 40ab568fe355964b878d770010aa9eeaef63eeac..9607a9fb25da26591ffd8071a02495f2042e0579 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2078,7 +2078,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
+				       LOOP_VINFO_IV_EXIT (loop_vinfo))
       || loop->inner)
     do_peeling = false;
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 09641901ff1e5c03dd07ab6f85dd67288f940ea2..e06717272aafc6d31cbdcb94840ac25de616da6d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -803,7 +803,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
    final gcond.  */
 
 static gcond *
-vect_set_loop_condition_partial_vectors (class loop *loop,
+vect_set_loop_condition_partial_vectors (class loop *loop, edge exit_edge,
 					 loop_vec_info loop_vinfo, tree niters,
 					 tree final_iv, bool niters_maybe_zero,
 					 gimple_stmt_iterator loop_cond_gsi)
@@ -904,7 +904,6 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   add_header_seq (loop, header_seq);
 
   /* Get a boolean result that tells us whether to iterate.  */
-  edge exit_edge = single_exit (loop);
   gcond *cond_stmt;
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
       && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
@@ -935,7 +934,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   if (final_iv)
     {
       gassign *assign = gimple_build_assign (final_iv, orig_niters);
-      gsi_insert_on_edge_immediate (single_exit (loop), assign);
+      gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
   return cond_stmt;
@@ -953,6 +952,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
 static gcond *
 vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
+					 edge exit_edge,
 					 loop_vec_info loop_vinfo, tree niters,
 					 tree final_iv,
 					 bool niters_maybe_zero,
@@ -1144,7 +1144,6 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
   add_preheader_seq (loop, preheader_seq);
 
   /* Adjust the exit test using the decrementing IV.  */
-  edge exit_edge = single_exit (loop);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR;
   /* When we peel for alignment with niter_skip != 0 this can
      cause niter + niter_skip to wrap and since we are comparing the
@@ -1183,7 +1182,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
 {
@@ -1191,13 +1191,12 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   gcond *cond_stmt;
   gcond *orig_cond;
   edge pe = loop_preheader_edge (loop);
-  edge exit_edge = single_exit (loop);
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   enum tree_code code;
   tree niters_type = TREE_TYPE (niters);
 
-  orig_cond = get_loop_exit_condition (loop);
+  orig_cond = get_loop_exit_condition (exit_edge);
   gcc_assert (orig_cond);
   loop_cond_gsi = gsi_for_stmt (orig_cond);
 
@@ -1305,19 +1304,18 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   if (final_iv)
     {
       gassign *assign;
-      edge exit = single_exit (loop);
-      gcc_assert (single_pred_p (exit->dest));
+      gcc_assert (single_pred_p (exit_edge->dest));
       tree phi_dest
 	= integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
       /* Make sure to maintain LC SSA form here and elide the subtraction
 	 if the value is zero.  */
-      gphi *phi = create_phi_node (phi_dest, exit->dest);
-      add_phi_arg (phi, indx_after_incr, exit, UNKNOWN_LOCATION);
+      gphi *phi = create_phi_node (phi_dest, exit_edge->dest);
+      add_phi_arg (phi, indx_after_incr, exit_edge, UNKNOWN_LOCATION);
       if (!integer_zerop (init))
 	{
 	  assign = gimple_build_assign (final_iv, MINUS_EXPR,
 					phi_dest, init);
-	  gimple_stmt_iterator gsi = gsi_after_labels (exit->dest);
+	  gimple_stmt_iterator gsi = gsi_after_labels (exit_edge->dest);
 	  gsi_insert_before (&gsi, assign, GSI_SAME_STMT);
 	}
     }
@@ -1348,29 +1346,33 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
    Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
 
 void
-vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
+vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo,
 			 tree niters, tree step, tree final_iv,
 			 bool niters_maybe_zero)
 {
   gcond *cond_stmt;
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_loop_exit_condition (loop_e);
   gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
 
   if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
     {
       if (LOOP_VINFO_PARTIAL_VECTORS_STYLE (loop_vinfo) == vect_partial_vectors_avx512)
-	cond_stmt = vect_set_loop_condition_partial_vectors_avx512 (loop, loop_vinfo,
+	cond_stmt = vect_set_loop_condition_partial_vectors_avx512 (loop, loop_e,
+								    loop_vinfo,
 								    niters, final_iv,
 								    niters_maybe_zero,
 								    loop_cond_gsi);
       else
-	cond_stmt = vect_set_loop_condition_partial_vectors (loop, loop_vinfo,
+	cond_stmt = vect_set_loop_condition_partial_vectors (loop, loop_e,
+							     loop_vinfo,
 							     niters, final_iv,
 							     niters_maybe_zero,
 							     loop_cond_gsi);
     }
   else
-    cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv,
+    cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop_e, loop,
+						niters,
+						step, final_iv,
 						niters_maybe_zero,
 						loop_cond_gsi);
 
@@ -1439,7 +1441,6 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
 		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
-
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1447,8 +1448,9 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    entry or exit of LOOP.  */
 
 class loop *
-slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
-					class loop *scalar_loop, edge e)
+slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
+					class loop *scalar_loop,
+					edge scalar_exit, edge e, edge *new_e)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1458,13 +1460,16 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   edge exit, new_exit;
   bool duplicate_outer_loop = false;
 
-  exit = single_exit (loop);
+  exit = loop_exit;
   at_exit = (e == exit);
   if (!at_exit && e != loop_preheader_edge (loop))
     return NULL;
 
   if (scalar_loop == NULL)
-    scalar_loop = loop;
+    {
+      scalar_loop = loop;
+      scalar_exit = loop_exit;
+    }
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1490,13 +1495,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   bbs[0] = preheader;
   new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
-  exit = single_exit (scalar_loop);
   copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
-	    &exit, 1, &new_exit, NULL,
+	    &scalar_exit, 1, &new_exit, NULL,
 	    at_exit ? loop->latch : e->src, true);
-  exit = single_exit (loop);
+  exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  if (new_e)
+    *new_e = new_exit;
+
   /* Before installing PHI arguments make sure that the edges
      into them match that of the scalar loop we analyzed.  This
      makes sure the SLP tree matches up between the main vectorized
@@ -1537,8 +1544,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
 	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
 	 header) to have current_def set, so copy them over.  */
-      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
-						exit);
+      slpeel_duplicate_current_defs_from_edges (scalar_exit, exit);
       slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
 							   0),
 						EDGE_SUCC (loop->latch, 0));
@@ -1696,11 +1702,11 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
  */
 
 bool
-slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
+slpeel_can_duplicate_loop_p (const class loop *loop, const_edge exit_e,
+			     const_edge e)
 {
-  edge exit_e = single_exit (loop);
   edge entry_e = loop_preheader_edge (loop);
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_loop_exit_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
@@ -1709,7 +1715,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   if (!loop_outer (loop)
       || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
-      || !single_exit (loop)
+      || !exit_e
       /* Verify that new loop exit condition can be trivially modified.  */
       || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
       || (e != exit_e && e != entry_e))
@@ -1722,7 +1728,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   return ret;
 }
 
-/* Function vect_get_loop_location.
+/* Function find_loop_location.
 
    Extract the location of the loop in the source code.
    If the loop is not well formed for vectorization, an estimated
@@ -1739,11 +1745,19 @@ find_loop_location (class loop *loop)
   if (!loop)
     return dump_user_location_t ();
 
-  stmt = get_loop_exit_condition (loop);
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+    {
+      /* We only care about the loop location, so use any exit with location
+	 information.  */
+      for (edge e : get_loop_exit_edges (loop))
+	{
+	  stmt = get_loop_exit_condition (e);
 
-  if (stmt
-      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
-    return stmt;
+	  if (stmt
+	      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
+	    return stmt;
+	}
+    }
 
   /* If we got here the loop is probably not "well formed",
      try to estimate the loop location */
@@ -1962,7 +1976,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-  basic_block exit_bb = single_exit (loop)->dest;
+
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
@@ -2529,10 +2544,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 {
   /* We should be using a step_vector of VF if VF is variable.  */
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
-  basic_block exit_bb = single_exit (loop)->dest;
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   gcc_assert (niters_vector_mult_vf_ptr != NULL);
   tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
@@ -2555,11 +2569,11 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
    NULL.  */
 
 static tree
-find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
-		gphi *lcssa_phi)
+find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
+		class loop *epilog ATTRIBUTE_UNUSED,
+		const_edge e, gphi *lcssa_phi)
 {
   gphi_iterator gsi;
-  edge e = single_exit (loop);
 
   gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
@@ -2620,7 +2634,8 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
 
 static void
 slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
-				   class loop *first, class loop *second,
+				   class loop *first, edge first_loop_e,
+				   class loop *second, edge second_loop_e,
 				   bool create_lcssa_for_iv_phis)
 {
   gphi_iterator gsi_update, gsi_orig;
@@ -2628,7 +2643,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 
   edge first_latch_e = EDGE_SUCC (first->latch, 0);
   edge second_preheader_e = loop_preheader_edge (second);
-  basic_block between_bb = single_exit (first)->dest;
+  basic_block between_bb = first_loop_e->dest;
 
   gcc_assert (between_bb == second_preheader_e->src);
   gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
@@ -2651,7 +2666,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 	{
 	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
 	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
+	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
 	  arg = new_res;
 	}
 
@@ -2664,7 +2679,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
      for correct vectorization of live stmts.  */
   if (loop == first)
     {
-      basic_block orig_exit = single_exit (second)->dest;
+      basic_block orig_exit = second_loop_e->dest;
       for (gsi_orig = gsi_start_phis (orig_exit);
 	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
 	{
@@ -2673,13 +2688,14 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
 	    continue;
 
+	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
 	  /* Already created in the above loop.   */
-	  if (find_guard_arg (first, second, orig_phi))
+	  if (find_guard_arg (first, second, exit_e, orig_phi))
 	    continue;
 
 	  tree new_res = copy_ssa_name (orig_arg);
 	  gphi *lcphi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION);
+	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
 	}
     }
 }
@@ -2847,7 +2863,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
       if (!merge_arg)
 	merge_arg = old_arg;
 
-      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
+      tree guard_arg
+	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
       /* If the var is live after loop but not a reduction, we simply
 	 use the old arg.  */
       if (!guard_arg)
@@ -3201,27 +3218,37 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     }
 
   if (vect_epilogues)
-    /* Make sure to set the epilogue's epilogue scalar loop, such that we can
-       use the original scalar loop as remaining epilogue if necessary.  */
-    LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
-      = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
+    {
+      /* Make sure to set the epilogue's epilogue scalar loop, such that we can
+	 use the original scalar loop as remaining epilogue if necessary.  */
+      LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
+	= LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
+      LOOP_VINFO_SCALAR_IV_EXIT (epilogue_vinfo)
+	= LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
+    }
 
   if (prolog_peeling)
     {
       e = loop_preheader_edge (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
+      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, exit_e, e));
 
       /* Peel prolog and put it on preheader edge of loop.  */
-      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
+      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
+      edge prolog_e = NULL;
+      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, exit_e,
+						       scalar_loop, scalar_e,
+						       e, &prolog_e);
       gcc_assert (prolog);
       prolog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
+      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
+					 exit_e, true);
       first_loop = prolog;
       reset_original_copy_tables ();
 
       /* Update the number of iterations for prolog loop.  */
       tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog));
-      vect_set_loop_condition (prolog, NULL, niters_prolog,
+      vect_set_loop_condition (prolog, prolog_e, loop_vinfo, niters_prolog,
 			       step_prolog, NULL_TREE, false);
 
       /* Skip the prolog loop.  */
@@ -3275,8 +3302,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
   if (epilog_peeling)
     {
-      e = single_exit (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
+      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e, e));
 
       /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
 	 said epilog then we should use a copy of the main loop as a starting
@@ -3285,12 +3312,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 If we are not vectorizing the epilog then we should use the scalar loop
 	 as the transformations mentioned above make less or no sense when not
 	 vectorizing.  */
+      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
       epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
-      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
+      edge epilog_e = vect_epilogues ? e : scalar_e;
+      edge new_epilog_e = NULL;
+      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
+						       epilog_e, e,
+						       &new_epilog_e);
+      LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
       gcc_assert (epilog);
-
       epilog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
+      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
+					 new_epilog_e, false);
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
       /* Scalar version loop may be preferred.  In this case, add guard
@@ -3374,16 +3407,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 				    niters, niters_vector_mult_vf);
-	  guard_bb = single_exit (loop)->dest;
-	  guard_to = split_edge (single_exit (epilog));
+	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
+	  guard_to = split_edge (epilog_e);
 	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
 					   skip_vector ? anchor : guard_bb,
 					   prob_epilog.invert (),
 					   irred_flag);
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
-	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
-					      single_exit (epilog));
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
@@ -3416,6 +3449,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     {
       epilog->aux = epilogue_vinfo;
       LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
+      LOOP_VINFO_IV_EXIT (epilogue_vinfo)
+	= LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
 
       loop_constraint_clear (epilog, LOOP_C_INFINITE);
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7b133cd7acc6bcf0bad26423e9993a..6e60d84143626a8e1d801bb580f4dcebc73c7ba7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -855,10 +855,9 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
 
 
 static gcond *
-vect_get_loop_niters (class loop *loop, tree *assumptions,
+vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
 		      tree *number_of_iterations, tree *number_of_iterationsm1)
 {
-  edge exit = single_exit (loop);
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
   gcond *cond = get_loop_exit_condition (loop);
@@ -927,6 +926,20 @@ vect_get_loop_niters (class loop *loop, tree *assumptions,
   return cond;
 }
 
+/*  Determine the main loop exit for the vectorizer.  */
+
+edge
+vec_init_loop_exit_info (class loop *loop)
+{
+  /* Before we begin we must first determine which exit is the main one and
+     which are auxilary exits.  */
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  if (exits.length () == 1)
+    return exits[0];
+  else
+    return NULL;
+}
+
 /* Function bb_in_loop_p
 
    Used as predicate for dfs order traversal of the loop bbs.  */
@@ -987,7 +1000,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
     scalar_loop (NULL),
-    orig_loop_info (NULL)
+    orig_loop_info (NULL),
+    vec_loop_iv (NULL),
+    vec_epilogue_loop_iv (NULL),
+    scalar_loop_iv (NULL)
 {
   /* CHECKME: We want to visit all BBs before their successors (except for
      latch blocks, for which this assertion wouldn't hold).  In the simple
@@ -1646,6 +1662,18 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 {
   DUMP_VECT_SCOPE ("vect_analyze_loop_form");
 
+  edge exit_e = vec_init_loop_exit_info (loop);
+  if (!exit_e)
+    return opt_result::failure_at (vect_location,
+				   "not vectorized:"
+				   " could not determine main exit from"
+				   " loop with multiple exits.\n");
+  info->loop_exit = exit_e;
+  if (dump_enabled_p ())
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "using as main loop exit: %d -> %d [AUX: %p]\n",
+		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
+
   /* Different restrictions apply when we are considering an inner-most loop,
      vs. an outer (nested) loop.
      (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -1767,7 +1795,7 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   " abnormal loop exit edge.\n");
 
   info->loop_cond
-    = vect_get_loop_niters (loop, &info->assumptions,
+    = vect_get_loop_niters (loop, e, &info->assumptions,
 			    &info->number_of_iterations,
 			    &info->number_of_iterationsm1);
   if (!info->loop_cond)
@@ -1821,6 +1849,9 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
 
   stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
   STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+
+  LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -3063,9 +3094,9 @@ start_over:
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
       if (!vect_can_advance_ivs_p (loop_vinfo)
-	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
-					   single_exit (LOOP_VINFO_LOOP
-							 (loop_vinfo))))
+	  || !slpeel_can_duplicate_loop_p (loop,
+					   LOOP_VINFO_IV_EXIT (loop_vinfo),
+					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
         {
 	  ok = opt_result::failure_at (vect_location,
 				       "not vectorized: can't create required "
@@ -6002,7 +6033,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = single_exit (loop)->dest;
+  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
   for (unsigned i = 0; i < vec_num; i++)
@@ -6018,7 +6049,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
 	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
+	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -10416,12 +10447,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = single_exit (loop)->dest;
+      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10965,7 +10996,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
    profile.  */
 
 static void
-scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
+scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool flat)
 {
   /* For flat profiles do not scale down proportionally by VF and only
      cap by known iteration count bounds.  */
@@ -10980,7 +11011,6 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
       return;
     }
   /* Loop body executes VF fewer times and exit increases VF times.  */
-  edge exit_e = single_exit (loop);
   profile_count entry_count = loop_preheader_edge (loop)->count ();
 
   /* If we have unreliable loop profile avoid dropping entry
@@ -11350,7 +11380,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
-  edge e = single_exit (loop);
+  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
   if (! single_pred_p (e->dest))
     {
       split_loop_exit_edge (e, true);
@@ -11376,7 +11406,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
      loop closed PHI nodes on the exit.  */
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
     {
-      e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
+      e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
       if (! single_pred_p (e->dest))
 	{
 	  split_loop_exit_edge (e, true);
@@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
      a zero NITERS becomes a nonzero NITERS_VECTOR.  */
   if (integer_onep (step_vector))
     niters_no_overflow = true;
-  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
-			   niters_vector_mult_vf, !niters_no_overflow);
+  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo), loop_vinfo,
+			   niters_vector, step_vector, niters_vector_mult_vf,
+			   !niters_no_overflow);
 
   unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
 
@@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 			  assumed_vf) - 1
 	 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
 			   assumed_vf) - 1);
-  scale_profile_for_vect_loop (loop, assumed_vf, flat);
+  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
+			       assumed_vf, flat);
 
   if (dump_enabled_p ())
     {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e3740ecc4377f5a31e54 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -919,10 +919,24 @@ public:
      analysis.  */
   vec<_loop_vec_info *> epilogue_vinfos;
 
+  /* The controlling loop IV for the current loop when vectorizing.  This IV
+     controls the natural exits of the loop.  */
+  edge vec_loop_iv;
+
+  /* The controlling loop IV for the epilogue loop when vectorizing.  This IV
+     controls the natural exits of the loop.  */
+  edge vec_epilogue_loop_iv;
+
+  /* The controlling loop IV for the scalar loop being vectorized.  This IV
+     controls the natural exits of the loop.  */
+  edge scalar_loop_iv;
 } *loop_vec_info;
 
 /* Access Functions.  */
 #define LOOP_VINFO_LOOP(L)                 (L)->loop
+#define LOOP_VINFO_IV_EXIT(L)              (L)->vec_loop_iv
+#define LOOP_VINFO_EPILOGUE_IV_EXIT(L)     (L)->vec_epilogue_loop_iv
+#define LOOP_VINFO_SCALAR_IV_EXIT(L)       (L)->scalar_loop_iv
 #define LOOP_VINFO_BBS(L)                  (L)->bbs
 #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
 #define LOOP_VINFO_NITERS(L)               (L)->num_iters
@@ -2155,11 +2169,13 @@ class auto_purge_vect_location
 
 /* Simple loop peeling and versioning utilities for vectorizer's purposes -
    in tree-vect-loop-manip.cc.  */
-extern void vect_set_loop_condition (class loop *, loop_vec_info,
+extern void vect_set_loop_condition (class loop *, edge, loop_vec_info,
 				     tree, tree, tree, bool);
-extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
-class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
-						     class loop *, edge);
+extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
+					 const_edge);
+class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
+						    class loop *, edge,
+						    edge, edge *);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
@@ -2169,6 +2185,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info);
 extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
+extern edge vec_init_loop_exit_info (class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2358,6 +2375,7 @@ struct vect_loop_form_info
   tree assumptions;
   gcond *loop_cond;
   gcond *inner_loop_cond;
+  edge loop_exit;
 };
 extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);
 extern loop_vec_info vect_create_loop_vinfo (class loop *, vec_info_shared *,
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac60378935392aa7b73476efed74b 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple *loop_vectorized_call,
   class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
 
   LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
+  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
+    = vec_init_loop_exit_info (scalar_loop);
   gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
 		       == loop_vectorized_call);
   /* If we are going to vectorize outer loop, prevent vectorization




-- 

[-- Attachment #2: rb17789.patch --]
[-- Type: text/plain, Size: 34459 bytes --]

diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index a28470b66ea935741a61fb73961ed7c927543a3d..902edc49ab588152a5b845f2c8a42a7e2a1d6080 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -949,7 +949,8 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
+						NULL, preheader, NULL);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
@@ -3043,6 +3044,24 @@ loop_distribution::distribute_loop (class loop *loop,
       return 0;
     }
 
+  /* Loop distribution only does prologue peeling but we still need to
+     initialize loop exit information.  However we only support single exits at
+     the moment.  As such, should exit information not have been provided and we
+     have more than one exit, bail out.  */
+  if (!single_exit (loop))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Loop %d not distributed: too many exits.\n",
+		 loop->num);
+
+      free_rdg (rdg);
+      loop_nest.release ();
+      free_data_refs (datarefs_vec);
+      delete ddrs_table;
+      return 0;
+    }
+
   data_reference_p dref;
   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
     dref->aux = (void *) (uintptr_t) i;
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index c58a8a16e81573aada38e912b7c58b3e1b23b66d..f35ca1bded0b841179e4958645d264ad23684019 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern tree number_of_latch_executions (class loop *);
 extern gcond *get_loop_exit_condition (const class loop *);
+extern gcond *get_loop_exit_condition (const_edge);
 
 extern void scev_initialize (void);
 extern bool scev_initialized_p (void);
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 3fb6951e6085352c027d32c3548246042b98b64b..7cafe5ce576079921e380aaab5c5c4aa84cea372 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -1292,9 +1292,17 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr,
 
 gcond *
 get_loop_exit_condition (const class loop *loop)
+{
+  return get_loop_exit_condition (single_exit (loop));
+}
+
+/* If the statement just before the EXIT_EDGE contains a condition then
+   return the condition, otherwise NULL. */
+
+gcond *
+get_loop_exit_condition (const_edge exit_edge)
 {
   gcond *res = NULL;
-  edge exit_edge = single_exit (loop);
 
   if (dump_file && (dump_flags & TDF_SCEV))
     fprintf (dump_file, "(get_loop_exit_condition \n  ");
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 40ab568fe355964b878d770010aa9eeaef63eeac..9607a9fb25da26591ffd8071a02495f2042e0579 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2078,7 +2078,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
+				       LOOP_VINFO_IV_EXIT (loop_vinfo))
       || loop->inner)
     do_peeling = false;
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 09641901ff1e5c03dd07ab6f85dd67288f940ea2..e06717272aafc6d31cbdcb94840ac25de616da6d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -803,7 +803,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
    final gcond.  */
 
 static gcond *
-vect_set_loop_condition_partial_vectors (class loop *loop,
+vect_set_loop_condition_partial_vectors (class loop *loop, edge exit_edge,
 					 loop_vec_info loop_vinfo, tree niters,
 					 tree final_iv, bool niters_maybe_zero,
 					 gimple_stmt_iterator loop_cond_gsi)
@@ -904,7 +904,6 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   add_header_seq (loop, header_seq);
 
   /* Get a boolean result that tells us whether to iterate.  */
-  edge exit_edge = single_exit (loop);
   gcond *cond_stmt;
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
       && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
@@ -935,7 +934,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   if (final_iv)
     {
       gassign *assign = gimple_build_assign (final_iv, orig_niters);
-      gsi_insert_on_edge_immediate (single_exit (loop), assign);
+      gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
   return cond_stmt;
@@ -953,6 +952,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
 static gcond *
 vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
+					 edge exit_edge,
 					 loop_vec_info loop_vinfo, tree niters,
 					 tree final_iv,
 					 bool niters_maybe_zero,
@@ -1144,7 +1144,6 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
   add_preheader_seq (loop, preheader_seq);
 
   /* Adjust the exit test using the decrementing IV.  */
-  edge exit_edge = single_exit (loop);
   tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR;
   /* When we peel for alignment with niter_skip != 0 this can
      cause niter + niter_skip to wrap and since we are comparing the
@@ -1183,7 +1182,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
 {
@@ -1191,13 +1191,12 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   gcond *cond_stmt;
   gcond *orig_cond;
   edge pe = loop_preheader_edge (loop);
-  edge exit_edge = single_exit (loop);
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   enum tree_code code;
   tree niters_type = TREE_TYPE (niters);
 
-  orig_cond = get_loop_exit_condition (loop);
+  orig_cond = get_loop_exit_condition (exit_edge);
   gcc_assert (orig_cond);
   loop_cond_gsi = gsi_for_stmt (orig_cond);
 
@@ -1305,19 +1304,18 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   if (final_iv)
     {
       gassign *assign;
-      edge exit = single_exit (loop);
-      gcc_assert (single_pred_p (exit->dest));
+      gcc_assert (single_pred_p (exit_edge->dest));
       tree phi_dest
 	= integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
       /* Make sure to maintain LC SSA form here and elide the subtraction
 	 if the value is zero.  */
-      gphi *phi = create_phi_node (phi_dest, exit->dest);
-      add_phi_arg (phi, indx_after_incr, exit, UNKNOWN_LOCATION);
+      gphi *phi = create_phi_node (phi_dest, exit_edge->dest);
+      add_phi_arg (phi, indx_after_incr, exit_edge, UNKNOWN_LOCATION);
       if (!integer_zerop (init))
 	{
 	  assign = gimple_build_assign (final_iv, MINUS_EXPR,
 					phi_dest, init);
-	  gimple_stmt_iterator gsi = gsi_after_labels (exit->dest);
+	  gimple_stmt_iterator gsi = gsi_after_labels (exit_edge->dest);
 	  gsi_insert_before (&gsi, assign, GSI_SAME_STMT);
 	}
     }
@@ -1348,29 +1346,33 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
    Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
 
 void
-vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
+vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo,
 			 tree niters, tree step, tree final_iv,
 			 bool niters_maybe_zero)
 {
   gcond *cond_stmt;
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_loop_exit_condition (loop_e);
   gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
 
   if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
     {
       if (LOOP_VINFO_PARTIAL_VECTORS_STYLE (loop_vinfo) == vect_partial_vectors_avx512)
-	cond_stmt = vect_set_loop_condition_partial_vectors_avx512 (loop, loop_vinfo,
+	cond_stmt = vect_set_loop_condition_partial_vectors_avx512 (loop, loop_e,
+								    loop_vinfo,
 								    niters, final_iv,
 								    niters_maybe_zero,
 								    loop_cond_gsi);
       else
-	cond_stmt = vect_set_loop_condition_partial_vectors (loop, loop_vinfo,
+	cond_stmt = vect_set_loop_condition_partial_vectors (loop, loop_e,
+							     loop_vinfo,
 							     niters, final_iv,
 							     niters_maybe_zero,
 							     loop_cond_gsi);
     }
   else
-    cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv,
+    cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop_e, loop,
+						niters,
+						step, final_iv,
 						niters_maybe_zero,
 						loop_cond_gsi);
 
@@ -1439,7 +1441,6 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
 		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
-
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1447,8 +1448,9 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    entry or exit of LOOP.  */
 
 class loop *
-slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
-					class loop *scalar_loop, edge e)
+slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
+					class loop *scalar_loop,
+					edge scalar_exit, edge e, edge *new_e)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1458,13 +1460,16 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   edge exit, new_exit;
   bool duplicate_outer_loop = false;
 
-  exit = single_exit (loop);
+  exit = loop_exit;
   at_exit = (e == exit);
   if (!at_exit && e != loop_preheader_edge (loop))
     return NULL;
 
   if (scalar_loop == NULL)
-    scalar_loop = loop;
+    {
+      scalar_loop = loop;
+      scalar_exit = loop_exit;
+    }
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1490,13 +1495,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   bbs[0] = preheader;
   new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
-  exit = single_exit (scalar_loop);
   copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
-	    &exit, 1, &new_exit, NULL,
+	    &scalar_exit, 1, &new_exit, NULL,
 	    at_exit ? loop->latch : e->src, true);
-  exit = single_exit (loop);
+  exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  if (new_e)
+    *new_e = new_exit;
+
   /* Before installing PHI arguments make sure that the edges
      into them match that of the scalar loop we analyzed.  This
      makes sure the SLP tree matches up between the main vectorized
@@ -1537,8 +1544,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
 	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
 	 header) to have current_def set, so copy them over.  */
-      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
-						exit);
+      slpeel_duplicate_current_defs_from_edges (scalar_exit, exit);
       slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
 							   0),
 						EDGE_SUCC (loop->latch, 0));
@@ -1696,11 +1702,11 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
  */
 
 bool
-slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
+slpeel_can_duplicate_loop_p (const class loop *loop, const_edge exit_e,
+			     const_edge e)
 {
-  edge exit_e = single_exit (loop);
   edge entry_e = loop_preheader_edge (loop);
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_loop_exit_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
@@ -1709,7 +1715,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   if (!loop_outer (loop)
       || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
-      || !single_exit (loop)
+      || !exit_e
       /* Verify that new loop exit condition can be trivially modified.  */
       || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
       || (e != exit_e && e != entry_e))
@@ -1722,7 +1728,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   return ret;
 }
 
-/* Function vect_get_loop_location.
+/* Function find_loop_location.
 
    Extract the location of the loop in the source code.
    If the loop is not well formed for vectorization, an estimated
@@ -1739,11 +1745,19 @@ find_loop_location (class loop *loop)
   if (!loop)
     return dump_user_location_t ();
 
-  stmt = get_loop_exit_condition (loop);
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+    {
+      /* We only care about the loop location, so use any exit with location
+	 information.  */
+      for (edge e : get_loop_exit_edges (loop))
+	{
+	  stmt = get_loop_exit_condition (e);
 
-  if (stmt
-      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
-    return stmt;
+	  if (stmt
+	      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
+	    return stmt;
+	}
+    }
 
   /* If we got here the loop is probably not "well formed",
      try to estimate the loop location */
@@ -1962,7 +1976,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-  basic_block exit_bb = single_exit (loop)->dest;
+
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
@@ -2529,10 +2544,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 {
   /* We should be using a step_vector of VF if VF is variable.  */
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
-  basic_block exit_bb = single_exit (loop)->dest;
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   gcc_assert (niters_vector_mult_vf_ptr != NULL);
   tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
@@ -2555,11 +2569,11 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
    NULL.  */
 
 static tree
-find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
-		gphi *lcssa_phi)
+find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
+		class loop *epilog ATTRIBUTE_UNUSED,
+		const_edge e, gphi *lcssa_phi)
 {
   gphi_iterator gsi;
-  edge e = single_exit (loop);
 
   gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
@@ -2620,7 +2634,8 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
 
 static void
 slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
-				   class loop *first, class loop *second,
+				   class loop *first, edge first_loop_e,
+				   class loop *second, edge second_loop_e,
 				   bool create_lcssa_for_iv_phis)
 {
   gphi_iterator gsi_update, gsi_orig;
@@ -2628,7 +2643,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 
   edge first_latch_e = EDGE_SUCC (first->latch, 0);
   edge second_preheader_e = loop_preheader_edge (second);
-  basic_block between_bb = single_exit (first)->dest;
+  basic_block between_bb = first_loop_e->dest;
 
   gcc_assert (between_bb == second_preheader_e->src);
   gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
@@ -2651,7 +2666,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 	{
 	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
 	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
+	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
 	  arg = new_res;
 	}
 
@@ -2664,7 +2679,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
      for correct vectorization of live stmts.  */
   if (loop == first)
     {
-      basic_block orig_exit = single_exit (second)->dest;
+      basic_block orig_exit = second_loop_e->dest;
       for (gsi_orig = gsi_start_phis (orig_exit);
 	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
 	{
@@ -2673,13 +2688,14 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
 	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
 	    continue;
 
+	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
 	  /* Already created in the above loop.   */
-	  if (find_guard_arg (first, second, orig_phi))
+	  if (find_guard_arg (first, second, exit_e, orig_phi))
 	    continue;
 
 	  tree new_res = copy_ssa_name (orig_arg);
 	  gphi *lcphi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION);
+	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
 	}
     }
 }
@@ -2847,7 +2863,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
       if (!merge_arg)
 	merge_arg = old_arg;
 
-      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
+      tree guard_arg
+	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
       /* If the var is live after loop but not a reduction, we simply
 	 use the old arg.  */
       if (!guard_arg)
@@ -3201,27 +3218,37 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     }
 
   if (vect_epilogues)
-    /* Make sure to set the epilogue's epilogue scalar loop, such that we can
-       use the original scalar loop as remaining epilogue if necessary.  */
-    LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
-      = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
+    {
+      /* Make sure to set the epilogue's epilogue scalar loop, such that we can
+	 use the original scalar loop as remaining epilogue if necessary.  */
+      LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
+	= LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
+      LOOP_VINFO_SCALAR_IV_EXIT (epilogue_vinfo)
+	= LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
+    }
 
   if (prolog_peeling)
     {
       e = loop_preheader_edge (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
+      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, exit_e, e));
 
       /* Peel prolog and put it on preheader edge of loop.  */
-      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
+      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
+      edge prolog_e = NULL;
+      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, exit_e,
+						       scalar_loop, scalar_e,
+						       e, &prolog_e);
       gcc_assert (prolog);
       prolog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
+      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
+					 exit_e, true);
       first_loop = prolog;
       reset_original_copy_tables ();
 
       /* Update the number of iterations for prolog loop.  */
       tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog));
-      vect_set_loop_condition (prolog, NULL, niters_prolog,
+      vect_set_loop_condition (prolog, prolog_e, loop_vinfo, niters_prolog,
 			       step_prolog, NULL_TREE, false);
 
       /* Skip the prolog loop.  */
@@ -3275,8 +3302,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
   if (epilog_peeling)
     {
-      e = single_exit (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
+      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e, e));
 
       /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
 	 said epilog then we should use a copy of the main loop as a starting
@@ -3285,12 +3312,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 If we are not vectorizing the epilog then we should use the scalar loop
 	 as the transformations mentioned above make less or no sense when not
 	 vectorizing.  */
+      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
       epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
-      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
+      edge epilog_e = vect_epilogues ? e : scalar_e;
+      edge new_epilog_e = NULL;
+      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
+						       epilog_e, e,
+						       &new_epilog_e);
+      LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
       gcc_assert (epilog);
-
       epilog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
+      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
+					 new_epilog_e, false);
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
       /* Scalar version loop may be preferred.  In this case, add guard
@@ -3374,16 +3407,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 				    niters, niters_vector_mult_vf);
-	  guard_bb = single_exit (loop)->dest;
-	  guard_to = split_edge (single_exit (epilog));
+	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
+	  guard_to = split_edge (epilog_e);
 	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
 					   skip_vector ? anchor : guard_bb,
 					   prob_epilog.invert (),
 					   irred_flag);
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
-	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
-					      single_exit (epilog));
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
@@ -3416,6 +3449,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     {
       epilog->aux = epilogue_vinfo;
       LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
+      LOOP_VINFO_IV_EXIT (epilogue_vinfo)
+	= LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
 
       loop_constraint_clear (epilog, LOOP_C_INFINITE);
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7b133cd7acc6bcf0bad26423e9993a..6e60d84143626a8e1d801bb580f4dcebc73c7ba7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -855,10 +855,9 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
 
 
 static gcond *
-vect_get_loop_niters (class loop *loop, tree *assumptions,
+vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
 		      tree *number_of_iterations, tree *number_of_iterationsm1)
 {
-  edge exit = single_exit (loop);
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
   gcond *cond = get_loop_exit_condition (loop);
@@ -927,6 +926,20 @@ vect_get_loop_niters (class loop *loop, tree *assumptions,
   return cond;
 }
 
+/*  Determine the main loop exit for the vectorizer.  */
+
+edge
+vec_init_loop_exit_info (class loop *loop)
+{
+  /* Before we begin we must first determine which exit is the main one and
+     which are auxilary exits.  */
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  if (exits.length () == 1)
+    return exits[0];
+  else
+    return NULL;
+}
+
 /* Function bb_in_loop_p
 
    Used as predicate for dfs order traversal of the loop bbs.  */
@@ -987,7 +1000,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
     scalar_loop (NULL),
-    orig_loop_info (NULL)
+    orig_loop_info (NULL),
+    vec_loop_iv (NULL),
+    vec_epilogue_loop_iv (NULL),
+    scalar_loop_iv (NULL)
 {
   /* CHECKME: We want to visit all BBs before their successors (except for
      latch blocks, for which this assertion wouldn't hold).  In the simple
@@ -1646,6 +1662,18 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 {
   DUMP_VECT_SCOPE ("vect_analyze_loop_form");
 
+  edge exit_e = vec_init_loop_exit_info (loop);
+  if (!exit_e)
+    return opt_result::failure_at (vect_location,
+				   "not vectorized:"
+				   " could not determine main exit from"
+				   " loop with multiple exits.\n");
+  info->loop_exit = exit_e;
+  if (dump_enabled_p ())
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "using as main loop exit: %d -> %d [AUX: %p]\n",
+		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
+
   /* Different restrictions apply when we are considering an inner-most loop,
      vs. an outer (nested) loop.
      (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -1767,7 +1795,7 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   " abnormal loop exit edge.\n");
 
   info->loop_cond
-    = vect_get_loop_niters (loop, &info->assumptions,
+    = vect_get_loop_niters (loop, e, &info->assumptions,
 			    &info->number_of_iterations,
 			    &info->number_of_iterationsm1);
   if (!info->loop_cond)
@@ -1821,6 +1849,9 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
 
   stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
   STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+
+  LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -3063,9 +3094,9 @@ start_over:
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
       if (!vect_can_advance_ivs_p (loop_vinfo)
-	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
-					   single_exit (LOOP_VINFO_LOOP
-							 (loop_vinfo))))
+	  || !slpeel_can_duplicate_loop_p (loop,
+					   LOOP_VINFO_IV_EXIT (loop_vinfo),
+					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
         {
 	  ok = opt_result::failure_at (vect_location,
 				       "not vectorized: can't create required "
@@ -6002,7 +6033,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = single_exit (loop)->dest;
+  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
   for (unsigned i = 0; i < vec_num; i++)
@@ -6018,7 +6049,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
 	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
+	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -10416,12 +10447,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = single_exit (loop)->dest;
+      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10965,7 +10996,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
    profile.  */
 
 static void
-scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
+scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool flat)
 {
   /* For flat profiles do not scale down proportionally by VF and only
      cap by known iteration count bounds.  */
@@ -10980,7 +11011,6 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
       return;
     }
   /* Loop body executes VF fewer times and exit increases VF times.  */
-  edge exit_e = single_exit (loop);
   profile_count entry_count = loop_preheader_edge (loop)->count ();
 
   /* If we have unreliable loop profile avoid dropping entry
@@ -11350,7 +11380,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
-  edge e = single_exit (loop);
+  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
   if (! single_pred_p (e->dest))
     {
       split_loop_exit_edge (e, true);
@@ -11376,7 +11406,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
      loop closed PHI nodes on the exit.  */
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
     {
-      e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
+      e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
       if (! single_pred_p (e->dest))
 	{
 	  split_loop_exit_edge (e, true);
@@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
      a zero NITERS becomes a nonzero NITERS_VECTOR.  */
   if (integer_onep (step_vector))
     niters_no_overflow = true;
-  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
-			   niters_vector_mult_vf, !niters_no_overflow);
+  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo), loop_vinfo,
+			   niters_vector, step_vector, niters_vector_mult_vf,
+			   !niters_no_overflow);
 
   unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
 
@@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 			  assumed_vf) - 1
 	 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
 			   assumed_vf) - 1);
-  scale_profile_for_vect_loop (loop, assumed_vf, flat);
+  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
+			       assumed_vf, flat);
 
   if (dump_enabled_p ())
     {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e3740ecc4377f5a31e54 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -919,10 +919,24 @@ public:
      analysis.  */
   vec<_loop_vec_info *> epilogue_vinfos;
 
+  /* The controlling loop IV for the current loop when vectorizing.  This IV
+     controls the natural exits of the loop.  */
+  edge vec_loop_iv;
+
+  /* The controlling loop IV for the epilogue loop when vectorizing.  This IV
+     controls the natural exits of the loop.  */
+  edge vec_epilogue_loop_iv;
+
+  /* The controlling loop IV for the scalar loop being vectorized.  This IV
+     controls the natural exits of the loop.  */
+  edge scalar_loop_iv;
 } *loop_vec_info;
 
 /* Access Functions.  */
 #define LOOP_VINFO_LOOP(L)                 (L)->loop
+#define LOOP_VINFO_IV_EXIT(L)              (L)->vec_loop_iv
+#define LOOP_VINFO_EPILOGUE_IV_EXIT(L)     (L)->vec_epilogue_loop_iv
+#define LOOP_VINFO_SCALAR_IV_EXIT(L)       (L)->scalar_loop_iv
 #define LOOP_VINFO_BBS(L)                  (L)->bbs
 #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
 #define LOOP_VINFO_NITERS(L)               (L)->num_iters
@@ -2155,11 +2169,13 @@ class auto_purge_vect_location
 
 /* Simple loop peeling and versioning utilities for vectorizer's purposes -
    in tree-vect-loop-manip.cc.  */
-extern void vect_set_loop_condition (class loop *, loop_vec_info,
+extern void vect_set_loop_condition (class loop *, edge, loop_vec_info,
 				     tree, tree, tree, bool);
-extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
-class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
-						     class loop *, edge);
+extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
+					 const_edge);
+class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
+						    class loop *, edge,
+						    edge, edge *);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
@@ -2169,6 +2185,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info);
 extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
+extern edge vec_init_loop_exit_info (class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2358,6 +2375,7 @@ struct vect_loop_form_info
   tree assumptions;
   gcond *loop_cond;
   gcond *inner_loop_cond;
+  edge loop_exit;
 };
 extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);
 extern loop_vec_info vect_create_loop_vinfo (class loop *, vec_info_shared *,
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac60378935392aa7b73476efed74b 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple *loop_vectorized_call,
   class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
 
   LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
+  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
+    = vec_init_loop_exit_info (scalar_loop);
   gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
 		       == loop_vectorized_call);
   /* If we are going to vectorize outer loop, prevent vectorization




^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.
  2023-10-02  7:41 [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
@ 2023-10-02  7:41 ` Tamar Christina
  2023-10-10 11:13   ` Richard Biener
  2023-10-02  7:42 ` [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling Tamar Christina
  2023-10-09 13:35 ` [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Richard Biener
  2 siblings, 1 reply; 12+ messages in thread
From: Tamar Christina @ 2023-10-02  7:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 14215 bytes --]

Hi All,

This second part updates niters analysis to be able to analyze any number of
exits.  If we have multiple exits we determine the main exit by finding the
first counting IV.

The change allows the vectorizer to pass analysis for multiple loops, but we
later gracefully reject them.  It does however allow us to test if the exit
handling is using the right exit everywhere.

Additionally since we analyze all exits, we now return all conditions for them
and determine which condition belongs to the main exit.

The main condition is needed because the vectorizer needs to ignore the main IV
condition during vectorization as it will replace it during codegen.

To track versioned loops we extend the contract between ifcvt and the vectorizer
to store the exit number in aux so that we can match it up again during peeling.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
	it.
	* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
	(vec_init_loop_exit_info): Extend analysis when multiple exits.
	(vect_analyze_loop_form): Record conds and determine main cond.
	(vect_create_loop_vinfo): Extend bookkeeping of conds.
	(vect_analyze_loop): Release conds.
	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
	LOOP_VINFO_LOOP_IV_COND):  New.
	(struct vect_loop_form_info): Add conds, alt_loop_conds;
	(struct loop_vec_info): Add conds, loop_iv_cond.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55903531b22829f4fdbd 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
     }
   if (need_to_ifcvt)
     {
+      /* Before we rewrite edges we'll record their original position in the
+	 edge map such that we can map the edges between the ifcvt and the
+	 non-ifcvt loop during peeling.  */
+      uintptr_t idx = 0;
+      for (edge exit : get_loop_exit_edges (loop))
+	exit->aux = (void*)idx++;
+
       /* Now all statements are if-convertible.  Combine all the basic
 	 blocks into one huge basic block doing the if-conversion
 	 on-the-fly.  */
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4052e1b12e0d17300262 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       scalar_loop = loop;
       scalar_exit = loop_exit;
     }
+  else if (scalar_loop == loop)
+    scalar_exit = loop_exit;
+  else
+    {
+      /* Loop has been version, match exits up using the aux index.  */
+      for (edge exit : get_loop_exit_edges (scalar_loop))
+	if (exit->aux == loop_exit->aux)
+	  {
+	    scalar_exit	= exit;
+	    break;
+	  }
+    }
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
   exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
+     so we must initialize the exit information.  */
   if (new_e)
     *new_e = new_exit;
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3a313b11d1ef98374349 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
    in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
    niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
-vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
+static vec<gcond *>
+vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge main_exit,
 		      tree *number_of_iterations, tree *number_of_iterationsm1)
 {
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  vec<gcond *> conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-    return cond;
+  if (exits.is_empty ())
+    return conds;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+		     exits.length ());
+
+  edge exit;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (exits, i, exit)
+    {
+      gcond *cond = get_loop_exit_condition (exit);
+      if (cond)
+	conds.safe_push (cond);
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
 
-  may_be_zero = NULL_TREE;
-  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
-      || chrec_contains_undetermined (niter_desc.niter))
-    return cond;
+      may_be_zero = NULL_TREE;
+      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
+          || chrec_contains_undetermined (niter_desc.niter))
+	continue;
 
-  niter_assumptions = niter_desc.assumptions;
-  may_be_zero = niter_desc.may_be_zero;
-  niter = niter_desc.niter;
+      niter_assumptions = niter_desc.assumptions;
+      may_be_zero = niter_desc.may_be_zero;
+      niter = niter_desc.niter;
 
-  if (may_be_zero && integer_zerop (may_be_zero))
-    may_be_zero = NULL_TREE;
+      if (may_be_zero && integer_zerop (may_be_zero))
+	may_be_zero = NULL_TREE;
 
-  if (may_be_zero)
-    {
-      if (COMPARISON_CLASS_P (may_be_zero))
+      if (may_be_zero)
 	{
-	  /* Try to combine may_be_zero with assumptions, this can simplify
-	     computation of niter expression.  */
-	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
-	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-					     niter_assumptions,
-					     fold_build1 (TRUTH_NOT_EXPR,
-							  boolean_type_node,
-							  may_be_zero));
+	  if (COMPARISON_CLASS_P (may_be_zero))
+	    {
+	      /* Try to combine may_be_zero with assumptions, this can simplify
+		 computation of niter expression.  */
+	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
+		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+						 niter_assumptions,
+						 fold_build1 (TRUTH_NOT_EXPR,
+							      boolean_type_node,
+							      may_be_zero));
+	      else
+		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
+				     build_int_cst (TREE_TYPE (niter), 0),
+				     rewrite_to_non_trapping_overflow (niter));
+
+	      may_be_zero = NULL_TREE;
+	    }
+	  else if (integer_nonzerop (may_be_zero) && exit == main_exit)
+	    {
+	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
+	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
+	      continue;
+	    }
 	  else
-	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
-				 build_int_cst (TREE_TYPE (niter), 0),
-				 rewrite_to_non_trapping_overflow (niter));
+	    continue;
+       }
 
-	  may_be_zero = NULL_TREE;
-	}
-      else if (integer_nonzerop (may_be_zero))
+      /* Loop assumptions are based off the normal exit.  */
+      if (exit == main_exit)
 	{
-	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
-	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
-	  return cond;
+	  *assumptions = niter_assumptions;
+	  *number_of_iterationsm1 = niter;
+
+	  /* We want the number of loop header executions which is the number
+	     of latch executions plus one.
+	     ???  For UINT_MAX latch executions this number overflows to zero
+	     for loops like do { n++; } while (n != 0);  */
+	  if (niter && !chrec_contains_undetermined (niter))
+	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
+				 unshare_expr (niter),
+				 build_int_cst (TREE_TYPE (niter), 1));
+	  *number_of_iterations = niter;
 	}
-      else
-	return cond;
     }
 
-  *assumptions = niter_assumptions;
-  *number_of_iterationsm1 = niter;
-
-  /* We want the number of loop header executions which is the number
-     of latch executions plus one.
-     ???  For UINT_MAX latch executions this number overflows to zero
-     for loops like do { n++; } while (n != 0);  */
-  if (niter && !chrec_contains_undetermined (niter))
-    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
-			  build_int_cst (TREE_TYPE (niter), 1));
-  *number_of_iterations = niter;
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
 
-  return cond;
+  return conds;
 }
 
 /*  Determine the main loop exit for the vectorizer.  */
@@ -936,8 +963,25 @@ vec_init_loop_exit_info (class loop *loop)
   auto_vec<edge> exits = get_loop_exit_edges (loop);
   if (exits.length () == 1)
     return exits[0];
-  else
-    return NULL;
+
+  /* If we have multiple exits we only support counting IV at the moment.  Analyze
+     all exits and return one */
+  class tree_niter_desc niter_desc;
+  edge candidate = NULL;
+  for (edge exit : exits)
+    {
+      if (!get_loop_exit_condition (exit))
+	continue;
+
+      if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
+	  && !chrec_contains_undetermined (niter_desc.niter))
+	{
+	  if (!niter_desc.may_be_zero || !candidate)
+	    candidate = exit;
+	}
+    }
+
+  return candidate;
 }
 
 /* Function bb_in_loop_p
@@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  edge e = single_exit (loop);
-  if (e->flags & EDGE_ABNORMAL)
+  if (exit_e->flags & EDGE_ABNORMAL)
     return opt_result::failure_at (vect_location,
 				   "not vectorized:"
 				   " abnormal loop exit edge.\n");
 
-  info->loop_cond
-    = vect_get_loop_niters (loop, e, &info->assumptions,
+  info->conds
+    = vect_get_loop_niters (loop, &info->assumptions, exit_e,
 			    &info->number_of_iterations,
 			    &info->number_of_iterationsm1);
-  if (!info->loop_cond)
+
+  if (info->conds.is_empty ())
     return opt_result::failure_at
       (vect_location,
        "not vectorized: complicated exit condition.\n");
 
+  /* Determine what the primary and alternate exit conds are.  */
+  info->alt_loop_conds.create (info->conds.length () - 1);
+  for (gcond *cond : info->conds)
+    {
+      if (exit_e->src != gimple_bb (cond))
+	info->alt_loop_conds.quick_push (cond);
+      else
+	info->loop_cond = cond;
+    }
+
   if (integer_zerop (info->assumptions)
       || !info->number_of_iterations
       || chrec_contains_undetermined (info->number_of_iterations))
@@ -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
   if (!integer_onep (info->assumptions) && !main_loop_info)
     LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
 
-  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
-  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+  for (gcond *cond : info->conds)
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
+      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+    }
+  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
+  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
@@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 			 && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
 			 && !loop->simduid);
   if (!vect_epilogues)
-    return first_loop_vinfo;
+    {
+      loop_form_info.conds.release ();
+      loop_form_info.alt_loop_conds.release ();
+      return first_loop_vinfo;
+    }
 
   /* Now analyze first_loop_vinfo for epilogue vectorization.  */
   poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
@@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
     }
 
+  loop_form_info.conds.release ();
+  loop_form_info.alt_loop_conds.release ();
+
   return first_loop_vinfo;
 }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa1327d595e1dddb112cfdf6 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -882,6 +882,12 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* List of loop additional IV conditionals found in the loop.  */
+  auto_vec<gcond *> conds;
+
+  /* Main loop IV cond.  */
+  gcond* loop_iv_cond;
+
   /* True if there are no loop carried data dependencies in the loop.
      If loop->safelen <= 1, then this is always true, either the loop
      didn't have any loop carried data dependencies, or the loop is being
@@ -984,6 +990,8 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
+#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
@@ -2373,7 +2381,9 @@ struct vect_loop_form_info
   tree number_of_iterations;
   tree number_of_iterationsm1;
   tree assumptions;
+  vec<gcond *> conds;
   gcond *loop_cond;
+  vec<gcond *> alt_loop_conds;
   gcond *inner_loop_cond;
   edge loop_exit;
 };




-- 

[-- Attachment #2: rb17790.patch --]
[-- Type: text/plain, Size: 12604 bytes --]

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55903531b22829f4fdbd 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
     }
   if (need_to_ifcvt)
     {
+      /* Before we rewrite edges we'll record their original position in the
+	 edge map such that we can map the edges between the ifcvt and the
+	 non-ifcvt loop during peeling.  */
+      uintptr_t idx = 0;
+      for (edge exit : get_loop_exit_edges (loop))
+	exit->aux = (void*)idx++;
+
       /* Now all statements are if-convertible.  Combine all the basic
 	 blocks into one huge basic block doing the if-conversion
 	 on-the-fly.  */
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4052e1b12e0d17300262 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       scalar_loop = loop;
       scalar_exit = loop_exit;
     }
+  else if (scalar_loop == loop)
+    scalar_exit = loop_exit;
+  else
+    {
+      /* Loop has been version, match exits up using the aux index.  */
+      for (edge exit : get_loop_exit_edges (scalar_loop))
+	if (exit->aux == loop_exit->aux)
+	  {
+	    scalar_exit	= exit;
+	    break;
+	  }
+    }
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
   exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
+     so we must initialize the exit information.  */
   if (new_e)
     *new_e = new_exit;
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3a313b11d1ef98374349 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
    in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
    niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
-vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
+static vec<gcond *>
+vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge main_exit,
 		      tree *number_of_iterations, tree *number_of_iterationsm1)
 {
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  vec<gcond *> conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-    return cond;
+  if (exits.is_empty ())
+    return conds;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+		     exits.length ());
+
+  edge exit;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (exits, i, exit)
+    {
+      gcond *cond = get_loop_exit_condition (exit);
+      if (cond)
+	conds.safe_push (cond);
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
 
-  may_be_zero = NULL_TREE;
-  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
-      || chrec_contains_undetermined (niter_desc.niter))
-    return cond;
+      may_be_zero = NULL_TREE;
+      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
+          || chrec_contains_undetermined (niter_desc.niter))
+	continue;
 
-  niter_assumptions = niter_desc.assumptions;
-  may_be_zero = niter_desc.may_be_zero;
-  niter = niter_desc.niter;
+      niter_assumptions = niter_desc.assumptions;
+      may_be_zero = niter_desc.may_be_zero;
+      niter = niter_desc.niter;
 
-  if (may_be_zero && integer_zerop (may_be_zero))
-    may_be_zero = NULL_TREE;
+      if (may_be_zero && integer_zerop (may_be_zero))
+	may_be_zero = NULL_TREE;
 
-  if (may_be_zero)
-    {
-      if (COMPARISON_CLASS_P (may_be_zero))
+      if (may_be_zero)
 	{
-	  /* Try to combine may_be_zero with assumptions, this can simplify
-	     computation of niter expression.  */
-	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
-	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-					     niter_assumptions,
-					     fold_build1 (TRUTH_NOT_EXPR,
-							  boolean_type_node,
-							  may_be_zero));
+	  if (COMPARISON_CLASS_P (may_be_zero))
+	    {
+	      /* Try to combine may_be_zero with assumptions, this can simplify
+		 computation of niter expression.  */
+	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
+		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+						 niter_assumptions,
+						 fold_build1 (TRUTH_NOT_EXPR,
+							      boolean_type_node,
+							      may_be_zero));
+	      else
+		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
+				     build_int_cst (TREE_TYPE (niter), 0),
+				     rewrite_to_non_trapping_overflow (niter));
+
+	      may_be_zero = NULL_TREE;
+	    }
+	  else if (integer_nonzerop (may_be_zero) && exit == main_exit)
+	    {
+	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
+	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
+	      continue;
+	    }
 	  else
-	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
-				 build_int_cst (TREE_TYPE (niter), 0),
-				 rewrite_to_non_trapping_overflow (niter));
+	    continue;
+       }
 
-	  may_be_zero = NULL_TREE;
-	}
-      else if (integer_nonzerop (may_be_zero))
+      /* Loop assumptions are based off the normal exit.  */
+      if (exit == main_exit)
 	{
-	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
-	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
-	  return cond;
+	  *assumptions = niter_assumptions;
+	  *number_of_iterationsm1 = niter;
+
+	  /* We want the number of loop header executions which is the number
+	     of latch executions plus one.
+	     ???  For UINT_MAX latch executions this number overflows to zero
+	     for loops like do { n++; } while (n != 0);  */
+	  if (niter && !chrec_contains_undetermined (niter))
+	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
+				 unshare_expr (niter),
+				 build_int_cst (TREE_TYPE (niter), 1));
+	  *number_of_iterations = niter;
 	}
-      else
-	return cond;
     }
 
-  *assumptions = niter_assumptions;
-  *number_of_iterationsm1 = niter;
-
-  /* We want the number of loop header executions which is the number
-     of latch executions plus one.
-     ???  For UINT_MAX latch executions this number overflows to zero
-     for loops like do { n++; } while (n != 0);  */
-  if (niter && !chrec_contains_undetermined (niter))
-    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
-			  build_int_cst (TREE_TYPE (niter), 1));
-  *number_of_iterations = niter;
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
 
-  return cond;
+  return conds;
 }
 
 /*  Determine the main loop exit for the vectorizer.  */
@@ -936,8 +963,25 @@ vec_init_loop_exit_info (class loop *loop)
   auto_vec<edge> exits = get_loop_exit_edges (loop);
   if (exits.length () == 1)
     return exits[0];
-  else
-    return NULL;
+
+  /* If we have multiple exits we only support counting IV at the moment.  Analyze
+     all exits and return one */
+  class tree_niter_desc niter_desc;
+  edge candidate = NULL;
+  for (edge exit : exits)
+    {
+      if (!get_loop_exit_condition (exit))
+	continue;
+
+      if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
+	  && !chrec_contains_undetermined (niter_desc.niter))
+	{
+	  if (!niter_desc.may_be_zero || !candidate)
+	    candidate = exit;
+	}
+    }
+
+  return candidate;
 }
 
 /* Function bb_in_loop_p
@@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  edge e = single_exit (loop);
-  if (e->flags & EDGE_ABNORMAL)
+  if (exit_e->flags & EDGE_ABNORMAL)
     return opt_result::failure_at (vect_location,
 				   "not vectorized:"
 				   " abnormal loop exit edge.\n");
 
-  info->loop_cond
-    = vect_get_loop_niters (loop, e, &info->assumptions,
+  info->conds
+    = vect_get_loop_niters (loop, &info->assumptions, exit_e,
 			    &info->number_of_iterations,
 			    &info->number_of_iterationsm1);
-  if (!info->loop_cond)
+
+  if (info->conds.is_empty ())
     return opt_result::failure_at
       (vect_location,
        "not vectorized: complicated exit condition.\n");
 
+  /* Determine what the primary and alternate exit conds are.  */
+  info->alt_loop_conds.create (info->conds.length () - 1);
+  for (gcond *cond : info->conds)
+    {
+      if (exit_e->src != gimple_bb (cond))
+	info->alt_loop_conds.quick_push (cond);
+      else
+	info->loop_cond = cond;
+    }
+
   if (integer_zerop (info->assumptions)
       || !info->number_of_iterations
       || chrec_contains_undetermined (info->number_of_iterations))
@@ -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
   if (!integer_onep (info->assumptions) && !main_loop_info)
     LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
 
-  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
-  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+  for (gcond *cond : info->conds)
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
+      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+    }
+  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
+  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
@@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 			 && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
 			 && !loop->simduid);
   if (!vect_epilogues)
-    return first_loop_vinfo;
+    {
+      loop_form_info.conds.release ();
+      loop_form_info.alt_loop_conds.release ();
+      return first_loop_vinfo;
+    }
 
   /* Now analyze first_loop_vinfo for epilogue vectorization.  */
   poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
@@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
     }
 
+  loop_form_info.conds.release ();
+  loop_form_info.alt_loop_conds.release ();
+
   return first_loop_vinfo;
 }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa1327d595e1dddb112cfdf6 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -882,6 +882,12 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* List of loop additional IV conditionals found in the loop.  */
+  auto_vec<gcond *> conds;
+
+  /* Main loop IV cond.  */
+  gcond* loop_iv_cond;
+
   /* True if there are no loop carried data dependencies in the loop.
      If loop->safelen <= 1, then this is always true, either the loop
      didn't have any loop carried data dependencies, or the loop is being
@@ -984,6 +990,8 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
+#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
@@ -2373,7 +2381,9 @@ struct vect_loop_form_info
   tree number_of_iterations;
   tree number_of_iterationsm1;
   tree assumptions;
+  vec<gcond *> conds;
   gcond *loop_cond;
+  vec<gcond *> alt_loop_conds;
   gcond *inner_loop_cond;
   edge loop_exit;
 };




^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling
  2023-10-02  7:41 [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
  2023-10-02  7:41 ` [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits Tamar Christina
@ 2023-10-02  7:42 ` Tamar Christina
  2023-10-10 12:59   ` Richard Biener
  2023-10-09 13:35 ` [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Richard Biener
  2 siblings, 1 reply; 12+ messages in thread
From: Tamar Christina @ 2023-10-02  7:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 22918 bytes --]

Hi All,

This final patch updates peeling to maintain LCSSA all the way through.

It's significantly easier to maintain it during peeling while we still know
where all new edges connect rather than touching it up later as is currently
being done.

This allows us to remove many of the helper functions that touch up the loops
at various parts.  The only complication is for loop distribution where we
should be able to use the same,  however ldist depending on whether
redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
form itself or removes are non-virtual phis.

The problem here is that if we maintain LCSSA then in some cases the blocks
connecting the two loops get PHIs to keep the loop IV up to date.

However there is no loop, the guard condition is rewritten as 0 != 0, to the
"loop" always exits.   However due to the PHI nodes the probabilities get
completely wrong.  It seems to think that the impossible exit is the likely
edge.  This causes incorrect warnings and the presence of the PHIs prevent the
blocks to be simplified.

While it may be possible to make ldist work with LCSSA form, doing so seems more
work than not.  For that reason the peeling code has an additional parameter
used by only ldist to not connect the two loops during peeling.

This preserves the current behaviour from ldist until I can dive into the
implementation more.  Hopefully that's ok for now.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
	asserts.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
	(find_guard_arg): Look value up through explicit edge and original defs.
	(vect_do_peeling): Use it.
	(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
	(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
	Remove.
	* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
	* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
	optional param to turn off LCSSA mode.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 902edc49ab588152a5b845f2c8a42a7e2a1d6080..14fb884d3e91d79785867debaee4956a2d5b0bb1 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -950,7 +950,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
 
   initialize_original_copy_tables ();
   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
-						NULL, preheader, NULL);
+						NULL, preheader, NULL, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 77f8e668bcc8beca99ba4052e1b12e0d17300262..0e8c0be5384aab2399ed93966e7bf4918f6c87a5 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
 {
   tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
 
+  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
+	      || orig_def != new_def);
+
   SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
 
   if (MAY_HAVE_DEBUG_BIND_STMTS)
@@ -1445,12 +1448,19 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
    basic blocks from SCALAR_LOOP instead of LOOP, but to either the
-   entry or exit of LOOP.  */
+   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
+   continuation.  This is correct for cases where one loop continues from the
+   other like in the vectorizer, but not true for uses in e.g. loop distribution
+   where the loop is duplicated and then modified.
+
+   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
+   dominators were updated during the peeling.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
-					edge scalar_exit, edge e, edge *new_e)
+					edge scalar_exit, edge e, edge *new_e,
+					bool flow_loops)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1481,6 +1491,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	    scalar_exit	= exit;
 	    break;
 	  }
+
+      gcc_assert (scalar_exit);
     }
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
@@ -1513,6 +1525,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
   exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  gcc_assert (new_exit);
+
   /* Record the new loop exit information.  new_loop doesn't have SCEV data and
      so we must initialize the exit information.  */
   if (new_e)
@@ -1551,6 +1565,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
   for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
     rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
+  /* Rename the exit uses.  */
+  for (edge exit : get_loop_exit_edges (new_loop))
+    for (auto gsi = gsi_start_phis (exit->dest);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
+	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
+	if (MAY_HAVE_DEBUG_BIND_STMTS)
+	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
+      }
+
+  /* This condition happens when the loop has been versioned. e.g. due to ifcvt
+     versioning the loop.  */
   if (scalar_loop != loop)
     {
       /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
@@ -1564,28 +1591,83 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 						EDGE_SUCC (loop->latch, 0));
     }
 
+  auto loop_exits = get_loop_exit_edges (loop);
+  auto_vec<basic_block> doms;
+
   if (at_exit) /* Add the loop copy at exit.  */
     {
-      if (scalar_loop != loop)
+      if (scalar_loop != loop && new_exit->dest != exit_dest)
 	{
-	  gphi_iterator gsi;
 	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
+	  flush_pending_stmts (new_exit);
+	}
 
-	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
-	       gsi_next (&gsi))
-	    {
-	      gphi *phi = gsi.phi ();
-	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
-	      location_t orig_locus
-		= gimple_phi_arg_location_from_edge (phi, e);
+      auto_vec <gimple *> new_phis;
+      hash_map <tree, tree> new_phi_args;
+      /* First create the empty phi nodes so that when we flush the
+	 statements they can be filled in.   However because there is no order
+	 between the PHI nodes in the exits and the loop headers we need to
+	 order them base on the order of the two headers.  First record the new
+	 phi nodes.  */
+      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
+	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	  gphi *res = create_phi_node (new_res, new_preheader);
+	  new_phis.safe_push (res);
+	}
 
-	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
+      /* Then redirect the edges and flush the changes.  This writes out the new
+	 SSA names.  */
+      for (edge exit : loop_exits)
+	{
+	  edge e = redirect_edge_and_branch (exit, new_preheader);
+	  flush_pending_stmts (e);
+	}
+
+      /* Record the new SSA names in the cache so that we can skip materializing
+	 them again when we fill in the rest of the LCSSA variables.  */
+      for (auto phi : new_phis)
+	{
+	  tree new_arg = gimple_phi_arg (phi, 0)->def;
+	  new_phi_args.put (new_arg, gimple_phi_result (phi));
+	}
+
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      edge latch_new = single_succ_edge (new_preheader);
+      for (auto gsi_from = gsi_start_phis (loop->header),
+	   gsi_to = gsi_start_phis (new_loop->header);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						loop_latch_edge (loop));
+
+	  /* Check if we've already created a new phi node during edge
+	     redirection.  If we have, only propagate the value downwards.  */
+	  if (tree *res = new_phi_args.get (new_arg))
+	    {
+	      adjust_phi_and_debug_stmts (to_phi, latch_new, *res);
+	      continue;
 	    }
+
+	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+
+	  /* Main loop exit should use the final iter value.  */
+	  add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+
+	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
 	}
-      redirect_edge_and_branch_force (e, new_preheader);
-      flush_pending_stmts (e);
+
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom || duplicate_outer_loop)
+
+      if ((was_imm_dom || duplicate_outer_loop))
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -1598,6 +1680,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
     }
   else /* Add the copy at entry.  */
     {
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      for (auto gsi_from = gsi_start_phis (new_loop->header),
+	   gsi_to = gsi_start_phis (loop->header);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						loop_latch_edge (new_loop));
+	  adjust_phi_and_debug_stmts (to_phi, loop_preheader_edge (loop),
+				      new_arg);
+	}
+
       if (scalar_loop != loop)
 	{
 	  /* Remove the non-necessary forwarder of scalar_loop again.  */
@@ -1627,29 +1725,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 			       loop_preheader_edge (new_loop)->src);
     }
 
-  if (scalar_loop != loop)
-    {
-      /* Update new_loop->header PHIs, so that on the preheader
-	 edge they are the ones from loop rather than scalar_loop.  */
-      gphi_iterator gsi_orig, gsi_new;
-      edge orig_e = loop_preheader_edge (loop);
-      edge new_e = loop_preheader_edge (new_loop);
-
-      for (gsi_orig = gsi_start_phis (loop->header),
-	   gsi_new = gsi_start_phis (new_loop->header);
-	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
-	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
-	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  gphi *new_phi = gsi_new.phi ();
-	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
-	  location_t orig_locus
-	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
-
-	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
-	}
-    }
-
   free (new_bbs);
   free (bbs);
 
@@ -2579,139 +2654,36 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 
 /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
    this function searches for the corresponding lcssa phi node in exit
-   bb of LOOP.  If it is found, return the phi result; otherwise return
-   NULL.  */
+   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
+   return the phi result; otherwise return NULL.  */
 
 static tree
 find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
 		class loop *epilog ATTRIBUTE_UNUSED,
-		const_edge e, gphi *lcssa_phi)
+		const_edge e, gphi *lcssa_phi, int lcssa_edge = 0)
 {
   gphi_iterator gsi;
 
-  gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *phi = gsi.phi ();
-      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
-			   PHI_ARG_DEF (lcssa_phi, 0), 0))
-	return PHI_RESULT (phi);
-    }
-  return NULL_TREE;
-}
-
-/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
-   from SECOND/FIRST and puts it at the original loop's preheader/exit
-   edge, the two loops are arranged as below:
-
-       preheader_a:
-     first_loop:
-       header_a:
-	 i_1 = PHI<i_0, i_2>;
-	 ...
-	 i_2 = i_1 + 1;
-	 if (cond_a)
-	   goto latch_a;
-	 else
-	   goto between_bb;
-       latch_a:
-	 goto header_a;
-
-       between_bb:
-	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
-
-     second_loop:
-       header_b:
-	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
-				 or with i_2 if no LCSSA phi is created
-				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
-	 ...
-	 i_4 = i_3 + 1;
-	 if (cond_b)
-	   goto latch_b;
-	 else
-	   goto exit_bb;
-       latch_b:
-	 goto header_b;
-
-       exit_bb:
-
-   This function creates loop closed SSA for the first loop; update the
-   second loop's PHI nodes by replacing argument on incoming edge with the
-   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
-   is false, Loop closed ssa phis will only be created for non-iv phis for
-   the first loop.
-
-   This function assumes exit bb of the first loop is preheader bb of the
-   second loop, i.e, between_bb in the example code.  With PHIs updated,
-   the second loop will execute rest iterations of the first.  */
-
-static void
-slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
-				   class loop *first, edge first_loop_e,
-				   class loop *second, edge second_loop_e,
-				   bool create_lcssa_for_iv_phis)
-{
-  gphi_iterator gsi_update, gsi_orig;
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-
-  edge first_latch_e = EDGE_SUCC (first->latch, 0);
-  edge second_preheader_e = loop_preheader_edge (second);
-  basic_block between_bb = first_loop_e->dest;
-
-  gcc_assert (between_bb == second_preheader_e->src);
-  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
-  /* Either the first loop or the second is the loop to be vectorized.  */
-  gcc_assert (loop == first || loop == second);
-
-  for (gsi_orig = gsi_start_phis (first->header),
-       gsi_update = gsi_start_phis (second->header);
-       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
-       gsi_next (&gsi_orig), gsi_next (&gsi_update))
-    {
-      gphi *orig_phi = gsi_orig.phi ();
-      gphi *update_phi = gsi_update.phi ();
-
-      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
-      /* Generate lcssa PHI node for the first loop.  */
-      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
-      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
-      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
+      /* Nested loops with multiple exits can have different no# phi node
+	arguments between the main loop and epilog as epilog falls to the
+	second loop.  */
+      if (gimple_phi_num_args (phi) > e->dest_idx)
 	{
-	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
-	  arg = new_res;
-	}
-
-      /* Update PHI node in the second loop by replacing arg on the loop's
-	 incoming edge.  */
-      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
-    }
-
-  /* For epilogue peeling we have to make sure to copy all LC PHIs
-     for correct vectorization of live stmts.  */
-  if (loop == first)
-    {
-      basic_block orig_exit = second_loop_e->dest;
-      for (gsi_orig = gsi_start_phis (orig_exit);
-	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
-	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
-	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
-	    continue;
-
-	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-	  /* Already created in the above loop.   */
-	  if (find_guard_arg (first, second, exit_e, orig_phi))
+	 tree var = PHI_ARG_DEF (phi, e->dest_idx);
+	 if (TREE_CODE (var) != SSA_NAME)
 	    continue;
-
-	  tree new_res = copy_ssa_name (orig_arg);
-	  gphi *lcphi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
+	 tree def = get_current_def (var);
+	 if (!def)
+	   continue;
+	 if (operand_equal_p (def,
+			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
+	   return PHI_RESULT (phi);
 	}
     }
+  return NULL_TREE;
 }
 
 /* Function slpeel_add_loop_guard adds guard skipping from the beginning
@@ -2796,11 +2768,11 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
     }
 }
 
-/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
-   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
-   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
-   and MERGE_EDGE are the two pred edges of merge_bb at the end of EPILOG.
-   The CFG looks like:
+/* LOOP and EPILOG are two consecutive loops in CFG connected by LOOP_EXIT edge
+   and EPILOG is copied from LOOP.  Function slpeel_add_loop_guard adds guard
+   skipping from a point between the two loops to the end of EPILOG.  Edges
+   GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at the end of
+   EPILOG.  The CFG looks like:
 
      loop:
        header_a:
@@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
 
 static void
 slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
+				    const_edge loop_exit,
 				    edge guard_edge, edge merge_edge)
 {
   gphi_iterator gsi;
@@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
   gcc_assert (single_succ_p (merge_bb));
   edge e = single_succ_edge (merge_bb);
   basic_block exit_bb = e->dest;
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
 
   for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *update_phi = gsi.phi ();
-      tree old_arg = PHI_ARG_DEF (update_phi, 0);
+      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
 
       tree merge_arg = NULL_TREE;
 
@@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
       if (!merge_arg)
 	merge_arg = old_arg;
 
-      tree guard_arg
-	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
+      tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
+				       update_phi, e->dest_idx);
       /* If the var is live after loop but not a reduction, we simply
 	 use the old arg.  */
       if (!guard_arg)
@@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
     }
 }
 
-/* EPILOG loop is duplicated from the original loop for vectorizing,
-   the arg of its loop closed ssa PHI needs to be updated.  */
-
-static void
-slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
-{
-  gphi_iterator gsi;
-  basic_block exit_bb = single_exit (epilog)->dest;
-
-  gcc_assert (single_pred_p (exit_bb));
-  edge e = EDGE_PRED (exit_bb, 0);
-  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
-}
-
 /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be skipped.
    Return a value that equals:
 
@@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 						       e, &prolog_e);
       gcc_assert (prolog);
       prolog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
-					 exit_e, true);
+
       first_loop = prolog;
       reset_original_copy_tables ();
 
@@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
       gcc_assert (epilog);
       epilog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
-					 new_epilog_e, false);
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
       /* Scalar version loop may be preferred.  In this case, add guard
@@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 					   irred_flag);
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
-	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
+	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
+					      epilog_e);
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
@@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    }
 	  scale_loop_profile (epilog, prob_epilog, -1);
 	}
-      else
-	slpeel_update_phi_nodes_for_lcssa (epilog);
 
       unsigned HOST_WIDE_INT bound;
       if (bound_scalar.is_constant (&bound))
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e024d666df46ef9208107 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
   basic_block exit_bb;
   tree scalar_dest;
   tree scalar_type;
-  gimple *new_phi = NULL, *phi;
+  gimple *new_phi = NULL, *phi = NULL;
   gimple_stmt_iterator exit_gsi;
   tree new_temp = NULL_TREE, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd6012443403997e921066483 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *);
+						    edge, edge *, bool = true);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,




-- 

[-- Attachment #2: rb17791.patch --]
[-- Type: text/plain, Size: 20686 bytes --]

diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 902edc49ab588152a5b845f2c8a42a7e2a1d6080..14fb884d3e91d79785867debaee4956a2d5b0bb1 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -950,7 +950,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
 
   initialize_original_copy_tables ();
   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
-						NULL, preheader, NULL);
+						NULL, preheader, NULL, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 77f8e668bcc8beca99ba4052e1b12e0d17300262..0e8c0be5384aab2399ed93966e7bf4918f6c87a5 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
 {
   tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
 
+  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
+	      || orig_def != new_def);
+
   SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
 
   if (MAY_HAVE_DEBUG_BIND_STMTS)
@@ -1445,12 +1448,19 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
    basic blocks from SCALAR_LOOP instead of LOOP, but to either the
-   entry or exit of LOOP.  */
+   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
+   continuation.  This is correct for cases where one loop continues from the
+   other like in the vectorizer, but not true for uses in e.g. loop distribution
+   where the loop is duplicated and then modified.
+
+   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
+   dominators were updated during the peeling.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
-					edge scalar_exit, edge e, edge *new_e)
+					edge scalar_exit, edge e, edge *new_e,
+					bool flow_loops)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1481,6 +1491,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	    scalar_exit	= exit;
 	    break;
 	  }
+
+      gcc_assert (scalar_exit);
     }
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
@@ -1513,6 +1525,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
   exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  gcc_assert (new_exit);
+
   /* Record the new loop exit information.  new_loop doesn't have SCEV data and
      so we must initialize the exit information.  */
   if (new_e)
@@ -1551,6 +1565,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
   for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
     rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
+  /* Rename the exit uses.  */
+  for (edge exit : get_loop_exit_edges (new_loop))
+    for (auto gsi = gsi_start_phis (exit->dest);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
+	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
+	if (MAY_HAVE_DEBUG_BIND_STMTS)
+	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
+      }
+
+  /* This condition happens when the loop has been versioned. e.g. due to ifcvt
+     versioning the loop.  */
   if (scalar_loop != loop)
     {
       /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
@@ -1564,28 +1591,83 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 						EDGE_SUCC (loop->latch, 0));
     }
 
+  auto loop_exits = get_loop_exit_edges (loop);
+  auto_vec<basic_block> doms;
+
   if (at_exit) /* Add the loop copy at exit.  */
     {
-      if (scalar_loop != loop)
+      if (scalar_loop != loop && new_exit->dest != exit_dest)
 	{
-	  gphi_iterator gsi;
 	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
+	  flush_pending_stmts (new_exit);
+	}
 
-	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
-	       gsi_next (&gsi))
-	    {
-	      gphi *phi = gsi.phi ();
-	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
-	      location_t orig_locus
-		= gimple_phi_arg_location_from_edge (phi, e);
+      auto_vec <gimple *> new_phis;
+      hash_map <tree, tree> new_phi_args;
+      /* First create the empty phi nodes so that when we flush the
+	 statements they can be filled in.   However because there is no order
+	 between the PHI nodes in the exits and the loop headers we need to
+	 order them base on the order of the two headers.  First record the new
+	 phi nodes.  */
+      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
+	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	  gphi *res = create_phi_node (new_res, new_preheader);
+	  new_phis.safe_push (res);
+	}
 
-	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
+      /* Then redirect the edges and flush the changes.  This writes out the new
+	 SSA names.  */
+      for (edge exit : loop_exits)
+	{
+	  edge e = redirect_edge_and_branch (exit, new_preheader);
+	  flush_pending_stmts (e);
+	}
+
+      /* Record the new SSA names in the cache so that we can skip materializing
+	 them again when we fill in the rest of the LCSSA variables.  */
+      for (auto phi : new_phis)
+	{
+	  tree new_arg = gimple_phi_arg (phi, 0)->def;
+	  new_phi_args.put (new_arg, gimple_phi_result (phi));
+	}
+
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      edge latch_new = single_succ_edge (new_preheader);
+      for (auto gsi_from = gsi_start_phis (loop->header),
+	   gsi_to = gsi_start_phis (new_loop->header);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						loop_latch_edge (loop));
+
+	  /* Check if we've already created a new phi node during edge
+	     redirection.  If we have, only propagate the value downwards.  */
+	  if (tree *res = new_phi_args.get (new_arg))
+	    {
+	      adjust_phi_and_debug_stmts (to_phi, latch_new, *res);
+	      continue;
 	    }
+
+	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+
+	  /* Main loop exit should use the final iter value.  */
+	  add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+
+	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
 	}
-      redirect_edge_and_branch_force (e, new_preheader);
-      flush_pending_stmts (e);
+
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom || duplicate_outer_loop)
+
+      if ((was_imm_dom || duplicate_outer_loop))
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -1598,6 +1680,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
     }
   else /* Add the copy at entry.  */
     {
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      for (auto gsi_from = gsi_start_phis (new_loop->header),
+	   gsi_to = gsi_start_phis (loop->header);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						loop_latch_edge (new_loop));
+	  adjust_phi_and_debug_stmts (to_phi, loop_preheader_edge (loop),
+				      new_arg);
+	}
+
       if (scalar_loop != loop)
 	{
 	  /* Remove the non-necessary forwarder of scalar_loop again.  */
@@ -1627,29 +1725,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 			       loop_preheader_edge (new_loop)->src);
     }
 
-  if (scalar_loop != loop)
-    {
-      /* Update new_loop->header PHIs, so that on the preheader
-	 edge they are the ones from loop rather than scalar_loop.  */
-      gphi_iterator gsi_orig, gsi_new;
-      edge orig_e = loop_preheader_edge (loop);
-      edge new_e = loop_preheader_edge (new_loop);
-
-      for (gsi_orig = gsi_start_phis (loop->header),
-	   gsi_new = gsi_start_phis (new_loop->header);
-	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
-	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
-	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  gphi *new_phi = gsi_new.phi ();
-	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
-	  location_t orig_locus
-	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
-
-	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
-	}
-    }
-
   free (new_bbs);
   free (bbs);
 
@@ -2579,139 +2654,36 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 
 /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
    this function searches for the corresponding lcssa phi node in exit
-   bb of LOOP.  If it is found, return the phi result; otherwise return
-   NULL.  */
+   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
+   return the phi result; otherwise return NULL.  */
 
 static tree
 find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
 		class loop *epilog ATTRIBUTE_UNUSED,
-		const_edge e, gphi *lcssa_phi)
+		const_edge e, gphi *lcssa_phi, int lcssa_edge = 0)
 {
   gphi_iterator gsi;
 
-  gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *phi = gsi.phi ();
-      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
-			   PHI_ARG_DEF (lcssa_phi, 0), 0))
-	return PHI_RESULT (phi);
-    }
-  return NULL_TREE;
-}
-
-/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
-   from SECOND/FIRST and puts it at the original loop's preheader/exit
-   edge, the two loops are arranged as below:
-
-       preheader_a:
-     first_loop:
-       header_a:
-	 i_1 = PHI<i_0, i_2>;
-	 ...
-	 i_2 = i_1 + 1;
-	 if (cond_a)
-	   goto latch_a;
-	 else
-	   goto between_bb;
-       latch_a:
-	 goto header_a;
-
-       between_bb:
-	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
-
-     second_loop:
-       header_b:
-	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
-				 or with i_2 if no LCSSA phi is created
-				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
-	 ...
-	 i_4 = i_3 + 1;
-	 if (cond_b)
-	   goto latch_b;
-	 else
-	   goto exit_bb;
-       latch_b:
-	 goto header_b;
-
-       exit_bb:
-
-   This function creates loop closed SSA for the first loop; update the
-   second loop's PHI nodes by replacing argument on incoming edge with the
-   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
-   is false, Loop closed ssa phis will only be created for non-iv phis for
-   the first loop.
-
-   This function assumes exit bb of the first loop is preheader bb of the
-   second loop, i.e, between_bb in the example code.  With PHIs updated,
-   the second loop will execute rest iterations of the first.  */
-
-static void
-slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
-				   class loop *first, edge first_loop_e,
-				   class loop *second, edge second_loop_e,
-				   bool create_lcssa_for_iv_phis)
-{
-  gphi_iterator gsi_update, gsi_orig;
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-
-  edge first_latch_e = EDGE_SUCC (first->latch, 0);
-  edge second_preheader_e = loop_preheader_edge (second);
-  basic_block between_bb = first_loop_e->dest;
-
-  gcc_assert (between_bb == second_preheader_e->src);
-  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
-  /* Either the first loop or the second is the loop to be vectorized.  */
-  gcc_assert (loop == first || loop == second);
-
-  for (gsi_orig = gsi_start_phis (first->header),
-       gsi_update = gsi_start_phis (second->header);
-       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
-       gsi_next (&gsi_orig), gsi_next (&gsi_update))
-    {
-      gphi *orig_phi = gsi_orig.phi ();
-      gphi *update_phi = gsi_update.phi ();
-
-      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
-      /* Generate lcssa PHI node for the first loop.  */
-      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
-      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
-      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
+      /* Nested loops with multiple exits can have different no# phi node
+	arguments between the main loop and epilog as epilog falls to the
+	second loop.  */
+      if (gimple_phi_num_args (phi) > e->dest_idx)
 	{
-	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
-	  arg = new_res;
-	}
-
-      /* Update PHI node in the second loop by replacing arg on the loop's
-	 incoming edge.  */
-      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
-    }
-
-  /* For epilogue peeling we have to make sure to copy all LC PHIs
-     for correct vectorization of live stmts.  */
-  if (loop == first)
-    {
-      basic_block orig_exit = second_loop_e->dest;
-      for (gsi_orig = gsi_start_phis (orig_exit);
-	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
-	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
-	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
-	    continue;
-
-	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-	  /* Already created in the above loop.   */
-	  if (find_guard_arg (first, second, exit_e, orig_phi))
+	 tree var = PHI_ARG_DEF (phi, e->dest_idx);
+	 if (TREE_CODE (var) != SSA_NAME)
 	    continue;
-
-	  tree new_res = copy_ssa_name (orig_arg);
-	  gphi *lcphi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
+	 tree def = get_current_def (var);
+	 if (!def)
+	   continue;
+	 if (operand_equal_p (def,
+			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
+	   return PHI_RESULT (phi);
 	}
     }
+  return NULL_TREE;
 }
 
 /* Function slpeel_add_loop_guard adds guard skipping from the beginning
@@ -2796,11 +2768,11 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
     }
 }
 
-/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
-   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
-   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
-   and MERGE_EDGE are the two pred edges of merge_bb at the end of EPILOG.
-   The CFG looks like:
+/* LOOP and EPILOG are two consecutive loops in CFG connected by LOOP_EXIT edge
+   and EPILOG is copied from LOOP.  Function slpeel_add_loop_guard adds guard
+   skipping from a point between the two loops to the end of EPILOG.  Edges
+   GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at the end of
+   EPILOG.  The CFG looks like:
 
      loop:
        header_a:
@@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
 
 static void
 slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
+				    const_edge loop_exit,
 				    edge guard_edge, edge merge_edge)
 {
   gphi_iterator gsi;
@@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
   gcc_assert (single_succ_p (merge_bb));
   edge e = single_succ_edge (merge_bb);
   basic_block exit_bb = e->dest;
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
 
   for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *update_phi = gsi.phi ();
-      tree old_arg = PHI_ARG_DEF (update_phi, 0);
+      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
 
       tree merge_arg = NULL_TREE;
 
@@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
       if (!merge_arg)
 	merge_arg = old_arg;
 
-      tree guard_arg
-	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
+      tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
+				       update_phi, e->dest_idx);
       /* If the var is live after loop but not a reduction, we simply
 	 use the old arg.  */
       if (!guard_arg)
@@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
     }
 }
 
-/* EPILOG loop is duplicated from the original loop for vectorizing,
-   the arg of its loop closed ssa PHI needs to be updated.  */
-
-static void
-slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
-{
-  gphi_iterator gsi;
-  basic_block exit_bb = single_exit (epilog)->dest;
-
-  gcc_assert (single_pred_p (exit_bb));
-  edge e = EDGE_PRED (exit_bb, 0);
-  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
-}
-
 /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be skipped.
    Return a value that equals:
 
@@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 						       e, &prolog_e);
       gcc_assert (prolog);
       prolog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
-					 exit_e, true);
+
       first_loop = prolog;
       reset_original_copy_tables ();
 
@@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
       gcc_assert (epilog);
       epilog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
-					 new_epilog_e, false);
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
       /* Scalar version loop may be preferred.  In this case, add guard
@@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 					   irred_flag);
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
-	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
+	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
+					      epilog_e);
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
@@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    }
 	  scale_loop_profile (epilog, prob_epilog, -1);
 	}
-      else
-	slpeel_update_phi_nodes_for_lcssa (epilog);
 
       unsigned HOST_WIDE_INT bound;
       if (bound_scalar.is_constant (&bound))
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e024d666df46ef9208107 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
   basic_block exit_bb;
   tree scalar_dest;
   tree scalar_type;
-  gimple *new_phi = NULL, *phi;
+  gimple *new_phi = NULL, *phi = NULL;
   gimple_stmt_iterator exit_gsi;
   tree new_temp = NULL_TREE, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd6012443403997e921066483 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *);
+						    edge, edge *, bool = true);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-10-02  7:41 [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
  2023-10-02  7:41 ` [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits Tamar Christina
  2023-10-02  7:42 ` [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling Tamar Christina
@ 2023-10-09 13:35 ` Richard Biener
  2023-10-11 10:45   ` Tamar Christina
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2023-10-09 13:35 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 2 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> This is extracted out of the patch series to support early break vectorization
> in order to simplify the review of that patch series.
> 
> The goal of this one is to separate out the refactoring from the new
> functionality.
> 
> This first patch separates out the vectorizer's definition of an exit to their
> own values inside loop_vinfo.  During vectorization we can have three separate
> copies for each loop: scalar, vectorized, epilogue.  The scalar loop can also be
> the versioned loop before peeling.
> 
> Because of this we track 3 different exits inside loop_vinfo corresponding to
> each of these loops.  Additionally each function that uses an exit, when not
> obviously clear which exit is needed will now take the exit explicitly as an
> argument.
> 
> This is because often times the callers switch the loops being passed around.
> While the caller knows which loops it is, the callee does not.
> 
> For now the loop exits are simply initialized to same value as before determined
> by single_exit (..).
> 
> No change in functionality is expected throughout this patch series.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
> no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
> 	(loop_distribution::distribute_loop): Bail out of not single exit.
> 	* tree-scalar-evolution.cc (get_loop_exit_condition): New.
> 	* tree-scalar-evolution.h (get_loop_exit_condition): New.
> 	* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
> 	explicitly.
> 	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
> 	vect_set_loop_condition_partial_vectors_avx512,
> 	vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
> 	take exit.
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
> 	return new peeled corresponding peeled exit.
> 	(slpeel_can_duplicate_loop_p): Explicitly take exit.
> 	(find_loop_location): Handle not knowing an explicit exit.
> 	(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
> 	find_guard_arg, slpeel_update_phi_nodes_for_loops,
> 	slpeel_update_phi_nodes_for_guard2): Use new exits.
> 	(vect_do_peeling): Update bookkeeping to keep track of exits.
> 	* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
> 	analyze.
> 	(vec_init_loop_exit_info): New.
> 	(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
> 	vec_epilogue_loop_iv, scalar_loop_iv.
> 	(vect_analyze_loop_form): Initialize exits.
> 	(vect_create_loop_vinfo): Set main exit.
> 	(vect_create_epilog_for_reduction, vectorizable_live_operation,
> 	vect_transform_loop): Use it.
> 	(scale_profile_for_vect_loop): Explicitly take exit to scale.
> 	* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
> 	* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
> 	LOOP_VINFO_SCALAR_IV_EXIT): New.
> 	(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
> 	scalar_loop_iv.
> 	(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
> 	slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
> 	(vec_init_loop_exit_info): New.
> 	(struct vect_loop_form_info): Add loop_exit.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> index a28470b66ea935741a61fb73961ed7c927543a3d..902edc49ab588152a5b845f2c8a42a7e2a1d6080 100644
> --- a/gcc/tree-loop-distribution.cc
> +++ b/gcc/tree-loop-distribution.cc
> @@ -949,7 +949,8 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
>    edge preheader = loop_preheader_edge (loop);
>  
>    initialize_original_copy_tables ();
> -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
> +						NULL, preheader, NULL);
>    gcc_assert (res != NULL);
>  
>    /* When a not last partition is supposed to keep the LC PHIs computed
> @@ -3043,6 +3044,24 @@ loop_distribution::distribute_loop (class loop *loop,
>        return 0;
>      }
>  
> +  /* Loop distribution only does prologue peeling but we still need to
> +     initialize loop exit information.  However we only support single exits at
> +     the moment.  As such, should exit information not have been provided and we
> +     have more than one exit, bail out.  */
> +  if (!single_exit (loop))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file,
> +		 "Loop %d not distributed: too many exits.\n",
> +		 loop->num);
> +
> +      free_rdg (rdg);
> +      loop_nest.release ();
> +      free_data_refs (datarefs_vec);
> +      delete ddrs_table;
> +      return 0;
> +    }

We are checking single_exit in the caller:

unsigned int
loop_distribution::execute (function *fun)
{
...
  /* We can at the moment only distribute non-nested loops, thus restrict
     walking to innermost loops.  */
  for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
    {
      /* Don't distribute multiple exit edges loop, or cold loop when
         not doing pattern detection.  */
      if (!single_exit (loop)
          || (!flag_tree_loop_distribute_patterns
              && !optimize_loop_for_speed_p (loop)))
        continue;

so this hunk shouldn't be necessary.

> +
>    data_reference_p dref;
>    for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
>      dref->aux = (void *) (uintptr_t) i;
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index c58a8a16e81573aada38e912b7c58b3e1b23b66d..f35ca1bded0b841179e4958645d264ad23684019 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  
>  extern tree number_of_latch_executions (class loop *);
>  extern gcond *get_loop_exit_condition (const class loop *);
> +extern gcond *get_loop_exit_condition (const_edge);
>  
>  extern void scev_initialize (void);
>  extern bool scev_initialized_p (void);
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 3fb6951e6085352c027d32c3548246042b98b64b..7cafe5ce576079921e380aaab5c5c4aa84cea372 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -1292,9 +1292,17 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr,
>  
>  gcond *
>  get_loop_exit_condition (const class loop *loop)
> +{
> +  return get_loop_exit_condition (single_exit (loop));
> +}
> +
> +/* If the statement just before the EXIT_EDGE contains a condition then
> +   return the condition, otherwise NULL. */
> +
> +gcond *
> +get_loop_exit_condition (const_edge exit_edge)
>  {
>    gcond *res = NULL;
> -  edge exit_edge = single_exit (loop);
>  
>    if (dump_file && (dump_flags & TDF_SCEV))
>      fprintf (dump_file, "(get_loop_exit_condition \n  ");
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 40ab568fe355964b878d770010aa9eeaef63eeac..9607a9fb25da26591ffd8071a02495f2042e0579 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -2078,7 +2078,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
>  
>    /* Check if we can possibly peel the loop.  */
>    if (!vect_can_advance_ivs_p (loop_vinfo)
> -      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> +      || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> +				       LOOP_VINFO_IV_EXIT (loop_vinfo))
>        || loop->inner)
>      do_peeling = false;
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 09641901ff1e5c03dd07ab6f85dd67288f940ea2..e06717272aafc6d31cbdcb94840ac25de616da6d 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -803,7 +803,7 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
>     final gcond.  */
>  
>  static gcond *
> -vect_set_loop_condition_partial_vectors (class loop *loop,
> +vect_set_loop_condition_partial_vectors (class loop *loop, edge exit_edge,
>  					 loop_vec_info loop_vinfo, tree niters,
>  					 tree final_iv, bool niters_maybe_zero,
>  					 gimple_stmt_iterator loop_cond_gsi)
> @@ -904,7 +904,6 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>    add_header_seq (loop, header_seq);
>  
>    /* Get a boolean result that tells us whether to iterate.  */
> -  edge exit_edge = single_exit (loop);
>    gcond *cond_stmt;
>    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>        && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> @@ -935,7 +934,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>    if (final_iv)
>      {
>        gassign *assign = gimple_build_assign (final_iv, orig_niters);
> -      gsi_insert_on_edge_immediate (single_exit (loop), assign);
> +      gsi_insert_on_edge_immediate (exit_edge, assign);
>      }
>  
>    return cond_stmt;
> @@ -953,6 +952,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>  
>  static gcond *
>  vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
> +					 edge exit_edge,
>  					 loop_vec_info loop_vinfo, tree niters,
>  					 tree final_iv,
>  					 bool niters_maybe_zero,
> @@ -1144,7 +1144,6 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>    add_preheader_seq (loop, preheader_seq);
>  
>    /* Adjust the exit test using the decrementing IV.  */
> -  edge exit_edge = single_exit (loop);
>    tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : GT_EXPR;
>    /* When we peel for alignment with niter_skip != 0 this can
>       cause niter + niter_skip to wrap and since we are comparing the
> @@ -1183,7 +1182,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
> +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
> +				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi)
>  {
> @@ -1191,13 +1191,12 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
>    gcond *cond_stmt;
>    gcond *orig_cond;
>    edge pe = loop_preheader_edge (loop);
> -  edge exit_edge = single_exit (loop);
>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
>    enum tree_code code;
>    tree niters_type = TREE_TYPE (niters);
>  
> -  orig_cond = get_loop_exit_condition (loop);
> +  orig_cond = get_loop_exit_condition (exit_edge);
>    gcc_assert (orig_cond);
>    loop_cond_gsi = gsi_for_stmt (orig_cond);
>  
> @@ -1305,19 +1304,18 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
>    if (final_iv)
>      {
>        gassign *assign;
> -      edge exit = single_exit (loop);
> -      gcc_assert (single_pred_p (exit->dest));
> +      gcc_assert (single_pred_p (exit_edge->dest));
>        tree phi_dest
>  	= integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
>        /* Make sure to maintain LC SSA form here and elide the subtraction
>  	 if the value is zero.  */
> -      gphi *phi = create_phi_node (phi_dest, exit->dest);
> -      add_phi_arg (phi, indx_after_incr, exit, UNKNOWN_LOCATION);
> +      gphi *phi = create_phi_node (phi_dest, exit_edge->dest);
> +      add_phi_arg (phi, indx_after_incr, exit_edge, UNKNOWN_LOCATION);
>        if (!integer_zerop (init))
>  	{
>  	  assign = gimple_build_assign (final_iv, MINUS_EXPR,
>  					phi_dest, init);
> -	  gimple_stmt_iterator gsi = gsi_after_labels (exit->dest);
> +	  gimple_stmt_iterator gsi = gsi_after_labels (exit_edge->dest);
>  	  gsi_insert_before (&gsi, assign, GSI_SAME_STMT);
>  	}
>      }
> @@ -1348,29 +1346,33 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
>     Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
>  
>  void
> -vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
> +vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo,
>  			 tree niters, tree step, tree final_iv,
>  			 bool niters_maybe_zero)
>  {
>    gcond *cond_stmt;
> -  gcond *orig_cond = get_loop_exit_condition (loop);
> +  gcond *orig_cond = get_loop_exit_condition (loop_e);
>    gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
>  
>    if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>      {
>        if (LOOP_VINFO_PARTIAL_VECTORS_STYLE (loop_vinfo) == vect_partial_vectors_avx512)
> -	cond_stmt = vect_set_loop_condition_partial_vectors_avx512 (loop, loop_vinfo,
> +	cond_stmt = vect_set_loop_condition_partial_vectors_avx512 (loop, loop_e,
> +								    loop_vinfo,
>  								    niters, final_iv,
>  								    niters_maybe_zero,
>  								    loop_cond_gsi);
>        else
> -	cond_stmt = vect_set_loop_condition_partial_vectors (loop, loop_vinfo,
> +	cond_stmt = vect_set_loop_condition_partial_vectors (loop, loop_e,
> +							     loop_vinfo,
>  							     niters, final_iv,
>  							     niters_maybe_zero,
>  							     loop_cond_gsi);
>      }
>    else
> -    cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv,
> +    cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop_e, loop,
> +						niters,
> +						step, final_iv,
>  						niters_maybe_zero,
>  						loop_cond_gsi);
>  
> @@ -1439,7 +1441,6 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
>  		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
>  }
>  
> -
>  /* Given LOOP this function generates a new copy of it and puts it
>     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
>     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
> @@ -1447,8 +1448,9 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
>     entry or exit of LOOP.  */
>  
>  class loop *
> -slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> -					class loop *scalar_loop, edge e)
> +slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
> +					class loop *scalar_loop,
> +					edge scalar_exit, edge e, edge *new_e)
>  {
>    class loop *new_loop;
>    basic_block *new_bbs, *bbs, *pbbs;
> @@ -1458,13 +1460,16 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>    edge exit, new_exit;
>    bool duplicate_outer_loop = false;
>  
> -  exit = single_exit (loop);
> +  exit = loop_exit;
>    at_exit = (e == exit);
>    if (!at_exit && e != loop_preheader_edge (loop))
>      return NULL;
>  
>    if (scalar_loop == NULL)
> -    scalar_loop = loop;
> +    {
> +      scalar_loop = loop;
> +      scalar_exit = loop_exit;
> +    }
>  
>    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
>    pbbs = bbs + 1;
> @@ -1490,13 +1495,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>    bbs[0] = preheader;
>    new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
>  
> -  exit = single_exit (scalar_loop);
>    copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
> -	    &exit, 1, &new_exit, NULL,
> +	    &scalar_exit, 1, &new_exit, NULL,
>  	    at_exit ? loop->latch : e->src, true);
> -  exit = single_exit (loop);
> +  exit = loop_exit;
>    basic_block new_preheader = new_bbs[0];
>  
> +  if (new_e)
> +    *new_e = new_exit;
> +
>    /* Before installing PHI arguments make sure that the edges
>       into them match that of the scalar loop we analyzed.  This
>       makes sure the SLP tree matches up between the main vectorized
> @@ -1537,8 +1544,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>  	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
>  	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
>  	 header) to have current_def set, so copy them over.  */
> -      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
> -						exit);
> +      slpeel_duplicate_current_defs_from_edges (scalar_exit, exit);
>        slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
>  							   0),
>  						EDGE_SUCC (loop->latch, 0));
> @@ -1696,11 +1702,11 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
>   */
>  
>  bool
> -slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
> +slpeel_can_duplicate_loop_p (const class loop *loop, const_edge exit_e,
> +			     const_edge e)
>  {
> -  edge exit_e = single_exit (loop);
>    edge entry_e = loop_preheader_edge (loop);
> -  gcond *orig_cond = get_loop_exit_condition (loop);
> +  gcond *orig_cond = get_loop_exit_condition (exit_e);
>    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
>    unsigned int num_bb = loop->inner? 5 : 2;
>  
> @@ -1709,7 +1715,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
>    if (!loop_outer (loop)
>        || loop->num_nodes != num_bb
>        || !empty_block_p (loop->latch)
> -      || !single_exit (loop)
> +      || !exit_e
>        /* Verify that new loop exit condition can be trivially modified.  */
>        || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
>        || (e != exit_e && e != entry_e))
> @@ -1722,7 +1728,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
>    return ret;
>  }
>  
> -/* Function vect_get_loop_location.
> +/* Function find_loop_location.
>  
>     Extract the location of the loop in the source code.
>     If the loop is not well formed for vectorization, an estimated
> @@ -1739,11 +1745,19 @@ find_loop_location (class loop *loop)
>    if (!loop)
>      return dump_user_location_t ();
>  
> -  stmt = get_loop_exit_condition (loop);
> +  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
> +    {
> +      /* We only care about the loop location, so use any exit with location
> +	 information.  */
> +      for (edge e : get_loop_exit_edges (loop))
> +	{
> +	  stmt = get_loop_exit_condition (e);
>  
> -  if (stmt
> -      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
> -    return stmt;
> +	  if (stmt
> +	      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
> +	    return stmt;
> +	}
> +    }
>  
>    /* If we got here the loop is probably not "well formed",
>       try to estimate the loop location */
> @@ -1962,7 +1976,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>    gphi_iterator gsi, gsi1;
>    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    basic_block update_bb = update_e->dest;
> -  basic_block exit_bb = single_exit (loop)->dest;
> +
> +  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>  
>    /* Make sure there exists a single-predecessor exit bb:  */
>    gcc_assert (single_pred_p (exit_bb));
> @@ -2529,10 +2544,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
>  {
>    /* We should be using a step_vector of VF if VF is variable.  */
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
> -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    tree type = TREE_TYPE (niters_vector);
>    tree log_vf = build_int_cst (type, exact_log2 (vf));
> -  basic_block exit_bb = single_exit (loop)->dest;
> +  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>  
>    gcc_assert (niters_vector_mult_vf_ptr != NULL);
>    tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
> @@ -2555,11 +2569,11 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
>     NULL.  */
>  
>  static tree
> -find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
> -		gphi *lcssa_phi)
> +find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
> +		class loop *epilog ATTRIBUTE_UNUSED,
> +		const_edge e, gphi *lcssa_phi)

please order 'e' after the corresponding loop argument

>  {
>    gphi_iterator gsi;
> -  edge e = single_exit (loop);
>  
>    gcc_assert (single_pred_p (e->dest));
>    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> @@ -2620,7 +2634,8 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
>  
>  static void
>  slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> -				   class loop *first, class loop *second,
> +				   class loop *first, edge first_loop_e,
> +				   class loop *second, edge second_loop_e,
>  				   bool create_lcssa_for_iv_phis)
>  {
>    gphi_iterator gsi_update, gsi_orig;
> @@ -2628,7 +2643,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
>  
>    edge first_latch_e = EDGE_SUCC (first->latch, 0);
>    edge second_preheader_e = loop_preheader_edge (second);
> -  basic_block between_bb = single_exit (first)->dest;
> +  basic_block between_bb = first_loop_e->dest;
>  
>    gcc_assert (between_bb == second_preheader_e->src);
>    gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
> @@ -2651,7 +2666,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
>  	{
>  	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
>  	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> -	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
> +	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
>  	  arg = new_res;
>  	}
>  
> @@ -2664,7 +2679,7 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
>       for correct vectorization of live stmts.  */
>    if (loop == first)
>      {
> -      basic_block orig_exit = single_exit (second)->dest;
> +      basic_block orig_exit = second_loop_e->dest;
>        for (gsi_orig = gsi_start_phis (orig_exit);
>  	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
>  	{
> @@ -2673,13 +2688,14 @@ slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
>  	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
>  	    continue;
>  
> +	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
>  	  /* Already created in the above loop.   */
> -	  if (find_guard_arg (first, second, orig_phi))
> +	  if (find_guard_arg (first, second, exit_e, orig_phi))
>  	    continue;
>  
>  	  tree new_res = copy_ssa_name (orig_arg);
>  	  gphi *lcphi = create_phi_node (new_res, between_bb);
> -	  add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION);
> +	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
>  	}
>      }
>  }
> @@ -2847,7 +2863,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>        if (!merge_arg)
>  	merge_arg = old_arg;
>  
> -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> +      tree guard_arg
> +	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);

missed adjustment?  you are introducing a single_exit call here ...

>        /* If the var is live after loop but not a reduction, we simply
>  	 use the old arg.  */
>        if (!guard_arg)
> @@ -3201,27 +3218,37 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>      }
>  
>    if (vect_epilogues)
> -    /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> -       use the original scalar loop as remaining epilogue if necessary.  */
> -    LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> -      = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> +    {
> +      /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> +	 use the original scalar loop as remaining epilogue if necessary.  */
> +      LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> +	= LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> +      LOOP_VINFO_SCALAR_IV_EXIT (epilogue_vinfo)
> +	= LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> +    }
>  
>    if (prolog_peeling)
>      {
>        e = loop_preheader_edge (loop);
> -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> +      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, exit_e, e));
>  
>        /* Peel prolog and put it on preheader edge of loop.  */
> -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> +      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> +      edge prolog_e = NULL;
> +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, exit_e,
> +						       scalar_loop, scalar_e,
> +						       e, &prolog_e);
>        gcc_assert (prolog);
>        prolog->force_vectorize = false;
> -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> +      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> +					 exit_e, true);
>        first_loop = prolog;
>        reset_original_copy_tables ();
>  
>        /* Update the number of iterations for prolog loop.  */
>        tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog));
> -      vect_set_loop_condition (prolog, NULL, niters_prolog,
> +      vect_set_loop_condition (prolog, prolog_e, loop_vinfo, niters_prolog,
>  			       step_prolog, NULL_TREE, false);
>  
>        /* Skip the prolog loop.  */
> @@ -3275,8 +3302,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  
>    if (epilog_peeling)
>      {
> -      e = single_exit (loop);
> -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> +      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e, e));
>  
>        /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
>  	 said epilog then we should use a copy of the main loop as a starting
> @@ -3285,12 +3312,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	 If we are not vectorizing the epilog then we should use the scalar loop
>  	 as the transformations mentioned above make less or no sense when not
>  	 vectorizing.  */
> +      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
>        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> +      edge epilog_e = vect_epilogues ? e : scalar_e;
> +      edge new_epilog_e = NULL;
> +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
> +						       epilog_e, e,
> +						       &new_epilog_e);
> +      LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
>        gcc_assert (epilog);
> -
>        epilog->force_vectorize = false;
> -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> +      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> +					 new_epilog_e, false);
>        bb_before_epilog = loop_preheader_edge (epilog)->src;
>  
>        /* Scalar version loop may be preferred.  In this case, add guard
> @@ -3374,16 +3407,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	{
>  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
>  				    niters, niters_vector_mult_vf);
> -	  guard_bb = single_exit (loop)->dest;
> -	  guard_to = split_edge (single_exit (epilog));
> +	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> +	  guard_to = split_edge (epilog_e);
>  	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
>  					   skip_vector ? anchor : guard_bb,
>  					   prob_epilog.invert (),
>  					   irred_flag);
>  	  if (vect_epilogues)
>  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> -	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> -					      single_exit (epilog));
> +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
>  	  /* Only need to handle basic block before epilog loop if it's not
>  	     the guard_bb, which is the case when skip_vector is true.  */
>  	  if (guard_bb != bb_before_epilog)
> @@ -3416,6 +3449,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>      {
>        epilog->aux = epilogue_vinfo;
>        LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
> +      LOOP_VINFO_IV_EXIT (epilogue_vinfo)
> +	= LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
>  
>        loop_constraint_clear (epilog, LOOP_C_INFINITE);
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 23c6e8259e7b133cd7acc6bcf0bad26423e9993a..6e60d84143626a8e1d801bb580f4dcebc73c7ba7 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -855,10 +855,9 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
>  
>  
>  static gcond *
> -vect_get_loop_niters (class loop *loop, tree *assumptions,
> +vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
>  		      tree *number_of_iterations, tree *number_of_iterationsm1)
>  {
> -  edge exit = single_exit (loop);
>    class tree_niter_desc niter_desc;
>    tree niter_assumptions, niter, may_be_zero;
>    gcond *cond = get_loop_exit_condition (loop);
> @@ -927,6 +926,20 @@ vect_get_loop_niters (class loop *loop, tree *assumptions,
>    return cond;
>  }
>  
> +/*  Determine the main loop exit for the vectorizer.  */
> +
> +edge

can't this be 'static'?

> +vec_init_loop_exit_info (class loop *loop)
> +{
> +  /* Before we begin we must first determine which exit is the main one and
> +     which are auxilary exits.  */
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  if (exits.length () == 1)
> +    return exits[0];
> +  else
> +    return NULL;
> +}
> +
>  /* Function bb_in_loop_p
>  
>     Used as predicate for dfs order traversal of the loop bbs.  */
> @@ -987,7 +1000,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
>      scalar_loop (NULL),
> -    orig_loop_info (NULL)
> +    orig_loop_info (NULL),
> +    vec_loop_iv (NULL),
> +    vec_epilogue_loop_iv (NULL),
> +    scalar_loop_iv (NULL)
>  {
>    /* CHECKME: We want to visit all BBs before their successors (except for
>       latch blocks, for which this assertion wouldn't hold).  In the simple
> @@ -1646,6 +1662,18 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  {
>    DUMP_VECT_SCOPE ("vect_analyze_loop_form");
>  
> +  edge exit_e = vec_init_loop_exit_info (loop);
> +  if (!exit_e)
> +    return opt_result::failure_at (vect_location,
> +				   "not vectorized:"
> +				   " could not determine main exit from"
> +				   " loop with multiple exits.\n");
> +  info->loop_exit = exit_e;
> +  if (dump_enabled_p ())
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +		       "using as main loop exit: %d -> %d [AUX: %p]\n",
> +		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
> +
>    /* Different restrictions apply when we are considering an inner-most loop,
>       vs. an outer (nested) loop.
>       (FORNOW. May want to relax some of these restrictions in the future).  */
> @@ -1767,7 +1795,7 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				   " abnormal loop exit edge.\n");
>  
>    info->loop_cond
> -    = vect_get_loop_niters (loop, &info->assumptions,
> +    = vect_get_loop_niters (loop, e, &info->assumptions,
>  			    &info->number_of_iterations,
>  			    &info->number_of_iterationsm1);
>    if (!info->loop_cond)
> @@ -1821,6 +1849,9 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>  
>    stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
>    STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +
> +  LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> +
>    if (info->inner_loop_cond)
>      {
>        stmt_vec_info inner_loop_cond_info
> @@ -3063,9 +3094,9 @@ start_over:
>        if (dump_enabled_p ())
>          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
>        if (!vect_can_advance_ivs_p (loop_vinfo)
> -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> -					   single_exit (LOOP_VINFO_LOOP
> -							 (loop_vinfo))))
> +	  || !slpeel_can_duplicate_loop_p (loop,
> +					   LOOP_VINFO_IV_EXIT (loop_vinfo),
> +					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
>          {
>  	  ok = opt_result::failure_at (vect_location,
>  				       "not vectorized: can't create required "
> @@ -6002,7 +6033,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>           Store them in NEW_PHIS.  */
>    if (double_reduc)
>      loop = outer_loop;
> -  exit_bb = single_exit (loop)->dest;
> +  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>    exit_gsi = gsi_after_labels (exit_bb);
>    reduc_inputs.create (slp_node ? vec_num : ncopies);
>    for (unsigned i = 0; i < vec_num; i++)
> @@ -6018,7 +6049,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  	  phi = create_phi_node (new_def, exit_bb);
>  	  if (j)
>  	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> -	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> +	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> @@ -10416,12 +10447,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = single_exit (loop)->dest;
> +      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>        gcc_assert (single_pred_p (exit_bb));
>  
>        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
>        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> +      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
>  
>        gimple_seq stmts = NULL;
>        tree new_tree;
> @@ -10965,7 +10996,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
>     profile.  */
>  
>  static void
> -scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
> +scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool flat)
>  {
>    /* For flat profiles do not scale down proportionally by VF and only
>       cap by known iteration count bounds.  */
> @@ -10980,7 +11011,6 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
>        return;
>      }
>    /* Loop body executes VF fewer times and exit increases VF times.  */
> -  edge exit_e = single_exit (loop);
>    profile_count entry_count = loop_preheader_edge (loop)->count ();
>  
>    /* If we have unreliable loop profile avoid dropping entry
> @@ -11350,7 +11380,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>  
>    /* Make sure there exists a single-predecessor exit bb.  Do this before 
>       versioning.   */
> -  edge e = single_exit (loop);
> +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
>    if (! single_pred_p (e->dest))
>      {
>        split_loop_exit_edge (e, true);
> @@ -11376,7 +11406,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>       loop closed PHI nodes on the exit.  */
>    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
>      {
> -      e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> +      e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
>        if (! single_pred_p (e->dest))
>  	{
>  	  split_loop_exit_edge (e, true);
> @@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>       a zero NITERS becomes a nonzero NITERS_VECTOR.  */
>    if (integer_onep (step_vector))
>      niters_no_overflow = true;
> -  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
> -			   niters_vector_mult_vf, !niters_no_overflow);
> +  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo), loop_vinfo,
> +			   niters_vector, step_vector, niters_vector_mult_vf,
> +			   !niters_no_overflow);
>  
>    unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
>  
> @@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>  			  assumed_vf) - 1
>  	 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
>  			   assumed_vf) - 1);
> -  scale_profile_for_vect_loop (loop, assumed_vf, flat);
> +  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> +			       assumed_vf, flat);
>  
>    if (dump_enabled_p ())
>      {
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e3740ecc4377f5a31e54 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -919,10 +919,24 @@ public:
>       analysis.  */
>    vec<_loop_vec_info *> epilogue_vinfos;
>  
> +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> +     controls the natural exits of the loop.  */
> +  edge vec_loop_iv;
> +
> +  /* The controlling loop IV for the epilogue loop when vectorizing.  This IV
> +     controls the natural exits of the loop.  */
> +  edge vec_epilogue_loop_iv;
> +
> +  /* The controlling loop IV for the scalar loop being vectorized.  This IV
> +     controls the natural exits of the loop.  */
> +  edge scalar_loop_iv;

all of the above sound as if they were IVs, the access macros have
_EXIT at the end, can you make the above as well?

Otherwise looks good to me.

Feel free to push approved patches of the series, no need to wait
until everything is approved.

Thanks,
Richard.

>  } *loop_vec_info;
>  
>  /* Access Functions.  */
>  #define LOOP_VINFO_LOOP(L)                 (L)->loop
> +#define LOOP_VINFO_IV_EXIT(L)              (L)->vec_loop_iv
> +#define LOOP_VINFO_EPILOGUE_IV_EXIT(L)     (L)->vec_epilogue_loop_iv
> +#define LOOP_VINFO_SCALAR_IV_EXIT(L)       (L)->scalar_loop_iv
>  #define LOOP_VINFO_BBS(L)                  (L)->bbs
>  #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
>  #define LOOP_VINFO_NITERS(L)               (L)->num_iters
> @@ -2155,11 +2169,13 @@ class auto_purge_vect_location
>  
>  /* Simple loop peeling and versioning utilities for vectorizer's purposes -
>     in tree-vect-loop-manip.cc.  */
> -extern void vect_set_loop_condition (class loop *, loop_vec_info,
> +extern void vect_set_loop_condition (class loop *, edge, loop_vec_info,
>  				     tree, tree, tree, bool);
> -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
> -class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> -						     class loop *, edge);
> +extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
> +					 const_edge);
> +class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> +						    class loop *, edge,
> +						    edge, edge *);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>  				    tree *, tree *, tree *, int, bool, bool,
> @@ -2169,6 +2185,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info);
>  extern dump_user_location_t find_loop_location (class loop *);
>  extern bool vect_can_advance_ivs_p (loop_vec_info);
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> +extern edge vec_init_loop_exit_info (class loop *);
>  
>  /* In tree-vect-stmts.cc.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> @@ -2358,6 +2375,7 @@ struct vect_loop_form_info
>    tree assumptions;
>    gcond *loop_cond;
>    gcond *inner_loop_cond;
> +  edge loop_exit;
>  };
>  extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);
>  extern loop_vec_info vect_create_loop_vinfo (class loop *, vec_info_shared *,
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac60378935392aa7b73476efed74b 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple *loop_vectorized_call,
>    class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
>  
>    LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
> +  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
> +    = vec_init_loop_exit_info (scalar_loop);
>    gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
>  		       == loop_vectorized_call);
>    /* If we are going to vectorize outer loop, prevent vectorization
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.
  2023-10-02  7:41 ` [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits Tamar Christina
@ 2023-10-10 11:13   ` Richard Biener
  2023-10-11 10:54     ` Tamar Christina
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2023-10-10 11:13 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 2 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> This second part updates niters analysis to be able to analyze any number of
> exits.  If we have multiple exits we determine the main exit by finding the
> first counting IV.
> 
> The change allows the vectorizer to pass analysis for multiple loops, but we
> later gracefully reject them.  It does however allow us to test if the exit
> handling is using the right exit everywhere.
> 
> Additionally since we analyze all exits, we now return all conditions for them
> and determine which condition belongs to the main exit.
> 
> The main condition is needed because the vectorizer needs to ignore the main IV
> condition during vectorization as it will replace it during codegen.
> 
> To track versioned loops we extend the contract between ifcvt and the vectorizer
> to store the exit number in aux so that we can match it up again during peeling.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
> no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
> 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
> 	it.
> 	* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
> 	(vec_init_loop_exit_info): Extend analysis when multiple exits.
> 	(vect_analyze_loop_form): Record conds and determine main cond.
> 	(vect_create_loop_vinfo): Extend bookkeeping of conds.
> 	(vect_analyze_loop): Release conds.
> 	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> 	LOOP_VINFO_LOOP_IV_COND):  New.
> 	(struct vect_loop_form_info): Add conds, alt_loop_conds;
> 	(struct loop_vec_info): Add conds, loop_iv_cond.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55903531b22829f4fdbd 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>      }
>    if (need_to_ifcvt)
>      {
> +      /* Before we rewrite edges we'll record their original position in the
> +	 edge map such that we can map the edges between the ifcvt and the
> +	 non-ifcvt loop during peeling.  */
> +      uintptr_t idx = 0;
> +      for (edge exit : get_loop_exit_edges (loop))
> +	exit->aux = (void*)idx++;
> +
>        /* Now all statements are if-convertible.  Combine all the basic
>  	 blocks into one huge basic block doing the if-conversion
>  	 on-the-fly.  */
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4052e1b12e0d17300262 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        scalar_loop = loop;
>        scalar_exit = loop_exit;
>      }
> +  else if (scalar_loop == loop)
> +    scalar_exit = loop_exit;
> +  else
> +    {
> +      /* Loop has been version, match exits up using the aux index.  */
> +      for (edge exit : get_loop_exit_edges (scalar_loop))
> +	if (exit->aux == loop_exit->aux)
> +	  {
> +	    scalar_exit	= exit;
> +	    break;
> +	  }
> +    }
>  
>    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
>    pbbs = bbs + 1;
> @@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>    exit = loop_exit;
>    basic_block new_preheader = new_bbs[0];
>  
> +  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
> +     so we must initialize the exit information.  */
>    if (new_e)
>      *new_e = new_exit;
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3a313b11d1ef98374349 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
>     in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
>     niter information holds in ASSUMPTIONS.
>  
> -   Return the loop exit condition.  */
> +   Return the loop exit conditions.  */
>  
>  
> -static gcond *
> -vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
> +static vec<gcond *>
> +vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge main_exit,
>  		      tree *number_of_iterations, tree *number_of_iterationsm1)

Any reason you swap exit and main_exit?  IMHO the input better pairs with
the other input 'loop'.


>  {
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  vec<gcond *> conds;
> +  conds.create (exits.length ());
>    class tree_niter_desc niter_desc;
>    tree niter_assumptions, niter, may_be_zero;
> -  gcond *cond = get_loop_exit_condition (loop);
>  
>    *assumptions = boolean_true_node;
>    *number_of_iterationsm1 = chrec_dont_know;
>    *number_of_iterations = chrec_dont_know;
> +
>    DUMP_VECT_SCOPE ("get_loop_niters");
>  
> -  if (!exit)
> -    return cond;
> +  if (exits.is_empty ())
> +    return conds;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> +		     exits.length ());
> +
> +  edge exit;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (exits, i, exit)
> +    {
> +      gcond *cond = get_loop_exit_condition (exit);
> +      if (cond)
> +	conds.safe_push (cond);
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
>  
> -  may_be_zero = NULL_TREE;
> -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> -      || chrec_contains_undetermined (niter_desc.niter))
> -    return cond;
> +      may_be_zero = NULL_TREE;
> +      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> +          || chrec_contains_undetermined (niter_desc.niter))
> +	continue;
>  
> -  niter_assumptions = niter_desc.assumptions;
> -  may_be_zero = niter_desc.may_be_zero;
> -  niter = niter_desc.niter;
> +      niter_assumptions = niter_desc.assumptions;
> +      may_be_zero = niter_desc.may_be_zero;
> +      niter = niter_desc.niter;
>  
> -  if (may_be_zero && integer_zerop (may_be_zero))
> -    may_be_zero = NULL_TREE;
> +      if (may_be_zero && integer_zerop (may_be_zero))
> +	may_be_zero = NULL_TREE;
>  
> -  if (may_be_zero)
> -    {
> -      if (COMPARISON_CLASS_P (may_be_zero))
> +      if (may_be_zero)
>  	{
> -	  /* Try to combine may_be_zero with assumptions, this can simplify
> -	     computation of niter expression.  */
> -	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> -					     niter_assumptions,
> -					     fold_build1 (TRUTH_NOT_EXPR,
> -							  boolean_type_node,
> -							  may_be_zero));
> +	  if (COMPARISON_CLASS_P (may_be_zero))
> +	    {
> +	      /* Try to combine may_be_zero with assumptions, this can simplify
> +		 computation of niter expression.  */
> +	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> +						 niter_assumptions,
> +						 fold_build1 (TRUTH_NOT_EXPR,
> +							      boolean_type_node,
> +							      may_be_zero));
> +	      else
> +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> +				     build_int_cst (TREE_TYPE (niter), 0),
> +				     rewrite_to_non_trapping_overflow (niter));
> +
> +	      may_be_zero = NULL_TREE;
> +	    }
> +	  else if (integer_nonzerop (may_be_zero) && exit == main_exit)
> +	    {
> +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> +	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> +	      continue;
> +	    }
>  	  else
> -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> -				 build_int_cst (TREE_TYPE (niter), 0),
> -				 rewrite_to_non_trapping_overflow (niter));
> +	    continue;
> +       }
>  
> -	  may_be_zero = NULL_TREE;
> -	}
> -      else if (integer_nonzerop (may_be_zero))
> +      /* Loop assumptions are based off the normal exit.  */
> +      if (exit == main_exit)

It's a bit hard to follow in patch form but I wonder why you even
analyze the number of iterations of the non-main exits riskying
possibly clobbering the *number_* outputs which we later assume
to be for the main exit?

>  	{
> -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> -	  return cond;
> +	  *assumptions = niter_assumptions;
> +	  *number_of_iterationsm1 = niter;
> +
> +	  /* We want the number of loop header executions which is the number
> +	     of latch executions plus one.
> +	     ???  For UINT_MAX latch executions this number overflows to zero
> +	     for loops like do { n++; } while (n != 0);  */
> +	  if (niter && !chrec_contains_undetermined (niter))
> +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> +				 unshare_expr (niter),
> +				 build_int_cst (TREE_TYPE (niter), 1));
> +	  *number_of_iterations = niter;
>  	}
> -      else
> -	return cond;
>      }
>  
> -  *assumptions = niter_assumptions;
> -  *number_of_iterationsm1 = niter;
> -
> -  /* We want the number of loop header executions which is the number
> -     of latch executions plus one.
> -     ???  For UINT_MAX latch executions this number overflows to zero
> -     for loops like do { n++; } while (n != 0);  */
> -  if (niter && !chrec_contains_undetermined (niter))
> -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
> -			  build_int_cst (TREE_TYPE (niter), 1));
> -  *number_of_iterations = niter;
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
>  
> -  return cond;
> +  return conds;
>  }
>  
>  /*  Determine the main loop exit for the vectorizer.  */
> @@ -936,8 +963,25 @@ vec_init_loop_exit_info (class loop *loop)
>    auto_vec<edge> exits = get_loop_exit_edges (loop);
>    if (exits.length () == 1)
>      return exits[0];
> -  else
> -    return NULL;
> +
> +  /* If we have multiple exits we only support counting IV at the moment.  Analyze
> +     all exits and return one */
> +  class tree_niter_desc niter_desc;
> +  edge candidate = NULL;
> +  for (edge exit : exits)
> +    {
> +      if (!get_loop_exit_condition (exit))
> +	continue;
> +
> +      if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> +	  && !chrec_contains_undetermined (niter_desc.niter))
> +	{
> +	  if (!niter_desc.may_be_zero || !candidate)
> +	    candidate = exit;
> +	}
> +    }
> +
> +  return candidate;
>  }
>  
>  /* Function bb_in_loop_p
> @@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				   "not vectorized: latch block not empty.\n");
>  
>    /* Make sure the exit is not abnormal.  */
> -  edge e = single_exit (loop);
> -  if (e->flags & EDGE_ABNORMAL)
> +  if (exit_e->flags & EDGE_ABNORMAL)
>      return opt_result::failure_at (vect_location,
>  				   "not vectorized:"
>  				   " abnormal loop exit edge.\n");
>  
> -  info->loop_cond
> -    = vect_get_loop_niters (loop, e, &info->assumptions,
> +  info->conds
> +    = vect_get_loop_niters (loop, &info->assumptions, exit_e,
>  			    &info->number_of_iterations,
>  			    &info->number_of_iterationsm1);
> -  if (!info->loop_cond)
> +
> +  if (info->conds.is_empty ())
>      return opt_result::failure_at
>        (vect_location,
>         "not vectorized: complicated exit condition.\n");
>  
> +  /* Determine what the primary and alternate exit conds are.  */
> +  info->alt_loop_conds.create (info->conds.length () - 1);
> +  for (gcond *cond : info->conds)
> +    {
> +      if (exit_e->src != gimple_bb (cond))
> +	info->alt_loop_conds.quick_push (cond);
> +      else
> +	info->loop_cond = cond;
> +    }
> +

IMHO it would be simpler to have the primary exit condition in
info->conds[0] and the rest after that?  That avoids having two
arrays and one scalar in vect_loop_form_info.

>    if (integer_zerop (info->assumptions)
>        || !info->number_of_iterations
>        || chrec_contains_undetermined (info->number_of_iterations))
> @@ -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>    if (!integer_onep (info->assumptions) && !main_loop_info)
>      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
>  
> -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
> -  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +  for (gcond *cond : info->conds)
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> +      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +    }
> +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
> +  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
>    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
>  
> @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  			 && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
>  			 && !loop->simduid);
>    if (!vect_epilogues)
> -    return first_loop_vinfo;
> +    {
> +      loop_form_info.conds.release ();
> +      loop_form_info.alt_loop_conds.release ();
> +      return first_loop_vinfo;
> +    }

I think there's 'inner' where you leak these.  Maybe use auto_vec<>
in vect_loop_form_info instead?

Otherwise looks OK.

Thanks,
Richard.

>    /* Now analyze first_loop_vinfo for epilogue vectorization.  */
>    poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
> @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
>      }
>  
> +  loop_form_info.conds.release ();
> +  loop_form_info.alt_loop_conds.release ();
> +
>    return first_loop_vinfo;
>  }
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa1327d595e1dddb112cfdf6 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -882,6 +882,12 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* List of loop additional IV conditionals found in the loop.  */
> +  auto_vec<gcond *> conds;
> +
> +  /* Main loop IV cond.  */
> +  gcond* loop_iv_cond;
> +
>    /* True if there are no loop carried data dependencies in the loop.
>       If loop->safelen <= 1, then this is always true, either the loop
>       didn't have any loop carried data dependencies, or the loop is being
> @@ -984,6 +990,8 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> +#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
>  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
>  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> @@ -2373,7 +2381,9 @@ struct vect_loop_form_info
>    tree number_of_iterations;
>    tree number_of_iterationsm1;
>    tree assumptions;
> +  vec<gcond *> conds;
>    gcond *loop_cond;
> +  vec<gcond *> alt_loop_conds;
>    gcond *inner_loop_cond;
>    edge loop_exit;
>  };
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling
  2023-10-02  7:42 ` [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling Tamar Christina
@ 2023-10-10 12:59   ` Richard Biener
  2023-10-11 11:16     ` Tamar Christina
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2023-10-10 12:59 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 2 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> This final patch updates peeling to maintain LCSSA all the way through.
> 
> It's significantly easier to maintain it during peeling while we still know
> where all new edges connect rather than touching it up later as is currently
> being done.
> 
> This allows us to remove many of the helper functions that touch up the loops
> at various parts.  The only complication is for loop distribution where we
> should be able to use the same,  however ldist depending on whether
> redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
> form itself or removes are non-virtual phis.
> 
> The problem here is that if we maintain LCSSA then in some cases the blocks
> connecting the two loops get PHIs to keep the loop IV up to date.
> 
> However there is no loop, the guard condition is rewritten as 0 != 0, to the
> "loop" always exits.   However due to the PHI nodes the probabilities get
> completely wrong.  It seems to think that the impossible exit is the likely
> edge.  This causes incorrect warnings and the presence of the PHIs prevent the
> blocks to be simplified.
> 
> While it may be possible to make ldist work with LCSSA form, doing so seems more
> work than not.  For that reason the peeling code has an additional parameter
> used by only ldist to not connect the two loops during peeling.
> 
> This preserves the current behaviour from ldist until I can dive into the
> implementation more.  Hopefully that's ok for now.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
> no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
> 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
> 	asserts.
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
> 	(find_guard_arg): Look value up through explicit edge and original defs.
> 	(vect_do_peeling): Use it.
> 	(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
> 	(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
> 	Remove.
> 	* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
> 	* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
> 	optional param to turn off LCSSA mode.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> index 902edc49ab588152a5b845f2c8a42a7e2a1d6080..14fb884d3e91d79785867debaee4956a2d5b0bb1 100644
> --- a/gcc/tree-loop-distribution.cc
> +++ b/gcc/tree-loop-distribution.cc
> @@ -950,7 +950,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
>  
>    initialize_original_copy_tables ();
>    res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
> -						NULL, preheader, NULL);
> +						NULL, preheader, NULL, false);
>    gcc_assert (res != NULL);
>  
>    /* When a not last partition is supposed to keep the LC PHIs computed
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 77f8e668bcc8beca99ba4052e1b12e0d17300262..0e8c0be5384aab2399ed93966e7bf4918f6c87a5 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
>  {
>    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
>  
> +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> +	      || orig_def != new_def);
> +
>    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
>  
>    if (MAY_HAVE_DEBUG_BIND_STMTS)
> @@ -1445,12 +1448,19 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
>     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
>     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
>     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> -   entry or exit of LOOP.  */
> +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
> +   continuation.  This is correct for cases where one loop continues from the
> +   other like in the vectorizer, but not true for uses in e.g. loop distribution
> +   where the loop is duplicated and then modified.

But for loop distribution the other loop also "continues" from the other,
maybe better say ", but not true for uses in e.g. loop distribution where
the contents of the loop body are split but the iteration space of both
copies remains the same."  It's an implementation limitation in loop
distribution that it for example doesn't support producing reductions
as the first loop (aka it cannot handle LC PHI nodes "inbetween").

> +
> +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
> +   dominators were updated during the peeling.  */
>  
>  class loop *
>  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  					class loop *scalar_loop,
> -					edge scalar_exit, edge e, edge *new_e)
> +					edge scalar_exit, edge e, edge *new_e,
> +					bool flow_loops)
>  {
>    class loop *new_loop;
>    basic_block *new_bbs, *bbs, *pbbs;
> @@ -1481,6 +1491,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	    scalar_exit	= exit;
>  	    break;
>  	  }
> +
> +      gcc_assert (scalar_exit);
>      }
>  
>    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> @@ -1513,6 +1525,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>    exit = loop_exit;
>    basic_block new_preheader = new_bbs[0];
>  
> +  gcc_assert (new_exit);
> +
>    /* Record the new loop exit information.  new_loop doesn't have SCEV data and
>       so we must initialize the exit information.  */
>    if (new_e)
> @@ -1551,6 +1565,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
>      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
>  
> +  /* Rename the exit uses.  */
> +  for (edge exit : get_loop_exit_edges (new_loop))
> +    for (auto gsi = gsi_start_phis (exit->dest);
> +	 !gsi_end_p (gsi); gsi_next (&gsi))
> +      {
> +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> +      }
> +
> +  /* This condition happens when the loop has been versioned. e.g. due to ifcvt
> +     versioning the loop.  */
>    if (scalar_loop != loop)
>      {
>        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
> @@ -1564,28 +1591,83 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  						EDGE_SUCC (loop->latch, 0));
>      }
>  
> +  auto loop_exits = get_loop_exit_edges (loop);
> +  auto_vec<basic_block> doms;
> +
>    if (at_exit) /* Add the loop copy at exit.  */
>      {
> -      if (scalar_loop != loop)
> +      if (scalar_loop != loop && new_exit->dest != exit_dest)
>  	{
> -	  gphi_iterator gsi;
>  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> +	  flush_pending_stmts (new_exit);
> +	}
>  
> -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> -	       gsi_next (&gsi))
> -	    {
> -	      gphi *phi = gsi.phi ();
> -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> -	      location_t orig_locus
> -		= gimple_phi_arg_location_from_edge (phi, e);
> +      auto_vec <gimple *> new_phis;
> +      hash_map <tree, tree> new_phi_args;
> +      /* First create the empty phi nodes so that when we flush the
> +	 statements they can be filled in.   However because there is no order
> +	 between the PHI nodes in the exits and the loop headers we need to
> +	 order them base on the order of the two headers.  First record the new
> +	 phi nodes.  */
> +      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> +	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> +	{
> +	  gimple *from_phi = gsi_stmt (gsi_from);
> +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +	  gphi *res = create_phi_node (new_res, new_preheader);
> +	  new_phis.safe_push (res);
> +	}
>  
> -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> +      /* Then redirect the edges and flush the changes.  This writes out the new
> +	 SSA names.  */
> +      for (edge exit : loop_exits)

I realize at the moment it's the same, but we are redirecting
multiple exit edges here and from the walk above expect them
all to have the same set of PHI nodes - that looks a bit fragile?
Does this need adjustments later for the early exit vectorization?

This also somewhat confuses the original redirection of 'e', the main
exit with the later (*)

> +	{
> +	  edge e = redirect_edge_and_branch (exit, new_preheader);
> +	  flush_pending_stmts (e);
> +	}
> +
> +      /* Record the new SSA names in the cache so that we can skip materializing
> +	 them again when we fill in the rest of the LCSSA variables.  */
> +      for (auto phi : new_phis)
> +	{
> +	  tree new_arg = gimple_phi_arg (phi, 0)->def;

and here you look at the (for now) single edge we redirected ...

> +	  new_phi_args.put (new_arg, gimple_phi_result (phi));
> +	}
> +
> +      /* Copy the current loop LC PHI nodes between the original loop exit
> +	 block and the new loop header.  This allows us to later split the
> +	 preheader block and still find the right LC nodes.  */
> +      edge latch_new = single_succ_edge (new_preheader);

odd name - the single successor of a loop preheader is the loop
header and the corresponding edge is the loop entry edge, not the latch?

> +      for (auto gsi_from = gsi_start_phis (loop->header),
> +	   gsi_to = gsi_start_phis (new_loop->header);
> +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);

Eh, can we have

  if (flow_loops)
    for  (auto ...)

please, even if that indents more?

> +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> +	{
> +	  gimple *from_phi = gsi_stmt (gsi_from);
> +	  gimple *to_phi = gsi_stmt (gsi_to);
> +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> +						loop_latch_edge (loop));
> +
> +	  /* Check if we've already created a new phi node during edge
> +	     redirection.  If we have, only propagate the value downwards.  */
> +	  if (tree *res = new_phi_args.get (new_arg))
> +	    {
> +	      adjust_phi_and_debug_stmts (to_phi, latch_new, *res);
> +	      continue;
>  	    }
> +
> +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> +
> +	  /* Main loop exit should use the final iter value.  */
> +	  add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);

For all other edges into the loop besides 'e' there's missing
PHI arguments?  You are using 'e' here again, but also use that as
temporary in for blocks, shadowing the parameter - that makes it
difficult to read.  Also it's sometimes 'e->dest' and sometimes
new_preheader - I think you want to use new_preheader here as well
(in create_phi_node) for consistency and ease of understanding.

ISTR when early break vectorization lands we're going to redirect
the alternate exits away again "fixing" the missing PHI args.

> +
> +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
>  	}
> -      redirect_edge_and_branch_force (e, new_preheader);
> -      flush_pending_stmts (e);
> +
>        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> -      if (was_imm_dom || duplicate_outer_loop)
> +
> +      if ((was_imm_dom || duplicate_outer_loop))

extra ()s

>  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
>  
>        /* And remove the non-necessary forwarder again.  Keep the other
> @@ -1598,6 +1680,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>      }
>    else /* Add the copy at entry.  */
>      {
> +      /* Copy the current loop LC PHI nodes between the original loop exit
> +	 block and the new loop header.  This allows us to later split the
> +	 preheader block and still find the right LC nodes.  */
> +      for (auto gsi_from = gsi_start_phis (new_loop->header),
> +	   gsi_to = gsi_start_phis (loop->header);
> +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);

same if (flow_loops)

> +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> +	{
> +	  gimple *from_phi = gsi_stmt (gsi_from);
> +	  gimple *to_phi = gsi_stmt (gsi_to);
> +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> +						loop_latch_edge (new_loop));

this looks wrong?  IMHO it should be the PHI_RESULT, no?  Note this
only triggers for alignment peeling ...

Otherwise looks OK.

Thanks,
Richard.


> +	  adjust_phi_and_debug_stmts (to_phi, loop_preheader_edge (loop),
> +				      new_arg);
> +	}
> +
>        if (scalar_loop != loop)
>  	{
>  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> @@ -1627,29 +1725,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  			       loop_preheader_edge (new_loop)->src);
>      }
>  
> -  if (scalar_loop != loop)
> -    {
> -      /* Update new_loop->header PHIs, so that on the preheader
> -	 edge they are the ones from loop rather than scalar_loop.  */
> -      gphi_iterator gsi_orig, gsi_new;
> -      edge orig_e = loop_preheader_edge (loop);
> -      edge new_e = loop_preheader_edge (new_loop);
> -
> -      for (gsi_orig = gsi_start_phis (loop->header),
> -	   gsi_new = gsi_start_phis (new_loop->header);
> -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> -	{
> -	  gphi *orig_phi = gsi_orig.phi ();
> -	  gphi *new_phi = gsi_new.phi ();
> -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> -	  location_t orig_locus
> -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> -
> -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> -	}
> -    }
> -
>    free (new_bbs);
>    free (bbs);
>  
> @@ -2579,139 +2654,36 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
>  
>  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
>     this function searches for the corresponding lcssa phi node in exit
> -   bb of LOOP.  If it is found, return the phi result; otherwise return
> -   NULL.  */
> +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> +   return the phi result; otherwise return NULL.  */
>  
>  static tree
>  find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
>  		class loop *epilog ATTRIBUTE_UNUSED,
> -		const_edge e, gphi *lcssa_phi)
> +		const_edge e, gphi *lcssa_phi, int lcssa_edge = 0)
>  {
>    gphi_iterator gsi;
>  
> -  gcc_assert (single_pred_p (e->dest));
>    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
>      {
>        gphi *phi = gsi.phi ();
> -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> -	return PHI_RESULT (phi);
> -    }
> -  return NULL_TREE;
> -}
> -
> -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
> -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> -   edge, the two loops are arranged as below:
> -
> -       preheader_a:
> -     first_loop:
> -       header_a:
> -	 i_1 = PHI<i_0, i_2>;
> -	 ...
> -	 i_2 = i_1 + 1;
> -	 if (cond_a)
> -	   goto latch_a;
> -	 else
> -	   goto between_bb;
> -       latch_a:
> -	 goto header_a;
> -
> -       between_bb:
> -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> -
> -     second_loop:
> -       header_b:
> -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> -				 or with i_2 if no LCSSA phi is created
> -				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
> -	 ...
> -	 i_4 = i_3 + 1;
> -	 if (cond_b)
> -	   goto latch_b;
> -	 else
> -	   goto exit_bb;
> -       latch_b:
> -	 goto header_b;
> -
> -       exit_bb:
> -
> -   This function creates loop closed SSA for the first loop; update the
> -   second loop's PHI nodes by replacing argument on incoming edge with the
> -   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
> -   is false, Loop closed ssa phis will only be created for non-iv phis for
> -   the first loop.
> -
> -   This function assumes exit bb of the first loop is preheader bb of the
> -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> -   the second loop will execute rest iterations of the first.  */
> -
> -static void
> -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> -				   class loop *first, edge first_loop_e,
> -				   class loop *second, edge second_loop_e,
> -				   bool create_lcssa_for_iv_phis)
> -{
> -  gphi_iterator gsi_update, gsi_orig;
> -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -
> -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> -  edge second_preheader_e = loop_preheader_edge (second);
> -  basic_block between_bb = first_loop_e->dest;
> -
> -  gcc_assert (between_bb == second_preheader_e->src);
> -  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
> -  /* Either the first loop or the second is the loop to be vectorized.  */
> -  gcc_assert (loop == first || loop == second);
> -
> -  for (gsi_orig = gsi_start_phis (first->header),
> -       gsi_update = gsi_start_phis (second->header);
> -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> -    {
> -      gphi *orig_phi = gsi_orig.phi ();
> -      gphi *update_phi = gsi_update.phi ();
> -
> -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> -      /* Generate lcssa PHI node for the first loop.  */
> -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> +      /* Nested loops with multiple exits can have different no# phi node
> +	arguments between the main loop and epilog as epilog falls to the
> +	second loop.  */
> +      if (gimple_phi_num_args (phi) > e->dest_idx)
>  	{
> -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> -	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
> -	  arg = new_res;
> -	}
> -
> -      /* Update PHI node in the second loop by replacing arg on the loop's
> -	 incoming edge.  */
> -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
> -    }
> -
> -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> -     for correct vectorization of live stmts.  */
> -  if (loop == first)
> -    {
> -      basic_block orig_exit = second_loop_e->dest;
> -      for (gsi_orig = gsi_start_phis (orig_exit);
> -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> -	{
> -	  gphi *orig_phi = gsi_orig.phi ();
> -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
> -	    continue;
> -
> -	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> -	  /* Already created in the above loop.   */
> -	  if (find_guard_arg (first, second, exit_e, orig_phi))
> +	 tree var = PHI_ARG_DEF (phi, e->dest_idx);
> +	 if (TREE_CODE (var) != SSA_NAME)
>  	    continue;
> -
> -	  tree new_res = copy_ssa_name (orig_arg);
> -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> -	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
> +	 tree def = get_current_def (var);
> +	 if (!def)
> +	   continue;
> +	 if (operand_equal_p (def,
> +			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> +	   return PHI_RESULT (phi);
>  	}
>      }
> +  return NULL_TREE;
>  }
>  
>  /* Function slpeel_add_loop_guard adds guard skipping from the beginning
> @@ -2796,11 +2768,11 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
>      }
>  }
>  
> -/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
> -   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
> -   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
> -   and MERGE_EDGE are the two pred edges of merge_bb at the end of EPILOG.
> -   The CFG looks like:
> +/* LOOP and EPILOG are two consecutive loops in CFG connected by LOOP_EXIT edge
> +   and EPILOG is copied from LOOP.  Function slpeel_add_loop_guard adds guard
> +   skipping from a point between the two loops to the end of EPILOG.  Edges
> +   GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at the end of
> +   EPILOG.  The CFG looks like:
>  
>       loop:
>         header_a:
> @@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
>  
>  static void
>  slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
> +				    const_edge loop_exit,
>  				    edge guard_edge, edge merge_edge)
>  {
>    gphi_iterator gsi;
> @@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>    gcc_assert (single_succ_p (merge_bb));
>    edge e = single_succ_edge (merge_bb);
>    basic_block exit_bb = e->dest;
> -  gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
>  
>    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
>      {
>        gphi *update_phi = gsi.phi ();
> -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
>  
>        tree merge_arg = NULL_TREE;
>  
> @@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>        if (!merge_arg)
>  	merge_arg = old_arg;
>  
> -      tree guard_arg
> -	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> +      tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
> +				       update_phi, e->dest_idx);
>        /* If the var is live after loop but not a reduction, we simply
>  	 use the old arg.  */
>        if (!guard_arg)
> @@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>      }
>  }
>  
> -/* EPILOG loop is duplicated from the original loop for vectorizing,
> -   the arg of its loop closed ssa PHI needs to be updated.  */
> -
> -static void
> -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> -{
> -  gphi_iterator gsi;
> -  basic_block exit_bb = single_exit (epilog)->dest;
> -
> -  gcc_assert (single_pred_p (exit_bb));
> -  edge e = EDGE_PRED (exit_bb, 0);
> -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> -}
> -
>  /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be skipped.
>     Return a value that equals:
>  
> @@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  						       e, &prolog_e);
>        gcc_assert (prolog);
>        prolog->force_vectorize = false;
> -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> -					 exit_e, true);
> +
>        first_loop = prolog;
>        reset_original_copy_tables ();
>  
> @@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>        LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
>        gcc_assert (epilog);
>        epilog->force_vectorize = false;
> -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> -					 new_epilog_e, false);
>        bb_before_epilog = loop_preheader_edge (epilog)->src;
>  
>        /* Scalar version loop may be preferred.  In this case, add guard
> @@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  					   irred_flag);
>  	  if (vect_epilogues)
>  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> -	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
> +	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
> +					      epilog_e);
>  	  /* Only need to handle basic block before epilog loop if it's not
>  	     the guard_bb, which is the case when skip_vector is true.  */
>  	  if (guard_bb != bb_before_epilog)
> @@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	    }
>  	  scale_loop_profile (epilog, prob_epilog, -1);
>  	}
> -      else
> -	slpeel_update_phi_nodes_for_lcssa (epilog);
>  
>        unsigned HOST_WIDE_INT bound;
>        if (bound_scalar.is_constant (&bound))
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e024d666df46ef9208107 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>    basic_block exit_bb;
>    tree scalar_dest;
>    tree scalar_type;
> -  gimple *new_phi = NULL, *phi;
> +  gimple *new_phi = NULL, *phi = NULL;
>    gimple_stmt_iterator exit_gsi;
>    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
>    gimple *epilog_stmt = NULL;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd6012443403997e921066483 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
>  					 const_edge);
>  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
>  						    class loop *, edge,
> -						    edge, edge *);
> +						    edge, edge *, bool = true);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>  				    tree *, tree *, tree *, int, bool, bool,
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-10-09 13:35 ` [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Richard Biener
@ 2023-10-11 10:45   ` Tamar Christina
  2023-10-11 12:07     ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Tamar Christina @ 2023-10-11 10:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> > @@ -2664,7 +2679,7 @@ slpeel_update_phi_nodes_for_loops
> (loop_vec_info loop_vinfo,
> >       for correct vectorization of live stmts.  */
> >    if (loop == first)
> >      {
> > -      basic_block orig_exit = single_exit (second)->dest;
> > +      basic_block orig_exit = second_loop_e->dest;
> >        for (gsi_orig = gsi_start_phis (orig_exit);
> >  	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> >  	{
> > @@ -2673,13 +2688,14 @@ slpeel_update_phi_nodes_for_loops
> (loop_vec_info loop_vinfo,
> >  	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> (orig_arg))
> >  	    continue;
> >
> > +	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> >  	  /* Already created in the above loop.   */
> > -	  if (find_guard_arg (first, second, orig_phi))
> > +	  if (find_guard_arg (first, second, exit_e, orig_phi))
> >  	    continue;
> >
> >  	  tree new_res = copy_ssa_name (orig_arg);
> >  	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> UNKNOWN_LOCATION);
> > +	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
> >  	}
> >      }
> >  }
> > @@ -2847,7 +2863,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> *loop, class loop *epilog,
> >        if (!merge_arg)
> >  	merge_arg = old_arg;
> >
> > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > +      tree guard_arg
> > +	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> 
> missed adjustment?  you are introducing a single_exit call here ...
> 

It's a very temporary one that gets removed in patch 3/3 when I start
passing the rest of the edges down explicitly. It allowed me to split the
patches a bit more.

> >        /* If the var is live after loop but not a reduction, we simply
> >  	 use the old arg.  */
> >        if (!guard_arg)
> > @@ -3201,27 +3218,37 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree niters, tree nitersm1,
> >      }
> >
> >    if (vect_epilogues)
> > -    /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> > -       use the original scalar loop as remaining epilogue if necessary.  */
> > -    LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> > -      = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > +    {
> > +      /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> > +	 use the original scalar loop as remaining epilogue if necessary.  */
> > +      LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> > +	= LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > +      LOOP_VINFO_SCALAR_IV_EXIT (epilogue_vinfo)
> > +	= LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > +    }
> >
> >    if (prolog_peeling)
> >      {
> >        e = loop_preheader_edge (loop);
> > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > +      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, exit_e,
> > + e));
> >
> >        /* Peel prolog and put it on preheader edge of loop.  */
> > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> > +      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > +      edge prolog_e = NULL;
> > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, exit_e,
> > +						       scalar_loop, scalar_e,
> > +						       e, &prolog_e);
> >        gcc_assert (prolog);
> >        prolog->force_vectorize = false;
> > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > +      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> > +					 exit_e, true);
> >        first_loop = prolog;
> >        reset_original_copy_tables ();
> >
> >        /* Update the number of iterations for prolog loop.  */
> >        tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog));
> > -      vect_set_loop_condition (prolog, NULL, niters_prolog,
> > +      vect_set_loop_condition (prolog, prolog_e, loop_vinfo,
> > + niters_prolog,
> >  			       step_prolog, NULL_TREE, false);
> >
> >        /* Skip the prolog loop.  */
> > @@ -3275,8 +3302,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> >
> >    if (epilog_peeling)
> >      {
> > -      e = single_exit (loop);
> > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > +      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e, e));
> >
> >        /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
> >  	 said epilog then we should use a copy of the main loop as a
> > starting @@ -3285,12 +3312,18 @@ vect_do_peeling (loop_vec_info
> loop_vinfo, tree niters, tree nitersm1,
> >  	 If we are not vectorizing the epilog then we should use the scalar loop
> >  	 as the transformations mentioned above make less or no sense when
> not
> >  	 vectorizing.  */
> > +      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > +      edge epilog_e = vect_epilogues ? e : scalar_e;
> > +      edge new_epilog_e = NULL;
> > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
> > +						       epilog_e, e,
> > +						       &new_epilog_e);
> > +      LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
> >        gcc_assert (epilog);
> > -
> >        epilog->force_vectorize = false;
> > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > +      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> > +					 new_epilog_e, false);
> >        bb_before_epilog = loop_preheader_edge (epilog)->src;
> >
> >        /* Scalar version loop may be preferred.  In this case, add
> > guard @@ -3374,16 +3407,16 @@ vect_do_peeling (loop_vec_info
> loop_vinfo, tree niters, tree nitersm1,
> >  	{
> >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> >  				    niters, niters_vector_mult_vf);
> > -	  guard_bb = single_exit (loop)->dest;
> > -	  guard_to = split_edge (single_exit (epilog));
> > +	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> > +	  guard_to = split_edge (epilog_e);
> >  	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
> >  					   skip_vector ? anchor : guard_bb,
> >  					   prob_epilog.invert (),
> >  					   irred_flag);
> >  	  if (vect_epilogues)
> >  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> > -	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > -					      single_exit (epilog));
> > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > +epilog_e);
> >  	  /* Only need to handle basic block before epilog loop if it's not
> >  	     the guard_bb, which is the case when skip_vector is true.  */
> >  	  if (guard_bb != bb_before_epilog)
> > @@ -3416,6 +3449,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >      {
> >        epilog->aux = epilogue_vinfo;
> >        LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
> > +      LOOP_VINFO_IV_EXIT (epilogue_vinfo)
> > +	= LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> >
> >        loop_constraint_clear (epilog, LOOP_C_INFINITE);
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 23c6e8259e7b133cd7acc6bcf0bad26423e9993a..6e60d84143626a8e1d80
> 1bb580f4
> > dcebc73c7ba7 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -855,10 +855,9 @@ vect_fixup_scalar_cycles_with_patterns
> > (loop_vec_info loop_vinfo)
> >
> >
> >  static gcond *
> > -vect_get_loop_niters (class loop *loop, tree *assumptions,
> > +vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
> >  		      tree *number_of_iterations, tree
> *number_of_iterationsm1)  {
> > -  edge exit = single_exit (loop);
> >    class tree_niter_desc niter_desc;
> >    tree niter_assumptions, niter, may_be_zero;
> >    gcond *cond = get_loop_exit_condition (loop); @@ -927,6 +926,20 @@
> > vect_get_loop_niters (class loop *loop, tree *assumptions,
> >    return cond;
> >  }
> >
> > +/*  Determine the main loop exit for the vectorizer.  */
> > +
> > +edge
> 
> can't this be 'static'?

No since it's used by set_uid_loop_bbs which is setting the loop out of get_loop.

If I understand correctly the expected loop from this is the ifcvt loop? If that's the
case I may be able to match it up through the ->aux again but since set_uid_loop_bbs
isn't called often I figure I can just re-analyze.

Regards,
Tamar

> 
> > +vec_init_loop_exit_info (class loop *loop) {
> > +  /* Before we begin we must first determine which exit is the main one and
> > +     which are auxilary exits.  */
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +  if (exits.length () == 1)
> > +    return exits[0];
> > +  else
> > +    return NULL;
> > +}
> > +
> >  /* Function bb_in_loop_p
> >
> >     Used as predicate for dfs order traversal of the loop bbs.  */ @@
> > -987,7 +1000,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in,
> vec_info_shared *shared)
> >      has_mask_store (false),
> >      scalar_loop_scaling (profile_probability::uninitialized ()),
> >      scalar_loop (NULL),
> > -    orig_loop_info (NULL)
> > +    orig_loop_info (NULL),
> > +    vec_loop_iv (NULL),
> > +    vec_epilogue_loop_iv (NULL),
> > +    scalar_loop_iv (NULL)
> >  {
> >    /* CHECKME: We want to visit all BBs before their successors (except for
> >       latch blocks, for which this assertion wouldn't hold).  In the
> > simple @@ -1646,6 +1662,18 @@ vect_analyze_loop_form (class loop
> > *loop, vect_loop_form_info *info)  {
> >    DUMP_VECT_SCOPE ("vect_analyze_loop_form");
> >
> > +  edge exit_e = vec_init_loop_exit_info (loop);
> > +  if (!exit_e)
> > +    return opt_result::failure_at (vect_location,
> > +				   "not vectorized:"
> > +				   " could not determine main exit from"
> > +				   " loop with multiple exits.\n");
> > +  info->loop_exit = exit_e;
> > +  if (dump_enabled_p ())
> > +      dump_printf_loc (MSG_NOTE, vect_location,
> > +		       "using as main loop exit: %d -> %d [AUX: %p]\n",
> > +		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
> > +
> >    /* Different restrictions apply when we are considering an inner-most loop,
> >       vs. an outer (nested) loop.
> >       (FORNOW. May want to relax some of these restrictions in the
> > future).  */ @@ -1767,7 +1795,7 @@ vect_analyze_loop_form (class loop
> *loop, vect_loop_form_info *info)
> >  				   " abnormal loop exit edge.\n");
> >
> >    info->loop_cond
> > -    = vect_get_loop_niters (loop, &info->assumptions,
> > +    = vect_get_loop_niters (loop, e, &info->assumptions,
> >  			    &info->number_of_iterations,
> >  			    &info->number_of_iterationsm1);
> >    if (!info->loop_cond)
> > @@ -1821,6 +1849,9 @@ vect_create_loop_vinfo (class loop *loop,
> > vec_info_shared *shared,
> >
> >    stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info-
> >loop_cond);
> >    STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > +
> > +  LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > +
> >    if (info->inner_loop_cond)
> >      {
> >        stmt_vec_info inner_loop_cond_info @@ -3063,9 +3094,9 @@
> > start_over:
> >        if (dump_enabled_p ())
> >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> >        if (!vect_can_advance_ivs_p (loop_vinfo)
> > -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > -					   single_exit (LOOP_VINFO_LOOP
> > -							 (loop_vinfo))))
> > +	  || !slpeel_can_duplicate_loop_p (loop,
> > +					   LOOP_VINFO_IV_EXIT (loop_vinfo),
> > +					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
> >          {
> >  	  ok = opt_result::failure_at (vect_location,
> >  				       "not vectorized: can't create required "
> > @@ -6002,7 +6033,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >           Store them in NEW_PHIS.  */
> >    if (double_reduc)
> >      loop = outer_loop;
> > -  exit_bb = single_exit (loop)->dest;
> > +  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> >    exit_gsi = gsi_after_labels (exit_bb);
> >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> >    for (unsigned i = 0; i < vec_num; i++) @@ -6018,7 +6049,7 @@
> > vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> >  	  phi = create_phi_node (new_def, exit_bb);
> >  	  if (j)
> >  	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> > -	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> > +	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)-
> >dest_idx,
> > +def);
> >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> >  	  reduc_inputs.quick_push (new_def);
> >  	}
> > @@ -10416,12 +10447,12 @@ vectorizable_live_operation (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  	   lhs' = new_tree;  */
> >
> >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -      basic_block exit_bb = single_exit (loop)->dest;
> > +      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> >        gcc_assert (single_pred_p (exit_bb));
> >
> >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> > +      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT
> > + (loop_vinfo)->dest_idx, vec_lhs);
> >
> >        gimple_seq stmts = NULL;
> >        tree new_tree;
> > @@ -10965,7 +10996,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> gimple_stmt_iterator *gsi,
> >     profile.  */
> >
> >  static void
> > -scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool
> > flat)
> > +scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned
> > +vf, bool flat)
> >  {
> >    /* For flat profiles do not scale down proportionally by VF and only
> >       cap by known iteration count bounds.  */ @@ -10980,7 +11011,6 @@
> > scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
> >        return;
> >      }
> >    /* Loop body executes VF fewer times and exit increases VF times.
> > */
> > -  edge exit_e = single_exit (loop);
> >    profile_count entry_count = loop_preheader_edge (loop)->count ();
> >
> >    /* If we have unreliable loop profile avoid dropping entry @@
> > -11350,7 +11380,7 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call)
> >
> >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> >       versioning.   */
> > -  edge e = single_exit (loop);
> > +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> >    if (! single_pred_p (e->dest))
> >      {
> >        split_loop_exit_edge (e, true); @@ -11376,7 +11406,7 @@
> > vect_transform_loop (loop_vec_info loop_vinfo, gimple
> *loop_vectorized_call)
> >       loop closed PHI nodes on the exit.  */
> >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> >      {
> > -      e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > +      e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> >        if (! single_pred_p (e->dest))
> >  	{
> >  	  split_loop_exit_edge (e, true);
> > @@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >       a zero NITERS becomes a nonzero NITERS_VECTOR.  */
> >    if (integer_onep (step_vector))
> >      niters_no_overflow = true;
> > -  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
> > -			   niters_vector_mult_vf, !niters_no_overflow);
> > +  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> loop_vinfo,
> > +			   niters_vector, step_vector, niters_vector_mult_vf,
> > +			   !niters_no_overflow);
> >
> >    unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
> >
> > @@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >  			  assumed_vf) - 1
> >  	 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
> >  			   assumed_vf) - 1);
> > -  scale_profile_for_vect_loop (loop, assumed_vf, flat);
> > +  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > +			       assumed_vf, flat);
> >
> >    if (dump_enabled_p ())
> >      {
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e
> 3740ecc
> > 4377f5a31e54 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -919,10 +919,24 @@ public:
> >       analysis.  */
> >    vec<_loop_vec_info *> epilogue_vinfos;
> >
> > +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> > +     controls the natural exits of the loop.  */  edge vec_loop_iv;
> > +
> > +  /* The controlling loop IV for the epilogue loop when vectorizing.  This IV
> > +     controls the natural exits of the loop.  */  edge
> > + vec_epilogue_loop_iv;
> > +
> > +  /* The controlling loop IV for the scalar loop being vectorized.  This IV
> > +     controls the natural exits of the loop.  */  edge
> > + scalar_loop_iv;
> 
> all of the above sound as if they were IVs, the access macros have _EXIT at the
> end, can you make the above as well?
> 
> Otherwise looks good to me.
> 
> Feel free to push approved patches of the series, no need to wait until
> everything is approved.
> 
> Thanks,
> Richard.
> 
> >  } *loop_vec_info;
> >
> >  /* Access Functions.  */
> >  #define LOOP_VINFO_LOOP(L)                 (L)->loop
> > +#define LOOP_VINFO_IV_EXIT(L)              (L)->vec_loop_iv
> > +#define LOOP_VINFO_EPILOGUE_IV_EXIT(L)     (L)->vec_epilogue_loop_iv
> > +#define LOOP_VINFO_SCALAR_IV_EXIT(L)       (L)->scalar_loop_iv
> >  #define LOOP_VINFO_BBS(L)                  (L)->bbs
> >  #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
> >  #define LOOP_VINFO_NITERS(L)               (L)->num_iters
> > @@ -2155,11 +2169,13 @@ class auto_purge_vect_location
> >
> >  /* Simple loop peeling and versioning utilities for vectorizer's purposes -
> >     in tree-vect-loop-manip.cc.  */
> > -extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > +extern void vect_set_loop_condition (class loop *, edge,
> > +loop_vec_info,
> >  				     tree, tree, tree, bool);
> > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > const_edge); -class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *,
> > -						     class loop *, edge);
> > +extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
> > +					 const_edge);
> > +class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> > +						    class loop *, edge,
> > +						    edge, edge *);
> >  class loop *vect_loop_versioning (loop_vec_info, gimple *);  extern
> > class loop *vect_do_peeling (loop_vec_info, tree, tree,
> >  				    tree *, tree *, tree *, int, bool, bool, @@ -
> 2169,6 +2185,7
> > @@ extern void vect_prepare_for_masked_peels (loop_vec_info);  extern
> > dump_user_location_t find_loop_location (class loop *);  extern bool
> > vect_can_advance_ivs_p (loop_vec_info);  extern void
> > vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> > +extern edge vec_init_loop_exit_info (class loop *);
> >
> >  /* In tree-vect-stmts.cc.  */
> >  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> > @@ -2358,6 +2375,7 @@ struct vect_loop_form_info
> >    tree assumptions;
> >    gcond *loop_cond;
> >    gcond *inner_loop_cond;
> > +  edge loop_exit;
> >  };
> >  extern opt_result vect_analyze_loop_form (class loop *,
> > vect_loop_form_info *);  extern loop_vec_info vect_create_loop_vinfo
> > (class loop *, vec_info_shared *, diff --git a/gcc/tree-vectorizer.cc
> > b/gcc/tree-vectorizer.cc index
> >
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac6037893
> 5392aa7b
> > 73476efed74b 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo,
> gimple *loop_vectorized_call,
> >    class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
> >
> >    LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
> > +  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
> > +    = vec_init_loop_exit_info (scalar_loop);
> >    gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
> >  		       == loop_vectorized_call);
> >    /* If we are going to vectorize outer loop, prevent vectorization
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.
  2023-10-10 11:13   ` Richard Biener
@ 2023-10-11 10:54     ` Tamar Christina
  2023-10-11 12:08       ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Tamar Christina @ 2023-10-11 10:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, October 10, 2023 12:14 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 2/3]middle-end: updated niters analysis to handle
> multiple exits.
> 
> On Mon, 2 Oct 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This second part updates niters analysis to be able to analyze any
> > number of exits.  If we have multiple exits we determine the main exit
> > by finding the first counting IV.
> >
> > The change allows the vectorizer to pass analysis for multiple loops,
> > but we later gracefully reject them.  It does however allow us to test
> > if the exit handling is using the right exit everywhere.
> >
> > Additionally since we analyze all exits, we now return all conditions
> > for them and determine which condition belongs to the main exit.
> >
> > The main condition is needed because the vectorizer needs to ignore
> > the main IV condition during vectorization as it will replace it during codegen.
> >
> > To track versioned loops we extend the contract between ifcvt and the
> > vectorizer to store the exit number in aux so that we can match it up again
> during peeling.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu,
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
> > 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> Use
> > 	it.
> > 	* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
> > 	(vec_init_loop_exit_info): Extend analysis when multiple exits.
> > 	(vect_analyze_loop_form): Record conds and determine main cond.
> > 	(vect_create_loop_vinfo): Extend bookkeeping of conds.
> > 	(vect_analyze_loop): Release conds.
> > 	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> > 	LOOP_VINFO_LOOP_IV_COND):  New.
> > 	(struct vect_loop_form_info): Add conds, alt_loop_conds;
> > 	(struct loop_vec_info): Add conds, loop_iv_cond.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index
> >
> 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55
> 903531
> > b22829f4fdbd 100644
> > --- a/gcc/tree-if-conv.cc
> > +++ b/gcc/tree-if-conv.cc
> > @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop,
> vec<gimple *> *preds)
> >      }
> >    if (need_to_ifcvt)
> >      {
> > +      /* Before we rewrite edges we'll record their original position in the
> > +	 edge map such that we can map the edges between the ifcvt and the
> > +	 non-ifcvt loop during peeling.  */
> > +      uintptr_t idx = 0;
> > +      for (edge exit : get_loop_exit_edges (loop))
> > +	exit->aux = (void*)idx++;
> > +
> >        /* Now all statements are if-convertible.  Combine all the basic
> >  	 blocks into one huge basic block doing the if-conversion
> >  	 on-the-fly.  */
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4
> 052e1b1
> > 2e0d17300262 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop, edge loop_exit,
> >        scalar_loop = loop;
> >        scalar_exit = loop_exit;
> >      }
> > +  else if (scalar_loop == loop)
> > +    scalar_exit = loop_exit;
> > +  else
> > +    {
> > +      /* Loop has been version, match exits up using the aux index.  */
> > +      for (edge exit : get_loop_exit_edges (scalar_loop))
> > +	if (exit->aux == loop_exit->aux)
> > +	  {
> > +	    scalar_exit	= exit;
> > +	    break;
> > +	  }
> > +    }
> >
> >    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> >    pbbs = bbs + 1;
> > @@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop, edge loop_exit,
> >    exit = loop_exit;
> >    basic_block new_preheader = new_bbs[0];
> >
> > +  /* Record the new loop exit information.  new_loop doesn't have SCEV
> data and
> > +     so we must initialize the exit information.  */
> >    if (new_e)
> >      *new_e = new_exit;
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3
> a313b11
> > d1ef98374349 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns
> (loop_vec_info loop_vinfo)
> >     in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
> >     niter information holds in ASSUMPTIONS.
> >
> > -   Return the loop exit condition.  */
> > +   Return the loop exit conditions.  */
> >
> >
> > -static gcond *
> > -vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
> > +static vec<gcond *>
> > +vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge
> > +main_exit,
> >  		      tree *number_of_iterations, tree
> *number_of_iterationsm1)
> 
> Any reason you swap exit and main_exit?  IMHO the input better pairs with
> the other input 'loop'.
> 

No, I think I was just rearranging thing to fit more on a line.  I'll put them next
to their exits.

> 
> >  {
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);  vec<gcond *>
> > + conds;  conds.create (exits.length ());
> >    class tree_niter_desc niter_desc;
> >    tree niter_assumptions, niter, may_be_zero;
> > -  gcond *cond = get_loop_exit_condition (loop);
> >
> >    *assumptions = boolean_true_node;
> >    *number_of_iterationsm1 = chrec_dont_know;
> >    *number_of_iterations = chrec_dont_know;
> > +
> >    DUMP_VECT_SCOPE ("get_loop_niters");
> >
> > -  if (!exit)
> > -    return cond;
> > +  if (exits.is_empty ())
> > +    return conds;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> > +		     exits.length ());
> > +
> > +  edge exit;
> > +  unsigned int i;
> > +  FOR_EACH_VEC_ELT (exits, i, exit)
> > +    {
> > +      gcond *cond = get_loop_exit_condition (exit);
> > +      if (cond)
> > +	conds.safe_push (cond);
> > +
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n",
> > +i);
> >
> > -  may_be_zero = NULL_TREE;
> > -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> NULL)
> > -      || chrec_contains_undetermined (niter_desc.niter))
> > -    return cond;
> > +      may_be_zero = NULL_TREE;
> > +      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> NULL)
> > +          || chrec_contains_undetermined (niter_desc.niter))
> > +	continue;
> >
> > -  niter_assumptions = niter_desc.assumptions;
> > -  may_be_zero = niter_desc.may_be_zero;
> > -  niter = niter_desc.niter;
> > +      niter_assumptions = niter_desc.assumptions;
> > +      may_be_zero = niter_desc.may_be_zero;
> > +      niter = niter_desc.niter;
> >
> > -  if (may_be_zero && integer_zerop (may_be_zero))
> > -    may_be_zero = NULL_TREE;
> > +      if (may_be_zero && integer_zerop (may_be_zero))
> > +	may_be_zero = NULL_TREE;
> >
> > -  if (may_be_zero)
> > -    {
> > -      if (COMPARISON_CLASS_P (may_be_zero))
> > +      if (may_be_zero)
> >  	{
> > -	  /* Try to combine may_be_zero with assumptions, this can simplify
> > -	     computation of niter expression.  */
> > -	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> boolean_type_node,
> > -					     niter_assumptions,
> > -					     fold_build1 (TRUTH_NOT_EXPR,
> > -							  boolean_type_node,
> > -							  may_be_zero));
> > +	  if (COMPARISON_CLASS_P (may_be_zero))
> > +	    {
> > +	      /* Try to combine may_be_zero with assumptions, this can simplify
> > +		 computation of niter expression.  */
> > +	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> boolean_type_node,
> > +						 niter_assumptions,
> > +						 fold_build1
> (TRUTH_NOT_EXPR,
> > +
> boolean_type_node,
> > +							      may_be_zero));
> > +	      else
> > +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter),
> may_be_zero,
> > +				     build_int_cst (TREE_TYPE (niter), 0),
> > +				     rewrite_to_non_trapping_overflow (niter));
> > +
> > +	      may_be_zero = NULL_TREE;
> > +	    }
> > +	  else if (integer_nonzerop (may_be_zero) && exit == main_exit)
> > +	    {
> > +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > +	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > +	      continue;
> > +	    }
> >  	  else
> > -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> > -				 build_int_cst (TREE_TYPE (niter), 0),
> > -				 rewrite_to_non_trapping_overflow (niter));
> > +	    continue;
> > +       }
> >
> > -	  may_be_zero = NULL_TREE;
> > -	}
> > -      else if (integer_nonzerop (may_be_zero))
> > +      /* Loop assumptions are based off the normal exit.  */
> > +      if (exit == main_exit)
> 
> It's a bit hard to follow in patch form but I wonder why you even analyze the
> number of iterations of the non-main exits riskying possibly clobbering the
> *number_* outputs which we later assume to be for the main exit?
> 

My original goal here was that if we can't analyze the other exits, we probably
can't vectorize them. So I don't really need the results but I thought it useful to
check.  I can skip them.

Thanks,
Tamar

> >  	{
> > -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > -	  return cond;
> > +	  *assumptions = niter_assumptions;
> > +	  *number_of_iterationsm1 = niter;
> > +
> > +	  /* We want the number of loop header executions which is the
> number
> > +	     of latch executions plus one.
> > +	     ???  For UINT_MAX latch executions this number overflows to zero
> > +	     for loops like do { n++; } while (n != 0);  */
> > +	  if (niter && !chrec_contains_undetermined (niter))
> > +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> > +				 unshare_expr (niter),
> > +				 build_int_cst (TREE_TYPE (niter), 1));
> > +	  *number_of_iterations = niter;
> >  	}
> > -      else
> > -	return cond;
> >      }
> >
> > -  *assumptions = niter_assumptions;
> > -  *number_of_iterationsm1 = niter;
> > -
> > -  /* We want the number of loop header executions which is the number
> > -     of latch executions plus one.
> > -     ???  For UINT_MAX latch executions this number overflows to zero
> > -     for loops like do { n++; } while (n != 0);  */
> > -  if (niter && !chrec_contains_undetermined (niter))
> > -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
> > -			  build_int_cst (TREE_TYPE (niter), 1));
> > -  *number_of_iterations = niter;
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits
> > + successfully analyzed.\n");
> >
> > -  return cond;
> > +  return conds;
> >  }
> >
> >  /*  Determine the main loop exit for the vectorizer.  */ @@ -936,8
> > +963,25 @@ vec_init_loop_exit_info (class loop *loop)
> >    auto_vec<edge> exits = get_loop_exit_edges (loop);
> >    if (exits.length () == 1)
> >      return exits[0];
> > -  else
> > -    return NULL;
> > +
> > +  /* If we have multiple exits we only support counting IV at the moment.
> Analyze
> > +     all exits and return one */
> > +  class tree_niter_desc niter_desc;
> > +  edge candidate = NULL;
> > +  for (edge exit : exits)
> > +    {
> > +      if (!get_loop_exit_condition (exit))
> > +	continue;
> > +
> > +      if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> NULL)
> > +	  && !chrec_contains_undetermined (niter_desc.niter))
> > +	{
> > +	  if (!niter_desc.may_be_zero || !candidate)
> > +	    candidate = exit;
> > +	}
> > +    }
> > +
> > +  return candidate;
> >  }
> >
> >  /* Function bb_in_loop_p
> > @@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
> >  				   "not vectorized: latch block not empty.\n");
> >
> >    /* Make sure the exit is not abnormal.  */
> > -  edge e = single_exit (loop);
> > -  if (e->flags & EDGE_ABNORMAL)
> > +  if (exit_e->flags & EDGE_ABNORMAL)
> >      return opt_result::failure_at (vect_location,
> >  				   "not vectorized:"
> >  				   " abnormal loop exit edge.\n");
> >
> > -  info->loop_cond
> > -    = vect_get_loop_niters (loop, e, &info->assumptions,
> > +  info->conds
> > +    = vect_get_loop_niters (loop, &info->assumptions, exit_e,
> >  			    &info->number_of_iterations,
> >  			    &info->number_of_iterationsm1);
> > -  if (!info->loop_cond)
> > +
> > +  if (info->conds.is_empty ())
> >      return opt_result::failure_at
> >        (vect_location,
> >         "not vectorized: complicated exit condition.\n");
> >
> > +  /* Determine what the primary and alternate exit conds are.  */
> > +  info->alt_loop_conds.create (info->conds.length () - 1);
> > +  for (gcond *cond : info->conds)
> > +    {
> > +      if (exit_e->src != gimple_bb (cond))
> > +	info->alt_loop_conds.quick_push (cond);
> > +      else
> > +	info->loop_cond = cond;
> > +    }
> > +
> 
> IMHO it would be simpler to have the primary exit condition in
> info->conds[0] and the rest after that?  That avoids having two
> arrays and one scalar in vect_loop_form_info.
> 
> >    if (integer_zerop (info->assumptions)
> >        || !info->number_of_iterations
> >        || chrec_contains_undetermined (info->number_of_iterations)) @@
> > -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop,
> vec_info_shared *shared,
> >    if (!integer_onep (info->assumptions) && !main_loop_info)
> >      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
> >
> > -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt
> > (info->loop_cond);
> > -  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > +  for (gcond *cond : info->conds)
> > +    {
> > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> > +      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > +    }
> > +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice
> > + (info->alt_loop_conds);  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) =
> > + info->loop_cond;
> >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> >
> > @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop,
> vec_info_shared *shared)
> >  			 && LOOP_VINFO_PEELING_FOR_NITER
> (first_loop_vinfo)
> >  			 && !loop->simduid);
> >    if (!vect_epilogues)
> > -    return first_loop_vinfo;
> > +    {
> > +      loop_form_info.conds.release ();
> > +      loop_form_info.alt_loop_conds.release ();
> > +      return first_loop_vinfo;
> > +    }
> 
> I think there's 'inner' where you leak these.  Maybe use auto_vec<> in
> vect_loop_form_info instead?
> 
> Otherwise looks OK.
> 
> Thanks,
> Richard.
> 
> >    /* Now analyze first_loop_vinfo for epilogue vectorization.  */
> >    poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD
> > (first_loop_vinfo); @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop
> *loop, vec_info_shared *shared)
> >  			   (first_loop_vinfo->epilogue_vinfos[0]-
> >vector_mode));
> >      }
> >
> > +  loop_form_info.conds.release ();
> > +  loop_form_info.alt_loop_conds.release ();
> > +
> >    return first_loop_vinfo;
> >  }
> >
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa132
> 7d595e1
> > dddb112cfdf6 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -882,6 +882,12 @@ public:
> >       we need to peel off iterations at the end to form an epilogue loop.  */
> >    bool peeling_for_niter;
> >
> > +  /* List of loop additional IV conditionals found in the loop.  */
> > + auto_vec<gcond *> conds;
> > +
> > +  /* Main loop IV cond.  */
> > +  gcond* loop_iv_cond;
> > +
> >    /* True if there are no loop carried data dependencies in the loop.
> >       If loop->safelen <= 1, then this is always true, either the loop
> >       didn't have any loop carried data dependencies, or the loop is
> > being @@ -984,6 +990,8 @@ public:
> >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > +#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > +#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> >no_data_dependencies
> >  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
> >  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> > @@ -2373,7 +2381,9 @@ struct vect_loop_form_info
> >    tree number_of_iterations;
> >    tree number_of_iterationsm1;
> >    tree assumptions;
> > +  vec<gcond *> conds;
> >    gcond *loop_cond;
> > +  vec<gcond *> alt_loop_conds;
> >    gcond *inner_loop_cond;
> >    edge loop_exit;
> >  };
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling
  2023-10-10 12:59   ` Richard Biener
@ 2023-10-11 11:16     ` Tamar Christina
  2023-10-11 12:09       ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Tamar Christina @ 2023-10-11 11:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> > +  auto loop_exits = get_loop_exit_edges (loop);
> > + auto_vec<basic_block> doms;
> > +
> >    if (at_exit) /* Add the loop copy at exit.  */
> >      {
> > -      if (scalar_loop != loop)
> > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> >  	{
> > -	  gphi_iterator gsi;
> >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > +	  flush_pending_stmts (new_exit);
> > +	}
> >
> > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > -	       gsi_next (&gsi))
> > -	    {
> > -	      gphi *phi = gsi.phi ();
> > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > -	      location_t orig_locus
> > -		= gimple_phi_arg_location_from_edge (phi, e);
> > +      auto_vec <gimple *> new_phis;
> > +      hash_map <tree, tree> new_phi_args;
> > +      /* First create the empty phi nodes so that when we flush the
> > +	 statements they can be filled in.   However because there is no order
> > +	 between the PHI nodes in the exits and the loop headers we need to
> > +	 order them base on the order of the two headers.  First record the
> new
> > +	 phi nodes.  */
> > +      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> > +	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> > +	{
> > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > +	  gphi *res = create_phi_node (new_res, new_preheader);
> > +	  new_phis.safe_push (res);
> > +	}
> >
> > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > +      /* Then redirect the edges and flush the changes.  This writes out the
> new
> > +	 SSA names.  */
> > +      for (edge exit : loop_exits)
> 
> I realize at the moment it's the same, but we are redirecting multiple exit edges
> here and from the walk above expect them all to have the same set of PHI
> nodes - that looks a bit fragile?

No, it only expects the two preheaders to have the same PHI nodes.  Since one loop
is copied from the other we know that to be true.

Now of course there are cases where your exit blocks have more PHI nodes than the
headers (e.g. live values) but those are handled later in the hunk below (with new_phi_args).

For the flush_pending_stmts to work I had to make sure the order of the phi nodes are the
same as the original.  This is why I can't iterate over the values in the exit block instead and
need to handle it in two steps.

> Does this need adjustments later for the early exit vectorization?
> 

I believe (need to finish the rebase) that the only adjustment I'll need here for multiple exits
is the updates of the dominators.  I don't think I'll need more.  I had issues with live values that
I had to handle specially before, but I think this new approach should deal with it already.

> This also somewhat confuses the original redirection of 'e', the main exit with
> the later (*)
> 
> > +	{
> > +	  edge e = redirect_edge_and_branch (exit, new_preheader);
> > +	  flush_pending_stmts (e);
> > +	}
> > +
> > +      /* Record the new SSA names in the cache so that we can skip
> materializing
> > +	 them again when we fill in the rest of the LCSSA variables.  */
> > +      for (auto phi : new_phis)
> > +	{
> > +	  tree new_arg = gimple_phi_arg (phi, 0)->def;
> 
> and here you look at the (for now) single edge we redirected ...
> 
> > +	  new_phi_args.put (new_arg, gimple_phi_result (phi));
> > +	}
> > +
> > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > +	 block and the new loop header.  This allows us to later split the
> > +	 preheader block and still find the right LC nodes.  */
> > +      edge latch_new = single_succ_edge (new_preheader);
> 
> odd name - the single successor of a loop preheader is the loop header and the
> corresponding edge is the loop entry edge, not the latch?
> 
> > +      for (auto gsi_from = gsi_start_phis (loop->header),
> > +	   gsi_to = gsi_start_phis (new_loop->header);
> > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> 
> Eh, can we have
> 
>   if (flow_loops)
>     for  (auto ...)
> 
> please, even if that indents more?
> 
> > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > +	{
> > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > +						loop_latch_edge (loop));
> > +
> > +	  /* Check if we've already created a new phi node during edge
> > +	     redirection.  If we have, only propagate the value downwards.  */
> > +	  if (tree *res = new_phi_args.get (new_arg))
> > +	    {
> > +	      adjust_phi_and_debug_stmts (to_phi, latch_new, *res);
> > +	      continue;
> >  	    }
> > +
> > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > +
> > +	  /* Main loop exit should use the final iter value.  */
> > +	  add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
> 
> For all other edges into the loop besides 'e' there's missing PHI arguments?
> You are using 'e' here again, but also use that as temporary in for blocks,
> shadowing the parameter - that makes it difficult to read.  Also it's sometimes
> 'e->dest' and sometimes new_preheader - I think you want to use
> new_preheader here as well (in create_phi_node) for consistency and ease of
> understanding.
> 
> ISTR when early break vectorization lands we're going to redirect the alternate
> exits away again "fixing" the missing PHI args.
> 

We indeed had a discussion about this, and I'll expand more on the reasoning in the
patch for early breaks.  But I think not redirecting the edges away for early break makes
more sense as It treats early break, alignment peeling and epilogue vectorization the same
way and the only difference is in the statement inside the guard blocks.

But also more importantly this representation also makes it easier to implement First-Faulting
Loads support.  For FFL we'll copy the main loop and at the "fault" check we branch to a new
Loop remainder that has the same sequences as the remainder of the main vector loop but
with different predicates.  The reason for this is to remove the predicate mangling from the
optimal/likely loop body which is critical for performance.

Now since FFL is intended to pair naturally with early break having the early exit edges all
lead into the same block makes the flow a lot easier to manage.

But I'll make sure to include a diagram in the early break peeling patch.

Thanks,
Tamar

> > +
> > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> >  	}
> > -      redirect_edge_and_branch_force (e, new_preheader);
> > -      flush_pending_stmts (e);
> > +
> >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> > -      if (was_imm_dom || duplicate_outer_loop)
> > +
> > +      if ((was_imm_dom || duplicate_outer_loop))
> 
> extra ()s
> 
> >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> >src);
> >
> >        /* And remove the non-necessary forwarder again.  Keep the
> > other @@ -1598,6 +1680,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class loop *loop, edge loop_exit,
> >      }
> >    else /* Add the copy at entry.  */
> >      {
> > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > +	 block and the new loop header.  This allows us to later split the
> > +	 preheader block and still find the right LC nodes.  */
> > +      for (auto gsi_from = gsi_start_phis (new_loop->header),
> > +	   gsi_to = gsi_start_phis (loop->header);
> > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> 
> same if (flow_loops)
> 
> > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > +	{
> > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > +						loop_latch_edge (new_loop));
> 
> this looks wrong?  IMHO it should be the PHI_RESULT, no?  Note this only
> triggers for alignment peeling ...
> 
> Otherwise looks OK.
> 
> Thanks,
> Richard.
> 
> 
> > +	  adjust_phi_and_debug_stmts (to_phi, loop_preheader_edge (loop),
> > +				      new_arg);
> > +	}
> > +
> >        if (scalar_loop != loop)
> >  	{
> >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */ @@
> > -1627,29 +1725,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> *loop, edge loop_exit,
> >  			       loop_preheader_edge (new_loop)->src);
> >      }
> >
> > -  if (scalar_loop != loop)
> > -    {
> > -      /* Update new_loop->header PHIs, so that on the preheader
> > -	 edge they are the ones from loop rather than scalar_loop.  */
> > -      gphi_iterator gsi_orig, gsi_new;
> > -      edge orig_e = loop_preheader_edge (loop);
> > -      edge new_e = loop_preheader_edge (new_loop);
> > -
> > -      for (gsi_orig = gsi_start_phis (loop->header),
> > -	   gsi_new = gsi_start_phis (new_loop->header);
> > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > -	{
> > -	  gphi *orig_phi = gsi_orig.phi ();
> > -	  gphi *new_phi = gsi_new.phi ();
> > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > -	  location_t orig_locus
> > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > -
> > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > -	}
> > -    }
> > -
> >    free (new_bbs);
> >    free (bbs);
> >
> > @@ -2579,139 +2654,36 @@ vect_gen_vector_loop_niters_mult_vf
> > (loop_vec_info loop_vinfo,
> >
> >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> >     this function searches for the corresponding lcssa phi node in exit
> > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > -   NULL.  */
> > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > +   return the phi result; otherwise return NULL.  */
> >
> >  static tree
> >  find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
> >  		class loop *epilog ATTRIBUTE_UNUSED,
> > -		const_edge e, gphi *lcssa_phi)
> > +		const_edge e, gphi *lcssa_phi, int lcssa_edge = 0)
> >  {
> >    gphi_iterator gsi;
> >
> > -  gcc_assert (single_pred_p (e->dest));
> >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> >      {
> >        gphi *phi = gsi.phi ();
> > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > -	return PHI_RESULT (phi);
> > -    }
> > -  return NULL_TREE;
> > -}
> > -
> > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> FIRST/SECOND
> > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > -   edge, the two loops are arranged as below:
> > -
> > -       preheader_a:
> > -     first_loop:
> > -       header_a:
> > -	 i_1 = PHI<i_0, i_2>;
> > -	 ...
> > -	 i_2 = i_1 + 1;
> > -	 if (cond_a)
> > -	   goto latch_a;
> > -	 else
> > -	   goto between_bb;
> > -       latch_a:
> > -	 goto header_a;
> > -
> > -       between_bb:
> > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > -
> > -     second_loop:
> > -       header_b:
> > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > -				 or with i_2 if no LCSSA phi is created
> > -				 under condition of
> CREATE_LCSSA_FOR_IV_PHIS.
> > -	 ...
> > -	 i_4 = i_3 + 1;
> > -	 if (cond_b)
> > -	   goto latch_b;
> > -	 else
> > -	   goto exit_bb;
> > -       latch_b:
> > -	 goto header_b;
> > -
> > -       exit_bb:
> > -
> > -   This function creates loop closed SSA for the first loop; update the
> > -   second loop's PHI nodes by replacing argument on incoming edge with the
> > -   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
> > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > -   the first loop.
> > -
> > -   This function assumes exit bb of the first loop is preheader bb of the
> > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > -   the second loop will execute rest iterations of the first.  */
> > -
> > -static void
> > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > -				   class loop *first, edge first_loop_e,
> > -				   class loop *second, edge second_loop_e,
> > -				   bool create_lcssa_for_iv_phis)
> > -{
> > -  gphi_iterator gsi_update, gsi_orig;
> > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -
> > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > -  edge second_preheader_e = loop_preheader_edge (second);
> > -  basic_block between_bb = first_loop_e->dest;
> > -
> > -  gcc_assert (between_bb == second_preheader_e->src);
> > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > (between_bb));
> > -  /* Either the first loop or the second is the loop to be
> > vectorized.  */
> > -  gcc_assert (loop == first || loop == second);
> > -
> > -  for (gsi_orig = gsi_start_phis (first->header),
> > -       gsi_update = gsi_start_phis (second->header);
> > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > -    {
> > -      gphi *orig_phi = gsi_orig.phi ();
> > -      gphi *update_phi = gsi_update.phi ();
> > -
> > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > -      /* Generate lcssa PHI node for the first loop.  */
> > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > +      /* Nested loops with multiple exits can have different no# phi node
> > +	arguments between the main loop and epilog as epilog falls to the
> > +	second loop.  */
> > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> >  	{
> > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > -	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
> > -	  arg = new_res;
> > -	}
> > -
> > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > -	 incoming edge.  */
> > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
> > -    }
> > -
> > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > -     for correct vectorization of live stmts.  */
> > -  if (loop == first)
> > -    {
> > -      basic_block orig_exit = second_loop_e->dest;
> > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > -	{
> > -	  gphi *orig_phi = gsi_orig.phi ();
> > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> (orig_arg))
> > -	    continue;
> > -
> > -	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > -	  /* Already created in the above loop.   */
> > -	  if (find_guard_arg (first, second, exit_e, orig_phi))
> > +	 tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > +	 if (TREE_CODE (var) != SSA_NAME)
> >  	    continue;
> > -
> > -	  tree new_res = copy_ssa_name (orig_arg);
> > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > -	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
> > +	 tree def = get_current_def (var);
> > +	 if (!def)
> > +	   continue;
> > +	 if (operand_equal_p (def,
> > +			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > +	   return PHI_RESULT (phi);
> >  	}
> >      }
> > +  return NULL_TREE;
> >  }
> >
> >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > beginning @@ -2796,11 +2768,11 @@
> slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
> >      }
> >  }
> >
> > -/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
> > -   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
> > -   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
> > -   and MERGE_EDGE are the two pred edges of merge_bb at the end of
> EPILOG.
> > -   The CFG looks like:
> > +/* LOOP and EPILOG are two consecutive loops in CFG connected by
> LOOP_EXIT edge
> > +   and EPILOG is copied from LOOP.  Function slpeel_add_loop_guard adds
> guard
> > +   skipping from a point between the two loops to the end of EPILOG.  Edges
> > +   GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at
> the end of
> > +   EPILOG.  The CFG looks like:
> >
> >       loop:
> >         header_a:
> > @@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop
> > *skip_loop,
> >
> >  static void
> >  slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop
> > *epilog,
> > +				    const_edge loop_exit,
> >  				    edge guard_edge, edge merge_edge)  {
> >    gphi_iterator gsi;
> > @@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop *loop, class loop *epilog,
> >    gcc_assert (single_succ_p (merge_bb));
> >    edge e = single_succ_edge (merge_bb);
> >    basic_block exit_bb = e->dest;
> > -  gcc_assert (single_pred_p (exit_bb));
> > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> >
> >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >      {
> >        gphi *update_phi = gsi.phi ();
> > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> >
> >        tree merge_arg = NULL_TREE;
> >
> > @@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> *loop, class loop *epilog,
> >        if (!merge_arg)
> >  	merge_arg = old_arg;
> >
> > -      tree guard_arg
> > -	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> > +      tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
> > +				       update_phi, e->dest_idx);
> >        /* If the var is live after loop but not a reduction, we simply
> >  	 use the old arg.  */
> >        if (!guard_arg)
> > @@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop *loop, class loop *epilog,
> >      }
> >  }
> >
> > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > -
> > -static void
> > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog) -{
> > -  gphi_iterator gsi;
> > -  basic_block exit_bb = single_exit (epilog)->dest;
> > -
> > -  gcc_assert (single_pred_p (exit_bb));
> > -  edge e = EDGE_PRED (exit_bb, 0);
> > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > -}
> > -
> >  /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be
> skipped.
> >     Return a value that equals:
> >
> > @@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  						       e, &prolog_e);
> >        gcc_assert (prolog);
> >        prolog->force_vectorize = false;
> > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> > -					 exit_e, true);
> > +
> >        first_loop = prolog;
> >        reset_original_copy_tables ();
> >
> > @@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >        LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
> >        gcc_assert (epilog);
> >        epilog->force_vectorize = false;
> > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> > -					 new_epilog_e, false);
> >        bb_before_epilog = loop_preheader_edge (epilog)->src;
> >
> >        /* Scalar version loop may be preferred.  In this case, add
> > guard @@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info
> loop_vinfo, tree niters, tree nitersm1,
> >  					   irred_flag);
> >  	  if (vect_epilogues)
> >  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> > -	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> epilog_e);
> > +	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv,
> guard_e,
> > +					      epilog_e);
> >  	  /* Only need to handle basic block before epilog loop if it's not
> >  	     the guard_bb, which is the case when skip_vector is true.  */
> >  	  if (guard_bb != bb_before_epilog)
> > @@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  	    }
> >  	  scale_loop_profile (epilog, prob_epilog, -1);
> >  	}
> > -      else
> > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> >
> >        unsigned HOST_WIDE_INT bound;
> >        if (bound_scalar.is_constant (&bound)) diff --git
> > a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e0
> 24d666d
> > f46ef9208107 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >    basic_block exit_bb;
> >    tree scalar_dest;
> >    tree scalar_type;
> > -  gimple *new_phi = NULL, *phi;
> > +  gimple *new_phi = NULL, *phi = NULL;
> >    gimple_stmt_iterator exit_gsi;
> >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> >    gimple *epilog_stmt = NULL;
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd601
> 24434039
> > 97e921066483 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const
> class loop *, const_edge,
> >  					 const_edge);
> >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> >  						    class loop *, edge,
> > -						    edge, edge *);
> > +						    edge, edge *, bool = true);
> >  class loop *vect_loop_versioning (loop_vec_info, gimple *);  extern
> > class loop *vect_do_peeling (loop_vec_info, tree, tree,
> >  				    tree *, tree *, tree *, int, bool, bool,
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-10-11 10:45   ` Tamar Christina
@ 2023-10-11 12:07     ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2023-10-11 12:07 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 11 Oct 2023, Tamar Christina wrote:

> > > @@ -2664,7 +2679,7 @@ slpeel_update_phi_nodes_for_loops
> > (loop_vec_info loop_vinfo,
> > >       for correct vectorization of live stmts.  */
> > >    if (loop == first)
> > >      {
> > > -      basic_block orig_exit = single_exit (second)->dest;
> > > +      basic_block orig_exit = second_loop_e->dest;
> > >        for (gsi_orig = gsi_start_phis (orig_exit);
> > >  	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > >  	{
> > > @@ -2673,13 +2688,14 @@ slpeel_update_phi_nodes_for_loops
> > (loop_vec_info loop_vinfo,
> > >  	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > (orig_arg))
> > >  	    continue;
> > >
> > > +	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > >  	  /* Already created in the above loop.   */
> > > -	  if (find_guard_arg (first, second, orig_phi))
> > > +	  if (find_guard_arg (first, second, exit_e, orig_phi))
> > >  	    continue;
> > >
> > >  	  tree new_res = copy_ssa_name (orig_arg);
> > >  	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > UNKNOWN_LOCATION);
> > > +	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
> > >  	}
> > >      }
> > >  }
> > > @@ -2847,7 +2863,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> > *loop, class loop *epilog,
> > >        if (!merge_arg)
> > >  	merge_arg = old_arg;
> > >
> > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > +      tree guard_arg
> > > +	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> > 
> > missed adjustment?  you are introducing a single_exit call here ...
> > 
> 
> It's a very temporary one that gets removed in patch 3/3 when I start
> passing the rest of the edges down explicitly. It allowed me to split the
> patches a bit more.

OK, fine.

> > >        /* If the var is live after loop but not a reduction, we simply
> > >  	 use the old arg.  */
> > >        if (!guard_arg)
> > > @@ -3201,27 +3218,37 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > >      }
> > >
> > >    if (vect_epilogues)
> > > -    /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> > > -       use the original scalar loop as remaining epilogue if necessary.  */
> > > -    LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> > > -      = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > +    {
> > > +      /* Make sure to set the epilogue's epilogue scalar loop, such that we can
> > > +	 use the original scalar loop as remaining epilogue if necessary.  */
> > > +      LOOP_VINFO_SCALAR_LOOP (epilogue_vinfo)
> > > +	= LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > +      LOOP_VINFO_SCALAR_IV_EXIT (epilogue_vinfo)
> > > +	= LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > > +    }
> > >
> > >    if (prolog_peeling)
> > >      {
> > >        e = loop_preheader_edge (loop);
> > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > +      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, exit_e,
> > > + e));
> > >
> > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> > > +      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > > +      edge prolog_e = NULL;
> > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, exit_e,
> > > +						       scalar_loop, scalar_e,
> > > +						       e, &prolog_e);
> > >        gcc_assert (prolog);
> > >        prolog->force_vectorize = false;
> > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > +      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> > > +					 exit_e, true);
> > >        first_loop = prolog;
> > >        reset_original_copy_tables ();
> > >
> > >        /* Update the number of iterations for prolog loop.  */
> > >        tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog));
> > > -      vect_set_loop_condition (prolog, NULL, niters_prolog,
> > > +      vect_set_loop_condition (prolog, prolog_e, loop_vinfo,
> > > + niters_prolog,
> > >  			       step_prolog, NULL_TREE, false);
> > >
> > >        /* Skip the prolog loop.  */
> > > @@ -3275,8 +3302,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > > niters, tree nitersm1,
> > >
> > >    if (epilog_peeling)
> > >      {
> > > -      e = single_exit (loop);
> > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > +      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e, e));
> > >
> > >        /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
> > >  	 said epilog then we should use a copy of the main loop as a
> > > starting @@ -3285,12 +3312,18 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo, tree niters, tree nitersm1,
> > >  	 If we are not vectorizing the epilog then we should use the scalar loop
> > >  	 as the transformations mentioned above make less or no sense when
> > not
> > >  	 vectorizing.  */
> > > +      edge scalar_e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > +      edge epilog_e = vect_epilogues ? e : scalar_e;
> > > +      edge new_epilog_e = NULL;
> > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
> > > +						       epilog_e, e,
> > > +						       &new_epilog_e);
> > > +      LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
> > >        gcc_assert (epilog);
> > > -
> > >        epilog->force_vectorize = false;
> > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > +      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> > > +					 new_epilog_e, false);
> > >        bb_before_epilog = loop_preheader_edge (epilog)->src;
> > >
> > >        /* Scalar version loop may be preferred.  In this case, add
> > > guard @@ -3374,16 +3407,16 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo, tree niters, tree nitersm1,
> > >  	{
> > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > >  				    niters, niters_vector_mult_vf);
> > > -	  guard_bb = single_exit (loop)->dest;
> > > -	  guard_to = split_edge (single_exit (epilog));
> > > +	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> > > +	  guard_to = split_edge (epilog_e);
> > >  	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
> > >  					   skip_vector ? anchor : guard_bb,
> > >  					   prob_epilog.invert (),
> > >  					   irred_flag);
> > >  	  if (vect_epilogues)
> > >  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > -	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > -					      single_exit (epilog));
> > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > +epilog_e);
> > >  	  /* Only need to handle basic block before epilog loop if it's not
> > >  	     the guard_bb, which is the case when skip_vector is true.  */
> > >  	  if (guard_bb != bb_before_epilog)
> > > @@ -3416,6 +3449,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >      {
> > >        epilog->aux = epilogue_vinfo;
> > >        LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
> > > +      LOOP_VINFO_IV_EXIT (epilogue_vinfo)
> > > +	= LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> > >
> > >        loop_constraint_clear (epilog, LOOP_C_INFINITE);
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > 23c6e8259e7b133cd7acc6bcf0bad26423e9993a..6e60d84143626a8e1d80
> > 1bb580f4
> > > dcebc73c7ba7 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -855,10 +855,9 @@ vect_fixup_scalar_cycles_with_patterns
> > > (loop_vec_info loop_vinfo)
> > >
> > >
> > >  static gcond *
> > > -vect_get_loop_niters (class loop *loop, tree *assumptions,
> > > +vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
> > >  		      tree *number_of_iterations, tree
> > *number_of_iterationsm1)  {
> > > -  edge exit = single_exit (loop);
> > >    class tree_niter_desc niter_desc;
> > >    tree niter_assumptions, niter, may_be_zero;
> > >    gcond *cond = get_loop_exit_condition (loop); @@ -927,6 +926,20 @@
> > > vect_get_loop_niters (class loop *loop, tree *assumptions,
> > >    return cond;
> > >  }
> > >
> > > +/*  Determine the main loop exit for the vectorizer.  */
> > > +
> > > +edge
> > 
> > can't this be 'static'?
> 
> No since it's used by set_uid_loop_bbs which is setting the loop out of get_loop.
> 
> If I understand correctly the expected loop from this is the ifcvt loop? If that's the
> case I may be able to match it up through the ->aux again but since set_uid_loop_bbs
> isn't called often I figure I can just re-analyze.

I see.

Richard.

> Regards,
> Tamar
> 
> > 
> > > +vec_init_loop_exit_info (class loop *loop) {
> > > +  /* Before we begin we must first determine which exit is the main one and
> > > +     which are auxilary exits.  */
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +  if (exits.length () == 1)
> > > +    return exits[0];
> > > +  else
> > > +    return NULL;
> > > +}
> > > +
> > >  /* Function bb_in_loop_p
> > >
> > >     Used as predicate for dfs order traversal of the loop bbs.  */ @@
> > > -987,7 +1000,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in,
> > vec_info_shared *shared)
> > >      has_mask_store (false),
> > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > >      scalar_loop (NULL),
> > > -    orig_loop_info (NULL)
> > > +    orig_loop_info (NULL),
> > > +    vec_loop_iv (NULL),
> > > +    vec_epilogue_loop_iv (NULL),
> > > +    scalar_loop_iv (NULL)
> > >  {
> > >    /* CHECKME: We want to visit all BBs before their successors (except for
> > >       latch blocks, for which this assertion wouldn't hold).  In the
> > > simple @@ -1646,6 +1662,18 @@ vect_analyze_loop_form (class loop
> > > *loop, vect_loop_form_info *info)  {
> > >    DUMP_VECT_SCOPE ("vect_analyze_loop_form");
> > >
> > > +  edge exit_e = vec_init_loop_exit_info (loop);
> > > +  if (!exit_e)
> > > +    return opt_result::failure_at (vect_location,
> > > +				   "not vectorized:"
> > > +				   " could not determine main exit from"
> > > +				   " loop with multiple exits.\n");
> > > +  info->loop_exit = exit_e;
> > > +  if (dump_enabled_p ())
> > > +      dump_printf_loc (MSG_NOTE, vect_location,
> > > +		       "using as main loop exit: %d -> %d [AUX: %p]\n",
> > > +		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
> > > +
> > >    /* Different restrictions apply when we are considering an inner-most loop,
> > >       vs. an outer (nested) loop.
> > >       (FORNOW. May want to relax some of these restrictions in the
> > > future).  */ @@ -1767,7 +1795,7 @@ vect_analyze_loop_form (class loop
> > *loop, vect_loop_form_info *info)
> > >  				   " abnormal loop exit edge.\n");
> > >
> > >    info->loop_cond
> > > -    = vect_get_loop_niters (loop, &info->assumptions,
> > > +    = vect_get_loop_niters (loop, e, &info->assumptions,
> > >  			    &info->number_of_iterations,
> > >  			    &info->number_of_iterationsm1);
> > >    if (!info->loop_cond)
> > > @@ -1821,6 +1849,9 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > >
> > >    stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info-
> > >loop_cond);
> > >    STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > > +
> > > +  LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > > +
> > >    if (info->inner_loop_cond)
> > >      {
> > >        stmt_vec_info inner_loop_cond_info @@ -3063,9 +3094,9 @@
> > > start_over:
> > >        if (dump_enabled_p ())
> > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> > >        if (!vect_can_advance_ivs_p (loop_vinfo)
> > > -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > > -					   single_exit (LOOP_VINFO_LOOP
> > > -							 (loop_vinfo))))
> > > +	  || !slpeel_can_duplicate_loop_p (loop,
> > > +					   LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > +					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
> > >          {
> > >  	  ok = opt_result::failure_at (vect_location,
> > >  				       "not vectorized: can't create required "
> > > @@ -6002,7 +6033,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >           Store them in NEW_PHIS.  */
> > >    if (double_reduc)
> > >      loop = outer_loop;
> > > -  exit_bb = single_exit (loop)->dest;
> > > +  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > >    exit_gsi = gsi_after_labels (exit_bb);
> > >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > >    for (unsigned i = 0; i < vec_num; i++) @@ -6018,7 +6049,7 @@
> > > vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> > >  	  phi = create_phi_node (new_def, exit_bb);
> > >  	  if (j)
> > >  	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> > > -	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> > > +	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)-
> > >dest_idx,
> > > +def);
> > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > >  	  reduc_inputs.quick_push (new_def);
> > >  	}
> > > @@ -10416,12 +10447,12 @@ vectorizable_live_operation (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >  	   lhs' = new_tree;  */
> > >
> > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > -      basic_block exit_bb = single_exit (loop)->dest;
> > > +      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > >        gcc_assert (single_pred_p (exit_bb));
> > >
> > >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > > -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> > > +      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT
> > > + (loop_vinfo)->dest_idx, vec_lhs);
> > >
> > >        gimple_seq stmts = NULL;
> > >        tree new_tree;
> > > @@ -10965,7 +10996,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> > gimple_stmt_iterator *gsi,
> > >     profile.  */
> > >
> > >  static void
> > > -scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool
> > > flat)
> > > +scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned
> > > +vf, bool flat)
> > >  {
> > >    /* For flat profiles do not scale down proportionally by VF and only
> > >       cap by known iteration count bounds.  */ @@ -10980,7 +11011,6 @@
> > > scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
> > >        return;
> > >      }
> > >    /* Loop body executes VF fewer times and exit increases VF times.
> > > */
> > > -  edge exit_e = single_exit (loop);
> > >    profile_count entry_count = loop_preheader_edge (loop)->count ();
> > >
> > >    /* If we have unreliable loop profile avoid dropping entry @@
> > > -11350,7 +11380,7 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> > > gimple *loop_vectorized_call)
> > >
> > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > >       versioning.   */
> > > -  edge e = single_exit (loop);
> > > +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > >    if (! single_pred_p (e->dest))
> > >      {
> > >        split_loop_exit_edge (e, true); @@ -11376,7 +11406,7 @@
> > > vect_transform_loop (loop_vec_info loop_vinfo, gimple
> > *loop_vectorized_call)
> > >       loop closed PHI nodes on the exit.  */
> > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > >      {
> > > -      e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > +      e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > >        if (! single_pred_p (e->dest))
> > >  	{
> > >  	  split_loop_exit_edge (e, true);
> > > @@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >       a zero NITERS becomes a nonzero NITERS_VECTOR.  */
> > >    if (integer_onep (step_vector))
> > >      niters_no_overflow = true;
> > > -  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
> > > -			   niters_vector_mult_vf, !niters_no_overflow);
> > > +  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > loop_vinfo,
> > > +			   niters_vector, step_vector, niters_vector_mult_vf,
> > > +			   !niters_no_overflow);
> > >
> > >    unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
> > >
> > > @@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >  			  assumed_vf) - 1
> > >  	 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
> > >  			   assumed_vf) - 1);
> > > -  scale_profile_for_vect_loop (loop, assumed_vf, flat);
> > > +  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > +			       assumed_vf, flat);
> > >
> > >    if (dump_enabled_p ())
> > >      {
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e
> > 3740ecc
> > > 4377f5a31e54 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -919,10 +919,24 @@ public:
> > >       analysis.  */
> > >    vec<_loop_vec_info *> epilogue_vinfos;
> > >
> > > +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> > > +     controls the natural exits of the loop.  */  edge vec_loop_iv;
> > > +
> > > +  /* The controlling loop IV for the epilogue loop when vectorizing.  This IV
> > > +     controls the natural exits of the loop.  */  edge
> > > + vec_epilogue_loop_iv;
> > > +
> > > +  /* The controlling loop IV for the scalar loop being vectorized.  This IV
> > > +     controls the natural exits of the loop.  */  edge
> > > + scalar_loop_iv;
> > 
> > all of the above sound as if they were IVs, the access macros have _EXIT at the
> > end, can you make the above as well?
> > 
> > Otherwise looks good to me.
> > 
> > Feel free to push approved patches of the series, no need to wait until
> > everything is approved.
> > 
> > Thanks,
> > Richard.
> > 
> > >  } *loop_vec_info;
> > >
> > >  /* Access Functions.  */
> > >  #define LOOP_VINFO_LOOP(L)                 (L)->loop
> > > +#define LOOP_VINFO_IV_EXIT(L)              (L)->vec_loop_iv
> > > +#define LOOP_VINFO_EPILOGUE_IV_EXIT(L)     (L)->vec_epilogue_loop_iv
> > > +#define LOOP_VINFO_SCALAR_IV_EXIT(L)       (L)->scalar_loop_iv
> > >  #define LOOP_VINFO_BBS(L)                  (L)->bbs
> > >  #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
> > >  #define LOOP_VINFO_NITERS(L)               (L)->num_iters
> > > @@ -2155,11 +2169,13 @@ class auto_purge_vect_location
> > >
> > >  /* Simple loop peeling and versioning utilities for vectorizer's purposes -
> > >     in tree-vect-loop-manip.cc.  */
> > > -extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > +extern void vect_set_loop_condition (class loop *, edge,
> > > +loop_vec_info,
> > >  				     tree, tree, tree, bool);
> > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > > const_edge); -class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *,
> > > -						     class loop *, edge);
> > > +extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
> > > +					 const_edge);
> > > +class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> > > +						    class loop *, edge,
> > > +						    edge, edge *);
> > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);  extern
> > > class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > >  				    tree *, tree *, tree *, int, bool, bool, @@ -
> > 2169,6 +2185,7
> > > @@ extern void vect_prepare_for_masked_peels (loop_vec_info);  extern
> > > dump_user_location_t find_loop_location (class loop *);  extern bool
> > > vect_can_advance_ivs_p (loop_vec_info);  extern void
> > > vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> > > +extern edge vec_init_loop_exit_info (class loop *);
> > >
> > >  /* In tree-vect-stmts.cc.  */
> > >  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> > > @@ -2358,6 +2375,7 @@ struct vect_loop_form_info
> > >    tree assumptions;
> > >    gcond *loop_cond;
> > >    gcond *inner_loop_cond;
> > > +  edge loop_exit;
> > >  };
> > >  extern opt_result vect_analyze_loop_form (class loop *,
> > > vect_loop_form_info *);  extern loop_vec_info vect_create_loop_vinfo
> > > (class loop *, vec_info_shared *, diff --git a/gcc/tree-vectorizer.cc
> > > b/gcc/tree-vectorizer.cc index
> > >
> > a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac6037893
> > 5392aa7b
> > > 73476efed74b 100644
> > > --- a/gcc/tree-vectorizer.cc
> > > +++ b/gcc/tree-vectorizer.cc
> > > @@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call,
> > >    class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
> > >
> > >    LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
> > > +  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
> > > +    = vec_init_loop_exit_info (scalar_loop);
> > >    gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
> > >  		       == loop_vectorized_call);
> > >    /* If we are going to vectorize outer loop, prevent vectorization
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.
  2023-10-11 10:54     ` Tamar Christina
@ 2023-10-11 12:08       ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2023-10-11 12:08 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 11 Oct 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, October 10, 2023 12:14 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 2/3]middle-end: updated niters analysis to handle
> > multiple exits.
> > 
> > On Mon, 2 Oct 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This second part updates niters analysis to be able to analyze any
> > > number of exits.  If we have multiple exits we determine the main exit
> > > by finding the first counting IV.
> > >
> > > The change allows the vectorizer to pass analysis for multiple loops,
> > > but we later gracefully reject them.  It does however allow us to test
> > > if the exit handling is using the right exit everywhere.
> > >
> > > Additionally since we analyze all exits, we now return all conditions
> > > for them and determine which condition belongs to the main exit.
> > >
> > > The main condition is needed because the vectorizer needs to ignore
> > > the main IV condition during vectorization as it will replace it during codegen.
> > >
> > > To track versioned loops we extend the contract between ifcvt and the
> > > vectorizer to store the exit number in aux so that we can match it up again
> > during peeling.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu,
> > > and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
> > > 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > Use
> > > 	it.
> > > 	* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
> > > 	(vec_init_loop_exit_info): Extend analysis when multiple exits.
> > > 	(vect_analyze_loop_form): Record conds and determine main cond.
> > > 	(vect_create_loop_vinfo): Extend bookkeeping of conds.
> > > 	(vect_analyze_loop): Release conds.
> > > 	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> > > 	LOOP_VINFO_LOOP_IV_COND):  New.
> > > 	(struct vect_loop_form_info): Add conds, alt_loop_conds;
> > > 	(struct loop_vec_info): Add conds, loop_iv_cond.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index
> > >
> > 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55
> > 903531
> > > b22829f4fdbd 100644
> > > --- a/gcc/tree-if-conv.cc
> > > +++ b/gcc/tree-if-conv.cc
> > > @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop,
> > vec<gimple *> *preds)
> > >      }
> > >    if (need_to_ifcvt)
> > >      {
> > > +      /* Before we rewrite edges we'll record their original position in the
> > > +	 edge map such that we can map the edges between the ifcvt and the
> > > +	 non-ifcvt loop during peeling.  */
> > > +      uintptr_t idx = 0;
> > > +      for (edge exit : get_loop_exit_edges (loop))
> > > +	exit->aux = (void*)idx++;
> > > +
> > >        /* Now all statements are if-convertible.  Combine all the basic
> > >  	 blocks into one huge basic block doing the if-conversion
> > >  	 on-the-fly.  */
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4
> > 052e1b1
> > > 2e0d17300262 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop, edge loop_exit,
> > >        scalar_loop = loop;
> > >        scalar_exit = loop_exit;
> > >      }
> > > +  else if (scalar_loop == loop)
> > > +    scalar_exit = loop_exit;
> > > +  else
> > > +    {
> > > +      /* Loop has been version, match exits up using the aux index.  */
> > > +      for (edge exit : get_loop_exit_edges (scalar_loop))
> > > +	if (exit->aux == loop_exit->aux)
> > > +	  {
> > > +	    scalar_exit	= exit;
> > > +	    break;
> > > +	  }
> > > +    }
> > >
> > >    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> > >    pbbs = bbs + 1;
> > > @@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop, edge loop_exit,
> > >    exit = loop_exit;
> > >    basic_block new_preheader = new_bbs[0];
> > >
> > > +  /* Record the new loop exit information.  new_loop doesn't have SCEV
> > data and
> > > +     so we must initialize the exit information.  */
> > >    if (new_e)
> > >      *new_e = new_exit;
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > 6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3
> > a313b11
> > > d1ef98374349 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns
> > (loop_vec_info loop_vinfo)
> > >     in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
> > >     niter information holds in ASSUMPTIONS.
> > >
> > > -   Return the loop exit condition.  */
> > > +   Return the loop exit conditions.  */
> > >
> > >
> > > -static gcond *
> > > -vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
> > > +static vec<gcond *>
> > > +vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge
> > > +main_exit,
> > >  		      tree *number_of_iterations, tree
> > *number_of_iterationsm1)
> > 
> > Any reason you swap exit and main_exit?  IMHO the input better pairs with
> > the other input 'loop'.
> > 
> 
> No, I think I was just rearranging thing to fit more on a line.  I'll put them next
> to their exits.
> 
> > 
> > >  {
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);  vec<gcond *>
> > > + conds;  conds.create (exits.length ());
> > >    class tree_niter_desc niter_desc;
> > >    tree niter_assumptions, niter, may_be_zero;
> > > -  gcond *cond = get_loop_exit_condition (loop);
> > >
> > >    *assumptions = boolean_true_node;
> > >    *number_of_iterationsm1 = chrec_dont_know;
> > >    *number_of_iterations = chrec_dont_know;
> > > +
> > >    DUMP_VECT_SCOPE ("get_loop_niters");
> > >
> > > -  if (!exit)
> > > -    return cond;
> > > +  if (exits.is_empty ())
> > > +    return conds;
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> > > +		     exits.length ());
> > > +
> > > +  edge exit;
> > > +  unsigned int i;
> > > +  FOR_EACH_VEC_ELT (exits, i, exit)
> > > +    {
> > > +      gcond *cond = get_loop_exit_condition (exit);
> > > +      if (cond)
> > > +	conds.safe_push (cond);
> > > +
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n",
> > > +i);
> > >
> > > -  may_be_zero = NULL_TREE;
> > > -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> > NULL)
> > > -      || chrec_contains_undetermined (niter_desc.niter))
> > > -    return cond;
> > > +      may_be_zero = NULL_TREE;
> > > +      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> > NULL)
> > > +          || chrec_contains_undetermined (niter_desc.niter))
> > > +	continue;
> > >
> > > -  niter_assumptions = niter_desc.assumptions;
> > > -  may_be_zero = niter_desc.may_be_zero;
> > > -  niter = niter_desc.niter;
> > > +      niter_assumptions = niter_desc.assumptions;
> > > +      may_be_zero = niter_desc.may_be_zero;
> > > +      niter = niter_desc.niter;
> > >
> > > -  if (may_be_zero && integer_zerop (may_be_zero))
> > > -    may_be_zero = NULL_TREE;
> > > +      if (may_be_zero && integer_zerop (may_be_zero))
> > > +	may_be_zero = NULL_TREE;
> > >
> > > -  if (may_be_zero)
> > > -    {
> > > -      if (COMPARISON_CLASS_P (may_be_zero))
> > > +      if (may_be_zero)
> > >  	{
> > > -	  /* Try to combine may_be_zero with assumptions, this can simplify
> > > -	     computation of niter expression.  */
> > > -	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > > -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> > boolean_type_node,
> > > -					     niter_assumptions,
> > > -					     fold_build1 (TRUTH_NOT_EXPR,
> > > -							  boolean_type_node,
> > > -							  may_be_zero));
> > > +	  if (COMPARISON_CLASS_P (may_be_zero))
> > > +	    {
> > > +	      /* Try to combine may_be_zero with assumptions, this can simplify
> > > +		 computation of niter expression.  */
> > > +	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > > +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR,
> > boolean_type_node,
> > > +						 niter_assumptions,
> > > +						 fold_build1
> > (TRUTH_NOT_EXPR,
> > > +
> > boolean_type_node,
> > > +							      may_be_zero));
> > > +	      else
> > > +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter),
> > may_be_zero,
> > > +				     build_int_cst (TREE_TYPE (niter), 0),
> > > +				     rewrite_to_non_trapping_overflow (niter));
> > > +
> > > +	      may_be_zero = NULL_TREE;
> > > +	    }
> > > +	  else if (integer_nonzerop (may_be_zero) && exit == main_exit)
> > > +	    {
> > > +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > > +	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > > +	      continue;
> > > +	    }
> > >  	  else
> > > -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> > > -				 build_int_cst (TREE_TYPE (niter), 0),
> > > -				 rewrite_to_non_trapping_overflow (niter));
> > > +	    continue;
> > > +       }
> > >
> > > -	  may_be_zero = NULL_TREE;
> > > -	}
> > > -      else if (integer_nonzerop (may_be_zero))
> > > +      /* Loop assumptions are based off the normal exit.  */
> > > +      if (exit == main_exit)
> > 
> > It's a bit hard to follow in patch form but I wonder why you even analyze the
> > number of iterations of the non-main exits riskying possibly clobbering the
> > *number_* outputs which we later assume to be for the main exit?
> > 
> 
> My original goal here was that if we can't analyze the other exits, we probably
> can't vectorize them. So I don't really need the results but I thought it useful to
> check.  I can skip them.

Please.  I don't think they need to be countable.

Richard.

> Thanks,
> Tamar
> 
> > >  	{
> > > -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > > -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > > -	  return cond;
> > > +	  *assumptions = niter_assumptions;
> > > +	  *number_of_iterationsm1 = niter;
> > > +
> > > +	  /* We want the number of loop header executions which is the
> > number
> > > +	     of latch executions plus one.
> > > +	     ???  For UINT_MAX latch executions this number overflows to zero
> > > +	     for loops like do { n++; } while (n != 0);  */
> > > +	  if (niter && !chrec_contains_undetermined (niter))
> > > +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> > > +				 unshare_expr (niter),
> > > +				 build_int_cst (TREE_TYPE (niter), 1));
> > > +	  *number_of_iterations = niter;
> > >  	}
> > > -      else
> > > -	return cond;
> > >      }
> > >
> > > -  *assumptions = niter_assumptions;
> > > -  *number_of_iterationsm1 = niter;
> > > -
> > > -  /* We want the number of loop header executions which is the number
> > > -     of latch executions plus one.
> > > -     ???  For UINT_MAX latch executions this number overflows to zero
> > > -     for loops like do { n++; } while (n != 0);  */
> > > -  if (niter && !chrec_contains_undetermined (niter))
> > > -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
> > > -			  build_int_cst (TREE_TYPE (niter), 1));
> > > -  *number_of_iterations = niter;
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits
> > > + successfully analyzed.\n");
> > >
> > > -  return cond;
> > > +  return conds;
> > >  }
> > >
> > >  /*  Determine the main loop exit for the vectorizer.  */ @@ -936,8
> > > +963,25 @@ vec_init_loop_exit_info (class loop *loop)
> > >    auto_vec<edge> exits = get_loop_exit_edges (loop);
> > >    if (exits.length () == 1)
> > >      return exits[0];
> > > -  else
> > > -    return NULL;
> > > +
> > > +  /* If we have multiple exits we only support counting IV at the moment.
> > Analyze
> > > +     all exits and return one */
> > > +  class tree_niter_desc niter_desc;
> > > +  edge candidate = NULL;
> > > +  for (edge exit : exits)
> > > +    {
> > > +      if (!get_loop_exit_condition (exit))
> > > +	continue;
> > > +
> > > +      if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc,
> > NULL)
> > > +	  && !chrec_contains_undetermined (niter_desc.niter))
> > > +	{
> > > +	  if (!niter_desc.may_be_zero || !candidate)
> > > +	    candidate = exit;
> > > +	}
> > > +    }
> > > +
> > > +  return candidate;
> > >  }
> > >
> > >  /* Function bb_in_loop_p
> > > @@ -1788,21 +1832,31 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > >  				   "not vectorized: latch block not empty.\n");
> > >
> > >    /* Make sure the exit is not abnormal.  */
> > > -  edge e = single_exit (loop);
> > > -  if (e->flags & EDGE_ABNORMAL)
> > > +  if (exit_e->flags & EDGE_ABNORMAL)
> > >      return opt_result::failure_at (vect_location,
> > >  				   "not vectorized:"
> > >  				   " abnormal loop exit edge.\n");
> > >
> > > -  info->loop_cond
> > > -    = vect_get_loop_niters (loop, e, &info->assumptions,
> > > +  info->conds
> > > +    = vect_get_loop_niters (loop, &info->assumptions, exit_e,
> > >  			    &info->number_of_iterations,
> > >  			    &info->number_of_iterationsm1);
> > > -  if (!info->loop_cond)
> > > +
> > > +  if (info->conds.is_empty ())
> > >      return opt_result::failure_at
> > >        (vect_location,
> > >         "not vectorized: complicated exit condition.\n");
> > >
> > > +  /* Determine what the primary and alternate exit conds are.  */
> > > +  info->alt_loop_conds.create (info->conds.length () - 1);
> > > +  for (gcond *cond : info->conds)
> > > +    {
> > > +      if (exit_e->src != gimple_bb (cond))
> > > +	info->alt_loop_conds.quick_push (cond);
> > > +      else
> > > +	info->loop_cond = cond;
> > > +    }
> > > +
> > 
> > IMHO it would be simpler to have the primary exit condition in
> > info->conds[0] and the rest after that?  That avoids having two
> > arrays and one scalar in vect_loop_form_info.
> > 
> > >    if (integer_zerop (info->assumptions)
> > >        || !info->number_of_iterations
> > >        || chrec_contains_undetermined (info->number_of_iterations)) @@
> > > -1847,8 +1901,13 @@ vect_create_loop_vinfo (class loop *loop,
> > vec_info_shared *shared,
> > >    if (!integer_onep (info->assumptions) && !main_loop_info)
> > >      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
> > >
> > > -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt
> > > (info->loop_cond);
> > > -  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > > +  for (gcond *cond : info->conds)
> > > +    {
> > > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> > > +      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > > +    }
> > > +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice
> > > + (info->alt_loop_conds);  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) =
> > > + info->loop_cond;
> > >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > >
> > > @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop,
> > vec_info_shared *shared)
> > >  			 && LOOP_VINFO_PEELING_FOR_NITER
> > (first_loop_vinfo)
> > >  			 && !loop->simduid);
> > >    if (!vect_epilogues)
> > > -    return first_loop_vinfo;
> > > +    {
> > > +      loop_form_info.conds.release ();
> > > +      loop_form_info.alt_loop_conds.release ();
> > > +      return first_loop_vinfo;
> > > +    }
> > 
> > I think there's 'inner' where you leak these.  Maybe use auto_vec<> in
> > vect_loop_form_info instead?
> > 
> > Otherwise looks OK.
> > 
> > Thanks,
> > Richard.
> > 
> > >    /* Now analyze first_loop_vinfo for epilogue vectorization.  */
> > >    poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD
> > > (first_loop_vinfo); @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop
> > *loop, vec_info_shared *shared)
> > >  			   (first_loop_vinfo->epilogue_vinfos[0]-
> > >vector_mode));
> > >      }
> > >
> > > +  loop_form_info.conds.release ();
> > > +  loop_form_info.alt_loop_conds.release ();
> > > +
> > >    return first_loop_vinfo;
> > >  }
> > >
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa132
> > 7d595e1
> > > dddb112cfdf6 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -882,6 +882,12 @@ public:
> > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > >    bool peeling_for_niter;
> > >
> > > +  /* List of loop additional IV conditionals found in the loop.  */
> > > + auto_vec<gcond *> conds;
> > > +
> > > +  /* Main loop IV cond.  */
> > > +  gcond* loop_iv_cond;
> > > +
> > >    /* True if there are no loop carried data dependencies in the loop.
> > >       If loop->safelen <= 1, then this is always true, either the loop
> > >       didn't have any loop carried data dependencies, or the loop is
> > > being @@ -984,6 +990,8 @@ public:
> > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > +#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > +#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > >no_data_dependencies
> > >  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
> > >  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> > > @@ -2373,7 +2381,9 @@ struct vect_loop_form_info
> > >    tree number_of_iterations;
> > >    tree number_of_iterationsm1;
> > >    tree assumptions;
> > > +  vec<gcond *> conds;
> > >    gcond *loop_cond;
> > > +  vec<gcond *> alt_loop_conds;
> > >    gcond *inner_loop_cond;
> > >    edge loop_exit;
> > >  };
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling
  2023-10-11 11:16     ` Tamar Christina
@ 2023-10-11 12:09       ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2023-10-11 12:09 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 11 Oct 2023, Tamar Christina wrote:

> > > +  auto loop_exits = get_loop_exit_edges (loop);
> > > + auto_vec<basic_block> doms;
> > > +
> > >    if (at_exit) /* Add the loop copy at exit.  */
> > >      {
> > > -      if (scalar_loop != loop)
> > > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> > >  	{
> > > -	  gphi_iterator gsi;
> > >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > > +	  flush_pending_stmts (new_exit);
> > > +	}
> > >
> > > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > > -	       gsi_next (&gsi))
> > > -	    {
> > > -	      gphi *phi = gsi.phi ();
> > > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > > -	      location_t orig_locus
> > > -		= gimple_phi_arg_location_from_edge (phi, e);
> > > +      auto_vec <gimple *> new_phis;
> > > +      hash_map <tree, tree> new_phi_args;
> > > +      /* First create the empty phi nodes so that when we flush the
> > > +	 statements they can be filled in.   However because there is no order
> > > +	 between the PHI nodes in the exits and the loop headers we need to
> > > +	 order them base on the order of the two headers.  First record the
> > new
> > > +	 phi nodes.  */
> > > +      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> > > +	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> > > +	{
> > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > +	  gphi *res = create_phi_node (new_res, new_preheader);
> > > +	  new_phis.safe_push (res);
> > > +	}
> > >
> > > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > +      /* Then redirect the edges and flush the changes.  This writes out the
> > new
> > > +	 SSA names.  */
> > > +      for (edge exit : loop_exits)
> > 
> > I realize at the moment it's the same, but we are redirecting multiple exit edges
> > here and from the walk above expect them all to have the same set of PHI
> > nodes - that looks a bit fragile?
> 
> No, it only expects the two preheaders to have the same PHI nodes.  Since one loop
> is copied from the other we know that to be true.
> 
> Now of course there are cases where your exit blocks have more PHI nodes than the
> headers (e.g. live values) but those are handled later in the hunk below (with new_phi_args).
> 
> For the flush_pending_stmts to work I had to make sure the order of the phi nodes are the
> same as the original.  This is why I can't iterate over the values in the exit block instead and
> need to handle it in two steps.
> 
> > Does this need adjustments later for the early exit vectorization?
> > 
> 
> I believe (need to finish the rebase) that the only adjustment I'll need here for multiple exits
> is the updates of the dominators.  I don't think I'll need more.  I had issues with live values that
> I had to handle specially before, but I think this new approach should deal with it already.

OK.

> > This also somewhat confuses the original redirection of 'e', the main exit with
> > the later (*)
> > 
> > > +	{
> > > +	  edge e = redirect_edge_and_branch (exit, new_preheader);
> > > +	  flush_pending_stmts (e);
> > > +	}
> > > +
> > > +      /* Record the new SSA names in the cache so that we can skip
> > materializing
> > > +	 them again when we fill in the rest of the LCSSA variables.  */
> > > +      for (auto phi : new_phis)
> > > +	{
> > > +	  tree new_arg = gimple_phi_arg (phi, 0)->def;
> > 
> > and here you look at the (for now) single edge we redirected ...
> > 
> > > +	  new_phi_args.put (new_arg, gimple_phi_result (phi));
> > > +	}
> > > +
> > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > +	 block and the new loop header.  This allows us to later split the
> > > +	 preheader block and still find the right LC nodes.  */
> > > +      edge latch_new = single_succ_edge (new_preheader);
> > 
> > odd name - the single successor of a loop preheader is the loop header and the
> > corresponding edge is the loop entry edge, not the latch?
> > 
> > > +      for (auto gsi_from = gsi_start_phis (loop->header),
> > > +	   gsi_to = gsi_start_phis (new_loop->header);
> > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > 
> > Eh, can we have
> > 
> >   if (flow_loops)
> >     for  (auto ...)
> > 
> > please, even if that indents more?
> > 
> > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > +	{
> > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > +						loop_latch_edge (loop));
> > > +
> > > +	  /* Check if we've already created a new phi node during edge
> > > +	     redirection.  If we have, only propagate the value downwards.  */
> > > +	  if (tree *res = new_phi_args.get (new_arg))
> > > +	    {
> > > +	      adjust_phi_and_debug_stmts (to_phi, latch_new, *res);
> > > +	      continue;
> > >  	    }
> > > +
> > > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > +
> > > +	  /* Main loop exit should use the final iter value.  */
> > > +	  add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
> > 
> > For all other edges into the loop besides 'e' there's missing PHI arguments?
> > You are using 'e' here again, but also use that as temporary in for blocks,
> > shadowing the parameter - that makes it difficult to read.  Also it's sometimes
> > 'e->dest' and sometimes new_preheader - I think you want to use
> > new_preheader here as well (in create_phi_node) for consistency and ease of
> > understanding.
> > 
> > ISTR when early break vectorization lands we're going to redirect the alternate
> > exits away again "fixing" the missing PHI args.
> > 
> 
> We indeed had a discussion about this, and I'll expand more on the reasoning in the
> patch for early breaks.  But I think not redirecting the edges away for early break makes
> more sense as It treats early break, alignment peeling and epilogue vectorization the same
> way and the only difference is in the statement inside the guard blocks.
> 
> But also more importantly this representation also makes it easier to implement First-Faulting
> Loads support.  For FFL we'll copy the main loop and at the "fault" check we branch to a new
> Loop remainder that has the same sequences as the remainder of the main vector loop but
> with different predicates.  The reason for this is to remove the predicate mangling from the
> optimal/likely loop body which is critical for performance.
> 
> Now since FFL is intended to pair naturally with early break having the early exit edges all
> lead into the same block makes the flow a lot easier to manage.
> 
> But I'll make sure to include a diagram in the early break peeling patch.

Thanks.

So with the minor pending adjustments this series should be OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> > > +
> > > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> > >  	}
> > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > -      flush_pending_stmts (e);
> > > +
> > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> > > -      if (was_imm_dom || duplicate_outer_loop)
> > > +
> > > +      if ((was_imm_dom || duplicate_outer_loop))
> > 
> > extra ()s
> > 
> > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > >src);
> > >
> > >        /* And remove the non-necessary forwarder again.  Keep the
> > > other @@ -1598,6 +1680,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class loop *loop, edge loop_exit,
> > >      }
> > >    else /* Add the copy at entry.  */
> > >      {
> > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > +	 block and the new loop header.  This allows us to later split the
> > > +	 preheader block and still find the right LC nodes.  */
> > > +      for (auto gsi_from = gsi_start_phis (new_loop->header),
> > > +	   gsi_to = gsi_start_phis (loop->header);
> > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > 
> > same if (flow_loops)
> > 
> > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > +	{
> > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > +						loop_latch_edge (new_loop));
> > 
> > this looks wrong?  IMHO it should be the PHI_RESULT, no?  Note this only
> > triggers for alignment peeling ...
> > 
> > Otherwise looks OK.
> > 
> > Thanks,
> > Richard.
> > 
> > 
> > > +	  adjust_phi_and_debug_stmts (to_phi, loop_preheader_edge (loop),
> > > +				      new_arg);
> > > +	}
> > > +
> > >        if (scalar_loop != loop)
> > >  	{
> > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */ @@
> > > -1627,29 +1725,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> > *loop, edge loop_exit,
> > >  			       loop_preheader_edge (new_loop)->src);
> > >      }
> > >
> > > -  if (scalar_loop != loop)
> > > -    {
> > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > -      gphi_iterator gsi_orig, gsi_new;
> > > -      edge orig_e = loop_preheader_edge (loop);
> > > -      edge new_e = loop_preheader_edge (new_loop);
> > > -
> > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > -	{
> > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > -	  gphi *new_phi = gsi_new.phi ();
> > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > -	  location_t orig_locus
> > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > -
> > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > -	}
> > > -    }
> > > -
> > >    free (new_bbs);
> > >    free (bbs);
> > >
> > > @@ -2579,139 +2654,36 @@ vect_gen_vector_loop_niters_mult_vf
> > > (loop_vec_info loop_vinfo,
> > >
> > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > >     this function searches for the corresponding lcssa phi node in exit
> > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > -   NULL.  */
> > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > +   return the phi result; otherwise return NULL.  */
> > >
> > >  static tree
> > >  find_guard_arg (class loop *loop ATTRIBUTE_UNUSED,
> > >  		class loop *epilog ATTRIBUTE_UNUSED,
> > > -		const_edge e, gphi *lcssa_phi)
> > > +		const_edge e, gphi *lcssa_phi, int lcssa_edge = 0)
> > >  {
> > >    gphi_iterator gsi;
> > >
> > > -  gcc_assert (single_pred_p (e->dest));
> > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > >      {
> > >        gphi *phi = gsi.phi ();
> > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > -	return PHI_RESULT (phi);
> > > -    }
> > > -  return NULL_TREE;
> > > -}
> > > -
> > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > FIRST/SECOND
> > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > -   edge, the two loops are arranged as below:
> > > -
> > > -       preheader_a:
> > > -     first_loop:
> > > -       header_a:
> > > -	 i_1 = PHI<i_0, i_2>;
> > > -	 ...
> > > -	 i_2 = i_1 + 1;
> > > -	 if (cond_a)
> > > -	   goto latch_a;
> > > -	 else
> > > -	   goto between_bb;
> > > -       latch_a:
> > > -	 goto header_a;
> > > -
> > > -       between_bb:
> > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > -
> > > -     second_loop:
> > > -       header_b:
> > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > -				 or with i_2 if no LCSSA phi is created
> > > -				 under condition of
> > CREATE_LCSSA_FOR_IV_PHIS.
> > > -	 ...
> > > -	 i_4 = i_3 + 1;
> > > -	 if (cond_b)
> > > -	   goto latch_b;
> > > -	 else
> > > -	   goto exit_bb;
> > > -       latch_b:
> > > -	 goto header_b;
> > > -
> > > -       exit_bb:
> > > -
> > > -   This function creates loop closed SSA for the first loop; update the
> > > -   second loop's PHI nodes by replacing argument on incoming edge with the
> > > -   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
> > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > -   the first loop.
> > > -
> > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > -   the second loop will execute rest iterations of the first.  */
> > > -
> > > -static void
> > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > -				   class loop *first, edge first_loop_e,
> > > -				   class loop *second, edge second_loop_e,
> > > -				   bool create_lcssa_for_iv_phis)
> > > -{
> > > -  gphi_iterator gsi_update, gsi_orig;
> > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > -
> > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > -  basic_block between_bb = first_loop_e->dest;
> > > -
> > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > > (between_bb));
> > > -  /* Either the first loop or the second is the loop to be
> > > vectorized.  */
> > > -  gcc_assert (loop == first || loop == second);
> > > -
> > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > -       gsi_update = gsi_start_phis (second->header);
> > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > -    {
> > > -      gphi *orig_phi = gsi_orig.phi ();
> > > -      gphi *update_phi = gsi_update.phi ();
> > > -
> > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > -      /* Generate lcssa PHI node for the first loop.  */
> > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > +      /* Nested loops with multiple exits can have different no# phi node
> > > +	arguments between the main loop and epilog as epilog falls to the
> > > +	second loop.  */
> > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > >  	{
> > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > -	  add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION);
> > > -	  arg = new_res;
> > > -	}
> > > -
> > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > -	 incoming edge.  */
> > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
> > > -    }
> > > -
> > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > -     for correct vectorization of live stmts.  */
> > > -  if (loop == first)
> > > -    {
> > > -      basic_block orig_exit = second_loop_e->dest;
> > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > -	{
> > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > (orig_arg))
> > > -	    continue;
> > > -
> > > -	  const_edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > -	  /* Already created in the above loop.   */
> > > -	  if (find_guard_arg (first, second, exit_e, orig_phi))
> > > +	 tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > +	 if (TREE_CODE (var) != SSA_NAME)
> > >  	    continue;
> > > -
> > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > -	  add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION);
> > > +	 tree def = get_current_def (var);
> > > +	 if (!def)
> > > +	   continue;
> > > +	 if (operand_equal_p (def,
> > > +			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > +	   return PHI_RESULT (phi);
> > >  	}
> > >      }
> > > +  return NULL_TREE;
> > >  }
> > >
> > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > > beginning @@ -2796,11 +2768,11 @@
> > slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
> > >      }
> > >  }
> > >
> > > -/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
> > > -   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
> > > -   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
> > > -   and MERGE_EDGE are the two pred edges of merge_bb at the end of
> > EPILOG.
> > > -   The CFG looks like:
> > > +/* LOOP and EPILOG are two consecutive loops in CFG connected by
> > LOOP_EXIT edge
> > > +   and EPILOG is copied from LOOP.  Function slpeel_add_loop_guard adds
> > guard
> > > +   skipping from a point between the two loops to the end of EPILOG.  Edges
> > > +   GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at
> > the end of
> > > +   EPILOG.  The CFG looks like:
> > >
> > >       loop:
> > >         header_a:
> > > @@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop
> > > *skip_loop,
> > >
> > >  static void
> > >  slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop
> > > *epilog,
> > > +				    const_edge loop_exit,
> > >  				    edge guard_edge, edge merge_edge)  {
> > >    gphi_iterator gsi;
> > > @@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > >    gcc_assert (single_succ_p (merge_bb));
> > >    edge e = single_succ_edge (merge_bb);
> > >    basic_block exit_bb = e->dest;
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > >
> > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > >      {
> > >        gphi *update_phi = gsi.phi ();
> > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > >
> > >        tree merge_arg = NULL_TREE;
> > >
> > > @@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> > *loop, class loop *epilog,
> > >        if (!merge_arg)
> > >  	merge_arg = old_arg;
> > >
> > > -      tree guard_arg
> > > -	= find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> > > +      tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
> > > +				       update_phi, e->dest_idx);
> > >        /* If the var is live after loop but not a reduction, we simply
> > >  	 use the old arg.  */
> > >        if (!guard_arg)
> > > @@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > >      }
> > >  }
> > >
> > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > -
> > > -static void
> > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog) -{
> > > -  gphi_iterator gsi;
> > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > -
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > -}
> > > -
> > >  /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be
> > skipped.
> > >     Return a value that equals:
> > >
> > > @@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  						       e, &prolog_e);
> > >        gcc_assert (prolog);
> > >        prolog->force_vectorize = false;
> > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> > > -					 exit_e, true);
> > > +
> > >        first_loop = prolog;
> > >        reset_original_copy_tables ();
> > >
> > > @@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >        LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
> > >        gcc_assert (epilog);
> > >        epilog->force_vectorize = false;
> > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> > > -					 new_epilog_e, false);
> > >        bb_before_epilog = loop_preheader_edge (epilog)->src;
> > >
> > >        /* Scalar version loop may be preferred.  In this case, add
> > > guard @@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo, tree niters, tree nitersm1,
> > >  					   irred_flag);
> > >  	  if (vect_epilogues)
> > >  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > -	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > epilog_e);
> > > +	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv,
> > guard_e,
> > > +					      epilog_e);
> > >  	  /* Only need to handle basic block before epilog loop if it's not
> > >  	     the guard_bb, which is the case when skip_vector is true.  */
> > >  	  if (guard_bb != bb_before_epilog)
> > > @@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  	    }
> > >  	  scale_loop_profile (epilog, prob_epilog, -1);
> > >  	}
> > > -      else
> > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > >
> > >        unsigned HOST_WIDE_INT bound;
> > >        if (bound_scalar.is_constant (&bound)) diff --git
> > > a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e0
> > 24d666d
> > > f46ef9208107 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >    basic_block exit_bb;
> > >    tree scalar_dest;
> > >    tree scalar_type;
> > > -  gimple *new_phi = NULL, *phi;
> > > +  gimple *new_phi = NULL, *phi = NULL;
> > >    gimple_stmt_iterator exit_gsi;
> > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > >    gimple *epilog_stmt = NULL;
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd601
> > 24434039
> > > 97e921066483 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const
> > class loop *, const_edge,
> > >  					 const_edge);
> > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> > >  						    class loop *, edge,
> > > -						    edge, edge *);
> > > +						    edge, edge *, bool = true);
> > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);  extern
> > > class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > >  				    tree *, tree *, tree *, int, bool, bool,
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-10-11 12:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-02  7:41 [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-10-02  7:41 ` [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-10-10 11:13   ` Richard Biener
2023-10-11 10:54     ` Tamar Christina
2023-10-11 12:08       ` Richard Biener
2023-10-02  7:42 ` [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling Tamar Christina
2023-10-10 12:59   ` Richard Biener
2023-10-11 11:16     ` Tamar Christina
2023-10-11 12:09       ` Richard Biener
2023-10-09 13:35 ` [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Richard Biener
2023-10-11 10:45   ` Tamar Christina
2023-10-11 12:07     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).