public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
@ 2023-12-29 21:05 Tamar Christina
  2024-01-08 12:38 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Tamar Christina @ 2023-12-29 21:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 20116 bytes --]

Hi All,

This patch fixes several interconnected issues.

1. When picking an exit we wanted to check for niter_desc.may_be_zero not true.
   i.e. we want to pick an exit which we know will iterate at least once.
   However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
   a boolean value.  !niter_desc.may_be_zero is just checking if we have some
   information, not what the information is.  This leads us to pick a more
   difficult to vectorize exit more often than we should.

2. Because we had this bug, we used to pick an alternative exit much more ofthen
   which showed one issue, when the loop accesses memory and we "invert it" we
   would corrupt the VUSE chain.  This is because on an peeled vector iteration
   every exit restarts the loop (i.e. they're all early) BUT since we may have
   performed a store, the vUSE would need to be updated.  This version maintains
   virtual PHIs correctly in these cases.   Note that we can't simply remove all
   of them and recreate them because we need the PHI nodes still in the right
   order for if skip_vector.

3. Since we're moving the stores to a safe location I don't think we actually
   need to analyze whether the store is in range of the memref,  because if we
   ever get there, we know that the loads must be in range, and if the loads are
   in range and we get to the store we know the early breaks were not taken and
   so the scalar loop would have done the VF stores too.

4. Instead of searching for where to move stores to, they should always be in
   exit belonging to the latch.  We can only ever delay stores and even if we
   pick a different exit than the latch one as the main one, effects still
   happen in program order when vectorized.  If we don't move the stores to the
   latch exit but instead to whever we pick as the "main" exit then we can
   perform incorrect memory accesses (luckily these are trapped by verify_ssa).

5. We only used to analyze loads inside the same BB as an early break, and also
   we'd never analyze the ones inside the block where we'd be moving memory
   references to.  This is obviously bogus and to fix it this patch splits apart
   the two constraints.  We first validate that all load memory references are
   in bounds and only after that do we perform the alias checks for the writes.
   This makes the code simpler to understand and more trivially correct.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues with --enable-checking=release --enable-lto
--with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/113137
	PR tree-optimization/113136
	PR tree-optimization/113172
	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
	(vect_do_peeling): Maintain virtual PHIs on inverted loops.
	* tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
	latch.
	(vect_create_loop_vinfo): Record all conds instead of only alt ones.
	* tree-vectorizer.h: Fix comment

gcc/testsuite/ChangeLog:

	PR tree-optimization/113137
	PR tree-optimization/113136
	PR tree-optimization/113172
	* g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
	* g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
	* gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
	* gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
	* gcc.dg/vect/vect-early-break_97-pr113172.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
new file mode 100644
index 0000000000000000000000000000000000000000..f78db8669dcc65f1b45ea78f4433d175e1138332
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int b;
+void a() __attribute__((__noreturn__));
+void c() {
+  char *buf;
+  int bufsz = 64;
+  while (b) {
+    !bufsz ? a(), 0 : *buf++ = bufsz--;
+    b -= 4;
+  }
+}
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
new file mode 100644
index 0000000000000000000000000000000000000000..dcd19fa2d2145e09de18279479b3f20fc27336ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+char UnpackReadTables_BitLength[20];
+int UnpackReadTables_ZeroCount;
+void UnpackReadTables() {
+  for (unsigned I = 0; I < 20;)
+    while (UnpackReadTables_ZeroCount-- &&
+           I < sizeof(UnpackReadTables_BitLength))
+      UnpackReadTables_BitLength[I++] = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
new file mode 100644
index 0000000000000000000000000000000000000000..e8f5b06834d1076cb4d0d166a3aff50c9cdda330
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-w" } */
+
+short gen_to_words_words;
+void gen_to_words() {
+  unsigned short *lp = &gen_to_words_words;
+  long carry;
+  for (carry = 1, lp--; carry; lp--) {
+    carry = *lp + carry;
+    *lp = carry >>= 16;
+    if (lp == &gen_to_words_words)
+      break;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
new file mode 100644
index 0000000000000000000000000000000000000000..016e749b0e3a9b9e4dbec0a192f2a0cb274b1a06
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; };
+static const char R64_ARRAY[] = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+char *
+_l64a_r (struct _reent *rptr,
+     long value)
+{
+  char *ptr;
+  char *result;
+  int i, index;
+  unsigned long tmp = (unsigned long)value & 0xffffffff;
+  result = 
+          ((
+          rptr
+          )->_new._reent._l64a_buf)
+                               ;
+  ptr = result;
+  for (i = 0; i < 60; ++i)
+    {
+      if (tmp == 0)
+ {
+   *ptr = '\0';
+   break;
+ }
+      *ptr++ = R64_ARRAY[index];
+      tmp >>= 6;
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
new file mode 100644
index 0000000000000000000000000000000000000000..625b227b626491a1e4e2f3e02a3689c5ccb50089
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int tswchp_2;
+short cpy_buf[8];
+void ts_endcmd() {
+  int i = 0;
+  for (; i < 8 && i < tswchp_2; i++)
+    cpy_buf[i] = i;
+}
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 3d9673fb0b580ff21ff151dc5c199840df41a1cd..cbec6d64aabc99be9cc67a00b5dee795b69789c2 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -671,10 +671,82 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 		     "loop contains multiple exits, analyzing"
 		     " statement dependencies.\n");
 
+  /* Since we don't support general control flow, the location we'll move the
+     side-effects to is always the latch connected exit.  When we support
+     general control flow we can do better but for now this is fine.  */
+  dest_bb = single_pred (loop->latch);
+
+  /* First check if all load references in the loop are within bounds.  This
+     is very conservative for now.  */
+  auto_vec<data_reference_p> datarefs;
+  if (find_data_references_in_loop (loop, &datarefs) == chrec_dont_know)
+    return opt_result::failure_at (vect_location,
+				 "Could not analyze all data references in the "
+				 "the loop.  Unable to continue.\n");
+  for (auto dr_ref : datarefs)
+    {
+      if (!DR_IS_READ (dr_ref))
+	continue;
+
+      gimple *stmt = DR_STMT (dr_ref);
+
+      /* We currently only support statically allocated objects due to
+	 not having first-faulting loads support or peeling for
+	 alignment support.  Compute the size of the referenced object
+	 (it could be dynamically allocated).  */
+      tree obj = DR_BASE_ADDRESS (dr_ref);
+      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "early breaks only supported on statically"
+			 " allocated objects.\n");
+	  return opt_result::failure_at (stmt,
+			 "can't safely apply code motion to "
+			 "dependencies of %G to vectorize "
+			 "the early exit.\n", stmt);
+	}
+
+      tree refop = TREE_OPERAND (obj, 0);
+      tree refbase = get_base_address (refop);
+      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "early breaks only supported on"
+			 " statically allocated objects.\n");
+	  return opt_result::failure_at (stmt,
+			 "can't safely apply code motion to "
+			 "dependencies of %G to vectorize "
+			 "the early exit.\n", stmt);
+	}
+
+      /* Check if vector accesses to the object will be within bounds.
+	 must be a constant or assume loop will be versioned or niters
+	 bounded by VF so accesses are within range.  */
+      if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "early breaks not supported: vectorization "
+			 "would %s beyond size of obj.",
+			 DR_IS_READ (dr_ref) ? "read" : "write");
+	  return opt_result::failure_at (stmt,
+			 "can't safely apply code motion to "
+			 "dependencies of %G to vectorize "
+			 "the early exit.\n", stmt);
+	}
+    }
+
   for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
     {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "inspecting exit: %G", c);
+
       stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
-      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type
+	  || gimple_bb (c) == dest_bb)
 	continue;
 
       gimple_stmt_iterator gsi = gsi_for_stmt (c);
@@ -694,54 +766,9 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 	  if (!dr_ref)
 	    continue;
 
-	  /* We currently only support statically allocated objects due to
-	     not having first-faulting loads support or peeling for
-	     alignment support.  Compute the size of the referenced object
-	     (it could be dynamically allocated).  */
-	  tree obj = DR_BASE_ADDRESS (dr_ref);
-	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "early breaks only supported on statically"
-				 " allocated objects.\n");
-	      return opt_result::failure_at (c,
-				 "can't safely apply code motion to "
-				 "dependencies of %G to vectorize "
-				 "the early exit.\n", c);
-	    }
-
-	  tree refop = TREE_OPERAND (obj, 0);
-	  tree refbase = get_base_address (refop);
-	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
-	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "early breaks only supported on"
-				 " statically allocated objects.\n");
-	      return opt_result::failure_at (c,
-				 "can't safely apply code motion to "
-				 "dependencies of %G to vectorize "
-				 "the early exit.\n", c);
-	    }
-
-	  /* Check if vector accesses to the object will be within bounds.
-	     must be a constant or assume loop will be versioned or niters
-	     bounded by VF so accesses are within range.  */
-	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "early breaks not supported: vectorization "
-				 "would %s beyond size of obj.",
-				 DR_IS_READ (dr_ref) ? "read" : "write");
-	      return opt_result::failure_at (c,
-				 "can't safely apply code motion to "
-				 "dependencies of %G to vectorize "
-				 "the early exit.\n", c);
-	    }
-
+	  /* Stores will be delayed until after the early breaks.  Because of
+	     that we don't need to know their bounds as we'd only perform them
+	     if we knew the vector iteration gets there.  */
 	  if (DR_IS_READ (dr_ref))
 	    bases.safe_push (dr_ref);
 	  else if (DR_IS_WRITE (dr_ref))
@@ -801,27 +828,12 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 				 "marked statement for vUSE update: %G", stmt);
 	    }
 	}
-
-      /* Save destination as we go, BB are visited in order and the last one
-	is where statements should be moved to.  */
-      if (!dest_bb)
-	dest_bb = gimple_bb (c);
-      else
-	{
-	  basic_block curr_bb = gimple_bb (c);
-	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
-	    dest_bb = curr_bb;
-	}
     }
 
-  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
-  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
-  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
   /* We don't allow outer -> inner loop transitions which should have been
      trapped already during loop form analysis.  */
   gcc_assert (dest_bb->loop_father == loop);
 
-  gcc_assert (dest_bb);
   LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
 
   if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 295e1c91687461749e5ac7c5bf5e5fdcd8660d64..76d4979c0b3b374dcaacf6825a95a8714114a63b 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1626,11 +1626,12 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (e);
 	}
 
+      bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
       /* Record the new SSA names in the cache so that we can skip materializing
 	 them again when we fill in the rest of the LCSSA variables.  */
       for (auto phi : new_phis)
 	{
-	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
+	  tree new_arg = gimple_phi_arg_def (phi, loop_exit->dest_idx);
 
 	  if (!SSA_VAR_P (new_arg))
 	    continue;
@@ -1653,7 +1654,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	      continue;
 	    }
 
-	  /* If we decide to remove the PHI node we should also not
+	  /* If we decided not to remove the PHI node we should also not
 	     rematerialize it later on.  */
 	  new_phi_args.put (new_arg, gimple_phi_result (phi));
 
@@ -1667,7 +1668,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       edge loop_entry = single_succ_edge (new_preheader);
       if (flow_loops)
 	{
-	  bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
 	  /* Link through the main exit first.  */
 	  for (auto gsi_from = gsi_start_phis (loop->header),
 	       gsi_to = gsi_start_phis (new_loop->header);
@@ -1693,8 +1693,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 		    }
 		}
 	      /* If we have multiple exits and the vector loop is peeled then we
-		 need to use the value at start of loop.  */
-	      if (peeled_iters)
+		 need to use the value at start of loop.  If we're looking at
+		 virtual operands we have to keep the original link.   Virtual
+		 operands don't all become the same because we'll corrupt the
+		 vUSE chains among others.  */
+	      if (peeled_iters && !virtual_operand_p (new_arg))
 		{
 		  tree tmp_arg = gimple_phi_result (from_phi);
 		  if (!new_phi_args.get (tmp_arg))
@@ -1728,9 +1731,26 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 
 		  tree alt_arg = gimple_phi_result (from_phi);
 		  edge main_e = single_succ_edge (alt_loop_exit_block);
-		  for (edge e : loop_exits)
-		    if (e != loop_exit)
-		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
+
+		  /* Now update the virtual PHI nodes with the right value.  */
+		  if (peeled_iters
+		      && virtual_operand_p (alt_arg)
+		      && flow_bb_inside_loop_p (loop,
+				gimple_bb (SSA_NAME_DEF_STMT (alt_arg))))
+		    {
+			/* Link the alternative exit one.  */
+			tree def
+			  = gimple_phi_arg_def (to_phi, loop_exit->dest_idx);
+			gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (def));
+			tree exit_val = gimple_phi_arg_def (phi, 0);
+			SET_PHI_ARG_DEF (phi, 0, alt_arg);
+
+			/* And now the main merge block.  */
+			alt_arg = copy_ssa_name (def);
+			gphi *l_phi = create_phi_node (alt_arg, main_e->src);
+			SET_PHI_ARG_DEF (l_phi, 0, exit_val);
+		    }
+		  SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
 		}
 
 	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
@@ -3399,6 +3419,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
 					  update_e);
 
+      /* If we have a peeled vector iteration we will never skip the epilog loop
+	 and we can simplify the cfg a lot by not doing the edge split.  */
       if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f51ae3e719e753059389cf9495b6d65b3b1191cb..37f1be1101ffae779214056a0886411e0683e887 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -976,7 +976,10 @@ vec_init_loop_exit_info (class loop *loop)
       if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
 	  && !chrec_contains_undetermined (niter_desc.niter))
 	{
-	  if (!niter_desc.may_be_zero || !candidate)
+	  tree may_be_zero = niter_desc.may_be_zero;
+	  if ((may_be_zero && integer_zerop (may_be_zero))
+	     && (!candidate
+		 || dominated_by_p (CDI_DOMINATORS, exit->src, candidate->src)))
 	    candidate = exit;
 	}
     }
@@ -1913,15 +1916,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
       STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
     }
 
-  for (unsigned i = 1; i < info->conds.length (); i ++)
-    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_push (info->conds[i]);
+  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->conds);
   LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->conds[0];
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
   /* Check to see if we're vectorizing multiple exits.  */
   LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+    = LOOP_VINFO_LOOP_CONDS (loop_vinfo).length () > 1;
 
   if (info->inner_loop_cond)
     {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 785fc99ca27a4caf26b1fca887e6262108f515b2..d0ee95146b2907132bf68962e3f16d43c36afded 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -893,7 +893,7 @@ public:
      the loop for the break finding loop.  */
   bool early_breaks;
 
-  /* List of loop additional IV conditionals found in the loop.  */
+  /* List of loop all IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
   /* Main loop IV cond.  */




-- 

[-- Attachment #2: rb18118.patch --]
[-- Type: text/plain, Size: 16572 bytes --]

diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
new file mode 100644
index 0000000000000000000000000000000000000000..f78db8669dcc65f1b45ea78f4433d175e1138332
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int b;
+void a() __attribute__((__noreturn__));
+void c() {
+  char *buf;
+  int bufsz = 64;
+  while (b) {
+    !bufsz ? a(), 0 : *buf++ = bufsz--;
+    b -= 4;
+  }
+}
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
new file mode 100644
index 0000000000000000000000000000000000000000..dcd19fa2d2145e09de18279479b3f20fc27336ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+char UnpackReadTables_BitLength[20];
+int UnpackReadTables_ZeroCount;
+void UnpackReadTables() {
+  for (unsigned I = 0; I < 20;)
+    while (UnpackReadTables_ZeroCount-- &&
+           I < sizeof(UnpackReadTables_BitLength))
+      UnpackReadTables_BitLength[I++] = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
new file mode 100644
index 0000000000000000000000000000000000000000..e8f5b06834d1076cb4d0d166a3aff50c9cdda330
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-w" } */
+
+short gen_to_words_words;
+void gen_to_words() {
+  unsigned short *lp = &gen_to_words_words;
+  long carry;
+  for (carry = 1, lp--; carry; lp--) {
+    carry = *lp + carry;
+    *lp = carry >>= 16;
+    if (lp == &gen_to_words_words)
+      break;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
new file mode 100644
index 0000000000000000000000000000000000000000..016e749b0e3a9b9e4dbec0a192f2a0cb274b1a06
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; };
+static const char R64_ARRAY[] = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+char *
+_l64a_r (struct _reent *rptr,
+     long value)
+{
+  char *ptr;
+  char *result;
+  int i, index;
+  unsigned long tmp = (unsigned long)value & 0xffffffff;
+  result = 
+          ((
+          rptr
+          )->_new._reent._l64a_buf)
+                               ;
+  ptr = result;
+  for (i = 0; i < 60; ++i)
+    {
+      if (tmp == 0)
+ {
+   *ptr = '\0';
+   break;
+ }
+      *ptr++ = R64_ARRAY[index];
+      tmp >>= 6;
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
new file mode 100644
index 0000000000000000000000000000000000000000..625b227b626491a1e4e2f3e02a3689c5ccb50089
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int tswchp_2;
+short cpy_buf[8];
+void ts_endcmd() {
+  int i = 0;
+  for (; i < 8 && i < tswchp_2; i++)
+    cpy_buf[i] = i;
+}
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 3d9673fb0b580ff21ff151dc5c199840df41a1cd..cbec6d64aabc99be9cc67a00b5dee795b69789c2 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -671,10 +671,82 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 		     "loop contains multiple exits, analyzing"
 		     " statement dependencies.\n");
 
+  /* Since we don't support general control flow, the location we'll move the
+     side-effects to is always the latch connected exit.  When we support
+     general control flow we can do better but for now this is fine.  */
+  dest_bb = single_pred (loop->latch);
+
+  /* First check if all load references in the loop are within bounds.  This
+     is very conservative for now.  */
+  auto_vec<data_reference_p> datarefs;
+  if (find_data_references_in_loop (loop, &datarefs) == chrec_dont_know)
+    return opt_result::failure_at (vect_location,
+				 "Could not analyze all data references in the "
+				 "the loop.  Unable to continue.\n");
+  for (auto dr_ref : datarefs)
+    {
+      if (!DR_IS_READ (dr_ref))
+	continue;
+
+      gimple *stmt = DR_STMT (dr_ref);
+
+      /* We currently only support statically allocated objects due to
+	 not having first-faulting loads support or peeling for
+	 alignment support.  Compute the size of the referenced object
+	 (it could be dynamically allocated).  */
+      tree obj = DR_BASE_ADDRESS (dr_ref);
+      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "early breaks only supported on statically"
+			 " allocated objects.\n");
+	  return opt_result::failure_at (stmt,
+			 "can't safely apply code motion to "
+			 "dependencies of %G to vectorize "
+			 "the early exit.\n", stmt);
+	}
+
+      tree refop = TREE_OPERAND (obj, 0);
+      tree refbase = get_base_address (refop);
+      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "early breaks only supported on"
+			 " statically allocated objects.\n");
+	  return opt_result::failure_at (stmt,
+			 "can't safely apply code motion to "
+			 "dependencies of %G to vectorize "
+			 "the early exit.\n", stmt);
+	}
+
+      /* Check if vector accesses to the object will be within bounds.
+	 must be a constant or assume loop will be versioned or niters
+	 bounded by VF so accesses are within range.  */
+      if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "early breaks not supported: vectorization "
+			 "would %s beyond size of obj.",
+			 DR_IS_READ (dr_ref) ? "read" : "write");
+	  return opt_result::failure_at (stmt,
+			 "can't safely apply code motion to "
+			 "dependencies of %G to vectorize "
+			 "the early exit.\n", stmt);
+	}
+    }
+
   for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
     {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "inspecting exit: %G", c);
+
       stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
-      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type
+	  || gimple_bb (c) == dest_bb)
 	continue;
 
       gimple_stmt_iterator gsi = gsi_for_stmt (c);
@@ -694,54 +766,9 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 	  if (!dr_ref)
 	    continue;
 
-	  /* We currently only support statically allocated objects due to
-	     not having first-faulting loads support or peeling for
-	     alignment support.  Compute the size of the referenced object
-	     (it could be dynamically allocated).  */
-	  tree obj = DR_BASE_ADDRESS (dr_ref);
-	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "early breaks only supported on statically"
-				 " allocated objects.\n");
-	      return opt_result::failure_at (c,
-				 "can't safely apply code motion to "
-				 "dependencies of %G to vectorize "
-				 "the early exit.\n", c);
-	    }
-
-	  tree refop = TREE_OPERAND (obj, 0);
-	  tree refbase = get_base_address (refop);
-	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
-	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "early breaks only supported on"
-				 " statically allocated objects.\n");
-	      return opt_result::failure_at (c,
-				 "can't safely apply code motion to "
-				 "dependencies of %G to vectorize "
-				 "the early exit.\n", c);
-	    }
-
-	  /* Check if vector accesses to the object will be within bounds.
-	     must be a constant or assume loop will be versioned or niters
-	     bounded by VF so accesses are within range.  */
-	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "early breaks not supported: vectorization "
-				 "would %s beyond size of obj.",
-				 DR_IS_READ (dr_ref) ? "read" : "write");
-	      return opt_result::failure_at (c,
-				 "can't safely apply code motion to "
-				 "dependencies of %G to vectorize "
-				 "the early exit.\n", c);
-	    }
-
+	  /* Stores will be delayed until after the early breaks.  Because of
+	     that we don't need to know their bounds as we'd only perform them
+	     if we knew the vector iteration gets there.  */
 	  if (DR_IS_READ (dr_ref))
 	    bases.safe_push (dr_ref);
 	  else if (DR_IS_WRITE (dr_ref))
@@ -801,27 +828,12 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 				 "marked statement for vUSE update: %G", stmt);
 	    }
 	}
-
-      /* Save destination as we go, BB are visited in order and the last one
-	is where statements should be moved to.  */
-      if (!dest_bb)
-	dest_bb = gimple_bb (c);
-      else
-	{
-	  basic_block curr_bb = gimple_bb (c);
-	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
-	    dest_bb = curr_bb;
-	}
     }
 
-  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
-  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
-  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
   /* We don't allow outer -> inner loop transitions which should have been
      trapped already during loop form analysis.  */
   gcc_assert (dest_bb->loop_father == loop);
 
-  gcc_assert (dest_bb);
   LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
 
   if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 295e1c91687461749e5ac7c5bf5e5fdcd8660d64..76d4979c0b3b374dcaacf6825a95a8714114a63b 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1626,11 +1626,12 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (e);
 	}
 
+      bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
       /* Record the new SSA names in the cache so that we can skip materializing
 	 them again when we fill in the rest of the LCSSA variables.  */
       for (auto phi : new_phis)
 	{
-	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
+	  tree new_arg = gimple_phi_arg_def (phi, loop_exit->dest_idx);
 
 	  if (!SSA_VAR_P (new_arg))
 	    continue;
@@ -1653,7 +1654,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	      continue;
 	    }
 
-	  /* If we decide to remove the PHI node we should also not
+	  /* If we decided not to remove the PHI node we should also not
 	     rematerialize it later on.  */
 	  new_phi_args.put (new_arg, gimple_phi_result (phi));
 
@@ -1667,7 +1668,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       edge loop_entry = single_succ_edge (new_preheader);
       if (flow_loops)
 	{
-	  bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
 	  /* Link through the main exit first.  */
 	  for (auto gsi_from = gsi_start_phis (loop->header),
 	       gsi_to = gsi_start_phis (new_loop->header);
@@ -1693,8 +1693,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 		    }
 		}
 	      /* If we have multiple exits and the vector loop is peeled then we
-		 need to use the value at start of loop.  */
-	      if (peeled_iters)
+		 need to use the value at start of loop.  If we're looking at
+		 virtual operands we have to keep the original link.   Virtual
+		 operands don't all become the same because we'll corrupt the
+		 vUSE chains among others.  */
+	      if (peeled_iters && !virtual_operand_p (new_arg))
 		{
 		  tree tmp_arg = gimple_phi_result (from_phi);
 		  if (!new_phi_args.get (tmp_arg))
@@ -1728,9 +1731,26 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 
 		  tree alt_arg = gimple_phi_result (from_phi);
 		  edge main_e = single_succ_edge (alt_loop_exit_block);
-		  for (edge e : loop_exits)
-		    if (e != loop_exit)
-		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
+
+		  /* Now update the virtual PHI nodes with the right value.  */
+		  if (peeled_iters
+		      && virtual_operand_p (alt_arg)
+		      && flow_bb_inside_loop_p (loop,
+				gimple_bb (SSA_NAME_DEF_STMT (alt_arg))))
+		    {
+			/* Link the alternative exit one.  */
+			tree def
+			  = gimple_phi_arg_def (to_phi, loop_exit->dest_idx);
+			gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (def));
+			tree exit_val = gimple_phi_arg_def (phi, 0);
+			SET_PHI_ARG_DEF (phi, 0, alt_arg);
+
+			/* And now the main merge block.  */
+			alt_arg = copy_ssa_name (def);
+			gphi *l_phi = create_phi_node (alt_arg, main_e->src);
+			SET_PHI_ARG_DEF (l_phi, 0, exit_val);
+		    }
+		  SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
 		}
 
 	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
@@ -3399,6 +3419,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
 					  update_e);
 
+      /* If we have a peeled vector iteration we will never skip the epilog loop
+	 and we can simplify the cfg a lot by not doing the edge split.  */
       if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f51ae3e719e753059389cf9495b6d65b3b1191cb..37f1be1101ffae779214056a0886411e0683e887 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -976,7 +976,10 @@ vec_init_loop_exit_info (class loop *loop)
       if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
 	  && !chrec_contains_undetermined (niter_desc.niter))
 	{
-	  if (!niter_desc.may_be_zero || !candidate)
+	  tree may_be_zero = niter_desc.may_be_zero;
+	  if ((may_be_zero && integer_zerop (may_be_zero))
+	     && (!candidate
+		 || dominated_by_p (CDI_DOMINATORS, exit->src, candidate->src)))
 	    candidate = exit;
 	}
     }
@@ -1913,15 +1916,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
       STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
     }
 
-  for (unsigned i = 1; i < info->conds.length (); i ++)
-    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_push (info->conds[i]);
+  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->conds);
   LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->conds[0];
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
   /* Check to see if we're vectorizing multiple exits.  */
   LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+    = LOOP_VINFO_LOOP_CONDS (loop_vinfo).length () > 1;
 
   if (info->inner_loop_cond)
     {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 785fc99ca27a4caf26b1fca887e6262108f515b2..d0ee95146b2907132bf68962e3f16d43c36afded 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -893,7 +893,7 @@ public:
      the loop for the break finding loop.  */
   bool early_breaks;
 
-  /* List of loop additional IV conditionals found in the loop.  */
+  /* List of loop all IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
   /* Main loop IV cond.  */




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
  2023-12-29 21:05 [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands Tamar Christina
@ 2024-01-08 12:38 ` Richard Biener
  2024-01-08 14:16   ` Tamar Christina
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2024-01-08 12:38 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 29 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch fixes several interconnected issues.
> 
> 1. When picking an exit we wanted to check for niter_desc.may_be_zero not true.
>    i.e. we want to pick an exit which we know will iterate at least once.
>    However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
>    a boolean value.  !niter_desc.may_be_zero is just checking if we have some
>    information, not what the information is.  This leads us to pick a more
>    difficult to vectorize exit more often than we should.
> 
> 2. Because we had this bug, we used to pick an alternative exit much more ofthen
>    which showed one issue, when the loop accesses memory and we "invert it" we
>    would corrupt the VUSE chain.  This is because on an peeled vector iteration
>    every exit restarts the loop (i.e. they're all early) BUT since we may have
>    performed a store, the vUSE would need to be updated.  This version maintains
>    virtual PHIs correctly in these cases.   Note that we can't simply remove all
>    of them and recreate them because we need the PHI nodes still in the right
>    order for if skip_vector.
> 
> 3. Since we're moving the stores to a safe location I don't think we actually
>    need to analyze whether the store is in range of the memref,  because if we
>    ever get there, we know that the loads must be in range, and if the loads are
>    in range and we get to the store we know the early breaks were not taken and
>    so the scalar loop would have done the VF stores too.
> 
> 4. Instead of searching for where to move stores to, they should always be in
>    exit belonging to the latch.  We can only ever delay stores and even if we
>    pick a different exit than the latch one as the main one, effects still
>    happen in program order when vectorized.  If we don't move the stores to the
>    latch exit but instead to whever we pick as the "main" exit then we can
>    perform incorrect memory accesses (luckily these are trapped by verify_ssa).
> 
> 5. We only used to analyze loads inside the same BB as an early break, and also
>    we'd never analyze the ones inside the block where we'd be moving memory
>    references to.  This is obviously bogus and to fix it this patch splits apart
>    the two constraints.  We first validate that all load memory references are
>    in bounds and only after that do we perform the alias checks for the writes.
>    This makes the code simpler to understand and more trivially correct.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues with --enable-checking=release --enable-lto
> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/113137
> 	PR tree-optimization/113136
> 	PR tree-optimization/113172
> 	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> 	(vect_do_peeling): Maintain virtual PHIs on inverted loops.
> 	* tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
> 	latch.
> 	(vect_create_loop_vinfo): Record all conds instead of only alt ones.
> 	* tree-vectorizer.h: Fix comment
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/113137
> 	PR tree-optimization/113136
> 	PR tree-optimization/113172
> 	* g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
> 	* g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
> 	* gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
> 	* gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
> 	* gcc.dg/vect/vect-early-break_97-pr113172.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..f78db8669dcc65f1b45ea78f4433d175e1138332
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int b;
> +void a() __attribute__((__noreturn__));
> +void c() {
> +  char *buf;
> +  int bufsz = 64;
> +  while (b) {
> +    !bufsz ? a(), 0 : *buf++ = bufsz--;
> +    b -= 4;
> +  }
> +}
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..dcd19fa2d2145e09de18279479b3f20fc27336ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +char UnpackReadTables_BitLength[20];
> +int UnpackReadTables_ZeroCount;
> +void UnpackReadTables() {
> +  for (unsigned I = 0; I < 20;)
> +    while (UnpackReadTables_ZeroCount-- &&
> +           I < sizeof(UnpackReadTables_BitLength))
> +      UnpackReadTables_BitLength[I++] = 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..e8f5b06834d1076cb4d0d166a3aff50c9cdda330
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-w" } */
> +
> +short gen_to_words_words;
> +void gen_to_words() {
> +  unsigned short *lp = &gen_to_words_words;
> +  long carry;
> +  for (carry = 1, lp--; carry; lp--) {
> +    carry = *lp + carry;
> +    *lp = carry >>= 16;
> +    if (lp == &gen_to_words_words)
> +      break;
> +  }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..016e749b0e3a9b9e4dbec0a192f2a0cb274b1a06
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; };
> +static const char R64_ARRAY[] = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
> +char *
> +_l64a_r (struct _reent *rptr,
> +     long value)
> +{
> +  char *ptr;
> +  char *result;
> +  int i, index;
> +  unsigned long tmp = (unsigned long)value & 0xffffffff;
> +  result = 
> +          ((
> +          rptr
> +          )->_new._reent._l64a_buf)
> +                               ;
> +  ptr = result;
> +  for (i = 0; i < 60; ++i)
> +    {
> +      if (tmp == 0)
> + {
> +   *ptr = '\0';
> +   break;
> + }
> +      *ptr++ = R64_ARRAY[index];
> +      tmp >>= 6;
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..625b227b626491a1e4e2f3e02a3689c5ccb50089
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int tswchp_2;
> +short cpy_buf[8];
> +void ts_endcmd() {
> +  int i = 0;
> +  for (; i < 8 && i < tswchp_2; i++)
> +    cpy_buf[i] = i;
> +}
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 3d9673fb0b580ff21ff151dc5c199840df41a1cd..cbec6d64aabc99be9cc67a00b5dee795b69789c2 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -671,10 +671,82 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
>  		     "loop contains multiple exits, analyzing"
>  		     " statement dependencies.\n");
>  
> +  /* Since we don't support general control flow, the location we'll move the
> +     side-effects to is always the latch connected exit.  When we support
> +     general control flow we can do better but for now this is fine.  */
> +  dest_bb = single_pred (loop->latch);
> +
> +  /* First check if all load references in the loop are within bounds.  This
> +     is very conservative for now.  */
> +  auto_vec<data_reference_p> datarefs;
> +  if (find_data_references_in_loop (loop, &datarefs) == chrec_dont_know)

loop_vinfo->datarefs has all datarefs already, no need to gather them
again.  You are now checking _all_ reads to be in-bounds even those
that are not feeding the early exit condition by being on the
in-loop edge of it (as we never move loads).  Was that intended?
I don't think that's necessary.

Btw, the existing code relies on LOOP_VINFO_LOOP_CONDS being ordered
last-to-first for the alias check to pick up all relevant reads first.

It's a bit non-obvious what we can ignore and what not.  I think
as you describe above we can only ignore memory accesses in the
block of the main IV exit or beyond (for the "late" early break).

So maybe instead of walking LOOP_VINFO_LOOP_CONDS walk BBs starting
from the main IV exit BB single predecessor (if the main IV exit
BB isn't the loop header)?

> +    return opt_result::failure_at (vect_location,
> +				 "Could not analyze all data references in the "
> +				 "the loop.  Unable to continue.\n");
> +  for (auto dr_ref : datarefs)
> +    {
> +      if (!DR_IS_READ (dr_ref))
> +	continue;
> +
> +      gimple *stmt = DR_STMT (dr_ref);
> +
> +      /* We currently only support statically allocated objects due to
> +	 not having first-faulting loads support or peeling for
> +	 alignment support.  Compute the size of the referenced object
> +	 (it could be dynamically allocated).  */
> +      tree obj = DR_BASE_ADDRESS (dr_ref);
> +      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "early breaks only supported on statically"
> +			 " allocated objects.\n");
> +	  return opt_result::failure_at (stmt,
> +			 "can't safely apply code motion to "
> +			 "dependencies of %G to vectorize "
> +			 "the early exit.\n", stmt);
> +	}
> +
> +      tree refop = TREE_OPERAND (obj, 0);
> +      tree refbase = get_base_address (refop);
> +      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +	  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "early breaks only supported on"
> +			 " statically allocated objects.\n");
> +	  return opt_result::failure_at (stmt,
> +			 "can't safely apply code motion to "
> +			 "dependencies of %G to vectorize "
> +			 "the early exit.\n", stmt);
> +	}
> +
> +      /* Check if vector accesses to the object will be within bounds.
> +	 must be a constant or assume loop will be versioned or niters
> +	 bounded by VF so accesses are within range.  */
> +      if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "early breaks not supported: vectorization "
> +			 "would %s beyond size of obj.",
> +			 DR_IS_READ (dr_ref) ? "read" : "write");
> +	  return opt_result::failure_at (stmt,
> +			 "can't safely apply code motion to "
> +			 "dependencies of %G to vectorize "
> +			 "the early exit.\n", stmt);
> +	}
> +    }
> +
>    for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
>      {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "inspecting exit: %G", c);
> +
>        stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> -      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type
> +	  || gimple_bb (c) == dest_bb)
>  	continue;
>  
>        gimple_stmt_iterator gsi = gsi_for_stmt (c);
> @@ -694,54 +766,9 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
>  	  if (!dr_ref)
>  	    continue;
>  
> -	  /* We currently only support statically allocated objects due to
> -	     not having first-faulting loads support or peeling for
> -	     alignment support.  Compute the size of the referenced object
> -	     (it could be dynamically allocated).  */
> -	  tree obj = DR_BASE_ADDRESS (dr_ref);
> -	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> -	    {
> -	      if (dump_enabled_p ())
> -		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -				 "early breaks only supported on statically"
> -				 " allocated objects.\n");
> -	      return opt_result::failure_at (c,
> -				 "can't safely apply code motion to "
> -				 "dependencies of %G to vectorize "
> -				 "the early exit.\n", c);
> -	    }
> -
> -	  tree refop = TREE_OPERAND (obj, 0);
> -	  tree refbase = get_base_address (refop);
> -	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> -	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> -	    {
> -	      if (dump_enabled_p ())
> -		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -				 "early breaks only supported on"
> -				 " statically allocated objects.\n");
> -	      return opt_result::failure_at (c,
> -				 "can't safely apply code motion to "
> -				 "dependencies of %G to vectorize "
> -				 "the early exit.\n", c);
> -	    }
> -
> -	  /* Check if vector accesses to the object will be within bounds.
> -	     must be a constant or assume loop will be versioned or niters
> -	     bounded by VF so accesses are within range.  */
> -	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> -	    {
> -	      if (dump_enabled_p ())
> -		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -				 "early breaks not supported: vectorization "
> -				 "would %s beyond size of obj.",
> -				 DR_IS_READ (dr_ref) ? "read" : "write");
> -	      return opt_result::failure_at (c,
> -				 "can't safely apply code motion to "
> -				 "dependencies of %G to vectorize "
> -				 "the early exit.\n", c);
> -	    }
> -
> +	  /* Stores will be delayed until after the early breaks.  Because of
> +	     that we don't need to know their bounds as we'd only perform them
> +	     if we knew the vector iteration gets there.  */
>  	  if (DR_IS_READ (dr_ref))
>  	    bases.safe_push (dr_ref);
>  	  else if (DR_IS_WRITE (dr_ref))
> @@ -801,27 +828,12 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
>  				 "marked statement for vUSE update: %G", stmt);
>  	    }
>  	}
> -
> -      /* Save destination as we go, BB are visited in order and the last one
> -	is where statements should be moved to.  */
> -      if (!dest_bb)
> -	dest_bb = gimple_bb (c);
> -      else
> -	{
> -	  basic_block curr_bb = gimple_bb (c);
> -	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> -	    dest_bb = curr_bb;
> -	}
>      }
>  
> -  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
> -  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
> -  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
>    /* We don't allow outer -> inner loop transitions which should have been
>       trapped already during loop form analysis.  */
>    gcc_assert (dest_bb->loop_father == loop);
>  
> -  gcc_assert (dest_bb);
>    LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
>  
>    if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 295e1c91687461749e5ac7c5bf5e5fdcd8660d64..76d4979c0b3b374dcaacf6825a95a8714114a63b 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1626,11 +1626,12 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	  flush_pending_stmts (e);
>  	}
>  
> +      bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
>        /* Record the new SSA names in the cache so that we can skip materializing
>  	 them again when we fill in the rest of the LCSSA variables.  */
>        for (auto phi : new_phis)
>  	{
> -	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
> +	  tree new_arg = gimple_phi_arg_def (phi, loop_exit->dest_idx);
>  
>  	  if (!SSA_VAR_P (new_arg))
>  	    continue;
> @@ -1653,7 +1654,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	      continue;
>  	    }
>  
> -	  /* If we decide to remove the PHI node we should also not
> +	  /* If we decided not to remove the PHI node we should also not
>  	     rematerialize it later on.  */
>  	  new_phi_args.put (new_arg, gimple_phi_result (phi));
>  
> @@ -1667,7 +1668,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        edge loop_entry = single_succ_edge (new_preheader);
>        if (flow_loops)
>  	{
> -	  bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
>  	  /* Link through the main exit first.  */
>  	  for (auto gsi_from = gsi_start_phis (loop->header),
>  	       gsi_to = gsi_start_phis (new_loop->header);
> @@ -1693,8 +1693,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  		    }
>  		}
>  	      /* If we have multiple exits and the vector loop is peeled then we
> -		 need to use the value at start of loop.  */
> -	      if (peeled_iters)
> +		 need to use the value at start of loop.  If we're looking at
> +		 virtual operands we have to keep the original link.   Virtual
> +		 operands don't all become the same because we'll corrupt the
> +		 vUSE chains among others.  */
> +	      if (peeled_iters && !virtual_operand_p (new_arg))
>  		{
>  		  tree tmp_arg = gimple_phi_result (from_phi);
>  		  if (!new_phi_args.get (tmp_arg))
> @@ -1728,9 +1731,26 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  
>  		  tree alt_arg = gimple_phi_result (from_phi);
>  		  edge main_e = single_succ_edge (alt_loop_exit_block);
> -		  for (edge e : loop_exits)
> -		    if (e != loop_exit)
> -		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
> +
> +		  /* Now update the virtual PHI nodes with the right value.  */
> +		  if (peeled_iters
> +		      && virtual_operand_p (alt_arg)
> +		      && flow_bb_inside_loop_p (loop,
> +				gimple_bb (SSA_NAME_DEF_STMT (alt_arg))))
> +		    {
> +			/* Link the alternative exit one.  */
> +			tree def
> +			  = gimple_phi_arg_def (to_phi, loop_exit->dest_idx);
> +			gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (def));
> +			tree exit_val = gimple_phi_arg_def (phi, 0);
> +			SET_PHI_ARG_DEF (phi, 0, alt_arg);
> +
> +			/* And now the main merge block.  */
> +			alt_arg = copy_ssa_name (def);
> +			gphi *l_phi = create_phi_node (alt_arg, main_e->src);
> +			SET_PHI_ARG_DEF (l_phi, 0, exit_val);
> +		    }
> +		  SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
>  		}
>  
>  	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> @@ -3399,6 +3419,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
>  					  update_e);
>  
> +      /* If we have a peeled vector iteration we will never skip the epilog loop
> +	 and we can simplify the cfg a lot by not doing the edge split.  */
>        if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>  	{
>  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index f51ae3e719e753059389cf9495b6d65b3b1191cb..37f1be1101ffae779214056a0886411e0683e887 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -976,7 +976,10 @@ vec_init_loop_exit_info (class loop *loop)
>        if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
>  	  && !chrec_contains_undetermined (niter_desc.niter))
>  	{
> -	  if (!niter_desc.may_be_zero || !candidate)
> +	  tree may_be_zero = niter_desc.may_be_zero;
> +	  if ((may_be_zero && integer_zerop (may_be_zero))

niter_desc.may_be_zero is never NULL

The rest looks OK.  Is it possible to split out parts that not
require the vect_analyze_early_break_dependences changes?

Richard.

> +	     && (!candidate
> +		 || dominated_by_p (CDI_DOMINATORS, exit->src, candidate->src)))
>  	    candidate = exit;
>  	}
>      }
> @@ -1913,15 +1916,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>        STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
>      }
>  
> -  for (unsigned i = 1; i < info->conds.length (); i ++)
> -    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_push (info->conds[i]);
> +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->conds);
>    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->conds[0];
>  
>    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
>  
>    /* Check to see if we're vectorizing multiple exits.  */
>    LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> -    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> +    = LOOP_VINFO_LOOP_CONDS (loop_vinfo).length () > 1;
>  
>    if (info->inner_loop_cond)
>      {
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 785fc99ca27a4caf26b1fca887e6262108f515b2..d0ee95146b2907132bf68962e3f16d43c36afded 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -893,7 +893,7 @@ public:
>       the loop for the break finding loop.  */
>    bool early_breaks;
>  
> -  /* List of loop additional IV conditionals found in the loop.  */
> +  /* List of loop all IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
>    /* Main loop IV cond.  */
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
  2024-01-08 12:38 ` Richard Biener
@ 2024-01-08 14:16   ` Tamar Christina
  0 siblings, 0 replies; 3+ messages in thread
From: Tamar Christina @ 2024-01-08 14:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, January 8, 2024 12:38 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH]middle-end: maintain LCSSA form when peeled vector
> iterations have virtual operands
> 
> On Fri, 29 Dec 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch fixes several interconnected issues.
> >
> > 1. When picking an exit we wanted to check for niter_desc.may_be_zero not
> true.
> >    i.e. we want to pick an exit which we know will iterate at least once.
> >    However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
> >    a boolean value.  !niter_desc.may_be_zero is just checking if we have some
> >    information, not what the information is.  This leads us to pick a more
> >    difficult to vectorize exit more often than we should.
> >
> > 2. Because we had this bug, we used to pick an alternative exit much more ofthen
> >    which showed one issue, when the loop accesses memory and we "invert it" we
> >    would corrupt the VUSE chain.  This is because on an peeled vector iteration
> >    every exit restarts the loop (i.e. they're all early) BUT since we may have
> >    performed a store, the vUSE would need to be updated.  This version maintains
> >    virtual PHIs correctly in these cases.   Note that we can't simply remove all
> >    of them and recreate them because we need the PHI nodes still in the right
> >    order for if skip_vector.
> >
> > 3. Since we're moving the stores to a safe location I don't think we actually
> >    need to analyze whether the store is in range of the memref,  because if we
> >    ever get there, we know that the loads must be in range, and if the loads are
> >    in range and we get to the store we know the early breaks were not taken and
> >    so the scalar loop would have done the VF stores too.
> >
> > 4. Instead of searching for where to move stores to, they should always be in
> >    exit belonging to the latch.  We can only ever delay stores and even if we
> >    pick a different exit than the latch one as the main one, effects still
> >    happen in program order when vectorized.  If we don't move the stores to the
> >    latch exit but instead to whever we pick as the "main" exit then we can
> >    perform incorrect memory accesses (luckily these are trapped by verify_ssa).
> >
> > 5. We only used to analyze loads inside the same BB as an early break, and also
> >    we'd never analyze the ones inside the block where we'd be moving memory
> >    references to.  This is obviously bogus and to fix it this patch splits apart
> >    the two constraints.  We first validate that all load memory references are
> >    in bounds and only after that do we perform the alias checks for the writes.
> >    This makes the code simpler to understand and more trivially correct.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues with --enable-checking=release --enable-lto
> > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/113137
> > 	PR tree-optimization/113136
> > 	PR tree-optimization/113172
> > 	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> > 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > 	(vect_do_peeling): Maintain virtual PHIs on inverted loops.
> > 	* tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
> > 	latch.
> > 	(vect_create_loop_vinfo): Record all conds instead of only alt ones.
> > 	* tree-vectorizer.h: Fix comment
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR tree-optimization/113137
> > 	PR tree-optimization/113136
> > 	PR tree-optimization/113172
> > 	* g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
> > 	* g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
> > 	* gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
> > 	* gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
> > 	* gcc.dg/vect/vect-early-break_97-pr113172.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..f78db8669dcc65f1b45ea7
> 8f4433d175e1138332
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int b;
> > +void a() __attribute__((__noreturn__));
> > +void c() {
> > +  char *buf;
> > +  int bufsz = 64;
> > +  while (b) {
> > +    !bufsz ? a(), 0 : *buf++ = bufsz--;
> > +    b -= 4;
> > +  }
> > +}
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..dcd19fa2d2145e09de1827
> 9479b3f20fc27336ba
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +char UnpackReadTables_BitLength[20];
> > +int UnpackReadTables_ZeroCount;
> > +void UnpackReadTables() {
> > +  for (unsigned I = 0; I < 20;)
> > +    while (UnpackReadTables_ZeroCount-- &&
> > +           I < sizeof(UnpackReadTables_BitLength))
> > +      UnpackReadTables_BitLength[I++] = 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..e8f5b06834d1076cb4d0d
> 166a3aff50c9cdda330
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_95-pr113137.cc
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-w" } */
> > +
> > +short gen_to_words_words;
> > +void gen_to_words() {
> > +  unsigned short *lp = &gen_to_words_words;
> > +  long carry;
> > +  for (carry = 1, lp--; carry; lp--) {
> > +    carry = *lp + carry;
> > +    *lp = carry >>= 16;
> > +    if (lp == &gen_to_words_words)
> > +      break;
> > +  }
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..016e749b0e3a9b9e4dbec
> 0a192f2a0cb274b1a06
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_96-pr113136.cc
> > @@ -0,0 +1,32 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; };
> > +static const char R64_ARRAY[] =
> "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
> ;
> > +char *
> > +_l64a_r (struct _reent *rptr,
> > +     long value)
> > +{
> > +  char *ptr;
> > +  char *result;
> > +  int i, index;
> > +  unsigned long tmp = (unsigned long)value & 0xffffffff;
> > +  result =
> > +          ((
> > +          rptr
> > +          )->_new._reent._l64a_buf)
> > +                               ;
> > +  ptr = result;
> > +  for (i = 0; i < 60; ++i)
> > +    {
> > +      if (tmp == 0)
> > + {
> > +   *ptr = '\0';
> > +   break;
> > + }
> > +      *ptr++ = R64_ARRAY[index];
> > +      tmp >>= 6;
> > +    }
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..625b227b626491a1e4e2f
> 3e02a3689c5ccb50089
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_97-pr113172.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int tswchp_2;
> > +short cpy_buf[8];
> > +void ts_endcmd() {
> > +  int i = 0;
> > +  for (; i < 8 && i < tswchp_2; i++)
> > +    cpy_buf[i] = i;
> > +}
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index
> 3d9673fb0b580ff21ff151dc5c199840df41a1cd..cbec6d64aabc99be9cc67a00b5
> dee795b69789c2 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -671,10 +671,82 @@ vect_analyze_early_break_dependences
> (loop_vec_info loop_vinfo)
> >  		     "loop contains multiple exits, analyzing"
> >  		     " statement dependencies.\n");
> >
> > +  /* Since we don't support general control flow, the location we'll move the
> > +     side-effects to is always the latch connected exit.  When we support
> > +     general control flow we can do better but for now this is fine.  */
> > +  dest_bb = single_pred (loop->latch);
> > +
> > +  /* First check if all load references in the loop are within bounds.  This
> > +     is very conservative for now.  */
> > +  auto_vec<data_reference_p> datarefs;
> > +  if (find_data_references_in_loop (loop, &datarefs) == chrec_dont_know)
> 
> loop_vinfo->datarefs has all datarefs already, no need to gather them
> again.  You are now checking _all_ reads to be in-bounds even those
> that are not feeding the early exit condition by being on the
> in-loop edge of it (as we never move loads).  Was that intended?
> I don't think that's necessary.

Yes that was intended, it was conservatively correct.  But I'll revisit it.

> 
> Btw, the existing code relies on LOOP_VINFO_LOOP_CONDS being ordered
> last-to-first for the alias check to pick up all relevant reads first.
> 
> It's a bit non-obvious what we can ignore and what not.  I think
> as you describe above we can only ignore memory accesses in the
> block of the main IV exit or beyond (for the "late" early break).
> 
> So maybe instead of walking LOOP_VINFO_LOOP_CONDS walk BBs starting
> from the main IV exit BB single predecessor (if the main IV exit
> BB isn't the loop header)?

This sort of broke indeed because we can only move stores to the loop latch exit.
anywhere else is invalid, but that means we will potentially move it over what the
vectorizer has chosen as the main exit.  i.e. you can have 3 exits and and If we pick
the first exit as the main one, we still have to move it to the last one.

So the order no longer matters as you must not alias with any access in any exits
but final one.  But I can change it from walking BB starting from the destination of
the move.

> 
> > +    return opt_result::failure_at (vect_location,
> > +				 "Could not analyze all data references in the "
> > +				 "the loop.  Unable to continue.\n");
> > +  for (auto dr_ref : datarefs)
> > +    {
> > +      if (!DR_IS_READ (dr_ref))
> > +	continue;
> > +
> > +      gimple *stmt = DR_STMT (dr_ref);
> > +
> > +      /* We currently only support statically allocated objects due to
> > +	 not having first-faulting loads support or peeling for
> > +	 alignment support.  Compute the size of the referenced object
> > +	 (it could be dynamically allocated).  */
> > +      tree obj = DR_BASE_ADDRESS (dr_ref);
> > +      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			 "early breaks only supported on statically"
> > +			 " allocated objects.\n");
> > +	  return opt_result::failure_at (stmt,
> > +			 "can't safely apply code motion to "
> > +			 "dependencies of %G to vectorize "
> > +			 "the early exit.\n", stmt);
> > +	}
> > +
> > +      tree refop = TREE_OPERAND (obj, 0);
> > +      tree refbase = get_base_address (refop);
> > +      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> > +	  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			 "early breaks only supported on"
> > +			 " statically allocated objects.\n");
> > +	  return opt_result::failure_at (stmt,
> > +			 "can't safely apply code motion to "
> > +			 "dependencies of %G to vectorize "
> > +			 "the early exit.\n", stmt);
> > +	}
> > +
> > +      /* Check if vector accesses to the object will be within bounds.
> > +	 must be a constant or assume loop will be versioned or niters
> > +	 bounded by VF so accesses are within range.  */
> > +      if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			 "early breaks not supported: vectorization "
> > +			 "would %s beyond size of obj.",
> > +			 DR_IS_READ (dr_ref) ? "read" : "write");
> > +	  return opt_result::failure_at (stmt,
> > +			 "can't safely apply code motion to "
> > +			 "dependencies of %G to vectorize "
> > +			 "the early exit.\n", stmt);
> > +	}
> > +    }
> > +
> >    for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> >      {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location, "inspecting exit: %G", c);
> > +
> >        stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> > -      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> > +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type
> > +	  || gimple_bb (c) == dest_bb)
> >  	continue;
> >
> >        gimple_stmt_iterator gsi = gsi_for_stmt (c);
> > @@ -694,54 +766,9 @@ vect_analyze_early_break_dependences
> (loop_vec_info loop_vinfo)
> >  	  if (!dr_ref)
> >  	    continue;
> >
> > -	  /* We currently only support statically allocated objects due to
> > -	     not having first-faulting loads support or peeling for
> > -	     alignment support.  Compute the size of the referenced object
> > -	     (it could be dynamically allocated).  */
> > -	  tree obj = DR_BASE_ADDRESS (dr_ref);
> > -	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > -	    {
> > -	      if (dump_enabled_p ())
> > -		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > -				 "early breaks only supported on statically"
> > -				 " allocated objects.\n");
> > -	      return opt_result::failure_at (c,
> > -				 "can't safely apply code motion to "
> > -				 "dependencies of %G to vectorize "
> > -				 "the early exit.\n", c);
> > -	    }
> > -
> > -	  tree refop = TREE_OPERAND (obj, 0);
> > -	  tree refbase = get_base_address (refop);
> > -	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> > -	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > -	    {
> > -	      if (dump_enabled_p ())
> > -		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > -				 "early breaks only supported on"
> > -				 " statically allocated objects.\n");
> > -	      return opt_result::failure_at (c,
> > -				 "can't safely apply code motion to "
> > -				 "dependencies of %G to vectorize "
> > -				 "the early exit.\n", c);
> > -	    }
> > -
> > -	  /* Check if vector accesses to the object will be within bounds.
> > -	     must be a constant or assume loop will be versioned or niters
> > -	     bounded by VF so accesses are within range.  */
> > -	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > -	    {
> > -	      if (dump_enabled_p ())
> > -		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > -				 "early breaks not supported: vectorization "
> > -				 "would %s beyond size of obj.",
> > -				 DR_IS_READ (dr_ref) ? "read" : "write");
> > -	      return opt_result::failure_at (c,
> > -				 "can't safely apply code motion to "
> > -				 "dependencies of %G to vectorize "
> > -				 "the early exit.\n", c);
> > -	    }
> > -
> > +	  /* Stores will be delayed until after the early breaks.  Because of
> > +	     that we don't need to know their bounds as we'd only perform them
> > +	     if we knew the vector iteration gets there.  */
> >  	  if (DR_IS_READ (dr_ref))
> >  	    bases.safe_push (dr_ref);
> >  	  else if (DR_IS_WRITE (dr_ref))
> > @@ -801,27 +828,12 @@ vect_analyze_early_break_dependences
> (loop_vec_info loop_vinfo)
> >  				 "marked statement for vUSE update: %G", stmt);
> >  	    }
> >  	}
> > -
> > -      /* Save destination as we go, BB are visited in order and the last one
> > -	is where statements should be moved to.  */
> > -      if (!dest_bb)
> > -	dest_bb = gimple_bb (c);
> > -      else
> > -	{
> > -	  basic_block curr_bb = gimple_bb (c);
> > -	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > -	    dest_bb = curr_bb;
> > -	}
> >      }
> >
> > -  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
> > -  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
> > -  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
> >    /* We don't allow outer -> inner loop transitions which should have been
> >       trapped already during loop form analysis.  */
> >    gcc_assert (dest_bb->loop_father == loop);
> >
> > -  gcc_assert (dest_bb);
> >    LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> >
> >    if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> 295e1c91687461749e5ac7c5bf5e5fdcd8660d64..76d4979c0b3b374dcaacf682
> 5a95a8714114a63b 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1626,11 +1626,12 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop, edge loop_exit,
> >  	  flush_pending_stmts (e);
> >  	}
> >
> > +      bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
> >        /* Record the new SSA names in the cache so that we can skip materializing
> >  	 them again when we fill in the rest of the LCSSA variables.  */
> >        for (auto phi : new_phis)
> >  	{
> > -	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
> > +	  tree new_arg = gimple_phi_arg_def (phi, loop_exit->dest_idx);
> >
> >  	  if (!SSA_VAR_P (new_arg))
> >  	    continue;
> > @@ -1653,7 +1654,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> *loop, edge loop_exit,
> >  	      continue;
> >  	    }
> >
> > -	  /* If we decide to remove the PHI node we should also not
> > +	  /* If we decided not to remove the PHI node we should also not
> >  	     rematerialize it later on.  */
> >  	  new_phi_args.put (new_arg, gimple_phi_result (phi));
> >
> > @@ -1667,7 +1668,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> *loop, edge loop_exit,
> >        edge loop_entry = single_succ_edge (new_preheader);
> >        if (flow_loops)
> >  	{
> > -	  bool peeled_iters = single_pred (loop->latch) != loop_exit->src;
> >  	  /* Link through the main exit first.  */
> >  	  for (auto gsi_from = gsi_start_phis (loop->header),
> >  	       gsi_to = gsi_start_phis (new_loop->header);
> > @@ -1693,8 +1693,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> *loop, edge loop_exit,
> >  		    }
> >  		}
> >  	      /* If we have multiple exits and the vector loop is peeled then we
> > -		 need to use the value at start of loop.  */
> > -	      if (peeled_iters)
> > +		 need to use the value at start of loop.  If we're looking at
> > +		 virtual operands we have to keep the original link.   Virtual
> > +		 operands don't all become the same because we'll corrupt the
> > +		 vUSE chains among others.  */
> > +	      if (peeled_iters && !virtual_operand_p (new_arg))
> >  		{
> >  		  tree tmp_arg = gimple_phi_result (from_phi);
> >  		  if (!new_phi_args.get (tmp_arg))
> > @@ -1728,9 +1731,26 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> *loop, edge loop_exit,
> >
> >  		  tree alt_arg = gimple_phi_result (from_phi);
> >  		  edge main_e = single_succ_edge (alt_loop_exit_block);
> > -		  for (edge e : loop_exits)
> > -		    if (e != loop_exit)
> > -		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
> > +
> > +		  /* Now update the virtual PHI nodes with the right value.  */
> > +		  if (peeled_iters
> > +		      && virtual_operand_p (alt_arg)
> > +		      && flow_bb_inside_loop_p (loop,
> > +				gimple_bb (SSA_NAME_DEF_STMT (alt_arg))))
> > +		    {
> > +			/* Link the alternative exit one.  */
> > +			tree def
> > +			  = gimple_phi_arg_def (to_phi, loop_exit->dest_idx);
> > +			gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (def));
> > +			tree exit_val = gimple_phi_arg_def (phi, 0);
> > +			SET_PHI_ARG_DEF (phi, 0, alt_arg);
> > +
> > +			/* And now the main merge block.  */
> > +			alt_arg = copy_ssa_name (def);
> > +			gphi *l_phi = create_phi_node (alt_arg, main_e->src);
> > +			SET_PHI_ARG_DEF (l_phi, 0, exit_val);
> > +		    }
> > +		  SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
> >  		}
> >
> >  	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> > @@ -3399,6 +3419,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  	vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> >  					  update_e);
> >
> > +      /* If we have a peeled vector iteration we will never skip the epilog loop
> > +	 and we can simplify the cfg a lot by not doing the edge split.  */
> >        if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >  	{
> >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> f51ae3e719e753059389cf9495b6d65b3b1191cb..37f1be1101ffae779214056a
> 0886411e0683e887 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -976,7 +976,10 @@ vec_init_loop_exit_info (class loop *loop)
> >        if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> >  	  && !chrec_contains_undetermined (niter_desc.niter))
> >  	{
> > -	  if (!niter_desc.may_be_zero || !candidate)
> > +	  tree may_be_zero = niter_desc.may_be_zero;
> > +	  if ((may_be_zero && integer_zerop (may_be_zero))
> 
> niter_desc.may_be_zero is never NULL
> 

Hmm vect_get_loop_niters has the same check that it's not NULL so I though it could be,
will remove then.

> The rest looks OK.  Is it possible to split out parts that not
> require the vect_analyze_early_break_dependences changes?

Sure, will do.

Thanks,
Tamar

> 
> Richard.
> 
> > +	     && (!candidate
> > +		 || dominated_by_p (CDI_DOMINATORS, exit->src, candidate-
> >src)))
> >  	    candidate = exit;
> >  	}
> >      }
> > @@ -1913,15 +1916,14 @@ vect_create_loop_vinfo (class loop *loop,
> vec_info_shared *shared,
> >        STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
> >      }
> >
> > -  for (unsigned i = 1; i < info->conds.length (); i ++)
> > -    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_push (info->conds[i]);
> > +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->conds);
> >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->conds[0];
> >
> >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> >
> >    /* Check to see if we're vectorizing multiple exits.  */
> >    LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > -    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > +    = LOOP_VINFO_LOOP_CONDS (loop_vinfo).length () > 1;
> >
> >    if (info->inner_loop_cond)
> >      {
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> 785fc99ca27a4caf26b1fca887e6262108f515b2..d0ee95146b2907132bf68962e
> 3f16d43c36afded 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -893,7 +893,7 @@ public:
> >       the loop for the break finding loop.  */
> >    bool early_breaks;
> >
> > -  /* List of loop additional IV conditionals found in the loop.  */
> > +  /* List of loop all IV conditionals found in the loop.  */
> >    auto_vec<gcond *> conds;
> >
> >    /* Main loop IV cond.  */
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-08 14:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-29 21:05 [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands Tamar Christina
2024-01-08 12:38 ` Richard Biener
2024-01-08 14:16   ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).