public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
Date: Mon, 20 Nov 2023 21:57:38 +0000	[thread overview]
Message-ID: <VI1PR08MB5325326E1AF88F3235E641DBFFB4A@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2311161054400.8772@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 24144 bytes --]

Good morning,

Here's the respun patch,  as discussed we now use reductions and inductions rather than scalar values:

This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.

Additinally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's main exit is
different from the loop's natural one (i.e. the one with the same src block as
the latch).

In those two cases we want the first rather than the last value as we're going
to restart the iteration in the scalar loop.  For VLA this means we need to
reverse both the mask and vector since there's only a way to get the last
active element and not the first.

For inductions and multiple exits:
  - we test if the target will support vectorizing the induction
  - mark all inductions in the loop as relevant
  - for codegen of non-live inductions during codegen
  - induction during an early exit gets the first element rather than last.

For reductions and multiple exits:
  - Reductions for early exits reduces the reduction definition statement
    rather than the reduction step.  This allows us to get the value at the
    start of the iteration.
  - The peeling layout means that we just have to update one block, the merge
    block.  We expect all the reductions to be the same but we leave it up to
    the value numbering to clean up any duplicate code as we iterate over all
    edges.

These two changes fix the reduction codegen given before which has been added
to the testsuite for early vect.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
	(vect_analyze_loop_operations): Check if target supports vectorizing IV.
	(vect_transform_loop): Call vectorizable_live_operation for non-live
	inductions or reductions.
	(find_connected_edge, vectorizable_live_operation_1): New.
	(vect_create_epilog_for_reduction): Support reductions in early break.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	(vect_stmt_relevant_p): Mark all inductions when early break as being
	relevant.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
	(vect_iv_increment_position): New.
	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 139311142b376d4eaa7ef8765608220b1eb92b31..af216d3dcb8a502639898ff67cb86948a7f140a4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
    INSERT_AFTER is set to true if the increment should be inserted after
    *BSI.  */
 
-static void
+void
 vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
 			    bool *insert_after)
 {
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b1c34c4c3aaf8bdf9bf52d5a726836936de772b6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
 					      -1, false, &cost_vec);
 
+	  /* Check if we can perform the operation for early break if we force
+	     the live operation.  */
+	  if (ok
+	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	      && !STMT_VINFO_LIVE_P (stmt_info)
+	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
+					      -1, false, &cost_vec);
+
           if (!ok)
 	    return opt_result::failure_at (phi,
 					   "not vectorized: relevant phi not "
@@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
+   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
+     exit.  This determines whether we use the final or original value.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5895,9 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit,
+				  bool main_exit_p = true)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
+  if (main_exit_p)
+    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
+  else
+    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));
+
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
       if (slp_node)
 	def = vect_get_slp_vect_def (slp_node, i);
       else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+	def = gimple_get_lhs (vec_stmts[0]);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  gimple *stmt = USE_STMT (use_p);
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else if (is_a <gphi *> (stmt))
+		    {
+		      /* If an early exit only update usages in the merge
+			 block.  */
+		      edge merge_e = single_succ_edge (loop_exit->dest);
+		      if (gimple_bb (stmt) != merge_e->dest)
+			continue;
+		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Function vectorizable_live_operation_1.
+
+   helper function for vectorizable_live_operation.  */
+
+tree
+vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info, edge exit_e,
+			       tree vectype, int ncopies, slp_tree slp_node,
+			       tree bitsize, tree bitstart, tree vec_lhs,
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
+{
+  basic_block exit_bb = exit_e->dest;
+  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
+  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+    SET_PHI_ARG_DEF (phi, i, vec_lhs);
+
+  gimple_seq stmts = NULL;
+  tree new_tree;
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (ncopies == 1 && !slp_node);
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree len = vect_get_loop_len (loop_vinfo, &gsi,
+				    &LOOP_VINFO_LENS (loop_vinfo),
+				    1, vectype, 0, 0);
+
+      /* BIAS - 1.  */
+      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+      tree bias_minus_one
+	= int_const_binop (MINUS_EXPR,
+			   build_int_cst (TREE_TYPE (len), biasval),
+			   build_one_cst (TREE_TYPE (len)));
+
+      /* LAST_INDEX = LEN + (BIAS - 1).  */
+      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
+				     len, bias_minus_one);
+
+      /* This needs to implement extraction of the first index, but not sure
+	 how the LEN stuff works.  At the moment we shouldn't get here since
+	 there's no LEN support for early breaks.  But guard this so there's
+	 no incorrect codegen.  */
+      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
+      tree scalar_res
+	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+			vec_lhs_phi, last_index);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (!slp_node);
+      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
+				      &LOOP_VINFO_MASKS (loop_vinfo),
+				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
+
+      gimple_seq_add_seq (&stmts, tem);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else
+    {
+      tree bftype = TREE_TYPE (vectype);
+      if (VECTOR_BOOLEAN_TYPE_P (vectype))
+	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
+      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
+      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
+				       &stmts, true, NULL_TREE);
+    }
+
+  *exit_gsi = gsi_after_labels (exit_bb);
+  if (stmts)
+    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
+  return new_tree;
+}
+
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, dest->preds)
+    {
+      if (src->dest == e->src)
+	return e;
+    }
+
+  return NULL;
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						slp_node, slp_node_instance,
+						exit, false);
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10683,103 +10883,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_seq stmts = NULL;
-      tree new_tree;
-      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree len
-	    = vect_get_loop_len (loop_vinfo, &gsi,
-				 &LOOP_VINFO_LENS (loop_vinfo),
-				 1, vectype, 0, 0);
-
-	  /* BIAS - 1.  */
-	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
-	  tree bias_minus_one
-	    = int_const_binop (MINUS_EXPR,
-			       build_int_cst (TREE_TYPE (len), biasval),
-			       build_one_cst (TREE_TYPE (len)));
-
-	  /* LAST_INDEX = LEN + (BIAS - 1).  */
-	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
-					  len, bias_minus_one);
-
-	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
-	  tree scalar_res
-	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
-			    vec_lhs_phi, last_index);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
-					  &LOOP_VINFO_MASKS (loop_vinfo),
-					  1, vectype, 0);
-	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else
-	{
-	  tree bftype = TREE_TYPE (vectype);
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
-	  new_tree = build3 (BIT_FIELD_REF, bftype,
-			     vec_lhs_phi, bitsize, bitstart);
-	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
-					   &stmts, true, NULL_TREE);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = !vect_is_loop_exit_latch_pred (main_e, loop);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
 
-      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
-      if (stmts)
-	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		if (restart_loop || exit_e != main_e)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
@@ -11797,6 +11956,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
 	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
+	      /* If vectorizing early break we must also vectorize the use of
+		 the PHIs as a live operation.  */
+	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		  && !STMT_VINFO_LIVE_P (stmt_info)
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+			 "----> vectorizing early break reduc or induc phi: %G",
+			 (gimple *) phi);
+		  bool done
+		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
+						   NULL, -1, true, NULL);
+		  gcc_assert (done);
+		}
 	    }
 	}
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *relevant = vect_used_in_scope;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 22a8c3d384d7ae1ca93079b64f2d40821b4a3c56..cfd6756492e4af460c2f5669ecccc82b1089cfe4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2225,6 +2225,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
 extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2246,6 +2247,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

[-- Attachment #2: rb17968 (1).patch --]
[-- Type: application/octet-stream, Size: 21096 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 139311142b376d4eaa7ef8765608220b1eb92b31..af216d3dcb8a502639898ff67cb86948a7f140a4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
    INSERT_AFTER is set to true if the increment should be inserted after
    *BSI.  */
 
-static void
+void
 vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
 			    bool *insert_after)
 {
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b1c34c4c3aaf8bdf9bf52d5a726836936de772b6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
 					      -1, false, &cost_vec);
 
+	  /* Check if we can perform the operation for early break if we force
+	     the live operation.  */
+	  if (ok
+	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	      && !STMT_VINFO_LIVE_P (stmt_info)
+	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
+					      -1, false, &cost_vec);
+
           if (!ok)
 	    return opt_result::failure_at (phi,
 					   "not vectorized: relevant phi not "
@@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
+   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
+     exit.  This determines whether we use the final or original value.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5895,9 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit,
+				  bool main_exit_p = true)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
+  if (main_exit_p)
+    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
+  else
+    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));
+
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
       if (slp_node)
 	def = vect_get_slp_vect_def (slp_node, i);
       else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+	def = gimple_get_lhs (vec_stmts[0]);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  gimple *stmt = USE_STMT (use_p);
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else if (is_a <gphi *> (stmt))
+		    {
+		      /* If an early exit only update usages in the merge
+			 block.  */
+		      edge merge_e = single_succ_edge (loop_exit->dest);
+		      if (gimple_bb (stmt) != merge_e->dest)
+			continue;
+		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Function vectorizable_live_operation_1.
+
+   helper function for vectorizable_live_operation.  */
+
+tree
+vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info, edge exit_e,
+			       tree vectype, int ncopies, slp_tree slp_node,
+			       tree bitsize, tree bitstart, tree vec_lhs,
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
+{
+  basic_block exit_bb = exit_e->dest;
+  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
+  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+    SET_PHI_ARG_DEF (phi, i, vec_lhs);
+
+  gimple_seq stmts = NULL;
+  tree new_tree;
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (ncopies == 1 && !slp_node);
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree len = vect_get_loop_len (loop_vinfo, &gsi,
+				    &LOOP_VINFO_LENS (loop_vinfo),
+				    1, vectype, 0, 0);
+
+      /* BIAS - 1.  */
+      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+      tree bias_minus_one
+	= int_const_binop (MINUS_EXPR,
+			   build_int_cst (TREE_TYPE (len), biasval),
+			   build_one_cst (TREE_TYPE (len)));
+
+      /* LAST_INDEX = LEN + (BIAS - 1).  */
+      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
+				     len, bias_minus_one);
+
+      /* This needs to implement extraction of the first index, but not sure
+	 how the LEN stuff works.  At the moment we shouldn't get here since
+	 there's no LEN support for early breaks.  But guard this so there's
+	 no incorrect codegen.  */
+      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
+      tree scalar_res
+	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+			vec_lhs_phi, last_index);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (!slp_node);
+      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
+				      &LOOP_VINFO_MASKS (loop_vinfo),
+				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
+
+      gimple_seq_add_seq (&stmts, tem);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else
+    {
+      tree bftype = TREE_TYPE (vectype);
+      if (VECTOR_BOOLEAN_TYPE_P (vectype))
+	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
+      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
+      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
+				       &stmts, true, NULL_TREE);
+    }
+
+  *exit_gsi = gsi_after_labels (exit_bb);
+  if (stmts)
+    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
+  return new_tree;
+}
+
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, dest->preds)
+    {
+      if (src->dest == e->src)
+	return e;
+    }
+
+  return NULL;
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						slp_node, slp_node_instance,
+						exit, false);
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10683,103 +10883,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_seq stmts = NULL;
-      tree new_tree;
-      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree len
-	    = vect_get_loop_len (loop_vinfo, &gsi,
-				 &LOOP_VINFO_LENS (loop_vinfo),
-				 1, vectype, 0, 0);
-
-	  /* BIAS - 1.  */
-	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
-	  tree bias_minus_one
-	    = int_const_binop (MINUS_EXPR,
-			       build_int_cst (TREE_TYPE (len), biasval),
-			       build_one_cst (TREE_TYPE (len)));
-
-	  /* LAST_INDEX = LEN + (BIAS - 1).  */
-	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
-					  len, bias_minus_one);
-
-	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
-	  tree scalar_res
-	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
-			    vec_lhs_phi, last_index);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
-					  &LOOP_VINFO_MASKS (loop_vinfo),
-					  1, vectype, 0);
-	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else
-	{
-	  tree bftype = TREE_TYPE (vectype);
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
-	  new_tree = build3 (BIT_FIELD_REF, bftype,
-			     vec_lhs_phi, bitsize, bitstart);
-	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
-					   &stmts, true, NULL_TREE);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = !vect_is_loop_exit_latch_pred (main_e, loop);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
 
-      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
-      if (stmts)
-	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		if (restart_loop || exit_e != main_e)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
@@ -11797,6 +11956,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
 	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
+	      /* If vectorizing early break we must also vectorize the use of
+		 the PHIs as a live operation.  */
+	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		  && !STMT_VINFO_LIVE_P (stmt_info)
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+			 "----> vectorizing early break reduc or induc phi: %G",
+			 (gimple *) phi);
+		  bool done
+		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
+						   NULL, -1, true, NULL);
+		  gcc_assert (done);
+		}
 	    }
 	}
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *relevant = vect_used_in_scope;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 22a8c3d384d7ae1ca93079b64f2d40821b4a3c56..cfd6756492e4af460c2f5669ecccc82b1089cfe4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2225,6 +2225,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
 extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2246,6 +2247,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

  reply	other threads:[~2023-11-20 21:57 UTC|newest]

Thread overview: 200+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina [this message]
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325326E1AF88F3235E641DBFFB4A@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).