[PATCH GCC 6/9]Simplify control flow graph for vectorized loop

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Bin Cheng <Bin.Cheng@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: nd <nd@arm.com>
Subject: [PATCH GCC 6/9]Simplify control flow graph for vectorized loop
Date: Tue, 06 Sep 2016 18:53:00 -0000	[thread overview]
Message-ID: <AM4PR0802MB21631E58381E20EE07D06446E7F90@AM4PR0802MB2163.eurprd08.prod.outlook.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4361 bytes --]

Hi,
This is the main patch improving control flow graph for vectorized loop.  It generally rewrites loop peeling stuff in vectorizer.  As described in patch, for a typical loop to be vectorized like:

       preheader:
     LOOP:
       header_bb:
	 loop_body
	 if (exit_loop_cond) goto exit_bb
	 else                goto header_bb
       exit_bb:

This patch peels prolog and epilog from the loop, adds guards skipping PROLOG and EPILOG for various conditions.  As a result, the changed CFG would look like:

       guard_bb_1:
	 if (prefer_scalar_loop) goto merge_bb_1
	 else                    goto guard_bb_2

       guard_bb_2:
         if (skip_prolog) goto merge_bb_2
         else             goto prolog_preheader

       prolog_preheader:
     PROLOG:
       prolog_header_bb:
	 prolog_body
	 if (exit_prolog_cond) goto prolog_exit_bb
	 else                  goto prolog_header_bb
       prolog_exit_bb:

       merge_bb_2:

       vector_preheader:
     VECTOR LOOP:
       vector_header_bb:
	 vector_body
	 if (exit_vector_cond) goto vector_exit_bb
	 else                  goto vector_header_bb
       vector_exit_bb:

       guard_bb_3:
	 if (skip_epilog) goto merge_bb_3
	 else             goto epilog_preheader

       merge_bb_1:

       epilog_preheader:
     EPILOG:
       epilog_header_bb:
	 epilog_body
	 if (exit_epilog_cond) goto merge_bb_3
	 else                  goto epilog_header_bb

       merge_bb_3:


Note this patch peels prolog and epilog only if it's necessary, as well as adds different guard_conditions/branches.  Also the first guard/branch could be further improved by merging it with loop versioning.

Before this patch, up to 4 branch instructions need to be executed before the vectorized loop is reached in the worst case, while the number is reduced to 2 with this patch.  The patch also does better in compile time analysis to avoid unnecessary peeling/branching.
From implementation's point of view, vectorizer needs to update induction variables and iteration bounds along with control flow changes.  Unfortunately, it also becomes much harder to follow because slpeel_* functions updates SSA by itself, rather than using update_ssa interface.  This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes.  This should make the implementation easier to read, and I think it maybe a step forward to replace slpeel_* functions with generic GIMPLE loop copy interfaces as Richard suggested.

Thanks,
bin

2016-09-01  Bin Cheng  <bin.cheng@arm.com>

	* tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release
	adjust_vec automatically.
	(slpeel_add_loop_guard): Remove param cond_expr_stmt_list.  Rename
	param exit_bb to guard_to.
	(slpeel_checking_verify_cfg_after_peeling):
	(set_prologue_iterations):
	(create_lcssa_for_virtual_phi): New func which is factored out from
	slpeel_tree_peel_loop_to_edge.
	(slpeel_tree_peel_loop_to_edge):
	(iv_phi_p): New func.
	(vect_can_advance_ivs_p): Call iv_phi_p.
	(vect_update_ivs_after_vectorizer): Call iv_phi_p.  Directly insert
	new gimple stmts in basic block.
	(vect_do_peeling_for_loop_bound):
	(vect_do_peeling_for_alignment):
	(vect_gen_niters_for_prolog_loop): Rename to...
	(vect_gen_prolog_loop_niters): ...Rename from.  Change parameters and
	adjust implementation.
	(vect_update_inits_of_drs): Fix code style issue.  Convert niters to
	sizetype if necessary.
	(vect_build_loop_niters): Move to here from tree-vect-loop.c.  Change
	it to external function.
	(vect_gen_scalar_loop_niters, vect_gen_vector_loop_niters): New.
	(vect_gen_vector_loop_niters_mult_vf): New.
	(slpeel_update_phi_nodes_for_loops): New.
	(slpeel_update_phi_nodes_for_guard1): Reimplement.
	(find_guard_arg, slpeel_update_phi_nodes_for_guard2): Reimplement.
	(slpeel_update_phi_nodes_for_lcssa, vect_do_peeling): New.
	* tree-vect-loop.c (vect_build_loop_niters): Move to file
	tree-vect-loop-manip.c
	(vect_generate_tmps_on_preheader): Delete.
	(vect_transform_loop): Rename vectorization_factor to vf.  Call
	vect_do_peeling instead of vect_do_peeling-* functions.
	* tree-vectorizer.h (vect_do_peeling): New decl.
	(vect_build_loop_niters, vect_gen_vector_loop_niters): New decls.
	(vect_do_peeling_for_loop_bound): Delete.
	(vect_do_peeling_for_alignment): Delete.

[-- Attachment #2: 006-cfg-vectorizer-20160902.txt --]
[-- Type: text/plain, Size: 92432 bytes --]

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index b749afa..e566c39 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -188,8 +188,6 @@ adjust_vec_debug_stmts (void)
       adjust_debug_stmts_now (&adjust_vec.last ());
       adjust_vec.pop ();
     }
-
-  adjust_vec.release ();
 }
 
 /* Adjust any debug stmts that referenced FROM values to use the
@@ -235,434 +233,6 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
 			gimple_bb (update_phi));
 }
 
-
-/* Update PHI nodes for a guard of the LOOP.
-
-   Input:
-   - LOOP, GUARD_EDGE: LOOP is a loop for which we added guard code that
-        controls whether LOOP is to be executed.  GUARD_EDGE is the edge that
-        originates from the guard-bb, skips LOOP and reaches the (unique) exit
-        bb of LOOP.  This loop-exit-bb is an empty bb with one successor.
-        We denote this bb NEW_MERGE_BB because before the guard code was added
-        it had a single predecessor (the LOOP header), and now it became a merge
-        point of two paths - the path that ends with the LOOP exit-edge, and
-        the path that ends with GUARD_EDGE.
-   - NEW_EXIT_BB: New basic block that is added by this function between LOOP
-        and NEW_MERGE_BB. It is used to place loop-closed-ssa-form exit-phis.
-
-   ===> The CFG before the guard-code was added:
-        LOOP_header_bb:
-          loop_body
-          if (exit_loop) goto update_bb
-          else           goto LOOP_header_bb
-        update_bb:
-
-   ==> The CFG after the guard-code was added:
-        guard_bb:
-          if (LOOP_guard_condition) goto new_merge_bb
-          else                      goto LOOP_header_bb
-        LOOP_header_bb:
-          loop_body
-          if (exit_loop_condition) goto new_merge_bb
-          else                     goto LOOP_header_bb
-        new_merge_bb:
-          goto update_bb
-        update_bb:
-
-   ==> The CFG after this function:
-        guard_bb:
-          if (LOOP_guard_condition) goto new_merge_bb
-          else                      goto LOOP_header_bb
-        LOOP_header_bb:
-          loop_body
-          if (exit_loop_condition) goto new_exit_bb
-          else                     goto LOOP_header_bb
-        new_exit_bb:
-        new_merge_bb:
-          goto update_bb
-        update_bb:
-
-   This function:
-   1. creates and updates the relevant phi nodes to account for the new
-      incoming edge (GUARD_EDGE) into NEW_MERGE_BB. This involves:
-      1.1. Create phi nodes at NEW_MERGE_BB.
-      1.2. Update the phi nodes at the successor of NEW_MERGE_BB (denoted
-           UPDATE_BB).  UPDATE_BB was the exit-bb of LOOP before NEW_MERGE_BB
-   2. preserves loop-closed-ssa-form by creating the required phi nodes
-      at the exit of LOOP (i.e, in NEW_EXIT_BB).
-
-   There are two flavors to this function:
-
-   slpeel_update_phi_nodes_for_guard1:
-     Here the guard controls whether we enter or skip LOOP, where LOOP is a
-     prolog_loop (loop1 below), and the new phis created in NEW_MERGE_BB are
-     for variables that have phis in the loop header.
-
-   slpeel_update_phi_nodes_for_guard2:
-     Here the guard controls whether we enter or skip LOOP, where LOOP is an
-     epilog_loop (loop2 below), and the new phis created in NEW_MERGE_BB are
-     for variables that have phis in the loop exit.
-
-   I.E., the overall structure is:
-
-        loop1_preheader_bb:
-                guard1 (goto loop1/merge1_bb)
-        loop1
-        loop1_exit_bb:
-                guard2 (goto merge1_bb/merge2_bb)
-        merge1_bb
-        loop2
-        loop2_exit_bb
-        merge2_bb
-        next_bb
-
-   slpeel_update_phi_nodes_for_guard1 takes care of creating phis in
-   loop1_exit_bb and merge1_bb. These are entry phis (phis for the vars
-   that have phis in loop1->header).
-
-   slpeel_update_phi_nodes_for_guard2 takes care of creating phis in
-   loop2_exit_bb and merge2_bb. These are exit phis (phis for the vars
-   that have phis in next_bb). It also adds some of these phis to
-   loop1_exit_bb.
-
-   slpeel_update_phi_nodes_for_guard1 is always called before
-   slpeel_update_phi_nodes_for_guard2. They are both needed in order
-   to create correct data-flow and loop-closed-ssa-form.
-
-   Generally slpeel_update_phi_nodes_for_guard1 creates phis for variables
-   that change between iterations of a loop (and therefore have a phi-node
-   at the loop entry), whereas slpeel_update_phi_nodes_for_guard2 creates
-   phis for variables that are used out of the loop (and therefore have
-   loop-closed exit phis). Some variables may be both updated between
-   iterations and used after the loop. This is why in loop1_exit_bb we
-   may need both entry_phis (created by slpeel_update_phi_nodes_for_guard1)
-   and exit phis (created by slpeel_update_phi_nodes_for_guard2).
-
-   - IS_NEW_LOOP: if IS_NEW_LOOP is true, then LOOP is a newly created copy of
-     an original loop. i.e., we have:
-
-           orig_loop
-           guard_bb (goto LOOP/new_merge)
-           new_loop <-- LOOP
-           new_exit
-           new_merge
-           next_bb
-
-     If IS_NEW_LOOP is false, then LOOP is an original loop, in which case we
-     have:
-
-           new_loop
-           guard_bb (goto LOOP/new_merge)
-           orig_loop <-- LOOP
-           new_exit
-           new_merge
-           next_bb
-
-     The SSA names defined in the original loop have a current
-     reaching definition that records the corresponding new ssa-name
-     used in the new duplicated loop copy.
-  */
-
-/* Function slpeel_update_phi_nodes_for_guard1
-
-   Input:
-   - GUARD_EDGE, LOOP, IS_NEW_LOOP, NEW_EXIT_BB - as explained above.
-   - DEFS - a bitmap of ssa names to mark new names for which we recorded
-            information.
-
-   In the context of the overall structure, we have:
-
-        loop1_preheader_bb:
-                guard1 (goto loop1/merge1_bb)
-LOOP->  loop1
-        loop1_exit_bb:
-                guard2 (goto merge1_bb/merge2_bb)
-        merge1_bb
-        loop2
-        loop2_exit_bb
-        merge2_bb
-        next_bb
-
-   For each name updated between loop iterations (i.e - for each name that has
-   an entry (loop-header) phi in LOOP) we create a new phi in:
-   1. merge1_bb (to account for the edge from guard1)
-   2. loop1_exit_bb (an exit-phi to keep LOOP in loop-closed form)
-*/
-
-static void
-slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
-                                    bool is_new_loop, basic_block *new_exit_bb)
-{
-  gphi *orig_phi, *new_phi;
-  gphi *update_phi, *update_phi2;
-  tree guard_arg, loop_arg;
-  basic_block new_merge_bb = guard_edge->dest;
-  edge e = EDGE_SUCC (new_merge_bb, 0);
-  basic_block update_bb = e->dest;
-  basic_block orig_bb = loop->header;
-  edge new_exit_e;
-  tree current_new_name;
-  gphi_iterator gsi_orig, gsi_update;
-
-  /* Create new bb between loop and new_merge_bb.  */
-  *new_exit_bb = split_edge (single_exit (loop));
-
-  new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
-
-  for (gsi_orig = gsi_start_phis (orig_bb),
-       gsi_update = gsi_start_phis (update_bb);
-       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
-       gsi_next (&gsi_orig), gsi_next (&gsi_update))
-    {
-      source_location loop_locus, guard_locus;
-      tree new_res;
-      orig_phi = gsi_orig.phi ();
-      update_phi = gsi_update.phi ();
-
-      /** 1. Handle new-merge-point phis  **/
-
-      /* 1.1. Generate new phi node in NEW_MERGE_BB:  */
-      new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-      new_phi = create_phi_node (new_res, new_merge_bb);
-
-      /* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
-            of LOOP. Set the two phi args in NEW_PHI for these edges:  */
-      loop_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, EDGE_SUCC (loop->latch, 0));
-      loop_locus = gimple_phi_arg_location_from_edge (orig_phi,
-						      EDGE_SUCC (loop->latch,
-								 0));
-      guard_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, loop_preheader_edge (loop));
-      guard_locus
-	= gimple_phi_arg_location_from_edge (orig_phi,
-					     loop_preheader_edge (loop));
-
-      add_phi_arg (new_phi, loop_arg, new_exit_e, loop_locus);
-      add_phi_arg (new_phi, guard_arg, guard_edge, guard_locus);
-
-      /* 1.3. Update phi in successor block.  */
-      gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi, e) == loop_arg
-                  || PHI_ARG_DEF_FROM_EDGE (update_phi, e) == guard_arg);
-      adjust_phi_and_debug_stmts (update_phi, e, PHI_RESULT (new_phi));
-      update_phi2 = new_phi;
-
-
-      /** 2. Handle loop-closed-ssa-form phis  **/
-
-      if (virtual_operand_p (PHI_RESULT (orig_phi)))
-	continue;
-
-      /* 2.1. Generate new phi node in NEW_EXIT_BB:  */
-      new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-      new_phi = create_phi_node (new_res, *new_exit_bb);
-
-      /* 2.2. NEW_EXIT_BB has one incoming edge: the exit-edge of the loop.  */
-      add_phi_arg (new_phi, loop_arg, single_exit (loop), loop_locus);
-
-      /* 2.3. Update phi in successor of NEW_EXIT_BB:  */
-      gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, new_exit_e) == loop_arg);
-      adjust_phi_and_debug_stmts (update_phi2, new_exit_e,
-				  PHI_RESULT (new_phi));
-
-      /* 2.4. Record the newly created name with set_current_def.
-         We want to find a name such that
-                name = get_current_def (orig_loop_name)
-         and to set its current definition as follows:
-                set_current_def (name, new_phi_name)
-
-         If LOOP is a new loop then loop_arg is already the name we're
-         looking for. If LOOP is the original loop, then loop_arg is
-         the orig_loop_name and the relevant name is recorded in its
-         current reaching definition.  */
-      if (is_new_loop)
-        current_new_name = loop_arg;
-      else
-        {
-          current_new_name = get_current_def (loop_arg);
-	  /* current_def is not available only if the variable does not
-	     change inside the loop, in which case we also don't care
-	     about recording a current_def for it because we won't be
-	     trying to create loop-exit-phis for it.  */
-	  if (!current_new_name)
-	    continue;
-        }
-      tree new_name = get_current_def (current_new_name);
-      /* Because of peeled_chrec optimization it is possible that we have
-	 set this earlier.  Verify the PHI has the same value.  */
-      if (new_name)
-	{
-	  gimple *phi = SSA_NAME_DEF_STMT (new_name);
-	  gcc_assert (gimple_code (phi) == GIMPLE_PHI
-		      && gimple_bb (phi) == *new_exit_bb
-		      && (PHI_ARG_DEF_FROM_EDGE (phi, single_exit (loop))
-			  == loop_arg));
-	  continue;
-	}
-
-      set_current_def (current_new_name, PHI_RESULT (new_phi));
-    }
-}
-
-
-/* Function slpeel_update_phi_nodes_for_guard2
-
-   Input:
-   - GUARD_EDGE, LOOP, IS_NEW_LOOP, NEW_EXIT_BB - as explained above.
-
-   In the context of the overall structure, we have:
-
-        loop1_preheader_bb:
-                guard1 (goto loop1/merge1_bb)
-        loop1
-        loop1_exit_bb:
-                guard2 (goto merge1_bb/merge2_bb)
-        merge1_bb
-LOOP->  loop2
-        loop2_exit_bb
-        merge2_bb
-        next_bb
-
-   For each name used out side the loop (i.e - for each name that has an exit
-   phi in next_bb) we create a new phi in:
-   1. merge2_bb (to account for the edge from guard_bb)
-   2. loop2_exit_bb (an exit-phi to keep LOOP in loop-closed form)
-   3. guard2 bb (an exit phi to keep the preceding loop in loop-closed form),
-      if needed (if it wasn't handled by slpeel_update_phis_nodes_for_phi1).
-*/
-
-static void
-slpeel_update_phi_nodes_for_guard2 (edge guard_edge, struct loop *loop,
-                                    bool is_new_loop, basic_block *new_exit_bb)
-{
-  gphi *orig_phi, *new_phi;
-  gphi *update_phi, *update_phi2;
-  tree guard_arg, loop_arg;
-  basic_block new_merge_bb = guard_edge->dest;
-  edge e = EDGE_SUCC (new_merge_bb, 0);
-  basic_block update_bb = e->dest;
-  edge new_exit_e;
-  tree orig_def, orig_def_new_name;
-  tree new_name, new_name2;
-  tree arg;
-  gphi_iterator gsi;
-
-  /* Create new bb between loop and new_merge_bb.  */
-  *new_exit_bb = split_edge (single_exit (loop));
-
-  new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
-
-  for (gsi = gsi_start_phis (update_bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    {
-      tree new_res;
-      update_phi = gsi.phi ();
-      orig_phi = update_phi;
-      orig_def = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
-      /* This loop-closed-phi actually doesn't represent a use
-         out of the loop - the phi arg is a constant.  */
-      if (TREE_CODE (orig_def) != SSA_NAME)
-        continue;
-      orig_def_new_name = get_current_def (orig_def);
-      arg = NULL_TREE;
-
-      /** 1. Handle new-merge-point phis  **/
-
-      /* 1.1. Generate new phi node in NEW_MERGE_BB:  */
-      new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-      new_phi = create_phi_node (new_res, new_merge_bb);
-
-      /* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
-            of LOOP. Set the two PHI args in NEW_PHI for these edges:  */
-      new_name = orig_def;
-      new_name2 = NULL_TREE;
-      if (orig_def_new_name)
-        {
-          new_name = orig_def_new_name;
-	  /* Some variables have both loop-entry-phis and loop-exit-phis.
-	     Such variables were given yet newer names by phis placed in
-	     guard_bb by slpeel_update_phi_nodes_for_guard1. I.e:
-	     new_name2 = get_current_def (get_current_def (orig_name)).  */
-          new_name2 = get_current_def (new_name);
-        }
-
-      if (is_new_loop)
-        {
-          guard_arg = orig_def;
-          loop_arg = new_name;
-        }
-      else
-        {
-          guard_arg = new_name;
-          loop_arg = orig_def;
-        }
-      if (new_name2)
-        guard_arg = new_name2;
-
-      add_phi_arg (new_phi, loop_arg, new_exit_e, UNKNOWN_LOCATION);
-      add_phi_arg (new_phi, guard_arg, guard_edge, UNKNOWN_LOCATION);
-
-      /* 1.3. Update phi in successor block.  */
-      gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi, e) == orig_def);
-      adjust_phi_and_debug_stmts (update_phi, e, PHI_RESULT (new_phi));
-      update_phi2 = new_phi;
-
-
-      /** 2. Handle loop-closed-ssa-form phis  **/
-
-      if (virtual_operand_p (PHI_RESULT (orig_phi)))
-	continue;
-
-      /* 2.1. Generate new phi node in NEW_EXIT_BB:  */
-      new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-      new_phi = create_phi_node (new_res, *new_exit_bb);
-
-      /* 2.2. NEW_EXIT_BB has one incoming edge: the exit-edge of the loop.  */
-      add_phi_arg (new_phi, loop_arg, single_exit (loop), UNKNOWN_LOCATION);
-
-      /* 2.3. Update phi in successor of NEW_EXIT_BB:  */
-      gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, new_exit_e) == loop_arg);
-      adjust_phi_and_debug_stmts (update_phi2, new_exit_e,
-				  PHI_RESULT (new_phi));
-
-
-      /** 3. Handle loop-closed-ssa-form phis for first loop  **/
-
-      /* 3.1. Find the relevant names that need an exit-phi in
-	 GUARD_BB, i.e. names for which
-	 slpeel_update_phi_nodes_for_guard1 had not already created a
-	 phi node. This is the case for names that are used outside
-	 the loop (and therefore need an exit phi) but are not updated
-	 across loop iterations (and therefore don't have a
-	 loop-header-phi).
-
-	 slpeel_update_phi_nodes_for_guard1 is responsible for
-	 creating loop-exit phis in GUARD_BB for names that have a
-	 loop-header-phi.  When such a phi is created we also record
-	 the new name in its current definition.  If this new name
-	 exists, then guard_arg was set to this new name (see 1.2
-	 above).  Therefore, if guard_arg is not this new name, this
-	 is an indication that an exit-phi in GUARD_BB was not yet
-	 created, so we take care of it here.  */
-      if (guard_arg == new_name2)
-	continue;
-      arg = guard_arg;
-
-      /* 3.2. Generate new phi node in GUARD_BB:  */
-      new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-      new_phi = create_phi_node (new_res, guard_edge->src);
-
-      /* 3.3. GUARD_BB has one incoming edge:  */
-      gcc_assert (EDGE_COUNT (guard_edge->src->preds) == 1);
-      add_phi_arg (new_phi, arg, EDGE_PRED (guard_edge->src, 0),
-		   UNKNOWN_LOCATION);
-
-      /* 3.4. Update phi in successor of GUARD_BB:  */
-      gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, guard_edge)
-                                                                == guard_arg);
-      adjust_phi_and_debug_stmts (update_phi2, guard_edge,
-				  PHI_RESULT (new_phi));
-    }
-}
-
-
 /* Make the LOOP iterate NITERS times. This is done by adding a new IV
    that starts at zero, increases by one and its limit is NITERS.
 
@@ -942,15 +512,14 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
 }
 
 
-/* Given the condition statement COND, put it as the last statement
-   of GUARD_BB; EXIT_BB is the basic block to skip the loop;
-   Assumes that this is the single exit of the guarded loop.
-   Returns the skip edge, inserts new stmts on the COND_EXPR_STMT_LIST.  */
+/* Given the condition expression COND, put it as the last statement of
+   GUARD_BB; set both edges' probability; set dominator of GUARD_TO to
+   DOM_BB; return the skip edge.  GUARD_TO is the target basic block to
+   skip the loop.  PROBABILITY is the skip edge's probability.  */
 
 static edge
 slpeel_add_loop_guard (basic_block guard_bb, tree cond,
-		       gimple_seq cond_expr_stmt_list,
-		       basic_block exit_bb, basic_block dom_bb,
+		       basic_block guard_to, basic_block dom_bb,
 		       int probability)
 {
   gimple_stmt_iterator gsi;
@@ -966,23 +535,21 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
   cond = force_gimple_operand_1 (cond, &gimplify_stmt_list, is_gimple_condexpr,
 				 NULL_TREE);
   if (gimplify_stmt_list)
-    gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
-  cond_stmt = gimple_build_cond_from_tree (cond, NULL_TREE, NULL_TREE);
-  if (cond_expr_stmt_list)
-    gsi_insert_seq_after (&gsi, cond_expr_stmt_list, GSI_NEW_STMT);
+    gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
 
+  cond_stmt = gimple_build_cond_from_tree (cond, NULL_TREE, NULL_TREE);
   gsi = gsi_last_bb (guard_bb);
   gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
 
   /* Add new edge to connect guard block to the merge/loop-exit block.  */
-  new_e = make_edge (guard_bb, exit_bb, EDGE_TRUE_VALUE);
+  new_e = make_edge (guard_bb, guard_to, EDGE_TRUE_VALUE);
 
   new_e->count = guard_bb->count;
   new_e->probability = probability;
   new_e->count = apply_probability (enter_e->count, probability);
   enter_e->count -= new_e->count;
   enter_e->probability = inverse_probability (probability);
-  set_immediate_dominator (CDI_DOMINATORS, exit_bb, dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, guard_to, dom_bb);
   return new_e;
 }
 
@@ -1018,217 +585,19 @@ slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
   return true;
 }
 
-static void
-slpeel_checking_verify_cfg_after_peeling (struct loop *first_loop,
-					  struct loop *second_loop)
-{
-  if (!flag_checking)
-    return;
-
-  basic_block loop1_exit_bb = single_exit (first_loop)->dest;
-  basic_block loop2_entry_bb = loop_preheader_edge (second_loop)->src;
-  basic_block loop1_entry_bb = loop_preheader_edge (first_loop)->src;
-
-  /* A guard that controls whether the second_loop is to be executed or skipped
-     is placed in first_loop->exit.  first_loop->exit therefore has two
-     successors - one is the preheader of second_loop, and the other is a bb
-     after second_loop.
-   */
-  gcc_assert (EDGE_COUNT (loop1_exit_bb->succs) == 2);
-
-  /* 1. Verify that one of the successors of first_loop->exit is the preheader
-        of second_loop.  */
-
-  /* The preheader of new_loop is expected to have two predecessors:
-     first_loop->exit and the block that precedes first_loop.  */
-
-  gcc_assert (EDGE_COUNT (loop2_entry_bb->preds) == 2
-              && ((EDGE_PRED (loop2_entry_bb, 0)->src == loop1_exit_bb
-                   && EDGE_PRED (loop2_entry_bb, 1)->src == loop1_entry_bb)
-               || (EDGE_PRED (loop2_entry_bb, 1)->src ==  loop1_exit_bb
-                   && EDGE_PRED (loop2_entry_bb, 0)->src == loop1_entry_bb)));
-
-  /* Verify that the other successor of first_loop->exit is after the
-     second_loop.  */
-  /* TODO */
-}
-
-/* If the run time cost model check determines that vectorization is
-   not profitable and hence scalar loop should be generated then set
-   FIRST_NITERS to prologue peeled iterations. This will allow all the
-   iterations to be executed in the prologue peeled scalar loop.  */
+/* If the loop has a virtual PHI, but exit bb doesn't, create a virtual PHI
+   in the exit bb and rename all the uses after the loop.  This simplifies
+   the *guard[12] routines, which assume loop closed SSA form for all PHIs
+   (but normally loop closed SSA form doesn't require virtual PHIs to be
+   in the same form).  Doing this early simplifies the checking what
+   uses should be renamed.  */
 
 static void
-set_prologue_iterations (basic_block bb_before_first_loop,
-			 tree *first_niters,
-			 struct loop *loop,
-			 unsigned int th,
-			 int probability)
-{
-  edge e;
-  basic_block cond_bb, then_bb;
-  tree var, prologue_after_cost_adjust_name;
-  gimple_stmt_iterator gsi;
-  gphi *newphi;
-  edge e_true, e_false, e_fallthru;
-  gcond *cond_stmt;
-  gimple_seq stmts = NULL;
-  tree cost_pre_condition = NULL_TREE;
-  tree scalar_loop_iters =
-    unshare_expr (LOOP_VINFO_NITERS_UNCHANGED (loop_vec_info_for_loop (loop)));
-
-  e = single_pred_edge (bb_before_first_loop);
-  cond_bb = split_edge (e);
-
-  e = single_pred_edge (bb_before_first_loop);
-  then_bb = split_edge (e);
-  set_immediate_dominator (CDI_DOMINATORS, then_bb, cond_bb);
-
-  e_false = make_single_succ_edge (cond_bb, bb_before_first_loop,
-				   EDGE_FALSE_VALUE);
-  set_immediate_dominator (CDI_DOMINATORS, bb_before_first_loop, cond_bb);
-
-  e_true = EDGE_PRED (then_bb, 0);
-  e_true->flags &= ~EDGE_FALLTHRU;
-  e_true->flags |= EDGE_TRUE_VALUE;
-
-  e_true->probability = probability;
-  e_false->probability = inverse_probability (probability);
-  e_true->count = apply_probability (cond_bb->count, probability);
-  e_false->count = cond_bb->count - e_true->count;
-  then_bb->frequency = EDGE_FREQUENCY (e_true);
-  then_bb->count = e_true->count;
-
-  e_fallthru = EDGE_SUCC (then_bb, 0);
-  e_fallthru->count = then_bb->count;
-
-  gsi = gsi_last_bb (cond_bb);
-  cost_pre_condition =
-    fold_build2 (LE_EXPR, boolean_type_node, scalar_loop_iters,
-	         build_int_cst (TREE_TYPE (scalar_loop_iters), th));
-  cost_pre_condition =
-    force_gimple_operand_gsi_1 (&gsi, cost_pre_condition, is_gimple_condexpr,
-				NULL_TREE, false, GSI_CONTINUE_LINKING);
-  cond_stmt = gimple_build_cond_from_tree (cost_pre_condition,
-					   NULL_TREE, NULL_TREE);
-  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
-
-  var = create_tmp_var (TREE_TYPE (scalar_loop_iters),
-			"prologue_after_cost_adjust");
-  prologue_after_cost_adjust_name =
-    force_gimple_operand (scalar_loop_iters, &stmts, false, var);
-
-  gsi = gsi_last_bb (then_bb);
-  if (stmts)
-    gsi_insert_seq_after (&gsi, stmts, GSI_NEW_STMT);
-
-  newphi = create_phi_node (var, bb_before_first_loop);
-  add_phi_arg (newphi, prologue_after_cost_adjust_name, e_fallthru,
-	       UNKNOWN_LOCATION);
-  add_phi_arg (newphi, *first_niters, e_false, UNKNOWN_LOCATION);
-
-  *first_niters = PHI_RESULT (newphi);
-}
-
-/* Function slpeel_tree_peel_loop_to_edge.
-
-   Peel the first (last) iterations of LOOP into a new prolog (epilog) loop
-   that is placed on the entry (exit) edge E of LOOP. After this transformation
-   we have two loops one after the other - first-loop iterates FIRST_NITERS
-   times, and second-loop iterates the remainder NITERS - FIRST_NITERS times.
-   If the cost model indicates that it is profitable to emit a scalar
-   loop instead of the vector one, then the prolog (epilog) loop will iterate
-   for the entire unchanged scalar iterations of the loop.
-
-   Input:
-   - LOOP: the loop to be peeled.
-   - SCALAR_LOOP: if non-NULL, the alternate loop from which basic blocks
-	should be copied.
-   - E: the exit or entry edge of LOOP.
-        If it is the entry edge, we peel the first iterations of LOOP. In this
-        case first-loop is LOOP, and second-loop is the newly created loop.
-        If it is the exit edge, we peel the last iterations of LOOP. In this
-        case, first-loop is the newly created loop, and second-loop is LOOP.
-   - NITERS: the number of iterations that LOOP iterates.
-   - FIRST_NITERS: the number of iterations that the first-loop should iterate.
-   - UPDATE_FIRST_LOOP_COUNT:  specified whether this function is responsible
-        for updating the loop bound of the first-loop to FIRST_NITERS.  If it
-        is false, the caller of this function may want to take care of this
-        (this can be useful if we don't want new stmts added to first-loop).
-   - TH: cost model profitability threshold of iterations for vectorization.
-   - CHECK_PROFITABILITY: specify whether cost model check has not occurred
-                          during versioning and hence needs to occur during
-			  prologue generation or whether cost model check
-			  has not occurred during prologue generation and hence
-			  needs to occur during epilogue generation.
-   - BOUND1 is the upper bound on number of iterations of the first loop (if known)
-   - BOUND2 is the upper bound on number of iterations of the second loop (if known)
-
-
-   Output:
-   The function returns a pointer to the new loop-copy, or NULL if it failed
-   to perform the transformation.
-
-   The function generates two if-then-else guards: one before the first loop,
-   and the other before the second loop:
-   The first guard is:
-     if (FIRST_NITERS == 0) then skip the first loop,
-     and go directly to the second loop.
-   The second guard is:
-     if (FIRST_NITERS == NITERS) then skip the second loop.
-
-   If the optional COND_EXPR and COND_EXPR_STMT_LIST arguments are given
-   then the generated condition is combined with COND_EXPR and the
-   statements in COND_EXPR_STMT_LIST are emitted together with it.
-
-   FORNOW only simple loops are supported (see slpeel_can_duplicate_loop_p).
-   FORNOW the resulting code will not be in loop-closed-ssa form.
-*/
-
-static struct loop *
-slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
-			       edge e, tree *first_niters,
-			       tree niters, bool update_first_loop_count,
-			       unsigned int th, bool check_profitability,
-			       tree cond_expr, gimple_seq cond_expr_stmt_list,
-			       int bound1, int bound2)
+create_lcssa_for_virtual_phi (struct loop *loop)
 {
-  struct loop *new_loop = NULL, *first_loop, *second_loop;
-  edge skip_e;
-  tree pre_condition = NULL_TREE;
-  basic_block bb_before_second_loop, bb_after_second_loop;
-  basic_block bb_before_first_loop;
-  basic_block bb_between_loops;
-  basic_block new_exit_bb;
   gphi_iterator gsi;
   edge exit_e = single_exit (loop);
-  source_location loop_loc;
-  /* There are many aspects to how likely the first loop is going to be executed.
-     Without histogram we can't really do good job.  Simply set it to
-     2/3, so the first loop is not reordered to the end of function and
-     the hot path through stays short.  */
-  int first_guard_probability = 2 * REG_BR_PROB_BASE / 3;
-  int second_guard_probability = 2 * REG_BR_PROB_BASE / 3;
-  int probability_of_second_loop;
-
-  if (!slpeel_can_duplicate_loop_p (loop, e))
-    return NULL;
 
-  /* We might have a queued need to update virtual SSA form.  As we
-     delete the update SSA machinery below after doing a regular
-     incremental SSA update during loop copying make sure we don't
-     lose that fact.
-     ???  Needing to update virtual SSA form by renaming is unfortunate
-     but not all of the vectorizer code inserting new loads / stores
-     properly assigns virtual operands to those statements.  */
-  update_ssa (TODO_update_ssa_only_virtuals);
- 
-  /* If the loop has a virtual PHI, but exit bb doesn't, create a virtual PHI
-     in the exit bb and rename all the uses after the loop.  This simplifies
-     the *guard[12] routines, which assume loop closed SSA form for all PHIs
-     (but normally loop closed SSA form doesn't require virtual PHIs to be
-     in the same form).  Doing this early simplifies the checking what
-     uses should be renamed.  */
   for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
     if (virtual_operand_p (gimple_phi_result (gsi_stmt (gsi))))
       {
@@ -1257,247 +626,6 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
 	break;
       }
 
-  /* 1. Generate a copy of LOOP and put it on E (E is the entry/exit of LOOP).
-        Resulting CFG would be:
-
-        first_loop:
-        do {
-        } while ...
-
-        second_loop:
-        do {
-        } while ...
-
-        orig_exit_bb:
-   */
-
-  if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
-							   e)))
-    {
-      loop_loc = find_loop_location (loop);
-      dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
-                       "tree_duplicate_loop_to_edge_cfg failed.\n");
-      return NULL;
-    }
-
-  if (MAY_HAVE_DEBUG_STMTS)
-    {
-      gcc_assert (!adjust_vec.exists ());
-      adjust_vec.create (32);
-    }
-
-  if (e == exit_e)
-    {
-      /* NEW_LOOP was placed after LOOP.  */
-      first_loop = loop;
-      second_loop = new_loop;
-    }
-  else
-    {
-      /* NEW_LOOP was placed before LOOP.  */
-      first_loop = new_loop;
-      second_loop = loop;
-    }
-
-  /* 2.  Add the guard code in one of the following ways:
-
-     2.a Add the guard that controls whether the first loop is executed.
-         This occurs when this function is invoked for prologue or epilogue
-	 generation and when the cost model check can be done at compile time.
-
-         Resulting CFG would be:
-
-         bb_before_first_loop:
-         if (FIRST_NITERS == 0) GOTO bb_before_second_loop
-                                GOTO first-loop
-
-         first_loop:
-         do {
-         } while ...
-
-         bb_before_second_loop:
-
-         second_loop:
-         do {
-         } while ...
-
-         orig_exit_bb:
-
-     2.b Add the cost model check that allows the prologue
-         to iterate for the entire unchanged scalar
-         iterations of the loop in the event that the cost
-         model indicates that the scalar loop is more
-         profitable than the vector one. This occurs when
-	 this function is invoked for prologue generation
-	 and the cost model check needs to be done at run
-	 time.
-
-         Resulting CFG after prologue peeling would be:
-
-         if (scalar_loop_iterations <= th)
-           FIRST_NITERS = scalar_loop_iterations
-
-         bb_before_first_loop:
-         if (FIRST_NITERS == 0) GOTO bb_before_second_loop
-                                GOTO first-loop
-
-         first_loop:
-         do {
-         } while ...
-
-         bb_before_second_loop:
-
-         second_loop:
-         do {
-         } while ...
-
-         orig_exit_bb:
-
-     2.c Add the cost model check that allows the epilogue
-         to iterate for the entire unchanged scalar
-         iterations of the loop in the event that the cost
-         model indicates that the scalar loop is more
-         profitable than the vector one. This occurs when
-	 this function is invoked for epilogue generation
-	 and the cost model check needs to be done at run
-	 time.  This check is combined with any pre-existing
-	 check in COND_EXPR to avoid versioning.
-
-         Resulting CFG after prologue peeling would be:
-
-         bb_before_first_loop:
-         if ((scalar_loop_iterations <= th)
-             ||
-             FIRST_NITERS == 0) GOTO bb_before_second_loop
-                                GOTO first-loop
-
-         first_loop:
-         do {
-         } while ...
-
-         bb_before_second_loop:
-
-         second_loop:
-         do {
-         } while ...
-
-         orig_exit_bb:
-  */
-
-  bb_before_first_loop = split_edge (loop_preheader_edge (first_loop));
-  /* Loop copying insterted a forwarder block for us here.  */
-  bb_before_second_loop = single_exit (first_loop)->dest;
-
-  probability_of_second_loop = (inverse_probability (first_guard_probability)
-			        + combine_probabilities (second_guard_probability,
-                                                         first_guard_probability));
-  /* Theoretically preheader edge of first loop and exit edge should have
-     same frequencies.  Loop exit probablities are however easy to get wrong.
-     It is safer to copy value from original loop entry.  */
-  bb_before_second_loop->frequency
-     = combine_probabilities (bb_before_first_loop->frequency,
-                              probability_of_second_loop);
-  bb_before_second_loop->count
-     = apply_probability (bb_before_first_loop->count,
-			  probability_of_second_loop);
-  single_succ_edge (bb_before_second_loop)->count
-     = bb_before_second_loop->count;
-
-  /* Epilogue peeling.  */
-  if (!update_first_loop_count)
-    {
-      loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
-      tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
-      unsigned limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
-      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
-	limit = limit + 1;
-      if (check_profitability
-	  && th > limit)
-	limit = th;
-      pre_condition =
-	fold_build2 (LT_EXPR, boolean_type_node, scalar_loop_iters,
-		     build_int_cst (TREE_TYPE (scalar_loop_iters), limit));
-      if (cond_expr)
-	{
-	  pre_condition =
-	    fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
-			 pre_condition,
-			 fold_build1 (TRUTH_NOT_EXPR, boolean_type_node,
-				      cond_expr));
-	}
-    }
-
-  /* Prologue peeling.  */
-  else
-    {
-      if (check_profitability)
-	set_prologue_iterations (bb_before_first_loop, first_niters,
-				 loop, th, first_guard_probability);
-
-      pre_condition =
-	fold_build2 (LE_EXPR, boolean_type_node, *first_niters,
-		     build_int_cst (TREE_TYPE (*first_niters), 0));
-    }
-
-  skip_e = slpeel_add_loop_guard (bb_before_first_loop, pre_condition,
-				  cond_expr_stmt_list,
-                                  bb_before_second_loop, bb_before_first_loop,
-				  inverse_probability (first_guard_probability));
-  scale_loop_profile (first_loop, first_guard_probability,
-		      check_profitability && (int)th > bound1 ? th : bound1);
-  slpeel_update_phi_nodes_for_guard1 (skip_e, first_loop,
-				      first_loop == new_loop,
-				      &new_exit_bb);
-
-
-  /* 3. Add the guard that controls whether the second loop is executed.
-        Resulting CFG would be:
-
-        bb_before_first_loop:
-        if (FIRST_NITERS == 0) GOTO bb_before_second_loop (skip first loop)
-                               GOTO first-loop
-
-        first_loop:
-        do {
-        } while ...
-
-        bb_between_loops:
-        if (FIRST_NITERS == NITERS) GOTO bb_after_second_loop (skip second loop)
-                                    GOTO bb_before_second_loop
-
-        bb_before_second_loop:
-
-        second_loop:
-        do {
-        } while ...
-
-        bb_after_second_loop:
-
-        orig_exit_bb:
-   */
-
-  bb_between_loops = new_exit_bb;
-  bb_after_second_loop = split_edge (single_exit (second_loop));
-
-  pre_condition =
-	fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
-  skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
-                                  bb_after_second_loop, bb_before_first_loop,
-				  inverse_probability (second_guard_probability));
-  scale_loop_profile (second_loop, probability_of_second_loop, bound2);
-  slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
-                                     second_loop == new_loop, &new_exit_bb);
-
-  /* 4. Make first-loop iterate FIRST_NITERS times, if requested.
-   */
-  if (update_first_loop_count)
-    slpeel_make_loop_iterate_ntimes (first_loop, *first_niters);
-
-  delete_update_ssa ();
-
-  adjust_vec_debug_stmts ();
-
-  return new_loop;
 }
 
 /* Function vect_get_loop_location.
@@ -1541,6 +669,22 @@ find_loop_location (struct loop *loop)
   return UNKNOWN_LOCATION;
 }
 
+/* Return true if PHI defines an IV of the loop to be vectorized.  */
+
+static bool
+iv_phi_p (gphi *phi)
+{
+  if (virtual_operand_p (PHI_RESULT (phi)))
+    return false;
+
+  stmt_vec_info stmt_info = vinfo_for_stmt (phi);
+  gcc_assert (stmt_info != NULL);
+  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
+      || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
+    return false;
+
+  return true;
+}
 
 /* Function vect_can_advance_ivs_p
 
@@ -1556,7 +700,6 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block bb = loop->header;
-  gimple *phi;
   gphi_iterator gsi;
 
   /* Analyze phi functions of the loop header.  */
@@ -1567,7 +710,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
     {
       tree evolution_part;
 
-      phi = gsi.phi ();
+      gphi *phi = gsi.phi ();
       if (dump_enabled_p ())
 	{
           dump_printf_loc (MSG_NOTE, vect_location, "Analyze phi: ");
@@ -1575,26 +718,17 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
 	}
 
       /* Skip virtual phi's. The data dependences that are associated with
-         virtual defs/uses (i.e., memory accesses) are analyzed elsewhere.  */
+	 virtual defs/uses (i.e., memory accesses) are analyzed elsewhere.
 
-      if (virtual_operand_p (PHI_RESULT (phi)))
+	 Skip reduction phis.  */
+      if (!iv_phi_p (phi))
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "virtual phi. skip.\n");
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "reduc or virtual phi. skip.\n");
 	  continue;
 	}
 
-      /* Skip reduction phis.  */
-
-      if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (phi)) == vect_reduction_def)
-        {
-          if (dump_enabled_p ())
-            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "reduc phi. skip.\n");
-          continue;
-        }
-
       /* Analyze the evolution function.  */
 
       evolution_part
@@ -1676,19 +810,17 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  */
 
 static void
-vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters,
-				  edge update_e)
+vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
+				  tree niters, edge update_e)
 {
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block exit_bb = single_exit (loop)->dest;
-  gphi *phi, *phi1;
   gphi_iterator gsi, gsi1;
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
-  gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
+  basic_block exit_bb = single_exit (loop)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
+  gcc_assert (single_succ_edge (exit_bb) == update_e);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -1699,39 +831,27 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters,
       tree type;
       tree var, ni, ni_name;
       gimple_stmt_iterator last_gsi;
-      stmt_vec_info stmt_info;
 
-      phi = gsi.phi ();
-      phi1 = gsi1.phi ();
+      gphi *phi = gsi.phi ();
+      gphi *phi1 = gsi1.phi ();
       if (dump_enabled_p ())
-        {
-          dump_printf_loc (MSG_NOTE, vect_location,
-                           "vect_update_ivs_after_vectorizer: phi: ");
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vect_update_ivs_after_vectorizer: phi: ");
 	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, phi, 0);
-        }
+	}
 
-      /* Skip virtual phi's.  */
-      if (virtual_operand_p (PHI_RESULT (phi)))
+      /* Skip reduction and virtual phis.  */
+      if (!iv_phi_p (phi))
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "virtual phi. skip.\n");
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "reduc or virtual phi. skip.\n");
 	  continue;
 	}
 
-      /* Skip reduction phis.  */
-      stmt_info = vinfo_for_stmt (phi);
-      if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
-	  || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
-        {
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                             "reduc phi. skip.\n");
-          continue;
-        }
-
       type = TREE_TYPE (gimple_phi_result (phi));
-      step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+      step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (vinfo_for_stmt (phi));
       step_expr = unshare_expr (step_expr);
 
       /* FORNOW: We do not support IVs whose evolution function is a polynomial
@@ -1752,118 +872,40 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters,
       var = create_tmp_var (type, "tmp");
 
       last_gsi = gsi_last_bb (exit_bb);
-      ni_name = force_gimple_operand_gsi (&last_gsi, ni, false, var,
-					  true, GSI_SAME_STMT);
+      gimple_seq new_stmts = NULL;
+      ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+      /* Exit_bb shouldn't be empty.  */
+      if (!gsi_end_p (last_gsi))
+	gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
+      else
+	gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
 
       /* Fix phi expressions in the successor bb.  */
       adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
     }
 }
 
-/* Function vect_do_peeling_for_loop_bound
-
-   Peel the last iterations of the loop represented by LOOP_VINFO.
-   The peeled iterations form a new epilog loop.  Given that the loop now
-   iterates NITERS times, the new epilog loop iterates
-   NITERS % VECTORIZATION_FACTOR times.
-
-   If CHECK_PROFITABILITY is 1 then profitability check is generated
-   using TH as a cost model profitability threshold of iterations for
-   vectorization.
+/* Function vect_gen_prolog_loop_niters
 
-   The original loop will later be made to iterate
-   NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).
-
-   COND_EXPR and COND_EXPR_STMT_LIST are combined with a new generated
-   test.  */
-
-void
-vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
-				tree ni_name, tree ratio_mult_vf_name,
-				unsigned int th, bool check_profitability)
-{
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
-  struct loop *new_loop;
-  edge update_e;
-  basic_block preheader;
-  int max_iter;
-  tree cond_expr = NULL_TREE;
-  gimple_seq cond_expr_stmt_list = NULL;
-
-  if (dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location,
-                     "=== vect_do_peeling_for_loop_bound ===\n");
-
-  initialize_original_copy_tables ();
-
-  new_loop
-    = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
-				     &ratio_mult_vf_name, ni_name, false,
-				     th, check_profitability,
-				     cond_expr, cond_expr_stmt_list,
-				     0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
-  gcc_assert (new_loop);
-  slpeel_checking_verify_cfg_after_peeling (loop, new_loop);
-
-  /* A guard that controls whether the new_loop is to be executed or skipped
-     is placed in LOOP->exit.  LOOP->exit therefore has two successors - one
-     is the preheader of NEW_LOOP, where the IVs from LOOP are used.  The other
-     is a bb after NEW_LOOP, where these IVs are not used.  Find the edge that
-     is on the path where the LOOP IVs are used and need to be updated.  */
-
-  preheader = loop_preheader_edge (new_loop)->src;
-  if (EDGE_PRED (preheader, 0)->src == single_exit (loop)->dest)
-    update_e = EDGE_PRED (preheader, 0);
-  else
-    update_e = EDGE_PRED (preheader, 1);
-
-  /* Update IVs of original loop as if they were advanced
-     by ratio_mult_vf_name steps.  */
-  vect_update_ivs_after_vectorizer (loop_vinfo, ratio_mult_vf_name, update_e);
-
-  /* For vectorization factor N, we need to copy last N-1 values in epilogue
-     and this means N-2 loopback edge executions.
-
-     PEELING_FOR_GAPS works by subtracting last iteration and thus the epilogue
-     will execute at least LOOP_VINFO_VECT_FACTOR times.  */
-  max_iter = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-	      ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 2
-	      : LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 2;
-  if (check_profitability)
-    max_iter = MAX (max_iter, (int) th - 1);
-  record_niter_bound (new_loop, max_iter, false, true);
-  dump_printf (MSG_NOTE,
-               "Setting upper bound of nb iterations for epilogue "
-               "loop to %d\n", max_iter);
-
-  /* After peeling we have to reset scalar evolution analyzer.  */
-  scev_reset ();
-
-  free_original_copy_tables ();
-}
-
-
-/* Function vect_gen_niters_for_prolog_loop
-
-   Set the number of iterations for the loop represented by LOOP_VINFO
-   to the minimum between LOOP_NITERS (the original iteration count of the loop)
-   and the misalignment of DR - the data reference recorded in
-   LOOP_VINFO_UNALIGNED_DR (LOOP_VINFO).  As a result, after the execution of
-   this loop, the data reference DR will refer to an aligned location.
-
-   The following computation is generated:
+   Generate the number of iterations which should be peeled as prolog for the
+   loop represented by LOOP_VINFO.  It is calculated as the misalignment of
+   DR - the data reference recorded in LOOP_VINFO_UNALIGNED_DR (LOOP_VINFO).
+   As a result, after the execution of this loop, the data reference DR will
+   refer to an aligned location.  The following computation is generated:
 
    If the misalignment of DR is known at compile time:
      addr_mis = int mis = DR_MISALIGNMENT (dr);
    Else, compute address misalignment in bytes:
      addr_mis = addr & (vectype_align - 1)
 
-   prolog_niters = min (LOOP_NITERS, ((VF - addr_mis/elem_size)&(VF-1))/step)
+   prolog_niters = ((VF - addr_mis/elem_size)&(VF-1))/step
 
    (elem_size = element type size; an element is the scalar element whose type
    is the inner type of the vectype)
 
+   The computations will be emitted at the end of BB.  We also compute and
+   store upper bound of the result in BOUND.
+
    When the step of the data-ref in the loop is not 1 (as in interleaved data
    and SLP), the number of iterations of the prolog must be divided by the step
    (which is equal to the size of interleaved group).
@@ -1875,24 +917,21 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
    use TYPE_VECTOR_SUBPARTS.  */
 
 static tree
-vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters, int *bound)
+vect_gen_prolog_loop_niters (loop_vec_info loop_vinfo,
+			     basic_block bb, int *bound)
 {
   struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree var;
-  gimple_seq stmts;
+  tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
+  gimple_seq stmts = NULL, new_stmts = NULL;
   tree iters, iters_name;
-  edge pe;
-  basic_block new_bb;
   gimple *dr_stmt = DR_STMT (dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (dr_stmt);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   int vectype_align = TYPE_ALIGN (vectype) / BITS_PER_UNIT;
-  tree niters_type = TREE_TYPE (loop_niters);
   int nelements = TYPE_VECTOR_SUBPARTS (vectype);
 
-  pe = loop_preheader_edge (loop);
-
   if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
     {
       int npeel = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
@@ -1902,16 +941,15 @@ vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters, int
                          "known peeling = %d.\n", npeel);
 
       iters = build_int_cst (niters_type, npeel);
-      *bound = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+      *bound = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) + 1;
     }
   else
     {
-      gimple_seq new_stmts = NULL;
       bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < 0;
       tree offset = negative
 	  ? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) : size_zero_node;
       tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt,
-						&new_stmts, offset, loop);
+						&stmts, offset, loop);
       tree type = unsigned_type_for (TREE_TYPE (start_addr));
       tree vectype_align_minus_1 = build_int_cst (type, vectype_align - 1);
       HOST_WIDE_INT elem_size =
@@ -1922,17 +960,14 @@ vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters, int
       tree byte_misalign;
       tree elem_misalign;
 
-      new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmts);
-      gcc_assert (!new_bb);
-
       /* Create:  byte_misalign = addr & (vectype_align - 1)  */
       byte_misalign =
-        fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr), 
-                     vectype_align_minus_1);
+	fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr),
+		     vectype_align_minus_1);
 
       /* Create:  elem_misalign = byte_misalign / element_size  */
       elem_misalign =
-        fold_build2 (RSHIFT_EXPR, type, byte_misalign, elem_size_log);
+	fold_build2 (RSHIFT_EXPR, type, byte_misalign, elem_size_log);
 
       /* Create:  (niters_type) (nelements - elem_misalign)&(nelements - 1)  */
       if (negative)
@@ -1944,13 +979,6 @@ vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters, int
       *bound = nelements;
     }
 
-  /* Create:  prolog_loop_niters = min (iters, loop_niters) */
-  /* If the loop bound is known at compile time we already verified that it is
-     greater than vf; since the misalignment ('iters') is at most vf, there's
-     no need to generate the MIN_EXPR in this case.  */
-  if (TREE_CODE (loop_niters) != INTEGER_CST)
-    iters = fold_build2 (MIN_EXPR, niters_type, iters, loop_niters);
-
   if (dump_enabled_p ())
     {
       dump_printf_loc (MSG_NOTE, vect_location,
@@ -1960,16 +988,19 @@ vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters, int
     }
 
   var = create_tmp_var (niters_type, "prolog_loop_niters");
-  stmts = NULL;
-  iters_name = force_gimple_operand (iters, &stmts, false, var);
+  iters_name = force_gimple_operand (iters, &new_stmts, false, var);
 
-  /* Insert stmt on loop preheader edge.  */
+  if (new_stmts)
+    gimple_seq_add_seq (&stmts, new_stmts);
   if (stmts)
     {
-      basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
-      gcc_assert (!new_bb);
+      gcc_assert (single_succ_p (bb));
+      gimple_stmt_iterator gsi = gsi_last_bb (bb);
+      if (gsi_end_p (gsi))
+	gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+      else
+	gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
     }
-
   return iters_name;
 }
 
@@ -2009,102 +1040,783 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree niters)
   unsigned int i;
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct data_reference *dr;
- 
- if (dump_enabled_p ())
+
+  if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
-                     "=== vect_update_inits_of_dr ===\n");
+		     "=== vect_update_inits_of_dr ===\n");
+
+  /* Adjust niters to sizetype and insert stmts on loop preheader edge.  */
+  if (!types_compatible_p (sizetype, TREE_TYPE (niters)))
+    {
+      gimple_seq seq;
+      edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
+      tree var = create_tmp_var (sizetype, "prolog_loop_adjusted_niters");
+
+      niters = fold_convert (sizetype, niters);
+      niters = force_gimple_operand (niters, &seq, false, var);
+      if (seq)
+	{
+	  basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+	  gcc_assert (!new_bb);
+	}
+    }
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     vect_update_init_of_dr (dr, niters);
 }
 
 
-/* Function vect_do_peeling_for_alignment
+/* This function builds ni_name = number of iterations.  Statements
+   are emitted on the loop preheader edge.  */
 
-   Peel the first 'niters' iterations of the loop represented by LOOP_VINFO.
-   'niters' is set to the misalignment of one of the data references in the
-   loop, thereby forcing it to refer to an aligned location at the beginning
-   of the execution of this loop.  The data reference for which we are
-   peeling is recorded in LOOP_VINFO_UNALIGNED_DR.
+tree
+vect_build_loop_niters (loop_vec_info loop_vinfo)
+{
+  tree ni = unshare_expr (LOOP_VINFO_NITERS (loop_vinfo));
+  if (TREE_CODE (ni) == INTEGER_CST)
+    return ni;
+  else
+    {
+      tree ni_name, var;
+      gimple_seq stmts = NULL;
+      edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
 
-   If CHECK_PROFITABILITY is 1 then profitability check is generated
-   using TH as a cost model profitability threshold of iterations for
-   vectorization.  */
+      var = create_tmp_var (TREE_TYPE (ni), "niters");
+      ni_name = force_gimple_operand (ni, &stmts, false, var);
+      if (stmts)
+	gsi_insert_seq_on_edge_immediate (pe, stmts);
+
+      return ni_name;
+    }
+}
+
+/* Calculate the number of iterations under which scalar loop will be
+   preferred than vectorized loop.  NITERS_PROLOG is the number of
+   iterations of prolog loop.  If it's integer const, the integer
+   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
+   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
+   true.  This function also store upper bound of the result in BOUND.  */
+
+static tree
+vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
+			     int bound_prolog, int vf, int th, int *bound,
+			     bool check_profitability)
+{
+  tree type = TREE_TYPE (niters_prolog);
+  tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
+			     build_int_cst (type, vf));
+
+  *bound = vf + bound_prolog;
+  if (check_profitability)
+    {
+      th++;
+      /* Peeling for constant times.  */
+      if (int_niters_prolog >= 0)
+	{
+	  *bound = (int_niters_prolog + vf < th
+					   ? th
+					   : vf + int_niters_prolog);
+	  return build_int_cst (type, *bound);
+	}
+      /* Peeling for unknown times, in this case, prolog loop must
+	 execute less than bound_prolog times.  */
+      if (th >=  vf + bound_prolog - 1)
+	{
+	  *bound = th;
+	  return build_int_cst (type, th);
+	}
+      /* Need to do runtime comparison, but bound remains the same.  */
+      else if (th > vf)
+	return fold_build2 (MAX_EXPR, type, build_int_cst (type, th), niters);
+    }
+  return niters;
+}
+
+/* This function generates the following statements:
+
+   niters = number of iterations loop executes (after peeling)
+   niters_vector = niters / vf
+
+   and places them on the loop preheader edge.  NITERS_NO_OVERFLOW is
+   true if NITERS doesn't overflow.  */
 
 void
-vect_do_peeling_for_alignment (loop_vec_info loop_vinfo, tree ni_name,
-			       unsigned int th, bool check_profitability)
+vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters,
+			     tree *niters_vector_ptr, bool niters_no_overflow)
+{
+  tree ni_minus_gap, var;
+  tree niters_vector;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
+  tree log_vf = build_int_cst (TREE_TYPE (niters), exact_log2 (vf));
+
+  /* If epilogue loop is required because of data accesses with gaps, we
+     subtract one iteration from the total number of iterations here for
+     correct calculation of RATIO.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+    {
+      ni_minus_gap = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
+				  niters,
+				  build_one_cst (TREE_TYPE (niters)));
+      if (!is_gimple_val (ni_minus_gap))
+	{
+	  var = create_tmp_var (TREE_TYPE (niters), "ni_gap");
+	  gimple *stmts = NULL;
+	  ni_minus_gap = force_gimple_operand (ni_minus_gap, &stmts,
+					       true, var);
+	  gsi_insert_seq_on_edge_immediate (pe, stmts);
+	}
+    }
+  else
+    ni_minus_gap = niters;
+
+  /* Create: niters >> log2(vf) */
+  /* If it's known that niters == number of latch executions + 1 doesn't
+     overflow, we can generate niters >> log2(vf); otherwise we generate
+     (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio
+     will be at least one.  */
+  if (niters_no_overflow)
+    niters_vector = fold_build2 (RSHIFT_EXPR, TREE_TYPE (niters),
+				 ni_minus_gap, log_vf);
+  else
+    niters_vector
+      = fold_build2 (PLUS_EXPR, TREE_TYPE (niters),
+		     fold_build2 (RSHIFT_EXPR, TREE_TYPE (niters),
+				  fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
+					       ni_minus_gap,
+					       build_int_cst
+						 (TREE_TYPE (niters), vf)),
+				  log_vf),
+		     build_int_cst (TREE_TYPE (niters), 1));
+
+  if (!is_gimple_val (niters_vector))
+    {
+      var = create_tmp_var (TREE_TYPE (niters), "bnd");
+      gimple *stmts = NULL;
+      niters_vector = force_gimple_operand (niters_vector, &stmts, true, var);
+      gsi_insert_seq_on_edge_immediate (pe, stmts);
+    }
+  *niters_vector_ptr = niters_vector;
+
+  return;
+}
+
+/* Given NITERS_VECTOR which is the number of iterations for vectorized
+   loop specified by LOOP_VINFO after vectorization, compute the number
+   of iterations before vectorization (niters_vector * vf) and store it
+   to NITERS_VECTOR_MULT_VF_PTR.  */
+
+static void
+vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
+				     tree niters_vector,
+				     tree *niters_vector_mult_vf_ptr)
 {
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
-  tree niters_of_prolog_loop;
-  tree wide_prolog_niters;
-  struct loop *new_loop;
-  int max_iter;
-  int bound = 0;
+  tree type = TREE_TYPE (niters_vector);
+  tree log_vf = build_int_cst (type, exact_log2 (vf));
+  basic_block exit_bb = single_exit (loop)->dest;
 
-  if (dump_enabled_p ())
-    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-                     "loop peeled for vectorization to enhance"
-                     " alignment\n");
+  gcc_assert (niters_vector_mult_vf_ptr != NULL);
+  tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
+					    niters_vector, log_vf);
+  if (!is_gimple_val (niters_vector_mult_vf))
+    {
+      tree var = create_tmp_var (type, "niters_vector_mult_vf");
+      gimple_seq stmts = NULL;
+      niters_vector_mult_vf = force_gimple_operand (niters_vector_mult_vf,
+						    &stmts, true, var);
+      gimple_stmt_iterator gsi = gsi_start_bb (exit_bb);
+      gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+    }
+  *niters_vector_mult_vf_ptr = niters_vector_mult_vf;
+}
+
+/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
+   from SECOND/FIRST and puts it at the original loop's preheader/exit
+   edge, the two loops are arranged as below:
+
+       preheader_a:
+     first_loop:
+       header_a:
+	 i_1 = PHI<i_0, i_2>;
+	 ...
+	 i_2 = i_1 + 1;
+	 if (cond_a)
+	   goto latch_a;
+	 else
+	   goto between_bb;
+       latch_a:
+	 goto header_a;
+
+       between_bb:
+	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
+
+     second_loop:
+       header_b:
+	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
+				 or with i_2 if no LCSSA phi is created
+				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
+	 ...
+	 i_4 = i_3 + 1;
+	 if (cond_b)
+	   goto latch_b;
+	 else
+	   goto exit_bb;
+       latch_b:
+	 goto header_b;
+
+       exit_bb:
+
+   This function creates loop closed SSA for the first loop; update the
+   second loop's PHI nodes by replacing argument on incoming edge with the
+   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
+   is false, Loop closed ssa phis will only be created for non-iv phis for
+   the first loop.  Above example will be transformed into:
+
+   This function assumes exit bb of the first loop is preheader bb of the
+   second loop, i.e, between_bb in the example code.  With PHIs updated,
+   the second loop will execute rest iterations of the first.  */
+
+static void
+slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
+				   struct loop *first, struct loop *second,
+				   bool create_lcssa_for_iv_phis)
+{
+  gphi_iterator gsi_update, gsi_orig;
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+  edge first_latch_e = EDGE_SUCC (first->latch, 0);
+  edge second_preheader_e = loop_preheader_edge (second);
+  basic_block between_bb = single_exit (first)->dest;
+
+  gcc_assert (between_bb == second_preheader_e->src);
+  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
+  /* Either the first loop or the second is the loop to be vectorized.  */
+  gcc_assert (loop == first || loop == second);
+
+  for (gsi_orig = gsi_start_phis (first->header),
+       gsi_update = gsi_start_phis (second->header);
+       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
+       gsi_next (&gsi_orig), gsi_next (&gsi_update))
+    {
+      gphi *orig_phi = gsi_orig.phi ();
+      gphi *update_phi = gsi_update.phi ();
+
+      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
+      /* Generate lcssa PHI node for the first loop.  */
+      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
+      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi))
+	{
+	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
+	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
+	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
+	  arg = new_res;
+	}
+
+      /* Update PHI node in the second loop by replacing arg on the loop's
+	 incoming edge.  */
+      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
+    }
+}
+
+/* Function slpeel_add_loop_guard adds guard skipping from the beginning
+   of SKIP_LOOP to the beginning of UPDATE_LOOP.  GUARD_EDGE and MERGE_EDGE
+   are two pred edges of the merge point before UPDATE_LOOP.  The two loops
+   appear like below:
+
+       guard_bb:
+	 if (cond)
+	   goto merge_bb;
+	 else
+	   goto skip_loop;
+
+     skip_loop:
+       header_a:
+	 i_1 = PHI<i_0, i_2>;
+	 ...
+	 i_2 = i_1 + 1;
+	 if (cond_a)
+	   goto latch_a;
+	 else
+	   goto exit_a;
+       latch_a:
+	 goto header_a;
+
+       exit_a:
+	 i_5 = PHI<i_2>;
+
+       merge_bb:
+	 ;; PHI (i_x = PHI<i_0, i_5>) to be created at merge point.
+
+     update_loop:
+       header_b:
+	 i_3 = PHI<i_5, i_4>;  ;; Use of i_5 to be replaced with i_x.
+	 ...
+	 i_4 = i_3 + 1;
+	 if (cond_b)
+	   goto latch_b;
+	 else
+	   goto exit_bb;
+       latch_b:
+	 goto header_b;
+
+       exit_bb:
+
+   This function creates PHI nodes at merge_bb and replaces the use of i_5
+   in the update_loop's PHI node with the result of new PHI result.  */
+
+static void
+slpeel_update_phi_nodes_for_guard1 (struct loop *skip_loop,
+				    struct loop *update_loop,
+				    edge guard_edge, edge merge_edge)
+{
+  source_location merge_loc, guard_loc;
+  edge orig_e = loop_preheader_edge (skip_loop);
+  edge update_e = loop_preheader_edge (update_loop);
+  gphi_iterator gsi_orig, gsi_update;
+
+  for ((gsi_orig = gsi_start_phis (skip_loop->header),
+	gsi_update = gsi_start_phis (update_loop->header));
+       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
+       gsi_next (&gsi_orig), gsi_next (&gsi_update))
+    {
+      gphi *orig_phi = gsi_orig.phi ();
+      gphi *update_phi = gsi_update.phi ();
+
+      /* Generate new phi node at merge bb of the guard.  */
+      tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
+      gphi *new_phi = create_phi_node (new_res, guard_edge->dest);
+
+      /* Merge bb has two incoming edges: GUARD_EDGE and MERGE_EDGE.  Set the
+	 args in NEW_PHI for these edges.  */
+      tree merge_arg = PHI_ARG_DEF_FROM_EDGE (update_phi, update_e);
+      tree guard_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
+      merge_loc = gimple_phi_arg_location_from_edge (update_phi, update_e);
+      guard_loc = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
+      add_phi_arg (new_phi, merge_arg, merge_edge, merge_loc);
+      add_phi_arg (new_phi, guard_arg, guard_edge, guard_loc);
+
+      /* Update phi in UPDATE_PHI.  */
+      adjust_phi_and_debug_stmts (update_phi, update_e, new_res);
+    }
+}
+
+/* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
+   this function searches for the corresponding lcssa phi node in exit
+   bb of LOOP.  If it is found, return the phi result; otherwise return
+   NULL.  */
+
+static tree
+find_guard_arg (struct loop *loop, struct loop *epilog ATTRIBUTE_UNUSED,
+		gphi *lcssa_phi)
+{
+  gphi_iterator gsi;
+  edge e = single_exit (loop);
+
+  gcc_assert (single_pred_p (e->dest));
+  for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gphi *phi = gsi.phi ();
+      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
+			   PHI_ARG_DEF (lcssa_phi, 0), 0))
+	return PHI_RESULT (phi);
+    }
+  return NULL_TREE;
+}
+
+/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
+   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
+   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
+   and MERGE_EDGE are the two pred edges of merge_bb at the end of EPILOG.
+   The CFG looks like:
+
+     loop:
+       header_a:
+	 i_1 = PHI<i_0, i_2>;
+	 ...
+	 i_2 = i_1 + 1;
+	 if (cond_a)
+	   goto latch_a;
+	 else
+	   goto exit_a;
+       latch_a:
+	 goto header_a;
+
+       exit_a:
+
+       guard_bb:
+	 if (cond)
+	   goto merge_bb;
+	 else
+	   goto epilog_loop;
+
+       ;; fall_through_bb
+
+     epilog_loop:
+       header_b:
+	 i_3 = PHI<i_2, i_4>;
+	 ...
+	 i_4 = i_3 + 1;
+	 if (cond_b)
+	   goto latch_b;
+	 else
+	   goto merge_bb;
+       latch_b:
+	 goto header_b;
+
+       merge_bb:
+	 ; PHI node (i_y = PHI<i_2, i_4>) to be created at merge point.
+
+       exit_bb:
+	 i_x = PHI<i_4>;  ;Use of i_4 to be replaced with i_y in merge_bb.
+
+   For each name used out side EPILOG (i.e - for each name that has a lcssa
+   phi in exit_bb) we create a new PHI in merge_bb.  The new PHI has two
+   args corresponding to GUARD_EDGE and MERGE_EDGE.  Arg for MERGE_EDGE is
+   the arg of the original PHI in exit_bb, arg for GUARD_EDGE is defined
+   by LOOP and is found in the exit bb of LOOP.  Arg of the original PHI
+   in exit_bb will also be updated.  */
+
+static void
+slpeel_update_phi_nodes_for_guard2 (struct loop *loop, struct loop *epilog,
+				    edge guard_edge, edge merge_edge)
+{
+  gphi_iterator gsi;
+  basic_block merge_bb = guard_edge->dest;
+
+  gcc_assert (single_succ_p (merge_bb));
+  edge e = single_succ_edge (merge_bb);
+  basic_block exit_bb = e->dest;
+  gcc_assert (single_pred_p (exit_bb));
+  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
+
+  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gphi *update_phi = gsi.phi ();
+      tree old_arg = PHI_ARG_DEF (update_phi, 0);
+      /* This loop-closed-phi actually doesn't represent a use out of the
+	 loop - the phi arg is a constant.  */
+      if (TREE_CODE (old_arg) != SSA_NAME)
+	continue;
+
+      tree merge_arg = get_current_def (old_arg);
+      if (!merge_arg)
+	merge_arg = old_arg;
+
+      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
+      /* If the var is live after loop but not a reduction, we simply
+	 use the old arg.  */
+      if (!guard_arg)
+	guard_arg = old_arg;
+
+      /* Create new phi node in MERGE_BB:  */
+      tree new_res = copy_ssa_name (PHI_RESULT (update_phi));
+      gphi *merge_phi = create_phi_node (new_res, merge_bb);
+
+      /* MERGE_BB has two incoming edges: GUARD_EDGE and MERGE_EDGE, Set
+	 the two PHI args in merge_phi for these edges.  */
+      add_phi_arg (merge_phi, merge_arg, merge_edge, UNKNOWN_LOCATION);
+      add_phi_arg (merge_phi, guard_arg, guard_edge, UNKNOWN_LOCATION);
+
+      /* Update the original phi in exit_bb.  */
+      adjust_phi_and_debug_stmts (update_phi, e, new_res);
+    }
+}
+
+/* EPILOG loop is duplicated from the original loop for vectorizing,
+   the arg of its loop closed ssa PHI needs to be updated.  */
+
+static void
+slpeel_update_phi_nodes_for_lcssa (struct loop *epilog)
+{
+  gphi_iterator gsi;
+  basic_block exit_bb = single_exit (epilog)->dest;
+
+  gcc_assert (single_pred_p (exit_bb));
+  edge e = EDGE_PRED (exit_bb, 0);
+  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
+}
+
+/* Function vect_do_peeling.
+
+   Input:
+   - LOOP_VINFO: Represent a loop to be vectorized, which looks like:
+
+       preheader:
+     LOOP:
+       header_bb:
+	 loop_body
+	 if (exit_loop_cond) goto exit_bb
+	 else                goto header_bb
+       exit_bb:
+
+   - NITERS: The number of iterations of the loop.
+   - NITERSM1: The number of iterations of the loop's latch.
+   - NITERS_NO_OVERFLOW: No overflow in computing NITERS.
+   - TH, CHECK_PROFITABILITY: Threshold of niters to vectorize loop if
+			      CHECK_PROFITABILITY is true.
+   Output:
+   - NITERS_VECTOR: The number of iterations of loop after vectorization.
 
+   This function peels prolog and epilog from the loop, adds guards skipping
+   PROLOG and EPILOG for various conditions.  As a result, the changed CFG
+   would look like:
+
+       guard_bb_1:
+	 if (prefer_scalar_loop) goto merge_bb_1
+	 else                    goto guard_bb_2
+
+       guard_bb_2:
+         if (skip_prolog) goto merge_bb_2
+         else             goto prolog_preheader
+
+       prolog_preheader:
+     PROLOG:
+       prolog_header_bb:
+	 prolog_body
+	 if (exit_prolog_cond) goto prolog_exit_bb
+	 else                  goto prolog_header_bb
+       prolog_exit_bb:
+
+       merge_bb_2:
+
+       vector_preheader:
+     VECTOR LOOP:
+       vector_header_bb:
+	 vector_body
+	 if (exit_vector_cond) goto vector_exit_bb
+	 else                  goto vector_header_bb
+       vector_exit_bb:
+
+       guard_bb_3:
+	 if (skip_epilog) goto merge_bb_3
+	 else             goto epilog_preheader
+
+       merge_bb_1:
+
+       epilog_preheader:
+     EPILOG:
+       epilog_header_bb:
+	 epilog_body
+	 if (exit_epilog_cond) goto merge_bb_3
+	 else                  goto epilog_header_bb
+
+       merge_bb_3:
+
+   Note this function peels prolog and epilog only if it's necessary,
+   as well as guards.
+
+   TODO: Guard for prefer_scalar_loop should be emitted along with
+   versioning conditions if loop versioning is needed.  */
+
+void
+vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
+		 tree *niters_vector, int th, bool check_profitability,
+		 bool niters_no_overflow)
+{
+  edge e, guard_e;
+  tree type = TREE_TYPE (niters), guard_cond;
+  basic_block guard_bb, guard_to;
+  int prob_prolog, prob_vector, prob_epilog;
+  int bound_prolog = 0, bound_epilog = 0, bound = 0;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  int prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  bool epilog_peeling = (LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+			 || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
+
+  if (!prolog_peeling && !epilog_peeling)
+    return;
+
+  prob_vector = 9 * REG_BR_PROB_BASE / 10;
+  if ((vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo)) == 2)
+    vf = 3;
+  prob_prolog = prob_epilog = (vf - 1) * REG_BR_PROB_BASE / vf;
+  vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  struct loop *prolog, *epilog, *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  struct loop *first_loop = loop;
+  create_lcssa_for_virtual_phi (loop);
+  update_ssa (TODO_update_ssa_only_virtuals);
+
+  if (MAY_HAVE_DEBUG_STMTS)
+    {
+      gcc_assert (!adjust_vec.exists ());
+      adjust_vec.create (32);
+    }
   initialize_original_copy_tables ();
 
-  niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo,
-							   ni_name,
-							   &bound);
-
-  /* Peel the prolog loop and iterate it niters_of_prolog_loop.  */
-  new_loop =
-    slpeel_tree_peel_loop_to_edge (loop, scalar_loop,
-				   loop_preheader_edge (loop),
-				   &niters_of_prolog_loop, ni_name, true,
-				   th, check_profitability, NULL_TREE, NULL,
-				   bound, 0);
-
-  gcc_assert (new_loop);
-  slpeel_checking_verify_cfg_after_peeling (new_loop, loop);
-  /* For vectorization factor N, we need to copy at most N-1 values 
-     for alignment and this means N-2 loopback edge executions.  */
-  max_iter = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 2;
-  if (check_profitability)
-    max_iter = MAX (max_iter, (int) th - 1);
-  record_niter_bound (new_loop, max_iter, false, true);
-  dump_printf (MSG_NOTE,
-               "Setting upper bound of nb iterations for prologue "
-               "loop to %d\n", max_iter);
-
-  /* Update number of times loop executes.  */
-  LOOP_VINFO_NITERS (loop_vinfo) = fold_build2 (MINUS_EXPR,
-		TREE_TYPE (ni_name), ni_name, niters_of_prolog_loop);
-  LOOP_VINFO_NITERSM1 (loop_vinfo) = fold_build2 (MINUS_EXPR,
-		TREE_TYPE (ni_name),
-		LOOP_VINFO_NITERSM1 (loop_vinfo), niters_of_prolog_loop);
-
-  if (types_compatible_p (sizetype, TREE_TYPE (niters_of_prolog_loop)))
-    wide_prolog_niters = niters_of_prolog_loop;
-  else
+  /* Prolog loop may be skipped.  */
+  bool skip_prolog = (prolog_peeling != 0);
+  /* Skip to epilog if scalar loop may be preferred.  It's only used when
+     we peel for epilog loop.  */
+  bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo));
+  /* Epilog loop must be executed if the number of iterations for epilog
+     loop is known at compile time, otherwise we need to add a check at
+     the end of vector loop and skip to the end of epilog loop.  */
+  bool skip_epilog = (prolog_peeling < 0
+		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo));
+  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+    skip_epilog = false;
+
+  /* Record the anchor bb at which guard should be placed if scalar loop
+     may be preferred.  */
+  basic_block anchor = loop_preheader_edge (loop)->src;
+  if (skip_vector)
+    split_edge (loop_preheader_edge (loop));
+
+  tree niters_prolog = build_int_cst (type, 0);
+  source_location loop_loc = find_loop_location (loop);
+  struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
+  if (prolog_peeling)
     {
-      gimple_seq seq = NULL;
-      edge pe = loop_preheader_edge (loop);
-      tree wide_iters = fold_convert (sizetype, niters_of_prolog_loop);
-      tree var = create_tmp_var (sizetype, "prolog_loop_adjusted_niters");
-      wide_prolog_niters = force_gimple_operand (wide_iters, &seq, false,
-                                                 var);
-      if (seq)
+      e = loop_preheader_edge (loop);
+      if (!slpeel_can_duplicate_loop_p (loop, e))
 	{
-	  /* Insert stmt on loop preheader edge.  */
-          basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
-          gcc_assert (!new_bb);
-        }
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
+			   "loop can't be duplicated to preheader edge.\n");
+	  gcc_unreachable ();
+	}
+      /* Peel prolog and put it on preheader edge of loop.  */
+      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
+      if (!prolog)
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
+			   "slpeel_tree_duplicate_loop_to_edge_cfg failed.\n");
+	  gcc_unreachable ();
+	}
+      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
+      first_loop = prolog;
+      reset_original_copy_tables ();
+
+      /* Generate and update the number of iterations for prolog loop.  */
+      niters_prolog = vect_gen_prolog_loop_niters (loop_vinfo, anchor,
+						   &bound_prolog);
+      slpeel_make_loop_iterate_ntimes (prolog, niters_prolog);
+
+      /* Skip the prolog loop.  */
+      if (skip_prolog)
+	{
+	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+				    niters_prolog, build_int_cst (type, 0));
+	  guard_bb = loop_preheader_edge (prolog)->src;
+	  guard_to = split_edge (loop_preheader_edge (loop));
+	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
+					   guard_to, guard_bb,
+					   inverse_probability (prob_prolog));
+	  e = EDGE_PRED (guard_to, 0);
+	  e = (e != guard_e ? e : EDGE_PRED (guard_to, 1));
+	  slpeel_update_phi_nodes_for_guard1 (prolog, loop, guard_e, e);
+	  scale_loop_profile (prolog, prob_prolog, bound_prolog);
+	}
+      /* Update init address of DRs.  */
+      vect_update_inits_of_drs (loop_vinfo, niters_prolog);
+      /* Update niters for vector loop.  */
+      LOOP_VINFO_NITERS (loop_vinfo)
+	= fold_build2 (MINUS_EXPR, type, niters, niters_prolog);
+      LOOP_VINFO_NITERSM1 (loop_vinfo)
+	= fold_build2 (MINUS_EXPR, type,
+		       LOOP_VINFO_NITERSM1 (loop_vinfo), niters_prolog);
+      niters = vect_build_loop_niters (loop_vinfo);
+
+      /* Prolog iterates at most bound_prolog - 1 times, latch iterates
+	 at most bound_prolog - 2 times.  */
+      record_niter_bound (prolog, bound_prolog - 2, false, true);
+      delete_update_ssa ();
+      adjust_vec_debug_stmts ();
+      scev_reset ();
     }
 
-  /* Update the init conditions of the access functions of all data refs.  */
-  vect_update_inits_of_drs (loop_vinfo, wide_prolog_niters);
+  if (epilog_peeling)
+    {
+      e = single_exit (loop);
+      if (!slpeel_can_duplicate_loop_p (loop, e))
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
+			   "loop can't be duplicated to exit edge.\n");
+	  gcc_unreachable ();
+	}
+      /* Peel epilog and put it on exit edge of loop.  */
+      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
+      if (!epilog)
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
+			   "slpeel_tree_duplicate_loop_to_edge_cfg failed.\n");
+	  gcc_unreachable ();
+	}
+      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
+
+      /* Scalar version loop may be preferred.  In this case, add guard
+	 and skip to epilog.  Note this only happens when the number of
+	 iterations of loop is unknown at compile time, otherwise this
+	 won't be vectorized.  */
+      if (skip_vector)
+	{
+	  /* Guard_cond needs is based on NITERSM1 because NITERS might
+	     overflow, so here it is niters_scalar - 1 generated.  In
+	     other words, both niters_scalar and bound_epilog are for
+	     scalar loop's latch.  */
+	  tree t = vect_gen_scalar_loop_niters (niters_prolog, prolog_peeling,
+						bound_prolog, vf - 1, th - 1,
+						&bound_epilog,
+						check_profitability);
+	  guard_cond = fold_build2 (LT_EXPR, boolean_type_node,
+				    nitersm1, t);
+	  guard_bb = anchor;
+	  guard_to = split_edge (loop_preheader_edge (epilog));
+	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
+					   guard_to, guard_bb,
+					   inverse_probability (prob_vector));
+	  e = EDGE_PRED (guard_to, 0);
+	  e = (e != guard_e ? e : EDGE_PRED (guard_to, 1));
+	  slpeel_update_phi_nodes_for_guard1 (first_loop, epilog, guard_e, e);
+	  scale_loop_profile (epilog, prob_vector, bound_epilog);
+	}
 
-  /* After peeling we have to reset scalar evolution analyzer.  */
-  scev_reset ();
+      tree niters_vector_mult_vf;
+      /* If loop is peeled for non-zero constant times, now niters refers to
+	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
+	 overflows.  */
+      niters_no_overflow |= (prolog_peeling > 0);
+      vect_gen_vector_loop_niters (loop_vinfo, niters,
+				   niters_vector, niters_no_overflow);
+      vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector,
+					   &niters_vector_mult_vf);
+      /* Update IVs of original loop as if they were advanced by
+	 niters_vector_mult_vf steps.  */
+      gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
+      edge update_e = skip_vector ? e : loop_preheader_edge (epilog);
+      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
+					update_e);
+
+      if (skip_epilog)
+	{
+	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+				    niters, niters_vector_mult_vf);
+	  guard_bb = single_exit (loop)->dest;
+	  guard_to = split_edge (single_exit (epilog));
+	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
+					   skip_vector ? anchor : guard_bb,
+					   inverse_probability (prob_epilog));
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
+					      single_exit (epilog));
+	  scale_loop_profile (epilog, prob_epilog, bound);
+	}
+      else
+	slpeel_update_phi_nodes_for_lcssa (epilog);
 
+      bound = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? vf * 2 : vf) - 2;
+      /* We share epilog loop with scalar version loop.  */
+      bound_epilog = MAX (bound, bound_epilog - 1);
+      record_niter_bound (epilog, bound_epilog, false, true);
+
+      delete_update_ssa ();
+      adjust_vec_debug_stmts ();
+      scev_reset ();
+    }
+  adjust_vec.release ();
   free_original_copy_tables ();
 }
 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 03ece95..0d37f55 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6610,120 +6610,6 @@ vect_loop_kill_debug_uses (struct loop *loop, gimple *stmt)
     }
 }
 
-
-/* This function builds ni_name = number of iterations.  Statements
-   are emitted on the loop preheader edge.  */
-
-static tree
-vect_build_loop_niters (loop_vec_info loop_vinfo)
-{
-  tree ni = unshare_expr (LOOP_VINFO_NITERS (loop_vinfo));
-  if (TREE_CODE (ni) == INTEGER_CST)
-    return ni;
-  else
-    {
-      tree ni_name, var;
-      gimple_seq stmts = NULL;
-      edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
-
-      var = create_tmp_var (TREE_TYPE (ni), "niters");
-      ni_name = force_gimple_operand (ni, &stmts, false, var);
-      if (stmts)
-	gsi_insert_seq_on_edge_immediate (pe, stmts);
-
-      return ni_name;
-    }
-}
-
-
-/* This function generates the following statements:
-
-   ni_name = number of iterations loop executes
-   ratio = ni_name / vf
-   ratio_mult_vf_name = ratio * vf
-
-   and places them on the loop preheader edge.  */
-
-static void
-vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
-				 tree ni_name,
-				 tree *ratio_mult_vf_name_ptr,
-				 tree *ratio_name_ptr)
-{
-  tree ni_minus_gap_name;
-  tree var;
-  tree ratio_name;
-  tree ratio_mult_vf_name;
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
-  tree log_vf;
-
-  log_vf = build_int_cst (TREE_TYPE (ni_name), exact_log2 (vf));
-
-  /* If epilogue loop is required because of data accesses with gaps, we
-     subtract one iteration from the total number of iterations here for
-     correct calculation of RATIO.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
-    {
-      ni_minus_gap_name = fold_build2 (MINUS_EXPR, TREE_TYPE (ni_name),
-				       ni_name,
-			               build_one_cst (TREE_TYPE (ni_name)));
-      if (!is_gimple_val (ni_minus_gap_name))
-	{
-	  var = create_tmp_var (TREE_TYPE (ni_name), "ni_gap");
-	  gimple *stmts = NULL;
-          ni_minus_gap_name = force_gimple_operand (ni_minus_gap_name, &stmts,
-						    true, var);
-	  gsi_insert_seq_on_edge_immediate (pe, stmts);
-        }
-    }
-  else
-    ni_minus_gap_name = ni_name;
-
-  /* Create: ratio = ni >> log2(vf) */
-  /* ???  As we have ni == number of latch executions + 1, ni could
-     have overflown to zero.  So avoid computing ratio based on ni
-     but compute it using the fact that we know ratio will be at least
-     one, thus via (ni - vf) >> log2(vf) + 1.  */
-  ratio_name
-    = fold_build2 (PLUS_EXPR, TREE_TYPE (ni_name),
-		   fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name),
-				fold_build2 (MINUS_EXPR, TREE_TYPE (ni_name),
-					     ni_minus_gap_name,
-					     build_int_cst
-					       (TREE_TYPE (ni_name), vf)),
-				log_vf),
-		   build_int_cst (TREE_TYPE (ni_name), 1));
-  if (!is_gimple_val (ratio_name))
-    {
-      var = create_tmp_var (TREE_TYPE (ni_name), "bnd");
-      gimple *stmts = NULL;
-      ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
-      gsi_insert_seq_on_edge_immediate (pe, stmts);
-    }
-  *ratio_name_ptr = ratio_name;
-
-  /* Create: ratio_mult_vf = ratio << log2 (vf).  */
-
-  if (ratio_mult_vf_name_ptr)
-    {
-      ratio_mult_vf_name = fold_build2 (LSHIFT_EXPR, TREE_TYPE (ratio_name),
-					ratio_name, log_vf);
-      if (!is_gimple_val (ratio_mult_vf_name))
-	{
-	  var = create_tmp_var (TREE_TYPE (ni_name), "ratio_mult_vf");
-	  gimple *stmts = NULL;
-	  ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
-						     true, var);
-	  gsi_insert_seq_on_edge_immediate (pe, stmts);
-	}
-      *ratio_mult_vf_name_ptr = ratio_mult_vf_name;
-    }
-
-  return;
-}
-
-
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -6737,8 +6623,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
   int i;
-  tree ratio = NULL;
-  int vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  tree niters_vector = NULL;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   bool grouped_store;
   bool slp_scheduled = false;
   gimple *stmt, *pattern_stmt;
@@ -6808,49 +6694,20 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 	}
     }
 
-  tree ni_name = vect_build_loop_niters (loop_vinfo);
-  LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = ni_name;
-
-  /* Peel the loop if there are data refs with unknown alignment.
-     Only one data ref with unknown store is allowed.  */
-
-  if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo))
+  tree niters = vect_build_loop_niters (loop_vinfo);
+  LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = niters;
+  tree nitersm1 = unshare_expr (LOOP_VINFO_NITERSM1 (loop_vinfo));
+  vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, th,
+		   check_profitability, false);
+  if (niters_vector == NULL_TREE)
     {
-      vect_do_peeling_for_alignment (loop_vinfo, ni_name,
-				     th, check_profitability);
-      check_profitability = false;
-      /* The above adjusts LOOP_VINFO_NITERS, so cause ni_name to
-	 be re-computed.  */
-      ni_name = NULL_TREE;
-    }
-
-  /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
-     compile time constant), or it is a constant that doesn't divide by the
-     vectorization factor, then an epilog loop needs to be created.
-     We therefore duplicate the loop: the original loop will be vectorized,
-     and will compute the first (n/VF) iterations.  The second copy of the loop
-     will remain scalar and will compute the remaining (n%VF) iterations.
-     (VF is the vectorization factor).  */
-
-  if (LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
-      || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
-    {
-      tree ratio_mult_vf;
-      if (!ni_name)
-	ni_name = vect_build_loop_niters (loop_vinfo);
-      vect_generate_tmps_on_preheader (loop_vinfo, ni_name, &ratio_mult_vf,
-				       &ratio);
-      vect_do_peeling_for_loop_bound (loop_vinfo, ni_name, ratio_mult_vf,
-				      th, check_profitability);
-    }
-  else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
-    ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
-		LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
-  else
-    {
-      if (!ni_name)
-	ni_name = vect_build_loop_niters (loop_vinfo);
-      vect_generate_tmps_on_preheader (loop_vinfo, ni_name, NULL, &ratio);
+      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+	niters_vector
+	  = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
+			   LOOP_VINFO_INT_NITERS (loop_vinfo) / vf);
+      else
+	vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector,
+				     false);
     }
 
   /* 1) Make sure the loop header has exactly two entries
@@ -6893,7 +6750,7 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 
 	  if (STMT_VINFO_VECTYPE (stmt_info)
 	      && (TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))
-		  != (unsigned HOST_WIDE_INT) vectorization_factor)
+		  != (unsigned HOST_WIDE_INT) vf)
 	      && dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location, "multiple-types.\n");
 
@@ -7026,7 +6883,7 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 		= (unsigned int)
 		  TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
 	      if (!STMT_SLP_TYPE (stmt_info)
-		  && nunits != (unsigned int) vectorization_factor
+		  && nunits != (unsigned int) vf
 		  && dump_enabled_p ())
 		  /* For SLP VF is set according to unrolling factor, and not
 		     to vector size, hence for SLP this print is not valid.  */
@@ -7098,11 +6955,11 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 	}		        /* stmts in BB */
     }				/* BBs in loop */
 
-  slpeel_make_loop_iterate_ntimes (loop, ratio);
+  slpeel_make_loop_iterate_ntimes (loop, niters_vector);
 
   /* Reduce loop iterations by the vectorization factor.  */
-  scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
-		      expected_iterations / vectorization_factor);
+  scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
+		      expected_iterations / vf);
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
     {
       if (loop->nb_iterations_upper_bound != 0)
@@ -7112,16 +6969,14 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 	   = loop->nb_iterations_likely_upper_bound - 1;
     }
   loop->nb_iterations_upper_bound
-    = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
-		      vectorization_factor) - 1;
+    = wi::udiv_floor (loop->nb_iterations_upper_bound + 1, vf) - 1;
   loop->nb_iterations_likely_upper_bound
-    = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + 1,
-		      vectorization_factor) - 1;
+    = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + 1, vf) - 1;
 
   if (loop->any_estimate)
     {
       loop->nb_iterations_estimate
-        = wi::udiv_floor (loop->nb_iterations_estimate, vectorization_factor);
+	= wi::udiv_floor (loop->nb_iterations_estimate, vf);
        if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
 	   && loop->nb_iterations_estimate != 0)
 	 loop->nb_iterations_estimate = loop->nb_iterations_estimate - 1;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 29ef676..e2e7323 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1028,10 +1028,8 @@ extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
 struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *,
 						     struct loop *, edge);
 extern void vect_loop_versioning (loop_vec_info, unsigned int, bool);
-extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree,
-					    unsigned int, bool);
-extern void vect_do_peeling_for_alignment (loop_vec_info, tree,
-					   unsigned int, bool);
+extern void vect_do_peeling (loop_vec_info, tree, tree,
+			     tree *, int, bool, bool);
 extern source_location find_loop_location (struct loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 
@@ -1143,6 +1141,8 @@ extern gimple *vect_force_simple_reduction (loop_vec_info, gimple *, bool,
 					    bool *, bool);
 /* Drive for loop analysis stage.  */
 extern loop_vec_info vect_analyze_loop (struct loop *);
+extern tree vect_build_loop_niters (loop_vec_info);
+extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, bool);
 /* Drive for loop transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 extern loop_vec_info vect_analyze_loop_form (struct loop *);

next             reply	other threads:[~2016-09-06 18:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-06 18:53 Bin Cheng [this message]
2016-09-14 13:23 ` Richard Biener
2016-09-14 16:52   ` Jeff Law
2016-09-21  8:53     ` Bin.Cheng
2016-09-28 16:23       ` Jeff Law
2017-02-04 12:06 ` Jan Hubicka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM4PR0802MB21631E58381E20EE07D06446E7F90@AM4PR0802MB2163.eurprd08.prod.outlook.com \
    --to=bin.cheng@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).