Gimple loop splitting

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Gimple loop splitting
@ 2015-11-12 16:52 Michael Matz
  2015-11-12 21:44 ` Jeff Law
  2016-07-25  7:00 ` Gimple loop splitting Andrew Pinski
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2015-11-12 16:52 UTC (permalink / raw)
  To: gcc-patches

Hello,

this new pass implements loop iteration space splitting for loops that 
contain a conditional that's always true for one part of the iteration 
space and false for the other, i.e. such situations:

  for (i = beg; i < end; i++)
    if (i < p)
      dothis();
    else
      dothat();

this is transformed into roughly:

  for (i = beg; i < p; i++)
    dothis();
  for (; i < end; i++)
    dothat();

Of course, not quite the above as there needs to be provisions for the 
border conditions, if e.g. 'p' is outside the original iteration space, or 
the conditional doesn't directly use the control IV, but some other, or 
the IV runs backwards.  The testcase checks many of these border 
conditions.

This transformation is in itself a good one but can also be an enabler for 
the vectorizer.  It does increase code size, when the loop body contains 
also unconditional code (that one is duplicated), so we only transform hot 
loops.  I'm a bit unsure of the placement of the new pass, or if it should 
be an own pass at all.  Right now I've placed it after unswitching and 
scev_cprop, before loop distribution.  Ideally I think all three, together 
with loop fusion and an gimple unroller should be integrated into one loop 
nest optimizer, alas, we aren't there yet.

I'm planning to work on loop fusion in the future as well, but that's not 
for GCC 6.

I've regstrapped this pass enabled with -O2 on x86-64-linux, without 
regressions.  I've also checked cpu2006 (the non-fortran part) for 
correctness, not yet for performance.  In the end it should probably only 
be enabled for -O3+ (although if the whole loop body is conditional it 
makes sense to also have it with -O2 because code growth is very small 
then).

So, okay for trunk?


Ciao,
Michael.
	* passes.def (pass_loop_split): Add.
	* timevar.def (TV_LOOP_SPLIT): Add.
	* tree-pass.h (make_pass_loop_split): Declare.
	* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
	* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
	cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
	gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
	(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
	split_loop, tree_ssa_split_loops,
	make_pass_loop_split): New functions.
	(pass_data_loop_split): New.
	(pass_loop_split): New.

testsuite/
	* gcc.dg/loop-split.c: New test.

Index: passes.def
===================================================================
--- passes.def	(revision 229763)
+++ passes.def	(working copy)
@@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
 	  NEXT_PASS (pass_dce);
 	  NEXT_PASS (pass_tree_unswitch);
 	  NEXT_PASS (pass_scev_cprop);
+	  NEXT_PASS (pass_loop_split);
 	  NEXT_PASS (pass_record_bounds);
 	  NEXT_PASS (pass_loop_distribution);
 	  NEXT_PASS (pass_copy_prop);
Index: timevar.def
===================================================================
--- timevar.def	(revision 229763)
+++ timevar.def	(working copy)
@@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM                   , "
 DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 229763)
+++ tree-pass.h	(working copy)
@@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===================================================================
--- tree-ssa-loop-manip.h	(revision 229763)
+++ tree-ssa-loop-manip.h	(working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
 		       bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+					    struct loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
 extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
 extern void verify_loop_closed_ssa (bool);
Index: tree-ssa-loop-unswitch.c
===================================================================
--- tree-ssa-loop-unswitch.c	(revision 229763)
+++ tree-ssa-loop-unswitch.c	(working copy)
@@ -31,12 +31,20 @@ along with GCC; see the file COPYING3.
 #include "tree-ssa.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
+#include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "cfganal.h"
 #include "cfgloop.h"
+#include "tree-chrec.h"
+#include "tree-affine.h"
+#include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-inline.h"
 #include "gimple-iterator.h"
+#include "gimple-pretty-print.h"
 #include "cfghooks.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 
 /* This file implements the loop unswitching, i.e. transformation of loops like
 
@@ -842,4 +850,551 @@ make_pass_tree_unswitch (gcc::context *c
   return new pass_tree_unswitch (ctxt);
 }
 
+/* Return true when BB inside LOOP is a potential iteration space
+   split point, i.e. ends with a condition like "IV < comp", which
+   is true on one side of the iteration space and false on the other,
+   and the split point can be computed.  If so, also return the border
+   point in *BORDER and the comparison induction variable in IV.  */
 
+static tree
+split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
+{
+  gimple *last;
+  gcond *stmt;
+  affine_iv iv2;
+
+  /* BB must end in a simple conditional jump.  */
+  last = last_stmt (bb);
+  if (!last || gimple_code (last) != GIMPLE_COND)
+    return NULL_TREE;
+  stmt = as_a <gcond *> (last);
+
+  enum tree_code code = gimple_cond_code (stmt);
+
+  /* Only handle relational comparisons, for equality and non-equality
+     we'd have to split the loop into two loops and a middle statement.  */
+  switch (code)
+    {
+      case LT_EXPR:
+      case LE_EXPR:
+      case GT_EXPR:
+      case GE_EXPR:
+	break;
+      default:
+	return NULL_TREE;
+    }
+
+  if (loop_exits_from_bb_p (loop, bb))
+    return NULL_TREE;
+
+  tree op0 = gimple_cond_lhs (stmt);
+  tree op1 = gimple_cond_rhs (stmt);
+
+  if (!simple_iv (loop, loop, op0, iv, false))
+    return NULL_TREE;
+  if (!simple_iv (loop, loop, op1, &iv2, false))
+    return NULL_TREE;
+
+  /* Make it so, that the first argument of the condition is
+     the looping one.  */
+  if (integer_zerop (iv->step))
+    {
+      std::swap (op0, op1);
+      std::swap (*iv, iv2);
+      code = swap_tree_comparison (code);
+      gimple_cond_set_condition (stmt, code, op0, op1);
+      update_stmt (stmt);
+    }
+
+  if (integer_zerop (iv->step))
+    return NULL_TREE;
+  if (!integer_zerop (iv2.step))
+    return NULL_TREE;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Found potential split point: ");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, " { ");
+      print_generic_expr (dump_file, iv->base, TDF_SLIM);
+      fprintf (dump_file, " + I*");
+      print_generic_expr (dump_file, iv->step, TDF_SLIM);
+      fprintf (dump_file, " } %s ", get_tree_code_name (code));
+      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
+      fprintf (dump_file, "\n");
+    }
+
+  *border = iv2.base;
+  return op0;
+}
+
+/* Given a GUARD conditional stmt inside LOOP, which we want to make always
+   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
+   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
+   exit test statement to loop back only if the GUARD statement will
+   also be true/false in the next iteration.  */
+
+static void
+patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
+		 bool initial_true)
+{
+  edge exit = single_exit (loop);
+  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
+  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
+			     nextval, newbound);
+  update_stmt (stmt);
+
+  edge stay = single_pred_edge (loop->latch);
+
+  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+
+  if (initial_true)
+    {
+      exit->flags |= EDGE_FALSE_VALUE;
+      stay->flags |= EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      exit->flags |= EDGE_TRUE_VALUE;
+      stay->flags |= EDGE_FALSE_VALUE;
+    }
+}
+
+/* Give an induction variable GUARD_IV, and its affine descriptor IV,
+   find the loop phi node in LOOP defining it directly, or create
+   such phi node.  Return that phi node.  */
+
+static gphi *
+find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
+{
+  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
+  gphi *phi;
+  if ((phi = dyn_cast <gphi *> (def))
+      && gimple_bb (phi) == loop->header)
+    return phi;
+
+  /* XXX Create the PHI instead.  */
+  return NULL;
+}
+
+/* Checks if LOOP contains an conditional block whose condition
+   depends on which side in the iteration space it is, and if so
+   splits the iteration space into two loops.  Returns true if the
+   loop was split.  NITER must contain the iteration descriptor for the
+   single exit of LOOP.  */
+
+static bool
+split_loop (struct loop *loop, struct tree_niter_desc *niter)
+{
+  basic_block *bbs;
+  unsigned i;
+  bool changed = false;
+  tree guard_iv;
+  tree border;
+  affine_iv iv;
+
+  bbs = get_loop_body (loop);
+
+  /* Find a splitting opportunity.  */
+  for (i = 0; i < loop->num_nodes; i++)
+    if ((guard_iv = split_at_bb_p (loop, bbs[i], &border, &iv)))
+      {
+	/* Handling opposite steps is not implemented yet.  Neither
+	   is handling different step sizes.  */
+	if ((tree_int_cst_sign_bit (iv.step)
+	     != tree_int_cst_sign_bit (niter->control.step))
+	    || !tree_int_cst_equal (iv.step, niter->control.step))
+	  continue;
+
+	/* Find a loop PHI node that defines guard_iv directly,
+	   or create one doing that.  */
+	gphi *phi = find_or_create_guard_phi (loop, guard_iv, &iv);
+	if (!phi)
+	  continue;
+	gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
+	tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
+						 loop_preheader_edge (loop));
+	enum tree_code guard_code = gimple_cond_code (guard_stmt);
+
+	/* Loop splitting is implemented by versioning the loop, placing
+	   the new loop in front of the old loop, make the first loop iterate
+	   as long as the conditional stays true (or false) and let the
+	   second (original) loop handle the rest of the iterations.
+
+	   First we need to determine if the condition will start being true
+	   or false in the first loop.  */
+	bool initial_true;
+	switch (guard_code)
+	  {
+	    case LT_EXPR:
+	    case LE_EXPR:
+	      initial_true = !tree_int_cst_sign_bit (iv.step);
+	      break;
+	    case GT_EXPR:
+	    case GE_EXPR:
+	      initial_true = tree_int_cst_sign_bit (iv.step);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	  }
+
+	/* Build a condition that will skip the first loop when the
+	   guard condition won't ever be true (or false).  */
+	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
+	if (initial_true)
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+
+	/* Now version the loop, we will then have this situation:
+	   if (!cond)
+	     for (...) {body}   //floop
+	   else
+	     for (...) {body}   //loop
+	   join:  */
+	initialize_original_copy_tables ();
+	basic_block cond_bb;
+	struct loop *floop = loop_version (loop, cond, &cond_bb,
+					   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+					   REG_BR_PROB_BASE, false);
+	gcc_assert (floop);
+	update_ssa (TODO_update_ssa);
+
+	/* Now diddle the exit edge of the first loop (floop->join in the
+	   above) to either go to the common exit (join) or to the second
+	   loop, depending on if there are still iterations left, or not.
+	   We split the floop exit edge and insert a copy of the
+	   original exit expression into the new block, that either
+	   skips the second loop or goes to it.  */
+	edge exit = single_exit (floop);
+	basic_block skip_bb = split_edge (exit);
+	gcond *skip_stmt;
+	gimple_stmt_iterator gsi;
+	edge new_e, skip_e;
+
+	gimple *stmt = last_stmt (exit->src);
+	skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
+				       gimple_cond_lhs (stmt),
+				       gimple_cond_rhs (stmt),
+				       NULL_TREE, NULL_TREE);
+	gsi = gsi_last_bb (skip_bb);
+	gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
+
+	skip_e = EDGE_SUCC (skip_bb, 0);
+	skip_e->flags &= ~EDGE_FALLTHRU;
+	new_e = make_edge (skip_bb, loop_preheader_edge (loop)->src, 0);
+	if (exit->flags & EDGE_TRUE_VALUE)
+	  {
+	    skip_e->flags |= EDGE_TRUE_VALUE;
+	    new_e->flags |= EDGE_FALSE_VALUE;
+	  }
+	else
+	  {
+	    skip_e->flags |= EDGE_FALSE_VALUE;
+	    new_e->flags |= EDGE_TRUE_VALUE;
+	  }
+
+	new_e->count = skip_bb->count;
+	new_e->probability = PROB_LIKELY;
+	new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
+	skip_e->count -= new_e->count;
+	skip_e->probability = inverse_probability (PROB_LIKELY);
+
+	/* Now we have created this situation:
+	     if (!cond) {
+	       for (...) {body; if (cexit) break;}
+	       if (!cexit) goto second;
+	     } else {
+	       second:
+	       for (...) {body; if (cexit) break;}
+	     }
+	     join:
+	   
+	   The second loop can now be entered by skipping the first
+	   loop (the inital values of its PHI nodes will be the
+	   original initial values), or by falling in from the first
+	   loop (the initial values will be the continuation values
+	   from the first loop).  Insert PHI nodes reflecting this
+	   in the pre-header of the second loop.  */
+
+	basic_block rest = loop_preheader_edge (loop)->src;
+	edge skip_first = find_edge (cond_bb, rest);
+	gcc_assert (skip_first);
+
+	edge firste = loop_preheader_edge (floop);
+	edge seconde = loop_preheader_edge (loop);
+	edge firstn = loop_latch_edge (floop);
+	gphi *new_guard_phi = 0;
+	gphi_iterator psi_first, psi_second;
+	for (psi_first = gsi_start_phis (floop->header),
+	     psi_second = gsi_start_phis (loop->header);
+	     !gsi_end_p (psi_first);
+	     gsi_next (&psi_first), gsi_next (&psi_second))
+	  {
+	    tree init, next, new_init;
+	    use_operand_p op;
+	    gphi *phi_first = psi_first.phi ();
+	    gphi *phi_second = psi_second.phi ();
+
+	    if (phi_second == phi)
+	      new_guard_phi = phi_first;
+
+	    init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
+	    next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+	    op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
+	    gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+
+	    /* Prefer using original variable as a base for the new ssa name.
+	       This is necessary for virtual ops, and useful in order to avoid
+	       losing debug info for real ops.  */
+	    if (TREE_CODE (next) == SSA_NAME
+		&& useless_type_conversion_p (TREE_TYPE (next),
+					      TREE_TYPE (init)))
+	      new_init = copy_ssa_name (next);
+	    else if (TREE_CODE (init) == SSA_NAME
+		     && useless_type_conversion_p (TREE_TYPE (init),
+						   TREE_TYPE (next)))
+	      new_init = copy_ssa_name (init);
+	    else if (useless_type_conversion_p (TREE_TYPE (next),
+						TREE_TYPE (init)))
+	      new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
+					     "unrinittmp");
+	    else
+	      new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
+					     "unrinittmp");
+
+	    gphi * newphi = create_phi_node (new_init, rest);
+	    add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
+	    add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
+	    SET_USE (op, new_init);
+	  }
+
+	/* The iterations of the second loop is now already
+	   exactly those that the first loop didn't do, but the
+	   iteration space of the first loop is still the original one.
+	   Build a new one, exactly covering those iterations where
+	   the conditional is true (or false).  For example, from such a loop:
+
+	     for (i = beg, j = beg2; i < end; i++, j++)
+	       if (j < c)  // this is supposed to be true
+	         ...
+
+	   we build new bounds and change the exit condtions such that
+	   it's effectively this:
+
+	     newend = min (end+beg2-beg, c)
+	     for (i = beg; j = beg2; j < newend; i++, j++)
+	       if (j < c)
+	         ...
+
+	   Depending on the direction of the IVs and if the exit tests
+	   are strict or include equality we need to use MIN or MAX,
+	   and add or subtract 1.  */
+
+	gimple_seq stmts = NULL;
+	/* The niter structure contains the after-increment IV, we need
+	   the loop-enter base, so subtract STEP once.  */
+	tree controlbase = force_gimple_operand (niter->control.base,
+						 &stmts, true, NULL_TREE);
+	tree controlstep = niter->control.step;
+	tree enddiff;
+	if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
+	  {
+	    controlstep = gimple_build (&stmts, NEGATE_EXPR,
+					TREE_TYPE (controlstep), controlstep);
+	    enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
+				    TREE_TYPE (controlbase),
+				    controlbase, controlstep);
+	  }
+	else
+	  enddiff = gimple_build (&stmts, MINUS_EXPR,
+				  TREE_TYPE (controlbase),
+				  controlbase, controlstep);
+
+	/* Compute beg-beg2.  */
+	if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
+	  {
+	    tree tem = gimple_convert (&stmts, sizetype, guard_init);
+	    tem = gimple_build (&stmts, NEGATE_EXPR, sizetype, tem);
+	    enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
+				    TREE_TYPE (enddiff),
+				    enddiff, tem);
+	  }
+	else
+	  enddiff = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+				  enddiff, guard_init);
+
+	/* Compute end-(beg-beg2).  */
+	gimple_seq stmts2;
+	tree newbound = force_gimple_operand (niter->bound, &stmts2,
+					      true, NULL_TREE);
+	gimple_seq_add_seq_without_update (&stmts, stmts2);
+
+	if (POINTER_TYPE_P (TREE_TYPE (enddiff))
+	    || POINTER_TYPE_P (TREE_TYPE (newbound)))
+	  {
+	    enddiff = gimple_convert (&stmts, sizetype, enddiff);
+	    enddiff = gimple_build (&stmts, NEGATE_EXPR, sizetype, enddiff);
+	    newbound = gimple_build (&stmts, POINTER_PLUS_EXPR,
+				     TREE_TYPE (newbound),
+				     newbound, enddiff);
+	  }
+	else
+	  newbound = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+				   newbound, enddiff);
+
+	/* Depending on the direction of the IVs the new bound for the first
+	   loop is the minimum or maximum of old bound and border.
+	   Also, if the guard condition isn't strictly less or greater,
+	   we need to adjust the bound.  */ 
+	int addbound = 0;
+	enum tree_code minmax;
+	if (niter->cmp == LT_EXPR)
+	  {
+	    /* GT and LE are the same, inverted.  */
+	    if (guard_code == GT_EXPR || guard_code == LE_EXPR)
+	      addbound = -1;
+	    minmax = MIN_EXPR;
+	  }
+	else
+	  {
+	    gcc_assert (niter->cmp == GT_EXPR);
+	    if (guard_code == GE_EXPR || guard_code == LT_EXPR)
+	      addbound = 1;
+	    minmax = MAX_EXPR;
+	  }
+
+	if (addbound)
+	  {
+	    tree type2 = TREE_TYPE (newbound);
+	    if (POINTER_TYPE_P (type2))
+	      type2 = sizetype;
+	    newbound = gimple_build (&stmts,
+				     POINTER_TYPE_P (TREE_TYPE (newbound))
+				     ? POINTER_PLUS_EXPR : PLUS_EXPR,
+				     TREE_TYPE (newbound),
+				     newbound,
+				     build_int_cst (type2, addbound));
+	  }
+
+	tree newend = gimple_build (&stmts, minmax, TREE_TYPE (border),
+				    border, newbound);
+	if (stmts)
+	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (floop),
+					    stmts);
+
+	/* Now patch the exit block of the first loop to compare
+	   the post-increment value of the guarding IV with the new end
+	   value.  */
+	tree new_guard_next = PHI_ARG_DEF_FROM_EDGE (new_guard_phi,
+						     loop_latch_edge (floop));
+	patch_loop_exit (floop, guard_stmt, new_guard_next, newend,
+			 initial_true);
+
+	/* Finally patch out the two copies of the condition to be always
+	   true/false (or opposite).  */
+	gcond *force_true = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
+	gcond *force_false = as_a<gcond *> (last_stmt (bbs[i]));
+	if (!initial_true)
+	  std::swap (force_true, force_false);
+	gimple_cond_make_true (force_true);
+	gimple_cond_make_false (force_false);
+	update_stmt (force_true);
+	update_stmt (force_false);
+
+	free_original_copy_tables ();
+
+	/* We destroyed LCSSA form above.  Eventually we might be able
+	   to fix it on the fly, for now simply punt and use the helper.  */
+	rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, floop);
+
+	changed = true;
+	if (dump_file && (dump_flags & TDF_DETAILS))
+	  fprintf (dump_file, ";; Loop split.\n");
+
+	/* Only deal with the first opportunity.  */
+	break;
+      }
+
+  free (bbs);
+  return changed;
+}
+
+/* Main entry point.  Perform loop splitting on all suitable loops.  */
+
+static unsigned int
+tree_ssa_split_loops (void)
+{
+  struct loop *loop;
+  bool changed = false;
+
+  gcc_assert (scev_initialized_p ());
+  /* Go through all loops starting from innermost.  */
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      struct tree_niter_desc niter;
+      if (single_exit (loop)
+	  /* ??? We could handle non-empty latches when we split
+	     the latch edge (not the exit edge), and put the new
+	     exit condition in the new block.  OTOH this executes some
+	     code unconditionally that might have been skipped by the
+	     original exit before.  */
+	  && empty_block_p (loop->latch)
+	  && !optimize_loop_for_size_p (loop)
+	  && number_of_iterations_exit (loop, single_exit (loop), &niter,
+					false, true)
+	  /* We can't yet handle loops controlled by a != predicate.  */
+	  && niter.cmp != NE_EXPR)
+	changed |= split_loop (loop, &niter);
+    }
+
+  if (changed)
+    return TODO_cleanup_cfg;
+  return 0;
+}
+
+/* Loop splitting pass.  */
+
+namespace {
+
+const pass_data pass_data_loop_split =
+{
+  GIMPLE_PASS, /* type */
+  "lsplit", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_LOOP_SPLIT, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_loop_split : public gimple_opt_pass
+{
+public:
+  pass_loop_split (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_loop_split, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return optimize >= 2; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_loop_split
+
+unsigned int
+pass_loop_split::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  return tree_ssa_split_loops ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_loop_split (gcc::context *ctxt)
+{
+  return new pass_loop_split (ctxt);
+}
Index: testsuite/gcc.dg/loop-split.c
===================================================================
--- testsuite/gcc.dg/loop-split.c	(revision 0)
+++ testsuite/gcc.dg/loop-split.c	(working copy)
@@ -0,0 +1,141 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
+
+#ifdef __cplusplus
+extern "C" int printf (const char *, ...);
+extern "C" void abort (void);
+#else
+extern int printf (const char *, ...);
+extern void abort (void);
+#endif
+
+#ifndef TRACE
+#define TRACE 0
+#endif
+
+#define loop(beg,step,beg2,cond1,cond2) \
+    do \
+      { \
+	sum = 0; \
+        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
+          { \
+            if (cond2) { \
+	      if (TRACE > 1) printf ("a: %d %d\n", i, j); \
+              sum += a[i]; \
+	    } else { \
+	      if (TRACE > 1) printf ("b: %d %d\n", i, j); \
+              sum += b[i]; \
+	    } \
+          } \
+	if (TRACE > 0) printf ("sum: %d\n", sum); \
+	check = check * 47 + sum; \
+      } while (0)
+
+#if 1
+unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
+					       int c, int *a, int *b, int beg2)
+{
+  unsigned check = 0;
+  int sum;
+  int i, j;
+  loop (beg, 1, beg2, i < end, j < c);
+  loop (beg, 1, beg2, i <= end, j < c);
+  loop (beg, 1, beg2, i < end, j <= c);
+  loop (beg, 1, beg2, i <= end, j <= c);
+  loop (beg, 1, beg2, i < end, j > c);
+  loop (beg, 1, beg2, i <= end, j > c);
+  loop (beg, 1, beg2, i < end, j >= c);
+  loop (beg, 1, beg2, i <= end, j >= c);
+  beg2 += end-beg;
+  loop (end, -1, beg2, i >= beg, j >= c);
+  loop (end, -1, beg2, i >= beg, j > c);
+  loop (end, -1, beg2, i > beg, j >= c);
+  loop (end, -1, beg2, i > beg, j > c);
+  loop (end, -1, beg2, i >= beg, j <= c);
+  loop (end, -1, beg2, i >= beg, j < c);
+  loop (end, -1, beg2, i > beg, j <= c);
+  loop (end, -1, beg2, i > beg, j < c);
+  return check;
+}
+
+#else
+
+int __attribute__((noinline, noclone)) f (int beg, int end, int step,
+					  int c, int *a, int *b, int beg2)
+{
+  int sum = 0;
+  int i, j;
+  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
+    {
+      // i - j == X --> i = X + j
+      // --> i < end == X+j < end == j < end - X
+      // --> newend = end - (i_init - j_init)
+      // j < end-X && j < c --> j < min(end-X,c)
+      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
+      //if (j < c)
+      if (j >= c)
+	printf ("a: %d %d\n", i, j);
+      /*else
+	printf ("b: %d %d\n", i, j);*/
+	/*sum += a[i];
+      else
+	sum += b[i];*/
+    }
+  return sum;
+}
+
+int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
+					  int *c, int *a, int *b, int *beg2)
+{
+  int sum = 0;
+  int *i, *j;
+  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+    {
+      if (j <= c)
+	printf ("%d %d\n", i - beg, j - beg);
+	/*sum += a[i];
+      else
+	sum += b[i];*/
+    }
+  return sum;
+}
+#endif
+
+extern int printf (const char *, ...);
+
+int main ()
+{
+  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
+  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
+  int c;
+  int diff = 0;
+  unsigned check = 0;
+  //dotest (0, 9, 1, -1, a+5, b+5, -1);
+  //return 0;
+  //f (0, 9, 1, -1, a+5, b+5, -1);
+  //return 0;
+  for (diff = -5; diff <= 5; diff++)
+    {
+      for (c = -1; c <= 10; c++)
+	{
+#if 0
+	  int s = f (0, 9, 1, c, a+5, b+5, diff);
+	  //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
+	  printf ("%d ", s);
+#else
+	  if (TRACE > 0)
+	    printf ("check %d %d\n", c, diff);
+	  check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
+#endif
+	}
+      //printf ("\n");
+    }
+  //printf ("%u\n", check);
+  if (check != 3213344948)
+    abort ();
+  return 0;
+}
+
+/* All 16 loops in dotest should be split.  */
+/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting
  2015-11-12 16:52 Gimple loop splitting Michael Matz
@ 2015-11-12 21:44 ` Jeff Law
  2015-11-16 16:06   ` Michael Matz
  2016-07-25  7:00 ` Gimple loop splitting Andrew Pinski
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-11-12 21:44 UTC (permalink / raw)
  To: Michael Matz, gcc-patches

On 11/12/2015 09:52 AM, Michael Matz wrote:
> Hello,
>
> this new pass implements loop iteration space splitting for loops that
> contain a conditional that's always true for one part of the iteration
> space and false for the other, i.e. such situations:
FWIW, Ajit suggested the same transformation earlier this year.  During 
that discussion Richi indicated that for hmmer this transformation would 
enable vectorization.

>
> This transformation is in itself a good one but can also be an enabler for
> the vectorizer.
Agreed.


   It does increase code size, when the loop body contains
> also unconditional code (that one is duplicated), so we only transform hot
> loops.
Probably ought to be disabled when we're not optimizing for speed as well.




   I'm a bit unsure of the placement of the new pass, or if it should
> be an own pass at all.  Right now I've placed it after unswitching and
> scev_cprop, before loop distribution.  Ideally I think all three, together
> with loop fusion and an gimple unroller should be integrated into one loop
> nest optimizer, alas, we aren't there yet.
Given its impact on the looping structure, I'd think early in the loop 
optimizer.  Given the similarities with unswitching, I think 
before/after unswitching is a natural first cut.  We can always iterate 
if it looks like putting it elsewhere would make sense.



> I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> regressions.  I've also checked cpu2006 (the non-fortran part) for
> correctness, not yet for performance.  In the end it should probably only
> be enabled for -O3+ (although if the whole loop body is conditional it
> makes sense to also have it with -O2 because code growth is very small
> then).
Very curious on the performance side, so if you could get some #s on 
that, it'd be greatly appreciated.

I'd be comfortable with this at -O2, but won't object if you'd prefer -O3.


>
> So, okay for trunk?
>
>
> Ciao,
> Michael.
> 	* passes.def (pass_loop_split): Add.
> 	* timevar.def (TV_LOOP_SPLIT): Add.
> 	* tree-pass.h (make_pass_loop_split): Declare.
> 	* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> 	* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> 	cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
> 	gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
> 	(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
> 	split_loop, tree_ssa_split_loops,
> 	make_pass_loop_split): New functions.
> 	(pass_data_loop_split): New.
> 	(pass_loop_split): New.
>
> testsuite/
> 	* gcc.dg/loop-split.c: New test.
Please clean up the #if 0/#if 1 code in the new tests.  You might also 
want to clean out the TRACE stuff.  Essentially the tests look like you 
just dropped in a test you'd been running by hand until now :-)

I don't see any negative tests -- ie tests that should not be split due 
to boundary conditions.  Do you have any from development?  If so it'd 
be good to have those too.

>
> Index: tree-ssa-loop-manip.h
> ===================================================================
> --- tree-ssa-loop-manip.h	(revision 229763)
> +++ tree-ssa-loop-manip.h	(working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
>   extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>   		       bool, tree *, tree *);
> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
> +					    struct loop *);
>   extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>   extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>   extern void verify_loop_closed_ssa (bool);
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c	(revision 229763)
> +++ tree-ssa-loop-unswitch.c	(working copy)
Given the amount of new code, unless there's a strong need, I'd prefer 
this transformation to be implemented in its own file.



> +
> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> +   find the loop phi node in LOOP defining it directly, or create
> +   such phi node.  Return that phi node.  */
> +
> +static gphi *
> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> +  gphi *phi;
> +  if ((phi = dyn_cast <gphi *> (def))
> +      && gimple_bb (phi) == loop->header)
> +    return phi;
> +
> +  /* XXX Create the PHI instead.  */
> +  return NULL;
So right now we just punt if we need to create the PHI?  Does that 
happen with any kind of regularity in practice?


> +}
> +
> +/* Checks if LOOP contains an conditional block whose condition
> +   depends on which side in the iteration space it is, and if so
> +   splits the iteration space into two loops.  Returns true if the
> +   loop was split.  NITER must contain the iteration descriptor for the
> +   single exit of LOOP.  */
> +
> +static bool
> +split_loop (struct loop *loop, struct tree_niter_desc *niter)
This should probably be broken up a bit more.  It's loooong as-is.

Without looking at how much stuff would have to be passed around, 
diddling the exit edge of the first loop, phi updates for the 2nd loop, 
fix iteration space of 2nd loop, exit block fixup might be a good 
initial cut at breaking this down into something of manageable size. 
Not sure if the setup and initial versioning should be broken out or not.


> +	initialize_original_copy_tables ();
> +	basic_block cond_bb;
> +	struct loop *floop = loop_version (loop, cond, &cond_bb,
> +					   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> +					   REG_BR_PROB_BASE, false);
> +	gcc_assert (floop);
> +	update_ssa (TODO_update_ssa);
> +
> +	/* Now diddle the exit edge of the first loop (floop->join in the
> +	   above) to either go to the common exit (join) or to the second
> +	   loop, depending on if there are still iterations left, or not.
> +	   We split the floop exit edge and insert a copy of the
> +	   original exit expression into the new block, that either
> +	   skips the second loop or goes to it.  */
So after diddling, haven't we mucked up the dominator tree and the SSA 
graph?   You're iterating over each PHI in two loop headers and fixing 
the SSA graph by hand AFAICT.   But ISTM the dominator tree is still 
mucked up, right?  I'm thinking specifically about the 2nd loop.  Though 
perhaps it just works since after all your transformations it'll still 
be immediately dominated by the same block as before your transformations.

Overall I think this looks real good.  THe biggest problem IMHO is 
breaking down that monster function a bit.  I'm a bit concerned by the 
dominator tree state.  Worst case is we have to rebuild the dominators 
before ensuring we're LCSSA form, and even that doesn't seem too bad. 
As I mentioned, it may actually be the case that we're OK on the 
dominator tree, kindof by accident more than design.


Jeff



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting
  2015-11-12 21:44 ` Jeff Law
@ 2015-11-16 16:06   ` Michael Matz
  2015-11-16 23:27     ` Jeff Law
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2015-11-16 16:06 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Hi,

On Thu, 12 Nov 2015, Jeff Law wrote:

> > this new pass implements loop iteration space splitting for loops that
> > contain a conditional that's always true for one part of the iteration
> > space and false for the other, i.e. such situations:
> FWIW, Ajit suggested the same transformation earlier this year.  During that
> discussion Richi indicated that for hmmer this transformation would enable
> vectorization.

It's a prerequisite indeed, but not enough in itself.  The next problem 
will be that only parts of access chains inside the hot loop are 
vectorizable, but for that those parts need to be disambiguated.  ICC is 
doing that by a massive chain of conditionals testing non-overlapping of 
the respective arrays at runtime.  Our vectorizer could also do that 
(possibly by increasing the allowed number of conditionals), but the next 
problem then is that one of these (then indeed separated) parts is not 
vectorizable by our vectorizer: it's a 'a[i] = f(a[i-1])' dependency that 
can't yet be handled by us.  If the separation of parts would be done by 
loop distribution that would be fine (we'd have separate loops for the 
parts, some of them vectorizable), but our loop distribution can't do 
runtime disambiguation, only our vectorizer.

hmmer is actually quite interesting because it's a fairly isolated hot 
loop posing quite challenging problems for us :)

> 
>   It does increase code size, when the loop body contains
> > also unconditional code (that one is duplicated), so we only transform hot
> > loops.
> 
> Probably ought to be disabled when we're not optimizing for speed as well.

That should be dealt with by '!optimize_loop_for_size_p (loop)'.

> > I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> > regressions.  I've also checked cpu2006 (the non-fortran part) for
> > correctness, not yet for performance.  In the end it should probably only
> > be enabled for -O3+ (although if the whole loop body is conditional it
> > makes sense to also have it with -O2 because code growth is very small
> > then).
> 
> Very curious on the performance side, so if you could get some #s on that,
> it'd be greatly appreciated.

My test machine misbehaved over the weekend, but as soon as I have them 
I'll update here.

> > testsuite/
> > 	* gcc.dg/loop-split.c: New test.
> 
> Please clean up the #if 0/#if 1 code in the new tests.

Actually I'd prefer if that test contains the by-hand code and the TRACE 
stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...

> You might also want to clean out the TRACE stuff.  Essentially the tests 
> look like you just dropped in a test you'd been running by hand until 
> now :-)

... the reason being, that bugs in the splitter are somewhat unwieldy to 
debug by just staring at the dumps, you only get a checksum mismatch, so 
TRACE=1 is for finding out which of the params and loops is actually 
miscompiled, TRACE=2 for finding the specific iteration that's broken, and 
the #if0 code for putting that situation into a non-macroized and smaller 
function than dotest.  (That's actually how I've run the testcase after I 
had it basically working, extending dotest with a couple more lines, aka 
example loop sitations, adjusting the checksum, and then making a face and 
scratching my head and mucking with the TRACE and #if0 macros :) ).

> I don't see any negative tests -- ie tests that should not be split due 
> to boundary conditions.  Do you have any from development?

Good point, I had some but only ones where I was able to extend the 
splitters to cover them.  I'll think of some that really shouldn't be 
split.

> > Index: tree-ssa-loop-unswitch.c
> > ===================================================================
> > --- tree-ssa-loop-unswitch.c	(revision 229763)
> > +++ tree-ssa-loop-unswitch.c	(working copy)
> Given the amount of new code, unless there's a strong need, I'd prefer this
> transformation to be implemented in its own file.

Okay.

> > +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> > +   find the loop phi node in LOOP defining it directly, or create
> > +   such phi node.  Return that phi node.  */
> > +
> > +static gphi *
> > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv *
> > /*iv*/)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> > +  gphi *phi;
> > +  if ((phi = dyn_cast <gphi *> (def))
> > +      && gimple_bb (phi) == loop->header)
> > +    return phi;
> > +
> > +  /* XXX Create the PHI instead.  */
> > +  return NULL;
> 
> So right now we just punt if we need to create the PHI?  Does that 
> happen with any kind of regularity in practice?

Only with such situations:

  for (int i = start; i < end; i++) {
    if (i + offset < bound)
      ...
  }

Here the condition-IV is not directly defined by a PHI node.  If it 
happens often I don't know, I guess the usual situation is testing the 
control IV directly.  The deficiency is not hard to fix.

> > +static bool
> > +split_loop (struct loop *loop, struct tree_niter_desc *niter)
> This should probably be broken up a bit more.  It's loooong as-is.
> 
> Without looking at how much stuff would have to be passed around, 
> diddling the exit edge of the first loop, phi updates for the 2nd loop, 
> fix iteration space of 2nd loop, exit block fixup might be a good 
> initial cut at breaking this down into something of manageable size.

Thanks, I'll do that.

> > +	initialize_original_copy_tables ();
> > +	basic_block cond_bb;
> > +	struct loop *floop = loop_version (loop, cond, &cond_bb,
> > +					   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> > +					   REG_BR_PROB_BASE, false);
> > +	gcc_assert (floop);
> > +	update_ssa (TODO_update_ssa);
> > +
> > +	/* Now diddle the exit edge of the first loop (floop->join in the
> > +	   above) to either go to the common exit (join) or to the second
> > +	   loop, depending on if there are still iterations left, or not.
> > +	   We split the floop exit edge and insert a copy of the
> > +	   original exit expression into the new block, that either
> > +	   skips the second loop or goes to it.  */
> 
> So after diddling, haven't we mucked up the dominator tree and the SSA 
> graph? You're iterating over each PHI in two loop headers and fixing the 
> SSA graph by hand AFAICT.  But ISTM the dominator tree is still mucked 
> up, right?

I think I convinced myself on paper that the dominator tree is correct due 
to our helpers doing the right thing (loop_version() for the initial 
loop copying and split_edge for the above diddling).  Let's see if I can 
paint some ASCII art.  So, after loop_version (which updates dom) we 
have:

               .------if (cond)-------.
               v                      v
             pre1                   pre2
              |                      |
             h1<----.               h2<----.
              |     |                |     |
          .--ex1    |        .------ex2    |
          |    \    |        |        \    |
          |    l1---'        |        l2---'
          |                  |
          |                  |
          '--X--------->join<'

At this point dominators are all correct (due to loop_version updating 
them), in particular dom(pre1)==dom(pre2)==if(cond).  Now we split 
ex1->join at X, and split_edge also updates them (trivially), but we 
insert a new edge from split_bb to pre2.  There are no paths from region2 
into region1, and anything in region2 except pre2 is still dominated by 
pre2 (or something further down), so if anything changes, then dom(pre2).

               .------if (cond)----.
               v                   |
             pre1                  |
              |                    |
             h1<----.              |
              |     |              |
             ex1    |              |
              | \   |              |
              |  l1-'              |
              v                    |
          .-split-----------.      |
          |                 v      |
          |               pre2<----'
          |                |
          |               h2<----.
          |                |     |
          |               ex2    |
          |                | \   |
          |                | l2--'
          |              .-'
          '------>join<--'

But there's a path directly to pre2, skipping whole region1, so dom(pre2) 
must be still if(cond), as originally.  Also dom(join) doesn't change, 
because what was first a normal diamond between 
if(cond),region1,region2,join now is a meddled diamond with paths from 
region1 to region2, but not back, so the dominator of the join block still 
is the if(cond) block.

This is all true if the internal structure of region1/region2 is sensible, 
and single_exit() regions are such.  Even multiple exits to something 
behind join wouldn't change this, but we don't even have to think about 
this.

In addition, anything not updating dominators correctly would scream 
loudly in the verifier.

The SSA tree is correct after loop_version() and split_edge.  The new edge 
split_bb->pre2 needs the adjustments in that loop over loop PHI nodes.  
That walk must catch everything, if it wouldn't then that would mean a use 
in region2 that's defined in region1, that wasn't originally dominated by 
the def (and hence must have been a loop-carried value and hence be 
defined in the loop header PHI block).

> Overall I think this looks real good.  THe biggest problem IMHO is 
> breaking down that monster function a bit.  I'm a bit concerned by the 
> dominator tree state.  Worst case is we have to rebuild the dominators 
> before ensuring we're LCSSA form, and even that doesn't seem too bad.

Actually keeping LCSSA form correct is doable as well, but needs another 
loop over one or the other PHI nodes.  I punted for now and called 
rewrite_into_loop_closed_ssa_1, which actually isn't too expensive for a 
single loop.

> As I mentioned, it may actually be the case that we're OK on the 
> dominator tree, kindof by accident more than design.

I'm pretty sure it is correct, and it is so by design :)

Thanks for the feedback, I'll update the patch accordingly.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting
  2015-11-16 16:06   ` Michael Matz
@ 2015-11-16 23:27     ` Jeff Law
  2015-12-01 16:47       ` Gimple loop splitting v2 Michael Matz
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-11-16 23:27 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On 11/16/2015 09:05 AM, Michael Matz wrote:
> It's a prerequisite indeed, but not enough in itself.
Sigh.  OK.  One can always hope.



>
> hmmer is actually quite interesting because it's a fairly isolated hot
> loop posing quite challenging problems for us :)
Sounds like it.  Essentially, it's a TODO list :-)

>> Probably ought to be disabled when we're not optimizing for speed as well.
>
> That should be dealt with by '!optimize_loop_for_size_p (loop)'.
Doh, must have missed that.

>>
>> Please clean up the #if 0/#if 1 code in the new tests.
>
> Actually I'd prefer if that test contains the by-hand code and the TRACE
> stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...
>
>> You might also want to clean out the TRACE stuff.  Essentially the tests
>> look like you just dropped in a test you'd been running by hand until
>> now :-)
>
> ... the reason being, that bugs in the splitter are somewhat unwieldy to
> debug by just staring at the dumps, you only get a checksum mismatch, so
> TRACE=1 is for finding out which of the params and loops is actually
> miscompiled, TRACE=2 for finding the specific iteration that's broken, and
> the #if0 code for putting that situation into a non-macroized and smaller
> function than dotest.  (That's actually how I've run the testcase after I
> had it basically working, extending dotest with a couple more lines, aka
> example loop sitations, adjusting the checksum, and then making a face and
> scratching my head and mucking with the TRACE and #if0 macros :) ).
OK, if you want to keep them, then  have a consistent way to turn them 
on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
someone else what to turn on/off if they need to debug this stuff.

>
>> I don't see any negative tests -- ie tests that should not be split due
>> to boundary conditions.  Do you have any from development?
>
> Good point, I had some but only ones where I was able to extend the
> splitters to cover them.  I'll think of some that really shouldn't be
> split.
If you've got them, certainly add them.  Though I realize they may get 
lost over time.

>
> Only with such situations:
>
>    for (int i = start; i < end; i++) {
>      if (i + offset < bound)
>        ...
>    }
>
> Here the condition-IV is not directly defined by a PHI node.  If it
> happens often I don't know, I guess the usual situation is testing the
> control IV directly.  The deficiency is not hard to fix.
I'm comfortable waiting until we see the need.

> I think I convinced myself on paper that the dominator tree is correct due
> to our helpers doing the right thing (loop_version() for the initial
> loop copying and split_edge for the above diddling).  Let's see if I can
> paint some ASCII art.  So, after loop_version (which updates dom) we
> have:
OK.  I was worried about the next step -- where we insert the 
conditional on the exit from pre1 to have it transfer to join or pre2.

But in that case, the immediate dominator of pre2 & join is still the 
initial if statement.  So I think we're OK.  That was the conclusion I 
was starting to come to yesterday, having the ascii art makes it pretty 
clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
explicitly and then everything seems so clear and simple.

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Gimple loop splitting v2
  2015-11-16 23:27     ` Jeff Law
@ 2015-12-01 16:47       ` Michael Matz
  2015-12-01 22:57         ` Jeff Law
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2015-12-01 16:47 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Hi,

On Mon, 16 Nov 2015, Jeff Law wrote:

> OK, if you want to keep them, then have a consistent way to turn them 
> on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
> someone else what to turn on/off if they need to debug this stuff.

> > > I don't see any negative tests -- ie tests that should not be split 
> > > due to boundary conditions.  Do you have any from development?
> > 
> > Good point, I had some but only ones where I was able to extend the 
> > splitters to cover them.  I'll think of some that really shouldn't be 
> > split.
> If you've got them, certainly add them.  Though I realize they may get 
> lost over time.

Actually, thinking a bit more about this, I don't have any that wouldn't 
be merely restrictions in the implementation that couldn't be lifted in 
the future (e.g. unequal step sizes), so I've added no additional ones.

> But in that case, the immediate dominator of pre2 & join is still the 
> initial if statement.  So I think we're OK.  That was the conclusion I 
> was starting to come to yesterday, having the ascii art makes it pretty 
> clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
> explicitly and then everything seems so clear and simple.

So, this second version should reflect the review.  I've moved everything 
to a new file, split the long function into several logically separate 
ones, and even included ascii art in the comments :)  The testcase got a 
comment about what to #define for debugging.  I've included the pass to 
-O3 or alternatively if profile-use is on, similar to funswitch-loops.  
I've also added a proper -fsplit-loops option.

There's two functional changes in v2: a bugfix to not try splitting a 
non-iterating loop (irritatingly such a look returns true from 
number_of_iterations_exit, but with an ERROR_MARK comparator), and a 
limitation to avoid combinatorical explosion in artificial testcases: Once 
we have done a splitting, we don't do any in that loops parents (we may 
still do splitting in siblings or childs of siblings).

I've also done some measurements: first, bootstrap time is unaffected, and 
regstrapping succeeds without regressions when I activate the pass by 
default.  Then SPECcpu2006: build times are unaffected, everything builds 
and works also with -fsplit-loops, performance is mostly unaffected, base 
is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops.

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  
---------
400.perlbench    9770        325       30.1 *    9770        323       30.3 *  
401.bzip2        9650        382       25.2 *    9650        382       25.3 *  
403.gcc          8050        242       33.3 *    8050        241       33.4 *  
429.mcf          9120        311       29.3 *    9120        311       29.3 *  
445.gobmk       10490        392       26.8 *   10490        391       26.8 *  
456.hmmer        9330        345       27.0 *    9330        342       27.3 *  
458.sjeng       12100        422       28.7 *   12100        420       28.8 *  
462.libquantum  20720        308       67.3 *   20720        308       67.3 *  
464.h264ref     22130        423       52.3 *   22130        423       52.3 *  
471.omnetpp      6250        273       22.9 *    6250        273       22.9 *  
473.astar        7020        311       22.6 *    7020        311       22.6 *  
483.xalancbmk    6900        191       36.2 *    6900        190       36.2 *  
 Est. SPECint_base2006                 31.7
 Est. SPECint2006                                                      31.7

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  
---------
410.bwaves      13590        235       57.7 *   13590        235       57.8 *  
416.gamess                                  NR                              NR 
433.milc         9180        347       26.5 *    9180        345       26.6 *  
434.zeusmp       9100        269       33.9 *    9100        268       33.9 *  
435.gromacs      7140        260       27.4 *    7140        262       27.3 *  
436.cactusADM   11950        237       50.5 *   11950        240       49.9 *  
437.leslie3d     9400        228       41.3 *    9400        228       41.2 *  
444.namd         8020        312       25.7 *    8020        311       25.7 *  
447.dealII      11440        254       45.0 *   11440        254       45.0 *  
450.soplex       8340        201       41.4 *    8340        202       41.4 *  
453.povray                                  NR                              NR 
454.calculix     8250        282       29.2 *    8250        283       29.2 *  
459.GemsFDTD    10610        310       34.3 *   10610        309       34.3 *  
465.tonto        9840        683       14.4 *    9840        684       14.4 *  
470.lbm         13740        224       61.2 *   13740        224       61.3 *  
481.wrf         11170        291       38.4 *   11170        291       38.4 *  
482.sphinx3     19490        377       51.7 *   19490        377       51.6 *  
 Est. SPECfp_base2006                  36.3
 Est. SPECfp2006                                                       36.3

The 1% improvements and degradations are all inside the normal result 
variations on this machine (I have the feeling that the hmmer improvement 
is stable, and will recheck this).  Not all of the above had loops split 
at all, only: SPECint: 400.perlbench, 403.gcc, 445.gobmk, 456.hmmer, 
462.libquantum, 464.h264ref, 471.omnetpp and SPECfp: 435.gromacs, 
436.cactusADM, 447.dealII, 454.calculix.

So, okay for trunk?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2015-12-01 16:47       ` Gimple loop splitting v2 Michael Matz
@ 2015-12-01 22:57         ` Jeff Law
  2015-12-02 13:23           ` Michael Matz
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-12-01 22:57 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On 12/01/2015 09:46 AM, Michael Matz wrote:
> Hi,
>
> So, okay for trunk?
-ENOPATCH

Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2015-12-01 22:57         ` Jeff Law
@ 2015-12-02 13:23           ` Michael Matz
  2015-12-05  7:55             ` Jeff Law
  2016-07-25 20:57             ` Andrew Pinski
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2015-12-02 13:23 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Hi,

On Tue, 1 Dec 2015, Jeff Law wrote:

> > So, okay for trunk?
> -ENOPATCH

Sigh :)
Here it is.


Ciao,
Michael.
	* common.opt (-fsplit-loops): New flag.
	* passes.def (pass_loop_split): Add.
	* opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
	(enable_fdo_optimizations): Add loop splitting.
	* timevar.def (TV_LOOP_SPLIT): Add.
	* tree-pass.h (make_pass_loop_split): Declare.
	* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
	* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
	* tree-ssa-loop-split.c: New file.
	* Makefile.in (OBJS): Add tree-ssa-loop-split.o.
	* doc/invoke.texi (fsplit-loops): Document.
	* doc/passes.texi (Loop optimization): Add paragraph about loop
	splitting.

testsuite/
	* gcc.dg/loop-split.c: New test.

Index: common.opt
===================================================================
--- common.opt	(revision 231115)
+++ common.opt	(working copy)
@@ -2453,6 +2457,10 @@ funswitch-loops
 Common Report Var(flag_unswitch_loops) Optimization
 Perform loop unswitching.
 
+fsplit-loops
+Common Report Var(flag_split_loops) Optimization
+Perform loop splitting.
+
 funwind-tables
 Common Report Var(flag_unwind_tables) Optimization
 Just generate unwind tables for exception handling.
Index: passes.def
===================================================================
--- passes.def	(revision 231115)
+++ passes.def	(working copy)
@@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
 	  NEXT_PASS (pass_dce);
 	  NEXT_PASS (pass_tree_unswitch);
 	  NEXT_PASS (pass_scev_cprop);
+	  NEXT_PASS (pass_loop_split);
 	  NEXT_PASS (pass_record_bounds);
 	  NEXT_PASS (pass_loop_distribution);
 	  NEXT_PASS (pass_copy_prop);
Index: opts.c
===================================================================
--- opts.c	(revision 231115)
+++ opts.c	(working copy)
@@ -532,6 +532,7 @@ static const struct default_options defa
        regardless of them being declared inline.  */
     { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
@@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
     opts->x_flag_ipa_cp_alignment = value;
   if (!opts_set->x_flag_predictive_commoning)
     opts->x_flag_predictive_commoning = value;
+  if (!opts_set->x_flag_split_loops)
+    opts->x_flag_split_loops = value;
   if (!opts_set->x_flag_unswitch_loops)
     opts->x_flag_unswitch_loops = value;
   if (!opts_set->x_flag_gcse_after_reload)
Index: timevar.def
===================================================================
--- timevar.def	(revision 231115)
+++ timevar.def	(working copy)
@@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM                   , "
 DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 231115)
+++ tree-pass.h	(working copy)
@@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===================================================================
--- tree-ssa-loop-manip.h	(revision 231115)
+++ tree-ssa-loop-manip.h	(working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
 		       bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+					    struct loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
 extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
 extern void verify_loop_closed_ssa (bool);
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 231115)
+++ Makefile.in	(working copy)
@@ -1474,6 +1474,7 @@ OBJS = \
 	tree-ssa-loop-manip.o \
 	tree-ssa-loop-niter.o \
 	tree-ssa-loop-prefetch.o \
+	tree-ssa-loop-split.o \
 	tree-ssa-loop-unswitch.o \
 	tree-ssa-loop.o \
 	tree-ssa-math-opts.o \
Index: tree-ssa-loop-split.c
===================================================================
--- tree-ssa-loop-split.c	(revision 0)
+++ tree-ssa-loop-split.c	(working copy)
@@ -0,0 +1,686 @@
+/* Loop splitting.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "fold-const.h"
+#include "tree-cfg.h"
+#include "tree-ssa.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-ssa-loop.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-into-ssa.h"
+#include "cfgloop.h"
+#include "tree-scalar-evolution.h"
+#include "gimple-iterator.h"
+#include "gimple-pretty-print.h"
+#include "cfghooks.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
+
+/* This file implements loop splitting, i.e. transformation of loops like
+
+   for (i = 0; i < 100; i++)
+     {
+       if (i < 50)
+         A;
+       else
+         B;
+     }
+
+   into:
+
+   for (i = 0; i < 50; i++)
+     {
+       A;
+     }
+   for (; i < 100; i++)
+     {
+       B;
+     }
+
+   */
+
+/* Return true when BB inside LOOP is a potential iteration space
+   split point, i.e. ends with a condition like "IV < comp", which
+   is true on one side of the iteration space and false on the other,
+   and the split point can be computed.  If so, also return the border
+   point in *BORDER and the comparison induction variable in IV.  */
+
+static tree
+split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
+{
+  gimple *last;
+  gcond *stmt;
+  affine_iv iv2;
+
+  /* BB must end in a simple conditional jump.  */
+  last = last_stmt (bb);
+  if (!last || gimple_code (last) != GIMPLE_COND)
+    return NULL_TREE;
+  stmt = as_a <gcond *> (last);
+
+  enum tree_code code = gimple_cond_code (stmt);
+
+  /* Only handle relational comparisons, for equality and non-equality
+     we'd have to split the loop into two loops and a middle statement.  */
+  switch (code)
+    {
+      case LT_EXPR:
+      case LE_EXPR:
+      case GT_EXPR:
+      case GE_EXPR:
+	break;
+      default:
+	return NULL_TREE;
+    }
+
+  if (loop_exits_from_bb_p (loop, bb))
+    return NULL_TREE;
+
+  tree op0 = gimple_cond_lhs (stmt);
+  tree op1 = gimple_cond_rhs (stmt);
+
+  if (!simple_iv (loop, loop, op0, iv, false))
+    return NULL_TREE;
+  if (!simple_iv (loop, loop, op1, &iv2, false))
+    return NULL_TREE;
+
+  /* Make it so, that the first argument of the condition is
+     the looping one (only swap.  */
+  if (!integer_zerop (iv2.step))
+    {
+      std::swap (op0, op1);
+      std::swap (*iv, iv2);
+      code = swap_tree_comparison (code);
+      gimple_cond_set_condition (stmt, code, op0, op1);
+      update_stmt (stmt);
+    }
+  else if (integer_zerop (iv->step))
+    return NULL_TREE;
+  if (!integer_zerop (iv2.step))
+    return NULL_TREE;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Found potential split point: ");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, " { ");
+      print_generic_expr (dump_file, iv->base, TDF_SLIM);
+      fprintf (dump_file, " + I*");
+      print_generic_expr (dump_file, iv->step, TDF_SLIM);
+      fprintf (dump_file, " } %s ", get_tree_code_name (code));
+      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
+      fprintf (dump_file, "\n");
+    }
+
+  *border = iv2.base;
+  return op0;
+}
+
+/* Given a GUARD conditional stmt inside LOOP, which we want to make always
+   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
+   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
+   exit test statement to loop back only if the GUARD statement will
+   also be true/false in the next iteration.  */
+
+static void
+patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
+		 bool initial_true)
+{
+  edge exit = single_exit (loop);
+  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
+  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
+			     nextval, newbound);
+  update_stmt (stmt);
+
+  edge stay = single_pred_edge (loop->latch);
+
+  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+
+  if (initial_true)
+    {
+      exit->flags |= EDGE_FALSE_VALUE;
+      stay->flags |= EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      exit->flags |= EDGE_TRUE_VALUE;
+      stay->flags |= EDGE_FALSE_VALUE;
+    }
+}
+
+/* Give an induction variable GUARD_IV, and its affine descriptor IV,
+   find the loop phi node in LOOP defining it directly, or create
+   such phi node.  Return that phi node.  */
+
+static gphi *
+find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
+{
+  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
+  gphi *phi;
+  if ((phi = dyn_cast <gphi *> (def))
+      && gimple_bb (phi) == loop->header)
+    return phi;
+
+  /* XXX Create the PHI instead.  */
+  return NULL;
+}
+
+/* This function updates the SSA form after connect_loops made a new
+   edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
+   conditional).  I.e. the second loop can now be entered either
+   via the original entry or via NEW_E, so the entry values of LOOP2
+   phi nodes are either the original ones or those at the exit
+   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
+   this.  */
+
+static void
+connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
+{
+  basic_block rest = loop_preheader_edge (loop2)->src;
+  gcc_assert (new_e->dest == rest);
+  edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
+
+  edge firste = loop_preheader_edge (loop1);
+  edge seconde = loop_preheader_edge (loop2);
+  edge firstn = loop_latch_edge (loop1);
+  gphi_iterator psi_first, psi_second;
+  for (psi_first = gsi_start_phis (loop1->header),
+       psi_second = gsi_start_phis (loop2->header);
+       !gsi_end_p (psi_first);
+       gsi_next (&psi_first), gsi_next (&psi_second))
+    {
+      tree init, next, new_init;
+      use_operand_p op;
+      gphi *phi_first = psi_first.phi ();
+      gphi *phi_second = psi_second.phi ();
+
+      init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
+      next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
+      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+
+      /* Prefer using original variable as a base for the new ssa name.
+	 This is necessary for virtual ops, and useful in order to avoid
+	 losing debug info for real ops.  */
+      if (TREE_CODE (next) == SSA_NAME
+	  && useless_type_conversion_p (TREE_TYPE (next),
+					TREE_TYPE (init)))
+	new_init = copy_ssa_name (next);
+      else if (TREE_CODE (init) == SSA_NAME
+	       && useless_type_conversion_p (TREE_TYPE (init),
+					     TREE_TYPE (next)))
+	new_init = copy_ssa_name (init);
+      else if (useless_type_conversion_p (TREE_TYPE (next),
+					  TREE_TYPE (init)))
+	new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
+				       "unrinittmp");
+      else
+	new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
+				       "unrinittmp");
+
+      gphi * newphi = create_phi_node (new_init, rest);
+      add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
+      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
+      SET_USE (op, new_init);
+    }
+}
+
+/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
+   they are still equivalent and placed in two arms of a diamond, like so:
+
+               .------if (cond)------.
+               v                     v
+             pre1                   pre2
+              |                      |
+        .--->h1                     h2<----.
+        |     |                      |     |
+        |    ex1---.            .---ex2    |
+        |    /     |            |     \    |
+        '---l1     X            |     l2---'
+                   |            |
+                   |            |
+                   '--->join<---'
+
+   This function transforms the program such that LOOP1 is conditionally
+   falling through to LOOP2, or skipping it.  This is done by splitting
+   the ex1->join edge at X in the diagram above, and inserting a condition
+   whose one arm goes to pre2, resulting in this situation:
+   
+               .------if (cond)------.
+               v                     v
+             pre1       .---------->pre2
+              |         |            |
+        .--->h1         |           h2<----.
+        |     |         |            |     |
+        |    ex1---.    |       .---ex2    |
+        |    /     v    |       |     \    |
+        '---l1   skip---'       |     l2---'
+                   |            |
+                   |            |
+                   '--->join<---'
+
+   
+   The condition used is the exit condition of LOOP1, which effectively means
+   that when the first loop exits (for whatever reason) but the real original
+   exit expression is still false the second loop will be entered.
+   The function returns the new edge cond->pre2.
+   
+   This doesn't update the SSA form, see connect_loop_phis for that.  */
+
+static edge
+connect_loops (struct loop *loop1, struct loop *loop2)
+{
+  edge exit = single_exit (loop1);
+  basic_block skip_bb = split_edge (exit);
+  gcond *skip_stmt;
+  gimple_stmt_iterator gsi;
+  edge new_e, skip_e;
+
+  gimple *stmt = last_stmt (exit->src);
+  skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
+				 gimple_cond_lhs (stmt),
+				 gimple_cond_rhs (stmt),
+				 NULL_TREE, NULL_TREE);
+  gsi = gsi_last_bb (skip_bb);
+  gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
+
+  skip_e = EDGE_SUCC (skip_bb, 0);
+  skip_e->flags &= ~EDGE_FALLTHRU;
+  new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
+  if (exit->flags & EDGE_TRUE_VALUE)
+    {
+      skip_e->flags |= EDGE_TRUE_VALUE;
+      new_e->flags |= EDGE_FALSE_VALUE;
+    }
+  else
+    {
+      skip_e->flags |= EDGE_FALSE_VALUE;
+      new_e->flags |= EDGE_TRUE_VALUE;
+    }
+
+  new_e->count = skip_bb->count;
+  new_e->probability = PROB_LIKELY;
+  new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
+  skip_e->count -= new_e->count;
+  skip_e->probability = inverse_probability (PROB_LIKELY);
+
+  return new_e;
+}
+
+/* This returns the new bound for iterations given the original iteration
+   space in NITER, an arbitrary new bound BORDER, assumed to be some
+   comparison value with a different IV, the initial value GUARD_INIT of
+   that other IV, and the comparison code GUARD_CODE that compares
+   that other IV with BORDER.  We return an SSA name, and place any
+   necessary statements for that computation into *STMTS.
+
+   For example for such a loop:
+
+     for (i = beg, j = guard_init; i < end; i++, j++)
+       if (j < border)  // this is supposed to be true/false
+         ...
+
+   we want to return a new bound (on j) that makes the loop iterate
+   as long as the condition j < border stays true.  We also don't want
+   to iterate more often than the original loop, so we have to introduce
+   some cut-off as well (via min/max), effectively resulting in:
+
+     newend = min (end+guard_init-beg, border)
+     for (i = beg; j = guard_init; j < newend; i++, j++)
+       if (j < c)
+         ...
+
+   Depending on the direction of the IVs and if the exit tests
+   are strict or non-strict we need to use MIN or MAX,
+   and add or subtract 1.  This routine computes newend above.  */
+
+static tree
+compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
+			 tree border,
+			 enum tree_code guard_code, tree guard_init)
+{
+  /* The niter structure contains the after-increment IV, we need
+     the loop-enter base, so subtract STEP once.  */
+  tree controlbase = force_gimple_operand (niter->control.base,
+					   stmts, true, NULL_TREE);
+  tree controlstep = niter->control.step;
+  tree enddiff;
+  if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
+    {
+      controlstep = gimple_build (stmts, NEGATE_EXPR,
+				  TREE_TYPE (controlstep), controlstep);
+      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
+			      TREE_TYPE (controlbase),
+			      controlbase, controlstep);
+    }
+  else
+    enddiff = gimple_build (stmts, MINUS_EXPR,
+			    TREE_TYPE (controlbase),
+			    controlbase, controlstep);
+
+  /* Compute beg-guard_init.  */
+  if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
+    {
+      tree tem = gimple_convert (stmts, sizetype, guard_init);
+      tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
+      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
+			      TREE_TYPE (enddiff),
+			      enddiff, tem);
+    }
+  else
+    enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+			    enddiff, guard_init);
+
+  /* Compute end-(beg-guard_init).  */
+  gimple_seq stmts2;
+  tree newbound = force_gimple_operand (niter->bound, &stmts2,
+					true, NULL_TREE);
+  gimple_seq_add_seq_without_update (stmts, stmts2);
+
+  if (POINTER_TYPE_P (TREE_TYPE (enddiff))
+      || POINTER_TYPE_P (TREE_TYPE (newbound)))
+    {
+      enddiff = gimple_convert (stmts, sizetype, enddiff);
+      enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
+      newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
+			       TREE_TYPE (newbound),
+			       newbound, enddiff);
+    }
+  else
+    newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+			     newbound, enddiff);
+
+  /* Depending on the direction of the IVs the new bound for the first
+     loop is the minimum or maximum of old bound and border.
+     Also, if the guard condition isn't strictly less or greater,
+     we need to adjust the bound.  */ 
+  int addbound = 0;
+  enum tree_code minmax;
+  if (niter->cmp == LT_EXPR)
+    {
+      /* GT and LE are the same, inverted.  */
+      if (guard_code == GT_EXPR || guard_code == LE_EXPR)
+	addbound = -1;
+      minmax = MIN_EXPR;
+    }
+  else
+    {
+      gcc_assert (niter->cmp == GT_EXPR);
+      if (guard_code == GE_EXPR || guard_code == LT_EXPR)
+	addbound = 1;
+      minmax = MAX_EXPR;
+    }
+
+  if (addbound)
+    {
+      tree type2 = TREE_TYPE (newbound);
+      if (POINTER_TYPE_P (type2))
+	type2 = sizetype;
+      newbound = gimple_build (stmts,
+			       POINTER_TYPE_P (TREE_TYPE (newbound))
+			       ? POINTER_PLUS_EXPR : PLUS_EXPR,
+			       TREE_TYPE (newbound),
+			       newbound,
+			       build_int_cst (type2, addbound));
+    }
+
+  tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
+			      border, newbound);
+  return newend;
+}
+
+/* Checks if LOOP contains an conditional block whose condition
+   depends on which side in the iteration space it is, and if so
+   splits the iteration space into two loops.  Returns true if the
+   loop was split.  NITER must contain the iteration descriptor for the
+   single exit of LOOP.  */
+
+static bool
+split_loop (struct loop *loop1, struct tree_niter_desc *niter)
+{
+  basic_block *bbs;
+  unsigned i;
+  bool changed = false;
+  tree guard_iv;
+  tree border;
+  affine_iv iv;
+
+  bbs = get_loop_body (loop1);
+
+  /* Find a splitting opportunity.  */
+  for (i = 0; i < loop1->num_nodes; i++)
+    if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
+      {
+	/* Handling opposite steps is not implemented yet.  Neither
+	   is handling different step sizes.  */
+	if ((tree_int_cst_sign_bit (iv.step)
+	     != tree_int_cst_sign_bit (niter->control.step))
+	    || !tree_int_cst_equal (iv.step, niter->control.step))
+	  continue;
+
+	/* Find a loop PHI node that defines guard_iv directly,
+	   or create one doing that.  */
+	gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
+	if (!phi)
+	  continue;
+	gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
+	tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
+						 loop_preheader_edge (loop1));
+	enum tree_code guard_code = gimple_cond_code (guard_stmt);
+
+	/* Loop splitting is implemented by versioning the loop, placing
+	   the new loop after the old loop, make the first loop iterate
+	   as long as the conditional stays true (or false) and let the
+	   second (new) loop handle the rest of the iterations.
+
+	   First we need to determine if the condition will start being true
+	   or false in the first loop.  */
+	bool initial_true;
+	switch (guard_code)
+	  {
+	    case LT_EXPR:
+	    case LE_EXPR:
+	      initial_true = !tree_int_cst_sign_bit (iv.step);
+	      break;
+	    case GT_EXPR:
+	    case GE_EXPR:
+	      initial_true = tree_int_cst_sign_bit (iv.step);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	  }
+
+	/* Build a condition that will skip the first loop when the
+	   guard condition won't ever be true (or false).  */
+	gimple_seq stmts2;
+	border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
+	if (stmts2)
+	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
+					    stmts2);
+	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
+	if (!initial_true)
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+
+	/* Now version the loop, placing loop2 after loop1 connecting
+	   them, and fix up SSA form for that.  */
+	initialize_original_copy_tables ();
+	basic_block cond_bb;
+	struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
+					   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+					   REG_BR_PROB_BASE, true);
+	gcc_assert (loop2);
+	update_ssa (TODO_update_ssa);
+
+	edge new_e = connect_loops (loop1, loop2);
+	connect_loop_phis (loop1, loop2, new_e);
+
+	/* The iterations of the second loop is now already
+	   exactly those that the first loop didn't do, but the
+	   iteration space of the first loop is still the original one.
+	   Compute the new bound for the guarding IV and patch the
+	   loop exit to use it instead of original IV and bound.  */
+	gimple_seq stmts = NULL;
+	tree newend = compute_new_first_bound (&stmts, niter, border,
+					       guard_code, guard_init);
+	if (stmts)
+	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
+					    stmts);
+	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
+	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
+
+	/* Finally patch out the two copies of the condition to be always
+	   true/false (or opposite).  */
+	gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
+	gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
+	if (!initial_true)
+	  std::swap (force_true, force_false);
+	gimple_cond_make_true (force_true);
+	gimple_cond_make_false (force_false);
+	update_stmt (force_true);
+	update_stmt (force_false);
+
+	free_original_copy_tables ();
+
+	/* We destroyed LCSSA form above.  Eventually we might be able
+	   to fix it on the fly, for now simply punt and use the helper.  */
+	rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+	changed = true;
+	if (dump_file && (dump_flags & TDF_DETAILS))
+	  fprintf (dump_file, ";; Loop split.\n");
+
+	/* Only deal with the first opportunity.  */
+	break;
+      }
+
+  free (bbs);
+  return changed;
+}
+
+/* Main entry point.  Perform loop splitting on all suitable loops.  */
+
+static unsigned int
+tree_ssa_split_loops (void)
+{
+  struct loop *loop;
+  bool changed = false;
+
+  gcc_assert (scev_initialized_p ());
+  FOR_EACH_LOOP (loop, 0)
+    loop->aux = NULL;
+
+  /* Go through all loops starting from innermost.  */
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      struct tree_niter_desc niter;
+      if (loop->aux)
+	{
+	  /* If any of our inner loops was split, don't split us,
+	     and mark our containing loop as having had splits as well.  */
+	  loop_outer (loop)->aux = loop;
+	  continue;
+	}
+
+      if (single_exit (loop)
+	  /* ??? We could handle non-empty latches when we split
+	     the latch edge (not the exit edge), and put the new
+	     exit condition in the new block.  OTOH this executes some
+	     code unconditionally that might have been skipped by the
+	     original exit before.  */
+	  && empty_block_p (loop->latch)
+	  && !optimize_loop_for_size_p (loop)
+	  && number_of_iterations_exit (loop, single_exit (loop), &niter,
+					false, true)
+	  && niter.cmp != ERROR_MARK
+	  /* We can't yet handle loops controlled by a != predicate.  */
+	  && niter.cmp != NE_EXPR)
+	{
+	  if (split_loop (loop, &niter))
+	    {
+	      /* Mark our containing loop as having had some split inner
+	         loops.  */
+	      loop_outer (loop)->aux = loop;
+	      changed = true;
+	    }
+	}
+    }
+
+  FOR_EACH_LOOP (loop, 0)
+    loop->aux = NULL;
+
+  if (changed)
+    return TODO_cleanup_cfg;
+  return 0;
+}
+
+/* Loop splitting pass.  */
+
+namespace {
+
+const pass_data pass_data_loop_split =
+{
+  GIMPLE_PASS, /* type */
+  "lsplit", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_LOOP_SPLIT, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_loop_split : public gimple_opt_pass
+{
+public:
+  pass_loop_split (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_loop_split, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_split_loops != 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_loop_split
+
+unsigned int
+pass_loop_split::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  return tree_ssa_split_loops ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_loop_split (gcc::context *ctxt)
+{
+  return new pass_loop_split (ctxt);
+}
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 231115)
+++ doc/invoke.texi	(working copy)
@@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
 -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
--fsingle-precision-constant -fsplit-ivs-in-unroller @gol
+-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
 -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
@@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
 Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
 at level @option{-O1}
 
+@item -fsplit-loops
+@opindex fsplit-loops
+Split a loop into two if it contains a condition that's always true
+for one side of the iteration space and false for the other.
+
 @item -funswitch-loops
 @opindex funswitch-loops
 Move branches with loop invariant conditions out of the loop, with duplicates
Index: doc/passes.texi
===================================================================
--- doc/passes.texi	(revision 231115)
+++ doc/passes.texi	(working copy)
@@ -484,6 +484,12 @@ out of the loops.  To achieve this, a du
 each possible outcome of conditional jump(s).  The pass is implemented in
 @file{tree-ssa-loop-unswitch.c}.
 
+Loop splitting.  If a loop contains a conditional statement that is
+always true for one part of the iteration space and false for the other
+this pass splits the loop into two, one dealing with one side the other
+only with the other, thereby removing one inner-loop conditional.  The
+pass is implemented in @file{tree-ssa-loop-split.c}.
+
 The optimizations also use various utility functions contained in
 @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
 @file{cfgloopmanip.c}.
Index: testsuite/gcc.dg/loop-split.c
===================================================================
--- testsuite/gcc.dg/loop-split.c	(revision 0)
+++ testsuite/gcc.dg/loop-split.c	(working copy)
@@ -0,0 +1,147 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+#ifdef __cplusplus
+extern "C" int printf (const char *, ...);
+extern "C" void abort (void);
+#else
+extern int printf (const char *, ...);
+extern void abort (void);
+#endif
+
+/* Define TRACE to 1 or 2 to get detailed tracing.
+   Define SINGLE_TEST to 1 or 2 to get a simple routine with
+   just one loop, called only one time or with multiple parameters,
+   to make debugging easier.  */
+#ifndef TRACE
+#define TRACE 0
+#endif
+
+#define loop(beg,step,beg2,cond1,cond2) \
+    do \
+      { \
+	sum = 0; \
+        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
+          { \
+            if (cond2) { \
+	      if (TRACE > 1) printf ("a: %d %d\n", i, j); \
+              sum += a[i]; \
+	    } else { \
+	      if (TRACE > 1) printf ("b: %d %d\n", i, j); \
+              sum += b[i]; \
+	    } \
+          } \
+	if (TRACE > 0) printf ("sum: %d\n", sum); \
+	check = check * 47 + sum; \
+      } while (0)
+
+#ifndef SINGLE_TEST
+unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
+					       int c, int *a, int *b, int beg2)
+{
+  unsigned check = 0;
+  int sum;
+  int i, j;
+  loop (beg, 1, beg2, i < end, j < c);
+  loop (beg, 1, beg2, i <= end, j < c);
+  loop (beg, 1, beg2, i < end, j <= c);
+  loop (beg, 1, beg2, i <= end, j <= c);
+  loop (beg, 1, beg2, i < end, j > c);
+  loop (beg, 1, beg2, i <= end, j > c);
+  loop (beg, 1, beg2, i < end, j >= c);
+  loop (beg, 1, beg2, i <= end, j >= c);
+  beg2 += end-beg;
+  loop (end, -1, beg2, i >= beg, j >= c);
+  loop (end, -1, beg2, i >= beg, j > c);
+  loop (end, -1, beg2, i > beg, j >= c);
+  loop (end, -1, beg2, i > beg, j > c);
+  loop (end, -1, beg2, i >= beg, j <= c);
+  loop (end, -1, beg2, i >= beg, j < c);
+  loop (end, -1, beg2, i > beg, j <= c);
+  loop (end, -1, beg2, i > beg, j < c);
+  return check;
+}
+
+#else
+
+int __attribute__((noinline, noclone)) f (int beg, int end, int step,
+					  int c, int *a, int *b, int beg2)
+{
+  int sum = 0;
+  int i, j;
+  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
+    {
+      // i - j == X --> i = X + j
+      // --> i < end == X+j < end == j < end - X
+      // --> newend = end - (i_init - j_init)
+      // j < end-X && j < c --> j < min(end-X,c)
+      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
+      //if (j < c)
+      if (j >= c)
+	printf ("a: %d %d\n", i, j);
+      /*else
+	printf ("b: %d %d\n", i, j);*/
+	/*sum += a[i];
+      else
+	sum += b[i];*/
+    }
+  return sum;
+}
+
+int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
+					  int *c, int *a, int *b, int *beg2)
+{
+  int sum = 0;
+  int *i, *j;
+  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+    {
+      if (j <= c)
+	printf ("%d %d\n", i - beg, j - beg);
+	/*sum += a[i];
+      else
+	sum += b[i];*/
+    }
+  return sum;
+}
+#endif
+
+extern int printf (const char *, ...);
+
+int main ()
+{
+  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
+  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
+  int c;
+  int diff = 0;
+  unsigned check = 0;
+#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
+  //dotest (0, 9, 1, -1, a+5, b+5, -1);
+  //return 0;
+  f (0, 9, 1, 5, a+5, b+5, -1);
+  return 0;
+#endif
+  for (diff = -5; diff <= 5; diff++)
+    {
+      for (c = -1; c <= 10; c++)
+	{
+#ifdef SINGLE_TEST
+	  int s = f (0, 9, 1, c, a+5, b+5, diff);
+	  //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
+	  printf ("%d ", s);
+#else
+	  if (TRACE > 0)
+	    printf ("check %d %d\n", c, diff);
+	  check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
+#endif
+	}
+      //printf ("\n");
+    }
+  //printf ("%u\n", check);
+  if (check != 3213344948)
+    abort ();
+  return 0;
+}
+
+/* All 16 loops in dotest should be split.  */
+/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2015-12-02 13:23           ` Michael Matz
@ 2015-12-05  7:55             ` Jeff Law
  2016-10-20 14:43               ` Michael Matz
  2016-07-25 20:57             ` Andrew Pinski
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-12-05  7:55 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On 12/02/2015 06:23 AM, Michael Matz wrote:
> Hi,
>
> On Tue, 1 Dec 2015, Jeff Law wrote:
>
>>> So, okay for trunk?
>> -ENOPATCH
>
> Sigh :)
> Here it is.
>
>
> Ciao,
> Michael.
> 	* common.opt (-fsplit-loops): New flag.
> 	* passes.def (pass_loop_split): Add.
> 	* opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
> 	(enable_fdo_optimizations): Add loop splitting.
> 	* timevar.def (TV_LOOP_SPLIT): Add.
> 	* tree-pass.h (make_pass_loop_split): Declare.
> 	* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> 	* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> 	* tree-ssa-loop-split.c: New file.
> 	* Makefile.in (OBJS): Add tree-ssa-loop-split.o.
> 	* doc/invoke.texi (fsplit-loops): Document.
> 	* doc/passes.texi (Loop optimization): Add paragraph about loop
> 	splitting.
>
> testsuite/
> 	* gcc.dg/loop-split.c: New test.
>
> Index: tree-ssa-loop-split.c

> +/* Return true when BB inside LOOP is a potential iteration space
> +   split point, i.e. ends with a condition like "IV < comp", which
> +   is true on one side of the iteration space and false on the other,
> +   and the split point can be computed.  If so, also return the border
> +   point in *BORDER and the comparison induction variable in IV.  */
> +
> +static tree
> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +{
> +  gimple *last;
> +  gcond *stmt;
> +  affine_iv iv2;
> +
  +
> +  /* Make it so, that the first argument of the condition is
> +     the looping one (only swap.  */
Nit.  I don't think you want a comma after "so".  And it looks like your 
comment got truncated as well.

With the comment above fixed, this is fine for the trunk.

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2015-12-05  7:55             ` Jeff Law
@ 2016-10-20 14:43               ` Michael Matz
  2016-10-20 14:56                 ` Bin.Cheng
  2016-10-20 19:17                 ` Jeff Law
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2016-10-20 14:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Hi,

On Sat, 5 Dec 2015, Jeff Law wrote:

> Nit.  I don't think you want a comma after "so".  And it looks like your
> comment got truncated as well.
> 
> With the comment above fixed, this is fine for the trunk.

I'm terribly sorry to have dropped the ball here, but I've committed this 
now after not even a year ;-/ (r241374)  Obviously after rebootstrapping 
with all,ada languages.  I also did some benchmark run which should be 
taken with a grain of salt as the machine had fairly variant results but 
the improvements are real, though perhaps not always in that range (it's a 
normal three repeats run).  I'm really curious if our automatic tester can 
pick up similar improvements, because if so, it's extreme (5 to 15 percent 
in some benchmarks) and we can brag about it for GCC 7 ;-)

400.perlbench    9770        519       18.8 *    9770   508       19.2 *  
401.bzip2        9650        668       14.5 *    9650   666       14.5 *  
403.gcc          8050        455       17.7 *    8050   432       18.6 *  
429.mcf          9120        477       19.1 *    9120   467       19.5 *  
445.gobmk       10490        643       16.3 *   10490   644       16.3 *  
456.hmmer        9330        641       14.6 *    9330   614       15.2 *  
458.sjeng       12100        784       15.4 *   12100   762       15.9 *  
462.libquantum  20720        605       34.2 *   20720   600       34.5 *  
464.h264ref     22130        969       22.8 *   22130   969       22.8 *  
471.omnetpp      6250        438       14.3 *    6250   358       17.5 *  
473.astar        7020        494       14.2 *    7020   492       14.3 *  
483.xalancbmk    6900        342       20.2 *    6900   336       20.6 *  
 Est. SPECint(R)_base2006              17.9
 Est. SPECint2006                                                 18.5

410.bwaves      13590        563       24.1 *   13590   506       26.9 *  
416.gamess                                  NR                         NR 
433.milc         9180        375       24.5 *    9180   349       26.3 *  
434.zeusmp       9100        433       21.0 *    9100   423       21.5 *  
435.gromacs      7140        402       17.7 *    7140   411       17.4 *  
436.cactusADM   11950        486       24.6 *   11950   486       24.6 *  
437.leslie3d     9400        421       22.4 *    9400   419       22.4 *  
444.namd         8020        520       15.4 *    8020   520       15.4 *  
447.dealII                                  NR                         NR 
450.soplex       8340        393       21.2 *    8340   391       21.3 *  
453.povray       5320        277       19.2 *    5320   278       19.1 *  
454.calculix     8250        453       18.2 *    8250   460       17.9 *  
459.GemsFDTD    10610        542       19.6 *   10610   537       19.8 *  
465.tonto        9840        492       20.0 *    9840   491       20.0 *  
470.lbm         13740        466       29.5 *   13740   430       32.0 *  
481.wrf         11170        492       22.7 *   11170   457       24.4 *  
482.sphinx3     19490        659       29.6 *   19490   655       29.8 *  
 Est. SPECfp(R)_base2006               21.6
 Est. SPECfp2006                                                  22.1


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-10-20 14:43               ` Michael Matz
@ 2016-10-20 14:56                 ` Bin.Cheng
  2016-10-24  8:44                   ` Bin.Cheng
  2016-10-20 19:17                 ` Jeff Law
  1 sibling, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2016-10-20 14:56 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jeff Law, gcc-patches List

On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Sat, 5 Dec 2015, Jeff Law wrote:
>
>> Nit.  I don't think you want a comma after "so".  And it looks like your
>> comment got truncated as well.
>>
>> With the comment above fixed, this is fine for the trunk.
>
> I'm terribly sorry to have dropped the ball here, but I've committed this
> now after not even a year ;-/ (r241374)  Obviously after rebootstrapping
> with all,ada languages.  I also did some benchmark run which should be
> taken with a grain of salt as the machine had fairly variant results but
> the improvements are real, though perhaps not always in that range (it's a
> normal three repeats run).  I'm really curious if our automatic tester can
> pick up similar improvements, because if so, it's extreme (5 to 15 percent
> in some benchmarks) and we can brag about it for GCC 7 ;-)
This is nice, thanks for doing it.  I will check the improvement on AArch64.

Thanks,
bin
>
> 400.perlbench    9770        519       18.8 *    9770   508       19.2 *
> 401.bzip2        9650        668       14.5 *    9650   666       14.5 *
> 403.gcc          8050        455       17.7 *    8050   432       18.6 *
> 429.mcf          9120        477       19.1 *    9120   467       19.5 *
> 445.gobmk       10490        643       16.3 *   10490   644       16.3 *
> 456.hmmer        9330        641       14.6 *    9330   614       15.2 *
> 458.sjeng       12100        784       15.4 *   12100   762       15.9 *
> 462.libquantum  20720        605       34.2 *   20720   600       34.5 *
> 464.h264ref     22130        969       22.8 *   22130   969       22.8 *
> 471.omnetpp      6250        438       14.3 *    6250   358       17.5 *
> 473.astar        7020        494       14.2 *    7020   492       14.3 *
> 483.xalancbmk    6900        342       20.2 *    6900   336       20.6 *
>  Est. SPECint(R)_base2006              17.9
>  Est. SPECint2006                                                 18.5
>
> 410.bwaves      13590        563       24.1 *   13590   506       26.9 *
> 416.gamess                                  NR                         NR
> 433.milc         9180        375       24.5 *    9180   349       26.3 *
> 434.zeusmp       9100        433       21.0 *    9100   423       21.5 *
> 435.gromacs      7140        402       17.7 *    7140   411       17.4 *
> 436.cactusADM   11950        486       24.6 *   11950   486       24.6 *
> 437.leslie3d     9400        421       22.4 *    9400   419       22.4 *
> 444.namd         8020        520       15.4 *    8020   520       15.4 *
> 447.dealII                                  NR                         NR
> 450.soplex       8340        393       21.2 *    8340   391       21.3 *
> 453.povray       5320        277       19.2 *    5320   278       19.1 *
> 454.calculix     8250        453       18.2 *    8250   460       17.9 *
> 459.GemsFDTD    10610        542       19.6 *   10610   537       19.8 *
> 465.tonto        9840        492       20.0 *    9840   491       20.0 *
> 470.lbm         13740        466       29.5 *   13740   430       32.0 *
> 481.wrf         11170        492       22.7 *   11170   457       24.4 *
> 482.sphinx3     19490        659       29.6 *   19490   655       29.8 *
>  Est. SPECfp(R)_base2006               21.6
>  Est. SPECfp2006                                                  22.1
>
>
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-10-20 14:56                 ` Bin.Cheng
@ 2016-10-24  8:44                   ` Bin.Cheng
  2016-10-24  9:02                     ` Michael Matz
  0 siblings, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2016-10-24  8:44 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jeff Law, gcc-patches List

On Thu, Oct 20, 2016 at 3:55 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Sat, 5 Dec 2015, Jeff Law wrote:
>>
>>> Nit.  I don't think you want a comma after "so".  And it looks like your
>>> comment got truncated as well.
>>>
>>> With the comment above fixed, this is fine for the trunk.
>>
>> I'm terribly sorry to have dropped the ball here, but I've committed this
>> now after not even a year ;-/ (r241374)  Obviously after rebootstrapping
>> with all,ada languages.  I also did some benchmark run which should be
>> taken with a grain of salt as the machine had fairly variant results but
>> the improvements are real, though perhaps not always in that range (it's a
>> normal three repeats run).  I'm really curious if our automatic tester can
>> pick up similar improvements, because if so, it's extreme (5 to 15 percent
>> in some benchmarks) and we can brag about it for GCC 7 ;-)
> This is nice, thanks for doing it.  I will check the improvement on AArch64.
Hi,
Unfortunately I didn't reproduce the improvement in my run on AArch64,
I will double check if I made some mistakes.

Thanks,
bin
>>
>> 400.perlbench    9770        519       18.8 *    9770   508       19.2 *
>> 401.bzip2        9650        668       14.5 *    9650   666       14.5 *
>> 403.gcc          8050        455       17.7 *    8050   432       18.6 *
>> 429.mcf          9120        477       19.1 *    9120   467       19.5 *
>> 445.gobmk       10490        643       16.3 *   10490   644       16.3 *
>> 456.hmmer        9330        641       14.6 *    9330   614       15.2 *
>> 458.sjeng       12100        784       15.4 *   12100   762       15.9 *
>> 462.libquantum  20720        605       34.2 *   20720   600       34.5 *
>> 464.h264ref     22130        969       22.8 *   22130   969       22.8 *
>> 471.omnetpp      6250        438       14.3 *    6250   358       17.5 *
>> 473.astar        7020        494       14.2 *    7020   492       14.3 *
>> 483.xalancbmk    6900        342       20.2 *    6900   336       20.6 *
>>  Est. SPECint(R)_base2006              17.9
>>  Est. SPECint2006                                                 18.5
>>
>> 410.bwaves      13590        563       24.1 *   13590   506       26.9 *
>> 416.gamess                                  NR                         NR
>> 433.milc         9180        375       24.5 *    9180   349       26.3 *
>> 434.zeusmp       9100        433       21.0 *    9100   423       21.5 *
>> 435.gromacs      7140        402       17.7 *    7140   411       17.4 *
>> 436.cactusADM   11950        486       24.6 *   11950   486       24.6 *
>> 437.leslie3d     9400        421       22.4 *    9400   419       22.4 *
>> 444.namd         8020        520       15.4 *    8020   520       15.4 *
>> 447.dealII                                  NR                         NR
>> 450.soplex       8340        393       21.2 *    8340   391       21.3 *
>> 453.povray       5320        277       19.2 *    5320   278       19.1 *
>> 454.calculix     8250        453       18.2 *    8250   460       17.9 *
>> 459.GemsFDTD    10610        542       19.6 *   10610   537       19.8 *
>> 465.tonto        9840        492       20.0 *    9840   491       20.0 *
>> 470.lbm         13740        466       29.5 *   13740   430       32.0 *
>> 481.wrf         11170        492       22.7 *   11170   457       24.4 *
>> 482.sphinx3     19490        659       29.6 *   19490   655       29.8 *
>>  Est. SPECfp(R)_base2006               21.6
>>  Est. SPECfp2006                                                  22.1
>>
>>
>> Ciao,
>> Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-10-24  8:44                   ` Bin.Cheng
@ 2016-10-24  9:02                     ` Michael Matz
  2016-10-25 16:41                       ` Tamar Christina
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2016-10-24  9:02 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jeff Law, gcc-patches List

Hi,

On Mon, 24 Oct 2016, Bin.Cheng wrote:

> Unfortunately I didn't reproduce the improvement in my run on AArch64, I 
> will double check if I made some mistakes.

Yeah, our regular testers also didn't pick up these kinds of improvements.  
As I said, the machine was quite jumpy (though not loaded at all, and 
fixated to run on one CPU) :-/


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Gimple loop splitting v2
  2016-10-24  9:02                     ` Michael Matz
@ 2016-10-25 16:41                       ` Tamar Christina
  0 siblings, 0 replies; 20+ messages in thread
From: Tamar Christina @ 2016-10-25 16:41 UTC (permalink / raw)
  To: Michael Matz, Bin.Cheng; +Cc: Jeff Law, gcc-patches List, nd

Hi Michael,

The commit seems to be causing an ICE on aarch64 (just tested latest trunk).

I've created a Bugzilla ticket with a test input https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78107

Regards,
Tamar

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Michael Matz
> Sent: 24 October 2016 10:02
> To: Bin.Cheng
> Cc: Jeff Law; gcc-patches List
> Subject: Re: Gimple loop splitting v2
> 
> Hi,
> 
> On Mon, 24 Oct 2016, Bin.Cheng wrote:
> 
> > Unfortunately I didn't reproduce the improvement in my run on AArch64,
> > I will double check if I made some mistakes.
> 
> Yeah, our regular testers also didn't pick up these kinds of improvements.
> As I said, the machine was quite jumpy (though not loaded at all, and fixated
> to run on one CPU) :-/
> 
> 
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-10-20 14:43               ` Michael Matz
  2016-10-20 14:56                 ` Bin.Cheng
@ 2016-10-20 19:17                 ` Jeff Law
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Law @ 2016-10-20 19:17 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On 10/20/2016 08:43 AM, Michael Matz wrote:
> Hi,
>
> On Sat, 5 Dec 2015, Jeff Law wrote:
>
>> Nit.  I don't think you want a comma after "so".  And it looks like your
>> comment got truncated as well.
>>
>> With the comment above fixed, this is fine for the trunk.
>
> I'm terribly sorry to have dropped the ball here, but I've committed this
> now after not even a year ;-/ (r241374)
It'd totally fallen off my radar.  I had to go find it in my archives :-).




  Obviously after rebootstrapping
> with all,ada languages.  I also did some benchmark run which should be
> taken with a grain of salt as the machine had fairly variant results but
> the improvements are real, though perhaps not always in that range (it's a
> normal three repeats run).  I'm really curious if our automatic tester can
> pick up similar improvements, because if so, it's extreme (5 to 15 percent
> in some benchmarks) and we can brag about it for GCC 7 ;-)
Yea.  I don't expect it applies that often and ISTM that it's probably 
most beneficial by enabling other stuff later in the loop optimizer 
pipeline to see more loops without embedded flow control.


ANyway, glad to see it go in.

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2015-12-02 13:23           ` Michael Matz
  2015-12-05  7:55             ` Jeff Law
@ 2016-07-25 20:57             ` Andrew Pinski
  2016-07-26 11:32               ` Richard Biener
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2016-07-25 20:57 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jeff Law, GCC Patches

On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 1 Dec 2015, Jeff Law wrote:
>
>> > So, okay for trunk?
>> -ENOPATCH
>
> Sigh :)
> Here it is.


I found one problem with it.
Take:
void f(int *a, int M, int *b)
{
  for(int i = 0; i <= M; i++)
    {
       if (i < M)
        a[i] = i;
    }
}
---- CUT ---
There are two issues with the code as below.  The outer most loop's
aux is still set which causes the vectorizer not to vector the loop.
The other issue is I need to run pass_scev_cprop after pass_loop_split
to get the induction variable usage after the loop gone so the
vectorizer will work.

Something like (note this is copy and paste from a terminal):
diff --git a/gcc/passes.def b/gcc/passes.def
index c327900..e8d6ea6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_copy_prop);
          NEXT_PASS (pass_dce);
          NEXT_PASS (pass_tree_unswitch);
-         NEXT_PASS (pass_scev_cprop);
          NEXT_PASS (pass_loop_split);
+         NEXT_PASS (pass_scev_cprop);
          NEXT_PASS (pass_record_bounds);
          NEXT_PASS (pass_loop_distribution);
          NEXT_PASS (pass_copy_prop);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 5411530..e72ef19 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -592,7 +592,11 @@ tree_ssa_split_loops (void)

   gcc_assert (scev_initialized_p ());
   FOR_EACH_LOOP (loop, 0)
-    loop->aux = NULL;
+    {
+      loop->aux = NULL;
+      if (loop_outer (loop))
+       loop_outer (loop)->aux = NULL;
+    }

   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
@@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
     }

   FOR_EACH_LOOP (loop, 0)
-    loop->aux = NULL;
+    {
+      loop->aux = NULL;
+      if (loop_outer (loop))
+       loop_outer (loop)->aux = NULL;
+    }

   if (changed)
     return TODO_cleanup_cfg;
-----  CUT -----

Thanks,
Andrew


>
>
> Ciao,
> Michael.
>         * common.opt (-fsplit-loops): New flag.
>         * passes.def (pass_loop_split): Add.
>         * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>         (enable_fdo_optimizations): Add loop splitting.
>         * timevar.def (TV_LOOP_SPLIT): Add.
>         * tree-pass.h (make_pass_loop_split): Declare.
>         * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>         * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>         * tree-ssa-loop-split.c: New file.
>         * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>         * doc/invoke.texi (fsplit-loops): Document.
>         * doc/passes.texi (Loop optimization): Add paragraph about loop
>         splitting.
>
> testsuite/
>         * gcc.dg/loop-split.c: New test.
>
> Index: common.opt
> ===================================================================
> --- common.opt  (revision 231115)
> +++ common.opt  (working copy)
> @@ -2453,6 +2457,10 @@ funswitch-loops
>  Common Report Var(flag_unswitch_loops) Optimization
>  Perform loop unswitching.
>
> +fsplit-loops
> +Common Report Var(flag_split_loops) Optimization
> +Perform loop splitting.
> +
>  funwind-tables
>  Common Report Var(flag_unwind_tables) Optimization
>  Just generate unwind tables for exception handling.
> Index: passes.def
> ===================================================================
> --- passes.def  (revision 231115)
> +++ passes.def  (working copy)
> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>           NEXT_PASS (pass_dce);
>           NEXT_PASS (pass_tree_unswitch);
>           NEXT_PASS (pass_scev_cprop);
> +         NEXT_PASS (pass_loop_split);
>           NEXT_PASS (pass_record_bounds);
>           NEXT_PASS (pass_loop_distribution);
>           NEXT_PASS (pass_copy_prop);
> Index: opts.c
> ===================================================================
> --- opts.c      (revision 231115)
> +++ opts.c      (working copy)
> @@ -532,6 +532,7 @@ static const struct default_options defa
>         regardless of them being declared inline.  */
>      { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>      { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
> +    { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>      opts->x_flag_ipa_cp_alignment = value;
>    if (!opts_set->x_flag_predictive_commoning)
>      opts->x_flag_predictive_commoning = value;
> +  if (!opts_set->x_flag_split_loops)
> +    opts->x_flag_split_loops = value;
>    if (!opts_set->x_flag_unswitch_loops)
>      opts->x_flag_unswitch_loops = value;
>    if (!opts_set->x_flag_gcse_after_reload)
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 231115)
> +++ timevar.def (working copy)
> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM                   , "
>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
> +DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 231115)
> +++ tree-pass.h (working copy)
> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
> Index: tree-ssa-loop-manip.h
> ===================================================================
> --- tree-ssa-loop-manip.h       (revision 231115)
> +++ tree-ssa-loop-manip.h       (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
>  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>                        bool, tree *, tree *);
> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
> +                                           struct loop *);
>  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>  extern void verify_loop_closed_ssa (bool);
> Index: Makefile.in
> ===================================================================
> --- Makefile.in (revision 231115)
> +++ Makefile.in (working copy)
> @@ -1474,6 +1474,7 @@ OBJS = \
>         tree-ssa-loop-manip.o \
>         tree-ssa-loop-niter.o \
>         tree-ssa-loop-prefetch.o \
> +       tree-ssa-loop-split.o \
>         tree-ssa-loop-unswitch.o \
>         tree-ssa-loop.o \
>         tree-ssa-math-opts.o \
> Index: tree-ssa-loop-split.c
> ===================================================================
> --- tree-ssa-loop-split.c       (revision 0)
> +++ tree-ssa-loop-split.c       (working copy)
> @@ -0,0 +1,686 @@
> +/* Loop splitting.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "fold-const.h"
> +#include "tree-cfg.h"
> +#include "tree-ssa.h"
> +#include "tree-ssa-loop-niter.h"
> +#include "tree-ssa-loop.h"
> +#include "tree-ssa-loop-manip.h"
> +#include "tree-into-ssa.h"
> +#include "cfgloop.h"
> +#include "tree-scalar-evolution.h"
> +#include "gimple-iterator.h"
> +#include "gimple-pretty-print.h"
> +#include "cfghooks.h"
> +#include "gimple-fold.h"
> +#include "gimplify-me.h"
> +
> +/* This file implements loop splitting, i.e. transformation of loops like
> +
> +   for (i = 0; i < 100; i++)
> +     {
> +       if (i < 50)
> +         A;
> +       else
> +         B;
> +     }
> +
> +   into:
> +
> +   for (i = 0; i < 50; i++)
> +     {
> +       A;
> +     }
> +   for (; i < 100; i++)
> +     {
> +       B;
> +     }
> +
> +   */
> +
> +/* Return true when BB inside LOOP is a potential iteration space
> +   split point, i.e. ends with a condition like "IV < comp", which
> +   is true on one side of the iteration space and false on the other,
> +   and the split point can be computed.  If so, also return the border
> +   point in *BORDER and the comparison induction variable in IV.  */
> +
> +static tree
> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +{
> +  gimple *last;
> +  gcond *stmt;
> +  affine_iv iv2;
> +
> +  /* BB must end in a simple conditional jump.  */
> +  last = last_stmt (bb);
> +  if (!last || gimple_code (last) != GIMPLE_COND)
> +    return NULL_TREE;
> +  stmt = as_a <gcond *> (last);
> +
> +  enum tree_code code = gimple_cond_code (stmt);
> +
> +  /* Only handle relational comparisons, for equality and non-equality
> +     we'd have to split the loop into two loops and a middle statement.  */
> +  switch (code)
> +    {
> +      case LT_EXPR:
> +      case LE_EXPR:
> +      case GT_EXPR:
> +      case GE_EXPR:
> +       break;
> +      default:
> +       return NULL_TREE;
> +    }
> +
> +  if (loop_exits_from_bb_p (loop, bb))
> +    return NULL_TREE;
> +
> +  tree op0 = gimple_cond_lhs (stmt);
> +  tree op1 = gimple_cond_rhs (stmt);
> +
> +  if (!simple_iv (loop, loop, op0, iv, false))
> +    return NULL_TREE;
> +  if (!simple_iv (loop, loop, op1, &iv2, false))
> +    return NULL_TREE;
> +
> +  /* Make it so, that the first argument of the condition is
> +     the looping one (only swap.  */
> +  if (!integer_zerop (iv2.step))
> +    {
> +      std::swap (op0, op1);
> +      std::swap (*iv, iv2);
> +      code = swap_tree_comparison (code);
> +      gimple_cond_set_condition (stmt, code, op0, op1);
> +      update_stmt (stmt);
> +    }
> +  else if (integer_zerop (iv->step))
> +    return NULL_TREE;
> +  if (!integer_zerop (iv2.step))
> +    return NULL_TREE;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "Found potential split point: ");
> +      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> +      fprintf (dump_file, " { ");
> +      print_generic_expr (dump_file, iv->base, TDF_SLIM);
> +      fprintf (dump_file, " + I*");
> +      print_generic_expr (dump_file, iv->step, TDF_SLIM);
> +      fprintf (dump_file, " } %s ", get_tree_code_name (code));
> +      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  *border = iv2.base;
> +  return op0;
> +}
> +
> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
> +   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
> +   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
> +   exit test statement to loop back only if the GUARD statement will
> +   also be true/false in the next iteration.  */
> +
> +static void
> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
> +                bool initial_true)
> +{
> +  edge exit = single_exit (loop);
> +  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
> +  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
> +                            nextval, newbound);
> +  update_stmt (stmt);
> +
> +  edge stay = single_pred_edge (loop->latch);
> +
> +  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> +  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> +
> +  if (initial_true)
> +    {
> +      exit->flags |= EDGE_FALSE_VALUE;
> +      stay->flags |= EDGE_TRUE_VALUE;
> +    }
> +  else
> +    {
> +      exit->flags |= EDGE_TRUE_VALUE;
> +      stay->flags |= EDGE_FALSE_VALUE;
> +    }
> +}
> +
> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> +   find the loop phi node in LOOP defining it directly, or create
> +   such phi node.  Return that phi node.  */
> +
> +static gphi *
> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> +  gphi *phi;
> +  if ((phi = dyn_cast <gphi *> (def))
> +      && gimple_bb (phi) == loop->header)
> +    return phi;
> +
> +  /* XXX Create the PHI instead.  */
> +  return NULL;
> +}
> +
> +/* This function updates the SSA form after connect_loops made a new
> +   edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
> +   conditional).  I.e. the second loop can now be entered either
> +   via the original entry or via NEW_E, so the entry values of LOOP2
> +   phi nodes are either the original ones or those at the exit
> +   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
> +   this.  */
> +
> +static void
> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
> +{
> +  basic_block rest = loop_preheader_edge (loop2)->src;
> +  gcc_assert (new_e->dest == rest);
> +  edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
> +
> +  edge firste = loop_preheader_edge (loop1);
> +  edge seconde = loop_preheader_edge (loop2);
> +  edge firstn = loop_latch_edge (loop1);
> +  gphi_iterator psi_first, psi_second;
> +  for (psi_first = gsi_start_phis (loop1->header),
> +       psi_second = gsi_start_phis (loop2->header);
> +       !gsi_end_p (psi_first);
> +       gsi_next (&psi_first), gsi_next (&psi_second))
> +    {
> +      tree init, next, new_init;
> +      use_operand_p op;
> +      gphi *phi_first = psi_first.phi ();
> +      gphi *phi_second = psi_second.phi ();
> +
> +      init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
> +      next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
> +      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
> +      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
> +
> +      /* Prefer using original variable as a base for the new ssa name.
> +        This is necessary for virtual ops, and useful in order to avoid
> +        losing debug info for real ops.  */
> +      if (TREE_CODE (next) == SSA_NAME
> +         && useless_type_conversion_p (TREE_TYPE (next),
> +                                       TREE_TYPE (init)))
> +       new_init = copy_ssa_name (next);
> +      else if (TREE_CODE (init) == SSA_NAME
> +              && useless_type_conversion_p (TREE_TYPE (init),
> +                                            TREE_TYPE (next)))
> +       new_init = copy_ssa_name (init);
> +      else if (useless_type_conversion_p (TREE_TYPE (next),
> +                                         TREE_TYPE (init)))
> +       new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
> +                                      "unrinittmp");
> +      else
> +       new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
> +                                      "unrinittmp");
> +
> +      gphi * newphi = create_phi_node (new_init, rest);
> +      add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
> +      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
> +      SET_USE (op, new_init);
> +    }
> +}
> +
> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
> +   they are still equivalent and placed in two arms of a diamond, like so:
> +
> +               .------if (cond)------.
> +               v                     v
> +             pre1                   pre2
> +              |                      |
> +        .--->h1                     h2<----.
> +        |     |                      |     |
> +        |    ex1---.            .---ex2    |
> +        |    /     |            |     \    |
> +        '---l1     X            |     l2---'
> +                   |            |
> +                   |            |
> +                   '--->join<---'
> +
> +   This function transforms the program such that LOOP1 is conditionally
> +   falling through to LOOP2, or skipping it.  This is done by splitting
> +   the ex1->join edge at X in the diagram above, and inserting a condition
> +   whose one arm goes to pre2, resulting in this situation:
> +
> +               .------if (cond)------.
> +               v                     v
> +             pre1       .---------->pre2
> +              |         |            |
> +        .--->h1         |           h2<----.
> +        |     |         |            |     |
> +        |    ex1---.    |       .---ex2    |
> +        |    /     v    |       |     \    |
> +        '---l1   skip---'       |     l2---'
> +                   |            |
> +                   |            |
> +                   '--->join<---'
> +
> +
> +   The condition used is the exit condition of LOOP1, which effectively means
> +   that when the first loop exits (for whatever reason) but the real original
> +   exit expression is still false the second loop will be entered.
> +   The function returns the new edge cond->pre2.
> +
> +   This doesn't update the SSA form, see connect_loop_phis for that.  */
> +
> +static edge
> +connect_loops (struct loop *loop1, struct loop *loop2)
> +{
> +  edge exit = single_exit (loop1);
> +  basic_block skip_bb = split_edge (exit);
> +  gcond *skip_stmt;
> +  gimple_stmt_iterator gsi;
> +  edge new_e, skip_e;
> +
> +  gimple *stmt = last_stmt (exit->src);
> +  skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
> +                                gimple_cond_lhs (stmt),
> +                                gimple_cond_rhs (stmt),
> +                                NULL_TREE, NULL_TREE);
> +  gsi = gsi_last_bb (skip_bb);
> +  gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
> +
> +  skip_e = EDGE_SUCC (skip_bb, 0);
> +  skip_e->flags &= ~EDGE_FALLTHRU;
> +  new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
> +  if (exit->flags & EDGE_TRUE_VALUE)
> +    {
> +      skip_e->flags |= EDGE_TRUE_VALUE;
> +      new_e->flags |= EDGE_FALSE_VALUE;
> +    }
> +  else
> +    {
> +      skip_e->flags |= EDGE_FALSE_VALUE;
> +      new_e->flags |= EDGE_TRUE_VALUE;
> +    }
> +
> +  new_e->count = skip_bb->count;
> +  new_e->probability = PROB_LIKELY;
> +  new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
> +  skip_e->count -= new_e->count;
> +  skip_e->probability = inverse_probability (PROB_LIKELY);
> +
> +  return new_e;
> +}
> +
> +/* This returns the new bound for iterations given the original iteration
> +   space in NITER, an arbitrary new bound BORDER, assumed to be some
> +   comparison value with a different IV, the initial value GUARD_INIT of
> +   that other IV, and the comparison code GUARD_CODE that compares
> +   that other IV with BORDER.  We return an SSA name, and place any
> +   necessary statements for that computation into *STMTS.
> +
> +   For example for such a loop:
> +
> +     for (i = beg, j = guard_init; i < end; i++, j++)
> +       if (j < border)  // this is supposed to be true/false
> +         ...
> +
> +   we want to return a new bound (on j) that makes the loop iterate
> +   as long as the condition j < border stays true.  We also don't want
> +   to iterate more often than the original loop, so we have to introduce
> +   some cut-off as well (via min/max), effectively resulting in:
> +
> +     newend = min (end+guard_init-beg, border)
> +     for (i = beg; j = guard_init; j < newend; i++, j++)
> +       if (j < c)
> +         ...
> +
> +   Depending on the direction of the IVs and if the exit tests
> +   are strict or non-strict we need to use MIN or MAX,
> +   and add or subtract 1.  This routine computes newend above.  */
> +
> +static tree
> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
> +                        tree border,
> +                        enum tree_code guard_code, tree guard_init)
> +{
> +  /* The niter structure contains the after-increment IV, we need
> +     the loop-enter base, so subtract STEP once.  */
> +  tree controlbase = force_gimple_operand (niter->control.base,
> +                                          stmts, true, NULL_TREE);
> +  tree controlstep = niter->control.step;
> +  tree enddiff;
> +  if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
> +    {
> +      controlstep = gimple_build (stmts, NEGATE_EXPR,
> +                                 TREE_TYPE (controlstep), controlstep);
> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
> +                             TREE_TYPE (controlbase),
> +                             controlbase, controlstep);
> +    }
> +  else
> +    enddiff = gimple_build (stmts, MINUS_EXPR,
> +                           TREE_TYPE (controlbase),
> +                           controlbase, controlstep);
> +
> +  /* Compute beg-guard_init.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
> +    {
> +      tree tem = gimple_convert (stmts, sizetype, guard_init);
> +      tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
> +                             TREE_TYPE (enddiff),
> +                             enddiff, tem);
> +    }
> +  else
> +    enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> +                           enddiff, guard_init);
> +
> +  /* Compute end-(beg-guard_init).  */
> +  gimple_seq stmts2;
> +  tree newbound = force_gimple_operand (niter->bound, &stmts2,
> +                                       true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (stmts, stmts2);
> +
> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff))
> +      || POINTER_TYPE_P (TREE_TYPE (newbound)))
> +    {
> +      enddiff = gimple_convert (stmts, sizetype, enddiff);
> +      enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
> +      newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
> +                              TREE_TYPE (newbound),
> +                              newbound, enddiff);
> +    }
> +  else
> +    newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> +                            newbound, enddiff);
> +
> +  /* Depending on the direction of the IVs the new bound for the first
> +     loop is the minimum or maximum of old bound and border.
> +     Also, if the guard condition isn't strictly less or greater,
> +     we need to adjust the bound.  */
> +  int addbound = 0;
> +  enum tree_code minmax;
> +  if (niter->cmp == LT_EXPR)
> +    {
> +      /* GT and LE are the same, inverted.  */
> +      if (guard_code == GT_EXPR || guard_code == LE_EXPR)
> +       addbound = -1;
> +      minmax = MIN_EXPR;
> +    }
> +  else
> +    {
> +      gcc_assert (niter->cmp == GT_EXPR);
> +      if (guard_code == GE_EXPR || guard_code == LT_EXPR)
> +       addbound = 1;
> +      minmax = MAX_EXPR;
> +    }
> +
> +  if (addbound)
> +    {
> +      tree type2 = TREE_TYPE (newbound);
> +      if (POINTER_TYPE_P (type2))
> +       type2 = sizetype;
> +      newbound = gimple_build (stmts,
> +                              POINTER_TYPE_P (TREE_TYPE (newbound))
> +                              ? POINTER_PLUS_EXPR : PLUS_EXPR,
> +                              TREE_TYPE (newbound),
> +                              newbound,
> +                              build_int_cst (type2, addbound));
> +    }
> +
> +  tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
> +                             border, newbound);
> +  return newend;
> +}
> +
> +/* Checks if LOOP contains an conditional block whose condition
> +   depends on which side in the iteration space it is, and if so
> +   splits the iteration space into two loops.  Returns true if the
> +   loop was split.  NITER must contain the iteration descriptor for the
> +   single exit of LOOP.  */
> +
> +static bool
> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
> +{
> +  basic_block *bbs;
> +  unsigned i;
> +  bool changed = false;
> +  tree guard_iv;
> +  tree border;
> +  affine_iv iv;
> +
> +  bbs = get_loop_body (loop1);
> +
> +  /* Find a splitting opportunity.  */
> +  for (i = 0; i < loop1->num_nodes; i++)
> +    if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
> +      {
> +       /* Handling opposite steps is not implemented yet.  Neither
> +          is handling different step sizes.  */
> +       if ((tree_int_cst_sign_bit (iv.step)
> +            != tree_int_cst_sign_bit (niter->control.step))
> +           || !tree_int_cst_equal (iv.step, niter->control.step))
> +         continue;
> +
> +       /* Find a loop PHI node that defines guard_iv directly,
> +          or create one doing that.  */
> +       gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
> +       if (!phi)
> +         continue;
> +       gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
> +       tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
> +                                                loop_preheader_edge (loop1));
> +       enum tree_code guard_code = gimple_cond_code (guard_stmt);
> +
> +       /* Loop splitting is implemented by versioning the loop, placing
> +          the new loop after the old loop, make the first loop iterate
> +          as long as the conditional stays true (or false) and let the
> +          second (new) loop handle the rest of the iterations.
> +
> +          First we need to determine if the condition will start being true
> +          or false in the first loop.  */
> +       bool initial_true;
> +       switch (guard_code)
> +         {
> +           case LT_EXPR:
> +           case LE_EXPR:
> +             initial_true = !tree_int_cst_sign_bit (iv.step);
> +             break;
> +           case GT_EXPR:
> +           case GE_EXPR:
> +             initial_true = tree_int_cst_sign_bit (iv.step);
> +             break;
> +           default:
> +             gcc_unreachable ();
> +         }
> +
> +       /* Build a condition that will skip the first loop when the
> +          guard condition won't ever be true (or false).  */
> +       gimple_seq stmts2;
> +       border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
> +       if (stmts2)
> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> +                                           stmts2);
> +       tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> +       if (!initial_true)
> +         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +       /* Now version the loop, placing loop2 after loop1 connecting
> +          them, and fix up SSA form for that.  */
> +       initialize_original_copy_tables ();
> +       basic_block cond_bb;
> +       struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
> +                                          REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> +                                          REG_BR_PROB_BASE, true);
> +       gcc_assert (loop2);
> +       update_ssa (TODO_update_ssa);
> +
> +       edge new_e = connect_loops (loop1, loop2);
> +       connect_loop_phis (loop1, loop2, new_e);
> +
> +       /* The iterations of the second loop is now already
> +          exactly those that the first loop didn't do, but the
> +          iteration space of the first loop is still the original one.
> +          Compute the new bound for the guarding IV and patch the
> +          loop exit to use it instead of original IV and bound.  */
> +       gimple_seq stmts = NULL;
> +       tree newend = compute_new_first_bound (&stmts, niter, border,
> +                                              guard_code, guard_init);
> +       if (stmts)
> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> +                                           stmts);
> +       tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
> +       patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
> +
> +       /* Finally patch out the two copies of the condition to be always
> +          true/false (or opposite).  */
> +       gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
> +       gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
> +       if (!initial_true)
> +         std::swap (force_true, force_false);
> +       gimple_cond_make_true (force_true);
> +       gimple_cond_make_false (force_false);
> +       update_stmt (force_true);
> +       update_stmt (force_false);
> +
> +       free_original_copy_tables ();
> +
> +       /* We destroyed LCSSA form above.  Eventually we might be able
> +          to fix it on the fly, for now simply punt and use the helper.  */
> +       rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +       changed = true;
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         fprintf (dump_file, ";; Loop split.\n");
> +
> +       /* Only deal with the first opportunity.  */
> +       break;
> +      }
> +
> +  free (bbs);
> +  return changed;
> +}
> +
> +/* Main entry point.  Perform loop splitting on all suitable loops.  */
> +
> +static unsigned int
> +tree_ssa_split_loops (void)
> +{
> +  struct loop *loop;
> +  bool changed = false;
> +
> +  gcc_assert (scev_initialized_p ());
> +  FOR_EACH_LOOP (loop, 0)
> +    loop->aux = NULL;
> +
> +  /* Go through all loops starting from innermost.  */
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      struct tree_niter_desc niter;
> +      if (loop->aux)
> +       {
> +         /* If any of our inner loops was split, don't split us,
> +            and mark our containing loop as having had splits as well.  */
> +         loop_outer (loop)->aux = loop;
> +         continue;
> +       }
> +
> +      if (single_exit (loop)
> +         /* ??? We could handle non-empty latches when we split
> +            the latch edge (not the exit edge), and put the new
> +            exit condition in the new block.  OTOH this executes some
> +            code unconditionally that might have been skipped by the
> +            original exit before.  */
> +         && empty_block_p (loop->latch)
> +         && !optimize_loop_for_size_p (loop)
> +         && number_of_iterations_exit (loop, single_exit (loop), &niter,
> +                                       false, true)
> +         && niter.cmp != ERROR_MARK
> +         /* We can't yet handle loops controlled by a != predicate.  */
> +         && niter.cmp != NE_EXPR)
> +       {
> +         if (split_loop (loop, &niter))
> +           {
> +             /* Mark our containing loop as having had some split inner
> +                loops.  */
> +             loop_outer (loop)->aux = loop;
> +             changed = true;
> +           }
> +       }
> +    }
> +
> +  FOR_EACH_LOOP (loop, 0)
> +    loop->aux = NULL;
> +
> +  if (changed)
> +    return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +/* Loop splitting pass.  */
> +
> +namespace {
> +
> +const pass_data pass_data_loop_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "lsplit", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_LOOP_SPLIT, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_loop_split : public gimple_opt_pass
> +{
> +public:
> +  pass_loop_split (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_loop_split, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_split_loops != 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_loop_split
> +
> +unsigned int
> +pass_loop_split::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  return tree_ssa_split_loops ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_loop_split (gcc::context *ctxt)
> +{
> +  return new pass_loop_split (ctxt);
> +}
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 231115)
> +++ doc/invoke.texi     (working copy)
> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fselective-scheduling -fselective-scheduling2 @gol
>  -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>  -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>  -fsplit-paths @gol
>  -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>  -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>  Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
>  at level @option{-O1}
>
> +@item -fsplit-loops
> +@opindex fsplit-loops
> +Split a loop into two if it contains a condition that's always true
> +for one side of the iteration space and false for the other.
> +
>  @item -funswitch-loops
>  @opindex funswitch-loops
>  Move branches with loop invariant conditions out of the loop, with duplicates
> Index: doc/passes.texi
> ===================================================================
> --- doc/passes.texi     (revision 231115)
> +++ doc/passes.texi     (working copy)
> @@ -484,6 +484,12 @@ out of the loops.  To achieve this, a du
>  each possible outcome of conditional jump(s).  The pass is implemented in
>  @file{tree-ssa-loop-unswitch.c}.
>
> +Loop splitting.  If a loop contains a conditional statement that is
> +always true for one part of the iteration space and false for the other
> +this pass splits the loop into two, one dealing with one side the other
> +only with the other, thereby removing one inner-loop conditional.  The
> +pass is implemented in @file{tree-ssa-loop-split.c}.
> +
>  The optimizations also use various utility functions contained in
>  @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>  @file{cfgloopmanip.c}.
> Index: testsuite/gcc.dg/loop-split.c
> ===================================================================
> --- testsuite/gcc.dg/loop-split.c       (revision 0)
> +++ testsuite/gcc.dg/loop-split.c       (working copy)
> @@ -0,0 +1,147 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
> +
> +#ifdef __cplusplus
> +extern "C" int printf (const char *, ...);
> +extern "C" void abort (void);
> +#else
> +extern int printf (const char *, ...);
> +extern void abort (void);
> +#endif
> +
> +/* Define TRACE to 1 or 2 to get detailed tracing.
> +   Define SINGLE_TEST to 1 or 2 to get a simple routine with
> +   just one loop, called only one time or with multiple parameters,
> +   to make debugging easier.  */
> +#ifndef TRACE
> +#define TRACE 0
> +#endif
> +
> +#define loop(beg,step,beg2,cond1,cond2) \
> +    do \
> +      { \
> +       sum = 0; \
> +        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
> +          { \
> +            if (cond2) { \
> +             if (TRACE > 1) printf ("a: %d %d\n", i, j); \
> +              sum += a[i]; \
> +           } else { \
> +             if (TRACE > 1) printf ("b: %d %d\n", i, j); \
> +              sum += b[i]; \
> +           } \
> +          } \
> +       if (TRACE > 0) printf ("sum: %d\n", sum); \
> +       check = check * 47 + sum; \
> +      } while (0)
> +
> +#ifndef SINGLE_TEST
> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
> +                                              int c, int *a, int *b, int beg2)
> +{
> +  unsigned check = 0;
> +  int sum;
> +  int i, j;
> +  loop (beg, 1, beg2, i < end, j < c);
> +  loop (beg, 1, beg2, i <= end, j < c);
> +  loop (beg, 1, beg2, i < end, j <= c);
> +  loop (beg, 1, beg2, i <= end, j <= c);
> +  loop (beg, 1, beg2, i < end, j > c);
> +  loop (beg, 1, beg2, i <= end, j > c);
> +  loop (beg, 1, beg2, i < end, j >= c);
> +  loop (beg, 1, beg2, i <= end, j >= c);
> +  beg2 += end-beg;
> +  loop (end, -1, beg2, i >= beg, j >= c);
> +  loop (end, -1, beg2, i >= beg, j > c);
> +  loop (end, -1, beg2, i > beg, j >= c);
> +  loop (end, -1, beg2, i > beg, j > c);
> +  loop (end, -1, beg2, i >= beg, j <= c);
> +  loop (end, -1, beg2, i >= beg, j < c);
> +  loop (end, -1, beg2, i > beg, j <= c);
> +  loop (end, -1, beg2, i > beg, j < c);
> +  return check;
> +}
> +
> +#else
> +
> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
> +                                         int c, int *a, int *b, int beg2)
> +{
> +  int sum = 0;
> +  int i, j;
> +  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> +  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
> +    {
> +      // i - j == X --> i = X + j
> +      // --> i < end == X+j < end == j < end - X
> +      // --> newend = end - (i_init - j_init)
> +      // j < end-X && j < c --> j < min(end-X,c)
> +      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
> +      //if (j < c)
> +      if (j >= c)
> +       printf ("a: %d %d\n", i, j);
> +      /*else
> +       printf ("b: %d %d\n", i, j);*/
> +       /*sum += a[i];
> +      else
> +       sum += b[i];*/
> +    }
> +  return sum;
> +}
> +
> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
> +                                         int *c, int *a, int *b, int *beg2)
> +{
> +  int sum = 0;
> +  int *i, *j;
> +  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> +    {
> +      if (j <= c)
> +       printf ("%d %d\n", i - beg, j - beg);
> +       /*sum += a[i];
> +      else
> +       sum += b[i];*/
> +    }
> +  return sum;
> +}
> +#endif
> +
> +extern int printf (const char *, ...);
> +
> +int main ()
> +{
> +  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
> +  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
> +  int c;
> +  int diff = 0;
> +  unsigned check = 0;
> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
> +  //dotest (0, 9, 1, -1, a+5, b+5, -1);
> +  //return 0;
> +  f (0, 9, 1, 5, a+5, b+5, -1);
> +  return 0;
> +#endif
> +  for (diff = -5; diff <= 5; diff++)
> +    {
> +      for (c = -1; c <= 10; c++)
> +       {
> +#ifdef SINGLE_TEST
> +         int s = f (0, 9, 1, c, a+5, b+5, diff);
> +         //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
> +         printf ("%d ", s);
> +#else
> +         if (TRACE > 0)
> +           printf ("check %d %d\n", c, diff);
> +         check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
> +#endif
> +       }
> +      //printf ("\n");
> +    }
> +  //printf ("%u\n", check);
> +  if (check != 3213344948)
> +    abort ();
> +  return 0;
> +}
> +
> +/* All 16 loops in dotest should be split.  */
> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-07-25 20:57             ` Andrew Pinski
@ 2016-07-26 11:32               ` Richard Biener
  2016-07-27  6:18                 ` Andrew Pinski
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2016-07-26 11:32 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Michael Matz, Jeff Law, GCC Patches

On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>
>>> > So, okay for trunk?
>>> -ENOPATCH
>>
>> Sigh :)
>> Here it is.
>
>
> I found one problem with it.
> Take:
> void f(int *a, int M, int *b)
> {
>   for(int i = 0; i <= M; i++)
>     {
>        if (i < M)
>         a[i] = i;
>     }
> }
> ---- CUT ---
> There are two issues with the code as below.  The outer most loop's
> aux is still set which causes the vectorizer not to vector the loop.
> The other issue is I need to run pass_scev_cprop after pass_loop_split
> to get the induction variable usage after the loop gone so the
> vectorizer will work.

I think scev_cprop needs to be re-written to an utility so that the vectorizer
itself can (within its own cost-model) eliminate an induction using it.

Richard.

> Something like (note this is copy and paste from a terminal):
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c327900..e8d6ea6 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_copy_prop);
>           NEXT_PASS (pass_dce);
>           NEXT_PASS (pass_tree_unswitch);
> -         NEXT_PASS (pass_scev_cprop);
>           NEXT_PASS (pass_loop_split);
> +         NEXT_PASS (pass_scev_cprop);
>           NEXT_PASS (pass_record_bounds);
>           NEXT_PASS (pass_loop_distribution);
>           NEXT_PASS (pass_copy_prop);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 5411530..e72ef19 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>
>    gcc_assert (scev_initialized_p ());
>    FOR_EACH_LOOP (loop, 0)
> -    loop->aux = NULL;
> +    {
> +      loop->aux = NULL;
> +      if (loop_outer (loop))
> +       loop_outer (loop)->aux = NULL;
> +    }

How does the iterator not visit loop_outer (loop)?!

>
>    /* Go through all loops starting from innermost.  */
>    FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>      }
>
>    FOR_EACH_LOOP (loop, 0)
> -    loop->aux = NULL;
> +    {
> +      loop->aux = NULL;
> +      if (loop_outer (loop))
> +       loop_outer (loop)->aux = NULL;
> +    }
>
>    if (changed)
>      return TODO_cleanup_cfg;
> -----  CUT -----
>
> Thanks,
> Andrew
>
>
>>
>>
>> Ciao,
>> Michael.
>>         * common.opt (-fsplit-loops): New flag.
>>         * passes.def (pass_loop_split): Add.
>>         * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>         (enable_fdo_optimizations): Add loop splitting.
>>         * timevar.def (TV_LOOP_SPLIT): Add.
>>         * tree-pass.h (make_pass_loop_split): Declare.
>>         * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>         * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>         * tree-ssa-loop-split.c: New file.
>>         * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>         * doc/invoke.texi (fsplit-loops): Document.
>>         * doc/passes.texi (Loop optimization): Add paragraph about loop
>>         splitting.
>>
>> testsuite/
>>         * gcc.dg/loop-split.c: New test.
>>
>> Index: common.opt
>> ===================================================================
>> --- common.opt  (revision 231115)
>> +++ common.opt  (working copy)
>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>  Common Report Var(flag_unswitch_loops) Optimization
>>  Perform loop unswitching.
>>
>> +fsplit-loops
>> +Common Report Var(flag_split_loops) Optimization
>> +Perform loop splitting.
>> +
>>  funwind-tables
>>  Common Report Var(flag_unwind_tables) Optimization
>>  Just generate unwind tables for exception handling.
>> Index: passes.def
>> ===================================================================
>> --- passes.def  (revision 231115)
>> +++ passes.def  (working copy)
>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>           NEXT_PASS (pass_dce);
>>           NEXT_PASS (pass_tree_unswitch);
>>           NEXT_PASS (pass_scev_cprop);
>> +         NEXT_PASS (pass_loop_split);
>>           NEXT_PASS (pass_record_bounds);
>>           NEXT_PASS (pass_loop_distribution);
>>           NEXT_PASS (pass_copy_prop);
>> Index: opts.c
>> ===================================================================
>> --- opts.c      (revision 231115)
>> +++ opts.c      (working copy)
>> @@ -532,6 +532,7 @@ static const struct default_options defa
>>         regardless of them being declared inline.  */
>>      { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>>      { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
>> +    { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>>      { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>>      { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>>      opts->x_flag_ipa_cp_alignment = value;
>>    if (!opts_set->x_flag_predictive_commoning)
>>      opts->x_flag_predictive_commoning = value;
>> +  if (!opts_set->x_flag_split_loops)
>> +    opts->x_flag_split_loops = value;
>>    if (!opts_set->x_flag_unswitch_loops)
>>      opts->x_flag_unswitch_loops = value;
>>    if (!opts_set->x_flag_gcse_after_reload)
>> Index: timevar.def
>> ===================================================================
>> --- timevar.def (revision 231115)
>> +++ timevar.def (working copy)
>> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM                   , "
>>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>> +DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
>>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>>  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
>> Index: tree-pass.h
>> ===================================================================
>> --- tree-pass.h (revision 231115)
>> +++ tree-pass.h (working copy)
>> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>> Index: tree-ssa-loop-manip.h
>> ===================================================================
>> --- tree-ssa-loop-manip.h       (revision 231115)
>> +++ tree-ssa-loop-manip.h       (working copy)
>> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>>
>>  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>>                        bool, tree *, tree *);
>> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
>> +                                           struct loop *);
>>  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>>  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>>  extern void verify_loop_closed_ssa (bool);
>> Index: Makefile.in
>> ===================================================================
>> --- Makefile.in (revision 231115)
>> +++ Makefile.in (working copy)
>> @@ -1474,6 +1474,7 @@ OBJS = \
>>         tree-ssa-loop-manip.o \
>>         tree-ssa-loop-niter.o \
>>         tree-ssa-loop-prefetch.o \
>> +       tree-ssa-loop-split.o \
>>         tree-ssa-loop-unswitch.o \
>>         tree-ssa-loop.o \
>>         tree-ssa-math-opts.o \
>> Index: tree-ssa-loop-split.c
>> ===================================================================
>> --- tree-ssa-loop-split.c       (revision 0)
>> +++ tree-ssa-loop-split.c       (working copy)
>> @@ -0,0 +1,686 @@
>> +/* Loop splitting.
>> +   Copyright (C) 2015 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it
>> +under the terms of the GNU General Public License as published by the
>> +Free Software Foundation; either version 3, or (at your option) any
>> +later version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "backend.h"
>> +#include "tree.h"
>> +#include "gimple.h"
>> +#include "tree-pass.h"
>> +#include "ssa.h"
>> +#include "fold-const.h"
>> +#include "tree-cfg.h"
>> +#include "tree-ssa.h"
>> +#include "tree-ssa-loop-niter.h"
>> +#include "tree-ssa-loop.h"
>> +#include "tree-ssa-loop-manip.h"
>> +#include "tree-into-ssa.h"
>> +#include "cfgloop.h"
>> +#include "tree-scalar-evolution.h"
>> +#include "gimple-iterator.h"
>> +#include "gimple-pretty-print.h"
>> +#include "cfghooks.h"
>> +#include "gimple-fold.h"
>> +#include "gimplify-me.h"
>> +
>> +/* This file implements loop splitting, i.e. transformation of loops like
>> +
>> +   for (i = 0; i < 100; i++)
>> +     {
>> +       if (i < 50)
>> +         A;
>> +       else
>> +         B;
>> +     }
>> +
>> +   into:
>> +
>> +   for (i = 0; i < 50; i++)
>> +     {
>> +       A;
>> +     }
>> +   for (; i < 100; i++)
>> +     {
>> +       B;
>> +     }
>> +
>> +   */
>> +
>> +/* Return true when BB inside LOOP is a potential iteration space
>> +   split point, i.e. ends with a condition like "IV < comp", which
>> +   is true on one side of the iteration space and false on the other,
>> +   and the split point can be computed.  If so, also return the border
>> +   point in *BORDER and the comparison induction variable in IV.  */
>> +
>> +static tree
>> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
>> +{
>> +  gimple *last;
>> +  gcond *stmt;
>> +  affine_iv iv2;
>> +
>> +  /* BB must end in a simple conditional jump.  */
>> +  last = last_stmt (bb);
>> +  if (!last || gimple_code (last) != GIMPLE_COND)
>> +    return NULL_TREE;
>> +  stmt = as_a <gcond *> (last);
>> +
>> +  enum tree_code code = gimple_cond_code (stmt);
>> +
>> +  /* Only handle relational comparisons, for equality and non-equality
>> +     we'd have to split the loop into two loops and a middle statement.  */
>> +  switch (code)
>> +    {
>> +      case LT_EXPR:
>> +      case LE_EXPR:
>> +      case GT_EXPR:
>> +      case GE_EXPR:
>> +       break;
>> +      default:
>> +       return NULL_TREE;
>> +    }
>> +
>> +  if (loop_exits_from_bb_p (loop, bb))
>> +    return NULL_TREE;
>> +
>> +  tree op0 = gimple_cond_lhs (stmt);
>> +  tree op1 = gimple_cond_rhs (stmt);
>> +
>> +  if (!simple_iv (loop, loop, op0, iv, false))
>> +    return NULL_TREE;
>> +  if (!simple_iv (loop, loop, op1, &iv2, false))
>> +    return NULL_TREE;
>> +
>> +  /* Make it so, that the first argument of the condition is
>> +     the looping one (only swap.  */
>> +  if (!integer_zerop (iv2.step))
>> +    {
>> +      std::swap (op0, op1);
>> +      std::swap (*iv, iv2);
>> +      code = swap_tree_comparison (code);
>> +      gimple_cond_set_condition (stmt, code, op0, op1);
>> +      update_stmt (stmt);
>> +    }
>> +  else if (integer_zerop (iv->step))
>> +    return NULL_TREE;
>> +  if (!integer_zerop (iv2.step))
>> +    return NULL_TREE;
>> +
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "Found potential split point: ");
>> +      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>> +      fprintf (dump_file, " { ");
>> +      print_generic_expr (dump_file, iv->base, TDF_SLIM);
>> +      fprintf (dump_file, " + I*");
>> +      print_generic_expr (dump_file, iv->step, TDF_SLIM);
>> +      fprintf (dump_file, " } %s ", get_tree_code_name (code));
>> +      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
>> +      fprintf (dump_file, "\n");
>> +    }
>> +
>> +  *border = iv2.base;
>> +  return op0;
>> +}
>> +
>> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
>> +   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
>> +   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
>> +   exit test statement to loop back only if the GUARD statement will
>> +   also be true/false in the next iteration.  */
>> +
>> +static void
>> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
>> +                bool initial_true)
>> +{
>> +  edge exit = single_exit (loop);
>> +  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
>> +  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
>> +                            nextval, newbound);
>> +  update_stmt (stmt);
>> +
>> +  edge stay = single_pred_edge (loop->latch);
>> +
>> +  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>> +  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>> +
>> +  if (initial_true)
>> +    {
>> +      exit->flags |= EDGE_FALSE_VALUE;
>> +      stay->flags |= EDGE_TRUE_VALUE;
>> +    }
>> +  else
>> +    {
>> +      exit->flags |= EDGE_TRUE_VALUE;
>> +      stay->flags |= EDGE_FALSE_VALUE;
>> +    }
>> +}
>> +
>> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
>> +   find the loop phi node in LOOP defining it directly, or create
>> +   such phi node.  Return that phi node.  */
>> +
>> +static gphi *
>> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
>> +{
>> +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
>> +  gphi *phi;
>> +  if ((phi = dyn_cast <gphi *> (def))
>> +      && gimple_bb (phi) == loop->header)
>> +    return phi;
>> +
>> +  /* XXX Create the PHI instead.  */
>> +  return NULL;
>> +}
>> +
>> +/* This function updates the SSA form after connect_loops made a new
>> +   edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
>> +   conditional).  I.e. the second loop can now be entered either
>> +   via the original entry or via NEW_E, so the entry values of LOOP2
>> +   phi nodes are either the original ones or those at the exit
>> +   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
>> +   this.  */
>> +
>> +static void
>> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
>> +{
>> +  basic_block rest = loop_preheader_edge (loop2)->src;
>> +  gcc_assert (new_e->dest == rest);
>> +  edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
>> +
>> +  edge firste = loop_preheader_edge (loop1);
>> +  edge seconde = loop_preheader_edge (loop2);
>> +  edge firstn = loop_latch_edge (loop1);
>> +  gphi_iterator psi_first, psi_second;
>> +  for (psi_first = gsi_start_phis (loop1->header),
>> +       psi_second = gsi_start_phis (loop2->header);
>> +       !gsi_end_p (psi_first);
>> +       gsi_next (&psi_first), gsi_next (&psi_second))
>> +    {
>> +      tree init, next, new_init;
>> +      use_operand_p op;
>> +      gphi *phi_first = psi_first.phi ();
>> +      gphi *phi_second = psi_second.phi ();
>> +
>> +      init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>> +      next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>> +      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>> +      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>> +
>> +      /* Prefer using original variable as a base for the new ssa name.
>> +        This is necessary for virtual ops, and useful in order to avoid
>> +        losing debug info for real ops.  */
>> +      if (TREE_CODE (next) == SSA_NAME
>> +         && useless_type_conversion_p (TREE_TYPE (next),
>> +                                       TREE_TYPE (init)))
>> +       new_init = copy_ssa_name (next);
>> +      else if (TREE_CODE (init) == SSA_NAME
>> +              && useless_type_conversion_p (TREE_TYPE (init),
>> +                                            TREE_TYPE (next)))
>> +       new_init = copy_ssa_name (init);
>> +      else if (useless_type_conversion_p (TREE_TYPE (next),
>> +                                         TREE_TYPE (init)))
>> +       new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>> +                                      "unrinittmp");
>> +      else
>> +       new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>> +                                      "unrinittmp");
>> +
>> +      gphi * newphi = create_phi_node (new_init, rest);
>> +      add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>> +      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>> +      SET_USE (op, new_init);
>> +    }
>> +}
>> +
>> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
>> +   they are still equivalent and placed in two arms of a diamond, like so:
>> +
>> +               .------if (cond)------.
>> +               v                     v
>> +             pre1                   pre2
>> +              |                      |
>> +        .--->h1                     h2<----.
>> +        |     |                      |     |
>> +        |    ex1---.            .---ex2    |
>> +        |    /     |            |     \    |
>> +        '---l1     X            |     l2---'
>> +                   |            |
>> +                   |            |
>> +                   '--->join<---'
>> +
>> +   This function transforms the program such that LOOP1 is conditionally
>> +   falling through to LOOP2, or skipping it.  This is done by splitting
>> +   the ex1->join edge at X in the diagram above, and inserting a condition
>> +   whose one arm goes to pre2, resulting in this situation:
>> +
>> +               .------if (cond)------.
>> +               v                     v
>> +             pre1       .---------->pre2
>> +              |         |            |
>> +        .--->h1         |           h2<----.
>> +        |     |         |            |     |
>> +        |    ex1---.    |       .---ex2    |
>> +        |    /     v    |       |     \    |
>> +        '---l1   skip---'       |     l2---'
>> +                   |            |
>> +                   |            |
>> +                   '--->join<---'
>> +
>> +
>> +   The condition used is the exit condition of LOOP1, which effectively means
>> +   that when the first loop exits (for whatever reason) but the real original
>> +   exit expression is still false the second loop will be entered.
>> +   The function returns the new edge cond->pre2.
>> +
>> +   This doesn't update the SSA form, see connect_loop_phis for that.  */
>> +
>> +static edge
>> +connect_loops (struct loop *loop1, struct loop *loop2)
>> +{
>> +  edge exit = single_exit (loop1);
>> +  basic_block skip_bb = split_edge (exit);
>> +  gcond *skip_stmt;
>> +  gimple_stmt_iterator gsi;
>> +  edge new_e, skip_e;
>> +
>> +  gimple *stmt = last_stmt (exit->src);
>> +  skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
>> +                                gimple_cond_lhs (stmt),
>> +                                gimple_cond_rhs (stmt),
>> +                                NULL_TREE, NULL_TREE);
>> +  gsi = gsi_last_bb (skip_bb);
>> +  gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
>> +
>> +  skip_e = EDGE_SUCC (skip_bb, 0);
>> +  skip_e->flags &= ~EDGE_FALLTHRU;
>> +  new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
>> +  if (exit->flags & EDGE_TRUE_VALUE)
>> +    {
>> +      skip_e->flags |= EDGE_TRUE_VALUE;
>> +      new_e->flags |= EDGE_FALSE_VALUE;
>> +    }
>> +  else
>> +    {
>> +      skip_e->flags |= EDGE_FALSE_VALUE;
>> +      new_e->flags |= EDGE_TRUE_VALUE;
>> +    }
>> +
>> +  new_e->count = skip_bb->count;
>> +  new_e->probability = PROB_LIKELY;
>> +  new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
>> +  skip_e->count -= new_e->count;
>> +  skip_e->probability = inverse_probability (PROB_LIKELY);
>> +
>> +  return new_e;
>> +}
>> +
>> +/* This returns the new bound for iterations given the original iteration
>> +   space in NITER, an arbitrary new bound BORDER, assumed to be some
>> +   comparison value with a different IV, the initial value GUARD_INIT of
>> +   that other IV, and the comparison code GUARD_CODE that compares
>> +   that other IV with BORDER.  We return an SSA name, and place any
>> +   necessary statements for that computation into *STMTS.
>> +
>> +   For example for such a loop:
>> +
>> +     for (i = beg, j = guard_init; i < end; i++, j++)
>> +       if (j < border)  // this is supposed to be true/false
>> +         ...
>> +
>> +   we want to return a new bound (on j) that makes the loop iterate
>> +   as long as the condition j < border stays true.  We also don't want
>> +   to iterate more often than the original loop, so we have to introduce
>> +   some cut-off as well (via min/max), effectively resulting in:
>> +
>> +     newend = min (end+guard_init-beg, border)
>> +     for (i = beg; j = guard_init; j < newend; i++, j++)
>> +       if (j < c)
>> +         ...
>> +
>> +   Depending on the direction of the IVs and if the exit tests
>> +   are strict or non-strict we need to use MIN or MAX,
>> +   and add or subtract 1.  This routine computes newend above.  */
>> +
>> +static tree
>> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>> +                        tree border,
>> +                        enum tree_code guard_code, tree guard_init)
>> +{
>> +  /* The niter structure contains the after-increment IV, we need
>> +     the loop-enter base, so subtract STEP once.  */
>> +  tree controlbase = force_gimple_operand (niter->control.base,
>> +                                          stmts, true, NULL_TREE);
>> +  tree controlstep = niter->control.step;
>> +  tree enddiff;
>> +  if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>> +    {
>> +      controlstep = gimple_build (stmts, NEGATE_EXPR,
>> +                                 TREE_TYPE (controlstep), controlstep);
>> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>> +                             TREE_TYPE (controlbase),
>> +                             controlbase, controlstep);
>> +    }
>> +  else
>> +    enddiff = gimple_build (stmts, MINUS_EXPR,
>> +                           TREE_TYPE (controlbase),
>> +                           controlbase, controlstep);
>> +
>> +  /* Compute beg-guard_init.  */
>> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
>> +    {
>> +      tree tem = gimple_convert (stmts, sizetype, guard_init);
>> +      tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
>> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>> +                             TREE_TYPE (enddiff),
>> +                             enddiff, tem);
>> +    }
>> +  else
>> +    enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>> +                           enddiff, guard_init);
>> +
>> +  /* Compute end-(beg-guard_init).  */
>> +  gimple_seq stmts2;
>> +  tree newbound = force_gimple_operand (niter->bound, &stmts2,
>> +                                       true, NULL_TREE);
>> +  gimple_seq_add_seq_without_update (stmts, stmts2);
>> +
>> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff))
>> +      || POINTER_TYPE_P (TREE_TYPE (newbound)))
>> +    {
>> +      enddiff = gimple_convert (stmts, sizetype, enddiff);
>> +      enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
>> +      newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
>> +                              TREE_TYPE (newbound),
>> +                              newbound, enddiff);
>> +    }
>> +  else
>> +    newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>> +                            newbound, enddiff);
>> +
>> +  /* Depending on the direction of the IVs the new bound for the first
>> +     loop is the minimum or maximum of old bound and border.
>> +     Also, if the guard condition isn't strictly less or greater,
>> +     we need to adjust the bound.  */
>> +  int addbound = 0;
>> +  enum tree_code minmax;
>> +  if (niter->cmp == LT_EXPR)
>> +    {
>> +      /* GT and LE are the same, inverted.  */
>> +      if (guard_code == GT_EXPR || guard_code == LE_EXPR)
>> +       addbound = -1;
>> +      minmax = MIN_EXPR;
>> +    }
>> +  else
>> +    {
>> +      gcc_assert (niter->cmp == GT_EXPR);
>> +      if (guard_code == GE_EXPR || guard_code == LT_EXPR)
>> +       addbound = 1;
>> +      minmax = MAX_EXPR;
>> +    }
>> +
>> +  if (addbound)
>> +    {
>> +      tree type2 = TREE_TYPE (newbound);
>> +      if (POINTER_TYPE_P (type2))
>> +       type2 = sizetype;
>> +      newbound = gimple_build (stmts,
>> +                              POINTER_TYPE_P (TREE_TYPE (newbound))
>> +                              ? POINTER_PLUS_EXPR : PLUS_EXPR,
>> +                              TREE_TYPE (newbound),
>> +                              newbound,
>> +                              build_int_cst (type2, addbound));
>> +    }
>> +
>> +  tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
>> +                             border, newbound);
>> +  return newend;
>> +}
>> +
>> +/* Checks if LOOP contains an conditional block whose condition
>> +   depends on which side in the iteration space it is, and if so
>> +   splits the iteration space into two loops.  Returns true if the
>> +   loop was split.  NITER must contain the iteration descriptor for the
>> +   single exit of LOOP.  */
>> +
>> +static bool
>> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
>> +{
>> +  basic_block *bbs;
>> +  unsigned i;
>> +  bool changed = false;
>> +  tree guard_iv;
>> +  tree border;
>> +  affine_iv iv;
>> +
>> +  bbs = get_loop_body (loop1);
>> +
>> +  /* Find a splitting opportunity.  */
>> +  for (i = 0; i < loop1->num_nodes; i++)
>> +    if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>> +      {
>> +       /* Handling opposite steps is not implemented yet.  Neither
>> +          is handling different step sizes.  */
>> +       if ((tree_int_cst_sign_bit (iv.step)
>> +            != tree_int_cst_sign_bit (niter->control.step))
>> +           || !tree_int_cst_equal (iv.step, niter->control.step))
>> +         continue;
>> +
>> +       /* Find a loop PHI node that defines guard_iv directly,
>> +          or create one doing that.  */
>> +       gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
>> +       if (!phi)
>> +         continue;
>> +       gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
>> +       tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
>> +                                                loop_preheader_edge (loop1));
>> +       enum tree_code guard_code = gimple_cond_code (guard_stmt);
>> +
>> +       /* Loop splitting is implemented by versioning the loop, placing
>> +          the new loop after the old loop, make the first loop iterate
>> +          as long as the conditional stays true (or false) and let the
>> +          second (new) loop handle the rest of the iterations.
>> +
>> +          First we need to determine if the condition will start being true
>> +          or false in the first loop.  */
>> +       bool initial_true;
>> +       switch (guard_code)
>> +         {
>> +           case LT_EXPR:
>> +           case LE_EXPR:
>> +             initial_true = !tree_int_cst_sign_bit (iv.step);
>> +             break;
>> +           case GT_EXPR:
>> +           case GE_EXPR:
>> +             initial_true = tree_int_cst_sign_bit (iv.step);
>> +             break;
>> +           default:
>> +             gcc_unreachable ();
>> +         }
>> +
>> +       /* Build a condition that will skip the first loop when the
>> +          guard condition won't ever be true (or false).  */
>> +       gimple_seq stmts2;
>> +       border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
>> +       if (stmts2)
>> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>> +                                           stmts2);
>> +       tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>> +       if (!initial_true)
>> +         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +
>> +       /* Now version the loop, placing loop2 after loop1 connecting
>> +          them, and fix up SSA form for that.  */
>> +       initialize_original_copy_tables ();
>> +       basic_block cond_bb;
>> +       struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
>> +                                          REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>> +                                          REG_BR_PROB_BASE, true);
>> +       gcc_assert (loop2);
>> +       update_ssa (TODO_update_ssa);
>> +
>> +       edge new_e = connect_loops (loop1, loop2);
>> +       connect_loop_phis (loop1, loop2, new_e);
>> +
>> +       /* The iterations of the second loop is now already
>> +          exactly those that the first loop didn't do, but the
>> +          iteration space of the first loop is still the original one.
>> +          Compute the new bound for the guarding IV and patch the
>> +          loop exit to use it instead of original IV and bound.  */
>> +       gimple_seq stmts = NULL;
>> +       tree newend = compute_new_first_bound (&stmts, niter, border,
>> +                                              guard_code, guard_init);
>> +       if (stmts)
>> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>> +                                           stmts);
>> +       tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>> +       patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>> +
>> +       /* Finally patch out the two copies of the condition to be always
>> +          true/false (or opposite).  */
>> +       gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
>> +       gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
>> +       if (!initial_true)
>> +         std::swap (force_true, force_false);
>> +       gimple_cond_make_true (force_true);
>> +       gimple_cond_make_false (force_false);
>> +       update_stmt (force_true);
>> +       update_stmt (force_false);
>> +
>> +       free_original_copy_tables ();
>> +
>> +       /* We destroyed LCSSA form above.  Eventually we might be able
>> +          to fix it on the fly, for now simply punt and use the helper.  */
>> +       rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>> +
>> +       changed = true;
>> +       if (dump_file && (dump_flags & TDF_DETAILS))
>> +         fprintf (dump_file, ";; Loop split.\n");
>> +
>> +       /* Only deal with the first opportunity.  */
>> +       break;
>> +      }
>> +
>> +  free (bbs);
>> +  return changed;
>> +}
>> +
>> +/* Main entry point.  Perform loop splitting on all suitable loops.  */
>> +
>> +static unsigned int
>> +tree_ssa_split_loops (void)
>> +{
>> +  struct loop *loop;
>> +  bool changed = false;
>> +
>> +  gcc_assert (scev_initialized_p ());
>> +  FOR_EACH_LOOP (loop, 0)
>> +    loop->aux = NULL;
>> +
>> +  /* Go through all loops starting from innermost.  */
>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>> +    {
>> +      struct tree_niter_desc niter;
>> +      if (loop->aux)
>> +       {
>> +         /* If any of our inner loops was split, don't split us,
>> +            and mark our containing loop as having had splits as well.  */
>> +         loop_outer (loop)->aux = loop;
>> +         continue;
>> +       }
>> +
>> +      if (single_exit (loop)
>> +         /* ??? We could handle non-empty latches when we split
>> +            the latch edge (not the exit edge), and put the new
>> +            exit condition in the new block.  OTOH this executes some
>> +            code unconditionally that might have been skipped by the
>> +            original exit before.  */
>> +         && empty_block_p (loop->latch)
>> +         && !optimize_loop_for_size_p (loop)
>> +         && number_of_iterations_exit (loop, single_exit (loop), &niter,
>> +                                       false, true)
>> +         && niter.cmp != ERROR_MARK
>> +         /* We can't yet handle loops controlled by a != predicate.  */
>> +         && niter.cmp != NE_EXPR)
>> +       {
>> +         if (split_loop (loop, &niter))
>> +           {
>> +             /* Mark our containing loop as having had some split inner
>> +                loops.  */
>> +             loop_outer (loop)->aux = loop;
>> +             changed = true;
>> +           }
>> +       }
>> +    }
>> +
>> +  FOR_EACH_LOOP (loop, 0)
>> +    loop->aux = NULL;
>> +
>> +  if (changed)
>> +    return TODO_cleanup_cfg;
>> +  return 0;
>> +}
>> +
>> +/* Loop splitting pass.  */
>> +
>> +namespace {
>> +
>> +const pass_data pass_data_loop_split =
>> +{
>> +  GIMPLE_PASS, /* type */
>> +  "lsplit", /* name */
>> +  OPTGROUP_LOOP, /* optinfo_flags */
>> +  TV_LOOP_SPLIT, /* tv_id */
>> +  PROP_cfg, /* properties_required */
>> +  0, /* properties_provided */
>> +  0, /* properties_destroyed */
>> +  0, /* todo_flags_start */
>> +  0, /* todo_flags_finish */
>> +};
>> +
>> +class pass_loop_split : public gimple_opt_pass
>> +{
>> +public:
>> +  pass_loop_split (gcc::context *ctxt)
>> +    : gimple_opt_pass (pass_data_loop_split, ctxt)
>> +  {}
>> +
>> +  /* opt_pass methods: */
>> +  virtual bool gate (function *) { return flag_split_loops != 0; }
>> +  virtual unsigned int execute (function *);
>> +
>> +}; // class pass_loop_split
>> +
>> +unsigned int
>> +pass_loop_split::execute (function *fun)
>> +{
>> +  if (number_of_loops (fun) <= 1)
>> +    return 0;
>> +
>> +  return tree_ssa_split_loops ();
>> +}
>> +
>> +} // anon namespace
>> +
>> +gimple_opt_pass *
>> +make_pass_loop_split (gcc::context *ctxt)
>> +{
>> +  return new pass_loop_split (ctxt);
>> +}
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi     (revision 231115)
>> +++ doc/invoke.texi     (working copy)
>> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -fselective-scheduling -fselective-scheduling2 @gol
>>  -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>>  -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
>> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
>> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>>  -fsplit-paths @gol
>>  -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>>  -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
>> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>>  Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
>>  at level @option{-O1}
>>
>> +@item -fsplit-loops
>> +@opindex fsplit-loops
>> +Split a loop into two if it contains a condition that's always true
>> +for one side of the iteration space and false for the other.
>> +
>>  @item -funswitch-loops
>>  @opindex funswitch-loops
>>  Move branches with loop invariant conditions out of the loop, with duplicates
>> Index: doc/passes.texi
>> ===================================================================
>> --- doc/passes.texi     (revision 231115)
>> +++ doc/passes.texi     (working copy)
>> @@ -484,6 +484,12 @@ out of the loops.  To achieve this, a du
>>  each possible outcome of conditional jump(s).  The pass is implemented in
>>  @file{tree-ssa-loop-unswitch.c}.
>>
>> +Loop splitting.  If a loop contains a conditional statement that is
>> +always true for one part of the iteration space and false for the other
>> +this pass splits the loop into two, one dealing with one side the other
>> +only with the other, thereby removing one inner-loop conditional.  The
>> +pass is implemented in @file{tree-ssa-loop-split.c}.
>> +
>>  The optimizations also use various utility functions contained in
>>  @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>>  @file{cfgloopmanip.c}.
>> Index: testsuite/gcc.dg/loop-split.c
>> ===================================================================
>> --- testsuite/gcc.dg/loop-split.c       (revision 0)
>> +++ testsuite/gcc.dg/loop-split.c       (working copy)
>> @@ -0,0 +1,147 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>> +
>> +#ifdef __cplusplus
>> +extern "C" int printf (const char *, ...);
>> +extern "C" void abort (void);
>> +#else
>> +extern int printf (const char *, ...);
>> +extern void abort (void);
>> +#endif
>> +
>> +/* Define TRACE to 1 or 2 to get detailed tracing.
>> +   Define SINGLE_TEST to 1 or 2 to get a simple routine with
>> +   just one loop, called only one time or with multiple parameters,
>> +   to make debugging easier.  */
>> +#ifndef TRACE
>> +#define TRACE 0
>> +#endif
>> +
>> +#define loop(beg,step,beg2,cond1,cond2) \
>> +    do \
>> +      { \
>> +       sum = 0; \
>> +        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
>> +          { \
>> +            if (cond2) { \
>> +             if (TRACE > 1) printf ("a: %d %d\n", i, j); \
>> +              sum += a[i]; \
>> +           } else { \
>> +             if (TRACE > 1) printf ("b: %d %d\n", i, j); \
>> +              sum += b[i]; \
>> +           } \
>> +          } \
>> +       if (TRACE > 0) printf ("sum: %d\n", sum); \
>> +       check = check * 47 + sum; \
>> +      } while (0)
>> +
>> +#ifndef SINGLE_TEST
>> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
>> +                                              int c, int *a, int *b, int beg2)
>> +{
>> +  unsigned check = 0;
>> +  int sum;
>> +  int i, j;
>> +  loop (beg, 1, beg2, i < end, j < c);
>> +  loop (beg, 1, beg2, i <= end, j < c);
>> +  loop (beg, 1, beg2, i < end, j <= c);
>> +  loop (beg, 1, beg2, i <= end, j <= c);
>> +  loop (beg, 1, beg2, i < end, j > c);
>> +  loop (beg, 1, beg2, i <= end, j > c);
>> +  loop (beg, 1, beg2, i < end, j >= c);
>> +  loop (beg, 1, beg2, i <= end, j >= c);
>> +  beg2 += end-beg;
>> +  loop (end, -1, beg2, i >= beg, j >= c);
>> +  loop (end, -1, beg2, i >= beg, j > c);
>> +  loop (end, -1, beg2, i > beg, j >= c);
>> +  loop (end, -1, beg2, i > beg, j > c);
>> +  loop (end, -1, beg2, i >= beg, j <= c);
>> +  loop (end, -1, beg2, i >= beg, j < c);
>> +  loop (end, -1, beg2, i > beg, j <= c);
>> +  loop (end, -1, beg2, i > beg, j < c);
>> +  return check;
>> +}
>> +
>> +#else
>> +
>> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
>> +                                         int c, int *a, int *b, int beg2)
>> +{
>> +  int sum = 0;
>> +  int i, j;
>> +  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>> +  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
>> +    {
>> +      // i - j == X --> i = X + j
>> +      // --> i < end == X+j < end == j < end - X
>> +      // --> newend = end - (i_init - j_init)
>> +      // j < end-X && j < c --> j < min(end-X,c)
>> +      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
>> +      //if (j < c)
>> +      if (j >= c)
>> +       printf ("a: %d %d\n", i, j);
>> +      /*else
>> +       printf ("b: %d %d\n", i, j);*/
>> +       /*sum += a[i];
>> +      else
>> +       sum += b[i];*/
>> +    }
>> +  return sum;
>> +}
>> +
>> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
>> +                                         int *c, int *a, int *b, int *beg2)
>> +{
>> +  int sum = 0;
>> +  int *i, *j;
>> +  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>> +    {
>> +      if (j <= c)
>> +       printf ("%d %d\n", i - beg, j - beg);
>> +       /*sum += a[i];
>> +      else
>> +       sum += b[i];*/
>> +    }
>> +  return sum;
>> +}
>> +#endif
>> +
>> +extern int printf (const char *, ...);
>> +
>> +int main ()
>> +{
>> +  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
>> +  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
>> +  int c;
>> +  int diff = 0;
>> +  unsigned check = 0;
>> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
>> +  //dotest (0, 9, 1, -1, a+5, b+5, -1);
>> +  //return 0;
>> +  f (0, 9, 1, 5, a+5, b+5, -1);
>> +  return 0;
>> +#endif
>> +  for (diff = -5; diff <= 5; diff++)
>> +    {
>> +      for (c = -1; c <= 10; c++)
>> +       {
>> +#ifdef SINGLE_TEST
>> +         int s = f (0, 9, 1, c, a+5, b+5, diff);
>> +         //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
>> +         printf ("%d ", s);
>> +#else
>> +         if (TRACE > 0)
>> +           printf ("check %d %d\n", c, diff);
>> +         check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
>> +#endif
>> +       }
>> +      //printf ("\n");
>> +    }
>> +  //printf ("%u\n", check);
>> +  if (check != 3213344948)
>> +    abort ();
>> +  return 0;
>> +}
>> +
>> +/* All 16 loops in dotest should be split.  */
>> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-07-26 11:32               ` Richard Biener
@ 2016-07-27  6:18                 ` Andrew Pinski
  2016-07-27  8:11                   ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2016-07-27  6:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Matz, Jeff Law, GCC Patches

On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>>
>>>> > So, okay for trunk?
>>>> -ENOPATCH
>>>
>>> Sigh :)
>>> Here it is.
>>
>>
>> I found one problem with it.
>> Take:
>> void f(int *a, int M, int *b)
>> {
>>   for(int i = 0; i <= M; i++)
>>     {
>>        if (i < M)
>>         a[i] = i;
>>     }
>> }
>> ---- CUT ---
>> There are two issues with the code as below.  The outer most loop's
>> aux is still set which causes the vectorizer not to vector the loop.
>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>> to get the induction variable usage after the loop gone so the
>> vectorizer will work.
>
> I think scev_cprop needs to be re-written to an utility so that the vectorizer
> itself can (within its own cost-model) eliminate an induction using it.
>
> Richard.
>
>> Something like (note this is copy and paste from a terminal):
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index c327900..e8d6ea6 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>>           NEXT_PASS (pass_copy_prop);
>>           NEXT_PASS (pass_dce);
>>           NEXT_PASS (pass_tree_unswitch);
>> -         NEXT_PASS (pass_scev_cprop);
>>           NEXT_PASS (pass_loop_split);
>> +         NEXT_PASS (pass_scev_cprop);
>>           NEXT_PASS (pass_record_bounds);
>>           NEXT_PASS (pass_loop_distribution);
>>           NEXT_PASS (pass_copy_prop);
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index 5411530..e72ef19 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>
>>    gcc_assert (scev_initialized_p ());
>>    FOR_EACH_LOOP (loop, 0)
>> -    loop->aux = NULL;
>> +    {
>> +      loop->aux = NULL;
>> +      if (loop_outer (loop))
>> +       loop_outer (loop)->aux = NULL;
>> +    }
>
> How does the iterator not visit loop_outer (loop)?!

The iterator with flags of 0 does not visit the the root.  So the way
to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
we zero out the root too.

Thanks,
Andrew

>
>>
>>    /* Go through all loops starting from innermost.  */
>>    FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>      }
>>
>>    FOR_EACH_LOOP (loop, 0)
>> -    loop->aux = NULL;
>> +    {
>> +      loop->aux = NULL;
>> +      if (loop_outer (loop))
>> +       loop_outer (loop)->aux = NULL;
>> +    }
>>
>>    if (changed)
>>      return TODO_cleanup_cfg;
>> -----  CUT -----
>>
>> Thanks,
>> Andrew
>>
>>
>>>
>>>
>>> Ciao,
>>> Michael.
>>>         * common.opt (-fsplit-loops): New flag.
>>>         * passes.def (pass_loop_split): Add.
>>>         * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>>         (enable_fdo_optimizations): Add loop splitting.
>>>         * timevar.def (TV_LOOP_SPLIT): Add.
>>>         * tree-pass.h (make_pass_loop_split): Declare.
>>>         * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>>         * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>>         * tree-ssa-loop-split.c: New file.
>>>         * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>>         * doc/invoke.texi (fsplit-loops): Document.
>>>         * doc/passes.texi (Loop optimization): Add paragraph about loop
>>>         splitting.
>>>
>>> testsuite/
>>>         * gcc.dg/loop-split.c: New test.
>>>
>>> Index: common.opt
>>> ===================================================================
>>> --- common.opt  (revision 231115)
>>> +++ common.opt  (working copy)
>>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>>  Common Report Var(flag_unswitch_loops) Optimization
>>>  Perform loop unswitching.
>>>
>>> +fsplit-loops
>>> +Common Report Var(flag_split_loops) Optimization
>>> +Perform loop splitting.
>>> +
>>>  funwind-tables
>>>  Common Report Var(flag_unwind_tables) Optimization
>>>  Just generate unwind tables for exception handling.
>>> Index: passes.def
>>> ===================================================================
>>> --- passes.def  (revision 231115)
>>> +++ passes.def  (working copy)
>>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>>           NEXT_PASS (pass_dce);
>>>           NEXT_PASS (pass_tree_unswitch);
>>>           NEXT_PASS (pass_scev_cprop);
>>> +         NEXT_PASS (pass_loop_split);
>>>           NEXT_PASS (pass_record_bounds);
>>>           NEXT_PASS (pass_loop_distribution);
>>>           NEXT_PASS (pass_copy_prop);
>>> Index: opts.c
>>> ===================================================================
>>> --- opts.c      (revision 231115)
>>> +++ opts.c      (working copy)
>>> @@ -532,6 +532,7 @@ static const struct default_options defa
>>>         regardless of them being declared inline.  */
>>>      { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>>>      { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
>>> +    { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>>>      { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>>>      { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>>>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>>> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>>>      opts->x_flag_ipa_cp_alignment = value;
>>>    if (!opts_set->x_flag_predictive_commoning)
>>>      opts->x_flag_predictive_commoning = value;
>>> +  if (!opts_set->x_flag_split_loops)
>>> +    opts->x_flag_split_loops = value;
>>>    if (!opts_set->x_flag_unswitch_loops)
>>>      opts->x_flag_unswitch_loops = value;
>>>    if (!opts_set->x_flag_gcse_after_reload)
>>> Index: timevar.def
>>> ===================================================================
>>> --- timevar.def (revision 231115)
>>> +++ timevar.def (working copy)
>>> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM                   , "
>>>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>>>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>>>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>>> +DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
>>>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>>>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>>>  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
>>> Index: tree-pass.h
>>> ===================================================================
>>> --- tree-pass.h (revision 231115)
>>> +++ tree-pass.h (working copy)
>>> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>>>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>>> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>>> Index: tree-ssa-loop-manip.h
>>> ===================================================================
>>> --- tree-ssa-loop-manip.h       (revision 231115)
>>> +++ tree-ssa-loop-manip.h       (working copy)
>>> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>>>
>>>  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>>>                        bool, tree *, tree *);
>>> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
>>> +                                           struct loop *);
>>>  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>>>  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>>>  extern void verify_loop_closed_ssa (bool);
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 231115)
>>> +++ Makefile.in (working copy)
>>> @@ -1474,6 +1474,7 @@ OBJS = \
>>>         tree-ssa-loop-manip.o \
>>>         tree-ssa-loop-niter.o \
>>>         tree-ssa-loop-prefetch.o \
>>> +       tree-ssa-loop-split.o \
>>>         tree-ssa-loop-unswitch.o \
>>>         tree-ssa-loop.o \
>>>         tree-ssa-math-opts.o \
>>> Index: tree-ssa-loop-split.c
>>> ===================================================================
>>> --- tree-ssa-loop-split.c       (revision 0)
>>> +++ tree-ssa-loop-split.c       (working copy)
>>> @@ -0,0 +1,686 @@
>>> +/* Loop splitting.
>>> +   Copyright (C) 2015 Free Software Foundation, Inc.
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it
>>> +under the terms of the GNU General Public License as published by the
>>> +Free Software Foundation; either version 3, or (at your option) any
>>> +later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "backend.h"
>>> +#include "tree.h"
>>> +#include "gimple.h"
>>> +#include "tree-pass.h"
>>> +#include "ssa.h"
>>> +#include "fold-const.h"
>>> +#include "tree-cfg.h"
>>> +#include "tree-ssa.h"
>>> +#include "tree-ssa-loop-niter.h"
>>> +#include "tree-ssa-loop.h"
>>> +#include "tree-ssa-loop-manip.h"
>>> +#include "tree-into-ssa.h"
>>> +#include "cfgloop.h"
>>> +#include "tree-scalar-evolution.h"
>>> +#include "gimple-iterator.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "cfghooks.h"
>>> +#include "gimple-fold.h"
>>> +#include "gimplify-me.h"
>>> +
>>> +/* This file implements loop splitting, i.e. transformation of loops like
>>> +
>>> +   for (i = 0; i < 100; i++)
>>> +     {
>>> +       if (i < 50)
>>> +         A;
>>> +       else
>>> +         B;
>>> +     }
>>> +
>>> +   into:
>>> +
>>> +   for (i = 0; i < 50; i++)
>>> +     {
>>> +       A;
>>> +     }
>>> +   for (; i < 100; i++)
>>> +     {
>>> +       B;
>>> +     }
>>> +
>>> +   */
>>> +
>>> +/* Return true when BB inside LOOP is a potential iteration space
>>> +   split point, i.e. ends with a condition like "IV < comp", which
>>> +   is true on one side of the iteration space and false on the other,
>>> +   and the split point can be computed.  If so, also return the border
>>> +   point in *BORDER and the comparison induction variable in IV.  */
>>> +
>>> +static tree
>>> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
>>> +{
>>> +  gimple *last;
>>> +  gcond *stmt;
>>> +  affine_iv iv2;
>>> +
>>> +  /* BB must end in a simple conditional jump.  */
>>> +  last = last_stmt (bb);
>>> +  if (!last || gimple_code (last) != GIMPLE_COND)
>>> +    return NULL_TREE;
>>> +  stmt = as_a <gcond *> (last);
>>> +
>>> +  enum tree_code code = gimple_cond_code (stmt);
>>> +
>>> +  /* Only handle relational comparisons, for equality and non-equality
>>> +     we'd have to split the loop into two loops and a middle statement.  */
>>> +  switch (code)
>>> +    {
>>> +      case LT_EXPR:
>>> +      case LE_EXPR:
>>> +      case GT_EXPR:
>>> +      case GE_EXPR:
>>> +       break;
>>> +      default:
>>> +       return NULL_TREE;
>>> +    }
>>> +
>>> +  if (loop_exits_from_bb_p (loop, bb))
>>> +    return NULL_TREE;
>>> +
>>> +  tree op0 = gimple_cond_lhs (stmt);
>>> +  tree op1 = gimple_cond_rhs (stmt);
>>> +
>>> +  if (!simple_iv (loop, loop, op0, iv, false))
>>> +    return NULL_TREE;
>>> +  if (!simple_iv (loop, loop, op1, &iv2, false))
>>> +    return NULL_TREE;
>>> +
>>> +  /* Make it so, that the first argument of the condition is
>>> +     the looping one (only swap.  */
>>> +  if (!integer_zerop (iv2.step))
>>> +    {
>>> +      std::swap (op0, op1);
>>> +      std::swap (*iv, iv2);
>>> +      code = swap_tree_comparison (code);
>>> +      gimple_cond_set_condition (stmt, code, op0, op1);
>>> +      update_stmt (stmt);
>>> +    }
>>> +  else if (integer_zerop (iv->step))
>>> +    return NULL_TREE;
>>> +  if (!integer_zerop (iv2.step))
>>> +    return NULL_TREE;
>>> +
>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>> +    {
>>> +      fprintf (dump_file, "Found potential split point: ");
>>> +      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>> +      fprintf (dump_file, " { ");
>>> +      print_generic_expr (dump_file, iv->base, TDF_SLIM);
>>> +      fprintf (dump_file, " + I*");
>>> +      print_generic_expr (dump_file, iv->step, TDF_SLIM);
>>> +      fprintf (dump_file, " } %s ", get_tree_code_name (code));
>>> +      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
>>> +      fprintf (dump_file, "\n");
>>> +    }
>>> +
>>> +  *border = iv2.base;
>>> +  return op0;
>>> +}
>>> +
>>> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
>>> +   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
>>> +   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
>>> +   exit test statement to loop back only if the GUARD statement will
>>> +   also be true/false in the next iteration.  */
>>> +
>>> +static void
>>> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
>>> +                bool initial_true)
>>> +{
>>> +  edge exit = single_exit (loop);
>>> +  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
>>> +  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
>>> +                            nextval, newbound);
>>> +  update_stmt (stmt);
>>> +
>>> +  edge stay = single_pred_edge (loop->latch);
>>> +
>>> +  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>> +  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>> +
>>> +  if (initial_true)
>>> +    {
>>> +      exit->flags |= EDGE_FALSE_VALUE;
>>> +      stay->flags |= EDGE_TRUE_VALUE;
>>> +    }
>>> +  else
>>> +    {
>>> +      exit->flags |= EDGE_TRUE_VALUE;
>>> +      stay->flags |= EDGE_FALSE_VALUE;
>>> +    }
>>> +}
>>> +
>>> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
>>> +   find the loop phi node in LOOP defining it directly, or create
>>> +   such phi node.  Return that phi node.  */
>>> +
>>> +static gphi *
>>> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
>>> +{
>>> +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
>>> +  gphi *phi;
>>> +  if ((phi = dyn_cast <gphi *> (def))
>>> +      && gimple_bb (phi) == loop->header)
>>> +    return phi;
>>> +
>>> +  /* XXX Create the PHI instead.  */
>>> +  return NULL;
>>> +}
>>> +
>>> +/* This function updates the SSA form after connect_loops made a new
>>> +   edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
>>> +   conditional).  I.e. the second loop can now be entered either
>>> +   via the original entry or via NEW_E, so the entry values of LOOP2
>>> +   phi nodes are either the original ones or those at the exit
>>> +   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
>>> +   this.  */
>>> +
>>> +static void
>>> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
>>> +{
>>> +  basic_block rest = loop_preheader_edge (loop2)->src;
>>> +  gcc_assert (new_e->dest == rest);
>>> +  edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
>>> +
>>> +  edge firste = loop_preheader_edge (loop1);
>>> +  edge seconde = loop_preheader_edge (loop2);
>>> +  edge firstn = loop_latch_edge (loop1);
>>> +  gphi_iterator psi_first, psi_second;
>>> +  for (psi_first = gsi_start_phis (loop1->header),
>>> +       psi_second = gsi_start_phis (loop2->header);
>>> +       !gsi_end_p (psi_first);
>>> +       gsi_next (&psi_first), gsi_next (&psi_second))
>>> +    {
>>> +      tree init, next, new_init;
>>> +      use_operand_p op;
>>> +      gphi *phi_first = psi_first.phi ();
>>> +      gphi *phi_second = psi_second.phi ();
>>> +
>>> +      init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>>> +      next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>>> +      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>>> +      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>>> +
>>> +      /* Prefer using original variable as a base for the new ssa name.
>>> +        This is necessary for virtual ops, and useful in order to avoid
>>> +        losing debug info for real ops.  */
>>> +      if (TREE_CODE (next) == SSA_NAME
>>> +         && useless_type_conversion_p (TREE_TYPE (next),
>>> +                                       TREE_TYPE (init)))
>>> +       new_init = copy_ssa_name (next);
>>> +      else if (TREE_CODE (init) == SSA_NAME
>>> +              && useless_type_conversion_p (TREE_TYPE (init),
>>> +                                            TREE_TYPE (next)))
>>> +       new_init = copy_ssa_name (init);
>>> +      else if (useless_type_conversion_p (TREE_TYPE (next),
>>> +                                         TREE_TYPE (init)))
>>> +       new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>>> +                                      "unrinittmp");
>>> +      else
>>> +       new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>>> +                                      "unrinittmp");
>>> +
>>> +      gphi * newphi = create_phi_node (new_init, rest);
>>> +      add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>>> +      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>>> +      SET_USE (op, new_init);
>>> +    }
>>> +}
>>> +
>>> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
>>> +   they are still equivalent and placed in two arms of a diamond, like so:
>>> +
>>> +               .------if (cond)------.
>>> +               v                     v
>>> +             pre1                   pre2
>>> +              |                      |
>>> +        .--->h1                     h2<----.
>>> +        |     |                      |     |
>>> +        |    ex1---.            .---ex2    |
>>> +        |    /     |            |     \    |
>>> +        '---l1     X            |     l2---'
>>> +                   |            |
>>> +                   |            |
>>> +                   '--->join<---'
>>> +
>>> +   This function transforms the program such that LOOP1 is conditionally
>>> +   falling through to LOOP2, or skipping it.  This is done by splitting
>>> +   the ex1->join edge at X in the diagram above, and inserting a condition
>>> +   whose one arm goes to pre2, resulting in this situation:
>>> +
>>> +               .------if (cond)------.
>>> +               v                     v
>>> +             pre1       .---------->pre2
>>> +              |         |            |
>>> +        .--->h1         |           h2<----.
>>> +        |     |         |            |     |
>>> +        |    ex1---.    |       .---ex2    |
>>> +        |    /     v    |       |     \    |
>>> +        '---l1   skip---'       |     l2---'
>>> +                   |            |
>>> +                   |            |
>>> +                   '--->join<---'
>>> +
>>> +
>>> +   The condition used is the exit condition of LOOP1, which effectively means
>>> +   that when the first loop exits (for whatever reason) but the real original
>>> +   exit expression is still false the second loop will be entered.
>>> +   The function returns the new edge cond->pre2.
>>> +
>>> +   This doesn't update the SSA form, see connect_loop_phis for that.  */
>>> +
>>> +static edge
>>> +connect_loops (struct loop *loop1, struct loop *loop2)
>>> +{
>>> +  edge exit = single_exit (loop1);
>>> +  basic_block skip_bb = split_edge (exit);
>>> +  gcond *skip_stmt;
>>> +  gimple_stmt_iterator gsi;
>>> +  edge new_e, skip_e;
>>> +
>>> +  gimple *stmt = last_stmt (exit->src);
>>> +  skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
>>> +                                gimple_cond_lhs (stmt),
>>> +                                gimple_cond_rhs (stmt),
>>> +                                NULL_TREE, NULL_TREE);
>>> +  gsi = gsi_last_bb (skip_bb);
>>> +  gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
>>> +
>>> +  skip_e = EDGE_SUCC (skip_bb, 0);
>>> +  skip_e->flags &= ~EDGE_FALLTHRU;
>>> +  new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
>>> +  if (exit->flags & EDGE_TRUE_VALUE)
>>> +    {
>>> +      skip_e->flags |= EDGE_TRUE_VALUE;
>>> +      new_e->flags |= EDGE_FALSE_VALUE;
>>> +    }
>>> +  else
>>> +    {
>>> +      skip_e->flags |= EDGE_FALSE_VALUE;
>>> +      new_e->flags |= EDGE_TRUE_VALUE;
>>> +    }
>>> +
>>> +  new_e->count = skip_bb->count;
>>> +  new_e->probability = PROB_LIKELY;
>>> +  new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
>>> +  skip_e->count -= new_e->count;
>>> +  skip_e->probability = inverse_probability (PROB_LIKELY);
>>> +
>>> +  return new_e;
>>> +}
>>> +
>>> +/* This returns the new bound for iterations given the original iteration
>>> +   space in NITER, an arbitrary new bound BORDER, assumed to be some
>>> +   comparison value with a different IV, the initial value GUARD_INIT of
>>> +   that other IV, and the comparison code GUARD_CODE that compares
>>> +   that other IV with BORDER.  We return an SSA name, and place any
>>> +   necessary statements for that computation into *STMTS.
>>> +
>>> +   For example for such a loop:
>>> +
>>> +     for (i = beg, j = guard_init; i < end; i++, j++)
>>> +       if (j < border)  // this is supposed to be true/false
>>> +         ...
>>> +
>>> +   we want to return a new bound (on j) that makes the loop iterate
>>> +   as long as the condition j < border stays true.  We also don't want
>>> +   to iterate more often than the original loop, so we have to introduce
>>> +   some cut-off as well (via min/max), effectively resulting in:
>>> +
>>> +     newend = min (end+guard_init-beg, border)
>>> +     for (i = beg; j = guard_init; j < newend; i++, j++)
>>> +       if (j < c)
>>> +         ...
>>> +
>>> +   Depending on the direction of the IVs and if the exit tests
>>> +   are strict or non-strict we need to use MIN or MAX,
>>> +   and add or subtract 1.  This routine computes newend above.  */
>>> +
>>> +static tree
>>> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>>> +                        tree border,
>>> +                        enum tree_code guard_code, tree guard_init)
>>> +{
>>> +  /* The niter structure contains the after-increment IV, we need
>>> +     the loop-enter base, so subtract STEP once.  */
>>> +  tree controlbase = force_gimple_operand (niter->control.base,
>>> +                                          stmts, true, NULL_TREE);
>>> +  tree controlstep = niter->control.step;
>>> +  tree enddiff;
>>> +  if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>>> +    {
>>> +      controlstep = gimple_build (stmts, NEGATE_EXPR,
>>> +                                 TREE_TYPE (controlstep), controlstep);
>>> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>> +                             TREE_TYPE (controlbase),
>>> +                             controlbase, controlstep);
>>> +    }
>>> +  else
>>> +    enddiff = gimple_build (stmts, MINUS_EXPR,
>>> +                           TREE_TYPE (controlbase),
>>> +                           controlbase, controlstep);
>>> +
>>> +  /* Compute beg-guard_init.  */
>>> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
>>> +    {
>>> +      tree tem = gimple_convert (stmts, sizetype, guard_init);
>>> +      tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
>>> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>> +                             TREE_TYPE (enddiff),
>>> +                             enddiff, tem);
>>> +    }
>>> +  else
>>> +    enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>> +                           enddiff, guard_init);
>>> +
>>> +  /* Compute end-(beg-guard_init).  */
>>> +  gimple_seq stmts2;
>>> +  tree newbound = force_gimple_operand (niter->bound, &stmts2,
>>> +                                       true, NULL_TREE);
>>> +  gimple_seq_add_seq_without_update (stmts, stmts2);
>>> +
>>> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff))
>>> +      || POINTER_TYPE_P (TREE_TYPE (newbound)))
>>> +    {
>>> +      enddiff = gimple_convert (stmts, sizetype, enddiff);
>>> +      enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
>>> +      newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
>>> +                              TREE_TYPE (newbound),
>>> +                              newbound, enddiff);
>>> +    }
>>> +  else
>>> +    newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>> +                            newbound, enddiff);
>>> +
>>> +  /* Depending on the direction of the IVs the new bound for the first
>>> +     loop is the minimum or maximum of old bound and border.
>>> +     Also, if the guard condition isn't strictly less or greater,
>>> +     we need to adjust the bound.  */
>>> +  int addbound = 0;
>>> +  enum tree_code minmax;
>>> +  if (niter->cmp == LT_EXPR)
>>> +    {
>>> +      /* GT and LE are the same, inverted.  */
>>> +      if (guard_code == GT_EXPR || guard_code == LE_EXPR)
>>> +       addbound = -1;
>>> +      minmax = MIN_EXPR;
>>> +    }
>>> +  else
>>> +    {
>>> +      gcc_assert (niter->cmp == GT_EXPR);
>>> +      if (guard_code == GE_EXPR || guard_code == LT_EXPR)
>>> +       addbound = 1;
>>> +      minmax = MAX_EXPR;
>>> +    }
>>> +
>>> +  if (addbound)
>>> +    {
>>> +      tree type2 = TREE_TYPE (newbound);
>>> +      if (POINTER_TYPE_P (type2))
>>> +       type2 = sizetype;
>>> +      newbound = gimple_build (stmts,
>>> +                              POINTER_TYPE_P (TREE_TYPE (newbound))
>>> +                              ? POINTER_PLUS_EXPR : PLUS_EXPR,
>>> +                              TREE_TYPE (newbound),
>>> +                              newbound,
>>> +                              build_int_cst (type2, addbound));
>>> +    }
>>> +
>>> +  tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
>>> +                             border, newbound);
>>> +  return newend;
>>> +}
>>> +
>>> +/* Checks if LOOP contains an conditional block whose condition
>>> +   depends on which side in the iteration space it is, and if so
>>> +   splits the iteration space into two loops.  Returns true if the
>>> +   loop was split.  NITER must contain the iteration descriptor for the
>>> +   single exit of LOOP.  */
>>> +
>>> +static bool
>>> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
>>> +{
>>> +  basic_block *bbs;
>>> +  unsigned i;
>>> +  bool changed = false;
>>> +  tree guard_iv;
>>> +  tree border;
>>> +  affine_iv iv;
>>> +
>>> +  bbs = get_loop_body (loop1);
>>> +
>>> +  /* Find a splitting opportunity.  */
>>> +  for (i = 0; i < loop1->num_nodes; i++)
>>> +    if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>>> +      {
>>> +       /* Handling opposite steps is not implemented yet.  Neither
>>> +          is handling different step sizes.  */
>>> +       if ((tree_int_cst_sign_bit (iv.step)
>>> +            != tree_int_cst_sign_bit (niter->control.step))
>>> +           || !tree_int_cst_equal (iv.step, niter->control.step))
>>> +         continue;
>>> +
>>> +       /* Find a loop PHI node that defines guard_iv directly,
>>> +          or create one doing that.  */
>>> +       gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
>>> +       if (!phi)
>>> +         continue;
>>> +       gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
>>> +       tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
>>> +                                                loop_preheader_edge (loop1));
>>> +       enum tree_code guard_code = gimple_cond_code (guard_stmt);
>>> +
>>> +       /* Loop splitting is implemented by versioning the loop, placing
>>> +          the new loop after the old loop, make the first loop iterate
>>> +          as long as the conditional stays true (or false) and let the
>>> +          second (new) loop handle the rest of the iterations.
>>> +
>>> +          First we need to determine if the condition will start being true
>>> +          or false in the first loop.  */
>>> +       bool initial_true;
>>> +       switch (guard_code)
>>> +         {
>>> +           case LT_EXPR:
>>> +           case LE_EXPR:
>>> +             initial_true = !tree_int_cst_sign_bit (iv.step);
>>> +             break;
>>> +           case GT_EXPR:
>>> +           case GE_EXPR:
>>> +             initial_true = tree_int_cst_sign_bit (iv.step);
>>> +             break;
>>> +           default:
>>> +             gcc_unreachable ();
>>> +         }
>>> +
>>> +       /* Build a condition that will skip the first loop when the
>>> +          guard condition won't ever be true (or false).  */
>>> +       gimple_seq stmts2;
>>> +       border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
>>> +       if (stmts2)
>>> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>> +                                           stmts2);
>>> +       tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>>> +       if (!initial_true)
>>> +         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>> +
>>> +       /* Now version the loop, placing loop2 after loop1 connecting
>>> +          them, and fix up SSA form for that.  */
>>> +       initialize_original_copy_tables ();
>>> +       basic_block cond_bb;
>>> +       struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>> +                                          REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>>> +                                          REG_BR_PROB_BASE, true);
>>> +       gcc_assert (loop2);
>>> +       update_ssa (TODO_update_ssa);
>>> +
>>> +       edge new_e = connect_loops (loop1, loop2);
>>> +       connect_loop_phis (loop1, loop2, new_e);
>>> +
>>> +       /* The iterations of the second loop is now already
>>> +          exactly those that the first loop didn't do, but the
>>> +          iteration space of the first loop is still the original one.
>>> +          Compute the new bound for the guarding IV and patch the
>>> +          loop exit to use it instead of original IV and bound.  */
>>> +       gimple_seq stmts = NULL;
>>> +       tree newend = compute_new_first_bound (&stmts, niter, border,
>>> +                                              guard_code, guard_init);
>>> +       if (stmts)
>>> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>> +                                           stmts);
>>> +       tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>>> +       patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>>> +
>>> +       /* Finally patch out the two copies of the condition to be always
>>> +          true/false (or opposite).  */
>>> +       gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
>>> +       gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
>>> +       if (!initial_true)
>>> +         std::swap (force_true, force_false);
>>> +       gimple_cond_make_true (force_true);
>>> +       gimple_cond_make_false (force_false);
>>> +       update_stmt (force_true);
>>> +       update_stmt (force_false);
>>> +
>>> +       free_original_copy_tables ();
>>> +
>>> +       /* We destroyed LCSSA form above.  Eventually we might be able
>>> +          to fix it on the fly, for now simply punt and use the helper.  */
>>> +       rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>>> +
>>> +       changed = true;
>>> +       if (dump_file && (dump_flags & TDF_DETAILS))
>>> +         fprintf (dump_file, ";; Loop split.\n");
>>> +
>>> +       /* Only deal with the first opportunity.  */
>>> +       break;
>>> +      }
>>> +
>>> +  free (bbs);
>>> +  return changed;
>>> +}
>>> +
>>> +/* Main entry point.  Perform loop splitting on all suitable loops.  */
>>> +
>>> +static unsigned int
>>> +tree_ssa_split_loops (void)
>>> +{
>>> +  struct loop *loop;
>>> +  bool changed = false;
>>> +
>>> +  gcc_assert (scev_initialized_p ());
>>> +  FOR_EACH_LOOP (loop, 0)
>>> +    loop->aux = NULL;
>>> +
>>> +  /* Go through all loops starting from innermost.  */
>>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>> +    {
>>> +      struct tree_niter_desc niter;
>>> +      if (loop->aux)
>>> +       {
>>> +         /* If any of our inner loops was split, don't split us,
>>> +            and mark our containing loop as having had splits as well.  */
>>> +         loop_outer (loop)->aux = loop;
>>> +         continue;
>>> +       }
>>> +
>>> +      if (single_exit (loop)
>>> +         /* ??? We could handle non-empty latches when we split
>>> +            the latch edge (not the exit edge), and put the new
>>> +            exit condition in the new block.  OTOH this executes some
>>> +            code unconditionally that might have been skipped by the
>>> +            original exit before.  */
>>> +         && empty_block_p (loop->latch)
>>> +         && !optimize_loop_for_size_p (loop)
>>> +         && number_of_iterations_exit (loop, single_exit (loop), &niter,
>>> +                                       false, true)
>>> +         && niter.cmp != ERROR_MARK
>>> +         /* We can't yet handle loops controlled by a != predicate.  */
>>> +         && niter.cmp != NE_EXPR)
>>> +       {
>>> +         if (split_loop (loop, &niter))
>>> +           {
>>> +             /* Mark our containing loop as having had some split inner
>>> +                loops.  */
>>> +             loop_outer (loop)->aux = loop;
>>> +             changed = true;
>>> +           }
>>> +       }
>>> +    }
>>> +
>>> +  FOR_EACH_LOOP (loop, 0)
>>> +    loop->aux = NULL;
>>> +
>>> +  if (changed)
>>> +    return TODO_cleanup_cfg;
>>> +  return 0;
>>> +}
>>> +
>>> +/* Loop splitting pass.  */
>>> +
>>> +namespace {
>>> +
>>> +const pass_data pass_data_loop_split =
>>> +{
>>> +  GIMPLE_PASS, /* type */
>>> +  "lsplit", /* name */
>>> +  OPTGROUP_LOOP, /* optinfo_flags */
>>> +  TV_LOOP_SPLIT, /* tv_id */
>>> +  PROP_cfg, /* properties_required */
>>> +  0, /* properties_provided */
>>> +  0, /* properties_destroyed */
>>> +  0, /* todo_flags_start */
>>> +  0, /* todo_flags_finish */
>>> +};
>>> +
>>> +class pass_loop_split : public gimple_opt_pass
>>> +{
>>> +public:
>>> +  pass_loop_split (gcc::context *ctxt)
>>> +    : gimple_opt_pass (pass_data_loop_split, ctxt)
>>> +  {}
>>> +
>>> +  /* opt_pass methods: */
>>> +  virtual bool gate (function *) { return flag_split_loops != 0; }
>>> +  virtual unsigned int execute (function *);
>>> +
>>> +}; // class pass_loop_split
>>> +
>>> +unsigned int
>>> +pass_loop_split::execute (function *fun)
>>> +{
>>> +  if (number_of_loops (fun) <= 1)
>>> +    return 0;
>>> +
>>> +  return tree_ssa_split_loops ();
>>> +}
>>> +
>>> +} // anon namespace
>>> +
>>> +gimple_opt_pass *
>>> +make_pass_loop_split (gcc::context *ctxt)
>>> +{
>>> +  return new pass_loop_split (ctxt);
>>> +}
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi     (revision 231115)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>>>  -fselective-scheduling -fselective-scheduling2 @gol
>>>  -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>>>  -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
>>> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
>>> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>>>  -fsplit-paths @gol
>>>  -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>>>  -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
>>> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>>>  Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
>>>  at level @option{-O1}
>>>
>>> +@item -fsplit-loops
>>> +@opindex fsplit-loops
>>> +Split a loop into two if it contains a condition that's always true
>>> +for one side of the iteration space and false for the other.
>>> +
>>>  @item -funswitch-loops
>>>  @opindex funswitch-loops
>>>  Move branches with loop invariant conditions out of the loop, with duplicates
>>> Index: doc/passes.texi
>>> ===================================================================
>>> --- doc/passes.texi     (revision 231115)
>>> +++ doc/passes.texi     (working copy)
>>> @@ -484,6 +484,12 @@ out of the loops.  To achieve this, a du
>>>  each possible outcome of conditional jump(s).  The pass is implemented in
>>>  @file{tree-ssa-loop-unswitch.c}.
>>>
>>> +Loop splitting.  If a loop contains a conditional statement that is
>>> +always true for one part of the iteration space and false for the other
>>> +this pass splits the loop into two, one dealing with one side the other
>>> +only with the other, thereby removing one inner-loop conditional.  The
>>> +pass is implemented in @file{tree-ssa-loop-split.c}.
>>> +
>>>  The optimizations also use various utility functions contained in
>>>  @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>>>  @file{cfgloopmanip.c}.
>>> Index: testsuite/gcc.dg/loop-split.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/loop-split.c       (revision 0)
>>> +++ testsuite/gcc.dg/loop-split.c       (working copy)
>>> @@ -0,0 +1,147 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>>> +
>>> +#ifdef __cplusplus
>>> +extern "C" int printf (const char *, ...);
>>> +extern "C" void abort (void);
>>> +#else
>>> +extern int printf (const char *, ...);
>>> +extern void abort (void);
>>> +#endif
>>> +
>>> +/* Define TRACE to 1 or 2 to get detailed tracing.
>>> +   Define SINGLE_TEST to 1 or 2 to get a simple routine with
>>> +   just one loop, called only one time or with multiple parameters,
>>> +   to make debugging easier.  */
>>> +#ifndef TRACE
>>> +#define TRACE 0
>>> +#endif
>>> +
>>> +#define loop(beg,step,beg2,cond1,cond2) \
>>> +    do \
>>> +      { \
>>> +       sum = 0; \
>>> +        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
>>> +          { \
>>> +            if (cond2) { \
>>> +             if (TRACE > 1) printf ("a: %d %d\n", i, j); \
>>> +              sum += a[i]; \
>>> +           } else { \
>>> +             if (TRACE > 1) printf ("b: %d %d\n", i, j); \
>>> +              sum += b[i]; \
>>> +           } \
>>> +          } \
>>> +       if (TRACE > 0) printf ("sum: %d\n", sum); \
>>> +       check = check * 47 + sum; \
>>> +      } while (0)
>>> +
>>> +#ifndef SINGLE_TEST
>>> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
>>> +                                              int c, int *a, int *b, int beg2)
>>> +{
>>> +  unsigned check = 0;
>>> +  int sum;
>>> +  int i, j;
>>> +  loop (beg, 1, beg2, i < end, j < c);
>>> +  loop (beg, 1, beg2, i <= end, j < c);
>>> +  loop (beg, 1, beg2, i < end, j <= c);
>>> +  loop (beg, 1, beg2, i <= end, j <= c);
>>> +  loop (beg, 1, beg2, i < end, j > c);
>>> +  loop (beg, 1, beg2, i <= end, j > c);
>>> +  loop (beg, 1, beg2, i < end, j >= c);
>>> +  loop (beg, 1, beg2, i <= end, j >= c);
>>> +  beg2 += end-beg;
>>> +  loop (end, -1, beg2, i >= beg, j >= c);
>>> +  loop (end, -1, beg2, i >= beg, j > c);
>>> +  loop (end, -1, beg2, i > beg, j >= c);
>>> +  loop (end, -1, beg2, i > beg, j > c);
>>> +  loop (end, -1, beg2, i >= beg, j <= c);
>>> +  loop (end, -1, beg2, i >= beg, j < c);
>>> +  loop (end, -1, beg2, i > beg, j <= c);
>>> +  loop (end, -1, beg2, i > beg, j < c);
>>> +  return check;
>>> +}
>>> +
>>> +#else
>>> +
>>> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
>>> +                                         int c, int *a, int *b, int beg2)
>>> +{
>>> +  int sum = 0;
>>> +  int i, j;
>>> +  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>> +  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
>>> +    {
>>> +      // i - j == X --> i = X + j
>>> +      // --> i < end == X+j < end == j < end - X
>>> +      // --> newend = end - (i_init - j_init)
>>> +      // j < end-X && j < c --> j < min(end-X,c)
>>> +      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
>>> +      //if (j < c)
>>> +      if (j >= c)
>>> +       printf ("a: %d %d\n", i, j);
>>> +      /*else
>>> +       printf ("b: %d %d\n", i, j);*/
>>> +       /*sum += a[i];
>>> +      else
>>> +       sum += b[i];*/
>>> +    }
>>> +  return sum;
>>> +}
>>> +
>>> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
>>> +                                         int *c, int *a, int *b, int *beg2)
>>> +{
>>> +  int sum = 0;
>>> +  int *i, *j;
>>> +  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>> +    {
>>> +      if (j <= c)
>>> +       printf ("%d %d\n", i - beg, j - beg);
>>> +       /*sum += a[i];
>>> +      else
>>> +       sum += b[i];*/
>>> +    }
>>> +  return sum;
>>> +}
>>> +#endif
>>> +
>>> +extern int printf (const char *, ...);
>>> +
>>> +int main ()
>>> +{
>>> +  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
>>> +  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
>>> +  int c;
>>> +  int diff = 0;
>>> +  unsigned check = 0;
>>> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
>>> +  //dotest (0, 9, 1, -1, a+5, b+5, -1);
>>> +  //return 0;
>>> +  f (0, 9, 1, 5, a+5, b+5, -1);
>>> +  return 0;
>>> +#endif
>>> +  for (diff = -5; diff <= 5; diff++)
>>> +    {
>>> +      for (c = -1; c <= 10; c++)
>>> +       {
>>> +#ifdef SINGLE_TEST
>>> +         int s = f (0, 9, 1, c, a+5, b+5, diff);
>>> +         //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
>>> +         printf ("%d ", s);
>>> +#else
>>> +         if (TRACE > 0)
>>> +           printf ("check %d %d\n", c, diff);
>>> +         check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
>>> +#endif
>>> +       }
>>> +      //printf ("\n");
>>> +    }
>>> +  //printf ("%u\n", check);
>>> +  if (check != 3213344948)
>>> +    abort ();
>>> +  return 0;
>>> +}
>>> +
>>> +/* All 16 loops in dotest should be split.  */
>>> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting v2
  2016-07-27  6:18                 ` Andrew Pinski
@ 2016-07-27  8:11                   ` Richard Biener
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Biener @ 2016-07-27  8:11 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Michael Matz, Jeff Law, GCC Patches

On Wed, Jul 27, 2016 at 8:17 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
>>>> Hi,
>>>>
>>>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>>>
>>>>> > So, okay for trunk?
>>>>> -ENOPATCH
>>>>
>>>> Sigh :)
>>>> Here it is.
>>>
>>>
>>> I found one problem with it.
>>> Take:
>>> void f(int *a, int M, int *b)
>>> {
>>>   for(int i = 0; i <= M; i++)
>>>     {
>>>        if (i < M)
>>>         a[i] = i;
>>>     }
>>> }
>>> ---- CUT ---
>>> There are two issues with the code as below.  The outer most loop's
>>> aux is still set which causes the vectorizer not to vector the loop.
>>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>>> to get the induction variable usage after the loop gone so the
>>> vectorizer will work.
>>
>> I think scev_cprop needs to be re-written to an utility so that the vectorizer
>> itself can (within its own cost-model) eliminate an induction using it.
>>
>> Richard.
>>
>>> Something like (note this is copy and paste from a terminal):
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index c327900..e8d6ea6 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>>>           NEXT_PASS (pass_copy_prop);
>>>           NEXT_PASS (pass_dce);
>>>           NEXT_PASS (pass_tree_unswitch);
>>> -         NEXT_PASS (pass_scev_cprop);
>>>           NEXT_PASS (pass_loop_split);
>>> +         NEXT_PASS (pass_scev_cprop);
>>>           NEXT_PASS (pass_record_bounds);
>>>           NEXT_PASS (pass_loop_distribution);
>>>           NEXT_PASS (pass_copy_prop);
>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>> index 5411530..e72ef19 100644
>>> --- a/gcc/tree-ssa-loop-split.c
>>> +++ b/gcc/tree-ssa-loop-split.c
>>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>>
>>>    gcc_assert (scev_initialized_p ());
>>>    FOR_EACH_LOOP (loop, 0)
>>> -    loop->aux = NULL;
>>> +    {
>>> +      loop->aux = NULL;
>>> +      if (loop_outer (loop))
>>> +       loop_outer (loop)->aux = NULL;
>>> +    }
>>
>> How does the iterator not visit loop_outer (loop)?!
>
> The iterator with flags of 0 does not visit the the root.  So the way
> to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
> we zero out the root too.

Or not set ->aux on the root in the first place.

Richard.

> Thanks,
> Andrew
>
>>
>>>
>>>    /* Go through all loops starting from innermost.  */
>>>    FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>>      }
>>>
>>>    FOR_EACH_LOOP (loop, 0)
>>> -    loop->aux = NULL;
>>> +    {
>>> +      loop->aux = NULL;
>>> +      if (loop_outer (loop))
>>> +       loop_outer (loop)->aux = NULL;
>>> +    }
>>>
>>>    if (changed)
>>>      return TODO_cleanup_cfg;
>>> -----  CUT -----
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>>
>>>>
>>>> Ciao,
>>>> Michael.
>>>>         * common.opt (-fsplit-loops): New flag.
>>>>         * passes.def (pass_loop_split): Add.
>>>>         * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>>>         (enable_fdo_optimizations): Add loop splitting.
>>>>         * timevar.def (TV_LOOP_SPLIT): Add.
>>>>         * tree-pass.h (make_pass_loop_split): Declare.
>>>>         * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>>>         * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>>>         * tree-ssa-loop-split.c: New file.
>>>>         * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>>>         * doc/invoke.texi (fsplit-loops): Document.
>>>>         * doc/passes.texi (Loop optimization): Add paragraph about loop
>>>>         splitting.
>>>>
>>>> testsuite/
>>>>         * gcc.dg/loop-split.c: New test.
>>>>
>>>> Index: common.opt
>>>> ===================================================================
>>>> --- common.opt  (revision 231115)
>>>> +++ common.opt  (working copy)
>>>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>>>  Common Report Var(flag_unswitch_loops) Optimization
>>>>  Perform loop unswitching.
>>>>
>>>> +fsplit-loops
>>>> +Common Report Var(flag_split_loops) Optimization
>>>> +Perform loop splitting.
>>>> +
>>>>  funwind-tables
>>>>  Common Report Var(flag_unwind_tables) Optimization
>>>>  Just generate unwind tables for exception handling.
>>>> Index: passes.def
>>>> ===================================================================
>>>> --- passes.def  (revision 231115)
>>>> +++ passes.def  (working copy)
>>>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>>>           NEXT_PASS (pass_dce);
>>>>           NEXT_PASS (pass_tree_unswitch);
>>>>           NEXT_PASS (pass_scev_cprop);
>>>> +         NEXT_PASS (pass_loop_split);
>>>>           NEXT_PASS (pass_record_bounds);
>>>>           NEXT_PASS (pass_loop_distribution);
>>>>           NEXT_PASS (pass_copy_prop);
>>>> Index: opts.c
>>>> ===================================================================
>>>> --- opts.c      (revision 231115)
>>>> +++ opts.c      (working copy)
>>>> @@ -532,6 +532,7 @@ static const struct default_options defa
>>>>         regardless of them being declared inline.  */
>>>>      { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>>>>      { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
>>>> +    { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>>>>      { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>>>>      { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>>>>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>>>> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>>>>      opts->x_flag_ipa_cp_alignment = value;
>>>>    if (!opts_set->x_flag_predictive_commoning)
>>>>      opts->x_flag_predictive_commoning = value;
>>>> +  if (!opts_set->x_flag_split_loops)
>>>> +    opts->x_flag_split_loops = value;
>>>>    if (!opts_set->x_flag_unswitch_loops)
>>>>      opts->x_flag_unswitch_loops = value;
>>>>    if (!opts_set->x_flag_gcse_after_reload)
>>>> Index: timevar.def
>>>> ===================================================================
>>>> --- timevar.def (revision 231115)
>>>> +++ timevar.def (working copy)
>>>> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM                   , "
>>>>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>>>>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>>>>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>>>> +DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
>>>>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>>>>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>>>>  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
>>>> Index: tree-pass.h
>>>> ===================================================================
>>>> --- tree-pass.h (revision 231115)
>>>> +++ tree-pass.h (working copy)
>>>> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>>>>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>>>>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>>>>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>>>> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>>>>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>>>>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>>>>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>>>> Index: tree-ssa-loop-manip.h
>>>> ===================================================================
>>>> --- tree-ssa-loop-manip.h       (revision 231115)
>>>> +++ tree-ssa-loop-manip.h       (working copy)
>>>> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>>>>
>>>>  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>>>>                        bool, tree *, tree *);
>>>> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
>>>> +                                           struct loop *);
>>>>  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>>>>  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>>>>  extern void verify_loop_closed_ssa (bool);
>>>> Index: Makefile.in
>>>> ===================================================================
>>>> --- Makefile.in (revision 231115)
>>>> +++ Makefile.in (working copy)
>>>> @@ -1474,6 +1474,7 @@ OBJS = \
>>>>         tree-ssa-loop-manip.o \
>>>>         tree-ssa-loop-niter.o \
>>>>         tree-ssa-loop-prefetch.o \
>>>> +       tree-ssa-loop-split.o \
>>>>         tree-ssa-loop-unswitch.o \
>>>>         tree-ssa-loop.o \
>>>>         tree-ssa-math-opts.o \
>>>> Index: tree-ssa-loop-split.c
>>>> ===================================================================
>>>> --- tree-ssa-loop-split.c       (revision 0)
>>>> +++ tree-ssa-loop-split.c       (working copy)
>>>> @@ -0,0 +1,686 @@
>>>> +/* Loop splitting.
>>>> +   Copyright (C) 2015 Free Software Foundation, Inc.
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify it
>>>> +under the terms of the GNU General Public License as published by the
>>>> +Free Software Foundation; either version 3, or (at your option) any
>>>> +later version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>>>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>> +for more details.
>>>> +
>>>> +You should have received a copy of the GNU General Public License
>>>> +along with GCC; see the file COPYING3.  If not see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +#include "config.h"
>>>> +#include "system.h"
>>>> +#include "coretypes.h"
>>>> +#include "backend.h"
>>>> +#include "tree.h"
>>>> +#include "gimple.h"
>>>> +#include "tree-pass.h"
>>>> +#include "ssa.h"
>>>> +#include "fold-const.h"
>>>> +#include "tree-cfg.h"
>>>> +#include "tree-ssa.h"
>>>> +#include "tree-ssa-loop-niter.h"
>>>> +#include "tree-ssa-loop.h"
>>>> +#include "tree-ssa-loop-manip.h"
>>>> +#include "tree-into-ssa.h"
>>>> +#include "cfgloop.h"
>>>> +#include "tree-scalar-evolution.h"
>>>> +#include "gimple-iterator.h"
>>>> +#include "gimple-pretty-print.h"
>>>> +#include "cfghooks.h"
>>>> +#include "gimple-fold.h"
>>>> +#include "gimplify-me.h"
>>>> +
>>>> +/* This file implements loop splitting, i.e. transformation of loops like
>>>> +
>>>> +   for (i = 0; i < 100; i++)
>>>> +     {
>>>> +       if (i < 50)
>>>> +         A;
>>>> +       else
>>>> +         B;
>>>> +     }
>>>> +
>>>> +   into:
>>>> +
>>>> +   for (i = 0; i < 50; i++)
>>>> +     {
>>>> +       A;
>>>> +     }
>>>> +   for (; i < 100; i++)
>>>> +     {
>>>> +       B;
>>>> +     }
>>>> +
>>>> +   */
>>>> +
>>>> +/* Return true when BB inside LOOP is a potential iteration space
>>>> +   split point, i.e. ends with a condition like "IV < comp", which
>>>> +   is true on one side of the iteration space and false on the other,
>>>> +   and the split point can be computed.  If so, also return the border
>>>> +   point in *BORDER and the comparison induction variable in IV.  */
>>>> +
>>>> +static tree
>>>> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
>>>> +{
>>>> +  gimple *last;
>>>> +  gcond *stmt;
>>>> +  affine_iv iv2;
>>>> +
>>>> +  /* BB must end in a simple conditional jump.  */
>>>> +  last = last_stmt (bb);
>>>> +  if (!last || gimple_code (last) != GIMPLE_COND)
>>>> +    return NULL_TREE;
>>>> +  stmt = as_a <gcond *> (last);
>>>> +
>>>> +  enum tree_code code = gimple_cond_code (stmt);
>>>> +
>>>> +  /* Only handle relational comparisons, for equality and non-equality
>>>> +     we'd have to split the loop into two loops and a middle statement.  */
>>>> +  switch (code)
>>>> +    {
>>>> +      case LT_EXPR:
>>>> +      case LE_EXPR:
>>>> +      case GT_EXPR:
>>>> +      case GE_EXPR:
>>>> +       break;
>>>> +      default:
>>>> +       return NULL_TREE;
>>>> +    }
>>>> +
>>>> +  if (loop_exits_from_bb_p (loop, bb))
>>>> +    return NULL_TREE;
>>>> +
>>>> +  tree op0 = gimple_cond_lhs (stmt);
>>>> +  tree op1 = gimple_cond_rhs (stmt);
>>>> +
>>>> +  if (!simple_iv (loop, loop, op0, iv, false))
>>>> +    return NULL_TREE;
>>>> +  if (!simple_iv (loop, loop, op1, &iv2, false))
>>>> +    return NULL_TREE;
>>>> +
>>>> +  /* Make it so, that the first argument of the condition is
>>>> +     the looping one (only swap.  */
>>>> +  if (!integer_zerop (iv2.step))
>>>> +    {
>>>> +      std::swap (op0, op1);
>>>> +      std::swap (*iv, iv2);
>>>> +      code = swap_tree_comparison (code);
>>>> +      gimple_cond_set_condition (stmt, code, op0, op1);
>>>> +      update_stmt (stmt);
>>>> +    }
>>>> +  else if (integer_zerop (iv->step))
>>>> +    return NULL_TREE;
>>>> +  if (!integer_zerop (iv2.step))
>>>> +    return NULL_TREE;
>>>> +
>>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +    {
>>>> +      fprintf (dump_file, "Found potential split point: ");
>>>> +      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>> +      fprintf (dump_file, " { ");
>>>> +      print_generic_expr (dump_file, iv->base, TDF_SLIM);
>>>> +      fprintf (dump_file, " + I*");
>>>> +      print_generic_expr (dump_file, iv->step, TDF_SLIM);
>>>> +      fprintf (dump_file, " } %s ", get_tree_code_name (code));
>>>> +      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
>>>> +      fprintf (dump_file, "\n");
>>>> +    }
>>>> +
>>>> +  *border = iv2.base;
>>>> +  return op0;
>>>> +}
>>>> +
>>>> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
>>>> +   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
>>>> +   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
>>>> +   exit test statement to loop back only if the GUARD statement will
>>>> +   also be true/false in the next iteration.  */
>>>> +
>>>> +static void
>>>> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
>>>> +                bool initial_true)
>>>> +{
>>>> +  edge exit = single_exit (loop);
>>>> +  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
>>>> +  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
>>>> +                            nextval, newbound);
>>>> +  update_stmt (stmt);
>>>> +
>>>> +  edge stay = single_pred_edge (loop->latch);
>>>> +
>>>> +  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>>> +  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>>> +
>>>> +  if (initial_true)
>>>> +    {
>>>> +      exit->flags |= EDGE_FALSE_VALUE;
>>>> +      stay->flags |= EDGE_TRUE_VALUE;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      exit->flags |= EDGE_TRUE_VALUE;
>>>> +      stay->flags |= EDGE_FALSE_VALUE;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
>>>> +   find the loop phi node in LOOP defining it directly, or create
>>>> +   such phi node.  Return that phi node.  */
>>>> +
>>>> +static gphi *
>>>> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
>>>> +{
>>>> +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
>>>> +  gphi *phi;
>>>> +  if ((phi = dyn_cast <gphi *> (def))
>>>> +      && gimple_bb (phi) == loop->header)
>>>> +    return phi;
>>>> +
>>>> +  /* XXX Create the PHI instead.  */
>>>> +  return NULL;
>>>> +}
>>>> +
>>>> +/* This function updates the SSA form after connect_loops made a new
>>>> +   edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
>>>> +   conditional).  I.e. the second loop can now be entered either
>>>> +   via the original entry or via NEW_E, so the entry values of LOOP2
>>>> +   phi nodes are either the original ones or those at the exit
>>>> +   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
>>>> +   this.  */
>>>> +
>>>> +static void
>>>> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
>>>> +{
>>>> +  basic_block rest = loop_preheader_edge (loop2)->src;
>>>> +  gcc_assert (new_e->dest == rest);
>>>> +  edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
>>>> +
>>>> +  edge firste = loop_preheader_edge (loop1);
>>>> +  edge seconde = loop_preheader_edge (loop2);
>>>> +  edge firstn = loop_latch_edge (loop1);
>>>> +  gphi_iterator psi_first, psi_second;
>>>> +  for (psi_first = gsi_start_phis (loop1->header),
>>>> +       psi_second = gsi_start_phis (loop2->header);
>>>> +       !gsi_end_p (psi_first);
>>>> +       gsi_next (&psi_first), gsi_next (&psi_second))
>>>> +    {
>>>> +      tree init, next, new_init;
>>>> +      use_operand_p op;
>>>> +      gphi *phi_first = psi_first.phi ();
>>>> +      gphi *phi_second = psi_second.phi ();
>>>> +
>>>> +      init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>>>> +      next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>>>> +      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>>>> +      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>>>> +
>>>> +      /* Prefer using original variable as a base for the new ssa name.
>>>> +        This is necessary for virtual ops, and useful in order to avoid
>>>> +        losing debug info for real ops.  */
>>>> +      if (TREE_CODE (next) == SSA_NAME
>>>> +         && useless_type_conversion_p (TREE_TYPE (next),
>>>> +                                       TREE_TYPE (init)))
>>>> +       new_init = copy_ssa_name (next);
>>>> +      else if (TREE_CODE (init) == SSA_NAME
>>>> +              && useless_type_conversion_p (TREE_TYPE (init),
>>>> +                                            TREE_TYPE (next)))
>>>> +       new_init = copy_ssa_name (init);
>>>> +      else if (useless_type_conversion_p (TREE_TYPE (next),
>>>> +                                         TREE_TYPE (init)))
>>>> +       new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>>>> +                                      "unrinittmp");
>>>> +      else
>>>> +       new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>>>> +                                      "unrinittmp");
>>>> +
>>>> +      gphi * newphi = create_phi_node (new_init, rest);
>>>> +      add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>>>> +      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>>>> +      SET_USE (op, new_init);
>>>> +    }
>>>> +}
>>>> +
>>>> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
>>>> +   they are still equivalent and placed in two arms of a diamond, like so:
>>>> +
>>>> +               .------if (cond)------.
>>>> +               v                     v
>>>> +             pre1                   pre2
>>>> +              |                      |
>>>> +        .--->h1                     h2<----.
>>>> +        |     |                      |     |
>>>> +        |    ex1---.            .---ex2    |
>>>> +        |    /     |            |     \    |
>>>> +        '---l1     X            |     l2---'
>>>> +                   |            |
>>>> +                   |            |
>>>> +                   '--->join<---'
>>>> +
>>>> +   This function transforms the program such that LOOP1 is conditionally
>>>> +   falling through to LOOP2, or skipping it.  This is done by splitting
>>>> +   the ex1->join edge at X in the diagram above, and inserting a condition
>>>> +   whose one arm goes to pre2, resulting in this situation:
>>>> +
>>>> +               .------if (cond)------.
>>>> +               v                     v
>>>> +             pre1       .---------->pre2
>>>> +              |         |            |
>>>> +        .--->h1         |           h2<----.
>>>> +        |     |         |            |     |
>>>> +        |    ex1---.    |       .---ex2    |
>>>> +        |    /     v    |       |     \    |
>>>> +        '---l1   skip---'       |     l2---'
>>>> +                   |            |
>>>> +                   |            |
>>>> +                   '--->join<---'
>>>> +
>>>> +
>>>> +   The condition used is the exit condition of LOOP1, which effectively means
>>>> +   that when the first loop exits (for whatever reason) but the real original
>>>> +   exit expression is still false the second loop will be entered.
>>>> +   The function returns the new edge cond->pre2.
>>>> +
>>>> +   This doesn't update the SSA form, see connect_loop_phis for that.  */
>>>> +
>>>> +static edge
>>>> +connect_loops (struct loop *loop1, struct loop *loop2)
>>>> +{
>>>> +  edge exit = single_exit (loop1);
>>>> +  basic_block skip_bb = split_edge (exit);
>>>> +  gcond *skip_stmt;
>>>> +  gimple_stmt_iterator gsi;
>>>> +  edge new_e, skip_e;
>>>> +
>>>> +  gimple *stmt = last_stmt (exit->src);
>>>> +  skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
>>>> +                                gimple_cond_lhs (stmt),
>>>> +                                gimple_cond_rhs (stmt),
>>>> +                                NULL_TREE, NULL_TREE);
>>>> +  gsi = gsi_last_bb (skip_bb);
>>>> +  gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
>>>> +
>>>> +  skip_e = EDGE_SUCC (skip_bb, 0);
>>>> +  skip_e->flags &= ~EDGE_FALLTHRU;
>>>> +  new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
>>>> +  if (exit->flags & EDGE_TRUE_VALUE)
>>>> +    {
>>>> +      skip_e->flags |= EDGE_TRUE_VALUE;
>>>> +      new_e->flags |= EDGE_FALSE_VALUE;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      skip_e->flags |= EDGE_FALSE_VALUE;
>>>> +      new_e->flags |= EDGE_TRUE_VALUE;
>>>> +    }
>>>> +
>>>> +  new_e->count = skip_bb->count;
>>>> +  new_e->probability = PROB_LIKELY;
>>>> +  new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
>>>> +  skip_e->count -= new_e->count;
>>>> +  skip_e->probability = inverse_probability (PROB_LIKELY);
>>>> +
>>>> +  return new_e;
>>>> +}
>>>> +
>>>> +/* This returns the new bound for iterations given the original iteration
>>>> +   space in NITER, an arbitrary new bound BORDER, assumed to be some
>>>> +   comparison value with a different IV, the initial value GUARD_INIT of
>>>> +   that other IV, and the comparison code GUARD_CODE that compares
>>>> +   that other IV with BORDER.  We return an SSA name, and place any
>>>> +   necessary statements for that computation into *STMTS.
>>>> +
>>>> +   For example for such a loop:
>>>> +
>>>> +     for (i = beg, j = guard_init; i < end; i++, j++)
>>>> +       if (j < border)  // this is supposed to be true/false
>>>> +         ...
>>>> +
>>>> +   we want to return a new bound (on j) that makes the loop iterate
>>>> +   as long as the condition j < border stays true.  We also don't want
>>>> +   to iterate more often than the original loop, so we have to introduce
>>>> +   some cut-off as well (via min/max), effectively resulting in:
>>>> +
>>>> +     newend = min (end+guard_init-beg, border)
>>>> +     for (i = beg; j = guard_init; j < newend; i++, j++)
>>>> +       if (j < c)
>>>> +         ...
>>>> +
>>>> +   Depending on the direction of the IVs and if the exit tests
>>>> +   are strict or non-strict we need to use MIN or MAX,
>>>> +   and add or subtract 1.  This routine computes newend above.  */
>>>> +
>>>> +static tree
>>>> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>>>> +                        tree border,
>>>> +                        enum tree_code guard_code, tree guard_init)
>>>> +{
>>>> +  /* The niter structure contains the after-increment IV, we need
>>>> +     the loop-enter base, so subtract STEP once.  */
>>>> +  tree controlbase = force_gimple_operand (niter->control.base,
>>>> +                                          stmts, true, NULL_TREE);
>>>> +  tree controlstep = niter->control.step;
>>>> +  tree enddiff;
>>>> +  if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>>>> +    {
>>>> +      controlstep = gimple_build (stmts, NEGATE_EXPR,
>>>> +                                 TREE_TYPE (controlstep), controlstep);
>>>> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>>> +                             TREE_TYPE (controlbase),
>>>> +                             controlbase, controlstep);
>>>> +    }
>>>> +  else
>>>> +    enddiff = gimple_build (stmts, MINUS_EXPR,
>>>> +                           TREE_TYPE (controlbase),
>>>> +                           controlbase, controlstep);
>>>> +
>>>> +  /* Compute beg-guard_init.  */
>>>> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
>>>> +    {
>>>> +      tree tem = gimple_convert (stmts, sizetype, guard_init);
>>>> +      tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
>>>> +      enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>>> +                             TREE_TYPE (enddiff),
>>>> +                             enddiff, tem);
>>>> +    }
>>>> +  else
>>>> +    enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>>> +                           enddiff, guard_init);
>>>> +
>>>> +  /* Compute end-(beg-guard_init).  */
>>>> +  gimple_seq stmts2;
>>>> +  tree newbound = force_gimple_operand (niter->bound, &stmts2,
>>>> +                                       true, NULL_TREE);
>>>> +  gimple_seq_add_seq_without_update (stmts, stmts2);
>>>> +
>>>> +  if (POINTER_TYPE_P (TREE_TYPE (enddiff))
>>>> +      || POINTER_TYPE_P (TREE_TYPE (newbound)))
>>>> +    {
>>>> +      enddiff = gimple_convert (stmts, sizetype, enddiff);
>>>> +      enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
>>>> +      newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
>>>> +                              TREE_TYPE (newbound),
>>>> +                              newbound, enddiff);
>>>> +    }
>>>> +  else
>>>> +    newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>>> +                            newbound, enddiff);
>>>> +
>>>> +  /* Depending on the direction of the IVs the new bound for the first
>>>> +     loop is the minimum or maximum of old bound and border.
>>>> +     Also, if the guard condition isn't strictly less or greater,
>>>> +     we need to adjust the bound.  */
>>>> +  int addbound = 0;
>>>> +  enum tree_code minmax;
>>>> +  if (niter->cmp == LT_EXPR)
>>>> +    {
>>>> +      /* GT and LE are the same, inverted.  */
>>>> +      if (guard_code == GT_EXPR || guard_code == LE_EXPR)
>>>> +       addbound = -1;
>>>> +      minmax = MIN_EXPR;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      gcc_assert (niter->cmp == GT_EXPR);
>>>> +      if (guard_code == GE_EXPR || guard_code == LT_EXPR)
>>>> +       addbound = 1;
>>>> +      minmax = MAX_EXPR;
>>>> +    }
>>>> +
>>>> +  if (addbound)
>>>> +    {
>>>> +      tree type2 = TREE_TYPE (newbound);
>>>> +      if (POINTER_TYPE_P (type2))
>>>> +       type2 = sizetype;
>>>> +      newbound = gimple_build (stmts,
>>>> +                              POINTER_TYPE_P (TREE_TYPE (newbound))
>>>> +                              ? POINTER_PLUS_EXPR : PLUS_EXPR,
>>>> +                              TREE_TYPE (newbound),
>>>> +                              newbound,
>>>> +                              build_int_cst (type2, addbound));
>>>> +    }
>>>> +
>>>> +  tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
>>>> +                             border, newbound);
>>>> +  return newend;
>>>> +}
>>>> +
>>>> +/* Checks if LOOP contains an conditional block whose condition
>>>> +   depends on which side in the iteration space it is, and if so
>>>> +   splits the iteration space into two loops.  Returns true if the
>>>> +   loop was split.  NITER must contain the iteration descriptor for the
>>>> +   single exit of LOOP.  */
>>>> +
>>>> +static bool
>>>> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
>>>> +{
>>>> +  basic_block *bbs;
>>>> +  unsigned i;
>>>> +  bool changed = false;
>>>> +  tree guard_iv;
>>>> +  tree border;
>>>> +  affine_iv iv;
>>>> +
>>>> +  bbs = get_loop_body (loop1);
>>>> +
>>>> +  /* Find a splitting opportunity.  */
>>>> +  for (i = 0; i < loop1->num_nodes; i++)
>>>> +    if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>>>> +      {
>>>> +       /* Handling opposite steps is not implemented yet.  Neither
>>>> +          is handling different step sizes.  */
>>>> +       if ((tree_int_cst_sign_bit (iv.step)
>>>> +            != tree_int_cst_sign_bit (niter->control.step))
>>>> +           || !tree_int_cst_equal (iv.step, niter->control.step))
>>>> +         continue;
>>>> +
>>>> +       /* Find a loop PHI node that defines guard_iv directly,
>>>> +          or create one doing that.  */
>>>> +       gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
>>>> +       if (!phi)
>>>> +         continue;
>>>> +       gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
>>>> +       tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
>>>> +                                                loop_preheader_edge (loop1));
>>>> +       enum tree_code guard_code = gimple_cond_code (guard_stmt);
>>>> +
>>>> +       /* Loop splitting is implemented by versioning the loop, placing
>>>> +          the new loop after the old loop, make the first loop iterate
>>>> +          as long as the conditional stays true (or false) and let the
>>>> +          second (new) loop handle the rest of the iterations.
>>>> +
>>>> +          First we need to determine if the condition will start being true
>>>> +          or false in the first loop.  */
>>>> +       bool initial_true;
>>>> +       switch (guard_code)
>>>> +         {
>>>> +           case LT_EXPR:
>>>> +           case LE_EXPR:
>>>> +             initial_true = !tree_int_cst_sign_bit (iv.step);
>>>> +             break;
>>>> +           case GT_EXPR:
>>>> +           case GE_EXPR:
>>>> +             initial_true = tree_int_cst_sign_bit (iv.step);
>>>> +             break;
>>>> +           default:
>>>> +             gcc_unreachable ();
>>>> +         }
>>>> +
>>>> +       /* Build a condition that will skip the first loop when the
>>>> +          guard condition won't ever be true (or false).  */
>>>> +       gimple_seq stmts2;
>>>> +       border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
>>>> +       if (stmts2)
>>>> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>>> +                                           stmts2);
>>>> +       tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>>>> +       if (!initial_true)
>>>> +         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +
>>>> +       /* Now version the loop, placing loop2 after loop1 connecting
>>>> +          them, and fix up SSA form for that.  */
>>>> +       initialize_original_copy_tables ();
>>>> +       basic_block cond_bb;
>>>> +       struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>> +                                          REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>>>> +                                          REG_BR_PROB_BASE, true);
>>>> +       gcc_assert (loop2);
>>>> +       update_ssa (TODO_update_ssa);
>>>> +
>>>> +       edge new_e = connect_loops (loop1, loop2);
>>>> +       connect_loop_phis (loop1, loop2, new_e);
>>>> +
>>>> +       /* The iterations of the second loop is now already
>>>> +          exactly those that the first loop didn't do, but the
>>>> +          iteration space of the first loop is still the original one.
>>>> +          Compute the new bound for the guarding IV and patch the
>>>> +          loop exit to use it instead of original IV and bound.  */
>>>> +       gimple_seq stmts = NULL;
>>>> +       tree newend = compute_new_first_bound (&stmts, niter, border,
>>>> +                                              guard_code, guard_init);
>>>> +       if (stmts)
>>>> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>>> +                                           stmts);
>>>> +       tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>>>> +       patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>>>> +
>>>> +       /* Finally patch out the two copies of the condition to be always
>>>> +          true/false (or opposite).  */
>>>> +       gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
>>>> +       gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
>>>> +       if (!initial_true)
>>>> +         std::swap (force_true, force_false);
>>>> +       gimple_cond_make_true (force_true);
>>>> +       gimple_cond_make_false (force_false);
>>>> +       update_stmt (force_true);
>>>> +       update_stmt (force_false);
>>>> +
>>>> +       free_original_copy_tables ();
>>>> +
>>>> +       /* We destroyed LCSSA form above.  Eventually we might be able
>>>> +          to fix it on the fly, for now simply punt and use the helper.  */
>>>> +       rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>>>> +
>>>> +       changed = true;
>>>> +       if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +         fprintf (dump_file, ";; Loop split.\n");
>>>> +
>>>> +       /* Only deal with the first opportunity.  */
>>>> +       break;
>>>> +      }
>>>> +
>>>> +  free (bbs);
>>>> +  return changed;
>>>> +}
>>>> +
>>>> +/* Main entry point.  Perform loop splitting on all suitable loops.  */
>>>> +
>>>> +static unsigned int
>>>> +tree_ssa_split_loops (void)
>>>> +{
>>>> +  struct loop *loop;
>>>> +  bool changed = false;
>>>> +
>>>> +  gcc_assert (scev_initialized_p ());
>>>> +  FOR_EACH_LOOP (loop, 0)
>>>> +    loop->aux = NULL;
>>>> +
>>>> +  /* Go through all loops starting from innermost.  */
>>>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>>> +    {
>>>> +      struct tree_niter_desc niter;
>>>> +      if (loop->aux)
>>>> +       {
>>>> +         /* If any of our inner loops was split, don't split us,
>>>> +            and mark our containing loop as having had splits as well.  */
>>>> +         loop_outer (loop)->aux = loop;
>>>> +         continue;
>>>> +       }
>>>> +
>>>> +      if (single_exit (loop)
>>>> +         /* ??? We could handle non-empty latches when we split
>>>> +            the latch edge (not the exit edge), and put the new
>>>> +            exit condition in the new block.  OTOH this executes some
>>>> +            code unconditionally that might have been skipped by the
>>>> +            original exit before.  */
>>>> +         && empty_block_p (loop->latch)
>>>> +         && !optimize_loop_for_size_p (loop)
>>>> +         && number_of_iterations_exit (loop, single_exit (loop), &niter,
>>>> +                                       false, true)
>>>> +         && niter.cmp != ERROR_MARK
>>>> +         /* We can't yet handle loops controlled by a != predicate.  */
>>>> +         && niter.cmp != NE_EXPR)
>>>> +       {
>>>> +         if (split_loop (loop, &niter))
>>>> +           {
>>>> +             /* Mark our containing loop as having had some split inner
>>>> +                loops.  */
>>>> +             loop_outer (loop)->aux = loop;
>>>> +             changed = true;
>>>> +           }
>>>> +       }
>>>> +    }
>>>> +
>>>> +  FOR_EACH_LOOP (loop, 0)
>>>> +    loop->aux = NULL;
>>>> +
>>>> +  if (changed)
>>>> +    return TODO_cleanup_cfg;
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Loop splitting pass.  */
>>>> +
>>>> +namespace {
>>>> +
>>>> +const pass_data pass_data_loop_split =
>>>> +{
>>>> +  GIMPLE_PASS, /* type */
>>>> +  "lsplit", /* name */
>>>> +  OPTGROUP_LOOP, /* optinfo_flags */
>>>> +  TV_LOOP_SPLIT, /* tv_id */
>>>> +  PROP_cfg, /* properties_required */
>>>> +  0, /* properties_provided */
>>>> +  0, /* properties_destroyed */
>>>> +  0, /* todo_flags_start */
>>>> +  0, /* todo_flags_finish */
>>>> +};
>>>> +
>>>> +class pass_loop_split : public gimple_opt_pass
>>>> +{
>>>> +public:
>>>> +  pass_loop_split (gcc::context *ctxt)
>>>> +    : gimple_opt_pass (pass_data_loop_split, ctxt)
>>>> +  {}
>>>> +
>>>> +  /* opt_pass methods: */
>>>> +  virtual bool gate (function *) { return flag_split_loops != 0; }
>>>> +  virtual unsigned int execute (function *);
>>>> +
>>>> +}; // class pass_loop_split
>>>> +
>>>> +unsigned int
>>>> +pass_loop_split::execute (function *fun)
>>>> +{
>>>> +  if (number_of_loops (fun) <= 1)
>>>> +    return 0;
>>>> +
>>>> +  return tree_ssa_split_loops ();
>>>> +}
>>>> +
>>>> +} // anon namespace
>>>> +
>>>> +gimple_opt_pass *
>>>> +make_pass_loop_split (gcc::context *ctxt)
>>>> +{
>>>> +  return new pass_loop_split (ctxt);
>>>> +}
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi     (revision 231115)
>>>> +++ doc/invoke.texi     (working copy)
>>>> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>>>>  -fselective-scheduling -fselective-scheduling2 @gol
>>>>  -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>>>>  -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
>>>> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
>>>> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>>>>  -fsplit-paths @gol
>>>>  -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>>>>  -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
>>>> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>>>>  Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
>>>>  at level @option{-O1}
>>>>
>>>> +@item -fsplit-loops
>>>> +@opindex fsplit-loops
>>>> +Split a loop into two if it contains a condition that's always true
>>>> +for one side of the iteration space and false for the other.
>>>> +
>>>>  @item -funswitch-loops
>>>>  @opindex funswitch-loops
>>>>  Move branches with loop invariant conditions out of the loop, with duplicates
>>>> Index: doc/passes.texi
>>>> ===================================================================
>>>> --- doc/passes.texi     (revision 231115)
>>>> +++ doc/passes.texi     (working copy)
>>>> @@ -484,6 +484,12 @@ out of the loops.  To achieve this, a du
>>>>  each possible outcome of conditional jump(s).  The pass is implemented in
>>>>  @file{tree-ssa-loop-unswitch.c}.
>>>>
>>>> +Loop splitting.  If a loop contains a conditional statement that is
>>>> +always true for one part of the iteration space and false for the other
>>>> +this pass splits the loop into two, one dealing with one side the other
>>>> +only with the other, thereby removing one inner-loop conditional.  The
>>>> +pass is implemented in @file{tree-ssa-loop-split.c}.
>>>> +
>>>>  The optimizations also use various utility functions contained in
>>>>  @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>>>>  @file{cfgloopmanip.c}.
>>>> Index: testsuite/gcc.dg/loop-split.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/loop-split.c       (revision 0)
>>>> +++ testsuite/gcc.dg/loop-split.c       (working copy)
>>>> @@ -0,0 +1,147 @@
>>>> +/* { dg-do run } */
>>>> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>>>> +
>>>> +#ifdef __cplusplus
>>>> +extern "C" int printf (const char *, ...);
>>>> +extern "C" void abort (void);
>>>> +#else
>>>> +extern int printf (const char *, ...);
>>>> +extern void abort (void);
>>>> +#endif
>>>> +
>>>> +/* Define TRACE to 1 or 2 to get detailed tracing.
>>>> +   Define SINGLE_TEST to 1 or 2 to get a simple routine with
>>>> +   just one loop, called only one time or with multiple parameters,
>>>> +   to make debugging easier.  */
>>>> +#ifndef TRACE
>>>> +#define TRACE 0
>>>> +#endif
>>>> +
>>>> +#define loop(beg,step,beg2,cond1,cond2) \
>>>> +    do \
>>>> +      { \
>>>> +       sum = 0; \
>>>> +        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
>>>> +          { \
>>>> +            if (cond2) { \
>>>> +             if (TRACE > 1) printf ("a: %d %d\n", i, j); \
>>>> +              sum += a[i]; \
>>>> +           } else { \
>>>> +             if (TRACE > 1) printf ("b: %d %d\n", i, j); \
>>>> +              sum += b[i]; \
>>>> +           } \
>>>> +          } \
>>>> +       if (TRACE > 0) printf ("sum: %d\n", sum); \
>>>> +       check = check * 47 + sum; \
>>>> +      } while (0)
>>>> +
>>>> +#ifndef SINGLE_TEST
>>>> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
>>>> +                                              int c, int *a, int *b, int beg2)
>>>> +{
>>>> +  unsigned check = 0;
>>>> +  int sum;
>>>> +  int i, j;
>>>> +  loop (beg, 1, beg2, i < end, j < c);
>>>> +  loop (beg, 1, beg2, i <= end, j < c);
>>>> +  loop (beg, 1, beg2, i < end, j <= c);
>>>> +  loop (beg, 1, beg2, i <= end, j <= c);
>>>> +  loop (beg, 1, beg2, i < end, j > c);
>>>> +  loop (beg, 1, beg2, i <= end, j > c);
>>>> +  loop (beg, 1, beg2, i < end, j >= c);
>>>> +  loop (beg, 1, beg2, i <= end, j >= c);
>>>> +  beg2 += end-beg;
>>>> +  loop (end, -1, beg2, i >= beg, j >= c);
>>>> +  loop (end, -1, beg2, i >= beg, j > c);
>>>> +  loop (end, -1, beg2, i > beg, j >= c);
>>>> +  loop (end, -1, beg2, i > beg, j > c);
>>>> +  loop (end, -1, beg2, i >= beg, j <= c);
>>>> +  loop (end, -1, beg2, i >= beg, j < c);
>>>> +  loop (end, -1, beg2, i > beg, j <= c);
>>>> +  loop (end, -1, beg2, i > beg, j < c);
>>>> +  return check;
>>>> +}
>>>> +
>>>> +#else
>>>> +
>>>> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
>>>> +                                         int c, int *a, int *b, int beg2)
>>>> +{
>>>> +  int sum = 0;
>>>> +  int i, j;
>>>> +  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>>> +  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
>>>> +    {
>>>> +      // i - j == X --> i = X + j
>>>> +      // --> i < end == X+j < end == j < end - X
>>>> +      // --> newend = end - (i_init - j_init)
>>>> +      // j < end-X && j < c --> j < min(end-X,c)
>>>> +      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
>>>> +      //if (j < c)
>>>> +      if (j >= c)
>>>> +       printf ("a: %d %d\n", i, j);
>>>> +      /*else
>>>> +       printf ("b: %d %d\n", i, j);*/
>>>> +       /*sum += a[i];
>>>> +      else
>>>> +       sum += b[i];*/
>>>> +    }
>>>> +  return sum;
>>>> +}
>>>> +
>>>> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
>>>> +                                         int *c, int *a, int *b, int *beg2)
>>>> +{
>>>> +  int sum = 0;
>>>> +  int *i, *j;
>>>> +  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>>> +    {
>>>> +      if (j <= c)
>>>> +       printf ("%d %d\n", i - beg, j - beg);
>>>> +       /*sum += a[i];
>>>> +      else
>>>> +       sum += b[i];*/
>>>> +    }
>>>> +  return sum;
>>>> +}
>>>> +#endif
>>>> +
>>>> +extern int printf (const char *, ...);
>>>> +
>>>> +int main ()
>>>> +{
>>>> +  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
>>>> +  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
>>>> +  int c;
>>>> +  int diff = 0;
>>>> +  unsigned check = 0;
>>>> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
>>>> +  //dotest (0, 9, 1, -1, a+5, b+5, -1);
>>>> +  //return 0;
>>>> +  f (0, 9, 1, 5, a+5, b+5, -1);
>>>> +  return 0;
>>>> +#endif
>>>> +  for (diff = -5; diff <= 5; diff++)
>>>> +    {
>>>> +      for (c = -1; c <= 10; c++)
>>>> +       {
>>>> +#ifdef SINGLE_TEST
>>>> +         int s = f (0, 9, 1, c, a+5, b+5, diff);
>>>> +         //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
>>>> +         printf ("%d ", s);
>>>> +#else
>>>> +         if (TRACE > 0)
>>>> +           printf ("check %d %d\n", c, diff);
>>>> +         check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
>>>> +#endif
>>>> +       }
>>>> +      //printf ("\n");
>>>> +    }
>>>> +  //printf ("%u\n", check);
>>>> +  if (check != 3213344948)
>>>> +    abort ();
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* All 16 loops in dotest should be split.  */
>>>> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting
  2015-11-12 16:52 Gimple loop splitting Michael Matz
  2015-11-12 21:44 ` Jeff Law
@ 2016-07-25  7:00 ` Andrew Pinski
  2016-07-25 14:27   ` Michael Matz
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2016-07-25  7:00 UTC (permalink / raw)
  To: Michael Matz; +Cc: GCC Patches

On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz <matz@suse.de> wrote:
> Hello,
>
> this new pass implements loop iteration space splitting for loops that
> contain a conditional that's always true for one part of the iteration
> space and false for the other, i.e. such situations:
>
>   for (i = beg; i < end; i++)
>     if (i < p)
>       dothis();
>     else
>       dothat();
>
> this is transformed into roughly:
>
>   for (i = beg; i < p; i++)
>     dothis();
>   for (; i < end; i++)
>     dothat();
>
> Of course, not quite the above as there needs to be provisions for the
> border conditions, if e.g. 'p' is outside the original iteration space, or
> the conditional doesn't directly use the control IV, but some other, or
> the IV runs backwards.  The testcase checks many of these border
> conditions.
>
> This transformation is in itself a good one but can also be an enabler for
> the vectorizer.  It does increase code size, when the loop body contains
> also unconditional code (that one is duplicated), so we only transform hot
> loops.  I'm a bit unsure of the placement of the new pass, or if it should
> be an own pass at all.  Right now I've placed it after unswitching and
> scev_cprop, before loop distribution.  Ideally I think all three, together
> with loop fusion and an gimple unroller should be integrated into one loop
> nest optimizer, alas, we aren't there yet.
>
> I'm planning to work on loop fusion in the future as well, but that's not
> for GCC 6.
>
> I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> regressions.  I've also checked cpu2006 (the non-fortran part) for
> correctness, not yet for performance.  In the end it should probably only
> be enabled for -O3+ (although if the whole loop body is conditional it
> makes sense to also have it with -O2 because code growth is very small
> then).
>
> So, okay for trunk?

What ever happened to this patch?  I was looking into doing this
myself today but I found this patch.
It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC.

Thanks,
Andrew

>
>
> Ciao,
> Michael.
>         * passes.def (pass_loop_split): Add.
>         * timevar.def (TV_LOOP_SPLIT): Add.
>         * tree-pass.h (make_pass_loop_split): Declare.
>         * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>         * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>         cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
>         gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
>         (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
>         split_loop, tree_ssa_split_loops,
>         make_pass_loop_split): New functions.
>         (pass_data_loop_split): New.
>         (pass_loop_split): New.
>
> testsuite/
>         * gcc.dg/loop-split.c: New test.
>
> Index: passes.def
> ===================================================================
> --- passes.def  (revision 229763)
> +++ passes.def  (working copy)
> @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
>           NEXT_PASS (pass_dce);
>           NEXT_PASS (pass_tree_unswitch);
>           NEXT_PASS (pass_scev_cprop);
> +         NEXT_PASS (pass_loop_split);
>           NEXT_PASS (pass_record_bounds);
>           NEXT_PASS (pass_loop_distribution);
>           NEXT_PASS (pass_copy_prop);
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 229763)
> +++ timevar.def (working copy)
> @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM                   , "
>  DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
> +DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>  DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 229763)
> +++ tree-pass.h (working copy)
> @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
>  extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
> Index: tree-ssa-loop-manip.h
> ===================================================================
> --- tree-ssa-loop-manip.h       (revision 229763)
> +++ tree-ssa-loop-manip.h       (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
>  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>                        bool, tree *, tree *);
> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
> +                                           struct loop *);
>  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>  extern void verify_loop_closed_ssa (bool);
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c    (revision 229763)
> +++ tree-ssa-loop-unswitch.c    (working copy)
> @@ -31,12 +31,20 @@ along with GCC; see the file COPYING3.
>  #include "tree-ssa.h"
>  #include "tree-ssa-loop-niter.h"
>  #include "tree-ssa-loop.h"
> +#include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "cfganal.h"
>  #include "cfgloop.h"
> +#include "tree-chrec.h"
> +#include "tree-affine.h"
> +#include "tree-scalar-evolution.h"
>  #include "params.h"
>  #include "tree-inline.h"
>  #include "gimple-iterator.h"
> +#include "gimple-pretty-print.h"
>  #include "cfghooks.h"
> +#include "gimple-fold.h"
> +#include "gimplify-me.h"
>
>  /* This file implements the loop unswitching, i.e. transformation of loops like
>
> @@ -842,4 +850,551 @@ make_pass_tree_unswitch (gcc::context *c
>    return new pass_tree_unswitch (ctxt);
>  }
>
> +/* Return true when BB inside LOOP is a potential iteration space
> +   split point, i.e. ends with a condition like "IV < comp", which
> +   is true on one side of the iteration space and false on the other,
> +   and the split point can be computed.  If so, also return the border
> +   point in *BORDER and the comparison induction variable in IV.  */
>
> +static tree
> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +{
> +  gimple *last;
> +  gcond *stmt;
> +  affine_iv iv2;
> +
> +  /* BB must end in a simple conditional jump.  */
> +  last = last_stmt (bb);
> +  if (!last || gimple_code (last) != GIMPLE_COND)
> +    return NULL_TREE;
> +  stmt = as_a <gcond *> (last);
> +
> +  enum tree_code code = gimple_cond_code (stmt);
> +
> +  /* Only handle relational comparisons, for equality and non-equality
> +     we'd have to split the loop into two loops and a middle statement.  */
> +  switch (code)
> +    {
> +      case LT_EXPR:
> +      case LE_EXPR:
> +      case GT_EXPR:
> +      case GE_EXPR:
> +       break;
> +      default:
> +       return NULL_TREE;
> +    }
> +
> +  if (loop_exits_from_bb_p (loop, bb))
> +    return NULL_TREE;
> +
> +  tree op0 = gimple_cond_lhs (stmt);
> +  tree op1 = gimple_cond_rhs (stmt);
> +
> +  if (!simple_iv (loop, loop, op0, iv, false))
> +    return NULL_TREE;
> +  if (!simple_iv (loop, loop, op1, &iv2, false))
> +    return NULL_TREE;
> +
> +  /* Make it so, that the first argument of the condition is
> +     the looping one.  */
> +  if (integer_zerop (iv->step))
> +    {
> +      std::swap (op0, op1);
> +      std::swap (*iv, iv2);
> +      code = swap_tree_comparison (code);
> +      gimple_cond_set_condition (stmt, code, op0, op1);
> +      update_stmt (stmt);
> +    }
> +
> +  if (integer_zerop (iv->step))
> +    return NULL_TREE;
> +  if (!integer_zerop (iv2.step))
> +    return NULL_TREE;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "Found potential split point: ");
> +      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> +      fprintf (dump_file, " { ");
> +      print_generic_expr (dump_file, iv->base, TDF_SLIM);
> +      fprintf (dump_file, " + I*");
> +      print_generic_expr (dump_file, iv->step, TDF_SLIM);
> +      fprintf (dump_file, " } %s ", get_tree_code_name (code));
> +      print_generic_expr (dump_file, iv2.base, TDF_SLIM);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  *border = iv2.base;
> +  return op0;
> +}
> +
> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
> +   true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
> +   (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
> +   exit test statement to loop back only if the GUARD statement will
> +   also be true/false in the next iteration.  */
> +
> +static void
> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
> +                bool initial_true)
> +{
> +  edge exit = single_exit (loop);
> +  gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
> +  gimple_cond_set_condition (stmt, gimple_cond_code (guard),
> +                            nextval, newbound);
> +  update_stmt (stmt);
> +
> +  edge stay = single_pred_edge (loop->latch);
> +
> +  exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> +  stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> +
> +  if (initial_true)
> +    {
> +      exit->flags |= EDGE_FALSE_VALUE;
> +      stay->flags |= EDGE_TRUE_VALUE;
> +    }
> +  else
> +    {
> +      exit->flags |= EDGE_TRUE_VALUE;
> +      stay->flags |= EDGE_FALSE_VALUE;
> +    }
> +}
> +
> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> +   find the loop phi node in LOOP defining it directly, or create
> +   such phi node.  Return that phi node.  */
> +
> +static gphi *
> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> +  gphi *phi;
> +  if ((phi = dyn_cast <gphi *> (def))
> +      && gimple_bb (phi) == loop->header)
> +    return phi;
> +
> +  /* XXX Create the PHI instead.  */
> +  return NULL;
> +}
> +
> +/* Checks if LOOP contains an conditional block whose condition
> +   depends on which side in the iteration space it is, and if so
> +   splits the iteration space into two loops.  Returns true if the
> +   loop was split.  NITER must contain the iteration descriptor for the
> +   single exit of LOOP.  */
> +
> +static bool
> +split_loop (struct loop *loop, struct tree_niter_desc *niter)
> +{
> +  basic_block *bbs;
> +  unsigned i;
> +  bool changed = false;
> +  tree guard_iv;
> +  tree border;
> +  affine_iv iv;
> +
> +  bbs = get_loop_body (loop);
> +
> +  /* Find a splitting opportunity.  */
> +  for (i = 0; i < loop->num_nodes; i++)
> +    if ((guard_iv = split_at_bb_p (loop, bbs[i], &border, &iv)))
> +      {
> +       /* Handling opposite steps is not implemented yet.  Neither
> +          is handling different step sizes.  */
> +       if ((tree_int_cst_sign_bit (iv.step)
> +            != tree_int_cst_sign_bit (niter->control.step))
> +           || !tree_int_cst_equal (iv.step, niter->control.step))
> +         continue;
> +
> +       /* Find a loop PHI node that defines guard_iv directly,
> +          or create one doing that.  */
> +       gphi *phi = find_or_create_guard_phi (loop, guard_iv, &iv);
> +       if (!phi)
> +         continue;
> +       gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
> +       tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
> +                                                loop_preheader_edge (loop));
> +       enum tree_code guard_code = gimple_cond_code (guard_stmt);
> +
> +       /* Loop splitting is implemented by versioning the loop, placing
> +          the new loop in front of the old loop, make the first loop iterate
> +          as long as the conditional stays true (or false) and let the
> +          second (original) loop handle the rest of the iterations.
> +
> +          First we need to determine if the condition will start being true
> +          or false in the first loop.  */
> +       bool initial_true;
> +       switch (guard_code)
> +         {
> +           case LT_EXPR:
> +           case LE_EXPR:
> +             initial_true = !tree_int_cst_sign_bit (iv.step);
> +             break;
> +           case GT_EXPR:
> +           case GE_EXPR:
> +             initial_true = tree_int_cst_sign_bit (iv.step);
> +             break;
> +           default:
> +             gcc_unreachable ();
> +         }
> +
> +       /* Build a condition that will skip the first loop when the
> +          guard condition won't ever be true (or false).  */
> +       tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> +       if (initial_true)
> +         cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +       /* Now version the loop, we will then have this situation:
> +          if (!cond)
> +            for (...) {body}   //floop
> +          else
> +            for (...) {body}   //loop
> +          join:  */
> +       initialize_original_copy_tables ();
> +       basic_block cond_bb;
> +       struct loop *floop = loop_version (loop, cond, &cond_bb,
> +                                          REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> +                                          REG_BR_PROB_BASE, false);
> +       gcc_assert (floop);
> +       update_ssa (TODO_update_ssa);
> +
> +       /* Now diddle the exit edge of the first loop (floop->join in the
> +          above) to either go to the common exit (join) or to the second
> +          loop, depending on if there are still iterations left, or not.
> +          We split the floop exit edge and insert a copy of the
> +          original exit expression into the new block, that either
> +          skips the second loop or goes to it.  */
> +       edge exit = single_exit (floop);
> +       basic_block skip_bb = split_edge (exit);
> +       gcond *skip_stmt;
> +       gimple_stmt_iterator gsi;
> +       edge new_e, skip_e;
> +
> +       gimple *stmt = last_stmt (exit->src);
> +       skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
> +                                      gimple_cond_lhs (stmt),
> +                                      gimple_cond_rhs (stmt),
> +                                      NULL_TREE, NULL_TREE);
> +       gsi = gsi_last_bb (skip_bb);
> +       gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
> +
> +       skip_e = EDGE_SUCC (skip_bb, 0);
> +       skip_e->flags &= ~EDGE_FALLTHRU;
> +       new_e = make_edge (skip_bb, loop_preheader_edge (loop)->src, 0);
> +       if (exit->flags & EDGE_TRUE_VALUE)
> +         {
> +           skip_e->flags |= EDGE_TRUE_VALUE;
> +           new_e->flags |= EDGE_FALSE_VALUE;
> +         }
> +       else
> +         {
> +           skip_e->flags |= EDGE_FALSE_VALUE;
> +           new_e->flags |= EDGE_TRUE_VALUE;
> +         }
> +
> +       new_e->count = skip_bb->count;
> +       new_e->probability = PROB_LIKELY;
> +       new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
> +       skip_e->count -= new_e->count;
> +       skip_e->probability = inverse_probability (PROB_LIKELY);
> +
> +       /* Now we have created this situation:
> +            if (!cond) {
> +              for (...) {body; if (cexit) break;}
> +              if (!cexit) goto second;
> +            } else {
> +              second:
> +              for (...) {body; if (cexit) break;}
> +            }
> +            join:
> +
> +          The second loop can now be entered by skipping the first
> +          loop (the inital values of its PHI nodes will be the
> +          original initial values), or by falling in from the first
> +          loop (the initial values will be the continuation values
> +          from the first loop).  Insert PHI nodes reflecting this
> +          in the pre-header of the second loop.  */
> +
> +       basic_block rest = loop_preheader_edge (loop)->src;
> +       edge skip_first = find_edge (cond_bb, rest);
> +       gcc_assert (skip_first);
> +
> +       edge firste = loop_preheader_edge (floop);
> +       edge seconde = loop_preheader_edge (loop);
> +       edge firstn = loop_latch_edge (floop);
> +       gphi *new_guard_phi = 0;
> +       gphi_iterator psi_first, psi_second;
> +       for (psi_first = gsi_start_phis (floop->header),
> +            psi_second = gsi_start_phis (loop->header);
> +            !gsi_end_p (psi_first);
> +            gsi_next (&psi_first), gsi_next (&psi_second))
> +         {
> +           tree init, next, new_init;
> +           use_operand_p op;
> +           gphi *phi_first = psi_first.phi ();
> +           gphi *phi_second = psi_second.phi ();
> +
> +           if (phi_second == phi)
> +             new_guard_phi = phi_first;
> +
> +           init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
> +           next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
> +           op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
> +           gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
> +
> +           /* Prefer using original variable as a base for the new ssa name.
> +              This is necessary for virtual ops, and useful in order to avoid
> +              losing debug info for real ops.  */
> +           if (TREE_CODE (next) == SSA_NAME
> +               && useless_type_conversion_p (TREE_TYPE (next),
> +                                             TREE_TYPE (init)))
> +             new_init = copy_ssa_name (next);
> +           else if (TREE_CODE (init) == SSA_NAME
> +                    && useless_type_conversion_p (TREE_TYPE (init),
> +                                                  TREE_TYPE (next)))
> +             new_init = copy_ssa_name (init);
> +           else if (useless_type_conversion_p (TREE_TYPE (next),
> +                                               TREE_TYPE (init)))
> +             new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
> +                                            "unrinittmp");
> +           else
> +             new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
> +                                            "unrinittmp");
> +
> +           gphi * newphi = create_phi_node (new_init, rest);
> +           add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
> +           add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
> +           SET_USE (op, new_init);
> +         }
> +
> +       /* The iterations of the second loop is now already
> +          exactly those that the first loop didn't do, but the
> +          iteration space of the first loop is still the original one.
> +          Build a new one, exactly covering those iterations where
> +          the conditional is true (or false).  For example, from such a loop:
> +
> +            for (i = beg, j = beg2; i < end; i++, j++)
> +              if (j < c)  // this is supposed to be true
> +                ...
> +
> +          we build new bounds and change the exit condtions such that
> +          it's effectively this:
> +
> +            newend = min (end+beg2-beg, c)
> +            for (i = beg; j = beg2; j < newend; i++, j++)
> +              if (j < c)
> +                ...
> +
> +          Depending on the direction of the IVs and if the exit tests
> +          are strict or include equality we need to use MIN or MAX,
> +          and add or subtract 1.  */
> +
> +       gimple_seq stmts = NULL;
> +       /* The niter structure contains the after-increment IV, we need
> +          the loop-enter base, so subtract STEP once.  */
> +       tree controlbase = force_gimple_operand (niter->control.base,
> +                                                &stmts, true, NULL_TREE);
> +       tree controlstep = niter->control.step;
> +       tree enddiff;
> +       if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
> +         {
> +           controlstep = gimple_build (&stmts, NEGATE_EXPR,
> +                                       TREE_TYPE (controlstep), controlstep);
> +           enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
> +                                   TREE_TYPE (controlbase),
> +                                   controlbase, controlstep);
> +         }
> +       else
> +         enddiff = gimple_build (&stmts, MINUS_EXPR,
> +                                 TREE_TYPE (controlbase),
> +                                 controlbase, controlstep);
> +
> +       /* Compute beg-beg2.  */
> +       if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
> +         {
> +           tree tem = gimple_convert (&stmts, sizetype, guard_init);
> +           tem = gimple_build (&stmts, NEGATE_EXPR, sizetype, tem);
> +           enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
> +                                   TREE_TYPE (enddiff),
> +                                   enddiff, tem);
> +         }
> +       else
> +         enddiff = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> +                                 enddiff, guard_init);
> +
> +       /* Compute end-(beg-beg2).  */
> +       gimple_seq stmts2;
> +       tree newbound = force_gimple_operand (niter->bound, &stmts2,
> +                                             true, NULL_TREE);
> +       gimple_seq_add_seq_without_update (&stmts, stmts2);
> +
> +       if (POINTER_TYPE_P (TREE_TYPE (enddiff))
> +           || POINTER_TYPE_P (TREE_TYPE (newbound)))
> +         {
> +           enddiff = gimple_convert (&stmts, sizetype, enddiff);
> +           enddiff = gimple_build (&stmts, NEGATE_EXPR, sizetype, enddiff);
> +           newbound = gimple_build (&stmts, POINTER_PLUS_EXPR,
> +                                    TREE_TYPE (newbound),
> +                                    newbound, enddiff);
> +         }
> +       else
> +         newbound = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> +                                  newbound, enddiff);
> +
> +       /* Depending on the direction of the IVs the new bound for the first
> +          loop is the minimum or maximum of old bound and border.
> +          Also, if the guard condition isn't strictly less or greater,
> +          we need to adjust the bound.  */
> +       int addbound = 0;
> +       enum tree_code minmax;
> +       if (niter->cmp == LT_EXPR)
> +         {
> +           /* GT and LE are the same, inverted.  */
> +           if (guard_code == GT_EXPR || guard_code == LE_EXPR)
> +             addbound = -1;
> +           minmax = MIN_EXPR;
> +         }
> +       else
> +         {
> +           gcc_assert (niter->cmp == GT_EXPR);
> +           if (guard_code == GE_EXPR || guard_code == LT_EXPR)
> +             addbound = 1;
> +           minmax = MAX_EXPR;
> +         }
> +
> +       if (addbound)
> +         {
> +           tree type2 = TREE_TYPE (newbound);
> +           if (POINTER_TYPE_P (type2))
> +             type2 = sizetype;
> +           newbound = gimple_build (&stmts,
> +                                    POINTER_TYPE_P (TREE_TYPE (newbound))
> +                                    ? POINTER_PLUS_EXPR : PLUS_EXPR,
> +                                    TREE_TYPE (newbound),
> +                                    newbound,
> +                                    build_int_cst (type2, addbound));
> +         }
> +
> +       tree newend = gimple_build (&stmts, minmax, TREE_TYPE (border),
> +                                   border, newbound);
> +       if (stmts)
> +         gsi_insert_seq_on_edge_immediate (loop_preheader_edge (floop),
> +                                           stmts);
> +
> +       /* Now patch the exit block of the first loop to compare
> +          the post-increment value of the guarding IV with the new end
> +          value.  */
> +       tree new_guard_next = PHI_ARG_DEF_FROM_EDGE (new_guard_phi,
> +                                                    loop_latch_edge (floop));
> +       patch_loop_exit (floop, guard_stmt, new_guard_next, newend,
> +                        initial_true);
> +
> +       /* Finally patch out the two copies of the condition to be always
> +          true/false (or opposite).  */
> +       gcond *force_true = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
> +       gcond *force_false = as_a<gcond *> (last_stmt (bbs[i]));
> +       if (!initial_true)
> +         std::swap (force_true, force_false);
> +       gimple_cond_make_true (force_true);
> +       gimple_cond_make_false (force_false);
> +       update_stmt (force_true);
> +       update_stmt (force_false);
> +
> +       free_original_copy_tables ();
> +
> +       /* We destroyed LCSSA form above.  Eventually we might be able
> +          to fix it on the fly, for now simply punt and use the helper.  */
> +       rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, floop);
> +
> +       changed = true;
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         fprintf (dump_file, ";; Loop split.\n");
> +
> +       /* Only deal with the first opportunity.  */
> +       break;
> +      }
> +
> +  free (bbs);
> +  return changed;
> +}
> +
> +/* Main entry point.  Perform loop splitting on all suitable loops.  */
> +
> +static unsigned int
> +tree_ssa_split_loops (void)
> +{
> +  struct loop *loop;
> +  bool changed = false;
> +
> +  gcc_assert (scev_initialized_p ());
> +  /* Go through all loops starting from innermost.  */
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      struct tree_niter_desc niter;
> +      if (single_exit (loop)
> +         /* ??? We could handle non-empty latches when we split
> +            the latch edge (not the exit edge), and put the new
> +            exit condition in the new block.  OTOH this executes some
> +            code unconditionally that might have been skipped by the
> +            original exit before.  */
> +         && empty_block_p (loop->latch)
> +         && !optimize_loop_for_size_p (loop)
> +         && number_of_iterations_exit (loop, single_exit (loop), &niter,
> +                                       false, true)
> +         /* We can't yet handle loops controlled by a != predicate.  */
> +         && niter.cmp != NE_EXPR)
> +       changed |= split_loop (loop, &niter);
> +    }
> +
> +  if (changed)
> +    return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +/* Loop splitting pass.  */
> +
> +namespace {
> +
> +const pass_data pass_data_loop_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "lsplit", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_LOOP_SPLIT, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_loop_split : public gimple_opt_pass
> +{
> +public:
> +  pass_loop_split (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_loop_split, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return optimize >= 2; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_loop_split
> +
> +unsigned int
> +pass_loop_split::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  return tree_ssa_split_loops ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_loop_split (gcc::context *ctxt)
> +{
> +  return new pass_loop_split (ctxt);
> +}
> Index: testsuite/gcc.dg/loop-split.c
> ===================================================================
> --- testsuite/gcc.dg/loop-split.c       (revision 0)
> +++ testsuite/gcc.dg/loop-split.c       (working copy)
> @@ -0,0 +1,141 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
> +
> +#ifdef __cplusplus
> +extern "C" int printf (const char *, ...);
> +extern "C" void abort (void);
> +#else
> +extern int printf (const char *, ...);
> +extern void abort (void);
> +#endif
> +
> +#ifndef TRACE
> +#define TRACE 0
> +#endif
> +
> +#define loop(beg,step,beg2,cond1,cond2) \
> +    do \
> +      { \
> +       sum = 0; \
> +        for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
> +          { \
> +            if (cond2) { \
> +             if (TRACE > 1) printf ("a: %d %d\n", i, j); \
> +              sum += a[i]; \
> +           } else { \
> +             if (TRACE > 1) printf ("b: %d %d\n", i, j); \
> +              sum += b[i]; \
> +           } \
> +          } \
> +       if (TRACE > 0) printf ("sum: %d\n", sum); \
> +       check = check * 47 + sum; \
> +      } while (0)
> +
> +#if 1
> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
> +                                              int c, int *a, int *b, int beg2)
> +{
> +  unsigned check = 0;
> +  int sum;
> +  int i, j;
> +  loop (beg, 1, beg2, i < end, j < c);
> +  loop (beg, 1, beg2, i <= end, j < c);
> +  loop (beg, 1, beg2, i < end, j <= c);
> +  loop (beg, 1, beg2, i <= end, j <= c);
> +  loop (beg, 1, beg2, i < end, j > c);
> +  loop (beg, 1, beg2, i <= end, j > c);
> +  loop (beg, 1, beg2, i < end, j >= c);
> +  loop (beg, 1, beg2, i <= end, j >= c);
> +  beg2 += end-beg;
> +  loop (end, -1, beg2, i >= beg, j >= c);
> +  loop (end, -1, beg2, i >= beg, j > c);
> +  loop (end, -1, beg2, i > beg, j >= c);
> +  loop (end, -1, beg2, i > beg, j > c);
> +  loop (end, -1, beg2, i >= beg, j <= c);
> +  loop (end, -1, beg2, i >= beg, j < c);
> +  loop (end, -1, beg2, i > beg, j <= c);
> +  loop (end, -1, beg2, i > beg, j < c);
> +  return check;
> +}
> +
> +#else
> +
> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
> +                                         int c, int *a, int *b, int beg2)
> +{
> +  int sum = 0;
> +  int i, j;
> +  //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> +  for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
> +    {
> +      // i - j == X --> i = X + j
> +      // --> i < end == X+j < end == j < end - X
> +      // --> newend = end - (i_init - j_init)
> +      // j < end-X && j < c --> j < min(end-X,c)
> +      // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
> +      //if (j < c)
> +      if (j >= c)
> +       printf ("a: %d %d\n", i, j);
> +      /*else
> +       printf ("b: %d %d\n", i, j);*/
> +       /*sum += a[i];
> +      else
> +       sum += b[i];*/
> +    }
> +  return sum;
> +}
> +
> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
> +                                         int *c, int *a, int *b, int *beg2)
> +{
> +  int sum = 0;
> +  int *i, *j;
> +  for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> +    {
> +      if (j <= c)
> +       printf ("%d %d\n", i - beg, j - beg);
> +       /*sum += a[i];
> +      else
> +       sum += b[i];*/
> +    }
> +  return sum;
> +}
> +#endif
> +
> +extern int printf (const char *, ...);
> +
> +int main ()
> +{
> +  int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9,          0,0,0,0,0};
> +  int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
> +  int c;
> +  int diff = 0;
> +  unsigned check = 0;
> +  //dotest (0, 9, 1, -1, a+5, b+5, -1);
> +  //return 0;
> +  //f (0, 9, 1, -1, a+5, b+5, -1);
> +  //return 0;
> +  for (diff = -5; diff <= 5; diff++)
> +    {
> +      for (c = -1; c <= 10; c++)
> +       {
> +#if 0
> +         int s = f (0, 9, 1, c, a+5, b+5, diff);
> +         //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
> +         printf ("%d ", s);
> +#else
> +         if (TRACE > 0)
> +           printf ("check %d %d\n", c, diff);
> +         check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
> +#endif
> +       }
> +      //printf ("\n");
> +    }
> +  //printf ("%u\n", check);
> +  if (check != 3213344948)
> +    abort ();
> +  return 0;
> +}
> +
> +/* All 16 loops in dotest should be split.  */
> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Gimple loop splitting
  2016-07-25  7:00 ` Gimple loop splitting Andrew Pinski
@ 2016-07-25 14:27   ` Michael Matz
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Matz @ 2016-07-25 14:27 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: GCC Patches

Hi,

On Sun, 24 Jul 2016, Andrew Pinski wrote:

> What ever happened to this patch?

It got accepted but I deferred inclusion in GCC 6 because it 
was late in the cycle then and performance results didn't show super 
improvements (only looked at cpu2006).  No regressions, but no nice 
speedups either.

> I was looking into doing this myself today but I found this patch. It is 
> stage 1 of GCC 7, it might be a good idea to get this patch into GCC.

Indeed.  If you want to performance test it on something you know where it 
should help, I'm all ears.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-10-25 16:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 16:52 Gimple loop splitting Michael Matz
2015-11-12 21:44 ` Jeff Law
2015-11-16 16:06   ` Michael Matz
2015-11-16 23:27     ` Jeff Law
2015-12-01 16:47       ` Gimple loop splitting v2 Michael Matz
2015-12-01 22:57         ` Jeff Law
2015-12-02 13:23           ` Michael Matz
2015-12-05  7:55             ` Jeff Law
2016-10-20 14:43               ` Michael Matz
2016-10-20 14:56                 ` Bin.Cheng
2016-10-24  8:44                   ` Bin.Cheng
2016-10-24  9:02                     ` Michael Matz
2016-10-25 16:41                       ` Tamar Christina
2016-10-20 19:17                 ` Jeff Law
2016-07-25 20:57             ` Andrew Pinski
2016-07-26 11:32               ` Richard Biener
2016-07-27  6:18                 ` Andrew Pinski
2016-07-27  8:11                   ` Richard Biener
2016-07-25  7:00 ` Gimple loop splitting Andrew Pinski
2016-07-25 14:27   ` Michael Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).