* Gimple loop splitting
@ 2015-11-12 16:52 Michael Matz
2015-11-12 21:44 ` Jeff Law
2016-07-25 7:00 ` Gimple loop splitting Andrew Pinski
0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2015-11-12 16:52 UTC (permalink / raw)
To: gcc-patches
Hello,
this new pass implements loop iteration space splitting for loops that
contain a conditional that's always true for one part of the iteration
space and false for the other, i.e. such situations:
for (i = beg; i < end; i++)
if (i < p)
dothis();
else
dothat();
this is transformed into roughly:
for (i = beg; i < p; i++)
dothis();
for (; i < end; i++)
dothat();
Of course, not quite the above as there needs to be provisions for the
border conditions, if e.g. 'p' is outside the original iteration space, or
the conditional doesn't directly use the control IV, but some other, or
the IV runs backwards. The testcase checks many of these border
conditions.
This transformation is in itself a good one but can also be an enabler for
the vectorizer. It does increase code size, when the loop body contains
also unconditional code (that one is duplicated), so we only transform hot
loops. I'm a bit unsure of the placement of the new pass, or if it should
be an own pass at all. Right now I've placed it after unswitching and
scev_cprop, before loop distribution. Ideally I think all three, together
with loop fusion and an gimple unroller should be integrated into one loop
nest optimizer, alas, we aren't there yet.
I'm planning to work on loop fusion in the future as well, but that's not
for GCC 6.
I've regstrapped this pass enabled with -O2 on x86-64-linux, without
regressions. I've also checked cpu2006 (the non-fortran part) for
correctness, not yet for performance. In the end it should probably only
be enabled for -O3+ (although if the whole loop body is conditional it
makes sense to also have it with -O2 because code growth is very small
then).
So, okay for trunk?
Ciao,
Michael.
* passes.def (pass_loop_split): Add.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
split_loop, tree_ssa_split_loops,
make_pass_loop_split): New functions.
(pass_data_loop_split): New.
(pass_loop_split): New.
testsuite/
* gcc.dg/loop-split.c: New test.
Index: passes.def
===================================================================
--- passes.def (revision 229763)
+++ passes.def (working copy)
@@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_dce);
NEXT_PASS (pass_tree_unswitch);
NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_loop_split);
NEXT_PASS (pass_record_bounds);
NEXT_PASS (pass_loop_distribution);
NEXT_PASS (pass_copy_prop);
Index: timevar.def
===================================================================
--- timevar.def (revision 229763)
+++ timevar.def (working copy)
@@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , "
DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 229763)
+++ tree-pass.h (working copy)
@@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===================================================================
--- tree-ssa-loop-manip.h (revision 229763)
+++ tree-ssa-loop-manip.h (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+ struct loop *);
extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
extern void verify_loop_closed_ssa (bool);
Index: tree-ssa-loop-unswitch.c
===================================================================
--- tree-ssa-loop-unswitch.c (revision 229763)
+++ tree-ssa-loop-unswitch.c (working copy)
@@ -31,12 +31,20 @@ along with GCC; see the file COPYING3.
#include "tree-ssa.h"
#include "tree-ssa-loop-niter.h"
#include "tree-ssa-loop.h"
+#include "tree-ssa-loop-manip.h"
#include "tree-into-ssa.h"
+#include "cfganal.h"
#include "cfgloop.h"
+#include "tree-chrec.h"
+#include "tree-affine.h"
+#include "tree-scalar-evolution.h"
#include "params.h"
#include "tree-inline.h"
#include "gimple-iterator.h"
+#include "gimple-pretty-print.h"
#include "cfghooks.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
/* This file implements the loop unswitching, i.e. transformation of loops like
@@ -842,4 +850,551 @@ make_pass_tree_unswitch (gcc::context *c
return new pass_tree_unswitch (ctxt);
}
+/* Return true when BB inside LOOP is a potential iteration space
+ split point, i.e. ends with a condition like "IV < comp", which
+ is true on one side of the iteration space and false on the other,
+ and the split point can be computed. If so, also return the border
+ point in *BORDER and the comparison induction variable in IV. */
+static tree
+split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
+{
+ gimple *last;
+ gcond *stmt;
+ affine_iv iv2;
+
+ /* BB must end in a simple conditional jump. */
+ last = last_stmt (bb);
+ if (!last || gimple_code (last) != GIMPLE_COND)
+ return NULL_TREE;
+ stmt = as_a <gcond *> (last);
+
+ enum tree_code code = gimple_cond_code (stmt);
+
+ /* Only handle relational comparisons, for equality and non-equality
+ we'd have to split the loop into two loops and a middle statement. */
+ switch (code)
+ {
+ case LT_EXPR:
+ case LE_EXPR:
+ case GT_EXPR:
+ case GE_EXPR:
+ break;
+ default:
+ return NULL_TREE;
+ }
+
+ if (loop_exits_from_bb_p (loop, bb))
+ return NULL_TREE;
+
+ tree op0 = gimple_cond_lhs (stmt);
+ tree op1 = gimple_cond_rhs (stmt);
+
+ if (!simple_iv (loop, loop, op0, iv, false))
+ return NULL_TREE;
+ if (!simple_iv (loop, loop, op1, &iv2, false))
+ return NULL_TREE;
+
+ /* Make it so, that the first argument of the condition is
+ the looping one. */
+ if (integer_zerop (iv->step))
+ {
+ std::swap (op0, op1);
+ std::swap (*iv, iv2);
+ code = swap_tree_comparison (code);
+ gimple_cond_set_condition (stmt, code, op0, op1);
+ update_stmt (stmt);
+ }
+
+ if (integer_zerop (iv->step))
+ return NULL_TREE;
+ if (!integer_zerop (iv2.step))
+ return NULL_TREE;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Found potential split point: ");
+ print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+ fprintf (dump_file, " { ");
+ print_generic_expr (dump_file, iv->base, TDF_SLIM);
+ fprintf (dump_file, " + I*");
+ print_generic_expr (dump_file, iv->step, TDF_SLIM);
+ fprintf (dump_file, " } %s ", get_tree_code_name (code));
+ print_generic_expr (dump_file, iv2.base, TDF_SLIM);
+ fprintf (dump_file, "\n");
+ }
+
+ *border = iv2.base;
+ return op0;
+}
+
+/* Given a GUARD conditional stmt inside LOOP, which we want to make always
+ true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
+ (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
+ exit test statement to loop back only if the GUARD statement will
+ also be true/false in the next iteration. */
+
+static void
+patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
+ bool initial_true)
+{
+ edge exit = single_exit (loop);
+ gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
+ gimple_cond_set_condition (stmt, gimple_cond_code (guard),
+ nextval, newbound);
+ update_stmt (stmt);
+
+ edge stay = single_pred_edge (loop->latch);
+
+ exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+ stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+
+ if (initial_true)
+ {
+ exit->flags |= EDGE_FALSE_VALUE;
+ stay->flags |= EDGE_TRUE_VALUE;
+ }
+ else
+ {
+ exit->flags |= EDGE_TRUE_VALUE;
+ stay->flags |= EDGE_FALSE_VALUE;
+ }
+}
+
+/* Give an induction variable GUARD_IV, and its affine descriptor IV,
+ find the loop phi node in LOOP defining it directly, or create
+ such phi node. Return that phi node. */
+
+static gphi *
+find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
+{
+ gimple *def = SSA_NAME_DEF_STMT (guard_iv);
+ gphi *phi;
+ if ((phi = dyn_cast <gphi *> (def))
+ && gimple_bb (phi) == loop->header)
+ return phi;
+
+ /* XXX Create the PHI instead. */
+ return NULL;
+}
+
+/* Checks if LOOP contains an conditional block whose condition
+ depends on which side in the iteration space it is, and if so
+ splits the iteration space into two loops. Returns true if the
+ loop was split. NITER must contain the iteration descriptor for the
+ single exit of LOOP. */
+
+static bool
+split_loop (struct loop *loop, struct tree_niter_desc *niter)
+{
+ basic_block *bbs;
+ unsigned i;
+ bool changed = false;
+ tree guard_iv;
+ tree border;
+ affine_iv iv;
+
+ bbs = get_loop_body (loop);
+
+ /* Find a splitting opportunity. */
+ for (i = 0; i < loop->num_nodes; i++)
+ if ((guard_iv = split_at_bb_p (loop, bbs[i], &border, &iv)))
+ {
+ /* Handling opposite steps is not implemented yet. Neither
+ is handling different step sizes. */
+ if ((tree_int_cst_sign_bit (iv.step)
+ != tree_int_cst_sign_bit (niter->control.step))
+ || !tree_int_cst_equal (iv.step, niter->control.step))
+ continue;
+
+ /* Find a loop PHI node that defines guard_iv directly,
+ or create one doing that. */
+ gphi *phi = find_or_create_guard_phi (loop, guard_iv, &iv);
+ if (!phi)
+ continue;
+ gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
+ tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
+ loop_preheader_edge (loop));
+ enum tree_code guard_code = gimple_cond_code (guard_stmt);
+
+ /* Loop splitting is implemented by versioning the loop, placing
+ the new loop in front of the old loop, make the first loop iterate
+ as long as the conditional stays true (or false) and let the
+ second (original) loop handle the rest of the iterations.
+
+ First we need to determine if the condition will start being true
+ or false in the first loop. */
+ bool initial_true;
+ switch (guard_code)
+ {
+ case LT_EXPR:
+ case LE_EXPR:
+ initial_true = !tree_int_cst_sign_bit (iv.step);
+ break;
+ case GT_EXPR:
+ case GE_EXPR:
+ initial_true = tree_int_cst_sign_bit (iv.step);
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ /* Build a condition that will skip the first loop when the
+ guard condition won't ever be true (or false). */
+ tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
+ if (initial_true)
+ cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+ /* Now version the loop, we will then have this situation:
+ if (!cond)
+ for (...) {body} //floop
+ else
+ for (...) {body} //loop
+ join: */
+ initialize_original_copy_tables ();
+ basic_block cond_bb;
+ struct loop *floop = loop_version (loop, cond, &cond_bb,
+ REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+ REG_BR_PROB_BASE, false);
+ gcc_assert (floop);
+ update_ssa (TODO_update_ssa);
+
+ /* Now diddle the exit edge of the first loop (floop->join in the
+ above) to either go to the common exit (join) or to the second
+ loop, depending on if there are still iterations left, or not.
+ We split the floop exit edge and insert a copy of the
+ original exit expression into the new block, that either
+ skips the second loop or goes to it. */
+ edge exit = single_exit (floop);
+ basic_block skip_bb = split_edge (exit);
+ gcond *skip_stmt;
+ gimple_stmt_iterator gsi;
+ edge new_e, skip_e;
+
+ gimple *stmt = last_stmt (exit->src);
+ skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
+ gimple_cond_lhs (stmt),
+ gimple_cond_rhs (stmt),
+ NULL_TREE, NULL_TREE);
+ gsi = gsi_last_bb (skip_bb);
+ gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
+
+ skip_e = EDGE_SUCC (skip_bb, 0);
+ skip_e->flags &= ~EDGE_FALLTHRU;
+ new_e = make_edge (skip_bb, loop_preheader_edge (loop)->src, 0);
+ if (exit->flags & EDGE_TRUE_VALUE)
+ {
+ skip_e->flags |= EDGE_TRUE_VALUE;
+ new_e->flags |= EDGE_FALSE_VALUE;
+ }
+ else
+ {
+ skip_e->flags |= EDGE_FALSE_VALUE;
+ new_e->flags |= EDGE_TRUE_VALUE;
+ }
+
+ new_e->count = skip_bb->count;
+ new_e->probability = PROB_LIKELY;
+ new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
+ skip_e->count -= new_e->count;
+ skip_e->probability = inverse_probability (PROB_LIKELY);
+
+ /* Now we have created this situation:
+ if (!cond) {
+ for (...) {body; if (cexit) break;}
+ if (!cexit) goto second;
+ } else {
+ second:
+ for (...) {body; if (cexit) break;}
+ }
+ join:
+
+ The second loop can now be entered by skipping the first
+ loop (the inital values of its PHI nodes will be the
+ original initial values), or by falling in from the first
+ loop (the initial values will be the continuation values
+ from the first loop). Insert PHI nodes reflecting this
+ in the pre-header of the second loop. */
+
+ basic_block rest = loop_preheader_edge (loop)->src;
+ edge skip_first = find_edge (cond_bb, rest);
+ gcc_assert (skip_first);
+
+ edge firste = loop_preheader_edge (floop);
+ edge seconde = loop_preheader_edge (loop);
+ edge firstn = loop_latch_edge (floop);
+ gphi *new_guard_phi = 0;
+ gphi_iterator psi_first, psi_second;
+ for (psi_first = gsi_start_phis (floop->header),
+ psi_second = gsi_start_phis (loop->header);
+ !gsi_end_p (psi_first);
+ gsi_next (&psi_first), gsi_next (&psi_second))
+ {
+ tree init, next, new_init;
+ use_operand_p op;
+ gphi *phi_first = psi_first.phi ();
+ gphi *phi_second = psi_second.phi ();
+
+ if (phi_second == phi)
+ new_guard_phi = phi_first;
+
+ init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
+ next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+ op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
+ gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+
+ /* Prefer using original variable as a base for the new ssa name.
+ This is necessary for virtual ops, and useful in order to avoid
+ losing debug info for real ops. */
+ if (TREE_CODE (next) == SSA_NAME
+ && useless_type_conversion_p (TREE_TYPE (next),
+ TREE_TYPE (init)))
+ new_init = copy_ssa_name (next);
+ else if (TREE_CODE (init) == SSA_NAME
+ && useless_type_conversion_p (TREE_TYPE (init),
+ TREE_TYPE (next)))
+ new_init = copy_ssa_name (init);
+ else if (useless_type_conversion_p (TREE_TYPE (next),
+ TREE_TYPE (init)))
+ new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
+ "unrinittmp");
+ else
+ new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
+ "unrinittmp");
+
+ gphi * newphi = create_phi_node (new_init, rest);
+ add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
+ add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
+ SET_USE (op, new_init);
+ }
+
+ /* The iterations of the second loop is now already
+ exactly those that the first loop didn't do, but the
+ iteration space of the first loop is still the original one.
+ Build a new one, exactly covering those iterations where
+ the conditional is true (or false). For example, from such a loop:
+
+ for (i = beg, j = beg2; i < end; i++, j++)
+ if (j < c) // this is supposed to be true
+ ...
+
+ we build new bounds and change the exit condtions such that
+ it's effectively this:
+
+ newend = min (end+beg2-beg, c)
+ for (i = beg; j = beg2; j < newend; i++, j++)
+ if (j < c)
+ ...
+
+ Depending on the direction of the IVs and if the exit tests
+ are strict or include equality we need to use MIN or MAX,
+ and add or subtract 1. */
+
+ gimple_seq stmts = NULL;
+ /* The niter structure contains the after-increment IV, we need
+ the loop-enter base, so subtract STEP once. */
+ tree controlbase = force_gimple_operand (niter->control.base,
+ &stmts, true, NULL_TREE);
+ tree controlstep = niter->control.step;
+ tree enddiff;
+ if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
+ {
+ controlstep = gimple_build (&stmts, NEGATE_EXPR,
+ TREE_TYPE (controlstep), controlstep);
+ enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
+ TREE_TYPE (controlbase),
+ controlbase, controlstep);
+ }
+ else
+ enddiff = gimple_build (&stmts, MINUS_EXPR,
+ TREE_TYPE (controlbase),
+ controlbase, controlstep);
+
+ /* Compute beg-beg2. */
+ if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
+ {
+ tree tem = gimple_convert (&stmts, sizetype, guard_init);
+ tem = gimple_build (&stmts, NEGATE_EXPR, sizetype, tem);
+ enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
+ TREE_TYPE (enddiff),
+ enddiff, tem);
+ }
+ else
+ enddiff = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+ enddiff, guard_init);
+
+ /* Compute end-(beg-beg2). */
+ gimple_seq stmts2;
+ tree newbound = force_gimple_operand (niter->bound, &stmts2,
+ true, NULL_TREE);
+ gimple_seq_add_seq_without_update (&stmts, stmts2);
+
+ if (POINTER_TYPE_P (TREE_TYPE (enddiff))
+ || POINTER_TYPE_P (TREE_TYPE (newbound)))
+ {
+ enddiff = gimple_convert (&stmts, sizetype, enddiff);
+ enddiff = gimple_build (&stmts, NEGATE_EXPR, sizetype, enddiff);
+ newbound = gimple_build (&stmts, POINTER_PLUS_EXPR,
+ TREE_TYPE (newbound),
+ newbound, enddiff);
+ }
+ else
+ newbound = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+ newbound, enddiff);
+
+ /* Depending on the direction of the IVs the new bound for the first
+ loop is the minimum or maximum of old bound and border.
+ Also, if the guard condition isn't strictly less or greater,
+ we need to adjust the bound. */
+ int addbound = 0;
+ enum tree_code minmax;
+ if (niter->cmp == LT_EXPR)
+ {
+ /* GT and LE are the same, inverted. */
+ if (guard_code == GT_EXPR || guard_code == LE_EXPR)
+ addbound = -1;
+ minmax = MIN_EXPR;
+ }
+ else
+ {
+ gcc_assert (niter->cmp == GT_EXPR);
+ if (guard_code == GE_EXPR || guard_code == LT_EXPR)
+ addbound = 1;
+ minmax = MAX_EXPR;
+ }
+
+ if (addbound)
+ {
+ tree type2 = TREE_TYPE (newbound);
+ if (POINTER_TYPE_P (type2))
+ type2 = sizetype;
+ newbound = gimple_build (&stmts,
+ POINTER_TYPE_P (TREE_TYPE (newbound))
+ ? POINTER_PLUS_EXPR : PLUS_EXPR,
+ TREE_TYPE (newbound),
+ newbound,
+ build_int_cst (type2, addbound));
+ }
+
+ tree newend = gimple_build (&stmts, minmax, TREE_TYPE (border),
+ border, newbound);
+ if (stmts)
+ gsi_insert_seq_on_edge_immediate (loop_preheader_edge (floop),
+ stmts);
+
+ /* Now patch the exit block of the first loop to compare
+ the post-increment value of the guarding IV with the new end
+ value. */
+ tree new_guard_next = PHI_ARG_DEF_FROM_EDGE (new_guard_phi,
+ loop_latch_edge (floop));
+ patch_loop_exit (floop, guard_stmt, new_guard_next, newend,
+ initial_true);
+
+ /* Finally patch out the two copies of the condition to be always
+ true/false (or opposite). */
+ gcond *force_true = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
+ gcond *force_false = as_a<gcond *> (last_stmt (bbs[i]));
+ if (!initial_true)
+ std::swap (force_true, force_false);
+ gimple_cond_make_true (force_true);
+ gimple_cond_make_false (force_false);
+ update_stmt (force_true);
+ update_stmt (force_false);
+
+ free_original_copy_tables ();
+
+ /* We destroyed LCSSA form above. Eventually we might be able
+ to fix it on the fly, for now simply punt and use the helper. */
+ rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, floop);
+
+ changed = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, ";; Loop split.\n");
+
+ /* Only deal with the first opportunity. */
+ break;
+ }
+
+ free (bbs);
+ return changed;
+}
+
+/* Main entry point. Perform loop splitting on all suitable loops. */
+
+static unsigned int
+tree_ssa_split_loops (void)
+{
+ struct loop *loop;
+ bool changed = false;
+
+ gcc_assert (scev_initialized_p ());
+ /* Go through all loops starting from innermost. */
+ FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+ {
+ struct tree_niter_desc niter;
+ if (single_exit (loop)
+ /* ??? We could handle non-empty latches when we split
+ the latch edge (not the exit edge), and put the new
+ exit condition in the new block. OTOH this executes some
+ code unconditionally that might have been skipped by the
+ original exit before. */
+ && empty_block_p (loop->latch)
+ && !optimize_loop_for_size_p (loop)
+ && number_of_iterations_exit (loop, single_exit (loop), &niter,
+ false, true)
+ /* We can't yet handle loops controlled by a != predicate. */
+ && niter.cmp != NE_EXPR)
+ changed |= split_loop (loop, &niter);
+ }
+
+ if (changed)
+ return TODO_cleanup_cfg;
+ return 0;
+}
+
+/* Loop splitting pass. */
+
+namespace {
+
+const pass_data pass_data_loop_split =
+{
+ GIMPLE_PASS, /* type */
+ "lsplit", /* name */
+ OPTGROUP_LOOP, /* optinfo_flags */
+ TV_LOOP_SPLIT, /* tv_id */
+ PROP_cfg, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ 0, /* todo_flags_finish */
+};
+
+class pass_loop_split : public gimple_opt_pass
+{
+public:
+ pass_loop_split (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_loop_split, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual bool gate (function *) { return optimize >= 2; }
+ virtual unsigned int execute (function *);
+
+}; // class pass_loop_split
+
+unsigned int
+pass_loop_split::execute (function *fun)
+{
+ if (number_of_loops (fun) <= 1)
+ return 0;
+
+ return tree_ssa_split_loops ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_loop_split (gcc::context *ctxt)
+{
+ return new pass_loop_split (ctxt);
+}
Index: testsuite/gcc.dg/loop-split.c
===================================================================
--- testsuite/gcc.dg/loop-split.c (revision 0)
+++ testsuite/gcc.dg/loop-split.c (working copy)
@@ -0,0 +1,141 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
+
+#ifdef __cplusplus
+extern "C" int printf (const char *, ...);
+extern "C" void abort (void);
+#else
+extern int printf (const char *, ...);
+extern void abort (void);
+#endif
+
+#ifndef TRACE
+#define TRACE 0
+#endif
+
+#define loop(beg,step,beg2,cond1,cond2) \
+ do \
+ { \
+ sum = 0; \
+ for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
+ { \
+ if (cond2) { \
+ if (TRACE > 1) printf ("a: %d %d\n", i, j); \
+ sum += a[i]; \
+ } else { \
+ if (TRACE > 1) printf ("b: %d %d\n", i, j); \
+ sum += b[i]; \
+ } \
+ } \
+ if (TRACE > 0) printf ("sum: %d\n", sum); \
+ check = check * 47 + sum; \
+ } while (0)
+
+#if 1
+unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
+ int c, int *a, int *b, int beg2)
+{
+ unsigned check = 0;
+ int sum;
+ int i, j;
+ loop (beg, 1, beg2, i < end, j < c);
+ loop (beg, 1, beg2, i <= end, j < c);
+ loop (beg, 1, beg2, i < end, j <= c);
+ loop (beg, 1, beg2, i <= end, j <= c);
+ loop (beg, 1, beg2, i < end, j > c);
+ loop (beg, 1, beg2, i <= end, j > c);
+ loop (beg, 1, beg2, i < end, j >= c);
+ loop (beg, 1, beg2, i <= end, j >= c);
+ beg2 += end-beg;
+ loop (end, -1, beg2, i >= beg, j >= c);
+ loop (end, -1, beg2, i >= beg, j > c);
+ loop (end, -1, beg2, i > beg, j >= c);
+ loop (end, -1, beg2, i > beg, j > c);
+ loop (end, -1, beg2, i >= beg, j <= c);
+ loop (end, -1, beg2, i >= beg, j < c);
+ loop (end, -1, beg2, i > beg, j <= c);
+ loop (end, -1, beg2, i > beg, j < c);
+ return check;
+}
+
+#else
+
+int __attribute__((noinline, noclone)) f (int beg, int end, int step,
+ int c, int *a, int *b, int beg2)
+{
+ int sum = 0;
+ int i, j;
+ //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+ for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
+ {
+ // i - j == X --> i = X + j
+ // --> i < end == X+j < end == j < end - X
+ // --> newend = end - (i_init - j_init)
+ // j < end-X && j < c --> j < min(end-X,c)
+ // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
+ //if (j < c)
+ if (j >= c)
+ printf ("a: %d %d\n", i, j);
+ /*else
+ printf ("b: %d %d\n", i, j);*/
+ /*sum += a[i];
+ else
+ sum += b[i];*/
+ }
+ return sum;
+}
+
+int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
+ int *c, int *a, int *b, int *beg2)
+{
+ int sum = 0;
+ int *i, *j;
+ for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+ {
+ if (j <= c)
+ printf ("%d %d\n", i - beg, j - beg);
+ /*sum += a[i];
+ else
+ sum += b[i];*/
+ }
+ return sum;
+}
+#endif
+
+extern int printf (const char *, ...);
+
+int main ()
+{
+ int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
+ int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
+ int c;
+ int diff = 0;
+ unsigned check = 0;
+ //dotest (0, 9, 1, -1, a+5, b+5, -1);
+ //return 0;
+ //f (0, 9, 1, -1, a+5, b+5, -1);
+ //return 0;
+ for (diff = -5; diff <= 5; diff++)
+ {
+ for (c = -1; c <= 10; c++)
+ {
+#if 0
+ int s = f (0, 9, 1, c, a+5, b+5, diff);
+ //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
+ printf ("%d ", s);
+#else
+ if (TRACE > 0)
+ printf ("check %d %d\n", c, diff);
+ check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
+#endif
+ }
+ //printf ("\n");
+ }
+ //printf ("%u\n", check);
+ if (check != 3213344948)
+ abort ();
+ return 0;
+}
+
+/* All 16 loops in dotest should be split. */
+/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting
2015-11-12 16:52 Gimple loop splitting Michael Matz
@ 2015-11-12 21:44 ` Jeff Law
2015-11-16 16:06 ` Michael Matz
2016-07-25 7:00 ` Gimple loop splitting Andrew Pinski
1 sibling, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-11-12 21:44 UTC (permalink / raw)
To: Michael Matz, gcc-patches
On 11/12/2015 09:52 AM, Michael Matz wrote:
> Hello,
>
> this new pass implements loop iteration space splitting for loops that
> contain a conditional that's always true for one part of the iteration
> space and false for the other, i.e. such situations:
FWIW, Ajit suggested the same transformation earlier this year. During
that discussion Richi indicated that for hmmer this transformation would
enable vectorization.
>
> This transformation is in itself a good one but can also be an enabler for
> the vectorizer.
Agreed.
It does increase code size, when the loop body contains
> also unconditional code (that one is duplicated), so we only transform hot
> loops.
Probably ought to be disabled when we're not optimizing for speed as well.
I'm a bit unsure of the placement of the new pass, or if it should
> be an own pass at all. Right now I've placed it after unswitching and
> scev_cprop, before loop distribution. Ideally I think all three, together
> with loop fusion and an gimple unroller should be integrated into one loop
> nest optimizer, alas, we aren't there yet.
Given its impact on the looping structure, I'd think early in the loop
optimizer. Given the similarities with unswitching, I think
before/after unswitching is a natural first cut. We can always iterate
if it looks like putting it elsewhere would make sense.
> I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> regressions. I've also checked cpu2006 (the non-fortran part) for
> correctness, not yet for performance. In the end it should probably only
> be enabled for -O3+ (although if the whole loop body is conditional it
> makes sense to also have it with -O2 because code growth is very small
> then).
Very curious on the performance side, so if you could get some #s on
that, it'd be greatly appreciated.
I'd be comfortable with this at -O2, but won't object if you'd prefer -O3.
>
> So, okay for trunk?
>
>
> Ciao,
> Michael.
> * passes.def (pass_loop_split): Add.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
> gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
> (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
> split_loop, tree_ssa_split_loops,
> make_pass_loop_split): New functions.
> (pass_data_loop_split): New.
> (pass_loop_split): New.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
Please clean up the #if 0/#if 1 code in the new tests. You might also
want to clean out the TRACE stuff. Essentially the tests look like you
just dropped in a test you'd been running by hand until now :-)
I don't see any negative tests -- ie tests that should not be split due
to boundary conditions. Do you have any from development? If so it'd
be good to have those too.
>
> Index: tree-ssa-loop-manip.h
> ===================================================================
> --- tree-ssa-loop-manip.h (revision 229763)
> +++ tree-ssa-loop-manip.h (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
> extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
> bool, tree *, tree *);
> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
> + struct loop *);
> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
> extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
> extern void verify_loop_closed_ssa (bool);
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c (revision 229763)
> +++ tree-ssa-loop-unswitch.c (working copy)
Given the amount of new code, unless there's a strong need, I'd prefer
this transformation to be implemented in its own file.
> +
> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> + find the loop phi node in LOOP defining it directly, or create
> + such phi node. Return that phi node. */
> +
> +static gphi *
> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
> +{
> + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> + gphi *phi;
> + if ((phi = dyn_cast <gphi *> (def))
> + && gimple_bb (phi) == loop->header)
> + return phi;
> +
> + /* XXX Create the PHI instead. */
> + return NULL;
So right now we just punt if we need to create the PHI? Does that
happen with any kind of regularity in practice?
> +}
> +
> +/* Checks if LOOP contains an conditional block whose condition
> + depends on which side in the iteration space it is, and if so
> + splits the iteration space into two loops. Returns true if the
> + loop was split. NITER must contain the iteration descriptor for the
> + single exit of LOOP. */
> +
> +static bool
> +split_loop (struct loop *loop, struct tree_niter_desc *niter)
This should probably be broken up a bit more. It's loooong as-is.
Without looking at how much stuff would have to be passed around,
diddling the exit edge of the first loop, phi updates for the 2nd loop,
fix iteration space of 2nd loop, exit block fixup might be a good
initial cut at breaking this down into something of manageable size.
Not sure if the setup and initial versioning should be broken out or not.
> + initialize_original_copy_tables ();
> + basic_block cond_bb;
> + struct loop *floop = loop_version (loop, cond, &cond_bb,
> + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> + REG_BR_PROB_BASE, false);
> + gcc_assert (floop);
> + update_ssa (TODO_update_ssa);
> +
> + /* Now diddle the exit edge of the first loop (floop->join in the
> + above) to either go to the common exit (join) or to the second
> + loop, depending on if there are still iterations left, or not.
> + We split the floop exit edge and insert a copy of the
> + original exit expression into the new block, that either
> + skips the second loop or goes to it. */
So after diddling, haven't we mucked up the dominator tree and the SSA
graph? You're iterating over each PHI in two loop headers and fixing
the SSA graph by hand AFAICT. But ISTM the dominator tree is still
mucked up, right? I'm thinking specifically about the 2nd loop. Though
perhaps it just works since after all your transformations it'll still
be immediately dominated by the same block as before your transformations.
Overall I think this looks real good. THe biggest problem IMHO is
breaking down that monster function a bit. I'm a bit concerned by the
dominator tree state. Worst case is we have to rebuild the dominators
before ensuring we're LCSSA form, and even that doesn't seem too bad.
As I mentioned, it may actually be the case that we're OK on the
dominator tree, kindof by accident more than design.
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting
2015-11-12 21:44 ` Jeff Law
@ 2015-11-16 16:06 ` Michael Matz
2015-11-16 23:27 ` Jeff Law
0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2015-11-16 16:06 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Hi,
On Thu, 12 Nov 2015, Jeff Law wrote:
> > this new pass implements loop iteration space splitting for loops that
> > contain a conditional that's always true for one part of the iteration
> > space and false for the other, i.e. such situations:
> FWIW, Ajit suggested the same transformation earlier this year. During that
> discussion Richi indicated that for hmmer this transformation would enable
> vectorization.
It's a prerequisite indeed, but not enough in itself. The next problem
will be that only parts of access chains inside the hot loop are
vectorizable, but for that those parts need to be disambiguated. ICC is
doing that by a massive chain of conditionals testing non-overlapping of
the respective arrays at runtime. Our vectorizer could also do that
(possibly by increasing the allowed number of conditionals), but the next
problem then is that one of these (then indeed separated) parts is not
vectorizable by our vectorizer: it's a 'a[i] = f(a[i-1])' dependency that
can't yet be handled by us. If the separation of parts would be done by
loop distribution that would be fine (we'd have separate loops for the
parts, some of them vectorizable), but our loop distribution can't do
runtime disambiguation, only our vectorizer.
hmmer is actually quite interesting because it's a fairly isolated hot
loop posing quite challenging problems for us :)
>
> It does increase code size, when the loop body contains
> > also unconditional code (that one is duplicated), so we only transform hot
> > loops.
>
> Probably ought to be disabled when we're not optimizing for speed as well.
That should be dealt with by '!optimize_loop_for_size_p (loop)'.
> > I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> > regressions. I've also checked cpu2006 (the non-fortran part) for
> > correctness, not yet for performance. In the end it should probably only
> > be enabled for -O3+ (although if the whole loop body is conditional it
> > makes sense to also have it with -O2 because code growth is very small
> > then).
>
> Very curious on the performance side, so if you could get some #s on that,
> it'd be greatly appreciated.
My test machine misbehaved over the weekend, but as soon as I have them
I'll update here.
> > testsuite/
> > * gcc.dg/loop-split.c: New test.
>
> Please clean up the #if 0/#if 1 code in the new tests.
Actually I'd prefer if that test contains the by-hand code and the TRACE
stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...
> You might also want to clean out the TRACE stuff. Essentially the tests
> look like you just dropped in a test you'd been running by hand until
> now :-)
... the reason being, that bugs in the splitter are somewhat unwieldy to
debug by just staring at the dumps, you only get a checksum mismatch, so
TRACE=1 is for finding out which of the params and loops is actually
miscompiled, TRACE=2 for finding the specific iteration that's broken, and
the #if0 code for putting that situation into a non-macroized and smaller
function than dotest. (That's actually how I've run the testcase after I
had it basically working, extending dotest with a couple more lines, aka
example loop sitations, adjusting the checksum, and then making a face and
scratching my head and mucking with the TRACE and #if0 macros :) ).
> I don't see any negative tests -- ie tests that should not be split due
> to boundary conditions. Do you have any from development?
Good point, I had some but only ones where I was able to extend the
splitters to cover them. I'll think of some that really shouldn't be
split.
> > Index: tree-ssa-loop-unswitch.c
> > ===================================================================
> > --- tree-ssa-loop-unswitch.c (revision 229763)
> > +++ tree-ssa-loop-unswitch.c (working copy)
> Given the amount of new code, unless there's a strong need, I'd prefer this
> transformation to be implemented in its own file.
Okay.
> > +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> > + find the loop phi node in LOOP defining it directly, or create
> > + such phi node. Return that phi node. */
> > +
> > +static gphi *
> > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv *
> > /*iv*/)
> > +{
> > + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> > + gphi *phi;
> > + if ((phi = dyn_cast <gphi *> (def))
> > + && gimple_bb (phi) == loop->header)
> > + return phi;
> > +
> > + /* XXX Create the PHI instead. */
> > + return NULL;
>
> So right now we just punt if we need to create the PHI? Does that
> happen with any kind of regularity in practice?
Only with such situations:
for (int i = start; i < end; i++) {
if (i + offset < bound)
...
}
Here the condition-IV is not directly defined by a PHI node. If it
happens often I don't know, I guess the usual situation is testing the
control IV directly. The deficiency is not hard to fix.
> > +static bool
> > +split_loop (struct loop *loop, struct tree_niter_desc *niter)
> This should probably be broken up a bit more. It's loooong as-is.
>
> Without looking at how much stuff would have to be passed around,
> diddling the exit edge of the first loop, phi updates for the 2nd loop,
> fix iteration space of 2nd loop, exit block fixup might be a good
> initial cut at breaking this down into something of manageable size.
Thanks, I'll do that.
> > + initialize_original_copy_tables ();
> > + basic_block cond_bb;
> > + struct loop *floop = loop_version (loop, cond, &cond_bb,
> > + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> > + REG_BR_PROB_BASE, false);
> > + gcc_assert (floop);
> > + update_ssa (TODO_update_ssa);
> > +
> > + /* Now diddle the exit edge of the first loop (floop->join in the
> > + above) to either go to the common exit (join) or to the second
> > + loop, depending on if there are still iterations left, or not.
> > + We split the floop exit edge and insert a copy of the
> > + original exit expression into the new block, that either
> > + skips the second loop or goes to it. */
>
> So after diddling, haven't we mucked up the dominator tree and the SSA
> graph? You're iterating over each PHI in two loop headers and fixing the
> SSA graph by hand AFAICT. But ISTM the dominator tree is still mucked
> up, right?
I think I convinced myself on paper that the dominator tree is correct due
to our helpers doing the right thing (loop_version() for the initial
loop copying and split_edge for the above diddling). Let's see if I can
paint some ASCII art. So, after loop_version (which updates dom) we
have:
.------if (cond)-------.
v v
pre1 pre2
| |
h1<----. h2<----.
| | | |
.--ex1 | .------ex2 |
| \ | | \ |
| l1---' | l2---'
| |
| |
'--X--------->join<'
At this point dominators are all correct (due to loop_version updating
them), in particular dom(pre1)==dom(pre2)==if(cond). Now we split
ex1->join at X, and split_edge also updates them (trivially), but we
insert a new edge from split_bb to pre2. There are no paths from region2
into region1, and anything in region2 except pre2 is still dominated by
pre2 (or something further down), so if anything changes, then dom(pre2).
.------if (cond)----.
v |
pre1 |
| |
h1<----. |
| | |
ex1 | |
| \ | |
| l1-' |
v |
.-split-----------. |
| v |
| pre2<----'
| |
| h2<----.
| | |
| ex2 |
| | \ |
| | l2--'
| .-'
'------>join<--'
But there's a path directly to pre2, skipping whole region1, so dom(pre2)
must be still if(cond), as originally. Also dom(join) doesn't change,
because what was first a normal diamond between
if(cond),region1,region2,join now is a meddled diamond with paths from
region1 to region2, but not back, so the dominator of the join block still
is the if(cond) block.
This is all true if the internal structure of region1/region2 is sensible,
and single_exit() regions are such. Even multiple exits to something
behind join wouldn't change this, but we don't even have to think about
this.
In addition, anything not updating dominators correctly would scream
loudly in the verifier.
The SSA tree is correct after loop_version() and split_edge. The new edge
split_bb->pre2 needs the adjustments in that loop over loop PHI nodes.
That walk must catch everything, if it wouldn't then that would mean a use
in region2 that's defined in region1, that wasn't originally dominated by
the def (and hence must have been a loop-carried value and hence be
defined in the loop header PHI block).
> Overall I think this looks real good. THe biggest problem IMHO is
> breaking down that monster function a bit. I'm a bit concerned by the
> dominator tree state. Worst case is we have to rebuild the dominators
> before ensuring we're LCSSA form, and even that doesn't seem too bad.
Actually keeping LCSSA form correct is doable as well, but needs another
loop over one or the other PHI nodes. I punted for now and called
rewrite_into_loop_closed_ssa_1, which actually isn't too expensive for a
single loop.
> As I mentioned, it may actually be the case that we're OK on the
> dominator tree, kindof by accident more than design.
I'm pretty sure it is correct, and it is so by design :)
Thanks for the feedback, I'll update the patch accordingly.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting
2015-11-16 16:06 ` Michael Matz
@ 2015-11-16 23:27 ` Jeff Law
2015-12-01 16:47 ` Gimple loop splitting v2 Michael Matz
0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-11-16 23:27 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
On 11/16/2015 09:05 AM, Michael Matz wrote:
> It's a prerequisite indeed, but not enough in itself.
Sigh. OK. One can always hope.
>
> hmmer is actually quite interesting because it's a fairly isolated hot
> loop posing quite challenging problems for us :)
Sounds like it. Essentially, it's a TODO list :-)
>> Probably ought to be disabled when we're not optimizing for speed as well.
>
> That should be dealt with by '!optimize_loop_for_size_p (loop)'.
Doh, must have missed that.
>>
>> Please clean up the #if 0/#if 1 code in the new tests.
>
> Actually I'd prefer if that test contains the by-hand code and the TRACE
> stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...
>
>> You might also want to clean out the TRACE stuff. Essentially the tests
>> look like you just dropped in a test you'd been running by hand until
>> now :-)
>
> ... the reason being, that bugs in the splitter are somewhat unwieldy to
> debug by just staring at the dumps, you only get a checksum mismatch, so
> TRACE=1 is for finding out which of the params and loops is actually
> miscompiled, TRACE=2 for finding the specific iteration that's broken, and
> the #if0 code for putting that situation into a non-macroized and smaller
> function than dotest. (That's actually how I've run the testcase after I
> had it basically working, extending dotest with a couple more lines, aka
> example loop sitations, adjusting the checksum, and then making a face and
> scratching my head and mucking with the TRACE and #if0 macros :) ).
OK, if you want to keep them, then have a consistent way to turn them
on/off for future debugging. if0/if1 doesn't provide much of a clue to
someone else what to turn on/off if they need to debug this stuff.
>
>> I don't see any negative tests -- ie tests that should not be split due
>> to boundary conditions. Do you have any from development?
>
> Good point, I had some but only ones where I was able to extend the
> splitters to cover them. I'll think of some that really shouldn't be
> split.
If you've got them, certainly add them. Though I realize they may get
lost over time.
>
> Only with such situations:
>
> for (int i = start; i < end; i++) {
> if (i + offset < bound)
> ...
> }
>
> Here the condition-IV is not directly defined by a PHI node. If it
> happens often I don't know, I guess the usual situation is testing the
> control IV directly. The deficiency is not hard to fix.
I'm comfortable waiting until we see the need.
> I think I convinced myself on paper that the dominator tree is correct due
> to our helpers doing the right thing (loop_version() for the initial
> loop copying and split_edge for the above diddling). Let's see if I can
> paint some ASCII art. So, after loop_version (which updates dom) we
> have:
OK. I was worried about the next step -- where we insert the
conditional on the exit from pre1 to have it transfer to join or pre2.
But in that case, the immediate dominator of pre2 & join is still the
initial if statement. So I think we're OK. That was the conclusion I
was starting to come to yesterday, having the ascii art makes it pretty
clear. I'm just not good at conceptualizing a CFG. I have to see it
explicitly and then everything seems so clear and simple.
jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Gimple loop splitting v2
2015-11-16 23:27 ` Jeff Law
@ 2015-12-01 16:47 ` Michael Matz
2015-12-01 22:57 ` Jeff Law
0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2015-12-01 16:47 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Hi,
On Mon, 16 Nov 2015, Jeff Law wrote:
> OK, if you want to keep them, then have a consistent way to turn them
> on/off for future debugging. if0/if1 doesn't provide much of a clue to
> someone else what to turn on/off if they need to debug this stuff.
> > > I don't see any negative tests -- ie tests that should not be split
> > > due to boundary conditions. Do you have any from development?
> >
> > Good point, I had some but only ones where I was able to extend the
> > splitters to cover them. I'll think of some that really shouldn't be
> > split.
> If you've got them, certainly add them. Though I realize they may get
> lost over time.
Actually, thinking a bit more about this, I don't have any that wouldn't
be merely restrictions in the implementation that couldn't be lifted in
the future (e.g. unequal step sizes), so I've added no additional ones.
> But in that case, the immediate dominator of pre2 & join is still the
> initial if statement. So I think we're OK. That was the conclusion I
> was starting to come to yesterday, having the ascii art makes it pretty
> clear. I'm just not good at conceptualizing a CFG. I have to see it
> explicitly and then everything seems so clear and simple.
So, this second version should reflect the review. I've moved everything
to a new file, split the long function into several logically separate
ones, and even included ascii art in the comments :) The testcase got a
comment about what to #define for debugging. I've included the pass to
-O3 or alternatively if profile-use is on, similar to funswitch-loops.
I've also added a proper -fsplit-loops option.
There's two functional changes in v2: a bugfix to not try splitting a
non-iterating loop (irritatingly such a look returns true from
number_of_iterations_exit, but with an ERROR_MARK comparator), and a
limitation to avoid combinatorical explosion in artificial testcases: Once
we have done a splitting, we don't do any in that loops parents (we may
still do splitting in siblings or childs of siblings).
I've also done some measurements: first, bootstrap time is unaffected, and
regstrapping succeeds without regressions when I activate the pass by
default. Then SPECcpu2006: build times are unaffected, everything builds
and works also with -fsplit-loops, performance is mostly unaffected, base
is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops.
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio
-------------- ------ --------- --------- ------ ---------
---------
400.perlbench 9770 325 30.1 * 9770 323 30.3 *
401.bzip2 9650 382 25.2 * 9650 382 25.3 *
403.gcc 8050 242 33.3 * 8050 241 33.4 *
429.mcf 9120 311 29.3 * 9120 311 29.3 *
445.gobmk 10490 392 26.8 * 10490 391 26.8 *
456.hmmer 9330 345 27.0 * 9330 342 27.3 *
458.sjeng 12100 422 28.7 * 12100 420 28.8 *
462.libquantum 20720 308 67.3 * 20720 308 67.3 *
464.h264ref 22130 423 52.3 * 22130 423 52.3 *
471.omnetpp 6250 273 22.9 * 6250 273 22.9 *
473.astar 7020 311 22.6 * 7020 311 22.6 *
483.xalancbmk 6900 191 36.2 * 6900 190 36.2 *
Est. SPECint_base2006 31.7
Est. SPECint2006 31.7
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio
-------------- ------ --------- --------- ------ ---------
---------
410.bwaves 13590 235 57.7 * 13590 235 57.8 *
416.gamess NR NR
433.milc 9180 347 26.5 * 9180 345 26.6 *
434.zeusmp 9100 269 33.9 * 9100 268 33.9 *
435.gromacs 7140 260 27.4 * 7140 262 27.3 *
436.cactusADM 11950 237 50.5 * 11950 240 49.9 *
437.leslie3d 9400 228 41.3 * 9400 228 41.2 *
444.namd 8020 312 25.7 * 8020 311 25.7 *
447.dealII 11440 254 45.0 * 11440 254 45.0 *
450.soplex 8340 201 41.4 * 8340 202 41.4 *
453.povray NR NR
454.calculix 8250 282 29.2 * 8250 283 29.2 *
459.GemsFDTD 10610 310 34.3 * 10610 309 34.3 *
465.tonto 9840 683 14.4 * 9840 684 14.4 *
470.lbm 13740 224 61.2 * 13740 224 61.3 *
481.wrf 11170 291 38.4 * 11170 291 38.4 *
482.sphinx3 19490 377 51.7 * 19490 377 51.6 *
Est. SPECfp_base2006 36.3
Est. SPECfp2006 36.3
The 1% improvements and degradations are all inside the normal result
variations on this machine (I have the feeling that the hmmer improvement
is stable, and will recheck this). Not all of the above had loops split
at all, only: SPECint: 400.perlbench, 403.gcc, 445.gobmk, 456.hmmer,
462.libquantum, 464.h264ref, 471.omnetpp and SPECfp: 435.gromacs,
436.cactusADM, 447.dealII, 454.calculix.
So, okay for trunk?
Ciao,
Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2015-12-01 16:47 ` Gimple loop splitting v2 Michael Matz
@ 2015-12-01 22:57 ` Jeff Law
2015-12-02 13:23 ` Michael Matz
0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-12-01 22:57 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
On 12/01/2015 09:46 AM, Michael Matz wrote:
> Hi,
>
> So, okay for trunk?
-ENOPATCH
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2015-12-01 22:57 ` Jeff Law
@ 2015-12-02 13:23 ` Michael Matz
2015-12-05 7:55 ` Jeff Law
2016-07-25 20:57 ` Andrew Pinski
0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2015-12-02 13:23 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Hi,
On Tue, 1 Dec 2015, Jeff Law wrote:
> > So, okay for trunk?
> -ENOPATCH
Sigh :)
Here it is.
Ciao,
Michael.
* common.opt (-fsplit-loops): New flag.
* passes.def (pass_loop_split): Add.
* opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
(enable_fdo_optimizations): Add loop splitting.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
* tree-ssa-loop-split.c: New file.
* Makefile.in (OBJS): Add tree-ssa-loop-split.o.
* doc/invoke.texi (fsplit-loops): Document.
* doc/passes.texi (Loop optimization): Add paragraph about loop
splitting.
testsuite/
* gcc.dg/loop-split.c: New test.
Index: common.opt
===================================================================
--- common.opt (revision 231115)
+++ common.opt (working copy)
@@ -2453,6 +2457,10 @@ funswitch-loops
Common Report Var(flag_unswitch_loops) Optimization
Perform loop unswitching.
+fsplit-loops
+Common Report Var(flag_split_loops) Optimization
+Perform loop splitting.
+
funwind-tables
Common Report Var(flag_unwind_tables) Optimization
Just generate unwind tables for exception handling.
Index: passes.def
===================================================================
--- passes.def (revision 231115)
+++ passes.def (working copy)
@@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_dce);
NEXT_PASS (pass_tree_unswitch);
NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_loop_split);
NEXT_PASS (pass_record_bounds);
NEXT_PASS (pass_loop_distribution);
NEXT_PASS (pass_copy_prop);
Index: opts.c
===================================================================
--- opts.c (revision 231115)
+++ opts.c (working copy)
@@ -532,6 +532,7 @@ static const struct default_options defa
regardless of them being declared inline. */
{ OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
+ { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
@@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
opts->x_flag_ipa_cp_alignment = value;
if (!opts_set->x_flag_predictive_commoning)
opts->x_flag_predictive_commoning = value;
+ if (!opts_set->x_flag_split_loops)
+ opts->x_flag_split_loops = value;
if (!opts_set->x_flag_unswitch_loops)
opts->x_flag_unswitch_loops = value;
if (!opts_set->x_flag_gcse_after_reload)
Index: timevar.def
===================================================================
--- timevar.def (revision 231115)
+++ timevar.def (working copy)
@@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM , "
DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 231115)
+++ tree-pass.h (working copy)
@@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===================================================================
--- tree-ssa-loop-manip.h (revision 231115)
+++ tree-ssa-loop-manip.h (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+ struct loop *);
extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
extern void verify_loop_closed_ssa (bool);
Index: Makefile.in
===================================================================
--- Makefile.in (revision 231115)
+++ Makefile.in (working copy)
@@ -1474,6 +1474,7 @@ OBJS = \
tree-ssa-loop-manip.o \
tree-ssa-loop-niter.o \
tree-ssa-loop-prefetch.o \
+ tree-ssa-loop-split.o \
tree-ssa-loop-unswitch.o \
tree-ssa-loop.o \
tree-ssa-math-opts.o \
Index: tree-ssa-loop-split.c
===================================================================
--- tree-ssa-loop-split.c (revision 0)
+++ tree-ssa-loop-split.c (working copy)
@@ -0,0 +1,686 @@
+/* Loop splitting.
+ Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3. If not see
+<http://www.gnu.org/licenses/>. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "fold-const.h"
+#include "tree-cfg.h"
+#include "tree-ssa.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-ssa-loop.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-into-ssa.h"
+#include "cfgloop.h"
+#include "tree-scalar-evolution.h"
+#include "gimple-iterator.h"
+#include "gimple-pretty-print.h"
+#include "cfghooks.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
+
+/* This file implements loop splitting, i.e. transformation of loops like
+
+ for (i = 0; i < 100; i++)
+ {
+ if (i < 50)
+ A;
+ else
+ B;
+ }
+
+ into:
+
+ for (i = 0; i < 50; i++)
+ {
+ A;
+ }
+ for (; i < 100; i++)
+ {
+ B;
+ }
+
+ */
+
+/* Return true when BB inside LOOP is a potential iteration space
+ split point, i.e. ends with a condition like "IV < comp", which
+ is true on one side of the iteration space and false on the other,
+ and the split point can be computed. If so, also return the border
+ point in *BORDER and the comparison induction variable in IV. */
+
+static tree
+split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
+{
+ gimple *last;
+ gcond *stmt;
+ affine_iv iv2;
+
+ /* BB must end in a simple conditional jump. */
+ last = last_stmt (bb);
+ if (!last || gimple_code (last) != GIMPLE_COND)
+ return NULL_TREE;
+ stmt = as_a <gcond *> (last);
+
+ enum tree_code code = gimple_cond_code (stmt);
+
+ /* Only handle relational comparisons, for equality and non-equality
+ we'd have to split the loop into two loops and a middle statement. */
+ switch (code)
+ {
+ case LT_EXPR:
+ case LE_EXPR:
+ case GT_EXPR:
+ case GE_EXPR:
+ break;
+ default:
+ return NULL_TREE;
+ }
+
+ if (loop_exits_from_bb_p (loop, bb))
+ return NULL_TREE;
+
+ tree op0 = gimple_cond_lhs (stmt);
+ tree op1 = gimple_cond_rhs (stmt);
+
+ if (!simple_iv (loop, loop, op0, iv, false))
+ return NULL_TREE;
+ if (!simple_iv (loop, loop, op1, &iv2, false))
+ return NULL_TREE;
+
+ /* Make it so, that the first argument of the condition is
+ the looping one (only swap. */
+ if (!integer_zerop (iv2.step))
+ {
+ std::swap (op0, op1);
+ std::swap (*iv, iv2);
+ code = swap_tree_comparison (code);
+ gimple_cond_set_condition (stmt, code, op0, op1);
+ update_stmt (stmt);
+ }
+ else if (integer_zerop (iv->step))
+ return NULL_TREE;
+ if (!integer_zerop (iv2.step))
+ return NULL_TREE;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Found potential split point: ");
+ print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+ fprintf (dump_file, " { ");
+ print_generic_expr (dump_file, iv->base, TDF_SLIM);
+ fprintf (dump_file, " + I*");
+ print_generic_expr (dump_file, iv->step, TDF_SLIM);
+ fprintf (dump_file, " } %s ", get_tree_code_name (code));
+ print_generic_expr (dump_file, iv2.base, TDF_SLIM);
+ fprintf (dump_file, "\n");
+ }
+
+ *border = iv2.base;
+ return op0;
+}
+
+/* Given a GUARD conditional stmt inside LOOP, which we want to make always
+ true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
+ (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
+ exit test statement to loop back only if the GUARD statement will
+ also be true/false in the next iteration. */
+
+static void
+patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
+ bool initial_true)
+{
+ edge exit = single_exit (loop);
+ gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
+ gimple_cond_set_condition (stmt, gimple_cond_code (guard),
+ nextval, newbound);
+ update_stmt (stmt);
+
+ edge stay = single_pred_edge (loop->latch);
+
+ exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+ stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+
+ if (initial_true)
+ {
+ exit->flags |= EDGE_FALSE_VALUE;
+ stay->flags |= EDGE_TRUE_VALUE;
+ }
+ else
+ {
+ exit->flags |= EDGE_TRUE_VALUE;
+ stay->flags |= EDGE_FALSE_VALUE;
+ }
+}
+
+/* Give an induction variable GUARD_IV, and its affine descriptor IV,
+ find the loop phi node in LOOP defining it directly, or create
+ such phi node. Return that phi node. */
+
+static gphi *
+find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
+{
+ gimple *def = SSA_NAME_DEF_STMT (guard_iv);
+ gphi *phi;
+ if ((phi = dyn_cast <gphi *> (def))
+ && gimple_bb (phi) == loop->header)
+ return phi;
+
+ /* XXX Create the PHI instead. */
+ return NULL;
+}
+
+/* This function updates the SSA form after connect_loops made a new
+ edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
+ conditional). I.e. the second loop can now be entered either
+ via the original entry or via NEW_E, so the entry values of LOOP2
+ phi nodes are either the original ones or those at the exit
+ of LOOP1. Insert new phi nodes in LOOP2 pre-header reflecting
+ this. */
+
+static void
+connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
+{
+ basic_block rest = loop_preheader_edge (loop2)->src;
+ gcc_assert (new_e->dest == rest);
+ edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
+
+ edge firste = loop_preheader_edge (loop1);
+ edge seconde = loop_preheader_edge (loop2);
+ edge firstn = loop_latch_edge (loop1);
+ gphi_iterator psi_first, psi_second;
+ for (psi_first = gsi_start_phis (loop1->header),
+ psi_second = gsi_start_phis (loop2->header);
+ !gsi_end_p (psi_first);
+ gsi_next (&psi_first), gsi_next (&psi_second))
+ {
+ tree init, next, new_init;
+ use_operand_p op;
+ gphi *phi_first = psi_first.phi ();
+ gphi *phi_second = psi_second.phi ();
+
+ init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
+ next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+ op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
+ gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+
+ /* Prefer using original variable as a base for the new ssa name.
+ This is necessary for virtual ops, and useful in order to avoid
+ losing debug info for real ops. */
+ if (TREE_CODE (next) == SSA_NAME
+ && useless_type_conversion_p (TREE_TYPE (next),
+ TREE_TYPE (init)))
+ new_init = copy_ssa_name (next);
+ else if (TREE_CODE (init) == SSA_NAME
+ && useless_type_conversion_p (TREE_TYPE (init),
+ TREE_TYPE (next)))
+ new_init = copy_ssa_name (init);
+ else if (useless_type_conversion_p (TREE_TYPE (next),
+ TREE_TYPE (init)))
+ new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
+ "unrinittmp");
+ else
+ new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
+ "unrinittmp");
+
+ gphi * newphi = create_phi_node (new_init, rest);
+ add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
+ add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
+ SET_USE (op, new_init);
+ }
+}
+
+/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
+ they are still equivalent and placed in two arms of a diamond, like so:
+
+ .------if (cond)------.
+ v v
+ pre1 pre2
+ | |
+ .--->h1 h2<----.
+ | | | |
+ | ex1---. .---ex2 |
+ | / | | \ |
+ '---l1 X | l2---'
+ | |
+ | |
+ '--->join<---'
+
+ This function transforms the program such that LOOP1 is conditionally
+ falling through to LOOP2, or skipping it. This is done by splitting
+ the ex1->join edge at X in the diagram above, and inserting a condition
+ whose one arm goes to pre2, resulting in this situation:
+
+ .------if (cond)------.
+ v v
+ pre1 .---------->pre2
+ | | |
+ .--->h1 | h2<----.
+ | | | | |
+ | ex1---. | .---ex2 |
+ | / v | | \ |
+ '---l1 skip---' | l2---'
+ | |
+ | |
+ '--->join<---'
+
+
+ The condition used is the exit condition of LOOP1, which effectively means
+ that when the first loop exits (for whatever reason) but the real original
+ exit expression is still false the second loop will be entered.
+ The function returns the new edge cond->pre2.
+
+ This doesn't update the SSA form, see connect_loop_phis for that. */
+
+static edge
+connect_loops (struct loop *loop1, struct loop *loop2)
+{
+ edge exit = single_exit (loop1);
+ basic_block skip_bb = split_edge (exit);
+ gcond *skip_stmt;
+ gimple_stmt_iterator gsi;
+ edge new_e, skip_e;
+
+ gimple *stmt = last_stmt (exit->src);
+ skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
+ gimple_cond_lhs (stmt),
+ gimple_cond_rhs (stmt),
+ NULL_TREE, NULL_TREE);
+ gsi = gsi_last_bb (skip_bb);
+ gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
+
+ skip_e = EDGE_SUCC (skip_bb, 0);
+ skip_e->flags &= ~EDGE_FALLTHRU;
+ new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
+ if (exit->flags & EDGE_TRUE_VALUE)
+ {
+ skip_e->flags |= EDGE_TRUE_VALUE;
+ new_e->flags |= EDGE_FALSE_VALUE;
+ }
+ else
+ {
+ skip_e->flags |= EDGE_FALSE_VALUE;
+ new_e->flags |= EDGE_TRUE_VALUE;
+ }
+
+ new_e->count = skip_bb->count;
+ new_e->probability = PROB_LIKELY;
+ new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
+ skip_e->count -= new_e->count;
+ skip_e->probability = inverse_probability (PROB_LIKELY);
+
+ return new_e;
+}
+
+/* This returns the new bound for iterations given the original iteration
+ space in NITER, an arbitrary new bound BORDER, assumed to be some
+ comparison value with a different IV, the initial value GUARD_INIT of
+ that other IV, and the comparison code GUARD_CODE that compares
+ that other IV with BORDER. We return an SSA name, and place any
+ necessary statements for that computation into *STMTS.
+
+ For example for such a loop:
+
+ for (i = beg, j = guard_init; i < end; i++, j++)
+ if (j < border) // this is supposed to be true/false
+ ...
+
+ we want to return a new bound (on j) that makes the loop iterate
+ as long as the condition j < border stays true. We also don't want
+ to iterate more often than the original loop, so we have to introduce
+ some cut-off as well (via min/max), effectively resulting in:
+
+ newend = min (end+guard_init-beg, border)
+ for (i = beg; j = guard_init; j < newend; i++, j++)
+ if (j < c)
+ ...
+
+ Depending on the direction of the IVs and if the exit tests
+ are strict or non-strict we need to use MIN or MAX,
+ and add or subtract 1. This routine computes newend above. */
+
+static tree
+compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
+ tree border,
+ enum tree_code guard_code, tree guard_init)
+{
+ /* The niter structure contains the after-increment IV, we need
+ the loop-enter base, so subtract STEP once. */
+ tree controlbase = force_gimple_operand (niter->control.base,
+ stmts, true, NULL_TREE);
+ tree controlstep = niter->control.step;
+ tree enddiff;
+ if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
+ {
+ controlstep = gimple_build (stmts, NEGATE_EXPR,
+ TREE_TYPE (controlstep), controlstep);
+ enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
+ TREE_TYPE (controlbase),
+ controlbase, controlstep);
+ }
+ else
+ enddiff = gimple_build (stmts, MINUS_EXPR,
+ TREE_TYPE (controlbase),
+ controlbase, controlstep);
+
+ /* Compute beg-guard_init. */
+ if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
+ {
+ tree tem = gimple_convert (stmts, sizetype, guard_init);
+ tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
+ enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
+ TREE_TYPE (enddiff),
+ enddiff, tem);
+ }
+ else
+ enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+ enddiff, guard_init);
+
+ /* Compute end-(beg-guard_init). */
+ gimple_seq stmts2;
+ tree newbound = force_gimple_operand (niter->bound, &stmts2,
+ true, NULL_TREE);
+ gimple_seq_add_seq_without_update (stmts, stmts2);
+
+ if (POINTER_TYPE_P (TREE_TYPE (enddiff))
+ || POINTER_TYPE_P (TREE_TYPE (newbound)))
+ {
+ enddiff = gimple_convert (stmts, sizetype, enddiff);
+ enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
+ newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
+ TREE_TYPE (newbound),
+ newbound, enddiff);
+ }
+ else
+ newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
+ newbound, enddiff);
+
+ /* Depending on the direction of the IVs the new bound for the first
+ loop is the minimum or maximum of old bound and border.
+ Also, if the guard condition isn't strictly less or greater,
+ we need to adjust the bound. */
+ int addbound = 0;
+ enum tree_code minmax;
+ if (niter->cmp == LT_EXPR)
+ {
+ /* GT and LE are the same, inverted. */
+ if (guard_code == GT_EXPR || guard_code == LE_EXPR)
+ addbound = -1;
+ minmax = MIN_EXPR;
+ }
+ else
+ {
+ gcc_assert (niter->cmp == GT_EXPR);
+ if (guard_code == GE_EXPR || guard_code == LT_EXPR)
+ addbound = 1;
+ minmax = MAX_EXPR;
+ }
+
+ if (addbound)
+ {
+ tree type2 = TREE_TYPE (newbound);
+ if (POINTER_TYPE_P (type2))
+ type2 = sizetype;
+ newbound = gimple_build (stmts,
+ POINTER_TYPE_P (TREE_TYPE (newbound))
+ ? POINTER_PLUS_EXPR : PLUS_EXPR,
+ TREE_TYPE (newbound),
+ newbound,
+ build_int_cst (type2, addbound));
+ }
+
+ tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
+ border, newbound);
+ return newend;
+}
+
+/* Checks if LOOP contains an conditional block whose condition
+ depends on which side in the iteration space it is, and if so
+ splits the iteration space into two loops. Returns true if the
+ loop was split. NITER must contain the iteration descriptor for the
+ single exit of LOOP. */
+
+static bool
+split_loop (struct loop *loop1, struct tree_niter_desc *niter)
+{
+ basic_block *bbs;
+ unsigned i;
+ bool changed = false;
+ tree guard_iv;
+ tree border;
+ affine_iv iv;
+
+ bbs = get_loop_body (loop1);
+
+ /* Find a splitting opportunity. */
+ for (i = 0; i < loop1->num_nodes; i++)
+ if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
+ {
+ /* Handling opposite steps is not implemented yet. Neither
+ is handling different step sizes. */
+ if ((tree_int_cst_sign_bit (iv.step)
+ != tree_int_cst_sign_bit (niter->control.step))
+ || !tree_int_cst_equal (iv.step, niter->control.step))
+ continue;
+
+ /* Find a loop PHI node that defines guard_iv directly,
+ or create one doing that. */
+ gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
+ if (!phi)
+ continue;
+ gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
+ tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
+ loop_preheader_edge (loop1));
+ enum tree_code guard_code = gimple_cond_code (guard_stmt);
+
+ /* Loop splitting is implemented by versioning the loop, placing
+ the new loop after the old loop, make the first loop iterate
+ as long as the conditional stays true (or false) and let the
+ second (new) loop handle the rest of the iterations.
+
+ First we need to determine if the condition will start being true
+ or false in the first loop. */
+ bool initial_true;
+ switch (guard_code)
+ {
+ case LT_EXPR:
+ case LE_EXPR:
+ initial_true = !tree_int_cst_sign_bit (iv.step);
+ break;
+ case GT_EXPR:
+ case GE_EXPR:
+ initial_true = tree_int_cst_sign_bit (iv.step);
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ /* Build a condition that will skip the first loop when the
+ guard condition won't ever be true (or false). */
+ gimple_seq stmts2;
+ border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
+ if (stmts2)
+ gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
+ stmts2);
+ tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
+ if (!initial_true)
+ cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+ /* Now version the loop, placing loop2 after loop1 connecting
+ them, and fix up SSA form for that. */
+ initialize_original_copy_tables ();
+ basic_block cond_bb;
+ struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
+ REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+ REG_BR_PROB_BASE, true);
+ gcc_assert (loop2);
+ update_ssa (TODO_update_ssa);
+
+ edge new_e = connect_loops (loop1, loop2);
+ connect_loop_phis (loop1, loop2, new_e);
+
+ /* The iterations of the second loop is now already
+ exactly those that the first loop didn't do, but the
+ iteration space of the first loop is still the original one.
+ Compute the new bound for the guarding IV and patch the
+ loop exit to use it instead of original IV and bound. */
+ gimple_seq stmts = NULL;
+ tree newend = compute_new_first_bound (&stmts, niter, border,
+ guard_code, guard_init);
+ if (stmts)
+ gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
+ stmts);
+ tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
+ patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
+
+ /* Finally patch out the two copies of the condition to be always
+ true/false (or opposite). */
+ gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
+ gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
+ if (!initial_true)
+ std::swap (force_true, force_false);
+ gimple_cond_make_true (force_true);
+ gimple_cond_make_false (force_false);
+ update_stmt (force_true);
+ update_stmt (force_false);
+
+ free_original_copy_tables ();
+
+ /* We destroyed LCSSA form above. Eventually we might be able
+ to fix it on the fly, for now simply punt and use the helper. */
+ rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+ changed = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, ";; Loop split.\n");
+
+ /* Only deal with the first opportunity. */
+ break;
+ }
+
+ free (bbs);
+ return changed;
+}
+
+/* Main entry point. Perform loop splitting on all suitable loops. */
+
+static unsigned int
+tree_ssa_split_loops (void)
+{
+ struct loop *loop;
+ bool changed = false;
+
+ gcc_assert (scev_initialized_p ());
+ FOR_EACH_LOOP (loop, 0)
+ loop->aux = NULL;
+
+ /* Go through all loops starting from innermost. */
+ FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+ {
+ struct tree_niter_desc niter;
+ if (loop->aux)
+ {
+ /* If any of our inner loops was split, don't split us,
+ and mark our containing loop as having had splits as well. */
+ loop_outer (loop)->aux = loop;
+ continue;
+ }
+
+ if (single_exit (loop)
+ /* ??? We could handle non-empty latches when we split
+ the latch edge (not the exit edge), and put the new
+ exit condition in the new block. OTOH this executes some
+ code unconditionally that might have been skipped by the
+ original exit before. */
+ && empty_block_p (loop->latch)
+ && !optimize_loop_for_size_p (loop)
+ && number_of_iterations_exit (loop, single_exit (loop), &niter,
+ false, true)
+ && niter.cmp != ERROR_MARK
+ /* We can't yet handle loops controlled by a != predicate. */
+ && niter.cmp != NE_EXPR)
+ {
+ if (split_loop (loop, &niter))
+ {
+ /* Mark our containing loop as having had some split inner
+ loops. */
+ loop_outer (loop)->aux = loop;
+ changed = true;
+ }
+ }
+ }
+
+ FOR_EACH_LOOP (loop, 0)
+ loop->aux = NULL;
+
+ if (changed)
+ return TODO_cleanup_cfg;
+ return 0;
+}
+
+/* Loop splitting pass. */
+
+namespace {
+
+const pass_data pass_data_loop_split =
+{
+ GIMPLE_PASS, /* type */
+ "lsplit", /* name */
+ OPTGROUP_LOOP, /* optinfo_flags */
+ TV_LOOP_SPLIT, /* tv_id */
+ PROP_cfg, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ 0, /* todo_flags_finish */
+};
+
+class pass_loop_split : public gimple_opt_pass
+{
+public:
+ pass_loop_split (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_loop_split, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual bool gate (function *) { return flag_split_loops != 0; }
+ virtual unsigned int execute (function *);
+
+}; // class pass_loop_split
+
+unsigned int
+pass_loop_split::execute (function *fun)
+{
+ if (number_of_loops (fun) <= 1)
+ return 0;
+
+ return tree_ssa_split_loops ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_loop_split (gcc::context *ctxt)
+{
+ return new pass_loop_split (ctxt);
+}
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 231115)
+++ doc/invoke.texi (working copy)
@@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
-fselective-scheduling -fselective-scheduling2 @gol
-fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
-fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
--fsingle-precision-constant -fsplit-ivs-in-unroller @gol
+-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
-fsplit-paths @gol
-fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
-fstack-protector -fstack-protector-all -fstack-protector-strong @gol
@@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
at level @option{-O1}
+@item -fsplit-loops
+@opindex fsplit-loops
+Split a loop into two if it contains a condition that's always true
+for one side of the iteration space and false for the other.
+
@item -funswitch-loops
@opindex funswitch-loops
Move branches with loop invariant conditions out of the loop, with duplicates
Index: doc/passes.texi
===================================================================
--- doc/passes.texi (revision 231115)
+++ doc/passes.texi (working copy)
@@ -484,6 +484,12 @@ out of the loops. To achieve this, a du
each possible outcome of conditional jump(s). The pass is implemented in
@file{tree-ssa-loop-unswitch.c}.
+Loop splitting. If a loop contains a conditional statement that is
+always true for one part of the iteration space and false for the other
+this pass splits the loop into two, one dealing with one side the other
+only with the other, thereby removing one inner-loop conditional. The
+pass is implemented in @file{tree-ssa-loop-split.c}.
+
The optimizations also use various utility functions contained in
@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
@file{cfgloopmanip.c}.
Index: testsuite/gcc.dg/loop-split.c
===================================================================
--- testsuite/gcc.dg/loop-split.c (revision 0)
+++ testsuite/gcc.dg/loop-split.c (working copy)
@@ -0,0 +1,147 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+#ifdef __cplusplus
+extern "C" int printf (const char *, ...);
+extern "C" void abort (void);
+#else
+extern int printf (const char *, ...);
+extern void abort (void);
+#endif
+
+/* Define TRACE to 1 or 2 to get detailed tracing.
+ Define SINGLE_TEST to 1 or 2 to get a simple routine with
+ just one loop, called only one time or with multiple parameters,
+ to make debugging easier. */
+#ifndef TRACE
+#define TRACE 0
+#endif
+
+#define loop(beg,step,beg2,cond1,cond2) \
+ do \
+ { \
+ sum = 0; \
+ for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
+ { \
+ if (cond2) { \
+ if (TRACE > 1) printf ("a: %d %d\n", i, j); \
+ sum += a[i]; \
+ } else { \
+ if (TRACE > 1) printf ("b: %d %d\n", i, j); \
+ sum += b[i]; \
+ } \
+ } \
+ if (TRACE > 0) printf ("sum: %d\n", sum); \
+ check = check * 47 + sum; \
+ } while (0)
+
+#ifndef SINGLE_TEST
+unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
+ int c, int *a, int *b, int beg2)
+{
+ unsigned check = 0;
+ int sum;
+ int i, j;
+ loop (beg, 1, beg2, i < end, j < c);
+ loop (beg, 1, beg2, i <= end, j < c);
+ loop (beg, 1, beg2, i < end, j <= c);
+ loop (beg, 1, beg2, i <= end, j <= c);
+ loop (beg, 1, beg2, i < end, j > c);
+ loop (beg, 1, beg2, i <= end, j > c);
+ loop (beg, 1, beg2, i < end, j >= c);
+ loop (beg, 1, beg2, i <= end, j >= c);
+ beg2 += end-beg;
+ loop (end, -1, beg2, i >= beg, j >= c);
+ loop (end, -1, beg2, i >= beg, j > c);
+ loop (end, -1, beg2, i > beg, j >= c);
+ loop (end, -1, beg2, i > beg, j > c);
+ loop (end, -1, beg2, i >= beg, j <= c);
+ loop (end, -1, beg2, i >= beg, j < c);
+ loop (end, -1, beg2, i > beg, j <= c);
+ loop (end, -1, beg2, i > beg, j < c);
+ return check;
+}
+
+#else
+
+int __attribute__((noinline, noclone)) f (int beg, int end, int step,
+ int c, int *a, int *b, int beg2)
+{
+ int sum = 0;
+ int i, j;
+ //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+ for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
+ {
+ // i - j == X --> i = X + j
+ // --> i < end == X+j < end == j < end - X
+ // --> newend = end - (i_init - j_init)
+ // j < end-X && j < c --> j < min(end-X,c)
+ // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
+ //if (j < c)
+ if (j >= c)
+ printf ("a: %d %d\n", i, j);
+ /*else
+ printf ("b: %d %d\n", i, j);*/
+ /*sum += a[i];
+ else
+ sum += b[i];*/
+ }
+ return sum;
+}
+
+int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
+ int *c, int *a, int *b, int *beg2)
+{
+ int sum = 0;
+ int *i, *j;
+ for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
+ {
+ if (j <= c)
+ printf ("%d %d\n", i - beg, j - beg);
+ /*sum += a[i];
+ else
+ sum += b[i];*/
+ }
+ return sum;
+}
+#endif
+
+extern int printf (const char *, ...);
+
+int main ()
+{
+ int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
+ int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
+ int c;
+ int diff = 0;
+ unsigned check = 0;
+#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
+ //dotest (0, 9, 1, -1, a+5, b+5, -1);
+ //return 0;
+ f (0, 9, 1, 5, a+5, b+5, -1);
+ return 0;
+#endif
+ for (diff = -5; diff <= 5; diff++)
+ {
+ for (c = -1; c <= 10; c++)
+ {
+#ifdef SINGLE_TEST
+ int s = f (0, 9, 1, c, a+5, b+5, diff);
+ //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
+ printf ("%d ", s);
+#else
+ if (TRACE > 0)
+ printf ("check %d %d\n", c, diff);
+ check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
+#endif
+ }
+ //printf ("\n");
+ }
+ //printf ("%u\n", check);
+ if (check != 3213344948)
+ abort ();
+ return 0;
+}
+
+/* All 16 loops in dotest should be split. */
+/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2015-12-02 13:23 ` Michael Matz
@ 2015-12-05 7:55 ` Jeff Law
2016-10-20 14:43 ` Michael Matz
2016-07-25 20:57 ` Andrew Pinski
1 sibling, 1 reply; 20+ messages in thread
From: Jeff Law @ 2015-12-05 7:55 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
On 12/02/2015 06:23 AM, Michael Matz wrote:
> Hi,
>
> On Tue, 1 Dec 2015, Jeff Law wrote:
>
>>> So, okay for trunk?
>> -ENOPATCH
>
> Sigh :)
> Here it is.
>
>
> Ciao,
> Michael.
> * common.opt (-fsplit-loops): New flag.
> * passes.def (pass_loop_split): Add.
> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
> (enable_fdo_optimizations): Add loop splitting.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> * tree-ssa-loop-split.c: New file.
> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
> * doc/invoke.texi (fsplit-loops): Document.
> * doc/passes.texi (Loop optimization): Add paragraph about loop
> splitting.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: tree-ssa-loop-split.c
> +/* Return true when BB inside LOOP is a potential iteration space
> + split point, i.e. ends with a condition like "IV < comp", which
> + is true on one side of the iteration space and false on the other,
> + and the split point can be computed. If so, also return the border
> + point in *BORDER and the comparison induction variable in IV. */
> +
> +static tree
> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +{
> + gimple *last;
> + gcond *stmt;
> + affine_iv iv2;
> +
+
> + /* Make it so, that the first argument of the condition is
> + the looping one (only swap. */
Nit. I don't think you want a comma after "so". And it looks like your
comment got truncated as well.
With the comment above fixed, this is fine for the trunk.
jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting
2015-11-12 16:52 Gimple loop splitting Michael Matz
2015-11-12 21:44 ` Jeff Law
@ 2016-07-25 7:00 ` Andrew Pinski
2016-07-25 14:27 ` Michael Matz
1 sibling, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2016-07-25 7:00 UTC (permalink / raw)
To: Michael Matz; +Cc: GCC Patches
On Thu, Nov 12, 2015 at 8:52 AM, Michael Matz <matz@suse.de> wrote:
> Hello,
>
> this new pass implements loop iteration space splitting for loops that
> contain a conditional that's always true for one part of the iteration
> space and false for the other, i.e. such situations:
>
> for (i = beg; i < end; i++)
> if (i < p)
> dothis();
> else
> dothat();
>
> this is transformed into roughly:
>
> for (i = beg; i < p; i++)
> dothis();
> for (; i < end; i++)
> dothat();
>
> Of course, not quite the above as there needs to be provisions for the
> border conditions, if e.g. 'p' is outside the original iteration space, or
> the conditional doesn't directly use the control IV, but some other, or
> the IV runs backwards. The testcase checks many of these border
> conditions.
>
> This transformation is in itself a good one but can also be an enabler for
> the vectorizer. It does increase code size, when the loop body contains
> also unconditional code (that one is duplicated), so we only transform hot
> loops. I'm a bit unsure of the placement of the new pass, or if it should
> be an own pass at all. Right now I've placed it after unswitching and
> scev_cprop, before loop distribution. Ideally I think all three, together
> with loop fusion and an gimple unroller should be integrated into one loop
> nest optimizer, alas, we aren't there yet.
>
> I'm planning to work on loop fusion in the future as well, but that's not
> for GCC 6.
>
> I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> regressions. I've also checked cpu2006 (the non-fortran part) for
> correctness, not yet for performance. In the end it should probably only
> be enabled for -O3+ (although if the whole loop body is conditional it
> makes sense to also have it with -O2 because code growth is very small
> then).
>
> So, okay for trunk?
What ever happened to this patch? I was looking into doing this
myself today but I found this patch.
It is stage 1 of GCC 7, it might be a good idea to get this patch into GCC.
Thanks,
Andrew
>
>
> Ciao,
> Michael.
> * passes.def (pass_loop_split): Add.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
> gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
> (split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
> split_loop, tree_ssa_split_loops,
> make_pass_loop_split): New functions.
> (pass_data_loop_split): New.
> (pass_loop_split): New.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: passes.def
> ===================================================================
> --- passes.def (revision 229763)
> +++ passes.def (working copy)
> @@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_dce);
> NEXT_PASS (pass_tree_unswitch);
> NEXT_PASS (pass_scev_cprop);
> + NEXT_PASS (pass_loop_split);
> NEXT_PASS (pass_record_bounds);
> NEXT_PASS (pass_loop_distribution);
> NEXT_PASS (pass_copy_prop);
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 229763)
> +++ timevar.def (working copy)
> @@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM , "
> DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
> DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
> DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
> +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
> DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
> DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 229763)
> +++ tree-pass.h (working copy)
> @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
> extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
> Index: tree-ssa-loop-manip.h
> ===================================================================
> --- tree-ssa-loop-manip.h (revision 229763)
> +++ tree-ssa-loop-manip.h (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
> extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
> bool, tree *, tree *);
> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
> + struct loop *);
> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
> extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
> extern void verify_loop_closed_ssa (bool);
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c (revision 229763)
> +++ tree-ssa-loop-unswitch.c (working copy)
> @@ -31,12 +31,20 @@ along with GCC; see the file COPYING3.
> #include "tree-ssa.h"
> #include "tree-ssa-loop-niter.h"
> #include "tree-ssa-loop.h"
> +#include "tree-ssa-loop-manip.h"
> #include "tree-into-ssa.h"
> +#include "cfganal.h"
> #include "cfgloop.h"
> +#include "tree-chrec.h"
> +#include "tree-affine.h"
> +#include "tree-scalar-evolution.h"
> #include "params.h"
> #include "tree-inline.h"
> #include "gimple-iterator.h"
> +#include "gimple-pretty-print.h"
> #include "cfghooks.h"
> +#include "gimple-fold.h"
> +#include "gimplify-me.h"
>
> /* This file implements the loop unswitching, i.e. transformation of loops like
>
> @@ -842,4 +850,551 @@ make_pass_tree_unswitch (gcc::context *c
> return new pass_tree_unswitch (ctxt);
> }
>
> +/* Return true when BB inside LOOP is a potential iteration space
> + split point, i.e. ends with a condition like "IV < comp", which
> + is true on one side of the iteration space and false on the other,
> + and the split point can be computed. If so, also return the border
> + point in *BORDER and the comparison induction variable in IV. */
>
> +static tree
> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +{
> + gimple *last;
> + gcond *stmt;
> + affine_iv iv2;
> +
> + /* BB must end in a simple conditional jump. */
> + last = last_stmt (bb);
> + if (!last || gimple_code (last) != GIMPLE_COND)
> + return NULL_TREE;
> + stmt = as_a <gcond *> (last);
> +
> + enum tree_code code = gimple_cond_code (stmt);
> +
> + /* Only handle relational comparisons, for equality and non-equality
> + we'd have to split the loop into two loops and a middle statement. */
> + switch (code)
> + {
> + case LT_EXPR:
> + case LE_EXPR:
> + case GT_EXPR:
> + case GE_EXPR:
> + break;
> + default:
> + return NULL_TREE;
> + }
> +
> + if (loop_exits_from_bb_p (loop, bb))
> + return NULL_TREE;
> +
> + tree op0 = gimple_cond_lhs (stmt);
> + tree op1 = gimple_cond_rhs (stmt);
> +
> + if (!simple_iv (loop, loop, op0, iv, false))
> + return NULL_TREE;
> + if (!simple_iv (loop, loop, op1, &iv2, false))
> + return NULL_TREE;
> +
> + /* Make it so, that the first argument of the condition is
> + the looping one. */
> + if (integer_zerop (iv->step))
> + {
> + std::swap (op0, op1);
> + std::swap (*iv, iv2);
> + code = swap_tree_comparison (code);
> + gimple_cond_set_condition (stmt, code, op0, op1);
> + update_stmt (stmt);
> + }
> +
> + if (integer_zerop (iv->step))
> + return NULL_TREE;
> + if (!integer_zerop (iv2.step))
> + return NULL_TREE;
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Found potential split point: ");
> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> + fprintf (dump_file, " { ");
> + print_generic_expr (dump_file, iv->base, TDF_SLIM);
> + fprintf (dump_file, " + I*");
> + print_generic_expr (dump_file, iv->step, TDF_SLIM);
> + fprintf (dump_file, " } %s ", get_tree_code_name (code));
> + print_generic_expr (dump_file, iv2.base, TDF_SLIM);
> + fprintf (dump_file, "\n");
> + }
> +
> + *border = iv2.base;
> + return op0;
> +}
> +
> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
> + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
> + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
> + exit test statement to loop back only if the GUARD statement will
> + also be true/false in the next iteration. */
> +
> +static void
> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
> + bool initial_true)
> +{
> + edge exit = single_exit (loop);
> + gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
> + gimple_cond_set_condition (stmt, gimple_cond_code (guard),
> + nextval, newbound);
> + update_stmt (stmt);
> +
> + edge stay = single_pred_edge (loop->latch);
> +
> + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> +
> + if (initial_true)
> + {
> + exit->flags |= EDGE_FALSE_VALUE;
> + stay->flags |= EDGE_TRUE_VALUE;
> + }
> + else
> + {
> + exit->flags |= EDGE_TRUE_VALUE;
> + stay->flags |= EDGE_FALSE_VALUE;
> + }
> +}
> +
> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> + find the loop phi node in LOOP defining it directly, or create
> + such phi node. Return that phi node. */
> +
> +static gphi *
> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
> +{
> + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> + gphi *phi;
> + if ((phi = dyn_cast <gphi *> (def))
> + && gimple_bb (phi) == loop->header)
> + return phi;
> +
> + /* XXX Create the PHI instead. */
> + return NULL;
> +}
> +
> +/* Checks if LOOP contains an conditional block whose condition
> + depends on which side in the iteration space it is, and if so
> + splits the iteration space into two loops. Returns true if the
> + loop was split. NITER must contain the iteration descriptor for the
> + single exit of LOOP. */
> +
> +static bool
> +split_loop (struct loop *loop, struct tree_niter_desc *niter)
> +{
> + basic_block *bbs;
> + unsigned i;
> + bool changed = false;
> + tree guard_iv;
> + tree border;
> + affine_iv iv;
> +
> + bbs = get_loop_body (loop);
> +
> + /* Find a splitting opportunity. */
> + for (i = 0; i < loop->num_nodes; i++)
> + if ((guard_iv = split_at_bb_p (loop, bbs[i], &border, &iv)))
> + {
> + /* Handling opposite steps is not implemented yet. Neither
> + is handling different step sizes. */
> + if ((tree_int_cst_sign_bit (iv.step)
> + != tree_int_cst_sign_bit (niter->control.step))
> + || !tree_int_cst_equal (iv.step, niter->control.step))
> + continue;
> +
> + /* Find a loop PHI node that defines guard_iv directly,
> + or create one doing that. */
> + gphi *phi = find_or_create_guard_phi (loop, guard_iv, &iv);
> + if (!phi)
> + continue;
> + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
> + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
> + loop_preheader_edge (loop));
> + enum tree_code guard_code = gimple_cond_code (guard_stmt);
> +
> + /* Loop splitting is implemented by versioning the loop, placing
> + the new loop in front of the old loop, make the first loop iterate
> + as long as the conditional stays true (or false) and let the
> + second (original) loop handle the rest of the iterations.
> +
> + First we need to determine if the condition will start being true
> + or false in the first loop. */
> + bool initial_true;
> + switch (guard_code)
> + {
> + case LT_EXPR:
> + case LE_EXPR:
> + initial_true = !tree_int_cst_sign_bit (iv.step);
> + break;
> + case GT_EXPR:
> + case GE_EXPR:
> + initial_true = tree_int_cst_sign_bit (iv.step);
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + /* Build a condition that will skip the first loop when the
> + guard condition won't ever be true (or false). */
> + tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> + if (initial_true)
> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> + /* Now version the loop, we will then have this situation:
> + if (!cond)
> + for (...) {body} //floop
> + else
> + for (...) {body} //loop
> + join: */
> + initialize_original_copy_tables ();
> + basic_block cond_bb;
> + struct loop *floop = loop_version (loop, cond, &cond_bb,
> + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> + REG_BR_PROB_BASE, false);
> + gcc_assert (floop);
> + update_ssa (TODO_update_ssa);
> +
> + /* Now diddle the exit edge of the first loop (floop->join in the
> + above) to either go to the common exit (join) or to the second
> + loop, depending on if there are still iterations left, or not.
> + We split the floop exit edge and insert a copy of the
> + original exit expression into the new block, that either
> + skips the second loop or goes to it. */
> + edge exit = single_exit (floop);
> + basic_block skip_bb = split_edge (exit);
> + gcond *skip_stmt;
> + gimple_stmt_iterator gsi;
> + edge new_e, skip_e;
> +
> + gimple *stmt = last_stmt (exit->src);
> + skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
> + gimple_cond_lhs (stmt),
> + gimple_cond_rhs (stmt),
> + NULL_TREE, NULL_TREE);
> + gsi = gsi_last_bb (skip_bb);
> + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
> +
> + skip_e = EDGE_SUCC (skip_bb, 0);
> + skip_e->flags &= ~EDGE_FALLTHRU;
> + new_e = make_edge (skip_bb, loop_preheader_edge (loop)->src, 0);
> + if (exit->flags & EDGE_TRUE_VALUE)
> + {
> + skip_e->flags |= EDGE_TRUE_VALUE;
> + new_e->flags |= EDGE_FALSE_VALUE;
> + }
> + else
> + {
> + skip_e->flags |= EDGE_FALSE_VALUE;
> + new_e->flags |= EDGE_TRUE_VALUE;
> + }
> +
> + new_e->count = skip_bb->count;
> + new_e->probability = PROB_LIKELY;
> + new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
> + skip_e->count -= new_e->count;
> + skip_e->probability = inverse_probability (PROB_LIKELY);
> +
> + /* Now we have created this situation:
> + if (!cond) {
> + for (...) {body; if (cexit) break;}
> + if (!cexit) goto second;
> + } else {
> + second:
> + for (...) {body; if (cexit) break;}
> + }
> + join:
> +
> + The second loop can now be entered by skipping the first
> + loop (the inital values of its PHI nodes will be the
> + original initial values), or by falling in from the first
> + loop (the initial values will be the continuation values
> + from the first loop). Insert PHI nodes reflecting this
> + in the pre-header of the second loop. */
> +
> + basic_block rest = loop_preheader_edge (loop)->src;
> + edge skip_first = find_edge (cond_bb, rest);
> + gcc_assert (skip_first);
> +
> + edge firste = loop_preheader_edge (floop);
> + edge seconde = loop_preheader_edge (loop);
> + edge firstn = loop_latch_edge (floop);
> + gphi *new_guard_phi = 0;
> + gphi_iterator psi_first, psi_second;
> + for (psi_first = gsi_start_phis (floop->header),
> + psi_second = gsi_start_phis (loop->header);
> + !gsi_end_p (psi_first);
> + gsi_next (&psi_first), gsi_next (&psi_second))
> + {
> + tree init, next, new_init;
> + use_operand_p op;
> + gphi *phi_first = psi_first.phi ();
> + gphi *phi_second = psi_second.phi ();
> +
> + if (phi_second == phi)
> + new_guard_phi = phi_first;
> +
> + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
> + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
> + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
> + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
> +
> + /* Prefer using original variable as a base for the new ssa name.
> + This is necessary for virtual ops, and useful in order to avoid
> + losing debug info for real ops. */
> + if (TREE_CODE (next) == SSA_NAME
> + && useless_type_conversion_p (TREE_TYPE (next),
> + TREE_TYPE (init)))
> + new_init = copy_ssa_name (next);
> + else if (TREE_CODE (init) == SSA_NAME
> + && useless_type_conversion_p (TREE_TYPE (init),
> + TREE_TYPE (next)))
> + new_init = copy_ssa_name (init);
> + else if (useless_type_conversion_p (TREE_TYPE (next),
> + TREE_TYPE (init)))
> + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
> + "unrinittmp");
> + else
> + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
> + "unrinittmp");
> +
> + gphi * newphi = create_phi_node (new_init, rest);
> + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
> + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
> + SET_USE (op, new_init);
> + }
> +
> + /* The iterations of the second loop is now already
> + exactly those that the first loop didn't do, but the
> + iteration space of the first loop is still the original one.
> + Build a new one, exactly covering those iterations where
> + the conditional is true (or false). For example, from such a loop:
> +
> + for (i = beg, j = beg2; i < end; i++, j++)
> + if (j < c) // this is supposed to be true
> + ...
> +
> + we build new bounds and change the exit condtions such that
> + it's effectively this:
> +
> + newend = min (end+beg2-beg, c)
> + for (i = beg; j = beg2; j < newend; i++, j++)
> + if (j < c)
> + ...
> +
> + Depending on the direction of the IVs and if the exit tests
> + are strict or include equality we need to use MIN or MAX,
> + and add or subtract 1. */
> +
> + gimple_seq stmts = NULL;
> + /* The niter structure contains the after-increment IV, we need
> + the loop-enter base, so subtract STEP once. */
> + tree controlbase = force_gimple_operand (niter->control.base,
> + &stmts, true, NULL_TREE);
> + tree controlstep = niter->control.step;
> + tree enddiff;
> + if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
> + {
> + controlstep = gimple_build (&stmts, NEGATE_EXPR,
> + TREE_TYPE (controlstep), controlstep);
> + enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
> + TREE_TYPE (controlbase),
> + controlbase, controlstep);
> + }
> + else
> + enddiff = gimple_build (&stmts, MINUS_EXPR,
> + TREE_TYPE (controlbase),
> + controlbase, controlstep);
> +
> + /* Compute beg-beg2. */
> + if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
> + {
> + tree tem = gimple_convert (&stmts, sizetype, guard_init);
> + tem = gimple_build (&stmts, NEGATE_EXPR, sizetype, tem);
> + enddiff = gimple_build (&stmts, POINTER_PLUS_EXPR,
> + TREE_TYPE (enddiff),
> + enddiff, tem);
> + }
> + else
> + enddiff = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> + enddiff, guard_init);
> +
> + /* Compute end-(beg-beg2). */
> + gimple_seq stmts2;
> + tree newbound = force_gimple_operand (niter->bound, &stmts2,
> + true, NULL_TREE);
> + gimple_seq_add_seq_without_update (&stmts, stmts2);
> +
> + if (POINTER_TYPE_P (TREE_TYPE (enddiff))
> + || POINTER_TYPE_P (TREE_TYPE (newbound)))
> + {
> + enddiff = gimple_convert (&stmts, sizetype, enddiff);
> + enddiff = gimple_build (&stmts, NEGATE_EXPR, sizetype, enddiff);
> + newbound = gimple_build (&stmts, POINTER_PLUS_EXPR,
> + TREE_TYPE (newbound),
> + newbound, enddiff);
> + }
> + else
> + newbound = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> + newbound, enddiff);
> +
> + /* Depending on the direction of the IVs the new bound for the first
> + loop is the minimum or maximum of old bound and border.
> + Also, if the guard condition isn't strictly less or greater,
> + we need to adjust the bound. */
> + int addbound = 0;
> + enum tree_code minmax;
> + if (niter->cmp == LT_EXPR)
> + {
> + /* GT and LE are the same, inverted. */
> + if (guard_code == GT_EXPR || guard_code == LE_EXPR)
> + addbound = -1;
> + minmax = MIN_EXPR;
> + }
> + else
> + {
> + gcc_assert (niter->cmp == GT_EXPR);
> + if (guard_code == GE_EXPR || guard_code == LT_EXPR)
> + addbound = 1;
> + minmax = MAX_EXPR;
> + }
> +
> + if (addbound)
> + {
> + tree type2 = TREE_TYPE (newbound);
> + if (POINTER_TYPE_P (type2))
> + type2 = sizetype;
> + newbound = gimple_build (&stmts,
> + POINTER_TYPE_P (TREE_TYPE (newbound))
> + ? POINTER_PLUS_EXPR : PLUS_EXPR,
> + TREE_TYPE (newbound),
> + newbound,
> + build_int_cst (type2, addbound));
> + }
> +
> + tree newend = gimple_build (&stmts, minmax, TREE_TYPE (border),
> + border, newbound);
> + if (stmts)
> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (floop),
> + stmts);
> +
> + /* Now patch the exit block of the first loop to compare
> + the post-increment value of the guarding IV with the new end
> + value. */
> + tree new_guard_next = PHI_ARG_DEF_FROM_EDGE (new_guard_phi,
> + loop_latch_edge (floop));
> + patch_loop_exit (floop, guard_stmt, new_guard_next, newend,
> + initial_true);
> +
> + /* Finally patch out the two copies of the condition to be always
> + true/false (or opposite). */
> + gcond *force_true = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
> + gcond *force_false = as_a<gcond *> (last_stmt (bbs[i]));
> + if (!initial_true)
> + std::swap (force_true, force_false);
> + gimple_cond_make_true (force_true);
> + gimple_cond_make_false (force_false);
> + update_stmt (force_true);
> + update_stmt (force_false);
> +
> + free_original_copy_tables ();
> +
> + /* We destroyed LCSSA form above. Eventually we might be able
> + to fix it on the fly, for now simply punt and use the helper. */
> + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, floop);
> +
> + changed = true;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, ";; Loop split.\n");
> +
> + /* Only deal with the first opportunity. */
> + break;
> + }
> +
> + free (bbs);
> + return changed;
> +}
> +
> +/* Main entry point. Perform loop splitting on all suitable loops. */
> +
> +static unsigned int
> +tree_ssa_split_loops (void)
> +{
> + struct loop *loop;
> + bool changed = false;
> +
> + gcc_assert (scev_initialized_p ());
> + /* Go through all loops starting from innermost. */
> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> + {
> + struct tree_niter_desc niter;
> + if (single_exit (loop)
> + /* ??? We could handle non-empty latches when we split
> + the latch edge (not the exit edge), and put the new
> + exit condition in the new block. OTOH this executes some
> + code unconditionally that might have been skipped by the
> + original exit before. */
> + && empty_block_p (loop->latch)
> + && !optimize_loop_for_size_p (loop)
> + && number_of_iterations_exit (loop, single_exit (loop), &niter,
> + false, true)
> + /* We can't yet handle loops controlled by a != predicate. */
> + && niter.cmp != NE_EXPR)
> + changed |= split_loop (loop, &niter);
> + }
> +
> + if (changed)
> + return TODO_cleanup_cfg;
> + return 0;
> +}
> +
> +/* Loop splitting pass. */
> +
> +namespace {
> +
> +const pass_data pass_data_loop_split =
> +{
> + GIMPLE_PASS, /* type */
> + "lsplit", /* name */
> + OPTGROUP_LOOP, /* optinfo_flags */
> + TV_LOOP_SPLIT, /* tv_id */
> + PROP_cfg, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + 0, /* todo_flags_finish */
> +};
> +
> +class pass_loop_split : public gimple_opt_pass
> +{
> +public:
> + pass_loop_split (gcc::context *ctxt)
> + : gimple_opt_pass (pass_data_loop_split, ctxt)
> + {}
> +
> + /* opt_pass methods: */
> + virtual bool gate (function *) { return optimize >= 2; }
> + virtual unsigned int execute (function *);
> +
> +}; // class pass_loop_split
> +
> +unsigned int
> +pass_loop_split::execute (function *fun)
> +{
> + if (number_of_loops (fun) <= 1)
> + return 0;
> +
> + return tree_ssa_split_loops ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_loop_split (gcc::context *ctxt)
> +{
> + return new pass_loop_split (ctxt);
> +}
> Index: testsuite/gcc.dg/loop-split.c
> ===================================================================
> --- testsuite/gcc.dg/loop-split.c (revision 0)
> +++ testsuite/gcc.dg/loop-split.c (working copy)
> @@ -0,0 +1,141 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
> +
> +#ifdef __cplusplus
> +extern "C" int printf (const char *, ...);
> +extern "C" void abort (void);
> +#else
> +extern int printf (const char *, ...);
> +extern void abort (void);
> +#endif
> +
> +#ifndef TRACE
> +#define TRACE 0
> +#endif
> +
> +#define loop(beg,step,beg2,cond1,cond2) \
> + do \
> + { \
> + sum = 0; \
> + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
> + { \
> + if (cond2) { \
> + if (TRACE > 1) printf ("a: %d %d\n", i, j); \
> + sum += a[i]; \
> + } else { \
> + if (TRACE > 1) printf ("b: %d %d\n", i, j); \
> + sum += b[i]; \
> + } \
> + } \
> + if (TRACE > 0) printf ("sum: %d\n", sum); \
> + check = check * 47 + sum; \
> + } while (0)
> +
> +#if 1
> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
> + int c, int *a, int *b, int beg2)
> +{
> + unsigned check = 0;
> + int sum;
> + int i, j;
> + loop (beg, 1, beg2, i < end, j < c);
> + loop (beg, 1, beg2, i <= end, j < c);
> + loop (beg, 1, beg2, i < end, j <= c);
> + loop (beg, 1, beg2, i <= end, j <= c);
> + loop (beg, 1, beg2, i < end, j > c);
> + loop (beg, 1, beg2, i <= end, j > c);
> + loop (beg, 1, beg2, i < end, j >= c);
> + loop (beg, 1, beg2, i <= end, j >= c);
> + beg2 += end-beg;
> + loop (end, -1, beg2, i >= beg, j >= c);
> + loop (end, -1, beg2, i >= beg, j > c);
> + loop (end, -1, beg2, i > beg, j >= c);
> + loop (end, -1, beg2, i > beg, j > c);
> + loop (end, -1, beg2, i >= beg, j <= c);
> + loop (end, -1, beg2, i >= beg, j < c);
> + loop (end, -1, beg2, i > beg, j <= c);
> + loop (end, -1, beg2, i > beg, j < c);
> + return check;
> +}
> +
> +#else
> +
> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
> + int c, int *a, int *b, int beg2)
> +{
> + int sum = 0;
> + int i, j;
> + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
> + {
> + // i - j == X --> i = X + j
> + // --> i < end == X+j < end == j < end - X
> + // --> newend = end - (i_init - j_init)
> + // j < end-X && j < c --> j < min(end-X,c)
> + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
> + //if (j < c)
> + if (j >= c)
> + printf ("a: %d %d\n", i, j);
> + /*else
> + printf ("b: %d %d\n", i, j);*/
> + /*sum += a[i];
> + else
> + sum += b[i];*/
> + }
> + return sum;
> +}
> +
> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
> + int *c, int *a, int *b, int *beg2)
> +{
> + int sum = 0;
> + int *i, *j;
> + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> + {
> + if (j <= c)
> + printf ("%d %d\n", i - beg, j - beg);
> + /*sum += a[i];
> + else
> + sum += b[i];*/
> + }
> + return sum;
> +}
> +#endif
> +
> +extern int printf (const char *, ...);
> +
> +int main ()
> +{
> + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
> + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
> + int c;
> + int diff = 0;
> + unsigned check = 0;
> + //dotest (0, 9, 1, -1, a+5, b+5, -1);
> + //return 0;
> + //f (0, 9, 1, -1, a+5, b+5, -1);
> + //return 0;
> + for (diff = -5; diff <= 5; diff++)
> + {
> + for (c = -1; c <= 10; c++)
> + {
> +#if 0
> + int s = f (0, 9, 1, c, a+5, b+5, diff);
> + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
> + printf ("%d ", s);
> +#else
> + if (TRACE > 0)
> + printf ("check %d %d\n", c, diff);
> + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
> +#endif
> + }
> + //printf ("\n");
> + }
> + //printf ("%u\n", check);
> + if (check != 3213344948)
> + abort ();
> + return 0;
> +}
> +
> +/* All 16 loops in dotest should be split. */
> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting
2016-07-25 7:00 ` Gimple loop splitting Andrew Pinski
@ 2016-07-25 14:27 ` Michael Matz
0 siblings, 0 replies; 20+ messages in thread
From: Michael Matz @ 2016-07-25 14:27 UTC (permalink / raw)
To: Andrew Pinski; +Cc: GCC Patches
Hi,
On Sun, 24 Jul 2016, Andrew Pinski wrote:
> What ever happened to this patch?
It got accepted but I deferred inclusion in GCC 6 because it
was late in the cycle then and performance results didn't show super
improvements (only looked at cpu2006). No regressions, but no nice
speedups either.
> I was looking into doing this myself today but I found this patch. It is
> stage 1 of GCC 7, it might be a good idea to get this patch into GCC.
Indeed. If you want to performance test it on something you know where it
should help, I'm all ears.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2015-12-02 13:23 ` Michael Matz
2015-12-05 7:55 ` Jeff Law
@ 2016-07-25 20:57 ` Andrew Pinski
2016-07-26 11:32 ` Richard Biener
1 sibling, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2016-07-25 20:57 UTC (permalink / raw)
To: Michael Matz; +Cc: Jeff Law, GCC Patches
On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 1 Dec 2015, Jeff Law wrote:
>
>> > So, okay for trunk?
>> -ENOPATCH
>
> Sigh :)
> Here it is.
I found one problem with it.
Take:
void f(int *a, int M, int *b)
{
for(int i = 0; i <= M; i++)
{
if (i < M)
a[i] = i;
}
}
---- CUT ---
There are two issues with the code as below. The outer most loop's
aux is still set which causes the vectorizer not to vector the loop.
The other issue is I need to run pass_scev_cprop after pass_loop_split
to get the induction variable usage after the loop gone so the
vectorizer will work.
Something like (note this is copy and paste from a terminal):
diff --git a/gcc/passes.def b/gcc/passes.def
index c327900..e8d6ea6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_dce);
NEXT_PASS (pass_tree_unswitch);
- NEXT_PASS (pass_scev_cprop);
NEXT_PASS (pass_loop_split);
+ NEXT_PASS (pass_scev_cprop);
NEXT_PASS (pass_record_bounds);
NEXT_PASS (pass_loop_distribution);
NEXT_PASS (pass_copy_prop);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 5411530..e72ef19 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
gcc_assert (scev_initialized_p ());
FOR_EACH_LOOP (loop, 0)
- loop->aux = NULL;
+ {
+ loop->aux = NULL;
+ if (loop_outer (loop))
+ loop_outer (loop)->aux = NULL;
+ }
/* Go through all loops starting from innermost. */
FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
@@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
}
FOR_EACH_LOOP (loop, 0)
- loop->aux = NULL;
+ {
+ loop->aux = NULL;
+ if (loop_outer (loop))
+ loop_outer (loop)->aux = NULL;
+ }
if (changed)
return TODO_cleanup_cfg;
----- CUT -----
Thanks,
Andrew
>
>
> Ciao,
> Michael.
> * common.opt (-fsplit-loops): New flag.
> * passes.def (pass_loop_split): Add.
> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
> (enable_fdo_optimizations): Add loop splitting.
> * timevar.def (TV_LOOP_SPLIT): Add.
> * tree-pass.h (make_pass_loop_split): Declare.
> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
> * tree-ssa-loop-split.c: New file.
> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
> * doc/invoke.texi (fsplit-loops): Document.
> * doc/passes.texi (Loop optimization): Add paragraph about loop
> splitting.
>
> testsuite/
> * gcc.dg/loop-split.c: New test.
>
> Index: common.opt
> ===================================================================
> --- common.opt (revision 231115)
> +++ common.opt (working copy)
> @@ -2453,6 +2457,10 @@ funswitch-loops
> Common Report Var(flag_unswitch_loops) Optimization
> Perform loop unswitching.
>
> +fsplit-loops
> +Common Report Var(flag_split_loops) Optimization
> +Perform loop splitting.
> +
> funwind-tables
> Common Report Var(flag_unwind_tables) Optimization
> Just generate unwind tables for exception handling.
> Index: passes.def
> ===================================================================
> --- passes.def (revision 231115)
> +++ passes.def (working copy)
> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_dce);
> NEXT_PASS (pass_tree_unswitch);
> NEXT_PASS (pass_scev_cprop);
> + NEXT_PASS (pass_loop_split);
> NEXT_PASS (pass_record_bounds);
> NEXT_PASS (pass_loop_distribution);
> NEXT_PASS (pass_copy_prop);
> Index: opts.c
> ===================================================================
> --- opts.c (revision 231115)
> +++ opts.c (working copy)
> @@ -532,6 +532,7 @@ static const struct default_options defa
> regardless of them being declared inline. */
> { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
> + { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
> opts->x_flag_ipa_cp_alignment = value;
> if (!opts_set->x_flag_predictive_commoning)
> opts->x_flag_predictive_commoning = value;
> + if (!opts_set->x_flag_split_loops)
> + opts->x_flag_split_loops = value;
> if (!opts_set->x_flag_unswitch_loops)
> opts->x_flag_unswitch_loops = value;
> if (!opts_set->x_flag_gcse_after_reload)
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 231115)
> +++ timevar.def (working copy)
> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM , "
> DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
> DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
> DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
> +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
> DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
> DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 231115)
> +++ tree-pass.h (working copy)
> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
> extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
> Index: tree-ssa-loop-manip.h
> ===================================================================
> --- tree-ssa-loop-manip.h (revision 231115)
> +++ tree-ssa-loop-manip.h (working copy)
> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>
> extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
> bool, tree *, tree *);
> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
> + struct loop *);
> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
> extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
> extern void verify_loop_closed_ssa (bool);
> Index: Makefile.in
> ===================================================================
> --- Makefile.in (revision 231115)
> +++ Makefile.in (working copy)
> @@ -1474,6 +1474,7 @@ OBJS = \
> tree-ssa-loop-manip.o \
> tree-ssa-loop-niter.o \
> tree-ssa-loop-prefetch.o \
> + tree-ssa-loop-split.o \
> tree-ssa-loop-unswitch.o \
> tree-ssa-loop.o \
> tree-ssa-math-opts.o \
> Index: tree-ssa-loop-split.c
> ===================================================================
> --- tree-ssa-loop-split.c (revision 0)
> +++ tree-ssa-loop-split.c (working copy)
> @@ -0,0 +1,686 @@
> +/* Loop splitting.
> + Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "fold-const.h"
> +#include "tree-cfg.h"
> +#include "tree-ssa.h"
> +#include "tree-ssa-loop-niter.h"
> +#include "tree-ssa-loop.h"
> +#include "tree-ssa-loop-manip.h"
> +#include "tree-into-ssa.h"
> +#include "cfgloop.h"
> +#include "tree-scalar-evolution.h"
> +#include "gimple-iterator.h"
> +#include "gimple-pretty-print.h"
> +#include "cfghooks.h"
> +#include "gimple-fold.h"
> +#include "gimplify-me.h"
> +
> +/* This file implements loop splitting, i.e. transformation of loops like
> +
> + for (i = 0; i < 100; i++)
> + {
> + if (i < 50)
> + A;
> + else
> + B;
> + }
> +
> + into:
> +
> + for (i = 0; i < 50; i++)
> + {
> + A;
> + }
> + for (; i < 100; i++)
> + {
> + B;
> + }
> +
> + */
> +
> +/* Return true when BB inside LOOP is a potential iteration space
> + split point, i.e. ends with a condition like "IV < comp", which
> + is true on one side of the iteration space and false on the other,
> + and the split point can be computed. If so, also return the border
> + point in *BORDER and the comparison induction variable in IV. */
> +
> +static tree
> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +{
> + gimple *last;
> + gcond *stmt;
> + affine_iv iv2;
> +
> + /* BB must end in a simple conditional jump. */
> + last = last_stmt (bb);
> + if (!last || gimple_code (last) != GIMPLE_COND)
> + return NULL_TREE;
> + stmt = as_a <gcond *> (last);
> +
> + enum tree_code code = gimple_cond_code (stmt);
> +
> + /* Only handle relational comparisons, for equality and non-equality
> + we'd have to split the loop into two loops and a middle statement. */
> + switch (code)
> + {
> + case LT_EXPR:
> + case LE_EXPR:
> + case GT_EXPR:
> + case GE_EXPR:
> + break;
> + default:
> + return NULL_TREE;
> + }
> +
> + if (loop_exits_from_bb_p (loop, bb))
> + return NULL_TREE;
> +
> + tree op0 = gimple_cond_lhs (stmt);
> + tree op1 = gimple_cond_rhs (stmt);
> +
> + if (!simple_iv (loop, loop, op0, iv, false))
> + return NULL_TREE;
> + if (!simple_iv (loop, loop, op1, &iv2, false))
> + return NULL_TREE;
> +
> + /* Make it so, that the first argument of the condition is
> + the looping one (only swap. */
> + if (!integer_zerop (iv2.step))
> + {
> + std::swap (op0, op1);
> + std::swap (*iv, iv2);
> + code = swap_tree_comparison (code);
> + gimple_cond_set_condition (stmt, code, op0, op1);
> + update_stmt (stmt);
> + }
> + else if (integer_zerop (iv->step))
> + return NULL_TREE;
> + if (!integer_zerop (iv2.step))
> + return NULL_TREE;
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Found potential split point: ");
> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> + fprintf (dump_file, " { ");
> + print_generic_expr (dump_file, iv->base, TDF_SLIM);
> + fprintf (dump_file, " + I*");
> + print_generic_expr (dump_file, iv->step, TDF_SLIM);
> + fprintf (dump_file, " } %s ", get_tree_code_name (code));
> + print_generic_expr (dump_file, iv2.base, TDF_SLIM);
> + fprintf (dump_file, "\n");
> + }
> +
> + *border = iv2.base;
> + return op0;
> +}
> +
> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
> + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
> + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
> + exit test statement to loop back only if the GUARD statement will
> + also be true/false in the next iteration. */
> +
> +static void
> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
> + bool initial_true)
> +{
> + edge exit = single_exit (loop);
> + gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
> + gimple_cond_set_condition (stmt, gimple_cond_code (guard),
> + nextval, newbound);
> + update_stmt (stmt);
> +
> + edge stay = single_pred_edge (loop->latch);
> +
> + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
> +
> + if (initial_true)
> + {
> + exit->flags |= EDGE_FALSE_VALUE;
> + stay->flags |= EDGE_TRUE_VALUE;
> + }
> + else
> + {
> + exit->flags |= EDGE_TRUE_VALUE;
> + stay->flags |= EDGE_FALSE_VALUE;
> + }
> +}
> +
> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> + find the loop phi node in LOOP defining it directly, or create
> + such phi node. Return that phi node. */
> +
> +static gphi *
> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
> +{
> + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> + gphi *phi;
> + if ((phi = dyn_cast <gphi *> (def))
> + && gimple_bb (phi) == loop->header)
> + return phi;
> +
> + /* XXX Create the PHI instead. */
> + return NULL;
> +}
> +
> +/* This function updates the SSA form after connect_loops made a new
> + edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
> + conditional). I.e. the second loop can now be entered either
> + via the original entry or via NEW_E, so the entry values of LOOP2
> + phi nodes are either the original ones or those at the exit
> + of LOOP1. Insert new phi nodes in LOOP2 pre-header reflecting
> + this. */
> +
> +static void
> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
> +{
> + basic_block rest = loop_preheader_edge (loop2)->src;
> + gcc_assert (new_e->dest == rest);
> + edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
> +
> + edge firste = loop_preheader_edge (loop1);
> + edge seconde = loop_preheader_edge (loop2);
> + edge firstn = loop_latch_edge (loop1);
> + gphi_iterator psi_first, psi_second;
> + for (psi_first = gsi_start_phis (loop1->header),
> + psi_second = gsi_start_phis (loop2->header);
> + !gsi_end_p (psi_first);
> + gsi_next (&psi_first), gsi_next (&psi_second))
> + {
> + tree init, next, new_init;
> + use_operand_p op;
> + gphi *phi_first = psi_first.phi ();
> + gphi *phi_second = psi_second.phi ();
> +
> + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
> + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
> + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
> + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
> +
> + /* Prefer using original variable as a base for the new ssa name.
> + This is necessary for virtual ops, and useful in order to avoid
> + losing debug info for real ops. */
> + if (TREE_CODE (next) == SSA_NAME
> + && useless_type_conversion_p (TREE_TYPE (next),
> + TREE_TYPE (init)))
> + new_init = copy_ssa_name (next);
> + else if (TREE_CODE (init) == SSA_NAME
> + && useless_type_conversion_p (TREE_TYPE (init),
> + TREE_TYPE (next)))
> + new_init = copy_ssa_name (init);
> + else if (useless_type_conversion_p (TREE_TYPE (next),
> + TREE_TYPE (init)))
> + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
> + "unrinittmp");
> + else
> + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
> + "unrinittmp");
> +
> + gphi * newphi = create_phi_node (new_init, rest);
> + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
> + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
> + SET_USE (op, new_init);
> + }
> +}
> +
> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
> + they are still equivalent and placed in two arms of a diamond, like so:
> +
> + .------if (cond)------.
> + v v
> + pre1 pre2
> + | |
> + .--->h1 h2<----.
> + | | | |
> + | ex1---. .---ex2 |
> + | / | | \ |
> + '---l1 X | l2---'
> + | |
> + | |
> + '--->join<---'
> +
> + This function transforms the program such that LOOP1 is conditionally
> + falling through to LOOP2, or skipping it. This is done by splitting
> + the ex1->join edge at X in the diagram above, and inserting a condition
> + whose one arm goes to pre2, resulting in this situation:
> +
> + .------if (cond)------.
> + v v
> + pre1 .---------->pre2
> + | | |
> + .--->h1 | h2<----.
> + | | | | |
> + | ex1---. | .---ex2 |
> + | / v | | \ |
> + '---l1 skip---' | l2---'
> + | |
> + | |
> + '--->join<---'
> +
> +
> + The condition used is the exit condition of LOOP1, which effectively means
> + that when the first loop exits (for whatever reason) but the real original
> + exit expression is still false the second loop will be entered.
> + The function returns the new edge cond->pre2.
> +
> + This doesn't update the SSA form, see connect_loop_phis for that. */
> +
> +static edge
> +connect_loops (struct loop *loop1, struct loop *loop2)
> +{
> + edge exit = single_exit (loop1);
> + basic_block skip_bb = split_edge (exit);
> + gcond *skip_stmt;
> + gimple_stmt_iterator gsi;
> + edge new_e, skip_e;
> +
> + gimple *stmt = last_stmt (exit->src);
> + skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
> + gimple_cond_lhs (stmt),
> + gimple_cond_rhs (stmt),
> + NULL_TREE, NULL_TREE);
> + gsi = gsi_last_bb (skip_bb);
> + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
> +
> + skip_e = EDGE_SUCC (skip_bb, 0);
> + skip_e->flags &= ~EDGE_FALLTHRU;
> + new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
> + if (exit->flags & EDGE_TRUE_VALUE)
> + {
> + skip_e->flags |= EDGE_TRUE_VALUE;
> + new_e->flags |= EDGE_FALSE_VALUE;
> + }
> + else
> + {
> + skip_e->flags |= EDGE_FALSE_VALUE;
> + new_e->flags |= EDGE_TRUE_VALUE;
> + }
> +
> + new_e->count = skip_bb->count;
> + new_e->probability = PROB_LIKELY;
> + new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
> + skip_e->count -= new_e->count;
> + skip_e->probability = inverse_probability (PROB_LIKELY);
> +
> + return new_e;
> +}
> +
> +/* This returns the new bound for iterations given the original iteration
> + space in NITER, an arbitrary new bound BORDER, assumed to be some
> + comparison value with a different IV, the initial value GUARD_INIT of
> + that other IV, and the comparison code GUARD_CODE that compares
> + that other IV with BORDER. We return an SSA name, and place any
> + necessary statements for that computation into *STMTS.
> +
> + For example for such a loop:
> +
> + for (i = beg, j = guard_init; i < end; i++, j++)
> + if (j < border) // this is supposed to be true/false
> + ...
> +
> + we want to return a new bound (on j) that makes the loop iterate
> + as long as the condition j < border stays true. We also don't want
> + to iterate more often than the original loop, so we have to introduce
> + some cut-off as well (via min/max), effectively resulting in:
> +
> + newend = min (end+guard_init-beg, border)
> + for (i = beg; j = guard_init; j < newend; i++, j++)
> + if (j < c)
> + ...
> +
> + Depending on the direction of the IVs and if the exit tests
> + are strict or non-strict we need to use MIN or MAX,
> + and add or subtract 1. This routine computes newend above. */
> +
> +static tree
> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
> + tree border,
> + enum tree_code guard_code, tree guard_init)
> +{
> + /* The niter structure contains the after-increment IV, we need
> + the loop-enter base, so subtract STEP once. */
> + tree controlbase = force_gimple_operand (niter->control.base,
> + stmts, true, NULL_TREE);
> + tree controlstep = niter->control.step;
> + tree enddiff;
> + if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
> + {
> + controlstep = gimple_build (stmts, NEGATE_EXPR,
> + TREE_TYPE (controlstep), controlstep);
> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
> + TREE_TYPE (controlbase),
> + controlbase, controlstep);
> + }
> + else
> + enddiff = gimple_build (stmts, MINUS_EXPR,
> + TREE_TYPE (controlbase),
> + controlbase, controlstep);
> +
> + /* Compute beg-guard_init. */
> + if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
> + {
> + tree tem = gimple_convert (stmts, sizetype, guard_init);
> + tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
> + TREE_TYPE (enddiff),
> + enddiff, tem);
> + }
> + else
> + enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> + enddiff, guard_init);
> +
> + /* Compute end-(beg-guard_init). */
> + gimple_seq stmts2;
> + tree newbound = force_gimple_operand (niter->bound, &stmts2,
> + true, NULL_TREE);
> + gimple_seq_add_seq_without_update (stmts, stmts2);
> +
> + if (POINTER_TYPE_P (TREE_TYPE (enddiff))
> + || POINTER_TYPE_P (TREE_TYPE (newbound)))
> + {
> + enddiff = gimple_convert (stmts, sizetype, enddiff);
> + enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
> + newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
> + TREE_TYPE (newbound),
> + newbound, enddiff);
> + }
> + else
> + newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
> + newbound, enddiff);
> +
> + /* Depending on the direction of the IVs the new bound for the first
> + loop is the minimum or maximum of old bound and border.
> + Also, if the guard condition isn't strictly less or greater,
> + we need to adjust the bound. */
> + int addbound = 0;
> + enum tree_code minmax;
> + if (niter->cmp == LT_EXPR)
> + {
> + /* GT and LE are the same, inverted. */
> + if (guard_code == GT_EXPR || guard_code == LE_EXPR)
> + addbound = -1;
> + minmax = MIN_EXPR;
> + }
> + else
> + {
> + gcc_assert (niter->cmp == GT_EXPR);
> + if (guard_code == GE_EXPR || guard_code == LT_EXPR)
> + addbound = 1;
> + minmax = MAX_EXPR;
> + }
> +
> + if (addbound)
> + {
> + tree type2 = TREE_TYPE (newbound);
> + if (POINTER_TYPE_P (type2))
> + type2 = sizetype;
> + newbound = gimple_build (stmts,
> + POINTER_TYPE_P (TREE_TYPE (newbound))
> + ? POINTER_PLUS_EXPR : PLUS_EXPR,
> + TREE_TYPE (newbound),
> + newbound,
> + build_int_cst (type2, addbound));
> + }
> +
> + tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
> + border, newbound);
> + return newend;
> +}
> +
> +/* Checks if LOOP contains an conditional block whose condition
> + depends on which side in the iteration space it is, and if so
> + splits the iteration space into two loops. Returns true if the
> + loop was split. NITER must contain the iteration descriptor for the
> + single exit of LOOP. */
> +
> +static bool
> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
> +{
> + basic_block *bbs;
> + unsigned i;
> + bool changed = false;
> + tree guard_iv;
> + tree border;
> + affine_iv iv;
> +
> + bbs = get_loop_body (loop1);
> +
> + /* Find a splitting opportunity. */
> + for (i = 0; i < loop1->num_nodes; i++)
> + if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
> + {
> + /* Handling opposite steps is not implemented yet. Neither
> + is handling different step sizes. */
> + if ((tree_int_cst_sign_bit (iv.step)
> + != tree_int_cst_sign_bit (niter->control.step))
> + || !tree_int_cst_equal (iv.step, niter->control.step))
> + continue;
> +
> + /* Find a loop PHI node that defines guard_iv directly,
> + or create one doing that. */
> + gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
> + if (!phi)
> + continue;
> + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
> + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
> + loop_preheader_edge (loop1));
> + enum tree_code guard_code = gimple_cond_code (guard_stmt);
> +
> + /* Loop splitting is implemented by versioning the loop, placing
> + the new loop after the old loop, make the first loop iterate
> + as long as the conditional stays true (or false) and let the
> + second (new) loop handle the rest of the iterations.
> +
> + First we need to determine if the condition will start being true
> + or false in the first loop. */
> + bool initial_true;
> + switch (guard_code)
> + {
> + case LT_EXPR:
> + case LE_EXPR:
> + initial_true = !tree_int_cst_sign_bit (iv.step);
> + break;
> + case GT_EXPR:
> + case GE_EXPR:
> + initial_true = tree_int_cst_sign_bit (iv.step);
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + /* Build a condition that will skip the first loop when the
> + guard condition won't ever be true (or false). */
> + gimple_seq stmts2;
> + border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
> + if (stmts2)
> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> + stmts2);
> + tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> + if (!initial_true)
> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> + /* Now version the loop, placing loop2 after loop1 connecting
> + them, and fix up SSA form for that. */
> + initialize_original_copy_tables ();
> + basic_block cond_bb;
> + struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
> + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
> + REG_BR_PROB_BASE, true);
> + gcc_assert (loop2);
> + update_ssa (TODO_update_ssa);
> +
> + edge new_e = connect_loops (loop1, loop2);
> + connect_loop_phis (loop1, loop2, new_e);
> +
> + /* The iterations of the second loop is now already
> + exactly those that the first loop didn't do, but the
> + iteration space of the first loop is still the original one.
> + Compute the new bound for the guarding IV and patch the
> + loop exit to use it instead of original IV and bound. */
> + gimple_seq stmts = NULL;
> + tree newend = compute_new_first_bound (&stmts, niter, border,
> + guard_code, guard_init);
> + if (stmts)
> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> + stmts);
> + tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
> + patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
> +
> + /* Finally patch out the two copies of the condition to be always
> + true/false (or opposite). */
> + gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
> + gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
> + if (!initial_true)
> + std::swap (force_true, force_false);
> + gimple_cond_make_true (force_true);
> + gimple_cond_make_false (force_false);
> + update_stmt (force_true);
> + update_stmt (force_false);
> +
> + free_original_copy_tables ();
> +
> + /* We destroyed LCSSA form above. Eventually we might be able
> + to fix it on the fly, for now simply punt and use the helper. */
> + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> + changed = true;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, ";; Loop split.\n");
> +
> + /* Only deal with the first opportunity. */
> + break;
> + }
> +
> + free (bbs);
> + return changed;
> +}
> +
> +/* Main entry point. Perform loop splitting on all suitable loops. */
> +
> +static unsigned int
> +tree_ssa_split_loops (void)
> +{
> + struct loop *loop;
> + bool changed = false;
> +
> + gcc_assert (scev_initialized_p ());
> + FOR_EACH_LOOP (loop, 0)
> + loop->aux = NULL;
> +
> + /* Go through all loops starting from innermost. */
> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> + {
> + struct tree_niter_desc niter;
> + if (loop->aux)
> + {
> + /* If any of our inner loops was split, don't split us,
> + and mark our containing loop as having had splits as well. */
> + loop_outer (loop)->aux = loop;
> + continue;
> + }
> +
> + if (single_exit (loop)
> + /* ??? We could handle non-empty latches when we split
> + the latch edge (not the exit edge), and put the new
> + exit condition in the new block. OTOH this executes some
> + code unconditionally that might have been skipped by the
> + original exit before. */
> + && empty_block_p (loop->latch)
> + && !optimize_loop_for_size_p (loop)
> + && number_of_iterations_exit (loop, single_exit (loop), &niter,
> + false, true)
> + && niter.cmp != ERROR_MARK
> + /* We can't yet handle loops controlled by a != predicate. */
> + && niter.cmp != NE_EXPR)
> + {
> + if (split_loop (loop, &niter))
> + {
> + /* Mark our containing loop as having had some split inner
> + loops. */
> + loop_outer (loop)->aux = loop;
> + changed = true;
> + }
> + }
> + }
> +
> + FOR_EACH_LOOP (loop, 0)
> + loop->aux = NULL;
> +
> + if (changed)
> + return TODO_cleanup_cfg;
> + return 0;
> +}
> +
> +/* Loop splitting pass. */
> +
> +namespace {
> +
> +const pass_data pass_data_loop_split =
> +{
> + GIMPLE_PASS, /* type */
> + "lsplit", /* name */
> + OPTGROUP_LOOP, /* optinfo_flags */
> + TV_LOOP_SPLIT, /* tv_id */
> + PROP_cfg, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + 0, /* todo_flags_finish */
> +};
> +
> +class pass_loop_split : public gimple_opt_pass
> +{
> +public:
> + pass_loop_split (gcc::context *ctxt)
> + : gimple_opt_pass (pass_data_loop_split, ctxt)
> + {}
> +
> + /* opt_pass methods: */
> + virtual bool gate (function *) { return flag_split_loops != 0; }
> + virtual unsigned int execute (function *);
> +
> +}; // class pass_loop_split
> +
> +unsigned int
> +pass_loop_split::execute (function *fun)
> +{
> + if (number_of_loops (fun) <= 1)
> + return 0;
> +
> + return tree_ssa_split_loops ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_loop_split (gcc::context *ctxt)
> +{
> + return new pass_loop_split (ctxt);
> +}
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi (revision 231115)
> +++ doc/invoke.texi (working copy)
> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
> -fselective-scheduling -fselective-scheduling2 @gol
> -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
> -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
> -fsplit-paths @gol
> -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
> -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
> Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
> at level @option{-O1}
>
> +@item -fsplit-loops
> +@opindex fsplit-loops
> +Split a loop into two if it contains a condition that's always true
> +for one side of the iteration space and false for the other.
> +
> @item -funswitch-loops
> @opindex funswitch-loops
> Move branches with loop invariant conditions out of the loop, with duplicates
> Index: doc/passes.texi
> ===================================================================
> --- doc/passes.texi (revision 231115)
> +++ doc/passes.texi (working copy)
> @@ -484,6 +484,12 @@ out of the loops. To achieve this, a du
> each possible outcome of conditional jump(s). The pass is implemented in
> @file{tree-ssa-loop-unswitch.c}.
>
> +Loop splitting. If a loop contains a conditional statement that is
> +always true for one part of the iteration space and false for the other
> +this pass splits the loop into two, one dealing with one side the other
> +only with the other, thereby removing one inner-loop conditional. The
> +pass is implemented in @file{tree-ssa-loop-split.c}.
> +
> The optimizations also use various utility functions contained in
> @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
> @file{cfgloopmanip.c}.
> Index: testsuite/gcc.dg/loop-split.c
> ===================================================================
> --- testsuite/gcc.dg/loop-split.c (revision 0)
> +++ testsuite/gcc.dg/loop-split.c (working copy)
> @@ -0,0 +1,147 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
> +
> +#ifdef __cplusplus
> +extern "C" int printf (const char *, ...);
> +extern "C" void abort (void);
> +#else
> +extern int printf (const char *, ...);
> +extern void abort (void);
> +#endif
> +
> +/* Define TRACE to 1 or 2 to get detailed tracing.
> + Define SINGLE_TEST to 1 or 2 to get a simple routine with
> + just one loop, called only one time or with multiple parameters,
> + to make debugging easier. */
> +#ifndef TRACE
> +#define TRACE 0
> +#endif
> +
> +#define loop(beg,step,beg2,cond1,cond2) \
> + do \
> + { \
> + sum = 0; \
> + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
> + { \
> + if (cond2) { \
> + if (TRACE > 1) printf ("a: %d %d\n", i, j); \
> + sum += a[i]; \
> + } else { \
> + if (TRACE > 1) printf ("b: %d %d\n", i, j); \
> + sum += b[i]; \
> + } \
> + } \
> + if (TRACE > 0) printf ("sum: %d\n", sum); \
> + check = check * 47 + sum; \
> + } while (0)
> +
> +#ifndef SINGLE_TEST
> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
> + int c, int *a, int *b, int beg2)
> +{
> + unsigned check = 0;
> + int sum;
> + int i, j;
> + loop (beg, 1, beg2, i < end, j < c);
> + loop (beg, 1, beg2, i <= end, j < c);
> + loop (beg, 1, beg2, i < end, j <= c);
> + loop (beg, 1, beg2, i <= end, j <= c);
> + loop (beg, 1, beg2, i < end, j > c);
> + loop (beg, 1, beg2, i <= end, j > c);
> + loop (beg, 1, beg2, i < end, j >= c);
> + loop (beg, 1, beg2, i <= end, j >= c);
> + beg2 += end-beg;
> + loop (end, -1, beg2, i >= beg, j >= c);
> + loop (end, -1, beg2, i >= beg, j > c);
> + loop (end, -1, beg2, i > beg, j >= c);
> + loop (end, -1, beg2, i > beg, j > c);
> + loop (end, -1, beg2, i >= beg, j <= c);
> + loop (end, -1, beg2, i >= beg, j < c);
> + loop (end, -1, beg2, i > beg, j <= c);
> + loop (end, -1, beg2, i > beg, j < c);
> + return check;
> +}
> +
> +#else
> +
> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
> + int c, int *a, int *b, int beg2)
> +{
> + int sum = 0;
> + int i, j;
> + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
> + {
> + // i - j == X --> i = X + j
> + // --> i < end == X+j < end == j < end - X
> + // --> newend = end - (i_init - j_init)
> + // j < end-X && j < c --> j < min(end-X,c)
> + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
> + //if (j < c)
> + if (j >= c)
> + printf ("a: %d %d\n", i, j);
> + /*else
> + printf ("b: %d %d\n", i, j);*/
> + /*sum += a[i];
> + else
> + sum += b[i];*/
> + }
> + return sum;
> +}
> +
> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
> + int *c, int *a, int *b, int *beg2)
> +{
> + int sum = 0;
> + int *i, *j;
> + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
> + {
> + if (j <= c)
> + printf ("%d %d\n", i - beg, j - beg);
> + /*sum += a[i];
> + else
> + sum += b[i];*/
> + }
> + return sum;
> +}
> +#endif
> +
> +extern int printf (const char *, ...);
> +
> +int main ()
> +{
> + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
> + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
> + int c;
> + int diff = 0;
> + unsigned check = 0;
> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
> + //dotest (0, 9, 1, -1, a+5, b+5, -1);
> + //return 0;
> + f (0, 9, 1, 5, a+5, b+5, -1);
> + return 0;
> +#endif
> + for (diff = -5; diff <= 5; diff++)
> + {
> + for (c = -1; c <= 10; c++)
> + {
> +#ifdef SINGLE_TEST
> + int s = f (0, 9, 1, c, a+5, b+5, diff);
> + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
> + printf ("%d ", s);
> +#else
> + if (TRACE > 0)
> + printf ("check %d %d\n", c, diff);
> + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
> +#endif
> + }
> + //printf ("\n");
> + }
> + //printf ("%u\n", check);
> + if (check != 3213344948)
> + abort ();
> + return 0;
> +}
> +
> +/* All 16 loops in dotest should be split. */
> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-07-25 20:57 ` Andrew Pinski
@ 2016-07-26 11:32 ` Richard Biener
2016-07-27 6:18 ` Andrew Pinski
0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2016-07-26 11:32 UTC (permalink / raw)
To: Andrew Pinski; +Cc: Michael Matz, Jeff Law, GCC Patches
On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>
>>> > So, okay for trunk?
>>> -ENOPATCH
>>
>> Sigh :)
>> Here it is.
>
>
> I found one problem with it.
> Take:
> void f(int *a, int M, int *b)
> {
> for(int i = 0; i <= M; i++)
> {
> if (i < M)
> a[i] = i;
> }
> }
> ---- CUT ---
> There are two issues with the code as below. The outer most loop's
> aux is still set which causes the vectorizer not to vector the loop.
> The other issue is I need to run pass_scev_cprop after pass_loop_split
> to get the induction variable usage after the loop gone so the
> vectorizer will work.
I think scev_cprop needs to be re-written to an utility so that the vectorizer
itself can (within its own cost-model) eliminate an induction using it.
Richard.
> Something like (note this is copy and paste from a terminal):
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c327900..e8d6ea6 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see
> NEXT_PASS (pass_copy_prop);
> NEXT_PASS (pass_dce);
> NEXT_PASS (pass_tree_unswitch);
> - NEXT_PASS (pass_scev_cprop);
> NEXT_PASS (pass_loop_split);
> + NEXT_PASS (pass_scev_cprop);
> NEXT_PASS (pass_record_bounds);
> NEXT_PASS (pass_loop_distribution);
> NEXT_PASS (pass_copy_prop);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 5411530..e72ef19 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>
> gcc_assert (scev_initialized_p ());
> FOR_EACH_LOOP (loop, 0)
> - loop->aux = NULL;
> + {
> + loop->aux = NULL;
> + if (loop_outer (loop))
> + loop_outer (loop)->aux = NULL;
> + }
How does the iterator not visit loop_outer (loop)?!
>
> /* Go through all loops starting from innermost. */
> FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
> }
>
> FOR_EACH_LOOP (loop, 0)
> - loop->aux = NULL;
> + {
> + loop->aux = NULL;
> + if (loop_outer (loop))
> + loop_outer (loop)->aux = NULL;
> + }
>
> if (changed)
> return TODO_cleanup_cfg;
> ----- CUT -----
>
> Thanks,
> Andrew
>
>
>>
>>
>> Ciao,
>> Michael.
>> * common.opt (-fsplit-loops): New flag.
>> * passes.def (pass_loop_split): Add.
>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>> (enable_fdo_optimizations): Add loop splitting.
>> * timevar.def (TV_LOOP_SPLIT): Add.
>> * tree-pass.h (make_pass_loop_split): Declare.
>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>> * tree-ssa-loop-split.c: New file.
>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>> * doc/invoke.texi (fsplit-loops): Document.
>> * doc/passes.texi (Loop optimization): Add paragraph about loop
>> splitting.
>>
>> testsuite/
>> * gcc.dg/loop-split.c: New test.
>>
>> Index: common.opt
>> ===================================================================
>> --- common.opt (revision 231115)
>> +++ common.opt (working copy)
>> @@ -2453,6 +2457,10 @@ funswitch-loops
>> Common Report Var(flag_unswitch_loops) Optimization
>> Perform loop unswitching.
>>
>> +fsplit-loops
>> +Common Report Var(flag_split_loops) Optimization
>> +Perform loop splitting.
>> +
>> funwind-tables
>> Common Report Var(flag_unwind_tables) Optimization
>> Just generate unwind tables for exception handling.
>> Index: passes.def
>> ===================================================================
>> --- passes.def (revision 231115)
>> +++ passes.def (working copy)
>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>> NEXT_PASS (pass_dce);
>> NEXT_PASS (pass_tree_unswitch);
>> NEXT_PASS (pass_scev_cprop);
>> + NEXT_PASS (pass_loop_split);
>> NEXT_PASS (pass_record_bounds);
>> NEXT_PASS (pass_loop_distribution);
>> NEXT_PASS (pass_copy_prop);
>> Index: opts.c
>> ===================================================================
>> --- opts.c (revision 231115)
>> +++ opts.c (working copy)
>> @@ -532,6 +532,7 @@ static const struct default_options defa
>> regardless of them being declared inline. */
>> { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
>> + { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>> { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>> opts->x_flag_ipa_cp_alignment = value;
>> if (!opts_set->x_flag_predictive_commoning)
>> opts->x_flag_predictive_commoning = value;
>> + if (!opts_set->x_flag_split_loops)
>> + opts->x_flag_split_loops = value;
>> if (!opts_set->x_flag_unswitch_loops)
>> opts->x_flag_unswitch_loops = value;
>> if (!opts_set->x_flag_gcse_after_reload)
>> Index: timevar.def
>> ===================================================================
>> --- timevar.def (revision 231115)
>> +++ timevar.def (working copy)
>> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM , "
>> DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
>> DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
>> DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
>> +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
>> DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
>> DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>> DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
>> Index: tree-pass.h
>> ===================================================================
>> --- tree-pass.h (revision 231115)
>> +++ tree-pass.h (working copy)
>> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>> extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>> extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>> extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>> extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>> extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>> extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>> Index: tree-ssa-loop-manip.h
>> ===================================================================
>> --- tree-ssa-loop-manip.h (revision 231115)
>> +++ tree-ssa-loop-manip.h (working copy)
>> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>>
>> extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>> bool, tree *, tree *);
>> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
>> + struct loop *);
>> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>> extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>> extern void verify_loop_closed_ssa (bool);
>> Index: Makefile.in
>> ===================================================================
>> --- Makefile.in (revision 231115)
>> +++ Makefile.in (working copy)
>> @@ -1474,6 +1474,7 @@ OBJS = \
>> tree-ssa-loop-manip.o \
>> tree-ssa-loop-niter.o \
>> tree-ssa-loop-prefetch.o \
>> + tree-ssa-loop-split.o \
>> tree-ssa-loop-unswitch.o \
>> tree-ssa-loop.o \
>> tree-ssa-math-opts.o \
>> Index: tree-ssa-loop-split.c
>> ===================================================================
>> --- tree-ssa-loop-split.c (revision 0)
>> +++ tree-ssa-loop-split.c (working copy)
>> @@ -0,0 +1,686 @@
>> +/* Loop splitting.
>> + Copyright (C) 2015 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it
>> +under the terms of the GNU General Public License as published by the
>> +Free Software Foundation; either version 3, or (at your option) any
>> +later version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3. If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "backend.h"
>> +#include "tree.h"
>> +#include "gimple.h"
>> +#include "tree-pass.h"
>> +#include "ssa.h"
>> +#include "fold-const.h"
>> +#include "tree-cfg.h"
>> +#include "tree-ssa.h"
>> +#include "tree-ssa-loop-niter.h"
>> +#include "tree-ssa-loop.h"
>> +#include "tree-ssa-loop-manip.h"
>> +#include "tree-into-ssa.h"
>> +#include "cfgloop.h"
>> +#include "tree-scalar-evolution.h"
>> +#include "gimple-iterator.h"
>> +#include "gimple-pretty-print.h"
>> +#include "cfghooks.h"
>> +#include "gimple-fold.h"
>> +#include "gimplify-me.h"
>> +
>> +/* This file implements loop splitting, i.e. transformation of loops like
>> +
>> + for (i = 0; i < 100; i++)
>> + {
>> + if (i < 50)
>> + A;
>> + else
>> + B;
>> + }
>> +
>> + into:
>> +
>> + for (i = 0; i < 50; i++)
>> + {
>> + A;
>> + }
>> + for (; i < 100; i++)
>> + {
>> + B;
>> + }
>> +
>> + */
>> +
>> +/* Return true when BB inside LOOP is a potential iteration space
>> + split point, i.e. ends with a condition like "IV < comp", which
>> + is true on one side of the iteration space and false on the other,
>> + and the split point can be computed. If so, also return the border
>> + point in *BORDER and the comparison induction variable in IV. */
>> +
>> +static tree
>> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
>> +{
>> + gimple *last;
>> + gcond *stmt;
>> + affine_iv iv2;
>> +
>> + /* BB must end in a simple conditional jump. */
>> + last = last_stmt (bb);
>> + if (!last || gimple_code (last) != GIMPLE_COND)
>> + return NULL_TREE;
>> + stmt = as_a <gcond *> (last);
>> +
>> + enum tree_code code = gimple_cond_code (stmt);
>> +
>> + /* Only handle relational comparisons, for equality and non-equality
>> + we'd have to split the loop into two loops and a middle statement. */
>> + switch (code)
>> + {
>> + case LT_EXPR:
>> + case LE_EXPR:
>> + case GT_EXPR:
>> + case GE_EXPR:
>> + break;
>> + default:
>> + return NULL_TREE;
>> + }
>> +
>> + if (loop_exits_from_bb_p (loop, bb))
>> + return NULL_TREE;
>> +
>> + tree op0 = gimple_cond_lhs (stmt);
>> + tree op1 = gimple_cond_rhs (stmt);
>> +
>> + if (!simple_iv (loop, loop, op0, iv, false))
>> + return NULL_TREE;
>> + if (!simple_iv (loop, loop, op1, &iv2, false))
>> + return NULL_TREE;
>> +
>> + /* Make it so, that the first argument of the condition is
>> + the looping one (only swap. */
>> + if (!integer_zerop (iv2.step))
>> + {
>> + std::swap (op0, op1);
>> + std::swap (*iv, iv2);
>> + code = swap_tree_comparison (code);
>> + gimple_cond_set_condition (stmt, code, op0, op1);
>> + update_stmt (stmt);
>> + }
>> + else if (integer_zerop (iv->step))
>> + return NULL_TREE;
>> + if (!integer_zerop (iv2.step))
>> + return NULL_TREE;
>> +
>> + if (dump_file && (dump_flags & TDF_DETAILS))
>> + {
>> + fprintf (dump_file, "Found potential split point: ");
>> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>> + fprintf (dump_file, " { ");
>> + print_generic_expr (dump_file, iv->base, TDF_SLIM);
>> + fprintf (dump_file, " + I*");
>> + print_generic_expr (dump_file, iv->step, TDF_SLIM);
>> + fprintf (dump_file, " } %s ", get_tree_code_name (code));
>> + print_generic_expr (dump_file, iv2.base, TDF_SLIM);
>> + fprintf (dump_file, "\n");
>> + }
>> +
>> + *border = iv2.base;
>> + return op0;
>> +}
>> +
>> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
>> + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
>> + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
>> + exit test statement to loop back only if the GUARD statement will
>> + also be true/false in the next iteration. */
>> +
>> +static void
>> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
>> + bool initial_true)
>> +{
>> + edge exit = single_exit (loop);
>> + gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
>> + gimple_cond_set_condition (stmt, gimple_cond_code (guard),
>> + nextval, newbound);
>> + update_stmt (stmt);
>> +
>> + edge stay = single_pred_edge (loop->latch);
>> +
>> + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>> + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>> +
>> + if (initial_true)
>> + {
>> + exit->flags |= EDGE_FALSE_VALUE;
>> + stay->flags |= EDGE_TRUE_VALUE;
>> + }
>> + else
>> + {
>> + exit->flags |= EDGE_TRUE_VALUE;
>> + stay->flags |= EDGE_FALSE_VALUE;
>> + }
>> +}
>> +
>> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
>> + find the loop phi node in LOOP defining it directly, or create
>> + such phi node. Return that phi node. */
>> +
>> +static gphi *
>> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
>> +{
>> + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
>> + gphi *phi;
>> + if ((phi = dyn_cast <gphi *> (def))
>> + && gimple_bb (phi) == loop->header)
>> + return phi;
>> +
>> + /* XXX Create the PHI instead. */
>> + return NULL;
>> +}
>> +
>> +/* This function updates the SSA form after connect_loops made a new
>> + edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
>> + conditional). I.e. the second loop can now be entered either
>> + via the original entry or via NEW_E, so the entry values of LOOP2
>> + phi nodes are either the original ones or those at the exit
>> + of LOOP1. Insert new phi nodes in LOOP2 pre-header reflecting
>> + this. */
>> +
>> +static void
>> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
>> +{
>> + basic_block rest = loop_preheader_edge (loop2)->src;
>> + gcc_assert (new_e->dest == rest);
>> + edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
>> +
>> + edge firste = loop_preheader_edge (loop1);
>> + edge seconde = loop_preheader_edge (loop2);
>> + edge firstn = loop_latch_edge (loop1);
>> + gphi_iterator psi_first, psi_second;
>> + for (psi_first = gsi_start_phis (loop1->header),
>> + psi_second = gsi_start_phis (loop2->header);
>> + !gsi_end_p (psi_first);
>> + gsi_next (&psi_first), gsi_next (&psi_second))
>> + {
>> + tree init, next, new_init;
>> + use_operand_p op;
>> + gphi *phi_first = psi_first.phi ();
>> + gphi *phi_second = psi_second.phi ();
>> +
>> + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>> + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>> + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>> + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>> +
>> + /* Prefer using original variable as a base for the new ssa name.
>> + This is necessary for virtual ops, and useful in order to avoid
>> + losing debug info for real ops. */
>> + if (TREE_CODE (next) == SSA_NAME
>> + && useless_type_conversion_p (TREE_TYPE (next),
>> + TREE_TYPE (init)))
>> + new_init = copy_ssa_name (next);
>> + else if (TREE_CODE (init) == SSA_NAME
>> + && useless_type_conversion_p (TREE_TYPE (init),
>> + TREE_TYPE (next)))
>> + new_init = copy_ssa_name (init);
>> + else if (useless_type_conversion_p (TREE_TYPE (next),
>> + TREE_TYPE (init)))
>> + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>> + "unrinittmp");
>> + else
>> + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>> + "unrinittmp");
>> +
>> + gphi * newphi = create_phi_node (new_init, rest);
>> + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>> + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>> + SET_USE (op, new_init);
>> + }
>> +}
>> +
>> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
>> + they are still equivalent and placed in two arms of a diamond, like so:
>> +
>> + .------if (cond)------.
>> + v v
>> + pre1 pre2
>> + | |
>> + .--->h1 h2<----.
>> + | | | |
>> + | ex1---. .---ex2 |
>> + | / | | \ |
>> + '---l1 X | l2---'
>> + | |
>> + | |
>> + '--->join<---'
>> +
>> + This function transforms the program such that LOOP1 is conditionally
>> + falling through to LOOP2, or skipping it. This is done by splitting
>> + the ex1->join edge at X in the diagram above, and inserting a condition
>> + whose one arm goes to pre2, resulting in this situation:
>> +
>> + .------if (cond)------.
>> + v v
>> + pre1 .---------->pre2
>> + | | |
>> + .--->h1 | h2<----.
>> + | | | | |
>> + | ex1---. | .---ex2 |
>> + | / v | | \ |
>> + '---l1 skip---' | l2---'
>> + | |
>> + | |
>> + '--->join<---'
>> +
>> +
>> + The condition used is the exit condition of LOOP1, which effectively means
>> + that when the first loop exits (for whatever reason) but the real original
>> + exit expression is still false the second loop will be entered.
>> + The function returns the new edge cond->pre2.
>> +
>> + This doesn't update the SSA form, see connect_loop_phis for that. */
>> +
>> +static edge
>> +connect_loops (struct loop *loop1, struct loop *loop2)
>> +{
>> + edge exit = single_exit (loop1);
>> + basic_block skip_bb = split_edge (exit);
>> + gcond *skip_stmt;
>> + gimple_stmt_iterator gsi;
>> + edge new_e, skip_e;
>> +
>> + gimple *stmt = last_stmt (exit->src);
>> + skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
>> + gimple_cond_lhs (stmt),
>> + gimple_cond_rhs (stmt),
>> + NULL_TREE, NULL_TREE);
>> + gsi = gsi_last_bb (skip_bb);
>> + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
>> +
>> + skip_e = EDGE_SUCC (skip_bb, 0);
>> + skip_e->flags &= ~EDGE_FALLTHRU;
>> + new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
>> + if (exit->flags & EDGE_TRUE_VALUE)
>> + {
>> + skip_e->flags |= EDGE_TRUE_VALUE;
>> + new_e->flags |= EDGE_FALSE_VALUE;
>> + }
>> + else
>> + {
>> + skip_e->flags |= EDGE_FALSE_VALUE;
>> + new_e->flags |= EDGE_TRUE_VALUE;
>> + }
>> +
>> + new_e->count = skip_bb->count;
>> + new_e->probability = PROB_LIKELY;
>> + new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
>> + skip_e->count -= new_e->count;
>> + skip_e->probability = inverse_probability (PROB_LIKELY);
>> +
>> + return new_e;
>> +}
>> +
>> +/* This returns the new bound for iterations given the original iteration
>> + space in NITER, an arbitrary new bound BORDER, assumed to be some
>> + comparison value with a different IV, the initial value GUARD_INIT of
>> + that other IV, and the comparison code GUARD_CODE that compares
>> + that other IV with BORDER. We return an SSA name, and place any
>> + necessary statements for that computation into *STMTS.
>> +
>> + For example for such a loop:
>> +
>> + for (i = beg, j = guard_init; i < end; i++, j++)
>> + if (j < border) // this is supposed to be true/false
>> + ...
>> +
>> + we want to return a new bound (on j) that makes the loop iterate
>> + as long as the condition j < border stays true. We also don't want
>> + to iterate more often than the original loop, so we have to introduce
>> + some cut-off as well (via min/max), effectively resulting in:
>> +
>> + newend = min (end+guard_init-beg, border)
>> + for (i = beg; j = guard_init; j < newend; i++, j++)
>> + if (j < c)
>> + ...
>> +
>> + Depending on the direction of the IVs and if the exit tests
>> + are strict or non-strict we need to use MIN or MAX,
>> + and add or subtract 1. This routine computes newend above. */
>> +
>> +static tree
>> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>> + tree border,
>> + enum tree_code guard_code, tree guard_init)
>> +{
>> + /* The niter structure contains the after-increment IV, we need
>> + the loop-enter base, so subtract STEP once. */
>> + tree controlbase = force_gimple_operand (niter->control.base,
>> + stmts, true, NULL_TREE);
>> + tree controlstep = niter->control.step;
>> + tree enddiff;
>> + if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>> + {
>> + controlstep = gimple_build (stmts, NEGATE_EXPR,
>> + TREE_TYPE (controlstep), controlstep);
>> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>> + TREE_TYPE (controlbase),
>> + controlbase, controlstep);
>> + }
>> + else
>> + enddiff = gimple_build (stmts, MINUS_EXPR,
>> + TREE_TYPE (controlbase),
>> + controlbase, controlstep);
>> +
>> + /* Compute beg-guard_init. */
>> + if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
>> + {
>> + tree tem = gimple_convert (stmts, sizetype, guard_init);
>> + tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
>> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>> + TREE_TYPE (enddiff),
>> + enddiff, tem);
>> + }
>> + else
>> + enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>> + enddiff, guard_init);
>> +
>> + /* Compute end-(beg-guard_init). */
>> + gimple_seq stmts2;
>> + tree newbound = force_gimple_operand (niter->bound, &stmts2,
>> + true, NULL_TREE);
>> + gimple_seq_add_seq_without_update (stmts, stmts2);
>> +
>> + if (POINTER_TYPE_P (TREE_TYPE (enddiff))
>> + || POINTER_TYPE_P (TREE_TYPE (newbound)))
>> + {
>> + enddiff = gimple_convert (stmts, sizetype, enddiff);
>> + enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
>> + newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
>> + TREE_TYPE (newbound),
>> + newbound, enddiff);
>> + }
>> + else
>> + newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>> + newbound, enddiff);
>> +
>> + /* Depending on the direction of the IVs the new bound for the first
>> + loop is the minimum or maximum of old bound and border.
>> + Also, if the guard condition isn't strictly less or greater,
>> + we need to adjust the bound. */
>> + int addbound = 0;
>> + enum tree_code minmax;
>> + if (niter->cmp == LT_EXPR)
>> + {
>> + /* GT and LE are the same, inverted. */
>> + if (guard_code == GT_EXPR || guard_code == LE_EXPR)
>> + addbound = -1;
>> + minmax = MIN_EXPR;
>> + }
>> + else
>> + {
>> + gcc_assert (niter->cmp == GT_EXPR);
>> + if (guard_code == GE_EXPR || guard_code == LT_EXPR)
>> + addbound = 1;
>> + minmax = MAX_EXPR;
>> + }
>> +
>> + if (addbound)
>> + {
>> + tree type2 = TREE_TYPE (newbound);
>> + if (POINTER_TYPE_P (type2))
>> + type2 = sizetype;
>> + newbound = gimple_build (stmts,
>> + POINTER_TYPE_P (TREE_TYPE (newbound))
>> + ? POINTER_PLUS_EXPR : PLUS_EXPR,
>> + TREE_TYPE (newbound),
>> + newbound,
>> + build_int_cst (type2, addbound));
>> + }
>> +
>> + tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
>> + border, newbound);
>> + return newend;
>> +}
>> +
>> +/* Checks if LOOP contains an conditional block whose condition
>> + depends on which side in the iteration space it is, and if so
>> + splits the iteration space into two loops. Returns true if the
>> + loop was split. NITER must contain the iteration descriptor for the
>> + single exit of LOOP. */
>> +
>> +static bool
>> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
>> +{
>> + basic_block *bbs;
>> + unsigned i;
>> + bool changed = false;
>> + tree guard_iv;
>> + tree border;
>> + affine_iv iv;
>> +
>> + bbs = get_loop_body (loop1);
>> +
>> + /* Find a splitting opportunity. */
>> + for (i = 0; i < loop1->num_nodes; i++)
>> + if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>> + {
>> + /* Handling opposite steps is not implemented yet. Neither
>> + is handling different step sizes. */
>> + if ((tree_int_cst_sign_bit (iv.step)
>> + != tree_int_cst_sign_bit (niter->control.step))
>> + || !tree_int_cst_equal (iv.step, niter->control.step))
>> + continue;
>> +
>> + /* Find a loop PHI node that defines guard_iv directly,
>> + or create one doing that. */
>> + gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
>> + if (!phi)
>> + continue;
>> + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
>> + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
>> + loop_preheader_edge (loop1));
>> + enum tree_code guard_code = gimple_cond_code (guard_stmt);
>> +
>> + /* Loop splitting is implemented by versioning the loop, placing
>> + the new loop after the old loop, make the first loop iterate
>> + as long as the conditional stays true (or false) and let the
>> + second (new) loop handle the rest of the iterations.
>> +
>> + First we need to determine if the condition will start being true
>> + or false in the first loop. */
>> + bool initial_true;
>> + switch (guard_code)
>> + {
>> + case LT_EXPR:
>> + case LE_EXPR:
>> + initial_true = !tree_int_cst_sign_bit (iv.step);
>> + break;
>> + case GT_EXPR:
>> + case GE_EXPR:
>> + initial_true = tree_int_cst_sign_bit (iv.step);
>> + break;
>> + default:
>> + gcc_unreachable ();
>> + }
>> +
>> + /* Build a condition that will skip the first loop when the
>> + guard condition won't ever be true (or false). */
>> + gimple_seq stmts2;
>> + border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
>> + if (stmts2)
>> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>> + stmts2);
>> + tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>> + if (!initial_true)
>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +
>> + /* Now version the loop, placing loop2 after loop1 connecting
>> + them, and fix up SSA form for that. */
>> + initialize_original_copy_tables ();
>> + basic_block cond_bb;
>> + struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
>> + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>> + REG_BR_PROB_BASE, true);
>> + gcc_assert (loop2);
>> + update_ssa (TODO_update_ssa);
>> +
>> + edge new_e = connect_loops (loop1, loop2);
>> + connect_loop_phis (loop1, loop2, new_e);
>> +
>> + /* The iterations of the second loop is now already
>> + exactly those that the first loop didn't do, but the
>> + iteration space of the first loop is still the original one.
>> + Compute the new bound for the guarding IV and patch the
>> + loop exit to use it instead of original IV and bound. */
>> + gimple_seq stmts = NULL;
>> + tree newend = compute_new_first_bound (&stmts, niter, border,
>> + guard_code, guard_init);
>> + if (stmts)
>> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>> + stmts);
>> + tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>> + patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>> +
>> + /* Finally patch out the two copies of the condition to be always
>> + true/false (or opposite). */
>> + gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
>> + gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
>> + if (!initial_true)
>> + std::swap (force_true, force_false);
>> + gimple_cond_make_true (force_true);
>> + gimple_cond_make_false (force_false);
>> + update_stmt (force_true);
>> + update_stmt (force_false);
>> +
>> + free_original_copy_tables ();
>> +
>> + /* We destroyed LCSSA form above. Eventually we might be able
>> + to fix it on the fly, for now simply punt and use the helper. */
>> + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>> +
>> + changed = true;
>> + if (dump_file && (dump_flags & TDF_DETAILS))
>> + fprintf (dump_file, ";; Loop split.\n");
>> +
>> + /* Only deal with the first opportunity. */
>> + break;
>> + }
>> +
>> + free (bbs);
>> + return changed;
>> +}
>> +
>> +/* Main entry point. Perform loop splitting on all suitable loops. */
>> +
>> +static unsigned int
>> +tree_ssa_split_loops (void)
>> +{
>> + struct loop *loop;
>> + bool changed = false;
>> +
>> + gcc_assert (scev_initialized_p ());
>> + FOR_EACH_LOOP (loop, 0)
>> + loop->aux = NULL;
>> +
>> + /* Go through all loops starting from innermost. */
>> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>> + {
>> + struct tree_niter_desc niter;
>> + if (loop->aux)
>> + {
>> + /* If any of our inner loops was split, don't split us,
>> + and mark our containing loop as having had splits as well. */
>> + loop_outer (loop)->aux = loop;
>> + continue;
>> + }
>> +
>> + if (single_exit (loop)
>> + /* ??? We could handle non-empty latches when we split
>> + the latch edge (not the exit edge), and put the new
>> + exit condition in the new block. OTOH this executes some
>> + code unconditionally that might have been skipped by the
>> + original exit before. */
>> + && empty_block_p (loop->latch)
>> + && !optimize_loop_for_size_p (loop)
>> + && number_of_iterations_exit (loop, single_exit (loop), &niter,
>> + false, true)
>> + && niter.cmp != ERROR_MARK
>> + /* We can't yet handle loops controlled by a != predicate. */
>> + && niter.cmp != NE_EXPR)
>> + {
>> + if (split_loop (loop, &niter))
>> + {
>> + /* Mark our containing loop as having had some split inner
>> + loops. */
>> + loop_outer (loop)->aux = loop;
>> + changed = true;
>> + }
>> + }
>> + }
>> +
>> + FOR_EACH_LOOP (loop, 0)
>> + loop->aux = NULL;
>> +
>> + if (changed)
>> + return TODO_cleanup_cfg;
>> + return 0;
>> +}
>> +
>> +/* Loop splitting pass. */
>> +
>> +namespace {
>> +
>> +const pass_data pass_data_loop_split =
>> +{
>> + GIMPLE_PASS, /* type */
>> + "lsplit", /* name */
>> + OPTGROUP_LOOP, /* optinfo_flags */
>> + TV_LOOP_SPLIT, /* tv_id */
>> + PROP_cfg, /* properties_required */
>> + 0, /* properties_provided */
>> + 0, /* properties_destroyed */
>> + 0, /* todo_flags_start */
>> + 0, /* todo_flags_finish */
>> +};
>> +
>> +class pass_loop_split : public gimple_opt_pass
>> +{
>> +public:
>> + pass_loop_split (gcc::context *ctxt)
>> + : gimple_opt_pass (pass_data_loop_split, ctxt)
>> + {}
>> +
>> + /* opt_pass methods: */
>> + virtual bool gate (function *) { return flag_split_loops != 0; }
>> + virtual unsigned int execute (function *);
>> +
>> +}; // class pass_loop_split
>> +
>> +unsigned int
>> +pass_loop_split::execute (function *fun)
>> +{
>> + if (number_of_loops (fun) <= 1)
>> + return 0;
>> +
>> + return tree_ssa_split_loops ();
>> +}
>> +
>> +} // anon namespace
>> +
>> +gimple_opt_pass *
>> +make_pass_loop_split (gcc::context *ctxt)
>> +{
>> + return new pass_loop_split (ctxt);
>> +}
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi (revision 231115)
>> +++ doc/invoke.texi (working copy)
>> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>> -fselective-scheduling -fselective-scheduling2 @gol
>> -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>> -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
>> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
>> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>> -fsplit-paths @gol
>> -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>> -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
>> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>> Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
>> at level @option{-O1}
>>
>> +@item -fsplit-loops
>> +@opindex fsplit-loops
>> +Split a loop into two if it contains a condition that's always true
>> +for one side of the iteration space and false for the other.
>> +
>> @item -funswitch-loops
>> @opindex funswitch-loops
>> Move branches with loop invariant conditions out of the loop, with duplicates
>> Index: doc/passes.texi
>> ===================================================================
>> --- doc/passes.texi (revision 231115)
>> +++ doc/passes.texi (working copy)
>> @@ -484,6 +484,12 @@ out of the loops. To achieve this, a du
>> each possible outcome of conditional jump(s). The pass is implemented in
>> @file{tree-ssa-loop-unswitch.c}.
>>
>> +Loop splitting. If a loop contains a conditional statement that is
>> +always true for one part of the iteration space and false for the other
>> +this pass splits the loop into two, one dealing with one side the other
>> +only with the other, thereby removing one inner-loop conditional. The
>> +pass is implemented in @file{tree-ssa-loop-split.c}.
>> +
>> The optimizations also use various utility functions contained in
>> @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>> @file{cfgloopmanip.c}.
>> Index: testsuite/gcc.dg/loop-split.c
>> ===================================================================
>> --- testsuite/gcc.dg/loop-split.c (revision 0)
>> +++ testsuite/gcc.dg/loop-split.c (working copy)
>> @@ -0,0 +1,147 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>> +
>> +#ifdef __cplusplus
>> +extern "C" int printf (const char *, ...);
>> +extern "C" void abort (void);
>> +#else
>> +extern int printf (const char *, ...);
>> +extern void abort (void);
>> +#endif
>> +
>> +/* Define TRACE to 1 or 2 to get detailed tracing.
>> + Define SINGLE_TEST to 1 or 2 to get a simple routine with
>> + just one loop, called only one time or with multiple parameters,
>> + to make debugging easier. */
>> +#ifndef TRACE
>> +#define TRACE 0
>> +#endif
>> +
>> +#define loop(beg,step,beg2,cond1,cond2) \
>> + do \
>> + { \
>> + sum = 0; \
>> + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
>> + { \
>> + if (cond2) { \
>> + if (TRACE > 1) printf ("a: %d %d\n", i, j); \
>> + sum += a[i]; \
>> + } else { \
>> + if (TRACE > 1) printf ("b: %d %d\n", i, j); \
>> + sum += b[i]; \
>> + } \
>> + } \
>> + if (TRACE > 0) printf ("sum: %d\n", sum); \
>> + check = check * 47 + sum; \
>> + } while (0)
>> +
>> +#ifndef SINGLE_TEST
>> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
>> + int c, int *a, int *b, int beg2)
>> +{
>> + unsigned check = 0;
>> + int sum;
>> + int i, j;
>> + loop (beg, 1, beg2, i < end, j < c);
>> + loop (beg, 1, beg2, i <= end, j < c);
>> + loop (beg, 1, beg2, i < end, j <= c);
>> + loop (beg, 1, beg2, i <= end, j <= c);
>> + loop (beg, 1, beg2, i < end, j > c);
>> + loop (beg, 1, beg2, i <= end, j > c);
>> + loop (beg, 1, beg2, i < end, j >= c);
>> + loop (beg, 1, beg2, i <= end, j >= c);
>> + beg2 += end-beg;
>> + loop (end, -1, beg2, i >= beg, j >= c);
>> + loop (end, -1, beg2, i >= beg, j > c);
>> + loop (end, -1, beg2, i > beg, j >= c);
>> + loop (end, -1, beg2, i > beg, j > c);
>> + loop (end, -1, beg2, i >= beg, j <= c);
>> + loop (end, -1, beg2, i >= beg, j < c);
>> + loop (end, -1, beg2, i > beg, j <= c);
>> + loop (end, -1, beg2, i > beg, j < c);
>> + return check;
>> +}
>> +
>> +#else
>> +
>> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
>> + int c, int *a, int *b, int beg2)
>> +{
>> + int sum = 0;
>> + int i, j;
>> + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>> + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
>> + {
>> + // i - j == X --> i = X + j
>> + // --> i < end == X+j < end == j < end - X
>> + // --> newend = end - (i_init - j_init)
>> + // j < end-X && j < c --> j < min(end-X,c)
>> + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
>> + //if (j < c)
>> + if (j >= c)
>> + printf ("a: %d %d\n", i, j);
>> + /*else
>> + printf ("b: %d %d\n", i, j);*/
>> + /*sum += a[i];
>> + else
>> + sum += b[i];*/
>> + }
>> + return sum;
>> +}
>> +
>> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
>> + int *c, int *a, int *b, int *beg2)
>> +{
>> + int sum = 0;
>> + int *i, *j;
>> + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>> + {
>> + if (j <= c)
>> + printf ("%d %d\n", i - beg, j - beg);
>> + /*sum += a[i];
>> + else
>> + sum += b[i];*/
>> + }
>> + return sum;
>> +}
>> +#endif
>> +
>> +extern int printf (const char *, ...);
>> +
>> +int main ()
>> +{
>> + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
>> + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
>> + int c;
>> + int diff = 0;
>> + unsigned check = 0;
>> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
>> + //dotest (0, 9, 1, -1, a+5, b+5, -1);
>> + //return 0;
>> + f (0, 9, 1, 5, a+5, b+5, -1);
>> + return 0;
>> +#endif
>> + for (diff = -5; diff <= 5; diff++)
>> + {
>> + for (c = -1; c <= 10; c++)
>> + {
>> +#ifdef SINGLE_TEST
>> + int s = f (0, 9, 1, c, a+5, b+5, diff);
>> + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
>> + printf ("%d ", s);
>> +#else
>> + if (TRACE > 0)
>> + printf ("check %d %d\n", c, diff);
>> + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
>> +#endif
>> + }
>> + //printf ("\n");
>> + }
>> + //printf ("%u\n", check);
>> + if (check != 3213344948)
>> + abort ();
>> + return 0;
>> +}
>> +
>> +/* All 16 loops in dotest should be split. */
>> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-07-26 11:32 ` Richard Biener
@ 2016-07-27 6:18 ` Andrew Pinski
2016-07-27 8:11 ` Richard Biener
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Pinski @ 2016-07-27 6:18 UTC (permalink / raw)
To: Richard Biener; +Cc: Michael Matz, Jeff Law, GCC Patches
On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>>
>>>> > So, okay for trunk?
>>>> -ENOPATCH
>>>
>>> Sigh :)
>>> Here it is.
>>
>>
>> I found one problem with it.
>> Take:
>> void f(int *a, int M, int *b)
>> {
>> for(int i = 0; i <= M; i++)
>> {
>> if (i < M)
>> a[i] = i;
>> }
>> }
>> ---- CUT ---
>> There are two issues with the code as below. The outer most loop's
>> aux is still set which causes the vectorizer not to vector the loop.
>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>> to get the induction variable usage after the loop gone so the
>> vectorizer will work.
>
> I think scev_cprop needs to be re-written to an utility so that the vectorizer
> itself can (within its own cost-model) eliminate an induction using it.
>
> Richard.
>
>> Something like (note this is copy and paste from a terminal):
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index c327900..e8d6ea6 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see
>> NEXT_PASS (pass_copy_prop);
>> NEXT_PASS (pass_dce);
>> NEXT_PASS (pass_tree_unswitch);
>> - NEXT_PASS (pass_scev_cprop);
>> NEXT_PASS (pass_loop_split);
>> + NEXT_PASS (pass_scev_cprop);
>> NEXT_PASS (pass_record_bounds);
>> NEXT_PASS (pass_loop_distribution);
>> NEXT_PASS (pass_copy_prop);
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index 5411530..e72ef19 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>
>> gcc_assert (scev_initialized_p ());
>> FOR_EACH_LOOP (loop, 0)
>> - loop->aux = NULL;
>> + {
>> + loop->aux = NULL;
>> + if (loop_outer (loop))
>> + loop_outer (loop)->aux = NULL;
>> + }
>
> How does the iterator not visit loop_outer (loop)?!
The iterator with flags of 0 does not visit the the root. So the way
to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
we zero out the root too.
Thanks,
Andrew
>
>>
>> /* Go through all loops starting from innermost. */
>> FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>> }
>>
>> FOR_EACH_LOOP (loop, 0)
>> - loop->aux = NULL;
>> + {
>> + loop->aux = NULL;
>> + if (loop_outer (loop))
>> + loop_outer (loop)->aux = NULL;
>> + }
>>
>> if (changed)
>> return TODO_cleanup_cfg;
>> ----- CUT -----
>>
>> Thanks,
>> Andrew
>>
>>
>>>
>>>
>>> Ciao,
>>> Michael.
>>> * common.opt (-fsplit-loops): New flag.
>>> * passes.def (pass_loop_split): Add.
>>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>> (enable_fdo_optimizations): Add loop splitting.
>>> * timevar.def (TV_LOOP_SPLIT): Add.
>>> * tree-pass.h (make_pass_loop_split): Declare.
>>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>> * tree-ssa-loop-split.c: New file.
>>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>> * doc/invoke.texi (fsplit-loops): Document.
>>> * doc/passes.texi (Loop optimization): Add paragraph about loop
>>> splitting.
>>>
>>> testsuite/
>>> * gcc.dg/loop-split.c: New test.
>>>
>>> Index: common.opt
>>> ===================================================================
>>> --- common.opt (revision 231115)
>>> +++ common.opt (working copy)
>>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>> Common Report Var(flag_unswitch_loops) Optimization
>>> Perform loop unswitching.
>>>
>>> +fsplit-loops
>>> +Common Report Var(flag_split_loops) Optimization
>>> +Perform loop splitting.
>>> +
>>> funwind-tables
>>> Common Report Var(flag_unwind_tables) Optimization
>>> Just generate unwind tables for exception handling.
>>> Index: passes.def
>>> ===================================================================
>>> --- passes.def (revision 231115)
>>> +++ passes.def (working copy)
>>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>> NEXT_PASS (pass_dce);
>>> NEXT_PASS (pass_tree_unswitch);
>>> NEXT_PASS (pass_scev_cprop);
>>> + NEXT_PASS (pass_loop_split);
>>> NEXT_PASS (pass_record_bounds);
>>> NEXT_PASS (pass_loop_distribution);
>>> NEXT_PASS (pass_copy_prop);
>>> Index: opts.c
>>> ===================================================================
>>> --- opts.c (revision 231115)
>>> +++ opts.c (working copy)
>>> @@ -532,6 +532,7 @@ static const struct default_options defa
>>> regardless of them being declared inline. */
>>> { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>>> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
>>> + { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>>> { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>>> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>>> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>>> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>>> opts->x_flag_ipa_cp_alignment = value;
>>> if (!opts_set->x_flag_predictive_commoning)
>>> opts->x_flag_predictive_commoning = value;
>>> + if (!opts_set->x_flag_split_loops)
>>> + opts->x_flag_split_loops = value;
>>> if (!opts_set->x_flag_unswitch_loops)
>>> opts->x_flag_unswitch_loops = value;
>>> if (!opts_set->x_flag_gcse_after_reload)
>>> Index: timevar.def
>>> ===================================================================
>>> --- timevar.def (revision 231115)
>>> +++ timevar.def (working copy)
>>> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM , "
>>> DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
>>> DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
>>> DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
>>> +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
>>> DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
>>> DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>>> DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
>>> Index: tree-pass.h
>>> ===================================================================
>>> --- tree-pass.h (revision 231115)
>>> +++ tree-pass.h (working copy)
>>> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>>> extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>>> extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>>> extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>>> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>>> extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>>> extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>>> extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>>> Index: tree-ssa-loop-manip.h
>>> ===================================================================
>>> --- tree-ssa-loop-manip.h (revision 231115)
>>> +++ tree-ssa-loop-manip.h (working copy)
>>> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>>>
>>> extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>>> bool, tree *, tree *);
>>> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
>>> + struct loop *);
>>> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>>> extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>>> extern void verify_loop_closed_ssa (bool);
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 231115)
>>> +++ Makefile.in (working copy)
>>> @@ -1474,6 +1474,7 @@ OBJS = \
>>> tree-ssa-loop-manip.o \
>>> tree-ssa-loop-niter.o \
>>> tree-ssa-loop-prefetch.o \
>>> + tree-ssa-loop-split.o \
>>> tree-ssa-loop-unswitch.o \
>>> tree-ssa-loop.o \
>>> tree-ssa-math-opts.o \
>>> Index: tree-ssa-loop-split.c
>>> ===================================================================
>>> --- tree-ssa-loop-split.c (revision 0)
>>> +++ tree-ssa-loop-split.c (working copy)
>>> @@ -0,0 +1,686 @@
>>> +/* Loop splitting.
>>> + Copyright (C) 2015 Free Software Foundation, Inc.
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it
>>> +under the terms of the GNU General Public License as published by the
>>> +Free Software Foundation; either version 3, or (at your option) any
>>> +later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3. If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "backend.h"
>>> +#include "tree.h"
>>> +#include "gimple.h"
>>> +#include "tree-pass.h"
>>> +#include "ssa.h"
>>> +#include "fold-const.h"
>>> +#include "tree-cfg.h"
>>> +#include "tree-ssa.h"
>>> +#include "tree-ssa-loop-niter.h"
>>> +#include "tree-ssa-loop.h"
>>> +#include "tree-ssa-loop-manip.h"
>>> +#include "tree-into-ssa.h"
>>> +#include "cfgloop.h"
>>> +#include "tree-scalar-evolution.h"
>>> +#include "gimple-iterator.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "cfghooks.h"
>>> +#include "gimple-fold.h"
>>> +#include "gimplify-me.h"
>>> +
>>> +/* This file implements loop splitting, i.e. transformation of loops like
>>> +
>>> + for (i = 0; i < 100; i++)
>>> + {
>>> + if (i < 50)
>>> + A;
>>> + else
>>> + B;
>>> + }
>>> +
>>> + into:
>>> +
>>> + for (i = 0; i < 50; i++)
>>> + {
>>> + A;
>>> + }
>>> + for (; i < 100; i++)
>>> + {
>>> + B;
>>> + }
>>> +
>>> + */
>>> +
>>> +/* Return true when BB inside LOOP is a potential iteration space
>>> + split point, i.e. ends with a condition like "IV < comp", which
>>> + is true on one side of the iteration space and false on the other,
>>> + and the split point can be computed. If so, also return the border
>>> + point in *BORDER and the comparison induction variable in IV. */
>>> +
>>> +static tree
>>> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
>>> +{
>>> + gimple *last;
>>> + gcond *stmt;
>>> + affine_iv iv2;
>>> +
>>> + /* BB must end in a simple conditional jump. */
>>> + last = last_stmt (bb);
>>> + if (!last || gimple_code (last) != GIMPLE_COND)
>>> + return NULL_TREE;
>>> + stmt = as_a <gcond *> (last);
>>> +
>>> + enum tree_code code = gimple_cond_code (stmt);
>>> +
>>> + /* Only handle relational comparisons, for equality and non-equality
>>> + we'd have to split the loop into two loops and a middle statement. */
>>> + switch (code)
>>> + {
>>> + case LT_EXPR:
>>> + case LE_EXPR:
>>> + case GT_EXPR:
>>> + case GE_EXPR:
>>> + break;
>>> + default:
>>> + return NULL_TREE;
>>> + }
>>> +
>>> + if (loop_exits_from_bb_p (loop, bb))
>>> + return NULL_TREE;
>>> +
>>> + tree op0 = gimple_cond_lhs (stmt);
>>> + tree op1 = gimple_cond_rhs (stmt);
>>> +
>>> + if (!simple_iv (loop, loop, op0, iv, false))
>>> + return NULL_TREE;
>>> + if (!simple_iv (loop, loop, op1, &iv2, false))
>>> + return NULL_TREE;
>>> +
>>> + /* Make it so, that the first argument of the condition is
>>> + the looping one (only swap. */
>>> + if (!integer_zerop (iv2.step))
>>> + {
>>> + std::swap (op0, op1);
>>> + std::swap (*iv, iv2);
>>> + code = swap_tree_comparison (code);
>>> + gimple_cond_set_condition (stmt, code, op0, op1);
>>> + update_stmt (stmt);
>>> + }
>>> + else if (integer_zerop (iv->step))
>>> + return NULL_TREE;
>>> + if (!integer_zerop (iv2.step))
>>> + return NULL_TREE;
>>> +
>>> + if (dump_file && (dump_flags & TDF_DETAILS))
>>> + {
>>> + fprintf (dump_file, "Found potential split point: ");
>>> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>> + fprintf (dump_file, " { ");
>>> + print_generic_expr (dump_file, iv->base, TDF_SLIM);
>>> + fprintf (dump_file, " + I*");
>>> + print_generic_expr (dump_file, iv->step, TDF_SLIM);
>>> + fprintf (dump_file, " } %s ", get_tree_code_name (code));
>>> + print_generic_expr (dump_file, iv2.base, TDF_SLIM);
>>> + fprintf (dump_file, "\n");
>>> + }
>>> +
>>> + *border = iv2.base;
>>> + return op0;
>>> +}
>>> +
>>> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
>>> + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
>>> + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
>>> + exit test statement to loop back only if the GUARD statement will
>>> + also be true/false in the next iteration. */
>>> +
>>> +static void
>>> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
>>> + bool initial_true)
>>> +{
>>> + edge exit = single_exit (loop);
>>> + gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
>>> + gimple_cond_set_condition (stmt, gimple_cond_code (guard),
>>> + nextval, newbound);
>>> + update_stmt (stmt);
>>> +
>>> + edge stay = single_pred_edge (loop->latch);
>>> +
>>> + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>> + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>> +
>>> + if (initial_true)
>>> + {
>>> + exit->flags |= EDGE_FALSE_VALUE;
>>> + stay->flags |= EDGE_TRUE_VALUE;
>>> + }
>>> + else
>>> + {
>>> + exit->flags |= EDGE_TRUE_VALUE;
>>> + stay->flags |= EDGE_FALSE_VALUE;
>>> + }
>>> +}
>>> +
>>> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
>>> + find the loop phi node in LOOP defining it directly, or create
>>> + such phi node. Return that phi node. */
>>> +
>>> +static gphi *
>>> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
>>> +{
>>> + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
>>> + gphi *phi;
>>> + if ((phi = dyn_cast <gphi *> (def))
>>> + && gimple_bb (phi) == loop->header)
>>> + return phi;
>>> +
>>> + /* XXX Create the PHI instead. */
>>> + return NULL;
>>> +}
>>> +
>>> +/* This function updates the SSA form after connect_loops made a new
>>> + edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
>>> + conditional). I.e. the second loop can now be entered either
>>> + via the original entry or via NEW_E, so the entry values of LOOP2
>>> + phi nodes are either the original ones or those at the exit
>>> + of LOOP1. Insert new phi nodes in LOOP2 pre-header reflecting
>>> + this. */
>>> +
>>> +static void
>>> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
>>> +{
>>> + basic_block rest = loop_preheader_edge (loop2)->src;
>>> + gcc_assert (new_e->dest == rest);
>>> + edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
>>> +
>>> + edge firste = loop_preheader_edge (loop1);
>>> + edge seconde = loop_preheader_edge (loop2);
>>> + edge firstn = loop_latch_edge (loop1);
>>> + gphi_iterator psi_first, psi_second;
>>> + for (psi_first = gsi_start_phis (loop1->header),
>>> + psi_second = gsi_start_phis (loop2->header);
>>> + !gsi_end_p (psi_first);
>>> + gsi_next (&psi_first), gsi_next (&psi_second))
>>> + {
>>> + tree init, next, new_init;
>>> + use_operand_p op;
>>> + gphi *phi_first = psi_first.phi ();
>>> + gphi *phi_second = psi_second.phi ();
>>> +
>>> + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>>> + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>>> + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>>> + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>>> +
>>> + /* Prefer using original variable as a base for the new ssa name.
>>> + This is necessary for virtual ops, and useful in order to avoid
>>> + losing debug info for real ops. */
>>> + if (TREE_CODE (next) == SSA_NAME
>>> + && useless_type_conversion_p (TREE_TYPE (next),
>>> + TREE_TYPE (init)))
>>> + new_init = copy_ssa_name (next);
>>> + else if (TREE_CODE (init) == SSA_NAME
>>> + && useless_type_conversion_p (TREE_TYPE (init),
>>> + TREE_TYPE (next)))
>>> + new_init = copy_ssa_name (init);
>>> + else if (useless_type_conversion_p (TREE_TYPE (next),
>>> + TREE_TYPE (init)))
>>> + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>>> + "unrinittmp");
>>> + else
>>> + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>>> + "unrinittmp");
>>> +
>>> + gphi * newphi = create_phi_node (new_init, rest);
>>> + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>>> + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>>> + SET_USE (op, new_init);
>>> + }
>>> +}
>>> +
>>> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
>>> + they are still equivalent and placed in two arms of a diamond, like so:
>>> +
>>> + .------if (cond)------.
>>> + v v
>>> + pre1 pre2
>>> + | |
>>> + .--->h1 h2<----.
>>> + | | | |
>>> + | ex1---. .---ex2 |
>>> + | / | | \ |
>>> + '---l1 X | l2---'
>>> + | |
>>> + | |
>>> + '--->join<---'
>>> +
>>> + This function transforms the program such that LOOP1 is conditionally
>>> + falling through to LOOP2, or skipping it. This is done by splitting
>>> + the ex1->join edge at X in the diagram above, and inserting a condition
>>> + whose one arm goes to pre2, resulting in this situation:
>>> +
>>> + .------if (cond)------.
>>> + v v
>>> + pre1 .---------->pre2
>>> + | | |
>>> + .--->h1 | h2<----.
>>> + | | | | |
>>> + | ex1---. | .---ex2 |
>>> + | / v | | \ |
>>> + '---l1 skip---' | l2---'
>>> + | |
>>> + | |
>>> + '--->join<---'
>>> +
>>> +
>>> + The condition used is the exit condition of LOOP1, which effectively means
>>> + that when the first loop exits (for whatever reason) but the real original
>>> + exit expression is still false the second loop will be entered.
>>> + The function returns the new edge cond->pre2.
>>> +
>>> + This doesn't update the SSA form, see connect_loop_phis for that. */
>>> +
>>> +static edge
>>> +connect_loops (struct loop *loop1, struct loop *loop2)
>>> +{
>>> + edge exit = single_exit (loop1);
>>> + basic_block skip_bb = split_edge (exit);
>>> + gcond *skip_stmt;
>>> + gimple_stmt_iterator gsi;
>>> + edge new_e, skip_e;
>>> +
>>> + gimple *stmt = last_stmt (exit->src);
>>> + skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
>>> + gimple_cond_lhs (stmt),
>>> + gimple_cond_rhs (stmt),
>>> + NULL_TREE, NULL_TREE);
>>> + gsi = gsi_last_bb (skip_bb);
>>> + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
>>> +
>>> + skip_e = EDGE_SUCC (skip_bb, 0);
>>> + skip_e->flags &= ~EDGE_FALLTHRU;
>>> + new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
>>> + if (exit->flags & EDGE_TRUE_VALUE)
>>> + {
>>> + skip_e->flags |= EDGE_TRUE_VALUE;
>>> + new_e->flags |= EDGE_FALSE_VALUE;
>>> + }
>>> + else
>>> + {
>>> + skip_e->flags |= EDGE_FALSE_VALUE;
>>> + new_e->flags |= EDGE_TRUE_VALUE;
>>> + }
>>> +
>>> + new_e->count = skip_bb->count;
>>> + new_e->probability = PROB_LIKELY;
>>> + new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
>>> + skip_e->count -= new_e->count;
>>> + skip_e->probability = inverse_probability (PROB_LIKELY);
>>> +
>>> + return new_e;
>>> +}
>>> +
>>> +/* This returns the new bound for iterations given the original iteration
>>> + space in NITER, an arbitrary new bound BORDER, assumed to be some
>>> + comparison value with a different IV, the initial value GUARD_INIT of
>>> + that other IV, and the comparison code GUARD_CODE that compares
>>> + that other IV with BORDER. We return an SSA name, and place any
>>> + necessary statements for that computation into *STMTS.
>>> +
>>> + For example for such a loop:
>>> +
>>> + for (i = beg, j = guard_init; i < end; i++, j++)
>>> + if (j < border) // this is supposed to be true/false
>>> + ...
>>> +
>>> + we want to return a new bound (on j) that makes the loop iterate
>>> + as long as the condition j < border stays true. We also don't want
>>> + to iterate more often than the original loop, so we have to introduce
>>> + some cut-off as well (via min/max), effectively resulting in:
>>> +
>>> + newend = min (end+guard_init-beg, border)
>>> + for (i = beg; j = guard_init; j < newend; i++, j++)
>>> + if (j < c)
>>> + ...
>>> +
>>> + Depending on the direction of the IVs and if the exit tests
>>> + are strict or non-strict we need to use MIN or MAX,
>>> + and add or subtract 1. This routine computes newend above. */
>>> +
>>> +static tree
>>> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>>> + tree border,
>>> + enum tree_code guard_code, tree guard_init)
>>> +{
>>> + /* The niter structure contains the after-increment IV, we need
>>> + the loop-enter base, so subtract STEP once. */
>>> + tree controlbase = force_gimple_operand (niter->control.base,
>>> + stmts, true, NULL_TREE);
>>> + tree controlstep = niter->control.step;
>>> + tree enddiff;
>>> + if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>>> + {
>>> + controlstep = gimple_build (stmts, NEGATE_EXPR,
>>> + TREE_TYPE (controlstep), controlstep);
>>> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>> + TREE_TYPE (controlbase),
>>> + controlbase, controlstep);
>>> + }
>>> + else
>>> + enddiff = gimple_build (stmts, MINUS_EXPR,
>>> + TREE_TYPE (controlbase),
>>> + controlbase, controlstep);
>>> +
>>> + /* Compute beg-guard_init. */
>>> + if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
>>> + {
>>> + tree tem = gimple_convert (stmts, sizetype, guard_init);
>>> + tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
>>> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>> + TREE_TYPE (enddiff),
>>> + enddiff, tem);
>>> + }
>>> + else
>>> + enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>> + enddiff, guard_init);
>>> +
>>> + /* Compute end-(beg-guard_init). */
>>> + gimple_seq stmts2;
>>> + tree newbound = force_gimple_operand (niter->bound, &stmts2,
>>> + true, NULL_TREE);
>>> + gimple_seq_add_seq_without_update (stmts, stmts2);
>>> +
>>> + if (POINTER_TYPE_P (TREE_TYPE (enddiff))
>>> + || POINTER_TYPE_P (TREE_TYPE (newbound)))
>>> + {
>>> + enddiff = gimple_convert (stmts, sizetype, enddiff);
>>> + enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
>>> + newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
>>> + TREE_TYPE (newbound),
>>> + newbound, enddiff);
>>> + }
>>> + else
>>> + newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>> + newbound, enddiff);
>>> +
>>> + /* Depending on the direction of the IVs the new bound for the first
>>> + loop is the minimum or maximum of old bound and border.
>>> + Also, if the guard condition isn't strictly less or greater,
>>> + we need to adjust the bound. */
>>> + int addbound = 0;
>>> + enum tree_code minmax;
>>> + if (niter->cmp == LT_EXPR)
>>> + {
>>> + /* GT and LE are the same, inverted. */
>>> + if (guard_code == GT_EXPR || guard_code == LE_EXPR)
>>> + addbound = -1;
>>> + minmax = MIN_EXPR;
>>> + }
>>> + else
>>> + {
>>> + gcc_assert (niter->cmp == GT_EXPR);
>>> + if (guard_code == GE_EXPR || guard_code == LT_EXPR)
>>> + addbound = 1;
>>> + minmax = MAX_EXPR;
>>> + }
>>> +
>>> + if (addbound)
>>> + {
>>> + tree type2 = TREE_TYPE (newbound);
>>> + if (POINTER_TYPE_P (type2))
>>> + type2 = sizetype;
>>> + newbound = gimple_build (stmts,
>>> + POINTER_TYPE_P (TREE_TYPE (newbound))
>>> + ? POINTER_PLUS_EXPR : PLUS_EXPR,
>>> + TREE_TYPE (newbound),
>>> + newbound,
>>> + build_int_cst (type2, addbound));
>>> + }
>>> +
>>> + tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
>>> + border, newbound);
>>> + return newend;
>>> +}
>>> +
>>> +/* Checks if LOOP contains an conditional block whose condition
>>> + depends on which side in the iteration space it is, and if so
>>> + splits the iteration space into two loops. Returns true if the
>>> + loop was split. NITER must contain the iteration descriptor for the
>>> + single exit of LOOP. */
>>> +
>>> +static bool
>>> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
>>> +{
>>> + basic_block *bbs;
>>> + unsigned i;
>>> + bool changed = false;
>>> + tree guard_iv;
>>> + tree border;
>>> + affine_iv iv;
>>> +
>>> + bbs = get_loop_body (loop1);
>>> +
>>> + /* Find a splitting opportunity. */
>>> + for (i = 0; i < loop1->num_nodes; i++)
>>> + if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>>> + {
>>> + /* Handling opposite steps is not implemented yet. Neither
>>> + is handling different step sizes. */
>>> + if ((tree_int_cst_sign_bit (iv.step)
>>> + != tree_int_cst_sign_bit (niter->control.step))
>>> + || !tree_int_cst_equal (iv.step, niter->control.step))
>>> + continue;
>>> +
>>> + /* Find a loop PHI node that defines guard_iv directly,
>>> + or create one doing that. */
>>> + gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
>>> + if (!phi)
>>> + continue;
>>> + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
>>> + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
>>> + loop_preheader_edge (loop1));
>>> + enum tree_code guard_code = gimple_cond_code (guard_stmt);
>>> +
>>> + /* Loop splitting is implemented by versioning the loop, placing
>>> + the new loop after the old loop, make the first loop iterate
>>> + as long as the conditional stays true (or false) and let the
>>> + second (new) loop handle the rest of the iterations.
>>> +
>>> + First we need to determine if the condition will start being true
>>> + or false in the first loop. */
>>> + bool initial_true;
>>> + switch (guard_code)
>>> + {
>>> + case LT_EXPR:
>>> + case LE_EXPR:
>>> + initial_true = !tree_int_cst_sign_bit (iv.step);
>>> + break;
>>> + case GT_EXPR:
>>> + case GE_EXPR:
>>> + initial_true = tree_int_cst_sign_bit (iv.step);
>>> + break;
>>> + default:
>>> + gcc_unreachable ();
>>> + }
>>> +
>>> + /* Build a condition that will skip the first loop when the
>>> + guard condition won't ever be true (or false). */
>>> + gimple_seq stmts2;
>>> + border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
>>> + if (stmts2)
>>> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>> + stmts2);
>>> + tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>>> + if (!initial_true)
>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>> +
>>> + /* Now version the loop, placing loop2 after loop1 connecting
>>> + them, and fix up SSA form for that. */
>>> + initialize_original_copy_tables ();
>>> + basic_block cond_bb;
>>> + struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>> + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>>> + REG_BR_PROB_BASE, true);
>>> + gcc_assert (loop2);
>>> + update_ssa (TODO_update_ssa);
>>> +
>>> + edge new_e = connect_loops (loop1, loop2);
>>> + connect_loop_phis (loop1, loop2, new_e);
>>> +
>>> + /* The iterations of the second loop is now already
>>> + exactly those that the first loop didn't do, but the
>>> + iteration space of the first loop is still the original one.
>>> + Compute the new bound for the guarding IV and patch the
>>> + loop exit to use it instead of original IV and bound. */
>>> + gimple_seq stmts = NULL;
>>> + tree newend = compute_new_first_bound (&stmts, niter, border,
>>> + guard_code, guard_init);
>>> + if (stmts)
>>> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>> + stmts);
>>> + tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>>> + patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>>> +
>>> + /* Finally patch out the two copies of the condition to be always
>>> + true/false (or opposite). */
>>> + gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
>>> + gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
>>> + if (!initial_true)
>>> + std::swap (force_true, force_false);
>>> + gimple_cond_make_true (force_true);
>>> + gimple_cond_make_false (force_false);
>>> + update_stmt (force_true);
>>> + update_stmt (force_false);
>>> +
>>> + free_original_copy_tables ();
>>> +
>>> + /* We destroyed LCSSA form above. Eventually we might be able
>>> + to fix it on the fly, for now simply punt and use the helper. */
>>> + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>>> +
>>> + changed = true;
>>> + if (dump_file && (dump_flags & TDF_DETAILS))
>>> + fprintf (dump_file, ";; Loop split.\n");
>>> +
>>> + /* Only deal with the first opportunity. */
>>> + break;
>>> + }
>>> +
>>> + free (bbs);
>>> + return changed;
>>> +}
>>> +
>>> +/* Main entry point. Perform loop splitting on all suitable loops. */
>>> +
>>> +static unsigned int
>>> +tree_ssa_split_loops (void)
>>> +{
>>> + struct loop *loop;
>>> + bool changed = false;
>>> +
>>> + gcc_assert (scev_initialized_p ());
>>> + FOR_EACH_LOOP (loop, 0)
>>> + loop->aux = NULL;
>>> +
>>> + /* Go through all loops starting from innermost. */
>>> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>> + {
>>> + struct tree_niter_desc niter;
>>> + if (loop->aux)
>>> + {
>>> + /* If any of our inner loops was split, don't split us,
>>> + and mark our containing loop as having had splits as well. */
>>> + loop_outer (loop)->aux = loop;
>>> + continue;
>>> + }
>>> +
>>> + if (single_exit (loop)
>>> + /* ??? We could handle non-empty latches when we split
>>> + the latch edge (not the exit edge), and put the new
>>> + exit condition in the new block. OTOH this executes some
>>> + code unconditionally that might have been skipped by the
>>> + original exit before. */
>>> + && empty_block_p (loop->latch)
>>> + && !optimize_loop_for_size_p (loop)
>>> + && number_of_iterations_exit (loop, single_exit (loop), &niter,
>>> + false, true)
>>> + && niter.cmp != ERROR_MARK
>>> + /* We can't yet handle loops controlled by a != predicate. */
>>> + && niter.cmp != NE_EXPR)
>>> + {
>>> + if (split_loop (loop, &niter))
>>> + {
>>> + /* Mark our containing loop as having had some split inner
>>> + loops. */
>>> + loop_outer (loop)->aux = loop;
>>> + changed = true;
>>> + }
>>> + }
>>> + }
>>> +
>>> + FOR_EACH_LOOP (loop, 0)
>>> + loop->aux = NULL;
>>> +
>>> + if (changed)
>>> + return TODO_cleanup_cfg;
>>> + return 0;
>>> +}
>>> +
>>> +/* Loop splitting pass. */
>>> +
>>> +namespace {
>>> +
>>> +const pass_data pass_data_loop_split =
>>> +{
>>> + GIMPLE_PASS, /* type */
>>> + "lsplit", /* name */
>>> + OPTGROUP_LOOP, /* optinfo_flags */
>>> + TV_LOOP_SPLIT, /* tv_id */
>>> + PROP_cfg, /* properties_required */
>>> + 0, /* properties_provided */
>>> + 0, /* properties_destroyed */
>>> + 0, /* todo_flags_start */
>>> + 0, /* todo_flags_finish */
>>> +};
>>> +
>>> +class pass_loop_split : public gimple_opt_pass
>>> +{
>>> +public:
>>> + pass_loop_split (gcc::context *ctxt)
>>> + : gimple_opt_pass (pass_data_loop_split, ctxt)
>>> + {}
>>> +
>>> + /* opt_pass methods: */
>>> + virtual bool gate (function *) { return flag_split_loops != 0; }
>>> + virtual unsigned int execute (function *);
>>> +
>>> +}; // class pass_loop_split
>>> +
>>> +unsigned int
>>> +pass_loop_split::execute (function *fun)
>>> +{
>>> + if (number_of_loops (fun) <= 1)
>>> + return 0;
>>> +
>>> + return tree_ssa_split_loops ();
>>> +}
>>> +
>>> +} // anon namespace
>>> +
>>> +gimple_opt_pass *
>>> +make_pass_loop_split (gcc::context *ctxt)
>>> +{
>>> + return new pass_loop_split (ctxt);
>>> +}
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi (revision 231115)
>>> +++ doc/invoke.texi (working copy)
>>> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>>> -fselective-scheduling -fselective-scheduling2 @gol
>>> -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>>> -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
>>> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
>>> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>>> -fsplit-paths @gol
>>> -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>>> -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
>>> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>>> Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
>>> at level @option{-O1}
>>>
>>> +@item -fsplit-loops
>>> +@opindex fsplit-loops
>>> +Split a loop into two if it contains a condition that's always true
>>> +for one side of the iteration space and false for the other.
>>> +
>>> @item -funswitch-loops
>>> @opindex funswitch-loops
>>> Move branches with loop invariant conditions out of the loop, with duplicates
>>> Index: doc/passes.texi
>>> ===================================================================
>>> --- doc/passes.texi (revision 231115)
>>> +++ doc/passes.texi (working copy)
>>> @@ -484,6 +484,12 @@ out of the loops. To achieve this, a du
>>> each possible outcome of conditional jump(s). The pass is implemented in
>>> @file{tree-ssa-loop-unswitch.c}.
>>>
>>> +Loop splitting. If a loop contains a conditional statement that is
>>> +always true for one part of the iteration space and false for the other
>>> +this pass splits the loop into two, one dealing with one side the other
>>> +only with the other, thereby removing one inner-loop conditional. The
>>> +pass is implemented in @file{tree-ssa-loop-split.c}.
>>> +
>>> The optimizations also use various utility functions contained in
>>> @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>>> @file{cfgloopmanip.c}.
>>> Index: testsuite/gcc.dg/loop-split.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/loop-split.c (revision 0)
>>> +++ testsuite/gcc.dg/loop-split.c (working copy)
>>> @@ -0,0 +1,147 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>>> +
>>> +#ifdef __cplusplus
>>> +extern "C" int printf (const char *, ...);
>>> +extern "C" void abort (void);
>>> +#else
>>> +extern int printf (const char *, ...);
>>> +extern void abort (void);
>>> +#endif
>>> +
>>> +/* Define TRACE to 1 or 2 to get detailed tracing.
>>> + Define SINGLE_TEST to 1 or 2 to get a simple routine with
>>> + just one loop, called only one time or with multiple parameters,
>>> + to make debugging easier. */
>>> +#ifndef TRACE
>>> +#define TRACE 0
>>> +#endif
>>> +
>>> +#define loop(beg,step,beg2,cond1,cond2) \
>>> + do \
>>> + { \
>>> + sum = 0; \
>>> + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
>>> + { \
>>> + if (cond2) { \
>>> + if (TRACE > 1) printf ("a: %d %d\n", i, j); \
>>> + sum += a[i]; \
>>> + } else { \
>>> + if (TRACE > 1) printf ("b: %d %d\n", i, j); \
>>> + sum += b[i]; \
>>> + } \
>>> + } \
>>> + if (TRACE > 0) printf ("sum: %d\n", sum); \
>>> + check = check * 47 + sum; \
>>> + } while (0)
>>> +
>>> +#ifndef SINGLE_TEST
>>> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
>>> + int c, int *a, int *b, int beg2)
>>> +{
>>> + unsigned check = 0;
>>> + int sum;
>>> + int i, j;
>>> + loop (beg, 1, beg2, i < end, j < c);
>>> + loop (beg, 1, beg2, i <= end, j < c);
>>> + loop (beg, 1, beg2, i < end, j <= c);
>>> + loop (beg, 1, beg2, i <= end, j <= c);
>>> + loop (beg, 1, beg2, i < end, j > c);
>>> + loop (beg, 1, beg2, i <= end, j > c);
>>> + loop (beg, 1, beg2, i < end, j >= c);
>>> + loop (beg, 1, beg2, i <= end, j >= c);
>>> + beg2 += end-beg;
>>> + loop (end, -1, beg2, i >= beg, j >= c);
>>> + loop (end, -1, beg2, i >= beg, j > c);
>>> + loop (end, -1, beg2, i > beg, j >= c);
>>> + loop (end, -1, beg2, i > beg, j > c);
>>> + loop (end, -1, beg2, i >= beg, j <= c);
>>> + loop (end, -1, beg2, i >= beg, j < c);
>>> + loop (end, -1, beg2, i > beg, j <= c);
>>> + loop (end, -1, beg2, i > beg, j < c);
>>> + return check;
>>> +}
>>> +
>>> +#else
>>> +
>>> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
>>> + int c, int *a, int *b, int beg2)
>>> +{
>>> + int sum = 0;
>>> + int i, j;
>>> + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>> + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
>>> + {
>>> + // i - j == X --> i = X + j
>>> + // --> i < end == X+j < end == j < end - X
>>> + // --> newend = end - (i_init - j_init)
>>> + // j < end-X && j < c --> j < min(end-X,c)
>>> + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
>>> + //if (j < c)
>>> + if (j >= c)
>>> + printf ("a: %d %d\n", i, j);
>>> + /*else
>>> + printf ("b: %d %d\n", i, j);*/
>>> + /*sum += a[i];
>>> + else
>>> + sum += b[i];*/
>>> + }
>>> + return sum;
>>> +}
>>> +
>>> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
>>> + int *c, int *a, int *b, int *beg2)
>>> +{
>>> + int sum = 0;
>>> + int *i, *j;
>>> + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>> + {
>>> + if (j <= c)
>>> + printf ("%d %d\n", i - beg, j - beg);
>>> + /*sum += a[i];
>>> + else
>>> + sum += b[i];*/
>>> + }
>>> + return sum;
>>> +}
>>> +#endif
>>> +
>>> +extern int printf (const char *, ...);
>>> +
>>> +int main ()
>>> +{
>>> + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
>>> + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
>>> + int c;
>>> + int diff = 0;
>>> + unsigned check = 0;
>>> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
>>> + //dotest (0, 9, 1, -1, a+5, b+5, -1);
>>> + //return 0;
>>> + f (0, 9, 1, 5, a+5, b+5, -1);
>>> + return 0;
>>> +#endif
>>> + for (diff = -5; diff <= 5; diff++)
>>> + {
>>> + for (c = -1; c <= 10; c++)
>>> + {
>>> +#ifdef SINGLE_TEST
>>> + int s = f (0, 9, 1, c, a+5, b+5, diff);
>>> + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
>>> + printf ("%d ", s);
>>> +#else
>>> + if (TRACE > 0)
>>> + printf ("check %d %d\n", c, diff);
>>> + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
>>> +#endif
>>> + }
>>> + //printf ("\n");
>>> + }
>>> + //printf ("%u\n", check);
>>> + if (check != 3213344948)
>>> + abort ();
>>> + return 0;
>>> +}
>>> +
>>> +/* All 16 loops in dotest should be split. */
>>> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-07-27 6:18 ` Andrew Pinski
@ 2016-07-27 8:11 ` Richard Biener
0 siblings, 0 replies; 20+ messages in thread
From: Richard Biener @ 2016-07-27 8:11 UTC (permalink / raw)
To: Andrew Pinski; +Cc: Michael Matz, Jeff Law, GCC Patches
On Wed, Jul 27, 2016 at 8:17 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz <matz@suse.de> wrote:
>>>> Hi,
>>>>
>>>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>>>
>>>>> > So, okay for trunk?
>>>>> -ENOPATCH
>>>>
>>>> Sigh :)
>>>> Here it is.
>>>
>>>
>>> I found one problem with it.
>>> Take:
>>> void f(int *a, int M, int *b)
>>> {
>>> for(int i = 0; i <= M; i++)
>>> {
>>> if (i < M)
>>> a[i] = i;
>>> }
>>> }
>>> ---- CUT ---
>>> There are two issues with the code as below. The outer most loop's
>>> aux is still set which causes the vectorizer not to vector the loop.
>>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>>> to get the induction variable usage after the loop gone so the
>>> vectorizer will work.
>>
>> I think scev_cprop needs to be re-written to an utility so that the vectorizer
>> itself can (within its own cost-model) eliminate an induction using it.
>>
>> Richard.
>>
>>> Something like (note this is copy and paste from a terminal):
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index c327900..e8d6ea6 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3. If not see
>>> NEXT_PASS (pass_copy_prop);
>>> NEXT_PASS (pass_dce);
>>> NEXT_PASS (pass_tree_unswitch);
>>> - NEXT_PASS (pass_scev_cprop);
>>> NEXT_PASS (pass_loop_split);
>>> + NEXT_PASS (pass_scev_cprop);
>>> NEXT_PASS (pass_record_bounds);
>>> NEXT_PASS (pass_loop_distribution);
>>> NEXT_PASS (pass_copy_prop);
>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>> index 5411530..e72ef19 100644
>>> --- a/gcc/tree-ssa-loop-split.c
>>> +++ b/gcc/tree-ssa-loop-split.c
>>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>>
>>> gcc_assert (scev_initialized_p ());
>>> FOR_EACH_LOOP (loop, 0)
>>> - loop->aux = NULL;
>>> + {
>>> + loop->aux = NULL;
>>> + if (loop_outer (loop))
>>> + loop_outer (loop)->aux = NULL;
>>> + }
>>
>> How does the iterator not visit loop_outer (loop)?!
>
> The iterator with flags of 0 does not visit the the root. So the way
> to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
> we zero out the root too.
Or not set ->aux on the root in the first place.
Richard.
> Thanks,
> Andrew
>
>>
>>>
>>> /* Go through all loops starting from innermost. */
>>> FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>> }
>>>
>>> FOR_EACH_LOOP (loop, 0)
>>> - loop->aux = NULL;
>>> + {
>>> + loop->aux = NULL;
>>> + if (loop_outer (loop))
>>> + loop_outer (loop)->aux = NULL;
>>> + }
>>>
>>> if (changed)
>>> return TODO_cleanup_cfg;
>>> ----- CUT -----
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>>
>>>>
>>>> Ciao,
>>>> Michael.
>>>> * common.opt (-fsplit-loops): New flag.
>>>> * passes.def (pass_loop_split): Add.
>>>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>>> (enable_fdo_optimizations): Add loop splitting.
>>>> * timevar.def (TV_LOOP_SPLIT): Add.
>>>> * tree-pass.h (make_pass_loop_split): Declare.
>>>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>>> * tree-ssa-loop-split.c: New file.
>>>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>>> * doc/invoke.texi (fsplit-loops): Document.
>>>> * doc/passes.texi (Loop optimization): Add paragraph about loop
>>>> splitting.
>>>>
>>>> testsuite/
>>>> * gcc.dg/loop-split.c: New test.
>>>>
>>>> Index: common.opt
>>>> ===================================================================
>>>> --- common.opt (revision 231115)
>>>> +++ common.opt (working copy)
>>>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>>> Common Report Var(flag_unswitch_loops) Optimization
>>>> Perform loop unswitching.
>>>>
>>>> +fsplit-loops
>>>> +Common Report Var(flag_split_loops) Optimization
>>>> +Perform loop splitting.
>>>> +
>>>> funwind-tables
>>>> Common Report Var(flag_unwind_tables) Optimization
>>>> Just generate unwind tables for exception handling.
>>>> Index: passes.def
>>>> ===================================================================
>>>> --- passes.def (revision 231115)
>>>> +++ passes.def (working copy)
>>>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>>> NEXT_PASS (pass_dce);
>>>> NEXT_PASS (pass_tree_unswitch);
>>>> NEXT_PASS (pass_scev_cprop);
>>>> + NEXT_PASS (pass_loop_split);
>>>> NEXT_PASS (pass_record_bounds);
>>>> NEXT_PASS (pass_loop_distribution);
>>>> NEXT_PASS (pass_copy_prop);
>>>> Index: opts.c
>>>> ===================================================================
>>>> --- opts.c (revision 231115)
>>>> +++ opts.c (working copy)
>>>> @@ -532,6 +532,7 @@ static const struct default_options defa
>>>> regardless of them being declared inline. */
>>>> { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
>>>> { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
>>>> + { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>>>> { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
>>>> { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
>>>> { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>>>> @@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
>>>> opts->x_flag_ipa_cp_alignment = value;
>>>> if (!opts_set->x_flag_predictive_commoning)
>>>> opts->x_flag_predictive_commoning = value;
>>>> + if (!opts_set->x_flag_split_loops)
>>>> + opts->x_flag_split_loops = value;
>>>> if (!opts_set->x_flag_unswitch_loops)
>>>> opts->x_flag_unswitch_loops = value;
>>>> if (!opts_set->x_flag_gcse_after_reload)
>>>> Index: timevar.def
>>>> ===================================================================
>>>> --- timevar.def (revision 231115)
>>>> +++ timevar.def (working copy)
>>>> @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM , "
>>>> DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
>>>> DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
>>>> DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
>>>> +DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting")
>>>> DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling")
>>>> DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>>>> DEFTIMEVAR (TV_TREE_VECTORIZATION , "tree vectorization")
>>>> Index: tree-pass.h
>>>> ===================================================================
>>>> --- tree-pass.h (revision 231115)
>>>> +++ tree-pass.h (working copy)
>>>> @@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
>>>> extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
>>>> extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>>>> extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>>>> +extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>>>> extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>>>> extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>>>> extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>>>> Index: tree-ssa-loop-manip.h
>>>> ===================================================================
>>>> --- tree-ssa-loop-manip.h (revision 231115)
>>>> +++ tree-ssa-loop-manip.h (working copy)
>>>> @@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
>>>>
>>>> extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
>>>> bool, tree *, tree *);
>>>> +extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
>>>> + struct loop *);
>>>> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>>>> extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
>>>> extern void verify_loop_closed_ssa (bool);
>>>> Index: Makefile.in
>>>> ===================================================================
>>>> --- Makefile.in (revision 231115)
>>>> +++ Makefile.in (working copy)
>>>> @@ -1474,6 +1474,7 @@ OBJS = \
>>>> tree-ssa-loop-manip.o \
>>>> tree-ssa-loop-niter.o \
>>>> tree-ssa-loop-prefetch.o \
>>>> + tree-ssa-loop-split.o \
>>>> tree-ssa-loop-unswitch.o \
>>>> tree-ssa-loop.o \
>>>> tree-ssa-math-opts.o \
>>>> Index: tree-ssa-loop-split.c
>>>> ===================================================================
>>>> --- tree-ssa-loop-split.c (revision 0)
>>>> +++ tree-ssa-loop-split.c (working copy)
>>>> @@ -0,0 +1,686 @@
>>>> +/* Loop splitting.
>>>> + Copyright (C) 2015 Free Software Foundation, Inc.
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify it
>>>> +under the terms of the GNU General Public License as published by the
>>>> +Free Software Foundation; either version 3, or (at your option) any
>>>> +later version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>>>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>>>> +for more details.
>>>> +
>>>> +You should have received a copy of the GNU General Public License
>>>> +along with GCC; see the file COPYING3. If not see
>>>> +<http://www.gnu.org/licenses/>. */
>>>> +
>>>> +#include "config.h"
>>>> +#include "system.h"
>>>> +#include "coretypes.h"
>>>> +#include "backend.h"
>>>> +#include "tree.h"
>>>> +#include "gimple.h"
>>>> +#include "tree-pass.h"
>>>> +#include "ssa.h"
>>>> +#include "fold-const.h"
>>>> +#include "tree-cfg.h"
>>>> +#include "tree-ssa.h"
>>>> +#include "tree-ssa-loop-niter.h"
>>>> +#include "tree-ssa-loop.h"
>>>> +#include "tree-ssa-loop-manip.h"
>>>> +#include "tree-into-ssa.h"
>>>> +#include "cfgloop.h"
>>>> +#include "tree-scalar-evolution.h"
>>>> +#include "gimple-iterator.h"
>>>> +#include "gimple-pretty-print.h"
>>>> +#include "cfghooks.h"
>>>> +#include "gimple-fold.h"
>>>> +#include "gimplify-me.h"
>>>> +
>>>> +/* This file implements loop splitting, i.e. transformation of loops like
>>>> +
>>>> + for (i = 0; i < 100; i++)
>>>> + {
>>>> + if (i < 50)
>>>> + A;
>>>> + else
>>>> + B;
>>>> + }
>>>> +
>>>> + into:
>>>> +
>>>> + for (i = 0; i < 50; i++)
>>>> + {
>>>> + A;
>>>> + }
>>>> + for (; i < 100; i++)
>>>> + {
>>>> + B;
>>>> + }
>>>> +
>>>> + */
>>>> +
>>>> +/* Return true when BB inside LOOP is a potential iteration space
>>>> + split point, i.e. ends with a condition like "IV < comp", which
>>>> + is true on one side of the iteration space and false on the other,
>>>> + and the split point can be computed. If so, also return the border
>>>> + point in *BORDER and the comparison induction variable in IV. */
>>>> +
>>>> +static tree
>>>> +split_at_bb_p (struct loop *loop, basic_block bb, tree *border, affine_iv *iv)
>>>> +{
>>>> + gimple *last;
>>>> + gcond *stmt;
>>>> + affine_iv iv2;
>>>> +
>>>> + /* BB must end in a simple conditional jump. */
>>>> + last = last_stmt (bb);
>>>> + if (!last || gimple_code (last) != GIMPLE_COND)
>>>> + return NULL_TREE;
>>>> + stmt = as_a <gcond *> (last);
>>>> +
>>>> + enum tree_code code = gimple_cond_code (stmt);
>>>> +
>>>> + /* Only handle relational comparisons, for equality and non-equality
>>>> + we'd have to split the loop into two loops and a middle statement. */
>>>> + switch (code)
>>>> + {
>>>> + case LT_EXPR:
>>>> + case LE_EXPR:
>>>> + case GT_EXPR:
>>>> + case GE_EXPR:
>>>> + break;
>>>> + default:
>>>> + return NULL_TREE;
>>>> + }
>>>> +
>>>> + if (loop_exits_from_bb_p (loop, bb))
>>>> + return NULL_TREE;
>>>> +
>>>> + tree op0 = gimple_cond_lhs (stmt);
>>>> + tree op1 = gimple_cond_rhs (stmt);
>>>> +
>>>> + if (!simple_iv (loop, loop, op0, iv, false))
>>>> + return NULL_TREE;
>>>> + if (!simple_iv (loop, loop, op1, &iv2, false))
>>>> + return NULL_TREE;
>>>> +
>>>> + /* Make it so, that the first argument of the condition is
>>>> + the looping one (only swap. */
>>>> + if (!integer_zerop (iv2.step))
>>>> + {
>>>> + std::swap (op0, op1);
>>>> + std::swap (*iv, iv2);
>>>> + code = swap_tree_comparison (code);
>>>> + gimple_cond_set_condition (stmt, code, op0, op1);
>>>> + update_stmt (stmt);
>>>> + }
>>>> + else if (integer_zerop (iv->step))
>>>> + return NULL_TREE;
>>>> + if (!integer_zerop (iv2.step))
>>>> + return NULL_TREE;
>>>> +
>>>> + if (dump_file && (dump_flags & TDF_DETAILS))
>>>> + {
>>>> + fprintf (dump_file, "Found potential split point: ");
>>>> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>> + fprintf (dump_file, " { ");
>>>> + print_generic_expr (dump_file, iv->base, TDF_SLIM);
>>>> + fprintf (dump_file, " + I*");
>>>> + print_generic_expr (dump_file, iv->step, TDF_SLIM);
>>>> + fprintf (dump_file, " } %s ", get_tree_code_name (code));
>>>> + print_generic_expr (dump_file, iv2.base, TDF_SLIM);
>>>> + fprintf (dump_file, "\n");
>>>> + }
>>>> +
>>>> + *border = iv2.base;
>>>> + return op0;
>>>> +}
>>>> +
>>>> +/* Given a GUARD conditional stmt inside LOOP, which we want to make always
>>>> + true or false depending on INITIAL_TRUE, and adjusted values NEXTVAL
>>>> + (a post-increment IV) and NEWBOUND (the comparator) adjust the loop
>>>> + exit test statement to loop back only if the GUARD statement will
>>>> + also be true/false in the next iteration. */
>>>> +
>>>> +static void
>>>> +patch_loop_exit (struct loop *loop, gcond *guard, tree nextval, tree newbound,
>>>> + bool initial_true)
>>>> +{
>>>> + edge exit = single_exit (loop);
>>>> + gcond *stmt = as_a <gcond *> (last_stmt (exit->src));
>>>> + gimple_cond_set_condition (stmt, gimple_cond_code (guard),
>>>> + nextval, newbound);
>>>> + update_stmt (stmt);
>>>> +
>>>> + edge stay = single_pred_edge (loop->latch);
>>>> +
>>>> + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>>> + stay->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
>>>> +
>>>> + if (initial_true)
>>>> + {
>>>> + exit->flags |= EDGE_FALSE_VALUE;
>>>> + stay->flags |= EDGE_TRUE_VALUE;
>>>> + }
>>>> + else
>>>> + {
>>>> + exit->flags |= EDGE_TRUE_VALUE;
>>>> + stay->flags |= EDGE_FALSE_VALUE;
>>>> + }
>>>> +}
>>>> +
>>>> +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
>>>> + find the loop phi node in LOOP defining it directly, or create
>>>> + such phi node. Return that phi node. */
>>>> +
>>>> +static gphi *
>>>> +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
>>>> +{
>>>> + gimple *def = SSA_NAME_DEF_STMT (guard_iv);
>>>> + gphi *phi;
>>>> + if ((phi = dyn_cast <gphi *> (def))
>>>> + && gimple_bb (phi) == loop->header)
>>>> + return phi;
>>>> +
>>>> + /* XXX Create the PHI instead. */
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +/* This function updates the SSA form after connect_loops made a new
>>>> + edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
>>>> + conditional). I.e. the second loop can now be entered either
>>>> + via the original entry or via NEW_E, so the entry values of LOOP2
>>>> + phi nodes are either the original ones or those at the exit
>>>> + of LOOP1. Insert new phi nodes in LOOP2 pre-header reflecting
>>>> + this. */
>>>> +
>>>> +static void
>>>> +connect_loop_phis (struct loop *loop1, struct loop *loop2, edge new_e)
>>>> +{
>>>> + basic_block rest = loop_preheader_edge (loop2)->src;
>>>> + gcc_assert (new_e->dest == rest);
>>>> + edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
>>>> +
>>>> + edge firste = loop_preheader_edge (loop1);
>>>> + edge seconde = loop_preheader_edge (loop2);
>>>> + edge firstn = loop_latch_edge (loop1);
>>>> + gphi_iterator psi_first, psi_second;
>>>> + for (psi_first = gsi_start_phis (loop1->header),
>>>> + psi_second = gsi_start_phis (loop2->header);
>>>> + !gsi_end_p (psi_first);
>>>> + gsi_next (&psi_first), gsi_next (&psi_second))
>>>> + {
>>>> + tree init, next, new_init;
>>>> + use_operand_p op;
>>>> + gphi *phi_first = psi_first.phi ();
>>>> + gphi *phi_second = psi_second.phi ();
>>>> +
>>>> + init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>>>> + next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>>>> + op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>>>> + gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>>>> +
>>>> + /* Prefer using original variable as a base for the new ssa name.
>>>> + This is necessary for virtual ops, and useful in order to avoid
>>>> + losing debug info for real ops. */
>>>> + if (TREE_CODE (next) == SSA_NAME
>>>> + && useless_type_conversion_p (TREE_TYPE (next),
>>>> + TREE_TYPE (init)))
>>>> + new_init = copy_ssa_name (next);
>>>> + else if (TREE_CODE (init) == SSA_NAME
>>>> + && useless_type_conversion_p (TREE_TYPE (init),
>>>> + TREE_TYPE (next)))
>>>> + new_init = copy_ssa_name (init);
>>>> + else if (useless_type_conversion_p (TREE_TYPE (next),
>>>> + TREE_TYPE (init)))
>>>> + new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>>>> + "unrinittmp");
>>>> + else
>>>> + new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>>>> + "unrinittmp");
>>>> +
>>>> + gphi * newphi = create_phi_node (new_init, rest);
>>>> + add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>>>> + add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>>>> + SET_USE (op, new_init);
>>>> + }
>>>> +}
>>>> +
>>>> +/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
>>>> + they are still equivalent and placed in two arms of a diamond, like so:
>>>> +
>>>> + .------if (cond)------.
>>>> + v v
>>>> + pre1 pre2
>>>> + | |
>>>> + .--->h1 h2<----.
>>>> + | | | |
>>>> + | ex1---. .---ex2 |
>>>> + | / | | \ |
>>>> + '---l1 X | l2---'
>>>> + | |
>>>> + | |
>>>> + '--->join<---'
>>>> +
>>>> + This function transforms the program such that LOOP1 is conditionally
>>>> + falling through to LOOP2, or skipping it. This is done by splitting
>>>> + the ex1->join edge at X in the diagram above, and inserting a condition
>>>> + whose one arm goes to pre2, resulting in this situation:
>>>> +
>>>> + .------if (cond)------.
>>>> + v v
>>>> + pre1 .---------->pre2
>>>> + | | |
>>>> + .--->h1 | h2<----.
>>>> + | | | | |
>>>> + | ex1---. | .---ex2 |
>>>> + | / v | | \ |
>>>> + '---l1 skip---' | l2---'
>>>> + | |
>>>> + | |
>>>> + '--->join<---'
>>>> +
>>>> +
>>>> + The condition used is the exit condition of LOOP1, which effectively means
>>>> + that when the first loop exits (for whatever reason) but the real original
>>>> + exit expression is still false the second loop will be entered.
>>>> + The function returns the new edge cond->pre2.
>>>> +
>>>> + This doesn't update the SSA form, see connect_loop_phis for that. */
>>>> +
>>>> +static edge
>>>> +connect_loops (struct loop *loop1, struct loop *loop2)
>>>> +{
>>>> + edge exit = single_exit (loop1);
>>>> + basic_block skip_bb = split_edge (exit);
>>>> + gcond *skip_stmt;
>>>> + gimple_stmt_iterator gsi;
>>>> + edge new_e, skip_e;
>>>> +
>>>> + gimple *stmt = last_stmt (exit->src);
>>>> + skip_stmt = gimple_build_cond (gimple_cond_code (stmt),
>>>> + gimple_cond_lhs (stmt),
>>>> + gimple_cond_rhs (stmt),
>>>> + NULL_TREE, NULL_TREE);
>>>> + gsi = gsi_last_bb (skip_bb);
>>>> + gsi_insert_after (&gsi, skip_stmt, GSI_NEW_STMT);
>>>> +
>>>> + skip_e = EDGE_SUCC (skip_bb, 0);
>>>> + skip_e->flags &= ~EDGE_FALLTHRU;
>>>> + new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
>>>> + if (exit->flags & EDGE_TRUE_VALUE)
>>>> + {
>>>> + skip_e->flags |= EDGE_TRUE_VALUE;
>>>> + new_e->flags |= EDGE_FALSE_VALUE;
>>>> + }
>>>> + else
>>>> + {
>>>> + skip_e->flags |= EDGE_FALSE_VALUE;
>>>> + new_e->flags |= EDGE_TRUE_VALUE;
>>>> + }
>>>> +
>>>> + new_e->count = skip_bb->count;
>>>> + new_e->probability = PROB_LIKELY;
>>>> + new_e->count = apply_probability (skip_e->count, PROB_LIKELY);
>>>> + skip_e->count -= new_e->count;
>>>> + skip_e->probability = inverse_probability (PROB_LIKELY);
>>>> +
>>>> + return new_e;
>>>> +}
>>>> +
>>>> +/* This returns the new bound for iterations given the original iteration
>>>> + space in NITER, an arbitrary new bound BORDER, assumed to be some
>>>> + comparison value with a different IV, the initial value GUARD_INIT of
>>>> + that other IV, and the comparison code GUARD_CODE that compares
>>>> + that other IV with BORDER. We return an SSA name, and place any
>>>> + necessary statements for that computation into *STMTS.
>>>> +
>>>> + For example for such a loop:
>>>> +
>>>> + for (i = beg, j = guard_init; i < end; i++, j++)
>>>> + if (j < border) // this is supposed to be true/false
>>>> + ...
>>>> +
>>>> + we want to return a new bound (on j) that makes the loop iterate
>>>> + as long as the condition j < border stays true. We also don't want
>>>> + to iterate more often than the original loop, so we have to introduce
>>>> + some cut-off as well (via min/max), effectively resulting in:
>>>> +
>>>> + newend = min (end+guard_init-beg, border)
>>>> + for (i = beg; j = guard_init; j < newend; i++, j++)
>>>> + if (j < c)
>>>> + ...
>>>> +
>>>> + Depending on the direction of the IVs and if the exit tests
>>>> + are strict or non-strict we need to use MIN or MAX,
>>>> + and add or subtract 1. This routine computes newend above. */
>>>> +
>>>> +static tree
>>>> +compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>>>> + tree border,
>>>> + enum tree_code guard_code, tree guard_init)
>>>> +{
>>>> + /* The niter structure contains the after-increment IV, we need
>>>> + the loop-enter base, so subtract STEP once. */
>>>> + tree controlbase = force_gimple_operand (niter->control.base,
>>>> + stmts, true, NULL_TREE);
>>>> + tree controlstep = niter->control.step;
>>>> + tree enddiff;
>>>> + if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>>>> + {
>>>> + controlstep = gimple_build (stmts, NEGATE_EXPR,
>>>> + TREE_TYPE (controlstep), controlstep);
>>>> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>>> + TREE_TYPE (controlbase),
>>>> + controlbase, controlstep);
>>>> + }
>>>> + else
>>>> + enddiff = gimple_build (stmts, MINUS_EXPR,
>>>> + TREE_TYPE (controlbase),
>>>> + controlbase, controlstep);
>>>> +
>>>> + /* Compute beg-guard_init. */
>>>> + if (POINTER_TYPE_P (TREE_TYPE (enddiff)))
>>>> + {
>>>> + tree tem = gimple_convert (stmts, sizetype, guard_init);
>>>> + tem = gimple_build (stmts, NEGATE_EXPR, sizetype, tem);
>>>> + enddiff = gimple_build (stmts, POINTER_PLUS_EXPR,
>>>> + TREE_TYPE (enddiff),
>>>> + enddiff, tem);
>>>> + }
>>>> + else
>>>> + enddiff = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>>> + enddiff, guard_init);
>>>> +
>>>> + /* Compute end-(beg-guard_init). */
>>>> + gimple_seq stmts2;
>>>> + tree newbound = force_gimple_operand (niter->bound, &stmts2,
>>>> + true, NULL_TREE);
>>>> + gimple_seq_add_seq_without_update (stmts, stmts2);
>>>> +
>>>> + if (POINTER_TYPE_P (TREE_TYPE (enddiff))
>>>> + || POINTER_TYPE_P (TREE_TYPE (newbound)))
>>>> + {
>>>> + enddiff = gimple_convert (stmts, sizetype, enddiff);
>>>> + enddiff = gimple_build (stmts, NEGATE_EXPR, sizetype, enddiff);
>>>> + newbound = gimple_build (stmts, POINTER_PLUS_EXPR,
>>>> + TREE_TYPE (newbound),
>>>> + newbound, enddiff);
>>>> + }
>>>> + else
>>>> + newbound = gimple_build (stmts, MINUS_EXPR, TREE_TYPE (enddiff),
>>>> + newbound, enddiff);
>>>> +
>>>> + /* Depending on the direction of the IVs the new bound for the first
>>>> + loop is the minimum or maximum of old bound and border.
>>>> + Also, if the guard condition isn't strictly less or greater,
>>>> + we need to adjust the bound. */
>>>> + int addbound = 0;
>>>> + enum tree_code minmax;
>>>> + if (niter->cmp == LT_EXPR)
>>>> + {
>>>> + /* GT and LE are the same, inverted. */
>>>> + if (guard_code == GT_EXPR || guard_code == LE_EXPR)
>>>> + addbound = -1;
>>>> + minmax = MIN_EXPR;
>>>> + }
>>>> + else
>>>> + {
>>>> + gcc_assert (niter->cmp == GT_EXPR);
>>>> + if (guard_code == GE_EXPR || guard_code == LT_EXPR)
>>>> + addbound = 1;
>>>> + minmax = MAX_EXPR;
>>>> + }
>>>> +
>>>> + if (addbound)
>>>> + {
>>>> + tree type2 = TREE_TYPE (newbound);
>>>> + if (POINTER_TYPE_P (type2))
>>>> + type2 = sizetype;
>>>> + newbound = gimple_build (stmts,
>>>> + POINTER_TYPE_P (TREE_TYPE (newbound))
>>>> + ? POINTER_PLUS_EXPR : PLUS_EXPR,
>>>> + TREE_TYPE (newbound),
>>>> + newbound,
>>>> + build_int_cst (type2, addbound));
>>>> + }
>>>> +
>>>> + tree newend = gimple_build (stmts, minmax, TREE_TYPE (border),
>>>> + border, newbound);
>>>> + return newend;
>>>> +}
>>>> +
>>>> +/* Checks if LOOP contains an conditional block whose condition
>>>> + depends on which side in the iteration space it is, and if so
>>>> + splits the iteration space into two loops. Returns true if the
>>>> + loop was split. NITER must contain the iteration descriptor for the
>>>> + single exit of LOOP. */
>>>> +
>>>> +static bool
>>>> +split_loop (struct loop *loop1, struct tree_niter_desc *niter)
>>>> +{
>>>> + basic_block *bbs;
>>>> + unsigned i;
>>>> + bool changed = false;
>>>> + tree guard_iv;
>>>> + tree border;
>>>> + affine_iv iv;
>>>> +
>>>> + bbs = get_loop_body (loop1);
>>>> +
>>>> + /* Find a splitting opportunity. */
>>>> + for (i = 0; i < loop1->num_nodes; i++)
>>>> + if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>>>> + {
>>>> + /* Handling opposite steps is not implemented yet. Neither
>>>> + is handling different step sizes. */
>>>> + if ((tree_int_cst_sign_bit (iv.step)
>>>> + != tree_int_cst_sign_bit (niter->control.step))
>>>> + || !tree_int_cst_equal (iv.step, niter->control.step))
>>>> + continue;
>>>> +
>>>> + /* Find a loop PHI node that defines guard_iv directly,
>>>> + or create one doing that. */
>>>> + gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
>>>> + if (!phi)
>>>> + continue;
>>>> + gcond *guard_stmt = as_a<gcond *> (last_stmt (bbs[i]));
>>>> + tree guard_init = PHI_ARG_DEF_FROM_EDGE (phi,
>>>> + loop_preheader_edge (loop1));
>>>> + enum tree_code guard_code = gimple_cond_code (guard_stmt);
>>>> +
>>>> + /* Loop splitting is implemented by versioning the loop, placing
>>>> + the new loop after the old loop, make the first loop iterate
>>>> + as long as the conditional stays true (or false) and let the
>>>> + second (new) loop handle the rest of the iterations.
>>>> +
>>>> + First we need to determine if the condition will start being true
>>>> + or false in the first loop. */
>>>> + bool initial_true;
>>>> + switch (guard_code)
>>>> + {
>>>> + case LT_EXPR:
>>>> + case LE_EXPR:
>>>> + initial_true = !tree_int_cst_sign_bit (iv.step);
>>>> + break;
>>>> + case GT_EXPR:
>>>> + case GE_EXPR:
>>>> + initial_true = tree_int_cst_sign_bit (iv.step);
>>>> + break;
>>>> + default:
>>>> + gcc_unreachable ();
>>>> + }
>>>> +
>>>> + /* Build a condition that will skip the first loop when the
>>>> + guard condition won't ever be true (or false). */
>>>> + gimple_seq stmts2;
>>>> + border = force_gimple_operand (border, &stmts2, true, NULL_TREE);
>>>> + if (stmts2)
>>>> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>>> + stmts2);
>>>> + tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>>>> + if (!initial_true)
>>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +
>>>> + /* Now version the loop, placing loop2 after loop1 connecting
>>>> + them, and fix up SSA form for that. */
>>>> + initialize_original_copy_tables ();
>>>> + basic_block cond_bb;
>>>> + struct loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>> + REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>>>> + REG_BR_PROB_BASE, true);
>>>> + gcc_assert (loop2);
>>>> + update_ssa (TODO_update_ssa);
>>>> +
>>>> + edge new_e = connect_loops (loop1, loop2);
>>>> + connect_loop_phis (loop1, loop2, new_e);
>>>> +
>>>> + /* The iterations of the second loop is now already
>>>> + exactly those that the first loop didn't do, but the
>>>> + iteration space of the first loop is still the original one.
>>>> + Compute the new bound for the guarding IV and patch the
>>>> + loop exit to use it instead of original IV and bound. */
>>>> + gimple_seq stmts = NULL;
>>>> + tree newend = compute_new_first_bound (&stmts, niter, border,
>>>> + guard_code, guard_init);
>>>> + if (stmts)
>>>> + gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>>>> + stmts);
>>>> + tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>>>> + patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>>>> +
>>>> + /* Finally patch out the two copies of the condition to be always
>>>> + true/false (or opposite). */
>>>> + gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
>>>> + gcond *force_false = as_a<gcond *> (last_stmt (get_bb_copy (bbs[i])));
>>>> + if (!initial_true)
>>>> + std::swap (force_true, force_false);
>>>> + gimple_cond_make_true (force_true);
>>>> + gimple_cond_make_false (force_false);
>>>> + update_stmt (force_true);
>>>> + update_stmt (force_false);
>>>> +
>>>> + free_original_copy_tables ();
>>>> +
>>>> + /* We destroyed LCSSA form above. Eventually we might be able
>>>> + to fix it on the fly, for now simply punt and use the helper. */
>>>> + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>>>> +
>>>> + changed = true;
>>>> + if (dump_file && (dump_flags & TDF_DETAILS))
>>>> + fprintf (dump_file, ";; Loop split.\n");
>>>> +
>>>> + /* Only deal with the first opportunity. */
>>>> + break;
>>>> + }
>>>> +
>>>> + free (bbs);
>>>> + return changed;
>>>> +}
>>>> +
>>>> +/* Main entry point. Perform loop splitting on all suitable loops. */
>>>> +
>>>> +static unsigned int
>>>> +tree_ssa_split_loops (void)
>>>> +{
>>>> + struct loop *loop;
>>>> + bool changed = false;
>>>> +
>>>> + gcc_assert (scev_initialized_p ());
>>>> + FOR_EACH_LOOP (loop, 0)
>>>> + loop->aux = NULL;
>>>> +
>>>> + /* Go through all loops starting from innermost. */
>>>> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>>> + {
>>>> + struct tree_niter_desc niter;
>>>> + if (loop->aux)
>>>> + {
>>>> + /* If any of our inner loops was split, don't split us,
>>>> + and mark our containing loop as having had splits as well. */
>>>> + loop_outer (loop)->aux = loop;
>>>> + continue;
>>>> + }
>>>> +
>>>> + if (single_exit (loop)
>>>> + /* ??? We could handle non-empty latches when we split
>>>> + the latch edge (not the exit edge), and put the new
>>>> + exit condition in the new block. OTOH this executes some
>>>> + code unconditionally that might have been skipped by the
>>>> + original exit before. */
>>>> + && empty_block_p (loop->latch)
>>>> + && !optimize_loop_for_size_p (loop)
>>>> + && number_of_iterations_exit (loop, single_exit (loop), &niter,
>>>> + false, true)
>>>> + && niter.cmp != ERROR_MARK
>>>> + /* We can't yet handle loops controlled by a != predicate. */
>>>> + && niter.cmp != NE_EXPR)
>>>> + {
>>>> + if (split_loop (loop, &niter))
>>>> + {
>>>> + /* Mark our containing loop as having had some split inner
>>>> + loops. */
>>>> + loop_outer (loop)->aux = loop;
>>>> + changed = true;
>>>> + }
>>>> + }
>>>> + }
>>>> +
>>>> + FOR_EACH_LOOP (loop, 0)
>>>> + loop->aux = NULL;
>>>> +
>>>> + if (changed)
>>>> + return TODO_cleanup_cfg;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +/* Loop splitting pass. */
>>>> +
>>>> +namespace {
>>>> +
>>>> +const pass_data pass_data_loop_split =
>>>> +{
>>>> + GIMPLE_PASS, /* type */
>>>> + "lsplit", /* name */
>>>> + OPTGROUP_LOOP, /* optinfo_flags */
>>>> + TV_LOOP_SPLIT, /* tv_id */
>>>> + PROP_cfg, /* properties_required */
>>>> + 0, /* properties_provided */
>>>> + 0, /* properties_destroyed */
>>>> + 0, /* todo_flags_start */
>>>> + 0, /* todo_flags_finish */
>>>> +};
>>>> +
>>>> +class pass_loop_split : public gimple_opt_pass
>>>> +{
>>>> +public:
>>>> + pass_loop_split (gcc::context *ctxt)
>>>> + : gimple_opt_pass (pass_data_loop_split, ctxt)
>>>> + {}
>>>> +
>>>> + /* opt_pass methods: */
>>>> + virtual bool gate (function *) { return flag_split_loops != 0; }
>>>> + virtual unsigned int execute (function *);
>>>> +
>>>> +}; // class pass_loop_split
>>>> +
>>>> +unsigned int
>>>> +pass_loop_split::execute (function *fun)
>>>> +{
>>>> + if (number_of_loops (fun) <= 1)
>>>> + return 0;
>>>> +
>>>> + return tree_ssa_split_loops ();
>>>> +}
>>>> +
>>>> +} // anon namespace
>>>> +
>>>> +gimple_opt_pass *
>>>> +make_pass_loop_split (gcc::context *ctxt)
>>>> +{
>>>> + return new pass_loop_split (ctxt);
>>>> +}
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi (revision 231115)
>>>> +++ doc/invoke.texi (working copy)
>>>> @@ -446,7 +446,7 @@ Objective-C and Objective-C++ Dialects}.
>>>> -fselective-scheduling -fselective-scheduling2 @gol
>>>> -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
>>>> -fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
>>>> --fsingle-precision-constant -fsplit-ivs-in-unroller @gol
>>>> +-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops@gol
>>>> -fsplit-paths @gol
>>>> -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
>>>> -fstack-protector -fstack-protector-all -fstack-protector-strong @gol
>>>> @@ -10197,6 +10197,11 @@ Enabled with @option{-fprofile-use}.
>>>> Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
>>>> at level @option{-O1}
>>>>
>>>> +@item -fsplit-loops
>>>> +@opindex fsplit-loops
>>>> +Split a loop into two if it contains a condition that's always true
>>>> +for one side of the iteration space and false for the other.
>>>> +
>>>> @item -funswitch-loops
>>>> @opindex funswitch-loops
>>>> Move branches with loop invariant conditions out of the loop, with duplicates
>>>> Index: doc/passes.texi
>>>> ===================================================================
>>>> --- doc/passes.texi (revision 231115)
>>>> +++ doc/passes.texi (working copy)
>>>> @@ -484,6 +484,12 @@ out of the loops. To achieve this, a du
>>>> each possible outcome of conditional jump(s). The pass is implemented in
>>>> @file{tree-ssa-loop-unswitch.c}.
>>>>
>>>> +Loop splitting. If a loop contains a conditional statement that is
>>>> +always true for one part of the iteration space and false for the other
>>>> +this pass splits the loop into two, one dealing with one side the other
>>>> +only with the other, thereby removing one inner-loop conditional. The
>>>> +pass is implemented in @file{tree-ssa-loop-split.c}.
>>>> +
>>>> The optimizations also use various utility functions contained in
>>>> @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
>>>> @file{cfgloopmanip.c}.
>>>> Index: testsuite/gcc.dg/loop-split.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/loop-split.c (revision 0)
>>>> +++ testsuite/gcc.dg/loop-split.c (working copy)
>>>> @@ -0,0 +1,147 @@
>>>> +/* { dg-do run } */
>>>> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>>>> +
>>>> +#ifdef __cplusplus
>>>> +extern "C" int printf (const char *, ...);
>>>> +extern "C" void abort (void);
>>>> +#else
>>>> +extern int printf (const char *, ...);
>>>> +extern void abort (void);
>>>> +#endif
>>>> +
>>>> +/* Define TRACE to 1 or 2 to get detailed tracing.
>>>> + Define SINGLE_TEST to 1 or 2 to get a simple routine with
>>>> + just one loop, called only one time or with multiple parameters,
>>>> + to make debugging easier. */
>>>> +#ifndef TRACE
>>>> +#define TRACE 0
>>>> +#endif
>>>> +
>>>> +#define loop(beg,step,beg2,cond1,cond2) \
>>>> + do \
>>>> + { \
>>>> + sum = 0; \
>>>> + for (i = (beg), j = (beg2); (cond1); i+=(step),j+=(step)) \
>>>> + { \
>>>> + if (cond2) { \
>>>> + if (TRACE > 1) printf ("a: %d %d\n", i, j); \
>>>> + sum += a[i]; \
>>>> + } else { \
>>>> + if (TRACE > 1) printf ("b: %d %d\n", i, j); \
>>>> + sum += b[i]; \
>>>> + } \
>>>> + } \
>>>> + if (TRACE > 0) printf ("sum: %d\n", sum); \
>>>> + check = check * 47 + sum; \
>>>> + } while (0)
>>>> +
>>>> +#ifndef SINGLE_TEST
>>>> +unsigned __attribute__((noinline, noclone)) dotest (int beg, int end, int step,
>>>> + int c, int *a, int *b, int beg2)
>>>> +{
>>>> + unsigned check = 0;
>>>> + int sum;
>>>> + int i, j;
>>>> + loop (beg, 1, beg2, i < end, j < c);
>>>> + loop (beg, 1, beg2, i <= end, j < c);
>>>> + loop (beg, 1, beg2, i < end, j <= c);
>>>> + loop (beg, 1, beg2, i <= end, j <= c);
>>>> + loop (beg, 1, beg2, i < end, j > c);
>>>> + loop (beg, 1, beg2, i <= end, j > c);
>>>> + loop (beg, 1, beg2, i < end, j >= c);
>>>> + loop (beg, 1, beg2, i <= end, j >= c);
>>>> + beg2 += end-beg;
>>>> + loop (end, -1, beg2, i >= beg, j >= c);
>>>> + loop (end, -1, beg2, i >= beg, j > c);
>>>> + loop (end, -1, beg2, i > beg, j >= c);
>>>> + loop (end, -1, beg2, i > beg, j > c);
>>>> + loop (end, -1, beg2, i >= beg, j <= c);
>>>> + loop (end, -1, beg2, i >= beg, j < c);
>>>> + loop (end, -1, beg2, i > beg, j <= c);
>>>> + loop (end, -1, beg2, i > beg, j < c);
>>>> + return check;
>>>> +}
>>>> +
>>>> +#else
>>>> +
>>>> +int __attribute__((noinline, noclone)) f (int beg, int end, int step,
>>>> + int c, int *a, int *b, int beg2)
>>>> +{
>>>> + int sum = 0;
>>>> + int i, j;
>>>> + //for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>>> + for (i = end, j = beg2 + (end-beg); i > beg; i += -1, j-- /*step*/)
>>>> + {
>>>> + // i - j == X --> i = X + j
>>>> + // --> i < end == X+j < end == j < end - X
>>>> + // --> newend = end - (i_init - j_init)
>>>> + // j < end-X && j < c --> j < min(end-X,c)
>>>> + // j < end-X && j <= c --> j <= min(end-X-1,c) or j < min(end-X,c+1{OF!})
>>>> + //if (j < c)
>>>> + if (j >= c)
>>>> + printf ("a: %d %d\n", i, j);
>>>> + /*else
>>>> + printf ("b: %d %d\n", i, j);*/
>>>> + /*sum += a[i];
>>>> + else
>>>> + sum += b[i];*/
>>>> + }
>>>> + return sum;
>>>> +}
>>>> +
>>>> +int __attribute__((noinline, noclone)) f2 (int *beg, int *end, int step,
>>>> + int *c, int *a, int *b, int *beg2)
>>>> +{
>>>> + int sum = 0;
>>>> + int *i, *j;
>>>> + for (i = beg, j = beg2; i < end; i += 1, j++ /*step*/)
>>>> + {
>>>> + if (j <= c)
>>>> + printf ("%d %d\n", i - beg, j - beg);
>>>> + /*sum += a[i];
>>>> + else
>>>> + sum += b[i];*/
>>>> + }
>>>> + return sum;
>>>> +}
>>>> +#endif
>>>> +
>>>> +extern int printf (const char *, ...);
>>>> +
>>>> +int main ()
>>>> +{
>>>> + int a[] = {0,0,0,0,0, 1,2,3,4,5,6,7,8,9, 0,0,0,0,0};
>>>> + int b[] = {0,0,0,0,0, -1,-2,-3,-4,-5,-6,-7,-8,-9, 0,0,0,0,0,};
>>>> + int c;
>>>> + int diff = 0;
>>>> + unsigned check = 0;
>>>> +#if defined(SINGLE_TEST) && (SINGLE_TEST == 1)
>>>> + //dotest (0, 9, 1, -1, a+5, b+5, -1);
>>>> + //return 0;
>>>> + f (0, 9, 1, 5, a+5, b+5, -1);
>>>> + return 0;
>>>> +#endif
>>>> + for (diff = -5; diff <= 5; diff++)
>>>> + {
>>>> + for (c = -1; c <= 10; c++)
>>>> + {
>>>> +#ifdef SINGLE_TEST
>>>> + int s = f (0, 9, 1, c, a+5, b+5, diff);
>>>> + //int s = f2 (a+0, a+9, 1, a+c, a+5, b+5, a+diff);
>>>> + printf ("%d ", s);
>>>> +#else
>>>> + if (TRACE > 0)
>>>> + printf ("check %d %d\n", c, diff);
>>>> + check = check * 51 + dotest (0, 9, 1, c, a+5, b+5, diff);
>>>> +#endif
>>>> + }
>>>> + //printf ("\n");
>>>> + }
>>>> + //printf ("%u\n", check);
>>>> + if (check != 3213344948)
>>>> + abort ();
>>>> + return 0;
>>>> +}
>>>> +
>>>> +/* All 16 loops in dotest should be split. */
>>>> +/* { dg-final { scan-tree-dump-times "Loop split" 16 "lsplit" } } */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2015-12-05 7:55 ` Jeff Law
@ 2016-10-20 14:43 ` Michael Matz
2016-10-20 14:56 ` Bin.Cheng
2016-10-20 19:17 ` Jeff Law
0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2016-10-20 14:43 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Hi,
On Sat, 5 Dec 2015, Jeff Law wrote:
> Nit. I don't think you want a comma after "so". And it looks like your
> comment got truncated as well.
>
> With the comment above fixed, this is fine for the trunk.
I'm terribly sorry to have dropped the ball here, but I've committed this
now after not even a year ;-/ (r241374) Obviously after rebootstrapping
with all,ada languages. I also did some benchmark run which should be
taken with a grain of salt as the machine had fairly variant results but
the improvements are real, though perhaps not always in that range (it's a
normal three repeats run). I'm really curious if our automatic tester can
pick up similar improvements, because if so, it's extreme (5 to 15 percent
in some benchmarks) and we can brag about it for GCC 7 ;-)
400.perlbench 9770 519 18.8 * 9770 508 19.2 *
401.bzip2 9650 668 14.5 * 9650 666 14.5 *
403.gcc 8050 455 17.7 * 8050 432 18.6 *
429.mcf 9120 477 19.1 * 9120 467 19.5 *
445.gobmk 10490 643 16.3 * 10490 644 16.3 *
456.hmmer 9330 641 14.6 * 9330 614 15.2 *
458.sjeng 12100 784 15.4 * 12100 762 15.9 *
462.libquantum 20720 605 34.2 * 20720 600 34.5 *
464.h264ref 22130 969 22.8 * 22130 969 22.8 *
471.omnetpp 6250 438 14.3 * 6250 358 17.5 *
473.astar 7020 494 14.2 * 7020 492 14.3 *
483.xalancbmk 6900 342 20.2 * 6900 336 20.6 *
Est. SPECint(R)_base2006 17.9
Est. SPECint2006 18.5
410.bwaves 13590 563 24.1 * 13590 506 26.9 *
416.gamess NR NR
433.milc 9180 375 24.5 * 9180 349 26.3 *
434.zeusmp 9100 433 21.0 * 9100 423 21.5 *
435.gromacs 7140 402 17.7 * 7140 411 17.4 *
436.cactusADM 11950 486 24.6 * 11950 486 24.6 *
437.leslie3d 9400 421 22.4 * 9400 419 22.4 *
444.namd 8020 520 15.4 * 8020 520 15.4 *
447.dealII NR NR
450.soplex 8340 393 21.2 * 8340 391 21.3 *
453.povray 5320 277 19.2 * 5320 278 19.1 *
454.calculix 8250 453 18.2 * 8250 460 17.9 *
459.GemsFDTD 10610 542 19.6 * 10610 537 19.8 *
465.tonto 9840 492 20.0 * 9840 491 20.0 *
470.lbm 13740 466 29.5 * 13740 430 32.0 *
481.wrf 11170 492 22.7 * 11170 457 24.4 *
482.sphinx3 19490 659 29.6 * 19490 655 29.8 *
Est. SPECfp(R)_base2006 21.6
Est. SPECfp2006 22.1
Ciao,
Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-10-20 14:43 ` Michael Matz
@ 2016-10-20 14:56 ` Bin.Cheng
2016-10-24 8:44 ` Bin.Cheng
2016-10-20 19:17 ` Jeff Law
1 sibling, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2016-10-20 14:56 UTC (permalink / raw)
To: Michael Matz; +Cc: Jeff Law, gcc-patches List
On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Sat, 5 Dec 2015, Jeff Law wrote:
>
>> Nit. I don't think you want a comma after "so". And it looks like your
>> comment got truncated as well.
>>
>> With the comment above fixed, this is fine for the trunk.
>
> I'm terribly sorry to have dropped the ball here, but I've committed this
> now after not even a year ;-/ (r241374) Obviously after rebootstrapping
> with all,ada languages. I also did some benchmark run which should be
> taken with a grain of salt as the machine had fairly variant results but
> the improvements are real, though perhaps not always in that range (it's a
> normal three repeats run). I'm really curious if our automatic tester can
> pick up similar improvements, because if so, it's extreme (5 to 15 percent
> in some benchmarks) and we can brag about it for GCC 7 ;-)
This is nice, thanks for doing it. I will check the improvement on AArch64.
Thanks,
bin
>
> 400.perlbench 9770 519 18.8 * 9770 508 19.2 *
> 401.bzip2 9650 668 14.5 * 9650 666 14.5 *
> 403.gcc 8050 455 17.7 * 8050 432 18.6 *
> 429.mcf 9120 477 19.1 * 9120 467 19.5 *
> 445.gobmk 10490 643 16.3 * 10490 644 16.3 *
> 456.hmmer 9330 641 14.6 * 9330 614 15.2 *
> 458.sjeng 12100 784 15.4 * 12100 762 15.9 *
> 462.libquantum 20720 605 34.2 * 20720 600 34.5 *
> 464.h264ref 22130 969 22.8 * 22130 969 22.8 *
> 471.omnetpp 6250 438 14.3 * 6250 358 17.5 *
> 473.astar 7020 494 14.2 * 7020 492 14.3 *
> 483.xalancbmk 6900 342 20.2 * 6900 336 20.6 *
> Est. SPECint(R)_base2006 17.9
> Est. SPECint2006 18.5
>
> 410.bwaves 13590 563 24.1 * 13590 506 26.9 *
> 416.gamess NR NR
> 433.milc 9180 375 24.5 * 9180 349 26.3 *
> 434.zeusmp 9100 433 21.0 * 9100 423 21.5 *
> 435.gromacs 7140 402 17.7 * 7140 411 17.4 *
> 436.cactusADM 11950 486 24.6 * 11950 486 24.6 *
> 437.leslie3d 9400 421 22.4 * 9400 419 22.4 *
> 444.namd 8020 520 15.4 * 8020 520 15.4 *
> 447.dealII NR NR
> 450.soplex 8340 393 21.2 * 8340 391 21.3 *
> 453.povray 5320 277 19.2 * 5320 278 19.1 *
> 454.calculix 8250 453 18.2 * 8250 460 17.9 *
> 459.GemsFDTD 10610 542 19.6 * 10610 537 19.8 *
> 465.tonto 9840 492 20.0 * 9840 491 20.0 *
> 470.lbm 13740 466 29.5 * 13740 430 32.0 *
> 481.wrf 11170 492 22.7 * 11170 457 24.4 *
> 482.sphinx3 19490 659 29.6 * 19490 655 29.8 *
> Est. SPECfp(R)_base2006 21.6
> Est. SPECfp2006 22.1
>
>
> Ciao,
> Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-10-20 14:43 ` Michael Matz
2016-10-20 14:56 ` Bin.Cheng
@ 2016-10-20 19:17 ` Jeff Law
1 sibling, 0 replies; 20+ messages in thread
From: Jeff Law @ 2016-10-20 19:17 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
On 10/20/2016 08:43 AM, Michael Matz wrote:
> Hi,
>
> On Sat, 5 Dec 2015, Jeff Law wrote:
>
>> Nit. I don't think you want a comma after "so". And it looks like your
>> comment got truncated as well.
>>
>> With the comment above fixed, this is fine for the trunk.
>
> I'm terribly sorry to have dropped the ball here, but I've committed this
> now after not even a year ;-/ (r241374)
It'd totally fallen off my radar. I had to go find it in my archives :-).
Obviously after rebootstrapping
> with all,ada languages. I also did some benchmark run which should be
> taken with a grain of salt as the machine had fairly variant results but
> the improvements are real, though perhaps not always in that range (it's a
> normal three repeats run). I'm really curious if our automatic tester can
> pick up similar improvements, because if so, it's extreme (5 to 15 percent
> in some benchmarks) and we can brag about it for GCC 7 ;-)
Yea. I don't expect it applies that often and ISTM that it's probably
most beneficial by enabling other stuff later in the loop optimizer
pipeline to see more loops without embedded flow control.
ANyway, glad to see it go in.
jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-10-20 14:56 ` Bin.Cheng
@ 2016-10-24 8:44 ` Bin.Cheng
2016-10-24 9:02 ` Michael Matz
0 siblings, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2016-10-24 8:44 UTC (permalink / raw)
To: Michael Matz; +Cc: Jeff Law, gcc-patches List
On Thu, Oct 20, 2016 at 3:55 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Oct 20, 2016 at 3:43 PM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Sat, 5 Dec 2015, Jeff Law wrote:
>>
>>> Nit. I don't think you want a comma after "so". And it looks like your
>>> comment got truncated as well.
>>>
>>> With the comment above fixed, this is fine for the trunk.
>>
>> I'm terribly sorry to have dropped the ball here, but I've committed this
>> now after not even a year ;-/ (r241374) Obviously after rebootstrapping
>> with all,ada languages. I also did some benchmark run which should be
>> taken with a grain of salt as the machine had fairly variant results but
>> the improvements are real, though perhaps not always in that range (it's a
>> normal three repeats run). I'm really curious if our automatic tester can
>> pick up similar improvements, because if so, it's extreme (5 to 15 percent
>> in some benchmarks) and we can brag about it for GCC 7 ;-)
> This is nice, thanks for doing it. I will check the improvement on AArch64.
Hi,
Unfortunately I didn't reproduce the improvement in my run on AArch64,
I will double check if I made some mistakes.
Thanks,
bin
>>
>> 400.perlbench 9770 519 18.8 * 9770 508 19.2 *
>> 401.bzip2 9650 668 14.5 * 9650 666 14.5 *
>> 403.gcc 8050 455 17.7 * 8050 432 18.6 *
>> 429.mcf 9120 477 19.1 * 9120 467 19.5 *
>> 445.gobmk 10490 643 16.3 * 10490 644 16.3 *
>> 456.hmmer 9330 641 14.6 * 9330 614 15.2 *
>> 458.sjeng 12100 784 15.4 * 12100 762 15.9 *
>> 462.libquantum 20720 605 34.2 * 20720 600 34.5 *
>> 464.h264ref 22130 969 22.8 * 22130 969 22.8 *
>> 471.omnetpp 6250 438 14.3 * 6250 358 17.5 *
>> 473.astar 7020 494 14.2 * 7020 492 14.3 *
>> 483.xalancbmk 6900 342 20.2 * 6900 336 20.6 *
>> Est. SPECint(R)_base2006 17.9
>> Est. SPECint2006 18.5
>>
>> 410.bwaves 13590 563 24.1 * 13590 506 26.9 *
>> 416.gamess NR NR
>> 433.milc 9180 375 24.5 * 9180 349 26.3 *
>> 434.zeusmp 9100 433 21.0 * 9100 423 21.5 *
>> 435.gromacs 7140 402 17.7 * 7140 411 17.4 *
>> 436.cactusADM 11950 486 24.6 * 11950 486 24.6 *
>> 437.leslie3d 9400 421 22.4 * 9400 419 22.4 *
>> 444.namd 8020 520 15.4 * 8020 520 15.4 *
>> 447.dealII NR NR
>> 450.soplex 8340 393 21.2 * 8340 391 21.3 *
>> 453.povray 5320 277 19.2 * 5320 278 19.1 *
>> 454.calculix 8250 453 18.2 * 8250 460 17.9 *
>> 459.GemsFDTD 10610 542 19.6 * 10610 537 19.8 *
>> 465.tonto 9840 492 20.0 * 9840 491 20.0 *
>> 470.lbm 13740 466 29.5 * 13740 430 32.0 *
>> 481.wrf 11170 492 22.7 * 11170 457 24.4 *
>> 482.sphinx3 19490 659 29.6 * 19490 655 29.8 *
>> Est. SPECfp(R)_base2006 21.6
>> Est. SPECfp2006 22.1
>>
>>
>> Ciao,
>> Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Gimple loop splitting v2
2016-10-24 8:44 ` Bin.Cheng
@ 2016-10-24 9:02 ` Michael Matz
2016-10-25 16:41 ` Tamar Christina
0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2016-10-24 9:02 UTC (permalink / raw)
To: Bin.Cheng; +Cc: Jeff Law, gcc-patches List
Hi,
On Mon, 24 Oct 2016, Bin.Cheng wrote:
> Unfortunately I didn't reproduce the improvement in my run on AArch64, I
> will double check if I made some mistakes.
Yeah, our regular testers also didn't pick up these kinds of improvements.
As I said, the machine was quite jumpy (though not loaded at all, and
fixated to run on one CPU) :-/
Ciao,
Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Gimple loop splitting v2
2016-10-24 9:02 ` Michael Matz
@ 2016-10-25 16:41 ` Tamar Christina
0 siblings, 0 replies; 20+ messages in thread
From: Tamar Christina @ 2016-10-25 16:41 UTC (permalink / raw)
To: Michael Matz, Bin.Cheng; +Cc: Jeff Law, gcc-patches List, nd
Hi Michael,
The commit seems to be causing an ICE on aarch64 (just tested latest trunk).
I've created a Bugzilla ticket with a test input https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78107
Regards,
Tamar
> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Michael Matz
> Sent: 24 October 2016 10:02
> To: Bin.Cheng
> Cc: Jeff Law; gcc-patches List
> Subject: Re: Gimple loop splitting v2
>
> Hi,
>
> On Mon, 24 Oct 2016, Bin.Cheng wrote:
>
> > Unfortunately I didn't reproduce the improvement in my run on AArch64,
> > I will double check if I made some mistakes.
>
> Yeah, our regular testers also didn't pick up these kinds of improvements.
> As I said, the machine was quite jumpy (though not loaded at all, and fixated
> to run on one CPU) :-/
>
>
> Ciao,
> Michael.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-10-25 16:41 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 16:52 Gimple loop splitting Michael Matz
2015-11-12 21:44 ` Jeff Law
2015-11-16 16:06 ` Michael Matz
2015-11-16 23:27 ` Jeff Law
2015-12-01 16:47 ` Gimple loop splitting v2 Michael Matz
2015-12-01 22:57 ` Jeff Law
2015-12-02 13:23 ` Michael Matz
2015-12-05 7:55 ` Jeff Law
2016-10-20 14:43 ` Michael Matz
2016-10-20 14:56 ` Bin.Cheng
2016-10-24 8:44 ` Bin.Cheng
2016-10-24 9:02 ` Michael Matz
2016-10-25 16:41 ` Tamar Christina
2016-10-20 19:17 ` Jeff Law
2016-07-25 20:57 ` Andrew Pinski
2016-07-26 11:32 ` Richard Biener
2016-07-27 6:18 ` Andrew Pinski
2016-07-27 8:11 ` Richard Biener
2016-07-25 7:00 ` Gimple loop splitting Andrew Pinski
2016-07-25 14:27 ` Michael Matz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).