public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
@ 2019-03-12  7:33 Feng Xue OS
  2019-03-12  8:33 ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-03-12  7:33 UTC (permalink / raw)
  To: gcc-patches

This patch is composed to implement a loop transformation on one of its conditional statements, which we call it semi-invariant, in that its computation is impacted in only one of its branches.

Suppose a loop as:

    void f (std::map<int, int> m)
    {
        for (auto it = m.begin (); it != m.end (); ++it) {
            /* if (b) is semi-invariant. */
            if (b) {
                b = do_something();    /* Has effect on b */
            } else {
                                                        /* No effect on b */
            }
            statements;                      /* Also no effect on b */
        }
    }

A transformation, kind of loop split, could be:

    void f (std::map<int, int> m)
    {
        for (auto it = m.begin (); it != m.end (); ++it) {
            if (b) {
                b = do_something();
            } else {
                ++it;
                statements;
                break;
            }
            statements;
        }

        for (; it != m.end (); ++it) {
            statements;
        }
    }

If "statements" contains nothing, the second loop becomes an empty one, which can be removed. (This part will be given in another patch). And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop. In practice, this optimization is found to improve some real application by %7.

Since it is just a kind of loop split, the codes are mainly placed in existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and is enabled with -O3.

Feng


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 64bf6017d16..a6c2878d652 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
+
+	PR tree-optimization/89134
+        * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
+	(min-cond-loop-split-prob): Likewise.
+	* params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
+	* passes.def (pass_cond_loop_split) : New pass.
+	* timevar.def (TV_COND_LOOP_SPLIT): New time variable.
+	* tree-pass.h (make_pass_cond_loop_split): New declaration.
+	* tree-ssa-loop-split.c (split_info): New class.
+	(find_vdef_in_loop, vuse_semi_invariant_p): New functions.
+	(ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
+	(can_branch_be_excluded, get_cond_invariant_branch): Likewise.
+	(is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
+	(can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
+	(split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
+	(pass_data_cond_loop_split): New variable.
+	(pass_cond_loop_split): New class.
+	(make_pass_cond_loop_split): New function.
+
 2019-03-11  Jakub Jelinek  <jakub@redhat.com>
 
 	PR middle-end/89655
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index df0883f2fc9..f5e09bd71fd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item max-cond-loop-split-insns
+The maximum number of insns to be increased due to loop split on
+semi-invariant condition statement.
+
+@item min-cond-loop-split-prob
+The minimum threshold for probability of semi-invaraint condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 3f1576448be..2e067526958 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
 	"The maximum number of unswitchings in a single loop.",
 	3, 0, 0)
 
+/* The maximum number of increased insns due to loop split on semi-invariant
+   condition statement.  */
+DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
+	"max-cond-loop-split-insns",
+	"The maximum number of insns to be increased due to loop split on semi-invariant condition statement.",
+	100, 0, 0)
+
+DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
+	"min-cond-loop-split-prob",
+	"The minimum threshold for probability of semi-invaraint condition statement to trigger loop split.",
+	30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
diff --git a/gcc/passes.def b/gcc/passes.def
index 446a7c48276..bde7f4c50c0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_tree_unswitch);
 	  NEXT_PASS (pass_scev_cprop);
 	  NEXT_PASS (pass_loop_split);
+	  NEXT_PASS (pass_cond_loop_split);
 	  NEXT_PASS (pass_loop_versioning);
 	  NEXT_PASS (pass_loop_jam);
 	  /* All unswitching, final value replacement and splitting can expose
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 54154464a58..39f2df0e3ec 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
 DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
+DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
 DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 47be59b2a11..f441ba36871 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 999c9a30366..d287a0d7d4c 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 
-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kind of loop splitting.
+
+   One transformation of loops like:
 
    for (i = 0; i < 100; i++)
      {
@@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
   return 0;
 }
 
+
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement. */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body(). */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop. */
+  auto_vec<gimple *> stores;
+
+  /* Whether above memory stores vector has been filled. */
+  bool set_stores;
+
+  /* Semi-invariant conditional statement, upon which to split loop. */
+  gcond *cond;
+
+  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+        free (bbs);
+    }
+};
+
+/* Find all statements with memory-write effect in a loop, including memory
+   store and non-pure function call, and keep those in a vector. This work
+   is only done for one time, for the vector should be constant during
+   analysis stage of semi-invariant condition. */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled. */
+  info->set_stores = true;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block. */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes. The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first. */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it. */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> others;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+         and reversely start the process from the last SSA name toward the
+         first, which ensure that this do-while will not touch SSA names
+         defined outside of the loop. */
+      gcc_assert (gimple_bb (stmt)
+                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+        {
+          gphi *phi = as_a <gphi *> (stmt);
+
+          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+            {
+              tree arg = gimple_phi_arg_def (stmt, i);
+
+              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+                worklist.safe_push (arg);
+            }
+        }
+      else
+        {
+          tree prev = gimple_vuse (stmt);
+
+          /* Non-pure call statement is conservatively assumed to impact
+             all memory locations. So place call statements ahead of other
+             memory stores in the vector with the idea of of using them as
+             shortcut terminators to memory alias analysis, kind of
+             optimization for compilation. */
+          if (gimple_code (stmt) == GIMPLE_CALL)
+            info->stores.safe_push (stmt);
+          else
+            others.safe_push (stmt);
+
+          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+            worklist.safe_push (prev);
+        }
+    } while (!worklist.is_empty ());
+
+    info->stores.safe_splice (others);
+}
+
+
+/* Given a memory load or pure call statement, check whether it is impacted
+   by some memory store in the loop excluding those basic blocks dominated
+   by SKIP_HEAD (those basic blocks always corresponds to one branch of
+   a conditional statement). If SKIP_HEAD is NULL, all basic blocks of the
+   loop are checked. */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+                       const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+
+  /* Collect memory store/clobber statements if have not do that. */
+  if (!info->set_stores)
+    find_vdef_in_loop (loop);
+
+  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->stores, i, store)
+    {
+      /* Skip those basic blocks dominated by SKIP_HEAD. */
+      if (skip_head
+          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+        continue;
+
+      /* For a pure call, it is assumed to be impacted by any memory store.
+         For a memory load, use memory alias analysis to check that. */
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+        return false;
+    }
+
+  return true;
+}
+
+/* Forward declaration */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+                       const_basic_block skip_head);
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed in certain
+   iteration, check whether an SSA name remains unchanged in next interation.
+   We can call this characterisic as semi-invariantness. SKIP_HEAD might be
+   NULL, if so, nothing excluded, all basic blocks and control flows in the
+   loop will be considered. */
+
+static bool
+ssa_semi_invariant_p (struct loop *loop, const tree name,
+                      const_basic_block skip_head)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside a loop is definitely semi-invariant. */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  /* This function is used to check semi-invariantness of a condition
+     statement, and SKIP_HEAD is always given as head of one of its
+     branches. So it implies that SSA name to check should be defined
+     before the conditional statement, and also before SKIP_HEAD. */
+
+  if (gimple_code (def) == GIMPLE_PHI)
+    {
+      /* In a normal loop, if a PHI node is located not in loop header, all
+         its source operands should be defined inside the loop. As we
+         mentioned before, these source definitions are ahead of SKIP_HEAD,
+         and will not be bypassed. Therefore, in each iteration, any of
+         these sources might be value provider to the SSA name, which for
+         sure should not be seen as invariant. */
+      if (def_bb != loop->header || !skip_head)
+        return false;
+
+      const_edge latch = loop_latch_edge (loop);
+      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
+
+      /* A PHI node in loop header always contains two source operands,
+         one is initial value, the other is the copy of last iteration
+         through loop latch, we call it latch value. From this PHI node
+         to definition of latch value, if excluding those basic blocks
+         dominated by SKIP_HEAD, there is no definition of other version
+         of same variable, SSA name defined by the PHI node is
+         semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+        Suppose in certain iteration, execution flow in above graph goes
+        through true branch, which means that one source value to define
+        x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
+        x_1 in next iterations is defined by x_3, we know that x_1 will
+        never changed if COND always chooses true branch from then on. */
+
+      while (from != name)
+        {
+          /* A new value comes from a CONSTANT. */
+          if (TREE_CODE (from) != SSA_NAME)
+            return false;
+
+          gimple *stmt = SSA_NAME_DEF_STMT (from);
+          const_basic_block bb = gimple_bb (stmt);
+
+          /* A new value comes from outside of loop. */
+          if (!bb || !flow_bb_inside_loop_p (loop, bb))
+            return false;
+
+          from = NULL_TREE;
+
+          if (gimple_code (stmt) == GIMPLE_PHI)
+            {
+              gphi *phi = as_a <gphi *> (stmt);
+
+              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+                {
+                  const_edge e = gimple_phi_arg_edge (phi, i);
+
+                  /* Skip redefinition from basic blocks being excluded. */
+                  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+                    {
+                      /* There are more than one source operands that can
+                         provide value to the SSA name. */
+                      if (from)
+                        return false;
+
+                      from = gimple_phi_arg_def (phi, i);
+                    }
+                }
+            }
+          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+            {
+              /* For simple value copy, check its rhs instead. */
+              if (gimple_assign_ssa_name_copy_p (stmt))
+                from = gimple_assign_rhs1 (stmt);
+            }
+
+          /* Any other kind of definition is deemed to introduce a new value
+             to the SSA name. */
+          if (!from)
+            return false;
+        }
+        return true;
+    }
+
+  /* Value originated from volatile memory load or return of normal (non-
+     const/pure) call should not be treated as constant in each iteration. */
+  if (gimple_has_side_effects (def))
+    return false;
+
+  /* Check if any memory store may kill memory load at this place. */
+  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
+    return false;
+
+  /* Check operands of definition statement of the SSA name. */
+  return stmt_semi_invariant_p (loop, def, skip_head);
+}
+
+/* Check whether a statement is semi-invariant, iff all its operands are
+   semi-invariant. */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+                       const_basic_block skip_head)
+{
+  ssa_op_iter iter;
+  tree use;
+
+  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
+     here we only need to check SSA name operands. For VARDECL operand
+     involves memory load, check on VARDECL operand must have been done
+     prior to invocation of this function in ssa_semi_invariant_p. */
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+    {
+      if (!ssa_semi_invariant_p (loop, use, skip_head))
+        return false;
+    }
+
+  return true;
+}
+
+/* Determine if unselect one branch of a conditional statement, whether we
+   can exclude leading basic block of the branch and those basic blocks
+   dominated by the leading one. */
+
+static bool
+can_branch_be_excluded (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+        continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+        continue;
+
+       /* The branch can be reached through other path, not just from the
+          conditional statement. */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement is invariant. That
+   is: once the branch is selected in certain loop iteration, any operand
+   that contributes to computation of the conditional statement remains
+   unchanged in all following iterations. */
+
+static int
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+         this conditional statement. Firstly, it is trivial if the exit
+         branch is semi-invariant, for the statement is just loop-breaking.
+         Secondly, if the opposite branch is semi-invariant, it means that
+         the statement is real loop-invariant, which is covered by loop
+         unswitch. */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+        return -1;
+    }
+
+  invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!can_branch_be_excluded (targ_bb[i]))
+        continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+         loop latch, it and its following trace will only be executed in
+         final iteration of loop, namely it is not part of repeated body
+         of the loop. Similar to the above case that the branch is loop
+         exit, no need to split loop. */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+        continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want. */
+  if (invar[0] ^ !invar[1])
+    return -1;
+
+  /* Found a real loop-invariant condition, do nothing. */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return -1;
+
+  return invar[1];
+}
+
+/* Return TRUE is conditional statement in a normal loop is also inside
+   a nested non-recognized loop, such as an irreducible loop. */
+
+static bool
+is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
+                        int branch)
+{
+  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
+
+  if (cond_bb == loop->header || branch_bb == loop->latch)
+    return false;
+
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  auto_vec<basic_block> worklist;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    bbs[i]->flags &= ~BB_REACHABLE;
+
+  /* Mark latch basic block as visited to be end point for reachablility
+     traversal. */
+  loop->latch->flags |= BB_REACHABLE;
+
+  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
+
+  /* Start from specified branch, the opposite branch is ignored for it
+     will not be executed. */
+  branch_bb->flags |= BB_REACHABLE;
+  worklist.safe_push (branch_bb);
+
+  do
+    {
+      basic_block bb = worklist.pop ();
+      edge e;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+        {
+          basic_block succ_bb = e->dest;
+
+          if (succ_bb == cond_bb)
+            return true;
+
+          if (!flow_bb_inside_loop_p (loop, succ_bb))
+            continue;
+
+          if (succ_bb->flags & BB_REACHABLE)
+            continue;
+
+          succ_bb->flags |= BB_REACHABLE;
+          worklist.safe_push (succ_bb);
+        }
+    } while (!worklist.is_empty ());
+
+  return false;
+}
+
+
+/* Calculate increased code size measured by estimated insn number if
+   applying loop split upon certain branch of a conditional statement. */
+
+static int
+compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
+                         int branch)
+{
+  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch. */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
+        continue;
+
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
+           gsi_next (&gsi))
+        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
+    }
+
+  return num;
+}
+
+/* Return true if it is eligible and profitable to perform loop split upon
+   a conditional statement. */
+
+static bool
+can_split_loop_on_cond (struct loop *loop, gcond *cond)
+{
+  int branch = get_cond_invariant_branch (loop, cond);
+
+  if (branch < 0)
+    return false;
+
+  basic_block cond_bb = gimple_bb (cond);
+
+  /* Add a threshold for increased code size to disable loop split. */
+  if (compute_added_num_insns (loop, cond_bb, branch) >
+      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
+    return false;
+
+  /* In each interation, conditional statement candidate should be
+     executed only once. */
+  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
+    return false;
+
+  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go. */
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+        return false;
+    }
+
+  /* Temporarily keep branch index in conditional statement. */
+  gimple_set_plf (cond, GF_PLF_1, branch);
+  return true;
+}
+
+/* Traverse all conditional statements in a loop, to find out a good
+   candidate upon which we can do loop split. */
+
+static bool
+mark_cond_to_split_loop (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field. */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* Skip statement in inner recognized loop, because we want that
+         conditional statement executes at most once in each iteration. */
+      if (bb->loop_father != loop)
+        continue;
+
+      /* Actually this check is not a must constraint. With it, we can
+         ensure conditional statement will execute at least once in
+         each iteration. */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+        continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+        continue;
+
+      gcond *cond = as_a <gcond *> (last);
+
+      if (can_split_loop_on_cond (loop, cond))
+        {
+          info->cond = cond;
+          return true;
+        }
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return false;
+}
+
+/* Given a loop with a chosen conditional statement candidate, perform loop
+   split transformation illustrated as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+       [ bool r = cond; ]     |     |                 |              |
+               |              |     |                 |              |
+      .---- if (r) -----.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              |  /                    |              |
+              / \             | /                    / \             |
+     .-------*   *       [ if (!r) ]        .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out. In loop1, a new bool temporary (r)
+   is introduced to keep value of the condition result. In original latch
+   edge of loop1, we insert a new conditional statement whose value comes
+   from previous temporary (r), one of its branch goes back to loop1 header
+   as a latch edge, and the other branch goes to loop2 pre-header as an
+   entry edge. And also in loop2, we abandon the variant branch of the
+   conditional statement candidate by setting a constant bool condition,
+   based on which branch is semi-invariant. */
+
+static bool
+split_loop_for_cond (struct loop *loop1)
+{
+  split_info *info = (split_info *) loop1->aux;
+  gcond *cond = info->cond;
+  basic_block cond_bb = gimple_bb (cond);
+  int branch = gimple_plf (cond, GF_PLF_1);
+  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+              current_function_name (), loop1->num,
+              true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+                                     profile_probability::always (),
+                                     profile_probability::never (),
+                                     profile_probability::always (),
+                                     profile_probability::always (),
+                                     true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  /* Generate a bool type temporary to hold result of the condition. */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+                                      gimple_cond_code (cond),
+                                      gimple_cond_lhs (cond),
+                                      gimple_cond_rhs (cond));
+
+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
+  update_stmt (cond);
+
+  /* Replace the condition in loop2 with a bool constant to let pass
+     manager remove the variant branch after current pass finishes. */
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1. This
+     statement acts as a switch to transfer execution from loop1 to
+     loop2, when loop1 enters into invariant state. */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
+                                          NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+
+  if (true_invar)
+    {
+      to_loop1->flags |= EDGE_FALSE_VALUE;
+      to_loop2->flags |= EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      to_loop1->flags |= EDGE_TRUE_VALUE;
+      to_loop2->flags |= EDGE_FALSE_VALUE;
+    }
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2. */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Main entry point to perform loop splitting for suitable if-conditions
+   in all loops. */
+
+static unsigned int
+tree_ssa_split_loops_for_cond (void)
+{
+  struct loop *loop;
+  auto_vec<struct loop *> loop_list;
+  bool changed = false;
+  unsigned i;
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  /* Go through all loops starting from innermost. */
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      /* Put loop in a list if found a conditional statement candidate in
+         the loop. This is stage for analysis, no change anything in the
+         function. */
+      if (!loop->aux
+          && !optimize_loop_for_size_p (loop)
+          && mark_cond_to_split_loop (loop))
+        loop_list.safe_push (loop);
+
+      /* If any of our inner loops was split, don't split us,
+         and mark our containing loop as having had splits as well. */
+      loop_outer (loop)->aux = loop->aux;
+    }
+
+  FOR_EACH_VEC_ELT (loop_list, i, loop)
+    {
+      /* Extract selected loop and perform loop split. This is stage for
+         transformation. */
+      changed |= split_loop_for_cond (loop);
+
+      delete (split_info *) loop->aux;
+    }
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  if (changed)
+    return TODO_cleanup_cfg;
+  return 0;
+}
+
+
 /* Loop splitting pass.  */
 
 namespace {
@@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
 {
   return new pass_loop_split (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_cond_loop_split =
+{
+  GIMPLE_PASS, /* type */
+  "cond_lsplit", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_COND_LOOP_SPLIT, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_cond_loop_split : public gimple_opt_pass
+{
+public:
+  pass_cond_loop_split (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_split_loops != 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_cond_loop_split
+
+unsigned int
+pass_cond_loop_split::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  return tree_ssa_split_loops_for_cond ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_cond_loop_split (gcc::context *ctxt)
+{
+  return new pass_cond_loop_split (ctxt);
+}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-12  7:33 [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134) Feng Xue OS
@ 2019-03-12  8:33 ` Richard Biener
  2019-03-13  2:13   ` Feng Xue OS
  2019-05-06  3:04   ` Feng Xue OS
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Biener @ 2019-03-12  8:33 UTC (permalink / raw)
  To: Feng Xue OS; +Cc: gcc-patches

On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> This patch is composed to implement a loop transformation on one of its conditional statements, which we call it semi-invariant, in that its computation is impacted in only one of its branches.
>
> Suppose a loop as:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             /* if (b) is semi-invariant. */
>             if (b) {
>                 b = do_something();    /* Has effect on b */
>             } else {
>                                                         /* No effect on b */
>             }
>             statements;                      /* Also no effect on b */
>         }
>     }
>
> A transformation, kind of loop split, could be:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             if (b) {
>                 b = do_something();
>             } else {
>                 ++it;
>                 statements;
>                 break;
>             }
>             statements;
>         }
>
>         for (; it != m.end (); ++it) {
>             statements;
>         }
>     }
>
> If "statements" contains nothing, the second loop becomes an empty one, which can be removed. (This part will be given in another patch). And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop. In practice, this optimization is found to improve some real application by %7.
>
> Since it is just a kind of loop split, the codes are mainly placed in existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and is enabled with -O3.

Note the transform itself is jump-threading with the threading
duplicating a whole CFG cycle.

I didn't look at the patch details yet since this is suitable for GCC 10 only.

Thanks for implementing this.
Richard.

> Feng
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 64bf6017d16..a6c2878d652 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> +
> +       PR tree-optimization/89134
> +        * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> +       (min-cond-loop-split-prob): Likewise.
> +       * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> +       * passes.def (pass_cond_loop_split) : New pass.
> +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> +       * tree-ssa-loop-split.c (split_info): New class.
> +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> +       (pass_data_cond_loop_split): New variable.
> +       (pass_cond_loop_split): New class.
> +       (make_pass_cond_loop_split): New function.
> +
>  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
>
>         PR middle-end/89655
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index df0883f2fc9..f5e09bd71fd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.
> +
> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 3f1576448be..2e067526958 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>         "The maximum number of unswitchings in a single loop.",
>         3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition statement to trigger loop split.",
> +       30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
>     headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 446a7c48276..bde7f4c50c0 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_tree_unswitch);
>           NEXT_PASS (pass_scev_cprop);
>           NEXT_PASS (pass_loop_split);
> +         NEXT_PASS (pass_cond_loop_split);
>           NEXT_PASS (pass_loop_versioning);
>           NEXT_PASS (pass_loop_jam);
>           /* All unswitching, final value replacement and splitting can expose
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 54154464a58..39f2df0e3ec 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
>  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 47be59b2a11..f441ba36871 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..d287a0d7d4c 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "tree-inline.h"
>  #include "cfgloop.h"
> +#include "params.h"
>  #include "tree-scalar-evolution.h"
>  #include "gimple-iterator.h"
>  #include "gimple-pretty-print.h"
> @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.
> +
> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement. */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body(). */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop. */
> +  auto_vec<gimple *> stores;
> +
> +  /* Whether above memory stores vector has been filled. */
> +  bool set_stores;
> +
> +  /* Semi-invariant conditional statement, upon which to split loop. */
> +  gcond *cond;
> +
> +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +        free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in a loop, including memory
> +   store and non-pure function call, and keep those in a vector. This work
> +   is only done for one time, for the vector should be constant during
> +   analysis stage of semi-invariant condition. */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled. */
> +  info->set_stores = true;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block. */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes. The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first. */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it. */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> others;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +         and reversely start the process from the last SSA name toward the
> +         first, which ensure that this do-while will not touch SSA names
> +         defined outside of the loop. */
> +      gcc_assert (gimple_bb (stmt)
> +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +        {
> +          gphi *phi = as_a <gphi *> (stmt);
> +
> +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +            {
> +              tree arg = gimple_phi_arg_def (stmt, i);
> +
> +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +                worklist.safe_push (arg);
> +            }
> +        }
> +      else
> +        {
> +          tree prev = gimple_vuse (stmt);
> +
> +          /* Non-pure call statement is conservatively assumed to impact
> +             all memory locations. So place call statements ahead of other
> +             memory stores in the vector with the idea of of using them as
> +             shortcut terminators to memory alias analysis, kind of
> +             optimization for compilation. */
> +          if (gimple_code (stmt) == GIMPLE_CALL)
> +            info->stores.safe_push (stmt);
> +          else
> +            others.safe_push (stmt);
> +
> +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +            worklist.safe_push (prev);
> +        }
> +    } while (!worklist.is_empty ());
> +
> +    info->stores.safe_splice (others);
> +}
> +
> +
> +/* Given a memory load or pure call statement, check whether it is impacted
> +   by some memory store in the loop excluding those basic blocks dominated
> +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks of the
> +   loop are checked. */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that. */
> +  if (!info->set_stores)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->stores, i, store)
> +    {
> +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> +      if (skip_head
> +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +        continue;
> +
> +      /* For a pure call, it is assumed to be impacted by any memory store.
> +         For a memory load, use memory alias analysis to check that. */
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +        return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed in certain
> +   iteration, check whether an SSA name remains unchanged in next interation.
> +   We can call this characterisic as semi-invariantness. SKIP_HEAD might be
> +   NULL, if so, nothing excluded, all basic blocks and control flows in the
> +   loop will be considered. */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                      const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant. */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  /* This function is used to check semi-invariantness of a condition
> +     statement, and SKIP_HEAD is always given as head of one of its
> +     branches. So it implies that SSA name to check should be defined
> +     before the conditional statement, and also before SKIP_HEAD. */
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* In a normal loop, if a PHI node is located not in loop header, all
> +         its source operands should be defined inside the loop. As we
> +         mentioned before, these source definitions are ahead of SKIP_HEAD,
> +         and will not be bypassed. Therefore, in each iteration, any of
> +         these sources might be value provider to the SSA name, which for
> +         sure should not be seen as invariant. */
> +      if (def_bb != loop->header || !skip_head)
> +        return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header always contains two source operands,
> +         one is initial value, the other is the copy of last iteration
> +         through loop latch, we call it latch value. From this PHI node
> +         to definition of latch value, if excluding those basic blocks
> +         dominated by SKIP_HEAD, there is no definition of other version
> +         of same variable, SSA name defined by the PHI node is
> +         semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +        Suppose in certain iteration, execution flow in above graph goes
> +        through true branch, which means that one source value to define
> +        x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +        x_1 in next iterations is defined by x_3, we know that x_1 will
> +        never changed if COND always chooses true branch from then on. */
> +
> +      while (from != name)
> +        {
> +          /* A new value comes from a CONSTANT. */
> +          if (TREE_CODE (from) != SSA_NAME)
> +            return false;
> +
> +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> +          const_basic_block bb = gimple_bb (stmt);
> +
> +          /* A new value comes from outside of loop. */
> +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +            return false;
> +
> +          from = NULL_TREE;
> +
> +          if (gimple_code (stmt) == GIMPLE_PHI)
> +            {
> +              gphi *phi = as_a <gphi *> (stmt);
> +
> +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +                {
> +                  const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +                  /* Skip redefinition from basic blocks being excluded. */
> +                  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                    {
> +                      /* There are more than one source operands that can
> +                         provide value to the SSA name. */
> +                      if (from)
> +                        return false;
> +
> +                      from = gimple_phi_arg_def (phi, i);
> +                    }
> +                }
> +            }
> +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +            {
> +              /* For simple value copy, check its rhs instead. */
> +              if (gimple_assign_ssa_name_copy_p (stmt))
> +                from = gimple_assign_rhs1 (stmt);
> +            }
> +
> +          /* Any other kind of definition is deemed to introduce a new value
> +             to the SSA name. */
> +          if (!from)
> +            return false;
> +        }
> +        return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration. */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place. */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name. */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether a statement is semi-invariant, iff all its operands are
> +   semi-invariant. */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands. For VARDECL operand
> +     involves memory load, check on VARDECL operand must have been done
> +     prior to invocation of this function in ssa_semi_invariant_p. */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +        return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine if unselect one branch of a conditional statement, whether we
> +   can exclude leading basic block of the branch and those basic blocks
> +   dominated by the leading one. */
> +
> +static bool
> +can_branch_be_excluded (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +        continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +        continue;
> +
> +       /* The branch can be reached through other path, not just from the
> +          conditional statement. */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement is invariant. That
> +   is: once the branch is selected in certain loop iteration, any operand
> +   that contributes to computation of the conditional statement remains
> +   unchanged in all following iterations. */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +         this conditional statement. Firstly, it is trivial if the exit
> +         branch is semi-invariant, for the statement is just loop-breaking.
> +         Secondly, if the opposite branch is semi-invariant, it means that
> +         the statement is real loop-invariant, which is covered by loop
> +         unswitch. */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +        return -1;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!can_branch_be_excluded (targ_bb[i]))
> +        continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +         loop latch, it and its following trace will only be executed in
> +         final iteration of loop, namely it is not part of repeated body
> +         of the loop. Similar to the above case that the branch is loop
> +         exit, no need to split loop. */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +        continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want. */
> +  if (invar[0] ^ !invar[1])
> +    return -1;
> +
> +  /* Found a real loop-invariant condition, do nothing. */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return -1;
> +
> +  return invar[1];
> +}
> +
> +/* Return TRUE is conditional statement in a normal loop is also inside
> +   a nested non-recognized loop, such as an irreducible loop. */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                        int branch)
> +{
> +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> +
> +  if (cond_bb == loop->header || branch_bb == loop->latch)
> +    return false;
> +
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  auto_vec<basic_block> worklist;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    bbs[i]->flags &= ~BB_REACHABLE;
> +
> +  /* Mark latch basic block as visited to be end point for reachablility
> +     traversal. */
> +  loop->latch->flags |= BB_REACHABLE;
> +
> +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> +
> +  /* Start from specified branch, the opposite branch is ignored for it
> +     will not be executed. */
> +  branch_bb->flags |= BB_REACHABLE;
> +  worklist.safe_push (branch_bb);
> +
> +  do
> +    {
> +      basic_block bb = worklist.pop ();
> +      edge e;
> +      edge_iterator ei;
> +
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +        {
> +          basic_block succ_bb = e->dest;
> +
> +          if (succ_bb == cond_bb)
> +            return true;
> +
> +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> +            continue;
> +
> +          if (succ_bb->flags & BB_REACHABLE)
> +            continue;
> +
> +          succ_bb->flags |= BB_REACHABLE;
> +          worklist.safe_push (succ_bb);
> +        }
> +    } while (!worklist.is_empty ());
> +
> +  return false;
> +}
> +
> +
> +/* Calculate increased code size measured by estimated insn number if
> +   applying loop split upon certain branch of a conditional statement. */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                         int branch)
> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch. */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +        continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +           gsi_next (&gsi))
> +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> +    }
> +
> +  return num;
> +}
> +
> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement. */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +
> +  /* Add a threshold for increased code size to disable loop split. */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> +    return false;
> +
> +  /* In each interation, conditional statement candidate should be
> +     executed only once. */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;
> +
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go. */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +        return false;
> +    }
> +
> +  /* Temporarily keep branch index in conditional statement. */
> +  gimple_set_plf (cond, GF_PLF_1, branch);
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in a loop, to find out a good
> +   candidate upon which we can do loop split. */
> +
> +static bool
> +mark_cond_to_split_loop (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field. */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* Skip statement in inner recognized loop, because we want that
> +         conditional statement executes at most once in each iteration. */
> +      if (bb->loop_father != loop)
> +        continue;
> +
> +      /* Actually this check is not a must constraint. With it, we can
> +         ensure conditional statement will execute at least once in
> +         each iteration. */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +        continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +        continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +
> +      if (can_split_loop_on_cond (loop, cond))
> +        {
> +          info->cond = cond;
> +          return true;
> +        }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return false;
> +}
> +
> +/* Given a loop with a chosen conditional statement candidate, perform loop
> +   split transformation illustrated as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out. In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result. In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an
> +   entry edge. And also in loop2, we abandon the variant branch of the
> +   conditional statement candidate by setting a constant bool condition,
> +   based on which branch is semi-invariant. */
> +
> +static bool
> +split_loop_for_cond (struct loop *loop1)
> +{
> +  split_info *info = (split_info *) loop1->aux;
> +  gcond *cond = info->cond;
> +  basic_block cond_bb = gimple_bb (cond);
> +  int branch = gimple_plf (cond, GF_PLF_1);
> +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +              current_function_name (), loop1->num,
> +              true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                     profile_probability::always (),
> +                                     profile_probability::never (),
> +                                     profile_probability::always (),
> +                                     profile_probability::always (),
> +                                     true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition. */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                      gimple_cond_code (cond),
> +                                      gimple_cond_lhs (cond),
> +                                      gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  /* Replace the condition in loop2 with a bool constant to let pass
> +     manager remove the variant branch after current pass finishes. */
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1. This
> +     statement acts as a switch to transfer execution from loop1 to
> +     loop2, when loop1 enters into invariant state. */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                          NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +
> +  if (true_invar)
> +    {
> +      to_loop1->flags |= EDGE_FALSE_VALUE;
> +      to_loop2->flags |= EDGE_TRUE_VALUE;
> +    }
> +  else
> +    {
> +      to_loop1->flags |= EDGE_TRUE_VALUE;
> +      to_loop2->flags |= EDGE_FALSE_VALUE;
> +    }
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2. */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops. */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)
> +{
> +  struct loop *loop;
> +  auto_vec<struct loop *> loop_list;
> +  bool changed = false;
> +  unsigned i;
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  /* Go through all loops starting from innermost. */
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      /* Put loop in a list if found a conditional statement candidate in
> +         the loop. This is stage for analysis, no change anything in the
> +         function. */
> +      if (!loop->aux
> +          && !optimize_loop_for_size_p (loop)
> +          && mark_cond_to_split_loop (loop))
> +        loop_list.safe_push (loop);
> +
> +      /* If any of our inner loops was split, don't split us,
> +         and mark our containing loop as having had splits as well. */
> +      loop_outer (loop)->aux = loop->aux;
> +    }
> +
> +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> +    {
> +      /* Extract selected loop and perform loop split. This is stage for
> +         transformation. */
> +      changed |= split_loop_for_cond (loop);
> +
> +      delete (split_info *) loop->aux;
> +    }
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  if (changed)
> +    return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +
>  /* Loop splitting pass.  */
>
>  namespace {
> @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
>  {
>    return new pass_loop_split (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_cond_loop_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "cond_lsplit", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_COND_LOOP_SPLIT, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_cond_loop_split : public gimple_opt_pass
> +{
> +public:
> +  pass_cond_loop_split (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_split_loops != 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_cond_loop_split
> +
> +unsigned int
> +pass_cond_loop_split::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  return tree_ssa_split_loops_for_cond ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_cond_loop_split (gcc::context *ctxt)
> +{
> +  return new pass_cond_loop_split (ctxt);
> +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-12  8:33 ` Richard Biener
@ 2019-03-13  2:13   ` Feng Xue OS
  2019-03-13  9:43     ` Kyrill Tkachov
  2019-05-06  3:04   ` Feng Xue OS
  1 sibling, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-03-13  2:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard,

    Thanks for your comment. Yes, it is like kind of jump threading with knowledge of loop structure. And what is rough time for GCC 10?


Regards,

Feng


________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, March 12, 2019 4:31:49 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)

On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> This patch is composed to implement a loop transformation on one of its conditional statements, which we call it semi-invariant, in that its computation is impacted in only one of its branches.
>
> Suppose a loop as:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             /* if (b) is semi-invariant. */
>             if (b) {
>                 b = do_something();    /* Has effect on b */
>             } else {
>                                                         /* No effect on b */
>             }
>             statements;                      /* Also no effect on b */
>         }
>     }
>
> A transformation, kind of loop split, could be:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             if (b) {
>                 b = do_something();
>             } else {
>                 ++it;
>                 statements;
>                 break;
>             }
>             statements;
>         }
>
>         for (; it != m.end (); ++it) {
>             statements;
>         }
>     }
>
> If "statements" contains nothing, the second loop becomes an empty one, which can be removed. (This part will be given in another patch). And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop. In practice, this optimization is found to improve some real application by %7.
>
> Since it is just a kind of loop split, the codes are mainly placed in existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and is enabled with -O3.

Note the transform itself is jump-threading with the threading
duplicating a whole CFG cycle.

I didn't look at the patch details yet since this is suitable for GCC 10 only.

Thanks for implementing this.
Richard.

> Feng
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 64bf6017d16..a6c2878d652 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> +
> +       PR tree-optimization/89134
> +        * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> +       (min-cond-loop-split-prob): Likewise.
> +       * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> +       * passes.def (pass_cond_loop_split) : New pass.
> +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> +       * tree-ssa-loop-split.c (split_info): New class.
> +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> +       (pass_data_cond_loop_split): New variable.
> +       (pass_cond_loop_split): New class.
> +       (make_pass_cond_loop_split): New function.
> +
>  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
>
>         PR middle-end/89655
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index df0883f2fc9..f5e09bd71fd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.
> +
> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 3f1576448be..2e067526958 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>         "The maximum number of unswitchings in a single loop.",
>         3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition statement to trigger loop split.",
> +       30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
>     headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 446a7c48276..bde7f4c50c0 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_tree_unswitch);
>           NEXT_PASS (pass_scev_cprop);
>           NEXT_PASS (pass_loop_split);
> +         NEXT_PASS (pass_cond_loop_split);
>           NEXT_PASS (pass_loop_versioning);
>           NEXT_PASS (pass_loop_jam);
>           /* All unswitching, final value replacement and splitting can expose
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 54154464a58..39f2df0e3ec 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
>  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 47be59b2a11..f441ba36871 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..d287a0d7d4c 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "tree-inline.h"
>  #include "cfgloop.h"
> +#include "params.h"
>  #include "tree-scalar-evolution.h"
>  #include "gimple-iterator.h"
>  #include "gimple-pretty-print.h"
> @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.
> +
> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement. */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body(). */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop. */
> +  auto_vec<gimple *> stores;
> +
> +  /* Whether above memory stores vector has been filled. */
> +  bool set_stores;
> +
> +  /* Semi-invariant conditional statement, upon which to split loop. */
> +  gcond *cond;
> +
> +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +        free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in a loop, including memory
> +   store and non-pure function call, and keep those in a vector. This work
> +   is only done for one time, for the vector should be constant during
> +   analysis stage of semi-invariant condition. */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled. */
> +  info->set_stores = true;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block. */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes. The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first. */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it. */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> others;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +         and reversely start the process from the last SSA name toward the
> +         first, which ensure that this do-while will not touch SSA names
> +         defined outside of the loop. */
> +      gcc_assert (gimple_bb (stmt)
> +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +        {
> +          gphi *phi = as_a <gphi *> (stmt);
> +
> +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +            {
> +              tree arg = gimple_phi_arg_def (stmt, i);
> +
> +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +                worklist.safe_push (arg);
> +            }
> +        }
> +      else
> +        {
> +          tree prev = gimple_vuse (stmt);
> +
> +          /* Non-pure call statement is conservatively assumed to impact
> +             all memory locations. So place call statements ahead of other
> +             memory stores in the vector with the idea of of using them as
> +             shortcut terminators to memory alias analysis, kind of
> +             optimization for compilation. */
> +          if (gimple_code (stmt) == GIMPLE_CALL)
> +            info->stores.safe_push (stmt);
> +          else
> +            others.safe_push (stmt);
> +
> +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +            worklist.safe_push (prev);
> +        }
> +    } while (!worklist.is_empty ());
> +
> +    info->stores.safe_splice (others);
> +}
> +
> +
> +/* Given a memory load or pure call statement, check whether it is impacted
> +   by some memory store in the loop excluding those basic blocks dominated
> +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks of the
> +   loop are checked. */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that. */
> +  if (!info->set_stores)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->stores, i, store)
> +    {
> +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> +      if (skip_head
> +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +        continue;
> +
> +      /* For a pure call, it is assumed to be impacted by any memory store.
> +         For a memory load, use memory alias analysis to check that. */
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +        return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed in certain
> +   iteration, check whether an SSA name remains unchanged in next interation.
> +   We can call this characterisic as semi-invariantness. SKIP_HEAD might be
> +   NULL, if so, nothing excluded, all basic blocks and control flows in the
> +   loop will be considered. */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                      const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant. */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  /* This function is used to check semi-invariantness of a condition
> +     statement, and SKIP_HEAD is always given as head of one of its
> +     branches. So it implies that SSA name to check should be defined
> +     before the conditional statement, and also before SKIP_HEAD. */
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* In a normal loop, if a PHI node is located not in loop header, all
> +         its source operands should be defined inside the loop. As we
> +         mentioned before, these source definitions are ahead of SKIP_HEAD,
> +         and will not be bypassed. Therefore, in each iteration, any of
> +         these sources might be value provider to the SSA name, which for
> +         sure should not be seen as invariant. */
> +      if (def_bb != loop->header || !skip_head)
> +        return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header always contains two source operands,
> +         one is initial value, the other is the copy of last iteration
> +         through loop latch, we call it latch value. From this PHI node
> +         to definition of latch value, if excluding those basic blocks
> +         dominated by SKIP_HEAD, there is no definition of other version
> +         of same variable, SSA name defined by the PHI node is
> +         semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +        Suppose in certain iteration, execution flow in above graph goes
> +        through true branch, which means that one source value to define
> +        x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +        x_1 in next iterations is defined by x_3, we know that x_1 will
> +        never changed if COND always chooses true branch from then on. */
> +
> +      while (from != name)
> +        {
> +          /* A new value comes from a CONSTANT. */
> +          if (TREE_CODE (from) != SSA_NAME)
> +            return false;
> +
> +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> +          const_basic_block bb = gimple_bb (stmt);
> +
> +          /* A new value comes from outside of loop. */
> +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +            return false;
> +
> +          from = NULL_TREE;
> +
> +          if (gimple_code (stmt) == GIMPLE_PHI)
> +            {
> +              gphi *phi = as_a <gphi *> (stmt);
> +
> +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +                {
> +                  const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +                  /* Skip redefinition from basic blocks being excluded. */
> +                  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                    {
> +                      /* There are more than one source operands that can
> +                         provide value to the SSA name. */
> +                      if (from)
> +                        return false;
> +
> +                      from = gimple_phi_arg_def (phi, i);
> +                    }
> +                }
> +            }
> +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +            {
> +              /* For simple value copy, check its rhs instead. */
> +              if (gimple_assign_ssa_name_copy_p (stmt))
> +                from = gimple_assign_rhs1 (stmt);
> +            }
> +
> +          /* Any other kind of definition is deemed to introduce a new value
> +             to the SSA name. */
> +          if (!from)
> +            return false;
> +        }
> +        return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration. */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place. */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name. */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether a statement is semi-invariant, iff all its operands are
> +   semi-invariant. */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands. For VARDECL operand
> +     involves memory load, check on VARDECL operand must have been done
> +     prior to invocation of this function in ssa_semi_invariant_p. */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +        return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine if unselect one branch of a conditional statement, whether we
> +   can exclude leading basic block of the branch and those basic blocks
> +   dominated by the leading one. */
> +
> +static bool
> +can_branch_be_excluded (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +        continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +        continue;
> +
> +       /* The branch can be reached through other path, not just from the
> +          conditional statement. */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement is invariant. That
> +   is: once the branch is selected in certain loop iteration, any operand
> +   that contributes to computation of the conditional statement remains
> +   unchanged in all following iterations. */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +         this conditional statement. Firstly, it is trivial if the exit
> +         branch is semi-invariant, for the statement is just loop-breaking.
> +         Secondly, if the opposite branch is semi-invariant, it means that
> +         the statement is real loop-invariant, which is covered by loop
> +         unswitch. */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +        return -1;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!can_branch_be_excluded (targ_bb[i]))
> +        continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +         loop latch, it and its following trace will only be executed in
> +         final iteration of loop, namely it is not part of repeated body
> +         of the loop. Similar to the above case that the branch is loop
> +         exit, no need to split loop. */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +        continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want. */
> +  if (invar[0] ^ !invar[1])
> +    return -1;
> +
> +  /* Found a real loop-invariant condition, do nothing. */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return -1;
> +
> +  return invar[1];
> +}
> +
> +/* Return TRUE is conditional statement in a normal loop is also inside
> +   a nested non-recognized loop, such as an irreducible loop. */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                        int branch)
> +{
> +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> +
> +  if (cond_bb == loop->header || branch_bb == loop->latch)
> +    return false;
> +
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  auto_vec<basic_block> worklist;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    bbs[i]->flags &= ~BB_REACHABLE;
> +
> +  /* Mark latch basic block as visited to be end point for reachablility
> +     traversal. */
> +  loop->latch->flags |= BB_REACHABLE;
> +
> +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> +
> +  /* Start from specified branch, the opposite branch is ignored for it
> +     will not be executed. */
> +  branch_bb->flags |= BB_REACHABLE;
> +  worklist.safe_push (branch_bb);
> +
> +  do
> +    {
> +      basic_block bb = worklist.pop ();
> +      edge e;
> +      edge_iterator ei;
> +
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +        {
> +          basic_block succ_bb = e->dest;
> +
> +          if (succ_bb == cond_bb)
> +            return true;
> +
> +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> +            continue;
> +
> +          if (succ_bb->flags & BB_REACHABLE)
> +            continue;
> +
> +          succ_bb->flags |= BB_REACHABLE;
> +          worklist.safe_push (succ_bb);
> +        }
> +    } while (!worklist.is_empty ());
> +
> +  return false;
> +}
> +
> +
> +/* Calculate increased code size measured by estimated insn number if
> +   applying loop split upon certain branch of a conditional statement. */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                         int branch)
> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch. */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +        continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +           gsi_next (&gsi))
> +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> +    }
> +
> +  return num;
> +}
> +
> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement. */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +
> +  /* Add a threshold for increased code size to disable loop split. */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> +    return false;
> +
> +  /* In each interation, conditional statement candidate should be
> +     executed only once. */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;
> +
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go. */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +        return false;
> +    }
> +
> +  /* Temporarily keep branch index in conditional statement. */
> +  gimple_set_plf (cond, GF_PLF_1, branch);
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in a loop, to find out a good
> +   candidate upon which we can do loop split. */
> +
> +static bool
> +mark_cond_to_split_loop (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field. */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* Skip statement in inner recognized loop, because we want that
> +         conditional statement executes at most once in each iteration. */
> +      if (bb->loop_father != loop)
> +        continue;
> +
> +      /* Actually this check is not a must constraint. With it, we can
> +         ensure conditional statement will execute at least once in
> +         each iteration. */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +        continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +        continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +
> +      if (can_split_loop_on_cond (loop, cond))
> +        {
> +          info->cond = cond;
> +          return true;
> +        }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return false;
> +}
> +
> +/* Given a loop with a chosen conditional statement candidate, perform loop
> +   split transformation illustrated as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out. In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result. In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an
> +   entry edge. And also in loop2, we abandon the variant branch of the
> +   conditional statement candidate by setting a constant bool condition,
> +   based on which branch is semi-invariant. */
> +
> +static bool
> +split_loop_for_cond (struct loop *loop1)
> +{
> +  split_info *info = (split_info *) loop1->aux;
> +  gcond *cond = info->cond;
> +  basic_block cond_bb = gimple_bb (cond);
> +  int branch = gimple_plf (cond, GF_PLF_1);
> +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +              current_function_name (), loop1->num,
> +              true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                     profile_probability::always (),
> +                                     profile_probability::never (),
> +                                     profile_probability::always (),
> +                                     profile_probability::always (),
> +                                     true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition. */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                      gimple_cond_code (cond),
> +                                      gimple_cond_lhs (cond),
> +                                      gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  /* Replace the condition in loop2 with a bool constant to let pass
> +     manager remove the variant branch after current pass finishes. */
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1. This
> +     statement acts as a switch to transfer execution from loop1 to
> +     loop2, when loop1 enters into invariant state. */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                          NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +
> +  if (true_invar)
> +    {
> +      to_loop1->flags |= EDGE_FALSE_VALUE;
> +      to_loop2->flags |= EDGE_TRUE_VALUE;
> +    }
> +  else
> +    {
> +      to_loop1->flags |= EDGE_TRUE_VALUE;
> +      to_loop2->flags |= EDGE_FALSE_VALUE;
> +    }
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2. */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops. */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)
> +{
> +  struct loop *loop;
> +  auto_vec<struct loop *> loop_list;
> +  bool changed = false;
> +  unsigned i;
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  /* Go through all loops starting from innermost. */
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      /* Put loop in a list if found a conditional statement candidate in
> +         the loop. This is stage for analysis, no change anything in the
> +         function. */
> +      if (!loop->aux
> +          && !optimize_loop_for_size_p (loop)
> +          && mark_cond_to_split_loop (loop))
> +        loop_list.safe_push (loop);
> +
> +      /* If any of our inner loops was split, don't split us,
> +         and mark our containing loop as having had splits as well. */
> +      loop_outer (loop)->aux = loop->aux;
> +    }
> +
> +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> +    {
> +      /* Extract selected loop and perform loop split. This is stage for
> +         transformation. */
> +      changed |= split_loop_for_cond (loop);
> +
> +      delete (split_info *) loop->aux;
> +    }
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  if (changed)
> +    return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +
>  /* Loop splitting pass.  */
>
>  namespace {
> @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
>  {
>    return new pass_loop_split (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_cond_loop_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "cond_lsplit", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_COND_LOOP_SPLIT, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_cond_loop_split : public gimple_opt_pass
> +{
> +public:
> +  pass_cond_loop_split (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_split_loops != 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_cond_loop_split
> +
> +unsigned int
> +pass_cond_loop_split::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  return tree_ssa_split_loops_for_cond ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_cond_loop_split (gcc::context *ctxt)
> +{
> +  return new pass_cond_loop_split (ctxt);
> +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-13  2:13   ` Feng Xue OS
@ 2019-03-13  9:43     ` Kyrill Tkachov
  2019-03-13 12:11       ` Richard Biener
  2019-03-14  3:31       ` Feng Xue OS
  0 siblings, 2 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2019-03-13  9:43 UTC (permalink / raw)
  To: Feng Xue OS, Richard Biener; +Cc: gcc-patches

Hi Feng,

On 3/13/19 1:56 AM, Feng Xue OS wrote:
> Richard,
>
>     Thanks for your comment. Yes, it is like kind of jump threading 
> with knowledge of loop structure. And what is rough time for GCC 10?
>
>

GCC 10 will be released once the number of P1 regressions gets down to 
zero. Past experience shows that it's around the April/May timeframe.

In the meantime my comment on the patch is that you should add some 
tests to the testsuite that showcase this transformation.

Thanks,

Kyrill


> Regards,
>
> Feng
>
>
> ________________________________
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, March 12, 2019 4:31:49 PM
> To: Feng Xue OS
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR 
> tree-optimization/89134)
>
> On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS 
> <fxue@os.amperecomputing.com> wrote:
> >
> > This patch is composed to implement a loop transformation on one of 
> its conditional statements, which we call it semi-invariant, in that 
> its computation is impacted in only one of its branches.
> >
> > Suppose a loop as:
> >
> >     void f (std::map<int, int> m)
> >     {
> >         for (auto it = m.begin (); it != m.end (); ++it) {
> >             /* if (b) is semi-invariant. */
> >             if (b) {
> >                 b = do_something();    /* Has effect on b */
> >             } else {
> > /* No effect on b */
> >             }
> >             statements;                      /* Also no effect on b */
> >         }
> >     }
> >
> > A transformation, kind of loop split, could be:
> >
> >     void f (std::map<int, int> m)
> >     {
> >         for (auto it = m.begin (); it != m.end (); ++it) {
> >             if (b) {
> >                 b = do_something();
> >             } else {
> >                 ++it;
> >                 statements;
> >                 break;
> >             }
> >             statements;
> >         }
> >
> >         for (; it != m.end (); ++it) {
> >             statements;
> >         }
> >     }
> >
> > If "statements" contains nothing, the second loop becomes an empty 
> one, which can be removed. (This part will be given in another patch). 
> And if "statements" are straight line instructions, we get an 
> opportunity to vectorize the second loop. In practice, this 
> optimization is found to improve some real application by %7.
> >
> > Since it is just a kind of loop split, the codes are mainly placed 
> in existing tree-ssa-loop-split module, and is controlled by 
> -fsplit-loop, and is enabled with -O3.
>
> Note the transform itself is jump-threading with the threading
> duplicating a whole CFG cycle.
>
> I didn't look at the patch details yet since this is suitable for GCC 
> 10 only.
>
> Thanks for implementing this.
> Richard.
>
> > Feng
> >
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index 64bf6017d16..a6c2878d652 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,23 @@
> > +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> > +
> > +       PR tree-optimization/89134
> > +        * doc/invoke.texi (max-cond-loop-split-insns): Document new 
> --params.
> > +       (min-cond-loop-split-prob): Likewise.
> > +       * params.def: Add max-cond-loop-split-insns, 
> min-cond-loop-split-prob.
> > +       * passes.def (pass_cond_loop_split) : New pass.
> > +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> > +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> > +       * tree-ssa-loop-split.c (split_info): New class.
> > +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> > +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> > +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> > +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> > +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> > +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> > +       (pass_data_cond_loop_split): New variable.
> > +       (pass_cond_loop_split): New class.
> > +       (make_pass_cond_loop_split): New function.
> > +
> >  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR middle-end/89655
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index df0883f2fc9..f5e09bd71fd 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched 
> in a single loop.
> >  @item lim-expensive
> >  The minimum cost of an expensive expression in the loop invariant 
> motion.
> >
> > +@item max-cond-loop-split-insns
> > +The maximum number of insns to be increased due to loop split on
> > +semi-invariant condition statement.
> > +
> > +@item min-cond-loop-split-prob
> > +The minimum threshold for probability of semi-invaraint condition
> > +statement to trigger loop split.
> > +
> >  @item iv-consider-all-candidates-bound
> >  Bound on number of candidates for induction variables, below which
> >  all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 3f1576448be..2e067526958 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> >         "The maximum number of unswitchings in a single loop.",
> >         3, 0, 0)
> >
> > +/* The maximum number of increased insns due to loop split on 
> semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > +       "max-cond-loop-split-insns",
> > +       "The maximum number of insns to be increased due to loop 
> split on semi-invariant condition statement.",
> > +       100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > +       "min-cond-loop-split-prob",
> > +       "The minimum threshold for probability of semi-invaraint 
> condition statement to trigger loop split.",
> > +       30, 0, 100)
> > +
> >  /* The maximum number of insns in loop header duplicated by the 
> copy loop
> >     headers pass.  */
> >  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 446a7c48276..bde7f4c50c0 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
> >           NEXT_PASS (pass_tree_unswitch);
> >           NEXT_PASS (pass_scev_cprop);
> >           NEXT_PASS (pass_loop_split);
> > +         NEXT_PASS (pass_cond_loop_split);
> >           NEXT_PASS (pass_loop_versioning);
> >           NEXT_PASS (pass_loop_jam);
> >           /* All unswitching, final value replacement and splitting 
> can expose
> > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > index 54154464a58..39f2df0e3ec 100644
> > --- a/gcc/timevar.def
> > +++ b/gcc/timevar.def
> > @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree 
> canonical iv")
> >  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
> >  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
> >  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> > +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
> >  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
> >  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
> >  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > index 47be59b2a11..f441ba36871 100644
> > --- a/gcc/tree-pass.h
> > +++ b/gcc/tree-pass.h
> > @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim 
> (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> > +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index 999c9a30366..d287a0d7d4c 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-ssa-loop.h"
> >  #include "tree-ssa-loop-manip.h"
> >  #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> >  #include "cfgloop.h"
> > +#include "params.h"
> >  #include "tree-scalar-evolution.h"
> >  #include "gimple-iterator.h"
> >  #include "gimple-pretty-print.h"
> > @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimple-fold.h"
> >  #include "gimplify-me.h"
> >
> > -/* This file implements loop splitting, i.e. transformation of 
> loops like
> > +/* This file implements two kind of loop splitting.
> > +
> > +   One transformation of loops like:
> >
> >     for (i = 0; i < 100; i++)
> >       {
> > @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
> >    return 0;
> >  }
> >
> > +
> > +/* Another transformation of loops like:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;  // change at least one a_j
> > +       else
> > +         S;          // not change any a_j
> > +     }
> > +
> > +   into:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;
> > +       else
> > +         {
> > +           S;
> > +           i = NEXT ();
> > +           break;
> > +         }
> > +     }
> > +
> > +   for (; CHECK (i); i = NEXT ())
> > +     {
> > +       S;
> > +     }
> > +
> > +   */
> > +
> > +/* Data structure to hold temporary information during loop split upon
> > +   semi-invariant conditional statement. */
> > +class split_info {
> > +public:
> > +  /* Array of all basic blocks in a loop, returned by 
> get_loop_body(). */
> > +  basic_block *bbs;
> > +
> > +  /* All memory store/clobber statements in a loop. */
> > +  auto_vec<gimple *> stores;
> > +
> > +  /* Whether above memory stores vector has been filled. */
> > +  bool set_stores;
> > +
> > +  /* Semi-invariant conditional statement, upon which to split loop. */
> > +  gcond *cond;
> > +
> > +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> > +
> > +  ~split_info ()
> > +    {
> > +      if (bbs)
> > +        free (bbs);
> > +    }
> > +};
> > +
> > +/* Find all statements with memory-write effect in a loop, 
> including memory
> > +   store and non-pure function call, and keep those in a vector. 
> This work
> > +   is only done for one time, for the vector should be constant during
> > +   analysis stage of semi-invariant condition. */
> > +
> > +static void
> > +find_vdef_in_loop (struct loop *loop)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +  gphi *vphi = get_virtual_phi (loop->header);
> > +
> > +  /* Indicate memory store vector has been filled. */
> > +  info->set_stores = true;
> > +
> > +  /* If loop contains memory operation, there must be a virtual PHI 
> node in
> > +     loop header basic block. */
> > +  if (vphi == NULL)
> > +    return;
> > +
> > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > +     graph via virtual PHI nodes. The virtual PHI node in loop 
> header just
> > +     links the first and the last virtual SSA names, by using the 
> last as
> > +     PHI operand to define the first. */
> > +  const edge latch = loop_latch_edge (loop);
> > +  const tree first = gimple_phi_result (vphi);
> > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > +
> > +  /* The virtual SSA cyclic graph might consist of only one SSA 
> name, who
> > +     is defined by itself.
> > +
> > +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > +
> > +     This means the loop contains only memory loads, so we can skip 
> it. */
> > +  if (first == last)
> > +    return;
> > +
> > +  auto_vec<gimple *> others;
> > +  auto_vec<tree> worklist;
> > +  auto_bitmap visited;
> > +
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > +  worklist.safe_push (last);
> > +
> > +  do
> > +    {
> > +      tree vuse = worklist.pop ();
> > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > +
> > +      /* We mark the first and last SSA names as visited at the 
> beginning,
> > +         and reversely start the process from the last SSA name 
> toward the
> > +         first, which ensure that this do-while will not touch SSA 
> names
> > +         defined outside of the loop. */
> > +      gcc_assert (gimple_bb (stmt)
> > +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > +
> > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > +        {
> > +          gphi *phi = as_a <gphi *> (stmt);
> > +
> > +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +            {
> > +              tree arg = gimple_phi_arg_def (stmt, i);
> > +
> > +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > +                worklist.safe_push (arg);
> > +            }
> > +        }
> > +      else
> > +        {
> > +          tree prev = gimple_vuse (stmt);
> > +
> > +          /* Non-pure call statement is conservatively assumed to 
> impact
> > +             all memory locations. So place call statements ahead 
> of other
> > +             memory stores in the vector with the idea of of using 
> them as
> > +             shortcut terminators to memory alias analysis, kind of
> > +             optimization for compilation. */
> > +          if (gimple_code (stmt) == GIMPLE_CALL)
> > +            info->stores.safe_push (stmt);
> > +          else
> > +            others.safe_push (stmt);
> > +
> > +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > +            worklist.safe_push (prev);
> > +        }
> > +    } while (!worklist.is_empty ());
> > +
> > +    info->stores.safe_splice (others);
> > +}
> > +
> > +
> > +/* Given a memory load or pure call statement, check whether it is 
> impacted
> > +   by some memory store in the loop excluding those basic blocks 
> dominated
> > +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> > +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks 
> of the
> > +   loop are checked. */
> > +
> > +static bool
> > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +
> > +  /* Collect memory store/clobber statements if have not do that. */
> > +  if (!info->set_stores)
> > +    find_vdef_in_loop (loop);
> > +
> > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : 
> NULL_TREE;
> > +  ao_ref ref;
> > +  gimple *store;
> > +  unsigned i;
> > +
> > +  ao_ref_init (&ref, rhs);
> > +
> > +  FOR_EACH_VEC_ELT (info->stores, i, store)
> > +    {
> > +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> > +      if (skip_head
> > +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), 
> skip_head))
> > +        continue;
> > +
> > +      /* For a pure call, it is assumed to be impacted by any 
> memory store.
> > +         For a memory load, use memory alias analysis to check that. */
> > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > +        return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Forward declaration */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head);
> > +
> > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed 
> in certain
> > +   iteration, check whether an SSA name remains unchanged in next 
> interation.
> > +   We can call this characterisic as semi-invariantness. SKIP_HEAD 
> might be
> > +   NULL, if so, nothing excluded, all basic blocks and control 
> flows in the
> > +   loop will be considered. */
> > +
> > +static bool
> > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > +                      const_basic_block skip_head)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > +  const_basic_block def_bb = gimple_bb (def);
> > +
> > +  /* An SSA name defined outside a loop is definitely 
> semi-invariant. */
> > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > +    return true;
> > +
> > +  /* This function is used to check semi-invariantness of a condition
> > +     statement, and SKIP_HEAD is always given as head of one of its
> > +     branches. So it implies that SSA name to check should be defined
> > +     before the conditional statement, and also before SKIP_HEAD. */
> > +
> > +  if (gimple_code (def) == GIMPLE_PHI)
> > +    {
> > +      /* In a normal loop, if a PHI node is located not in loop 
> header, all
> > +         its source operands should be defined inside the loop. As we
> > +         mentioned before, these source definitions are ahead of 
> SKIP_HEAD,
> > +         and will not be bypassed. Therefore, in each iteration, any of
> > +         these sources might be value provider to the SSA name, 
> which for
> > +         sure should not be seen as invariant. */
> > +      if (def_bb != loop->header || !skip_head)
> > +        return false;
> > +
> > +      const_edge latch = loop_latch_edge (loop);
> > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > +
> > +      /* A PHI node in loop header always contains two source operands,
> > +         one is initial value, the other is the copy of last iteration
> > +         through loop latch, we call it latch value. From this PHI node
> > +         to definition of latch value, if excluding those basic blocks
> > +         dominated by SKIP_HEAD, there is no definition of other 
> version
> > +         of same variable, SSA name defined by the PHI node is
> > +         semi-invariant.
> > +
> > +                         loop entry
> > +                              |     .--- latch ---.
> > +                              |     |             |
> > +                              v     v             |
> > +                  x_1 = PHI <x_0, x_3>           |
> > +                           |                      |
> > +                           v                      |
> > +              .------- if (cond) -------.         |
> > +              |                         |         |
> > +              |                     [ SKIP ]      |
> > +              |                         |         |
> > +              |                     x_2 = ...     |
> > +              |                         |         |
> > +              '---- T ---->.<---- F ----'         |
> > +                           |                      |
> > +                           v                      |
> > +                  x_3 = PHI <x_1, x_2>            |
> > +                           |                      |
> > +                           '----------------------'
> > +
> > +        Suppose in certain iteration, execution flow in above graph 
> goes
> > +        through true branch, which means that one source value to 
> define
> > +        x_3 in false branch (x2) is skipped, x_3 only comes from 
> x_1, and
> > +        x_1 in next iterations is defined by x_3, we know that x_1 will
> > +        never changed if COND always chooses true branch from then 
> on. */
> > +
> > +      while (from != name)
> > +        {
> > +          /* A new value comes from a CONSTANT. */
> > +          if (TREE_CODE (from) != SSA_NAME)
> > +            return false;
> > +
> > +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> > +          const_basic_block bb = gimple_bb (stmt);
> > +
> > +          /* A new value comes from outside of loop. */
> > +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > +            return false;
> > +
> > +          from = NULL_TREE;
> > +
> > +          if (gimple_code (stmt) == GIMPLE_PHI)
> > +            {
> > +              gphi *phi = as_a <gphi *> (stmt);
> > +
> > +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +                {
> > +                  const_edge e = gimple_phi_arg_edge (phi, i);
> > +
> > +                  /* Skip redefinition from basic blocks being 
> excluded. */
> > +                  if (!dominated_by_p (CDI_DOMINATORS, e->src, 
> skip_head))
> > +                    {
> > +                      /* There are more than one source operands 
> that can
> > +                         provide value to the SSA name. */
> > +                      if (from)
> > +                        return false;
> > +
> > +                      from = gimple_phi_arg_def (phi, i);
> > +                    }
> > +                }
> > +            }
> > +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > +            {
> > +              /* For simple value copy, check its rhs instead. */
> > +              if (gimple_assign_ssa_name_copy_p (stmt))
> > +                from = gimple_assign_rhs1 (stmt);
> > +            }
> > +
> > +          /* Any other kind of definition is deemed to introduce a 
> new value
> > +             to the SSA name. */
> > +          if (!from)
> > +            return false;
> > +        }
> > +        return true;
> > +    }
> > +
> > +  /* Value originated from volatile memory load or return of normal 
> (non-
> > +     const/pure) call should not be treated as constant in each 
> iteration. */
> > +  if (gimple_has_side_effects (def))
> > +    return false;
> > +
> > +  /* Check if any memory store may kill memory load at this place. */
> > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, 
> skip_head))
> > +    return false;
> > +
> > +  /* Check operands of definition statement of the SSA name. */
> > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > +}
> > +
> > +/* Check whether a statement is semi-invariant, iff all its 
> operands are
> > +   semi-invariant. */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head)
> > +{
> > +  ssa_op_iter iter;
> > +  tree use;
> > +
> > +  /* Although operand of a statement might be SSA name, CONSTANT or 
> VARDECL,
> > +     here we only need to check SSA name operands. For VARDECL operand
> > +     involves memory load, check on VARDECL operand must have been done
> > +     prior to invocation of this function in ssa_semi_invariant_p. */
> > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > +    {
> > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > +        return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine if unselect one branch of a conditional statement, 
> whether we
> > +   can exclude leading basic block of the branch and those basic blocks
> > +   dominated by the leading one. */
> > +
> > +static bool
> > +can_branch_be_excluded (basic_block branch_bb)
> > +{
> > +  if (single_pred_p (branch_bb))
> > +    return true;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > +    {
> > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > +        continue;
> > +
> > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > +        continue;
> > +
> > +       /* The branch can be reached through other path, not just 
> from the
> > +          conditional statement. */
> > +      return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Find out which branch of a conditional statement is invariant. That
> > +   is: once the branch is selected in certain loop iteration, any 
> operand
> > +   that contributes to computation of the conditional statement remains
> > +   unchanged in all following iterations. */
> > +
> > +static int
> > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > +{
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  basic_block targ_bb[2];
> > +  bool invar[2];
> > +  unsigned invar_checks;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > +
> > +      /* One branch directs to loop exit, no need to perform loop 
> split upon
> > +         this conditional statement. Firstly, it is trivial if the exit
> > +         branch is semi-invariant, for the statement is just 
> loop-breaking.
> > +         Secondly, if the opposite branch is semi-invariant, it 
> means that
> > +         the statement is real loop-invariant, which is covered by loop
> > +         unswitch. */
> > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > +        return -1;
> > +    }
> > +
> > +  invar_checks = 0;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      invar[!i] = false;
> > +
> > +      if (!can_branch_be_excluded (targ_bb[i]))
> > +        continue;
> > +
> > +      /* Given a semi-invariant branch, if its opposite branch 
> dominates
> > +         loop latch, it and its following trace will only be 
> executed in
> > +         final iteration of loop, namely it is not part of repeated 
> body
> > +         of the loop. Similar to the above case that the branch is loop
> > +         exit, no need to split loop. */
> > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > +        continue;
> > +
> > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > +      invar_checks++;
> > +    }
> > +
> > +  /* With both branches being invariant (handled by loop unswitch) or
> > +     variant is not what we want. */
> > +  if (invar[0] ^ !invar[1])
> > +    return -1;
> > +
> > +  /* Found a real loop-invariant condition, do nothing. */
> > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > +    return -1;
> > +
> > +  return invar[1];
> > +}
> > +
> > +/* Return TRUE is conditional statement in a normal loop is also inside
> > +   a nested non-recognized loop, such as an irreducible loop. */
> > +
> > +static bool
> > +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> > +                        int branch)
> > +{
> > +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> > +
> > +  if (cond_bb == loop->header || branch_bb == loop->latch)
> > +    return false;
> > +
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  auto_vec<basic_block> worklist;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    bbs[i]->flags &= ~BB_REACHABLE;
> > +
> > +  /* Mark latch basic block as visited to be end point for 
> reachablility
> > +     traversal. */
> > +  loop->latch->flags |= BB_REACHABLE;
> > +
> > +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> > +
> > +  /* Start from specified branch, the opposite branch is ignored for it
> > +     will not be executed. */
> > +  branch_bb->flags |= BB_REACHABLE;
> > +  worklist.safe_push (branch_bb);
> > +
> > +  do
> > +    {
> > +      basic_block bb = worklist.pop ();
> > +      edge e;
> > +      edge_iterator ei;
> > +
> > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > +        {
> > +          basic_block succ_bb = e->dest;
> > +
> > +          if (succ_bb == cond_bb)
> > +            return true;
> > +
> > +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> > +            continue;
> > +
> > +          if (succ_bb->flags & BB_REACHABLE)
> > +            continue;
> > +
> > +          succ_bb->flags |= BB_REACHABLE;
> > +          worklist.safe_push (succ_bb);
> > +        }
> > +    } while (!worklist.is_empty ());
> > +
> > +  return false;
> > +}
> > +
> > +
> > +/* Calculate increased code size measured by estimated insn number if
> > +   applying loop split upon certain branch of a conditional 
> statement. */
> > +
> > +static int
> > +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> > +                         int branch)
> > +{
> > +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  int num = 0;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      /* Do no count basic blocks only in opposite branch. */
> > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> > +        continue;
> > +
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); 
> !gsi_end_p (gsi);
> > +           gsi_next (&gsi))
> > +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> > +    }
> > +
> > +  return num;
> > +}
> > +
> > +/* Return true if it is eligible and profitable to perform loop 
> split upon
> > +   a conditional statement. */
> > +
> > +static bool
> > +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> > +{
> > +  int branch = get_cond_invariant_branch (loop, cond);
> > +
> > +  if (branch < 0)
> > +    return false;
> > +
> > +  basic_block cond_bb = gimple_bb (cond);
> > +
> > +  /* Add a threshold for increased code size to disable loop split. */
> > +  if (compute_added_num_insns (loop, cond_bb, branch) >
> > +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> > +    return false;
> > +
> > +  /* In each interation, conditional statement candidate should be
> > +     executed only once. */
> > +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> > +    return false;
> > +
> > +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> > +
> > +  /* When accurate profile information is available, and execution
> > +     frequency of the branch is too low, just let it go. */
> > +  if (prob.reliable_p ())
> > +    {
> > +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> > +
> > +      if (prob < profile_probability::always ().apply_scale (thres, 
> 100))
> > +        return false;
> > +    }
> > +
> > +  /* Temporarily keep branch index in conditional statement. */
> > +  gimple_set_plf (cond, GF_PLF_1, branch);
> > +  return true;
> > +}
> > +
> > +/* Traverse all conditional statements in a loop, to find out a good
> > +   candidate upon which we can do loop split. */
> > +
> > +static bool
> > +mark_cond_to_split_loop (struct loop *loop)
> > +{
> > +  split_info *info = new split_info ();
> > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > +
> > +  /* Allocate an area to keep temporary info, and associate its address
> > +     with loop aux field. */
> > +  loop->aux = info;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      basic_block bb = bbs[i];
> > +
> > +      /* Skip statement in inner recognized loop, because we want that
> > +         conditional statement executes at most once in each 
> iteration. */
> > +      if (bb->loop_father != loop)
> > +        continue;
> > +
> > +      /* Actually this check is not a must constraint. With it, we can
> > +         ensure conditional statement will execute at least once in
> > +         each iteration. */
> > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +        continue;
> > +
> > +      gimple *last = last_stmt (bb);
> > +
> > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > +        continue;
> > +
> > +      gcond *cond = as_a <gcond *> (last);
> > +
> > +      if (can_split_loop_on_cond (loop, cond))
> > +        {
> > +          info->cond = cond;
> > +          return true;
> > +        }
> > +    }
> > +
> > +  delete info;
> > +  loop->aux = NULL;
> > +
> > +  return false;
> > +}
> > +
> > +/* Given a loop with a chosen conditional statement candidate, 
> perform loop
> > +   split transformation illustrated as the following graph.
> > +
> > +               .-------T------ if (true) ------F------.
> > +               |                    .---------------. |
> > +               |                    |               | |
> > +               v                    |               v v
> > +          pre-header                | pre-header
> > +               | .------------.     | | .------------.
> > +               | |            |     | | |            |
> > +               | v            |     | | v            |
> > +             header           |     | header           |
> > +               |              |     | |              |
> > +       [ bool r = cond; ]     |     | |              |
> > +               |              |     | |              |
> > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > +      |                 |     |     | |                 |     |
> > +  invariant             |     |     | invariant             |     |
> > +      |                 |     |     | |                 |     |
> > +      '---T--->.<---F---'     |     | '---T--->.<---F---'     |
> > +               |              |    / |              |
> > +             stmts            |   / stmts            |
> > +               |              |  / |              |
> > +              / \             | /                    / \             |
> > +     .-------*   *       [ if (!r) ] .-------*   *            |
> > +     |           |            | |           |            |
> > +     |         latch          |             | latch          |
> > +     |           |            | |           |            |
> > +     |           '------------' |           '------------'
> > +     '------------------------. .-----------'
> > +             loop1            | | loop2
> > +                              v v
> > +                             exits
> > +
> > +   In the graph, loop1 represents the part derived from original 
> one, and
> > +   loop2 is duplicated using loop_version (), which corresponds to 
> the part
> > +   of original one being splitted out. In loop1, a new bool 
> temporary (r)
> > +   is introduced to keep value of the condition result. In original 
> latch
> > +   edge of loop1, we insert a new conditional statement whose value 
> comes
> > +   from previous temporary (r), one of its branch goes back to 
> loop1 header
> > +   as a latch edge, and the other branch goes to loop2 pre-header as an
> > +   entry edge. And also in loop2, we abandon the variant branch of the
> > +   conditional statement candidate by setting a constant bool 
> condition,
> > +   based on which branch is semi-invariant. */
> > +
> > +static bool
> > +split_loop_for_cond (struct loop *loop1)
> > +{
> > +  split_info *info = (split_info *) loop1->aux;
> > +  gcond *cond = info->cond;
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  int branch = gimple_plf (cond, GF_PLF_1);
> > +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & 
> EDGE_TRUE_VALUE);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB 
> %d\n",
> > +              current_function_name (), loop1->num,
> > +              true_invar ? "T" : "F", cond_bb->index);
> > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > +   }
> > +
> > +  initialize_original_copy_tables ();
> > +
> > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > + profile_probability::always (),
> > + profile_probability::never (),
> > + profile_probability::always (),
> > + profile_probability::always (),
> > +                                     true);
> > +  if (!loop2)
> > +    {
> > +      free_original_copy_tables ();
> > +      return false;
> > +    }
> > +
> > +  /* Generate a bool type temporary to hold result of the condition. */
> > +  tree tmp = make_ssa_name (boolean_type_node);
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple *stmt = gimple_build_assign (tmp,
> > +                                      gimple_cond_code (cond),
> > +                                      gimple_cond_lhs (cond),
> > +                                      gimple_cond_rhs (cond));
> > +
> > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > +  update_stmt (cond);
> > +
> > +  /* Replace the condition in loop2 with a bool constant to let pass
> > +     manager remove the variant branch after current pass finishes. */
> > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > +
> > +  if (true_invar)
> > +    gimple_cond_make_true (cond_copy);
> > +  else
> > +    gimple_cond_make_false (cond_copy);
> > +
> > +  update_stmt (cond_copy);
> > +
> > +  /* Insert a new conditional statement on latch edge of loop1. This
> > +     statement acts as a switch to transfer execution from loop1 to
> > +     loop2, when loop1 enters into invariant state. */
> > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, 
> boolean_true_node,
> > +                                          NULL_TREE, NULL_TREE);
> > +
> > +  gsi = gsi_last_bb (break_bb);
> > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > +
> > +  edge to_loop1 = single_succ_edge (break_bb);
> > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge 
> (loop2)->src, 0);
> > +
> > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > +
> > +  if (true_invar)
> > +    {
> > +      to_loop1->flags |= EDGE_FALSE_VALUE;
> > +      to_loop2->flags |= EDGE_TRUE_VALUE;
> > +    }
> > +  else
> > +    {
> > +      to_loop1->flags |= EDGE_TRUE_VALUE;
> > +      to_loop2->flags |= EDGE_FALSE_VALUE;
> > +    }
> > +
> > +  update_ssa (TODO_update_ssa);
> > +
> > +  /* Due to introduction of a control flow edge from loop1 latch to 
> loop2
> > +     pre-header, we should update PHIs in loop2 to reflect this 
> connection
> > +     between loop1 and loop2. */
> > +  connect_loop_phis (loop1, loop2, to_loop2);
> > +
> > +  free_original_copy_tables ();
> > +
> > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > +
> > +  return true;
> > +}
> > +
> > +/* Main entry point to perform loop splitting for suitable 
> if-conditions
> > +   in all loops. */
> > +
> > +static unsigned int
> > +tree_ssa_split_loops_for_cond (void)
> > +{
> > +  struct loop *loop;
> > +  auto_vec<struct loop *> loop_list;
> > +  bool changed = false;
> > +  unsigned i;
> > +
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  /* Go through all loops starting from innermost. */
> > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > +    {
> > +      /* Put loop in a list if found a conditional statement 
> candidate in
> > +         the loop. This is stage for analysis, no change anything 
> in the
> > +         function. */
> > +      if (!loop->aux
> > +          && !optimize_loop_for_size_p (loop)
> > +          && mark_cond_to_split_loop (loop))
> > +        loop_list.safe_push (loop);
> > +
> > +      /* If any of our inner loops was split, don't split us,
> > +         and mark our containing loop as having had splits as well. */
> > +      loop_outer (loop)->aux = loop->aux;
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> > +    {
> > +      /* Extract selected loop and perform loop split. This is 
> stage for
> > +         transformation. */
> > +      changed |= split_loop_for_cond (loop);
> > +
> > +      delete (split_info *) loop->aux;
> > +    }
> > +
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  if (changed)
> > +    return TODO_cleanup_cfg;
> > +  return 0;
> > +}
> > +
> > +
> >  /* Loop splitting pass.  */
> >
> >  namespace {
> > @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
> >  {
> >    return new pass_loop_split (ctxt);
> >  }
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_cond_loop_split =
> > +{
> > +  GIMPLE_PASS, /* type */
> > +  "cond_lsplit", /* name */
> > +  OPTGROUP_LOOP, /* optinfo_flags */
> > +  TV_COND_LOOP_SPLIT, /* tv_id */
> > +  PROP_cfg, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  0, /* todo_flags_finish */
> > +};
> > +
> > +class pass_cond_loop_split : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_cond_loop_split (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *) { return flag_split_loops != 0; }
> > +  virtual unsigned int execute (function *);
> > +
> > +}; // class pass_cond_loop_split
> > +
> > +unsigned int
> > +pass_cond_loop_split::execute (function *fun)
> > +{
> > +  if (number_of_loops (fun) <= 1)
> > +    return 0;
> > +
> > +  return tree_ssa_split_loops_for_cond ();
> > +}
> > +
> > +} // anon namespace
> > +
> > +gimple_opt_pass *
> > +make_pass_cond_loop_split (gcc::context *ctxt)
> > +{
> > +  return new pass_cond_loop_split (ctxt);
> > +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-13  9:43     ` Kyrill Tkachov
@ 2019-03-13 12:11       ` Richard Biener
  2019-03-13 12:39         ` Kyrill Tkachov
  2019-03-14  3:31       ` Feng Xue OS
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Biener @ 2019-03-13 12:11 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Feng Xue OS, gcc-patches

On Wed, Mar 13, 2019 at 10:40 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Feng,
>
> On 3/13/19 1:56 AM, Feng Xue OS wrote:
> > Richard,
> >
> >     Thanks for your comment. Yes, it is like kind of jump threading
> > with knowledge of loop structure. And what is rough time for GCC 10?
> >
> >
>
> GCC 10 will be released once the number of P1 regressions gets down to
> zero. Past experience shows that it's around the April/May timeframe.

Note GCC 10 is due only next year.

> In the meantime my comment on the patch is that you should add some
> tests to the testsuite that showcase this transformation.
>
> Thanks,
>
> Kyrill
>
>
> > Regards,
> >
> > Feng
> >
> >
> > ________________________________
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Tuesday, March 12, 2019 4:31:49 PM
> > To: Feng Xue OS
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR
> > tree-optimization/89134)
> >
> > On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS
> > <fxue@os.amperecomputing.com> wrote:
> > >
> > > This patch is composed to implement a loop transformation on one of
> > its conditional statements, which we call it semi-invariant, in that
> > its computation is impacted in only one of its branches.
> > >
> > > Suppose a loop as:
> > >
> > >     void f (std::map<int, int> m)
> > >     {
> > >         for (auto it = m.begin (); it != m.end (); ++it) {
> > >             /* if (b) is semi-invariant. */
> > >             if (b) {
> > >                 b = do_something();    /* Has effect on b */
> > >             } else {
> > > /* No effect on b */
> > >             }
> > >             statements;                      /* Also no effect on b */
> > >         }
> > >     }
> > >
> > > A transformation, kind of loop split, could be:
> > >
> > >     void f (std::map<int, int> m)
> > >     {
> > >         for (auto it = m.begin (); it != m.end (); ++it) {
> > >             if (b) {
> > >                 b = do_something();
> > >             } else {
> > >                 ++it;
> > >                 statements;
> > >                 break;
> > >             }
> > >             statements;
> > >         }
> > >
> > >         for (; it != m.end (); ++it) {
> > >             statements;
> > >         }
> > >     }
> > >
> > > If "statements" contains nothing, the second loop becomes an empty
> > one, which can be removed. (This part will be given in another patch).
> > And if "statements" are straight line instructions, we get an
> > opportunity to vectorize the second loop. In practice, this
> > optimization is found to improve some real application by %7.
> > >
> > > Since it is just a kind of loop split, the codes are mainly placed
> > in existing tree-ssa-loop-split module, and is controlled by
> > -fsplit-loop, and is enabled with -O3.
> >
> > Note the transform itself is jump-threading with the threading
> > duplicating a whole CFG cycle.
> >
> > I didn't look at the patch details yet since this is suitable for GCC
> > 10 only.
> >
> > Thanks for implementing this.
> > Richard.
> >
> > > Feng
> > >
> > >
> > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > > index 64bf6017d16..a6c2878d652 100644
> > > --- a/gcc/ChangeLog
> > > +++ b/gcc/ChangeLog
> > > @@ -1,3 +1,23 @@
> > > +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> > > +
> > > +       PR tree-optimization/89134
> > > +        * doc/invoke.texi (max-cond-loop-split-insns): Document new
> > --params.
> > > +       (min-cond-loop-split-prob): Likewise.
> > > +       * params.def: Add max-cond-loop-split-insns,
> > min-cond-loop-split-prob.
> > > +       * passes.def (pass_cond_loop_split) : New pass.
> > > +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> > > +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> > > +       * tree-ssa-loop-split.c (split_info): New class.
> > > +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> > > +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> > > +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> > > +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> > > +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> > > +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> > > +       (pass_data_cond_loop_split): New variable.
> > > +       (pass_cond_loop_split): New class.
> > > +       (make_pass_cond_loop_split): New function.
> > > +
> > >  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
> > >
> > >         PR middle-end/89655
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index df0883f2fc9..f5e09bd71fd 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched
> > in a single loop.
> > >  @item lim-expensive
> > >  The minimum cost of an expensive expression in the loop invariant
> > motion.
> > >
> > > +@item max-cond-loop-split-insns
> > > +The maximum number of insns to be increased due to loop split on
> > > +semi-invariant condition statement.
> > > +
> > > +@item min-cond-loop-split-prob
> > > +The minimum threshold for probability of semi-invaraint condition
> > > +statement to trigger loop split.
> > > +
> > >  @item iv-consider-all-candidates-bound
> > >  Bound on number of candidates for induction variables, below which
> > >  all candidates are considered for each use in induction variable
> > > diff --git a/gcc/params.def b/gcc/params.def
> > > index 3f1576448be..2e067526958 100644
> > > --- a/gcc/params.def
> > > +++ b/gcc/params.def
> > > @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> > >         "The maximum number of unswitchings in a single loop.",
> > >         3, 0, 0)
> > >
> > > +/* The maximum number of increased insns due to loop split on
> > semi-invariant
> > > +   condition statement.  */
> > > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > > +       "max-cond-loop-split-insns",
> > > +       "The maximum number of insns to be increased due to loop
> > split on semi-invariant condition statement.",
> > > +       100, 0, 0)
> > > +
> > > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > > +       "min-cond-loop-split-prob",
> > > +       "The minimum threshold for probability of semi-invaraint
> > condition statement to trigger loop split.",
> > > +       30, 0, 100)
> > > +
> > >  /* The maximum number of insns in loop header duplicated by the
> > copy loop
> > >     headers pass.  */
> > >  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > index 446a7c48276..bde7f4c50c0 100644
> > > --- a/gcc/passes.def
> > > +++ b/gcc/passes.def
> > > @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
> > >           NEXT_PASS (pass_tree_unswitch);
> > >           NEXT_PASS (pass_scev_cprop);
> > >           NEXT_PASS (pass_loop_split);
> > > +         NEXT_PASS (pass_cond_loop_split);
> > >           NEXT_PASS (pass_loop_versioning);
> > >           NEXT_PASS (pass_loop_jam);
> > >           /* All unswitching, final value replacement and splitting
> > can expose
> > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > index 54154464a58..39f2df0e3ec 100644
> > > --- a/gcc/timevar.def
> > > +++ b/gcc/timevar.def
> > > @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree
> > canonical iv")
> > >  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
> > >  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
> > >  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> > > +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
> > >  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
> > >  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
> > >  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > index 47be59b2a11..f441ba36871 100644
> > > --- a/gcc/tree-pass.h
> > > +++ b/gcc/tree-pass.h
> > > @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim
> > (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> > > +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> > > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > > index 999c9a30366..d287a0d7d4c 100644
> > > --- a/gcc/tree-ssa-loop-split.c
> > > +++ b/gcc/tree-ssa-loop-split.c
> > > @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "tree-ssa-loop.h"
> > >  #include "tree-ssa-loop-manip.h"
> > >  #include "tree-into-ssa.h"
> > > +#include "tree-inline.h"
> > >  #include "cfgloop.h"
> > > +#include "params.h"
> > >  #include "tree-scalar-evolution.h"
> > >  #include "gimple-iterator.h"
> > >  #include "gimple-pretty-print.h"
> > > @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "gimple-fold.h"
> > >  #include "gimplify-me.h"
> > >
> > > -/* This file implements loop splitting, i.e. transformation of
> > loops like
> > > +/* This file implements two kind of loop splitting.
> > > +
> > > +   One transformation of loops like:
> > >
> > >     for (i = 0; i < 100; i++)
> > >       {
> > > @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
> > >    return 0;
> > >  }
> > >
> > > +
> > > +/* Another transformation of loops like:
> > > +
> > > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > > +     {
> > > +       if (expr (a_1, a_2, ..., a_n))
> > > +         a_j = ...;  // change at least one a_j
> > > +       else
> > > +         S;          // not change any a_j
> > > +     }
> > > +
> > > +   into:
> > > +
> > > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > > +     {
> > > +       if (expr (a_1, a_2, ..., a_n))
> > > +         a_j = ...;
> > > +       else
> > > +         {
> > > +           S;
> > > +           i = NEXT ();
> > > +           break;
> > > +         }
> > > +     }
> > > +
> > > +   for (; CHECK (i); i = NEXT ())
> > > +     {
> > > +       S;
> > > +     }
> > > +
> > > +   */
> > > +
> > > +/* Data structure to hold temporary information during loop split upon
> > > +   semi-invariant conditional statement. */
> > > +class split_info {
> > > +public:
> > > +  /* Array of all basic blocks in a loop, returned by
> > get_loop_body(). */
> > > +  basic_block *bbs;
> > > +
> > > +  /* All memory store/clobber statements in a loop. */
> > > +  auto_vec<gimple *> stores;
> > > +
> > > +  /* Whether above memory stores vector has been filled. */
> > > +  bool set_stores;
> > > +
> > > +  /* Semi-invariant conditional statement, upon which to split loop. */
> > > +  gcond *cond;
> > > +
> > > +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> > > +
> > > +  ~split_info ()
> > > +    {
> > > +      if (bbs)
> > > +        free (bbs);
> > > +    }
> > > +};
> > > +
> > > +/* Find all statements with memory-write effect in a loop,
> > including memory
> > > +   store and non-pure function call, and keep those in a vector.
> > This work
> > > +   is only done for one time, for the vector should be constant during
> > > +   analysis stage of semi-invariant condition. */
> > > +
> > > +static void
> > > +find_vdef_in_loop (struct loop *loop)
> > > +{
> > > +  split_info *info = (split_info *) loop->aux;
> > > +  gphi *vphi = get_virtual_phi (loop->header);
> > > +
> > > +  /* Indicate memory store vector has been filled. */
> > > +  info->set_stores = true;
> > > +
> > > +  /* If loop contains memory operation, there must be a virtual PHI
> > node in
> > > +     loop header basic block. */
> > > +  if (vphi == NULL)
> > > +    return;
> > > +
> > > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > > +     graph via virtual PHI nodes. The virtual PHI node in loop
> > header just
> > > +     links the first and the last virtual SSA names, by using the
> > last as
> > > +     PHI operand to define the first. */
> > > +  const edge latch = loop_latch_edge (loop);
> > > +  const tree first = gimple_phi_result (vphi);
> > > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > > +
> > > +  /* The virtual SSA cyclic graph might consist of only one SSA
> > name, who
> > > +     is defined by itself.
> > > +
> > > +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > > +
> > > +     This means the loop contains only memory loads, so we can skip
> > it. */
> > > +  if (first == last)
> > > +    return;
> > > +
> > > +  auto_vec<gimple *> others;
> > > +  auto_vec<tree> worklist;
> > > +  auto_bitmap visited;
> > > +
> > > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > > +  worklist.safe_push (last);
> > > +
> > > +  do
> > > +    {
> > > +      tree vuse = worklist.pop ();
> > > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > > +
> > > +      /* We mark the first and last SSA names as visited at the
> > beginning,
> > > +         and reversely start the process from the last SSA name
> > toward the
> > > +         first, which ensure that this do-while will not touch SSA
> > names
> > > +         defined outside of the loop. */
> > > +      gcc_assert (gimple_bb (stmt)
> > > +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > > +
> > > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > > +        {
> > > +          gphi *phi = as_a <gphi *> (stmt);
> > > +
> > > +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > > +            {
> > > +              tree arg = gimple_phi_arg_def (stmt, i);
> > > +
> > > +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > > +                worklist.safe_push (arg);
> > > +            }
> > > +        }
> > > +      else
> > > +        {
> > > +          tree prev = gimple_vuse (stmt);
> > > +
> > > +          /* Non-pure call statement is conservatively assumed to
> > impact
> > > +             all memory locations. So place call statements ahead
> > of other
> > > +             memory stores in the vector with the idea of of using
> > them as
> > > +             shortcut terminators to memory alias analysis, kind of
> > > +             optimization for compilation. */
> > > +          if (gimple_code (stmt) == GIMPLE_CALL)
> > > +            info->stores.safe_push (stmt);
> > > +          else
> > > +            others.safe_push (stmt);
> > > +
> > > +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > > +            worklist.safe_push (prev);
> > > +        }
> > > +    } while (!worklist.is_empty ());
> > > +
> > > +    info->stores.safe_splice (others);
> > > +}
> > > +
> > > +
> > > +/* Given a memory load or pure call statement, check whether it is
> > impacted
> > > +   by some memory store in the loop excluding those basic blocks
> > dominated
> > > +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> > > +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks
> > of the
> > > +   loop are checked. */
> > > +
> > > +static bool
> > > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > > +                       const_basic_block skip_head)
> > > +{
> > > +  split_info *info = (split_info *) loop->aux;
> > > +
> > > +  /* Collect memory store/clobber statements if have not do that. */
> > > +  if (!info->set_stores)
> > > +    find_vdef_in_loop (loop);
> > > +
> > > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) :
> > NULL_TREE;
> > > +  ao_ref ref;
> > > +  gimple *store;
> > > +  unsigned i;
> > > +
> > > +  ao_ref_init (&ref, rhs);
> > > +
> > > +  FOR_EACH_VEC_ELT (info->stores, i, store)
> > > +    {
> > > +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> > > +      if (skip_head
> > > +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store),
> > skip_head))
> > > +        continue;
> > > +
> > > +      /* For a pure call, it is assumed to be impacted by any
> > memory store.
> > > +         For a memory load, use memory alias analysis to check that. */
> > > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > > +        return false;
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Forward declaration */
> > > +
> > > +static bool
> > > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > > +                       const_basic_block skip_head);
> > > +
> > > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed
> > in certain
> > > +   iteration, check whether an SSA name remains unchanged in next
> > interation.
> > > +   We can call this characterisic as semi-invariantness. SKIP_HEAD
> > might be
> > > +   NULL, if so, nothing excluded, all basic blocks and control
> > flows in the
> > > +   loop will be considered. */
> > > +
> > > +static bool
> > > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > > +                      const_basic_block skip_head)
> > > +{
> > > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > > +  const_basic_block def_bb = gimple_bb (def);
> > > +
> > > +  /* An SSA name defined outside a loop is definitely
> > semi-invariant. */
> > > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > > +    return true;
> > > +
> > > +  /* This function is used to check semi-invariantness of a condition
> > > +     statement, and SKIP_HEAD is always given as head of one of its
> > > +     branches. So it implies that SSA name to check should be defined
> > > +     before the conditional statement, and also before SKIP_HEAD. */
> > > +
> > > +  if (gimple_code (def) == GIMPLE_PHI)
> > > +    {
> > > +      /* In a normal loop, if a PHI node is located not in loop
> > header, all
> > > +         its source operands should be defined inside the loop. As we
> > > +         mentioned before, these source definitions are ahead of
> > SKIP_HEAD,
> > > +         and will not be bypassed. Therefore, in each iteration, any of
> > > +         these sources might be value provider to the SSA name,
> > which for
> > > +         sure should not be seen as invariant. */
> > > +      if (def_bb != loop->header || !skip_head)
> > > +        return false;
> > > +
> > > +      const_edge latch = loop_latch_edge (loop);
> > > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > > +
> > > +      /* A PHI node in loop header always contains two source operands,
> > > +         one is initial value, the other is the copy of last iteration
> > > +         through loop latch, we call it latch value. From this PHI node
> > > +         to definition of latch value, if excluding those basic blocks
> > > +         dominated by SKIP_HEAD, there is no definition of other
> > version
> > > +         of same variable, SSA name defined by the PHI node is
> > > +         semi-invariant.
> > > +
> > > +                         loop entry
> > > +                              |     .--- latch ---.
> > > +                              |     |             |
> > > +                              v     v             |
> > > +                  x_1 = PHI <x_0, x_3>           |
> > > +                           |                      |
> > > +                           v                      |
> > > +              .------- if (cond) -------.         |
> > > +              |                         |         |
> > > +              |                     [ SKIP ]      |
> > > +              |                         |         |
> > > +              |                     x_2 = ...     |
> > > +              |                         |         |
> > > +              '---- T ---->.<---- F ----'         |
> > > +                           |                      |
> > > +                           v                      |
> > > +                  x_3 = PHI <x_1, x_2>            |
> > > +                           |                      |
> > > +                           '----------------------'
> > > +
> > > +        Suppose in certain iteration, execution flow in above graph
> > goes
> > > +        through true branch, which means that one source value to
> > define
> > > +        x_3 in false branch (x2) is skipped, x_3 only comes from
> > x_1, and
> > > +        x_1 in next iterations is defined by x_3, we know that x_1 will
> > > +        never changed if COND always chooses true branch from then
> > on. */
> > > +
> > > +      while (from != name)
> > > +        {
> > > +          /* A new value comes from a CONSTANT. */
> > > +          if (TREE_CODE (from) != SSA_NAME)
> > > +            return false;
> > > +
> > > +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> > > +          const_basic_block bb = gimple_bb (stmt);
> > > +
> > > +          /* A new value comes from outside of loop. */
> > > +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > > +            return false;
> > > +
> > > +          from = NULL_TREE;
> > > +
> > > +          if (gimple_code (stmt) == GIMPLE_PHI)
> > > +            {
> > > +              gphi *phi = as_a <gphi *> (stmt);
> > > +
> > > +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > > +                {
> > > +                  const_edge e = gimple_phi_arg_edge (phi, i);
> > > +
> > > +                  /* Skip redefinition from basic blocks being
> > excluded. */
> > > +                  if (!dominated_by_p (CDI_DOMINATORS, e->src,
> > skip_head))
> > > +                    {
> > > +                      /* There are more than one source operands
> > that can
> > > +                         provide value to the SSA name. */
> > > +                      if (from)
> > > +                        return false;
> > > +
> > > +                      from = gimple_phi_arg_def (phi, i);
> > > +                    }
> > > +                }
> > > +            }
> > > +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > > +            {
> > > +              /* For simple value copy, check its rhs instead. */
> > > +              if (gimple_assign_ssa_name_copy_p (stmt))
> > > +                from = gimple_assign_rhs1 (stmt);
> > > +            }
> > > +
> > > +          /* Any other kind of definition is deemed to introduce a
> > new value
> > > +             to the SSA name. */
> > > +          if (!from)
> > > +            return false;
> > > +        }
> > > +        return true;
> > > +    }
> > > +
> > > +  /* Value originated from volatile memory load or return of normal
> > (non-
> > > +     const/pure) call should not be treated as constant in each
> > iteration. */
> > > +  if (gimple_has_side_effects (def))
> > > +    return false;
> > > +
> > > +  /* Check if any memory store may kill memory load at this place. */
> > > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def,
> > skip_head))
> > > +    return false;
> > > +
> > > +  /* Check operands of definition statement of the SSA name. */
> > > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > > +}
> > > +
> > > +/* Check whether a statement is semi-invariant, iff all its
> > operands are
> > > +   semi-invariant. */
> > > +
> > > +static bool
> > > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > > +                       const_basic_block skip_head)
> > > +{
> > > +  ssa_op_iter iter;
> > > +  tree use;
> > > +
> > > +  /* Although operand of a statement might be SSA name, CONSTANT or
> > VARDECL,
> > > +     here we only need to check SSA name operands. For VARDECL operand
> > > +     involves memory load, check on VARDECL operand must have been done
> > > +     prior to invocation of this function in ssa_semi_invariant_p. */
> > > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > > +    {
> > > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > > +        return false;
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Determine if unselect one branch of a conditional statement,
> > whether we
> > > +   can exclude leading basic block of the branch and those basic blocks
> > > +   dominated by the leading one. */
> > > +
> > > +static bool
> > > +can_branch_be_excluded (basic_block branch_bb)
> > > +{
> > > +  if (single_pred_p (branch_bb))
> > > +    return true;
> > > +
> > > +  edge e;
> > > +  edge_iterator ei;
> > > +
> > > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > > +    {
> > > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > > +        continue;
> > > +
> > > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > > +        continue;
> > > +
> > > +       /* The branch can be reached through other path, not just
> > from the
> > > +          conditional statement. */
> > > +      return false;
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Find out which branch of a conditional statement is invariant. That
> > > +   is: once the branch is selected in certain loop iteration, any
> > operand
> > > +   that contributes to computation of the conditional statement remains
> > > +   unchanged in all following iterations. */
> > > +
> > > +static int
> > > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > > +{
> > > +  basic_block cond_bb = gimple_bb (cond);
> > > +  basic_block targ_bb[2];
> > > +  bool invar[2];
> > > +  unsigned invar_checks;
> > > +
> > > +  for (unsigned i = 0; i < 2; i++)
> > > +    {
> > > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > > +
> > > +      /* One branch directs to loop exit, no need to perform loop
> > split upon
> > > +         this conditional statement. Firstly, it is trivial if the exit
> > > +         branch is semi-invariant, for the statement is just
> > loop-breaking.
> > > +         Secondly, if the opposite branch is semi-invariant, it
> > means that
> > > +         the statement is real loop-invariant, which is covered by loop
> > > +         unswitch. */
> > > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > > +        return -1;
> > > +    }
> > > +
> > > +  invar_checks = 0;
> > > +
> > > +  for (unsigned i = 0; i < 2; i++)
> > > +    {
> > > +      invar[!i] = false;
> > > +
> > > +      if (!can_branch_be_excluded (targ_bb[i]))
> > > +        continue;
> > > +
> > > +      /* Given a semi-invariant branch, if its opposite branch
> > dominates
> > > +         loop latch, it and its following trace will only be
> > executed in
> > > +         final iteration of loop, namely it is not part of repeated
> > body
> > > +         of the loop. Similar to the above case that the branch is loop
> > > +         exit, no need to split loop. */
> > > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > > +        continue;
> > > +
> > > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > > +      invar_checks++;
> > > +    }
> > > +
> > > +  /* With both branches being invariant (handled by loop unswitch) or
> > > +     variant is not what we want. */
> > > +  if (invar[0] ^ !invar[1])
> > > +    return -1;
> > > +
> > > +  /* Found a real loop-invariant condition, do nothing. */
> > > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > > +    return -1;
> > > +
> > > +  return invar[1];
> > > +}
> > > +
> > > +/* Return TRUE is conditional statement in a normal loop is also inside
> > > +   a nested non-recognized loop, such as an irreducible loop. */
> > > +
> > > +static bool
> > > +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> > > +                        int branch)
> > > +{
> > > +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> > > +
> > > +  if (cond_bb == loop->header || branch_bb == loop->latch)
> > > +    return false;
> > > +
> > > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > > +  auto_vec<basic_block> worklist;
> > > +
> > > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > > +    bbs[i]->flags &= ~BB_REACHABLE;
> > > +
> > > +  /* Mark latch basic block as visited to be end point for
> > reachablility
> > > +     traversal. */
> > > +  loop->latch->flags |= BB_REACHABLE;
> > > +
> > > +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> > > +
> > > +  /* Start from specified branch, the opposite branch is ignored for it
> > > +     will not be executed. */
> > > +  branch_bb->flags |= BB_REACHABLE;
> > > +  worklist.safe_push (branch_bb);
> > > +
> > > +  do
> > > +    {
> > > +      basic_block bb = worklist.pop ();
> > > +      edge e;
> > > +      edge_iterator ei;
> > > +
> > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > +        {
> > > +          basic_block succ_bb = e->dest;
> > > +
> > > +          if (succ_bb == cond_bb)
> > > +            return true;
> > > +
> > > +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> > > +            continue;
> > > +
> > > +          if (succ_bb->flags & BB_REACHABLE)
> > > +            continue;
> > > +
> > > +          succ_bb->flags |= BB_REACHABLE;
> > > +          worklist.safe_push (succ_bb);
> > > +        }
> > > +    } while (!worklist.is_empty ());
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +
> > > +/* Calculate increased code size measured by estimated insn number if
> > > +   applying loop split upon certain branch of a conditional
> > statement. */
> > > +
> > > +static int
> > > +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> > > +                         int branch)
> > > +{
> > > +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> > > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > > +  int num = 0;
> > > +
> > > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > > +    {
> > > +      /* Do no count basic blocks only in opposite branch. */
> > > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> > > +        continue;
> > > +
> > > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
> > !gsi_end_p (gsi);
> > > +           gsi_next (&gsi))
> > > +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> > > +    }
> > > +
> > > +  return num;
> > > +}
> > > +
> > > +/* Return true if it is eligible and profitable to perform loop
> > split upon
> > > +   a conditional statement. */
> > > +
> > > +static bool
> > > +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> > > +{
> > > +  int branch = get_cond_invariant_branch (loop, cond);
> > > +
> > > +  if (branch < 0)
> > > +    return false;
> > > +
> > > +  basic_block cond_bb = gimple_bb (cond);
> > > +
> > > +  /* Add a threshold for increased code size to disable loop split. */
> > > +  if (compute_added_num_insns (loop, cond_bb, branch) >
> > > +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> > > +    return false;
> > > +
> > > +  /* In each interation, conditional statement candidate should be
> > > +     executed only once. */
> > > +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> > > +    return false;
> > > +
> > > +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> > > +
> > > +  /* When accurate profile information is available, and execution
> > > +     frequency of the branch is too low, just let it go. */
> > > +  if (prob.reliable_p ())
> > > +    {
> > > +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> > > +
> > > +      if (prob < profile_probability::always ().apply_scale (thres,
> > 100))
> > > +        return false;
> > > +    }
> > > +
> > > +  /* Temporarily keep branch index in conditional statement. */
> > > +  gimple_set_plf (cond, GF_PLF_1, branch);
> > > +  return true;
> > > +}
> > > +
> > > +/* Traverse all conditional statements in a loop, to find out a good
> > > +   candidate upon which we can do loop split. */
> > > +
> > > +static bool
> > > +mark_cond_to_split_loop (struct loop *loop)
> > > +{
> > > +  split_info *info = new split_info ();
> > > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > > +
> > > +  /* Allocate an area to keep temporary info, and associate its address
> > > +     with loop aux field. */
> > > +  loop->aux = info;
> > > +
> > > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > > +    {
> > > +      basic_block bb = bbs[i];
> > > +
> > > +      /* Skip statement in inner recognized loop, because we want that
> > > +         conditional statement executes at most once in each
> > iteration. */
> > > +      if (bb->loop_father != loop)
> > > +        continue;
> > > +
> > > +      /* Actually this check is not a must constraint. With it, we can
> > > +         ensure conditional statement will execute at least once in
> > > +         each iteration. */
> > > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > > +        continue;
> > > +
> > > +      gimple *last = last_stmt (bb);
> > > +
> > > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > > +        continue;
> > > +
> > > +      gcond *cond = as_a <gcond *> (last);
> > > +
> > > +      if (can_split_loop_on_cond (loop, cond))
> > > +        {
> > > +          info->cond = cond;
> > > +          return true;
> > > +        }
> > > +    }
> > > +
> > > +  delete info;
> > > +  loop->aux = NULL;
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +/* Given a loop with a chosen conditional statement candidate,
> > perform loop
> > > +   split transformation illustrated as the following graph.
> > > +
> > > +               .-------T------ if (true) ------F------.
> > > +               |                    .---------------. |
> > > +               |                    |               | |
> > > +               v                    |               v v
> > > +          pre-header                | pre-header
> > > +               | .------------.     | | .------------.
> > > +               | |            |     | | |            |
> > > +               | v            |     | | v            |
> > > +             header           |     | header           |
> > > +               |              |     | |              |
> > > +       [ bool r = cond; ]     |     | |              |
> > > +               |              |     | |              |
> > > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > > +      |                 |     |     | |                 |     |
> > > +  invariant             |     |     | invariant             |     |
> > > +      |                 |     |     | |                 |     |
> > > +      '---T--->.<---F---'     |     | '---T--->.<---F---'     |
> > > +               |              |    / |              |
> > > +             stmts            |   / stmts            |
> > > +               |              |  / |              |
> > > +              / \             | /                    / \             |
> > > +     .-------*   *       [ if (!r) ] .-------*   *            |
> > > +     |           |            | |           |            |
> > > +     |         latch          |             | latch          |
> > > +     |           |            | |           |            |
> > > +     |           '------------' |           '------------'
> > > +     '------------------------. .-----------'
> > > +             loop1            | | loop2
> > > +                              v v
> > > +                             exits
> > > +
> > > +   In the graph, loop1 represents the part derived from original
> > one, and
> > > +   loop2 is duplicated using loop_version (), which corresponds to
> > the part
> > > +   of original one being splitted out. In loop1, a new bool
> > temporary (r)
> > > +   is introduced to keep value of the condition result. In original
> > latch
> > > +   edge of loop1, we insert a new conditional statement whose value
> > comes
> > > +   from previous temporary (r), one of its branch goes back to
> > loop1 header
> > > +   as a latch edge, and the other branch goes to loop2 pre-header as an
> > > +   entry edge. And also in loop2, we abandon the variant branch of the
> > > +   conditional statement candidate by setting a constant bool
> > condition,
> > > +   based on which branch is semi-invariant. */
> > > +
> > > +static bool
> > > +split_loop_for_cond (struct loop *loop1)
> > > +{
> > > +  split_info *info = (split_info *) loop1->aux;
> > > +  gcond *cond = info->cond;
> > > +  basic_block cond_bb = gimple_bb (cond);
> > > +  int branch = gimple_plf (cond, GF_PLF_1);
> > > +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags &
> > EDGE_TRUE_VALUE);
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +   {
> > > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB
> > %d\n",
> > > +              current_function_name (), loop1->num,
> > > +              true_invar ? "T" : "F", cond_bb->index);
> > > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > > +   }
> > > +
> > > +  initialize_original_copy_tables ();
> > > +
> > > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > > + profile_probability::always (),
> > > + profile_probability::never (),
> > > + profile_probability::always (),
> > > + profile_probability::always (),
> > > +                                     true);
> > > +  if (!loop2)
> > > +    {
> > > +      free_original_copy_tables ();
> > > +      return false;
> > > +    }
> > > +
> > > +  /* Generate a bool type temporary to hold result of the condition. */
> > > +  tree tmp = make_ssa_name (boolean_type_node);
> > > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > > +  gimple *stmt = gimple_build_assign (tmp,
> > > +                                      gimple_cond_code (cond),
> > > +                                      gimple_cond_lhs (cond),
> > > +                                      gimple_cond_rhs (cond));
> > > +
> > > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > > +  update_stmt (cond);
> > > +
> > > +  /* Replace the condition in loop2 with a bool constant to let pass
> > > +     manager remove the variant branch after current pass finishes. */
> > > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > > +
> > > +  if (true_invar)
> > > +    gimple_cond_make_true (cond_copy);
> > > +  else
> > > +    gimple_cond_make_false (cond_copy);
> > > +
> > > +  update_stmt (cond_copy);
> > > +
> > > +  /* Insert a new conditional statement on latch edge of loop1. This
> > > +     statement acts as a switch to transfer execution from loop1 to
> > > +     loop2, when loop1 enters into invariant state. */
> > > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp,
> > boolean_true_node,
> > > +                                          NULL_TREE, NULL_TREE);
> > > +
> > > +  gsi = gsi_last_bb (break_bb);
> > > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > > +
> > > +  edge to_loop1 = single_succ_edge (break_bb);
> > > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge
> > (loop2)->src, 0);
> > > +
> > > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > > +
> > > +  if (true_invar)
> > > +    {
> > > +      to_loop1->flags |= EDGE_FALSE_VALUE;
> > > +      to_loop2->flags |= EDGE_TRUE_VALUE;
> > > +    }
> > > +  else
> > > +    {
> > > +      to_loop1->flags |= EDGE_TRUE_VALUE;
> > > +      to_loop2->flags |= EDGE_FALSE_VALUE;
> > > +    }
> > > +
> > > +  update_ssa (TODO_update_ssa);
> > > +
> > > +  /* Due to introduction of a control flow edge from loop1 latch to
> > loop2
> > > +     pre-header, we should update PHIs in loop2 to reflect this
> > connection
> > > +     between loop1 and loop2. */
> > > +  connect_loop_phis (loop1, loop2, to_loop2);
> > > +
> > > +  free_original_copy_tables ();
> > > +
> > > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Main entry point to perform loop splitting for suitable
> > if-conditions
> > > +   in all loops. */
> > > +
> > > +static unsigned int
> > > +tree_ssa_split_loops_for_cond (void)
> > > +{
> > > +  struct loop *loop;
> > > +  auto_vec<struct loop *> loop_list;
> > > +  bool changed = false;
> > > +  unsigned i;
> > > +
> > > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > > +    loop->aux = NULL;
> > > +
> > > +  /* Go through all loops starting from innermost. */
> > > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > > +    {
> > > +      /* Put loop in a list if found a conditional statement
> > candidate in
> > > +         the loop. This is stage for analysis, no change anything
> > in the
> > > +         function. */
> > > +      if (!loop->aux
> > > +          && !optimize_loop_for_size_p (loop)
> > > +          && mark_cond_to_split_loop (loop))
> > > +        loop_list.safe_push (loop);
> > > +
> > > +      /* If any of our inner loops was split, don't split us,
> > > +         and mark our containing loop as having had splits as well. */
> > > +      loop_outer (loop)->aux = loop->aux;
> > > +    }
> > > +
> > > +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> > > +    {
> > > +      /* Extract selected loop and perform loop split. This is
> > stage for
> > > +         transformation. */
> > > +      changed |= split_loop_for_cond (loop);
> > > +
> > > +      delete (split_info *) loop->aux;
> > > +    }
> > > +
> > > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > > +    loop->aux = NULL;
> > > +
> > > +  if (changed)
> > > +    return TODO_cleanup_cfg;
> > > +  return 0;
> > > +}
> > > +
> > > +
> > >  /* Loop splitting pass.  */
> > >
> > >  namespace {
> > > @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
> > >  {
> > >    return new pass_loop_split (ctxt);
> > >  }
> > > +
> > > +namespace {
> > > +
> > > +const pass_data pass_data_cond_loop_split =
> > > +{
> > > +  GIMPLE_PASS, /* type */
> > > +  "cond_lsplit", /* name */
> > > +  OPTGROUP_LOOP, /* optinfo_flags */
> > > +  TV_COND_LOOP_SPLIT, /* tv_id */
> > > +  PROP_cfg, /* properties_required */
> > > +  0, /* properties_provided */
> > > +  0, /* properties_destroyed */
> > > +  0, /* todo_flags_start */
> > > +  0, /* todo_flags_finish */
> > > +};
> > > +
> > > +class pass_cond_loop_split : public gimple_opt_pass
> > > +{
> > > +public:
> > > +  pass_cond_loop_split (gcc::context *ctxt)
> > > +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> > > +  {}
> > > +
> > > +  /* opt_pass methods: */
> > > +  virtual bool gate (function *) { return flag_split_loops != 0; }
> > > +  virtual unsigned int execute (function *);
> > > +
> > > +}; // class pass_cond_loop_split
> > > +
> > > +unsigned int
> > > +pass_cond_loop_split::execute (function *fun)
> > > +{
> > > +  if (number_of_loops (fun) <= 1)
> > > +    return 0;
> > > +
> > > +  return tree_ssa_split_loops_for_cond ();
> > > +}
> > > +
> > > +} // anon namespace
> > > +
> > > +gimple_opt_pass *
> > > +make_pass_cond_loop_split (gcc::context *ctxt)
> > > +{
> > > +  return new pass_cond_loop_split (ctxt);
> > > +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-13 12:11       ` Richard Biener
@ 2019-03-13 12:39         ` Kyrill Tkachov
  0 siblings, 0 replies; 31+ messages in thread
From: Kyrill Tkachov @ 2019-03-13 12:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: Feng Xue OS, gcc-patches


On 3/13/19 12:07 PM, Richard Biener wrote:
> On Wed, Mar 13, 2019 at 10:40 AM Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi Feng,
>>
>> On 3/13/19 1:56 AM, Feng Xue OS wrote:
>>> Richard,
>>>
>>>      Thanks for your comment. Yes, it is like kind of jump threading
>>> with knowledge of loop structure. And what is rough time for GCC 10?
>>>
>>>
>> GCC 10 will be released once the number of P1 regressions gets down to
>> zero. Past experience shows that it's around the April/May timeframe.
> Note GCC 10 is due only next year.

Errr, yes. I meant that GCC 10 *development* will start once GCC 9 is 
*released*.

Thanks,

Kyrill

>
>> In the meantime my comment on the patch is that you should add some
>> tests to the testsuite that showcase this transformation.
>>
>> Thanks,
>>
>> Kyrill
>>
>>
>>> Regards,
>>>
>>> Feng
>>>
>>>
>>> ________________________________
>>> From: Richard Biener <richard.guenther@gmail.com>
>>> Sent: Tuesday, March 12, 2019 4:31:49 PM
>>> To: Feng Xue OS
>>> Cc: gcc-patches@gcc.gnu.org
>>> Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR
>>> tree-optimization/89134)
>>>
>>> On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS
>>> <fxue@os.amperecomputing.com> wrote:
>>>> This patch is composed to implement a loop transformation on one of
>>> its conditional statements, which we call it semi-invariant, in that
>>> its computation is impacted in only one of its branches.
>>>> Suppose a loop as:
>>>>
>>>>      void f (std::map<int, int> m)
>>>>      {
>>>>          for (auto it = m.begin (); it != m.end (); ++it) {
>>>>              /* if (b) is semi-invariant. */
>>>>              if (b) {
>>>>                  b = do_something();    /* Has effect on b */
>>>>              } else {
>>>> /* No effect on b */
>>>>              }
>>>>              statements;                      /* Also no effect on b */
>>>>          }
>>>>      }
>>>>
>>>> A transformation, kind of loop split, could be:
>>>>
>>>>      void f (std::map<int, int> m)
>>>>      {
>>>>          for (auto it = m.begin (); it != m.end (); ++it) {
>>>>              if (b) {
>>>>                  b = do_something();
>>>>              } else {
>>>>                  ++it;
>>>>                  statements;
>>>>                  break;
>>>>              }
>>>>              statements;
>>>>          }
>>>>
>>>>          for (; it != m.end (); ++it) {
>>>>              statements;
>>>>          }
>>>>      }
>>>>
>>>> If "statements" contains nothing, the second loop becomes an empty
>>> one, which can be removed. (This part will be given in another patch).
>>> And if "statements" are straight line instructions, we get an
>>> opportunity to vectorize the second loop. In practice, this
>>> optimization is found to improve some real application by %7.
>>>> Since it is just a kind of loop split, the codes are mainly placed
>>> in existing tree-ssa-loop-split module, and is controlled by
>>> -fsplit-loop, and is enabled with -O3.
>>>
>>> Note the transform itself is jump-threading with the threading
>>> duplicating a whole CFG cycle.
>>>
>>> I didn't look at the patch details yet since this is suitable for GCC
>>> 10 only.
>>>
>>> Thanks for implementing this.
>>> Richard.
>>>
>>>> Feng
>>>>
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index 64bf6017d16..a6c2878d652 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,23 @@
>>>> +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
>>>> +
>>>> +       PR tree-optimization/89134
>>>> +        * doc/invoke.texi (max-cond-loop-split-insns): Document new
>>> --params.
>>>> +       (min-cond-loop-split-prob): Likewise.
>>>> +       * params.def: Add max-cond-loop-split-insns,
>>> min-cond-loop-split-prob.
>>>> +       * passes.def (pass_cond_loop_split) : New pass.
>>>> +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
>>>> +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
>>>> +       * tree-ssa-loop-split.c (split_info): New class.
>>>> +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
>>>> +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
>>>> +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
>>>> +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
>>>> +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
>>>> +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
>>>> +       (pass_data_cond_loop_split): New variable.
>>>> +       (pass_cond_loop_split): New class.
>>>> +       (make_pass_cond_loop_split): New function.
>>>> +
>>>>   2019-03-11  Jakub Jelinek  <jakub@redhat.com>
>>>>
>>>>          PR middle-end/89655
>>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>>> index df0883f2fc9..f5e09bd71fd 100644
>>>> --- a/gcc/doc/invoke.texi
>>>> +++ b/gcc/doc/invoke.texi
>>>> @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched
>>> in a single loop.
>>>>   @item lim-expensive
>>>>   The minimum cost of an expensive expression in the loop invariant
>>> motion.
>>>> +@item max-cond-loop-split-insns
>>>> +The maximum number of insns to be increased due to loop split on
>>>> +semi-invariant condition statement.
>>>> +
>>>> +@item min-cond-loop-split-prob
>>>> +The minimum threshold for probability of semi-invaraint condition
>>>> +statement to trigger loop split.
>>>> +
>>>>   @item iv-consider-all-candidates-bound
>>>>   Bound on number of candidates for induction variables, below which
>>>>   all candidates are considered for each use in induction variable
>>>> diff --git a/gcc/params.def b/gcc/params.def
>>>> index 3f1576448be..2e067526958 100644
>>>> --- a/gcc/params.def
>>>> +++ b/gcc/params.def
>>>> @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>>>>          "The maximum number of unswitchings in a single loop.",
>>>>          3, 0, 0)
>>>>
>>>> +/* The maximum number of increased insns due to loop split on
>>> semi-invariant
>>>> +   condition statement.  */
>>>> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
>>>> +       "max-cond-loop-split-insns",
>>>> +       "The maximum number of insns to be increased due to loop
>>> split on semi-invariant condition statement.",
>>>> +       100, 0, 0)
>>>> +
>>>> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
>>>> +       "min-cond-loop-split-prob",
>>>> +       "The minimum threshold for probability of semi-invaraint
>>> condition statement to trigger loop split.",
>>>> +       30, 0, 100)
>>>> +
>>>>   /* The maximum number of insns in loop header duplicated by the
>>> copy loop
>>>>      headers pass.  */
>>>>   DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>>> index 446a7c48276..bde7f4c50c0 100644
>>>> --- a/gcc/passes.def
>>>> +++ b/gcc/passes.def
>>>> @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>            NEXT_PASS (pass_tree_unswitch);
>>>>            NEXT_PASS (pass_scev_cprop);
>>>>            NEXT_PASS (pass_loop_split);
>>>> +         NEXT_PASS (pass_cond_loop_split);
>>>>            NEXT_PASS (pass_loop_versioning);
>>>>            NEXT_PASS (pass_loop_jam);
>>>>            /* All unswitching, final value replacement and splitting
>>> can expose
>>>> diff --git a/gcc/timevar.def b/gcc/timevar.def
>>>> index 54154464a58..39f2df0e3ec 100644
>>>> --- a/gcc/timevar.def
>>>> +++ b/gcc/timevar.def
>>>> @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree
>>> canonical iv")
>>>>   DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>>>>   DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>>>>   DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
>>>> +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
>>>>   DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
>>>>   DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>>>>   DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
>>>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
>>>> index 47be59b2a11..f441ba36871 100644
>>>> --- a/gcc/tree-pass.h
>>>> +++ b/gcc/tree-pass.h
>>>> @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim
>>> (gcc::context *ctxt);
>>>>   extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
>>>>   extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>>>>   extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
>>>> +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
>>>>   extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
>>>>   extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>>>>   extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>> index 999c9a30366..d287a0d7d4c 100644
>>>> --- a/gcc/tree-ssa-loop-split.c
>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>> @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>   #include "tree-ssa-loop.h"
>>>>   #include "tree-ssa-loop-manip.h"
>>>>   #include "tree-into-ssa.h"
>>>> +#include "tree-inline.h"
>>>>   #include "cfgloop.h"
>>>> +#include "params.h"
>>>>   #include "tree-scalar-evolution.h"
>>>>   #include "gimple-iterator.h"
>>>>   #include "gimple-pretty-print.h"
>>>> @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>   #include "gimple-fold.h"
>>>>   #include "gimplify-me.h"
>>>>
>>>> -/* This file implements loop splitting, i.e. transformation of
>>> loops like
>>>> +/* This file implements two kind of loop splitting.
>>>> +
>>>> +   One transformation of loops like:
>>>>
>>>>      for (i = 0; i < 100; i++)
>>>>        {
>>>> @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
>>>>     return 0;
>>>>   }
>>>>
>>>> +
>>>> +/* Another transformation of loops like:
>>>> +
>>>> +   for (i = INIT (); CHECK (i); i = NEXT ())
>>>> +     {
>>>> +       if (expr (a_1, a_2, ..., a_n))
>>>> +         a_j = ...;  // change at least one a_j
>>>> +       else
>>>> +         S;          // not change any a_j
>>>> +     }
>>>> +
>>>> +   into:
>>>> +
>>>> +   for (i = INIT (); CHECK (i); i = NEXT ())
>>>> +     {
>>>> +       if (expr (a_1, a_2, ..., a_n))
>>>> +         a_j = ...;
>>>> +       else
>>>> +         {
>>>> +           S;
>>>> +           i = NEXT ();
>>>> +           break;
>>>> +         }
>>>> +     }
>>>> +
>>>> +   for (; CHECK (i); i = NEXT ())
>>>> +     {
>>>> +       S;
>>>> +     }
>>>> +
>>>> +   */
>>>> +
>>>> +/* Data structure to hold temporary information during loop split upon
>>>> +   semi-invariant conditional statement. */
>>>> +class split_info {
>>>> +public:
>>>> +  /* Array of all basic blocks in a loop, returned by
>>> get_loop_body(). */
>>>> +  basic_block *bbs;
>>>> +
>>>> +  /* All memory store/clobber statements in a loop. */
>>>> +  auto_vec<gimple *> stores;
>>>> +
>>>> +  /* Whether above memory stores vector has been filled. */
>>>> +  bool set_stores;
>>>> +
>>>> +  /* Semi-invariant conditional statement, upon which to split loop. */
>>>> +  gcond *cond;
>>>> +
>>>> +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
>>>> +
>>>> +  ~split_info ()
>>>> +    {
>>>> +      if (bbs)
>>>> +        free (bbs);
>>>> +    }
>>>> +};
>>>> +
>>>> +/* Find all statements with memory-write effect in a loop,
>>> including memory
>>>> +   store and non-pure function call, and keep those in a vector.
>>> This work
>>>> +   is only done for one time, for the vector should be constant during
>>>> +   analysis stage of semi-invariant condition. */
>>>> +
>>>> +static void
>>>> +find_vdef_in_loop (struct loop *loop)
>>>> +{
>>>> +  split_info *info = (split_info *) loop->aux;
>>>> +  gphi *vphi = get_virtual_phi (loop->header);
>>>> +
>>>> +  /* Indicate memory store vector has been filled. */
>>>> +  info->set_stores = true;
>>>> +
>>>> +  /* If loop contains memory operation, there must be a virtual PHI
>>> node in
>>>> +     loop header basic block. */
>>>> +  if (vphi == NULL)
>>>> +    return;
>>>> +
>>>> +  /* All virtual SSA names inside the loop are connected to be a cyclic
>>>> +     graph via virtual PHI nodes. The virtual PHI node in loop
>>> header just
>>>> +     links the first and the last virtual SSA names, by using the
>>> last as
>>>> +     PHI operand to define the first. */
>>>> +  const edge latch = loop_latch_edge (loop);
>>>> +  const tree first = gimple_phi_result (vphi);
>>>> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
>>>> +
>>>> +  /* The virtual SSA cyclic graph might consist of only one SSA
>>> name, who
>>>> +     is defined by itself.
>>>> +
>>>> +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
>>>> +
>>>> +     This means the loop contains only memory loads, so we can skip
>>> it. */
>>>> +  if (first == last)
>>>> +    return;
>>>> +
>>>> +  auto_vec<gimple *> others;
>>>> +  auto_vec<tree> worklist;
>>>> +  auto_bitmap visited;
>>>> +
>>>> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
>>>> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
>>>> +  worklist.safe_push (last);
>>>> +
>>>> +  do
>>>> +    {
>>>> +      tree vuse = worklist.pop ();
>>>> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
>>>> +
>>>> +      /* We mark the first and last SSA names as visited at the
>>> beginning,
>>>> +         and reversely start the process from the last SSA name
>>> toward the
>>>> +         first, which ensure that this do-while will not touch SSA
>>> names
>>>> +         defined outside of the loop. */
>>>> +      gcc_assert (gimple_bb (stmt)
>>>> +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
>>>> +
>>>> +      if (gimple_code (stmt) == GIMPLE_PHI)
>>>> +        {
>>>> +          gphi *phi = as_a <gphi *> (stmt);
>>>> +
>>>> +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>>>> +            {
>>>> +              tree arg = gimple_phi_arg_def (stmt, i);
>>>> +
>>>> +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
>>>> +                worklist.safe_push (arg);
>>>> +            }
>>>> +        }
>>>> +      else
>>>> +        {
>>>> +          tree prev = gimple_vuse (stmt);
>>>> +
>>>> +          /* Non-pure call statement is conservatively assumed to
>>> impact
>>>> +             all memory locations. So place call statements ahead
>>> of other
>>>> +             memory stores in the vector with the idea of of using
>>> them as
>>>> +             shortcut terminators to memory alias analysis, kind of
>>>> +             optimization for compilation. */
>>>> +          if (gimple_code (stmt) == GIMPLE_CALL)
>>>> +            info->stores.safe_push (stmt);
>>>> +          else
>>>> +            others.safe_push (stmt);
>>>> +
>>>> +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
>>>> +            worklist.safe_push (prev);
>>>> +        }
>>>> +    } while (!worklist.is_empty ());
>>>> +
>>>> +    info->stores.safe_splice (others);
>>>> +}
>>>> +
>>>> +
>>>> +/* Given a memory load or pure call statement, check whether it is
>>> impacted
>>>> +   by some memory store in the loop excluding those basic blocks
>>> dominated
>>>> +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
>>>> +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks
>>> of the
>>>> +   loop are checked. */
>>>> +
>>>> +static bool
>>>> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
>>>> +                       const_basic_block skip_head)
>>>> +{
>>>> +  split_info *info = (split_info *) loop->aux;
>>>> +
>>>> +  /* Collect memory store/clobber statements if have not do that. */
>>>> +  if (!info->set_stores)
>>>> +    find_vdef_in_loop (loop);
>>>> +
>>>> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) :
>>> NULL_TREE;
>>>> +  ao_ref ref;
>>>> +  gimple *store;
>>>> +  unsigned i;
>>>> +
>>>> +  ao_ref_init (&ref, rhs);
>>>> +
>>>> +  FOR_EACH_VEC_ELT (info->stores, i, store)
>>>> +    {
>>>> +      /* Skip those basic blocks dominated by SKIP_HEAD. */
>>>> +      if (skip_head
>>>> +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store),
>>> skip_head))
>>>> +        continue;
>>>> +
>>>> +      /* For a pure call, it is assumed to be impacted by any
>>> memory store.
>>>> +         For a memory load, use memory alias analysis to check that. */
>>>> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Forward declaration */
>>>> +
>>>> +static bool
>>>> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
>>>> +                       const_basic_block skip_head);
>>>> +
>>>> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed
>>> in certain
>>>> +   iteration, check whether an SSA name remains unchanged in next
>>> interation.
>>>> +   We can call this characterisic as semi-invariantness. SKIP_HEAD
>>> might be
>>>> +   NULL, if so, nothing excluded, all basic blocks and control
>>> flows in the
>>>> +   loop will be considered. */
>>>> +
>>>> +static bool
>>>> +ssa_semi_invariant_p (struct loop *loop, const tree name,
>>>> +                      const_basic_block skip_head)
>>>> +{
>>>> +  gimple *def = SSA_NAME_DEF_STMT (name);
>>>> +  const_basic_block def_bb = gimple_bb (def);
>>>> +
>>>> +  /* An SSA name defined outside a loop is definitely
>>> semi-invariant. */
>>>> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
>>>> +    return true;
>>>> +
>>>> +  /* This function is used to check semi-invariantness of a condition
>>>> +     statement, and SKIP_HEAD is always given as head of one of its
>>>> +     branches. So it implies that SSA name to check should be defined
>>>> +     before the conditional statement, and also before SKIP_HEAD. */
>>>> +
>>>> +  if (gimple_code (def) == GIMPLE_PHI)
>>>> +    {
>>>> +      /* In a normal loop, if a PHI node is located not in loop
>>> header, all
>>>> +         its source operands should be defined inside the loop. As we
>>>> +         mentioned before, these source definitions are ahead of
>>> SKIP_HEAD,
>>>> +         and will not be bypassed. Therefore, in each iteration, any of
>>>> +         these sources might be value provider to the SSA name,
>>> which for
>>>> +         sure should not be seen as invariant. */
>>>> +      if (def_bb != loop->header || !skip_head)
>>>> +        return false;
>>>> +
>>>> +      const_edge latch = loop_latch_edge (loop);
>>>> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
>>>> +
>>>> +      /* A PHI node in loop header always contains two source operands,
>>>> +         one is initial value, the other is the copy of last iteration
>>>> +         through loop latch, we call it latch value. From this PHI node
>>>> +         to definition of latch value, if excluding those basic blocks
>>>> +         dominated by SKIP_HEAD, there is no definition of other
>>> version
>>>> +         of same variable, SSA name defined by the PHI node is
>>>> +         semi-invariant.
>>>> +
>>>> +                         loop entry
>>>> +                              |     .--- latch ---.
>>>> +                              |     |             |
>>>> +                              v     v             |
>>>> +                  x_1 = PHI <x_0, x_3>           |
>>>> +                           |                      |
>>>> +                           v                      |
>>>> +              .------- if (cond) -------.         |
>>>> +              |                         |         |
>>>> +              |                     [ SKIP ]      |
>>>> +              |                         |         |
>>>> +              |                     x_2 = ...     |
>>>> +              |                         |         |
>>>> +              '---- T ---->.<---- F ----'         |
>>>> +                           |                      |
>>>> +                           v                      |
>>>> +                  x_3 = PHI <x_1, x_2>            |
>>>> +                           |                      |
>>>> +                           '----------------------'
>>>> +
>>>> +        Suppose in certain iteration, execution flow in above graph
>>> goes
>>>> +        through true branch, which means that one source value to
>>> define
>>>> +        x_3 in false branch (x2) is skipped, x_3 only comes from
>>> x_1, and
>>>> +        x_1 in next iterations is defined by x_3, we know that x_1 will
>>>> +        never changed if COND always chooses true branch from then
>>> on. */
>>>> +
>>>> +      while (from != name)
>>>> +        {
>>>> +          /* A new value comes from a CONSTANT. */
>>>> +          if (TREE_CODE (from) != SSA_NAME)
>>>> +            return false;
>>>> +
>>>> +          gimple *stmt = SSA_NAME_DEF_STMT (from);
>>>> +          const_basic_block bb = gimple_bb (stmt);
>>>> +
>>>> +          /* A new value comes from outside of loop. */
>>>> +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
>>>> +            return false;
>>>> +
>>>> +          from = NULL_TREE;
>>>> +
>>>> +          if (gimple_code (stmt) == GIMPLE_PHI)
>>>> +            {
>>>> +              gphi *phi = as_a <gphi *> (stmt);
>>>> +
>>>> +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>>>> +                {
>>>> +                  const_edge e = gimple_phi_arg_edge (phi, i);
>>>> +
>>>> +                  /* Skip redefinition from basic blocks being
>>> excluded. */
>>>> +                  if (!dominated_by_p (CDI_DOMINATORS, e->src,
>>> skip_head))
>>>> +                    {
>>>> +                      /* There are more than one source operands
>>> that can
>>>> +                         provide value to the SSA name. */
>>>> +                      if (from)
>>>> +                        return false;
>>>> +
>>>> +                      from = gimple_phi_arg_def (phi, i);
>>>> +                    }
>>>> +                }
>>>> +            }
>>>> +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
>>>> +            {
>>>> +              /* For simple value copy, check its rhs instead. */
>>>> +              if (gimple_assign_ssa_name_copy_p (stmt))
>>>> +                from = gimple_assign_rhs1 (stmt);
>>>> +            }
>>>> +
>>>> +          /* Any other kind of definition is deemed to introduce a
>>> new value
>>>> +             to the SSA name. */
>>>> +          if (!from)
>>>> +            return false;
>>>> +        }
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +  /* Value originated from volatile memory load or return of normal
>>> (non-
>>>> +     const/pure) call should not be treated as constant in each
>>> iteration. */
>>>> +  if (gimple_has_side_effects (def))
>>>> +    return false;
>>>> +
>>>> +  /* Check if any memory store may kill memory load at this place. */
>>>> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def,
>>> skip_head))
>>>> +    return false;
>>>> +
>>>> +  /* Check operands of definition statement of the SSA name. */
>>>> +  return stmt_semi_invariant_p (loop, def, skip_head);
>>>> +}
>>>> +
>>>> +/* Check whether a statement is semi-invariant, iff all its
>>> operands are
>>>> +   semi-invariant. */
>>>> +
>>>> +static bool
>>>> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
>>>> +                       const_basic_block skip_head)
>>>> +{
>>>> +  ssa_op_iter iter;
>>>> +  tree use;
>>>> +
>>>> +  /* Although operand of a statement might be SSA name, CONSTANT or
>>> VARDECL,
>>>> +     here we only need to check SSA name operands. For VARDECL operand
>>>> +     involves memory load, check on VARDECL operand must have been done
>>>> +     prior to invocation of this function in ssa_semi_invariant_p. */
>>>> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>>> +    {
>>>> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Determine if unselect one branch of a conditional statement,
>>> whether we
>>>> +   can exclude leading basic block of the branch and those basic blocks
>>>> +   dominated by the leading one. */
>>>> +
>>>> +static bool
>>>> +can_branch_be_excluded (basic_block branch_bb)
>>>> +{
>>>> +  if (single_pred_p (branch_bb))
>>>> +    return true;
>>>> +
>>>> +  edge e;
>>>> +  edge_iterator ei;
>>>> +
>>>> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
>>>> +    {
>>>> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
>>>> +        continue;
>>>> +
>>>> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
>>>> +        continue;
>>>> +
>>>> +       /* The branch can be reached through other path, not just
>>> from the
>>>> +          conditional statement. */
>>>> +      return false;
>>>> +    }
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Find out which branch of a conditional statement is invariant. That
>>>> +   is: once the branch is selected in certain loop iteration, any
>>> operand
>>>> +   that contributes to computation of the conditional statement remains
>>>> +   unchanged in all following iterations. */
>>>> +
>>>> +static int
>>>> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
>>>> +{
>>>> +  basic_block cond_bb = gimple_bb (cond);
>>>> +  basic_block targ_bb[2];
>>>> +  bool invar[2];
>>>> +  unsigned invar_checks;
>>>> +
>>>> +  for (unsigned i = 0; i < 2; i++)
>>>> +    {
>>>> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
>>>> +
>>>> +      /* One branch directs to loop exit, no need to perform loop
>>> split upon
>>>> +         this conditional statement. Firstly, it is trivial if the exit
>>>> +         branch is semi-invariant, for the statement is just
>>> loop-breaking.
>>>> +         Secondly, if the opposite branch is semi-invariant, it
>>> means that
>>>> +         the statement is real loop-invariant, which is covered by loop
>>>> +         unswitch. */
>>>> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +  invar_checks = 0;
>>>> +
>>>> +  for (unsigned i = 0; i < 2; i++)
>>>> +    {
>>>> +      invar[!i] = false;
>>>> +
>>>> +      if (!can_branch_be_excluded (targ_bb[i]))
>>>> +        continue;
>>>> +
>>>> +      /* Given a semi-invariant branch, if its opposite branch
>>> dominates
>>>> +         loop latch, it and its following trace will only be
>>> executed in
>>>> +         final iteration of loop, namely it is not part of repeated
>>> body
>>>> +         of the loop. Similar to the above case that the branch is loop
>>>> +         exit, no need to split loop. */
>>>> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
>>>> +        continue;
>>>> +
>>>> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
>>>> +      invar_checks++;
>>>> +    }
>>>> +
>>>> +  /* With both branches being invariant (handled by loop unswitch) or
>>>> +     variant is not what we want. */
>>>> +  if (invar[0] ^ !invar[1])
>>>> +    return -1;
>>>> +
>>>> +  /* Found a real loop-invariant condition, do nothing. */
>>>> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
>>>> +    return -1;
>>>> +
>>>> +  return invar[1];
>>>> +}
>>>> +
>>>> +/* Return TRUE is conditional statement in a normal loop is also inside
>>>> +   a nested non-recognized loop, such as an irreducible loop. */
>>>> +
>>>> +static bool
>>>> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
>>>> +                        int branch)
>>>> +{
>>>> +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
>>>> +
>>>> +  if (cond_bb == loop->header || branch_bb == loop->latch)
>>>> +    return false;
>>>> +
>>>> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
>>>> +  auto_vec<basic_block> worklist;
>>>> +
>>>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>>>> +    bbs[i]->flags &= ~BB_REACHABLE;
>>>> +
>>>> +  /* Mark latch basic block as visited to be end point for
>>> reachablility
>>>> +     traversal. */
>>>> +  loop->latch->flags |= BB_REACHABLE;
>>>> +
>>>> +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
>>>> +
>>>> +  /* Start from specified branch, the opposite branch is ignored for it
>>>> +     will not be executed. */
>>>> +  branch_bb->flags |= BB_REACHABLE;
>>>> +  worklist.safe_push (branch_bb);
>>>> +
>>>> +  do
>>>> +    {
>>>> +      basic_block bb = worklist.pop ();
>>>> +      edge e;
>>>> +      edge_iterator ei;
>>>> +
>>>> +      FOR_EACH_EDGE (e, ei, bb->succs)
>>>> +        {
>>>> +          basic_block succ_bb = e->dest;
>>>> +
>>>> +          if (succ_bb == cond_bb)
>>>> +            return true;
>>>> +
>>>> +          if (!flow_bb_inside_loop_p (loop, succ_bb))
>>>> +            continue;
>>>> +
>>>> +          if (succ_bb->flags & BB_REACHABLE)
>>>> +            continue;
>>>> +
>>>> +          succ_bb->flags |= BB_REACHABLE;
>>>> +          worklist.safe_push (succ_bb);
>>>> +        }
>>>> +    } while (!worklist.is_empty ());
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +
>>>> +/* Calculate increased code size measured by estimated insn number if
>>>> +   applying loop split upon certain branch of a conditional
>>> statement. */
>>>> +
>>>> +static int
>>>> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
>>>> +                         int branch)
>>>> +{
>>>> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
>>>> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
>>>> +  int num = 0;
>>>> +
>>>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>>>> +    {
>>>> +      /* Do no count basic blocks only in opposite branch. */
>>>> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
>>>> +        continue;
>>>> +
>>>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
>>> !gsi_end_p (gsi);
>>>> +           gsi_next (&gsi))
>>>> +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
>>>> +    }
>>>> +
>>>> +  return num;
>>>> +}
>>>> +
>>>> +/* Return true if it is eligible and profitable to perform loop
>>> split upon
>>>> +   a conditional statement. */
>>>> +
>>>> +static bool
>>>> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
>>>> +{
>>>> +  int branch = get_cond_invariant_branch (loop, cond);
>>>> +
>>>> +  if (branch < 0)
>>>> +    return false;
>>>> +
>>>> +  basic_block cond_bb = gimple_bb (cond);
>>>> +
>>>> +  /* Add a threshold for increased code size to disable loop split. */
>>>> +  if (compute_added_num_insns (loop, cond_bb, branch) >
>>>> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
>>>> +    return false;
>>>> +
>>>> +  /* In each interation, conditional statement candidate should be
>>>> +     executed only once. */
>>>> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
>>>> +    return false;
>>>> +
>>>> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
>>>> +
>>>> +  /* When accurate profile information is available, and execution
>>>> +     frequency of the branch is too low, just let it go. */
>>>> +  if (prob.reliable_p ())
>>>> +    {
>>>> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
>>>> +
>>>> +      if (prob < profile_probability::always ().apply_scale (thres,
>>> 100))
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +  /* Temporarily keep branch index in conditional statement. */
>>>> +  gimple_set_plf (cond, GF_PLF_1, branch);
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Traverse all conditional statements in a loop, to find out a good
>>>> +   candidate upon which we can do loop split. */
>>>> +
>>>> +static bool
>>>> +mark_cond_to_split_loop (struct loop *loop)
>>>> +{
>>>> +  split_info *info = new split_info ();
>>>> +  basic_block *bbs = info->bbs = get_loop_body (loop);
>>>> +
>>>> +  /* Allocate an area to keep temporary info, and associate its address
>>>> +     with loop aux field. */
>>>> +  loop->aux = info;
>>>> +
>>>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>>>> +    {
>>>> +      basic_block bb = bbs[i];
>>>> +
>>>> +      /* Skip statement in inner recognized loop, because we want that
>>>> +         conditional statement executes at most once in each
>>> iteration. */
>>>> +      if (bb->loop_father != loop)
>>>> +        continue;
>>>> +
>>>> +      /* Actually this check is not a must constraint. With it, we can
>>>> +         ensure conditional statement will execute at least once in
>>>> +         each iteration. */
>>>> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>> +        continue;
>>>> +
>>>> +      gimple *last = last_stmt (bb);
>>>> +
>>>> +      if (!last || gimple_code (last) != GIMPLE_COND)
>>>> +        continue;
>>>> +
>>>> +      gcond *cond = as_a <gcond *> (last);
>>>> +
>>>> +      if (can_split_loop_on_cond (loop, cond))
>>>> +        {
>>>> +          info->cond = cond;
>>>> +          return true;
>>>> +        }
>>>> +    }
>>>> +
>>>> +  delete info;
>>>> +  loop->aux = NULL;
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Given a loop with a chosen conditional statement candidate,
>>> perform loop
>>>> +   split transformation illustrated as the following graph.
>>>> +
>>>> +               .-------T------ if (true) ------F------.
>>>> +               |                    .---------------. |
>>>> +               |                    |               | |
>>>> +               v                    |               v v
>>>> +          pre-header                | pre-header
>>>> +               | .------------.     | | .------------.
>>>> +               | |            |     | | |            |
>>>> +               | v            |     | | v            |
>>>> +             header           |     | header           |
>>>> +               |              |     | |              |
>>>> +       [ bool r = cond; ]     |     | |              |
>>>> +               |              |     | |              |
>>>> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
>>>> +      |                 |     |     | |                 |     |
>>>> +  invariant             |     |     | invariant             |     |
>>>> +      |                 |     |     | |                 |     |
>>>> +      '---T--->.<---F---'     |     | '---T--->.<---F---'     |
>>>> +               |              |    / |              |
>>>> +             stmts            |   / stmts            |
>>>> +               |              |  / |              |
>>>> +              / \             | /                    / \             |
>>>> +     .-------*   *       [ if (!r) ] .-------*   *            |
>>>> +     |           |            | |           |            |
>>>> +     |         latch          |             | latch          |
>>>> +     |           |            | |           |            |
>>>> +     |           '------------' |           '------------'
>>>> +     '------------------------. .-----------'
>>>> +             loop1            | | loop2
>>>> +                              v v
>>>> +                             exits
>>>> +
>>>> +   In the graph, loop1 represents the part derived from original
>>> one, and
>>>> +   loop2 is duplicated using loop_version (), which corresponds to
>>> the part
>>>> +   of original one being splitted out. In loop1, a new bool
>>> temporary (r)
>>>> +   is introduced to keep value of the condition result. In original
>>> latch
>>>> +   edge of loop1, we insert a new conditional statement whose value
>>> comes
>>>> +   from previous temporary (r), one of its branch goes back to
>>> loop1 header
>>>> +   as a latch edge, and the other branch goes to loop2 pre-header as an
>>>> +   entry edge. And also in loop2, we abandon the variant branch of the
>>>> +   conditional statement candidate by setting a constant bool
>>> condition,
>>>> +   based on which branch is semi-invariant. */
>>>> +
>>>> +static bool
>>>> +split_loop_for_cond (struct loop *loop1)
>>>> +{
>>>> +  split_info *info = (split_info *) loop1->aux;
>>>> +  gcond *cond = info->cond;
>>>> +  basic_block cond_bb = gimple_bb (cond);
>>>> +  int branch = gimple_plf (cond, GF_PLF_1);
>>>> +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags &
>>> EDGE_TRUE_VALUE);
>>>> +
>>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +   {
>>>> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB
>>> %d\n",
>>>> +              current_function_name (), loop1->num,
>>>> +              true_invar ? "T" : "F", cond_bb->index);
>>>> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
>>>> +   }
>>>> +
>>>> +  initialize_original_copy_tables ();
>>>> +
>>>> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>>>> + profile_probability::always (),
>>>> + profile_probability::never (),
>>>> + profile_probability::always (),
>>>> + profile_probability::always (),
>>>> +                                     true);
>>>> +  if (!loop2)
>>>> +    {
>>>> +      free_original_copy_tables ();
>>>> +      return false;
>>>> +    }
>>>> +
>>>> +  /* Generate a bool type temporary to hold result of the condition. */
>>>> +  tree tmp = make_ssa_name (boolean_type_node);
>>>> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
>>>> +  gimple *stmt = gimple_build_assign (tmp,
>>>> +                                      gimple_cond_code (cond),
>>>> +                                      gimple_cond_lhs (cond),
>>>> +                                      gimple_cond_rhs (cond));
>>>> +
>>>> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
>>>> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
>>>> +  update_stmt (cond);
>>>> +
>>>> +  /* Replace the condition in loop2 with a bool constant to let pass
>>>> +     manager remove the variant branch after current pass finishes. */
>>>> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
>>>> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
>>>> +
>>>> +  if (true_invar)
>>>> +    gimple_cond_make_true (cond_copy);
>>>> +  else
>>>> +    gimple_cond_make_false (cond_copy);
>>>> +
>>>> +  update_stmt (cond_copy);
>>>> +
>>>> +  /* Insert a new conditional statement on latch edge of loop1. This
>>>> +     statement acts as a switch to transfer execution from loop1 to
>>>> +     loop2, when loop1 enters into invariant state. */
>>>> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
>>>> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
>>>> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp,
>>> boolean_true_node,
>>>> +                                          NULL_TREE, NULL_TREE);
>>>> +
>>>> +  gsi = gsi_last_bb (break_bb);
>>>> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
>>>> +
>>>> +  edge to_loop1 = single_succ_edge (break_bb);
>>>> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge
>>> (loop2)->src, 0);
>>>> +
>>>> +  to_loop1->flags &= ~EDGE_FALLTHRU;
>>>> +
>>>> +  if (true_invar)
>>>> +    {
>>>> +      to_loop1->flags |= EDGE_FALSE_VALUE;
>>>> +      to_loop2->flags |= EDGE_TRUE_VALUE;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      to_loop1->flags |= EDGE_TRUE_VALUE;
>>>> +      to_loop2->flags |= EDGE_FALSE_VALUE;
>>>> +    }
>>>> +
>>>> +  update_ssa (TODO_update_ssa);
>>>> +
>>>> +  /* Due to introduction of a control flow edge from loop1 latch to
>>> loop2
>>>> +     pre-header, we should update PHIs in loop2 to reflect this
>>> connection
>>>> +     between loop1 and loop2. */
>>>> +  connect_loop_phis (loop1, loop2, to_loop2);
>>>> +
>>>> +  free_original_copy_tables ();
>>>> +
>>>> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Main entry point to perform loop splitting for suitable
>>> if-conditions
>>>> +   in all loops. */
>>>> +
>>>> +static unsigned int
>>>> +tree_ssa_split_loops_for_cond (void)
>>>> +{
>>>> +  struct loop *loop;
>>>> +  auto_vec<struct loop *> loop_list;
>>>> +  bool changed = false;
>>>> +  unsigned i;
>>>> +
>>>> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
>>>> +    loop->aux = NULL;
>>>> +
>>>> +  /* Go through all loops starting from innermost. */
>>>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>>> +    {
>>>> +      /* Put loop in a list if found a conditional statement
>>> candidate in
>>>> +         the loop. This is stage for analysis, no change anything
>>> in the
>>>> +         function. */
>>>> +      if (!loop->aux
>>>> +          && !optimize_loop_for_size_p (loop)
>>>> +          && mark_cond_to_split_loop (loop))
>>>> +        loop_list.safe_push (loop);
>>>> +
>>>> +      /* If any of our inner loops was split, don't split us,
>>>> +         and mark our containing loop as having had splits as well. */
>>>> +      loop_outer (loop)->aux = loop->aux;
>>>> +    }
>>>> +
>>>> +  FOR_EACH_VEC_ELT (loop_list, i, loop)
>>>> +    {
>>>> +      /* Extract selected loop and perform loop split. This is
>>> stage for
>>>> +         transformation. */
>>>> +      changed |= split_loop_for_cond (loop);
>>>> +
>>>> +      delete (split_info *) loop->aux;
>>>> +    }
>>>> +
>>>> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
>>>> +    loop->aux = NULL;
>>>> +
>>>> +  if (changed)
>>>> +    return TODO_cleanup_cfg;
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +
>>>>   /* Loop splitting pass.  */
>>>>
>>>>   namespace {
>>>> @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
>>>>   {
>>>>     return new pass_loop_split (ctxt);
>>>>   }
>>>> +
>>>> +namespace {
>>>> +
>>>> +const pass_data pass_data_cond_loop_split =
>>>> +{
>>>> +  GIMPLE_PASS, /* type */
>>>> +  "cond_lsplit", /* name */
>>>> +  OPTGROUP_LOOP, /* optinfo_flags */
>>>> +  TV_COND_LOOP_SPLIT, /* tv_id */
>>>> +  PROP_cfg, /* properties_required */
>>>> +  0, /* properties_provided */
>>>> +  0, /* properties_destroyed */
>>>> +  0, /* todo_flags_start */
>>>> +  0, /* todo_flags_finish */
>>>> +};
>>>> +
>>>> +class pass_cond_loop_split : public gimple_opt_pass
>>>> +{
>>>> +public:
>>>> +  pass_cond_loop_split (gcc::context *ctxt)
>>>> +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
>>>> +  {}
>>>> +
>>>> +  /* opt_pass methods: */
>>>> +  virtual bool gate (function *) { return flag_split_loops != 0; }
>>>> +  virtual unsigned int execute (function *);
>>>> +
>>>> +}; // class pass_cond_loop_split
>>>> +
>>>> +unsigned int
>>>> +pass_cond_loop_split::execute (function *fun)
>>>> +{
>>>> +  if (number_of_loops (fun) <= 1)
>>>> +    return 0;
>>>> +
>>>> +  return tree_ssa_split_loops_for_cond ();
>>>> +}
>>>> +
>>>> +} // anon namespace
>>>> +
>>>> +gimple_opt_pass *
>>>> +make_pass_cond_loop_split (gcc::context *ctxt)
>>>> +{
>>>> +  return new pass_cond_loop_split (ctxt);
>>>> +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-13  9:43     ` Kyrill Tkachov
  2019-03-13 12:11       ` Richard Biener
@ 2019-03-14  3:31       ` Feng Xue OS
  1 sibling, 0 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-03-14  3:31 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Biener; +Cc: gcc-patches

Ok. Got it. And I will add some cases.

Thanks,
Feng
________________________________
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Sent: Wednesday, March 13, 2019 5:40:37 PM
To: Feng Xue OS; Richard Biener
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Hi Feng,

On 3/13/19 1:56 AM, Feng Xue OS wrote:
> Richard,
>
>     Thanks for your comment. Yes, it is like kind of jump threading
> with knowledge of loop structure. And what is rough time for GCC 10?
>
>

GCC 10 will be released once the number of P1 regressions gets down to
zero. Past experience shows that it's around the April/May timeframe.

In the meantime my comment on the patch is that you should add some
tests to the testsuite that showcase this transformation.

Thanks,

Kyrill


> Regards,
>
> Feng
>
>
> ________________________________
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, March 12, 2019 4:31:49 PM
> To: Feng Xue OS
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR
> tree-optimization/89134)
>
> On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS
> <fxue@os.amperecomputing.com> wrote:
> >
> > This patch is composed to implement a loop transformation on one of
> its conditional statements, which we call it semi-invariant, in that
> its computation is impacted in only one of its branches.
> >
> > Suppose a loop as:
> >
> >     void f (std::map<int, int> m)
> >     {
> >         for (auto it = m.begin (); it != m.end (); ++it) {
> >             /* if (b) is semi-invariant. */
> >             if (b) {
> >                 b = do_something();    /* Has effect on b */
> >             } else {
> > /* No effect on b */
> >             }
> >             statements;                      /* Also no effect on b */
> >         }
> >     }
> >
> > A transformation, kind of loop split, could be:
> >
> >     void f (std::map<int, int> m)
> >     {
> >         for (auto it = m.begin (); it != m.end (); ++it) {
> >             if (b) {
> >                 b = do_something();
> >             } else {
> >                 ++it;
> >                 statements;
> >                 break;
> >             }
> >             statements;
> >         }
> >
> >         for (; it != m.end (); ++it) {
> >             statements;
> >         }
> >     }
> >
> > If "statements" contains nothing, the second loop becomes an empty
> one, which can be removed. (This part will be given in another patch).
> And if "statements" are straight line instructions, we get an
> opportunity to vectorize the second loop. In practice, this
> optimization is found to improve some real application by %7.
> >
> > Since it is just a kind of loop split, the codes are mainly placed
> in existing tree-ssa-loop-split module, and is controlled by
> -fsplit-loop, and is enabled with -O3.
>
> Note the transform itself is jump-threading with the threading
> duplicating a whole CFG cycle.
>
> I didn't look at the patch details yet since this is suitable for GCC
> 10 only.
>
> Thanks for implementing this.
> Richard.
>
> > Feng
> >
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index 64bf6017d16..a6c2878d652 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,23 @@
> > +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> > +
> > +       PR tree-optimization/89134
> > +        * doc/invoke.texi (max-cond-loop-split-insns): Document new
> --params.
> > +       (min-cond-loop-split-prob): Likewise.
> > +       * params.def: Add max-cond-loop-split-insns,
> min-cond-loop-split-prob.
> > +       * passes.def (pass_cond_loop_split) : New pass.
> > +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> > +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> > +       * tree-ssa-loop-split.c (split_info): New class.
> > +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> > +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> > +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> > +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> > +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> > +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> > +       (pass_data_cond_loop_split): New variable.
> > +       (pass_cond_loop_split): New class.
> > +       (make_pass_cond_loop_split): New function.
> > +
> >  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR middle-end/89655
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index df0883f2fc9..f5e09bd71fd 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched
> in a single loop.
> >  @item lim-expensive
> >  The minimum cost of an expensive expression in the loop invariant
> motion.
> >
> > +@item max-cond-loop-split-insns
> > +The maximum number of insns to be increased due to loop split on
> > +semi-invariant condition statement.
> > +
> > +@item min-cond-loop-split-prob
> > +The minimum threshold for probability of semi-invaraint condition
> > +statement to trigger loop split.
> > +
> >  @item iv-consider-all-candidates-bound
> >  Bound on number of candidates for induction variables, below which
> >  all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 3f1576448be..2e067526958 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> >         "The maximum number of unswitchings in a single loop.",
> >         3, 0, 0)
> >
> > +/* The maximum number of increased insns due to loop split on
> semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > +       "max-cond-loop-split-insns",
> > +       "The maximum number of insns to be increased due to loop
> split on semi-invariant condition statement.",
> > +       100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > +       "min-cond-loop-split-prob",
> > +       "The minimum threshold for probability of semi-invaraint
> condition statement to trigger loop split.",
> > +       30, 0, 100)
> > +
> >  /* The maximum number of insns in loop header duplicated by the
> copy loop
> >     headers pass.  */
> >  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 446a7c48276..bde7f4c50c0 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
> >           NEXT_PASS (pass_tree_unswitch);
> >           NEXT_PASS (pass_scev_cprop);
> >           NEXT_PASS (pass_loop_split);
> > +         NEXT_PASS (pass_cond_loop_split);
> >           NEXT_PASS (pass_loop_versioning);
> >           NEXT_PASS (pass_loop_jam);
> >           /* All unswitching, final value replacement and splitting
> can expose
> > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > index 54154464a58..39f2df0e3ec 100644
> > --- a/gcc/timevar.def
> > +++ b/gcc/timevar.def
> > @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree
> canonical iv")
> >  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
> >  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
> >  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> > +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
> >  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
> >  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
> >  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > index 47be59b2a11..f441ba36871 100644
> > --- a/gcc/tree-pass.h
> > +++ b/gcc/tree-pass.h
> > @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim
> (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> > +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index 999c9a30366..d287a0d7d4c 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-ssa-loop.h"
> >  #include "tree-ssa-loop-manip.h"
> >  #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> >  #include "cfgloop.h"
> > +#include "params.h"
> >  #include "tree-scalar-evolution.h"
> >  #include "gimple-iterator.h"
> >  #include "gimple-pretty-print.h"
> > @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimple-fold.h"
> >  #include "gimplify-me.h"
> >
> > -/* This file implements loop splitting, i.e. transformation of
> loops like
> > +/* This file implements two kind of loop splitting.
> > +
> > +   One transformation of loops like:
> >
> >     for (i = 0; i < 100; i++)
> >       {
> > @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
> >    return 0;
> >  }
> >
> > +
> > +/* Another transformation of loops like:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;  // change at least one a_j
> > +       else
> > +         S;          // not change any a_j
> > +     }
> > +
> > +   into:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;
> > +       else
> > +         {
> > +           S;
> > +           i = NEXT ();
> > +           break;
> > +         }
> > +     }
> > +
> > +   for (; CHECK (i); i = NEXT ())
> > +     {
> > +       S;
> > +     }
> > +
> > +   */
> > +
> > +/* Data structure to hold temporary information during loop split upon
> > +   semi-invariant conditional statement. */
> > +class split_info {
> > +public:
> > +  /* Array of all basic blocks in a loop, returned by
> get_loop_body(). */
> > +  basic_block *bbs;
> > +
> > +  /* All memory store/clobber statements in a loop. */
> > +  auto_vec<gimple *> stores;
> > +
> > +  /* Whether above memory stores vector has been filled. */
> > +  bool set_stores;
> > +
> > +  /* Semi-invariant conditional statement, upon which to split loop. */
> > +  gcond *cond;
> > +
> > +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> > +
> > +  ~split_info ()
> > +    {
> > +      if (bbs)
> > +        free (bbs);
> > +    }
> > +};
> > +
> > +/* Find all statements with memory-write effect in a loop,
> including memory
> > +   store and non-pure function call, and keep those in a vector.
> This work
> > +   is only done for one time, for the vector should be constant during
> > +   analysis stage of semi-invariant condition. */
> > +
> > +static void
> > +find_vdef_in_loop (struct loop *loop)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +  gphi *vphi = get_virtual_phi (loop->header);
> > +
> > +  /* Indicate memory store vector has been filled. */
> > +  info->set_stores = true;
> > +
> > +  /* If loop contains memory operation, there must be a virtual PHI
> node in
> > +     loop header basic block. */
> > +  if (vphi == NULL)
> > +    return;
> > +
> > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > +     graph via virtual PHI nodes. The virtual PHI node in loop
> header just
> > +     links the first and the last virtual SSA names, by using the
> last as
> > +     PHI operand to define the first. */
> > +  const edge latch = loop_latch_edge (loop);
> > +  const tree first = gimple_phi_result (vphi);
> > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > +
> > +  /* The virtual SSA cyclic graph might consist of only one SSA
> name, who
> > +     is defined by itself.
> > +
> > +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > +
> > +     This means the loop contains only memory loads, so we can skip
> it. */
> > +  if (first == last)
> > +    return;
> > +
> > +  auto_vec<gimple *> others;
> > +  auto_vec<tree> worklist;
> > +  auto_bitmap visited;
> > +
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > +  worklist.safe_push (last);
> > +
> > +  do
> > +    {
> > +      tree vuse = worklist.pop ();
> > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > +
> > +      /* We mark the first and last SSA names as visited at the
> beginning,
> > +         and reversely start the process from the last SSA name
> toward the
> > +         first, which ensure that this do-while will not touch SSA
> names
> > +         defined outside of the loop. */
> > +      gcc_assert (gimple_bb (stmt)
> > +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > +
> > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > +        {
> > +          gphi *phi = as_a <gphi *> (stmt);
> > +
> > +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +            {
> > +              tree arg = gimple_phi_arg_def (stmt, i);
> > +
> > +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > +                worklist.safe_push (arg);
> > +            }
> > +        }
> > +      else
> > +        {
> > +          tree prev = gimple_vuse (stmt);
> > +
> > +          /* Non-pure call statement is conservatively assumed to
> impact
> > +             all memory locations. So place call statements ahead
> of other
> > +             memory stores in the vector with the idea of of using
> them as
> > +             shortcut terminators to memory alias analysis, kind of
> > +             optimization for compilation. */
> > +          if (gimple_code (stmt) == GIMPLE_CALL)
> > +            info->stores.safe_push (stmt);
> > +          else
> > +            others.safe_push (stmt);
> > +
> > +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > +            worklist.safe_push (prev);
> > +        }
> > +    } while (!worklist.is_empty ());
> > +
> > +    info->stores.safe_splice (others);
> > +}
> > +
> > +
> > +/* Given a memory load or pure call statement, check whether it is
> impacted
> > +   by some memory store in the loop excluding those basic blocks
> dominated
> > +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> > +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks
> of the
> > +   loop are checked. */
> > +
> > +static bool
> > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +
> > +  /* Collect memory store/clobber statements if have not do that. */
> > +  if (!info->set_stores)
> > +    find_vdef_in_loop (loop);
> > +
> > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) :
> NULL_TREE;
> > +  ao_ref ref;
> > +  gimple *store;
> > +  unsigned i;
> > +
> > +  ao_ref_init (&ref, rhs);
> > +
> > +  FOR_EACH_VEC_ELT (info->stores, i, store)
> > +    {
> > +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> > +      if (skip_head
> > +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store),
> skip_head))
> > +        continue;
> > +
> > +      /* For a pure call, it is assumed to be impacted by any
> memory store.
> > +         For a memory load, use memory alias analysis to check that. */
> > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > +        return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Forward declaration */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head);
> > +
> > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed
> in certain
> > +   iteration, check whether an SSA name remains unchanged in next
> interation.
> > +   We can call this characterisic as semi-invariantness. SKIP_HEAD
> might be
> > +   NULL, if so, nothing excluded, all basic blocks and control
> flows in the
> > +   loop will be considered. */
> > +
> > +static bool
> > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > +                      const_basic_block skip_head)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > +  const_basic_block def_bb = gimple_bb (def);
> > +
> > +  /* An SSA name defined outside a loop is definitely
> semi-invariant. */
> > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > +    return true;
> > +
> > +  /* This function is used to check semi-invariantness of a condition
> > +     statement, and SKIP_HEAD is always given as head of one of its
> > +     branches. So it implies that SSA name to check should be defined
> > +     before the conditional statement, and also before SKIP_HEAD. */
> > +
> > +  if (gimple_code (def) == GIMPLE_PHI)
> > +    {
> > +      /* In a normal loop, if a PHI node is located not in loop
> header, all
> > +         its source operands should be defined inside the loop. As we
> > +         mentioned before, these source definitions are ahead of
> SKIP_HEAD,
> > +         and will not be bypassed. Therefore, in each iteration, any of
> > +         these sources might be value provider to the SSA name,
> which for
> > +         sure should not be seen as invariant. */
> > +      if (def_bb != loop->header || !skip_head)
> > +        return false;
> > +
> > +      const_edge latch = loop_latch_edge (loop);
> > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > +
> > +      /* A PHI node in loop header always contains two source operands,
> > +         one is initial value, the other is the copy of last iteration
> > +         through loop latch, we call it latch value. From this PHI node
> > +         to definition of latch value, if excluding those basic blocks
> > +         dominated by SKIP_HEAD, there is no definition of other
> version
> > +         of same variable, SSA name defined by the PHI node is
> > +         semi-invariant.
> > +
> > +                         loop entry
> > +                              |     .--- latch ---.
> > +                              |     |             |
> > +                              v     v             |
> > +                  x_1 = PHI <x_0, x_3>           |
> > +                           |                      |
> > +                           v                      |
> > +              .------- if (cond) -------.         |
> > +              |                         |         |
> > +              |                     [ SKIP ]      |
> > +              |                         |         |
> > +              |                     x_2 = ...     |
> > +              |                         |         |
> > +              '---- T ---->.<---- F ----'         |
> > +                           |                      |
> > +                           v                      |
> > +                  x_3 = PHI <x_1, x_2>            |
> > +                           |                      |
> > +                           '----------------------'
> > +
> > +        Suppose in certain iteration, execution flow in above graph
> goes
> > +        through true branch, which means that one source value to
> define
> > +        x_3 in false branch (x2) is skipped, x_3 only comes from
> x_1, and
> > +        x_1 in next iterations is defined by x_3, we know that x_1 will
> > +        never changed if COND always chooses true branch from then
> on. */
> > +
> > +      while (from != name)
> > +        {
> > +          /* A new value comes from a CONSTANT. */
> > +          if (TREE_CODE (from) != SSA_NAME)
> > +            return false;
> > +
> > +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> > +          const_basic_block bb = gimple_bb (stmt);
> > +
> > +          /* A new value comes from outside of loop. */
> > +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > +            return false;
> > +
> > +          from = NULL_TREE;
> > +
> > +          if (gimple_code (stmt) == GIMPLE_PHI)
> > +            {
> > +              gphi *phi = as_a <gphi *> (stmt);
> > +
> > +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +                {
> > +                  const_edge e = gimple_phi_arg_edge (phi, i);
> > +
> > +                  /* Skip redefinition from basic blocks being
> excluded. */
> > +                  if (!dominated_by_p (CDI_DOMINATORS, e->src,
> skip_head))
> > +                    {
> > +                      /* There are more than one source operands
> that can
> > +                         provide value to the SSA name. */
> > +                      if (from)
> > +                        return false;
> > +
> > +                      from = gimple_phi_arg_def (phi, i);
> > +                    }
> > +                }
> > +            }
> > +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > +            {
> > +              /* For simple value copy, check its rhs instead. */
> > +              if (gimple_assign_ssa_name_copy_p (stmt))
> > +                from = gimple_assign_rhs1 (stmt);
> > +            }
> > +
> > +          /* Any other kind of definition is deemed to introduce a
> new value
> > +             to the SSA name. */
> > +          if (!from)
> > +            return false;
> > +        }
> > +        return true;
> > +    }
> > +
> > +  /* Value originated from volatile memory load or return of normal
> (non-
> > +     const/pure) call should not be treated as constant in each
> iteration. */
> > +  if (gimple_has_side_effects (def))
> > +    return false;
> > +
> > +  /* Check if any memory store may kill memory load at this place. */
> > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def,
> skip_head))
> > +    return false;
> > +
> > +  /* Check operands of definition statement of the SSA name. */
> > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > +}
> > +
> > +/* Check whether a statement is semi-invariant, iff all its
> operands are
> > +   semi-invariant. */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head)
> > +{
> > +  ssa_op_iter iter;
> > +  tree use;
> > +
> > +  /* Although operand of a statement might be SSA name, CONSTANT or
> VARDECL,
> > +     here we only need to check SSA name operands. For VARDECL operand
> > +     involves memory load, check on VARDECL operand must have been done
> > +     prior to invocation of this function in ssa_semi_invariant_p. */
> > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > +    {
> > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > +        return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine if unselect one branch of a conditional statement,
> whether we
> > +   can exclude leading basic block of the branch and those basic blocks
> > +   dominated by the leading one. */
> > +
> > +static bool
> > +can_branch_be_excluded (basic_block branch_bb)
> > +{
> > +  if (single_pred_p (branch_bb))
> > +    return true;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > +    {
> > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > +        continue;
> > +
> > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > +        continue;
> > +
> > +       /* The branch can be reached through other path, not just
> from the
> > +          conditional statement. */
> > +      return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Find out which branch of a conditional statement is invariant. That
> > +   is: once the branch is selected in certain loop iteration, any
> operand
> > +   that contributes to computation of the conditional statement remains
> > +   unchanged in all following iterations. */
> > +
> > +static int
> > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > +{
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  basic_block targ_bb[2];
> > +  bool invar[2];
> > +  unsigned invar_checks;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > +
> > +      /* One branch directs to loop exit, no need to perform loop
> split upon
> > +         this conditional statement. Firstly, it is trivial if the exit
> > +         branch is semi-invariant, for the statement is just
> loop-breaking.
> > +         Secondly, if the opposite branch is semi-invariant, it
> means that
> > +         the statement is real loop-invariant, which is covered by loop
> > +         unswitch. */
> > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > +        return -1;
> > +    }
> > +
> > +  invar_checks = 0;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      invar[!i] = false;
> > +
> > +      if (!can_branch_be_excluded (targ_bb[i]))
> > +        continue;
> > +
> > +      /* Given a semi-invariant branch, if its opposite branch
> dominates
> > +         loop latch, it and its following trace will only be
> executed in
> > +         final iteration of loop, namely it is not part of repeated
> body
> > +         of the loop. Similar to the above case that the branch is loop
> > +         exit, no need to split loop. */
> > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > +        continue;
> > +
> > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > +      invar_checks++;
> > +    }
> > +
> > +  /* With both branches being invariant (handled by loop unswitch) or
> > +     variant is not what we want. */
> > +  if (invar[0] ^ !invar[1])
> > +    return -1;
> > +
> > +  /* Found a real loop-invariant condition, do nothing. */
> > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > +    return -1;
> > +
> > +  return invar[1];
> > +}
> > +
> > +/* Return TRUE is conditional statement in a normal loop is also inside
> > +   a nested non-recognized loop, such as an irreducible loop. */
> > +
> > +static bool
> > +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> > +                        int branch)
> > +{
> > +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> > +
> > +  if (cond_bb == loop->header || branch_bb == loop->latch)
> > +    return false;
> > +
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  auto_vec<basic_block> worklist;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    bbs[i]->flags &= ~BB_REACHABLE;
> > +
> > +  /* Mark latch basic block as visited to be end point for
> reachablility
> > +     traversal. */
> > +  loop->latch->flags |= BB_REACHABLE;
> > +
> > +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> > +
> > +  /* Start from specified branch, the opposite branch is ignored for it
> > +     will not be executed. */
> > +  branch_bb->flags |= BB_REACHABLE;
> > +  worklist.safe_push (branch_bb);
> > +
> > +  do
> > +    {
> > +      basic_block bb = worklist.pop ();
> > +      edge e;
> > +      edge_iterator ei;
> > +
> > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > +        {
> > +          basic_block succ_bb = e->dest;
> > +
> > +          if (succ_bb == cond_bb)
> > +            return true;
> > +
> > +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> > +            continue;
> > +
> > +          if (succ_bb->flags & BB_REACHABLE)
> > +            continue;
> > +
> > +          succ_bb->flags |= BB_REACHABLE;
> > +          worklist.safe_push (succ_bb);
> > +        }
> > +    } while (!worklist.is_empty ());
> > +
> > +  return false;
> > +}
> > +
> > +
> > +/* Calculate increased code size measured by estimated insn number if
> > +   applying loop split upon certain branch of a conditional
> statement. */
> > +
> > +static int
> > +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> > +                         int branch)
> > +{
> > +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  int num = 0;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      /* Do no count basic blocks only in opposite branch. */
> > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> > +        continue;
> > +
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
> !gsi_end_p (gsi);
> > +           gsi_next (&gsi))
> > +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> > +    }
> > +
> > +  return num;
> > +}
> > +
> > +/* Return true if it is eligible and profitable to perform loop
> split upon
> > +   a conditional statement. */
> > +
> > +static bool
> > +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> > +{
> > +  int branch = get_cond_invariant_branch (loop, cond);
> > +
> > +  if (branch < 0)
> > +    return false;
> > +
> > +  basic_block cond_bb = gimple_bb (cond);
> > +
> > +  /* Add a threshold for increased code size to disable loop split. */
> > +  if (compute_added_num_insns (loop, cond_bb, branch) >
> > +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> > +    return false;
> > +
> > +  /* In each interation, conditional statement candidate should be
> > +     executed only once. */
> > +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> > +    return false;
> > +
> > +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> > +
> > +  /* When accurate profile information is available, and execution
> > +     frequency of the branch is too low, just let it go. */
> > +  if (prob.reliable_p ())
> > +    {
> > +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> > +
> > +      if (prob < profile_probability::always ().apply_scale (thres,
> 100))
> > +        return false;
> > +    }
> > +
> > +  /* Temporarily keep branch index in conditional statement. */
> > +  gimple_set_plf (cond, GF_PLF_1, branch);
> > +  return true;
> > +}
> > +
> > +/* Traverse all conditional statements in a loop, to find out a good
> > +   candidate upon which we can do loop split. */
> > +
> > +static bool
> > +mark_cond_to_split_loop (struct loop *loop)
> > +{
> > +  split_info *info = new split_info ();
> > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > +
> > +  /* Allocate an area to keep temporary info, and associate its address
> > +     with loop aux field. */
> > +  loop->aux = info;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      basic_block bb = bbs[i];
> > +
> > +      /* Skip statement in inner recognized loop, because we want that
> > +         conditional statement executes at most once in each
> iteration. */
> > +      if (bb->loop_father != loop)
> > +        continue;
> > +
> > +      /* Actually this check is not a must constraint. With it, we can
> > +         ensure conditional statement will execute at least once in
> > +         each iteration. */
> > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +        continue;
> > +
> > +      gimple *last = last_stmt (bb);
> > +
> > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > +        continue;
> > +
> > +      gcond *cond = as_a <gcond *> (last);
> > +
> > +      if (can_split_loop_on_cond (loop, cond))
> > +        {
> > +          info->cond = cond;
> > +          return true;
> > +        }
> > +    }
> > +
> > +  delete info;
> > +  loop->aux = NULL;
> > +
> > +  return false;
> > +}
> > +
> > +/* Given a loop with a chosen conditional statement candidate,
> perform loop
> > +   split transformation illustrated as the following graph.
> > +
> > +               .-------T------ if (true) ------F------.
> > +               |                    .---------------. |
> > +               |                    |               | |
> > +               v                    |               v v
> > +          pre-header                | pre-header
> > +               | .------------.     | | .------------.
> > +               | |            |     | | |            |
> > +               | v            |     | | v            |
> > +             header           |     | header           |
> > +               |              |     | |              |
> > +       [ bool r = cond; ]     |     | |              |
> > +               |              |     | |              |
> > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > +      |                 |     |     | |                 |     |
> > +  invariant             |     |     | invariant             |     |
> > +      |                 |     |     | |                 |     |
> > +      '---T--->.<---F---'     |     | '---T--->.<---F---'     |
> > +               |              |    / |              |
> > +             stmts            |   / stmts            |
> > +               |              |  / |              |
> > +              / \             | /                    / \             |
> > +     .-------*   *       [ if (!r) ] .-------*   *            |
> > +     |           |            | |           |            |
> > +     |         latch          |             | latch          |
> > +     |           |            | |           |            |
> > +     |           '------------' |           '------------'
> > +     '------------------------. .-----------'
> > +             loop1            | | loop2
> > +                              v v
> > +                             exits
> > +
> > +   In the graph, loop1 represents the part derived from original
> one, and
> > +   loop2 is duplicated using loop_version (), which corresponds to
> the part
> > +   of original one being splitted out. In loop1, a new bool
> temporary (r)
> > +   is introduced to keep value of the condition result. In original
> latch
> > +   edge of loop1, we insert a new conditional statement whose value
> comes
> > +   from previous temporary (r), one of its branch goes back to
> loop1 header
> > +   as a latch edge, and the other branch goes to loop2 pre-header as an
> > +   entry edge. And also in loop2, we abandon the variant branch of the
> > +   conditional statement candidate by setting a constant bool
> condition,
> > +   based on which branch is semi-invariant. */
> > +
> > +static bool
> > +split_loop_for_cond (struct loop *loop1)
> > +{
> > +  split_info *info = (split_info *) loop1->aux;
> > +  gcond *cond = info->cond;
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  int branch = gimple_plf (cond, GF_PLF_1);
> > +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags &
> EDGE_TRUE_VALUE);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB
> %d\n",
> > +              current_function_name (), loop1->num,
> > +              true_invar ? "T" : "F", cond_bb->index);
> > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > +   }
> > +
> > +  initialize_original_copy_tables ();
> > +
> > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > + profile_probability::always (),
> > + profile_probability::never (),
> > + profile_probability::always (),
> > + profile_probability::always (),
> > +                                     true);
> > +  if (!loop2)
> > +    {
> > +      free_original_copy_tables ();
> > +      return false;
> > +    }
> > +
> > +  /* Generate a bool type temporary to hold result of the condition. */
> > +  tree tmp = make_ssa_name (boolean_type_node);
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple *stmt = gimple_build_assign (tmp,
> > +                                      gimple_cond_code (cond),
> > +                                      gimple_cond_lhs (cond),
> > +                                      gimple_cond_rhs (cond));
> > +
> > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > +  update_stmt (cond);
> > +
> > +  /* Replace the condition in loop2 with a bool constant to let pass
> > +     manager remove the variant branch after current pass finishes. */
> > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > +
> > +  if (true_invar)
> > +    gimple_cond_make_true (cond_copy);
> > +  else
> > +    gimple_cond_make_false (cond_copy);
> > +
> > +  update_stmt (cond_copy);
> > +
> > +  /* Insert a new conditional statement on latch edge of loop1. This
> > +     statement acts as a switch to transfer execution from loop1 to
> > +     loop2, when loop1 enters into invariant state. */
> > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp,
> boolean_true_node,
> > +                                          NULL_TREE, NULL_TREE);
> > +
> > +  gsi = gsi_last_bb (break_bb);
> > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > +
> > +  edge to_loop1 = single_succ_edge (break_bb);
> > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge
> (loop2)->src, 0);
> > +
> > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > +
> > +  if (true_invar)
> > +    {
> > +      to_loop1->flags |= EDGE_FALSE_VALUE;
> > +      to_loop2->flags |= EDGE_TRUE_VALUE;
> > +    }
> > +  else
> > +    {
> > +      to_loop1->flags |= EDGE_TRUE_VALUE;
> > +      to_loop2->flags |= EDGE_FALSE_VALUE;
> > +    }
> > +
> > +  update_ssa (TODO_update_ssa);
> > +
> > +  /* Due to introduction of a control flow edge from loop1 latch to
> loop2
> > +     pre-header, we should update PHIs in loop2 to reflect this
> connection
> > +     between loop1 and loop2. */
> > +  connect_loop_phis (loop1, loop2, to_loop2);
> > +
> > +  free_original_copy_tables ();
> > +
> > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > +
> > +  return true;
> > +}
> > +
> > +/* Main entry point to perform loop splitting for suitable
> if-conditions
> > +   in all loops. */
> > +
> > +static unsigned int
> > +tree_ssa_split_loops_for_cond (void)
> > +{
> > +  struct loop *loop;
> > +  auto_vec<struct loop *> loop_list;
> > +  bool changed = false;
> > +  unsigned i;
> > +
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  /* Go through all loops starting from innermost. */
> > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > +    {
> > +      /* Put loop in a list if found a conditional statement
> candidate in
> > +         the loop. This is stage for analysis, no change anything
> in the
> > +         function. */
> > +      if (!loop->aux
> > +          && !optimize_loop_for_size_p (loop)
> > +          && mark_cond_to_split_loop (loop))
> > +        loop_list.safe_push (loop);
> > +
> > +      /* If any of our inner loops was split, don't split us,
> > +         and mark our containing loop as having had splits as well. */
> > +      loop_outer (loop)->aux = loop->aux;
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> > +    {
> > +      /* Extract selected loop and perform loop split. This is
> stage for
> > +         transformation. */
> > +      changed |= split_loop_for_cond (loop);
> > +
> > +      delete (split_info *) loop->aux;
> > +    }
> > +
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  if (changed)
> > +    return TODO_cleanup_cfg;
> > +  return 0;
> > +}
> > +
> > +
> >  /* Loop splitting pass.  */
> >
> >  namespace {
> > @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
> >  {
> >    return new pass_loop_split (ctxt);
> >  }
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_cond_loop_split =
> > +{
> > +  GIMPLE_PASS, /* type */
> > +  "cond_lsplit", /* name */
> > +  OPTGROUP_LOOP, /* optinfo_flags */
> > +  TV_COND_LOOP_SPLIT, /* tv_id */
> > +  PROP_cfg, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  0, /* todo_flags_finish */
> > +};
> > +
> > +class pass_cond_loop_split : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_cond_loop_split (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *) { return flag_split_loops != 0; }
> > +  virtual unsigned int execute (function *);
> > +
> > +}; // class pass_cond_loop_split
> > +
> > +unsigned int
> > +pass_cond_loop_split::execute (function *fun)
> > +{
> > +  if (number_of_loops (fun) <= 1)
> > +    return 0;
> > +
> > +  return tree_ssa_split_loops_for_cond ();
> > +}
> > +
> > +} // anon namespace
> > +
> > +gimple_opt_pass *
> > +make_pass_cond_loop_split (gcc::context *ctxt)
> > +{
> > +  return new pass_cond_loop_split (ctxt);
> > +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-03-12  8:33 ` Richard Biener
  2019-03-13  2:13   ` Feng Xue OS
@ 2019-05-06  3:04   ` Feng Xue OS
  2019-05-06 10:17     ` Richard Biener
  1 sibling, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-05-06  3:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi Richard,


   Since gcc 9 has been released, will you get some time to take a look at this patch? Thanks.


Feng

________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, March 12, 2019 4:31:49 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)

On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> This patch is composed to implement a loop transformation on one of its conditional statements, which we call it semi-invariant, in that its computation is impacted in only one of its branches.
>
> Suppose a loop as:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             /* if (b) is semi-invariant. */
>             if (b) {
>                 b = do_something();    /* Has effect on b */
>             } else {
>                                                         /* No effect on b */
>             }
>             statements;                      /* Also no effect on b */
>         }
>     }
>
> A transformation, kind of loop split, could be:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             if (b) {
>                 b = do_something();
>             } else {
>                 ++it;
>                 statements;
>                 break;
>             }
>             statements;
>         }
>
>         for (; it != m.end (); ++it) {
>             statements;
>         }
>     }
>
> If "statements" contains nothing, the second loop becomes an empty one, which can be removed. (This part will be given in another patch). And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop. In practice, this optimization is found to improve some real application by %7.
>
> Since it is just a kind of loop split, the codes are mainly placed in existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and is enabled with -O3.

Note the transform itself is jump-threading with the threading
duplicating a whole CFG cycle.

I didn't look at the patch details yet since this is suitable for GCC 10 only.

Thanks for implementing this.
Richard.

> Feng
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 64bf6017d16..a6c2878d652 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> +
> +       PR tree-optimization/89134
> +        * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> +       (min-cond-loop-split-prob): Likewise.
> +       * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> +       * passes.def (pass_cond_loop_split) : New pass.
> +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> +       * tree-ssa-loop-split.c (split_info): New class.
> +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> +       (pass_data_cond_loop_split): New variable.
> +       (pass_cond_loop_split): New class.
> +       (make_pass_cond_loop_split): New function.
> +
>  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
>
>         PR middle-end/89655
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index df0883f2fc9..f5e09bd71fd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.
> +
> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 3f1576448be..2e067526958 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>         "The maximum number of unswitchings in a single loop.",
>         3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition statement to trigger loop split.",
> +       30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
>     headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 446a7c48276..bde7f4c50c0 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_tree_unswitch);
>           NEXT_PASS (pass_scev_cprop);
>           NEXT_PASS (pass_loop_split);
> +         NEXT_PASS (pass_cond_loop_split);
>           NEXT_PASS (pass_loop_versioning);
>           NEXT_PASS (pass_loop_jam);
>           /* All unswitching, final value replacement and splitting can expose
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 54154464a58..39f2df0e3ec 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
>  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 47be59b2a11..f441ba36871 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..d287a0d7d4c 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "tree-inline.h"
>  #include "cfgloop.h"
> +#include "params.h"
>  #include "tree-scalar-evolution.h"
>  #include "gimple-iterator.h"
>  #include "gimple-pretty-print.h"
> @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.
> +
> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement. */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body(). */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop. */
> +  auto_vec<gimple *> stores;
> +
> +  /* Whether above memory stores vector has been filled. */
> +  bool set_stores;
> +
> +  /* Semi-invariant conditional statement, upon which to split loop. */
> +  gcond *cond;
> +
> +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +        free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in a loop, including memory
> +   store and non-pure function call, and keep those in a vector. This work
> +   is only done for one time, for the vector should be constant during
> +   analysis stage of semi-invariant condition. */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled. */
> +  info->set_stores = true;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block. */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes. The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first. */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it. */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> others;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +         and reversely start the process from the last SSA name toward the
> +         first, which ensure that this do-while will not touch SSA names
> +         defined outside of the loop. */
> +      gcc_assert (gimple_bb (stmt)
> +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +        {
> +          gphi *phi = as_a <gphi *> (stmt);
> +
> +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +            {
> +              tree arg = gimple_phi_arg_def (stmt, i);
> +
> +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +                worklist.safe_push (arg);
> +            }
> +        }
> +      else
> +        {
> +          tree prev = gimple_vuse (stmt);
> +
> +          /* Non-pure call statement is conservatively assumed to impact
> +             all memory locations. So place call statements ahead of other
> +             memory stores in the vector with the idea of of using them as
> +             shortcut terminators to memory alias analysis, kind of
> +             optimization for compilation. */
> +          if (gimple_code (stmt) == GIMPLE_CALL)
> +            info->stores.safe_push (stmt);
> +          else
> +            others.safe_push (stmt);
> +
> +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +            worklist.safe_push (prev);
> +        }
> +    } while (!worklist.is_empty ());
> +
> +    info->stores.safe_splice (others);
> +}
> +
> +
> +/* Given a memory load or pure call statement, check whether it is impacted
> +   by some memory store in the loop excluding those basic blocks dominated
> +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks of the
> +   loop are checked. */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that. */
> +  if (!info->set_stores)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->stores, i, store)
> +    {
> +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> +      if (skip_head
> +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +        continue;
> +
> +      /* For a pure call, it is assumed to be impacted by any memory store.
> +         For a memory load, use memory alias analysis to check that. */
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +        return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed in certain
> +   iteration, check whether an SSA name remains unchanged in next interation.
> +   We can call this characterisic as semi-invariantness. SKIP_HEAD might be
> +   NULL, if so, nothing excluded, all basic blocks and control flows in the
> +   loop will be considered. */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                      const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant. */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  /* This function is used to check semi-invariantness of a condition
> +     statement, and SKIP_HEAD is always given as head of one of its
> +     branches. So it implies that SSA name to check should be defined
> +     before the conditional statement, and also before SKIP_HEAD. */
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* In a normal loop, if a PHI node is located not in loop header, all
> +         its source operands should be defined inside the loop. As we
> +         mentioned before, these source definitions are ahead of SKIP_HEAD,
> +         and will not be bypassed. Therefore, in each iteration, any of
> +         these sources might be value provider to the SSA name, which for
> +         sure should not be seen as invariant. */
> +      if (def_bb != loop->header || !skip_head)
> +        return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header always contains two source operands,
> +         one is initial value, the other is the copy of last iteration
> +         through loop latch, we call it latch value. From this PHI node
> +         to definition of latch value, if excluding those basic blocks
> +         dominated by SKIP_HEAD, there is no definition of other version
> +         of same variable, SSA name defined by the PHI node is
> +         semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +        Suppose in certain iteration, execution flow in above graph goes
> +        through true branch, which means that one source value to define
> +        x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +        x_1 in next iterations is defined by x_3, we know that x_1 will
> +        never changed if COND always chooses true branch from then on. */
> +
> +      while (from != name)
> +        {
> +          /* A new value comes from a CONSTANT. */
> +          if (TREE_CODE (from) != SSA_NAME)
> +            return false;
> +
> +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> +          const_basic_block bb = gimple_bb (stmt);
> +
> +          /* A new value comes from outside of loop. */
> +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +            return false;
> +
> +          from = NULL_TREE;
> +
> +          if (gimple_code (stmt) == GIMPLE_PHI)
> +            {
> +              gphi *phi = as_a <gphi *> (stmt);
> +
> +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +                {
> +                  const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +                  /* Skip redefinition from basic blocks being excluded. */
> +                  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                    {
> +                      /* There are more than one source operands that can
> +                         provide value to the SSA name. */
> +                      if (from)
> +                        return false;
> +
> +                      from = gimple_phi_arg_def (phi, i);
> +                    }
> +                }
> +            }
> +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +            {
> +              /* For simple value copy, check its rhs instead. */
> +              if (gimple_assign_ssa_name_copy_p (stmt))
> +                from = gimple_assign_rhs1 (stmt);
> +            }
> +
> +          /* Any other kind of definition is deemed to introduce a new value
> +             to the SSA name. */
> +          if (!from)
> +            return false;
> +        }
> +        return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration. */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place. */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name. */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether a statement is semi-invariant, iff all its operands are
> +   semi-invariant. */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                       const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands. For VARDECL operand
> +     involves memory load, check on VARDECL operand must have been done
> +     prior to invocation of this function in ssa_semi_invariant_p. */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +        return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine if unselect one branch of a conditional statement, whether we
> +   can exclude leading basic block of the branch and those basic blocks
> +   dominated by the leading one. */
> +
> +static bool
> +can_branch_be_excluded (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +        continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +        continue;
> +
> +       /* The branch can be reached through other path, not just from the
> +          conditional statement. */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement is invariant. That
> +   is: once the branch is selected in certain loop iteration, any operand
> +   that contributes to computation of the conditional statement remains
> +   unchanged in all following iterations. */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +         this conditional statement. Firstly, it is trivial if the exit
> +         branch is semi-invariant, for the statement is just loop-breaking.
> +         Secondly, if the opposite branch is semi-invariant, it means that
> +         the statement is real loop-invariant, which is covered by loop
> +         unswitch. */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +        return -1;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!can_branch_be_excluded (targ_bb[i]))
> +        continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +         loop latch, it and its following trace will only be executed in
> +         final iteration of loop, namely it is not part of repeated body
> +         of the loop. Similar to the above case that the branch is loop
> +         exit, no need to split loop. */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +        continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want. */
> +  if (invar[0] ^ !invar[1])
> +    return -1;
> +
> +  /* Found a real loop-invariant condition, do nothing. */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return -1;
> +
> +  return invar[1];
> +}
> +
> +/* Return TRUE is conditional statement in a normal loop is also inside
> +   a nested non-recognized loop, such as an irreducible loop. */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                        int branch)
> +{
> +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> +
> +  if (cond_bb == loop->header || branch_bb == loop->latch)
> +    return false;
> +
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  auto_vec<basic_block> worklist;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    bbs[i]->flags &= ~BB_REACHABLE;
> +
> +  /* Mark latch basic block as visited to be end point for reachablility
> +     traversal. */
> +  loop->latch->flags |= BB_REACHABLE;
> +
> +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> +
> +  /* Start from specified branch, the opposite branch is ignored for it
> +     will not be executed. */
> +  branch_bb->flags |= BB_REACHABLE;
> +  worklist.safe_push (branch_bb);
> +
> +  do
> +    {
> +      basic_block bb = worklist.pop ();
> +      edge e;
> +      edge_iterator ei;
> +
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +        {
> +          basic_block succ_bb = e->dest;
> +
> +          if (succ_bb == cond_bb)
> +            return true;
> +
> +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> +            continue;
> +
> +          if (succ_bb->flags & BB_REACHABLE)
> +            continue;
> +
> +          succ_bb->flags |= BB_REACHABLE;
> +          worklist.safe_push (succ_bb);
> +        }
> +    } while (!worklist.is_empty ());
> +
> +  return false;
> +}
> +
> +
> +/* Calculate increased code size measured by estimated insn number if
> +   applying loop split upon certain branch of a conditional statement. */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                         int branch)
> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch. */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +        continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +           gsi_next (&gsi))
> +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> +    }
> +
> +  return num;
> +}
> +
> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement. */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +
> +  /* Add a threshold for increased code size to disable loop split. */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> +    return false;
> +
> +  /* In each interation, conditional statement candidate should be
> +     executed only once. */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;
> +
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go. */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +        return false;
> +    }
> +
> +  /* Temporarily keep branch index in conditional statement. */
> +  gimple_set_plf (cond, GF_PLF_1, branch);
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in a loop, to find out a good
> +   candidate upon which we can do loop split. */
> +
> +static bool
> +mark_cond_to_split_loop (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field. */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* Skip statement in inner recognized loop, because we want that
> +         conditional statement executes at most once in each iteration. */
> +      if (bb->loop_father != loop)
> +        continue;
> +
> +      /* Actually this check is not a must constraint. With it, we can
> +         ensure conditional statement will execute at least once in
> +         each iteration. */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +        continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +        continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +
> +      if (can_split_loop_on_cond (loop, cond))
> +        {
> +          info->cond = cond;
> +          return true;
> +        }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return false;
> +}
> +
> +/* Given a loop with a chosen conditional statement candidate, perform loop
> +   split transformation illustrated as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out. In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result. In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an
> +   entry edge. And also in loop2, we abandon the variant branch of the
> +   conditional statement candidate by setting a constant bool condition,
> +   based on which branch is semi-invariant. */
> +
> +static bool
> +split_loop_for_cond (struct loop *loop1)
> +{
> +  split_info *info = (split_info *) loop1->aux;
> +  gcond *cond = info->cond;
> +  basic_block cond_bb = gimple_bb (cond);
> +  int branch = gimple_plf (cond, GF_PLF_1);
> +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +              current_function_name (), loop1->num,
> +              true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                     profile_probability::always (),
> +                                     profile_probability::never (),
> +                                     profile_probability::always (),
> +                                     profile_probability::always (),
> +                                     true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition. */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                      gimple_cond_code (cond),
> +                                      gimple_cond_lhs (cond),
> +                                      gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  /* Replace the condition in loop2 with a bool constant to let pass
> +     manager remove the variant branch after current pass finishes. */
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1. This
> +     statement acts as a switch to transfer execution from loop1 to
> +     loop2, when loop1 enters into invariant state. */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                          NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +
> +  if (true_invar)
> +    {
> +      to_loop1->flags |= EDGE_FALSE_VALUE;
> +      to_loop2->flags |= EDGE_TRUE_VALUE;
> +    }
> +  else
> +    {
> +      to_loop1->flags |= EDGE_TRUE_VALUE;
> +      to_loop2->flags |= EDGE_FALSE_VALUE;
> +    }
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2. */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops. */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)
> +{
> +  struct loop *loop;
> +  auto_vec<struct loop *> loop_list;
> +  bool changed = false;
> +  unsigned i;
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  /* Go through all loops starting from innermost. */
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      /* Put loop in a list if found a conditional statement candidate in
> +         the loop. This is stage for analysis, no change anything in the
> +         function. */
> +      if (!loop->aux
> +          && !optimize_loop_for_size_p (loop)
> +          && mark_cond_to_split_loop (loop))
> +        loop_list.safe_push (loop);
> +
> +      /* If any of our inner loops was split, don't split us,
> +         and mark our containing loop as having had splits as well. */
> +      loop_outer (loop)->aux = loop->aux;
> +    }
> +
> +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> +    {
> +      /* Extract selected loop and perform loop split. This is stage for
> +         transformation. */
> +      changed |= split_loop_for_cond (loop);
> +
> +      delete (split_info *) loop->aux;
> +    }
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  if (changed)
> +    return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +
>  /* Loop splitting pass.  */
>
>  namespace {
> @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
>  {
>    return new pass_loop_split (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_cond_loop_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "cond_lsplit", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_COND_LOOP_SPLIT, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_cond_loop_split : public gimple_opt_pass
> +{
> +public:
> +  pass_cond_loop_split (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_split_loops != 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_cond_loop_split
> +
> +unsigned int
> +pass_cond_loop_split::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  return tree_ssa_split_loops_for_cond ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_cond_loop_split (gcc::context *ctxt)
> +{
> +  return new pass_cond_loop_split (ctxt);
> +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-05-06  3:04   ` Feng Xue OS
@ 2019-05-06 10:17     ` Richard Biener
  2019-06-18  7:00       ` Ping: [PATCH V2] " Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2019-05-06 10:17 UTC (permalink / raw)
  To: Feng Xue OS, Michael Matz; +Cc: gcc-patches

On Mon, May 6, 2019 at 5:04 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> Hi Richard,
>
>
>    Since gcc 9 has been released, will you get some time to take a look at this patch? Thanks.

I'm working through the backlog but I also hope Micha can have a look
here since he
authored the loop splitting code.

Richard.

>
> Feng
>
> ________________________________
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, March 12, 2019 4:31:49 PM
> To: Feng Xue OS
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134)
>
> On Tue, Mar 12, 2019 at 7:20 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
> >
> > This patch is composed to implement a loop transformation on one of its conditional statements, which we call it semi-invariant, in that its computation is impacted in only one of its branches.
> >
> > Suppose a loop as:
> >
> >     void f (std::map<int, int> m)
> >     {
> >         for (auto it = m.begin (); it != m.end (); ++it) {
> >             /* if (b) is semi-invariant. */
> >             if (b) {
> >                 b = do_something();    /* Has effect on b */
> >             } else {
> >                                                         /* No effect on b */
> >             }
> >             statements;                      /* Also no effect on b */
> >         }
> >     }
> >
> > A transformation, kind of loop split, could be:
> >
> >     void f (std::map<int, int> m)
> >     {
> >         for (auto it = m.begin (); it != m.end (); ++it) {
> >             if (b) {
> >                 b = do_something();
> >             } else {
> >                 ++it;
> >                 statements;
> >                 break;
> >             }
> >             statements;
> >         }
> >
> >         for (; it != m.end (); ++it) {
> >             statements;
> >         }
> >     }
> >
> > If "statements" contains nothing, the second loop becomes an empty one, which can be removed. (This part will be given in another patch). And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop. In practice, this optimization is found to improve some real application by %7.
> >
> > Since it is just a kind of loop split, the codes are mainly placed in existing tree-ssa-loop-split module, and is controlled by -fsplit-loop, and is enabled with -O3.
>
> Note the transform itself is jump-threading with the threading
> duplicating a whole CFG cycle.
>
> I didn't look at the patch details yet since this is suitable for GCC 10 only.
>
> Thanks for implementing this.
> Richard.
>
> > Feng
> >
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index 64bf6017d16..a6c2878d652 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,23 @@
> > +2019-03-12  Feng Xue <fxue@os.amperecomputing.com>
> > +
> > +       PR tree-optimization/89134
> > +        * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> > +       (min-cond-loop-split-prob): Likewise.
> > +       * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> > +       * passes.def (pass_cond_loop_split) : New pass.
> > +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> > +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> > +       * tree-ssa-loop-split.c (split_info): New class.
> > +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> > +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> > +       (can_branch_be_excluded, get_cond_invariant_branch): Likewise.
> > +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> > +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> > +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> > +       (pass_data_cond_loop_split): New variable.
> > +       (pass_cond_loop_split): New class.
> > +       (make_pass_cond_loop_split): New function.
> > +
> >  2019-03-11  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR middle-end/89655
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index df0883f2fc9..f5e09bd71fd 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11316,6 +11316,14 @@ The maximum number of branches unswitched in a single loop.
> >  @item lim-expensive
> >  The minimum cost of an expensive expression in the loop invariant motion.
> >
> > +@item max-cond-loop-split-insns
> > +The maximum number of insns to be increased due to loop split on
> > +semi-invariant condition statement.
> > +
> > +@item min-cond-loop-split-prob
> > +The minimum threshold for probability of semi-invaraint condition
> > +statement to trigger loop split.
> > +
> >  @item iv-consider-all-candidates-bound
> >  Bound on number of candidates for induction variables, below which
> >  all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 3f1576448be..2e067526958 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 +386,18 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> >         "The maximum number of unswitchings in a single loop.",
> >         3, 0, 0)
> >
> > +/* The maximum number of increased insns due to loop split on semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > +       "max-cond-loop-split-insns",
> > +       "The maximum number of insns to be increased due to loop split on semi-invariant condition statement.",
> > +       100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > +       "min-cond-loop-split-prob",
> > +       "The minimum threshold for probability of semi-invaraint condition statement to trigger loop split.",
> > +       30, 0, 100)
> > +
> >  /* The maximum number of insns in loop header duplicated by the copy loop
> >     headers pass.  */
> >  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 446a7c48276..bde7f4c50c0 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -265,6 +265,7 @@ along with GCC; see the file COPYING3.  If not see
> >           NEXT_PASS (pass_tree_unswitch);
> >           NEXT_PASS (pass_scev_cprop);
> >           NEXT_PASS (pass_loop_split);
> > +         NEXT_PASS (pass_cond_loop_split);
> >           NEXT_PASS (pass_loop_versioning);
> >           NEXT_PASS (pass_loop_jam);
> >           /* All unswitching, final value replacement and splitting can expose
> > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > index 54154464a58..39f2df0e3ec 100644
> > --- a/gcc/timevar.def
> > +++ b/gcc/timevar.def
> > @@ -189,6 +189,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
> >  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
> >  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
> >  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> > +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
> >  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
> >  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
> >  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > index 47be59b2a11..f441ba36871 100644
> > --- a/gcc/tree-pass.h
> > +++ b/gcc/tree-pass.h
> > @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> > +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index 999c9a30366..d287a0d7d4c 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-ssa-loop.h"
> >  #include "tree-ssa-loop-manip.h"
> >  #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> >  #include "cfgloop.h"
> > +#include "params.h"
> >  #include "tree-scalar-evolution.h"
> >  #include "gimple-iterator.h"
> >  #include "gimple-pretty-print.h"
> > @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimple-fold.h"
> >  #include "gimplify-me.h"
> >
> > -/* This file implements loop splitting, i.e. transformation of loops like
> > +/* This file implements two kind of loop splitting.
> > +
> > +   One transformation of loops like:
> >
> >     for (i = 0; i < 100; i++)
> >       {
> > @@ -670,6 +674,803 @@ tree_ssa_split_loops (void)
> >    return 0;
> >  }
> >
> > +
> > +/* Another transformation of loops like:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;  // change at least one a_j
> > +       else
> > +         S;          // not change any a_j
> > +     }
> > +
> > +   into:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;
> > +       else
> > +         {
> > +           S;
> > +           i = NEXT ();
> > +           break;
> > +         }
> > +     }
> > +
> > +   for (; CHECK (i); i = NEXT ())
> > +     {
> > +       S;
> > +     }
> > +
> > +   */
> > +
> > +/* Data structure to hold temporary information during loop split upon
> > +   semi-invariant conditional statement. */
> > +class split_info {
> > +public:
> > +  /* Array of all basic blocks in a loop, returned by get_loop_body(). */
> > +  basic_block *bbs;
> > +
> > +  /* All memory store/clobber statements in a loop. */
> > +  auto_vec<gimple *> stores;
> > +
> > +  /* Whether above memory stores vector has been filled. */
> > +  bool set_stores;
> > +
> > +  /* Semi-invariant conditional statement, upon which to split loop. */
> > +  gcond *cond;
> > +
> > +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> > +
> > +  ~split_info ()
> > +    {
> > +      if (bbs)
> > +        free (bbs);
> > +    }
> > +};
> > +
> > +/* Find all statements with memory-write effect in a loop, including memory
> > +   store and non-pure function call, and keep those in a vector. This work
> > +   is only done for one time, for the vector should be constant during
> > +   analysis stage of semi-invariant condition. */
> > +
> > +static void
> > +find_vdef_in_loop (struct loop *loop)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +  gphi *vphi = get_virtual_phi (loop->header);
> > +
> > +  /* Indicate memory store vector has been filled. */
> > +  info->set_stores = true;
> > +
> > +  /* If loop contains memory operation, there must be a virtual PHI node in
> > +     loop header basic block. */
> > +  if (vphi == NULL)
> > +    return;
> > +
> > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > +     graph via virtual PHI nodes. The virtual PHI node in loop header just
> > +     links the first and the last virtual SSA names, by using the last as
> > +     PHI operand to define the first. */
> > +  const edge latch = loop_latch_edge (loop);
> > +  const tree first = gimple_phi_result (vphi);
> > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > +
> > +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> > +     is defined by itself.
> > +
> > +        .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > +
> > +     This means the loop contains only memory loads, so we can skip it. */
> > +  if (first == last)
> > +    return;
> > +
> > +  auto_vec<gimple *> others;
> > +  auto_vec<tree> worklist;
> > +  auto_bitmap visited;
> > +
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > +  worklist.safe_push (last);
> > +
> > +  do
> > +    {
> > +      tree vuse = worklist.pop ();
> > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > +
> > +      /* We mark the first and last SSA names as visited at the beginning,
> > +         and reversely start the process from the last SSA name toward the
> > +         first, which ensure that this do-while will not touch SSA names
> > +         defined outside of the loop. */
> > +      gcc_assert (gimple_bb (stmt)
> > +                  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > +
> > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > +        {
> > +          gphi *phi = as_a <gphi *> (stmt);
> > +
> > +          for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +            {
> > +              tree arg = gimple_phi_arg_def (stmt, i);
> > +
> > +              if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > +                worklist.safe_push (arg);
> > +            }
> > +        }
> > +      else
> > +        {
> > +          tree prev = gimple_vuse (stmt);
> > +
> > +          /* Non-pure call statement is conservatively assumed to impact
> > +             all memory locations. So place call statements ahead of other
> > +             memory stores in the vector with the idea of of using them as
> > +             shortcut terminators to memory alias analysis, kind of
> > +             optimization for compilation. */
> > +          if (gimple_code (stmt) == GIMPLE_CALL)
> > +            info->stores.safe_push (stmt);
> > +          else
> > +            others.safe_push (stmt);
> > +
> > +          if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > +            worklist.safe_push (prev);
> > +        }
> > +    } while (!worklist.is_empty ());
> > +
> > +    info->stores.safe_splice (others);
> > +}
> > +
> > +
> > +/* Given a memory load or pure call statement, check whether it is impacted
> > +   by some memory store in the loop excluding those basic blocks dominated
> > +   by SKIP_HEAD (those basic blocks always corresponds to one branch of
> > +   a conditional statement). If SKIP_HEAD is NULL, all basic blocks of the
> > +   loop are checked. */
> > +
> > +static bool
> > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +
> > +  /* Collect memory store/clobber statements if have not do that. */
> > +  if (!info->set_stores)
> > +    find_vdef_in_loop (loop);
> > +
> > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> > +  ao_ref ref;
> > +  gimple *store;
> > +  unsigned i;
> > +
> > +  ao_ref_init (&ref, rhs);
> > +
> > +  FOR_EACH_VEC_ELT (info->stores, i, store)
> > +    {
> > +      /* Skip those basic blocks dominated by SKIP_HEAD. */
> > +      if (skip_head
> > +          && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> > +        continue;
> > +
> > +      /* For a pure call, it is assumed to be impacted by any memory store.
> > +         For a memory load, use memory alias analysis to check that. */
> > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > +        return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Forward declaration */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head);
> > +
> > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed in certain
> > +   iteration, check whether an SSA name remains unchanged in next interation.
> > +   We can call this characterisic as semi-invariantness. SKIP_HEAD might be
> > +   NULL, if so, nothing excluded, all basic blocks and control flows in the
> > +   loop will be considered. */
> > +
> > +static bool
> > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > +                      const_basic_block skip_head)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > +  const_basic_block def_bb = gimple_bb (def);
> > +
> > +  /* An SSA name defined outside a loop is definitely semi-invariant. */
> > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > +    return true;
> > +
> > +  /* This function is used to check semi-invariantness of a condition
> > +     statement, and SKIP_HEAD is always given as head of one of its
> > +     branches. So it implies that SSA name to check should be defined
> > +     before the conditional statement, and also before SKIP_HEAD. */
> > +
> > +  if (gimple_code (def) == GIMPLE_PHI)
> > +    {
> > +      /* In a normal loop, if a PHI node is located not in loop header, all
> > +         its source operands should be defined inside the loop. As we
> > +         mentioned before, these source definitions are ahead of SKIP_HEAD,
> > +         and will not be bypassed. Therefore, in each iteration, any of
> > +         these sources might be value provider to the SSA name, which for
> > +         sure should not be seen as invariant. */
> > +      if (def_bb != loop->header || !skip_head)
> > +        return false;
> > +
> > +      const_edge latch = loop_latch_edge (loop);
> > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > +
> > +      /* A PHI node in loop header always contains two source operands,
> > +         one is initial value, the other is the copy of last iteration
> > +         through loop latch, we call it latch value. From this PHI node
> > +         to definition of latch value, if excluding those basic blocks
> > +         dominated by SKIP_HEAD, there is no definition of other version
> > +         of same variable, SSA name defined by the PHI node is
> > +         semi-invariant.
> > +
> > +                         loop entry
> > +                              |     .--- latch ---.
> > +                              |     |             |
> > +                              v     v             |
> > +                  x_1 = PHI <x_0,  x_3>           |
> > +                           |                      |
> > +                           v                      |
> > +              .------- if (cond) -------.         |
> > +              |                         |         |
> > +              |                     [ SKIP ]      |
> > +              |                         |         |
> > +              |                     x_2 = ...     |
> > +              |                         |         |
> > +              '---- T ---->.<---- F ----'         |
> > +                           |                      |
> > +                           v                      |
> > +                  x_3 = PHI <x_1, x_2>            |
> > +                           |                      |
> > +                           '----------------------'
> > +
> > +        Suppose in certain iteration, execution flow in above graph goes
> > +        through true branch, which means that one source value to define
> > +        x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> > +        x_1 in next iterations is defined by x_3, we know that x_1 will
> > +        never changed if COND always chooses true branch from then on. */
> > +
> > +      while (from != name)
> > +        {
> > +          /* A new value comes from a CONSTANT. */
> > +          if (TREE_CODE (from) != SSA_NAME)
> > +            return false;
> > +
> > +          gimple *stmt = SSA_NAME_DEF_STMT (from);
> > +          const_basic_block bb = gimple_bb (stmt);
> > +
> > +          /* A new value comes from outside of loop. */
> > +          if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > +            return false;
> > +
> > +          from = NULL_TREE;
> > +
> > +          if (gimple_code (stmt) == GIMPLE_PHI)
> > +            {
> > +              gphi *phi = as_a <gphi *> (stmt);
> > +
> > +              for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +                {
> > +                  const_edge e = gimple_phi_arg_edge (phi, i);
> > +
> > +                  /* Skip redefinition from basic blocks being excluded. */
> > +                  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> > +                    {
> > +                      /* There are more than one source operands that can
> > +                         provide value to the SSA name. */
> > +                      if (from)
> > +                        return false;
> > +
> > +                      from = gimple_phi_arg_def (phi, i);
> > +                    }
> > +                }
> > +            }
> > +          else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > +            {
> > +              /* For simple value copy, check its rhs instead. */
> > +              if (gimple_assign_ssa_name_copy_p (stmt))
> > +                from = gimple_assign_rhs1 (stmt);
> > +            }
> > +
> > +          /* Any other kind of definition is deemed to introduce a new value
> > +             to the SSA name. */
> > +          if (!from)
> > +            return false;
> > +        }
> > +        return true;
> > +    }
> > +
> > +  /* Value originated from volatile memory load or return of normal (non-
> > +     const/pure) call should not be treated as constant in each iteration. */
> > +  if (gimple_has_side_effects (def))
> > +    return false;
> > +
> > +  /* Check if any memory store may kill memory load at this place. */
> > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> > +    return false;
> > +
> > +  /* Check operands of definition statement of the SSA name. */
> > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > +}
> > +
> > +/* Check whether a statement is semi-invariant, iff all its operands are
> > +   semi-invariant. */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                       const_basic_block skip_head)
> > +{
> > +  ssa_op_iter iter;
> > +  tree use;
> > +
> > +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> > +     here we only need to check SSA name operands. For VARDECL operand
> > +     involves memory load, check on VARDECL operand must have been done
> > +     prior to invocation of this function in ssa_semi_invariant_p. */
> > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > +    {
> > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > +        return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine if unselect one branch of a conditional statement, whether we
> > +   can exclude leading basic block of the branch and those basic blocks
> > +   dominated by the leading one. */
> > +
> > +static bool
> > +can_branch_be_excluded (basic_block branch_bb)
> > +{
> > +  if (single_pred_p (branch_bb))
> > +    return true;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > +    {
> > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > +        continue;
> > +
> > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > +        continue;
> > +
> > +       /* The branch can be reached through other path, not just from the
> > +          conditional statement. */
> > +      return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Find out which branch of a conditional statement is invariant. That
> > +   is: once the branch is selected in certain loop iteration, any operand
> > +   that contributes to computation of the conditional statement remains
> > +   unchanged in all following iterations. */
> > +
> > +static int
> > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > +{
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  basic_block targ_bb[2];
> > +  bool invar[2];
> > +  unsigned invar_checks;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > +
> > +      /* One branch directs to loop exit, no need to perform loop split upon
> > +         this conditional statement. Firstly, it is trivial if the exit
> > +         branch is semi-invariant, for the statement is just loop-breaking.
> > +         Secondly, if the opposite branch is semi-invariant, it means that
> > +         the statement is real loop-invariant, which is covered by loop
> > +         unswitch. */
> > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > +        return -1;
> > +    }
> > +
> > +  invar_checks = 0;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      invar[!i] = false;
> > +
> > +      if (!can_branch_be_excluded (targ_bb[i]))
> > +        continue;
> > +
> > +      /* Given a semi-invariant branch, if its opposite branch dominates
> > +         loop latch, it and its following trace will only be executed in
> > +         final iteration of loop, namely it is not part of repeated body
> > +         of the loop. Similar to the above case that the branch is loop
> > +         exit, no need to split loop. */
> > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > +        continue;
> > +
> > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > +      invar_checks++;
> > +    }
> > +
> > +  /* With both branches being invariant (handled by loop unswitch) or
> > +     variant is not what we want. */
> > +  if (invar[0] ^ !invar[1])
> > +    return -1;
> > +
> > +  /* Found a real loop-invariant condition, do nothing. */
> > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > +    return -1;
> > +
> > +  return invar[1];
> > +}
> > +
> > +/* Return TRUE is conditional statement in a normal loop is also inside
> > +   a nested non-recognized loop, such as an irreducible loop. */
> > +
> > +static bool
> > +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> > +                        int branch)
> > +{
> > +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> > +
> > +  if (cond_bb == loop->header || branch_bb == loop->latch)
> > +    return false;
> > +
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  auto_vec<basic_block> worklist;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    bbs[i]->flags &= ~BB_REACHABLE;
> > +
> > +  /* Mark latch basic block as visited to be end point for reachablility
> > +     traversal. */
> > +  loop->latch->flags |= BB_REACHABLE;
> > +
> > +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> > +
> > +  /* Start from specified branch, the opposite branch is ignored for it
> > +     will not be executed. */
> > +  branch_bb->flags |= BB_REACHABLE;
> > +  worklist.safe_push (branch_bb);
> > +
> > +  do
> > +    {
> > +      basic_block bb = worklist.pop ();
> > +      edge e;
> > +      edge_iterator ei;
> > +
> > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > +        {
> > +          basic_block succ_bb = e->dest;
> > +
> > +          if (succ_bb == cond_bb)
> > +            return true;
> > +
> > +          if (!flow_bb_inside_loop_p (loop, succ_bb))
> > +            continue;
> > +
> > +          if (succ_bb->flags & BB_REACHABLE)
> > +            continue;
> > +
> > +          succ_bb->flags |= BB_REACHABLE;
> > +          worklist.safe_push (succ_bb);
> > +        }
> > +    } while (!worklist.is_empty ());
> > +
> > +  return false;
> > +}
> > +
> > +
> > +/* Calculate increased code size measured by estimated insn number if
> > +   applying loop split upon certain branch of a conditional statement. */
> > +
> > +static int
> > +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> > +                         int branch)
> > +{
> > +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  int num = 0;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      /* Do no count basic blocks only in opposite branch. */
> > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> > +        continue;
> > +
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> > +           gsi_next (&gsi))
> > +        num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> > +    }
> > +
> > +  return num;
> > +}
> > +
> > +/* Return true if it is eligible and profitable to perform loop split upon
> > +   a conditional statement. */
> > +
> > +static bool
> > +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> > +{
> > +  int branch = get_cond_invariant_branch (loop, cond);
> > +
> > +  if (branch < 0)
> > +    return false;
> > +
> > +  basic_block cond_bb = gimple_bb (cond);
> > +
> > +  /* Add a threshold for increased code size to disable loop split. */
> > +  if (compute_added_num_insns (loop, cond_bb, branch) >
> > +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> > +    return false;
> > +
> > +  /* In each interation, conditional statement candidate should be
> > +     executed only once. */
> > +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> > +    return false;
> > +
> > +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> > +
> > +  /* When accurate profile information is available, and execution
> > +     frequency of the branch is too low, just let it go. */
> > +  if (prob.reliable_p ())
> > +    {
> > +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> > +
> > +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> > +        return false;
> > +    }
> > +
> > +  /* Temporarily keep branch index in conditional statement. */
> > +  gimple_set_plf (cond, GF_PLF_1, branch);
> > +  return true;
> > +}
> > +
> > +/* Traverse all conditional statements in a loop, to find out a good
> > +   candidate upon which we can do loop split. */
> > +
> > +static bool
> > +mark_cond_to_split_loop (struct loop *loop)
> > +{
> > +  split_info *info = new split_info ();
> > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > +
> > +  /* Allocate an area to keep temporary info, and associate its address
> > +     with loop aux field. */
> > +  loop->aux = info;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      basic_block bb = bbs[i];
> > +
> > +      /* Skip statement in inner recognized loop, because we want that
> > +         conditional statement executes at most once in each iteration. */
> > +      if (bb->loop_father != loop)
> > +        continue;
> > +
> > +      /* Actually this check is not a must constraint. With it, we can
> > +         ensure conditional statement will execute at least once in
> > +         each iteration. */
> > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +        continue;
> > +
> > +      gimple *last = last_stmt (bb);
> > +
> > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > +        continue;
> > +
> > +      gcond *cond = as_a <gcond *> (last);
> > +
> > +      if (can_split_loop_on_cond (loop, cond))
> > +        {
> > +          info->cond = cond;
> > +          return true;
> > +        }
> > +    }
> > +
> > +  delete info;
> > +  loop->aux = NULL;
> > +
> > +  return false;
> > +}
> > +
> > +/* Given a loop with a chosen conditional statement candidate, perform loop
> > +   split transformation illustrated as the following graph.
> > +
> > +               .-------T------ if (true) ------F------.
> > +               |                    .---------------. |
> > +               |                    |               | |
> > +               v                    |               v v
> > +          pre-header                |            pre-header
> > +               | .------------.     |                 | .------------.
> > +               | |            |     |                 | |            |
> > +               | v            |     |                 | v            |
> > +             header           |     |               header           |
> > +               |              |     |                 |              |
> > +       [ bool r = cond; ]     |     |                 |              |
> > +               |              |     |                 |              |
> > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > +      |                 |     |     |        |                 |     |
> > +  invariant             |     |     |    invariant             |     |
> > +      |                 |     |     |        |                 |     |
> > +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> > +               |              |    /                  |              |
> > +             stmts            |   /                 stmts            |
> > +               |              |  /                    |              |
> > +              / \             | /                    / \             |
> > +     .-------*   *       [ if (!r) ]        .-------*   *            |
> > +     |           |            |             |           |            |
> > +     |         latch          |             |         latch          |
> > +     |           |            |             |           |            |
> > +     |           '------------'             |           '------------'
> > +     '------------------------. .-----------'
> > +             loop1            | |                   loop2
> > +                              v v
> > +                             exits
> > +
> > +   In the graph, loop1 represents the part derived from original one, and
> > +   loop2 is duplicated using loop_version (), which corresponds to the part
> > +   of original one being splitted out. In loop1, a new bool temporary (r)
> > +   is introduced to keep value of the condition result. In original latch
> > +   edge of loop1, we insert a new conditional statement whose value comes
> > +   from previous temporary (r), one of its branch goes back to loop1 header
> > +   as a latch edge, and the other branch goes to loop2 pre-header as an
> > +   entry edge. And also in loop2, we abandon the variant branch of the
> > +   conditional statement candidate by setting a constant bool condition,
> > +   based on which branch is semi-invariant. */
> > +
> > +static bool
> > +split_loop_for_cond (struct loop *loop1)
> > +{
> > +  split_info *info = (split_info *) loop1->aux;
> > +  gcond *cond = info->cond;
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  int branch = gimple_plf (cond, GF_PLF_1);
> > +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> > +              current_function_name (), loop1->num,
> > +              true_invar ? "T" : "F", cond_bb->index);
> > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > +   }
> > +
> > +  initialize_original_copy_tables ();
> > +
> > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > +                                     profile_probability::always (),
> > +                                     profile_probability::never (),
> > +                                     profile_probability::always (),
> > +                                     profile_probability::always (),
> > +                                     true);
> > +  if (!loop2)
> > +    {
> > +      free_original_copy_tables ();
> > +      return false;
> > +    }
> > +
> > +  /* Generate a bool type temporary to hold result of the condition. */
> > +  tree tmp = make_ssa_name (boolean_type_node);
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple *stmt = gimple_build_assign (tmp,
> > +                                      gimple_cond_code (cond),
> > +                                      gimple_cond_lhs (cond),
> > +                                      gimple_cond_rhs (cond));
> > +
> > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > +  update_stmt (cond);
> > +
> > +  /* Replace the condition in loop2 with a bool constant to let pass
> > +     manager remove the variant branch after current pass finishes. */
> > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > +
> > +  if (true_invar)
> > +    gimple_cond_make_true (cond_copy);
> > +  else
> > +    gimple_cond_make_false (cond_copy);
> > +
> > +  update_stmt (cond_copy);
> > +
> > +  /* Insert a new conditional statement on latch edge of loop1. This
> > +     statement acts as a switch to transfer execution from loop1 to
> > +     loop2, when loop1 enters into invariant state. */
> > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> > +                                          NULL_TREE, NULL_TREE);
> > +
> > +  gsi = gsi_last_bb (break_bb);
> > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > +
> > +  edge to_loop1 = single_succ_edge (break_bb);
> > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> > +
> > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > +
> > +  if (true_invar)
> > +    {
> > +      to_loop1->flags |= EDGE_FALSE_VALUE;
> > +      to_loop2->flags |= EDGE_TRUE_VALUE;
> > +    }
> > +  else
> > +    {
> > +      to_loop1->flags |= EDGE_TRUE_VALUE;
> > +      to_loop2->flags |= EDGE_FALSE_VALUE;
> > +    }
> > +
> > +  update_ssa (TODO_update_ssa);
> > +
> > +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> > +     pre-header, we should update PHIs in loop2 to reflect this connection
> > +     between loop1 and loop2. */
> > +  connect_loop_phis (loop1, loop2, to_loop2);
> > +
> > +  free_original_copy_tables ();
> > +
> > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > +
> > +  return true;
> > +}
> > +
> > +/* Main entry point to perform loop splitting for suitable if-conditions
> > +   in all loops. */
> > +
> > +static unsigned int
> > +tree_ssa_split_loops_for_cond (void)
> > +{
> > +  struct loop *loop;
> > +  auto_vec<struct loop *> loop_list;
> > +  bool changed = false;
> > +  unsigned i;
> > +
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  /* Go through all loops starting from innermost. */
> > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > +    {
> > +      /* Put loop in a list if found a conditional statement candidate in
> > +         the loop. This is stage for analysis, no change anything in the
> > +         function. */
> > +      if (!loop->aux
> > +          && !optimize_loop_for_size_p (loop)
> > +          && mark_cond_to_split_loop (loop))
> > +        loop_list.safe_push (loop);
> > +
> > +      /* If any of our inner loops was split, don't split us,
> > +         and mark our containing loop as having had splits as well. */
> > +      loop_outer (loop)->aux = loop->aux;
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> > +    {
> > +      /* Extract selected loop and perform loop split. This is stage for
> > +         transformation. */
> > +      changed |= split_loop_for_cond (loop);
> > +
> > +      delete (split_info *) loop->aux;
> > +    }
> > +
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  if (changed)
> > +    return TODO_cleanup_cfg;
> > +  return 0;
> > +}
> > +
> > +
> >  /* Loop splitting pass.  */
> >
> >  namespace {
> > @@ -716,3 +1517,48 @@ make_pass_loop_split (gcc::context *ctxt)
> >  {
> >    return new pass_loop_split (ctxt);
> >  }
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_cond_loop_split =
> > +{
> > +  GIMPLE_PASS, /* type */
> > +  "cond_lsplit", /* name */
> > +  OPTGROUP_LOOP, /* optinfo_flags */
> > +  TV_COND_LOOP_SPLIT, /* tv_id */
> > +  PROP_cfg, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  0, /* todo_flags_finish */
> > +};
> > +
> > +class pass_cond_loop_split : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_cond_loop_split (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *) { return flag_split_loops != 0; }
> > +  virtual unsigned int execute (function *);
> > +
> > +}; // class pass_cond_loop_split
> > +
> > +unsigned int
> > +pass_cond_loop_split::execute (function *fun)
> > +{
> > +  if (number_of_loops (fun) <= 1)
> > +    return 0;
> > +
> > +  return tree_ssa_split_loops_for_cond ();
> > +}
> > +
> > +} // anon namespace
> > +
> > +gimple_opt_pass *
> > +make_pass_cond_loop_split (gcc::context *ctxt)
> > +{
> > +  return new pass_cond_loop_split (ctxt);
> > +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Ping: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-05-06 10:17     ` Richard Biener
@ 2019-06-18  7:00       ` Feng Xue OS
  2019-07-15  2:34         ` Ping agian: " Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-06-18  7:00 UTC (permalink / raw)
  To: Richard Biener, Michael Matz; +Cc: gcc-patches

Richard & Michael,

   I made some adjustments on coding style and added test cases for this version.

   Would you please take a look at the patch? It is long a little bit and might steal some
   of your time.

Thanks a lot.

----
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a46f93d89d..2334b184945 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2019-06-18  Feng Xue <fxue@os.amperecomputing.com>
+
+	PR tree-optimization/89134
+	* doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
+	(min-cond-loop-split-prob): Likewise.
+	* params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
+	* passes.def (pass_cond_loop_split) : New pass.
+	* timevar.def (TV_COND_LOOP_SPLIT): New time variable.
+	* tree-pass.h (make_pass_cond_loop_split): New declaration.
+	* tree-ssa-loop-split.c (split_info): New class.
+	(find_vdef_in_loop, vuse_semi_invariant_p): New functions.
+	(ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
+	(branch_removable_p, get_cond_invariant_branch): Likewise.
+	(is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
+	(can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
+	(split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
+	(pass_data_cond_loop_split): New variable.
+	(pass_cond_loop_split): New class.
+	(make_pass_cond_loop_split): New function.
+
 2019-06-18  Kewen Lin  <linkw@gcc.gnu.org>
 
 	PR middle-end/80791
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eaef4cd63d2..0427fede3d6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item max-cond-loop-split-insns
+The maximum number of insns to be increased due to loop split on
+semi-invariant condition statement.
+
+@item min-cond-loop-split-prob
+The minimum threshold for probability of semi-invaraint condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 0db60951413..5384f7d1c4d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
 	"The maximum number of unswitchings in a single loop.",
 	3, 0, 0)
 
+/* The maximum number of increased insns due to loop split on semi-invariant
+   condition statement.  */
+DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
+	"max-cond-loop-split-insns",
+	"The maximum number of insns to be increased due to loop split on "
+	"semi-invariant condition statement.",
+	100, 0, 0)
+
+DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
+	"min-cond-loop-split-prob",
+	"The minimum threshold for probability of semi-invaraint condition "
+	"statement to trigger loop split.",
+	30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
diff --git a/gcc/passes.def b/gcc/passes.def
index ad2efabd385..bb32b88738e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -261,6 +261,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_tree_unswitch);
 	  NEXT_PASS (pass_scev_cprop);
 	  NEXT_PASS (pass_loop_split);
+	  NEXT_PASS (pass_cond_loop_split);
 	  NEXT_PASS (pass_loop_versioning);
 	  NEXT_PASS (pass_loop_jam);
 	  /* All unswitching, final value replacement and splitting can expose
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 27a522e0140..9aa069e5c29 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2019-06-18  Feng Xue  <fxue@os.amperecomputing.com>
+
+	* gcc.dg/tree-ssa/loop-cond-split-1.c: New test.
+	* g++.dg/tree-ssa/loop-cond-split-1.C: New test.
+
 2019-06-17  Jakub Jelinek  <jakub@redhat.com>
 
 	* gcc.dg/vect/vect-simd-8.c: New test.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
new file mode 100644
index 00000000000..df269c5ee44
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */
+
+#include <string>
+#include <map>
+
+using namespace std;
+
+class  A
+{
+public:
+  bool empty;
+  void set (string s);
+};
+
+class  B
+{
+  map<int, string> m;
+  void f ();
+};
+
+extern A *ga;
+
+void B::f ()
+{
+  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
+    {
+      if (ga->empty)
+        ga->set (iter->second);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
new file mode 100644
index 00000000000..a0eb7a26ad5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */
+
+__attribute__((pure)) __attribute__((noinline)) int inc (int i)
+{
+  return i + 1;
+}
+
+extern int do_something (void);
+extern int b;
+
+void test(int n)
+{
+  int i;
+
+  for (i = 0; i < n; i = inc (i))
+    {
+      if (b)
+        b = do_something();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 13cb470b688..5a2a80a29f7 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -188,6 +188,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
 DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
+DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
 DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3a0b3805d24..cdb7ef3c9f2 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 999c9a30366..7239d0cfb00 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 
-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kind of loop splitting.
+
+   One transformation of loops like:
 
    for (i = 0; i < 100; i++)
      {
@@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
   return 0;
 }
 
+
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement.  */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop.  */
+  auto_vec<gimple *> stores;
+
+  /* Whether above memory stores vector has been filled.  */
+  bool set_stores;
+
+  /* Semi-invariant conditional statement, upon which to split loop.  */
+  gcond *cond;
+
+  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+	free (bbs);
+    }
+};
+
+/* Find all statements with memory-write effect in LOOP, including memory
+   store and non-pure function call, and keep those in a vector.  This work
+   is only done one time, for the vector should be constant during analysis
+   stage of semi-invariant condition.  */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled.  */
+  info->set_stores = true;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block.  */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes.  The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first.  */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it.  */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> others;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+	 and reversely start the process from the last SSA name towards the
+	 first, which ensures that this do-while will not touch SSA names
+	 defined outside of the loop.  */
+      gcc_assert (gimple_bb (stmt)
+		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+	{
+	  gphi *phi = as_a <gphi *> (stmt);
+
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree arg = gimple_phi_arg_def (stmt, i);
+
+	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+		worklist.safe_push (arg);
+	    }
+	}
+      else
+	{
+	  tree prev = gimple_vuse (stmt);
+
+	  /* Non-pure call statement is conservatively assumed to impact all
+	     memory locations.  So place call statements ahead of other memory
+	     stores in the vector with an idea of of using them as shortcut
+	     terminators to memory alias analysis.  */
+	  if (gimple_code (stmt) == GIMPLE_CALL)
+	    info->stores.safe_push (stmt);
+	  else
+	    others.safe_push (stmt);
+
+	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+	    worklist.safe_push (prev);
+	}
+    } while (!worklist.is_empty ());
+
+    info->stores.safe_splice (others);
+}
+
+
+/* Given STMT, memory load or pure call statement, check whether it is impacted
+   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
+   trace is composed of SKIP_HEAD and those basic block dominated by it, always
+   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
+   NULL, all basic blocks of LOOP are checked.  */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+
+  /* Collect memory store/clobber statements if have not do that.  */
+  if (!info->set_stores)
+    find_vdef_in_loop (loop);
+
+  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->stores, i, store)
+    {
+      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
+      if (skip_head
+	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+	continue;
+
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+	return false;
+    }
+
+  return true;
+}
+
+/* Forward declaration.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head);
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
+   certain iteration of LOOP, check whether an SSA name (NAME) remains
+   unchanged in next interation.  We call this characterisic as semi-
+   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
+   basic blocks and control flows in the loop will be considered.  If non-
+   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
+
+static bool
+ssa_semi_invariant_p (struct loop *loop, const tree name,
+		      const_basic_block skip_head)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside a loop is definitely semi-invariant.  */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  if (gimple_code (def) == GIMPLE_PHI)
+    {
+      /* For PHI node that is not in loop header, its source operands should
+	 be defined inside the loop, which are seen as loop variant.  */
+      if (def_bb != loop->header || !skip_head)
+	return false;
+
+      const_edge latch = loop_latch_edge (loop);
+      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
+
+      /* A PHI node in loop header contains two source operands, one is
+	 initial value, the other is the copy of last iteration through loop
+	 latch, we call it latch value.  From the PHI node to definition
+	 of latch value, if excluding branch trace from SKIP_HEAD, there
+	 is no definition of other version of same variable, SSA name defined
+	 by the PHI node is semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+	Suppose in certain iteration, execution flow in above graph goes
+	through true branch, which means that one source value to define
+	x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
+	x_1 in next iterations is defined by x_3, we know that x_1 will
+	never changed if COND always chooses true branch from then on.  */
+
+      while (from != name)
+	{
+	  /* A new value comes from a CONSTANT.  */
+	  if (TREE_CODE (from) != SSA_NAME)
+	    return false;
+
+	  gimple *stmt = SSA_NAME_DEF_STMT (from);
+	  const_basic_block bb = gimple_bb (stmt);
+
+	  /* A new value comes from outside of loop.  */
+	  if (!bb || !flow_bb_inside_loop_p (loop, bb))
+	    return false;
+
+	  from = NULL_TREE;
+
+	  if (gimple_code (stmt) == GIMPLE_PHI)
+	    {
+	      gphi *phi = as_a <gphi *> (stmt);
+
+	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		{
+		  const_edge e = gimple_phi_arg_edge (phi, i);
+
+		  /* Not consider redefinitions in excluded basic blocks.  */
+		  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+		    {
+		      /* There are more than one source operands that can
+			 provide value to the SSA name, it is variant.  */
+		      if (from)
+			return false;
+
+		      from = gimple_phi_arg_def (phi, i);
+		    }
+		}
+	    }
+	  else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+	    {
+	      /* For simple value copy, check its rhs instead.  */
+	      if (gimple_assign_ssa_name_copy_p (stmt))
+		from = gimple_assign_rhs1 (stmt);
+	    }
+
+	  /* Any other kind of definition is deemed to introduce a new value
+	     to the SSA name.  */
+	  if (!from)
+	    return false;
+	}
+	return true;
+    }
+
+  /* Value originated from volatile memory load or return of normal (non-
+     const/pure) call should not be treated as constant in each iteration.  */
+  if (gimple_has_side_effects (def))
+    return false;
+
+  /* Check if any memory store may kill memory load at this place.  */
+  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
+    return false;
+
+  /* Check operands of definition statement of the SSA name.  */
+  return stmt_semi_invariant_p (loop, def, skip_head);
+}
+
+/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
+   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
+   dominated by it are excluded from the loop.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  ssa_op_iter iter;
+  tree use;
+
+  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
+     here we only need to check SSA name operands.  This is because check on
+     VARDECL operands, which involve memory loads, must have been done
+     prior to invocation of this function in vuse_semi_invariant_p.  */
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+    {
+      if (!ssa_semi_invariant_p (loop, use, skip_head))
+	return false;
+    }
+
+  return true;
+}
+
+/* Determine when conditional statement never transfers execution to one of its
+   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
+   and those basic blocks dominated by BRANCH_BB.  */
+
+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+	continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+	continue;
+
+       /* The branch can be reached from opposite branch, or from some
+	  statement not dominated by the conditional statement.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement (COND) is invariant in the
+   execution context of LOOP.  That is: once the branch is selected in certain
+   iteration of the loop, any operand that contributes to computation of the
+   conditional statement remains unchanged in all following iterations.  */
+
+static int
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+	 this conditional statement.  Firstly, it is trivial if the exit branch
+	 is semi-invariant, for the statement is just to break loop.  Secondly,
+	 if the opposite branch is semi-invariant, it means that the statement
+	 is real loop-invariant, which is covered by loop unswitch.  */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+	return -1;
+    }
+
+  invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!branch_removable_p (targ_bb[i]))
+	continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+	 loop latch, it and its following trace will only be executed in
+	 final iteration of loop, namely it is not part of repeated body
+	 of the loop.  Similar to the above case that the branch is loop
+	 exit, no need to split loop.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+	continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want.  */
+  if (invar[0] ^ !invar[1])
+    return -1;
+
+  /* Found a real loop-invariant condition, do nothing.  */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return -1;
+
+  return invar[1];
+}
+
+/* Given a conditional statement in LOOP, whose basic block is COND_BB,
+   suppose its execution only goes through one of its branch, whose index is
+   specified by BRANCH.  Return TRUE if this statement still executes multiple
+   times in one iteration of LOOP, in that the statement belongs a nested
+   unrecognized loop.  */
+
+static bool
+is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
+			int branch)
+{
+  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
+
+  if (cond_bb == loop->header || branch_bb == loop->latch)
+    return false;
+
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  auto_vec<basic_block> worklist;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    bbs[i]->flags &= ~BB_REACHABLE;
+
+  /* Mark latch basic block as visited so as to terminate reachablility
+     traversal.  */
+  loop->latch->flags |= BB_REACHABLE;
+
+  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
+
+  /* Start from the specified branch, the opposite branch is ignored for it
+     will not be executed.  */
+  branch_bb->flags |= BB_REACHABLE;
+  worklist.safe_push (branch_bb);
+
+  do
+    {
+      basic_block bb = worklist.pop ();
+      edge e;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  basic_block succ_bb = e->dest;
+
+	  if (succ_bb == cond_bb)
+	    return true;
+
+	  if (!flow_bb_inside_loop_p (loop, succ_bb))
+	    continue;
+
+	  if (succ_bb->flags & BB_REACHABLE)
+	    continue;
+
+	  succ_bb->flags |= BB_REACHABLE;
+	  worklist.safe_push (succ_bb);
+	}
+    } while (!worklist.is_empty ());
+
+  return false;
+}
+
+
+/* Calculate increased code size measured by estimated insn number if applying
+   loop split upon certain branch (BRANCH) of a conditional statement whose
+   basic block is COND_BB.  */
+
+static int
+compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
+			 int branch)
+{
+  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch.  */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
+	continue;
+
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
+    }
+
+  return num;
+}
+
+/* Return true if it is eligible and profitable to perform loop split upon
+   a conditional statement COND in LOOP.  */
+
+static bool
+can_split_loop_on_cond (struct loop *loop, gcond *cond)
+{
+  int branch = get_cond_invariant_branch (loop, cond);
+
+  if (branch < 0)
+    return false;
+
+  basic_block cond_bb = gimple_bb (cond);
+  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go.  */
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+	return false;
+    }
+
+  /* Add a threshold for increased code size to disable loop split.  */
+  if (compute_added_num_insns (loop, cond_bb, branch) >
+      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
+    return false;
+
+  /* Skip conditional statement that is inside a nested unrecognized loop.  */
+  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
+    return false;
+
+  /* Temporarily keep branch index in conditional statement.  */
+  gimple_set_plf (cond, GF_PLF_1, branch);
+  return true;
+}
+
+/* Traverse all conditional statements in LOOP, to find out a good candidate
+   upon which we can do loop split.  */
+
+static bool
+mark_cond_to_split_loop (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field.  */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* We only consider conditional statement, which be executed at most once
+	 in each iteration of the loop.  So skip statements in inner loops.  */
+      if (bb->loop_father != loop)
+	continue;
+
+      /* Actually this check is not a must constraint. With it, we can ensure
+	 conditional statement will always be executed in each iteration. */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+	continue;
+
+      gcond *cond = as_a <gcond *> (last);
+
+      if (can_split_loop_on_cond (loop, cond))
+	{
+	  info->cond = cond;
+	  return true;
+	}
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return false;
+}
+
+/* Given a loop (LOOP1) with a chosen conditional statement candidate, perform
+   loop split transformation illustrated as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+       [ bool r = cond; ]     |     |                 |              |
+               |              |     |                 |              |
+      .---- if (r) -----.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              |  /                    |              |
+              / \             | /                    / \             |
+     .-------*   *       [ if (!r) ]        .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out.  In loop1, a new bool temporary (r)
+   is introduced to keep value of the condition result.  In original latch
+   edge of loop1, we insert a new conditional statement whose value comes
+   from previous temporary (r), one of its branch goes back to loop1 header
+   as a latch edge, and the other branch goes to loop2 pre-header as an entry
+   edge.  And also in loop2, we abandon the variant branch of the conditional
+   statement candidate by setting a constant bool condition, based on which
+   branch is semi-invariant.  */
+
+static bool
+split_loop_for_cond (struct loop *loop1)
+{
+  split_info *info = (split_info *) loop1->aux;
+  gcond *cond = info->cond;
+  basic_block cond_bb = gimple_bb (cond);
+  int branch = gimple_plf (cond, GF_PLF_1);
+  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+	      current_function_name (), loop1->num,
+	      true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+				     profile_probability::always (),
+				     profile_probability::never (),
+				     profile_probability::always (),
+				     profile_probability::always (),
+				     true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+				      gimple_cond_code (cond),
+				      gimple_cond_lhs (cond),
+				      gimple_cond_rhs (cond));
+
+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
+  update_stmt (cond);
+
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  /* Replace the condition in loop2 with a bool constant to let PassManager
+     remove the variant branch after current pass completes.  */
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1.  This
+     statement acts as a switch to transfer execution from loop1 to loop2,
+     when loop1 enters into invariant state.  */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
+					  NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
+  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2.  */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Main entry point to perform loop splitting for suitable if-conditions
+   in all loops.  */
+
+static unsigned int
+tree_ssa_split_loops_for_cond (void)
+{
+  struct loop *loop;
+  auto_vec<struct loop *> loop_list;
+  bool changed = false;
+  unsigned i;
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  /* Go through all loops starting from innermost.  */
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      /* Put loop in a list if found a conditional statement candidate in it.
+	 This is stage for analysis, not change anything of the function.  */
+      if (!loop->aux
+	  && !optimize_loop_for_size_p (loop)
+	  && mark_cond_to_split_loop (loop))
+	loop_list.safe_push (loop);
+
+      /* If any of our inner loops was split, don't split us,
+	 and mark our containing loop as having had splits as well.  */
+      loop_outer (loop)->aux = loop->aux;
+    }
+
+  FOR_EACH_VEC_ELT (loop_list, i, loop)
+    {
+      /* Extract selected loop and perform loop split.  This is stage for
+	 transformation.  */
+      changed |= split_loop_for_cond (loop);
+
+      delete (split_info *) loop->aux;
+    }
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  if (changed)
+    return TODO_cleanup_cfg;
+  return 0;
+}
+
+
 /* Loop splitting pass.  */
 
 namespace {
@@ -716,3 +1496,48 @@ make_pass_loop_split (gcc::context *ctxt)
 {
   return new pass_loop_split (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_cond_loop_split =
+{
+  GIMPLE_PASS, /* type */
+  "cond_lsplit", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_COND_LOOP_SPLIT, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_cond_loop_split : public gimple_opt_pass
+{
+public:
+  pass_cond_loop_split (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_split_loops != 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_cond_loop_split
+
+unsigned int
+pass_cond_loop_split::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  return tree_ssa_split_loops_for_cond ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_cond_loop_split (gcc::context *ctxt)
+{
+  return new pass_cond_loop_split (ctxt);
+}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-06-18  7:00       ` Ping: [PATCH V2] " Feng Xue OS
@ 2019-07-15  2:34         ` Feng Xue OS
  2019-07-29 20:30           ` Michael Matz
  2019-09-12 11:10           ` Ping agian: " Richard Biener
  0 siblings, 2 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-07-15  2:34 UTC (permalink / raw)
  To: Richard Biener, Michael Matz; +Cc: gcc-patches

Some time passed, so ping again. I made this patch, because it can reward us with 7%

performance benefit in some real application. For convenience, the optimization to be

implemented was listed in the following again. And hope your comments on the patch, or

design suggestions. Thanks!


Suppose a loop as:

    void f (std::map<int, int> m)
    {
        for (auto it = m.begin (); it != m.end (); ++it) {
            /* if (b) is semi-invariant. */
            if (b) {
                b = do_something();    /* Has effect on b */
            } else {
                                                        /* No effect on b */
            }
            statements;                      /* Also no effect on b */
        }
    }

A transformation, kind of loop split, could be:

    void f (std::map<int, int> m)
    {
        for (auto it = m.begin (); it != m.end (); ++it) {
            if (b) {
                b = do_something();
            } else {
                ++it;
                statements;
                break;
            }
            statements;
        }

        for (; it != m.end (); ++it) {
            statements;
        }
    }

If "statements" contains nothing, the second loop becomes an empty one, which can be removed.
And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop.


Feng

________________________________
From: Feng Xue OS
Sent: Tuesday, June 18, 2019 3:00 PM
To: Richard Biener; Michael Matz
Cc: gcc-patches@gcc.gnu.org
Subject: Ping: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Richard & Michael,

   I made some adjustments on coding style and added test cases for this version.

   Would you please take a look at the patch? It is long a little bit and might steal some
   of your time.

Thanks a lot.

----
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a46f93d89d..2334b184945 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2019-06-18  Feng Xue <fxue@os.amperecomputing.com>
+
+       PR tree-optimization/89134
+       * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
+       (min-cond-loop-split-prob): Likewise.
+       * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
+       * passes.def (pass_cond_loop_split) : New pass.
+       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
+       * tree-pass.h (make_pass_cond_loop_split): New declaration.
+       * tree-ssa-loop-split.c (split_info): New class.
+       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
+       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
+       (branch_removable_p, get_cond_invariant_branch): Likewise.
+       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
+       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
+       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
+       (pass_data_cond_loop_split): New variable.
+       (pass_cond_loop_split): New class.
+       (make_pass_cond_loop_split): New function.
+
 2019-06-18  Kewen Lin  <linkw@gcc.gnu.org>

         PR middle-end/80791
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eaef4cd63d2..0427fede3d6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.

+@item max-cond-loop-split-insns
+The maximum number of insns to be increased due to loop split on
+semi-invariant condition statement.
+
+@item min-cond-loop-split-prob
+The minimum threshold for probability of semi-invaraint condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 0db60951413..5384f7d1c4d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
         "The maximum number of unswitchings in a single loop.",
         3, 0, 0)

+/* The maximum number of increased insns due to loop split on semi-invariant
+   condition statement.  */
+DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
+       "max-cond-loop-split-insns",
+       "The maximum number of insns to be increased due to loop split on "
+       "semi-invariant condition statement.",
+       100, 0, 0)
+
+DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
+       "min-cond-loop-split-prob",
+       "The minimum threshold for probability of semi-invaraint condition "
+       "statement to trigger loop split.",
+       30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
diff --git a/gcc/passes.def b/gcc/passes.def
index ad2efabd385..bb32b88738e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -261,6 +261,7 @@ along with GCC; see the file COPYING3.  If not see
           NEXT_PASS (pass_tree_unswitch);
           NEXT_PASS (pass_scev_cprop);
           NEXT_PASS (pass_loop_split);
+         NEXT_PASS (pass_cond_loop_split);
           NEXT_PASS (pass_loop_versioning);
           NEXT_PASS (pass_loop_jam);
           /* All unswitching, final value replacement and splitting can expose
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 27a522e0140..9aa069e5c29 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2019-06-18  Feng Xue  <fxue@os.amperecomputing.com>
+
+       * gcc.dg/tree-ssa/loop-cond-split-1.c: New test.
+       * g++.dg/tree-ssa/loop-cond-split-1.C: New test.
+
 2019-06-17  Jakub Jelinek  <jakub@redhat.com>

         * gcc.dg/vect/vect-simd-8.c: New test.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
new file mode 100644
index 00000000000..df269c5ee44
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */
+
+#include <string>
+#include <map>
+
+using namespace std;
+
+class  A
+{
+public:
+  bool empty;
+  void set (string s);
+};
+
+class  B
+{
+  map<int, string> m;
+  void f ();
+};
+
+extern A *ga;
+
+void B::f ()
+{
+  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
+    {
+      if (ga->empty)
+        ga->set (iter->second);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
new file mode 100644
index 00000000000..a0eb7a26ad5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */
+
+__attribute__((pure)) __attribute__((noinline)) int inc (int i)
+{
+  return i + 1;
+}
+
+extern int do_something (void);
+extern int b;
+
+void test(int n)
+{
+  int i;
+
+  for (i = 0; i < n; i = inc (i))
+    {
+      if (b)
+        b = do_something();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 13cb470b688..5a2a80a29f7 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -188,6 +188,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
 DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
+DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
 DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3a0b3805d24..cdb7ef3c9f2 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 999c9a30366..7239d0cfb00 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"

-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kind of loop splitting.
+
+   One transformation of loops like:

    for (i = 0; i < 100; i++)
      {
@@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
   return 0;
 }

+
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement.  */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop.  */
+  auto_vec<gimple *> stores;
+
+  /* Whether above memory stores vector has been filled.  */
+  bool set_stores;
+
+  /* Semi-invariant conditional statement, upon which to split loop.  */
+  gcond *cond;
+
+  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+       free (bbs);
+    }
+};
+
+/* Find all statements with memory-write effect in LOOP, including memory
+   store and non-pure function call, and keep those in a vector.  This work
+   is only done one time, for the vector should be constant during analysis
+   stage of semi-invariant condition.  */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled.  */
+  info->set_stores = true;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block.  */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes.  The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first.  */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it.  */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> others;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+        and reversely start the process from the last SSA name towards the
+        first, which ensures that this do-while will not touch SSA names
+        defined outside of the loop.  */
+      gcc_assert (gimple_bb (stmt)
+                 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+       {
+         gphi *phi = as_a <gphi *> (stmt);
+
+         for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+           {
+             tree arg = gimple_phi_arg_def (stmt, i);
+
+             if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+               worklist.safe_push (arg);
+           }
+       }
+      else
+       {
+         tree prev = gimple_vuse (stmt);
+
+         /* Non-pure call statement is conservatively assumed to impact all
+            memory locations.  So place call statements ahead of other memory
+            stores in the vector with an idea of of using them as shortcut
+            terminators to memory alias analysis.  */
+         if (gimple_code (stmt) == GIMPLE_CALL)
+           info->stores.safe_push (stmt);
+         else
+           others.safe_push (stmt);
+
+         if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+           worklist.safe_push (prev);
+       }
+    } while (!worklist.is_empty ());
+
+    info->stores.safe_splice (others);
+}
+
+
+/* Given STMT, memory load or pure call statement, check whether it is impacted
+   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
+   trace is composed of SKIP_HEAD and those basic block dominated by it, always
+   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
+   NULL, all basic blocks of LOOP are checked.  */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+                      const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+
+  /* Collect memory store/clobber statements if have not do that.  */
+  if (!info->set_stores)
+    find_vdef_in_loop (loop);
+
+  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->stores, i, store)
+    {
+      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
+      if (skip_head
+         && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+       continue;
+
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+       return false;
+    }
+
+  return true;
+}
+
+/* Forward declaration.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+                      const_basic_block skip_head);
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
+   certain iteration of LOOP, check whether an SSA name (NAME) remains
+   unchanged in next interation.  We call this characterisic as semi-
+   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
+   basic blocks and control flows in the loop will be considered.  If non-
+   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
+
+static bool
+ssa_semi_invariant_p (struct loop *loop, const tree name,
+                     const_basic_block skip_head)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside a loop is definitely semi-invariant.  */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  if (gimple_code (def) == GIMPLE_PHI)
+    {
+      /* For PHI node that is not in loop header, its source operands should
+        be defined inside the loop, which are seen as loop variant.  */
+      if (def_bb != loop->header || !skip_head)
+       return false;
+
+      const_edge latch = loop_latch_edge (loop);
+      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
+
+      /* A PHI node in loop header contains two source operands, one is
+        initial value, the other is the copy of last iteration through loop
+        latch, we call it latch value.  From the PHI node to definition
+        of latch value, if excluding branch trace from SKIP_HEAD, there
+        is no definition of other version of same variable, SSA name defined
+        by the PHI node is semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+       Suppose in certain iteration, execution flow in above graph goes
+       through true branch, which means that one source value to define
+       x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
+       x_1 in next iterations is defined by x_3, we know that x_1 will
+       never changed if COND always chooses true branch from then on.  */
+
+      while (from != name)
+       {
+         /* A new value comes from a CONSTANT.  */
+         if (TREE_CODE (from) != SSA_NAME)
+           return false;
+
+         gimple *stmt = SSA_NAME_DEF_STMT (from);
+         const_basic_block bb = gimple_bb (stmt);
+
+         /* A new value comes from outside of loop.  */
+         if (!bb || !flow_bb_inside_loop_p (loop, bb))
+           return false;
+
+         from = NULL_TREE;
+
+         if (gimple_code (stmt) == GIMPLE_PHI)
+           {
+             gphi *phi = as_a <gphi *> (stmt);
+
+             for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+               {
+                 const_edge e = gimple_phi_arg_edge (phi, i);
+
+                 /* Not consider redefinitions in excluded basic blocks.  */
+                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+                   {
+                     /* There are more than one source operands that can
+                        provide value to the SSA name, it is variant.  */
+                     if (from)
+                       return false;
+
+                     from = gimple_phi_arg_def (phi, i);
+                   }
+               }
+           }
+         else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+           {
+             /* For simple value copy, check its rhs instead.  */
+             if (gimple_assign_ssa_name_copy_p (stmt))
+               from = gimple_assign_rhs1 (stmt);
+           }
+
+         /* Any other kind of definition is deemed to introduce a new value
+            to the SSA name.  */
+         if (!from)
+           return false;
+       }
+       return true;
+    }
+
+  /* Value originated from volatile memory load or return of normal (non-
+     const/pure) call should not be treated as constant in each iteration.  */
+  if (gimple_has_side_effects (def))
+    return false;
+
+  /* Check if any memory store may kill memory load at this place.  */
+  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
+    return false;
+
+  /* Check operands of definition statement of the SSA name.  */
+  return stmt_semi_invariant_p (loop, def, skip_head);
+}
+
+/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
+   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
+   dominated by it are excluded from the loop.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+                      const_basic_block skip_head)
+{
+  ssa_op_iter iter;
+  tree use;
+
+  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
+     here we only need to check SSA name operands.  This is because check on
+     VARDECL operands, which involve memory loads, must have been done
+     prior to invocation of this function in vuse_semi_invariant_p.  */
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+    {
+      if (!ssa_semi_invariant_p (loop, use, skip_head))
+       return false;
+    }
+
+  return true;
+}
+
+/* Determine when conditional statement never transfers execution to one of its
+   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
+   and those basic blocks dominated by BRANCH_BB.  */
+
+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+       continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+       continue;
+
+       /* The branch can be reached from opposite branch, or from some
+         statement not dominated by the conditional statement.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement (COND) is invariant in the
+   execution context of LOOP.  That is: once the branch is selected in certain
+   iteration of the loop, any operand that contributes to computation of the
+   conditional statement remains unchanged in all following iterations.  */
+
+static int
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+        this conditional statement.  Firstly, it is trivial if the exit branch
+        is semi-invariant, for the statement is just to break loop.  Secondly,
+        if the opposite branch is semi-invariant, it means that the statement
+        is real loop-invariant, which is covered by loop unswitch.  */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+       return -1;
+    }
+
+  invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!branch_removable_p (targ_bb[i]))
+       continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+        loop latch, it and its following trace will only be executed in
+        final iteration of loop, namely it is not part of repeated body
+        of the loop.  Similar to the above case that the branch is loop
+        exit, no need to split loop.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+       continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want.  */
+  if (invar[0] ^ !invar[1])
+    return -1;
+
+  /* Found a real loop-invariant condition, do nothing.  */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return -1;
+
+  return invar[1];
+}
+
+/* Given a conditional statement in LOOP, whose basic block is COND_BB,
+   suppose its execution only goes through one of its branch, whose index is
+   specified by BRANCH.  Return TRUE if this statement still executes multiple
+   times in one iteration of LOOP, in that the statement belongs a nested
+   unrecognized loop.  */
+
+static bool
+is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
+                       int branch)
+{
+  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
+
+  if (cond_bb == loop->header || branch_bb == loop->latch)
+    return false;
+
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  auto_vec<basic_block> worklist;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    bbs[i]->flags &= ~BB_REACHABLE;
+
+  /* Mark latch basic block as visited so as to terminate reachablility
+     traversal.  */
+  loop->latch->flags |= BB_REACHABLE;
+
+  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
+
+  /* Start from the specified branch, the opposite branch is ignored for it
+     will not be executed.  */
+  branch_bb->flags |= BB_REACHABLE;
+  worklist.safe_push (branch_bb);
+
+  do
+    {
+      basic_block bb = worklist.pop ();
+      edge e;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+       {
+         basic_block succ_bb = e->dest;
+
+         if (succ_bb == cond_bb)
+           return true;
+
+         if (!flow_bb_inside_loop_p (loop, succ_bb))
+           continue;
+
+         if (succ_bb->flags & BB_REACHABLE)
+           continue;
+
+         succ_bb->flags |= BB_REACHABLE;
+         worklist.safe_push (succ_bb);
+       }
+    } while (!worklist.is_empty ());
+
+  return false;
+}
+
+
+/* Calculate increased code size measured by estimated insn number if applying
+   loop split upon certain branch (BRANCH) of a conditional statement whose
+   basic block is COND_BB.  */
+
+static int
+compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
+                        int branch)
+{
+  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch.  */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
+       continue;
+
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
+          gsi_next (&gsi))
+       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
+    }
+
+  return num;
+}
+
+/* Return true if it is eligible and profitable to perform loop split upon
+   a conditional statement COND in LOOP.  */
+
+static bool
+can_split_loop_on_cond (struct loop *loop, gcond *cond)
+{
+  int branch = get_cond_invariant_branch (loop, cond);
+
+  if (branch < 0)
+    return false;
+
+  basic_block cond_bb = gimple_bb (cond);
+  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go.  */
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+       return false;
+    }
+
+  /* Add a threshold for increased code size to disable loop split.  */
+  if (compute_added_num_insns (loop, cond_bb, branch) >
+      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
+    return false;
+
+  /* Skip conditional statement that is inside a nested unrecognized loop.  */
+  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
+    return false;
+
+  /* Temporarily keep branch index in conditional statement.  */
+  gimple_set_plf (cond, GF_PLF_1, branch);
+  return true;
+}
+
+/* Traverse all conditional statements in LOOP, to find out a good candidate
+   upon which we can do loop split.  */
+
+static bool
+mark_cond_to_split_loop (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field.  */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* We only consider conditional statement, which be executed at most once
+        in each iteration of the loop.  So skip statements in inner loops.  */
+      if (bb->loop_father != loop)
+       continue;
+
+      /* Actually this check is not a must constraint. With it, we can ensure
+        conditional statement will always be executed in each iteration. */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+       continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+       continue;
+
+      gcond *cond = as_a <gcond *> (last);
+
+      if (can_split_loop_on_cond (loop, cond))
+       {
+         info->cond = cond;
+         return true;
+       }
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return false;
+}
+
+/* Given a loop (LOOP1) with a chosen conditional statement candidate, perform
+   loop split transformation illustrated as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+       [ bool r = cond; ]     |     |                 |              |
+               |              |     |                 |              |
+      .---- if (r) -----.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              |  /                    |              |
+              / \             | /                    / \             |
+     .-------*   *       [ if (!r) ]        .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out.  In loop1, a new bool temporary (r)
+   is introduced to keep value of the condition result.  In original latch
+   edge of loop1, we insert a new conditional statement whose value comes
+   from previous temporary (r), one of its branch goes back to loop1 header
+   as a latch edge, and the other branch goes to loop2 pre-header as an entry
+   edge.  And also in loop2, we abandon the variant branch of the conditional
+   statement candidate by setting a constant bool condition, based on which
+   branch is semi-invariant.  */
+
+static bool
+split_loop_for_cond (struct loop *loop1)
+{
+  split_info *info = (split_info *) loop1->aux;
+  gcond *cond = info->cond;
+  basic_block cond_bb = gimple_bb (cond);
+  int branch = gimple_plf (cond, GF_PLF_1);
+  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+             current_function_name (), loop1->num,
+             true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+                                    profile_probability::always (),
+                                    profile_probability::never (),
+                                    profile_probability::always (),
+                                    profile_probability::always (),
+                                    true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+                                     gimple_cond_code (cond),
+                                     gimple_cond_lhs (cond),
+                                     gimple_cond_rhs (cond));
+
+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
+  update_stmt (cond);
+
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  /* Replace the condition in loop2 with a bool constant to let PassManager
+     remove the variant branch after current pass completes.  */
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1.  This
+     statement acts as a switch to transfer execution from loop1 to loop2,
+     when loop1 enters into invariant state.  */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
+                                         NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
+  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2.  */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Main entry point to perform loop splitting for suitable if-conditions
+   in all loops.  */
+
+static unsigned int
+tree_ssa_split_loops_for_cond (void)
+{
+  struct loop *loop;
+  auto_vec<struct loop *> loop_list;
+  bool changed = false;
+  unsigned i;
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  /* Go through all loops starting from innermost.  */
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      /* Put loop in a list if found a conditional statement candidate in it.
+        This is stage for analysis, not change anything of the function.  */
+      if (!loop->aux
+         && !optimize_loop_for_size_p (loop)
+         && mark_cond_to_split_loop (loop))
+       loop_list.safe_push (loop);
+
+      /* If any of our inner loops was split, don't split us,
+        and mark our containing loop as having had splits as well.  */
+      loop_outer (loop)->aux = loop->aux;
+    }
+
+  FOR_EACH_VEC_ELT (loop_list, i, loop)
+    {
+      /* Extract selected loop and perform loop split.  This is stage for
+        transformation.  */
+      changed |= split_loop_for_cond (loop);
+
+      delete (split_info *) loop->aux;
+    }
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  if (changed)
+    return TODO_cleanup_cfg;
+  return 0;
+}
+
+
 /* Loop splitting pass.  */

 namespace {
@@ -716,3 +1496,48 @@ make_pass_loop_split (gcc::context *ctxt)
 {
   return new pass_loop_split (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_cond_loop_split =
+{
+  GIMPLE_PASS, /* type */
+  "cond_lsplit", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_COND_LOOP_SPLIT, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_cond_loop_split : public gimple_opt_pass
+{
+public:
+  pass_cond_loop_split (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_split_loops != 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_cond_loop_split
+
+unsigned int
+pass_cond_loop_split::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  return tree_ssa_split_loops_for_cond ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_cond_loop_split (gcc::context *ctxt)
+{
+  return new pass_cond_loop_split (ctxt);
+}
--
2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-07-15  2:34         ` Ping agian: " Feng Xue OS
@ 2019-07-29 20:30           ` Michael Matz
  2019-07-31  7:25             ` Feng Xue OS
  2019-09-12 10:21             ` Feng Xue OS
  2019-09-12 11:10           ` Ping agian: " Richard Biener
  1 sibling, 2 replies; 31+ messages in thread
From: Michael Matz @ 2019-07-29 20:30 UTC (permalink / raw)
  To: Feng Xue OS; +Cc: Richard Biener, gcc-patches

Hello Feng,

first, sorry for the terrible delay in reviewing, but here is one now :)

Generally I do like the idea of the transformation, and the basic building 
blocks seem to be sound.  But I dislike it being a separate pass, so 
please integrate the code you have written into the existing loop split 
pass.  Most building blocks can be used as is, except the main driver.

The existing loop-split code uses loop->aux as binary marker and analyses 
and transforms loops in one go, you're doing it separately.  That 
separation makes sense for you, so the existing code should be changed to 
also do that separately.  Some info for the existing loop-split analysis 
needs to be stored in the info struct then as well, which can be done if 
you add some fields in yours.  Some splitting-out of code from the 
existing main driver is probably needed (basically the parts that 
determine eligibility and which cond statement to split).

The two routines that actually split the loops (those that call 
loop_version) also have an overlap, so maybe something more can be 
commonized between the two (ultimately the way of splitting in both 
variants is different, so somewhere they'll do something else, but still 
some parts are common).

So, with these general remarks, some more concrete ones about the patch:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
> 
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.

"to be increased" --> "to be generated" (or "added")

> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.

typo, semi-invariant
I think somewhere in the docs your definition of semi-invariant needs
to be stated in some form (can be short, doesn't need to reproduce the 
diagram or such), so don't just replicate the short info from the 
params.def file.

> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>          "The maximum number of unswitchings in a single loop.",
>          3, 0, 0)
> 
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on "
> +       "semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition "
> +       "statement to trigger loop split.",

Same typo: "semi-invariant".

> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.

kind_s_, plural

> +   One transformation of loops like:
> 
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
> 
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }

You should mention that 'expr' needs to be pure, i.e. once it
becomes false and the inputs don't change, that it remains false.

> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
...
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +       continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +       continue;

My gut feeling is surprised by this.  So one of the predecessors of
branch_bb dominates it.  Why should that indicate that branch_bb
can be safely removed?

Think about something like this:

   esrc --> cond_bb --> branch_bb
    '-------------------^

(cond_bb is the callers bb of the cond statement in question).  Now esrc
dominates branch_bb but still you can't simply remove it, even if
the cond_bb->branch_bb edge becomes unexecutable.

> +       /* The branch can be reached from opposite branch, or from some
> +         statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)

Please return an edge here, not an edge index (which removes the using of 
-1).  I think the name (and comment) should consistently talk about 
semi-invariant, not invariant.  For when you really need an edge index 
later, you could use "EDGE_SUCC(bb, 0) != edge".  But you probably don't 
really need it, e.g. instead of using the gimple pass-local-flag on a 
statement you can just as well also store the edge in your info structure.

> +/* Given a conditional statement in LOOP, whose basic block is COND_BB,
> +   suppose its execution only goes through one of its branch, whose index is
> +   specified by BRANCH.  Return TRUE if this statement still executes multiple
> +   times in one iteration of LOOP, in that the statement belongs a nested
> +   unrecognized loop.  */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                       int branch)

With above change in get_cond_invariant_branch, this also should
take an edge, not a bb+edge-index.

> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH) of a conditional statement whose
> +   basic block is COND_BB.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                        int branch)

This should take an edge as well.

> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +       continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +          gsi_next (&gsi))
> +       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);

Replace the loop by
  estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights);

> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement COND in LOOP.  */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +       return false;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))

Operator should be first on next line, not last on previous line.

> +    return false;
> +
> +  /* Skip conditional statement that is inside a nested unrecognized loop.  */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;

This part (as well as is_cond_in_hidden_loop implementation) had me 
confused for a while, because "unrecognized" loops seems strange.  I think 
I now know what you try to do here, but I wonder if there's an easier way, 
or at least about which situations you stumbled into that made you write 
this code.

> +
> +  /* Temporarily keep branch index in conditional statement.  */
> +  gimple_set_plf (cond, GF_PLF_1, branch);

i.e. here, store the edge in your info structure.

> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops.  */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)

So, from here on the code should be integrated into the existing code
of the file (which might need changes as well for this integration to look 
good).

That's it for now.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-07-29 20:30           ` Michael Matz
@ 2019-07-31  7:25             ` Feng Xue OS
  2019-09-12 10:21             ` Feng Xue OS
  1 sibling, 0 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-07-31  7:25 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Biener, gcc-patches

Thanks for these comments.

Feng

________________________________________
From: Michael Matz <matz@suse.de>
Sent: Tuesday, July 30, 2019 1:59:04 AM
To: Feng Xue OS
Cc: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Hello Feng,

first, sorry for the terrible delay in reviewing, but here is one now :)

Generally I do like the idea of the transformation, and the basic building
blocks seem to be sound.  But I dislike it being a separate pass, so
please integrate the code you have written into the existing loop split
pass.  Most building blocks can be used as is, except the main driver.

The existing loop-split code uses loop->aux as binary marker and analyses
and transforms loops in one go, you're doing it separately.  That
separation makes sense for you, so the existing code should be changed to
also do that separately.  Some info for the existing loop-split analysis
needs to be stored in the info struct then as well, which can be done if
you add some fields in yours.  Some splitting-out of code from the
existing main driver is probably needed (basically the parts that
determine eligibility and which cond statement to split).

The two routines that actually split the loops (those that call
loop_version) also have an overlap, so maybe something more can be
commonized between the two (ultimately the way of splitting in both
variants is different, so somewhere they'll do something else, but still
some parts are common).

So, with these general remarks, some more concrete ones about the patch:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.

"to be increased" --> "to be generated" (or "added")

> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.

typo, semi-invariant
I think somewhere in the docs your definition of semi-invariant needs
to be stated in some form (can be short, doesn't need to reproduce the
diagram or such), so don't just replicate the short info from the
params.def file.

> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>          "The maximum number of unswitchings in a single loop.",
>          3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on "
> +       "semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition "
> +       "statement to trigger loop split.",

Same typo: "semi-invariant".

> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.

kind_s_, plural

> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }

You should mention that 'expr' needs to be pure, i.e. once it
becomes false and the inputs don't change, that it remains false.

> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
...
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +       continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +       continue;

My gut feeling is surprised by this.  So one of the predecessors of
branch_bb dominates it.  Why should that indicate that branch_bb
can be safely removed?

Think about something like this:

   esrc --> cond_bb --> branch_bb
    '-------------------^

(cond_bb is the callers bb of the cond statement in question).  Now esrc
dominates branch_bb but still you can't simply remove it, even if
the cond_bb->branch_bb edge becomes unexecutable.

> +       /* The branch can be reached from opposite branch, or from some
> +         statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)

Please return an edge here, not an edge index (which removes the using of
-1).  I think the name (and comment) should consistently talk about
semi-invariant, not invariant.  For when you really need an edge index
later, you could use "EDGE_SUCC(bb, 0) != edge".  But you probably don't
really need it, e.g. instead of using the gimple pass-local-flag on a
statement you can just as well also store the edge in your info structure.

> +/* Given a conditional statement in LOOP, whose basic block is COND_BB,
> +   suppose its execution only goes through one of its branch, whose index is
> +   specified by BRANCH.  Return TRUE if this statement still executes multiple
> +   times in one iteration of LOOP, in that the statement belongs a nested
> +   unrecognized loop.  */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                       int branch)

With above change in get_cond_invariant_branch, this also should
take an edge, not a bb+edge-index.

> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH) of a conditional statement whose
> +   basic block is COND_BB.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                        int branch)

This should take an edge as well.

> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +       continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +          gsi_next (&gsi))
> +       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);

Replace the loop by
  estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights);

> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement COND in LOOP.  */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +       return false;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))

Operator should be first on next line, not last on previous line.

> +    return false;
> +
> +  /* Skip conditional statement that is inside a nested unrecognized loop.  */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;

This part (as well as is_cond_in_hidden_loop implementation) had me
confused for a while, because "unrecognized" loops seems strange.  I think
I now know what you try to do here, but I wonder if there's an easier way,
or at least about which situations you stumbled into that made you write
this code.

> +
> +  /* Temporarily keep branch index in conditional statement.  */
> +  gimple_set_plf (cond, GF_PLF_1, branch);

i.e. here, store the edge in your info structure.

> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops.  */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)

So, from here on the code should be integrated into the existing code
of the file (which might need changes as well for this integration to look
good).

That's it for now.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-07-29 20:30           ` Michael Matz
  2019-07-31  7:25             ` Feng Xue OS
@ 2019-09-12 10:21             ` Feng Xue OS
  2019-09-12 10:23               ` [PATCH V3] " Feng Xue OS
  2019-10-09  4:42               ` Ping: [PATCH V2] " Feng Xue OS
  1 sibling, 2 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-09-12 10:21 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Biener, gcc-patches

Hi, Michael,

  Since I was involved in other tasks, it is a little bit late to reply you. Sorry
for that. I composed a new one with your suggestions. Please review that
when you are in convenience. 

> Generally I do like the idea of the transformation, and the basic building
> blocks seem to be sound.  But I dislike it being a separate pass, so
> please integrate the code you have written into the existing loop split
> pass.  Most building blocks can be used as is, except the main driver.
This new transformation was integrated into the pass of original loop split.

>> +@item max-cond-loop-split-insns
>> +The maximum number of insns to be increased due to loop split on
>> +semi-invariant condition statement.

> "to be increased" --> "to be generated" (or "added")
Done.

>> +@item min-cond-loop-split-prob
>> +The minimum threshold for probability of semi-invaraint condition
>> +statement to trigger loop split.

> typo, semi-invariant
Done.

> I think somewhere in the docs your definition of semi-invariant needs
> to be stated in some form (can be short, doesn't need to reproduce the
> diagram or such), so don't just replicate the short info from the
> params.def file.
Done.

>> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
>> +       "min-cond-loop-split-prob",
>> +       "The minimum threshold for probability of semi-invaraint condition "
>> +       "statement to trigger loop split.",

> Same typo: "semi-invariant".
Done.

>> -/* This file implements loop splitting, i.e. transformation of loops like
>> +/* This file implements two kind of loop splitting.

> kind_s_, plural
Done.

>> +/* Another transformation of loops like:
>> +
>> +   for (i = INIT (); CHECK (i); i = NEXT ())
>> +     {
>> +       if (expr (a_1, a_2, ..., a_n))
>> +         a_j = ...;  // change at least one a_j
>> +       else
>> +         S;          // not change any a_j
>> +     }

> You should mention that 'expr' needs to be pure, i.e. once it
> becomes false and the inputs don't change, that it remains false.
Done.

>> +static bool
>> +branch_removable_p (basic_block branch_bb)
>> +{
>> +  if (single_pred_p (branch_bb))
>> +    return true;
>> +
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
>> +    {
>> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
>> +       continue;
>> +
>> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
>> +       continue;

> My gut feeling is surprised by this.  So one of the predecessors of
> branch_bb dominates it.  Why should that indicate that branch_bb
> can be safely removed?
>
> Think about something like this:
>
>   esrc --> cond_bb --> branch_bb
>   '-------------------^

If all predecessors of branch_bb dominate it, these predecessors should also
be in dominating relationship among them, and the conditional statement must
be branch_bb's immediate dominator, and branch_bb is removable. In your example.

For "esrc", loop is continued, nothing is impacted. But in the next iteration, we
encounter "cond_bb", it does not dominate "branch_bb", so the function return
false in the following return statement.

> (cond_bb is the callers bb of the cond statement in question).  Now esrc
> dominates branch_bb but still you can't simply remove it, even if
> the cond_bb->branch_bb edge becomes unexecutable.


>> +static int
>> +get_cond_invariant_branch (struct loop *loop, gcond *cond)

> Please return an edge here, not an edge index (which removes the using of
> -1).  I think the name (and comment) should consistently talk about
> semi-invariant, not invariant.  For when you really need an edge index
> later, you could use "EDGE_SUCC(bb, 0) != edge".  But you probably don't
> really need it, e.g. instead of using the gimple pass-local-flag on a
> statement you can just as well also store the edge in your info structure.
Done.

>> +static bool
>> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
>> +                       int branch)

> With above change in get_cond_invariant_branch, this also should
> take an edge, not a bb+edge-index.
Done.

>> +static int
>> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
>> +                        int branch)

> This should take an edge as well.
Done.

>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>> +    {
>> +      /* Do no count basic blocks only in opposite branch.  */
>> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
>> +       continue;
>> +
>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
>> +          gsi_next (&gsi))
>> +       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);

> Replace the loop by
>  estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights);
Done.


>> +  /* Add a threshold for increased code size to disable loop split.  */
>> +  if (compute_added_num_insns (loop, cond_bb, branch) >
>> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))

> Operator should be first on next line, not last on previous line.
Done.

>> +  /* Skip conditional statement that is inside a nested unrecognized loop.  */
>> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
>> +    return false;

> This part (as well as is_cond_in_hidden_loop implementation) had me
> confused for a while, because "unrecognized" loops seems strange.  I think
> I now know what you try to do here, but I wonder if there's an easier way,
> or at least about which situations you stumbled into that made you write
> this code.
Use BB_IRREDUCIBLE_LOOP flag to check that, for tree-loop-init pass 
requires LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS which marks
irreducible loops.

>> +
>> +  /* Temporarily keep branch index in conditional statement.  */
>> +  gimple_set_plf (cond, GF_PLF_1, branch);

> i.e. here, store the edge in your info structure.
Done.

>> +/* Main entry point to perform loop splitting for suitable if-conditions
>> +   in all loops.  */
>> +
>> +static unsigned int
>> +tree_ssa_split_loops_for_cond (void)

> So, from here on the code should be integrated into the existing code
> of the file (which might need changes as well for this integration to look
> good).
Done.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-09-12 10:21             ` Feng Xue OS
@ 2019-09-12 10:23               ` Feng Xue OS
  2019-10-15 16:01                 ` Philipp Tomsich
  2019-10-09  4:42               ` Ping: [PATCH V2] " Feng Xue OS
  1 sibling, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-09-12 10:23 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Biener, gcc-patches

---
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1391a562c35..28981fa1048 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item max-cond-loop-split-insns
+In a loop, if a branch of a conditional statement is selected since certain
+loop iteration, any operand that contributes to computation of the conditional
+expression remains unchanged in all following iterations, the statement is
+semi-invariant, upon which we can do a kind of loop split transformation.
+@option{max-cond-loop-split-insns} controls maximum number of insns to be
+added due to loop split on semi-invariant conditional statement.
+
+@item min-cond-loop-split-prob
+When FDO profile information is available, @option{min-cond-loop-split-prob}
+specifies minimum threshold for probability of semi-invariant condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 13001a7bb2d..12bc8c26c9e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
 	"The maximum number of unswitchings in a single loop.",
 	3, 0, 0)
 
+/* The maximum number of increased insns due to loop split on semi-invariant
+   condition statement.  */
+DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
+	"max-cond-loop-split-insns",
+	"The maximum number of insns to be added due to loop split on "
+	"semi-invariant condition statement.",
+	100, 0, 0)
+
+DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
+	"min-cond-loop-split-prob",
+	"The minimum threshold for probability of semi-invariant condition "
+	"statement to trigger loop split.",
+	30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,

diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
new file mode 100644
index 00000000000..51f9da22fc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+#include <string>
+#include <map>
+
+using namespace std;
+
+class  A
+{
+public:
+  bool empty;
+  void set (string s);
+};
+
+class  B
+{
+  map<int, string> m;
+  void f ();
+};
+
+extern A *ga;
+
+void B::f ()
+{
+  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
+    {
+      if (ga->empty)
+        ga->set (iter->second);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
new file mode 100644
index 00000000000..bbd522d6bcd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+__attribute__((pure)) __attribute__((noinline)) int inc (int i)
+{
+  return i + 1;
+}
+
+extern int do_something (void);
+extern int b;
+
+void test(int n)
+{
+  int i;
+
+  for (i = 0; i < n; i = inc (i))
+    {
+      if (b)
+        b = do_something();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index f5f083384bc..e4a1b6d2019 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
+#include "tree-cfgcleanup.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 
-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kinds of loop splitting.
+
+   One transformation of loops like:
 
    for (i = 0; i < 100; i++)
      {
@@ -612,6 +617,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   return changed;
 }
 
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))  // expr is pure
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement.  */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop.  */
+  auto_vec<gimple *> memory_stores;
+
+  /* Whether above memory stores vector has been filled.  */
+  int need_init;
+
+  split_info () : bbs (NULL),  need_init (true) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+	free (bbs);
+    }
+};
+
+/* Find all statements with memory-write effect in LOOP, including memory
+   store and non-pure function call, and keep those in a vector.  This work
+   is only done one time, for the vector should be constant during analysis
+   stage of semi-invariant condition.  */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled.  */
+  info->need_init = false;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block.  */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes.  The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first.  */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it.  */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> other_stores;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+	 and reversely start the process from the last SSA name towards the
+	 first, which ensures that this do-while will not touch SSA names
+	 defined outside of the loop.  */
+      gcc_assert (gimple_bb (stmt)
+		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+	{
+	  gphi *phi = as_a <gphi *> (stmt);
+
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree arg = gimple_phi_arg_def (stmt, i);
+
+	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+		worklist.safe_push (arg);
+	    }
+	}
+      else
+	{
+	  tree prev = gimple_vuse (stmt);
+
+	  /* Non-pure call statement is conservatively assumed to impact all
+	     memory locations.  So place call statements ahead of other memory
+	     stores in the vector with an idea of of using them as shortcut
+	     terminators to memory alias analysis.  */
+	  if (gimple_code (stmt) == GIMPLE_CALL)
+	    info->memory_stores.safe_push (stmt);
+	  else
+	    other_stores.safe_push (stmt);
+
+	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+	    worklist.safe_push (prev);
+	}
+    } while (!worklist.is_empty ());
+
+    info->memory_stores.safe_splice (other_stores);
+}
+
+
+/* Given STMT, memory load or pure call statement, check whether it is impacted
+   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
+   trace is composed of SKIP_HEAD and those basic block dominated by it, always
+   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
+   NULL, all basic blocks of LOOP are checked.  */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+
+  /* Collect memory store/clobber statements if have not do that.  */
+  if (info->need_init)
+    find_vdef_in_loop (loop);
+
+  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
+    {
+      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
+      if (skip_head
+	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+	continue;
+
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+	return false;
+    }
+
+  return true;
+}
+
+/* Forward declaration.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head);
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
+   certain iteration of LOOP, check whether an SSA name (NAME) remains
+   unchanged in next interation.  We call this characterisic as semi-
+   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
+   basic blocks and control flows in the loop will be considered.  If non-
+   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
+
+static bool
+ssa_semi_invariant_p (struct loop *loop, const tree name,
+		      const_basic_block skip_head)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside a loop is definitely semi-invariant.  */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  if (gimple_code (def) == GIMPLE_PHI)
+    {
+      /* For PHI node that is not in loop header, its source operands should
+	 be defined inside the loop, which are seen as loop variant.  */
+      if (def_bb != loop->header || !skip_head)
+	return false;
+
+      const_edge latch = loop_latch_edge (loop);
+      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
+
+      /* A PHI node in loop header contains two source operands, one is
+	 initial value, the other is the copy of last iteration through loop
+	 latch, we call it latch value.  From the PHI node to definition
+	 of latch value, if excluding branch trace from SKIP_HEAD, there
+	 is no definition of other version of same variable, SSA name defined
+	 by the PHI node is semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+	Suppose in certain iteration, execution flow in above graph goes
+	through true branch, which means that one source value to define
+	x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
+	x_1 in next iterations is defined by x_3, we know that x_1 will
+	never changed if COND always chooses true branch from then on.  */
+
+      while (from != name)
+	{
+	  /* A new value comes from a CONSTANT.  */
+	  if (TREE_CODE (from) != SSA_NAME)
+	    return false;
+
+	  gimple *stmt = SSA_NAME_DEF_STMT (from);
+	  const_basic_block bb = gimple_bb (stmt);
+
+	  /* A new value comes from outside of loop.  */
+	  if (!bb || !flow_bb_inside_loop_p (loop, bb))
+	    return false;
+
+	  from = NULL_TREE;
+
+	  if (gimple_code (stmt) == GIMPLE_PHI)
+	    {
+	      gphi *phi = as_a <gphi *> (stmt);
+
+	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		{
+		  const_edge e = gimple_phi_arg_edge (phi, i);
+
+		  /* Not consider redefinitions in excluded basic blocks.  */
+		  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+		    {
+		      /* There are more than one source operands that can
+			 provide value to the SSA name, it is variant.  */
+		      if (from)
+			return false;
+
+		      from = gimple_phi_arg_def (phi, i);
+		    }
+		}
+	    }
+	  else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+	    {
+	      /* For simple value copy, check its rhs instead.  */
+	      if (gimple_assign_ssa_name_copy_p (stmt))
+		from = gimple_assign_rhs1 (stmt);
+	    }
+
+	  /* Any other kind of definition is deemed to introduce a new value
+	     to the SSA name.  */
+	  if (!from)
+	    return false;
+	}
+	return true;
+    }
+
+  /* Value originated from volatile memory load or return of normal (non-
+     const/pure) call should not be treated as constant in each iteration.  */
+  if (gimple_has_side_effects (def))
+    return false;
+
+  /* Check if any memory store may kill memory load at this place.  */
+  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
+    return false;
+
+  /* Check operands of definition statement of the SSA name.  */
+  return stmt_semi_invariant_p (loop, def, skip_head);
+}
+
+/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
+   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
+   dominated by it are excluded from the loop.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  ssa_op_iter iter;
+  tree use;
+
+  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
+     here we only need to check SSA name operands.  This is because check on
+     VARDECL operands, which involve memory loads, must have been done
+     prior to invocation of this function in vuse_semi_invariant_p.  */
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+    {
+      if (!ssa_semi_invariant_p (loop, use, skip_head))
+	return false;
+    }
+
+  return true;
+}
+
+/* Determine when conditional statement never transfers execution to one of its
+   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
+   and those basic blocks dominated by BRANCH_BB.  */
+
+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+	continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+	continue;
+
+       /* The branch can be reached from opposite branch, or from some
+	  statement not dominated by the conditional statement.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement (COND) is invariant in the
+   execution context of LOOP.  That is: once the branch is selected in certain
+   iteration of the loop, any operand that contributes to computation of the
+   conditional statement remains unchanged in all following iterations.  */
+
+static edge
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+	 this conditional statement.  Firstly, it is trivial if the exit branch
+	 is semi-invariant, for the statement is just to break loop.  Secondly,
+	 if the opposite branch is semi-invariant, it means that the statement
+	 is real loop-invariant, which is covered by loop unswitch.  */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+	return NULL;
+    }
+
+  invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!branch_removable_p (targ_bb[i]))
+	continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+	 loop latch, it and its following trace will only be executed in
+	 final iteration of loop, namely it is not part of repeated body
+	 of the loop.  Similar to the above case that the branch is loop
+	 exit, no need to split loop.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+	continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want.  */
+  if (invar[0] ^ !invar[1])
+    return NULL;
+
+  /* Found a real loop-invariant condition, do nothing.  */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return NULL;
+
+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
+}
+
+/* Calculate increased code size measured by estimated insn number if applying
+   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
+
+static int
+compute_added_num_insns (struct loop *loop, const_edge branch_edge)
+{
+  basic_block cond_bb = branch_edge->src;
+  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
+  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch.  */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
+	continue;
+
+      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
+    }
+
+  /* It is unnecessary to evaluate expression of the conditional statement
+     in new loop that contains only invariant branch.  This expresion should
+     be constant value (either true or false).  Exclude code size of insns
+     that contribute to computation of the expression.  */
+
+  auto_vec<gimple *> worklist;
+  hash_set<gimple *> removed;
+  gimple *stmt = last_stmt (cond_bb);
+
+  worklist.safe_push (stmt);
+  removed.add (stmt);
+  num -= estimate_num_insns (stmt, &eni_size_weights);
+
+  do
+    {
+      ssa_op_iter opnd_iter;
+      use_operand_p opnd_p;
+
+      stmt = worklist.pop ();
+      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
+	{
+	  tree opnd = USE_FROM_PTR (opnd_p);
+
+	  if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
+	    continue;
+
+	  gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
+	  use_operand_p use_p;
+	  imm_use_iterator use_iter;
+
+	  if (removed.contains (opnd_stmt)
+	      || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
+	    continue;
+
+	  FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
+	    {
+              gimple *use_stmt = USE_STMT (use_p);
+
+	      if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
+		{
+		  opnd_stmt = NULL;
+		  break;
+		}
+	    }
+
+	  if (opnd_stmt)
+	    {
+	      worklist.safe_push (opnd_stmt);
+	      removed.add (opnd_stmt);
+	      num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
+	    }
+	}
+    } while (!worklist.is_empty ());
+
+  gcc_assert (num >= 0);
+  return num;
+}
+
+/* Find out loop-invariant branch of a conditional statement (COND) if it has,
+   and check whether it is eligible and profitable to perform loop split upon
+   this branch in LOOP.  */
+
+static edge
+get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
+{
+  edge invar_branch = get_cond_invariant_branch (loop, cond);
+
+  if (!invar_branch)
+    return NULL;
+
+  profile_probability prob = invar_branch->probability;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go.  */
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+	return NULL;
+    }
+
+  /* Add a threshold for increased code size to disable loop split.  */
+  if (compute_added_num_insns (loop, invar_branch)
+      > PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
+    return NULL;
+
+  return invar_branch;
+}
+
+/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
+   conditional statement, perform loop split transformation illustrated
+   as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+       [ bool r = cond; ]     |     |                 |              |
+               |              |     |                 |              |
+      .---- if (r) -----.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              |  /                    |              |
+              / \             | /                    / \             |
+     .-------*   *       [ if (!r) ]        .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out.  In loop1, a new bool temporary (r)
+   is introduced to keep value of the condition result.  In original latch
+   edge of loop1, we insert a new conditional statement whose value comes
+   from previous temporary (r), one of its branch goes back to loop1 header
+   as a latch edge, and the other branch goes to loop2 pre-header as an entry
+   edge.  And also in loop2, we abandon the variant branch of the conditional
+   statement candidate by setting a constant bool condition, based on which
+   branch is semi-invariant.  */
+
+static bool
+do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
+{
+  basic_block cond_bb = invar_branch->src;
+  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
+  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
+
+  gcc_assert (cond_bb->loop_father == loop1);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+	      current_function_name (), loop1->num,
+	      true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+				     profile_probability::always (),
+				     profile_probability::never (),
+				     profile_probability::always (),
+				     profile_probability::always (),
+				     true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+				      gimple_cond_code (cond),
+				      gimple_cond_lhs (cond),
+				      gimple_cond_rhs (cond));
+
+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
+  update_stmt (cond);
+
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  /* Replace the condition in loop2 with a bool constant to let PassManager
+     remove the variant branch after current pass completes.  */
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1.  This
+     statement acts as a switch to transfer execution from loop1 to loop2,
+     when loop1 enters into invariant state.  */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
+					  NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
+  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2.  */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Traverse all conditional statements in LOOP, to find out a good candidate
+   upon which we can do loop split.  */
+
+static bool
+split_loop_on_cond (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+  bool do_split = false;
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field.  */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* We only consider conditional statement, which be executed at most once
+	 in each iteration of the loop.  So skip statements in inner loops.  */
+      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
+	continue;
+
+      /* Actually this check is not a must constraint. With it, we can ensure
+	 conditional statement will always be executed in each iteration. */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+	continue;
+
+      gcond *cond = as_a <gcond *> (last);
+      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
+
+      if (branch_edge)
+	{
+	  do_split_loop_on_cond (loop, branch_edge);
+	  do_split = true;
+	  break;
+	}
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return do_split;
+}
+
 /* Main entry point.  Perform loop splitting on all suitable loops.  */
 
 static unsigned int
@@ -662,6 +1383,32 @@ tree_ssa_split_loops (void)
 	}
     }
 
+  if (changed)
+    {
+      cleanup_tree_cfg ();
+      changed = false;
+    }
+
+  /* Perform loop splitting for suitable if-conditions in all loops.  */
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+    loop->aux = NULL;
+
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      if (loop->aux)
+        {
+	  loop_outer (loop)->aux = loop;
+	  continue;
+	}
+
+      if (!optimize_loop_for_size_p (loop)
+	  && split_loop_on_cond (loop))
+	{
+	  loop_outer (loop)->aux = loop;
+	  changed = true;
+	}
+    }
+
   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
     loop->aux = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-07-15  2:34         ` Ping agian: " Feng Xue OS
  2019-07-29 20:30           ` Michael Matz
@ 2019-09-12 11:10           ` Richard Biener
  2019-09-12 13:52             ` Feng Xue OS
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Biener @ 2019-09-12 11:10 UTC (permalink / raw)
  To: Feng Xue OS; +Cc: Michael Matz, gcc-patches

On Mon, Jul 15, 2019 at 4:20 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> Some time passed, so ping again. I made this patch, because it can reward us with 7%
>
> performance benefit in some real application. For convenience, the optimization to be
>
> implemented was listed in the following again. And hope your comments on the patch, or
>
> design suggestions. Thanks!

Replying again to the very first post since it contains the figure below.

> Suppose a loop as:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             /* if (b) is semi-invariant. */
>             if (b) {
>                 b = do_something();    /* Has effect on b */
>             } else {
>                                                         /* No effect on b */
>             }
>             statements;                      /* Also no effect on b */
>         }
>     }
>
> A transformation, kind of loop split, could be:
>
>     void f (std::map<int, int> m)
>     {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>             if (b) {
>                 b = do_something();
>             } else {
>                 ++it;
>                 statements;
>                 break;
>             }
>             statements;
>         }
>
>         for (; it != m.end (); ++it) {
>             statements;
>         }
>     }

So if you consider approaching this from unswitching instead we'd
unswitch it on if (b) but
treat the condition as constant only in the 'false' path, thus the
transformed code would
look like the following.  I believe implementing this in the existing
unswitching pass
involves a lot less code than putting it into the splitting pass but
it would catch
exactly the same cases?

  if (b)
   {
        for (auto it = m.begin (); it != m.end (); ++it) {
             /* if (b) is non-invariant. */
            if (b) {
                b = do_something();    /* Has effect on b */
             } else {
                                                        /* No effect on b */
            }
            statements;                      /* Also no effect on b */
        }
    }
  else
    {
          for (auto it = m.begin (); it != m.end (); ++it) {
             /* if (b) is invariant. */
             if (false) {
                 b = do_something();    /* Has effect on b */
             } else {
                                                         /* No effect on b */
             }
             statements;                      /* Also no effect on b */
         }
    }


> If "statements" contains nothing, the second loop becomes an empty one, which can be removed.
> And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop.
>
> Feng
>
>
> ________________________________
> From: Feng Xue OS
> Sent: Tuesday, June 18, 2019 3:00 PM
> To: Richard Biener; Michael Matz
> Cc: gcc-patches@gcc.gnu.org
> Subject: Ping: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
>
> Richard & Michael,
>
>    I made some adjustments on coding style and added test cases for this version.
>
>    Would you please take a look at the patch? It is long a little bit and might steal some
>    of your time.
>
> Thanks a lot.
>
> ----
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9a46f93d89d..2334b184945 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2019-06-18  Feng Xue <fxue@os.amperecomputing.com>
> +
> +       PR tree-optimization/89134
> +       * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> +       (min-cond-loop-split-prob): Likewise.
> +       * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> +       * passes.def (pass_cond_loop_split) : New pass.
> +       * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> +       * tree-pass.h (make_pass_cond_loop_split): New declaration.
> +       * tree-ssa-loop-split.c (split_info): New class.
> +       (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> +       (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> +       (branch_removable_p, get_cond_invariant_branch): Likewise.
> +       (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> +       (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> +       (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> +       (pass_data_cond_loop_split): New variable.
> +       (pass_cond_loop_split): New class.
> +       (make_pass_cond_loop_split): New function.
> +
>  2019-06-18  Kewen Lin  <linkw@gcc.gnu.org>
>
>          PR middle-end/80791
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.
> +
> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>          "The maximum number of unswitchings in a single loop.",
>          3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +       "max-cond-loop-split-insns",
> +       "The maximum number of insns to be increased due to loop split on "
> +       "semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +       "min-cond-loop-split-prob",
> +       "The minimum threshold for probability of semi-invaraint condition "
> +       "statement to trigger loop split.",
> +       30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
>     headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> diff --git a/gcc/passes.def b/gcc/passes.def
> index ad2efabd385..bb32b88738e 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -261,6 +261,7 @@ along with GCC; see the file COPYING3.  If not see
>            NEXT_PASS (pass_tree_unswitch);
>            NEXT_PASS (pass_scev_cprop);
>            NEXT_PASS (pass_loop_split);
> +         NEXT_PASS (pass_cond_loop_split);
>            NEXT_PASS (pass_loop_versioning);
>            NEXT_PASS (pass_loop_jam);
>            /* All unswitching, final value replacement and splitting can expose
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 27a522e0140..9aa069e5c29 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2019-06-18  Feng Xue  <fxue@os.amperecomputing.com>
> +
> +       * gcc.dg/tree-ssa/loop-cond-split-1.c: New test.
> +       * g++.dg/tree-ssa/loop-cond-split-1.C: New test.
> +
>  2019-06-17  Jakub Jelinek  <jakub@redhat.com>
>
>          * gcc.dg/vect/vect-simd-8.c: New test.
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 00000000000..df269c5ee44
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */
> +
> +#include <string>
> +#include <map>
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map<int, string> m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> +    {
> +      if (ga->empty)
> +        ga->set (iter->second);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 00000000000..a0eb7a26ad5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +    {
> +      if (b)
> +        b = do_something();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 13cb470b688..5a2a80a29f7 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -188,6 +188,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
>  DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
>  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
>  DEFTIMEVAR (TV_LOOP_SPLIT            , "loop splitting")
> +DEFTIMEVAR (TV_COND_LOOP_SPLIT       , "loop splitting for conditions")
>  DEFTIMEVAR (TV_LOOP_JAM              , "unroll and jam")
>  DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
>  DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 3a0b3805d24..cdb7ef3c9f2 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "tree-inline.h"
>  #include "cfgloop.h"
> +#include "params.h"
>  #include "tree-scalar-evolution.h"
>  #include "gimple-iterator.h"
>  #include "gimple-pretty-print.h"
> @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.
> +
> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>    return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement.  */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop.  */
> +  auto_vec<gimple *> stores;
> +
> +  /* Whether above memory stores vector has been filled.  */
> +  bool set_stores;
> +
> +  /* Semi-invariant conditional statement, upon which to split loop.  */
> +  gcond *cond;
> +
> +  split_info () : bbs (NULL),  set_stores (false), cond (NULL) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +       free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in LOOP, including memory
> +   store and non-pure function call, and keep those in a vector.  This work
> +   is only done one time, for the vector should be constant during analysis
> +   stage of semi-invariant condition.  */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled.  */
> +  info->set_stores = true;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block.  */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first.  */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it.  */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> others;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +        and reversely start the process from the last SSA name towards the
> +        first, which ensures that this do-while will not touch SSA names
> +        defined outside of the loop.  */
> +      gcc_assert (gimple_bb (stmt)
> +                 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +       {
> +         gphi *phi = as_a <gphi *> (stmt);
> +
> +         for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +           {
> +             tree arg = gimple_phi_arg_def (stmt, i);
> +
> +             if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +               worklist.safe_push (arg);
> +           }
> +       }
> +      else
> +       {
> +         tree prev = gimple_vuse (stmt);
> +
> +         /* Non-pure call statement is conservatively assumed to impact all
> +            memory locations.  So place call statements ahead of other memory
> +            stores in the vector with an idea of of using them as shortcut
> +            terminators to memory alias analysis.  */
> +         if (gimple_code (stmt) == GIMPLE_CALL)
> +           info->stores.safe_push (stmt);
> +         else
> +           others.safe_push (stmt);
> +
> +         if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +           worklist.safe_push (prev);
> +       }
> +    } while (!worklist.is_empty ());
> +
> +    info->stores.safe_splice (others);
> +}
> +
> +
> +/* Given STMT, memory load or pure call statement, check whether it is impacted
> +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> +   NULL, all basic blocks of LOOP are checked.  */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that.  */
> +  if (!info->set_stores)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->stores, i, store)
> +    {
> +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> +      if (skip_head
> +         && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +       continue;
> +
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> +   unchanged in next interation.  We call this characterisic as semi-
> +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> +   basic blocks and control flows in the loop will be considered.  If non-
> +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                     const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* For PHI node that is not in loop header, its source operands should
> +        be defined inside the loop, which are seen as loop variant.  */
> +      if (def_bb != loop->header || !skip_head)
> +       return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header contains two source operands, one is
> +        initial value, the other is the copy of last iteration through loop
> +        latch, we call it latch value.  From the PHI node to definition
> +        of latch value, if excluding branch trace from SKIP_HEAD, there
> +        is no definition of other version of same variable, SSA name defined
> +        by the PHI node is semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +       Suppose in certain iteration, execution flow in above graph goes
> +       through true branch, which means that one source value to define
> +       x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +       x_1 in next iterations is defined by x_3, we know that x_1 will
> +       never changed if COND always chooses true branch from then on.  */
> +
> +      while (from != name)
> +       {
> +         /* A new value comes from a CONSTANT.  */
> +         if (TREE_CODE (from) != SSA_NAME)
> +           return false;
> +
> +         gimple *stmt = SSA_NAME_DEF_STMT (from);
> +         const_basic_block bb = gimple_bb (stmt);
> +
> +         /* A new value comes from outside of loop.  */
> +         if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +           return false;
> +
> +         from = NULL_TREE;
> +
> +         if (gimple_code (stmt) == GIMPLE_PHI)
> +           {
> +             gphi *phi = as_a <gphi *> (stmt);
> +
> +             for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +               {
> +                 const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +                 /* Not consider redefinitions in excluded basic blocks.  */
> +                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                   {
> +                     /* There are more than one source operands that can
> +                        provide value to the SSA name, it is variant.  */
> +                     if (from)
> +                       return false;
> +
> +                     from = gimple_phi_arg_def (phi, i);
> +                   }
> +               }
> +           }
> +         else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +           {
> +             /* For simple value copy, check its rhs instead.  */
> +             if (gimple_assign_ssa_name_copy_p (stmt))
> +               from = gimple_assign_rhs1 (stmt);
> +           }
> +
> +         /* Any other kind of definition is deemed to introduce a new value
> +            to the SSA name.  */
> +         if (!from)
> +           return false;
> +       }
> +       return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration.  */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place.  */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name.  */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> +   dominated by it are excluded from the loop.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands.  This is because check on
> +     VARDECL operands, which involve memory loads, must have been done
> +     prior to invocation of this function in vuse_semi_invariant_p.  */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +       continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +       continue;
> +
> +       /* The branch can be reached from opposite branch, or from some
> +         statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static int
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +        this conditional statement.  Firstly, it is trivial if the exit branch
> +        is semi-invariant, for the statement is just to break loop.  Secondly,
> +        if the opposite branch is semi-invariant, it means that the statement
> +        is real loop-invariant, which is covered by loop unswitch.  */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +       return -1;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!branch_removable_p (targ_bb[i]))
> +       continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +        loop latch, it and its following trace will only be executed in
> +        final iteration of loop, namely it is not part of repeated body
> +        of the loop.  Similar to the above case that the branch is loop
> +        exit, no need to split loop.  */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +       continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want.  */
> +  if (invar[0] ^ !invar[1])
> +    return -1;
> +
> +  /* Found a real loop-invariant condition, do nothing.  */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return -1;
> +
> +  return invar[1];
> +}
> +
> +/* Given a conditional statement in LOOP, whose basic block is COND_BB,
> +   suppose its execution only goes through one of its branch, whose index is
> +   specified by BRANCH.  Return TRUE if this statement still executes multiple
> +   times in one iteration of LOOP, in that the statement belongs a nested
> +   unrecognized loop.  */
> +
> +static bool
> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
> +                       int branch)
> +{
> +  basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest;
> +
> +  if (cond_bb == loop->header || branch_bb == loop->latch)
> +    return false;
> +
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  auto_vec<basic_block> worklist;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    bbs[i]->flags &= ~BB_REACHABLE;
> +
> +  /* Mark latch basic block as visited so as to terminate reachablility
> +     traversal.  */
> +  loop->latch->flags |= BB_REACHABLE;
> +
> +  gcc_assert (flow_bb_inside_loop_p (loop, branch_bb));
> +
> +  /* Start from the specified branch, the opposite branch is ignored for it
> +     will not be executed.  */
> +  branch_bb->flags |= BB_REACHABLE;
> +  worklist.safe_push (branch_bb);
> +
> +  do
> +    {
> +      basic_block bb = worklist.pop ();
> +      edge e;
> +      edge_iterator ei;
> +
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +       {
> +         basic_block succ_bb = e->dest;
> +
> +         if (succ_bb == cond_bb)
> +           return true;
> +
> +         if (!flow_bb_inside_loop_p (loop, succ_bb))
> +           continue;
> +
> +         if (succ_bb->flags & BB_REACHABLE)
> +           continue;
> +
> +         succ_bb->flags |= BB_REACHABLE;
> +         worklist.safe_push (succ_bb);
> +       }
> +    } while (!worklist.is_empty ());
> +
> +  return false;
> +}
> +
> +
> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH) of a conditional statement whose
> +   basic block is COND_BB.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
> +                        int branch)
> +{
> +  const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
> +       continue;
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
> +          gsi_next (&gsi))
> +       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
> +    }
> +
> +  return num;
> +}
> +
> +/* Return true if it is eligible and profitable to perform loop split upon
> +   a conditional statement COND in LOOP.  */
> +
> +static bool
> +can_split_loop_on_cond (struct loop *loop, gcond *cond)
> +{
> +  int branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (branch < 0)
> +    return false;
> +
> +  basic_block cond_bb = gimple_bb (cond);
> +  profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +       return false;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, cond_bb, branch) >
> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> +    return false;
> +
> +  /* Skip conditional statement that is inside a nested unrecognized loop.  */
> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
> +    return false;
> +
> +  /* Temporarily keep branch index in conditional statement.  */
> +  gimple_set_plf (cond, GF_PLF_1, branch);
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in LOOP, to find out a good candidate
> +   upon which we can do loop split.  */
> +
> +static bool
> +mark_cond_to_split_loop (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field.  */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* We only consider conditional statement, which be executed at most once
> +        in each iteration of the loop.  So skip statements in inner loops.  */
> +      if (bb->loop_father != loop)
> +       continue;
> +
> +      /* Actually this check is not a must constraint. With it, we can ensure
> +        conditional statement will always be executed in each iteration. */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +       continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +       continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +
> +      if (can_split_loop_on_cond (loop, cond))
> +       {
> +         info->cond = cond;
> +         return true;
> +       }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return false;
> +}
> +
> +/* Given a loop (LOOP1) with a chosen conditional statement candidate, perform
> +   loop split transformation illustrated as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out.  In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result.  In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> +   edge.  And also in loop2, we abandon the variant branch of the conditional
> +   statement candidate by setting a constant bool condition, based on which
> +   branch is semi-invariant.  */
> +
> +static bool
> +split_loop_for_cond (struct loop *loop1)
> +{
> +  split_info *info = (split_info *) loop1->aux;
> +  gcond *cond = info->cond;
> +  basic_block cond_bb = gimple_bb (cond);
> +  int branch = gimple_plf (cond, GF_PLF_1);
> +  bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +             current_function_name (), loop1->num,
> +             true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                    profile_probability::always (),
> +                                    profile_probability::never (),
> +                                    profile_probability::always (),
> +                                    profile_probability::always (),
> +                                    true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition.  */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                     gimple_cond_code (cond),
> +                                     gimple_cond_lhs (cond),
> +                                     gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  /* Replace the condition in loop2 with a bool constant to let PassManager
> +     remove the variant branch after current pass completes.  */
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1.  This
> +     statement acts as a switch to transfer execution from loop1 to loop2,
> +     when loop1 enters into invariant state.  */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                         NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2.  */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Main entry point to perform loop splitting for suitable if-conditions
> +   in all loops.  */
> +
> +static unsigned int
> +tree_ssa_split_loops_for_cond (void)
> +{
> +  struct loop *loop;
> +  auto_vec<struct loop *> loop_list;
> +  bool changed = false;
> +  unsigned i;
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  /* Go through all loops starting from innermost.  */
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      /* Put loop in a list if found a conditional statement candidate in it.
> +        This is stage for analysis, not change anything of the function.  */
> +      if (!loop->aux
> +         && !optimize_loop_for_size_p (loop)
> +         && mark_cond_to_split_loop (loop))
> +       loop_list.safe_push (loop);
> +
> +      /* If any of our inner loops was split, don't split us,
> +        and mark our containing loop as having had splits as well.  */
> +      loop_outer (loop)->aux = loop->aux;
> +    }
> +
> +  FOR_EACH_VEC_ELT (loop_list, i, loop)
> +    {
> +      /* Extract selected loop and perform loop split.  This is stage for
> +        transformation.  */
> +      changed |= split_loop_for_cond (loop);
> +
> +      delete (split_info *) loop->aux;
> +    }
> +
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  if (changed)
> +    return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +
>  /* Loop splitting pass.  */
>
>  namespace {
> @@ -716,3 +1496,48 @@ make_pass_loop_split (gcc::context *ctxt)
>  {
>    return new pass_loop_split (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_cond_loop_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "cond_lsplit", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_COND_LOOP_SPLIT, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_cond_loop_split : public gimple_opt_pass
> +{
> +public:
> +  pass_cond_loop_split (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_cond_loop_split, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return flag_split_loops != 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_cond_loop_split
> +
> +unsigned int
> +pass_cond_loop_split::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  return tree_ssa_split_loops_for_cond ();
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_cond_loop_split (gcc::context *ctxt)
> +{
> +  return new pass_cond_loop_split (ctxt);
> +}
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-09-12 11:10           ` Ping agian: " Richard Biener
@ 2019-09-12 13:52             ` Feng Xue OS
  0 siblings, 0 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-09-12 13:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Matz, gcc-patches

>> Suppose a loop as:
>>
>>     void f (std::map<int, int> m)
>>     {
>>         for (auto it = m.begin (); it != m.end (); ++it) {
>>             /* if (b) is semi-invariant. */
>>             if (b) {
>>                 b = do_something();    /* Has effect on b */
>>             } else {
>>                                                         /* No effect on b */
>>             }
>>             statements;                      /* Also no effect on b */
>>         }
>>     }
>>
>> A transformation, kind of loop split, could be:
>>
>>     void f (std::map<int, int> m)
>>     {
>>         for (auto it = m.begin (); it != m.end (); ++it) {
>>             if (b) {
>>                 b = do_something();
>>             } else {
>>                 ++it;
>>                 statements;
>>                 break;
>>             }
>>             statements;
>>         }
>>
>>         for (; it != m.end (); ++it) {
>>             statements;
>>         }
>>     }

> So if you consider approaching this from unswitching instead we'd
> unswitch it on if (b) but
> treat the condition as constant only in the 'false' path, thus the
> transformed code would
> look like the following.  I believe implementing this in the existing
> unswitching pass
> involves a lot less code than putting it into the splitting pass but
> it would catch
> exactly the same cases?

May not.

Firstly, the following transformation is legal only when "b" is
semi-invariant, which means once a branch of "if (b)" is selected since
certain iteration, execution will always go to that branch in all following
iterations. Most of code in this loop-split patch was composed to check 
semi-invariantness of a conditional expression, which must also be needed
in loop-unswitch solution.

Secondly, to duplicate/move an invariant expression out of loop is simple,
for all intermediate computations should already be outside, only need to 
handle its result SSA that is inside the loop. But for semi-invariant, we
have to duplicate a tree of statements that contributes to computation of
the condition expression, similar to code hoisting, which make us write 
extra code. 

And loop-unswitch solution is weaker than loop-split. Suppose initial
value of "b" is true, and is changed to false after some iterations, 
not only unswitch does not help that, but also it introduces extra cost
due to enclosing "if (b)".

>   if (b)
>    {
>         for (auto it = m.begin (); it != m.end (); ++it) {
>              /* if (b) is non-invariant. */
>             if (b) {
>                 b = do_something();    /* Has effect on b */
>              } else {
>                                                         /* No effect on b */
>             }
>             statements;                      /* Also no effect on b */
>         }
>     }
>   else
>     {
>           for (auto it = m.begin (); it != m.end (); ++it) {
>              /* if (b) is invariant. */
>              if (false) {
>                  b = do_something();    /* Has effect on b */
>              } else {
>                                                          /* No effect on b */
>              }
>              statements;                      /* Also no effect on b */
>          }
>     }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Ping: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-09-12 10:21             ` Feng Xue OS
  2019-09-12 10:23               ` [PATCH V3] " Feng Xue OS
@ 2019-10-09  4:42               ` Feng Xue OS
  1 sibling, 0 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-10-09  4:42 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Biener, gcc-patches

Hi, Michael,

   Would you please take a look at this modified version?

Thanks,
Feng

________________________________________
From: Feng Xue OS <fxue@os.amperecomputing.com>
Sent: Thursday, September 12, 2019 6:21 PM
To: Michael Matz
Cc: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Hi, Michael,

  Since I was involved in other tasks, it is a little bit late to reply you. Sorry
for that. I composed a new one with your suggestions. Please review that
when you are in convenience.

> Generally I do like the idea of the transformation, and the basic building
> blocks seem to be sound.  But I dislike it being a separate pass, so
> please integrate the code you have written into the existing loop split
> pass.  Most building blocks can be used as is, except the main driver.
This new transformation was integrated into the pass of original loop split.

>> +@item max-cond-loop-split-insns
>> +The maximum number of insns to be increased due to loop split on
>> +semi-invariant condition statement.

> "to be increased" --> "to be generated" (or "added")
Done.

>> +@item min-cond-loop-split-prob
>> +The minimum threshold for probability of semi-invaraint condition
>> +statement to trigger loop split.

> typo, semi-invariant
Done.

> I think somewhere in the docs your definition of semi-invariant needs
> to be stated in some form (can be short, doesn't need to reproduce the
> diagram or such), so don't just replicate the short info from the
> params.def file.
Done.

>> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
>> +       "min-cond-loop-split-prob",
>> +       "The minimum threshold for probability of semi-invaraint condition "
>> +       "statement to trigger loop split.",

> Same typo: "semi-invariant".
Done.

>> -/* This file implements loop splitting, i.e. transformation of loops like
>> +/* This file implements two kind of loop splitting.

> kind_s_, plural
Done.

>> +/* Another transformation of loops like:
>> +
>> +   for (i = INIT (); CHECK (i); i = NEXT ())
>> +     {
>> +       if (expr (a_1, a_2, ..., a_n))
>> +         a_j = ...;  // change at least one a_j
>> +       else
>> +         S;          // not change any a_j
>> +     }

> You should mention that 'expr' needs to be pure, i.e. once it
> becomes false and the inputs don't change, that it remains false.
Done.

>> +static bool
>> +branch_removable_p (basic_block branch_bb)
>> +{
>> +  if (single_pred_p (branch_bb))
>> +    return true;
>> +
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
>> +    {
>> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
>> +       continue;
>> +
>> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
>> +       continue;

> My gut feeling is surprised by this.  So one of the predecessors of
> branch_bb dominates it.  Why should that indicate that branch_bb
> can be safely removed?
>
> Think about something like this:
>
>   esrc --> cond_bb --> branch_bb
>   '-------------------^

If all predecessors of branch_bb dominate it, these predecessors should also
be in dominating relationship among them, and the conditional statement must
be branch_bb's immediate dominator, and branch_bb is removable. In your example.

For "esrc", loop is continued, nothing is impacted. But in the next iteration, we
encounter "cond_bb", it does not dominate "branch_bb", so the function return
false in the following return statement.

> (cond_bb is the callers bb of the cond statement in question).  Now esrc
> dominates branch_bb but still you can't simply remove it, even if
> the cond_bb->branch_bb edge becomes unexecutable.


>> +static int
>> +get_cond_invariant_branch (struct loop *loop, gcond *cond)

> Please return an edge here, not an edge index (which removes the using of
> -1).  I think the name (and comment) should consistently talk about
> semi-invariant, not invariant.  For when you really need an edge index
> later, you could use "EDGE_SUCC(bb, 0) != edge".  But you probably don't
> really need it, e.g. instead of using the gimple pass-local-flag on a
> statement you can just as well also store the edge in your info structure.
Done.

>> +static bool
>> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
>> +                       int branch)

> With above change in get_cond_invariant_branch, this also should
> take an edge, not a bb+edge-index.
Done.

>> +static int
>> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
>> +                        int branch)

> This should take an edge as well.
Done.

>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>> +    {
>> +      /* Do no count basic blocks only in opposite branch.  */
>> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
>> +       continue;
>> +
>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);
>> +          gsi_next (&gsi))
>> +       num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);

> Replace the loop by
>  estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights);
Done.


>> +  /* Add a threshold for increased code size to disable loop split.  */
>> +  if (compute_added_num_insns (loop, cond_bb, branch) >
>> +      PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))

> Operator should be first on next line, not last on previous line.
Done.

>> +  /* Skip conditional statement that is inside a nested unrecognized loop.  */
>> +  if (is_cond_in_hidden_loop (loop, cond_bb, branch))
>> +    return false;

> This part (as well as is_cond_in_hidden_loop implementation) had me
> confused for a while, because "unrecognized" loops seems strange.  I think
> I now know what you try to do here, but I wonder if there's an easier way,
> or at least about which situations you stumbled into that made you write
> this code.
Use BB_IRREDUCIBLE_LOOP flag to check that, for tree-loop-init pass
requires LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS which marks
irreducible loops.

>> +
>> +  /* Temporarily keep branch index in conditional statement.  */
>> +  gimple_set_plf (cond, GF_PLF_1, branch);

> i.e. here, store the edge in your info structure.
Done.

>> +/* Main entry point to perform loop splitting for suitable if-conditions
>> +   in all loops.  */
>> +
>> +static unsigned int
>> +tree_ssa_split_loops_for_cond (void)

> So, from here on the code should be integrated into the existing code
> of the file (which might need changes as well for this integration to look
> good).
Done.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-09-12 10:23               ` [PATCH V3] " Feng Xue OS
@ 2019-10-15 16:01                 ` Philipp Tomsich
  2019-10-15 16:06                   ` Michael Matz
  2019-10-16  2:00                   ` [PATCH V3] " Feng Xue OS
  0 siblings, 2 replies; 31+ messages in thread
From: Philipp Tomsich @ 2019-10-15 16:01 UTC (permalink / raw)
  To: Feng Xue OS
  Cc: Michael Matz, Richard Biener, gcc-patches,
	Christoph Müllner, erick.ochoa

Feng,

This looks good from our side and has shown useful (combined with the other 2 patches) in
our testing with SPEC2017.
Given that this looks final: what is the plan for getting this merged?

Thanks,
Philipp.

> On 12.09.2019, at 12:23, Feng Xue OS <fxue at os dot amperecomputing dot com> wrote:
> 
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1391a562c35..28981fa1048 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a single loop.
> @item lim-expensive
> The minimum cost of an expensive expression in the loop invariant motion.
> 
> +@item max-cond-loop-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-cond-loop-split-prob
> +When FDO profile information is available, @option{min-cond-loop-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
> @item iv-consider-all-candidates-bound
> Bound on number of candidates for induction variables, below which
> all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 13001a7bb2d..12bc8c26c9e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> 	"The maximum number of unswitchings in a single loop.",
> 	3, 0, 0)
> 
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +	"max-cond-loop-split-insns",
> +	"The maximum number of insns to be added due to loop split on "
> +	"semi-invariant condition statement.",
> +	100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +	"min-cond-loop-split-prob",
> +	"The minimum threshold for probability of semi-invariant condition "
> +	"statement to trigger loop split.",
> +	30, 0, 100)
> +
> /* The maximum number of insns in loop header duplicated by the copy loop
>    headers pass.  */
> DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 00000000000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include <string>
> +#include <map>
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map<int, string> m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> +    {
> +      if (ga->empty)
> +        ga->set (iter->second);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 00000000000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +    {
> +      if (b)
> +        b = do_something();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f5f083384bc..e4a1b6d2019 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-loop.h"
> #include "tree-ssa-loop-manip.h"
> #include "tree-into-ssa.h"
> +#include "tree-inline.h"
> +#include "tree-cfgcleanup.h"
> #include "cfgloop.h"
> +#include "params.h"
> #include "tree-scalar-evolution.h"
> #include "gimple-iterator.h"
> #include "gimple-pretty-print.h"
> @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
> #include "gimple-fold.h"
> #include "gimplify-me.h"
> 
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kinds of loop splitting.
> +
> +   One transformation of loops like:
> 
>    for (i = 0; i < 100; i++)
>      {
> @@ -612,6 +617,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>   return changed;
> }
> 
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement.  */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop.  */
> +  auto_vec<gimple *> memory_stores;
> +
> +  /* Whether above memory stores vector has been filled.  */
> +  int need_init;
> +
> +  split_info () : bbs (NULL),  need_init (true) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +	free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in LOOP, including memory
> +   store and non-pure function call, and keep those in a vector.  This work
> +   is only done one time, for the vector should be constant during analysis
> +   stage of semi-invariant condition.  */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled.  */
> +  info->need_init = false;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block.  */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first.  */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it.  */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> other_stores;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +	 and reversely start the process from the last SSA name towards the
> +	 first, which ensures that this do-while will not touch SSA names
> +	 defined outside of the loop.  */
> +      gcc_assert (gimple_bb (stmt)
> +		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +	{
> +	  gphi *phi = as_a <gphi *> (stmt);
> +
> +	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +	    {
> +	      tree arg = gimple_phi_arg_def (stmt, i);
> +
> +	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +		worklist.safe_push (arg);
> +	    }
> +	}
> +      else
> +	{
> +	  tree prev = gimple_vuse (stmt);
> +
> +	  /* Non-pure call statement is conservatively assumed to impact all
> +	     memory locations.  So place call statements ahead of other memory
> +	     stores in the vector with an idea of of using them as shortcut
> +	     terminators to memory alias analysis.  */
> +	  if (gimple_code (stmt) == GIMPLE_CALL)
> +	    info->memory_stores.safe_push (stmt);
> +	  else
> +	    other_stores.safe_push (stmt);
> +
> +	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +	    worklist.safe_push (prev);
> +	}
> +    } while (!worklist.is_empty ());
> +
> +    info->memory_stores.safe_splice (other_stores);
> +}
> +
> +
> +/* Given STMT, memory load or pure call statement, check whether it is impacted
> +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> +   NULL, all basic blocks of LOOP are checked.  */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +		       const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that.  */
> +  if (info->need_init)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> +    {
> +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> +      if (skip_head
> +	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +	continue;
> +
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +	return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +		       const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> +   unchanged in next interation.  We call this characterisic as semi-
> +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> +   basic blocks and control flows in the loop will be considered.  If non-
> +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +		      const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* For PHI node that is not in loop header, its source operands should
> +	 be defined inside the loop, which are seen as loop variant.  */
> +      if (def_bb != loop->header || !skip_head)
> +	return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header contains two source operands, one is
> +	 initial value, the other is the copy of last iteration through loop
> +	 latch, we call it latch value.  From the PHI node to definition
> +	 of latch value, if excluding branch trace from SKIP_HEAD, there
> +	 is no definition of other version of same variable, SSA name defined
> +	 by the PHI node is semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +	Suppose in certain iteration, execution flow in above graph goes
> +	through true branch, which means that one source value to define
> +	x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +	x_1 in next iterations is defined by x_3, we know that x_1 will
> +	never changed if COND always chooses true branch from then on.  */
> +
> +      while (from != name)
> +	{
> +	  /* A new value comes from a CONSTANT.  */
> +	  if (TREE_CODE (from) != SSA_NAME)
> +	    return false;
> +
> +	  gimple *stmt = SSA_NAME_DEF_STMT (from);
> +	  const_basic_block bb = gimple_bb (stmt);
> +
> +	  /* A new value comes from outside of loop.  */
> +	  if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +	    return false;
> +
> +	  from = NULL_TREE;
> +
> +	  if (gimple_code (stmt) == GIMPLE_PHI)
> +	    {
> +	      gphi *phi = as_a <gphi *> (stmt);
> +
> +	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +		{
> +		  const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +		  /* Not consider redefinitions in excluded basic blocks.  */
> +		  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +		    {
> +		      /* There are more than one source operands that can
> +			 provide value to the SSA name, it is variant.  */
> +		      if (from)
> +			return false;
> +
> +		      from = gimple_phi_arg_def (phi, i);
> +		    }
> +		}
> +	    }
> +	  else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +	    {
> +	      /* For simple value copy, check its rhs instead.  */
> +	      if (gimple_assign_ssa_name_copy_p (stmt))
> +		from = gimple_assign_rhs1 (stmt);
> +	    }
> +
> +	  /* Any other kind of definition is deemed to introduce a new value
> +	     to the SSA name.  */
> +	  if (!from)
> +	    return false;
> +	}
> +	return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration.  */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place.  */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name.  */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> +   dominated by it are excluded from the loop.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +		       const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands.  This is because check on
> +     VARDECL operands, which involve memory loads, must have been done
> +     prior to invocation of this function in vuse_semi_invariant_p.  */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +	return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +	continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +	continue;
> +
> +       /* The branch can be reached from opposite branch, or from some
> +	  statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static edge
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +	 this conditional statement.  Firstly, it is trivial if the exit branch
> +	 is semi-invariant, for the statement is just to break loop.  Secondly,
> +	 if the opposite branch is semi-invariant, it means that the statement
> +	 is real loop-invariant, which is covered by loop unswitch.  */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +	return NULL;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!branch_removable_p (targ_bb[i]))
> +	continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +	 loop latch, it and its following trace will only be executed in
> +	 final iteration of loop, namely it is not part of repeated body
> +	 of the loop.  Similar to the above case that the branch is loop
> +	 exit, no need to split loop.  */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +	continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want.  */
> +  if (invar[0] ^ !invar[1])
> +    return NULL;
> +
> +  /* Found a real loop-invariant condition, do nothing.  */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return NULL;
> +
> +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> +}
> +
> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> +{
> +  basic_block cond_bb = branch_edge->src;
> +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> +	continue;
> +
> +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> +    }
> +
> +  /* It is unnecessary to evaluate expression of the conditional statement
> +     in new loop that contains only invariant branch.  This expresion should
> +     be constant value (either true or false).  Exclude code size of insns
> +     that contribute to computation of the expression.  */
> +
> +  auto_vec<gimple *> worklist;
> +  hash_set<gimple *> removed;
> +  gimple *stmt = last_stmt (cond_bb);
> +
> +  worklist.safe_push (stmt);
> +  removed.add (stmt);
> +  num -= estimate_num_insns (stmt, &eni_size_weights);
> +
> +  do
> +    {
> +      ssa_op_iter opnd_iter;
> +      use_operand_p opnd_p;
> +
> +      stmt = worklist.pop ();
> +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> +	{
> +	  tree opnd = USE_FROM_PTR (opnd_p);
> +
> +	  if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> +	    continue;
> +
> +	  gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> +	  use_operand_p use_p;
> +	  imm_use_iterator use_iter;
> +
> +	  if (removed.contains (opnd_stmt)
> +	      || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> +	    continue;
> +
> +	  FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> +	    {
> +              gimple *use_stmt = USE_STMT (use_p);
> +
> +	      if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> +		{
> +		  opnd_stmt = NULL;
> +		  break;
> +		}
> +	    }
> +
> +	  if (opnd_stmt)
> +	    {
> +	      worklist.safe_push (opnd_stmt);
> +	      removed.add (opnd_stmt);
> +	      num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> +	    }
> +	}
> +    } while (!worklist.is_empty ());
> +
> +  gcc_assert (num >= 0);
> +  return num;
> +}
> +
> +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> +   and check whether it is eligible and profitable to perform loop split upon
> +   this branch in LOOP.  */
> +
> +static edge
> +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> +{
> +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (!invar_branch)
> +    return NULL;
> +
> +  profile_probability prob = invar_branch->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +	return NULL;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, invar_branch)
> +      > PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> +    return NULL;
> +
> +  return invar_branch;
> +}
> +
> +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> +   conditional statement, perform loop split transformation illustrated
> +   as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out.  In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result.  In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> +   edge.  And also in loop2, we abandon the variant branch of the conditional
> +   statement candidate by setting a constant bool condition, based on which
> +   branch is semi-invariant.  */
> +
> +static bool
> +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> +{
> +  basic_block cond_bb = invar_branch->src;
> +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> +
> +  gcc_assert (cond_bb->loop_father == loop1);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +	      current_function_name (), loop1->num,
> +	      true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +				     profile_probability::always (),
> +				     profile_probability::never (),
> +				     profile_probability::always (),
> +				     profile_probability::always (),
> +				     true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition.  */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +				      gimple_cond_code (cond),
> +				      gimple_cond_lhs (cond),
> +				      gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  /* Replace the condition in loop2 with a bool constant to let PassManager
> +     remove the variant branch after current pass completes.  */
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1.  This
> +     statement acts as a switch to transfer execution from loop1 to loop2,
> +     when loop1 enters into invariant state.  */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +					  NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2.  */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in LOOP, to find out a good candidate
> +   upon which we can do loop split.  */
> +
> +static bool
> +split_loop_on_cond (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +  bool do_split = false;
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field.  */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* We only consider conditional statement, which be executed at most once
> +	 in each iteration of the loop.  So skip statements in inner loops.  */
> +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> +	continue;
> +
> +      /* Actually this check is not a must constraint. With it, we can ensure
> +	 conditional statement will always be executed in each iteration. */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +	continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +	continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> +
> +      if (branch_edge)
> +	{
> +	  do_split_loop_on_cond (loop, branch_edge);
> +	  do_split = true;
> +	  break;
> +	}
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return do_split;
> +}
> +
> /* Main entry point.  Perform loop splitting on all suitable loops.  */
> 
> static unsigned int
> @@ -662,6 +1383,32 @@ tree_ssa_split_loops (void)
> 	}
>     }
> 
> +  if (changed)
> +    {
> +      cleanup_tree_cfg ();
> +      changed = false;
> +    }
> +
> +  /* Perform loop splitting for suitable if-conditions in all loops.  */
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      if (loop->aux)
> +        {
> +	  loop_outer (loop)->aux = loop;
> +	  continue;
> +	}
> +
> +      if (!optimize_loop_for_size_p (loop)
> +	  && split_loop_on_cond (loop))
> +	{
> +	  loop_outer (loop)->aux = loop;
> +	  changed = true;
> +	}
> +    }
> +
>   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
>     loop->aux = NULL;
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-15 16:01                 ` Philipp Tomsich
@ 2019-10-15 16:06                   ` Michael Matz
  2019-10-22 10:16                     ` Feng Xue OS
  2019-10-16  2:00                   ` [PATCH V3] " Feng Xue OS
  1 sibling, 1 reply; 31+ messages in thread
From: Michael Matz @ 2019-10-15 16:06 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Feng Xue OS, Richard Biener, gcc-patches, Christoph Müllner,
	erick.ochoa

Hi,

On Tue, 15 Oct 2019, Philipp Tomsich wrote:

> This looks good from our side and has shown useful (combined with the other 2 patches) in
> our testing with SPEC2017.
> Given that this looks final: what is the plan for getting this merged?

I'll get to review this v3 version this week.


Ciao,
Michael.

> 
> Thanks,
> Philipp.
> 
> > On 12.09.2019, at 12:23, Feng Xue OS <fxue at os dot amperecomputing dot com> wrote:
> > 
> > ---
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 1391a562c35..28981fa1048 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a single loop.
> > @item lim-expensive
> > The minimum cost of an expensive expression in the loop invariant motion.
> > 
> > +@item max-cond-loop-split-insns
> > +In a loop, if a branch of a conditional statement is selected since certain
> > +loop iteration, any operand that contributes to computation of the conditional
> > +expression remains unchanged in all following iterations, the statement is
> > +semi-invariant, upon which we can do a kind of loop split transformation.
> > +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> > +added due to loop split on semi-invariant conditional statement.
> > +
> > +@item min-cond-loop-split-prob
> > +When FDO profile information is available, @option{min-cond-loop-split-prob}
> > +specifies minimum threshold for probability of semi-invariant condition
> > +statement to trigger loop split.
> > +
> > @item iv-consider-all-candidates-bound
> > Bound on number of candidates for induction variables, below which
> > all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 13001a7bb2d..12bc8c26c9e 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> > 	"The maximum number of unswitchings in a single loop.",
> > 	3, 0, 0)
> > 
> > +/* The maximum number of increased insns due to loop split on semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > +	"max-cond-loop-split-insns",
> > +	"The maximum number of insns to be added due to loop split on "
> > +	"semi-invariant condition statement.",
> > +	100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > +	"min-cond-loop-split-prob",
> > +	"The minimum threshold for probability of semi-invariant condition "
> > +	"statement to trigger loop split.",
> > +	30, 0, 100)
> > +
> > /* The maximum number of insns in loop header duplicated by the copy loop
> >    headers pass.  */
> > DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> > 
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > new file mode 100644
> > index 00000000000..51f9da22fc7
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +#include <string>
> > +#include <map>
> > +
> > +using namespace std;
> > +
> > +class  A
> > +{
> > +public:
> > +  bool empty;
> > +  void set (string s);
> > +};
> > +
> > +class  B
> > +{
> > +  map<int, string> m;
> > +  void f ();
> > +};
> > +
> > +extern A *ga;
> > +
> > +void B::f ()
> > +{
> > +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> > +    {
> > +      if (ga->empty)
> > +        ga->set (iter->second);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > new file mode 100644
> > index 00000000000..bbd522d6bcd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> > +{
> > +  return i + 1;
> > +}
> > +
> > +extern int do_something (void);
> > +extern int b;
> > +
> > +void test(int n)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < n; i = inc (i))
> > +    {
> > +      if (b)
> > +        b = do_something();
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index f5f083384bc..e4a1b6d2019 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> > #include "tree-ssa-loop.h"
> > #include "tree-ssa-loop-manip.h"
> > #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> > +#include "tree-cfgcleanup.h"
> > #include "cfgloop.h"
> > +#include "params.h"
> > #include "tree-scalar-evolution.h"
> > #include "gimple-iterator.h"
> > #include "gimple-pretty-print.h"
> > @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
> > #include "gimple-fold.h"
> > #include "gimplify-me.h"
> > 
> > -/* This file implements loop splitting, i.e. transformation of loops like
> > +/* This file implements two kinds of loop splitting.
> > +
> > +   One transformation of loops like:
> > 
> >    for (i = 0; i < 100; i++)
> >      {
> > @@ -612,6 +617,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
> >   return changed;
> > }
> > 
> > +/* Another transformation of loops like:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> > +         a_j = ...;  // change at least one a_j
> > +       else
> > +         S;          // not change any a_j
> > +     }
> > +
> > +   into:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;
> > +       else
> > +         {
> > +           S;
> > +           i = NEXT ();
> > +           break;
> > +         }
> > +     }
> > +
> > +   for (; CHECK (i); i = NEXT ())
> > +     {
> > +       S;
> > +     }
> > +
> > +   */
> > +
> > +/* Data structure to hold temporary information during loop split upon
> > +   semi-invariant conditional statement.  */
> > +class split_info {
> > +public:
> > +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> > +  basic_block *bbs;
> > +
> > +  /* All memory store/clobber statements in a loop.  */
> > +  auto_vec<gimple *> memory_stores;
> > +
> > +  /* Whether above memory stores vector has been filled.  */
> > +  int need_init;
> > +
> > +  split_info () : bbs (NULL),  need_init (true) { }
> > +
> > +  ~split_info ()
> > +    {
> > +      if (bbs)
> > +	free (bbs);
> > +    }
> > +};
> > +
> > +/* Find all statements with memory-write effect in LOOP, including memory
> > +   store and non-pure function call, and keep those in a vector.  This work
> > +   is only done one time, for the vector should be constant during analysis
> > +   stage of semi-invariant condition.  */
> > +
> > +static void
> > +find_vdef_in_loop (struct loop *loop)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +  gphi *vphi = get_virtual_phi (loop->header);
> > +
> > +  /* Indicate memory store vector has been filled.  */
> > +  info->need_init = false;
> > +
> > +  /* If loop contains memory operation, there must be a virtual PHI node in
> > +     loop header basic block.  */
> > +  if (vphi == NULL)
> > +    return;
> > +
> > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> > +     links the first and the last virtual SSA names, by using the last as
> > +     PHI operand to define the first.  */
> > +  const edge latch = loop_latch_edge (loop);
> > +  const tree first = gimple_phi_result (vphi);
> > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > +
> > +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> > +     is defined by itself.
> > +
> > +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > +
> > +     This means the loop contains only memory loads, so we can skip it.  */
> > +  if (first == last)
> > +    return;
> > +
> > +  auto_vec<gimple *> other_stores;
> > +  auto_vec<tree> worklist;
> > +  auto_bitmap visited;
> > +
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > +  worklist.safe_push (last);
> > +
> > +  do
> > +    {
> > +      tree vuse = worklist.pop ();
> > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > +
> > +      /* We mark the first and last SSA names as visited at the beginning,
> > +	 and reversely start the process from the last SSA name towards the
> > +	 first, which ensures that this do-while will not touch SSA names
> > +	 defined outside of the loop.  */
> > +      gcc_assert (gimple_bb (stmt)
> > +		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > +
> > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > +	{
> > +	  gphi *phi = as_a <gphi *> (stmt);
> > +
> > +	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +	    {
> > +	      tree arg = gimple_phi_arg_def (stmt, i);
> > +
> > +	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > +		worklist.safe_push (arg);
> > +	    }
> > +	}
> > +      else
> > +	{
> > +	  tree prev = gimple_vuse (stmt);
> > +
> > +	  /* Non-pure call statement is conservatively assumed to impact all
> > +	     memory locations.  So place call statements ahead of other memory
> > +	     stores in the vector with an idea of of using them as shortcut
> > +	     terminators to memory alias analysis.  */
> > +	  if (gimple_code (stmt) == GIMPLE_CALL)
> > +	    info->memory_stores.safe_push (stmt);
> > +	  else
> > +	    other_stores.safe_push (stmt);
> > +
> > +	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > +	    worklist.safe_push (prev);
> > +	}
> > +    } while (!worklist.is_empty ());
> > +
> > +    info->memory_stores.safe_splice (other_stores);
> > +}
> > +
> > +
> > +/* Given STMT, memory load or pure call statement, check whether it is impacted
> > +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> > +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> > +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> > +   NULL, all basic blocks of LOOP are checked.  */
> > +
> > +static bool
> > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +		       const_basic_block skip_head)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +
> > +  /* Collect memory store/clobber statements if have not do that.  */
> > +  if (info->need_init)
> > +    find_vdef_in_loop (loop);
> > +
> > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> > +  ao_ref ref;
> > +  gimple *store;
> > +  unsigned i;
> > +
> > +  ao_ref_init (&ref, rhs);
> > +
> > +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> > +    {
> > +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> > +      if (skip_head
> > +	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> > +	continue;
> > +
> > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > +	return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Forward declaration.  */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +		       const_basic_block skip_head);
> > +
> > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> > +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> > +   unchanged in next interation.  We call this characterisic as semi-
> > +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> > +   basic blocks and control flows in the loop will be considered.  If non-
> > +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> > +
> > +static bool
> > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > +		      const_basic_block skip_head)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > +  const_basic_block def_bb = gimple_bb (def);
> > +
> > +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > +    return true;
> > +
> > +  if (gimple_code (def) == GIMPLE_PHI)
> > +    {
> > +      /* For PHI node that is not in loop header, its source operands should
> > +	 be defined inside the loop, which are seen as loop variant.  */
> > +      if (def_bb != loop->header || !skip_head)
> > +	return false;
> > +
> > +      const_edge latch = loop_latch_edge (loop);
> > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > +
> > +      /* A PHI node in loop header contains two source operands, one is
> > +	 initial value, the other is the copy of last iteration through loop
> > +	 latch, we call it latch value.  From the PHI node to definition
> > +	 of latch value, if excluding branch trace from SKIP_HEAD, there
> > +	 is no definition of other version of same variable, SSA name defined
> > +	 by the PHI node is semi-invariant.
> > +
> > +                         loop entry
> > +                              |     .--- latch ---.
> > +                              |     |             |
> > +                              v     v             |
> > +                  x_1 = PHI <x_0,  x_3>           |
> > +                           |                      |
> > +                           v                      |
> > +              .------- if (cond) -------.         |
> > +              |                         |         |
> > +              |                     [ SKIP ]      |
> > +              |                         |         |
> > +              |                     x_2 = ...     |
> > +              |                         |         |
> > +              '---- T ---->.<---- F ----'         |
> > +                           |                      |
> > +                           v                      |
> > +                  x_3 = PHI <x_1, x_2>            |
> > +                           |                      |
> > +                           '----------------------'
> > +
> > +	Suppose in certain iteration, execution flow in above graph goes
> > +	through true branch, which means that one source value to define
> > +	x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> > +	x_1 in next iterations is defined by x_3, we know that x_1 will
> > +	never changed if COND always chooses true branch from then on.  */
> > +
> > +      while (from != name)
> > +	{
> > +	  /* A new value comes from a CONSTANT.  */
> > +	  if (TREE_CODE (from) != SSA_NAME)
> > +	    return false;
> > +
> > +	  gimple *stmt = SSA_NAME_DEF_STMT (from);
> > +	  const_basic_block bb = gimple_bb (stmt);
> > +
> > +	  /* A new value comes from outside of loop.  */
> > +	  if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > +	    return false;
> > +
> > +	  from = NULL_TREE;
> > +
> > +	  if (gimple_code (stmt) == GIMPLE_PHI)
> > +	    {
> > +	      gphi *phi = as_a <gphi *> (stmt);
> > +
> > +	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +		{
> > +		  const_edge e = gimple_phi_arg_edge (phi, i);
> > +
> > +		  /* Not consider redefinitions in excluded basic blocks.  */
> > +		  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> > +		    {
> > +		      /* There are more than one source operands that can
> > +			 provide value to the SSA name, it is variant.  */
> > +		      if (from)
> > +			return false;
> > +
> > +		      from = gimple_phi_arg_def (phi, i);
> > +		    }
> > +		}
> > +	    }
> > +	  else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > +	    {
> > +	      /* For simple value copy, check its rhs instead.  */
> > +	      if (gimple_assign_ssa_name_copy_p (stmt))
> > +		from = gimple_assign_rhs1 (stmt);
> > +	    }
> > +
> > +	  /* Any other kind of definition is deemed to introduce a new value
> > +	     to the SSA name.  */
> > +	  if (!from)
> > +	    return false;
> > +	}
> > +	return true;
> > +    }
> > +
> > +  /* Value originated from volatile memory load or return of normal (non-
> > +     const/pure) call should not be treated as constant in each iteration.  */
> > +  if (gimple_has_side_effects (def))
> > +    return false;
> > +
> > +  /* Check if any memory store may kill memory load at this place.  */
> > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> > +    return false;
> > +
> > +  /* Check operands of definition statement of the SSA name.  */
> > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > +}
> > +
> > +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> > +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> > +   dominated by it are excluded from the loop.  */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +		       const_basic_block skip_head)
> > +{
> > +  ssa_op_iter iter;
> > +  tree use;
> > +
> > +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> > +     here we only need to check SSA name operands.  This is because check on
> > +     VARDECL operands, which involve memory loads, must have been done
> > +     prior to invocation of this function in vuse_semi_invariant_p.  */
> > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > +    {
> > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > +	return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine when conditional statement never transfers execution to one of its
> > +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> > +   and those basic blocks dominated by BRANCH_BB.  */
> > +
> > +static bool
> > +branch_removable_p (basic_block branch_bb)
> > +{
> > +  if (single_pred_p (branch_bb))
> > +    return true;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > +    {
> > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > +	continue;
> > +
> > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > +	continue;
> > +
> > +       /* The branch can be reached from opposite branch, or from some
> > +	  statement not dominated by the conditional statement.  */
> > +      return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Find out which branch of a conditional statement (COND) is invariant in the
> > +   execution context of LOOP.  That is: once the branch is selected in certain
> > +   iteration of the loop, any operand that contributes to computation of the
> > +   conditional statement remains unchanged in all following iterations.  */
> > +
> > +static edge
> > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > +{
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  basic_block targ_bb[2];
> > +  bool invar[2];
> > +  unsigned invar_checks;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > +
> > +      /* One branch directs to loop exit, no need to perform loop split upon
> > +	 this conditional statement.  Firstly, it is trivial if the exit branch
> > +	 is semi-invariant, for the statement is just to break loop.  Secondly,
> > +	 if the opposite branch is semi-invariant, it means that the statement
> > +	 is real loop-invariant, which is covered by loop unswitch.  */
> > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > +	return NULL;
> > +    }
> > +
> > +  invar_checks = 0;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      invar[!i] = false;
> > +
> > +      if (!branch_removable_p (targ_bb[i]))
> > +	continue;
> > +
> > +      /* Given a semi-invariant branch, if its opposite branch dominates
> > +	 loop latch, it and its following trace will only be executed in
> > +	 final iteration of loop, namely it is not part of repeated body
> > +	 of the loop.  Similar to the above case that the branch is loop
> > +	 exit, no need to split loop.  */
> > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > +	continue;
> > +
> > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > +      invar_checks++;
> > +    }
> > +
> > +  /* With both branches being invariant (handled by loop unswitch) or
> > +     variant is not what we want.  */
> > +  if (invar[0] ^ !invar[1])
> > +    return NULL;
> > +
> > +  /* Found a real loop-invariant condition, do nothing.  */
> > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > +    return NULL;
> > +
> > +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> > +}
> > +
> > +/* Calculate increased code size measured by estimated insn number if applying
> > +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> > +
> > +static int
> > +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> > +{
> > +  basic_block cond_bb = branch_edge->src;
> > +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> > +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  int num = 0;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      /* Do no count basic blocks only in opposite branch.  */
> > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> > +	continue;
> > +
> > +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> > +    }
> > +
> > +  /* It is unnecessary to evaluate expression of the conditional statement
> > +     in new loop that contains only invariant branch.  This expresion should
> > +     be constant value (either true or false).  Exclude code size of insns
> > +     that contribute to computation of the expression.  */
> > +
> > +  auto_vec<gimple *> worklist;
> > +  hash_set<gimple *> removed;
> > +  gimple *stmt = last_stmt (cond_bb);
> > +
> > +  worklist.safe_push (stmt);
> > +  removed.add (stmt);
> > +  num -= estimate_num_insns (stmt, &eni_size_weights);
> > +
> > +  do
> > +    {
> > +      ssa_op_iter opnd_iter;
> > +      use_operand_p opnd_p;
> > +
> > +      stmt = worklist.pop ();
> > +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> > +	{
> > +	  tree opnd = USE_FROM_PTR (opnd_p);
> > +
> > +	  if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> > +	    continue;
> > +
> > +	  gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> > +	  use_operand_p use_p;
> > +	  imm_use_iterator use_iter;
> > +
> > +	  if (removed.contains (opnd_stmt)
> > +	      || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> > +	    continue;
> > +
> > +	  FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> > +	    {
> > +              gimple *use_stmt = USE_STMT (use_p);
> > +
> > +	      if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> > +		{
> > +		  opnd_stmt = NULL;
> > +		  break;
> > +		}
> > +	    }
> > +
> > +	  if (opnd_stmt)
> > +	    {
> > +	      worklist.safe_push (opnd_stmt);
> > +	      removed.add (opnd_stmt);
> > +	      num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> > +	    }
> > +	}
> > +    } while (!worklist.is_empty ());
> > +
> > +  gcc_assert (num >= 0);
> > +  return num;
> > +}
> > +
> > +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> > +   and check whether it is eligible and profitable to perform loop split upon
> > +   this branch in LOOP.  */
> > +
> > +static edge
> > +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> > +{
> > +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> > +
> > +  if (!invar_branch)
> > +    return NULL;
> > +
> > +  profile_probability prob = invar_branch->probability;
> > +
> > +  /* When accurate profile information is available, and execution
> > +     frequency of the branch is too low, just let it go.  */
> > +  if (prob.reliable_p ())
> > +    {
> > +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> > +
> > +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> > +	return NULL;
> > +    }
> > +
> > +  /* Add a threshold for increased code size to disable loop split.  */
> > +  if (compute_added_num_insns (loop, invar_branch)
> > +      > PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> > +    return NULL;
> > +
> > +  return invar_branch;
> > +}
> > +
> > +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> > +   conditional statement, perform loop split transformation illustrated
> > +   as the following graph.
> > +
> > +               .-------T------ if (true) ------F------.
> > +               |                    .---------------. |
> > +               |                    |               | |
> > +               v                    |               v v
> > +          pre-header                |            pre-header
> > +               | .------------.     |                 | .------------.
> > +               | |            |     |                 | |            |
> > +               | v            |     |                 | v            |
> > +             header           |     |               header           |
> > +               |              |     |                 |              |
> > +       [ bool r = cond; ]     |     |                 |              |
> > +               |              |     |                 |              |
> > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > +      |                 |     |     |        |                 |     |
> > +  invariant             |     |     |    invariant             |     |
> > +      |                 |     |     |        |                 |     |
> > +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> > +               |              |    /                  |              |
> > +             stmts            |   /                 stmts            |
> > +               |              |  /                    |              |
> > +              / \             | /                    / \             |
> > +     .-------*   *       [ if (!r) ]        .-------*   *            |
> > +     |           |            |             |           |            |
> > +     |         latch          |             |         latch          |
> > +     |           |            |             |           |            |
> > +     |           '------------'             |           '------------'
> > +     '------------------------. .-----------'
> > +             loop1            | |                   loop2
> > +                              v v
> > +                             exits
> > +
> > +   In the graph, loop1 represents the part derived from original one, and
> > +   loop2 is duplicated using loop_version (), which corresponds to the part
> > +   of original one being splitted out.  In loop1, a new bool temporary (r)
> > +   is introduced to keep value of the condition result.  In original latch
> > +   edge of loop1, we insert a new conditional statement whose value comes
> > +   from previous temporary (r), one of its branch goes back to loop1 header
> > +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> > +   edge.  And also in loop2, we abandon the variant branch of the conditional
> > +   statement candidate by setting a constant bool condition, based on which
> > +   branch is semi-invariant.  */
> > +
> > +static bool
> > +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> > +{
> > +  basic_block cond_bb = invar_branch->src;
> > +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> > +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> > +
> > +  gcc_assert (cond_bb->loop_father == loop1);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> > +	      current_function_name (), loop1->num,
> > +	      true_invar ? "T" : "F", cond_bb->index);
> > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > +   }
> > +
> > +  initialize_original_copy_tables ();
> > +
> > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > +				     profile_probability::always (),
> > +				     profile_probability::never (),
> > +				     profile_probability::always (),
> > +				     profile_probability::always (),
> > +				     true);
> > +  if (!loop2)
> > +    {
> > +      free_original_copy_tables ();
> > +      return false;
> > +    }
> > +
> > +  /* Generate a bool type temporary to hold result of the condition.  */
> > +  tree tmp = make_ssa_name (boolean_type_node);
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple *stmt = gimple_build_assign (tmp,
> > +				      gimple_cond_code (cond),
> > +				      gimple_cond_lhs (cond),
> > +				      gimple_cond_rhs (cond));
> > +
> > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > +  update_stmt (cond);
> > +
> > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > +
> > +  /* Replace the condition in loop2 with a bool constant to let PassManager
> > +     remove the variant branch after current pass completes.  */
> > +  if (true_invar)
> > +    gimple_cond_make_true (cond_copy);
> > +  else
> > +    gimple_cond_make_false (cond_copy);
> > +
> > +  update_stmt (cond_copy);
> > +
> > +  /* Insert a new conditional statement on latch edge of loop1.  This
> > +     statement acts as a switch to transfer execution from loop1 to loop2,
> > +     when loop1 enters into invariant state.  */
> > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> > +					  NULL_TREE, NULL_TREE);
> > +
> > +  gsi = gsi_last_bb (break_bb);
> > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > +
> > +  edge to_loop1 = single_succ_edge (break_bb);
> > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> > +
> > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> > +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> > +
> > +  update_ssa (TODO_update_ssa);
> > +
> > +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> > +     pre-header, we should update PHIs in loop2 to reflect this connection
> > +     between loop1 and loop2.  */
> > +  connect_loop_phis (loop1, loop2, to_loop2);
> > +
> > +  free_original_copy_tables ();
> > +
> > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > +
> > +  return true;
> > +}
> > +
> > +/* Traverse all conditional statements in LOOP, to find out a good candidate
> > +   upon which we can do loop split.  */
> > +
> > +static bool
> > +split_loop_on_cond (struct loop *loop)
> > +{
> > +  split_info *info = new split_info ();
> > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > +  bool do_split = false;
> > +
> > +  /* Allocate an area to keep temporary info, and associate its address
> > +     with loop aux field.  */
> > +  loop->aux = info;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      basic_block bb = bbs[i];
> > +
> > +      /* We only consider conditional statement, which be executed at most once
> > +	 in each iteration of the loop.  So skip statements in inner loops.  */
> > +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> > +	continue;
> > +
> > +      /* Actually this check is not a must constraint. With it, we can ensure
> > +	 conditional statement will always be executed in each iteration. */
> > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +	continue;
> > +
> > +      gimple *last = last_stmt (bb);
> > +
> > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > +	continue;
> > +
> > +      gcond *cond = as_a <gcond *> (last);
> > +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> > +
> > +      if (branch_edge)
> > +	{
> > +	  do_split_loop_on_cond (loop, branch_edge);
> > +	  do_split = true;
> > +	  break;
> > +	}
> > +    }
> > +
> > +  delete info;
> > +  loop->aux = NULL;
> > +
> > +  return do_split;
> > +}
> > +
> > /* Main entry point.  Perform loop splitting on all suitable loops.  */
> > 
> > static unsigned int
> > @@ -662,6 +1383,32 @@ tree_ssa_split_loops (void)
> > 	}
> >     }
> > 
> > +  if (changed)
> > +    {
> > +      cleanup_tree_cfg ();
> > +      changed = false;
> > +    }
> > +
> > +  /* Perform loop splitting for suitable if-conditions in all loops.  */
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > +    {
> > +      if (loop->aux)
> > +        {
> > +	  loop_outer (loop)->aux = loop;
> > +	  continue;
> > +	}
> > +
> > +      if (!optimize_loop_for_size_p (loop)
> > +	  && split_loop_on_cond (loop))
> > +	{
> > +	  loop_outer (loop)->aux = loop;
> > +	  changed = true;
> > +	}
> > +    }
> > +
> >   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> >     loop->aux = NULL;
> > 
> > -- 
> > 2.17.1
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-15 16:01                 ` Philipp Tomsich
  2019-10-15 16:06                   ` Michael Matz
@ 2019-10-16  2:00                   ` Feng Xue OS
  1 sibling, 0 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-10-16  2:00 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Michael Matz, Richard Biener, gcc-patches,
	Christoph Müllner, erick.ochoa

Hi Philipp,

   This is an updated patch based on comments form Michael, and if he think this is ok, we will merge it into trunk. Thanks,

Feng

________________________________________
From: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Sent: Tuesday, October 15, 2019 11:49 PM
To: Feng Xue OS
Cc: Michael Matz; Richard Biener; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.ochoa@theobroma-systems.com
Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Feng,

This looks good from our side and has shown useful (combined with the other 2 patches) in
our testing with SPEC2017.
Given that this looks final: what is the plan for getting this merged?

Thanks,
Philipp.

> On 12.09.2019, at 12:23, Feng Xue OS <fxue at os dot amperecomputing dot com> wrote:
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1391a562c35..28981fa1048 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a single loop.
> @item lim-expensive
> The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-cond-loop-split-prob
> +When FDO profile information is available, @option{min-cond-loop-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
> @item iv-consider-all-candidates-bound
> Bound on number of candidates for induction variables, below which
> all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 13001a7bb2d..12bc8c26c9e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>       "The maximum number of unswitchings in a single loop.",
>       3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +     "max-cond-loop-split-insns",
> +     "The maximum number of insns to be added due to loop split on "
> +     "semi-invariant condition statement.",
> +     100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +     "min-cond-loop-split-prob",
> +     "The minimum threshold for probability of semi-invariant condition "
> +     "statement to trigger loop split.",
> +     30, 0, 100)
> +
> /* The maximum number of insns in loop header duplicated by the copy loop
>    headers pass.  */
> DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 00000000000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include <string>
> +#include <map>
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map<int, string> m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> +    {
> +      if (ga->empty)
> +        ga->set (iter->second);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 00000000000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +    {
> +      if (b)
> +        b = do_something();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f5f083384bc..e4a1b6d2019 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-loop.h"
> #include "tree-ssa-loop-manip.h"
> #include "tree-into-ssa.h"
> +#include "tree-inline.h"
> +#include "tree-cfgcleanup.h"
> #include "cfgloop.h"
> +#include "params.h"
> #include "tree-scalar-evolution.h"
> #include "gimple-iterator.h"
> #include "gimple-pretty-print.h"
> @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
> #include "gimple-fold.h"
> #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kinds of loop splitting.
> +
> +   One transformation of loops like:
>
>    for (i = 0; i < 100; i++)
>      {
> @@ -612,6 +617,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>   return changed;
> }
>
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement.  */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop.  */
> +  auto_vec<gimple *> memory_stores;
> +
> +  /* Whether above memory stores vector has been filled.  */
> +  int need_init;
> +
> +  split_info () : bbs (NULL),  need_init (true) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +     free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in LOOP, including memory
> +   store and non-pure function call, and keep those in a vector.  This work
> +   is only done one time, for the vector should be constant during analysis
> +   stage of semi-invariant condition.  */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled.  */
> +  info->need_init = false;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block.  */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first.  */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it.  */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> other_stores;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +      and reversely start the process from the last SSA name towards the
> +      first, which ensures that this do-while will not touch SSA names
> +      defined outside of the loop.  */
> +      gcc_assert (gimple_bb (stmt)
> +               && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +     {
> +       gphi *phi = as_a <gphi *> (stmt);
> +
> +       for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +         {
> +           tree arg = gimple_phi_arg_def (stmt, i);
> +
> +           if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +             worklist.safe_push (arg);
> +         }
> +     }
> +      else
> +     {
> +       tree prev = gimple_vuse (stmt);
> +
> +       /* Non-pure call statement is conservatively assumed to impact all
> +          memory locations.  So place call statements ahead of other memory
> +          stores in the vector with an idea of of using them as shortcut
> +          terminators to memory alias analysis.  */
> +       if (gimple_code (stmt) == GIMPLE_CALL)
> +         info->memory_stores.safe_push (stmt);
> +       else
> +         other_stores.safe_push (stmt);
> +
> +       if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +         worklist.safe_push (prev);
> +     }
> +    } while (!worklist.is_empty ());
> +
> +    info->memory_stores.safe_splice (other_stores);
> +}
> +
> +
> +/* Given STMT, memory load or pure call statement, check whether it is impacted
> +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> +   NULL, all basic blocks of LOOP are checked.  */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                    const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that.  */
> +  if (info->need_init)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> +    {
> +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> +      if (skip_head
> +       && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +     continue;
> +
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +     return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                    const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> +   unchanged in next interation.  We call this characterisic as semi-
> +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> +   basic blocks and control flows in the loop will be considered.  If non-
> +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                   const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* For PHI node that is not in loop header, its source operands should
> +      be defined inside the loop, which are seen as loop variant.  */
> +      if (def_bb != loop->header || !skip_head)
> +     return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header contains two source operands, one is
> +      initial value, the other is the copy of last iteration through loop
> +      latch, we call it latch value.  From the PHI node to definition
> +      of latch value, if excluding branch trace from SKIP_HEAD, there
> +      is no definition of other version of same variable, SSA name defined
> +      by the PHI node is semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +     Suppose in certain iteration, execution flow in above graph goes
> +     through true branch, which means that one source value to define
> +     x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +     x_1 in next iterations is defined by x_3, we know that x_1 will
> +     never changed if COND always chooses true branch from then on.  */
> +
> +      while (from != name)
> +     {
> +       /* A new value comes from a CONSTANT.  */
> +       if (TREE_CODE (from) != SSA_NAME)
> +         return false;
> +
> +       gimple *stmt = SSA_NAME_DEF_STMT (from);
> +       const_basic_block bb = gimple_bb (stmt);
> +
> +       /* A new value comes from outside of loop.  */
> +       if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +         return false;
> +
> +       from = NULL_TREE;
> +
> +       if (gimple_code (stmt) == GIMPLE_PHI)
> +         {
> +           gphi *phi = as_a <gphi *> (stmt);
> +
> +           for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +             {
> +               const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +               /* Not consider redefinitions in excluded basic blocks.  */
> +               if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                 {
> +                   /* There are more than one source operands that can
> +                      provide value to the SSA name, it is variant.  */
> +                   if (from)
> +                     return false;
> +
> +                   from = gimple_phi_arg_def (phi, i);
> +                 }
> +             }
> +         }
> +       else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +         {
> +           /* For simple value copy, check its rhs instead.  */
> +           if (gimple_assign_ssa_name_copy_p (stmt))
> +             from = gimple_assign_rhs1 (stmt);
> +         }
> +
> +       /* Any other kind of definition is deemed to introduce a new value
> +          to the SSA name.  */
> +       if (!from)
> +         return false;
> +     }
> +     return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration.  */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place.  */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name.  */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> +   dominated by it are excluded from the loop.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                    const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands.  This is because check on
> +     VARDECL operands, which involve memory loads, must have been done
> +     prior to invocation of this function in vuse_semi_invariant_p.  */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +     return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +     continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +     continue;
> +
> +       /* The branch can be reached from opposite branch, or from some
> +       statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static edge
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +      this conditional statement.  Firstly, it is trivial if the exit branch
> +      is semi-invariant, for the statement is just to break loop.  Secondly,
> +      if the opposite branch is semi-invariant, it means that the statement
> +      is real loop-invariant, which is covered by loop unswitch.  */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +     return NULL;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!branch_removable_p (targ_bb[i]))
> +     continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +      loop latch, it and its following trace will only be executed in
> +      final iteration of loop, namely it is not part of repeated body
> +      of the loop.  Similar to the above case that the branch is loop
> +      exit, no need to split loop.  */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +     continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want.  */
> +  if (invar[0] ^ !invar[1])
> +    return NULL;
> +
> +  /* Found a real loop-invariant condition, do nothing.  */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return NULL;
> +
> +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> +}
> +
> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> +{
> +  basic_block cond_bb = branch_edge->src;
> +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> +     continue;
> +
> +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> +    }
> +
> +  /* It is unnecessary to evaluate expression of the conditional statement
> +     in new loop that contains only invariant branch.  This expresion should
> +     be constant value (either true or false).  Exclude code size of insns
> +     that contribute to computation of the expression.  */
> +
> +  auto_vec<gimple *> worklist;
> +  hash_set<gimple *> removed;
> +  gimple *stmt = last_stmt (cond_bb);
> +
> +  worklist.safe_push (stmt);
> +  removed.add (stmt);
> +  num -= estimate_num_insns (stmt, &eni_size_weights);
> +
> +  do
> +    {
> +      ssa_op_iter opnd_iter;
> +      use_operand_p opnd_p;
> +
> +      stmt = worklist.pop ();
> +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> +     {
> +       tree opnd = USE_FROM_PTR (opnd_p);
> +
> +       if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> +         continue;
> +
> +       gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> +       use_operand_p use_p;
> +       imm_use_iterator use_iter;
> +
> +       if (removed.contains (opnd_stmt)
> +           || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> +         continue;
> +
> +       FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> +         {
> +              gimple *use_stmt = USE_STMT (use_p);
> +
> +           if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> +             {
> +               opnd_stmt = NULL;
> +               break;
> +             }
> +         }
> +
> +       if (opnd_stmt)
> +         {
> +           worklist.safe_push (opnd_stmt);
> +           removed.add (opnd_stmt);
> +           num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> +         }
> +     }
> +    } while (!worklist.is_empty ());
> +
> +  gcc_assert (num >= 0);
> +  return num;
> +}
> +
> +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> +   and check whether it is eligible and profitable to perform loop split upon
> +   this branch in LOOP.  */
> +
> +static edge
> +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> +{
> +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (!invar_branch)
> +    return NULL;
> +
> +  profile_probability prob = invar_branch->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +     return NULL;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, invar_branch)
> +      > PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> +    return NULL;
> +
> +  return invar_branch;
> +}
> +
> +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> +   conditional statement, perform loop split transformation illustrated
> +   as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out.  In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result.  In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> +   edge.  And also in loop2, we abandon the variant branch of the conditional
> +   statement candidate by setting a constant bool condition, based on which
> +   branch is semi-invariant.  */
> +
> +static bool
> +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> +{
> +  basic_block cond_bb = invar_branch->src;
> +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> +
> +  gcc_assert (cond_bb->loop_father == loop1);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +           current_function_name (), loop1->num,
> +           true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                  profile_probability::always (),
> +                                  profile_probability::never (),
> +                                  profile_probability::always (),
> +                                  profile_probability::always (),
> +                                  true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition.  */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                   gimple_cond_code (cond),
> +                                   gimple_cond_lhs (cond),
> +                                   gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  /* Replace the condition in loop2 with a bool constant to let PassManager
> +     remove the variant branch after current pass completes.  */
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1.  This
> +     statement acts as a switch to transfer execution from loop1 to loop2,
> +     when loop1 enters into invariant state.  */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                       NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2.  */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in LOOP, to find out a good candidate
> +   upon which we can do loop split.  */
> +
> +static bool
> +split_loop_on_cond (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +  bool do_split = false;
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field.  */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* We only consider conditional statement, which be executed at most once
> +      in each iteration of the loop.  So skip statements in inner loops.  */
> +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> +     continue;
> +
> +      /* Actually this check is not a must constraint. With it, we can ensure
> +      conditional statement will always be executed in each iteration. */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +     continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +     continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> +
> +      if (branch_edge)
> +     {
> +       do_split_loop_on_cond (loop, branch_edge);
> +       do_split = true;
> +       break;
> +     }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return do_split;
> +}
> +
> /* Main entry point.  Perform loop splitting on all suitable loops.  */
>
> static unsigned int
> @@ -662,6 +1383,32 @@ tree_ssa_split_loops (void)
>       }
>     }
>
> +  if (changed)
> +    {
> +      cleanup_tree_cfg ();
> +      changed = false;
> +    }
> +
> +  /* Perform loop splitting for suitable if-conditions in all loops.  */
> +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> +    loop->aux = NULL;
> +
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +    {
> +      if (loop->aux)
> +        {
> +       loop_outer (loop)->aux = loop;
> +       continue;
> +     }
> +
> +      if (!optimize_loop_for_size_p (loop)
> +       && split_loop_on_cond (loop))
> +     {
> +       loop_outer (loop)->aux = loop;
> +       changed = true;
> +     }
> +    }
> +
>   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
>     loop->aux = NULL;
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-15 16:06                   ` Michael Matz
@ 2019-10-22 10:16                     ` Feng Xue OS
  2019-10-22 11:16                       ` Michael Matz
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-10-22 10:16 UTC (permalink / raw)
  To: Michael Matz, Philipp Tomsich
  Cc: Richard Biener, gcc-patches, Christoph Müllner, erick.ochoa

Hi, Michael,

  Since gcc 10 release is coming, that will be good if we can add this patch before that. Thanks

Feng.

________________________________________
From: Michael Matz <matz@suse.de>
Sent: Wednesday, October 16, 2019 12:01 AM
To: Philipp Tomsich
Cc: Feng Xue OS; Richard Biener; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.ochoa@theobroma-systems.com
Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Hi,

On Tue, 15 Oct 2019, Philipp Tomsich wrote:

> This looks good from our side and has shown useful (combined with the other 2 patches) in
> our testing with SPEC2017.
> Given that this looks final: what is the plan for getting this merged?

I'll get to review this v3 version this week.


Ciao,
Michael.

>
> Thanks,
> Philipp.
>
> > On 12.09.2019, at 12:23, Feng Xue OS <fxue at os dot amperecomputing dot com> wrote:
> >
> > ---
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 1391a562c35..28981fa1048 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a single loop.
> > @item lim-expensive
> > The minimum cost of an expensive expression in the loop invariant motion.
> >
> > +@item max-cond-loop-split-insns
> > +In a loop, if a branch of a conditional statement is selected since certain
> > +loop iteration, any operand that contributes to computation of the conditional
> > +expression remains unchanged in all following iterations, the statement is
> > +semi-invariant, upon which we can do a kind of loop split transformation.
> > +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> > +added due to loop split on semi-invariant conditional statement.
> > +
> > +@item min-cond-loop-split-prob
> > +When FDO profile information is available, @option{min-cond-loop-split-prob}
> > +specifies minimum threshold for probability of semi-invariant condition
> > +statement to trigger loop split.
> > +
> > @item iv-consider-all-candidates-bound
> > Bound on number of candidates for induction variables, below which
> > all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 13001a7bb2d..12bc8c26c9e 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> >     "The maximum number of unswitchings in a single loop.",
> >     3, 0, 0)
> >
> > +/* The maximum number of increased insns due to loop split on semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > +   "max-cond-loop-split-insns",
> > +   "The maximum number of insns to be added due to loop split on "
> > +   "semi-invariant condition statement.",
> > +   100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > +   "min-cond-loop-split-prob",
> > +   "The minimum threshold for probability of semi-invariant condition "
> > +   "statement to trigger loop split.",
> > +   30, 0, 100)
> > +
> > /* The maximum number of insns in loop header duplicated by the copy loop
> >    headers pass.  */
> > DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> >
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > new file mode 100644
> > index 00000000000..51f9da22fc7
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +#include <string>
> > +#include <map>
> > +
> > +using namespace std;
> > +
> > +class  A
> > +{
> > +public:
> > +  bool empty;
> > +  void set (string s);
> > +};
> > +
> > +class  B
> > +{
> > +  map<int, string> m;
> > +  void f ();
> > +};
> > +
> > +extern A *ga;
> > +
> > +void B::f ()
> > +{
> > +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> > +    {
> > +      if (ga->empty)
> > +        ga->set (iter->second);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > new file mode 100644
> > index 00000000000..bbd522d6bcd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> > +{
> > +  return i + 1;
> > +}
> > +
> > +extern int do_something (void);
> > +extern int b;
> > +
> > +void test(int n)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < n; i = inc (i))
> > +    {
> > +      if (b)
> > +        b = do_something();
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index f5f083384bc..e4a1b6d2019 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> > #include "tree-ssa-loop.h"
> > #include "tree-ssa-loop-manip.h"
> > #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> > +#include "tree-cfgcleanup.h"
> > #include "cfgloop.h"
> > +#include "params.h"
> > #include "tree-scalar-evolution.h"
> > #include "gimple-iterator.h"
> > #include "gimple-pretty-print.h"
> > @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
> > #include "gimple-fold.h"
> > #include "gimplify-me.h"
> >
> > -/* This file implements loop splitting, i.e. transformation of loops like
> > +/* This file implements two kinds of loop splitting.
> > +
> > +   One transformation of loops like:
> >
> >    for (i = 0; i < 100; i++)
> >      {
> > @@ -612,6 +617,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
> >   return changed;
> > }
> >
> > +/* Another transformation of loops like:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> > +         a_j = ...;  // change at least one a_j
> > +       else
> > +         S;          // not change any a_j
> > +     }
> > +
> > +   into:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;
> > +       else
> > +         {
> > +           S;
> > +           i = NEXT ();
> > +           break;
> > +         }
> > +     }
> > +
> > +   for (; CHECK (i); i = NEXT ())
> > +     {
> > +       S;
> > +     }
> > +
> > +   */
> > +
> > +/* Data structure to hold temporary information during loop split upon
> > +   semi-invariant conditional statement.  */
> > +class split_info {
> > +public:
> > +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> > +  basic_block *bbs;
> > +
> > +  /* All memory store/clobber statements in a loop.  */
> > +  auto_vec<gimple *> memory_stores;
> > +
> > +  /* Whether above memory stores vector has been filled.  */
> > +  int need_init;
> > +
> > +  split_info () : bbs (NULL),  need_init (true) { }
> > +
> > +  ~split_info ()
> > +    {
> > +      if (bbs)
> > +   free (bbs);
> > +    }
> > +};
> > +
> > +/* Find all statements with memory-write effect in LOOP, including memory
> > +   store and non-pure function call, and keep those in a vector.  This work
> > +   is only done one time, for the vector should be constant during analysis
> > +   stage of semi-invariant condition.  */
> > +
> > +static void
> > +find_vdef_in_loop (struct loop *loop)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +  gphi *vphi = get_virtual_phi (loop->header);
> > +
> > +  /* Indicate memory store vector has been filled.  */
> > +  info->need_init = false;
> > +
> > +  /* If loop contains memory operation, there must be a virtual PHI node in
> > +     loop header basic block.  */
> > +  if (vphi == NULL)
> > +    return;
> > +
> > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> > +     links the first and the last virtual SSA names, by using the last as
> > +     PHI operand to define the first.  */
> > +  const edge latch = loop_latch_edge (loop);
> > +  const tree first = gimple_phi_result (vphi);
> > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > +
> > +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> > +     is defined by itself.
> > +
> > +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > +
> > +     This means the loop contains only memory loads, so we can skip it.  */
> > +  if (first == last)
> > +    return;
> > +
> > +  auto_vec<gimple *> other_stores;
> > +  auto_vec<tree> worklist;
> > +  auto_bitmap visited;
> > +
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > +  worklist.safe_push (last);
> > +
> > +  do
> > +    {
> > +      tree vuse = worklist.pop ();
> > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > +
> > +      /* We mark the first and last SSA names as visited at the beginning,
> > +    and reversely start the process from the last SSA name towards the
> > +    first, which ensures that this do-while will not touch SSA names
> > +    defined outside of the loop.  */
> > +      gcc_assert (gimple_bb (stmt)
> > +             && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > +
> > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > +   {
> > +     gphi *phi = as_a <gphi *> (stmt);
> > +
> > +     for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +       {
> > +         tree arg = gimple_phi_arg_def (stmt, i);
> > +
> > +         if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > +           worklist.safe_push (arg);
> > +       }
> > +   }
> > +      else
> > +   {
> > +     tree prev = gimple_vuse (stmt);
> > +
> > +     /* Non-pure call statement is conservatively assumed to impact all
> > +        memory locations.  So place call statements ahead of other memory
> > +        stores in the vector with an idea of of using them as shortcut
> > +        terminators to memory alias analysis.  */
> > +     if (gimple_code (stmt) == GIMPLE_CALL)
> > +       info->memory_stores.safe_push (stmt);
> > +     else
> > +       other_stores.safe_push (stmt);
> > +
> > +     if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > +       worklist.safe_push (prev);
> > +   }
> > +    } while (!worklist.is_empty ());
> > +
> > +    info->memory_stores.safe_splice (other_stores);
> > +}
> > +
> > +
> > +/* Given STMT, memory load or pure call statement, check whether it is impacted
> > +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> > +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> > +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> > +   NULL, all basic blocks of LOOP are checked.  */
> > +
> > +static bool
> > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                  const_basic_block skip_head)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +
> > +  /* Collect memory store/clobber statements if have not do that.  */
> > +  if (info->need_init)
> > +    find_vdef_in_loop (loop);
> > +
> > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> > +  ao_ref ref;
> > +  gimple *store;
> > +  unsigned i;
> > +
> > +  ao_ref_init (&ref, rhs);
> > +
> > +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> > +    {
> > +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> > +      if (skip_head
> > +     && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> > +   continue;
> > +
> > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > +   return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Forward declaration.  */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                  const_basic_block skip_head);
> > +
> > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> > +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> > +   unchanged in next interation.  We call this characterisic as semi-
> > +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> > +   basic blocks and control flows in the loop will be considered.  If non-
> > +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> > +
> > +static bool
> > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > +                 const_basic_block skip_head)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > +  const_basic_block def_bb = gimple_bb (def);
> > +
> > +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > +    return true;
> > +
> > +  if (gimple_code (def) == GIMPLE_PHI)
> > +    {
> > +      /* For PHI node that is not in loop header, its source operands should
> > +    be defined inside the loop, which are seen as loop variant.  */
> > +      if (def_bb != loop->header || !skip_head)
> > +   return false;
> > +
> > +      const_edge latch = loop_latch_edge (loop);
> > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > +
> > +      /* A PHI node in loop header contains two source operands, one is
> > +    initial value, the other is the copy of last iteration through loop
> > +    latch, we call it latch value.  From the PHI node to definition
> > +    of latch value, if excluding branch trace from SKIP_HEAD, there
> > +    is no definition of other version of same variable, SSA name defined
> > +    by the PHI node is semi-invariant.
> > +
> > +                         loop entry
> > +                              |     .--- latch ---.
> > +                              |     |             |
> > +                              v     v             |
> > +                  x_1 = PHI <x_0,  x_3>           |
> > +                           |                      |
> > +                           v                      |
> > +              .------- if (cond) -------.         |
> > +              |                         |         |
> > +              |                     [ SKIP ]      |
> > +              |                         |         |
> > +              |                     x_2 = ...     |
> > +              |                         |         |
> > +              '---- T ---->.<---- F ----'         |
> > +                           |                      |
> > +                           v                      |
> > +                  x_3 = PHI <x_1, x_2>            |
> > +                           |                      |
> > +                           '----------------------'
> > +
> > +   Suppose in certain iteration, execution flow in above graph goes
> > +   through true branch, which means that one source value to define
> > +   x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> > +   x_1 in next iterations is defined by x_3, we know that x_1 will
> > +   never changed if COND always chooses true branch from then on.  */
> > +
> > +      while (from != name)
> > +   {
> > +     /* A new value comes from a CONSTANT.  */
> > +     if (TREE_CODE (from) != SSA_NAME)
> > +       return false;
> > +
> > +     gimple *stmt = SSA_NAME_DEF_STMT (from);
> > +     const_basic_block bb = gimple_bb (stmt);
> > +
> > +     /* A new value comes from outside of loop.  */
> > +     if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > +       return false;
> > +
> > +     from = NULL_TREE;
> > +
> > +     if (gimple_code (stmt) == GIMPLE_PHI)
> > +       {
> > +         gphi *phi = as_a <gphi *> (stmt);
> > +
> > +         for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +           {
> > +             const_edge e = gimple_phi_arg_edge (phi, i);
> > +
> > +             /* Not consider redefinitions in excluded basic blocks.  */
> > +             if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> > +               {
> > +                 /* There are more than one source operands that can
> > +                    provide value to the SSA name, it is variant.  */
> > +                 if (from)
> > +                   return false;
> > +
> > +                 from = gimple_phi_arg_def (phi, i);
> > +               }
> > +           }
> > +       }
> > +     else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > +       {
> > +         /* For simple value copy, check its rhs instead.  */
> > +         if (gimple_assign_ssa_name_copy_p (stmt))
> > +           from = gimple_assign_rhs1 (stmt);
> > +       }
> > +
> > +     /* Any other kind of definition is deemed to introduce a new value
> > +        to the SSA name.  */
> > +     if (!from)
> > +       return false;
> > +   }
> > +   return true;
> > +    }
> > +
> > +  /* Value originated from volatile memory load or return of normal (non-
> > +     const/pure) call should not be treated as constant in each iteration.  */
> > +  if (gimple_has_side_effects (def))
> > +    return false;
> > +
> > +  /* Check if any memory store may kill memory load at this place.  */
> > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> > +    return false;
> > +
> > +  /* Check operands of definition statement of the SSA name.  */
> > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > +}
> > +
> > +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> > +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> > +   dominated by it are excluded from the loop.  */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                  const_basic_block skip_head)
> > +{
> > +  ssa_op_iter iter;
> > +  tree use;
> > +
> > +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> > +     here we only need to check SSA name operands.  This is because check on
> > +     VARDECL operands, which involve memory loads, must have been done
> > +     prior to invocation of this function in vuse_semi_invariant_p.  */
> > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > +    {
> > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > +   return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine when conditional statement never transfers execution to one of its
> > +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> > +   and those basic blocks dominated by BRANCH_BB.  */
> > +
> > +static bool
> > +branch_removable_p (basic_block branch_bb)
> > +{
> > +  if (single_pred_p (branch_bb))
> > +    return true;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > +    {
> > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > +   continue;
> > +
> > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > +   continue;
> > +
> > +       /* The branch can be reached from opposite branch, or from some
> > +     statement not dominated by the conditional statement.  */
> > +      return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Find out which branch of a conditional statement (COND) is invariant in the
> > +   execution context of LOOP.  That is: once the branch is selected in certain
> > +   iteration of the loop, any operand that contributes to computation of the
> > +   conditional statement remains unchanged in all following iterations.  */
> > +
> > +static edge
> > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > +{
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  basic_block targ_bb[2];
> > +  bool invar[2];
> > +  unsigned invar_checks;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > +
> > +      /* One branch directs to loop exit, no need to perform loop split upon
> > +    this conditional statement.  Firstly, it is trivial if the exit branch
> > +    is semi-invariant, for the statement is just to break loop.  Secondly,
> > +    if the opposite branch is semi-invariant, it means that the statement
> > +    is real loop-invariant, which is covered by loop unswitch.  */
> > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > +   return NULL;
> > +    }
> > +
> > +  invar_checks = 0;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      invar[!i] = false;
> > +
> > +      if (!branch_removable_p (targ_bb[i]))
> > +   continue;
> > +
> > +      /* Given a semi-invariant branch, if its opposite branch dominates
> > +    loop latch, it and its following trace will only be executed in
> > +    final iteration of loop, namely it is not part of repeated body
> > +    of the loop.  Similar to the above case that the branch is loop
> > +    exit, no need to split loop.  */
> > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > +   continue;
> > +
> > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > +      invar_checks++;
> > +    }
> > +
> > +  /* With both branches being invariant (handled by loop unswitch) or
> > +     variant is not what we want.  */
> > +  if (invar[0] ^ !invar[1])
> > +    return NULL;
> > +
> > +  /* Found a real loop-invariant condition, do nothing.  */
> > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > +    return NULL;
> > +
> > +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> > +}
> > +
> > +/* Calculate increased code size measured by estimated insn number if applying
> > +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> > +
> > +static int
> > +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> > +{
> > +  basic_block cond_bb = branch_edge->src;
> > +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> > +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  int num = 0;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      /* Do no count basic blocks only in opposite branch.  */
> > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> > +   continue;
> > +
> > +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> > +    }
> > +
> > +  /* It is unnecessary to evaluate expression of the conditional statement
> > +     in new loop that contains only invariant branch.  This expresion should
> > +     be constant value (either true or false).  Exclude code size of insns
> > +     that contribute to computation of the expression.  */
> > +
> > +  auto_vec<gimple *> worklist;
> > +  hash_set<gimple *> removed;
> > +  gimple *stmt = last_stmt (cond_bb);
> > +
> > +  worklist.safe_push (stmt);
> > +  removed.add (stmt);
> > +  num -= estimate_num_insns (stmt, &eni_size_weights);
> > +
> > +  do
> > +    {
> > +      ssa_op_iter opnd_iter;
> > +      use_operand_p opnd_p;
> > +
> > +      stmt = worklist.pop ();
> > +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> > +   {
> > +     tree opnd = USE_FROM_PTR (opnd_p);
> > +
> > +     if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> > +       continue;
> > +
> > +     gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> > +     use_operand_p use_p;
> > +     imm_use_iterator use_iter;
> > +
> > +     if (removed.contains (opnd_stmt)
> > +         || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> > +       continue;
> > +
> > +     FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> > +       {
> > +              gimple *use_stmt = USE_STMT (use_p);
> > +
> > +         if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> > +           {
> > +             opnd_stmt = NULL;
> > +             break;
> > +           }
> > +       }
> > +
> > +     if (opnd_stmt)
> > +       {
> > +         worklist.safe_push (opnd_stmt);
> > +         removed.add (opnd_stmt);
> > +         num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> > +       }
> > +   }
> > +    } while (!worklist.is_empty ());
> > +
> > +  gcc_assert (num >= 0);
> > +  return num;
> > +}
> > +
> > +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> > +   and check whether it is eligible and profitable to perform loop split upon
> > +   this branch in LOOP.  */
> > +
> > +static edge
> > +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> > +{
> > +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> > +
> > +  if (!invar_branch)
> > +    return NULL;
> > +
> > +  profile_probability prob = invar_branch->probability;
> > +
> > +  /* When accurate profile information is available, and execution
> > +     frequency of the branch is too low, just let it go.  */
> > +  if (prob.reliable_p ())
> > +    {
> > +      int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB);
> > +
> > +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> > +   return NULL;
> > +    }
> > +
> > +  /* Add a threshold for increased code size to disable loop split.  */
> > +  if (compute_added_num_insns (loop, invar_branch)
> > +      > PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))
> > +    return NULL;
> > +
> > +  return invar_branch;
> > +}
> > +
> > +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> > +   conditional statement, perform loop split transformation illustrated
> > +   as the following graph.
> > +
> > +               .-------T------ if (true) ------F------.
> > +               |                    .---------------. |
> > +               |                    |               | |
> > +               v                    |               v v
> > +          pre-header                |            pre-header
> > +               | .------------.     |                 | .------------.
> > +               | |            |     |                 | |            |
> > +               | v            |     |                 | v            |
> > +             header           |     |               header           |
> > +               |              |     |                 |              |
> > +       [ bool r = cond; ]     |     |                 |              |
> > +               |              |     |                 |              |
> > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > +      |                 |     |     |        |                 |     |
> > +  invariant             |     |     |    invariant             |     |
> > +      |                 |     |     |        |                 |     |
> > +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> > +               |              |    /                  |              |
> > +             stmts            |   /                 stmts            |
> > +               |              |  /                    |              |
> > +              / \             | /                    / \             |
> > +     .-------*   *       [ if (!r) ]        .-------*   *            |
> > +     |           |            |             |           |            |
> > +     |         latch          |             |         latch          |
> > +     |           |            |             |           |            |
> > +     |           '------------'             |           '------------'
> > +     '------------------------. .-----------'
> > +             loop1            | |                   loop2
> > +                              v v
> > +                             exits
> > +
> > +   In the graph, loop1 represents the part derived from original one, and
> > +   loop2 is duplicated using loop_version (), which corresponds to the part
> > +   of original one being splitted out.  In loop1, a new bool temporary (r)
> > +   is introduced to keep value of the condition result.  In original latch
> > +   edge of loop1, we insert a new conditional statement whose value comes
> > +   from previous temporary (r), one of its branch goes back to loop1 header
> > +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> > +   edge.  And also in loop2, we abandon the variant branch of the conditional
> > +   statement candidate by setting a constant bool condition, based on which
> > +   branch is semi-invariant.  */
> > +
> > +static bool
> > +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> > +{
> > +  basic_block cond_bb = invar_branch->src;
> > +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> > +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> > +
> > +  gcc_assert (cond_bb->loop_father == loop1);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> > +         current_function_name (), loop1->num,
> > +         true_invar ? "T" : "F", cond_bb->index);
> > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > +   }
> > +
> > +  initialize_original_copy_tables ();
> > +
> > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > +                                profile_probability::always (),
> > +                                profile_probability::never (),
> > +                                profile_probability::always (),
> > +                                profile_probability::always (),
> > +                                true);
> > +  if (!loop2)
> > +    {
> > +      free_original_copy_tables ();
> > +      return false;
> > +    }
> > +
> > +  /* Generate a bool type temporary to hold result of the condition.  */
> > +  tree tmp = make_ssa_name (boolean_type_node);
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple *stmt = gimple_build_assign (tmp,
> > +                                 gimple_cond_code (cond),
> > +                                 gimple_cond_lhs (cond),
> > +                                 gimple_cond_rhs (cond));
> > +
> > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > +  update_stmt (cond);
> > +
> > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > +
> > +  /* Replace the condition in loop2 with a bool constant to let PassManager
> > +     remove the variant branch after current pass completes.  */
> > +  if (true_invar)
> > +    gimple_cond_make_true (cond_copy);
> > +  else
> > +    gimple_cond_make_false (cond_copy);
> > +
> > +  update_stmt (cond_copy);
> > +
> > +  /* Insert a new conditional statement on latch edge of loop1.  This
> > +     statement acts as a switch to transfer execution from loop1 to loop2,
> > +     when loop1 enters into invariant state.  */
> > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> > +                                     NULL_TREE, NULL_TREE);
> > +
> > +  gsi = gsi_last_bb (break_bb);
> > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > +
> > +  edge to_loop1 = single_succ_edge (break_bb);
> > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> > +
> > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> > +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> > +
> > +  update_ssa (TODO_update_ssa);
> > +
> > +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> > +     pre-header, we should update PHIs in loop2 to reflect this connection
> > +     between loop1 and loop2.  */
> > +  connect_loop_phis (loop1, loop2, to_loop2);
> > +
> > +  free_original_copy_tables ();
> > +
> > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > +
> > +  return true;
> > +}
> > +
> > +/* Traverse all conditional statements in LOOP, to find out a good candidate
> > +   upon which we can do loop split.  */
> > +
> > +static bool
> > +split_loop_on_cond (struct loop *loop)
> > +{
> > +  split_info *info = new split_info ();
> > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > +  bool do_split = false;
> > +
> > +  /* Allocate an area to keep temporary info, and associate its address
> > +     with loop aux field.  */
> > +  loop->aux = info;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      basic_block bb = bbs[i];
> > +
> > +      /* We only consider conditional statement, which be executed at most once
> > +    in each iteration of the loop.  So skip statements in inner loops.  */
> > +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> > +   continue;
> > +
> > +      /* Actually this check is not a must constraint. With it, we can ensure
> > +    conditional statement will always be executed in each iteration. */
> > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +   continue;
> > +
> > +      gimple *last = last_stmt (bb);
> > +
> > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > +   continue;
> > +
> > +      gcond *cond = as_a <gcond *> (last);
> > +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> > +
> > +      if (branch_edge)
> > +   {
> > +     do_split_loop_on_cond (loop, branch_edge);
> > +     do_split = true;
> > +     break;
> > +   }
> > +    }
> > +
> > +  delete info;
> > +  loop->aux = NULL;
> > +
> > +  return do_split;
> > +}
> > +
> > /* Main entry point.  Perform loop splitting on all suitable loops.  */
> >
> > static unsigned int
> > @@ -662,6 +1383,32 @@ tree_ssa_split_loops (void)
> >     }
> >     }
> >
> > +  if (changed)
> > +    {
> > +      cleanup_tree_cfg ();
> > +      changed = false;
> > +    }
> > +
> > +  /* Perform loop splitting for suitable if-conditions in all loops.  */
> > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > +    loop->aux = NULL;
> > +
> > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > +    {
> > +      if (loop->aux)
> > +        {
> > +     loop_outer (loop)->aux = loop;
> > +     continue;
> > +   }
> > +
> > +      if (!optimize_loop_for_size_p (loop)
> > +     && split_loop_on_cond (loop))
> > +   {
> > +     loop_outer (loop)->aux = loop;
> > +     changed = true;
> > +   }
> > +    }
> > +
> >   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> >     loop->aux = NULL;
> >
> > --
> > 2.17.1
> >
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-22 10:16                     ` Feng Xue OS
@ 2019-10-22 11:16                       ` Michael Matz
  2019-10-23  5:49                         ` Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Michael Matz @ 2019-10-22 11:16 UTC (permalink / raw)
  To: Feng Xue OS
  Cc: Philipp Tomsich, Richard Biener, gcc-patches,
	Christoph Müllner, erick.ochoa

Hello,

I've only noticed a couple typos, and one minor remark.  From my 
perspective it's okay, but you still need the okay of a proper reviewer, 
for which you might want to state the testing/regression state of this 
patch relative to trunk.  The remarks follow:

On Tue, 22 Oct 2019, Feng Xue OS wrote:

> > > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> > > +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> > > +   unchanged in next interation.  We call this characterisic as semi-

"iteration", "characteristic", remove "as"

> > > +             /* Not consider redefinitions in excluded basic blocks.  */

"Don't consider"

> > > +  /* It is unnecessary to evaluate expression of the conditional statement
> > > +     in new loop that contains only invariant branch.  This expresion should

"expression"

> > > @@ -662,6 +1383,32 @@ tree_ssa_split_loops (void)
> > >     }
> > >     }
> > >
> > > +  if (changed)
> > > +    {
> > > +      cleanup_tree_cfg ();
> > > +      changed = false;
> > > +    }
> > > +
> > > +  /* Perform loop splitting for suitable if-conditions in all loops.  */
> > > +  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > > +    loop->aux = NULL;
> > > +
> > > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > > +    {
> > > +      if (loop->aux)
> > > +        {
> > > +     loop_outer (loop)->aux = loop;
> > > +     continue;
> > > +   }
> > > +
> > > +      if (!optimize_loop_for_size_p (loop)
> > > +     && split_loop_on_cond (loop))
> > > +   {
> > > +     loop_outer (loop)->aux = loop;
> > > +     changed = true;
> > > +   }
> > > +    }
> > > +
> > >   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
> > >     loop->aux = NULL;

I just wonder why you duplicated these three loops instead of integrating 
the real body into the existing LI_FROM_INNERMOST loop.  I would have 
expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block 
to simply be the else block of the existing
"if (... conditions for normal loop splitting ...)" block.

Either way it's okay with me.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-22 11:16                       ` Michael Matz
@ 2019-10-23  5:49                         ` Feng Xue OS
  2019-10-23  9:10                           ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-10-23  5:49 UTC (permalink / raw)
  To: Michael Matz, Richard Biener
  Cc: Philipp Tomsich, gcc-patches, Christoph Müllner, erick.ochoa

Michael,

> I've only noticed a couple typos, and one minor remark. 
Typos corrected.

> I just wonder why you duplicated these three loops instead of integrating
> the real body into the existing LI_FROM_INNERMOST loop.  I would have
> expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block
> to simply be the else block of the existing
> "if (... conditions for normal loop splitting ...)" block.
Adjusted to do two kinds of loop-split in same LI_FROM_INNERMOST loop.

> From my perspective it's okay, but you still need the okay of a proper reviewer,
> for which you might want to state the testing/regression state of this
> patch relative to trunk. 

Richard,
  
  Is it ok to commit this patch? Bootstrap and regression test passed. And for
performance, we can get about 7% improvement on spec2017 omnetpp with this
patch.

Thanks,
Feng

---
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1407d019d14..d41e5aa0215 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item max-loop-cond-split-insns
+In a loop, if a branch of a conditional statement is selected since certain
+loop iteration, any operand that contributes to computation of the conditional
+expression remains unchanged in all following iterations, the statement is
+semi-invariant, upon which we can do a kind of loop split transformation.
+@option{max-loop-cond-split-insns} controls maximum number of insns to be
+added due to loop split on semi-invariant conditional statement.
+
+@item min-loop-cond-split-prob
+When FDO profile information is available, @option{min-loop-cond-split-prob}
+specifies minimum threshold for probability of semi-invariant condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 322c37f8b96..73b59f7465e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
 	"The maximum number of unswitchings in a single loop.",
 	3, 0, 0)
 
+/* The maximum number of increased insns due to loop split on semi-invariant
+   condition statement.  */
+DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
+	"max-loop-cond-split-insns",
+	"The maximum number of insns to be added due to loop split on "
+	"semi-invariant condition statement.",
+	100, 0, 0)
+
+DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
+	"min-loop-cond-split-prob",
+	"The minimum threshold for probability of semi-invariant condition "
+	"statement to trigger loop split.",
+	30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,

diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
new file mode 100644
index 00000000000..51f9da22fc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+#include <string>
+#include <map>
+
+using namespace std;
+
+class  A
+{
+public:
+  bool empty;
+  void set (string s);
+};
+
+class  B
+{
+  map<int, string> m;
+  void f ();
+};
+
+extern A *ga;
+
+void B::f ()
+{
+  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
+    {
+      if (ga->empty)
+        ga->set (iter->second);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
new file mode 100644
index 00000000000..bbd522d6bcd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+__attribute__((pure)) __attribute__((noinline)) int inc (int i)
+{
+  return i + 1;
+}
+
+extern int do_something (void);
+extern int b;
+
+void test(int n)
+{
+  int i;
+
+  for (i = 0; i < n; i = inc (i))
+    {
+      if (b)
+        b = do_something();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index f5f083384bc..5cffd4bb508 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
+#include "tree-cfgcleanup.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 
-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kinds of loop splitting.
+
+   One transformation of loops like:
 
    for (i = 0; i < 100; i++)
      {
@@ -487,8 +492,9 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
    single exit of LOOP.  */
 
 static bool
-split_loop (class loop *loop1, class tree_niter_desc *niter)
+split_loop (class loop *loop1)
 {
+  class tree_niter_desc niter;
   basic_block *bbs;
   unsigned i;
   bool changed = false;
@@ -496,8 +502,28 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   tree border = NULL_TREE;
   affine_iv iv;
 
+  if (!single_exit (loop1)
+      /* ??? We could handle non-empty latches when we split the latch edge
+         (not the exit edge), and put the new exit condition in the new block.
+	 OTOH this executes some code unconditionally that might have been
+	 skipped by the original exit before.  */
+      || !empty_block_p (loop1->latch)
+      || !easy_exit_values (loop1)
+      || !number_of_iterations_exit (loop1, single_exit (loop1), &niter,
+				     false, true)
+      || niter.cmp == ERROR_MARK
+      /* We can't yet handle loops controlled by a != predicate.  */
+      || niter.cmp == NE_EXPR)
+    return false;
+
   bbs = get_loop_body (loop1);
 
+  if (!can_copy_bbs_p (bbs, loop1->num_nodes))
+    {
+      free (bbs);
+      return false;
+    }
+
   /* Find a splitting opportunity.  */
   for (i = 0; i < loop1->num_nodes; i++)
     if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
@@ -505,8 +531,8 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
 	/* Handling opposite steps is not implemented yet.  Neither
 	   is handling different step sizes.  */
 	if ((tree_int_cst_sign_bit (iv.step)
-	     != tree_int_cst_sign_bit (niter->control.step))
-	    || !tree_int_cst_equal (iv.step, niter->control.step))
+	     != tree_int_cst_sign_bit (niter.control.step))
+	    || !tree_int_cst_equal (iv.step, niter.control.step))
 	  continue;
 
 	/* Find a loop PHI node that defines guard_iv directly,
@@ -575,7 +601,7 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
 	   Compute the new bound for the guarding IV and patch the
 	   loop exit to use it instead of original IV and bound.  */
 	gimple_seq stmts = NULL;
-	tree newend = compute_new_first_bound (&stmts, niter, border,
+	tree newend = compute_new_first_bound (&stmts, &niter, border,
 					       guard_code, guard_init);
 	if (stmts)
 	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
@@ -612,6 +638,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   return changed;
 }
 
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))  // expr is pure
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement.  */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop.  */
+  auto_vec<gimple *> memory_stores;
+
+  /* Whether above memory stores vector has been filled.  */
+  int need_init;
+
+  split_info () : bbs (NULL),  need_init (true) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+	free (bbs);
+    }
+};
+
+/* Find all statements with memory-write effect in LOOP, including memory
+   store and non-pure function call, and keep those in a vector.  This work
+   is only done one time, for the vector should be constant during analysis
+   stage of semi-invariant condition.  */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled.  */
+  info->need_init = false;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block.  */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes.  The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first.  */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it.  */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> other_stores;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+	 and reversely start the process from the last SSA name towards the
+	 first, which ensures that this do-while will not touch SSA names
+	 defined outside of the loop.  */
+      gcc_assert (gimple_bb (stmt)
+		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+	{
+	  gphi *phi = as_a <gphi *> (stmt);
+
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree arg = gimple_phi_arg_def (stmt, i);
+
+	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+		worklist.safe_push (arg);
+	    }
+	}
+      else
+	{
+	  tree prev = gimple_vuse (stmt);
+
+	  /* Non-pure call statement is conservatively assumed to impact all
+	     memory locations.  So place call statements ahead of other memory
+	     stores in the vector with an idea of of using them as shortcut
+	     terminators to memory alias analysis.  */
+	  if (gimple_code (stmt) == GIMPLE_CALL)
+	    info->memory_stores.safe_push (stmt);
+	  else
+	    other_stores.safe_push (stmt);
+
+	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+	    worklist.safe_push (prev);
+	}
+    } while (!worklist.is_empty ());
+
+    info->memory_stores.safe_splice (other_stores);
+}
+
+
+/* Given STMT, memory load or pure call statement, check whether it is impacted
+   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
+   trace is composed of SKIP_HEAD and those basic block dominated by it, always
+   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
+   NULL, all basic blocks of LOOP are checked.  */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+
+  /* Collect memory store/clobber statements if have not do that.  */
+  if (info->need_init)
+    find_vdef_in_loop (loop);
+
+  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
+    {
+      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
+      if (skip_head
+	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+	continue;
+
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+	return false;
+    }
+
+  return true;
+}
+
+/* Forward declaration.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head);
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
+   certain iteration of LOOP, check whether an SSA name (NAME) remains
+   unchanged in next iteration.  We call this characteristic semi-
+   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
+   basic blocks and control flows in the loop will be considered.  If non-
+   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
+
+static bool
+ssa_semi_invariant_p (struct loop *loop, const tree name,
+		      const_basic_block skip_head)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside a loop is definitely semi-invariant.  */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  if (gimple_code (def) == GIMPLE_PHI)
+    {
+      /* For PHI node that is not in loop header, its source operands should
+	 be defined inside the loop, which are seen as loop variant.  */
+      if (def_bb != loop->header || !skip_head)
+	return false;
+
+      const_edge latch = loop_latch_edge (loop);
+      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
+
+      /* A PHI node in loop header contains two source operands, one is
+	 initial value, the other is the copy of last iteration through loop
+	 latch, we call it latch value.  From the PHI node to definition
+	 of latch value, if excluding branch trace from SKIP_HEAD, there
+	 is no definition of other version of same variable, SSA name defined
+	 by the PHI node is semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+	Suppose in certain iteration, execution flow in above graph goes
+	through true branch, which means that one source value to define
+	x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
+	x_1 in next iterations is defined by x_3, we know that x_1 will
+	never changed if COND always chooses true branch from then on.  */
+
+      while (from != name)
+	{
+	  /* A new value comes from a CONSTANT.  */
+	  if (TREE_CODE (from) != SSA_NAME)
+	    return false;
+
+	  gimple *stmt = SSA_NAME_DEF_STMT (from);
+	  const_basic_block bb = gimple_bb (stmt);
+
+	  /* A new value comes from outside of loop.  */
+	  if (!bb || !flow_bb_inside_loop_p (loop, bb))
+	    return false;
+
+	  from = NULL_TREE;
+
+	  if (gimple_code (stmt) == GIMPLE_PHI)
+	    {
+	      gphi *phi = as_a <gphi *> (stmt);
+
+	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		{
+		  const_edge e = gimple_phi_arg_edge (phi, i);
+
+		  /* Don't consider redefinitions in excluded basic blocks.  */
+		  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+		    {
+		      /* There are more than one source operands that can
+			 provide value to the SSA name, it is variant.  */
+		      if (from)
+			return false;
+
+		      from = gimple_phi_arg_def (phi, i);
+		    }
+		}
+	    }
+	  else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+	    {
+	      /* For simple value copy, check its rhs instead.  */
+	      if (gimple_assign_ssa_name_copy_p (stmt))
+		from = gimple_assign_rhs1 (stmt);
+	    }
+
+	  /* Any other kind of definition is deemed to introduce a new value
+	     to the SSA name.  */
+	  if (!from)
+	    return false;
+	}
+	return true;
+    }
+
+  /* Value originated from volatile memory load or return of normal (non-
+     const/pure) call should not be treated as constant in each iteration.  */
+  if (gimple_has_side_effects (def))
+    return false;
+
+  /* Check if any memory store may kill memory load at this place.  */
+  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
+    return false;
+
+  /* Check operands of definition statement of the SSA name.  */
+  return stmt_semi_invariant_p (loop, def, skip_head);
+}
+
+/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
+   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
+   dominated by it are excluded from the loop.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  ssa_op_iter iter;
+  tree use;
+
+  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
+     here we only need to check SSA name operands.  This is because check on
+     VARDECL operands, which involve memory loads, must have been done
+     prior to invocation of this function in vuse_semi_invariant_p.  */
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+    {
+      if (!ssa_semi_invariant_p (loop, use, skip_head))
+	return false;
+    }
+
+  return true;
+}
+
+/* Determine when conditional statement never transfers execution to one of its
+   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
+   and those basic blocks dominated by BRANCH_BB.  */
+
+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+	continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+	continue;
+
+       /* The branch can be reached from opposite branch, or from some
+	  statement not dominated by the conditional statement.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement (COND) is invariant in the
+   execution context of LOOP.  That is: once the branch is selected in certain
+   iteration of the loop, any operand that contributes to computation of the
+   conditional statement remains unchanged in all following iterations.  */
+
+static edge
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+	 this conditional statement.  Firstly, it is trivial if the exit branch
+	 is semi-invariant, for the statement is just to break loop.  Secondly,
+	 if the opposite branch is semi-invariant, it means that the statement
+	 is real loop-invariant, which is covered by loop unswitch.  */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+	return NULL;
+    }
+
+  invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!branch_removable_p (targ_bb[i]))
+	continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+	 loop latch, it and its following trace will only be executed in
+	 final iteration of loop, namely it is not part of repeated body
+	 of the loop.  Similar to the above case that the branch is loop
+	 exit, no need to split loop.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+	continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want.  */
+  if (invar[0] ^ !invar[1])
+    return NULL;
+
+  /* Found a real loop-invariant condition, do nothing.  */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return NULL;
+
+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
+}
+
+/* Calculate increased code size measured by estimated insn number if applying
+   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
+
+static int
+compute_added_num_insns (struct loop *loop, const_edge branch_edge)
+{
+  basic_block cond_bb = branch_edge->src;
+  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
+  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch.  */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
+	continue;
+
+      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
+    }
+
+  /* It is unnecessary to evaluate expression of the conditional statement
+     in new loop that contains only invariant branch.  This expression should
+     be constant value (either true or false).  Exclude code size of insns
+     that contribute to computation of the expression.  */
+
+  auto_vec<gimple *> worklist;
+  hash_set<gimple *> removed;
+  gimple *stmt = last_stmt (cond_bb);
+
+  worklist.safe_push (stmt);
+  removed.add (stmt);
+  num -= estimate_num_insns (stmt, &eni_size_weights);
+
+  do
+    {
+      ssa_op_iter opnd_iter;
+      use_operand_p opnd_p;
+
+      stmt = worklist.pop ();
+      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
+	{
+	  tree opnd = USE_FROM_PTR (opnd_p);
+
+	  if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
+	    continue;
+
+	  gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
+	  use_operand_p use_p;
+	  imm_use_iterator use_iter;
+
+	  if (removed.contains (opnd_stmt)
+	      || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
+	    continue;
+
+	  FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
+	    {
+	      gimple *use_stmt = USE_STMT (use_p);
+
+	      if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
+		{
+		  opnd_stmt = NULL;
+		  break;
+		}
+	    }
+
+	  if (opnd_stmt)
+	    {
+	      worklist.safe_push (opnd_stmt);
+	      removed.add (opnd_stmt);
+	      num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
+	    }
+	}
+    } while (!worklist.is_empty ());
+
+  gcc_assert (num >= 0);
+  return num;
+}
+
+/* Find out loop-invariant branch of a conditional statement (COND) if it has,
+   and check whether it is eligible and profitable to perform loop split upon
+   this branch in LOOP.  */
+
+static edge
+get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
+{
+  edge invar_branch = get_cond_invariant_branch (loop, cond);
+
+  if (!invar_branch)
+    return NULL;
+
+  profile_probability prob = invar_branch->probability;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go.  */
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_LOOP_COND_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+	return NULL;
+    }
+
+  /* Add a threshold for increased code size to disable loop split.  */
+  if (compute_added_num_insns (loop, invar_branch)
+      > PARAM_VALUE (PARAM_MAX_LOOP_COND_SPLIT_INSNS))
+    return NULL;
+
+  return invar_branch;
+}
+
+/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
+   conditional statement, perform loop split transformation illustrated
+   as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+       [ bool r = cond; ]     |     |                 |              |
+               |              |     |                 |              |
+      .---- if (r) -----.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              |  /                    |              |
+              / \             | /                    / \             |
+     .-------*   *       [ if (!r) ]        .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out.  In loop1, a new bool temporary (r)
+   is introduced to keep value of the condition result.  In original latch
+   edge of loop1, we insert a new conditional statement whose value comes
+   from previous temporary (r), one of its branch goes back to loop1 header
+   as a latch edge, and the other branch goes to loop2 pre-header as an entry
+   edge.  And also in loop2, we abandon the variant branch of the conditional
+   statement candidate by setting a constant bool condition, based on which
+   branch is semi-invariant.  */
+
+static bool
+do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
+{
+  basic_block cond_bb = invar_branch->src;
+  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
+  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
+
+  gcc_assert (cond_bb->loop_father == loop1);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+	      current_function_name (), loop1->num,
+	      true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+				     profile_probability::always (),
+				     profile_probability::never (),
+				     profile_probability::always (),
+				     profile_probability::always (),
+				     true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+				      gimple_cond_code (cond),
+				      gimple_cond_lhs (cond),
+				      gimple_cond_rhs (cond));
+
+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
+  update_stmt (cond);
+
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  /* Replace the condition in loop2 with a bool constant to let PassManager
+     remove the variant branch after current pass completes.  */
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1.  This
+     statement acts as a switch to transfer execution from loop1 to loop2,
+     when loop1 enters into invariant state.  */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
+					  NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
+  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2.  */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Traverse all conditional statements in LOOP, to find out a good candidate
+   upon which we can do loop split.  */
+
+static bool
+split_loop_on_cond (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+  bool do_split = false;
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field.  */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* We only consider conditional statement, which be executed at most once
+	 in each iteration of the loop.  So skip statements in inner loops.  */
+      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
+	continue;
+
+      /* Actually this check is not a must constraint.  With it, we can ensure
+	 conditional statement will always be executed in each iteration.  */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+	continue;
+
+      gcond *cond = as_a <gcond *> (last);
+      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
+
+      if (branch_edge)
+	{
+	  do_split_loop_on_cond (loop, branch_edge);
+	  do_split = true;
+	  break;
+	}
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return do_split;
+}
+
 /* Main entry point.  Perform loop splitting on all suitable loops.  */
 
 static unsigned int
@@ -627,7 +1369,6 @@ tree_ssa_split_loops (void)
   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
     {
-      class tree_niter_desc niter;
       if (loop->aux)
 	{
 	  /* If any of our inner loops was split, don't split us,
@@ -636,29 +1377,14 @@ tree_ssa_split_loops (void)
 	  continue;
 	}
 
-      if (single_exit (loop)
-	  /* ??? We could handle non-empty latches when we split
-	     the latch edge (not the exit edge), and put the new
-	     exit condition in the new block.  OTOH this executes some
-	     code unconditionally that might have been skipped by the
-	     original exit before.  */
-	  && empty_block_p (loop->latch)
-	  && !optimize_loop_for_size_p (loop)
-	  && easy_exit_values (loop)
-	  && number_of_iterations_exit (loop, single_exit (loop), &niter,
-					false, true)
-	  && niter.cmp != ERROR_MARK
-	  /* We can't yet handle loops controlled by a != predicate.  */
-	  && niter.cmp != NE_EXPR
-	  && can_duplicate_loop_p (loop))
+      if (optimize_loop_for_size_p (loop))
+        continue;
+
+      if (split_loop (loop) || split_loop_on_cond (loop))
 	{
-	  if (split_loop (loop, &niter))
-	    {
-	      /* Mark our containing loop as having had some split inner
-	         loops.  */
-	      loop_outer (loop)->aux = loop;
-	      changed = true;
-	    }
+	  /* Mark our containing loop as having had some split inner loops.  */
+	  loop_outer (loop)->aux = loop;
+	  changed = true;
 	}
     }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-23  5:49                         ` Feng Xue OS
@ 2019-10-23  9:10                           ` Richard Biener
  2019-10-23  9:37                             ` Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2019-10-23  9:10 UTC (permalink / raw)
  To: Feng Xue OS
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

On Wed, Oct 23, 2019 at 5:36 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> Michael,
>
> > I've only noticed a couple typos, and one minor remark.
> Typos corrected.
>
> > I just wonder why you duplicated these three loops instead of integrating
> > the real body into the existing LI_FROM_INNERMOST loop.  I would have
> > expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block
> > to simply be the else block of the existing
> > "if (... conditions for normal loop splitting ...)" block.
> Adjusted to do two kinds of loop-split in same LI_FROM_INNERMOST loop.
>
> > From my perspective it's okay, but you still need the okay of a proper reviewer,
> > for which you might want to state the testing/regression state of this
> > patch relative to trunk.
>
> Richard,
>
>   Is it ok to commit this patch? Bootstrap and regression test passed. And for
> performance, we can get about 7% improvement on spec2017 omnetpp with this
> patch.

Can you please provide the corresponding ChangeLog entries as well and
attach the patch?  It seems to be garbled by some encoding.

Thanks,
Richard.

> Thanks,
> Feng
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1407d019d14..d41e5aa0215 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-loop-cond-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-loop-cond-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-loop-cond-split-prob
> +When FDO profile information is available, @option{min-loop-cond-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 322c37f8b96..73b59f7465e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>         "The maximum number of unswitchings in a single loop.",
>         3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
> +       "max-loop-cond-split-insns",
> +       "The maximum number of insns to be added due to loop split on "
> +       "semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
> +       "min-loop-cond-split-prob",
> +       "The minimum threshold for probability of semi-invariant condition "
> +       "statement to trigger loop split.",
> +       30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
>     headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 00000000000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include <string>
> +#include <map>
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map<int, string> m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> +    {
> +      if (ga->empty)
> +        ga->set (iter->second);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 00000000000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +    {
> +      if (b)
> +        b = do_something();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f5f083384bc..5cffd4bb508 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "tree-inline.h"
> +#include "tree-cfgcleanup.h"
>  #include "cfgloop.h"
> +#include "params.h"
>  #include "tree-scalar-evolution.h"
>  #include "gimple-iterator.h"
>  #include "gimple-pretty-print.h"
> @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kinds of loop splitting.
> +
> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -487,8 +492,9 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
>     single exit of LOOP.  */
>
>  static bool
> -split_loop (class loop *loop1, class tree_niter_desc *niter)
> +split_loop (class loop *loop1)
>  {
> +  class tree_niter_desc niter;
>    basic_block *bbs;
>    unsigned i;
>    bool changed = false;
> @@ -496,8 +502,28 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>    tree border = NULL_TREE;
>    affine_iv iv;
>
> +  if (!single_exit (loop1)
> +      /* ??? We could handle non-empty latches when we split the latch edge
> +         (not the exit edge), and put the new exit condition in the new block.
> +        OTOH this executes some code unconditionally that might have been
> +        skipped by the original exit before.  */
> +      || !empty_block_p (loop1->latch)
> +      || !easy_exit_values (loop1)
> +      || !number_of_iterations_exit (loop1, single_exit (loop1), &niter,
> +                                    false, true)
> +      || niter.cmp == ERROR_MARK
> +      /* We can't yet handle loops controlled by a != predicate.  */
> +      || niter.cmp == NE_EXPR)
> +    return false;
> +
>    bbs = get_loop_body (loop1);
>
> +  if (!can_copy_bbs_p (bbs, loop1->num_nodes))
> +    {
> +      free (bbs);
> +      return false;
> +    }
> +
>    /* Find a splitting opportunity.  */
>    for (i = 0; i < loop1->num_nodes; i++)
>      if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
> @@ -505,8 +531,8 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>         /* Handling opposite steps is not implemented yet.  Neither
>            is handling different step sizes.  */
>         if ((tree_int_cst_sign_bit (iv.step)
> -            != tree_int_cst_sign_bit (niter->control.step))
> -           || !tree_int_cst_equal (iv.step, niter->control.step))
> +            != tree_int_cst_sign_bit (niter.control.step))
> +           || !tree_int_cst_equal (iv.step, niter.control.step))
>           continue;
>
>         /* Find a loop PHI node that defines guard_iv directly,
> @@ -575,7 +601,7 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>            Compute the new bound for the guarding IV and patch the
>            loop exit to use it instead of original IV and bound.  */
>         gimple_seq stmts = NULL;
> -       tree newend = compute_new_first_bound (&stmts, niter, border,
> +       tree newend = compute_new_first_bound (&stmts, &niter, border,
>                                                guard_code, guard_init);
>         if (stmts)
>           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> @@ -612,6 +638,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>    return changed;
>  }
>
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement.  */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop.  */
> +  auto_vec<gimple *> memory_stores;
> +
> +  /* Whether above memory stores vector has been filled.  */
> +  int need_init;
> +
> +  split_info () : bbs (NULL),  need_init (true) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +       free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in LOOP, including memory
> +   store and non-pure function call, and keep those in a vector.  This work
> +   is only done one time, for the vector should be constant during analysis
> +   stage of semi-invariant condition.  */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled.  */
> +  info->need_init = false;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block.  */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first.  */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it.  */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> other_stores;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +        and reversely start the process from the last SSA name towards the
> +        first, which ensures that this do-while will not touch SSA names
> +        defined outside of the loop.  */
> +      gcc_assert (gimple_bb (stmt)
> +                 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +       {
> +         gphi *phi = as_a <gphi *> (stmt);
> +
> +         for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +           {
> +             tree arg = gimple_phi_arg_def (stmt, i);
> +
> +             if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +               worklist.safe_push (arg);
> +           }
> +       }
> +      else
> +       {
> +         tree prev = gimple_vuse (stmt);
> +
> +         /* Non-pure call statement is conservatively assumed to impact all
> +            memory locations.  So place call statements ahead of other memory
> +            stores in the vector with an idea of of using them as shortcut
> +            terminators to memory alias analysis.  */
> +         if (gimple_code (stmt) == GIMPLE_CALL)
> +           info->memory_stores.safe_push (stmt);
> +         else
> +           other_stores.safe_push (stmt);
> +
> +         if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +           worklist.safe_push (prev);
> +       }
> +    } while (!worklist.is_empty ());
> +
> +    info->memory_stores.safe_splice (other_stores);
> +}
> +
> +
> +/* Given STMT, memory load or pure call statement, check whether it is impacted
> +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> +   NULL, all basic blocks of LOOP are checked.  */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that.  */
> +  if (info->need_init)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> +    {
> +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> +      if (skip_head
> +         && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +       continue;
> +
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> +   unchanged in next iteration.  We call this characteristic semi-
> +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> +   basic blocks and control flows in the loop will be considered.  If non-
> +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                     const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* For PHI node that is not in loop header, its source operands should
> +        be defined inside the loop, which are seen as loop variant.  */
> +      if (def_bb != loop->header || !skip_head)
> +       return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header contains two source operands, one is
> +        initial value, the other is the copy of last iteration through loop
> +        latch, we call it latch value.  From the PHI node to definition
> +        of latch value, if excluding branch trace from SKIP_HEAD, there
> +        is no definition of other version of same variable, SSA name defined
> +        by the PHI node is semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +       Suppose in certain iteration, execution flow in above graph goes
> +       through true branch, which means that one source value to define
> +       x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +       x_1 in next iterations is defined by x_3, we know that x_1 will
> +       never changed if COND always chooses true branch from then on.  */
> +
> +      while (from != name)
> +       {
> +         /* A new value comes from a CONSTANT.  */
> +         if (TREE_CODE (from) != SSA_NAME)
> +           return false;
> +
> +         gimple *stmt = SSA_NAME_DEF_STMT (from);
> +         const_basic_block bb = gimple_bb (stmt);
> +
> +         /* A new value comes from outside of loop.  */
> +         if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +           return false;
> +
> +         from = NULL_TREE;
> +
> +         if (gimple_code (stmt) == GIMPLE_PHI)
> +           {
> +             gphi *phi = as_a <gphi *> (stmt);
> +
> +             for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +               {
> +                 const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +                 /* Don't consider redefinitions in excluded basic blocks.  */
> +                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                   {
> +                     /* There are more than one source operands that can
> +                        provide value to the SSA name, it is variant.  */
> +                     if (from)
> +                       return false;
> +
> +                     from = gimple_phi_arg_def (phi, i);
> +                   }
> +               }
> +           }
> +         else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +           {
> +             /* For simple value copy, check its rhs instead.  */
> +             if (gimple_assign_ssa_name_copy_p (stmt))
> +               from = gimple_assign_rhs1 (stmt);
> +           }
> +
> +         /* Any other kind of definition is deemed to introduce a new value
> +            to the SSA name.  */
> +         if (!from)
> +           return false;
> +       }
> +       return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration.  */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place.  */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name.  */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> +   dominated by it are excluded from the loop.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands.  This is because check on
> +     VARDECL operands, which involve memory loads, must have been done
> +     prior to invocation of this function in vuse_semi_invariant_p.  */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +       continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +       continue;
> +
> +       /* The branch can be reached from opposite branch, or from some
> +         statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static edge
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +        this conditional statement.  Firstly, it is trivial if the exit branch
> +        is semi-invariant, for the statement is just to break loop.  Secondly,
> +        if the opposite branch is semi-invariant, it means that the statement
> +        is real loop-invariant, which is covered by loop unswitch.  */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +       return NULL;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!branch_removable_p (targ_bb[i]))
> +       continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +        loop latch, it and its following trace will only be executed in
> +        final iteration of loop, namely it is not part of repeated body
> +        of the loop.  Similar to the above case that the branch is loop
> +        exit, no need to split loop.  */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +       continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want.  */
> +  if (invar[0] ^ !invar[1])
> +    return NULL;
> +
> +  /* Found a real loop-invariant condition, do nothing.  */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return NULL;
> +
> +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> +}
> +
> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> +{
> +  basic_block cond_bb = branch_edge->src;
> +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> +       continue;
> +
> +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> +    }
> +
> +  /* It is unnecessary to evaluate expression of the conditional statement
> +     in new loop that contains only invariant branch.  This expression should
> +     be constant value (either true or false).  Exclude code size of insns
> +     that contribute to computation of the expression.  */
> +
> +  auto_vec<gimple *> worklist;
> +  hash_set<gimple *> removed;
> +  gimple *stmt = last_stmt (cond_bb);
> +
> +  worklist.safe_push (stmt);
> +  removed.add (stmt);
> +  num -= estimate_num_insns (stmt, &eni_size_weights);
> +
> +  do
> +    {
> +      ssa_op_iter opnd_iter;
> +      use_operand_p opnd_p;
> +
> +      stmt = worklist.pop ();
> +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> +       {
> +         tree opnd = USE_FROM_PTR (opnd_p);
> +
> +         if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> +           continue;
> +
> +         gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> +         use_operand_p use_p;
> +         imm_use_iterator use_iter;
> +
> +         if (removed.contains (opnd_stmt)
> +             || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> +           continue;
> +
> +         FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> +           {
> +             gimple *use_stmt = USE_STMT (use_p);
> +
> +             if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> +               {
> +                 opnd_stmt = NULL;
> +                 break;
> +               }
> +           }
> +
> +         if (opnd_stmt)
> +           {
> +             worklist.safe_push (opnd_stmt);
> +             removed.add (opnd_stmt);
> +             num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> +           }
> +       }
> +    } while (!worklist.is_empty ());
> +
> +  gcc_assert (num >= 0);
> +  return num;
> +}
> +
> +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> +   and check whether it is eligible and profitable to perform loop split upon
> +   this branch in LOOP.  */
> +
> +static edge
> +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> +{
> +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (!invar_branch)
> +    return NULL;
> +
> +  profile_probability prob = invar_branch->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_LOOP_COND_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +       return NULL;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, invar_branch)
> +      > PARAM_VALUE (PARAM_MAX_LOOP_COND_SPLIT_INSNS))
> +    return NULL;
> +
> +  return invar_branch;
> +}
> +
> +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> +   conditional statement, perform loop split transformation illustrated
> +   as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out.  In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result.  In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> +   edge.  And also in loop2, we abandon the variant branch of the conditional
> +   statement candidate by setting a constant bool condition, based on which
> +   branch is semi-invariant.  */
> +
> +static bool
> +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> +{
> +  basic_block cond_bb = invar_branch->src;
> +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> +
> +  gcc_assert (cond_bb->loop_father == loop1);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +             current_function_name (), loop1->num,
> +             true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                    profile_probability::always (),
> +                                    profile_probability::never (),
> +                                    profile_probability::always (),
> +                                    profile_probability::always (),
> +                                    true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition.  */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                     gimple_cond_code (cond),
> +                                     gimple_cond_lhs (cond),
> +                                     gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  /* Replace the condition in loop2 with a bool constant to let PassManager
> +     remove the variant branch after current pass completes.  */
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1.  This
> +     statement acts as a switch to transfer execution from loop1 to loop2,
> +     when loop1 enters into invariant state.  */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                         NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2.  */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in LOOP, to find out a good candidate
> +   upon which we can do loop split.  */
> +
> +static bool
> +split_loop_on_cond (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +  bool do_split = false;
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field.  */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* We only consider conditional statement, which be executed at most once
> +        in each iteration of the loop.  So skip statements in inner loops.  */
> +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> +       continue;
> +
> +      /* Actually this check is not a must constraint.  With it, we can ensure
> +        conditional statement will always be executed in each iteration.  */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +       continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +       continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> +
> +      if (branch_edge)
> +       {
> +         do_split_loop_on_cond (loop, branch_edge);
> +         do_split = true;
> +         break;
> +       }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return do_split;
> +}
> +
>  /* Main entry point.  Perform loop splitting on all suitable loops.  */
>
>  static unsigned int
> @@ -627,7 +1369,6 @@ tree_ssa_split_loops (void)
>    /* Go through all loops starting from innermost.  */
>    FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>      {
> -      class tree_niter_desc niter;
>        if (loop->aux)
>         {
>           /* If any of our inner loops was split, don't split us,
> @@ -636,29 +1377,14 @@ tree_ssa_split_loops (void)
>           continue;
>         }
>
> -      if (single_exit (loop)
> -         /* ??? We could handle non-empty latches when we split
> -            the latch edge (not the exit edge), and put the new
> -            exit condition in the new block.  OTOH this executes some
> -            code unconditionally that might have been skipped by the
> -            original exit before.  */
> -         && empty_block_p (loop->latch)
> -         && !optimize_loop_for_size_p (loop)
> -         && easy_exit_values (loop)
> -         && number_of_iterations_exit (loop, single_exit (loop), &niter,
> -                                       false, true)
> -         && niter.cmp != ERROR_MARK
> -         /* We can't yet handle loops controlled by a != predicate.  */
> -         && niter.cmp != NE_EXPR
> -         && can_duplicate_loop_p (loop))
> +      if (optimize_loop_for_size_p (loop))
> +        continue;
> +
> +      if (split_loop (loop) || split_loop_on_cond (loop))
>         {
> -         if (split_loop (loop, &niter))
> -           {
> -             /* Mark our containing loop as having had some split inner
> -                loops.  */
> -             loop_outer (loop)->aux = loop;
> -             changed = true;
> -           }
> +         /* Mark our containing loop as having had some split inner loops.  */
> +         loop_outer (loop)->aux = loop;
> +         changed = true;
>         }
>      }
>
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-23  9:10                           ` Richard Biener
@ 2019-10-23  9:37                             ` Feng Xue OS
  2019-10-23 10:32                               ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-10-23  9:37 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

[-- Attachment #1: Type: text/plain, Size: 39167 bytes --]

Patch attached.

Feng

________________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Wednesday, October 23, 2019 5:04 PM
To: Feng Xue OS
Cc: Michael Matz; Philipp Tomsich; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.ochoa@theobroma-systems.com
Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

On Wed, Oct 23, 2019 at 5:36 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> Michael,
>
> > I've only noticed a couple typos, and one minor remark.
> Typos corrected.
>
> > I just wonder why you duplicated these three loops instead of integrating
> > the real body into the existing LI_FROM_INNERMOST loop.  I would have
> > expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block
> > to simply be the else block of the existing
> > "if (... conditions for normal loop splitting ...)" block.
> Adjusted to do two kinds of loop-split in same LI_FROM_INNERMOST loop.
>
> > From my perspective it's okay, but you still need the okay of a proper reviewer,
> > for which you might want to state the testing/regression state of this
> > patch relative to trunk.
>
> Richard,
>
>   Is it ok to commit this patch? Bootstrap and regression test passed. And for
> performance, we can get about 7% improvement on spec2017 omnetpp with this
> patch.

Can you please provide the corresponding ChangeLog entries as well and
attach the patch?  It seems to be garbled by some encoding.

Thanks,
Richard.

> Thanks,
> Feng
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1407d019d14..d41e5aa0215 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-loop-cond-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-loop-cond-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-loop-cond-split-prob
> +When FDO profile information is available, @option{min-loop-cond-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 322c37f8b96..73b59f7465e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>         "The maximum number of unswitchings in a single loop.",
>         3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
> +       "max-loop-cond-split-insns",
> +       "The maximum number of insns to be added due to loop split on "
> +       "semi-invariant condition statement.",
> +       100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
> +       "min-loop-cond-split-prob",
> +       "The minimum threshold for probability of semi-invariant condition "
> +       "statement to trigger loop split.",
> +       30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
>     headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 00000000000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include <string>
> +#include <map>
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map<int, string> m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> +    {
> +      if (ga->empty)
> +        ga->set (iter->second);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 00000000000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +    {
> +      if (b)
> +        b = do_something();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f5f083384bc..5cffd4bb508 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "tree-into-ssa.h"
> +#include "tree-inline.h"
> +#include "tree-cfgcleanup.h"
>  #include "cfgloop.h"
> +#include "params.h"
>  #include "tree-scalar-evolution.h"
>  #include "gimple-iterator.h"
>  #include "gimple-pretty-print.h"
> @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kinds of loop splitting.
> +
> +   One transformation of loops like:
>
>     for (i = 0; i < 100; i++)
>       {
> @@ -487,8 +492,9 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
>     single exit of LOOP.  */
>
>  static bool
> -split_loop (class loop *loop1, class tree_niter_desc *niter)
> +split_loop (class loop *loop1)
>  {
> +  class tree_niter_desc niter;
>    basic_block *bbs;
>    unsigned i;
>    bool changed = false;
> @@ -496,8 +502,28 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>    tree border = NULL_TREE;
>    affine_iv iv;
>
> +  if (!single_exit (loop1)
> +      /* ??? We could handle non-empty latches when we split the latch edge
> +         (not the exit edge), and put the new exit condition in the new block.
> +        OTOH this executes some code unconditionally that might have been
> +        skipped by the original exit before.  */
> +      || !empty_block_p (loop1->latch)
> +      || !easy_exit_values (loop1)
> +      || !number_of_iterations_exit (loop1, single_exit (loop1), &niter,
> +                                    false, true)
> +      || niter.cmp == ERROR_MARK
> +      /* We can't yet handle loops controlled by a != predicate.  */
> +      || niter.cmp == NE_EXPR)
> +    return false;
> +
>    bbs = get_loop_body (loop1);
>
> +  if (!can_copy_bbs_p (bbs, loop1->num_nodes))
> +    {
> +      free (bbs);
> +      return false;
> +    }
> +
>    /* Find a splitting opportunity.  */
>    for (i = 0; i < loop1->num_nodes; i++)
>      if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
> @@ -505,8 +531,8 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>         /* Handling opposite steps is not implemented yet.  Neither
>            is handling different step sizes.  */
>         if ((tree_int_cst_sign_bit (iv.step)
> -            != tree_int_cst_sign_bit (niter->control.step))
> -           || !tree_int_cst_equal (iv.step, niter->control.step))
> +            != tree_int_cst_sign_bit (niter.control.step))
> +           || !tree_int_cst_equal (iv.step, niter.control.step))
>           continue;
>
>         /* Find a loop PHI node that defines guard_iv directly,
> @@ -575,7 +601,7 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>            Compute the new bound for the guarding IV and patch the
>            loop exit to use it instead of original IV and bound.  */
>         gimple_seq stmts = NULL;
> -       tree newend = compute_new_first_bound (&stmts, niter, border,
> +       tree newend = compute_new_first_bound (&stmts, &niter, border,
>                                                guard_code, guard_init);
>         if (stmts)
>           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> @@ -612,6 +638,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
>    return changed;
>  }
>
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> +         a_j = ...;  // change at least one a_j
> +       else
> +         S;          // not change any a_j
> +     }
> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> +     {
> +       if (expr (a_1, a_2, ..., a_n))
> +         a_j = ...;
> +       else
> +         {
> +           S;
> +           i = NEXT ();
> +           break;
> +         }
> +     }
> +
> +   for (; CHECK (i); i = NEXT ())
> +     {
> +       S;
> +     }
> +
> +   */
> +
> +/* Data structure to hold temporary information during loop split upon
> +   semi-invariant conditional statement.  */
> +class split_info {
> +public:
> +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> +  basic_block *bbs;
> +
> +  /* All memory store/clobber statements in a loop.  */
> +  auto_vec<gimple *> memory_stores;
> +
> +  /* Whether above memory stores vector has been filled.  */
> +  int need_init;
> +
> +  split_info () : bbs (NULL),  need_init (true) { }
> +
> +  ~split_info ()
> +    {
> +      if (bbs)
> +       free (bbs);
> +    }
> +};
> +
> +/* Find all statements with memory-write effect in LOOP, including memory
> +   store and non-pure function call, and keep those in a vector.  This work
> +   is only done one time, for the vector should be constant during analysis
> +   stage of semi-invariant condition.  */
> +
> +static void
> +find_vdef_in_loop (struct loop *loop)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +  gphi *vphi = get_virtual_phi (loop->header);
> +
> +  /* Indicate memory store vector has been filled.  */
> +  info->need_init = false;
> +
> +  /* If loop contains memory operation, there must be a virtual PHI node in
> +     loop header basic block.  */
> +  if (vphi == NULL)
> +    return;
> +
> +  /* All virtual SSA names inside the loop are connected to be a cyclic
> +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> +     links the first and the last virtual SSA names, by using the last as
> +     PHI operand to define the first.  */
> +  const edge latch = loop_latch_edge (loop);
> +  const tree first = gimple_phi_result (vphi);
> +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> +
> +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> +     is defined by itself.
> +
> +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> +
> +     This means the loop contains only memory loads, so we can skip it.  */
> +  if (first == last)
> +    return;
> +
> +  auto_vec<gimple *> other_stores;
> +  auto_vec<tree> worklist;
> +  auto_bitmap visited;
> +
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> +  worklist.safe_push (last);
> +
> +  do
> +    {
> +      tree vuse = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> +
> +      /* We mark the first and last SSA names as visited at the beginning,
> +        and reversely start the process from the last SSA name towards the
> +        first, which ensures that this do-while will not touch SSA names
> +        defined outside of the loop.  */
> +      gcc_assert (gimple_bb (stmt)
> +                 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> +
> +      if (gimple_code (stmt) == GIMPLE_PHI)
> +       {
> +         gphi *phi = as_a <gphi *> (stmt);
> +
> +         for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +           {
> +             tree arg = gimple_phi_arg_def (stmt, i);
> +
> +             if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> +               worklist.safe_push (arg);
> +           }
> +       }
> +      else
> +       {
> +         tree prev = gimple_vuse (stmt);
> +
> +         /* Non-pure call statement is conservatively assumed to impact all
> +            memory locations.  So place call statements ahead of other memory
> +            stores in the vector with an idea of of using them as shortcut
> +            terminators to memory alias analysis.  */
> +         if (gimple_code (stmt) == GIMPLE_CALL)
> +           info->memory_stores.safe_push (stmt);
> +         else
> +           other_stores.safe_push (stmt);
> +
> +         if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> +           worklist.safe_push (prev);
> +       }
> +    } while (!worklist.is_empty ());
> +
> +    info->memory_stores.safe_splice (other_stores);
> +}
> +
> +
> +/* Given STMT, memory load or pure call statement, check whether it is impacted
> +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> +   NULL, all basic blocks of LOOP are checked.  */
> +
> +static bool
> +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head)
> +{
> +  split_info *info = (split_info *) loop->aux;
> +
> +  /* Collect memory store/clobber statements if have not do that.  */
> +  if (info->need_init)
> +    find_vdef_in_loop (loop);
> +
> +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> +  ao_ref ref;
> +  gimple *store;
> +  unsigned i;
> +
> +  ao_ref_init (&ref, rhs);
> +
> +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> +    {
> +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> +      if (skip_head
> +         && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> +       continue;
> +
> +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Forward declaration.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head);
> +
> +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> +   unchanged in next iteration.  We call this characteristic semi-
> +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> +   basic blocks and control flows in the loop will be considered.  If non-
> +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> +
> +static bool
> +ssa_semi_invariant_p (struct loop *loop, const tree name,
> +                     const_basic_block skip_head)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  const_basic_block def_bb = gimple_bb (def);
> +
> +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> +    return true;
> +
> +  if (gimple_code (def) == GIMPLE_PHI)
> +    {
> +      /* For PHI node that is not in loop header, its source operands should
> +        be defined inside the loop, which are seen as loop variant.  */
> +      if (def_bb != loop->header || !skip_head)
> +       return false;
> +
> +      const_edge latch = loop_latch_edge (loop);
> +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> +
> +      /* A PHI node in loop header contains two source operands, one is
> +        initial value, the other is the copy of last iteration through loop
> +        latch, we call it latch value.  From the PHI node to definition
> +        of latch value, if excluding branch trace from SKIP_HEAD, there
> +        is no definition of other version of same variable, SSA name defined
> +        by the PHI node is semi-invariant.
> +
> +                         loop entry
> +                              |     .--- latch ---.
> +                              |     |             |
> +                              v     v             |
> +                  x_1 = PHI <x_0,  x_3>           |
> +                           |                      |
> +                           v                      |
> +              .------- if (cond) -------.         |
> +              |                         |         |
> +              |                     [ SKIP ]      |
> +              |                         |         |
> +              |                     x_2 = ...     |
> +              |                         |         |
> +              '---- T ---->.<---- F ----'         |
> +                           |                      |
> +                           v                      |
> +                  x_3 = PHI <x_1, x_2>            |
> +                           |                      |
> +                           '----------------------'
> +
> +       Suppose in certain iteration, execution flow in above graph goes
> +       through true branch, which means that one source value to define
> +       x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> +       x_1 in next iterations is defined by x_3, we know that x_1 will
> +       never changed if COND always chooses true branch from then on.  */
> +
> +      while (from != name)
> +       {
> +         /* A new value comes from a CONSTANT.  */
> +         if (TREE_CODE (from) != SSA_NAME)
> +           return false;
> +
> +         gimple *stmt = SSA_NAME_DEF_STMT (from);
> +         const_basic_block bb = gimple_bb (stmt);
> +
> +         /* A new value comes from outside of loop.  */
> +         if (!bb || !flow_bb_inside_loop_p (loop, bb))
> +           return false;
> +
> +         from = NULL_TREE;
> +
> +         if (gimple_code (stmt) == GIMPLE_PHI)
> +           {
> +             gphi *phi = as_a <gphi *> (stmt);
> +
> +             for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> +               {
> +                 const_edge e = gimple_phi_arg_edge (phi, i);
> +
> +                 /* Don't consider redefinitions in excluded basic blocks.  */
> +                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> +                   {
> +                     /* There are more than one source operands that can
> +                        provide value to the SSA name, it is variant.  */
> +                     if (from)
> +                       return false;
> +
> +                     from = gimple_phi_arg_def (phi, i);
> +                   }
> +               }
> +           }
> +         else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> +           {
> +             /* For simple value copy, check its rhs instead.  */
> +             if (gimple_assign_ssa_name_copy_p (stmt))
> +               from = gimple_assign_rhs1 (stmt);
> +           }
> +
> +         /* Any other kind of definition is deemed to introduce a new value
> +            to the SSA name.  */
> +         if (!from)
> +           return false;
> +       }
> +       return true;
> +    }
> +
> +  /* Value originated from volatile memory load or return of normal (non-
> +     const/pure) call should not be treated as constant in each iteration.  */
> +  if (gimple_has_side_effects (def))
> +    return false;
> +
> +  /* Check if any memory store may kill memory load at this place.  */
> +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> +    return false;
> +
> +  /* Check operands of definition statement of the SSA name.  */
> +  return stmt_semi_invariant_p (loop, def, skip_head);
> +}
> +
> +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> +   dominated by it are excluded from the loop.  */
> +
> +static bool
> +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> +                      const_basic_block skip_head)
> +{
> +  ssa_op_iter iter;
> +  tree use;
> +
> +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> +     here we only need to check SSA name operands.  This is because check on
> +     VARDECL operands, which involve memory loads, must have been done
> +     prior to invocation of this function in vuse_semi_invariant_p.  */
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> +    {
> +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Determine when conditional statement never transfers execution to one of its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +    return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +       continue;
> +
> +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +       continue;
> +
> +       /* The branch can be reached from opposite branch, or from some
> +         statement not dominated by the conditional statement.  */
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Find out which branch of a conditional statement (COND) is invariant in the
> +   execution context of LOOP.  That is: once the branch is selected in certain
> +   iteration of the loop, any operand that contributes to computation of the
> +   conditional statement remains unchanged in all following iterations.  */
> +
> +static edge
> +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> +{
> +  basic_block cond_bb = gimple_bb (cond);
> +  basic_block targ_bb[2];
> +  bool invar[2];
> +  unsigned invar_checks;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> +
> +      /* One branch directs to loop exit, no need to perform loop split upon
> +        this conditional statement.  Firstly, it is trivial if the exit branch
> +        is semi-invariant, for the statement is just to break loop.  Secondly,
> +        if the opposite branch is semi-invariant, it means that the statement
> +        is real loop-invariant, which is covered by loop unswitch.  */
> +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> +       return NULL;
> +    }
> +
> +  invar_checks = 0;
> +
> +  for (unsigned i = 0; i < 2; i++)
> +    {
> +      invar[!i] = false;
> +
> +      if (!branch_removable_p (targ_bb[i]))
> +       continue;
> +
> +      /* Given a semi-invariant branch, if its opposite branch dominates
> +        loop latch, it and its following trace will only be executed in
> +        final iteration of loop, namely it is not part of repeated body
> +        of the loop.  Similar to the above case that the branch is loop
> +        exit, no need to split loop.  */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> +       continue;
> +
> +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> +      invar_checks++;
> +    }
> +
> +  /* With both branches being invariant (handled by loop unswitch) or
> +     variant is not what we want.  */
> +  if (invar[0] ^ !invar[1])
> +    return NULL;
> +
> +  /* Found a real loop-invariant condition, do nothing.  */
> +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> +    return NULL;
> +
> +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> +}
> +
> +/* Calculate increased code size measured by estimated insn number if applying
> +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> +
> +static int
> +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> +{
> +  basic_block cond_bb = branch_edge->src;
> +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> +  int num = 0;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      /* Do no count basic blocks only in opposite branch.  */
> +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> +       continue;
> +
> +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> +    }
> +
> +  /* It is unnecessary to evaluate expression of the conditional statement
> +     in new loop that contains only invariant branch.  This expression should
> +     be constant value (either true or false).  Exclude code size of insns
> +     that contribute to computation of the expression.  */
> +
> +  auto_vec<gimple *> worklist;
> +  hash_set<gimple *> removed;
> +  gimple *stmt = last_stmt (cond_bb);
> +
> +  worklist.safe_push (stmt);
> +  removed.add (stmt);
> +  num -= estimate_num_insns (stmt, &eni_size_weights);
> +
> +  do
> +    {
> +      ssa_op_iter opnd_iter;
> +      use_operand_p opnd_p;
> +
> +      stmt = worklist.pop ();
> +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> +       {
> +         tree opnd = USE_FROM_PTR (opnd_p);
> +
> +         if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> +           continue;
> +
> +         gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> +         use_operand_p use_p;
> +         imm_use_iterator use_iter;
> +
> +         if (removed.contains (opnd_stmt)
> +             || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> +           continue;
> +
> +         FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> +           {
> +             gimple *use_stmt = USE_STMT (use_p);
> +
> +             if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> +               {
> +                 opnd_stmt = NULL;
> +                 break;
> +               }
> +           }
> +
> +         if (opnd_stmt)
> +           {
> +             worklist.safe_push (opnd_stmt);
> +             removed.add (opnd_stmt);
> +             num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> +           }
> +       }
> +    } while (!worklist.is_empty ());
> +
> +  gcc_assert (num >= 0);
> +  return num;
> +}
> +
> +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> +   and check whether it is eligible and profitable to perform loop split upon
> +   this branch in LOOP.  */
> +
> +static edge
> +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> +{
> +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> +
> +  if (!invar_branch)
> +    return NULL;
> +
> +  profile_probability prob = invar_branch->probability;
> +
> +  /* When accurate profile information is available, and execution
> +     frequency of the branch is too low, just let it go.  */
> +  if (prob.reliable_p ())
> +    {
> +      int thres = PARAM_VALUE (PARAM_MIN_LOOP_COND_SPLIT_PROB);
> +
> +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> +       return NULL;
> +    }
> +
> +  /* Add a threshold for increased code size to disable loop split.  */
> +  if (compute_added_num_insns (loop, invar_branch)
> +      > PARAM_VALUE (PARAM_MAX_LOOP_COND_SPLIT_INSNS))
> +    return NULL;
> +
> +  return invar_branch;
> +}
> +
> +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> +   conditional statement, perform loop split transformation illustrated
> +   as the following graph.
> +
> +               .-------T------ if (true) ------F------.
> +               |                    .---------------. |
> +               |                    |               | |
> +               v                    |               v v
> +          pre-header                |            pre-header
> +               | .------------.     |                 | .------------.
> +               | |            |     |                 | |            |
> +               | v            |     |                 | v            |
> +             header           |     |               header           |
> +               |              |     |                 |              |
> +       [ bool r = cond; ]     |     |                 |              |
> +               |              |     |                 |              |
> +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> +      |                 |     |     |        |                 |     |
> +  invariant             |     |     |    invariant             |     |
> +      |                 |     |     |        |                 |     |
> +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> +               |              |    /                  |              |
> +             stmts            |   /                 stmts            |
> +               |              |  /                    |              |
> +              / \             | /                    / \             |
> +     .-------*   *       [ if (!r) ]        .-------*   *            |
> +     |           |            |             |           |            |
> +     |         latch          |             |         latch          |
> +     |           |            |             |           |            |
> +     |           '------------'             |           '------------'
> +     '------------------------. .-----------'
> +             loop1            | |                   loop2
> +                              v v
> +                             exits
> +
> +   In the graph, loop1 represents the part derived from original one, and
> +   loop2 is duplicated using loop_version (), which corresponds to the part
> +   of original one being splitted out.  In loop1, a new bool temporary (r)
> +   is introduced to keep value of the condition result.  In original latch
> +   edge of loop1, we insert a new conditional statement whose value comes
> +   from previous temporary (r), one of its branch goes back to loop1 header
> +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> +   edge.  And also in loop2, we abandon the variant branch of the conditional
> +   statement candidate by setting a constant bool condition, based on which
> +   branch is semi-invariant.  */
> +
> +static bool
> +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> +{
> +  basic_block cond_bb = invar_branch->src;
> +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> +
> +  gcc_assert (cond_bb->loop_father == loop1);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> +             current_function_name (), loop1->num,
> +             true_invar ? "T" : "F", cond_bb->index);
> +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> +   }
> +
> +  initialize_original_copy_tables ();
> +
> +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> +                                    profile_probability::always (),
> +                                    profile_probability::never (),
> +                                    profile_probability::always (),
> +                                    profile_probability::always (),
> +                                    true);
> +  if (!loop2)
> +    {
> +      free_original_copy_tables ();
> +      return false;
> +    }
> +
> +  /* Generate a bool type temporary to hold result of the condition.  */
> +  tree tmp = make_ssa_name (boolean_type_node);
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple *stmt = gimple_build_assign (tmp,
> +                                     gimple_cond_code (cond),
> +                                     gimple_cond_lhs (cond),
> +                                     gimple_cond_rhs (cond));
> +
> +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> +  update_stmt (cond);
> +
> +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> +
> +  /* Replace the condition in loop2 with a bool constant to let PassManager
> +     remove the variant branch after current pass completes.  */
> +  if (true_invar)
> +    gimple_cond_make_true (cond_copy);
> +  else
> +    gimple_cond_make_false (cond_copy);
> +
> +  update_stmt (cond_copy);
> +
> +  /* Insert a new conditional statement on latch edge of loop1.  This
> +     statement acts as a switch to transfer execution from loop1 to loop2,
> +     when loop1 enters into invariant state.  */
> +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> +                                         NULL_TREE, NULL_TREE);
> +
> +  gsi = gsi_last_bb (break_bb);
> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> +
> +  edge to_loop1 = single_succ_edge (break_bb);
> +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> +
> +  to_loop1->flags &= ~EDGE_FALLTHRU;
> +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> +
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> +     pre-header, we should update PHIs in loop2 to reflect this connection
> +     between loop1 and loop2.  */
> +  connect_loop_phis (loop1, loop2, to_loop2);
> +
> +  free_original_copy_tables ();
> +
> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> +
> +  return true;
> +}
> +
> +/* Traverse all conditional statements in LOOP, to find out a good candidate
> +   upon which we can do loop split.  */
> +
> +static bool
> +split_loop_on_cond (struct loop *loop)
> +{
> +  split_info *info = new split_info ();
> +  basic_block *bbs = info->bbs = get_loop_body (loop);
> +  bool do_split = false;
> +
> +  /* Allocate an area to keep temporary info, and associate its address
> +     with loop aux field.  */
> +  loop->aux = info;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* We only consider conditional statement, which be executed at most once
> +        in each iteration of the loop.  So skip statements in inner loops.  */
> +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> +       continue;
> +
> +      /* Actually this check is not a must constraint.  With it, we can ensure
> +        conditional statement will always be executed in each iteration.  */
> +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +       continue;
> +
> +      gimple *last = last_stmt (bb);
> +
> +      if (!last || gimple_code (last) != GIMPLE_COND)
> +       continue;
> +
> +      gcond *cond = as_a <gcond *> (last);
> +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> +
> +      if (branch_edge)
> +       {
> +         do_split_loop_on_cond (loop, branch_edge);
> +         do_split = true;
> +         break;
> +       }
> +    }
> +
> +  delete info;
> +  loop->aux = NULL;
> +
> +  return do_split;
> +}
> +
>  /* Main entry point.  Perform loop splitting on all suitable loops.  */
>
>  static unsigned int
> @@ -627,7 +1369,6 @@ tree_ssa_split_loops (void)
>    /* Go through all loops starting from innermost.  */
>    FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>      {
> -      class tree_niter_desc niter;
>        if (loop->aux)
>         {
>           /* If any of our inner loops was split, don't split us,
> @@ -636,29 +1377,14 @@ tree_ssa_split_loops (void)
>           continue;
>         }
>
> -      if (single_exit (loop)
> -         /* ??? We could handle non-empty latches when we split
> -            the latch edge (not the exit edge), and put the new
> -            exit condition in the new block.  OTOH this executes some
> -            code unconditionally that might have been skipped by the
> -            original exit before.  */
> -         && empty_block_p (loop->latch)
> -         && !optimize_loop_for_size_p (loop)
> -         && easy_exit_values (loop)
> -         && number_of_iterations_exit (loop, single_exit (loop), &niter,
> -                                       false, true)
> -         && niter.cmp != ERROR_MARK
> -         /* We can't yet handle loops controlled by a != predicate.  */
> -         && niter.cmp != NE_EXPR
> -         && can_duplicate_loop_p (loop))
> +      if (optimize_loop_for_size_p (loop))
> +        continue;
> +
> +      if (split_loop (loop) || split_loop_on_cond (loop))
>         {
> -         if (split_loop (loop, &niter))
> -           {
> -             /* Mark our containing loop as having had some split inner
> -                loops.  */
> -             loop_outer (loop)->aux = loop;
> -             changed = true;
> -           }
> +         /* Mark our containing loop as having had some split inner loops.  */
> +         loop_outer (loop)->aux = loop;
> +         changed = true;
>         }
>      }
>
> --
> 2.17.1

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: loop-split.patch --]
[-- Type: text/x-patch; name="loop-split.patch", Size: 35296 bytes --]

From 3cb899240b28e46457728bd9a6f2957d873e8dff Mon Sep 17 00:00:00 2001
From: Feng Xue <feng.xue@amperecomputing.com>
Date: Tue, 12 Mar 2019 11:46:19 +0800
Subject: [PATCH] Loop split on semi-invariant conditional statement

---
 gcc/ChangeLog                                 |  16 +
 gcc/doc/invoke.texi                           |  13 +
 gcc/params.def                                |  14 +
 gcc/testsuite/ChangeLog                       |   6 +
 .../g++.dg/tree-ssa/loop-cond-split-1.C       |  33 +
 .../gcc.dg/tree-ssa/loop-cond-split-1.c       |  23 +
 gcc/tree-ssa-loop-split.c                     | 782 +++++++++++++++++-
 7 files changed, 859 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8ec312d7470..780060b8698 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,19 @@
+2019-10-23  Feng Xue <fxue@os.amperecomputing.com>
+
+	PR tree-optimization/89134
+	* doc/invoke.texi (max-loop-cond-split-insns): Document new --params.
+	(min-loop-cond-split-prob): Likewise.
+	* params.def: Add max-loop-cond-split-insns, min-loop-cond-split-prob.
+	* tree-ssa-loop-split.c (split_loop): Remove niter parameter, move some
+	outside checks on loop into the function.
+	(split_info): New class.
+	(find_vdef_in_loop, vuse_semi_invariant_p): New functions.
+	(ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
+	(branch_removable_p, get_cond_invariant_branch): Likewise.
+	(compute_added_num_insns, get_cond_branch_to_split_loop): Likewise.
+	(do_split_loop_on_cond, split_loop_on_cond): Likewise.
+	(tree_ssa_split_loops): Add loop split on conditional statement.
+
 2019-10-23  Iain Sandoe  <iain@sandoe.co.uk>
 
 	* config/rs6000/darwin.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Guard
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1407d019d14..d41e5aa0215 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item max-loop-cond-split-insns
+In a loop, if a branch of a conditional statement is selected since certain
+loop iteration, any operand that contributes to computation of the conditional
+expression remains unchanged in all following iterations, the statement is
+semi-invariant, upon which we can do a kind of loop split transformation.
+@option{max-loop-cond-split-insns} controls maximum number of insns to be
+added due to loop split on semi-invariant conditional statement.
+
+@item min-loop-cond-split-prob
+When FDO profile information is available, @option{min-loop-cond-split-prob}
+specifies minimum threshold for probability of semi-invariant condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 322c37f8b96..73b59f7465e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
 	"The maximum number of unswitchings in a single loop.",
 	3, 0, 0)
 
+/* The maximum number of increased insns due to loop split on semi-invariant
+   condition statement.  */
+DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
+	"max-loop-cond-split-insns",
+	"The maximum number of insns to be added due to loop split on "
+	"semi-invariant condition statement.",
+	100, 0, 0)
+
+DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
+	"min-loop-cond-split-prob",
+	"The minimum threshold for probability of semi-invariant condition "
+	"statement to trigger loop split.",
+	30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index fd272807d0b..892cf457b8d 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2019-10-23  Feng Xue  <fxue@os.amperecomputing.com>
+
+	PR tree-optimization/89134
+	* gcc.dg/tree-ssa/loop-cond-split-1.c: New test.
+	* g++.dg/tree-ssa/loop-cond-split-1.C: New test.
+
 2019-10-22  Marc Glisse  <marc.glisse@inria.fr>
 
 	PR c++/85746
diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
new file mode 100644
index 00000000000..51f9da22fc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+#include <string>
+#include <map>
+
+using namespace std;
+
+class  A
+{
+public:
+  bool empty;
+  void set (string s);
+};
+
+class  B
+{
+  map<int, string> m;
+  void f ();
+};
+
+extern A *ga;
+
+void B::f ()
+{
+  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
+    {
+      if (ga->empty)
+        ga->set (iter->second);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
new file mode 100644
index 00000000000..bbd522d6bcd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+__attribute__((pure)) __attribute__((noinline)) int inc (int i)
+{
+  return i + 1;
+}
+
+extern int do_something (void);
+extern int b;
+
+void test(int n)
+{
+  int i;
+
+  for (i = 0; i < n; i = inc (i))
+    {
+      if (b)
+        b = do_something();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index f5f083384bc..5cffd4bb508 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
+#include "tree-cfgcleanup.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 
-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kinds of loop splitting.
+
+   One transformation of loops like:
 
    for (i = 0; i < 100; i++)
      {
@@ -487,8 +492,9 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
    single exit of LOOP.  */
 
 static bool
-split_loop (class loop *loop1, class tree_niter_desc *niter)
+split_loop (class loop *loop1)
 {
+  class tree_niter_desc niter;
   basic_block *bbs;
   unsigned i;
   bool changed = false;
@@ -496,8 +502,28 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   tree border = NULL_TREE;
   affine_iv iv;
 
+  if (!single_exit (loop1)
+      /* ??? We could handle non-empty latches when we split the latch edge
+         (not the exit edge), and put the new exit condition in the new block.
+	 OTOH this executes some code unconditionally that might have been
+	 skipped by the original exit before.  */
+      || !empty_block_p (loop1->latch)
+      || !easy_exit_values (loop1)
+      || !number_of_iterations_exit (loop1, single_exit (loop1), &niter,
+				     false, true)
+      || niter.cmp == ERROR_MARK
+      /* We can't yet handle loops controlled by a != predicate.  */
+      || niter.cmp == NE_EXPR)
+    return false;
+
   bbs = get_loop_body (loop1);
 
+  if (!can_copy_bbs_p (bbs, loop1->num_nodes))
+    {
+      free (bbs);
+      return false;
+    }
+
   /* Find a splitting opportunity.  */
   for (i = 0; i < loop1->num_nodes; i++)
     if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
@@ -505,8 +531,8 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
 	/* Handling opposite steps is not implemented yet.  Neither
 	   is handling different step sizes.  */
 	if ((tree_int_cst_sign_bit (iv.step)
-	     != tree_int_cst_sign_bit (niter->control.step))
-	    || !tree_int_cst_equal (iv.step, niter->control.step))
+	     != tree_int_cst_sign_bit (niter.control.step))
+	    || !tree_int_cst_equal (iv.step, niter.control.step))
 	  continue;
 
 	/* Find a loop PHI node that defines guard_iv directly,
@@ -575,7 +601,7 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
 	   Compute the new bound for the guarding IV and patch the
 	   loop exit to use it instead of original IV and bound.  */
 	gimple_seq stmts = NULL;
-	tree newend = compute_new_first_bound (&stmts, niter, border,
+	tree newend = compute_new_first_bound (&stmts, &niter, border,
 					       guard_code, guard_init);
 	if (stmts)
 	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
@@ -612,6 +638,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   return changed;
 }
 
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))  // expr is pure
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement.  */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop.  */
+  auto_vec<gimple *> memory_stores;
+
+  /* Whether above memory stores vector has been filled.  */
+  int need_init;
+
+  split_info () : bbs (NULL),  need_init (true) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+	free (bbs);
+    }
+};
+
+/* Find all statements with memory-write effect in LOOP, including memory
+   store and non-pure function call, and keep those in a vector.  This work
+   is only done one time, for the vector should be constant during analysis
+   stage of semi-invariant condition.  */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled.  */
+  info->need_init = false;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block.  */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes.  The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first.  */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it.  */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> other_stores;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+	 and reversely start the process from the last SSA name towards the
+	 first, which ensures that this do-while will not touch SSA names
+	 defined outside of the loop.  */
+      gcc_assert (gimple_bb (stmt)
+		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+	{
+	  gphi *phi = as_a <gphi *> (stmt);
+
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree arg = gimple_phi_arg_def (stmt, i);
+
+	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+		worklist.safe_push (arg);
+	    }
+	}
+      else
+	{
+	  tree prev = gimple_vuse (stmt);
+
+	  /* Non-pure call statement is conservatively assumed to impact all
+	     memory locations.  So place call statements ahead of other memory
+	     stores in the vector with an idea of of using them as shortcut
+	     terminators to memory alias analysis.  */
+	  if (gimple_code (stmt) == GIMPLE_CALL)
+	    info->memory_stores.safe_push (stmt);
+	  else
+	    other_stores.safe_push (stmt);
+
+	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+	    worklist.safe_push (prev);
+	}
+    } while (!worklist.is_empty ());
+
+    info->memory_stores.safe_splice (other_stores);
+}
+
+
+/* Given STMT, memory load or pure call statement, check whether it is impacted
+   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
+   trace is composed of SKIP_HEAD and those basic block dominated by it, always
+   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
+   NULL, all basic blocks of LOOP are checked.  */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+
+  /* Collect memory store/clobber statements if have not do that.  */
+  if (info->need_init)
+    find_vdef_in_loop (loop);
+
+  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
+    {
+      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
+      if (skip_head
+	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+	continue;
+
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+	return false;
+    }
+
+  return true;
+}
+
+/* Forward declaration.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head);
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
+   certain iteration of LOOP, check whether an SSA name (NAME) remains
+   unchanged in next iteration.  We call this characteristic semi-
+   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
+   basic blocks and control flows in the loop will be considered.  If non-
+   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
+
+static bool
+ssa_semi_invariant_p (struct loop *loop, const tree name,
+		      const_basic_block skip_head)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside a loop is definitely semi-invariant.  */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  if (gimple_code (def) == GIMPLE_PHI)
+    {
+      /* For PHI node that is not in loop header, its source operands should
+	 be defined inside the loop, which are seen as loop variant.  */
+      if (def_bb != loop->header || !skip_head)
+	return false;
+
+      const_edge latch = loop_latch_edge (loop);
+      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
+
+      /* A PHI node in loop header contains two source operands, one is
+	 initial value, the other is the copy of last iteration through loop
+	 latch, we call it latch value.  From the PHI node to definition
+	 of latch value, if excluding branch trace from SKIP_HEAD, there
+	 is no definition of other version of same variable, SSA name defined
+	 by the PHI node is semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+	Suppose in certain iteration, execution flow in above graph goes
+	through true branch, which means that one source value to define
+	x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
+	x_1 in next iterations is defined by x_3, we know that x_1 will
+	never changed if COND always chooses true branch from then on.  */
+
+      while (from != name)
+	{
+	  /* A new value comes from a CONSTANT.  */
+	  if (TREE_CODE (from) != SSA_NAME)
+	    return false;
+
+	  gimple *stmt = SSA_NAME_DEF_STMT (from);
+	  const_basic_block bb = gimple_bb (stmt);
+
+	  /* A new value comes from outside of loop.  */
+	  if (!bb || !flow_bb_inside_loop_p (loop, bb))
+	    return false;
+
+	  from = NULL_TREE;
+
+	  if (gimple_code (stmt) == GIMPLE_PHI)
+	    {
+	      gphi *phi = as_a <gphi *> (stmt);
+
+	      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		{
+		  const_edge e = gimple_phi_arg_edge (phi, i);
+
+		  /* Don't consider redefinitions in excluded basic blocks.  */
+		  if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+		    {
+		      /* There are more than one source operands that can
+			 provide value to the SSA name, it is variant.  */
+		      if (from)
+			return false;
+
+		      from = gimple_phi_arg_def (phi, i);
+		    }
+		}
+	    }
+	  else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+	    {
+	      /* For simple value copy, check its rhs instead.  */
+	      if (gimple_assign_ssa_name_copy_p (stmt))
+		from = gimple_assign_rhs1 (stmt);
+	    }
+
+	  /* Any other kind of definition is deemed to introduce a new value
+	     to the SSA name.  */
+	  if (!from)
+	    return false;
+	}
+	return true;
+    }
+
+  /* Value originated from volatile memory load or return of normal (non-
+     const/pure) call should not be treated as constant in each iteration.  */
+  if (gimple_has_side_effects (def))
+    return false;
+
+  /* Check if any memory store may kill memory load at this place.  */
+  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
+    return false;
+
+  /* Check operands of definition statement of the SSA name.  */
+  return stmt_semi_invariant_p (loop, def, skip_head);
+}
+
+/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
+   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
+   dominated by it are excluded from the loop.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  ssa_op_iter iter;
+  tree use;
+
+  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
+     here we only need to check SSA name operands.  This is because check on
+     VARDECL operands, which involve memory loads, must have been done
+     prior to invocation of this function in vuse_semi_invariant_p.  */
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+    {
+      if (!ssa_semi_invariant_p (loop, use, skip_head))
+	return false;
+    }
+
+  return true;
+}
+
+/* Determine when conditional statement never transfers execution to one of its
+   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
+   and those basic blocks dominated by BRANCH_BB.  */
+
+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+	continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+	continue;
+
+       /* The branch can be reached from opposite branch, or from some
+	  statement not dominated by the conditional statement.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement (COND) is invariant in the
+   execution context of LOOP.  That is: once the branch is selected in certain
+   iteration of the loop, any operand that contributes to computation of the
+   conditional statement remains unchanged in all following iterations.  */
+
+static edge
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+	 this conditional statement.  Firstly, it is trivial if the exit branch
+	 is semi-invariant, for the statement is just to break loop.  Secondly,
+	 if the opposite branch is semi-invariant, it means that the statement
+	 is real loop-invariant, which is covered by loop unswitch.  */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+	return NULL;
+    }
+
+  invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!branch_removable_p (targ_bb[i]))
+	continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+	 loop latch, it and its following trace will only be executed in
+	 final iteration of loop, namely it is not part of repeated body
+	 of the loop.  Similar to the above case that the branch is loop
+	 exit, no need to split loop.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+	continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want.  */
+  if (invar[0] ^ !invar[1])
+    return NULL;
+
+  /* Found a real loop-invariant condition, do nothing.  */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return NULL;
+
+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
+}
+
+/* Calculate increased code size measured by estimated insn number if applying
+   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
+
+static int
+compute_added_num_insns (struct loop *loop, const_edge branch_edge)
+{
+  basic_block cond_bb = branch_edge->src;
+  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
+  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch.  */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
+	continue;
+
+      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
+    }
+
+  /* It is unnecessary to evaluate expression of the conditional statement
+     in new loop that contains only invariant branch.  This expression should
+     be constant value (either true or false).  Exclude code size of insns
+     that contribute to computation of the expression.  */
+
+  auto_vec<gimple *> worklist;
+  hash_set<gimple *> removed;
+  gimple *stmt = last_stmt (cond_bb);
+
+  worklist.safe_push (stmt);
+  removed.add (stmt);
+  num -= estimate_num_insns (stmt, &eni_size_weights);
+
+  do
+    {
+      ssa_op_iter opnd_iter;
+      use_operand_p opnd_p;
+
+      stmt = worklist.pop ();
+      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
+	{
+	  tree opnd = USE_FROM_PTR (opnd_p);
+
+	  if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
+	    continue;
+
+	  gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
+	  use_operand_p use_p;
+	  imm_use_iterator use_iter;
+
+	  if (removed.contains (opnd_stmt)
+	      || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
+	    continue;
+
+	  FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
+	    {
+	      gimple *use_stmt = USE_STMT (use_p);
+
+	      if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
+		{
+		  opnd_stmt = NULL;
+		  break;
+		}
+	    }
+
+	  if (opnd_stmt)
+	    {
+	      worklist.safe_push (opnd_stmt);
+	      removed.add (opnd_stmt);
+	      num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
+	    }
+	}
+    } while (!worklist.is_empty ());
+
+  gcc_assert (num >= 0);
+  return num;
+}
+
+/* Find out loop-invariant branch of a conditional statement (COND) if it has,
+   and check whether it is eligible and profitable to perform loop split upon
+   this branch in LOOP.  */
+
+static edge
+get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
+{
+  edge invar_branch = get_cond_invariant_branch (loop, cond);
+
+  if (!invar_branch)
+    return NULL;
+
+  profile_probability prob = invar_branch->probability;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go.  */
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_LOOP_COND_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+	return NULL;
+    }
+
+  /* Add a threshold for increased code size to disable loop split.  */
+  if (compute_added_num_insns (loop, invar_branch)
+      > PARAM_VALUE (PARAM_MAX_LOOP_COND_SPLIT_INSNS))
+    return NULL;
+
+  return invar_branch;
+}
+
+/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
+   conditional statement, perform loop split transformation illustrated
+   as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+       [ bool r = cond; ]     |     |                 |              |
+               |              |     |                 |              |
+      .---- if (r) -----.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              |  /                    |              |
+              / \             | /                    / \             |
+     .-------*   *       [ if (!r) ]        .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out.  In loop1, a new bool temporary (r)
+   is introduced to keep value of the condition result.  In original latch
+   edge of loop1, we insert a new conditional statement whose value comes
+   from previous temporary (r), one of its branch goes back to loop1 header
+   as a latch edge, and the other branch goes to loop2 pre-header as an entry
+   edge.  And also in loop2, we abandon the variant branch of the conditional
+   statement candidate by setting a constant bool condition, based on which
+   branch is semi-invariant.  */
+
+static bool
+do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
+{
+  basic_block cond_bb = invar_branch->src;
+  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
+  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
+
+  gcc_assert (cond_bb->loop_father == loop1);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+	      current_function_name (), loop1->num,
+	      true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+				     profile_probability::always (),
+				     profile_probability::never (),
+				     profile_probability::always (),
+				     profile_probability::always (),
+				     true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+				      gimple_cond_code (cond),
+				      gimple_cond_lhs (cond),
+				      gimple_cond_rhs (cond));
+
+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
+  update_stmt (cond);
+
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  /* Replace the condition in loop2 with a bool constant to let PassManager
+     remove the variant branch after current pass completes.  */
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1.  This
+     statement acts as a switch to transfer execution from loop1 to loop2,
+     when loop1 enters into invariant state.  */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
+					  NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
+  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2.  */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Traverse all conditional statements in LOOP, to find out a good candidate
+   upon which we can do loop split.  */
+
+static bool
+split_loop_on_cond (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+  bool do_split = false;
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field.  */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* We only consider conditional statement, which be executed at most once
+	 in each iteration of the loop.  So skip statements in inner loops.  */
+      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
+	continue;
+
+      /* Actually this check is not a must constraint.  With it, we can ensure
+	 conditional statement will always be executed in each iteration.  */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+	continue;
+
+      gcond *cond = as_a <gcond *> (last);
+      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
+
+      if (branch_edge)
+	{
+	  do_split_loop_on_cond (loop, branch_edge);
+	  do_split = true;
+	  break;
+	}
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return do_split;
+}
+
 /* Main entry point.  Perform loop splitting on all suitable loops.  */
 
 static unsigned int
@@ -627,7 +1369,6 @@ tree_ssa_split_loops (void)
   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
     {
-      class tree_niter_desc niter;
       if (loop->aux)
 	{
 	  /* If any of our inner loops was split, don't split us,
@@ -636,29 +1377,14 @@ tree_ssa_split_loops (void)
 	  continue;
 	}
 
-      if (single_exit (loop)
-	  /* ??? We could handle non-empty latches when we split
-	     the latch edge (not the exit edge), and put the new
-	     exit condition in the new block.  OTOH this executes some
-	     code unconditionally that might have been skipped by the
-	     original exit before.  */
-	  && empty_block_p (loop->latch)
-	  && !optimize_loop_for_size_p (loop)
-	  && easy_exit_values (loop)
-	  && number_of_iterations_exit (loop, single_exit (loop), &niter,
-					false, true)
-	  && niter.cmp != ERROR_MARK
-	  /* We can't yet handle loops controlled by a != predicate.  */
-	  && niter.cmp != NE_EXPR
-	  && can_duplicate_loop_p (loop))
+      if (optimize_loop_for_size_p (loop))
+        continue;
+
+      if (split_loop (loop) || split_loop_on_cond (loop))
 	{
-	  if (split_loop (loop, &niter))
-	    {
-	      /* Mark our containing loop as having had some split inner
-	         loops.  */
-	      loop_outer (loop)->aux = loop;
-	      changed = true;
-	    }
+	  /* Mark our containing loop as having had some split inner loops.  */
+	  loop_outer (loop)->aux = loop;
+	  changed = true;
 	}
     }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-23  9:37                             ` Feng Xue OS
@ 2019-10-23 10:32                               ` Richard Biener
  2019-10-25  5:20                                 ` Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2019-10-23 10:32 UTC (permalink / raw)
  To: Feng Xue OS
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

On Wed, Oct 23, 2019 at 11:11 AM Feng Xue OS
<fxue@os.amperecomputing.com> wrote:
>
> Patch attached.

+      /* For PHI node that is not in loop header, its source operands should
+        be defined inside the loop, which are seen as loop variant.  */
+      if (def_bb != loop->header || !skip_head)
+       return false;

so if we have

 for (;;)
  {
     if (x)
       a = ..;
     else
       a = ...;
     if (cond-to-split-on dependent on a)
...
  }

the above is too restrictive in case 'x' is semi-invariant as well, correct?

+         /* A new value comes from outside of loop.  */
+         if (!bb || !flow_bb_inside_loop_p (loop, bb))
+           return false;

but that means starting from the second iteration the value is invariant.

+                 /* Don't consider redefinitions in excluded basic blocks.  */
+                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+                   {
+                     /* There are more than one source operands that can
+                        provide value to the SSA name, it is variant.  */
+                     if (from)
+                       return false;

they might be the same though, for PHIs with > 2 arguments.

In the cycle handling you are not recursing via stmt_semi_invariant_p
but only handle SSA name copies - any particular reason for that?

+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+    return true;

I'm not sure what this function tests - at least the single_pred_p check
looks odd to me given the dominator checks later.  The single predecessor
could simply be a forwarder.  I wonder if you are looking for branches forming
an irreducible loop?  I think you can then check EDGE_IRREDUCIBLE_LOOP
or BB_IRREDUCIBLE_LOOP on the condition block (btw, I don't see
testcases covering the appearant special-cases in the patch - refering to
existing ones via a comment often helps understanding the code).

+
+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
+}

magic ensures that invar[1] is always the invariant edge?  Oh, it's a bool.
Ick.  I wonder if logic with int invariant_edge = -1; and the loop setting
it to either 0 or 1 would be easier to follow...

Note your stmt_semi_invariant_p check is exponential for a condition
like

   _1 = 1;
   _2 = _1 + _1;
   _3 = _2 + _2;
   if (_3 != param_4(D))

because you don't track ops you already proved semi-invariant.  We've
run into such situation repeatedly in SCEV analysis so I doubt it can be
disregarded as irrelevant in practice.  A worklist approach could then
also get rid of the recursion.  You are already computing the stmts
forming the condition in compute_added_num_insns so another option
is to re-use that.

Btw, I wonder if we can simply re-use PARAM_MAX_PEELED_INSNS
instead of adding yet another param (it also happens to have the same
size).  Because we are "peeling" the loop.

+  edge invar_branch = get_cond_invariant_branch (loop, cond);
+
+  if (!invar_branch)
+    return NULL;

extra vertical space is unwanted in such cases.

+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+             current_function_name (), loop1->num,
+             true_invar ? "T" : "F", cond_bb->index);
+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }

can you please use sth like

  if (dump_enabled_p ())
    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
                             cond, "loop split on semi-invariant condition");

so -fopt-info-loop will show it?

+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+                                     gimple_cond_code (cond),
+                                     gimple_cond_lhs (cond),
+                                     gimple_cond_rhs (cond));

shorter is

   gimple_seq stmts = NULL;
   tree tmp = gimple_build (&stmts, gimple_cond_code (cond),
                                      boolean_type_node,
gimple_cond_lhs (cond), gimple_cond_rhs (cond));
   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);

+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);

but I wonder what's the point here to move the condition computation to
a temporary?  Why not just build the original condition again for break_cond?

in split_loop_on_cond you'll find the first semi-invariant condition
to split on,
but we'll not visit the split loop again (also for original splitting I guess).
Don't we eventually want to recurse on that?

Otherwise the patch looks reasonable.  Sorry for the many bits above and the
late real review from me...

Thanks,
Richard.


> Feng
>
> ________________________________________
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, October 23, 2019 5:04 PM
> To: Feng Xue OS
> Cc: Michael Matz; Philipp Tomsich; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.ochoa@theobroma-systems.com
> Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
>
> On Wed, Oct 23, 2019 at 5:36 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
> >
> > Michael,
> >
> > > I've only noticed a couple typos, and one minor remark.
> > Typos corrected.
> >
> > > I just wonder why you duplicated these three loops instead of integrating
> > > the real body into the existing LI_FROM_INNERMOST loop.  I would have
> > > expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block
> > > to simply be the else block of the existing
> > > "if (... conditions for normal loop splitting ...)" block.
> > Adjusted to do two kinds of loop-split in same LI_FROM_INNERMOST loop.
> >
> > > From my perspective it's okay, but you still need the okay of a proper reviewer,
> > > for which you might want to state the testing/regression state of this
> > > patch relative to trunk.
> >
> > Richard,
> >
> >   Is it ok to commit this patch? Bootstrap and regression test passed. And for
> > performance, we can get about 7% improvement on spec2017 omnetpp with this
> > patch.
>
> Can you please provide the corresponding ChangeLog entries as well and
> attach the patch?  It seems to be garbled by some encoding.
>
> Thanks,
> Richard.
>
> > Thanks,
> > Feng
> >
> > ---
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 1407d019d14..d41e5aa0215 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a single loop.
> >  @item lim-expensive
> >  The minimum cost of an expensive expression in the loop invariant motion.
> >
> > +@item max-loop-cond-split-insns
> > +In a loop, if a branch of a conditional statement is selected since certain
> > +loop iteration, any operand that contributes to computation of the conditional
> > +expression remains unchanged in all following iterations, the statement is
> > +semi-invariant, upon which we can do a kind of loop split transformation.
> > +@option{max-loop-cond-split-insns} controls maximum number of insns to be
> > +added due to loop split on semi-invariant conditional statement.
> > +
> > +@item min-loop-cond-split-prob
> > +When FDO profile information is available, @option{min-loop-cond-split-prob}
> > +specifies minimum threshold for probability of semi-invariant condition
> > +statement to trigger loop split.
> > +
> >  @item iv-consider-all-candidates-bound
> >  Bound on number of candidates for induction variables, below which
> >  all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 322c37f8b96..73b59f7465e 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> >         "The maximum number of unswitchings in a single loop.",
> >         3, 0, 0)
> >
> > +/* The maximum number of increased insns due to loop split on semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
> > +       "max-loop-cond-split-insns",
> > +       "The maximum number of insns to be added due to loop split on "
> > +       "semi-invariant condition statement.",
> > +       100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
> > +       "min-loop-cond-split-prob",
> > +       "The minimum threshold for probability of semi-invariant condition "
> > +       "statement to trigger loop split.",
> > +       30, 0, 100)
> > +
> >  /* The maximum number of insns in loop header duplicated by the copy loop
> >     headers pass.  */
> >  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> >
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > new file mode 100644
> > index 00000000000..51f9da22fc7
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +#include <string>
> > +#include <map>
> > +
> > +using namespace std;
> > +
> > +class  A
> > +{
> > +public:
> > +  bool empty;
> > +  void set (string s);
> > +};
> > +
> > +class  B
> > +{
> > +  map<int, string> m;
> > +  void f ();
> > +};
> > +
> > +extern A *ga;
> > +
> > +void B::f ()
> > +{
> > +  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
> > +    {
> > +      if (ga->empty)
> > +        ga->set (iter->second);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > new file mode 100644
> > index 00000000000..bbd522d6bcd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> > +{
> > +  return i + 1;
> > +}
> > +
> > +extern int do_something (void);
> > +extern int b;
> > +
> > +void test(int n)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < n; i = inc (i))
> > +    {
> > +      if (b)
> > +        b = do_something();
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index f5f083384bc..5cffd4bb508 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-ssa-loop.h"
> >  #include "tree-ssa-loop-manip.h"
> >  #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> > +#include "tree-cfgcleanup.h"
> >  #include "cfgloop.h"
> > +#include "params.h"
> >  #include "tree-scalar-evolution.h"
> >  #include "gimple-iterator.h"
> >  #include "gimple-pretty-print.h"
> > @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimple-fold.h"
> >  #include "gimplify-me.h"
> >
> > -/* This file implements loop splitting, i.e. transformation of loops like
> > +/* This file implements two kinds of loop splitting.
> > +
> > +   One transformation of loops like:
> >
> >     for (i = 0; i < 100; i++)
> >       {
> > @@ -487,8 +492,9 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
> >     single exit of LOOP.  */
> >
> >  static bool
> > -split_loop (class loop *loop1, class tree_niter_desc *niter)
> > +split_loop (class loop *loop1)
> >  {
> > +  class tree_niter_desc niter;
> >    basic_block *bbs;
> >    unsigned i;
> >    bool changed = false;
> > @@ -496,8 +502,28 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
> >    tree border = NULL_TREE;
> >    affine_iv iv;
> >
> > +  if (!single_exit (loop1)
> > +      /* ??? We could handle non-empty latches when we split the latch edge
> > +         (not the exit edge), and put the new exit condition in the new block.
> > +        OTOH this executes some code unconditionally that might have been
> > +        skipped by the original exit before.  */
> > +      || !empty_block_p (loop1->latch)
> > +      || !easy_exit_values (loop1)
> > +      || !number_of_iterations_exit (loop1, single_exit (loop1), &niter,
> > +                                    false, true)
> > +      || niter.cmp == ERROR_MARK
> > +      /* We can't yet handle loops controlled by a != predicate.  */
> > +      || niter.cmp == NE_EXPR)
> > +    return false;
> > +
> >    bbs = get_loop_body (loop1);
> >
> > +  if (!can_copy_bbs_p (bbs, loop1->num_nodes))
> > +    {
> > +      free (bbs);
> > +      return false;
> > +    }
> > +
> >    /* Find a splitting opportunity.  */
> >    for (i = 0; i < loop1->num_nodes; i++)
> >      if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
> > @@ -505,8 +531,8 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
> >         /* Handling opposite steps is not implemented yet.  Neither
> >            is handling different step sizes.  */
> >         if ((tree_int_cst_sign_bit (iv.step)
> > -            != tree_int_cst_sign_bit (niter->control.step))
> > -           || !tree_int_cst_equal (iv.step, niter->control.step))
> > +            != tree_int_cst_sign_bit (niter.control.step))
> > +           || !tree_int_cst_equal (iv.step, niter.control.step))
> >           continue;
> >
> >         /* Find a loop PHI node that defines guard_iv directly,
> > @@ -575,7 +601,7 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
> >            Compute the new bound for the guarding IV and patch the
> >            loop exit to use it instead of original IV and bound.  */
> >         gimple_seq stmts = NULL;
> > -       tree newend = compute_new_first_bound (&stmts, niter, border,
> > +       tree newend = compute_new_first_bound (&stmts, &niter, border,
> >                                                guard_code, guard_init);
> >         if (stmts)
> >           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> > @@ -612,6 +638,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
> >    return changed;
> >  }
> >
> > +/* Another transformation of loops like:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))  // expr is pure
> > +         a_j = ...;  // change at least one a_j
> > +       else
> > +         S;          // not change any a_j
> > +     }
> > +
> > +   into:
> > +
> > +   for (i = INIT (); CHECK (i); i = NEXT ())
> > +     {
> > +       if (expr (a_1, a_2, ..., a_n))
> > +         a_j = ...;
> > +       else
> > +         {
> > +           S;
> > +           i = NEXT ();
> > +           break;
> > +         }
> > +     }
> > +
> > +   for (; CHECK (i); i = NEXT ())
> > +     {
> > +       S;
> > +     }
> > +
> > +   */
> > +
> > +/* Data structure to hold temporary information during loop split upon
> > +   semi-invariant conditional statement.  */
> > +class split_info {
> > +public:
> > +  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
> > +  basic_block *bbs;
> > +
> > +  /* All memory store/clobber statements in a loop.  */
> > +  auto_vec<gimple *> memory_stores;
> > +
> > +  /* Whether above memory stores vector has been filled.  */
> > +  int need_init;
> > +
> > +  split_info () : bbs (NULL),  need_init (true) { }
> > +
> > +  ~split_info ()
> > +    {
> > +      if (bbs)
> > +       free (bbs);
> > +    }
> > +};
> > +
> > +/* Find all statements with memory-write effect in LOOP, including memory
> > +   store and non-pure function call, and keep those in a vector.  This work
> > +   is only done one time, for the vector should be constant during analysis
> > +   stage of semi-invariant condition.  */
> > +
> > +static void
> > +find_vdef_in_loop (struct loop *loop)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +  gphi *vphi = get_virtual_phi (loop->header);
> > +
> > +  /* Indicate memory store vector has been filled.  */
> > +  info->need_init = false;
> > +
> > +  /* If loop contains memory operation, there must be a virtual PHI node in
> > +     loop header basic block.  */
> > +  if (vphi == NULL)
> > +    return;
> > +
> > +  /* All virtual SSA names inside the loop are connected to be a cyclic
> > +     graph via virtual PHI nodes.  The virtual PHI node in loop header just
> > +     links the first and the last virtual SSA names, by using the last as
> > +     PHI operand to define the first.  */
> > +  const edge latch = loop_latch_edge (loop);
> > +  const tree first = gimple_phi_result (vphi);
> > +  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
> > +
> > +  /* The virtual SSA cyclic graph might consist of only one SSA name, who
> > +     is defined by itself.
> > +
> > +       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
> > +
> > +     This means the loop contains only memory loads, so we can skip it.  */
> > +  if (first == last)
> > +    return;
> > +
> > +  auto_vec<gimple *> other_stores;
> > +  auto_vec<tree> worklist;
> > +  auto_bitmap visited;
> > +
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
> > +  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
> > +  worklist.safe_push (last);
> > +
> > +  do
> > +    {
> > +      tree vuse = worklist.pop ();
> > +      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
> > +
> > +      /* We mark the first and last SSA names as visited at the beginning,
> > +        and reversely start the process from the last SSA name towards the
> > +        first, which ensures that this do-while will not touch SSA names
> > +        defined outside of the loop.  */
> > +      gcc_assert (gimple_bb (stmt)
> > +                 && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
> > +
> > +      if (gimple_code (stmt) == GIMPLE_PHI)
> > +       {
> > +         gphi *phi = as_a <gphi *> (stmt);
> > +
> > +         for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +           {
> > +             tree arg = gimple_phi_arg_def (stmt, i);
> > +
> > +             if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
> > +               worklist.safe_push (arg);
> > +           }
> > +       }
> > +      else
> > +       {
> > +         tree prev = gimple_vuse (stmt);
> > +
> > +         /* Non-pure call statement is conservatively assumed to impact all
> > +            memory locations.  So place call statements ahead of other memory
> > +            stores in the vector with an idea of of using them as shortcut
> > +            terminators to memory alias analysis.  */
> > +         if (gimple_code (stmt) == GIMPLE_CALL)
> > +           info->memory_stores.safe_push (stmt);
> > +         else
> > +           other_stores.safe_push (stmt);
> > +
> > +         if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
> > +           worklist.safe_push (prev);
> > +       }
> > +    } while (!worklist.is_empty ());
> > +
> > +    info->memory_stores.safe_splice (other_stores);
> > +}
> > +
> > +
> > +/* Given STMT, memory load or pure call statement, check whether it is impacted
> > +   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
> > +   trace is composed of SKIP_HEAD and those basic block dominated by it, always
> > +   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
> > +   NULL, all basic blocks of LOOP are checked.  */
> > +
> > +static bool
> > +vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                      const_basic_block skip_head)
> > +{
> > +  split_info *info = (split_info *) loop->aux;
> > +
> > +  /* Collect memory store/clobber statements if have not do that.  */
> > +  if (info->need_init)
> > +    find_vdef_in_loop (loop);
> > +
> > +  tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
> > +  ao_ref ref;
> > +  gimple *store;
> > +  unsigned i;
> > +
> > +  ao_ref_init (&ref, rhs);
> > +
> > +  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
> > +    {
> > +      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
> > +      if (skip_head
> > +         && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
> > +       continue;
> > +
> > +      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
> > +       return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Forward declaration.  */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                      const_basic_block skip_head);
> > +
> > +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
> > +   certain iteration of LOOP, check whether an SSA name (NAME) remains
> > +   unchanged in next iteration.  We call this characteristic semi-
> > +   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all
> > +   basic blocks and control flows in the loop will be considered.  If non-
> > +   NULL, SSA name to check is supposed to be defined before SKIP_HEAD.  */
> > +
> > +static bool
> > +ssa_semi_invariant_p (struct loop *loop, const tree name,
> > +                     const_basic_block skip_head)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (name);
> > +  const_basic_block def_bb = gimple_bb (def);
> > +
> > +  /* An SSA name defined outside a loop is definitely semi-invariant.  */
> > +  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
> > +    return true;
> > +
> > +  if (gimple_code (def) == GIMPLE_PHI)
> > +    {
> > +      /* For PHI node that is not in loop header, its source operands should
> > +        be defined inside the loop, which are seen as loop variant.  */
> > +      if (def_bb != loop->header || !skip_head)
> > +       return false;
> > +
> > +      const_edge latch = loop_latch_edge (loop);
> > +      tree from = PHI_ARG_DEF_FROM_EDGE (as_a <gphi *> (def), latch);
> > +
> > +      /* A PHI node in loop header contains two source operands, one is
> > +        initial value, the other is the copy of last iteration through loop
> > +        latch, we call it latch value.  From the PHI node to definition
> > +        of latch value, if excluding branch trace from SKIP_HEAD, there
> > +        is no definition of other version of same variable, SSA name defined
> > +        by the PHI node is semi-invariant.
> > +
> > +                         loop entry
> > +                              |     .--- latch ---.
> > +                              |     |             |
> > +                              v     v             |
> > +                  x_1 = PHI <x_0,  x_3>           |
> > +                           |                      |
> > +                           v                      |
> > +              .------- if (cond) -------.         |
> > +              |                         |         |
> > +              |                     [ SKIP ]      |
> > +              |                         |         |
> > +              |                     x_2 = ...     |
> > +              |                         |         |
> > +              '---- T ---->.<---- F ----'         |
> > +                           |                      |
> > +                           v                      |
> > +                  x_3 = PHI <x_1, x_2>            |
> > +                           |                      |
> > +                           '----------------------'
> > +
> > +       Suppose in certain iteration, execution flow in above graph goes
> > +       through true branch, which means that one source value to define
> > +       x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and
> > +       x_1 in next iterations is defined by x_3, we know that x_1 will
> > +       never changed if COND always chooses true branch from then on.  */
> > +
> > +      while (from != name)
> > +       {
> > +         /* A new value comes from a CONSTANT.  */
> > +         if (TREE_CODE (from) != SSA_NAME)
> > +           return false;
> > +
> > +         gimple *stmt = SSA_NAME_DEF_STMT (from);
> > +         const_basic_block bb = gimple_bb (stmt);
> > +
> > +         /* A new value comes from outside of loop.  */
> > +         if (!bb || !flow_bb_inside_loop_p (loop, bb))
> > +           return false;
> > +
> > +         from = NULL_TREE;
> > +
> > +         if (gimple_code (stmt) == GIMPLE_PHI)
> > +           {
> > +             gphi *phi = as_a <gphi *> (stmt);
> > +
> > +             for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> > +               {
> > +                 const_edge e = gimple_phi_arg_edge (phi, i);
> > +
> > +                 /* Don't consider redefinitions in excluded basic blocks.  */
> > +                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> > +                   {
> > +                     /* There are more than one source operands that can
> > +                        provide value to the SSA name, it is variant.  */
> > +                     if (from)
> > +                       return false;
> > +
> > +                     from = gimple_phi_arg_def (phi, i);
> > +                   }
> > +               }
> > +           }
> > +         else if (gimple_code (stmt) == GIMPLE_ASSIGN)
> > +           {
> > +             /* For simple value copy, check its rhs instead.  */
> > +             if (gimple_assign_ssa_name_copy_p (stmt))
> > +               from = gimple_assign_rhs1 (stmt);
> > +           }
> > +
> > +         /* Any other kind of definition is deemed to introduce a new value
> > +            to the SSA name.  */
> > +         if (!from)
> > +           return false;
> > +       }
> > +       return true;
> > +    }
> > +
> > +  /* Value originated from volatile memory load or return of normal (non-
> > +     const/pure) call should not be treated as constant in each iteration.  */
> > +  if (gimple_has_side_effects (def))
> > +    return false;
> > +
> > +  /* Check if any memory store may kill memory load at this place.  */
> > +  if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head))
> > +    return false;
> > +
> > +  /* Check operands of definition statement of the SSA name.  */
> > +  return stmt_semi_invariant_p (loop, def, skip_head);
> > +}
> > +
> > +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
> > +   semi-invariant.  Trace composed of basic block SKIP_HEAD and basic blocks
> > +   dominated by it are excluded from the loop.  */
> > +
> > +static bool
> > +stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
> > +                      const_basic_block skip_head)
> > +{
> > +  ssa_op_iter iter;
> > +  tree use;
> > +
> > +  /* Although operand of a statement might be SSA name, CONSTANT or VARDECL,
> > +     here we only need to check SSA name operands.  This is because check on
> > +     VARDECL operands, which involve memory loads, must have been done
> > +     prior to invocation of this function in vuse_semi_invariant_p.  */
> > +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> > +    {
> > +      if (!ssa_semi_invariant_p (loop, use, skip_head))
> > +       return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine when conditional statement never transfers execution to one of its
> > +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> > +   and those basic blocks dominated by BRANCH_BB.  */
> > +
> > +static bool
> > +branch_removable_p (basic_block branch_bb)
> > +{
> > +  if (single_pred_p (branch_bb))
> > +    return true;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +
> > +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> > +    {
> > +      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> > +       continue;
> > +
> > +      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> > +       continue;
> > +
> > +       /* The branch can be reached from opposite branch, or from some
> > +         statement not dominated by the conditional statement.  */
> > +      return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Find out which branch of a conditional statement (COND) is invariant in the
> > +   execution context of LOOP.  That is: once the branch is selected in certain
> > +   iteration of the loop, any operand that contributes to computation of the
> > +   conditional statement remains unchanged in all following iterations.  */
> > +
> > +static edge
> > +get_cond_invariant_branch (struct loop *loop, gcond *cond)
> > +{
> > +  basic_block cond_bb = gimple_bb (cond);
> > +  basic_block targ_bb[2];
> > +  bool invar[2];
> > +  unsigned invar_checks;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
> > +
> > +      /* One branch directs to loop exit, no need to perform loop split upon
> > +        this conditional statement.  Firstly, it is trivial if the exit branch
> > +        is semi-invariant, for the statement is just to break loop.  Secondly,
> > +        if the opposite branch is semi-invariant, it means that the statement
> > +        is real loop-invariant, which is covered by loop unswitch.  */
> > +      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
> > +       return NULL;
> > +    }
> > +
> > +  invar_checks = 0;
> > +
> > +  for (unsigned i = 0; i < 2; i++)
> > +    {
> > +      invar[!i] = false;
> > +
> > +      if (!branch_removable_p (targ_bb[i]))
> > +       continue;
> > +
> > +      /* Given a semi-invariant branch, if its opposite branch dominates
> > +        loop latch, it and its following trace will only be executed in
> > +        final iteration of loop, namely it is not part of repeated body
> > +        of the loop.  Similar to the above case that the branch is loop
> > +        exit, no need to split loop.  */
> > +      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
> > +       continue;
> > +
> > +      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
> > +      invar_checks++;
> > +    }
> > +
> > +  /* With both branches being invariant (handled by loop unswitch) or
> > +     variant is not what we want.  */
> > +  if (invar[0] ^ !invar[1])
> > +    return NULL;
> > +
> > +  /* Found a real loop-invariant condition, do nothing.  */
> > +  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
> > +    return NULL;
> > +
> > +  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> > +}
> > +
> > +/* Calculate increased code size measured by estimated insn number if applying
> > +   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
> > +
> > +static int
> > +compute_added_num_insns (struct loop *loop, const_edge branch_edge)
> > +{
> > +  basic_block cond_bb = branch_edge->src;
> > +  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
> > +  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
> > +  basic_block *bbs = ((split_info *) loop->aux)->bbs;
> > +  int num = 0;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      /* Do no count basic blocks only in opposite branch.  */
> > +      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
> > +       continue;
> > +
> > +      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
> > +    }
> > +
> > +  /* It is unnecessary to evaluate expression of the conditional statement
> > +     in new loop that contains only invariant branch.  This expression should
> > +     be constant value (either true or false).  Exclude code size of insns
> > +     that contribute to computation of the expression.  */
> > +
> > +  auto_vec<gimple *> worklist;
> > +  hash_set<gimple *> removed;
> > +  gimple *stmt = last_stmt (cond_bb);
> > +
> > +  worklist.safe_push (stmt);
> > +  removed.add (stmt);
> > +  num -= estimate_num_insns (stmt, &eni_size_weights);
> > +
> > +  do
> > +    {
> > +      ssa_op_iter opnd_iter;
> > +      use_operand_p opnd_p;
> > +
> > +      stmt = worklist.pop ();
> > +      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
> > +       {
> > +         tree opnd = USE_FROM_PTR (opnd_p);
> > +
> > +         if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
> > +           continue;
> > +
> > +         gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
> > +         use_operand_p use_p;
> > +         imm_use_iterator use_iter;
> > +
> > +         if (removed.contains (opnd_stmt)
> > +             || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
> > +           continue;
> > +
> > +         FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
> > +           {
> > +             gimple *use_stmt = USE_STMT (use_p);
> > +
> > +             if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
> > +               {
> > +                 opnd_stmt = NULL;
> > +                 break;
> > +               }
> > +           }
> > +
> > +         if (opnd_stmt)
> > +           {
> > +             worklist.safe_push (opnd_stmt);
> > +             removed.add (opnd_stmt);
> > +             num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
> > +           }
> > +       }
> > +    } while (!worklist.is_empty ());
> > +
> > +  gcc_assert (num >= 0);
> > +  return num;
> > +}
> > +
> > +/* Find out loop-invariant branch of a conditional statement (COND) if it has,
> > +   and check whether it is eligible and profitable to perform loop split upon
> > +   this branch in LOOP.  */
> > +
> > +static edge
> > +get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
> > +{
> > +  edge invar_branch = get_cond_invariant_branch (loop, cond);
> > +
> > +  if (!invar_branch)
> > +    return NULL;
> > +
> > +  profile_probability prob = invar_branch->probability;
> > +
> > +  /* When accurate profile information is available, and execution
> > +     frequency of the branch is too low, just let it go.  */
> > +  if (prob.reliable_p ())
> > +    {
> > +      int thres = PARAM_VALUE (PARAM_MIN_LOOP_COND_SPLIT_PROB);
> > +
> > +      if (prob < profile_probability::always ().apply_scale (thres, 100))
> > +       return NULL;
> > +    }
> > +
> > +  /* Add a threshold for increased code size to disable loop split.  */
> > +  if (compute_added_num_insns (loop, invar_branch)
> > +      > PARAM_VALUE (PARAM_MAX_LOOP_COND_SPLIT_INSNS))
> > +    return NULL;
> > +
> > +  return invar_branch;
> > +}
> > +
> > +/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
> > +   conditional statement, perform loop split transformation illustrated
> > +   as the following graph.
> > +
> > +               .-------T------ if (true) ------F------.
> > +               |                    .---------------. |
> > +               |                    |               | |
> > +               v                    |               v v
> > +          pre-header                |            pre-header
> > +               | .------------.     |                 | .------------.
> > +               | |            |     |                 | |            |
> > +               | v            |     |                 | v            |
> > +             header           |     |               header           |
> > +               |              |     |                 |              |
> > +       [ bool r = cond; ]     |     |                 |              |
> > +               |              |     |                 |              |
> > +      .---- if (r) -----.     |     |        .--- if (true) ---.     |
> > +      |                 |     |     |        |                 |     |
> > +  invariant             |     |     |    invariant             |     |
> > +      |                 |     |     |        |                 |     |
> > +      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
> > +               |              |    /                  |              |
> > +             stmts            |   /                 stmts            |
> > +               |              |  /                    |              |
> > +              / \             | /                    / \             |
> > +     .-------*   *       [ if (!r) ]        .-------*   *            |
> > +     |           |            |             |           |            |
> > +     |         latch          |             |         latch          |
> > +     |           |            |             |           |            |
> > +     |           '------------'             |           '------------'
> > +     '------------------------. .-----------'
> > +             loop1            | |                   loop2
> > +                              v v
> > +                             exits
> > +
> > +   In the graph, loop1 represents the part derived from original one, and
> > +   loop2 is duplicated using loop_version (), which corresponds to the part
> > +   of original one being splitted out.  In loop1, a new bool temporary (r)
> > +   is introduced to keep value of the condition result.  In original latch
> > +   edge of loop1, we insert a new conditional statement whose value comes
> > +   from previous temporary (r), one of its branch goes back to loop1 header
> > +   as a latch edge, and the other branch goes to loop2 pre-header as an entry
> > +   edge.  And also in loop2, we abandon the variant branch of the conditional
> > +   statement candidate by setting a constant bool condition, based on which
> > +   branch is semi-invariant.  */
> > +
> > +static bool
> > +do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> > +{
> > +  basic_block cond_bb = invar_branch->src;
> > +  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
> > +  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
> > +
> > +  gcc_assert (cond_bb->loop_father == loop1);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   {
> > +     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> > +             current_function_name (), loop1->num,
> > +             true_invar ? "T" : "F", cond_bb->index);
> > +     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> > +   }
> > +
> > +  initialize_original_copy_tables ();
> > +
> > +  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > +                                    profile_probability::always (),
> > +                                    profile_probability::never (),
> > +                                    profile_probability::always (),
> > +                                    profile_probability::always (),
> > +                                    true);
> > +  if (!loop2)
> > +    {
> > +      free_original_copy_tables ();
> > +      return false;
> > +    }
> > +
> > +  /* Generate a bool type temporary to hold result of the condition.  */
> > +  tree tmp = make_ssa_name (boolean_type_node);
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple *stmt = gimple_build_assign (tmp,
> > +                                     gimple_cond_code (cond),
> > +                                     gimple_cond_lhs (cond),
> > +                                     gimple_cond_rhs (cond));
> > +
> > +  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> > +  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> > +  update_stmt (cond);
> > +
> > +  basic_block cond_bb_copy = get_bb_copy (cond_bb);
> > +  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
> > +
> > +  /* Replace the condition in loop2 with a bool constant to let PassManager
> > +     remove the variant branch after current pass completes.  */
> > +  if (true_invar)
> > +    gimple_cond_make_true (cond_copy);
> > +  else
> > +    gimple_cond_make_false (cond_copy);
> > +
> > +  update_stmt (cond_copy);
> > +
> > +  /* Insert a new conditional statement on latch edge of loop1.  This
> > +     statement acts as a switch to transfer execution from loop1 to loop2,
> > +     when loop1 enters into invariant state.  */
> > +  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
> > +  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
> > +  gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node,
> > +                                         NULL_TREE, NULL_TREE);
> > +
> > +  gsi = gsi_last_bb (break_bb);
> > +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
> > +
> > +  edge to_loop1 = single_succ_edge (break_bb);
> > +  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
> > +
> > +  to_loop1->flags &= ~EDGE_FALLTHRU;
> > +  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> > +  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> > +
> > +  update_ssa (TODO_update_ssa);
> > +
> > +  /* Due to introduction of a control flow edge from loop1 latch to loop2
> > +     pre-header, we should update PHIs in loop2 to reflect this connection
> > +     between loop1 and loop2.  */
> > +  connect_loop_phis (loop1, loop2, to_loop2);
> > +
> > +  free_original_copy_tables ();
> > +
> > +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
> > +
> > +  return true;
> > +}
> > +
> > +/* Traverse all conditional statements in LOOP, to find out a good candidate
> > +   upon which we can do loop split.  */
> > +
> > +static bool
> > +split_loop_on_cond (struct loop *loop)
> > +{
> > +  split_info *info = new split_info ();
> > +  basic_block *bbs = info->bbs = get_loop_body (loop);
> > +  bool do_split = false;
> > +
> > +  /* Allocate an area to keep temporary info, and associate its address
> > +     with loop aux field.  */
> > +  loop->aux = info;
> > +
> > +  for (unsigned i = 0; i < loop->num_nodes; i++)
> > +    {
> > +      basic_block bb = bbs[i];
> > +
> > +      /* We only consider conditional statement, which be executed at most once
> > +        in each iteration of the loop.  So skip statements in inner loops.  */
> > +      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
> > +       continue;
> > +
> > +      /* Actually this check is not a must constraint.  With it, we can ensure
> > +        conditional statement will always be executed in each iteration.  */
> > +      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +       continue;
> > +
> > +      gimple *last = last_stmt (bb);
> > +
> > +      if (!last || gimple_code (last) != GIMPLE_COND)
> > +       continue;
> > +
> > +      gcond *cond = as_a <gcond *> (last);
> > +      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
> > +
> > +      if (branch_edge)
> > +       {
> > +         do_split_loop_on_cond (loop, branch_edge);
> > +         do_split = true;
> > +         break;
> > +       }
> > +    }
> > +
> > +  delete info;
> > +  loop->aux = NULL;
> > +
> > +  return do_split;
> > +}
> > +
> >  /* Main entry point.  Perform loop splitting on all suitable loops.  */
> >
> >  static unsigned int
> > @@ -627,7 +1369,6 @@ tree_ssa_split_loops (void)
> >    /* Go through all loops starting from innermost.  */
> >    FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> >      {
> > -      class tree_niter_desc niter;
> >        if (loop->aux)
> >         {
> >           /* If any of our inner loops was split, don't split us,
> > @@ -636,29 +1377,14 @@ tree_ssa_split_loops (void)
> >           continue;
> >         }
> >
> > -      if (single_exit (loop)
> > -         /* ??? We could handle non-empty latches when we split
> > -            the latch edge (not the exit edge), and put the new
> > -            exit condition in the new block.  OTOH this executes some
> > -            code unconditionally that might have been skipped by the
> > -            original exit before.  */
> > -         && empty_block_p (loop->latch)
> > -         && !optimize_loop_for_size_p (loop)
> > -         && easy_exit_values (loop)
> > -         && number_of_iterations_exit (loop, single_exit (loop), &niter,
> > -                                       false, true)
> > -         && niter.cmp != ERROR_MARK
> > -         /* We can't yet handle loops controlled by a != predicate.  */
> > -         && niter.cmp != NE_EXPR
> > -         && can_duplicate_loop_p (loop))
> > +      if (optimize_loop_for_size_p (loop))
> > +        continue;
> > +
> > +      if (split_loop (loop) || split_loop_on_cond (loop))
> >         {
> > -         if (split_loop (loop, &niter))
> > -           {
> > -             /* Mark our containing loop as having had some split inner
> > -                loops.  */
> > -             loop_outer (loop)->aux = loop;
> > -             changed = true;
> > -           }
> > +         /* Mark our containing loop as having had some split inner loops.  */
> > +         loop_outer (loop)->aux = loop;
> > +         changed = true;
> >         }
> >      }
> >
> > --
> > 2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-23 10:32                               ` Richard Biener
@ 2019-10-25  5:20                                 ` Feng Xue OS
  2019-10-31 15:56                                   ` [PATCH V4] " Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-10-25  5:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

Richard,

    Thanks for your comments. 

>+      /* For PHI node that is not in loop header, its source operands should
>+        be defined inside the loop, which are seen as loop variant.  */
>+      if (def_bb != loop->header || !skip_head)
>+       return false;

> so if we have
>
> for (;;)
>  {
>     if (x)
>       a = ..;
>     else
>       a = ...;
>     if (cond-to-split-on dependent on a)
> ...
>  }
>
> the above is too restrictive in case 'x' is semi-invariant as well, correct?
In above case, cond-on-a will not be identified as semi-invariant, in that
a is defined by PHI with real multi-sources. To handle it,  besides each
source value, we should add extra check on each source's control
dependence node (x in the case), which might have not a little code expansion.
Anyway, I'll have a try.


>+         /* A new value comes from outside of loop.  */
>+         if (!bb || !flow_bb_inside_loop_p (loop, bb))
>+           return false;

> but that means starting from the second iteration the value is invariant.
No. Traversal direction is reverse to loop execution. In the following,
start from "x_1 = ", extract latch value x_3, and get x_3 definition, and
finally reach "x_1 =".

Loop:
      x_1 = PHI (x_0, x_3)
      ... 
      x_3 = 
      ...
      goto Loop;


>+                 /* Don't consider redefinitions in excluded basic blocks.  */
>+                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
>+                   {
>+                     /* There are more than one source operands that can
>+                        provide value to the SSA name, it is variant.  */
>+                     if (from)
>+                       return false;
>
> they might be the same though, for PHIs with > 2 arguments.
OK. Will add value equivalence check.


> In the cycle handling you are not recursing via stmt_semi_invariant_p
> but only handle SSA name copies - any particular reason for that?
The cycle handling is specified for ssa that crosses iteration. It is
semi-invariant if it remains unchanged after certain iteration, which
means its value in previous iteration (coming from latch edge) is just
a copy of its self,  nothing else. So, recursion via stmt_semi_invariant_p
is unnecessary.

Loop:
      x_1 = PHI (x_0, x_3);
      x_2 = PHI(x_1, value defined in excluded branch);
      x_3 = x_2;
      goto Loop;


>+static bool
>+branch_removable_p (basic_block branch_bb)
>+{
>+  if (single_pred_p (branch_bb))
>+    return true;
>
> I'm not sure what this function tests - at least the single_pred_p check
> looks odd to me given the dominator checks later.  The single predecessor
> could simply be a forwarder.  I wonder if you are looking for branches forming
> an irreducible loop?  I think you can then check EDGE_IRREDUCIBLE_LOOP
> or BB_IRREDUCIBLE_LOOP on the condition block (btw, I don't see
> testcases covering the appearant special-cases in the patch - refering to
> existing ones via a comment often helps understanding the code).

Upon condition evaluation, if a branch is not selected,  
This function test a branch is reachable from other place other than its
conditional statement. This ensure that when the branch is not selected
upon condition evaluation, trace path led by the branch will never
be executed so that it can be excluded  during semi-invariantness analysis.

If single_pred_p, only condition statement can reach the branch.

If not, consider a half diamond condition control graph, with a back-edge to
true branch.

            condition
               |  \
               |   \
               |  false branch
   .--->----.  |   /
   |        |  |  /
 other    true branch
   |        |
   '---<----'

If there is an edge from false branch, true branch can not be excluded even it
is not selected.  And back edge from "other" (dominated by true branch) does
not have any impact.


>+
>+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
>+}
>
> magic ensures that invar[1] is always the invariant edge?  Oh, it's a bool.
> Ick.  I wonder if logic with int invariant_edge = -1; and the loop setting
> it to either 0 or 1 would be easier to follow...
OK.


> Note your stmt_semi_invariant_p check is exponential for a condition
> like
>
>   _1 = 1;
>   _2 = _1 + _1;
>   _3 = _2 + _2;
>   if (_3 != param_4(D))
>
> because you don't track ops you already proved semi-invariant.  We've
> run into such situation repeatedly in SCEV analysis so I doubt it can be
> disregarded as irrelevant in practice.  A worklist approach could then
> also get rid of the recursion.  You are already computing the stmts
> forming the condition in compute_added_num_insns so another option
> is to re-use that.
OK.


> Btw, I wonder if we can simply re-use PARAM_MAX_PEELED_INSNS
> instead of adding yet another param (it also happens to have the same
> size).  Because we are "peeling" the loop.
I'll check that.

>+  edge invar_branch = get_cond_invariant_branch (loop, cond);
>+
>+  if (!invar_branch)
>+    return NULL;
>
> extra vertical space is unwanted in such cases.
OK.

>+  if (dump_file && (dump_flags & TDF_DETAILS))
>+   {
>+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
>+             current_function_name (), loop1->num,
>+             true_invar ? "T" : "F", cond_bb->index);
>+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
>+   }
>
> can you please use sth like
>
>  if (dump_enabled_p ())
>    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>                             cond, "loop split on semi-invariant condition");
>
> so -fopt-info-loop will show it?
OK.


>+  /* Generate a bool type temporary to hold result of the condition.  */
>+  tree tmp = make_ssa_name (boolean_type_node);
>+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
>+  gimple *stmt = gimple_build_assign (tmp,
>+                                     gimple_cond_code (cond),
>+                                     gimple_cond_lhs (cond),
>+                                     gimple_cond_rhs (cond));
>
> shorter is
>
>   gimple_seq stmts = NULL;
>   tree tmp = gimple_build (&stmts, gimple_cond_code (cond),
>                                      boolean_type_node,
>  gimple_cond_lhs (cond), gimple_cond_rhs (cond));
>   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
OK.


>+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
>+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
>
> but I wonder what's the point here to move the condition computation to
> a temporary?  Why not just build the original condition again for break_cond?
OK.


> in split_loop_on_cond you'll find the first semi-invariant condition
> to split on,
> but we'll not visit the split loop again (also for original splitting I guess).
> Don't we eventually want to recurse on that?
Currently, we only do a round of loop-split. It is a TODO to enable more than
one loop-splits on a loop.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V4] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-25  5:20                                 ` Feng Xue OS
@ 2019-10-31 15:56                                   ` Feng Xue OS
  2019-11-05 14:04                                     ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Feng Xue OS @ 2019-10-31 15:56 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

[-- Attachment #1: Type: text/plain, Size: 7604 bytes --]

Hi, Richard

   This is a new patch to support more generalized semi-invariant condition, which uses
control dependence analysis.

Thanks,
Feng

________________________________________
From: Feng Xue OS <fxue@os.amperecomputing.com>
Sent: Friday, October 25, 2019 11:43 AM
To: Richard Biener
Cc: Michael Matz; Philipp Tomsich; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.ochoa@theobroma-systems.com
Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

Richard,

    Thanks for your comments.

>+      /* For PHI node that is not in loop header, its source operands should
>+        be defined inside the loop, which are seen as loop variant.  */
>+      if (def_bb != loop->header || !skip_head)
>+       return false;

> so if we have
>
> for (;;)
>  {
>     if (x)
>       a = ..;
>     else
>       a = ...;
>     if (cond-to-split-on dependent on a)
> ...
>  }
>
> the above is too restrictive in case 'x' is semi-invariant as well, correct?
In above case, cond-on-a will not be identified as semi-invariant, in that
a is defined by PHI with real multi-sources. To handle it,  besides each
source value, we should add extra check on each source's control
dependence node (x in the case), which might have not a little code expansion.
Anyway, I'll have a try.


>+         /* A new value comes from outside of loop.  */
>+         if (!bb || !flow_bb_inside_loop_p (loop, bb))
>+           return false;

> but that means starting from the second iteration the value is invariant.
No. Traversal direction is reverse to loop execution. In the following,
start from "x_1 = ", extract latch value x_3, and get x_3 definition, and
finally reach "x_1 =".

Loop:
      x_1 = PHI (x_0, x_3)
      ...
      x_3 =
      ...
      goto Loop;


>+                 /* Don't consider redefinitions in excluded basic blocks.  */
>+                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
>+                   {
>+                     /* There are more than one source operands that can
>+                        provide value to the SSA name, it is variant.  */
>+                     if (from)
>+                       return false;
>
> they might be the same though, for PHIs with > 2 arguments.
OK. Will add value equivalence check.


> In the cycle handling you are not recursing via stmt_semi_invariant_p
> but only handle SSA name copies - any particular reason for that?
The cycle handling is specified for ssa that crosses iteration. It is
semi-invariant if it remains unchanged after certain iteration, which
means its value in previous iteration (coming from latch edge) is just
a copy of its self,  nothing else. So, recursion via stmt_semi_invariant_p
is unnecessary.

Loop:
      x_1 = PHI (x_0, x_3);
      x_2 = PHI(x_1, value defined in excluded branch);
      x_3 = x_2;
      goto Loop;


>+static bool
>+branch_removable_p (basic_block branch_bb)
>+{
>+  if (single_pred_p (branch_bb))
>+    return true;
>
> I'm not sure what this function tests - at least the single_pred_p check
> looks odd to me given the dominator checks later.  The single predecessor
> could simply be a forwarder.  I wonder if you are looking for branches forming
> an irreducible loop?  I think you can then check EDGE_IRREDUCIBLE_LOOP
> or BB_IRREDUCIBLE_LOOP on the condition block (btw, I don't see
> testcases covering the appearant special-cases in the patch - refering to
> existing ones via a comment often helps understanding the code).

Upon condition evaluation, if a branch is not selected,
This function test a branch is reachable from other place other than its
conditional statement. This ensure that when the branch is not selected
upon condition evaluation, trace path led by the branch will never
be executed so that it can be excluded  during semi-invariantness analysis.

If single_pred_p, only condition statement can reach the branch.

If not, consider a half diamond condition control graph, with a back-edge to
true branch.

            condition
               |  \
               |   \
               |  false branch
   .--->----.  |   /
   |        |  |  /
 other    true branch
   |        |
   '---<----'

If there is an edge from false branch, true branch can not be excluded even it
is not selected.  And back edge from "other" (dominated by true branch) does
not have any impact.


>+
>+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
>+}
>
> magic ensures that invar[1] is always the invariant edge?  Oh, it's a bool.
> Ick.  I wonder if logic with int invariant_edge = -1; and the loop setting
> it to either 0 or 1 would be easier to follow...
OK.


> Note your stmt_semi_invariant_p check is exponential for a condition
> like
>
>   _1 = 1;
>   _2 = _1 + _1;
>   _3 = _2 + _2;
>   if (_3 != param_4(D))
>
> because you don't track ops you already proved semi-invariant.  We've
> run into such situation repeatedly in SCEV analysis so I doubt it can be
> disregarded as irrelevant in practice.  A worklist approach could then
> also get rid of the recursion.  You are already computing the stmts
> forming the condition in compute_added_num_insns so another option
> is to re-use that.
OK.


> Btw, I wonder if we can simply re-use PARAM_MAX_PEELED_INSNS
> instead of adding yet another param (it also happens to have the same
> size).  Because we are "peeling" the loop.
I'll check that.

>+  edge invar_branch = get_cond_invariant_branch (loop, cond);
>+
>+  if (!invar_branch)
>+    return NULL;
>
> extra vertical space is unwanted in such cases.
OK.

>+  if (dump_file && (dump_flags & TDF_DETAILS))
>+   {
>+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
>+             current_function_name (), loop1->num,
>+             true_invar ? "T" : "F", cond_bb->index);
>+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
>+   }
>
> can you please use sth like
>
>  if (dump_enabled_p ())
>    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>                             cond, "loop split on semi-invariant condition");
>
> so -fopt-info-loop will show it?
OK.


>+  /* Generate a bool type temporary to hold result of the condition.  */
>+  tree tmp = make_ssa_name (boolean_type_node);
>+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
>+  gimple *stmt = gimple_build_assign (tmp,
>+                                     gimple_cond_code (cond),
>+                                     gimple_cond_lhs (cond),
>+                                     gimple_cond_rhs (cond));
>
> shorter is
>
>   gimple_seq stmts = NULL;
>   tree tmp = gimple_build (&stmts, gimple_cond_code (cond),
>                                      boolean_type_node,
>  gimple_cond_lhs (cond), gimple_cond_rhs (cond));
>   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
OK.


>+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
>+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
>
> but I wonder what's the point here to move the condition computation to
> a temporary?  Why not just build the original condition again for break_cond?
OK.


> in split_loop_on_cond you'll find the first semi-invariant condition
> to split on,
> but we'll not visit the split loop again (also for original splitting I guess).
> Don't we eventually want to recurse on that?
Currently, we only do a round of loop-split. It is a TODO to enable more than
one loop-splits on a loop.

[-- Attachment #2: 0001-Loop-split-on-semi-invariant-conditional-statement.patch --]
[-- Type: application/octet-stream, Size: 43766 bytes --]

From bebbf6079687becc7cf088df3cd8bacda43920c4 Mon Sep 17 00:00:00 2001
From: Feng Xue <feng.xue@amperecomputing.com>
Date: Tue, 12 Mar 2019 11:46:19 +0800
Subject: [PATCH] Loop split on semi-invariant conditional statement

---
 gcc/ChangeLog                                 |   18 +
 gcc/doc/invoke.texi                           |    5 +
 gcc/params.def                                |    6 +
 gcc/testsuite/ChangeLog                       |    6 +
 .../g++.dg/tree-ssa/loop-cond-split-1.C       |   33 +
 .../gcc.dg/tree-ssa/loop-cond-split-1.c       |   97 ++
 gcc/tree-ssa-loop-split.c                     | 1023 ++++++++++++++++-
 7 files changed, 1160 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8ec312d7470..1679ce83faa 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,21 @@
+2019-10-23  Feng Xue <fxue@os.amperecomputing.com>
+
+	PR tree-optimization/89134
+	* doc/invoke.texi (min-loop-cond-split-prob): Document new --params.
+	* params.def: Add min-loop-cond-split-prob.
+	* tree-ssa-loop-split.c (split_loop): Remove niter parameter, move some
+	outside checks on loop into the function.
+	(split_info): New class.
+	(find_vdef_in_loop, get_control_equiv_head_block): New functions.
+	(find_control_dep_blocks, vuse_semi_invariant_p): Likewise.
+	(ssa_semi_invariant_p, loop_iter_phi_semi_invariant_p): Likewise.
+	(control_dep_semi_invariant_p, stmt_semi_invariant_p_1): Likewise.
+	(stmt_semi_invariant_p, branch_removable_p): Likewise.
+	(get_cond_invariant_branch, compute_added_num_insns): Likewise.
+	(get_cond_branch_to_split_loop, do_split_loop_on_cond): Likewise.
+	(split_loop_on_cond): Likewise.
+	(tree_ssa_split_loops): Add loop split on conditional statement.
+
 2019-10-23  Iain Sandoe  <iain@sandoe.co.uk>
 
 	* config/rs6000/darwin.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Guard
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1407d019d14..ac147cd0727 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11481,6 +11481,11 @@ The maximum number of branches unswitched in a single loop.
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
+@item min-loop-cond-split-prob
+When FDO profile information is available, @option{min-loop-cond-split-prob}
+specifies minimum threshold for probability of semi-invariant condition
+statement to trigger loop split.
+
 @item iv-consider-all-candidates-bound
 Bound on number of candidates for induction variables, below which
 all candidates are considered for each use in induction variable
diff --git a/gcc/params.def b/gcc/params.def
index 322c37f8b96..33b5907b0d0 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -415,6 +415,12 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
 	"The maximum number of unswitchings in a single loop.",
 	3, 0, 0)
 
+DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
+	"min-loop-cond-split-prob",
+	"The minimum threshold for probability of semi-invariant condition "
+	"statement to trigger loop split.",
+	30, 0, 100)
+
 /* The maximum number of insns in loop header duplicated by the copy loop
    headers pass.  */
 DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index fd272807d0b..892cf457b8d 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2019-10-23  Feng Xue  <fxue@os.amperecomputing.com>
+
+	PR tree-optimization/89134
+	* gcc.dg/tree-ssa/loop-cond-split-1.c: New test.
+	* g++.dg/tree-ssa/loop-cond-split-1.C: New test.
+
 2019-10-22  Marc Glisse  <marc.glisse@inria.fr>
 
 	PR c++/85746
diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
new file mode 100644
index 00000000000..0d679cb9035
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+#include <string>
+#include <map>
+
+using namespace std;
+
+class  A
+{
+public:
+  bool empty;
+  void set (string s);
+};
+
+class  B
+{
+  map<int, string> m;
+  void f ();
+};
+
+extern A *ga;
+
+void B::f ()
+{
+  for (map<int, string>::iterator iter = m.begin (); iter != m.end (); ++iter)
+    {
+      if (ga->empty)
+        ga->set (iter->second);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "loop split on semi-invariant condition at false branch" 1 "lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
new file mode 100644
index 00000000000..feb776e8373
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
@@ -0,0 +1,97 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
+
+extern const int step;
+
+int ga, gb;
+
+__attribute__((pure)) __attribute__((noinline)) int inc (int i)
+{
+  return i + step;
+}
+
+extern int do_something (void);
+
+void test1 (int n)
+{
+  int i;
+
+  for (i = 0; i < n; i = inc (i))
+    {
+      if (ga)
+        ga = do_something ();
+    }
+}
+
+void test2 (int n, int p)
+{
+  int i;
+  int v;
+
+  for (i = 0; i < n ; i = inc (i))
+    {
+      if (ga)
+       {
+         v = inc (2);
+         gb += 1;
+       }
+      else
+       {
+         v = p * p;
+         gb *= 3;
+       }
+
+      if (v < 10)
+        ga = do_something ();
+    }
+}
+
+void test3 (int n, int p)
+{
+  int i;
+  int c = p + 1;
+  int v;
+
+  for (i = 0; i < n ; i = inc (i))
+    {
+      if (c)
+       {
+         v = inc (c);
+         gb += 1;
+       }
+      else
+       {
+         v = p * p;
+         gb *= 3;
+       }
+
+      if (v < 10)
+        c = do_something ();
+    }
+}
+
+void test4 (int n, int p)
+{
+  int i;
+  int v;
+
+  for (i = 0; i < n ; i = inc (i))
+    {
+      if (ga)
+       {
+         v = inc (2);
+         if (gb > 16)
+           v = inc (5);  
+       }
+      else
+       {
+         v = p * p;
+         gb += 2;
+       }
+
+      if (v < 10)
+        ga = do_something ();
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "loop split on semi-invariant condition at false branch" 3 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index f5f083384bc..f5f99c21dc2 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-into-ssa.h"
+#include "tree-inline.h"
+#include "tree-cfgcleanup.h"
 #include "cfgloop.h"
+#include "params.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-iterator.h"
 #include "gimple-pretty-print.h"
@@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 
-/* This file implements loop splitting, i.e. transformation of loops like
+/* This file implements two kinds of loop splitting.
+
+   One transformation of loops like:
 
    for (i = 0; i < 100; i++)
      {
@@ -487,8 +492,9 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
    single exit of LOOP.  */
 
 static bool
-split_loop (class loop *loop1, class tree_niter_desc *niter)
+split_loop (class loop *loop1)
 {
+  class tree_niter_desc niter;
   basic_block *bbs;
   unsigned i;
   bool changed = false;
@@ -496,8 +502,28 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   tree border = NULL_TREE;
   affine_iv iv;
 
+  if (!single_exit (loop1)
+      /* ??? We could handle non-empty latches when we split the latch edge
+	 (not the exit edge), and put the new exit condition in the new block.
+	 OTOH this executes some code unconditionally that might have been
+	 skipped by the original exit before.  */
+      || !empty_block_p (loop1->latch)
+      || !easy_exit_values (loop1)
+      || !number_of_iterations_exit (loop1, single_exit (loop1), &niter,
+				     false, true)
+      || niter.cmp == ERROR_MARK
+      /* We can't yet handle loops controlled by a != predicate.  */
+      || niter.cmp == NE_EXPR)
+    return false;
+
   bbs = get_loop_body (loop1);
 
+  if (!can_copy_bbs_p (bbs, loop1->num_nodes))
+    {
+      free (bbs);
+      return false;
+    }
+
   /* Find a splitting opportunity.  */
   for (i = 0; i < loop1->num_nodes; i++)
     if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
@@ -505,8 +531,8 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
 	/* Handling opposite steps is not implemented yet.  Neither
 	   is handling different step sizes.  */
 	if ((tree_int_cst_sign_bit (iv.step)
-	     != tree_int_cst_sign_bit (niter->control.step))
-	    || !tree_int_cst_equal (iv.step, niter->control.step))
+	     != tree_int_cst_sign_bit (niter.control.step))
+	    || !tree_int_cst_equal (iv.step, niter.control.step))
 	  continue;
 
 	/* Find a loop PHI node that defines guard_iv directly,
@@ -575,7 +601,7 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
 	   Compute the new bound for the guarding IV and patch the
 	   loop exit to use it instead of original IV and bound.  */
 	gimple_seq stmts = NULL;
-	tree newend = compute_new_first_bound (&stmts, niter, border,
+	tree newend = compute_new_first_bound (&stmts, &niter, border,
 					       guard_code, guard_init);
 	if (stmts)
 	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
@@ -612,6 +638,956 @@ split_loop (class loop *loop1, class tree_niter_desc *niter)
   return changed;
 }
 
+/* Another transformation of loops like:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))  // expr is pure
+         a_j = ...;  // change at least one a_j
+       else
+         S;          // not change any a_j
+     }
+
+   into:
+
+   for (i = INIT (); CHECK (i); i = NEXT ())
+     {
+       if (expr (a_1, a_2, ..., a_n))
+         a_j = ...;
+       else
+         {
+           S;
+           i = NEXT ();
+           break;
+         }
+     }
+
+   for (; CHECK (i); i = NEXT ())
+     {
+       S;
+     }
+
+   */
+
+/* Data structure to hold temporary information during loop split upon
+   semi-invariant conditional statement.  */
+class split_info {
+public:
+  /* Array of all basic blocks in a loop, returned by get_loop_body().  */
+  basic_block *bbs;
+
+  /* All memory store/clobber statements in a loop.  */
+  auto_vec<gimple *> memory_stores;
+
+  /* Whether above memory stores vector has been filled.  */
+  int need_init;
+
+  /* Control dependencies of basic blocks in a loop.  */
+  auto_vec<hash_set<basic_block> *> control_deps;
+
+  split_info () : bbs (NULL),  need_init (true) { }
+
+  ~split_info ()
+    {
+      if (bbs)
+	free (bbs);
+
+      for (unsigned i = 0; i < control_deps.length (); i++)
+	delete control_deps[i];
+    }
+};
+
+/* Find all statements with memory-write effect in LOOP, including memory
+   store and non-pure function call, and keep those in a vector.  This work
+   is only done one time, for the vector should be constant during analysis
+   stage of semi-invariant condition.  */
+
+static void
+find_vdef_in_loop (struct loop *loop)
+{
+  split_info *info = (split_info *) loop->aux;
+  gphi *vphi = get_virtual_phi (loop->header);
+
+  /* Indicate memory store vector has been filled.  */
+  info->need_init = false;
+
+  /* If loop contains memory operation, there must be a virtual PHI node in
+     loop header basic block.  */
+  if (vphi == NULL)
+    return;
+
+  /* All virtual SSA names inside the loop are connected to be a cyclic
+     graph via virtual PHI nodes.  The virtual PHI node in loop header just
+     links the first and the last virtual SSA names, by using the last as
+     PHI operand to define the first.  */
+  const edge latch = loop_latch_edge (loop);
+  const tree first = gimple_phi_result (vphi);
+  const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch);
+
+  /* The virtual SSA cyclic graph might consist of only one SSA name, who
+     is defined by itself.
+
+       .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)>
+
+     This means the loop contains only memory loads, so we can skip it.  */
+  if (first == last)
+    return;
+
+  auto_vec<gimple *> other_stores;
+  auto_vec<tree> worklist;
+  auto_bitmap visited;
+
+  bitmap_set_bit (visited, SSA_NAME_VERSION (first));
+  bitmap_set_bit (visited, SSA_NAME_VERSION (last));
+  worklist.safe_push (last);
+
+  do
+    {
+      tree vuse = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (vuse);
+
+      /* We mark the first and last SSA names as visited at the beginning,
+	 and reversely start the process from the last SSA name towards the
+	 first, which ensures that this do-while will not touch SSA names
+	 defined outside the loop.  */
+      gcc_assert (gimple_bb (stmt)
+		  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)));
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+	{
+	  gphi *phi = as_a <gphi *> (stmt);
+
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree arg = gimple_phi_arg_def (stmt, i);
+
+	      if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
+		worklist.safe_push (arg);
+	    }
+	}
+      else
+	{
+	  tree prev = gimple_vuse (stmt);
+
+	  /* Non-pure call statement is conservatively assumed to impact all
+	     memory locations.  So place call statements ahead of other memory
+	     stores in the vector with an idea of of using them as shortcut
+	     terminators to memory alias analysis.  */
+	  if (gimple_code (stmt) == GIMPLE_CALL)
+	    info->memory_stores.safe_push (stmt);
+	  else
+	    other_stores.safe_push (stmt);
+
+	  if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev)))
+	    worklist.safe_push (prev);
+	}
+    } while (!worklist.is_empty ());
+
+    info->memory_stores.safe_splice (other_stores);
+}
+
+/* Two basic blocks have equivalent control dependency if one dominates to
+   the other, and is post-dominated by the latter.  Given a basic block
+   BB in LOOP, find farest equivalent dominating basic block.  For BB, there
+   is a constraint that BB does not post-dominate loop header of LOOP, this
+   means BB is control-dependent on at least one basic block in LOOP.  */
+
+static basic_block
+get_control_equiv_head_block (struct loop *loop, basic_block bb)
+{
+  while (!bb->aux)
+    {
+      basic_block dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+      gcc_checking_assert (dom_bb && flow_bb_inside_loop_p (loop, dom_bb));
+
+      if (!dominated_by_p (CDI_POST_DOMINATORS, dom_bb, bb))
+	break;
+
+      bb = dom_bb;
+    }
+  return bb;
+}
+
+/* Given a BB in LOOP, find out all basic blocks in LOOP that BB is control-
+   dependent on.  */
+
+static hash_set<basic_block> *
+find_control_dep_blocks (struct loop *loop, basic_block bb)
+{
+  /* BB has same control dependency as loop header, then it is not control-
+     dependent on any basic block in LOOP.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, loop->header, bb))
+    return NULL;
+
+  basic_block equiv_head = get_control_equiv_head_block (loop, bb);
+
+  if (equiv_head->aux)
+    {
+      /* There is a basic block with available control dependency equivalent
+	 to BB.  No need to recompute that, and also set this information
+	 to other equivalent basic blocks.  */
+      for (; bb != equiv_head;
+	   bb = get_immediate_dominator (CDI_DOMINATORS, bb))
+	bb->aux = equiv_head->aux;
+      return (hash_set<basic_block> *) equiv_head->aux;
+    }
+
+  /* A basic block X is control-dependent on another Y iff there exists
+     a path from X to Y, in which every basic block other than X and Y
+     is post-dominated by Y, but X is not post-dominated by Y.
+
+     According to this rule, recursively traverse basic blocks in the loop
+     backwards starting from BB, if a basic block is post-dominated by BB,
+     extend current post-dominating path to the basic block, otherwise it
+     is one that BB is control-dependent on.  */
+
+  auto_vec<basic_block> pdom_worklist;
+  hash_set<basic_block> pdom_visited;
+  hash_set<basic_block> *dep_bbs = new hash_set<basic_block>;
+
+  pdom_worklist.safe_push (equiv_head);
+
+  do
+    {
+      basic_block pdom_bb = pdom_worklist.pop ();
+      edge_iterator ei;
+      edge e;
+
+      if (pdom_visited.add (pdom_bb))
+	continue;
+
+      FOR_EACH_EDGE (e, ei, pdom_bb->preds)
+	{
+	  basic_block pred_bb = e->src;
+
+	  if (!dominated_by_p (CDI_POST_DOMINATORS, pred_bb, bb))
+	    {
+	      dep_bbs->add (pred_bb);
+	      continue;
+	    }
+
+	  pred_bb = get_control_equiv_head_block (loop, pred_bb);
+
+	  if (pdom_visited.contains (pred_bb))
+	    continue;
+
+	  if (!pred_bb->aux)
+	    {
+	      pdom_worklist.safe_push (pred_bb);
+	      continue;
+	    }
+
+	  /* For some basic block in the path being extended, if its control
+	     dependency has been computed, reuse the information instead of
+	     computing again from scratch.  */
+	  hash_set<basic_block> *pred_dep_bbs
+			= (hash_set<basic_block> *) pred_bb->aux;
+
+	  for (hash_set<basic_block>::iterator iter = pred_dep_bbs->begin ();
+	       iter != pred_dep_bbs->end (); ++iter)
+	    {
+	      basic_block pred_dep_bb = *iter;
+
+	      /* Basic blocks can either be in control dependency of BB, or
+		 must be post-dominated by BB, if so, extend the path from
+		 these basic blocks.  */
+	      if (!dominated_by_p (CDI_POST_DOMINATORS, pred_dep_bb, bb))
+		dep_bbs->add (pred_dep_bb);
+	      else if (!pdom_visited.contains (pred_dep_bb))
+		pdom_worklist.safe_push (pred_dep_bb);
+	    }
+	}
+    } while (!pdom_worklist.is_empty ());
+
+  /* Record computed control dependencies in loop so that we can reach them
+     when reclaiming resources.  */
+  ((split_info *) loop->aux)->control_deps.safe_push (dep_bbs);
+
+  /* Associate control dependence with related equivalent basic blocks.  */
+  for (equiv_head->aux = dep_bbs; bb != equiv_head;
+       bb = get_immediate_dominator (CDI_DOMINATORS, bb))
+    bb->aux = dep_bbs;
+
+  return dep_bbs;
+}
+
+/* Forward declaration */
+
+static bool
+stmt_semi_invariant_p_1 (struct loop *loop, gimple *stmt,
+			 const_basic_block skip_head,
+			 hash_map<gimple *, bool> &stmt_stat);
+
+/* Given STMT, memory load or pure call statement, check whether it is impacted
+   by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the
+   trace is composed of SKIP_HEAD and those basic block dominated by it, always
+   corresponds to one branch of a conditional statement).  If SKIP_HEAD is
+   NULL, all basic blocks of LOOP are checked.  */
+
+static bool
+vuse_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  split_info *info = (split_info *) loop->aux;
+  tree rhs = NULL_TREE;
+  ao_ref ref;
+  gimple *store;
+  unsigned i;
+
+  /* Collect memory store/clobber statements if haven't done that.  */
+  if (info->need_init)
+    find_vdef_in_loop (loop);
+
+  if (is_gimple_assign (stmt))
+    rhs = gimple_assign_rhs1 (stmt);
+
+  ao_ref_init (&ref, rhs);
+
+  FOR_EACH_VEC_ELT (info->memory_stores, i, store)
+    {
+      /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL.  */
+      if (skip_head
+	  && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head))
+	continue;
+
+      if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref))
+	return false;
+    }
+
+  return true;
+}
+
+/* Suppose one condition branch, led by SKIP_HEAD, is not executed since
+   certain iteration of LOOP, check whether an SSA name (NAME) remains
+   unchanged in next iteration.  We call this characteristic semi-
+   invariantness.  SKIP_HEAD might be NULL, if so, nothing excluded, all basic
+   blocks and control flows in the loop will be considered.  Semi-invariant
+   state of checked statement is cached in hash map STMT_STAT to avoid
+   redundant computation in possible following re-check.  */
+
+static inline bool
+ssa_semi_invariant_p (struct loop *loop, tree name,
+		      const_basic_block skip_head,
+		      hash_map<gimple *, bool> &stmt_stat)
+{
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  const_basic_block def_bb = gimple_bb (def);
+
+  /* An SSA name defined outside loop is definitely semi-invariant.  */
+  if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb))
+    return true;
+
+  return stmt_semi_invariant_p_1 (loop, def, skip_head, stmt_stat);
+}
+
+/* Check whether a loop iteration PHI node (LOOP_PHI) defines a value that is
+   semi-invariant in LOOP.  Basic blocks dominated by SKIP_HEAD, if non-NULL,
+   are excluded from LOOP.  */
+
+static bool
+loop_iter_phi_semi_invariant_p (struct loop *loop, gphi *loop_phi,
+				const_basic_block skip_head)
+{
+  const_edge latch = loop_latch_edge (loop);
+  tree name = gimple_phi_result (loop_phi);
+  tree from = PHI_ARG_DEF_FROM_EDGE (loop_phi, latch);
+
+  gcc_checking_assert (from);
+
+  /* Loop iteration PHI node locates in loop header, and it has two source
+     operands, one is an initial value coming from outside the loop, the other
+     is a value through latch of the loop, which is derived in last iteration,
+     we call the latter latch value.  From the PHI node to definition of latch
+     value, if excluding branch trace starting from SKIP_HEAD, except copy-
+     assignment or likewise, there is no other kind of value redefinition, SSA
+     name defined by the PHI node is semi-invariant.
+
+                         loop entry
+                              |     .--- latch ---.
+                              |     |             |
+                              v     v             |
+                  x_1 = PHI <x_0,  x_3>           |
+                           |                      |
+                           v                      |
+              .------- if (cond) -------.         |
+              |                         |         |
+              |                     [ SKIP ]      |
+              |                         |         |
+              |                     x_2 = ...     |
+              |                         |         |
+              '---- T ---->.<---- F ----'         |
+                           |                      |
+                           v                      |
+                  x_3 = PHI <x_1, x_2>            |
+                           |                      |
+                           '----------------------'
+
+     Suppose in certain iteration, execution flow in above graph goes through
+     true branch, which means that one source value to define x_3 in false
+     branch (x_2) is skipped, x_3 only comes from x_1, and x_1 in next
+     iterations is defined by x_3, we know that x_1 will never changed if COND
+     always chooses true branch from then on.  */
+
+  while (from != name)
+    {
+      /* A new value comes from a CONSTANT.  */
+      if (TREE_CODE (from) != SSA_NAME)
+	return false;
+
+      gimple *stmt = SSA_NAME_DEF_STMT (from);
+      const_basic_block bb = gimple_bb (stmt);
+
+      /* A new value comes from outside the loop.  */
+      if (!bb || !flow_bb_inside_loop_p (loop, bb))
+	return false;
+
+      from = NULL_TREE;
+
+      if (gimple_code (stmt) == GIMPLE_PHI)
+	{
+	  gphi *phi = as_a <gphi *> (stmt);
+
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      if (skip_head)
+		{
+		  const_edge e = gimple_phi_arg_edge (phi, i);
+
+		  /* Don't consider redefinitions in excluded basic blocks.  */
+		  if (dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+		    continue;
+		}
+
+	      tree arg = gimple_phi_arg_def (phi, i);
+
+	      if (!from)
+		from = arg;
+	      else if (!operand_equal_p (from, arg, 0))
+		/* There are more than one source operands that provide
+		   different values to the SSA name, it is variant.  */
+		return false;
+	    }
+	}
+      else if (gimple_code (stmt) == GIMPLE_ASSIGN)
+	{
+	  /* For simple value copy, check its rhs instead.  */
+	  if (gimple_assign_ssa_name_copy_p (stmt))
+	    from = gimple_assign_rhs1 (stmt);
+	}
+
+      /* Any other kind of definition is deemed to introduce a new value
+	 to the SSA name.  */
+      if (!from)
+	return false;
+    }
+  return true;
+}
+
+/* Check whether conditional predicates that BB is control-dependent on, are
+   semi-invariant in LOOP.  Basic blocks dominated by SKIP_HEAD, if non-NULL,
+   are excluded from LOOP.  Semi-invariant state of checked statement is cached
+   in hash map STMT_STAT.  */
+
+static bool
+control_dep_semi_invariant_p (struct loop *loop, basic_block bb,
+			      const_basic_block skip_head,
+			      hash_map<gimple *, bool> &stmt_stat)
+{
+  hash_set<basic_block> *dep_bbs = find_control_dep_blocks (loop, bb);
+
+  if (!dep_bbs)
+    return true;
+
+  for (hash_set<basic_block>::iterator iter = dep_bbs->begin ();
+       iter != dep_bbs->end (); ++iter)
+    {
+      gimple *last = last_stmt (*iter);
+
+      if (!last)
+	return false;
+
+      /* Only check condition predicates.  */
+      if (gimple_code (last) != GIMPLE_COND
+	  && gimple_code (last) != GIMPLE_SWITCH)
+	return false;
+
+      if (!stmt_semi_invariant_p_1 (loop, last, skip_head, stmt_stat))
+	return false;
+    }
+
+  return true;
+}
+
+/* Check whether STMT is semi-invariant in LOOP, iff all its operands are
+   semi-invariant, consequently, all its defined values are semi-invariant.
+   Basic blocks dominated by SKIP_HEAD, if non-NULL, are excluded from LOOP.
+   Semi-invariant state of checked statement is cached in hash map
+   STMT_STAT.  */
+
+static bool
+stmt_semi_invariant_p_1 (struct loop *loop, gimple *stmt,
+			 const_basic_block skip_head,
+			 hash_map<gimple *, bool> &stmt_stat)
+{
+  bool existed;
+  bool &invar = stmt_stat.get_or_insert (stmt, &existed);
+
+  if (existed)
+    return invar;
+
+  /* A statement might depend on itself, which is treated as variant.  So set
+     state of statement under check to be variant to ensure that.  */
+  invar = false;
+
+  if (gimple_code (stmt) == GIMPLE_PHI)
+    {
+      gphi *phi = as_a <gphi *> (stmt);
+
+      if (gimple_bb (stmt) == loop->header)
+	{
+	  invar = loop_iter_phi_semi_invariant_p (loop, phi, skip_head);
+	  return invar;
+	}
+
+      /* For a loop PHI node that does not locate in loop header, it is semi-
+	 invariant only if two conditions are met.  The first is its source
+	 values are derived from CONSTANT (including loop-invariant value), or
+	 from SSA name defined by semi-invariant loop iteration PHI node.  The
+	 second is its source incoming edges are control-dependent on semi-
+	 invariant conditional predicates.  */
+      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	{
+	  const_edge e = gimple_phi_arg_edge (phi, i);
+	  tree arg = gimple_phi_arg_def (phi, i);
+
+	  if (TREE_CODE (arg) == SSA_NAME)
+	    {
+	      if (!ssa_semi_invariant_p (loop, arg, skip_head, stmt_stat))
+		return false;
+
+	      /* If source value is defined in location from where the source
+		 edge comes in, no need to check control dependency again
+		 since this has been done in above SSA name check stage.  */
+	      if (e->src == gimple_bb (SSA_NAME_DEF_STMT (arg)))
+		continue;
+	    }
+
+	  if (!control_dep_semi_invariant_p (loop, e->src, skip_head,
+					     stmt_stat))
+	    return false;
+	}
+    }
+  else
+    {
+      ssa_op_iter iter;
+      tree use;
+
+      /* Volatile memory load or return of normal (non-const/non-pure) call
+	 should not be treated as constant in each iteration of loop.  */
+      if (gimple_has_side_effects (stmt))
+	return false;
+
+      /* Check if any memory store may kill memory load at this place.  */
+      if (gimple_vuse (stmt) && !vuse_semi_invariant_p (loop, stmt, skip_head))
+	return false;
+
+      /* Although operand of a statement might be SSA name, CONSTANT or
+	 VARDECL, here we only need to check SSA name operands.  This is
+	 because check on VARDECL operands, which involve memory loads,
+	 must have been done prior to invocation of this function in
+	 vuse_semi_invariant_p.  */
+      FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
+	if (!ssa_semi_invariant_p (loop, use, skip_head, stmt_stat))
+	  return false;
+    }
+
+  if (!control_dep_semi_invariant_p (loop, gimple_bb (stmt), skip_head,
+				     stmt_stat))
+    return false;
+
+  /* Here we SHOULD NOT use invar = true, since hash map might be changed due
+     to new insertion, and invar may point to invalid memory.  */
+  stmt_stat.put (stmt, true);
+  return true;
+}
+
+/* A wrapper function to check whether STMT is semi-invariant in LOOP.  Basic
+   blocks dominated by SKIP_HEAD, if non-NULL, are excluded from LOOP.  */
+
+static bool
+stmt_semi_invariant_p (struct loop *loop, gimple *stmt,
+		       const_basic_block skip_head)
+{
+  hash_map<gimple *, bool> stmt_stat;
+  return stmt_semi_invariant_p_1 (loop, stmt, skip_head, stmt_stat);
+}
+
+/* Determine when conditional statement never transfers execution to one of its
+   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
+   and those basic blocks dominated by BRANCH_BB.  */
+
+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  edge_iterator ei;
+  edge e;
+
+  if (single_pred_p (branch_bb))
+    return true;
+
+  FOR_EACH_EDGE (e, ei, branch_bb->preds)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
+	continue;
+
+      if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
+	continue;
+
+       /* The branch can be reached from opposite branch, or from some
+	  statement not dominated by the conditional statement.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Find out which branch of a conditional statement (COND) is invariant in the
+   execution context of LOOP.  That is: once the branch is selected in certain
+   iteration of the loop, any operand that contributes to computation of the
+   conditional statement remains unchanged in all following iterations.  */
+
+static edge
+get_cond_invariant_branch (struct loop *loop, gcond *cond)
+{
+  basic_block cond_bb = gimple_bb (cond);
+  basic_block targ_bb[2];
+  bool invar[2];
+  unsigned invar_checks = 0;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest;
+
+      /* One branch directs to loop exit, no need to perform loop split upon
+	 this conditional statement.  Firstly, it is trivial if the exit branch
+	 is semi-invariant, for the statement is just to break loop.  Secondly,
+	 if the opposite branch is semi-invariant, it means that the statement
+	 is real loop-invariant, which is covered by loop unswitch.  */
+      if (!flow_bb_inside_loop_p (loop, targ_bb[i]))
+	return NULL;
+    }
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      invar[!i] = false;
+
+      if (!branch_removable_p (targ_bb[i]))
+	continue;
+
+      /* Given a semi-invariant branch, if its opposite branch dominates
+	 loop latch, it and its following trace will only be executed in
+	 final iteration of loop, namely it is not part of repeated body
+	 of the loop.  Similar to the above case that the branch is loop
+	 exit, no need to split loop.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i]))
+	continue;
+
+      invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]);
+      invar_checks++;
+    }
+
+  /* With both branches being invariant (handled by loop unswitch) or
+     variant is not what we want.  */
+  if (invar[0] ^ !invar[1])
+    return NULL;
+
+  /* Found a real loop-invariant condition, do nothing.  */
+  if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL))
+    return NULL;
+
+  return EDGE_SUCC (cond_bb, invar[0] ? 0 : 1);
+}
+
+/* Calculate increased code size measured by estimated insn number if applying
+   loop split upon certain branch (BRANCH_EDGE) of a conditional statement.  */
+
+static int
+compute_added_num_insns (struct loop *loop, const_edge branch_edge)
+{
+  basic_block cond_bb = branch_edge->src;
+  unsigned branch = EDGE_SUCC (cond_bb, 1) == branch_edge;
+  basic_block opposite_bb = EDGE_SUCC (cond_bb, !branch)->dest;
+  basic_block *bbs = ((split_info *) loop->aux)->bbs;
+  int num = 0;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      /* Do no count basic blocks only in opposite branch.  */
+      if (dominated_by_p (CDI_DOMINATORS, bbs[i], opposite_bb))
+	continue;
+
+      num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
+    }
+
+  /* It is unnecessary to evaluate expression of the conditional statement
+     in new loop that contains only invariant branch.  This expression should
+     be constant value (either true or false).  Exclude code size of insns
+     that contribute to computation of the expression.  */
+
+  auto_vec<gimple *> worklist;
+  hash_set<gimple *> removed;
+  gimple *stmt = last_stmt (cond_bb);
+
+  worklist.safe_push (stmt);
+  removed.add (stmt);
+  num -= estimate_num_insns (stmt, &eni_size_weights);
+
+  do
+    {
+      ssa_op_iter opnd_iter;
+      use_operand_p opnd_p;
+
+      stmt = worklist.pop ();
+      FOR_EACH_PHI_OR_STMT_USE (opnd_p, stmt, opnd_iter, SSA_OP_USE)
+	{
+	  tree opnd = USE_FROM_PTR (opnd_p);
+
+	  if (TREE_CODE (opnd) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (opnd))
+	    continue;
+
+	  gimple *opnd_stmt = SSA_NAME_DEF_STMT (opnd);
+	  use_operand_p use_p;
+	  imm_use_iterator use_iter;
+
+	  if (removed.contains (opnd_stmt)
+	      || !flow_bb_inside_loop_p (loop, gimple_bb (opnd_stmt)))
+	    continue;
+
+	  FOR_EACH_IMM_USE_FAST (use_p, use_iter, opnd)
+	    {
+	      gimple *use_stmt = USE_STMT (use_p);
+
+	      if (!is_gimple_debug (use_stmt) && !removed.contains (use_stmt))
+		{
+		  opnd_stmt = NULL;
+		  break;
+		}
+	    }
+
+	  if (opnd_stmt)
+	    {
+	      worklist.safe_push (opnd_stmt);
+	      removed.add (opnd_stmt);
+	      num -= estimate_num_insns (opnd_stmt, &eni_size_weights);
+	    }
+	}
+    } while (!worklist.is_empty ());
+
+  gcc_assert (num >= 0);
+  return num;
+}
+
+/* Find out loop-invariant branch of a conditional statement (COND) if it has,
+   and check whether it is eligible and profitable to perform loop split upon
+   this branch in LOOP.  */
+
+static edge
+get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
+{
+  edge invar_branch = get_cond_invariant_branch (loop, cond);
+  if (!invar_branch)
+    return NULL;
+
+  /* When accurate profile information is available, and execution
+     frequency of the branch is too low, just let it go.  */
+  profile_probability prob = invar_branch->probability;
+  if (prob.reliable_p ())
+    {
+      int thres = PARAM_VALUE (PARAM_MIN_LOOP_COND_SPLIT_PROB);
+
+      if (prob < profile_probability::always ().apply_scale (thres, 100))
+	return NULL;
+    }
+
+  /* Add a threshold for increased code size to disable loop split.  */
+  if (compute_added_num_insns (loop, invar_branch)
+      > PARAM_VALUE (PARAM_MAX_PEELED_INSNS))
+    return NULL;
+
+  return invar_branch;
+}
+
+/* Given a loop (LOOP1) with a loop-invariant branch (INVAR_BRANCH) of some
+   conditional statement, perform loop split transformation illustrated
+   as the following graph.
+
+               .-------T------ if (true) ------F------.
+               |                    .---------------. |
+               |                    |               | |
+               v                    |               v v
+          pre-header                |            pre-header
+               | .------------.     |                 | .------------.
+               | |            |     |                 | |            |
+               | v            |     |                 | v            |
+             header           |     |               header           |
+               |              |     |                 |              |
+      .--- if (cond) ---.     |     |        .--- if (true) ---.     |
+      |                 |     |     |        |                 |     |
+  invariant             |     |     |    invariant             |     |
+      |                 |     |     |        |                 |     |
+      '---T--->.<---F---'     |     |        '---T--->.<---F---'     |
+               |              |    /                  |              |
+             stmts            |   /                 stmts            |
+               |              F  T                    |              |
+              / \             | /                    / \             |
+     .-------*   *      [ if (cond) ]       .-------*   *            |
+     |           |            |             |           |            |
+     |         latch          |             |         latch          |
+     |           |            |             |           |            |
+     |           '------------'             |           '------------'
+     '------------------------. .-----------'
+             loop1            | |                   loop2
+                              v v
+                             exits
+
+   In the graph, loop1 represents the part derived from original one, and
+   loop2 is duplicated using loop_version (), which corresponds to the part
+   of original one being splitted out.  In original latch edge of loop1, we
+   insert a new conditional statement duplicated from the semi-invariant cond,
+   and one of its branch goes back to loop1 header as a latch edge, and the
+   other branch goes to loop2 pre-header as an entry edge.  And also in loop2,
+   we abandon the variant branch of the conditional statement by setting a
+   constant bool condition, based on which branch is semi-invariant.  */
+
+static bool
+do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
+{
+  basic_block cond_bb = invar_branch->src;
+  bool true_invar = !!(invar_branch->flags & EDGE_TRUE_VALUE);
+  gcond *cond = as_a <gcond *> (last_stmt (cond_bb));
+
+  gcc_assert (cond_bb->loop_father == loop1);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, cond,
+		     "loop split on semi-invariant condition at %s branch\n",
+		     true_invar ? "true" : "false");
+
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
+				     profile_probability::always (),
+				     profile_probability::never (),
+				     profile_probability::always (),
+				     profile_probability::always (),
+				     true);
+  if (!loop2)
+    {
+      free_original_copy_tables ();
+      return false;
+    }
+
+  basic_block cond_bb_copy = get_bb_copy (cond_bb);
+  gcond *cond_copy = as_a<gcond *> (last_stmt (cond_bb_copy));
+
+  /* Replace the condition in loop2 with a bool constant to let PassManager
+     remove the variant branch after current pass completes.  */
+  if (true_invar)
+    gimple_cond_make_true (cond_copy);
+  else
+    gimple_cond_make_false (cond_copy);
+
+  update_stmt (cond_copy);
+
+  /* Insert a new conditional statement on latch edge of loop1, its condition
+     is duplicated from the semi-invariant.  This statement acts as a switch
+     to transfer execution from loop1 to loop2, when loop1 enters into
+     invariant state.  */
+  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
+  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
+  gimple *break_cond = gimple_build_cond (gimple_cond_code(cond),
+					  gimple_cond_lhs (cond),
+					  gimple_cond_rhs (cond),
+					  NULL_TREE, NULL_TREE);
+
+  gimple_stmt_iterator gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_loop1 = single_succ_edge (break_bb);
+  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
+
+  to_loop1->flags &= ~EDGE_FALLTHRU;
+  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
+  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
+
+  update_ssa (TODO_update_ssa);
+
+  /* Due to introduction of a control flow edge from loop1 latch to loop2
+     pre-header, we should update PHIs in loop2 to reflect this connection
+     between loop1 and loop2.  */
+  connect_loop_phis (loop1, loop2, to_loop2);
+
+  free_original_copy_tables ();
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
+
+  return true;
+}
+
+/* Traverse all conditional statements in LOOP, to find out a good candidate
+   upon which we can do loop split.  */
+
+static bool
+split_loop_on_cond (struct loop *loop)
+{
+  split_info *info = new split_info ();
+  basic_block *bbs = info->bbs = get_loop_body (loop);
+  bool do_split = false;
+
+  /* Allocate an area to keep temporary info, and associate its address
+     with loop aux field.  */
+  loop->aux = info;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    bbs[i]->aux = NULL;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* We only consider conditional statement, which be executed at most once
+	 in each iteration of the loop.  So skip statements in inner loops.  */
+      if ((bb->loop_father != loop) || (bb->flags & BB_IRREDUCIBLE_LOOP))
+	continue;
+
+      /* Actually this check is not a must constraint.  With it, we can ensure
+	 conditional statement will always be executed in each iteration.  */
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	continue;
+
+      gimple *last = last_stmt (bb);
+
+      if (!last || gimple_code (last) != GIMPLE_COND)
+	continue;
+
+      gcond *cond = as_a <gcond *> (last);
+      edge branch_edge = get_cond_branch_to_split_loop (loop, cond);
+
+      if (branch_edge)
+	{
+	  do_split_loop_on_cond (loop, branch_edge);
+	  do_split = true;
+	  break;
+	}
+    }
+
+  delete info;
+  loop->aux = NULL;
+
+  return do_split;
+}
+
 /* Main entry point.  Perform loop splitting on all suitable loops.  */
 
 static unsigned int
@@ -621,13 +1597,15 @@ tree_ssa_split_loops (void)
   bool changed = false;
 
   gcc_assert (scev_initialized_p ());
+
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+
   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
     loop->aux = NULL;
 
   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
     {
-      class tree_niter_desc niter;
       if (loop->aux)
 	{
 	  /* If any of our inner loops was split, don't split us,
@@ -636,35 +1614,24 @@ tree_ssa_split_loops (void)
 	  continue;
 	}
 
-      if (single_exit (loop)
-	  /* ??? We could handle non-empty latches when we split
-	     the latch edge (not the exit edge), and put the new
-	     exit condition in the new block.  OTOH this executes some
-	     code unconditionally that might have been skipped by the
-	     original exit before.  */
-	  && empty_block_p (loop->latch)
-	  && !optimize_loop_for_size_p (loop)
-	  && easy_exit_values (loop)
-	  && number_of_iterations_exit (loop, single_exit (loop), &niter,
-					false, true)
-	  && niter.cmp != ERROR_MARK
-	  /* We can't yet handle loops controlled by a != predicate.  */
-	  && niter.cmp != NE_EXPR
-	  && can_duplicate_loop_p (loop))
+      if (optimize_loop_for_size_p (loop))
+	continue;
+
+      if (split_loop (loop) || split_loop_on_cond (loop))
 	{
-	  if (split_loop (loop, &niter))
-	    {
-	      /* Mark our containing loop as having had some split inner
-	         loops.  */
-	      loop_outer (loop)->aux = loop;
-	      changed = true;
-	    }
+	  /* Mark our containing loop as having had some split inner loops.  */
+	  loop_outer (loop)->aux = loop;
+	  changed = true;
 	}
     }
 
   FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
     loop->aux = NULL;
 
+  clear_aux_for_blocks ();
+
+  free_dominance_info (CDI_POST_DOMINATORS);
+
   if (changed)
     return TODO_cleanup_cfg;
   return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-10-31 15:56                                   ` [PATCH V4] " Feng Xue OS
@ 2019-11-05 14:04                                     ` Richard Biener
  2019-11-06  7:13                                       ` Feng Xue OS
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2019-11-05 14:04 UTC (permalink / raw)
  To: Feng Xue OS
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

On Thu, Oct 31, 2019 at 3:38 PM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> Hi, Richard
>
>    This is a new patch to support more generalized semi-invariant condition, which uses
> control dependence analysis.

Uh.  Note it's not exactly helpful to change algorithms between
reviews, that makes it
just harder :/

Btw, I notice you use post-dominance info.  Note that we generally do
not keep that
up-to-date with CFG manipulations (and for dominators fast queries are
disabled).
Probably the way we walk & transform loops makes this safe but it's something to
remember when extending that.  Possibly doing analysis of all candidates first
and then applying the transform for all wanted cases would avoid this (and maybe
also can reduce the number of update_ssa calls).  I guess this can be done as
followup.

The patch is OK.

Thanks,
Richard.



> Thanks,
> Feng
>
> ________________________________________
> From: Feng Xue OS <fxue@os.amperecomputing.com>
> Sent: Friday, October 25, 2019 11:43 AM
> To: Richard Biener
> Cc: Michael Matz; Philipp Tomsich; gcc-patches@gcc.gnu.org; Christoph Müllner; erick.ochoa@theobroma-systems.com
> Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
>
> Richard,
>
>     Thanks for your comments.
>
> >+      /* For PHI node that is not in loop header, its source operands should
> >+        be defined inside the loop, which are seen as loop variant.  */
> >+      if (def_bb != loop->header || !skip_head)
> >+       return false;
>
> > so if we have
> >
> > for (;;)
> >  {
> >     if (x)
> >       a = ..;
> >     else
> >       a = ...;
> >     if (cond-to-split-on dependent on a)
> > ...
> >  }
> >
> > the above is too restrictive in case 'x' is semi-invariant as well, correct?
> In above case, cond-on-a will not be identified as semi-invariant, in that
> a is defined by PHI with real multi-sources. To handle it,  besides each
> source value, we should add extra check on each source's control
> dependence node (x in the case), which might have not a little code expansion.
> Anyway, I'll have a try.
>
>
> >+         /* A new value comes from outside of loop.  */
> >+         if (!bb || !flow_bb_inside_loop_p (loop, bb))
> >+           return false;
>
> > but that means starting from the second iteration the value is invariant.
> No. Traversal direction is reverse to loop execution. In the following,
> start from "x_1 = ", extract latch value x_3, and get x_3 definition, and
> finally reach "x_1 =".
>
> Loop:
>       x_1 = PHI (x_0, x_3)
>       ...
>       x_3 =
>       ...
>       goto Loop;
>
>
> >+                 /* Don't consider redefinitions in excluded basic blocks.  */
> >+                 if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
> >+                   {
> >+                     /* There are more than one source operands that can
> >+                        provide value to the SSA name, it is variant.  */
> >+                     if (from)
> >+                       return false;
> >
> > they might be the same though, for PHIs with > 2 arguments.
> OK. Will add value equivalence check.
>
>
> > In the cycle handling you are not recursing via stmt_semi_invariant_p
> > but only handle SSA name copies - any particular reason for that?
> The cycle handling is specified for ssa that crosses iteration. It is
> semi-invariant if it remains unchanged after certain iteration, which
> means its value in previous iteration (coming from latch edge) is just
> a copy of its self,  nothing else. So, recursion via stmt_semi_invariant_p
> is unnecessary.
>
> Loop:
>       x_1 = PHI (x_0, x_3);
>       x_2 = PHI(x_1, value defined in excluded branch);
>       x_3 = x_2;
>       goto Loop;
>
>
> >+static bool
> >+branch_removable_p (basic_block branch_bb)
> >+{
> >+  if (single_pred_p (branch_bb))
> >+    return true;
> >
> > I'm not sure what this function tests - at least the single_pred_p check
> > looks odd to me given the dominator checks later.  The single predecessor
> > could simply be a forwarder.  I wonder if you are looking for branches forming
> > an irreducible loop?  I think you can then check EDGE_IRREDUCIBLE_LOOP
> > or BB_IRREDUCIBLE_LOOP on the condition block (btw, I don't see
> > testcases covering the appearant special-cases in the patch - refering to
> > existing ones via a comment often helps understanding the code).
>
> Upon condition evaluation, if a branch is not selected,
> This function test a branch is reachable from other place other than its
> conditional statement. This ensure that when the branch is not selected
> upon condition evaluation, trace path led by the branch will never
> be executed so that it can be excluded  during semi-invariantness analysis.
>
> If single_pred_p, only condition statement can reach the branch.
>
> If not, consider a half diamond condition control graph, with a back-edge to
> true branch.
>
>             condition
>                |  \
>                |   \
>                |  false branch
>    .--->----.  |   /
>    |        |  |  /
>  other    true branch
>    |        |
>    '---<----'
>
> If there is an edge from false branch, true branch can not be excluded even it
> is not selected.  And back edge from "other" (dominated by true branch) does
> not have any impact.
>
>
> >+
> >+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
> >+}
> >
> > magic ensures that invar[1] is always the invariant edge?  Oh, it's a bool.
> > Ick.  I wonder if logic with int invariant_edge = -1; and the loop setting
> > it to either 0 or 1 would be easier to follow...
> OK.
>
>
> > Note your stmt_semi_invariant_p check is exponential for a condition
> > like
> >
> >   _1 = 1;
> >   _2 = _1 + _1;
> >   _3 = _2 + _2;
> >   if (_3 != param_4(D))
> >
> > because you don't track ops you already proved semi-invariant.  We've
> > run into such situation repeatedly in SCEV analysis so I doubt it can be
> > disregarded as irrelevant in practice.  A worklist approach could then
> > also get rid of the recursion.  You are already computing the stmts
> > forming the condition in compute_added_num_insns so another option
> > is to re-use that.
> OK.
>
>
> > Btw, I wonder if we can simply re-use PARAM_MAX_PEELED_INSNS
> > instead of adding yet another param (it also happens to have the same
> > size).  Because we are "peeling" the loop.
> I'll check that.
>
> >+  edge invar_branch = get_cond_invariant_branch (loop, cond);
> >+
> >+  if (!invar_branch)
> >+    return NULL;
> >
> > extra vertical space is unwanted in such cases.
> OK.
>
> >+  if (dump_file && (dump_flags & TDF_DETAILS))
> >+   {
> >+     fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
> >+             current_function_name (), loop1->num,
> >+             true_invar ? "T" : "F", cond_bb->index);
> >+     print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
> >+   }
> >
> > can you please use sth like
> >
> >  if (dump_enabled_p ())
> >    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >                             cond, "loop split on semi-invariant condition");
> >
> > so -fopt-info-loop will show it?
> OK.
>
>
> >+  /* Generate a bool type temporary to hold result of the condition.  */
> >+  tree tmp = make_ssa_name (boolean_type_node);
> >+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> >+  gimple *stmt = gimple_build_assign (tmp,
> >+                                     gimple_cond_code (cond),
> >+                                     gimple_cond_lhs (cond),
> >+                                     gimple_cond_rhs (cond));
> >
> > shorter is
> >
> >   gimple_seq stmts = NULL;
> >   tree tmp = gimple_build (&stmts, gimple_cond_code (cond),
> >                                      boolean_type_node,
> >  gimple_cond_lhs (cond), gimple_cond_rhs (cond));
> >   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
> OK.
>
>
> >+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
> >+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);
> >
> > but I wonder what's the point here to move the condition computation to
> > a temporary?  Why not just build the original condition again for break_cond?
> OK.
>
>
> > in split_loop_on_cond you'll find the first semi-invariant condition
> > to split on,
> > but we'll not visit the split loop again (also for original splitting I guess).
> > Don't we eventually want to recurse on that?
> Currently, we only do a round of loop-split. It is a TODO to enable more than
> one loop-splits on a loop.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V4] Loop split upon semi-invariant condition (PR tree-optimization/89134)
  2019-11-05 14:04                                     ` Richard Biener
@ 2019-11-06  7:13                                       ` Feng Xue OS
  0 siblings, 0 replies; 31+ messages in thread
From: Feng Xue OS @ 2019-11-06  7:13 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Philipp Tomsich, gcc-patches,
	Christoph Müllner, erick.ochoa

> Uh.  Note it's not exactly helpful to change algorithms between
> reviews, that makes it
> just harder :/
>
> Btw, I notice you use post-dominance info.  Note that we generally do
> not keep that
> up-to-date with CFG manipulations (and for dominators fast queries are
> disabled).
> Probably the way we walk & transform loops makes this safe but it's something to
> remember when extending that.  Possibly doing analysis of all candidates first
> and then applying the transform for all wanted cases would avoid this (and maybe
> also can reduce the number of update_ssa calls).  I guess this can be done as
> followup.

Ok. Thanks for the suggestion.

Feng

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-11-06  7:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12  7:33 [PATCH] Loop split upon semi-invariant condition (PR tree-optimization/89134) Feng Xue OS
2019-03-12  8:33 ` Richard Biener
2019-03-13  2:13   ` Feng Xue OS
2019-03-13  9:43     ` Kyrill Tkachov
2019-03-13 12:11       ` Richard Biener
2019-03-13 12:39         ` Kyrill Tkachov
2019-03-14  3:31       ` Feng Xue OS
2019-05-06  3:04   ` Feng Xue OS
2019-05-06 10:17     ` Richard Biener
2019-06-18  7:00       ` Ping: [PATCH V2] " Feng Xue OS
2019-07-15  2:34         ` Ping agian: " Feng Xue OS
2019-07-29 20:30           ` Michael Matz
2019-07-31  7:25             ` Feng Xue OS
2019-09-12 10:21             ` Feng Xue OS
2019-09-12 10:23               ` [PATCH V3] " Feng Xue OS
2019-10-15 16:01                 ` Philipp Tomsich
2019-10-15 16:06                   ` Michael Matz
2019-10-22 10:16                     ` Feng Xue OS
2019-10-22 11:16                       ` Michael Matz
2019-10-23  5:49                         ` Feng Xue OS
2019-10-23  9:10                           ` Richard Biener
2019-10-23  9:37                             ` Feng Xue OS
2019-10-23 10:32                               ` Richard Biener
2019-10-25  5:20                                 ` Feng Xue OS
2019-10-31 15:56                                   ` [PATCH V4] " Feng Xue OS
2019-11-05 14:04                                     ` Richard Biener
2019-11-06  7:13                                       ` Feng Xue OS
2019-10-16  2:00                   ` [PATCH V3] " Feng Xue OS
2019-10-09  4:42               ` Ping: [PATCH V2] " Feng Xue OS
2019-09-12 11:10           ` Ping agian: " Richard Biener
2019-09-12 13:52             ` Feng Xue OS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).