public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] loop split fix and functions renaming
@ 2021-10-27  6:34 Xionghu Luo
  2021-10-27  6:34 ` [PATCH v2 1/4] Fix loop split incorrect count and probability Xionghu Luo
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Xionghu Luo @ 2021-10-27  6:34 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka,
	Xionghu Luo

This patchset is an followed update from [1].
Patch 1 is expecting review comments from Honza[2];
Patch 2 refactors loop_version to remove loopify call and adjust
condition generation later than loopify;
Patch 3 and Patch 4 are function renamings to help better understanding.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582600.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582607.html

Xionghu Luo (4):
  Fix loop split incorrect count and probability
  Refactor loop_version
  Rename loop_version to clone_loop_to_header_edge.
  Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge

 gcc/cfghooks.c                |  27 +++----
 gcc/cfghooks.h                |  13 ++-
 gcc/cfgloopmanip.c            | 144 +++++++++++-----------------------
 gcc/cfgloopmanip.h            |  17 ++--
 gcc/cfgrtl.c                  |   2 +-
 gcc/doc/loop.texi             |   4 +-
 gcc/gimple-loop-versioning.cc |  11 +--
 gcc/loop-unroll.c             |  27 +++----
 gcc/modulo-sched.c            |   6 +-
 gcc/tree-cfg.c                |   2 +-
 gcc/tree-if-conv.c            |  13 +--
 gcc/tree-loop-distribution.c  |   4 +-
 gcc/tree-parloops.c           |  10 +--
 gcc/tree-ssa-loop-ivcanon.c   |   4 +-
 gcc/tree-ssa-loop-manip.c     |  31 ++++----
 gcc/tree-ssa-loop-manip.h     |   7 +-
 gcc/tree-ssa-loop-split.c     |  39 +++++----
 gcc/tree-ssa-loop-unswitch.c  |   8 +-
 gcc/tree-vect-loop-manip.c    |   5 +-
 19 files changed, 159 insertions(+), 215 deletions(-)

-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] Fix loop split incorrect count and probability
  2021-10-27  6:34 [PATCH v2 0/4] loop split fix and functions renaming Xionghu Luo
@ 2021-10-27  6:34 ` Xionghu Luo
  2021-10-27  7:07   ` Jan Hubicka
  2021-10-27  6:34 ` [PATCH v2 2/4] Refactor loop_version Xionghu Luo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Xionghu Luo @ 2021-10-27  6:34 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka,
	Xionghu Luo

loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.

Regression tested pass, OK for master?

diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   int prephitmp_16;
   int prephitmp_25;

   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

-  <bb 6> [local count: 315357972]:
+  <bb 6> [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..d30782888f3 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -575,7 +575,11 @@ split_loop (class loop *loop1)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
+			   ? EDGE_SUCC (bbs[i], 0)
+			   : EDGE_SUCC (bbs[i], 1);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -583,10 +587,10 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
 					   true);
 	gcc_assert (loop2);
 
@@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
-				     profile_probability::always (),
-				     profile_probability::always (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     true);
   if (!loop2)
     {
@@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
   to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
 
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
   /* Due to introduction of a control flow edge from loop1 latch to loop2
      pre-header, we should update PHIs in loop2 to reflect this connection
      between loop1 and loop2.  */
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] Refactor loop_version
  2021-10-27  6:34 [PATCH v2 0/4] loop split fix and functions renaming Xionghu Luo
  2021-10-27  6:34 ` [PATCH v2 1/4] Fix loop split incorrect count and probability Xionghu Luo
@ 2021-10-27  6:34 ` Xionghu Luo
  2021-10-29 11:52   ` Richard Biener
  2021-10-27  6:34 ` [PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge Xionghu Luo
  2021-10-27  6:34 ` [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge Xionghu Luo
  3 siblings, 1 reply; 15+ messages in thread
From: Xionghu Luo @ 2021-10-27  6:34 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka,
	Xionghu Luo

loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header.  This patch moves
the condition generation later and thus we have four pieces to help
understanding of how the adjustment works:
 1) duplicating the loop on the entry edge.
 2) loopify the duplicated new loop.
 3) adjusting the CFG to insert a condition branching to either loop
 with lv_adjust_loop_entry_edge.
 4) From loopify extract the scale_loop_frequencies bits.

Also removed some piece of code seems obviously useless which is not
completely sure:
 - redirect_all_edges since it is false and loopify only called once.
 - extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
 edge is not redirected actually.

gcc/ChangeLog:

	* cfgloopmanip.c (loop_version): Refactor loopify to
	loop_version.  Move condition generation after loopify.
	(loopify): Delete.
	* cfgloopmanip.h (loopify): Delete.
---
 gcc/cfgloopmanip.c | 113 ++++++++++++---------------------------------
 gcc/cfgloopmanip.h |   3 --
 2 files changed, 29 insertions(+), 87 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 82c242dd720..a30ebe1cdb4 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -846,71 +846,6 @@ create_empty_loop_on_edge (edge entry_edge,
   return loop;
 }
 
-/* Make area between HEADER_EDGE and LATCH_EDGE a loop by connecting
-   latch to header and update loop tree and dominators
-   accordingly. Everything between them plus LATCH_EDGE destination must
-   be dominated by HEADER_EDGE destination, and back-reachable from
-   LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
-   FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
-   TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
-   Returns the newly created loop.  Frequencies and counts in the new loop
-   are scaled by FALSE_SCALE and in the old one by TRUE_SCALE.  */
-
-class loop *
-loopify (edge latch_edge, edge header_edge,
-	 basic_block switch_bb, edge true_edge, edge false_edge,
-	 bool redirect_all_edges, profile_probability true_scale,
-	 profile_probability false_scale)
-{
-  basic_block succ_bb = latch_edge->dest;
-  basic_block pred_bb = header_edge->src;
-  class loop *loop = alloc_loop ();
-  class loop *outer = loop_outer (succ_bb->loop_father);
-  profile_count cnt;
-
-  loop->header = header_edge->dest;
-  loop->latch = latch_edge->src;
-
-  cnt = header_edge->count ();
-
-  /* Redirect edges.  */
-  loop_redirect_edge (latch_edge, loop->header);
-  loop_redirect_edge (true_edge, succ_bb);
-
-  /* During loop versioning, one of the switch_bb edge is already properly
-     set. Do not redirect it again unless redirect_all_edges is true.  */
-  if (redirect_all_edges)
-    {
-      loop_redirect_edge (header_edge, switch_bb);
-      loop_redirect_edge (false_edge, loop->header);
-
-      /* Update dominators.  */
-      set_immediate_dominator (CDI_DOMINATORS, switch_bb, pred_bb);
-      set_immediate_dominator (CDI_DOMINATORS, loop->header, switch_bb);
-    }
-
-  set_immediate_dominator (CDI_DOMINATORS, succ_bb, switch_bb);
-
-  /* Compute new loop.  */
-  add_loop (loop, outer);
-
-  /* Add switch_bb to appropriate loop.  */
-  if (switch_bb->loop_father)
-    remove_bb_from_loops (switch_bb);
-  add_bb_to_loop (switch_bb, outer);
-
-  /* Fix counts.  */
-  if (redirect_all_edges)
-    {
-      switch_bb->count = cnt;
-    }
-  scale_loop_frequencies (loop, false_scale);
-  scale_loop_frequencies (succ_bb->loop_father, true_scale);
-  update_dominators_in_loop (loop);
-
-  return loop;
-}
-
 /* Remove the latch edge of a LOOP and update loops to indicate that
    the LOOP was removed.  After this function, original loop latch will
    have no successor, which caller is expected to fix somehow.
@@ -1681,7 +1616,7 @@ loop_version (class loop *loop,
 	      bool place_after)
 {
   basic_block first_head, second_head;
-  edge entry, latch_edge, true_edge, false_edge;
+  edge entry, latch_edge;
   int irred_flag;
   class loop *nloop;
   basic_block cond_bb;
@@ -1694,7 +1629,7 @@ loop_version (class loop *loop,
   /* Note down head of loop as first_head.  */
   first_head = entry->dest;
 
-  /* Duplicate loop.  */
+  /* 1) Duplicate loop on the entry edge.  */
   if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
 					       NULL, NULL, NULL, 0))
     {
@@ -1702,11 +1637,28 @@ loop_version (class loop *loop,
       return NULL;
     }
 
+  /* 2) loopify the duplicated new loop. */
+  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
+  nloop = alloc_loop ();
+  class loop *outer = loop_outer (latch_edge->dest->loop_father);
+  edge new_header_edge = single_pred_edge (get_bb_copy (loop->header));
+  nloop->header = new_header_edge->dest;
+  nloop->latch = latch_edge->src;
+  loop_redirect_edge (latch_edge, nloop->header);
+
+  /* Compute new loop.  */
+  add_loop (nloop, outer);
+  copy_loop_info (loop, nloop);
+  set_loop_copy (loop, nloop);
+
+  /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
+  lv_flush_pending_stmts (latch_edge);
+
   /* After duplication entry edge now points to new loop head block.
      Note down new head as second_head.  */
   second_head = entry->dest;
 
-  /* Split loop entry edge and insert new block with cond expr.  */
+  /* 3) Split loop entry edge and insert new block with cond expr.  */
   cond_bb =  lv_adjust_loop_entry_edge (first_head, second_head,
 					entry, cond_expr, then_prob, else_prob);
   if (condition_bb)
@@ -1718,24 +1670,17 @@ loop_version (class loop *loop,
       return NULL;
     }
 
-  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
-
-  extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
-  nloop = loopify (latch_edge,
-		   single_pred_edge (get_bb_copy (loop->header)),
-		   cond_bb, true_edge, false_edge,
-		   false /* Do not redirect all edges.  */,
-		   then_scale, else_scale);
-
-  copy_loop_info (loop, nloop);
-  set_loop_copy (loop, nloop);
+  /* Add cond_bb to appropriate loop.  */
+  if (cond_bb->loop_father)
+    remove_bb_from_loops (cond_bb);
+  add_bb_to_loop (cond_bb, outer);
 
-  /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
-  lv_flush_pending_stmts (latch_edge);
+  /* 4) Scale the original loop and new loop frequency.  */
+  scale_loop_frequencies (loop, then_scale);
+  scale_loop_frequencies (nloop, else_scale);
+  update_dominators_in_loop (loop);
+  update_dominators_in_loop (nloop);
 
-  /* loopify redirected condition_bb's succ edge. Update its PENDING_STMTS.  */
-  extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
-  lv_flush_pending_stmts (false_edge);
   /* Adjust irreducible flag.  */
   if (irred_flag)
     {
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 07d5f925b79..312a3b48d05 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -42,9 +42,6 @@ extern void scale_loop_profile (class loop *, profile_probability, gcov_type);
 extern edge create_empty_if_region_on_edge (edge, tree);
 extern class loop *create_empty_loop_on_edge (edge, tree, tree, tree, tree,
 					       tree *, tree *, class loop *);
-extern class loop *loopify (edge, edge,
-			     basic_block, edge, edge, bool,
-			     profile_probability, profile_probability);
 extern void unloop (class loop *, bool *, bitmap);
 extern void copy_loop_info (class loop *loop, class loop *target);
 extern class loop * duplicate_loop (class loop *, class loop *,
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge.
  2021-10-27  6:34 [PATCH v2 0/4] loop split fix and functions renaming Xionghu Luo
  2021-10-27  6:34 ` [PATCH v2 1/4] Fix loop split incorrect count and probability Xionghu Luo
  2021-10-27  6:34 ` [PATCH v2 2/4] Refactor loop_version Xionghu Luo
@ 2021-10-27  6:34 ` Xionghu Luo
  2021-11-03 13:36   ` Richard Biener
  2021-10-27  6:34 ` [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge Xionghu Luo
  3 siblings, 1 reply; 15+ messages in thread
From: Xionghu Luo @ 2021-10-27  6:34 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka,
	Xionghu Luo

Name loop_copy is used in gcc/cfg.c already.

gcc/ChangeLog:

	* cfgloopmanip.c (force_single_succ_latches): Rename
	loop_version to clone_loop_to_header_edge.
	(lv_adjust_loop_entry_edge): Likewise.
	(loop_version): Likewise.
	(clone_loop_to_header_edge): Likewise.
	* cfgloopmanip.h (class loop): Likewise.
	(clone_loop_to_header_edge): Likewise.
	* gimple-loop-versioning.cc (loop_versioning::version_loop): Likewise.
	* modulo-sched.c (sms_schedule): Likewise.
	* tree-if-conv.c (version_loop_for_if_conversion): Likewise.
	* tree-loop-distribution.c (version_loop_by_alias_check): Likewise.
	* tree-parloops.c (gen_parallel_loop): Likewise.
	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Likewise.
	* tree-ssa-loop-split.c (split_loop): Likewise.
	(get_cond_branch_to_split_loop): Likewise.
	(do_split_loop_on_cond): Likewise.
	* tree-ssa-loop-unswitch.c (tree_unswitch_loop): Likewise.
	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
---
 gcc/cfgloopmanip.c            | 22 +++++++++++-----------
 gcc/cfgloopmanip.h            |  8 ++++----
 gcc/gimple-loop-versioning.cc | 11 ++++++-----
 gcc/modulo-sched.c            |  6 +++---
 gcc/tree-if-conv.c            | 13 +++++++------
 gcc/tree-loop-distribution.c  |  4 ++--
 gcc/tree-parloops.c           | 10 +++++-----
 gcc/tree-ssa-loop-manip.c     |  9 +++++----
 gcc/tree-ssa-loop-split.c     | 30 +++++++++++++++---------------
 gcc/tree-ssa-loop-unswitch.c  |  8 +++-----
 gcc/tree-vect-loop-manip.c    |  5 +++--
 11 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index a30ebe1cdb4..066fbddbcfe 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1535,11 +1535,10 @@ force_single_succ_latches (void)
   loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES);
 }
 
-/* This function is called from loop_version.  It splits the entry edge
-   of the loop we want to version, adds the versioning condition, and
-   adjust the edges to the two versions of the loop appropriately.
-   e is an incoming edge. Returns the basic block containing the
-   condition.
+/* This function is called from clone_loop_to_header_edge.  It splits the entry
+  edge of the loop we want to version, adds the versioning condition, and adjust
+  the edges to the two versions of the loop appropriately. e is an incoming
+  edge. Returns the basic block containing the condition.
 
    --- edge e ---- > [second_head]
 
@@ -1588,7 +1587,7 @@ lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head,
   return new_head;
 }
 
-/* Main entry point for Loop Versioning transformation.
+/* Main entry point for Loop copy transformation.
 
    This transformation given a condition and a loop, creates
    -if (condition) { loop_copy1 } else { loop_copy2 },
@@ -1609,11 +1608,12 @@ lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head,
    instruction stream, otherwise it is placed before LOOP.  */
 
 class loop *
-loop_version (class loop *loop,
-	      void *cond_expr, basic_block *condition_bb,
-	      profile_probability then_prob, profile_probability else_prob,
-	      profile_probability then_scale, profile_probability else_scale,
-	      bool place_after)
+clone_loop_to_header_edge (class loop *loop, void *cond_expr,
+			   basic_block *condition_bb,
+			   profile_probability then_prob,
+			   profile_probability else_prob,
+			   profile_probability then_scale,
+			   profile_probability else_scale, bool place_after)
 {
   basic_block first_head, second_head;
   edge entry, latch_edge;
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 312a3b48d05..eac09518702 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -55,9 +55,9 @@ extern bool mfb_keep_just (edge);
 basic_block create_preheader (class loop *, int);
 extern void create_preheaders (int);
 extern void force_single_succ_latches (void);
-class loop * loop_version (class loop *, void *,
-			    basic_block *,
-			    profile_probability, profile_probability,
-			    profile_probability, profile_probability, bool);
+class loop *
+clone_loop_to_header_edge (class loop *, void *, basic_block *,
+			   profile_probability, profile_probability,
+			   profile_probability, profile_probability, bool);
 
 #endif /* GCC_CFGLOOPMANIP_H */
diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
index 15e0803dc29..c0fa60b91f8 100644
--- a/gcc/gimple-loop-versioning.cc
+++ b/gcc/gimple-loop-versioning.cc
@@ -1686,11 +1686,12 @@ loop_versioning::version_loop (class loop *loop)
   /* Version the loop.  */
   initialize_original_copy_tables ();
   basic_block cond_bb;
-  li.optimized_loop = loop_version (loop, cond, &cond_bb,
-				    profile_probability::unlikely (),
-				    profile_probability::likely (),
-				    profile_probability::unlikely (),
-				    profile_probability::likely (), true);
+  li.optimized_loop
+    = clone_loop_to_header_edge (loop, cond, &cond_bb,
+				 profile_probability::unlikely (),
+				 profile_probability::likely (),
+				 profile_probability::unlikely (),
+				 profile_probability::likely (), true);
   free_original_copy_tables ();
   if (!li.optimized_loop)
     {
diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
index 1c1b459d34f..26cd76a279c 100644
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -1737,9 +1737,9 @@ sms_schedule (void)
 	      profile_probability prob = profile_probability::guessed_always ()
 				.apply_scale (PROB_SMS_ENOUGH_ITERATIONS, 100);
 
-	      loop_version (loop, comp_rtx, &condition_bb,
-	  		    prob, prob.invert (),
-			    prob, prob.invert (), true);
+	      clone_loop_to_header_edge (loop, comp_rtx, &condition_bb, prob,
+					 prob.invert (), prob, prob.invert (),
+					 true);
 	    }
 
 	  /* Now apply the scheduled kernel to the RTL of the loop.  */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index d7b7b309309..893231a2d16 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2871,7 +2871,8 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
+  /* Save BB->aux around clone_loop_to_header_edge as that uses the same field.
+   */
   save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
   void **saved_preds = XALLOCAVEC (void *, save_length);
   for (unsigned i = 0; i < save_length; i++)
@@ -2880,11 +2881,11 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
      is re-merged in the vectorizer.  */
-  new_loop = loop_version (loop, cond, &cond_bb,
-			   profile_probability::always (),
-			   profile_probability::always (),
-			   profile_probability::always (),
-			   profile_probability::always (), true);
+  new_loop = clone_loop_to_header_edge (loop, cond, &cond_bb,
+					profile_probability::always (),
+					profile_probability::always (),
+					profile_probability::always (),
+					profile_probability::always (), true);
   free_original_copy_tables ();
 
   for (unsigned i = 0; i < save_length; i++)
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 2df762c8aa8..ae543fa022d 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2705,8 +2705,8 @@ version_loop_by_alias_check (vec<struct partition *> *partitions,
 
   prob = profile_probability::guessed_always ().apply_scale (9, 10);
   initialize_original_copy_tables ();
-  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
-			prob, prob.invert (), true);
+  nloop = clone_loop_to_header_edge (loop, lhs, &cond_bb, prob, prob.invert (),
+				     prob, prob.invert (), true);
   free_original_copy_tables ();
   /* Record the original loop number in newly generated loops.  In case of
      distribution, the original loop will be distributed and the new loop
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 5e64d5ed7a3..e19fc52f38b 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -3083,11 +3083,11 @@ gen_parallel_loop (class loop *loop,
       initialize_original_copy_tables ();
 
       /* We assume that the loop usually iterates a lot.  */
-      loop_version (loop, many_iterations_cond, NULL,
-		    profile_probability::likely (),
-		    profile_probability::unlikely (),
-		    profile_probability::likely (),
-		    profile_probability::unlikely (), true);
+      clone_loop_to_header_edge (loop, many_iterations_cond, NULL,
+				 profile_probability::likely (),
+				 profile_probability::unlikely (),
+				 profile_probability::likely (),
+				 profile_probability::unlikely (), true);
       update_ssa (TODO_update_ssa);
       free_original_copy_tables ();
     }
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index c7a2f67b129..350e25bb8d2 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1282,10 +1282,11 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
 	 frequencies in its body because of this change (scale the frequencies
 	 of blocks before and after the exit by appropriate factors).  */
       profile_probability scale_unrolled = prob_entry;
-      new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
-			       prob_entry.invert (), scale_unrolled,
-			       profile_probability::guessed_always (),
-			       true);
+      new_loop
+	= clone_loop_to_header_edge (loop, enter_main_cond, NULL, prob_entry,
+				     prob_entry.invert (), scale_unrolled,
+				     profile_probability::guessed_always (),
+				     true);
       gcc_assert (new_loop != NULL);
       update_ssa (TODO_update_ssa);
 
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index d30782888f3..5bc9c0f6443 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -586,12 +586,12 @@ split_loop (class loop *loop1)
 	initialize_original_copy_tables ();
 	basic_block cond_bb;
 
-	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   true_edge->probability,
-					   true_edge->probability.invert (),
-					   true_edge->probability,
-					   true_edge->probability.invert (),
-					   true);
+	class loop *loop2
+	  = clone_loop_to_header_edge (loop1, cond, &cond_bb,
+				       true_edge->probability,
+				       true_edge->probability.invert (),
+				       true_edge->probability,
+				       true_edge->probability.invert (), true);
 	gcc_assert (loop2);
 
 	edge new_e = connect_loops (loop1, loop2);
@@ -1465,9 +1465,9 @@ get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
                              exits
 
    In the graph, loop1 represents the part derived from original one, and
-   loop2 is duplicated using loop_version (), which corresponds to the part
-   of original one being splitted out.  In original latch edge of loop1, we
-   insert a new conditional statement duplicated from the semi-invariant cond,
+   loop2 is duplicated using clone_loop_to_header_edge (), which corresponds to
+   the part of original one being splitted out.  In original latch edge of loop1,
+   we insert a new conditional statement duplicated from the semi-invariant cond,
    and one of its branch goes back to loop1 header as a latch edge, and the
    other branch goes to loop2 pre-header as an entry edge.  And also in loop2,
    we abandon the variant branch of the conditional statement by setting a
@@ -1489,12 +1489,12 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
 
   initialize_original_copy_tables ();
 
-  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     invar_branch->probability.invert (),
-				     invar_branch->probability,
-				     invar_branch->probability.invert (),
-				     invar_branch->probability,
-				     true);
+  struct loop *loop2
+    = clone_loop_to_header_edge (loop1, boolean_true_node, NULL,
+				 invar_branch->probability.invert (),
+				 invar_branch->probability,
+				 invar_branch->probability.invert (),
+				 invar_branch->probability, true);
   if (!loop2)
     {
       free_original_copy_tables ();
diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index fe4dacc0833..6beebe368a7 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -488,11 +488,9 @@ tree_unswitch_loop (class loop *loop,
 
   extract_true_false_edges_from_block (unswitch_on, &edge_true, &edge_false);
   prob_true = edge_true->probability;
-  return loop_version (loop, unshare_expr (cond),
-		       NULL, prob_true,
-		       prob_true.invert (),
-		       prob_true, prob_true.invert (),
-		       false);
+  return clone_loop_to_header_edge (loop, unshare_expr (cond), NULL, prob_true,
+				    prob_true.invert (), prob_true,
+				    prob_true.invert (), false);
 }
 
 /* Unswitch outer loops by hoisting invariant guard on
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 4988c93fdb6..ebc013f195e 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -3558,8 +3558,9 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 			 loop_to_version->num);
 
       initialize_original_copy_tables ();
-      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
-			    prob, prob.invert (), prob, prob.invert (), true);
+      nloop = clone_loop_to_header_edge (loop_to_version, cond_expr,
+					 &condition_bb, prob, prob.invert (),
+					 prob, prob.invert (), true);
       gcc_assert (nloop);
       nloop = get_loop_copy (loop);
 
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge
  2021-10-27  6:34 [PATCH v2 0/4] loop split fix and functions renaming Xionghu Luo
                   ` (2 preceding siblings ...)
  2021-10-27  6:34 ` [PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge Xionghu Luo
@ 2021-10-27  6:34 ` Xionghu Luo
  2021-10-29 11:50   ` Richard Biener
  3 siblings, 1 reply; 15+ messages in thread
From: Xionghu Luo @ 2021-10-27  6:34 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka,
	Xionghu Luo

gcc/ChangeLog:

	* cfghooks.c (cfg_hook_duplicate_loop_to_header_edge): Rename
	duplicate_loop_to_header_edge to
	duplicate_loop_body_to_header_edge.
	(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
	* cfghooks.h (struct cfg_hooks): Likewise.
	(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
	* cfgloopmanip.c (duplicate_loop_body_to_header_edge): Likewise.
	(clone_loop_to_header_edge): Likewise.
	* cfgloopmanip.h (duplicate_loop_body_to_header_edge): Likewise.
	* cfgrtl.c (struct cfg_hooks): Likewise.
	* doc/loop.texi: Likewise.
	* loop-unroll.c (unroll_loop_constant_iterations): Likewise.
	(unroll_loop_runtime_iterations): Likewise.
	(unroll_loop_stupid): Likewise.
	(apply_opt_in_copies): Likewise.
	* tree-cfg.c (struct cfg_hooks): Likewise.
	* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
	(try_peel_loop): Likewise.
	* tree-ssa-loop-manip.c (copy_phi_node_args): Likewise.
	(gimple_duplicate_loop_body_to_header_edge): Likewise.
	(tree_transform_and_unroll_loop): Likewise.
	* tree-ssa-loop-manip.h (gimple_duplicate_loop_body_to_header_edge):
	Likewise.
---
 gcc/cfghooks.c              | 27 ++++++++++++---------------
 gcc/cfghooks.h              | 13 ++++++-------
 gcc/cfgloopmanip.c          |  9 ++++-----
 gcc/cfgloopmanip.h          |  6 +++---
 gcc/cfgrtl.c                |  2 +-
 gcc/doc/loop.texi           |  4 ++--
 gcc/loop-unroll.c           | 27 ++++++++++++---------------
 gcc/tree-cfg.c              |  2 +-
 gcc/tree-ssa-loop-ivcanon.c |  4 ++--
 gcc/tree-ssa-loop-manip.c   | 22 ++++++++++++----------
 gcc/tree-ssa-loop-manip.h   |  7 +++----
 11 files changed, 58 insertions(+), 65 deletions(-)

diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index 50b9b177639..23eb364bee6 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -1226,25 +1226,22 @@ lv_flush_pending_stmts (edge e)
     cfg_hooks->flush_pending_stmts (e);
 }
 
-/* Loop versioning uses the duplicate_loop_to_header_edge to create
+/* Loop versioning uses the duplicate_loop_body_to_header_edge to create
    a new version of the loop basic-blocks, the parameters here are
-   exactly the same as in duplicate_loop_to_header_edge or
-   tree_duplicate_loop_to_header_edge; while in tree-ssa there is
+   exactly the same as in duplicate_loop_body_to_header_edge or
+   tree_duplicate_loop_body_to_header_edge; while in tree-ssa there is
    additional work to maintain ssa information that's why there is
-   a need to call the tree_duplicate_loop_to_header_edge rather
-   than duplicate_loop_to_header_edge when we are in tree mode.  */
+   a need to call the tree_duplicate_loop_body_to_header_edge rather
+   than duplicate_loop_body_to_header_edge when we are in tree mode.  */
 bool
-cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge e,
-					unsigned int ndupl,
-					sbitmap wont_exit, edge orig,
-					vec<edge> *to_remove,
-					int flags)
+cfg_hook_duplicate_loop_body_to_header_edge (class loop *loop, edge e,
+					     unsigned int ndupl,
+					     sbitmap wont_exit, edge orig,
+					     vec<edge> *to_remove, int flags)
 {
-  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_to_header_edge);
-  return cfg_hooks->cfg_hook_duplicate_loop_to_header_edge (loop, e,
-							    ndupl, wont_exit,
-							    orig, to_remove,
-							    flags);
+  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge);
+  return cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge (
+    loop, e, ndupl, wont_exit, orig, to_remove, flags);
 }
 
 /* Conditional jumps are represented differently in trees and RTL,
diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
index 8645fe5b9e7..29aa2bf0636 100644
--- a/gcc/cfghooks.h
+++ b/gcc/cfghooks.h
@@ -166,7 +166,7 @@ struct cfg_hooks
 
   /* A hook for duplicating loop in CFG, currently this is used
      in loop versioning.  */
-  bool (*cfg_hook_duplicate_loop_to_header_edge) (class loop *, edge,
+  bool (*cfg_hook_duplicate_loop_body_to_header_edge) (class loop *, edge,
 						  unsigned, sbitmap,
 						  edge, vec<edge> *,
 						  int);
@@ -250,12 +250,11 @@ extern bool block_ends_with_condjump_p (const_basic_block bb);
 extern int flow_call_edges_add (sbitmap);
 extern void execute_on_growing_pred (edge);
 extern void execute_on_shrinking_pred (edge);
-extern bool cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge,
-						    unsigned int ndupl,
-						    sbitmap wont_exit,
-						    edge orig,
-						    vec<edge> *to_remove,
-						    int flags);
+extern bool
+cfg_hook_duplicate_loop_body_to_header_edge (class loop *loop, edge,
+					     unsigned int ndupl,
+					     sbitmap wont_exit, edge orig,
+					     vec<edge> *to_remove, int flags);
 
 extern void lv_flush_pending_stmts (edge);
 extern void extract_cond_bb_edges (basic_block, edge *, edge*);
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 066fbddbcfe..455c3ef8db9 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1059,10 +1059,9 @@ can_duplicate_loop_p (const class loop *loop)
    impossible.  */
 
 bool
-duplicate_loop_to_header_edge (class loop *loop, edge e,
-			       unsigned int ndupl, sbitmap wont_exit,
-			       edge orig, vec<edge> *to_remove,
-			       int flags)
+duplicate_loop_body_to_header_edge (class loop *loop, edge e,
+				    unsigned int ndupl, sbitmap wont_exit,
+				    edge orig, vec<edge> *to_remove, int flags)
 {
   class loop *target, *aloop;
   class loop **orig_loops;
@@ -1630,7 +1629,7 @@ clone_loop_to_header_edge (class loop *loop, void *cond_expr,
   first_head = entry->dest;
 
   /* 1) Duplicate loop on the entry edge.  */
-  if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
+  if (!cfg_hook_duplicate_loop_body_to_header_edge (loop, entry, 1,
 					       NULL, NULL, NULL, 0))
     {
       entry->flags |= irred_flag;
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index eac09518702..42ba0689825 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -48,9 +48,9 @@ extern class loop * duplicate_loop (class loop *, class loop *,
 				     class loop * = NULL);
 extern void duplicate_subloops (class loop *, class loop *);
 extern bool can_duplicate_loop_p (const class loop *loop);
-extern bool duplicate_loop_to_header_edge (class loop *, edge,
-					   unsigned, sbitmap, edge,
- 					   vec<edge> *, int);
+extern bool
+duplicate_loop_body_to_header_edge (class loop *, edge, unsigned, sbitmap, edge,
+				    vec<edge> *, int);
 extern bool mfb_keep_just (edge);
 basic_block create_preheader (class loop *, int);
 extern void create_preheaders (int);
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 4f3b1e8f3dc..e3a724bddb4 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -5344,7 +5344,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
   rtl_flow_call_edges_add,
   NULL, /* execute_on_growing_pred */
   NULL, /* execute_on_shrinking_pred */
-  duplicate_loop_to_header_edge, /* duplicate loop for trees */
+  duplicate_loop_body_to_header_edge, /* duplicate loop for trees */
   rtl_lv_add_condition_to_bb, /* lv_add_condition_to_bb */
   NULL, /* lv_adjust_loop_header_phi*/
   rtl_extract_cond_bb_edges, /* extract_cond_bb_edges */
diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
index 94eed6720b1..29a580066d6 100644
--- a/gcc/doc/loop.texi
+++ b/gcc/doc/loop.texi
@@ -249,8 +249,8 @@ are only reliable for the innermost loops:
 @item @code{create_iv}: Creates a new induction variable.  Only works on
 GIMPLE@.  @code{standard_iv_increment_position} can be used to find a
 suitable place for the iv increment.
-@item @code{duplicate_loop_to_header_edge},
-@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
+@item @code{duplicate_loop_body_to_header_edge},
+@code{tree_duplicate_loop_body_to_header_edge}: These functions (on RTL and
 on GIMPLE) duplicate the body of the loop prescribed number of times on
 one of the edges entering loop header, thus performing either loop
 unrolling or loop peeling.  @code{can_duplicate_loop_p}
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 2b31fafa3a3..f554ebb8450 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -520,14 +520,11 @@ unroll_loop_constant_iterations (class loop *loop)
       if (exit_mod)
 	{
 	  opt_info_start_duplication (opt_info);
-          ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
-					      exit_mod,
-					      wont_exit, desc->out_edge,
-					      &remove_edges,
-					      DLTHE_FLAG_UPDATE_FREQ
-					      | (opt_info && exit_mod > 1
-						 ? DLTHE_RECORD_COPY_NUMBER
-						   : 0));
+	  ok = duplicate_loop_body_to_header_edge (
+	    loop, loop_preheader_edge (loop), exit_mod, wont_exit,
+	    desc->out_edge, &remove_edges,
+	    DLTHE_FLAG_UPDATE_FREQ
+	      | (opt_info && exit_mod > 1 ? DLTHE_RECORD_COPY_NUMBER : 0));
 	  gcc_assert (ok);
 
           if (opt_info && exit_mod > 1)
@@ -569,7 +566,7 @@ unroll_loop_constant_iterations (class loop *loop)
 	    bitmap_clear_bit (wont_exit, 1);
 
           opt_info_start_duplication (opt_info);
-	  ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
+	  ok = duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
 					      exit_mod + 1,
 					      wont_exit, desc->out_edge,
 					      &remove_edges,
@@ -606,7 +603,7 @@ unroll_loop_constant_iterations (class loop *loop)
   /* Now unroll the loop.  */
 
   opt_info_start_duplication (opt_info);
-  ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
+  ok = duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
 				      max_unroll,
 				      wont_exit, desc->out_edge,
 				      &remove_edges,
@@ -975,7 +972,7 @@ unroll_loop_runtime_iterations (class loop *loop)
       if (!desc->noloop_assumptions)
 	bitmap_set_bit (wont_exit, 1);
       ezc_swtch = loop_preheader_edge (loop)->src;
-      ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
+      ok = duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
 					  1, wont_exit, desc->out_edge,
 					  &remove_edges,
 					  DLTHE_FLAG_UPDATE_FREQ);
@@ -997,7 +994,7 @@ unroll_loop_runtime_iterations (class loop *loop)
       bitmap_clear (wont_exit);
       if (i != n_peel - 1 || !last_may_exit)
 	bitmap_set_bit (wont_exit, 1);
-      ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
+      ok = duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
 					  1, wont_exit, desc->out_edge,
 					  &remove_edges,
 					  DLTHE_FLAG_UPDATE_FREQ);
@@ -1061,7 +1058,7 @@ unroll_loop_runtime_iterations (class loop *loop)
   bitmap_clear_bit (wont_exit, may_exit_copy);
   opt_info_start_duplication (opt_info);
 
-  ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
+  ok = duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
 				      max_unroll,
 				      wont_exit, desc->out_edge,
 				      &remove_edges,
@@ -1255,7 +1252,7 @@ unroll_loop_stupid (class loop *loop)
   bitmap_clear (wont_exit);
   opt_info_start_duplication (opt_info);
 
-  ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
+  ok = duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
 				      nunroll, wont_exit,
 				      NULL, NULL,
 				      DLTHE_FLAG_UPDATE_FREQ
@@ -2019,7 +2016,7 @@ apply_opt_in_copies (struct opt_info *opt_info,
       orig_bb = get_bb_original (bb);
 
       /* bb->aux holds position in copy sequence initialized by
-	 duplicate_loop_to_header_edge.  */
+	 duplicate_loop_body_to_header_edge.  */
       delta = determine_split_iv_delta ((size_t)bb->aux, n_copies,
 					unrolling);
       bb->aux = 0;
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 9883eaaa9bf..d0177cb1c07 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -9072,7 +9072,7 @@ struct cfg_hooks gimple_cfg_hooks = {
   gimple_flow_call_edges_add,   /* flow_call_edges_add */
   gimple_execute_on_growing_pred,	/* execute_on_growing_pred */
   gimple_execute_on_shrinking_pred, /* execute_on_shrinking_pred */
-  gimple_duplicate_loop_to_header_edge, /* duplicate loop for trees */
+  gimple_duplicate_loop_body_to_header_edge, /* duplicate loop for trees */
   gimple_lv_add_condition_to_bb, /* lv_add_condition_to_bb */
   gimple_lv_adjust_loop_header_phi, /* lv_adjust_loop_header_phi*/
   extract_true_false_edges_from_block, /* extract_cond_bb_edges */
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8d8791f837e..a9107ef5ffb 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -903,7 +903,7 @@ try_unroll_loop_completely (class loop *loop,
       if (may_be_zero)
 	bitmap_clear_bit (wont_exit, 1);
 
-      if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
+      if (!gimple_duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
 						 n_unroll, wont_exit,
 						 exit, &edges_to_remove,
 						 DLTHE_FLAG_UPDATE_FREQ
@@ -1094,7 +1094,7 @@ try_peel_loop (class loop *loop,
     }
   if (may_be_zero)
     bitmap_clear_bit (wont_exit, 1);
-  if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
+  if (!gimple_duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
 					     npeel, wont_exit,
 					     exit, &edges_to_remove,
 					     DLTHE_FLAG_UPDATE_FREQ))
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 350e25bb8d2..b60da4fb084 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -911,7 +911,7 @@ copy_phi_node_args (unsigned first_new_block)
 }
 
 
-/* The same as cfgloopmanip.c:duplicate_loop_to_header_edge, but also
+/* The same as cfgloopmanip.c:duplicate_loop_body_to_header_edge, but also
    updates the PHI nodes at start of the copied region.  In order to
    achieve this, only loops whose exits all lead to the same location
    are handled.
@@ -921,10 +921,10 @@ copy_phi_node_args (unsigned first_new_block)
    after the loop has been duplicated.  */
 
 bool
-gimple_duplicate_loop_to_header_edge (class loop *loop, edge e,
-				    unsigned int ndupl, sbitmap wont_exit,
-				    edge orig, vec<edge> *to_remove,
-				    int flags)
+gimple_duplicate_loop_body_to_header_edge (class loop *loop, edge e,
+					   unsigned int ndupl,
+					   sbitmap wont_exit, edge orig,
+					   vec<edge> *to_remove, int flags)
 {
   unsigned first_new_block;
 
@@ -934,8 +934,8 @@ gimple_duplicate_loop_to_header_edge (class loop *loop, edge e,
     return false;
 
   first_new_block = last_basic_block_for_fn (cfun);
-  if (!duplicate_loop_to_header_edge (loop, e, ndupl, wont_exit,
-				      orig, to_remove, flags))
+  if (!duplicate_loop_body_to_header_edge (loop, e, ndupl, wont_exit, orig,
+					   to_remove, flags))
     return false;
 
   /* Readd the removed phi args for e.  */
@@ -1390,9 +1390,11 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
   bitmap_clear_bit (wont_exit, factor - 1);
 
   auto_vec<edge> to_remove;
-  bool ok = gimple_duplicate_loop_to_header_edge
-	  (loop, loop_latch_edge (loop), factor - 1,
-	   wont_exit, new_exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ);
+  bool ok
+    = gimple_duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
+						 factor - 1, wont_exit,
+						 new_exit, &to_remove,
+						 DLTHE_FLAG_UPDATE_FREQ);
   gcc_assert (ok);
 
   for (edge e : to_remove)
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 86fc118b6be..792c366701b 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -42,10 +42,9 @@ extern basic_block ip_end_pos (class loop *);
 extern basic_block ip_normal_pos (class loop *);
 extern void standard_iv_increment_position (class loop *,
 					    gimple_stmt_iterator *, bool *);
-extern bool gimple_duplicate_loop_to_header_edge (class loop *, edge,
-						  unsigned int, sbitmap,
-						  edge, vec<edge> *,
-						  int);
+extern bool
+gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
+					   sbitmap, edge, vec<edge> *, int);
 extern bool can_unroll_loop_p (class loop *loop, unsigned factor,
 			       class tree_niter_desc *niter);
 extern gcov_type niter_for_unrolled_loop (class loop *, unsigned);
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] Fix loop split incorrect count and probability
  2021-10-27  6:34 ` [PATCH v2 1/4] Fix loop split incorrect count and probability Xionghu Luo
@ 2021-10-27  7:07   ` Jan Hubicka
  2021-10-27  7:23     ` Jan Hubicka
  2021-10-27  7:29     ` Richard Biener
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Hubicka @ 2021-10-27  7:07 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, rguenther, segher, wschmidt, linkw, dje.gcc

> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> 	(do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3f6ad046623..d30782888f3 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
>  					    stmts2);
>  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>  	if (!initial_true)
> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> +			   ? EDGE_SUCC (bbs[i], 0)
> +			   : EDGE_SUCC (bbs[i], 1);
>  
>  	/* Now version the loop, placing loop2 after loop1 connecting
>  	   them, and fix up SSA form for that.  */
> @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
>  	basic_block cond_bb;
>  
>  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> +					   true_edge->probability,
> +					   true_edge->probability.invert (),
> +					   true_edge->probability,
> +					   true_edge->probability.invert (),
>  					   true);

As discussed yesterday, for loop of form

for (...)
  if (cond)
    cond = something();
  else
    something2

Split as

loop1:
for (...)
  if (true)
    cond = something();
    if (!cond)
      break
  else
    something2 ();

loop2:
for (...)
  if (false)
    cond = something();
  else
    something2 ();

If "if (cond)" has probability p, you want to scale loop1 by p
and loop2 by 1-p as your patch does, but you need to exclude the basic
blocks guarded by the condition.

One way is to break out loop_version and implement it inline, other
option (perhaps leading to less code duplication) is to add argument listing
basic blocks that should not be scaled, which would be set to both arms
of the if.

Are there other profile patches of your I should look at?
Honza
>  	gcc_assert (loop2);
>  
> @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>    initialize_original_copy_tables ();
>  
>    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -				     profile_probability::always (),
> -				     profile_probability::never (),
> -				     profile_probability::always (),
> -				     profile_probability::always (),
> +				     invar_branch->probability.invert (),
> +				     invar_branch->probability,
> +				     invar_branch->probability.invert (),
> +				     invar_branch->probability,
>  				     true);
>    if (!loop2)
>      {
> @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>    to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
>    to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
>  
> +  to_loop1->probability = invar_branch->probability.invert ();
> +  to_loop2->probability = invar_branch->probability;
> +
>    /* Due to introduction of a control flow edge from loop1 latch to loop2
>       pre-header, we should update PHIs in loop2 to reflect this connection
>       between loop1 and loop2.  */
> -- 
> 2.27.0.90.geebb51ba8c
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] Fix loop split incorrect count and probability
  2021-10-27  7:07   ` Jan Hubicka
@ 2021-10-27  7:23     ` Jan Hubicka
  2021-10-27  7:29     ` Richard Biener
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Hubicka @ 2021-10-27  7:23 UTC (permalink / raw)
  To: Xionghu Luo, rguenther, segher, wschmidt, linkw, gcc-patches, dje.gcc

> As discussed yesterday, for loop of form
> 
> for (...)
>   if (cond)
>     cond = something();
>   else
>     something2
> 
> Split as
> 

Say "if (cond)" has probability p, then individual statements scale as
follows:

  loop1:
p    for (...)
p      if (true)
1        cond = something();
1        if (!cond)
x          break
0      else
0        something2 ();
     
     loop2:
1-p  for (...)
1-p    if (false)
0        cond = something();
1      else
1        something2 ();

Block with x has same count as preheader of the loop.

Your patch does
  loop1:
p    for (...)
p      if (true)
p        cond = something();
p        if (!cond)
x          break
p      else
p        something2 ();
     
     loop2:
1-p  for (...)
1-p    if (false)
1-p      cond = something();
1-p    else
1-p      something2 ();

One does not need set 0 correctly (blocks will be removed), but it would
be nice to avoid dropping 1s down. Looking at this, things will go well
with your change if we are guaranteed that something() and something2()
is always 1 bb becuase block merging happening after turning condiitonal
to constant will remove the misupdated count. Your profile scales after merging
is:
  loop1:
p    for (...)
1        cond = something();
1        if (!cond)
x          break
     
     loop2:
1-p  for (...)
1       something2 ();

assuming that profile was sane and frequency of something() is
p*frequency of the conditional and similarly for something2().
This is why you see final profile correct.  So if we split only loops
with one BB then/else arms, the patch is OK with comment explaining
this.

Also I wonder, do we correctly duplicate&update known bounds on
iteration counts attached to the loop struccture?

Honza
> 
> If "if (cond)" has probability p, you want to scale loop1 by p
> and loop2 by 1-p as your patch does, but you need to exclude the basic
> blocks guarded by the condition.
> 
> One way is to break out loop_version and implement it inline, other
> option (perhaps leading to less code duplication) is to add argument listing
> basic blocks that should not be scaled, which would be set to both arms
> of the if.
> 
> Are there other profile patches of your I should look at?
> Honza
> >  	gcc_assert (loop2);
> >  
> > @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >    initialize_original_copy_tables ();
> >  
> >    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > -				     profile_probability::always (),
> > -				     profile_probability::never (),
> > -				     profile_probability::always (),
> > -				     profile_probability::always (),
> > +				     invar_branch->probability.invert (),
> > +				     invar_branch->probability,
> > +				     invar_branch->probability.invert (),
> > +				     invar_branch->probability,
> >  				     true);
> >    if (!loop2)
> >      {
> > @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >    to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> >    to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> >  
> > +  to_loop1->probability = invar_branch->probability.invert ();
> > +  to_loop2->probability = invar_branch->probability;
> > +
> >    /* Due to introduction of a control flow edge from loop1 latch to loop2
> >       pre-header, we should update PHIs in loop2 to reflect this connection
> >       between loop1 and loop2.  */
> > -- 
> > 2.27.0.90.geebb51ba8c
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] Fix loop split incorrect count and probability
  2021-10-27  7:07   ` Jan Hubicka
  2021-10-27  7:23     ` Jan Hubicka
@ 2021-10-27  7:29     ` Richard Biener
  2021-10-27  7:44       ` Jan Hubicka
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-10-27  7:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Xionghu Luo, gcc-patches, segher, wschmidt, linkw, dje.gcc

On Wed, 27 Oct 2021, Jan Hubicka wrote:

> > 
> > gcc/ChangeLog:
> > 
> > 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> > 	(do_split_loop_on_cond): Likewise.
> > ---
> >  gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
> >  1 file changed, 16 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index 3f6ad046623..d30782888f3 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
> >  					    stmts2);
> >  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> >  	if (!initial_true)
> > -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> > +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > +
> > +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > +			   ? EDGE_SUCC (bbs[i], 0)
> > +			   : EDGE_SUCC (bbs[i], 1);
> >  
> >  	/* Now version the loop, placing loop2 after loop1 connecting
> >  	   them, and fix up SSA form for that.  */
> > @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
> >  	basic_block cond_bb;
> >  
> >  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > -					   profile_probability::always (),
> > -					   profile_probability::always (),
> > -					   profile_probability::always (),
> > -					   profile_probability::always (),
> > +					   true_edge->probability,
> > +					   true_edge->probability.invert (),
> > +					   true_edge->probability,
> > +					   true_edge->probability.invert (),
> >  					   true);
> 
> As discussed yesterday, for loop of form
> 
> for (...)
>   if (cond)
>     cond = something();
>   else
>     something2
> 
> Split as

Note that you are missing to conditionalize loop1 execution
on 'cond' (not sure if that makes a difference).

> loop1:
if (cond)
> for (...)
>   if (true)
>     cond = something();
>     if (!cond)
>       break
>   else
>     something2 ();
> 
> loop2:
> for (...)
>   if (false)
>     cond = something();
>   else
>     something2 ();
> 
> If "if (cond)" has probability p, you want to scale loop1 by p
> and loop2 by 1-p as your patch does, but you need to exclude the basic
> blocks guarded by the condition.
> 
> One way is to break out loop_version and implement it inline, other
> option (perhaps leading to less code duplication) is to add argument listing
> basic blocks that should not be scaled, which would be set to both arms
> of the if.
> 
> Are there other profile patches of your I should look at?
> Honza
> >  	gcc_assert (loop2);
> >  
> > @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >    initialize_original_copy_tables ();
> >  
> >    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > -				     profile_probability::always (),
> > -				     profile_probability::never (),
> > -				     profile_probability::always (),
> > -				     profile_probability::always (),
> > +				     invar_branch->probability.invert (),
> > +				     invar_branch->probability,
> > +				     invar_branch->probability.invert (),
> > +				     invar_branch->probability,
> >  				     true);
> >    if (!loop2)
> >      {
> > @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >    to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> >    to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> >  
> > +  to_loop1->probability = invar_branch->probability.invert ();
> > +  to_loop2->probability = invar_branch->probability;
> > +
> >    /* Due to introduction of a control flow edge from loop1 latch to loop2
> >       pre-header, we should update PHIs in loop2 to reflect this connection
> >       between loop1 and loop2.  */
> > -- 
> > 2.27.0.90.geebb51ba8c
> > 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] Fix loop split incorrect count and probability
  2021-10-27  7:29     ` Richard Biener
@ 2021-10-27  7:44       ` Jan Hubicka
  2021-11-08  6:09         ` [PATCH v3 " Xionghu Luo
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Hubicka @ 2021-10-27  7:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Xionghu Luo, gcc-patches, segher, wschmidt, linkw, dje.gcc

> On Wed, 27 Oct 2021, Jan Hubicka wrote:
> 
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> > > 	(do_split_loop_on_cond): Likewise.
> > > ---
> > >  gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
> > >  1 file changed, 16 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > > index 3f6ad046623..d30782888f3 100644
> > > --- a/gcc/tree-ssa-loop-split.c
> > > +++ b/gcc/tree-ssa-loop-split.c
> > > @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
> > >  					    stmts2);
> > >  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> > >  	if (!initial_true)
> > > -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> > > +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > > +
> > > +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > > +			   ? EDGE_SUCC (bbs[i], 0)
> > > +			   : EDGE_SUCC (bbs[i], 1);
> > >  
> > >  	/* Now version the loop, placing loop2 after loop1 connecting
> > >  	   them, and fix up SSA form for that.  */
> > > @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
> > >  	basic_block cond_bb;
> > >  
> > >  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > > -					   profile_probability::always (),
> > > -					   profile_probability::always (),
> > > -					   profile_probability::always (),
> > > -					   profile_probability::always (),
> > > +					   true_edge->probability,
> > > +					   true_edge->probability.invert (),
> > > +					   true_edge->probability,
> > > +					   true_edge->probability.invert (),
> > >  					   true);
> > 
> > As discussed yesterday, for loop of form
> > 
> > for (...)
> >   if (cond)
> >     cond = something();
> >   else
> >     something2
> > 
> > Split as
> 
> Note that you are missing to conditionalize loop1 execution
> on 'cond' (not sure if that makes a difference).
You are right - forgot to mention that.

Entry conditional makes no difference on scaling stmts inside loop but
affects its header and expected trip count. We however need to set up
probability of this conditional (and preheader count if it exists)
There is no general way to read the probability of this initial
conditional from cfg profile.  So I guess we are stuck with guessing
some arbitrary value. I guess common case is that cond is true first
iteration tough and often we can easily see that fromo PHI node
initializing the test variable.

Other thing that changes is expected number of iterations of the split
loops, so we may want to update the exit conditinal probability
accordingly...

Honza
> 
> > loop1:
> if (cond)
> > for (...)
> >   if (true)
> >     cond = something();
> >     if (!cond)
> >       break
> >   else
> >     something2 ();
> > 
> > loop2:
> > for (...)
> >   if (false)
> >     cond = something();
> >   else
> >     something2 ();
> > 
> > If "if (cond)" has probability p, you want to scale loop1 by p
> > and loop2 by 1-p as your patch does, but you need to exclude the basic
> > blocks guarded by the condition.
> > 
> > One way is to break out loop_version and implement it inline, other
> > option (perhaps leading to less code duplication) is to add argument listing
> > basic blocks that should not be scaled, which would be set to both arms
> > of the if.
> > 
> > Are there other profile patches of your I should look at?
> > Honza
> > >  	gcc_assert (loop2);
> > >  
> > > @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> > >    initialize_original_copy_tables ();
> > >  
> > >    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > > -				     profile_probability::always (),
> > > -				     profile_probability::never (),
> > > -				     profile_probability::always (),
> > > -				     profile_probability::always (),
> > > +				     invar_branch->probability.invert (),
> > > +				     invar_branch->probability,
> > > +				     invar_branch->probability.invert (),
> > > +				     invar_branch->probability,
> > >  				     true);
> > >    if (!loop2)
> > >      {
> > > @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> > >    to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> > >    to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> > >  
> > > +  to_loop1->probability = invar_branch->probability.invert ();
> > > +  to_loop2->probability = invar_branch->probability;
> > > +
> > >    /* Due to introduction of a control flow edge from loop1 latch to loop2
> > >       pre-header, we should update PHIs in loop2 to reflect this connection
> > >       between loop1 and loop2.  */
> > > -- 
> > > 2.27.0.90.geebb51ba8c
> > > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge
  2021-10-27  6:34 ` [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge Xionghu Luo
@ 2021-10-29 11:50   ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2021-10-29 11:50 UTC (permalink / raw)
  To: Xionghu Luo
  Cc: gcc-patches, segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka

On Wed, 27 Oct 2021, Xionghu Luo wrote:

> gcc/ChangeLog:
> 
> 	* cfghooks.c (cfg_hook_duplicate_loop_to_header_edge): Rename
> 	duplicate_loop_to_header_edge to
> 	duplicate_loop_body_to_header_edge.
> 	(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
> 	* cfghooks.h (struct cfg_hooks): Likewise.
> 	(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
> 	* cfgloopmanip.c (duplicate_loop_body_to_header_edge): Likewise.
> 	(clone_loop_to_header_edge): Likewise.
> 	* cfgloopmanip.h (duplicate_loop_body_to_header_edge): Likewise.
> 	* cfgrtl.c (struct cfg_hooks): Likewise.
> 	* doc/loop.texi: Likewise.
> 	* loop-unroll.c (unroll_loop_constant_iterations): Likewise.
> 	(unroll_loop_runtime_iterations): Likewise.
> 	(unroll_loop_stupid): Likewise.
> 	(apply_opt_in_copies): Likewise.
> 	* tree-cfg.c (struct cfg_hooks): Likewise.
> 	* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
> 	(try_peel_loop): Likewise.
> 	* tree-ssa-loop-manip.c (copy_phi_node_args): Likewise.
> 	(gimple_duplicate_loop_body_to_header_edge): Likewise.
> 	(tree_transform_and_unroll_loop): Likewise.
> 	* tree-ssa-loop-manip.h (gimple_duplicate_loop_body_to_header_edge):
> 	Likewise.

This renaming is OK (you can commit it independently).

Thanks,
Richard.

> ---
>  gcc/cfghooks.c              | 27 ++++++++++++---------------
>  gcc/cfghooks.h              | 13 ++++++-------
>  gcc/cfgloopmanip.c          |  9 ++++-----
>  gcc/cfgloopmanip.h          |  6 +++---
>  gcc/cfgrtl.c                |  2 +-
>  gcc/doc/loop.texi           |  4 ++--
>  gcc/loop-unroll.c           | 27 ++++++++++++---------------
>  gcc/tree-cfg.c              |  2 +-
>  gcc/tree-ssa-loop-ivcanon.c |  4 ++--
>  gcc/tree-ssa-loop-manip.c   | 22 ++++++++++++----------
>  gcc/tree-ssa-loop-manip.h   |  7 +++----
>  11 files changed, 58 insertions(+), 65 deletions(-)
> 
> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
> index 50b9b177639..23eb364bee6 100644
> --- a/gcc/cfghooks.c
> +++ b/gcc/cfghooks.c
> @@ -1226,25 +1226,22 @@ lv_flush_pending_stmts (edge e)
>      cfg_hooks->flush_pending_stmts (e);
>  }
>  
> -/* Loop versioning uses the duplicate_loop_to_header_edge to create
> +/* Loop versioning uses the duplicate_loop_body_to_header_edge to create
>     a new version of the loop basic-blocks, the parameters here are
> -   exactly the same as in duplicate_loop_to_header_edge or
> -   tree_duplicate_loop_to_header_edge; while in tree-ssa there is
> +   exactly the same as in duplicate_loop_body_to_header_edge or
> +   tree_duplicate_loop_body_to_header_edge; while in tree-ssa there is
>     additional work to maintain ssa information that's why there is
> -   a need to call the tree_duplicate_loop_to_header_edge rather
> -   than duplicate_loop_to_header_edge when we are in tree mode.  */
> +   a need to call the tree_duplicate_loop_body_to_header_edge rather
> +   than duplicate_loop_body_to_header_edge when we are in tree mode.  */
>  bool
> -cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge e,
> -					unsigned int ndupl,
> -					sbitmap wont_exit, edge orig,
> -					vec<edge> *to_remove,
> -					int flags)
> +cfg_hook_duplicate_loop_body_to_header_edge (class loop *loop, edge e,
> +					     unsigned int ndupl,
> +					     sbitmap wont_exit, edge orig,
> +					     vec<edge> *to_remove, int flags)
>  {
> -  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_to_header_edge);
> -  return cfg_hooks->cfg_hook_duplicate_loop_to_header_edge (loop, e,
> -							    ndupl, wont_exit,
> -							    orig, to_remove,
> -							    flags);
> +  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge);
> +  return cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge (
> +    loop, e, ndupl, wont_exit, orig, to_remove, flags);
>  }
>  
>  /* Conditional jumps are represented differently in trees and RTL,
> diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
> index 8645fe5b9e7..29aa2bf0636 100644
> --- a/gcc/cfghooks.h
> +++ b/gcc/cfghooks.h
> @@ -166,7 +166,7 @@ struct cfg_hooks
>  
>    /* A hook for duplicating loop in CFG, currently this is used
>       in loop versioning.  */
> -  bool (*cfg_hook_duplicate_loop_to_header_edge) (class loop *, edge,
> +  bool (*cfg_hook_duplicate_loop_body_to_header_edge) (class loop *, edge,
>  						  unsigned, sbitmap,
>  						  edge, vec<edge> *,
>  						  int);
> @@ -250,12 +250,11 @@ extern bool block_ends_with_condjump_p (const_basic_block bb);
>  extern int flow_call_edges_add (sbitmap);
>  extern void execute_on_growing_pred (edge);
>  extern void execute_on_shrinking_pred (edge);
> -extern bool cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge,
> -						    unsigned int ndupl,
> -						    sbitmap wont_exit,
> -						    edge orig,
> -						    vec<edge> *to_remove,
> -						    int flags);
> +extern bool
> +cfg_hook_duplicate_loop_body_to_header_edge (class loop *loop, edge,
> +					     unsigned int ndupl,
> +					     sbitmap wont_exit, edge orig,
> +					     vec<edge> *to_remove, int flags);
>  
>  extern void lv_flush_pending_stmts (edge);
>  extern void extract_cond_bb_edges (basic_block, edge *, edge*);
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 066fbddbcfe..455c3ef8db9 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -1059,10 +1059,9 @@ can_duplicate_loop_p (const class loop *loop)
>     impossible.  */
>  
>  bool
> -duplicate_loop_to_header_edge (class loop *loop, edge e,
> -			       unsigned int ndupl, sbitmap wont_exit,
> -			       edge orig, vec<edge> *to_remove,
> -			       int flags)
> +duplicate_loop_body_to_header_edge (class loop *loop, edge e,
> +				    unsigned int ndupl, sbitmap wont_exit,
> +				    edge orig, vec<edge> *to_remove, int flags)
>  {
>    class loop *target, *aloop;
>    class loop **orig_loops;
> @@ -1630,7 +1629,7 @@ clone_loop_to_header_edge (class loop *loop, void *cond_expr,
>    first_head = entry->dest;
>  
>    /* 1) Duplicate loop on the entry edge.  */
> -  if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
> +  if (!cfg_hook_duplicate_loop_body_to_header_edge (loop, entry, 1,
>  					       NULL, NULL, NULL, 0))
>      {
>        entry->flags |= irred_flag;
> diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
> index eac09518702..42ba0689825 100644
> --- a/gcc/cfgloopmanip.h
> +++ b/gcc/cfgloopmanip.h
> @@ -48,9 +48,9 @@ extern class loop * duplicate_loop (class loop *, class loop *,
>  				     class loop * = NULL);
>  extern void duplicate_subloops (class loop *, class loop *);
>  extern bool can_duplicate_loop_p (const class loop *loop);
> -extern bool duplicate_loop_to_header_edge (class loop *, edge,
> -					   unsigned, sbitmap, edge,
> - 					   vec<edge> *, int);
> +extern bool
> +duplicate_loop_body_to_header_edge (class loop *, edge, unsigned, sbitmap, edge,
> +				    vec<edge> *, int);
>  extern bool mfb_keep_just (edge);
>  basic_block create_preheader (class loop *, int);
>  extern void create_preheaders (int);
> diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
> index 4f3b1e8f3dc..e3a724bddb4 100644
> --- a/gcc/cfgrtl.c
> +++ b/gcc/cfgrtl.c
> @@ -5344,7 +5344,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
>    rtl_flow_call_edges_add,
>    NULL, /* execute_on_growing_pred */
>    NULL, /* execute_on_shrinking_pred */
> -  duplicate_loop_to_header_edge, /* duplicate loop for trees */
> +  duplicate_loop_body_to_header_edge, /* duplicate loop for trees */
>    rtl_lv_add_condition_to_bb, /* lv_add_condition_to_bb */
>    NULL, /* lv_adjust_loop_header_phi*/
>    rtl_extract_cond_bb_edges, /* extract_cond_bb_edges */
> diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
> index 94eed6720b1..29a580066d6 100644
> --- a/gcc/doc/loop.texi
> +++ b/gcc/doc/loop.texi
> @@ -249,8 +249,8 @@ are only reliable for the innermost loops:
>  @item @code{create_iv}: Creates a new induction variable.  Only works on
>  GIMPLE@.  @code{standard_iv_increment_position} can be used to find a
>  suitable place for the iv increment.
> -@item @code{duplicate_loop_to_header_edge},
> -@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
> +@item @code{duplicate_loop_body_to_header_edge},
> +@code{tree_duplicate_loop_body_to_header_edge}: These functions (on RTL and
>  on GIMPLE) duplicate the body of the loop prescribed number of times on
>  one of the edges entering loop header, thus performing either loop
>  unrolling or loop peeling.  @code{can_duplicate_loop_p}
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 2b31fafa3a3..f554ebb8450 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -520,14 +520,11 @@ unroll_loop_constant_iterations (class loop *loop)
>        if (exit_mod)
>  	{
>  	  opt_info_start_duplication (opt_info);
> -          ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
> -					      exit_mod,
> -					      wont_exit, desc->out_edge,
> -					      &remove_edges,
> -					      DLTHE_FLAG_UPDATE_FREQ
> -					      | (opt_info && exit_mod > 1
> -						 ? DLTHE_RECORD_COPY_NUMBER
> -						   : 0));
> +	  ok = duplicate_loop_body_to_header_edge (
> +	    loop, loop_preheader_edge (loop), exit_mod, wont_exit,
> +	    desc->out_edge, &remove_edges,
> +	    DLTHE_FLAG_UPDATE_FREQ
> +	      | (opt_info && exit_mod > 1 ? DLTHE_RECORD_COPY_NUMBER : 0));
>  	  gcc_assert (ok);
>  
>            if (opt_info && exit_mod > 1)
> @@ -569,7 +566,7 @@ unroll_loop_constant_iterations (class loop *loop)
>  	    bitmap_clear_bit (wont_exit, 1);
>  
>            opt_info_start_duplication (opt_info);
> -	  ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
> +	  ok = duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
>  					      exit_mod + 1,
>  					      wont_exit, desc->out_edge,
>  					      &remove_edges,
> @@ -606,7 +603,7 @@ unroll_loop_constant_iterations (class loop *loop)
>    /* Now unroll the loop.  */
>  
>    opt_info_start_duplication (opt_info);
> -  ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
> +  ok = duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
>  				      max_unroll,
>  				      wont_exit, desc->out_edge,
>  				      &remove_edges,
> @@ -975,7 +972,7 @@ unroll_loop_runtime_iterations (class loop *loop)
>        if (!desc->noloop_assumptions)
>  	bitmap_set_bit (wont_exit, 1);
>        ezc_swtch = loop_preheader_edge (loop)->src;
> -      ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
> +      ok = duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
>  					  1, wont_exit, desc->out_edge,
>  					  &remove_edges,
>  					  DLTHE_FLAG_UPDATE_FREQ);
> @@ -997,7 +994,7 @@ unroll_loop_runtime_iterations (class loop *loop)
>        bitmap_clear (wont_exit);
>        if (i != n_peel - 1 || !last_may_exit)
>  	bitmap_set_bit (wont_exit, 1);
> -      ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
> +      ok = duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
>  					  1, wont_exit, desc->out_edge,
>  					  &remove_edges,
>  					  DLTHE_FLAG_UPDATE_FREQ);
> @@ -1061,7 +1058,7 @@ unroll_loop_runtime_iterations (class loop *loop)
>    bitmap_clear_bit (wont_exit, may_exit_copy);
>    opt_info_start_duplication (opt_info);
>  
> -  ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
> +  ok = duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
>  				      max_unroll,
>  				      wont_exit, desc->out_edge,
>  				      &remove_edges,
> @@ -1255,7 +1252,7 @@ unroll_loop_stupid (class loop *loop)
>    bitmap_clear (wont_exit);
>    opt_info_start_duplication (opt_info);
>  
> -  ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),
> +  ok = duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
>  				      nunroll, wont_exit,
>  				      NULL, NULL,
>  				      DLTHE_FLAG_UPDATE_FREQ
> @@ -2019,7 +2016,7 @@ apply_opt_in_copies (struct opt_info *opt_info,
>        orig_bb = get_bb_original (bb);
>  
>        /* bb->aux holds position in copy sequence initialized by
> -	 duplicate_loop_to_header_edge.  */
> +	 duplicate_loop_body_to_header_edge.  */
>        delta = determine_split_iv_delta ((size_t)bb->aux, n_copies,
>  					unrolling);
>        bb->aux = 0;
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 9883eaaa9bf..d0177cb1c07 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -9072,7 +9072,7 @@ struct cfg_hooks gimple_cfg_hooks = {
>    gimple_flow_call_edges_add,   /* flow_call_edges_add */
>    gimple_execute_on_growing_pred,	/* execute_on_growing_pred */
>    gimple_execute_on_shrinking_pred, /* execute_on_shrinking_pred */
> -  gimple_duplicate_loop_to_header_edge, /* duplicate loop for trees */
> +  gimple_duplicate_loop_body_to_header_edge, /* duplicate loop for trees */
>    gimple_lv_add_condition_to_bb, /* lv_add_condition_to_bb */
>    gimple_lv_adjust_loop_header_phi, /* lv_adjust_loop_header_phi*/
>    extract_true_false_edges_from_block, /* extract_cond_bb_edges */
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8d8791f837e..a9107ef5ffb 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -903,7 +903,7 @@ try_unroll_loop_completely (class loop *loop,
>        if (may_be_zero)
>  	bitmap_clear_bit (wont_exit, 1);
>  
> -      if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
> +      if (!gimple_duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
>  						 n_unroll, wont_exit,
>  						 exit, &edges_to_remove,
>  						 DLTHE_FLAG_UPDATE_FREQ
> @@ -1094,7 +1094,7 @@ try_peel_loop (class loop *loop,
>      }
>    if (may_be_zero)
>      bitmap_clear_bit (wont_exit, 1);
> -  if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
> +  if (!gimple_duplicate_loop_body_to_header_edge (loop, loop_preheader_edge (loop),
>  					     npeel, wont_exit,
>  					     exit, &edges_to_remove,
>  					     DLTHE_FLAG_UPDATE_FREQ))
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 350e25bb8d2..b60da4fb084 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -911,7 +911,7 @@ copy_phi_node_args (unsigned first_new_block)
>  }
>  
>  
> -/* The same as cfgloopmanip.c:duplicate_loop_to_header_edge, but also
> +/* The same as cfgloopmanip.c:duplicate_loop_body_to_header_edge, but also
>     updates the PHI nodes at start of the copied region.  In order to
>     achieve this, only loops whose exits all lead to the same location
>     are handled.
> @@ -921,10 +921,10 @@ copy_phi_node_args (unsigned first_new_block)
>     after the loop has been duplicated.  */
>  
>  bool
> -gimple_duplicate_loop_to_header_edge (class loop *loop, edge e,
> -				    unsigned int ndupl, sbitmap wont_exit,
> -				    edge orig, vec<edge> *to_remove,
> -				    int flags)
> +gimple_duplicate_loop_body_to_header_edge (class loop *loop, edge e,
> +					   unsigned int ndupl,
> +					   sbitmap wont_exit, edge orig,
> +					   vec<edge> *to_remove, int flags)
>  {
>    unsigned first_new_block;
>  
> @@ -934,8 +934,8 @@ gimple_duplicate_loop_to_header_edge (class loop *loop, edge e,
>      return false;
>  
>    first_new_block = last_basic_block_for_fn (cfun);
> -  if (!duplicate_loop_to_header_edge (loop, e, ndupl, wont_exit,
> -				      orig, to_remove, flags))
> +  if (!duplicate_loop_body_to_header_edge (loop, e, ndupl, wont_exit, orig,
> +					   to_remove, flags))
>      return false;
>  
>    /* Readd the removed phi args for e.  */
> @@ -1390,9 +1390,11 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>    bitmap_clear_bit (wont_exit, factor - 1);
>  
>    auto_vec<edge> to_remove;
> -  bool ok = gimple_duplicate_loop_to_header_edge
> -	  (loop, loop_latch_edge (loop), factor - 1,
> -	   wont_exit, new_exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ);
> +  bool ok
> +    = gimple_duplicate_loop_body_to_header_edge (loop, loop_latch_edge (loop),
> +						 factor - 1, wont_exit,
> +						 new_exit, &to_remove,
> +						 DLTHE_FLAG_UPDATE_FREQ);
>    gcc_assert (ok);
>  
>    for (edge e : to_remove)
> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> index 86fc118b6be..792c366701b 100644
> --- a/gcc/tree-ssa-loop-manip.h
> +++ b/gcc/tree-ssa-loop-manip.h
> @@ -42,10 +42,9 @@ extern basic_block ip_end_pos (class loop *);
>  extern basic_block ip_normal_pos (class loop *);
>  extern void standard_iv_increment_position (class loop *,
>  					    gimple_stmt_iterator *, bool *);
> -extern bool gimple_duplicate_loop_to_header_edge (class loop *, edge,
> -						  unsigned int, sbitmap,
> -						  edge, vec<edge> *,
> -						  int);
> +extern bool
> +gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
> +					   sbitmap, edge, vec<edge> *, int);
>  extern bool can_unroll_loop_p (class loop *loop, unsigned factor,
>  			       class tree_niter_desc *niter);
>  extern gcov_type niter_for_unrolled_loop (class loop *, unsigned);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] Refactor loop_version
  2021-10-27  6:34 ` [PATCH v2 2/4] Refactor loop_version Xionghu Luo
@ 2021-10-29 11:52   ` Richard Biener
  2021-11-01  5:28     ` Xionghu Luo
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-10-29 11:52 UTC (permalink / raw)
  To: Xionghu Luo
  Cc: gcc-patches, segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka

On Wed, 27 Oct 2021, Xionghu Luo wrote:

> loop_version currently does lv_adjust_loop_entry_edge
> before it loopifys the copy inserted on the header.  This patch moves
> the condition generation later and thus we have four pieces to help
> understanding of how the adjustment works:
>  1) duplicating the loop on the entry edge.
>  2) loopify the duplicated new loop.
>  3) adjusting the CFG to insert a condition branching to either loop
>  with lv_adjust_loop_entry_edge.
>  4) From loopify extract the scale_loop_frequencies bits.
> 
> Also removed some piece of code seems obviously useless which is not
> completely sure:
>  - redirect_all_edges since it is false and loopify only called once.
>  - extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
>  edge is not redirected actually.

This is OK (you can also commit this independently), thanks for the
cleanup.

Richard.

> gcc/ChangeLog:
> 
> 	* cfgloopmanip.c (loop_version): Refactor loopify to
> 	loop_version.  Move condition generation after loopify.
> 	(loopify): Delete.
> 	* cfgloopmanip.h (loopify): Delete.
> ---
>  gcc/cfgloopmanip.c | 113 ++++++++++++---------------------------------
>  gcc/cfgloopmanip.h |   3 --
>  2 files changed, 29 insertions(+), 87 deletions(-)
> 
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 82c242dd720..a30ebe1cdb4 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -846,71 +846,6 @@ create_empty_loop_on_edge (edge entry_edge,
>    return loop;
>  }
>  
> -/* Make area between HEADER_EDGE and LATCH_EDGE a loop by connecting
> -   latch to header and update loop tree and dominators
> -   accordingly. Everything between them plus LATCH_EDGE destination must
> -   be dominated by HEADER_EDGE destination, and back-reachable from
> -   LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
> -   FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
> -   TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
> -   Returns the newly created loop.  Frequencies and counts in the new loop
> -   are scaled by FALSE_SCALE and in the old one by TRUE_SCALE.  */
> -
> -class loop *
> -loopify (edge latch_edge, edge header_edge,
> -	 basic_block switch_bb, edge true_edge, edge false_edge,
> -	 bool redirect_all_edges, profile_probability true_scale,
> -	 profile_probability false_scale)
> -{
> -  basic_block succ_bb = latch_edge->dest;
> -  basic_block pred_bb = header_edge->src;
> -  class loop *loop = alloc_loop ();
> -  class loop *outer = loop_outer (succ_bb->loop_father);
> -  profile_count cnt;
> -
> -  loop->header = header_edge->dest;
> -  loop->latch = latch_edge->src;
> -
> -  cnt = header_edge->count ();
> -
> -  /* Redirect edges.  */
> -  loop_redirect_edge (latch_edge, loop->header);
> -  loop_redirect_edge (true_edge, succ_bb);
> -
> -  /* During loop versioning, one of the switch_bb edge is already properly
> -     set. Do not redirect it again unless redirect_all_edges is true.  */
> -  if (redirect_all_edges)
> -    {
> -      loop_redirect_edge (header_edge, switch_bb);
> -      loop_redirect_edge (false_edge, loop->header);
> -
> -      /* Update dominators.  */
> -      set_immediate_dominator (CDI_DOMINATORS, switch_bb, pred_bb);
> -      set_immediate_dominator (CDI_DOMINATORS, loop->header, switch_bb);
> -    }
> -
> -  set_immediate_dominator (CDI_DOMINATORS, succ_bb, switch_bb);
> -
> -  /* Compute new loop.  */
> -  add_loop (loop, outer);
> -
> -  /* Add switch_bb to appropriate loop.  */
> -  if (switch_bb->loop_father)
> -    remove_bb_from_loops (switch_bb);
> -  add_bb_to_loop (switch_bb, outer);
> -
> -  /* Fix counts.  */
> -  if (redirect_all_edges)
> -    {
> -      switch_bb->count = cnt;
> -    }
> -  scale_loop_frequencies (loop, false_scale);
> -  scale_loop_frequencies (succ_bb->loop_father, true_scale);
> -  update_dominators_in_loop (loop);
> -
> -  return loop;
> -}
> -
>  /* Remove the latch edge of a LOOP and update loops to indicate that
>     the LOOP was removed.  After this function, original loop latch will
>     have no successor, which caller is expected to fix somehow.
> @@ -1681,7 +1616,7 @@ loop_version (class loop *loop,
>  	      bool place_after)
>  {
>    basic_block first_head, second_head;
> -  edge entry, latch_edge, true_edge, false_edge;
> +  edge entry, latch_edge;
>    int irred_flag;
>    class loop *nloop;
>    basic_block cond_bb;
> @@ -1694,7 +1629,7 @@ loop_version (class loop *loop,
>    /* Note down head of loop as first_head.  */
>    first_head = entry->dest;
>  
> -  /* Duplicate loop.  */
> +  /* 1) Duplicate loop on the entry edge.  */
>    if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
>  					       NULL, NULL, NULL, 0))
>      {
> @@ -1702,11 +1637,28 @@ loop_version (class loop *loop,
>        return NULL;
>      }
>  
> +  /* 2) loopify the duplicated new loop. */
> +  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
> +  nloop = alloc_loop ();
> +  class loop *outer = loop_outer (latch_edge->dest->loop_father);
> +  edge new_header_edge = single_pred_edge (get_bb_copy (loop->header));
> +  nloop->header = new_header_edge->dest;
> +  nloop->latch = latch_edge->src;
> +  loop_redirect_edge (latch_edge, nloop->header);
> +
> +  /* Compute new loop.  */
> +  add_loop (nloop, outer);
> +  copy_loop_info (loop, nloop);
> +  set_loop_copy (loop, nloop);
> +
> +  /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
> +  lv_flush_pending_stmts (latch_edge);
> +
>    /* After duplication entry edge now points to new loop head block.
>       Note down new head as second_head.  */
>    second_head = entry->dest;
>  
> -  /* Split loop entry edge and insert new block with cond expr.  */
> +  /* 3) Split loop entry edge and insert new block with cond expr.  */
>    cond_bb =  lv_adjust_loop_entry_edge (first_head, second_head,
>  					entry, cond_expr, then_prob, else_prob);
>    if (condition_bb)
> @@ -1718,24 +1670,17 @@ loop_version (class loop *loop,
>        return NULL;
>      }
>  
> -  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
> -
> -  extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
> -  nloop = loopify (latch_edge,
> -		   single_pred_edge (get_bb_copy (loop->header)),
> -		   cond_bb, true_edge, false_edge,
> -		   false /* Do not redirect all edges.  */,
> -		   then_scale, else_scale);
> -
> -  copy_loop_info (loop, nloop);
> -  set_loop_copy (loop, nloop);
> +  /* Add cond_bb to appropriate loop.  */
> +  if (cond_bb->loop_father)
> +    remove_bb_from_loops (cond_bb);
> +  add_bb_to_loop (cond_bb, outer);
>  
> -  /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
> -  lv_flush_pending_stmts (latch_edge);
> +  /* 4) Scale the original loop and new loop frequency.  */
> +  scale_loop_frequencies (loop, then_scale);
> +  scale_loop_frequencies (nloop, else_scale);
> +  update_dominators_in_loop (loop);
> +  update_dominators_in_loop (nloop);
>  
> -  /* loopify redirected condition_bb's succ edge. Update its PENDING_STMTS.  */
> -  extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
> -  lv_flush_pending_stmts (false_edge);
>    /* Adjust irreducible flag.  */
>    if (irred_flag)
>      {
> diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
> index 07d5f925b79..312a3b48d05 100644
> --- a/gcc/cfgloopmanip.h
> +++ b/gcc/cfgloopmanip.h
> @@ -42,9 +42,6 @@ extern void scale_loop_profile (class loop *, profile_probability, gcov_type);
>  extern edge create_empty_if_region_on_edge (edge, tree);
>  extern class loop *create_empty_loop_on_edge (edge, tree, tree, tree, tree,
>  					       tree *, tree *, class loop *);
> -extern class loop *loopify (edge, edge,
> -			     basic_block, edge, edge, bool,
> -			     profile_probability, profile_probability);
>  extern void unloop (class loop *, bool *, bitmap);
>  extern void copy_loop_info (class loop *loop, class loop *target);
>  extern class loop * duplicate_loop (class loop *, class loop *,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] Refactor loop_version
  2021-10-29 11:52   ` Richard Biener
@ 2021-11-01  5:28     ` Xionghu Luo
  0 siblings, 0 replies; 15+ messages in thread
From: Xionghu Luo @ 2021-11-01  5:28 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka



On 2021/10/29 19:52, Richard Biener wrote:
> On Wed, 27 Oct 2021, Xionghu Luo wrote:
> 
>> loop_version currently does lv_adjust_loop_entry_edge
>> before it loopifys the copy inserted on the header.  This patch moves
>> the condition generation later and thus we have four pieces to help
>> understanding of how the adjustment works:
>>  1) duplicating the loop on the entry edge.
>>  2) loopify the duplicated new loop.
>>  3) adjusting the CFG to insert a condition branching to either loop
>>  with lv_adjust_loop_entry_edge.
>>  4) From loopify extract the scale_loop_frequencies bits.
>>
>> Also removed some piece of code seems obviously useless which is not
>> completely sure:
>>  - redirect_all_edges since it is false and loopify only called once.
>>  - extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
>>  edge is not redirected actually.
> 
> This is OK (you can also commit this independently), thanks for the
> cleanup.

Thanks, committed this and [PATCH v2 4/4] to r12-4818 and r12-4819.

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge.
  2021-10-27  6:34 ` [PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge Xionghu Luo
@ 2021-11-03 13:36   ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2021-11-03 13:36 UTC (permalink / raw)
  To: Xionghu Luo
  Cc: gcc-patches, segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka

On Wed, 27 Oct 2021, Xionghu Luo wrote:

> Name loop_copy is used in gcc/cfg.c already.

But it's not actually cloning the loop to the header edge but doing
more - adding a condition and re-wiring edges so the cloned loop
is no longer on the header edge.

So no, I don't think this is good to go.  If we can make use
of the part of loop_version that does this we should instead
split out a new API but keep loop_version with its name but
using clone_loop_to_header_edge then.

But we can leave that when we have use of such API.  I originally
thought loop splitting might be able to use such API to make
the transform it does clearer - but whether that's the case is
left as an exercise to the reader.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 	* cfgloopmanip.c (force_single_succ_latches): Rename
> 	loop_version to clone_loop_to_header_edge.
> 	(lv_adjust_loop_entry_edge): Likewise.
> 	(loop_version): Likewise.
> 	(clone_loop_to_header_edge): Likewise.
> 	* cfgloopmanip.h (class loop): Likewise.
> 	(clone_loop_to_header_edge): Likewise.
> 	* gimple-loop-versioning.cc (loop_versioning::version_loop): Likewise.
> 	* modulo-sched.c (sms_schedule): Likewise.
> 	* tree-if-conv.c (version_loop_for_if_conversion): Likewise.
> 	* tree-loop-distribution.c (version_loop_by_alias_check): Likewise.
> 	* tree-parloops.c (gen_parallel_loop): Likewise.
> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Likewise.
> 	* tree-ssa-loop-split.c (split_loop): Likewise.
> 	(get_cond_branch_to_split_loop): Likewise.
> 	(do_split_loop_on_cond): Likewise.
> 	* tree-ssa-loop-unswitch.c (tree_unswitch_loop): Likewise.
> 	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
> ---
>  gcc/cfgloopmanip.c            | 22 +++++++++++-----------
>  gcc/cfgloopmanip.h            |  8 ++++----
>  gcc/gimple-loop-versioning.cc | 11 ++++++-----
>  gcc/modulo-sched.c            |  6 +++---
>  gcc/tree-if-conv.c            | 13 +++++++------
>  gcc/tree-loop-distribution.c  |  4 ++--
>  gcc/tree-parloops.c           | 10 +++++-----
>  gcc/tree-ssa-loop-manip.c     |  9 +++++----
>  gcc/tree-ssa-loop-split.c     | 30 +++++++++++++++---------------
>  gcc/tree-ssa-loop-unswitch.c  |  8 +++-----
>  gcc/tree-vect-loop-manip.c    |  5 +++--
>  11 files changed, 64 insertions(+), 62 deletions(-)
> 
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index a30ebe1cdb4..066fbddbcfe 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -1535,11 +1535,10 @@ force_single_succ_latches (void)
>    loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES);
>  }
>  
> -/* This function is called from loop_version.  It splits the entry edge
> -   of the loop we want to version, adds the versioning condition, and
> -   adjust the edges to the two versions of the loop appropriately.
> -   e is an incoming edge. Returns the basic block containing the
> -   condition.
> +/* This function is called from clone_loop_to_header_edge.  It splits the entry
> +  edge of the loop we want to version, adds the versioning condition, and adjust
> +  the edges to the two versions of the loop appropriately. e is an incoming
> +  edge. Returns the basic block containing the condition.
>  
>     --- edge e ---- > [second_head]
>  
> @@ -1588,7 +1587,7 @@ lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head,
>    return new_head;
>  }
>  
> -/* Main entry point for Loop Versioning transformation.
> +/* Main entry point for Loop copy transformation.
>  
>     This transformation given a condition and a loop, creates
>     -if (condition) { loop_copy1 } else { loop_copy2 },
> @@ -1609,11 +1608,12 @@ lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head,
>     instruction stream, otherwise it is placed before LOOP.  */
>  
>  class loop *
> -loop_version (class loop *loop,
> -	      void *cond_expr, basic_block *condition_bb,
> -	      profile_probability then_prob, profile_probability else_prob,
> -	      profile_probability then_scale, profile_probability else_scale,
> -	      bool place_after)
> +clone_loop_to_header_edge (class loop *loop, void *cond_expr,
> +			   basic_block *condition_bb,
> +			   profile_probability then_prob,
> +			   profile_probability else_prob,
> +			   profile_probability then_scale,
> +			   profile_probability else_scale, bool place_after)
>  {
>    basic_block first_head, second_head;
>    edge entry, latch_edge;
> diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
> index 312a3b48d05..eac09518702 100644
> --- a/gcc/cfgloopmanip.h
> +++ b/gcc/cfgloopmanip.h
> @@ -55,9 +55,9 @@ extern bool mfb_keep_just (edge);
>  basic_block create_preheader (class loop *, int);
>  extern void create_preheaders (int);
>  extern void force_single_succ_latches (void);
> -class loop * loop_version (class loop *, void *,
> -			    basic_block *,
> -			    profile_probability, profile_probability,
> -			    profile_probability, profile_probability, bool);
> +class loop *
> +clone_loop_to_header_edge (class loop *, void *, basic_block *,
> +			   profile_probability, profile_probability,
> +			   profile_probability, profile_probability, bool);
>  
>  #endif /* GCC_CFGLOOPMANIP_H */
> diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
> index 15e0803dc29..c0fa60b91f8 100644
> --- a/gcc/gimple-loop-versioning.cc
> +++ b/gcc/gimple-loop-versioning.cc
> @@ -1686,11 +1686,12 @@ loop_versioning::version_loop (class loop *loop)
>    /* Version the loop.  */
>    initialize_original_copy_tables ();
>    basic_block cond_bb;
> -  li.optimized_loop = loop_version (loop, cond, &cond_bb,
> -				    profile_probability::unlikely (),
> -				    profile_probability::likely (),
> -				    profile_probability::unlikely (),
> -				    profile_probability::likely (), true);
> +  li.optimized_loop
> +    = clone_loop_to_header_edge (loop, cond, &cond_bb,
> +				 profile_probability::unlikely (),
> +				 profile_probability::likely (),
> +				 profile_probability::unlikely (),
> +				 profile_probability::likely (), true);
>    free_original_copy_tables ();
>    if (!li.optimized_loop)
>      {
> diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
> index 1c1b459d34f..26cd76a279c 100644
> --- a/gcc/modulo-sched.c
> +++ b/gcc/modulo-sched.c
> @@ -1737,9 +1737,9 @@ sms_schedule (void)
>  	      profile_probability prob = profile_probability::guessed_always ()
>  				.apply_scale (PROB_SMS_ENOUGH_ITERATIONS, 100);
>  
> -	      loop_version (loop, comp_rtx, &condition_bb,
> -	  		    prob, prob.invert (),
> -			    prob, prob.invert (), true);
> +	      clone_loop_to_header_edge (loop, comp_rtx, &condition_bb, prob,
> +					 prob.invert (), prob, prob.invert (),
> +					 true);
>  	    }
>  
>  	  /* Now apply the scheduled kernel to the RTL of the loop.  */
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index d7b7b309309..893231a2d16 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -2871,7 +2871,8 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
>  				  integer_zero_node);
>    gimple_call_set_lhs (g, cond);
>  
> -  /* Save BB->aux around loop_version as that uses the same field.  */
> +  /* Save BB->aux around clone_loop_to_header_edge as that uses the same field.
> +   */
>    save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
>    void **saved_preds = XALLOCAVEC (void *, save_length);
>    for (unsigned i = 0; i < save_length; i++)
> @@ -2880,11 +2881,11 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
>    initialize_original_copy_tables ();
>    /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
>       is re-merged in the vectorizer.  */
> -  new_loop = loop_version (loop, cond, &cond_bb,
> -			   profile_probability::always (),
> -			   profile_probability::always (),
> -			   profile_probability::always (),
> -			   profile_probability::always (), true);
> +  new_loop = clone_loop_to_header_edge (loop, cond, &cond_bb,
> +					profile_probability::always (),
> +					profile_probability::always (),
> +					profile_probability::always (),
> +					profile_probability::always (), true);
>    free_original_copy_tables ();
>  
>    for (unsigned i = 0; i < save_length; i++)
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 2df762c8aa8..ae543fa022d 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -2705,8 +2705,8 @@ version_loop_by_alias_check (vec<struct partition *> *partitions,
>  
>    prob = profile_probability::guessed_always ().apply_scale (9, 10);
>    initialize_original_copy_tables ();
> -  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
> -			prob, prob.invert (), true);
> +  nloop = clone_loop_to_header_edge (loop, lhs, &cond_bb, prob, prob.invert (),
> +				     prob, prob.invert (), true);
>    free_original_copy_tables ();
>    /* Record the original loop number in newly generated loops.  In case of
>       distribution, the original loop will be distributed and the new loop
> diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
> index 5e64d5ed7a3..e19fc52f38b 100644
> --- a/gcc/tree-parloops.c
> +++ b/gcc/tree-parloops.c
> @@ -3083,11 +3083,11 @@ gen_parallel_loop (class loop *loop,
>        initialize_original_copy_tables ();
>  
>        /* We assume that the loop usually iterates a lot.  */
> -      loop_version (loop, many_iterations_cond, NULL,
> -		    profile_probability::likely (),
> -		    profile_probability::unlikely (),
> -		    profile_probability::likely (),
> -		    profile_probability::unlikely (), true);
> +      clone_loop_to_header_edge (loop, many_iterations_cond, NULL,
> +				 profile_probability::likely (),
> +				 profile_probability::unlikely (),
> +				 profile_probability::likely (),
> +				 profile_probability::unlikely (), true);
>        update_ssa (TODO_update_ssa);
>        free_original_copy_tables ();
>      }
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index c7a2f67b129..350e25bb8d2 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -1282,10 +1282,11 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>  	 frequencies in its body because of this change (scale the frequencies
>  	 of blocks before and after the exit by appropriate factors).  */
>        profile_probability scale_unrolled = prob_entry;
> -      new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
> -			       prob_entry.invert (), scale_unrolled,
> -			       profile_probability::guessed_always (),
> -			       true);
> +      new_loop
> +	= clone_loop_to_header_edge (loop, enter_main_cond, NULL, prob_entry,
> +				     prob_entry.invert (), scale_unrolled,
> +				     profile_probability::guessed_always (),
> +				     true);
>        gcc_assert (new_loop != NULL);
>        update_ssa (TODO_update_ssa);
>  
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index d30782888f3..5bc9c0f6443 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -586,12 +586,12 @@ split_loop (class loop *loop1)
>  	initialize_original_copy_tables ();
>  	basic_block cond_bb;
>  
> -	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -					   true_edge->probability,
> -					   true_edge->probability.invert (),
> -					   true_edge->probability,
> -					   true_edge->probability.invert (),
> -					   true);
> +	class loop *loop2
> +	  = clone_loop_to_header_edge (loop1, cond, &cond_bb,
> +				       true_edge->probability,
> +				       true_edge->probability.invert (),
> +				       true_edge->probability,
> +				       true_edge->probability.invert (), true);
>  	gcc_assert (loop2);
>  
>  	edge new_e = connect_loops (loop1, loop2);
> @@ -1465,9 +1465,9 @@ get_cond_branch_to_split_loop (struct loop *loop, gcond *cond)
>                               exits
>  
>     In the graph, loop1 represents the part derived from original one, and
> -   loop2 is duplicated using loop_version (), which corresponds to the part
> -   of original one being splitted out.  In original latch edge of loop1, we
> -   insert a new conditional statement duplicated from the semi-invariant cond,
> +   loop2 is duplicated using clone_loop_to_header_edge (), which corresponds to
> +   the part of original one being splitted out.  In original latch edge of loop1,
> +   we insert a new conditional statement duplicated from the semi-invariant cond,
>     and one of its branch goes back to loop1 header as a latch edge, and the
>     other branch goes to loop2 pre-header as an entry edge.  And also in loop2,
>     we abandon the variant branch of the conditional statement by setting a
> @@ -1489,12 +1489,12 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>  
>    initialize_original_copy_tables ();
>  
> -  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -				     invar_branch->probability.invert (),
> -				     invar_branch->probability,
> -				     invar_branch->probability.invert (),
> -				     invar_branch->probability,
> -				     true);
> +  struct loop *loop2
> +    = clone_loop_to_header_edge (loop1, boolean_true_node, NULL,
> +				 invar_branch->probability.invert (),
> +				 invar_branch->probability,
> +				 invar_branch->probability.invert (),
> +				 invar_branch->probability, true);
>    if (!loop2)
>      {
>        free_original_copy_tables ();
> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
> index fe4dacc0833..6beebe368a7 100644
> --- a/gcc/tree-ssa-loop-unswitch.c
> +++ b/gcc/tree-ssa-loop-unswitch.c
> @@ -488,11 +488,9 @@ tree_unswitch_loop (class loop *loop,
>  
>    extract_true_false_edges_from_block (unswitch_on, &edge_true, &edge_false);
>    prob_true = edge_true->probability;
> -  return loop_version (loop, unshare_expr (cond),
> -		       NULL, prob_true,
> -		       prob_true.invert (),
> -		       prob_true, prob_true.invert (),
> -		       false);
> +  return clone_loop_to_header_edge (loop, unshare_expr (cond), NULL, prob_true,
> +				    prob_true.invert (), prob_true,
> +				    prob_true.invert (), false);
>  }
>  
>  /* Unswitch outer loops by hoisting invariant guard on
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 4988c93fdb6..ebc013f195e 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -3558,8 +3558,9 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>  			 loop_to_version->num);
>  
>        initialize_original_copy_tables ();
> -      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
> -			    prob, prob.invert (), prob, prob.invert (), true);
> +      nloop = clone_loop_to_header_edge (loop_to_version, cond_expr,
> +					 &condition_bb, prob, prob.invert (),
> +					 prob, prob.invert (), true);
>        gcc_assert (nloop);
>        nloop = get_loop_copy (loop);
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] Fix loop split incorrect count and probability
  2021-10-27  7:44       ` Jan Hubicka
@ 2021-11-08  6:09         ` Xionghu Luo
  2021-11-24  5:11           ` Xionghu Luo
  0 siblings, 1 reply; 15+ messages in thread
From: Xionghu Luo @ 2021-11-08  6:09 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc



On 2021/10/27 15:44, Jan Hubicka wrote:
>> On Wed, 27 Oct 2021, Jan Hubicka wrote:
>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>> 	(do_split_loop_on_cond): Likewise.
>>>> ---
>>>>  gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
>>>>  1 file changed, 16 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>> index 3f6ad046623..d30782888f3 100644
>>>> --- a/gcc/tree-ssa-loop-split.c
>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>> @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
>>>>  					    stmts2);
>>>>  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>>>>  	if (!initial_true)
>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +
>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>  
>>>>  	/* Now version the loop, placing loop2 after loop1 connecting
>>>>  	   them, and fix up SSA form for that.  */
>>>> @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
>>>>  	basic_block cond_bb;
>>>>  
>>>>  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> +					   true_edge->probability,
>>>> +					   true_edge->probability.invert (),
>>>> +					   true_edge->probability,
>>>> +					   true_edge->probability.invert (),
>>>>  					   true);
>>>
>>> As discussed yesterday, for loop of form
>>>
>>> for (...)
>>>   if (cond)
>>>     cond = something();
>>>   else
>>>     something2
>>>
>>> Split as
>>
>> Note that you are missing to conditionalize loop1 execution
>> on 'cond' (not sure if that makes a difference).
> You are right - forgot to mention that.
> 
> Entry conditional makes no difference on scaling stmts inside loop but
> affects its header and expected trip count. We however need to set up
> probability of this conditional (and preheader count if it exists)
> There is no general way to read the probability of this initial
> conditional from cfg profile.  So I guess we are stuck with guessing
> some arbitrary value. I guess common case is that cond is true first
> iteration tough and often we can easily see that fromo PHI node
> initializing the test variable.
> 
> Other thing that changes is expected number of iterations of the split
> loops, so we may want to update the exit conditinal probability
> accordingly...
> 
Sorry for the late reply.  The below updated patch mainly solves the issues
you pointed out:
  - profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
  - probability update in the two loops and between the two loops;
  - number of iterations update/check for split_loop.


[PATCH v3] Fix loop split incorrect count and probability

In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
kind of split. split_loop only works for single loop and insert edge at
exit when split, while split_loop_on_cond is not limited to single loop
and insert edge at latch when split.  Both split behavior should consider
loop count and probability update.  For split_loop, loop split condition
is moved in front of loop1 and loop2; But split_loop_on_cond moves the
condition between loop1 and loop2, this patch does:
1) profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
2) probability update in and between the two loops;
3) number of iterations update for split_loop.

Regression tested pass, OK for master?

Changes diff for split_loop and split_loop_on_cond cases:

1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
...
   <bb 2> [local count: 118111600]:
   if (beg_5(D) < end_8(D))
     goto <bb 14>; [89.00%]
   else
     goto <bb 6>; [11.00%]

   <bb 14> [local count: 105119324]:
   if (beg2_6(D) < c_9(D))
-    goto <bb 15>; [100.00%]
+    goto <bb 15>; [33.00%]
   else
-    goto <bb 16>; [100.00%]
+    goto <bb 16>; [67.00%]

-  <bb 15> [local count: 105119324]:
+  <bb 15> [local count: 34689377]:
   _25 = beg_5(D) + 1;
   _26 = end_8(D) - beg_5(D);
   _27 = beg2_6(D) + _26;
   _28 = MIN_EXPR <c_9(D), _27>;

-  <bb 3> [local count: 955630225]:
+  <bb 3> [local count: 315357973]:
   # i_16 = PHI <i_11(8), beg_5(D)(15)>
   # j_17 = PHI <j_12(8), beg2_6(D)(15)>
   printf ("a: %d %d\n", i_16, j_17);
   i_11 = i_16 + 1;
   j_12 = j_17 + 1;
   if (j_12 < _28)
-    goto <bb 8>; [89.00%]
+    goto <bb 8>; [29.37%]
   else
-    goto <bb 17>; [11.00%]
+    goto <bb 17>; [70.63%]

-  <bb 8> [local count: 850510901]:
+  <bb 8> [local count: 280668596]:
   goto <bb 3>; [100.00%]

-  <bb 16> [local count: 105119324]:
+  <bb 16> [local count: 70429947]:
   # i_22 = PHI <beg_5(D)(14), i_29(17)>
   # j_23 = PHI <beg2_6(D)(14), j_30(17)>

   <bb 10> [local count: 955630225]:
   # i_2 = PHI <i_22(16), i_20(13)>
   # j_1 = PHI <j_23(16), j_21(13)>
   i_20 = i_2 + 1;
   j_21 = j_1 + 1;
   if (end_8(D) > i_20)
-    goto <bb 13>; [89.00%]
+    goto <bb 13>; [59.63%]
   else
-    goto <bb 9>; [11.00%]
+    goto <bb 9>; [40.37%]

-  <bb 13> [local count: 850510901]:
+  <bb 13> [local count: 569842305]:
   goto <bb 10>; [100.00%]

   <bb 17> [local count: 105119324]:
   # i_29 = PHI <i_11(3)>
   # j_30 = PHI <j_12(3)>
   if (end_8(D) > i_29)
     goto <bb 16>; [80.00%]
   else
     goto <bb 9>; [20.00%]

   <bb 9> [local count: 105119324]:

   <bb 6> [local count: 118111600]:
   return 0;

 }

2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

   <bb 6> [local count: 315357972]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
-    goto <bb 16>; [89.00%]
+    goto <bb 16>; [59.63%]
   else
-    goto <bb 11>; [11.00%]
+    goto <bb 11>; [40.37%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (split_loop): Fix incorrect
	profile_count and probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 110 +++++++++++++++++++++++++++++++++++---
 1 file changed, 102 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..102766241fb 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -575,7 +575,10 @@ split_loop (class loop *loop1)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge, false_edge;
+	extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -583,11 +586,11 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   true);
+					  true_edge->probability,
+					  true_edge->probability.invert (),
+					  profile_probability::always (),
+					  profile_probability::always (),
+					  true);
 	gcc_assert (loop2);
 
 	edge new_e = connect_loops (loop1, loop2);
@@ -607,6 +610,53 @@ split_loop (class loop *loop1)
 	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
 	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
 
+	/* Check first loop's number of iterations.  */
+	update_ssa (TODO_update_ssa);
+	gcc_assert (number_of_iterations_exit (loop1, single_exit (loop1),
+					       &niter, false, true));
+
+	/* Proportion first loop's bb counts except those dominated by true
+	   branch to avoid drop 1s down.  */
+	basic_block *bbs1, *bbs2;
+	bbs1 = get_loop_body (loop1);
+	unsigned j;
+	for (j = 0; j < loop1->num_nodes; j++)
+	  if (bbs1[j] == loop1->latch
+	      || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
+	    bbs1[j]->count
+	      = bbs1[j]->count.apply_probability (true_edge->probability);
+	free (bbs1);
+
+	/* Fix first loop's exit probability after scaling.  */
+	edge exit_to_latch1 = single_pred_edge (loop1->latch);
+	exit_to_latch1->probability = exit_to_latch1->probability.apply_scale (
+	  true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
+	single_exit (loop1)->probability
+	  = exit_to_latch1->probability.invert ();
+
+	/* Check second loop's number of iterations.  */
+	class tree_niter_desc niter2;
+	gcc_assert (number_of_iterations_exit (loop2, single_exit (loop2),
+					       &niter2, false, true));
+
+	/* Proportion second loop's bb counts except those dominated by false
+	   branch to avoid drop 1s down.  */
+	basic_block bbi_copy = get_bb_copy (false_edge->dest);
+	bbs2 = get_loop_body (loop2);
+	for (j = 0; j < loop2->num_nodes; j++)
+	  if (bbs2[j] == loop2->latch
+	      || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
+	    bbs2[j]->count = bbs2[j]->count.apply_probability (
+	      true_edge->probability.invert ());
+	free (bbs2);
+
+	/* Fix second loop's exit probability after scaling.  */
+	edge exit_to_latch2 = single_pred_edge (loop2->latch);
+	exit_to_latch2->probability = exit_to_latch2->probability.apply_scale (
+	  false_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
+	single_exit (loop2)->probability
+	  = exit_to_latch2->probability.invert ();
+
 	/* Finally patch out the two copies of the condition to be always
 	   true/false (or opposite).  */
 	gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
@@ -1486,8 +1536,8 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     profile_probability::always (),
 				     profile_probability::always (),
 				     true);
@@ -1535,6 +1585,50 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
      between loop1 and loop2.  */
   connect_loop_phis (loop1, loop2, to_loop2);
 
+  update_ssa (TODO_update_ssa);
+
+  edge true_edge, false_edge, skip_edge1, skip_edge2;
+  extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
+
+  /* Proportion first loop's bb counts except those dominated by true
+     branch to avoid drop 1s down.  */
+  skip_edge1 = true_invar ? false_edge : true_edge;
+  skip_edge2 = true_invar ? true_edge : false_edge;
+  basic_block *bbs1, *bbs2;
+  bbs1 = get_loop_body (loop1);
+  unsigned j;
+  for (j = 0; j < loop1->num_nodes; j++)
+    if (bbs1[j] == loop1->latch
+	|| !dominated_by_p (CDI_DOMINATORS, bbs1[j], skip_edge1->dest))
+      bbs1[j]->count
+	= bbs1[j]->count.apply_probability (skip_edge1->probability);
+  free (bbs1);
+
+  /* Fix first loop's exit probability after scaling.  */
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
+  /* Proportion second loop's bb counts except those dominated by false
+     branch to avoid drop 1s down.  */
+  basic_block bbi_copy = get_bb_copy (skip_edge2->dest);
+  bbs2 = get_loop_body (loop2);
+  for (j = 0; j < loop2->num_nodes; j++)
+    if (bbs2[j] == loop2->latch
+	|| !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
+      bbs2[j]->count
+	= bbs2[j]->count.apply_probability (skip_edge2->probability);
+  free (bbs2);
+
+  /* Fix second loop's exit probability after scaling.  */
+  edge loop2_latch_exit;
+  edge exit_to_latch2 = single_pred_edge (loop2->latch);
+  exit_to_latch2->probability = exit_to_latch2->probability.apply_scale (
+    skip_edge2->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
+  loop2_latch_exit = EDGE_SUCC (exit_to_latch2->src, 0) == exit_to_latch2
+		       ? EDGE_SUCC (exit_to_latch2->src, 1)
+		       : EDGE_SUCC (exit_to_latch2->src, 0);
+  loop2_latch_exit->probability = exit_to_latch2->probability.invert ();
+
   free_original_copy_tables ();
 
   return true;
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] Fix loop split incorrect count and probability
  2021-11-08  6:09         ` [PATCH v3 " Xionghu Luo
@ 2021-11-24  5:11           ` Xionghu Luo
  0 siblings, 0 replies; 15+ messages in thread
From: Xionghu Luo @ 2021-11-24  5:11 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher

Gentle ping, thanks.

[PATCH v3] Fix loop split incorrect count and probability

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583626.html


On 2021/11/8 14:09, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/10/27 15:44, Jan Hubicka wrote:
>>> On Wed, 27 Oct 2021, Jan Hubicka wrote:
>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>> 	(do_split_loop_on_cond): Likewise.
>>>>> ---
>>>>>  gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
>>>>>  1 file changed, 16 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>> index 3f6ad046623..d30782888f3 100644
>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>> @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
>>>>>  					    stmts2);
>>>>>  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>>>>>  	if (!initial_true)
>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>> +
>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>>  
>>>>>  	/* Now version the loop, placing loop2 after loop1 connecting
>>>>>  	   them, and fix up SSA form for that.  */
>>>>> @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
>>>>>  	basic_block cond_bb;
>>>>>  
>>>>>  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>> -					   profile_probability::always (),
>>>>> -					   profile_probability::always (),
>>>>> -					   profile_probability::always (),
>>>>> -					   profile_probability::always (),
>>>>> +					   true_edge->probability,
>>>>> +					   true_edge->probability.invert (),
>>>>> +					   true_edge->probability,
>>>>> +					   true_edge->probability.invert (),
>>>>>  					   true);
>>>>
>>>> As discussed yesterday, for loop of form
>>>>
>>>> for (...)
>>>>   if (cond)
>>>>     cond = something();
>>>>   else
>>>>     something2
>>>>
>>>> Split as
>>>
>>> Note that you are missing to conditionalize loop1 execution
>>> on 'cond' (not sure if that makes a difference).
>> You are right - forgot to mention that.
>>
>> Entry conditional makes no difference on scaling stmts inside loop but
>> affects its header and expected trip count. We however need to set up
>> probability of this conditional (and preheader count if it exists)
>> There is no general way to read the probability of this initial
>> conditional from cfg profile.  So I guess we are stuck with guessing
>> some arbitrary value. I guess common case is that cond is true first
>> iteration tough and often we can easily see that fromo PHI node
>> initializing the test variable.
>>
>> Other thing that changes is expected number of iterations of the split
>> loops, so we may want to update the exit conditinal probability
>> accordingly...
>>
> Sorry for the late reply.  The below updated patch mainly solves the issues
> you pointed out:
>   - profile count proportion for both original loop and copied loop
> without dropping down the true branch's count;
>   - probability update in the two loops and between the two loops;
>   - number of iterations update/check for split_loop.
> 
> 
> [PATCH v3] Fix loop split incorrect count and probability
> 
> In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
> kind of split. split_loop only works for single loop and insert edge at
> exit when split, while split_loop_on_cond is not limited to single loop
> and insert edge at latch when split.  Both split behavior should consider
> loop count and probability update.  For split_loop, loop split condition
> is moved in front of loop1 and loop2; But split_loop_on_cond moves the
> condition between loop1 and loop2, this patch does:
> 1) profile count proportion for both original loop and copied loop
> without dropping down the true branch's count;
> 2) probability update in and between the two loops;
> 3) number of iterations update for split_loop.
> 
> Regression tested pass, OK for master?
> 
> Changes diff for split_loop and split_loop_on_cond cases:
> 
> 1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
> ...
>    <bb 2> [local count: 118111600]:
>    if (beg_5(D) < end_8(D))
>      goto <bb 14>; [89.00%]
>    else
>      goto <bb 6>; [11.00%]
> 
>    <bb 14> [local count: 105119324]:
>    if (beg2_6(D) < c_9(D))
> -    goto <bb 15>; [100.00%]
> +    goto <bb 15>; [33.00%]
>    else
> -    goto <bb 16>; [100.00%]
> +    goto <bb 16>; [67.00%]
> 
> -  <bb 15> [local count: 105119324]:
> +  <bb 15> [local count: 34689377]:
>    _25 = beg_5(D) + 1;
>    _26 = end_8(D) - beg_5(D);
>    _27 = beg2_6(D) + _26;
>    _28 = MIN_EXPR <c_9(D), _27>;
> 
> -  <bb 3> [local count: 955630225]:
> +  <bb 3> [local count: 315357973]:
>    # i_16 = PHI <i_11(8), beg_5(D)(15)>
>    # j_17 = PHI <j_12(8), beg2_6(D)(15)>
>    printf ("a: %d %d\n", i_16, j_17);
>    i_11 = i_16 + 1;
>    j_12 = j_17 + 1;
>    if (j_12 < _28)
> -    goto <bb 8>; [89.00%]
> +    goto <bb 8>; [29.37%]
>    else
> -    goto <bb 17>; [11.00%]
> +    goto <bb 17>; [70.63%]
> 
> -  <bb 8> [local count: 850510901]:
> +  <bb 8> [local count: 280668596]:
>    goto <bb 3>; [100.00%]
> 
> -  <bb 16> [local count: 105119324]:
> +  <bb 16> [local count: 70429947]:
>    # i_22 = PHI <beg_5(D)(14), i_29(17)>
>    # j_23 = PHI <beg2_6(D)(14), j_30(17)>
> 
>    <bb 10> [local count: 955630225]:
>    # i_2 = PHI <i_22(16), i_20(13)>
>    # j_1 = PHI <j_23(16), j_21(13)>
>    i_20 = i_2 + 1;
>    j_21 = j_1 + 1;
>    if (end_8(D) > i_20)
> -    goto <bb 13>; [89.00%]
> +    goto <bb 13>; [59.63%]
>    else
> -    goto <bb 9>; [11.00%]
> +    goto <bb 9>; [40.37%]
> 
> -  <bb 13> [local count: 850510901]:
> +  <bb 13> [local count: 569842305]:
>    goto <bb 10>; [100.00%]
> 
>    <bb 17> [local count: 105119324]:
>    # i_29 = PHI <i_11(3)>
>    # j_30 = PHI <j_12(3)>
>    if (end_8(D) > i_29)
>      goto <bb 16>; [80.00%]
>    else
>      goto <bb 9>; [20.00%]
> 
>    <bb 9> [local count: 105119324]:
> 
>    <bb 6> [local count: 118111600]:
>    return 0;
> 
>  }
> 
> 2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> ...
>    <bb 2> [local count: 118111600]:
>    if (n_7(D) > 0)
>      goto <bb 4>; [89.00%]
>    else
>      goto <bb 3>; [11.00%]
> 
>    <bb 3> [local count: 118111600]:
>    return;
> 
>    <bb 4> [local count: 105119324]:
>    pretmp_3 = ga;
> 
> -  <bb 5> [local count: 955630225]:
> +  <bb 5> [local count: 315357973]:
>    # i_13 = PHI <i_10(20), 0(4)>
>    # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>    if (prephitmp_12 != 0)
>      goto <bb 6>; [33.00%]
>    else
>      goto <bb 7>; [67.00%]
> 
>    <bb 6> [local count: 315357972]:
>    _2 = do_something ();
>    ga = _2;
> 
> -  <bb 7> [local count: 955630225]:
> +  <bb 7> [local count: 315357973]:
>    # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>    i_10 = inc (i_13);
>    if (n_7(D) > i_10)
>      goto <bb 21>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
> 
>    <bb 11> [local count: 105119324]:
>    goto <bb 3>; [100.00%]
> 
> -  <bb 21> [local count: 850510901]:
> +  <bb 21> [local count: 280668596]:
>    if (prephitmp_12 != 0)
> -    goto <bb 20>; [100.00%]
> +    goto <bb 20>; [33.00%]
>    else
> -    goto <bb 19>; [INV]
> +    goto <bb 19>; [67.00%]
> 
> -  <bb 20> [local count: 850510901]:
> +  <bb 20> [local count: 280668596]:
>    goto <bb 5>; [100.00%]
> 
> -  <bb 19> [count: 0]:
> +  <bb 19> [local count: 70429947]:
>    # i_23 = PHI <i_10(21)>
>    # prephitmp_25 = PHI <prephitmp_5(21)>
> 
> -  <bb 12> [local count: 955630225]:
> +  <bb 12> [local count: 640272252]:
>    # i_15 = PHI <i_23(19), i_22(16)>
>    # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>    i_22 = inc (i_15);
>    if (n_7(D) > i_22)
> -    goto <bb 16>; [89.00%]
> +    goto <bb 16>; [59.63%]
>    else
> -    goto <bb 11>; [11.00%]
> +    goto <bb 11>; [40.37%]
> 
> -  <bb 16> [local count: 850510901]:
> +  <bb 16> [local count: 569842305]:
>    goto <bb 12>; [100.00%]
> 
>  }
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect
> 	profile_count and probability.
> 	(do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 110 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 102 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3f6ad046623..102766241fb 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -575,7 +575,10 @@ split_loop (class loop *loop1)
>  					    stmts2);
>  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>  	if (!initial_true)
> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +	edge true_edge, false_edge;
> +	extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
> 
>  	/* Now version the loop, placing loop2 after loop1 connecting
>  	   them, and fix up SSA form for that.  */
> @@ -583,11 +586,11 @@ split_loop (class loop *loop1)
>  	basic_block cond_bb;
> 
>  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   true);
> +					  true_edge->probability,
> +					  true_edge->probability.invert (),
> +					  profile_probability::always (),
> +					  profile_probability::always (),
> +					  true);
>  	gcc_assert (loop2);
> 
>  	edge new_e = connect_loops (loop1, loop2);
> @@ -607,6 +610,53 @@ split_loop (class loop *loop1)
>  	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>  	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
> 
> +	/* Check first loop's number of iterations.  */
> +	update_ssa (TODO_update_ssa);
> +	gcc_assert (number_of_iterations_exit (loop1, single_exit (loop1),
> +					       &niter, false, true));
> +
> +	/* Proportion first loop's bb counts except those dominated by true
> +	   branch to avoid drop 1s down.  */
> +	basic_block *bbs1, *bbs2;
> +	bbs1 = get_loop_body (loop1);
> +	unsigned j;
> +	for (j = 0; j < loop1->num_nodes; j++)
> +	  if (bbs1[j] == loop1->latch
> +	      || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
> +	    bbs1[j]->count
> +	      = bbs1[j]->count.apply_probability (true_edge->probability);
> +	free (bbs1);
> +
> +	/* Fix first loop's exit probability after scaling.  */
> +	edge exit_to_latch1 = single_pred_edge (loop1->latch);
> +	exit_to_latch1->probability = exit_to_latch1->probability.apply_scale (
> +	  true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
> +	single_exit (loop1)->probability
> +	  = exit_to_latch1->probability.invert ();
> +
> +	/* Check second loop's number of iterations.  */
> +	class tree_niter_desc niter2;
> +	gcc_assert (number_of_iterations_exit (loop2, single_exit (loop2),
> +					       &niter2, false, true));
> +
> +	/* Proportion second loop's bb counts except those dominated by false
> +	   branch to avoid drop 1s down.  */
> +	basic_block bbi_copy = get_bb_copy (false_edge->dest);
> +	bbs2 = get_loop_body (loop2);
> +	for (j = 0; j < loop2->num_nodes; j++)
> +	  if (bbs2[j] == loop2->latch
> +	      || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
> +	    bbs2[j]->count = bbs2[j]->count.apply_probability (
> +	      true_edge->probability.invert ());
> +	free (bbs2);
> +
> +	/* Fix second loop's exit probability after scaling.  */
> +	edge exit_to_latch2 = single_pred_edge (loop2->latch);
> +	exit_to_latch2->probability = exit_to_latch2->probability.apply_scale (
> +	  false_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
> +	single_exit (loop2)->probability
> +	  = exit_to_latch2->probability.invert ();
> +
>  	/* Finally patch out the two copies of the condition to be always
>  	   true/false (or opposite).  */
>  	gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
> @@ -1486,8 +1536,8 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>    initialize_original_copy_tables ();
> 
>    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -				     profile_probability::always (),
> -				     profile_probability::never (),
> +				     invar_branch->probability.invert (),
> +				     invar_branch->probability,
>  				     profile_probability::always (),
>  				     profile_probability::always (),
>  				     true);
> @@ -1535,6 +1585,50 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>       between loop1 and loop2.  */
>    connect_loop_phis (loop1, loop2, to_loop2);
> 
> +  update_ssa (TODO_update_ssa);
> +
> +  edge true_edge, false_edge, skip_edge1, skip_edge2;
> +  extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
> +
> +  /* Proportion first loop's bb counts except those dominated by true
> +     branch to avoid drop 1s down.  */
> +  skip_edge1 = true_invar ? false_edge : true_edge;
> +  skip_edge2 = true_invar ? true_edge : false_edge;
> +  basic_block *bbs1, *bbs2;
> +  bbs1 = get_loop_body (loop1);
> +  unsigned j;
> +  for (j = 0; j < loop1->num_nodes; j++)
> +    if (bbs1[j] == loop1->latch
> +	|| !dominated_by_p (CDI_DOMINATORS, bbs1[j], skip_edge1->dest))
> +      bbs1[j]->count
> +	= bbs1[j]->count.apply_probability (skip_edge1->probability);
> +  free (bbs1);
> +
> +  /* Fix first loop's exit probability after scaling.  */
> +  to_loop1->probability = invar_branch->probability.invert ();
> +  to_loop2->probability = invar_branch->probability;
> +
> +  /* Proportion second loop's bb counts except those dominated by false
> +     branch to avoid drop 1s down.  */
> +  basic_block bbi_copy = get_bb_copy (skip_edge2->dest);
> +  bbs2 = get_loop_body (loop2);
> +  for (j = 0; j < loop2->num_nodes; j++)
> +    if (bbs2[j] == loop2->latch
> +	|| !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
> +      bbs2[j]->count
> +	= bbs2[j]->count.apply_probability (skip_edge2->probability);
> +  free (bbs2);
> +
> +  /* Fix second loop's exit probability after scaling.  */
> +  edge loop2_latch_exit;
> +  edge exit_to_latch2 = single_pred_edge (loop2->latch);
> +  exit_to_latch2->probability = exit_to_latch2->probability.apply_scale (
> +    skip_edge2->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
> +  loop2_latch_exit = EDGE_SUCC (exit_to_latch2->src, 0) == exit_to_latch2
> +		       ? EDGE_SUCC (exit_to_latch2->src, 1)
> +		       : EDGE_SUCC (exit_to_latch2->src, 0);
> +  loop2_latch_exit->probability = exit_to_latch2->probability.invert ();
> +
>    free_original_copy_tables ();
> 
>    return true;
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-24  5:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27  6:34 [PATCH v2 0/4] loop split fix and functions renaming Xionghu Luo
2021-10-27  6:34 ` [PATCH v2 1/4] Fix loop split incorrect count and probability Xionghu Luo
2021-10-27  7:07   ` Jan Hubicka
2021-10-27  7:23     ` Jan Hubicka
2021-10-27  7:29     ` Richard Biener
2021-10-27  7:44       ` Jan Hubicka
2021-11-08  6:09         ` [PATCH v3 " Xionghu Luo
2021-11-24  5:11           ` Xionghu Luo
2021-10-27  6:34 ` [PATCH v2 2/4] Refactor loop_version Xionghu Luo
2021-10-29 11:52   ` Richard Biener
2021-11-01  5:28     ` Xionghu Luo
2021-10-27  6:34 ` [PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge Xionghu Luo
2021-11-03 13:36   ` Richard Biener
2021-10-27  6:34 ` [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge Xionghu Luo
2021-10-29 11:50   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).