public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Dependency patches for hoist LIM code to cold loop
@ 2021-12-08  5:54 Xionghu Luo
  2021-12-08  5:54 ` [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Xionghu Luo
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Xionghu Luo @ 2021-12-08  5:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka,
	richard.guenther, Xionghu Luo

This patchset is a recollect of previously sent patches.  Thanks
Richard that The "Don't move cold code out of loop by checking bb count"
is approved[1], but there are still 3 prerequesite patches to supplement
or avoid regression.

1) Patch [1/3] is the RTL part of not hoisting LIM code out of cold loop, it
could improve perlbench by 7.69% [2].
2) Patch [2/3] is a test case regression fix for pr103270.c, after enabling
gimple part of hoisting LIM code to coldest loop [1], the store
instruction in loop won't be moved out of inner loop, it is caused by a
jump-threading patch unexpectedly turning a hot inner loop to cold loop,
this patch could recover the inner loop to be hot[3].
3) As data showed in [2], besides improvement, there is also a small regression
on SPEC2017 544.nab_r (-1.55%).  After investigation, it turned out to be
the profile count and probability is not correctly adjusted in loop
split, with this patch [3/3], the only regression is also fixed. This version
slightly updates [4] to fix ICEs.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586319.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580109.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585195.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585290.html

Xionghu Luo (3):
  loop-invariant: Don't move cold bb instructions to preheader in RTL
  Fix incorrect loop exit edge probability [PR103270]
  Fix loop split incorrect count and probability

 gcc/loop-invariant.c            | 10 ++--
 gcc/predict.c                   | 10 ++--
 gcc/tree-ssa-loop-split.c       | 85 +++++++++++++++++++++++++++++----
 gcc/testsuite/gcc.dg/pr103270.c | 19 ++++++++
 4 files changed, 109 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr103270.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-08  5:54 [PATCH 0/3] Dependency patches for hoist LIM code to cold loop Xionghu Luo
@ 2021-12-08  5:54 ` Xionghu Luo
  2021-12-08 23:26   ` Jeff Law
  2021-12-13  9:14   ` Jan Hubicka
  2021-12-08  5:54 ` [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270] Xionghu Luo
  2021-12-08  5:54 ` [PATCH 3/3] Fix loop split incorrect count and probability Xionghu Luo
  2 siblings, 2 replies; 22+ messages in thread
From: Xionghu Luo @ 2021-12-08  5:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka,
	richard.guenther, Xionghu Luo

gcc/ChangeLog:

	* loop-invariant.c (find_invariants_bb): Check profile count
	before motion.
	(find_invariants_body): Add argument.
---
 gcc/loop-invariant.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 5eee2e5c9f8..c61c8612fae 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
    call.  */
 
 static void
-find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
+find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
+		    bool always_executed)
 {
   rtx_insn *insn;
+  basic_block preheader = loop_preheader_edge (loop)->src;
+
+  if (preheader->count > bb->count)
+    return;
 
   FOR_BB_INSNS (bb, insn)
     {
@@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
   unsigned i;
 
   for (i = 0; i < loop->num_nodes; i++)
-    find_invariants_bb (body[i],
-			bitmap_bit_p (always_reached, i),
+    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 			bitmap_bit_p (always_executed, i));
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-08  5:54 [PATCH 0/3] Dependency patches for hoist LIM code to cold loop Xionghu Luo
  2021-12-08  5:54 ` [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Xionghu Luo
@ 2021-12-08  5:54 ` Xionghu Luo
  2021-12-08 23:28   ` Jeff Law
  2021-12-13  9:25   ` Jan Hubicka
  2021-12-08  5:54 ` [PATCH 3/3] Fix loop split incorrect count and probability Xionghu Luo
  2 siblings, 2 replies; 22+ messages in thread
From: Xionghu Luo @ 2021-12-08  5:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka,
	richard.guenther, Xionghu Luo

r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
profile-estimate when predict_extra_loop_exits, outer loop's exit edge
is marked as inner loop's extra loop exit and set with incorrect
prediction, then a hot inner loop will become cold loop finally through
optimizations, this patch add loop check when searching extra exit edges
to avoid unexpected predict_edge from predict_paths_for_bb.

Regression tested on P8LE, OK for master?

gcc/ChangeLog:

	PR middle-end/103270
	* predict.c (predict_extra_loop_exits): Add loop parameter.
	(predict_loops): Call with loop argument.

gcc/testsuite/ChangeLog:

	PR middle-end/103270
	* gcc.dg/pr103270.c: New test.
---
 gcc/predict.c                   | 10 ++++++----
 gcc/testsuite/gcc.dg/pr103270.c | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr103270.c

diff --git a/gcc/predict.c b/gcc/predict.c
index 3cb4e3c0eb5..5b6e0cf722b 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
    exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
 
 static void
-predict_extra_loop_exits (edge exit_edge)
+predict_extra_loop_exits (class loop *loop, edge exit_edge)
 {
   unsigned i;
   bool check_value_one;
@@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
 	continue;
       if (EDGE_COUNT (e->src->succs) != 1)
 	{
-	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
+	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
+					 loop);
 	  continue;
 	}
 
       FOR_EACH_EDGE (e1, ei, e->src->preds)
-	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
+	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
+				       loop);
     }
 }
 
@@ -2008,7 +2010,7 @@ predict_loops (void)
 			 ex->src->index, ex->dest->index);
 	      continue;
 	    }
-	  predict_extra_loop_exits (ex);
+	  predict_extra_loop_exits (loop, ex);
 
 	  if (number_of_iterations_exit (loop, ex, &niter_desc, false, false))
 	    niter = niter_desc.niter;
diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
new file mode 100644
index 00000000000..819310e360e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr103270.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
+
+void test(int a, int* i)
+{
+  for (; a < 5; ++a)
+    {
+      int b = 0;
+      int c = 0;
+      for (; b != -11; b--)
+	for (int d = 0; d ==0; d++)
+	  {
+	    *i += c & a;
+	    c = b;
+	  }
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "extra loop exit heuristics of edge\[^:\]*:" "profile_estimate"} } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/3] Fix loop split incorrect count and probability
  2021-12-08  5:54 [PATCH 0/3] Dependency patches for hoist LIM code to cold loop Xionghu Luo
  2021-12-08  5:54 ` [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Xionghu Luo
  2021-12-08  5:54 ` [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270] Xionghu Luo
@ 2021-12-08  5:54 ` Xionghu Luo
  2021-12-08 23:47   ` Jeff Law
  2 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-08  5:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka,
	richard.guenther, Xionghu Luo

In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
kind of split. split_loop only works for single loop and insert edge at
exit when split, while split_loop_on_cond is not limited to single loop
and insert edge at latch when split.  Both split behavior should consider
loop count and probability update.  For split_loop, loop split condition
is moved in front of loop1 and loop2; But split_loop_on_cond moves the
condition between loop1 and loop2, this patch does:
 1) profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
 2) probability update in the two loops and between the two loops.

Regression tested pass, OK for master?

Changes diff for split_loop and split_loop_on_cond cases:

1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
...
   <bb 2> [local count: 118111600]:
   if (beg_5(D) < end_8(D))
     goto <bb 14>; [89.00%]
   else
     goto <bb 6>; [11.00%]

   <bb 14> [local count: 105119324]:
   if (beg2_6(D) < c_9(D))
-    goto <bb 15>; [100.00%]
+    goto <bb 15>; [33.00%]
   else
-    goto <bb 16>; [100.00%]
+    goto <bb 16>; [67.00%]

-  <bb 15> [local count: 105119324]:
+  <bb 15> [local count: 34689377]:
   _25 = beg_5(D) + 1;
   _26 = end_8(D) - beg_5(D);
   _27 = beg2_6(D) + _26;
   _28 = MIN_EXPR <c_9(D), _27>;

-  <bb 3> [local count: 955630225]:
+  <bb 3> [local count: 315357973]:
   # i_16 = PHI <i_11(8), beg_5(D)(15)>
   # j_17 = PHI <j_12(8), beg2_6(D)(15)>
   printf ("a: %d %d\n", i_16, j_17);
   i_11 = i_16 + 1;
   j_12 = j_17 + 1;
   if (j_12 < _28)
-    goto <bb 8>; [89.00%]
+    goto <bb 8>; [29.37%]
   else
-    goto <bb 17>; [11.00%]
+    goto <bb 17>; [70.63%]

-  <bb 8> [local count: 850510901]:
+  <bb 8> [local count: 280668596]:
   goto <bb 3>; [100.00%]

-  <bb 16> [local count: 105119324]:
+  <bb 16> [local count: 70429947]:
   # i_22 = PHI <beg_5(D)(14), i_29(17)>
   # j_23 = PHI <beg2_6(D)(14), j_30(17)>

   <bb 10> [local count: 955630225]:
   # i_2 = PHI <i_22(16), i_20(13)>
   # j_1 = PHI <j_23(16), j_21(13)>
   i_20 = i_2 + 1;
   j_21 = j_1 + 1;
   if (end_8(D) > i_20)
-    goto <bb 13>; [89.00%]
+    goto <bb 13>; [59.63%]
   else
-    goto <bb 9>; [11.00%]
+    goto <bb 9>; [40.37%]

-  <bb 13> [local count: 850510901]:
+  <bb 13> [local count: 569842305]:
   goto <bb 10>; [100.00%]

   <bb 17> [local count: 105119324]:
   # i_29 = PHI <i_11(3)>
   # j_30 = PHI <j_12(3)>
   if (end_8(D) > i_29)
     goto <bb 16>; [80.00%]
   else
     goto <bb 9>; [20.00%]

   <bb 9> [local count: 105119324]:

   <bb 6> [local count: 118111600]:
   return 0;

 }
   <bb 2> [local count: 118111600]:
-  if (beg_5(D) < end_8(D))
+  _1 = end_6(D) - beg_7(D);
+  j_9 = _1 + beg2_8(D);
+  if (end_6(D) > beg_7(D))
     goto <bb 14>; [89.00%]
   else
     goto <bb 6>; [11.00%]

   <bb 14> [local count: 105119324]:
-  if (beg2_6(D) < c_9(D))
-    goto <bb 15>; [100.00%]
+  if (j_9 >= c_11(D))
+    goto <bb 15>; [33.00%]
   else
-    goto <bb 16>; [100.00%]
+    goto <bb 16>; [67.00%]

-  <bb 15> [local count: 105119324]:
-  _25 = beg_5(D) + 1;
-  _26 = end_8(D) - beg_5(D);
-  _27 = beg2_6(D) + _26;
-  _28 = MIN_EXPR <c_9(D), _27>;
-
-  <bb 3> [local count: 955630225]:
-  # i_16 = PHI <i_11(8), beg_5(D)(15)>
-  # j_17 = PHI <j_12(8), beg2_6(D)(15)>
-  printf ("a: %d %d\n", i_16, j_17);
-  i_11 = i_16 + 1;
-  j_12 = j_17 + 1;
-  if (j_12 < _28)
-    goto <bb 8>; [89.00%]
+  <bb 15> [local count: 34689377]:
+  _27 = end_6(D) + -1;
+  _28 = beg_7(D) - end_6(D);
+  _29 = j_9 + _28;
+  _30 = _29 + 1;
+  _31 = MAX_EXPR <c_11(D), _30>;
+
+  <bb 3> [local count: 315357973]:
+  # i_18 = PHI <i_13(8), end_6(D)(15)>
+  # j_19 = PHI <j_14(8), j_9(15)>
+  printf ("a: %d %d\n", i_18, j_19);
+  i_13 = i_18 + -1;
+  j_14 = j_19 + -1;
+  if (j_14 >= _31)
+    goto <bb 8>; [29.37%]
   else
-    goto <bb 17>; [11.00%]
+    goto <bb 17>; [70.63%]

-  <bb 8> [local count: 850510901]:
+  <bb 8> [local count: 280668596]:
   goto <bb 3>; [100.00%]

-  <bb 16> [local count: 105119324]:
-  # i_22 = PHI <beg_5(D)(14), i_29(17)>
-  # j_23 = PHI <beg2_6(D)(14), j_30(17)>
+  <bb 16> [local count: 70429947]:
+  # i_24 = PHI <end_6(D)(14), i_32(17)>
+  # j_25 = PHI <j_9(14), j_33(17)>

   <bb 10> [local count: 955630225]:
-  # i_2 = PHI <i_22(16), i_20(13)>
-  # j_1 = PHI <j_23(16), j_21(13)>
-  i_20 = i_2 + 1;
-  j_21 = j_1 + 1;
-  if (end_8(D) > i_20)
+  # i_3 = PHI <i_24(16), i_22(13)>
+  # j_2 = PHI <j_25(16), j_23(13)>
+  i_22 = i_3 + -1;
+  j_23 = j_2 + -1;
+  if (beg_7(D) < i_22)
     goto <bb 13>; [89.00%]
   else
     goto <bb 9>; [11.00%]

-  <bb 13> [local count: 850510901]:
+  <bb 13> [local count: 569842305]:
   goto <bb 10>; [100.00%]

   <bb 17> [local count: 105119324]:
-  # i_29 = PHI <i_11(3)>
-  # j_30 = PHI <j_12(3)>
-  if (end_8(D) > i_29)
+  # i_32 = PHI <i_13(3)>
+  # j_33 = PHI <j_14(3)>
+  if (beg_7(D) < i_32)
     goto <bb 16>; [80.00%]
   else
     goto <bb 9>; [20.00%]

   <bb 9> [local count: 105119324]:

   <bb 6> [local count: 118111600]:
   return 0;

 }

2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

   <bb 6> [local count: 315357972]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (split_loop): Fix incorrect
	profile_count and probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 85 +++++++++++++++++++++++++++++++++++----
 1 file changed, 77 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..33128061aab 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -575,7 +575,10 @@ split_loop (class loop *loop1)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge, false_edge;
+	extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -583,11 +586,11 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   true);
+					  true_edge->probability,
+					  true_edge->probability.invert (),
+					  profile_probability::always (),
+					  profile_probability::always (),
+					  true);
 	gcc_assert (loop2);
 
 	edge new_e = connect_loops (loop1, loop2);
@@ -607,6 +610,38 @@ split_loop (class loop *loop1)
 	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
 	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
 
+	update_ssa (TODO_update_ssa);
+
+	/* Proportion first loop's bb counts except those dominated by true
+	   branch to avoid drop 1s down.  */
+	basic_block *bbs1, *bbs2;
+	bbs1 = get_loop_body (loop1);
+	unsigned j;
+	for (j = 0; j < loop1->num_nodes; j++)
+	  if (bbs1[j] == loop1->latch
+	      || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
+	    bbs1[j]->count
+	      = bbs1[j]->count.apply_probability (true_edge->probability);
+	free (bbs1);
+
+	/* Fix first loop's exit probability after scaling.  */
+	edge exit_to_latch1 = single_pred_edge (loop1->latch);
+	exit_to_latch1->probability = exit_to_latch1->probability.apply_scale (
+	  true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
+	single_exit (loop1)->probability
+	  = exit_to_latch1->probability.invert ();
+
+	/* Proportion second loop's bb counts except those dominated by false
+	   branch to avoid drop 1s down.  */
+	basic_block bbi_copy = get_bb_copy (false_edge->dest);
+	bbs2 = get_loop_body (loop2);
+	for (j = 0; j < loop2->num_nodes; j++)
+	  if (bbs2[j] == loop2->latch
+	      || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
+	    bbs2[j]->count = bbs2[j]->count.apply_probability (
+	      true_edge->probability.invert ());
+	free (bbs2);
+
 	/* Finally patch out the two copies of the condition to be always
 	   true/false (or opposite).  */
 	gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
@@ -1486,8 +1521,8 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     profile_probability::always (),
 				     profile_probability::always (),
 				     true);
@@ -1535,6 +1570,40 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
      between loop1 and loop2.  */
   connect_loop_phis (loop1, loop2, to_loop2);
 
+  update_ssa (TODO_update_ssa);
+
+  edge true_edge, false_edge, skip_edge1, skip_edge2;
+  extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
+
+  /* Proportion first loop's bb counts except those dominated by true
+     branch to avoid drop 1s down.  */
+  skip_edge1 = true_invar ? false_edge : true_edge;
+  skip_edge2 = true_invar ? true_edge : false_edge;
+  basic_block *bbs1, *bbs2;
+  bbs1 = get_loop_body (loop1);
+  unsigned j;
+  for (j = 0; j < loop1->num_nodes; j++)
+    if (bbs1[j] == loop1->latch
+	|| !dominated_by_p (CDI_DOMINATORS, bbs1[j], skip_edge1->dest))
+      bbs1[j]->count
+	= bbs1[j]->count.apply_probability (skip_edge1->probability);
+  free (bbs1);
+
+  /* Fix first loop's exit probability after scaling.  */
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
+  /* Proportion second loop's bb counts except those dominated by false
+     branch to avoid drop 1s down.  */
+  basic_block bbi_copy = get_bb_copy (skip_edge2->dest);
+  bbs2 = get_loop_body (loop2);
+  for (j = 0; j < loop2->num_nodes; j++)
+    if (bbs2[j] == loop2->latch
+	|| !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
+      bbs2[j]->count
+	= bbs2[j]->count.apply_probability (skip_edge2->probability);
+  free (bbs2);
+
   free_original_copy_tables ();
 
   return true;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-08  5:54 ` [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Xionghu Luo
@ 2021-12-08 23:26   ` Jeff Law
  2021-12-13  9:14   ` Jan Hubicka
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff Law @ 2021-12-08 23:26 UTC (permalink / raw)
  To: Xionghu Luo, gcc-patches; +Cc: segher, hubicka, wschmidt, linkw, dje.gcc



On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:
> gcc/ChangeLog:
>
> 	* loop-invariant.c (find_invariants_bb): Check profile count
> 	before motion.
> 	(find_invariants_body): Add argument.
OK
jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-08  5:54 ` [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270] Xionghu Luo
@ 2021-12-08 23:28   ` Jeff Law
  2021-12-13  9:25   ` Jan Hubicka
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff Law @ 2021-12-08 23:28 UTC (permalink / raw)
  To: Xionghu Luo, gcc-patches; +Cc: segher, hubicka, wschmidt, linkw, dje.gcc



On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:
> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> is marked as inner loop's extra loop exit and set with incorrect
> prediction, then a hot inner loop will become cold loop finally through
> optimizations, this patch add loop check when searching extra exit edges
> to avoid unexpected predict_edge from predict_paths_for_bb.
>
> Regression tested on P8LE, OK for master?
>
> gcc/ChangeLog:
>
> 	PR middle-end/103270
> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
> 	(predict_loops): Call with loop argument.
>
> gcc/testsuite/ChangeLog:
>
> 	PR middle-end/103270
> 	* gcc.dg/pr103270.c: New test.
OK
jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] Fix loop split incorrect count and probability
  2021-12-08  5:54 ` [PATCH 3/3] Fix loop split incorrect count and probability Xionghu Luo
@ 2021-12-08 23:47   ` Jeff Law
  2021-12-13  8:57     ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2021-12-08 23:47 UTC (permalink / raw)
  To: Xionghu Luo, gcc-patches; +Cc: segher, hubicka, wschmidt, linkw, dje.gcc



On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:
> In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
> kind of split. split_loop only works for single loop and insert edge at
> exit when split, while split_loop_on_cond is not limited to single loop
> and insert edge at latch when split.  Both split behavior should consider
> loop count and probability update.  For split_loop, loop split condition
> is moved in front of loop1 and loop2; But split_loop_on_cond moves the
> condition between loop1 and loop2, this patch does:
>   1) profile count proportion for both original loop and copied loop
> without dropping down the true branch's count;
>   2) probability update in the two loops and between the two loops.
>
> Regression tested pass, OK for master?
>
> Changes diff for split_loop and split_loop_on_cond cases:
>
> 1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
> ...
>     <bb 2> [local count: 118111600]:
>     if (beg_5(D) < end_8(D))
>       goto <bb 14>; [89.00%]
>     else
>       goto <bb 6>; [11.00%]
>
>     <bb 14> [local count: 105119324]:
>     if (beg2_6(D) < c_9(D))
> -    goto <bb 15>; [100.00%]
> +    goto <bb 15>; [33.00%]
>     else
> -    goto <bb 16>; [100.00%]
> +    goto <bb 16>; [67.00%]
>
> -  <bb 15> [local count: 105119324]:
> +  <bb 15> [local count: 34689377]:
>     _25 = beg_5(D) + 1;
>     _26 = end_8(D) - beg_5(D);
>     _27 = beg2_6(D) + _26;
>     _28 = MIN_EXPR <c_9(D), _27>;
>
> -  <bb 3> [local count: 955630225]:
> +  <bb 3> [local count: 315357973]:
>     # i_16 = PHI <i_11(8), beg_5(D)(15)>
>     # j_17 = PHI <j_12(8), beg2_6(D)(15)>
>     printf ("a: %d %d\n", i_16, j_17);
>     i_11 = i_16 + 1;
>     j_12 = j_17 + 1;
>     if (j_12 < _28)
> -    goto <bb 8>; [89.00%]
> +    goto <bb 8>; [29.37%]
>     else
> -    goto <bb 17>; [11.00%]
> +    goto <bb 17>; [70.63%]
>
> -  <bb 8> [local count: 850510901]:
> +  <bb 8> [local count: 280668596]:
>     goto <bb 3>; [100.00%]
>
> -  <bb 16> [local count: 105119324]:
> +  <bb 16> [local count: 70429947]:
>     # i_22 = PHI <beg_5(D)(14), i_29(17)>
>     # j_23 = PHI <beg2_6(D)(14), j_30(17)>
>
>     <bb 10> [local count: 955630225]:
>     # i_2 = PHI <i_22(16), i_20(13)>
>     # j_1 = PHI <j_23(16), j_21(13)>
>     i_20 = i_2 + 1;
>     j_21 = j_1 + 1;
>     if (end_8(D) > i_20)
> -    goto <bb 13>; [89.00%]
> +    goto <bb 13>; [59.63%]
>     else
> -    goto <bb 9>; [11.00%]
> +    goto <bb 9>; [40.37%]
>
> -  <bb 13> [local count: 850510901]:
> +  <bb 13> [local count: 569842305]:
>     goto <bb 10>; [100.00%]
>
>     <bb 17> [local count: 105119324]:
>     # i_29 = PHI <i_11(3)>
>     # j_30 = PHI <j_12(3)>
>     if (end_8(D) > i_29)
>       goto <bb 16>; [80.00%]
>     else
>       goto <bb 9>; [20.00%]
>
>     <bb 9> [local count: 105119324]:
>
>     <bb 6> [local count: 118111600]:
>     return 0;
>
>   }
>     <bb 2> [local count: 118111600]:
> -  if (beg_5(D) < end_8(D))
> +  _1 = end_6(D) - beg_7(D);
> +  j_9 = _1 + beg2_8(D);
> +  if (end_6(D) > beg_7(D))
>       goto <bb 14>; [89.00%]
>     else
>       goto <bb 6>; [11.00%]
>
>     <bb 14> [local count: 105119324]:
> -  if (beg2_6(D) < c_9(D))
> -    goto <bb 15>; [100.00%]
> +  if (j_9 >= c_11(D))
> +    goto <bb 15>; [33.00%]
>     else
> -    goto <bb 16>; [100.00%]
> +    goto <bb 16>; [67.00%]
>
> -  <bb 15> [local count: 105119324]:
> -  _25 = beg_5(D) + 1;
> -  _26 = end_8(D) - beg_5(D);
> -  _27 = beg2_6(D) + _26;
> -  _28 = MIN_EXPR <c_9(D), _27>;
> -
> -  <bb 3> [local count: 955630225]:
> -  # i_16 = PHI <i_11(8), beg_5(D)(15)>
> -  # j_17 = PHI <j_12(8), beg2_6(D)(15)>
> -  printf ("a: %d %d\n", i_16, j_17);
> -  i_11 = i_16 + 1;
> -  j_12 = j_17 + 1;
> -  if (j_12 < _28)
> -    goto <bb 8>; [89.00%]
> +  <bb 15> [local count: 34689377]:
> +  _27 = end_6(D) + -1;
> +  _28 = beg_7(D) - end_6(D);
> +  _29 = j_9 + _28;
> +  _30 = _29 + 1;
> +  _31 = MAX_EXPR <c_11(D), _30>;
> +
> +  <bb 3> [local count: 315357973]:
> +  # i_18 = PHI <i_13(8), end_6(D)(15)>
> +  # j_19 = PHI <j_14(8), j_9(15)>
> +  printf ("a: %d %d\n", i_18, j_19);
> +  i_13 = i_18 + -1;
> +  j_14 = j_19 + -1;
> +  if (j_14 >= _31)
> +    goto <bb 8>; [29.37%]
>     else
> -    goto <bb 17>; [11.00%]
> +    goto <bb 17>; [70.63%]
>
> -  <bb 8> [local count: 850510901]:
> +  <bb 8> [local count: 280668596]:
>     goto <bb 3>; [100.00%]
>
> -  <bb 16> [local count: 105119324]:
> -  # i_22 = PHI <beg_5(D)(14), i_29(17)>
> -  # j_23 = PHI <beg2_6(D)(14), j_30(17)>
> +  <bb 16> [local count: 70429947]:
> +  # i_24 = PHI <end_6(D)(14), i_32(17)>
> +  # j_25 = PHI <j_9(14), j_33(17)>
>
>     <bb 10> [local count: 955630225]:
> -  # i_2 = PHI <i_22(16), i_20(13)>
> -  # j_1 = PHI <j_23(16), j_21(13)>
> -  i_20 = i_2 + 1;
> -  j_21 = j_1 + 1;
> -  if (end_8(D) > i_20)
> +  # i_3 = PHI <i_24(16), i_22(13)>
> +  # j_2 = PHI <j_25(16), j_23(13)>
> +  i_22 = i_3 + -1;
> +  j_23 = j_2 + -1;
> +  if (beg_7(D) < i_22)
>       goto <bb 13>; [89.00%]
>     else
>       goto <bb 9>; [11.00%]
>
> -  <bb 13> [local count: 850510901]:
> +  <bb 13> [local count: 569842305]:
>     goto <bb 10>; [100.00%]
>
>     <bb 17> [local count: 105119324]:
> -  # i_29 = PHI <i_11(3)>
> -  # j_30 = PHI <j_12(3)>
> -  if (end_8(D) > i_29)
> +  # i_32 = PHI <i_13(3)>
> +  # j_33 = PHI <j_14(3)>
> +  if (beg_7(D) < i_32)
>       goto <bb 16>; [80.00%]
>     else
>       goto <bb 9>; [20.00%]
>
>     <bb 9> [local count: 105119324]:
>
>     <bb 6> [local count: 118111600]:
>     return 0;
>
>   }
>
> 2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> ...
>     <bb 2> [local count: 118111600]:
>     if (n_7(D) > 0)
>       goto <bb 4>; [89.00%]
>     else
>       goto <bb 3>; [11.00%]
>
>     <bb 3> [local count: 118111600]:
>     return;
>
>     <bb 4> [local count: 105119324]:
>     pretmp_3 = ga;
>
> -  <bb 5> [local count: 955630225]:
> +  <bb 5> [local count: 315357973]:
>     # i_13 = PHI <i_10(20), 0(4)>
>     # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>     if (prephitmp_12 != 0)
>       goto <bb 6>; [33.00%]
>     else
>       goto <bb 7>; [67.00%]
>
>     <bb 6> [local count: 315357972]:
>     _2 = do_something ();
>     ga = _2;
>
> -  <bb 7> [local count: 955630225]:
> +  <bb 7> [local count: 315357973]:
>     # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>     i_10 = inc (i_13);
>     if (n_7(D) > i_10)
>       goto <bb 21>; [89.00%]
>     else
>       goto <bb 11>; [11.00%]
>
>     <bb 11> [local count: 105119324]:
>     goto <bb 3>; [100.00%]
>
> -  <bb 21> [local count: 850510901]:
> +  <bb 21> [local count: 280668596]:
>     if (prephitmp_12 != 0)
> -    goto <bb 20>; [100.00%]
> +    goto <bb 20>; [33.00%]
>     else
> -    goto <bb 19>; [INV]
> +    goto <bb 19>; [67.00%]
>
> -  <bb 20> [local count: 850510901]:
> +  <bb 20> [local count: 280668596]:
>     goto <bb 5>; [100.00%]
>
> -  <bb 19> [count: 0]:
> +  <bb 19> [local count: 70429947]:
>     # i_23 = PHI <i_10(21)>
>     # prephitmp_25 = PHI <prephitmp_5(21)>
>
> -  <bb 12> [local count: 955630225]:
> +  <bb 12> [local count: 640272252]:
>     # i_15 = PHI <i_23(19), i_22(16)>
>     # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>     i_22 = inc (i_15);
>     if (n_7(D) > i_22)
>       goto <bb 16>; [89.00%]
>     else
>       goto <bb 11>; [11.00%]
>
> -  <bb 16> [local count: 850510901]:
> +  <bb 16> [local count: 569842305]:
>     goto <bb 12>; [100.00%]
>
>   }
>
> gcc/ChangeLog:
>
> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect
> 	profile_count and probability.
> 	(do_split_loop_on_cond): Likewise.
> ---
>   gcc/tree-ssa-loop-split.c | 85 +++++++++++++++++++++++++++++++++++----
>   1 file changed, 77 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3f6ad046623..33128061aab 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
>
> @@ -607,6 +610,38 @@ split_loop (class loop *loop1)
>   	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>   	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>   
> +	update_ssa (TODO_update_ssa);
> +
> +	/* Proportion first loop's bb counts except those dominated by true
> +	   branch to avoid drop 1s down.  */
> +	basic_block *bbs1, *bbs2;
> +	bbs1 = get_loop_body (loop1);
> +	unsigned j;
> +	for (j = 0; j < loop1->num_nodes; j++)
> +	  if (bbs1[j] == loop1->latch
> +	      || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
> +	    bbs1[j]->count
> +	      = bbs1[j]->count.apply_probability (true_edge->probability);
> +	free (bbs1);
It looks like there's two copies of this code in this patch, one in 
split_loop and the other in do_split_loop_on_cond.  Would it make sense 
to factor it out into its own little function?


> +
> +	/* Proportion second loop's bb counts except those dominated by false
> +	   branch to avoid drop 1s down.  */
> +	basic_block bbi_copy = get_bb_copy (false_edge->dest);
> +	bbs2 = get_loop_body (loop2);
> +	for (j = 0; j < loop2->num_nodes; j++)
> +	  if (bbs2[j] == loop2->latch
> +	      || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
> +	    bbs2[j]->count = bbs2[j]->count.apply_probability (
> +	      true_edge->probability.invert ());
> +	free (bbs2);
Similarly for this block of code.

If those can be reasonably factored out into two helper functions to be 
called from split_loop and do_split_loop_on_cond, then this is OK with 
the refactoring.

jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] Fix loop split incorrect count and probability
  2021-12-08 23:47   ` Jeff Law
@ 2021-12-13  8:57     ` Xionghu Luo
  2021-12-21  3:57       ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-13  8:57 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: segher, hubicka, wschmidt, linkw, dje.gcc



On 2021/12/9 07:47, Jeff Law wrote:
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index 3f6ad046623..33128061aab 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>>
>> @@ -607,6 +610,38 @@ split_loop (class loop *loop1)
>>       tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge
>> (loop1));
>>       patch_loop_exit (loop1, guard_stmt, guard_next, newend,
>> initial_true);
>>   +    update_ssa (TODO_update_ssa);
>> +
>> +    /* Proportion first loop's bb counts except those dominated by true
>> +       branch to avoid drop 1s down.  */
>> +    basic_block *bbs1, *bbs2;
>> +    bbs1 = get_loop_body (loop1);
>> +    unsigned j;
>> +    for (j = 0; j < loop1->num_nodes; j++)
>> +      if (bbs1[j] == loop1->latch
>> +          || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
>> +        bbs1[j]->count
>> +          = bbs1[j]->count.apply_probability (true_edge->probability);
>> +    free (bbs1);
> It looks like there's two copies of this code in this patch, one in
> split_loop and the other in do_split_loop_on_cond.  Would it make sense
> to factor it out into its own little function?
> 
> 
>> +
>> +    /* Proportion second loop's bb counts except those dominated by
>> false
>> +       branch to avoid drop 1s down.  */
>> +    basic_block bbi_copy = get_bb_copy (false_edge->dest);
>> +    bbs2 = get_loop_body (loop2);
>> +    for (j = 0; j < loop2->num_nodes; j++)
>> +      if (bbs2[j] == loop2->latch
>> +          || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
>> +        bbs2[j]->count = bbs2[j]->count.apply_probability (
>> +          true_edge->probability.invert ());
>> +    free (bbs2);
> Similarly for this block of code.
> 
> If those can be reasonably factored out into two helper functions to be
> called from split_loop and do_split_loop_on_cond, then this is OK with
> the refactoring.
> 
> jeff


Thanks for the comments, updated as below.  Will commit this patchset and the
approved patch for LIM if there are no objections:


[PATCH v2 3/3] Fix loop split incorrect count and probability

In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
kind of split. split_loop only works for single loop and insert edge at
exit when split, while split_loop_on_cond is not limited to single loop
and insert edge at latch when split.  Both split behavior should consider
loop count and probability update.  For split_loop, loop split condition
is moved in front of loop1 and loop2; But split_loop_on_cond moves the
condition between loop1 and loop2, this patch does:
 1) profile count proportion for both original loop and copied loop
without dropping down the true branch's count;
 2) probability update in the two loops and between the two loops.

Regression tested pass, OK for master?

Changes diff for split_loop and split_loop_on_cond cases:

1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
...
   <bb 2> [local count: 118111600]:
   _1 = end_6(D) - beg_7(D);
   j_9 = _1 + beg2_8(D);
   if (end_6(D) > beg_7(D))
     goto <bb 14>; [89.00%]
   else
     goto <bb 6>; [11.00%]

   <bb 14> [local count: 105119324]:
   if (j_9 >= c_11(D))
-    goto <bb 15>; [100.00%]
+    goto <bb 15>; [33.00%]
   else
-    goto <bb 16>; [100.00%]
+    goto <bb 16>; [67.00%]

-  <bb 15> [local count: 105119324]:
+  <bb 15> [local count: 34689377]:
   _27 = end_6(D) + -1;
   _28 = beg_7(D) - end_6(D);
   _29 = j_9 + _28;
   _30 = _29 + 1;
   _31 = MAX_EXPR <c_11(D), _30>;

-  <bb 3> [local count: 955630225]:
+  <bb 3> [local count: 315357973]:
   # i_18 = PHI <i_13(8), end_6(D)(15)>
   # j_19 = PHI <j_14(8), j_9(15)>
   printf ("a: %d %d\n", i_18, j_19);
   i_13 = i_18 + -1;
   j_14 = j_19 + -1;
   if (j_14 >= _31)
-    goto <bb 8>; [89.00%]
+    goto <bb 8>; [29.37%]
   else
-    goto <bb 17>; [11.00%]
+    goto <bb 17>; [70.63%]

-  <bb 8> [local count: 850510901]:
+  <bb 8> [local count: 280668596]:
   goto <bb 3>; [100.00%]

-  <bb 16> [local count: 105119324]:
+  <bb 16> [local count: 70429947]:
   # i_24 = PHI <end_6(D)(14), i_32(17)>
   # j_25 = PHI <j_9(14), j_33(17)>

   <bb 10> [local count: 955630225]:
   # i_3 = PHI <i_24(16), i_22(13)>
   # j_2 = PHI <j_25(16), j_23(13)>
   i_22 = i_3 + -1;
   j_23 = j_2 + -1;
   if (beg_7(D) < i_22)
     goto <bb 13>; [89.00%]
   else
     goto <bb 9>; [11.00%]

-  <bb 13> [local count: 850510901]:
+  <bb 13> [local count: 569842305]:
   goto <bb 10>; [100.00%]

   <bb 17> [local count: 105119324]:
   # i_32 = PHI <i_13(3)>
   # j_33 = PHI <j_14(3)>
   if (beg_7(D) < i_32)
     goto <bb 16>; [80.00%]
   else
     goto <bb 9>; [20.00%]

   <bb 9> [local count: 105119324]:

   <bb 6> [local count: 118111600]:
   return 0;

 }

2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

   <bb 6> [local count: 315357972]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]
 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (Fix_loop_bb_probability): New function.
	(split_loop): Fix incorrect profile_count and probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 72 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 64 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..55011aeed97 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -484,6 +484,39 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
   return newend;
 }
 
+/* Fix the two loop's bb count after split based on the split edge probability,
+   don't adjust the bbs dominated by true branches of that loop to avoid
+   dropping 1s down.  */
+static void
+fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
+			 edge false_edge)
+{
+  update_ssa (TODO_update_ssa);
+
+  /* Proportion first loop's bb counts except those dominated by true
+     branch to avoid drop 1s down.  */
+  basic_block *bbs1, *bbs2;
+  bbs1 = get_loop_body (loop1);
+  unsigned j;
+  for (j = 0; j < loop1->num_nodes; j++)
+    if (bbs1[j] == loop1->latch
+	|| !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
+      bbs1[j]->count
+	= bbs1[j]->count.apply_probability (true_edge->probability);
+  free (bbs1);
+
+  /* Proportion second loop's bb counts except those dominated by false
+     branch to avoid drop 1s down.  */
+  basic_block bbi_copy = get_bb_copy (false_edge->dest);
+  bbs2 = get_loop_body (loop2);
+  for (j = 0; j < loop2->num_nodes; j++)
+    if (bbs2[j] == loop2->latch
+	|| !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
+      bbs2[j]->count
+	= bbs2[j]->count.apply_probability (true_edge->probability.invert ());
+  free (bbs2);
+}
+
 /* Checks if LOOP contains an conditional block whose condition
    depends on which side in the iteration space it is, and if so
    splits the iteration space into two loops.  Returns true if the
@@ -575,7 +608,10 @@ split_loop (class loop *loop1)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge, false_edge;
+	extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -583,11 +619,11 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   true);
+					  true_edge->probability,
+					  true_edge->probability.invert (),
+					  profile_probability::always (),
+					  profile_probability::always (),
+					  true);
 	gcc_assert (loop2);
 
 	edge new_e = connect_loops (loop1, loop2);
@@ -607,6 +643,15 @@ split_loop (class loop *loop1)
 	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
 	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
 
+	fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
+
+	/* Fix first loop's exit probability after scaling.  */
+	edge exit_to_latch1 = single_pred_edge (loop1->latch);
+	exit_to_latch1->probability = exit_to_latch1->probability.apply_scale (
+	  true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
+	single_exit (loop1)->probability
+	  = exit_to_latch1->probability.invert ();
+
 	/* Finally patch out the two copies of the condition to be always
 	   true/false (or opposite).  */
 	gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
@@ -1486,8 +1531,8 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     profile_probability::always (),
 				     profile_probability::always (),
 				     true);
@@ -1535,6 +1580,17 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
      between loop1 and loop2.  */
   connect_loop_phis (loop1, loop2, to_loop2);
 
+  edge true_edge, false_edge, skip_edge1, skip_edge2;
+  extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
+
+  skip_edge1 = true_invar ? false_edge : true_edge;
+  skip_edge2 = true_invar ? true_edge : false_edge;
+  fix_loop_bb_probability (loop1, loop2, skip_edge1, skip_edge2);
+
+  /* Fix first loop's exit probability after scaling.  */
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
   free_original_copy_tables ();
 
   return true;
-- 
2.27.0.90.geebb51ba8c




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-08  5:54 ` [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Xionghu Luo
  2021-12-08 23:26   ` Jeff Law
@ 2021-12-13  9:14   ` Jan Hubicka
  2021-12-13 10:24     ` Jan Hubicka
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2021-12-13  9:14 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc

> gcc/ChangeLog:
> 
> 	* loop-invariant.c (find_invariants_bb): Check profile count
> 	before motion.
> 	(find_invariants_body): Add argument.
> ---
>  gcc/loop-invariant.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 5eee2e5c9f8..c61c8612fae 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
>     call.  */
>  
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> +		    bool always_executed)
>  {
>    rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  if (preheader->count > bb->count)
> +    return;

Please add a comment explaining the conditional and if possible also a
testcase.  Since profile updating and use is sensitive topic and it may
trigger regressions later, it is important to keep track of info why
given tests was added.

I wonder why the cost model chose to move any invariatns to preheader
why preheader->count > bb->count?

Honza
>  
>    FOR_BB_INSNS (bb, insn)
>      {
> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
>    unsigned i;
>  
>    for (i = 0; i < loop->num_nodes; i++)
> -    find_invariants_bb (body[i],
> -			bitmap_bit_p (always_reached, i),
> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>  			bitmap_bit_p (always_executed, i));
>  }
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-08  5:54 ` [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270] Xionghu Luo
  2021-12-08 23:28   ` Jeff Law
@ 2021-12-13  9:25   ` Jan Hubicka
  2021-12-14  9:27     ` Xionghu Luo
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2021-12-13  9:25 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc

> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> is marked as inner loop's extra loop exit and set with incorrect
> prediction, then a hot inner loop will become cold loop finally through
> optimizations, this patch add loop check when searching extra exit edges
> to avoid unexpected predict_edge from predict_paths_for_bb.
> 
> Regression tested on P8LE, OK for master?
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/103270
> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
> 	(predict_loops): Call with loop argument.

With changes to branch predictors it is useful to re-test their
effectivity on spec and see if their hitrates are still mathcing
reality.  You can do it by buiding spec with -fprofile-generate, train
it and then build with -fprofile-use -fdump-tree-ipa-profile-details
and use contrib/analyze_brprob.py that will collect info on how they
work.

This patch looks good to me, but it would be nice to have things reality
checked (and since we did not do the stats for some time, there may be
surprises) so if you could run the specs and post results of
analyze_brprob, it would be great.  I will also try to get to that soon,
but currently I am bit swamped by other problems I noticed on clang
builds.

Thanks a lot for working on profile fixes - I am trying now to get
things into shape.  With Martin we added basic testing infrastructure
for keeping track of profile updates and I am trying to see how it works
in practice now.  Hopefully it will make it easier to judge on profile
updating patches. I would welcome list of patches I should look at.

I will write separate mail on this.
Honza
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR middle-end/103270
> 	* gcc.dg/pr103270.c: New test.
> ---
>  gcc/predict.c                   | 10 ++++++----
>  gcc/testsuite/gcc.dg/pr103270.c | 19 +++++++++++++++++++
>  2 files changed, 25 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr103270.c
> 
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 3cb4e3c0eb5..5b6e0cf722b 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
>     exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
>  
>  static void
> -predict_extra_loop_exits (edge exit_edge)
> +predict_extra_loop_exits (class loop *loop, edge exit_edge)
>  {
>    unsigned i;
>    bool check_value_one;
> @@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
>  	continue;
>        if (EDGE_COUNT (e->src->succs) != 1)
>  	{
> -	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> +	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
> +					 loop);
>  	  continue;
>  	}
>  
>        FOR_EACH_EDGE (e1, ei, e->src->preds)
> -	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> +	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
> +				       loop);
>      }
>  }
>  
> @@ -2008,7 +2010,7 @@ predict_loops (void)
>  			 ex->src->index, ex->dest->index);
>  	      continue;
>  	    }
> -	  predict_extra_loop_exits (ex);
> +	  predict_extra_loop_exits (loop, ex);
>  
>  	  if (number_of_iterations_exit (loop, ex, &niter_desc, false, false))
>  	    niter = niter_desc.niter;
> diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
> new file mode 100644
> index 00000000000..819310e360e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr103270.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
> +
> +void test(int a, int* i)
> +{
> +  for (; a < 5; ++a)
> +    {
> +      int b = 0;
> +      int c = 0;
> +      for (; b != -11; b--)
> +	for (int d = 0; d ==0; d++)
> +	  {
> +	    *i += c & a;
> +	    c = b;
> +	  }
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "extra loop exit heuristics of edge\[^:\]*:" "profile_estimate"} } */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-13  9:14   ` Jan Hubicka
@ 2021-12-13 10:24     ` Jan Hubicka
  2021-12-14  9:21       ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2021-12-13 10:24 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc

> > gcc/ChangeLog:
> > 
> > 	* loop-invariant.c (find_invariants_bb): Check profile count
> > 	before motion.
> > 	(find_invariants_body): Add argument.
> > ---
> >  gcc/loop-invariant.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> > index 5eee2e5c9f8..c61c8612fae 100644
> > --- a/gcc/loop-invariant.c
> > +++ b/gcc/loop-invariant.c
> > @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
> >     call.  */
> >  
> >  static void
> > -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
> > +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> > +		    bool always_executed)
> >  {
> >    rtx_insn *insn;
> > +  basic_block preheader = loop_preheader_edge (loop)->src;
> > +
> > +  if (preheader->count > bb->count)
> > +    return;
> 
> Please add a comment explaining the conditional and if possible also a
> testcase.  Since profile updating and use is sensitive topic and it may
> trigger regressions later, it is important to keep track of info why
> given tests was added.
> 
> I wonder why the cost model chose to move any invariatns to preheader
> why preheader->count > bb->count?

Thinking about this more, you want to test:
  if (!always_executed && preheader->count > bb->count)
    return;
This will rule out some profile misupates.

Also the code updating always_reached is worng:
      if (always_reached
          && CALL_P (insn)
          && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
              || ! RTL_CONST_OR_PURE_CALL_P (insn)))
  always_reached = false;
It misses the case where statement can trhow external exception or
volatile asms that can also terminate execution.

I bit worry on how often the test will have false positives with guessed
profile where earlier loop optimizations reduced trip count below
realistic estimate.

Honza
> 
> Honza
> >  
> >    FOR_BB_INSNS (bb, insn)
> >      {
> > @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
> >    unsigned i;
> >  
> >    for (i = 0; i < loop->num_nodes; i++)
> > -    find_invariants_bb (body[i],
> > -			bitmap_bit_p (always_reached, i),
> > +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> >  			bitmap_bit_p (always_executed, i));
> >  }
> >  
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-13 10:24     ` Jan Hubicka
@ 2021-12-14  9:21       ` Xionghu Luo
  2021-12-16 11:20         ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-14  9:21 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc



On 2021/12/13 18:24, Jan Hubicka wrote:
>>> gcc/ChangeLog:
>>>
>>> 	* loop-invariant.c (find_invariants_bb): Check profile count
>>> 	before motion.
>>> 	(find_invariants_body): Add argument.
>>> ---
>>>  gcc/loop-invariant.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 5eee2e5c9f8..c61c8612fae 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
>>>     call.  */
>>>  
>>>  static void
>>> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
>>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>> +		    bool always_executed)
>>>  {
>>>    rtx_insn *insn;
>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>>> +
>>> +  if (preheader->count > bb->count)
>>> +    return;
>>
>> Please add a comment explaining the conditional and if possible also a
>> testcase.  Since profile updating and use is sensitive topic and it may
>> trigger regressions later, it is important to keep track of info why
>> given tests was added.
 
OK. Comments like?

/* Don't move insn of cold BB out of loop to preheader to reduce calculations
   and register live range in hot loop with cold BB.  */


And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.

--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
   basic_block preheader = loop_preheader_edge (loop)->src;

   if (preheader->count > bb->count)
-    return;
+    {
+      if (dump_file)
+       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
+                bb->index, loop->num);
+      return;
+    }


This case could reflect the patch's effect:


gcc/testsuite/gcc.dg/loop-invariant-2.c
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */

volatile int x;
void
bar (int, char *, char *);
void
foo (int *a, int n, int k)
{
  int i;

  for (i = 0; i < n; i++)
    {
      if (__builtin_expect (x, 0))
        bar (k / 5, "one", "two");
      a[i] = k;
    }
}

/* { dg-final { scan-rtl-dump-not "Decided to move invariant" "loop2_invariant" } } */


insn 27,28,29 was hoisted out of loop, with the count test patch, they are kept in
loop body.

 diff -U15 base/ssa-lim-23.c.271r.loop2_invariant patched/ssa-lim-23.c.271r.loop2_invariant

 *****ending processing of loop 1 ******
-Set in insn 27 is invariant (0), cost 16, depends on
-Set in insn 28 is invariant (1), cost 16, depends on
-Set in insn 29 is invariant (2), cost 8, depends on
-Set in insn 30 is invariant (3), cost 0, depends on 0
-Set in insn 31 is invariant (4), cost 0, depends on 1
-Set in insn 32 is invariant (5), cost 0, depends on 2
-Decided to move invariant 0 -- gain 16
-Decided to move invariant 1 -- gain 16
-Decided to move invariant 2 -- gain 8
-deferring rescan insn with uid = 27.
-deferring rescan insn with uid = 30.
-deferring rescan insn with uid = 61.
-changing bb of uid 27
-  from 5 to 3
-deferring rescan insn with uid = 28.
-deferring rescan insn with uid = 31.
-deferring rescan insn with uid = 62.
-changing bb of uid 28
-  from 5 to 3
-deferring rescan insn with uid = 29.
-deferring rescan insn with uid = 32.
-deferring rescan insn with uid = 63.
-changing bb of uid 29
-  from 5 to 3
 starting the processing of deferred insns
-rescanning insn with uid = 27.
-rescanning insn with uid = 28.
-rescanning insn with uid = 29.
-rescanning insn with uid = 30.
-rescanning insn with uid = 31.
-rescanning insn with uid = 32.
-rescanning insn with uid = 61.
-rescanning insn with uid = 62.
-rescanning insn with uid = 63.
 ending the processing of deferred insns
 starting the processing of deferred insns
 ending the processing of deferred insns

...

    55: r138:DI=unspec[`*.LANCHOR0',%2:DI] 47
       REG_EQUAL `*.LANCHOR0'
-   27: r139:DI=unspec[`*.LC0',%2:DI] 47
-   28: r140:DI=unspec[`*.LC1',%2:DI] 47
-   29: r141:DI=sign_extend(r118:SI)
    39: L39:
    21: NOTE_INSN_BASIC_BLOCK 4
    23: r117:SI=[r138:DI]
    24: r133:CC=cmp(r117:SI,0)
       REG_DEAD r117:SI
    25: pc={(r133:CC==0)?L34:pc}
       REG_DEAD r133:CC
       REG_BR_PROB 966367644
    26: NOTE_INSN_BASIC_BLOCK 5
-   61: r134:DI=r139:DI
-   62: r135:DI=r140:DI
-   63: r136:DI=r141:DI
-   30: %5:DI=r139:DI
+   27: r134:DI=unspec[`*.LC0',%2:DI] 47
+      REG_EQUAL `*.LC0'
+   28: r135:DI=unspec[`*.LC1',%2:DI] 47
+      REG_EQUAL `*.LC1'
+   29: r136:DI=sign_extend(r118:SI)
+   30: %5:DI=r134:DI
       REG_DEAD r134:DI
       REG_EQUAL `*.LC0'
-   31: %4:DI=r140:DI
+   31: %4:DI=r135:DI
       REG_DEAD r135:DI
       REG_EQUAL `*.LC1'
-   32: %3:DI=r141:DI
+   32: %3:DI=r136:DI
       REG_DEAD r136:DI
    33: call [`bar'] argc:0
       REG_DEAD %5:DI
       REG_DEAD %4:DI
       REG_DEAD %3:DI
       REG_CALL_DECL `bar'
    34: L34:
    35: NOTE_INSN_BASIC_BLOCK 6

>>
>> I wonder why the cost model chose to move any invariatns to preheader
>> why preheader->count > bb->count?
> 
> Thinking about this more, you want to test:
>   if (!always_executed && preheader->count > bb->count)
>     return;
> This will rule out some profile misupates.
> 
> Also the code updating always_reached is worng:
>       if (always_reached
>           && CALL_P (insn)
>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>   always_reached = false;
> It misses the case where statement can trhow external exception or
> volatile asms that can also terminate execution.
> 
> I bit worry on how often the test will have false positives with guessed
> profile where earlier loop optimizations reduced trip count below
> realistic estimate.

Sorry I don't understand why always_executed and always_reached matters here?
the test is based on BB before the FOR_BB_INSNS loop...

> 
> Honza
>>
>> Honza
>>>  
>>>    FOR_BB_INSNS (bb, insn)
>>>      {
>>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
>>>    unsigned i;
>>>  
>>>    for (i = 0; i < loop->num_nodes; i++)
>>> -    find_invariants_bb (body[i],
>>> -			bitmap_bit_p (always_reached, i),
>>> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>>>  			bitmap_bit_p (always_executed, i));
>>>  }
>>>  
>>> -- 
>>> 2.25.1
>>>

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-13  9:25   ` Jan Hubicka
@ 2021-12-14  9:27     ` Xionghu Luo
  2021-12-15  6:40       ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-14  9:27 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc



On 2021/12/13 17:25, Jan Hubicka wrote:
>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>> is marked as inner loop's extra loop exit and set with incorrect
>> prediction, then a hot inner loop will become cold loop finally through
>> optimizations, this patch add loop check when searching extra exit edges
>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>
>> Regression tested on P8LE, OK for master?
>>
>> gcc/ChangeLog:
>>
>> 	PR middle-end/103270
>> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
>> 	(predict_loops): Call with loop argument.
> 
> With changes to branch predictors it is useful to re-test their
> effectivity on spec and see if their hitrates are still mathcing
> reality.  You can do it by buiding spec with -fprofile-generate, train
> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
> and use contrib/analyze_brprob.py that will collect info on how they
> work.
> 
> This patch looks good to me, but it would be nice to have things reality
> checked (and since we did not do the stats for some time, there may be
> surprises) so if you could run the specs and post results of
> analyze_brprob, it would be great.  I will also try to get to that soon,
> but currently I am bit swamped by other problems I noticed on clang
> builds.
> 
> Thanks a lot for working on profile fixes - I am trying now to get
> things into shape.  With Martin we added basic testing infrastructure
> for keeping track of profile updates and I am trying to see how it works
> in practice now.  Hopefully it will make it easier to judge on profile
> updating patches. I would welcome list of patches I should look at.
> 
> I will write separate mail on this.
> Honza


With the patch, the analyze_brprob.py outputs below data with PGO build,
there is no verification code in the script, so how to check whether it
is correct?  Run it again without the patch and compare "extra loop exit"
field?


./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
__builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%

HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%

Loop count: 37870
  avg. # of iter: 8444.77
  median # of iter: 7.00
  avg. (1% cutoff) # of iter: 174.68
  avg. (5% cutoff) # of iter: 55.14
  avg. (10% cutoff) # of iter: 35.21
  avg. (20% cutoff) # of iter: 26.23
  avg. (30% cutoff) # of iter: 21.70


>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR middle-end/103270
>> 	* gcc.dg/pr103270.c: New test.
>> ---
>>  gcc/predict.c                   | 10 ++++++----
>>  gcc/testsuite/gcc.dg/pr103270.c | 19 +++++++++++++++++++
>>  2 files changed, 25 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr103270.c
>>
>> diff --git a/gcc/predict.c b/gcc/predict.c
>> index 3cb4e3c0eb5..5b6e0cf722b 100644
>> --- a/gcc/predict.c
>> +++ b/gcc/predict.c
>> @@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
>>     exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
>>  
>>  static void
>> -predict_extra_loop_exits (edge exit_edge)
>> +predict_extra_loop_exits (class loop *loop, edge exit_edge)
>>  {
>>    unsigned i;
>>    bool check_value_one;
>> @@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
>>  	continue;
>>        if (EDGE_COUNT (e->src->succs) != 1)
>>  	{
>> -	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
>> +	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
>> +					 loop);
>>  	  continue;
>>  	}
>>  
>>        FOR_EACH_EDGE (e1, ei, e->src->preds)
>> -	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
>> +	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
>> +				       loop);
>>      }
>>  }
>>  
>> @@ -2008,7 +2010,7 @@ predict_loops (void)
>>  			 ex->src->index, ex->dest->index);
>>  	      continue;
>>  	    }
>> -	  predict_extra_loop_exits (ex);
>> +	  predict_extra_loop_exits (loop, ex);
>>  
>>  	  if (number_of_iterations_exit (loop, ex, &niter_desc, false, false))
>>  	    niter = niter_desc.niter;
>> diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
>> new file mode 100644
>> index 00000000000..819310e360e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr103270.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
>> +
>> +void test(int a, int* i)
>> +{
>> +  for (; a < 5; ++a)
>> +    {
>> +      int b = 0;
>> +      int c = 0;
>> +      for (; b != -11; b--)
>> +	for (int d = 0; d ==0; d++)
>> +	  {
>> +	    *i += c & a;
>> +	    c = b;
>> +	  }
>> +    }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "extra loop exit heuristics of edge\[^:\]*:" "profile_estimate"} } */
>> -- 
>> 2.25.1
>>

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-14  9:27     ` Xionghu Luo
@ 2021-12-15  6:40       ` Xionghu Luo
  2021-12-16 11:18         ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-15  6:40 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher



On 2021/12/14 17:27, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/13 17:25, Jan Hubicka wrote:
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch add loop check when searching extra exit edges
>>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>>
>>> Regression tested on P8LE, OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR middle-end/103270
>>> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
>>> 	(predict_loops): Call with loop argument.
>>
>> With changes to branch predictors it is useful to re-test their
>> effectivity on spec and see if their hitrates are still mathcing
>> reality.  You can do it by buiding spec with -fprofile-generate, train
>> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
>> and use contrib/analyze_brprob.py that will collect info on how they
>> work.
>>
>> This patch looks good to me, but it would be nice to have things reality
>> checked (and since we did not do the stats for some time, there may be
>> surprises) so if you could run the specs and post results of
>> analyze_brprob, it would be great.  I will also try to get to that soon,
>> but currently I am bit swamped by other problems I noticed on clang
>> builds.
>>
>> Thanks a lot for working on profile fixes - I am trying now to get
>> things into shape.  With Martin we added basic testing infrastructure
>> for keeping track of profile updates and I am trying to see how it works
>> in practice now.  Hopefully it will make it easier to judge on profile
>> updating patches. I would welcome list of patches I should look at.
>>
>> I will write separate mail on this.
>> Honza
> 
> 
> With the patch, the analyze_brprob.py outputs below data with PGO build,
> there is no verification code in the script, so how to check whether it
> is correct?  Run it again without the patch and compare "extra loop exit"
> field?
> 
> 
> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
> 
> Loop count: 37870
>   avg. # of iter: 8444.77
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 174.68
>   avg. (5% cutoff) # of iter: 55.14
>   avg. (10% cutoff) # of iter: 35.21
>   avg. (20% cutoff) # of iter: 26.23
>   avg. (30% cutoff) # of iter: 21.70

This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
But this issue should be fixed.


./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/

benchspec
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
__builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%

HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%

Loop count: 38058
  avg. # of iter: 8403.32
  median # of iter: 7.00
  avg. (1% cutoff) # of iter: 173.72
  avg. (5% cutoff) # of iter: 54.90
  avg. (10% cutoff) # of iter: 35.20
  avg. (20% cutoff) # of iter: 26.35
  avg. (30% cutoff) # of iter: 21.87


-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-15  6:40       ` Xionghu Luo
@ 2021-12-16 11:18         ` Jan Hubicka
  2021-12-21  3:56           ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2021-12-16 11:18 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher

> > 
> > 
> > ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> > HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> > noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> > Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> > loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> > __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> > loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> > extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> > guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> > negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> > loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> > const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> > indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> > polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> > recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> > goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> > null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> > continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> > loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> > opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> > loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> > loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
> > pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> > call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> > opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
> > guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
> > 
> > HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> > no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
> > first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
> > DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
> > combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
> > 
> > Loop count: 37870
> >   avg. # of iter: 8444.77
> >   median # of iter: 7.00
> >   avg. (1% cutoff) # of iter: 174.68
> >   avg. (5% cutoff) # of iter: 55.14
> >   avg. (10% cutoff) # of iter: 35.21
> >   avg. (20% cutoff) # of iter: 26.23
> >   avg. (30% cutoff) # of iter: 21.70
> 
> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
> But this issue should be fixed.
> 
> 
> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
> 
> benchspec
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%

Thank you.  So it seems that the problem does not trigger in Spec but I
was also wondering if our current predict.def values are anywhere near
to reality.

THe table reads as follows:  
 - BRANCHES is number of branches the heuristics hit on (so extra loop
   exit has 80 and therefore we do not have that good statistics on it)
 - HITRATE is the probability that the prediction goes given direction
   during the train run.
   after / is the value which would be reached by perfect predictor
   (which predict branch to the direction that dominates during train)
   Extra loop exit is 81% out of 89% so it is pretty close to optimum
 - COVERAGE is how many times the predicted branch was executed

In general the idea is that for most heuristics (wihch can not determine
exact value like loop iteraitons) HITRATE values can be put to
predict.def so the Dempster-Shafer formula (DS theory) combines the
hypothesis sort of realistically (it assumes that all the predictors are
staistically independent which they are not).

We have HITRATE 67 for extra loop exit which is bit off what we do have
in the measured data, but I think our predict.def is still based on
spec2006 numbers.

So the patch is OK.  Perhaps we could experiment with updating
predict.def (It does develop even when run across same benchmark suite
since early optimizations change - this stage1 I think the threading
work definitly affects the situation substantially)

Honza
> 
> Loop count: 38058
>   avg. # of iter: 8403.32
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 173.72
>   avg. (5% cutoff) # of iter: 54.90
>   avg. (10% cutoff) # of iter: 35.20
>   avg. (20% cutoff) # of iter: 26.35
>   avg. (30% cutoff) # of iter: 21.87
> 
> 
> -- 
> Thanks,
> Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-14  9:21       ` Xionghu Luo
@ 2021-12-16 11:20         ` Jan Hubicka
  2021-12-17  1:30           ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2021-12-16 11:20 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc

>  
> OK. Comments like?
> 
> /* Don't move insn of cold BB out of loop to preheader to reduce calculations
>    and register live range in hot loop with cold BB.  */

Looks good.
> 
> 
> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
> 
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>    basic_block preheader = loop_preheader_edge (loop)->src;
> 
>    if (preheader->count > bb->count)
> -    return;
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
> +                bb->index, loop->num);
> +      return;
> +    }

This is also a good idea - testcases are quite important for this typo
of stuff.
> > 
> > Thinking about this more, you want to test:
> >   if (!always_executed && preheader->count > bb->count)
> >     return;
> > This will rule out some profile misupates.
> > 
> > Also the code updating always_reached is worng:
> >       if (always_reached
> >           && CALL_P (insn)
> >           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
> >               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
> >   always_reached = false;
> > It misses the case where statement can trhow external exception or
> > volatile asms that can also terminate execution.
> > 
> > I bit worry on how often the test will have false positives with guessed
> > profile where earlier loop optimizations reduced trip count below
> > realistic estimate.
> 
> Sorry I don't understand why always_executed and always_reached matters here?
> the test is based on BB before the FOR_BB_INSNS loop...

always_executed is useful here to avoid the situation where bb->count or
preheader->count is wrong and it would disable wanted transformation.

always_executed is just a bug I noticed while looking at the code.  I
will produce testcase for bugzilla.

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-16 11:20         ` Jan Hubicka
@ 2021-12-17  1:30           ` Xionghu Luo
  2021-12-29  1:43             ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-17  1:30 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, segher, wschmidt, linkw, dje.gcc



On 2021/12/16 19:20, Jan Hubicka wrote:
>>  
>> OK. Comments like?
>>
>> /* Don't move insn of cold BB out of loop to preheader to reduce calculations
>>    and register live range in hot loop with cold BB.  */
> 
> Looks good.
>>
>>
>> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
>>
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>    basic_block preheader = loop_preheader_edge (loop)->src;
>>
>>    if (preheader->count > bb->count)
>> -    return;
>> +    {
>> +      if (dump_file)
>> +       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
>> +                bb->index, loop->num);
>> +      return;
>> +    }
> 
> This is also a good idea - testcases are quite important for this typo
> of stuff.
>>>
>>> Thinking about this more, you want to test:
>>>   if (!always_executed && preheader->count > bb->count)
>>>     return;
>>> This will rule out some profile misupates.
>>>
>>> Also the code updating always_reached is worng:
>>>       if (always_reached
>>>           && CALL_P (insn)
>>>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>>>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>>>   always_reached = false;
>>> It misses the case where statement can trhow external exception or
>>> volatile asms that can also terminate execution.
>>>
>>> I bit worry on how often the test will have false positives with guessed
>>> profile where earlier loop optimizations reduced trip count below
>>> realistic estimate.
>>
>> Sorry I don't understand why always_executed and always_reached matters here?
>> the test is based on BB before the FOR_BB_INSNS loop...
> 
> always_executed is useful here to avoid the situation where bb->count or
> preheader->count is wrong and it would disable wanted transformation.
> 
> always_executed is just a bug I noticed while looking at the code.  I
> will produce testcase for bugzilla.
> 
> Honza


Thanks, so is this OK to commit now?  And any additional comments for
the "[PATCH 3/3] Fix loop split incorrect count and probability"
(https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586374.html)?
 

Updated comments and testcase:

[PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

From: Xiong Hu Luo <luoxhu@linux.ibm.com>

gcc/ChangeLog:

	* loop-invariant.c (find_invariants_bb): Check profile count
	before motion.
	(find_invariants_body): Add argument.

gcc/testsuite/ChangeLog:

	* gcc.dg/loop-invariant-2.c: New.
---
 gcc/loop-invariant.c                    | 17 ++++++++++++++---
 gcc/testsuite/gcc.dg/loop-invariant-2.c | 20 ++++++++++++++++++++
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/loop-invariant-2.c

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index fca0c2b24be..690f7704a0b 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1183,9 +1183,21 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
    call.  */
 
 static void
-find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
+find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
+		    bool always_executed)
 {
   rtx_insn *insn;
+  basic_block preheader = loop_preheader_edge (loop)->src;
+
+  /* Don't move insn of cold BB out of loop to preheader to reduce calculations
+     and register live range in hot loop with cold BB.  */
+  if (!always_executed && preheader->count > bb->count)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Don't move invariant from bb: %d out of loop %d\n",
+		 bb->index, loop->num);
+      return;
+    }
 
   FOR_BB_INSNS (bb, insn)
     {
@@ -1214,8 +1226,7 @@ find_invariants_body (class loop *loop, basic_block *body,
   unsigned i;
 
   for (i = 0; i < loop->num_nodes; i++)
-    find_invariants_bb (body[i],
-			bitmap_bit_p (always_reached, i),
+    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 			bitmap_bit_p (always_executed, i));
 }
 
diff --git a/gcc/testsuite/gcc.dg/loop-invariant-2.c b/gcc/testsuite/gcc.dg/loop-invariant-2.c
new file mode 100644
index 00000000000..df3d8458569
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-invariant-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */
+
+volatile int x;
+void
+bar (int, char *, char *);
+void
+foo (int *a, int n, int k)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+    {
+      if (__builtin_expect (x, 0))
+	bar (k / 5, "one", "two");
+      a[i] = k;
+    }
+}
+
+/* { dg-final { scan-rtl-dump "Don't move invariant from bb: .*out of loop" "loop2_invariant" } } */
-- 
2.27.0.90.geebb51ba8c



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]
  2021-12-16 11:18         ` Jan Hubicka
@ 2021-12-21  3:56           ` Xionghu Luo
  0 siblings, 0 replies; 22+ messages in thread
From: Xionghu Luo @ 2021-12-21  3:56 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher



On 2021/12/16 19:18, Jan Hubicka wrote:
>>>
>>>
>>> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
>>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
>>> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
>>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
>>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
>>> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
>>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
>>> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
>>> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
>>>
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
>>> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
>>> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
>>> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
>>>
>>> Loop count: 37870
>>>   avg. # of iter: 8444.77
>>>   median # of iter: 7.00
>>>   avg. (1% cutoff) # of iter: 174.68
>>>   avg. (5% cutoff) # of iter: 55.14
>>>   avg. (10% cutoff) # of iter: 35.21
>>>   avg. (20% cutoff) # of iter: 26.23
>>>   avg. (30% cutoff) # of iter: 21.70
>>
>> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
>> But this issue should be fixed.
>>
>>
>> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
>>
>> benchspec
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
>> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
>> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
>> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
>> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
>>
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
>> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
>> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
>> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%
> 
> Thank you.  So it seems that the problem does not trigger in Spec but I
> was also wondering if our current predict.def values are anywhere near
> to reality.
> 
> THe table reads as follows:  
>  - BRANCHES is number of branches the heuristics hit on (so extra loop
>    exit has 80 and therefore we do not have that good statistics on it)
>  - HITRATE is the probability that the prediction goes given direction
>    during the train run.
>    after / is the value which would be reached by perfect predictor
>    (which predict branch to the direction that dominates during train)
>    Extra loop exit is 81% out of 89% so it is pretty close to optimum
>  - COVERAGE is how many times the predicted branch was executed
> 
> In general the idea is that for most heuristics (wihch can not determine
> exact value like loop iteraitons) HITRATE values can be put to
> predict.def so the Dempster-Shafer formula (DS theory) combines the
> hypothesis sort of realistically (it assumes that all the predictors are
> staistically independent which they are not).
> 
> We have HITRATE 67 for extra loop exit which is bit off what we do have
> in the measured data, but I think our predict.def is still based on
> spec2006 numbers.
> 
> So the patch is OK.  Perhaps we could experiment with updating
> predict.def (It does develop even when run across same benchmark suite
> since early optimizations change - this stage1 I think the threading
> work definitly affects the situation substantially)

Thanks, committed to r12-6085.

> 
> Honza
>>
>> Loop count: 38058
>>   avg. # of iter: 8403.32
>>   median # of iter: 7.00
>>   avg. (1% cutoff) # of iter: 173.72
>>   avg. (5% cutoff) # of iter: 54.90
>>   avg. (10% cutoff) # of iter: 35.20
>>   avg. (20% cutoff) # of iter: 26.35
>>   avg. (30% cutoff) # of iter: 21.87
>>
>>
>> -- 
>> Thanks,
>> Xionghu

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] Fix loop split incorrect count and probability
  2021-12-13  8:57     ` Xionghu Luo
@ 2021-12-21  3:57       ` Xionghu Luo
  0 siblings, 0 replies; 22+ messages in thread
From: Xionghu Luo @ 2021-12-21  3:57 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: hubicka, dje.gcc, wschmidt, linkw, segher



On 2021/12/13 16:57, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/9 07:47, Jeff Law wrote:
>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>> index 3f6ad046623..33128061aab 100644
>>> --- a/gcc/tree-ssa-loop-split.c
>>> +++ b/gcc/tree-ssa-loop-split.c
>>>
>>> @@ -607,6 +610,38 @@ split_loop (class loop *loop1)
>>>       tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge
>>> (loop1));
>>>       patch_loop_exit (loop1, guard_stmt, guard_next, newend,
>>> initial_true);
>>>   +    update_ssa (TODO_update_ssa);
>>> +
>>> +    /* Proportion first loop's bb counts except those dominated by true
>>> +       branch to avoid drop 1s down.  */
>>> +    basic_block *bbs1, *bbs2;
>>> +    bbs1 = get_loop_body (loop1);
>>> +    unsigned j;
>>> +    for (j = 0; j < loop1->num_nodes; j++)
>>> +      if (bbs1[j] == loop1->latch
>>> +          || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
>>> +        bbs1[j]->count
>>> +          = bbs1[j]->count.apply_probability (true_edge->probability);
>>> +    free (bbs1);
>> It looks like there's two copies of this code in this patch, one in
>> split_loop and the other in do_split_loop_on_cond.  Would it make sense
>> to factor it out into its own little function?
>>
>>
>>> +
>>> +    /* Proportion second loop's bb counts except those dominated by
>>> false
>>> +       branch to avoid drop 1s down.  */
>>> +    basic_block bbi_copy = get_bb_copy (false_edge->dest);
>>> +    bbs2 = get_loop_body (loop2);
>>> +    for (j = 0; j < loop2->num_nodes; j++)
>>> +      if (bbs2[j] == loop2->latch
>>> +          || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
>>> +        bbs2[j]->count = bbs2[j]->count.apply_probability (
>>> +          true_edge->probability.invert ());
>>> +    free (bbs2);
>> Similarly for this block of code.
>>
>> If those can be reasonably factored out into two helper functions to be
>> called from split_loop and do_split_loop_on_cond, then this is OK with
>> the refactoring.
>>
>> jeff
> 
> 
> Thanks for the comments, updated as below.  Will commit this patchset and the
> approved patch for LIM if there are no objections:

This patch is committed to r12-6086.

> 
> 
> [PATCH v2 3/3] Fix loop split incorrect count and probability
> 
> In tree-ssa-loop-split.c, split_loop and split_loop_on_cond does two
> kind of split. split_loop only works for single loop and insert edge at
> exit when split, while split_loop_on_cond is not limited to single loop
> and insert edge at latch when split.  Both split behavior should consider
> loop count and probability update.  For split_loop, loop split condition
> is moved in front of loop1 and loop2; But split_loop_on_cond moves the
> condition between loop1 and loop2, this patch does:
>  1) profile count proportion for both original loop and copied loop
> without dropping down the true branch's count;
>  2) probability update in the two loops and between the two loops.
> 
> Regression tested pass, OK for master?
> 
> Changes diff for split_loop and split_loop_on_cond cases:
> 
> 1) diff base/loop-split.c.151t.lsplit patched/loop-split.c.152t.lsplit
> ...
>    <bb 2> [local count: 118111600]:
>    _1 = end_6(D) - beg_7(D);
>    j_9 = _1 + beg2_8(D);
>    if (end_6(D) > beg_7(D))
>      goto <bb 14>; [89.00%]
>    else
>      goto <bb 6>; [11.00%]
> 
>    <bb 14> [local count: 105119324]:
>    if (j_9 >= c_11(D))
> -    goto <bb 15>; [100.00%]
> +    goto <bb 15>; [33.00%]
>    else
> -    goto <bb 16>; [100.00%]
> +    goto <bb 16>; [67.00%]
> 
> -  <bb 15> [local count: 105119324]:
> +  <bb 15> [local count: 34689377]:
>    _27 = end_6(D) + -1;
>    _28 = beg_7(D) - end_6(D);
>    _29 = j_9 + _28;
>    _30 = _29 + 1;
>    _31 = MAX_EXPR <c_11(D), _30>;
> 
> -  <bb 3> [local count: 955630225]:
> +  <bb 3> [local count: 315357973]:
>    # i_18 = PHI <i_13(8), end_6(D)(15)>
>    # j_19 = PHI <j_14(8), j_9(15)>
>    printf ("a: %d %d\n", i_18, j_19);
>    i_13 = i_18 + -1;
>    j_14 = j_19 + -1;
>    if (j_14 >= _31)
> -    goto <bb 8>; [89.00%]
> +    goto <bb 8>; [29.37%]
>    else
> -    goto <bb 17>; [11.00%]
> +    goto <bb 17>; [70.63%]
> 
> -  <bb 8> [local count: 850510901]:
> +  <bb 8> [local count: 280668596]:
>    goto <bb 3>; [100.00%]
> 
> -  <bb 16> [local count: 105119324]:
> +  <bb 16> [local count: 70429947]:
>    # i_24 = PHI <end_6(D)(14), i_32(17)>
>    # j_25 = PHI <j_9(14), j_33(17)>
> 
>    <bb 10> [local count: 955630225]:
>    # i_3 = PHI <i_24(16), i_22(13)>
>    # j_2 = PHI <j_25(16), j_23(13)>
>    i_22 = i_3 + -1;
>    j_23 = j_2 + -1;
>    if (beg_7(D) < i_22)
>      goto <bb 13>; [89.00%]
>    else
>      goto <bb 9>; [11.00%]
> 
> -  <bb 13> [local count: 850510901]:
> +  <bb 13> [local count: 569842305]:
>    goto <bb 10>; [100.00%]
> 
>    <bb 17> [local count: 105119324]:
>    # i_32 = PHI <i_13(3)>
>    # j_33 = PHI <j_14(3)>
>    if (beg_7(D) < i_32)
>      goto <bb 16>; [80.00%]
>    else
>      goto <bb 9>; [20.00%]
> 
>    <bb 9> [local count: 105119324]:
> 
>    <bb 6> [local count: 118111600]:
>    return 0;
> 
>  }
> 
> 2) diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> ...
>    <bb 2> [local count: 118111600]:
>    if (n_7(D) > 0)
>      goto <bb 4>; [89.00%]
>    else
>      goto <bb 3>; [11.00%]
> 
>    <bb 3> [local count: 118111600]:
>    return;
> 
>    <bb 4> [local count: 105119324]:
>    pretmp_3 = ga;
> 
> -  <bb 5> [local count: 955630225]:
> +  <bb 5> [local count: 315357973]:
>    # i_13 = PHI <i_10(20), 0(4)>
>    # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>    if (prephitmp_12 != 0)
>      goto <bb 6>; [33.00%]
>    else
>      goto <bb 7>; [67.00%]
> 
>    <bb 6> [local count: 315357972]:
>    _2 = do_something ();
>    ga = _2;
> 
> -  <bb 7> [local count: 955630225]:
> +  <bb 7> [local count: 315357973]:
>    # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>    i_10 = inc (i_13);
>    if (n_7(D) > i_10)
>      goto <bb 21>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
> 
>    <bb 11> [local count: 105119324]:
>    goto <bb 3>; [100.00%]
> 
> -  <bb 21> [local count: 850510901]:
> +  <bb 21> [local count: 280668596]:
>    if (prephitmp_12 != 0)
> -    goto <bb 20>; [100.00%]
> +    goto <bb 20>; [33.00%]
>    else
> -    goto <bb 19>; [INV]
> +    goto <bb 19>; [67.00%]
> 
> -  <bb 20> [local count: 850510901]:
> +  <bb 20> [local count: 280668596]:
>    goto <bb 5>; [100.00%]
> 
> -  <bb 19> [count: 0]:
> +  <bb 19> [local count: 70429947]:
>    # i_23 = PHI <i_10(21)>
>    # prephitmp_25 = PHI <prephitmp_5(21)>
> 
> -  <bb 12> [local count: 955630225]:
> +  <bb 12> [local count: 640272252]:
>    # i_15 = PHI <i_23(19), i_22(16)>
>    # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>    i_22 = inc (i_15);
>    if (n_7(D) > i_22)
>      goto <bb 16>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
> 
> -  <bb 16> [local count: 850510901]:
> +  <bb 16> [local count: 569842305]:
>    goto <bb 12>; [100.00%]
>  }
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-split.c (Fix_loop_bb_probability): New function.
> 	(split_loop): Fix incorrect profile_count and probability.
> 	(do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 72 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 64 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3f6ad046623..55011aeed97 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -484,6 +484,39 @@ compute_new_first_bound (gimple_seq *stmts, class tree_niter_desc *niter,
>    return newend;
>  }
> 
> +/* Fix the two loop's bb count after split based on the split edge probability,
> +   don't adjust the bbs dominated by true branches of that loop to avoid
> +   dropping 1s down.  */
> +static void
> +fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
> +			 edge false_edge)
> +{
> +  update_ssa (TODO_update_ssa);
> +
> +  /* Proportion first loop's bb counts except those dominated by true
> +     branch to avoid drop 1s down.  */
> +  basic_block *bbs1, *bbs2;
> +  bbs1 = get_loop_body (loop1);
> +  unsigned j;
> +  for (j = 0; j < loop1->num_nodes; j++)
> +    if (bbs1[j] == loop1->latch
> +	|| !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
> +      bbs1[j]->count
> +	= bbs1[j]->count.apply_probability (true_edge->probability);
> +  free (bbs1);
> +
> +  /* Proportion second loop's bb counts except those dominated by false
> +     branch to avoid drop 1s down.  */
> +  basic_block bbi_copy = get_bb_copy (false_edge->dest);
> +  bbs2 = get_loop_body (loop2);
> +  for (j = 0; j < loop2->num_nodes; j++)
> +    if (bbs2[j] == loop2->latch
> +	|| !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
> +      bbs2[j]->count
> +	= bbs2[j]->count.apply_probability (true_edge->probability.invert ());
> +  free (bbs2);
> +}
> +
>  /* Checks if LOOP contains an conditional block whose condition
>     depends on which side in the iteration space it is, and if so
>     splits the iteration space into two loops.  Returns true if the
> @@ -575,7 +608,10 @@ split_loop (class loop *loop1)
>  					    stmts2);
>  	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>  	if (!initial_true)
> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +	edge true_edge, false_edge;
> +	extract_true_false_edges_from_block (bbs[i], &true_edge, &false_edge);
> 
>  	/* Now version the loop, placing loop2 after loop1 connecting
>  	   them, and fix up SSA form for that.  */
> @@ -583,11 +619,11 @@ split_loop (class loop *loop1)
>  	basic_block cond_bb;
> 
>  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   true);
> +					  true_edge->probability,
> +					  true_edge->probability.invert (),
> +					  profile_probability::always (),
> +					  profile_probability::always (),
> +					  true);
>  	gcc_assert (loop2);
> 
>  	edge new_e = connect_loops (loop1, loop2);
> @@ -607,6 +643,15 @@ split_loop (class loop *loop1)
>  	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>  	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
> 
> +	fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> +
> +	/* Fix first loop's exit probability after scaling.  */
> +	edge exit_to_latch1 = single_pred_edge (loop1->latch);
> +	exit_to_latch1->probability = exit_to_latch1->probability.apply_scale (
> +	  true_edge->probability.to_reg_br_prob_base (), REG_BR_PROB_BASE);
> +	single_exit (loop1)->probability
> +	  = exit_to_latch1->probability.invert ();
> +
>  	/* Finally patch out the two copies of the condition to be always
>  	   true/false (or opposite).  */
>  	gcond *force_true = as_a<gcond *> (last_stmt (bbs[i]));
> @@ -1486,8 +1531,8 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>    initialize_original_copy_tables ();
> 
>    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -				     profile_probability::always (),
> -				     profile_probability::never (),
> +				     invar_branch->probability.invert (),
> +				     invar_branch->probability,
>  				     profile_probability::always (),
>  				     profile_probability::always (),
>  				     true);
> @@ -1535,6 +1580,17 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>       between loop1 and loop2.  */
>    connect_loop_phis (loop1, loop2, to_loop2);
> 
> +  edge true_edge, false_edge, skip_edge1, skip_edge2;
> +  extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
> +
> +  skip_edge1 = true_invar ? false_edge : true_edge;
> +  skip_edge2 = true_invar ? true_edge : false_edge;
> +  fix_loop_bb_probability (loop1, loop2, skip_edge1, skip_edge2);
> +
> +  /* Fix first loop's exit probability after scaling.  */
> +  to_loop1->probability = invar_branch->probability.invert ();
> +  to_loop2->probability = invar_branch->probability;
> +
>    free_original_copy_tables ();
> 
>    return true;

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-17  1:30           ` Xionghu Luo
@ 2021-12-29  1:43             ` Xionghu Luo
  2021-12-29 12:55               ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Xionghu Luo @ 2021-12-29  1:43 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher



On 2021/12/17 09:30, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/16 19:20, Jan Hubicka wrote:
>>>  
>>> OK. Comments like?
>>>
>>> /* Don't move insn of cold BB out of loop to preheader to reduce calculations
>>>    and register live range in hot loop with cold BB.  */
>>
>> Looks good.
>>>
>>>
>>> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
>>>
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>>    basic_block preheader = loop_preheader_edge (loop)->src;
>>>
>>>    if (preheader->count > bb->count)
>>> -    return;
>>> +    {
>>> +      if (dump_file)
>>> +       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
>>> +                bb->index, loop->num);
>>> +      return;
>>> +    }
>>
>> This is also a good idea - testcases are quite important for this typo
>> of stuff.
>>>>
>>>> Thinking about this more, you want to test:
>>>>   if (!always_executed && preheader->count > bb->count)
>>>>     return;
>>>> This will rule out some profile misupates.
>>>>
>>>> Also the code updating always_reached is worng:
>>>>       if (always_reached
>>>>           && CALL_P (insn)
>>>>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>>>>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>>>>   always_reached = false;
>>>> It misses the case where statement can trhow external exception or
>>>> volatile asms that can also terminate execution.
>>>>
>>>> I bit worry on how often the test will have false positives with guessed
>>>> profile where earlier loop optimizations reduced trip count below
>>>> realistic estimate.
>>>
>>> Sorry I don't understand why always_executed and always_reached matters here?
>>> the test is based on BB before the FOR_BB_INSNS loop...
>>
>> always_executed is useful here to avoid the situation where bb->count or
>> preheader->count is wrong and it would disable wanted transformation.
>>
>> always_executed is just a bug I noticed while looking at the code.  I
>> will produce testcase for bugzilla.
>>
>> Honza
> 
> 
> Thanks, so is this OK to commit now?  And any additional comments for
> the "[PATCH 3/3] Fix loop split incorrect count and probability"
> (https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586374.html)?
> 
> 
> Updated comments and testcase:
> 
> [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

This is the last patch for the "cold bb in hot loop optimization" I'd
like commit soon if not other comments,  to let it fully tested more
broadly before stage4.  Thanks.

Regression tested pass on powerpc64le-linux-gnu {P10,P9},
powerpc64-linux-gnu {P8, P7} and X86 though.

> 
> From: Xiong Hu Luo <luoxhu@linux.ibm.com>
> 
> gcc/ChangeLog:
> 
> 	* loop-invariant.c (find_invariants_bb): Check profile count
> 	before motion.
> 	(find_invariants_body): Add argument.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/loop-invariant-2.c: New.
> ---
>  gcc/loop-invariant.c                    | 17 ++++++++++++++---
>  gcc/testsuite/gcc.dg/loop-invariant-2.c | 20 ++++++++++++++++++++
>  2 files changed, 34 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/loop-invariant-2.c
> 
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index fca0c2b24be..690f7704a0b 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,21 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
>     call.  */
> 
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> +		    bool always_executed)
>  {
>    rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  /* Don't move insn of cold BB out of loop to preheader to reduce calculations
> +     and register live range in hot loop with cold BB.  */
> +  if (!always_executed && preheader->count > bb->count)
> +    {
> +      if (dump_file)
> +	fprintf (dump_file, "Don't move invariant from bb: %d out of loop %d\n",
> +		 bb->index, loop->num);
> +      return;
> +    }
> 
>    FOR_BB_INSNS (bb, insn)
>      {
> @@ -1214,8 +1226,7 @@ find_invariants_body (class loop *loop, basic_block *body,
>    unsigned i;
> 
>    for (i = 0; i < loop->num_nodes; i++)
> -    find_invariants_bb (body[i],
> -			bitmap_bit_p (always_reached, i),
> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>  			bitmap_bit_p (always_executed, i));
>  }
> 
> diff --git a/gcc/testsuite/gcc.dg/loop-invariant-2.c b/gcc/testsuite/gcc.dg/loop-invariant-2.c
> new file mode 100644
> index 00000000000..df3d8458569
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-invariant-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */
> +
> +volatile int x;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int k)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i++)
> +    {
> +      if (__builtin_expect (x, 0))
> +	bar (k / 5, "one", "two");
> +      a[i] = k;
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump "Don't move invariant from bb: .*out of loop" "loop2_invariant" } } */

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-29  1:43             ` Xionghu Luo
@ 2021-12-29 12:55               ` Jan Hubicka
  2021-12-30  6:08                 ` Xionghu Luo
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2021-12-29 12:55 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher

> > 
> > From: Xiong Hu Luo <luoxhu@linux.ibm.com>
> > 
> > gcc/ChangeLog:
> > 
> > 	* loop-invariant.c (find_invariants_bb): Check profile count
> > 	before motion.
> > 	(find_invariants_body): Add argument.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.dg/loop-invariant-2.c: New.
OK,
thanks!
Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL
  2021-12-29 12:55               ` Jan Hubicka
@ 2021-12-30  6:08                 ` Xionghu Luo
  0 siblings, 0 replies; 22+ messages in thread
From: Xionghu Luo @ 2021-12-30  6:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: wschmidt, dje.gcc, gcc-patches, linkw, segher



On 2021/12/29 20:55, Jan Hubicka wrote:
>>>
>>> From: Xiong Hu Luo <luoxhu@linux.ibm.com>
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* loop-invariant.c (find_invariants_bb): Check profile count
>>> 	before motion.
>>> 	(find_invariants_body): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.dg/loop-invariant-2.c: New.
> OK,
> thanks!
> Honza

Thanks, committed to r12-6149.

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-12-30  6:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08  5:54 [PATCH 0/3] Dependency patches for hoist LIM code to cold loop Xionghu Luo
2021-12-08  5:54 ` [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Xionghu Luo
2021-12-08 23:26   ` Jeff Law
2021-12-13  9:14   ` Jan Hubicka
2021-12-13 10:24     ` Jan Hubicka
2021-12-14  9:21       ` Xionghu Luo
2021-12-16 11:20         ` Jan Hubicka
2021-12-17  1:30           ` Xionghu Luo
2021-12-29  1:43             ` Xionghu Luo
2021-12-29 12:55               ` Jan Hubicka
2021-12-30  6:08                 ` Xionghu Luo
2021-12-08  5:54 ` [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270] Xionghu Luo
2021-12-08 23:28   ` Jeff Law
2021-12-13  9:25   ` Jan Hubicka
2021-12-14  9:27     ` Xionghu Luo
2021-12-15  6:40       ` Xionghu Luo
2021-12-16 11:18         ` Jan Hubicka
2021-12-21  3:56           ` Xionghu Luo
2021-12-08  5:54 ` [PATCH 3/3] Fix loop split incorrect count and probability Xionghu Luo
2021-12-08 23:47   ` Jeff Law
2021-12-13  8:57     ` Xionghu Luo
2021-12-21  3:57       ` Xionghu Luo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).