[PATCH] Fix loop split incorrect count and probability

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix loop split incorrect count and probability
@ 2021-08-03  8:58 Xionghu Luo
  2021-08-04  2:42 ` Xionghu Luo
  2021-08-06 11:46 ` Richard Biener
  0 siblings, 2 replies; 20+ messages in thread
From: Xionghu Luo @ 2021-08-03  8:58 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka,
	Xionghu Luo

loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.

Regression tested pass, OK for master?

diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   int prephitmp_16;
   int prephitmp_25;

   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

-  <bb 6> [local count: 315357972]:
+  <bb 6> [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3a09bbc39e5..8e5a7ded0f7 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -583,10 +583,10 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
 					   true);
 	gcc_assert (loop2);
 
@@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
-				     profile_probability::always (),
-				     profile_probability::always (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     true);
   if (!loop2)
     {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-03  8:58 [PATCH] Fix loop split incorrect count and probability Xionghu Luo
@ 2021-08-04  2:42 ` Xionghu Luo
  2021-08-06 11:46 ` Richard Biener
  1 sibling, 0 replies; 20+ messages in thread
From: Xionghu Luo @ 2021-08-04  2:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, rguenther, hubicka

[-- Attachment #1: Type: text/plain, Size: 538 bytes --]

I' like to split this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html

to two patches:

0001-Fix-loop-split-incorrect-count-and-probability.patch
0002-Don-t-move-cold-code-out-of-loop-by-checking-bb-coun.patch

since they are solving two different things, please help to review
the attached series.  They show obvious performance improvement on
both P8 and P9 for CPU2017, and I am not sure how it will affect other
platforms like X86 and AArch64, it will be grateful if someone could
try it.  Thanks.


Xionghu

[-- Attachment #2: 0001-Fix-loop-split-incorrect-count-and-probability.patch --]
[-- Type: text/plain, Size: 4898 bytes --]

From 4e1ef5b1f423484a6789750e7cc0cf2e94517f20 Mon Sep 17 00:00:00 2001
From: Xionghu Luo <luoxhu@linux.ibm.com>
Date: Tue, 3 Aug 2021 03:44:14 -0500
Subject: [PATCH 1/2] Fix loop split incorrect count and probability

loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.

Regression tested pass, OK for master?

diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   int prephitmp_16;
   int prephitmp_25;

   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

-  <bb 6> [local count: 315357972]:
+  <bb 6> [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..d30782888f3 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -575,7 +575,11 @@ split_loop (class loop *loop1)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
+			   ? EDGE_SUCC (bbs[i], 0)
+			   : EDGE_SUCC (bbs[i], 1);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -583,10 +587,10 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
 					   true);
 	gcc_assert (loop2);
 
@@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
-				     profile_probability::always (),
-				     profile_probability::always (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     true);
   if (!loop2)
     {
@@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
   to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
 
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
   /* Due to introduction of a control flow edge from loop1 latch to loop2
      pre-header, we should update PHIs in loop2 to reflect this connection
      between loop1 and loop2.  */
-- 
2.27.0.90.geebb51ba8c


[-- Attachment #3: 0002-Don-t-move-cold-code-out-of-loop-by-checking-bb-coun.patch --]
[-- Type: text/plain, Size: 8766 bytes --]

From b015708a4a3a3cdadb184012174f813c11bcdec2 Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo <luoxhu@linux.ibm.com>
Date: Mon, 5 Jul 2021 03:57:11 -0500
Subject: [PATCH 2/2] Don't move cold code out of loop by checking bb count

There was a patch trying to avoid move cold block out of loop:

https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html

Richard suggested to "never hoist anything from a bb with lower execution
frequency to a bb with higher one in LIM invariantness_dom_walker
before_dom_children".

This patch does this profile count check in both gimple LIM
move_computations_worker and RTL loop-invariant.c find_invariants_bb,
if the loop bb is colder than loop preheader, don't hoist it out of
loop.

SPEC2017 performance evaluation shows 1% performance improvement for
intrate GEOMEAN and no obvious regression for others.  Especially,
500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
on P8LE.

Regression and bootstrap tested pass on P8LE, any comments?  Thanks.

gcc/ChangeLog:

	* loop-invariant.c (find_invariants_bb): Check profile count
	before motion.
	(find_invariants_body): Add argument.
	* tree-ssa-loop-im.c (move_computations_worker): Check profile
	count before motion.
	(execute_sm): Likewise.
	(execute_sm_exit): Check pointer validness.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/recip-3.c: Adjust.
---
 gcc/loop-invariant.c                    |  10 +-
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c |   2 +-
 gcc/tree-ssa-loop-im.c                  | 154 +++++++++++++++++++++++-
 3 files changed, 157 insertions(+), 9 deletions(-)

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index fca0c2b24be..5c3be7bf0eb 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
    call.  */
 
 static void
-find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
+find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
+		    bool always_executed)
 {
   rtx_insn *insn;
+  basic_block preheader = loop_preheader_edge (loop)->src;
+
+  if (preheader->count > bb->count)
+    return;
 
   FOR_BB_INSNS (bb, insn)
     {
@@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
   unsigned i;
 
   for (i = 0; i < loop->num_nodes; i++)
-    find_invariants_bb (body[i],
-			bitmap_bit_p (always_reached, i),
+    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 			bitmap_bit_p (always_executed, i));
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
index 638bf38db8c..641c91e719e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
@@ -23,4 +23,4 @@ float h ()
 	F[0] += E / d;
 }
 
-/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
+/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d9f75d5025e..80d61ddf531 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -1154,6 +1154,58 @@ move_computations_worker (basic_block bb)
 	  continue;
 	}
 
+      edge e = loop_preheader_edge (level);
+      if (e->src->count > bb->count)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "PHI node NOT moved to %d from %d:\n",
+		       e->src->index, bb->index);
+	      print_gimple_stmt (dump_file, stmt, 0);
+	      fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost,
+		       level->num);
+	    }
+	  gsi_next (&bsi);
+	  continue;
+	}
+      else
+	{
+	  unsigned i;
+	  bool skip_phi_move = false;
+	  for (i = 0; i < gimple_phi_num_args (stmt); i++)
+	    {
+	      tree def = PHI_ARG_DEF (stmt, i);
+
+	      if (TREE_CODE (def) != SSA_NAME)
+		continue;
+
+	      gimple *def_stmt = SSA_NAME_DEF_STMT (def);
+
+	      if (!gimple_bb (def_stmt))
+		continue;
+
+	      if (!dominated_by_p (CDI_DOMINATORS, e->src,
+				   gimple_bb (def_stmt)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    {
+		      fprintf (dump_file, "PHI node NOT moved to %d from %d\n",
+			       e->src->index, bb->index);
+		      print_gimple_stmt (dump_file, stmt, 0);
+		      fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost,
+			       level->num);
+		    }
+		  skip_phi_move = true;
+		  break;
+		}
+	    }
+	  if (skip_phi_move)
+	    {
+	      gsi_next (&bsi);
+	      continue;
+	    }
+	}
+
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "Moving PHI node\n");
@@ -1191,14 +1243,13 @@ move_computations_worker (basic_block bb)
 	  tree lhs = gimple_assign_lhs (new_stmt);
 	  SSA_NAME_RANGE_INFO (lhs) = NULL;
 	}
-      gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
+      gsi_insert_on_edge (e, new_stmt);
       remove_phi_node (&bsi, false);
     }
 
   for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); )
     {
       edge e;
-
       gimple *stmt = gsi_stmt (bsi);
 
       lim_data = get_lim_data (stmt);
@@ -1221,7 +1272,83 @@ move_computations_worker (basic_block bb)
       /* We do not really want to move conditionals out of the loop; we just
 	 placed it here to force its operands to be moved if necessary.  */
       if (gimple_code (stmt) == GIMPLE_COND)
-	continue;
+	{
+	  gsi_next (&bsi);
+	  continue;
+	}
+
+      e = loop_preheader_edge (level);
+      if (e->src->count > bb->count)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "stmt: Statement NOT moved to %d from %d \n",
+		       e->src->index, bb->index);
+	      print_gimple_stmt (dump_file, stmt, 0);
+	      fprintf (dump_file, "(cost %u) out of loop %d.\n\n", cost,
+		       level->num);
+	    }
+	  gsi_next (&bsi);
+	  continue;
+	}
+      else
+	{
+	  if (is_gimple_assign (stmt))
+	    {
+	      tree rhs1 = gimple_assign_rhs1 (stmt);
+	      tree rhs2 = gimple_assign_rhs2 (stmt);
+	      if (TREE_CODE (rhs1) == MEM_REF)
+		{
+		  rhs2 = TREE_OPERAND (rhs1, 1);
+		  rhs1 = TREE_OPERAND (rhs1, 0);
+		}
+	      gimple *stmt1 = NULL, *stmt2 = NULL;
+	      basic_block def_bb;
+	      if (rhs1 && TREE_CODE (rhs1) == SSA_NAME)
+		{
+		  stmt1 = SSA_NAME_DEF_STMT (rhs1);
+		  def_bb = gimple_bb (stmt1);
+		  if (stmt1
+		      && def_bb
+		      && (def_bb == bb
+			  || !dominated_by_p (CDI_DOMINATORS, e->src, def_bb)))
+		    {
+		      if (dump_file && (dump_flags & TDF_DETAILS))
+			{
+			  fprintf (dump_file,
+				   "stmt1: Statement NOT moved to %d from %d\n",
+				   e->src->index, bb->index);
+			  print_gimple_stmt (dump_file, stmt, 0);
+			  fprintf (dump_file, "(cost %u) out of loop %d.\n\n",
+				   cost, level->num);
+			}
+		      gsi_next (&bsi);
+		      continue;
+		    }
+		}
+	      if (rhs2 && TREE_CODE (rhs2) == SSA_NAME)
+		{
+		  stmt2 = SSA_NAME_DEF_STMT (rhs2);
+		  def_bb = gimple_bb (stmt2);
+		  if (stmt2 && def_bb
+		      && (def_bb == bb
+			  || !dominated_by_p (CDI_DOMINATORS, e->src, def_bb)))
+		    {
+		      if (dump_file && (dump_flags & TDF_DETAILS))
+			{
+			  fprintf (dump_file,
+				   "stmt2: Statement NOT moved to %d from %d\n",
+				   e->src->index, bb->index);
+			  print_gimple_stmt (dump_file, stmt, 0);
+			  fprintf (dump_file, "(cost %u) out of loop %d.\n\n",
+				   cost, level->num);
+			}
+		      gsi_next (&bsi);
+		      continue;
+		    }
+		}
+	    }
+	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -1231,7 +1358,6 @@ move_computations_worker (basic_block bb)
 		   cost, level->num);
 	}
 
-      e = loop_preheader_edge (level);
       gcc_assert (!gimple_vdef (stmt));
       if (gimple_vuse (stmt))
 	{
@@ -2133,6 +2259,19 @@ execute_sm (class loop *loop, im_mem_ref *ref,
   bool multi_threaded_model_p = false;
   gimple_stmt_iterator gsi;
   sm_aux *aux = new sm_aux;
+  basic_block bb = gimple_bb (first_mem_ref_loc (loop, ref)->stmt);
+
+  edge e = loop_preheader_edge (loop);
+  if (e->src->count > bb->count)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Don't execute store motion of ");
+	  print_generic_expr (dump_file, ref->mem.ref);
+	  fprintf (dump_file, " from loop %d\n", loop->num);
+	}
+      return;
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2241,7 +2380,12 @@ execute_sm_exit (class loop *loop, edge ex, vec<seq_entry> &seq,
 	}
       else
 	{
-	  sm_aux *aux = *aux_map.get (ref);
+	  sm_aux **paux = aux_map.get (ref);
+	  sm_aux *aux;
+	  if (paux)
+	    aux = *paux;
+	  else
+	    continue;
 	  if (!aux->store_flag || kind == sm_ord)
 	    {
 	      gassign *store;
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-03  8:58 [PATCH] Fix loop split incorrect count and probability Xionghu Luo
  2021-08-04  2:42 ` Xionghu Luo
@ 2021-08-06 11:46 ` Richard Biener
  2021-08-09  2:37   ` Xionghu Luo
  2021-08-09  4:33   ` Feng Xue OS
  1 sibling, 2 replies; 20+ messages in thread
From: Richard Biener @ 2021-08-06 11:46 UTC (permalink / raw)
  To: Xionghu Luo
  Cc: gcc-patches, segher, dje.gcc, wschmidt, guojiufu, linkw, hubicka

On Tue, 3 Aug 2021, Xionghu Luo wrote:

> loop split condition is moved between loop1 and loop2, the split bb's
> count and probability should also be duplicated instead of (100% vs INV),
> secondly, the original loop1 and loop2 count need be propotional from the
> original loop.
> 
> Regression tested pass, OK for master?
> 
> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> ...
>    int prephitmp_16;
>    int prephitmp_25;
> 
>    <bb 2> [local count: 118111600]:
>    if (n_7(D) > 0)
>      goto <bb 4>; [89.00%]
>    else
>      goto <bb 3>; [11.00%]
> 
>    <bb 3> [local count: 118111600]:
>    return;
> 
>    <bb 4> [local count: 105119324]:
>    pretmp_3 = ga;
> 
> -  <bb 5> [local count: 955630225]:
> +  <bb 5> [local count: 315357973]:
>    # i_13 = PHI <i_10(20), 0(4)>
>    # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>    if (prephitmp_12 != 0)
>      goto <bb 6>; [33.00%]
>    else
>      goto <bb 7>; [67.00%]
> 
> -  <bb 6> [local count: 315357972]:
> +  <bb 6> [local count: 104068130]:
>    _2 = do_something ();
>    ga = _2;
> 
> -  <bb 7> [local count: 955630225]:
> +  <bb 7> [local count: 315357973]:
>    # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>    i_10 = inc (i_13);
>    if (n_7(D) > i_10)
>      goto <bb 21>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
> 
>    <bb 11> [local count: 105119324]:
>    goto <bb 3>; [100.00%]
> 
> -  <bb 21> [local count: 850510901]:
> +  <bb 21> [local count: 280668596]:
>    if (prephitmp_12 != 0)
> -    goto <bb 20>; [100.00%]
> +    goto <bb 20>; [33.00%]
>    else
> -    goto <bb 19>; [INV]
> +    goto <bb 19>; [67.00%]
> 
> -  <bb 20> [local count: 850510901]:
> +  <bb 20> [local count: 280668596]:
>    goto <bb 5>; [100.00%]
> 
> -  <bb 19> [count: 0]:
> +  <bb 19> [local count: 70429947]:
>    # i_23 = PHI <i_10(21)>
>    # prephitmp_25 = PHI <prephitmp_5(21)>
> 
> -  <bb 12> [local count: 955630225]:
> +  <bb 12> [local count: 640272252]:
>    # i_15 = PHI <i_23(19), i_22(16)>
>    # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>    i_22 = inc (i_15);
>    if (n_7(D) > i_22)
>      goto <bb 16>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
> 
> -  <bb 16> [local count: 850510901]:
> +  <bb 16> [local count: 569842305]:
>    goto <bb 12>; [100.00%]
> 
>  }
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> 	(do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3a09bbc39e5..8e5a7ded0f7 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>  	basic_block cond_bb;
>  
>  	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> -					   profile_probability::always (),
> +					   true_edge->probability,
> +					   true_edge->probability.invert (),
> +					   true_edge->probability,
> +					   true_edge->probability.invert (),
>  					   true);

there is no 'true_edge' variable at this point.

>  	gcc_assert (loop2);
>  
> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>    initialize_original_copy_tables ();
>  
>    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -				     profile_probability::always (),
> -				     profile_probability::never (),
> -				     profile_probability::always (),
> -				     profile_probability::always (),
> +				     invar_branch->probability.invert (),
> +				     invar_branch->probability,
> +				     invar_branch->probability.invert (),
> +				     invar_branch->probability,
>  				     true);
>    if (!loop2)
>      {

The patch introduction seems to talk about do_split_loop_on_cond only.
Since loop versioning inserts a condition with the passed probabilities
but in this case a 'boolean_true_node' condition the then and else
probabilities passed look correct.  It's just the scaling arguments
that look wrong?  This loop_version call should get a comment as to
why we are passing probabilities the way we do.

It does seem that scaling the loop by the invar_branch probability
is correct.  Since this does similar things to unswitching, I see
that unswitching does

prob_true = edge_true->probability;
loop_version (loop, unshare_expr (cond),
                       NULL, prob_true,
                       prob_true.invert (),
                       prob_true, prob_true.invert (),
                       false);

which maybe suggests that your invar_branch based passing should
depend on 'true_invar'?

Also compared to unswitching the first loop is always entered,
so I wonder if the scaling is correct with respect to that
given unswitching where only ever one loop is entered?

Thanks,
Richard.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-06 11:46 ` Richard Biener
@ 2021-08-09  2:37   ` Xionghu Luo
  2021-08-10 14:47     ` Richard Biener
  2021-08-09  4:33   ` Feng Xue OS
  1 sibling, 1 reply; 20+ messages in thread
From: Xionghu Luo @ 2021-08-09  2:37 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka

Thanks,

On 2021/8/6 19:46, Richard Biener wrote:
> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> 
>> loop split condition is moved between loop1 and loop2, the split bb's
>> count and probability should also be duplicated instead of (100% vs INV),
>> secondly, the original loop1 and loop2 count need be propotional from the
>> original loop.
>>
>>
>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
>> ...
>>     int prephitmp_16;
>>     int prephitmp_25;
>>
>>     <bb 2> [local count: 118111600]:
>>     if (n_7(D) > 0)
>>       goto <bb 4>; [89.00%]
>>     else
>>       goto <bb 3>; [11.00%]
>>
>>     <bb 3> [local count: 118111600]:
>>     return;
>>
>>     <bb 4> [local count: 105119324]:
>>     pretmp_3 = ga;
>>
>> -  <bb 5> [local count: 955630225]:
>> +  <bb 5> [local count: 315357973]:
>>     # i_13 = PHI <i_10(20), 0(4)>
>>     # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>     if (prephitmp_12 != 0)
>>       goto <bb 6>; [33.00%]
>>     else
>>       goto <bb 7>; [67.00%]
>>
>> -  <bb 6> [local count: 315357972]:
>> +  <bb 6> [local count: 104068130]:
>>     _2 = do_something ();
>>     ga = _2;
>>
>> -  <bb 7> [local count: 955630225]:
>> +  <bb 7> [local count: 315357973]:
>>     # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>     i_10 = inc (i_13);
>>     if (n_7(D) > i_10)
>>       goto <bb 21>; [89.00%]
>>     else
>>       goto <bb 11>; [11.00%]
>>
>>     <bb 11> [local count: 105119324]:
>>     goto <bb 3>; [100.00%]
>>
>> -  <bb 21> [local count: 850510901]:
>> +  <bb 21> [local count: 280668596]:
>>     if (prephitmp_12 != 0)
>> -    goto <bb 20>; [100.00%]
>> +    goto <bb 20>; [33.00%]
>>     else
>> -    goto <bb 19>; [INV]
>> +    goto <bb 19>; [67.00%]
>>
>> -  <bb 20> [local count: 850510901]:
>> +  <bb 20> [local count: 280668596]:
>>     goto <bb 5>; [100.00%]
>>
>> -  <bb 19> [count: 0]:
>> +  <bb 19> [local count: 70429947]:
>>     # i_23 = PHI <i_10(21)>
>>     # prephitmp_25 = PHI <prephitmp_5(21)>
>>
>> -  <bb 12> [local count: 955630225]:
>> +  <bb 12> [local count: 640272252]:
>>     # i_15 = PHI <i_23(19), i_22(16)>
>>     # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>     i_22 = inc (i_15);
>>     if (n_7(D) > i_22)
>>       goto <bb 16>; [89.00%]
>>     else
>>       goto <bb 11>; [11.00%]
>>
>> -  <bb 16> [local count: 850510901]:
>> +  <bb 16> [local count: 569842305]:
>>     goto <bb 12>; [100.00%]
>>
>>   }
>>
>> gcc/ChangeLog:
>>
>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>> 	(do_split_loop_on_cond): Likewise.
>> ---
>>   gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>   1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index 3a09bbc39e5..8e5a7ded0f7 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>   	basic_block cond_bb;

 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
+			   ? EDGE_SUCC (bbs[i], 0)
+			   : EDGE_SUCC (bbs[i], 1);

>>   
>>   	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>> -					   profile_probability::always (),
>> -					   profile_probability::always (),
>> -					   profile_probability::always (),
>> -					   profile_probability::always (),
>> +					   true_edge->probability,
>> +					   true_edge->probability.invert (),
>> +					   true_edge->probability,
>> +					   true_edge->probability.invert (),
>>   					   true);
> 
> there is no 'true_edge' variable at this point.

Sorry, missed the above hunk when split the patch. 

> 
>>   	gcc_assert (loop2);
>>   
>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>>     initialize_original_copy_tables ();
>>   
>>     struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>> -				     profile_probability::always (),
>> -				     profile_probability::never (),
>> -				     profile_probability::always (),
>> -				     profile_probability::always (),
>> +				     invar_branch->probability.invert (),
>> +				     invar_branch->probability,
>> +				     invar_branch->probability.invert (),
>> +				     invar_branch->probability,
>>   				     true);
>>     if (!loop2)
>>       {
> 
> The patch introduction seems to talk about do_split_loop_on_cond only.

split_loop faces similar issue though it sets the two branches to 100% vs 100%
and no scaling which seems also incorrect.

> Since loop versioning inserts a condition with the passed probabilities
> but in this case a 'boolean_true_node' condition the then and else
> probabilities passed look correct.  It's just the scaling arguments
> that look wrong?  This loop_version call should get a comment as to
> why we are passing probabilities the way we do.

This optimization is transforming from:

for (i = 0; i < n; i = inc (i))
    {
      if (ga)
        ga = do_something ();
}

to: 

  for (i = 0; i < x; i = inc (i))
{
    if (true)
         ga = do_something ();
        if (!ga)
          break;
}
  for (; i < n; i = inc (i))
{
    if (false)
         ga = do_something ();
}


`boolean_true_node` is passed in to make the first loop's condition
statement to be 'true', after returning from loop_version, there is a
piece of code forcing the condition in second loop to be false,
and the original condition is moved from *in loop* to *exit edge*
between loop1 and loop2. 

@fxue may have inputs about this since he contributed it years ago.

> 
> It does seem that scaling the loop by the invar_branch probability
> is correct.  Since this does similar things to unswitching, I see
> that unswitching does
> 
> prob_true = edge_true->probability;
> loop_version (loop, unshare_expr (cond),
>                         NULL, prob_true,
>                         prob_true.invert (),
>                         prob_true, prob_true.invert (),
>                         false);
> 
> which maybe suggests that your invar_branch based passing should
> depend on 'true_invar'?
> 
> Also compared to unswitching the first loop is always entered,
> so I wonder if the scaling is correct with respect to that
> given unswitching where only ever one loop is entered?


Scaling is based on the probability calculated on "if (ga)", if it is
33% vs 67% before split, then it is reasonable to scale loop1 to 33%
and loop2 to 67% suppose the branch probability is correct enough?

unswitch also scaled the two loops based on prob_true, if prob_true
is 50%, it decreases X's count to HALF unexpectedly since it should be
executed on both branches?  while loop split still kept execute both first
loop and second loop, it seems even more accurate than loop unswitching?

tree-ssa-loop-unswitch.c:

   while (A)
     {
       if (inv)
         B;

       X;

       if (!inv)
	 C;
     }

   where inv is the loop invariant, into

   if (inv)
     {
       while (A)
	 {
           B;
	   X;
	 }
     }
   else
     {
       while (A)
	 {
	   X;
	   C;
	 }
     }

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-06 11:46 ` Richard Biener
  2021-08-09  2:37   ` Xionghu Luo
@ 2021-08-09  4:33   ` Feng Xue OS
  1 sibling, 0 replies; 20+ messages in thread
From: Feng Xue OS @ 2021-08-09  4:33 UTC (permalink / raw)
  To: Xionghu Luo, Richard Biener
  Cc: segher, wschmidt, linkw, gcc-patches, hubicka, dje.gcc

Yes. Condition to to switch two versioned loops is "true", the first two arguments should be 100% and 0%.

It is different from normal loop split, we could not deduce exactly precise probability for
condition-based loop split, since cfg inside loop2 would be changed. (invar-branch is replaced
to "true", as shown in the comment on do_split_loop_on_cond). Any way, your way of scaling
two loops' probabilities according to that of invar-branch, seems to be a better heuristics than
original, which would give us more reasonable execution count, at least for loop header bb.

Thanks,
Feng

________________________________________
From: Gcc-patches <gcc-patches-bounces+fxue=os.amperecomputing.com@gcc.gnu.org> on behalf of Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Sent: Friday, August 6, 2021 7:46 PM
To: Xionghu Luo
Cc: segher@kernel.crashing.org; wschmidt@linux.ibm.com; linkw@gcc.gnu.org; gcc-patches@gcc.gnu.org; hubicka@ucw.cz; dje.gcc@gmail.com
Subject: Re: [PATCH] Fix loop split incorrect count and probability

On Tue, 3 Aug 2021, Xionghu Luo wrote:

> loop split condition is moved between loop1 and loop2, the split bb's
> count and probability should also be duplicated instead of (100% vs INV),
> secondly, the original loop1 and loop2 count need be propotional from the
> original loop.
>
> Regression tested pass, OK for master?
>
> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> ...
>    int prephitmp_16;
>    int prephitmp_25;
>
>    <bb 2> [local count: 118111600]:
>    if (n_7(D) > 0)
>      goto <bb 4>; [89.00%]
>    else
>      goto <bb 3>; [11.00%]
>
>    <bb 3> [local count: 118111600]:
>    return;
>
>    <bb 4> [local count: 105119324]:
>    pretmp_3 = ga;
>
> -  <bb 5> [local count: 955630225]:
> +  <bb 5> [local count: 315357973]:
>    # i_13 = PHI <i_10(20), 0(4)>
>    # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>    if (prephitmp_12 != 0)
>      goto <bb 6>; [33.00%]
>    else
>      goto <bb 7>; [67.00%]
>
> -  <bb 6> [local count: 315357972]:
> +  <bb 6> [local count: 104068130]:
>    _2 = do_something ();
>    ga = _2;
>
> -  <bb 7> [local count: 955630225]:
> +  <bb 7> [local count: 315357973]:
>    # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>    i_10 = inc (i_13);
>    if (n_7(D) > i_10)
>      goto <bb 21>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
>
>    <bb 11> [local count: 105119324]:
>    goto <bb 3>; [100.00%]
>
> -  <bb 21> [local count: 850510901]:
> +  <bb 21> [local count: 280668596]:
>    if (prephitmp_12 != 0)
> -    goto <bb 20>; [100.00%]
> +    goto <bb 20>; [33.00%]
>    else
> -    goto <bb 19>; [INV]
> +    goto <bb 19>; [67.00%]
>
> -  <bb 20> [local count: 850510901]:
> +  <bb 20> [local count: 280668596]:
>    goto <bb 5>; [100.00%]
>
> -  <bb 19> [count: 0]:
> +  <bb 19> [local count: 70429947]:
>    # i_23 = PHI <i_10(21)>
>    # prephitmp_25 = PHI <prephitmp_5(21)>
>
> -  <bb 12> [local count: 955630225]:
> +  <bb 12> [local count: 640272252]:
>    # i_15 = PHI <i_23(19), i_22(16)>
>    # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>    i_22 = inc (i_15);
>    if (n_7(D) > i_22)
>      goto <bb 16>; [89.00%]
>    else
>      goto <bb 11>; [11.00%]
>
> -  <bb 16> [local count: 850510901]:
> +  <bb 16> [local count: 569842305]:
>    goto <bb 12>; [100.00%]
>
>  }
>
> gcc/ChangeLog:
>
>       * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>       (do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3a09bbc39e5..8e5a7ded0f7 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>       basic_block cond_bb;
>
>       class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -                                        profile_probability::always (),
> -                                        profile_probability::always (),
> -                                        profile_probability::always (),
> -                                        profile_probability::always (),
> +                                        true_edge->probability,
> +                                        true_edge->probability.invert (),
> +                                        true_edge->probability,
> +                                        true_edge->probability.invert (),
>                                          true);

there is no 'true_edge' variable at this point.

>       gcc_assert (loop2);
>
> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>    initialize_original_copy_tables ();
>
>    struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -                                  profile_probability::always (),
> -                                  profile_probability::never (),
> -                                  profile_probability::always (),
> -                                  profile_probability::always (),
> +                                  invar_branch->probability.invert (),
> +                                  invar_branch->probability,
> +                                  invar_branch->probability.invert (),
> +                                  invar_branch->probability,
>                                    true);
>    if (!loop2)
>      {

The patch introduction seems to talk about do_split_loop_on_cond only.
Since loop versioning inserts a condition with the passed probabilities
but in this case a 'boolean_true_node' condition the then and else
probabilities passed look correct.  It's just the scaling arguments
that look wrong?  This loop_version call should get a comment as to
why we are passing probabilities the way we do.

It does seem that scaling the loop by the invar_branch probability
is correct.  Since this does similar things to unswitching, I see
that unswitching does

prob_true = edge_true->probability;
loop_version (loop, unshare_expr (cond),
                       NULL, prob_true,
                       prob_true.invert (),
                       prob_true, prob_true.invert (),
                       false);

which maybe suggests that your invar_branch based passing should
depend on 'true_invar'?

Also compared to unswitching the first loop is always entered,
so I wonder if the scaling is correct with respect to that
given unswitching where only ever one loop is entered?

Thanks,
Richard.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-09  2:37   ` Xionghu Luo
@ 2021-08-10 14:47     ` Richard Biener
  2021-08-11  3:03       ` Feng Xue OS
  2021-08-11  8:32       ` Xionghu Luo
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Biener @ 2021-08-10 14:47 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka

On Mon, 9 Aug 2021, Xionghu Luo wrote:

> Thanks,
> 
> On 2021/8/6 19:46, Richard Biener wrote:
> > On Tue, 3 Aug 2021, Xionghu Luo wrote:
> > 
> >> loop split condition is moved between loop1 and loop2, the split bb's
> >> count and probability should also be duplicated instead of (100% vs INV),
> >> secondly, the original loop1 and loop2 count need be propotional from the
> >> original loop.
> >>
> >>
> >> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> >> ...
> >>     int prephitmp_16;
> >>     int prephitmp_25;
> >>
> >>     <bb 2> [local count: 118111600]:
> >>     if (n_7(D) > 0)
> >>       goto <bb 4>; [89.00%]
> >>     else
> >>       goto <bb 3>; [11.00%]
> >>
> >>     <bb 3> [local count: 118111600]:
> >>     return;
> >>
> >>     <bb 4> [local count: 105119324]:
> >>     pretmp_3 = ga;
> >>
> >> -  <bb 5> [local count: 955630225]:
> >> +  <bb 5> [local count: 315357973]:
> >>     # i_13 = PHI <i_10(20), 0(4)>
> >>     # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>     if (prephitmp_12 != 0)
> >>       goto <bb 6>; [33.00%]
> >>     else
> >>       goto <bb 7>; [67.00%]
> >>
> >> -  <bb 6> [local count: 315357972]:
> >> +  <bb 6> [local count: 104068130]:
> >>     _2 = do_something ();
> >>     ga = _2;
> >>
> >> -  <bb 7> [local count: 955630225]:
> >> +  <bb 7> [local count: 315357973]:
> >>     # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>     i_10 = inc (i_13);
> >>     if (n_7(D) > i_10)
> >>       goto <bb 21>; [89.00%]
> >>     else
> >>       goto <bb 11>; [11.00%]
> >>
> >>     <bb 11> [local count: 105119324]:
> >>     goto <bb 3>; [100.00%]
> >>
> >> -  <bb 21> [local count: 850510901]:
> >> +  <bb 21> [local count: 280668596]:
> >>     if (prephitmp_12 != 0)
> >> -    goto <bb 20>; [100.00%]
> >> +    goto <bb 20>; [33.00%]
> >>     else
> >> -    goto <bb 19>; [INV]
> >> +    goto <bb 19>; [67.00%]
> >>
> >> -  <bb 20> [local count: 850510901]:
> >> +  <bb 20> [local count: 280668596]:
> >>     goto <bb 5>; [100.00%]
> >>
> >> -  <bb 19> [count: 0]:
> >> +  <bb 19> [local count: 70429947]:
> >>     # i_23 = PHI <i_10(21)>
> >>     # prephitmp_25 = PHI <prephitmp_5(21)>
> >>
> >> -  <bb 12> [local count: 955630225]:
> >> +  <bb 12> [local count: 640272252]:
> >>     # i_15 = PHI <i_23(19), i_22(16)>
> >>     # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>     i_22 = inc (i_15);
> >>     if (n_7(D) > i_22)
> >>       goto <bb 16>; [89.00%]
> >>     else
> >>       goto <bb 11>; [11.00%]
> >>
> >> -  <bb 16> [local count: 850510901]:
> >> +  <bb 16> [local count: 569842305]:
> >>     goto <bb 12>; [100.00%]
> >>
> >>   }
> >>
> >> gcc/ChangeLog:
> >>
> >> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >> 	(do_split_loop_on_cond): Likewise.
> >> ---
> >>   gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>   1 file changed, 8 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >> index 3a09bbc39e5..8e5a7ded0f7 100644
> >> --- a/gcc/tree-ssa-loop-split.c
> >> +++ b/gcc/tree-ssa-loop-split.c
> >> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>   	basic_block cond_bb;
> 
>  	if (!initial_true)
> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> +			   ? EDGE_SUCC (bbs[i], 0)
> +			   : EDGE_SUCC (bbs[i], 1);
> 
> >>   
> >>   	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >> -					   profile_probability::always (),
> >> -					   profile_probability::always (),
> >> -					   profile_probability::always (),
> >> -					   profile_probability::always (),
> >> +					   true_edge->probability,
> >> +					   true_edge->probability.invert (),
> >> +					   true_edge->probability,
> >> +					   true_edge->probability.invert (),
> >>   					   true);
> > 
> > there is no 'true_edge' variable at this point.
> 
> Sorry, missed the above hunk when split the patch. 
> 
> > 
> >>   	gcc_assert (loop2);
> >>   
> >> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >>     initialize_original_copy_tables ();
> >>   
> >>     struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> >> -				     profile_probability::always (),
> >> -				     profile_probability::never (),
> >> -				     profile_probability::always (),
> >> -				     profile_probability::always (),
> >> +				     invar_branch->probability.invert (),
> >> +				     invar_branch->probability,
> >> +				     invar_branch->probability.invert (),
> >> +				     invar_branch->probability,
> >>   				     true);
> >>     if (!loop2)
> >>       {
> > 
> > The patch introduction seems to talk about do_split_loop_on_cond only.
> 
> split_loop faces similar issue though it sets the two branches to 100% vs 100%
> and no scaling which seems also incorrect.
> 
> > Since loop versioning inserts a condition with the passed probabilities
> > but in this case a 'boolean_true_node' condition the then and else
> > probabilities passed look correct.  It's just the scaling arguments
> > that look wrong?  This loop_version call should get a comment as to
> > why we are passing probabilities the way we do.
> 
> This optimization is transforming from:
> 
> for (i = 0; i < n; i = inc (i))
>     {
>       if (ga)
>         ga = do_something ();
> }
> 
> to: 
> 
>   for (i = 0; i < x; i = inc (i))
> {
>     if (true)
>          ga = do_something ();
>         if (!ga)
>           break;
> }
>   for (; i < n; i = inc (i))
> {
>     if (false)
>          ga = do_something ();
> }
> 
> 
> `boolean_true_node` is passed in to make the first loop's condition
> statement to be 'true', after returning from loop_version, there is a
> piece of code forcing the condition in second loop to be false,
> and the original condition is moved from *in loop* to *exit edge*
> between loop1 and loop2. 

Yes, one complication is that we use loop_version but do not retain
the CFG it creates.  Something like the vectorizers
slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
but then that code doesn't do any scaling yet.  But then
loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
we could write a variant that simply doesn't mangle the CFG
with a new condition switching between both loops but keeps them
executed after each other ...

As said, this adds to the confusion and some awkwardness.

> @fxue may have inputs about this since he contributed it years ago.
> 
> > 
> > It does seem that scaling the loop by the invar_branch probability
> > is correct.  Since this does similar things to unswitching, I see
> > that unswitching does
> > 
> > prob_true = edge_true->probability;
> > loop_version (loop, unshare_expr (cond),
> >                         NULL, prob_true,
> >                         prob_true.invert (),
> >                         prob_true, prob_true.invert (),
> >                         false);
> > 
> > which maybe suggests that your invar_branch based passing should
> > depend on 'true_invar'?
> > 
> > Also compared to unswitching the first loop is always entered,
> > so I wonder if the scaling is correct with respect to that
> > given unswitching where only ever one loop is entered?
> 
> 
> Scaling is based on the probability calculated on "if (ga)", if it is
> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
> and loop2 to 67% suppose the branch probability is correct enough?
> 
> unswitch also scaled the two loops based on prob_true, if prob_true
> is 50%, it decreases X's count to HALF unexpectedly since it should be
> executed on both branches?  while loop split still kept execute both first
> loop and second loop, it seems even more accurate than loop unswitching?

I just was saying that both doing exactly the same looks wrong (on
either side).

> tree-ssa-loop-unswitch.c:
> 
>    while (A)
>      {
>        if (inv)
>          B;
> 
>        X;
> 
>        if (!inv)
> 	 C;
>      }
> 
>    where inv is the loop invariant, into
> 
>    if (inv)
>      {
>        while (A)
> 	 {
>            B;
> 	   X;
> 	 }
>      }
>    else
>      {
>        while (A)
> 	 {
> 	   X;
> 	   C;
> 	 }
>      }

Yes, here scaling based on the if (inv) probability looks obviously
100% correct to me.  But I might be wrong.  And thus the
splitting case must be still off (so I conclude).  Hmm, in fact
I think for the loop unswitching case the scaling of the B and
C blocks is off?  Those should be considered always executed.
Note the loop unswitching pass is altering the conditions
condition to static true/false but I don't think it performs
further adjustments.

That said, likely the profile update cannot be done uniformly
for all blocks of a loop?

Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-10 14:47     ` Richard Biener
@ 2021-08-11  3:03       ` Feng Xue OS
  2021-10-26 13:05         ` Jan Hubicka
  2021-08-11  8:32       ` Xionghu Luo
  1 sibling, 1 reply; 20+ messages in thread
From: Feng Xue OS @ 2021-08-11  3:03 UTC (permalink / raw)
  To: Richard Biener, Xionghu Luo
  Cc: gcc-patches, segher, wschmidt, guojiufu, linkw, hubicka

Any transformation involving cfg alteration would face same problem,
it is not that easy to update new cfg with reasonable and seemly-correct
profile count. We can adjust probability for impacted condition bbs, but
lack of a utility like what static profile estimating pass does, and only
propagates count partially.

Thanks,
Feng

________________________________________
From: Richard Biener <rguenther@suse.de>
Sent: Tuesday, August 10, 2021 10:47 PM
To: Xionghu Luo
Cc: gcc-patches@gcc.gnu.org; segher@kernel.crashing.org; Feng Xue OS; wschmidt@linux.ibm.com; guojiufu@linux.ibm.com; linkw@gcc.gnu.org; hubicka@ucw.cz
Subject: Re: [PATCH] Fix loop split incorrect count and probability

On Mon, 9 Aug 2021, Xionghu Luo wrote:

> Thanks,
>
> On 2021/8/6 19:46, Richard Biener wrote:
> > On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >
> >> loop split condition is moved between loop1 and loop2, the split bb's
> >> count and probability should also be duplicated instead of (100% vs INV),
> >> secondly, the original loop1 and loop2 count need be propotional from the
> >> original loop.
> >>
> >>
> >> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> >> ...
> >>     int prephitmp_16;
> >>     int prephitmp_25;
> >>
> >>     <bb 2> [local count: 118111600]:
> >>     if (n_7(D) > 0)
> >>       goto <bb 4>; [89.00%]
> >>     else
> >>       goto <bb 3>; [11.00%]
> >>
> >>     <bb 3> [local count: 118111600]:
> >>     return;
> >>
> >>     <bb 4> [local count: 105119324]:
> >>     pretmp_3 = ga;
> >>
> >> -  <bb 5> [local count: 955630225]:
> >> +  <bb 5> [local count: 315357973]:
> >>     # i_13 = PHI <i_10(20), 0(4)>
> >>     # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>     if (prephitmp_12 != 0)
> >>       goto <bb 6>; [33.00%]
> >>     else
> >>       goto <bb 7>; [67.00%]
> >>
> >> -  <bb 6> [local count: 315357972]:
> >> +  <bb 6> [local count: 104068130]:
> >>     _2 = do_something ();
> >>     ga = _2;
> >>
> >> -  <bb 7> [local count: 955630225]:
> >> +  <bb 7> [local count: 315357973]:
> >>     # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>     i_10 = inc (i_13);
> >>     if (n_7(D) > i_10)
> >>       goto <bb 21>; [89.00%]
> >>     else
> >>       goto <bb 11>; [11.00%]
> >>
> >>     <bb 11> [local count: 105119324]:
> >>     goto <bb 3>; [100.00%]
> >>
> >> -  <bb 21> [local count: 850510901]:
> >> +  <bb 21> [local count: 280668596]:
> >>     if (prephitmp_12 != 0)
> >> -    goto <bb 20>; [100.00%]
> >> +    goto <bb 20>; [33.00%]
> >>     else
> >> -    goto <bb 19>; [INV]
> >> +    goto <bb 19>; [67.00%]
> >>
> >> -  <bb 20> [local count: 850510901]:
> >> +  <bb 20> [local count: 280668596]:
> >>     goto <bb 5>; [100.00%]
> >>
> >> -  <bb 19> [count: 0]:
> >> +  <bb 19> [local count: 70429947]:
> >>     # i_23 = PHI <i_10(21)>
> >>     # prephitmp_25 = PHI <prephitmp_5(21)>
> >>
> >> -  <bb 12> [local count: 955630225]:
> >> +  <bb 12> [local count: 640272252]:
> >>     # i_15 = PHI <i_23(19), i_22(16)>
> >>     # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>     i_22 = inc (i_15);
> >>     if (n_7(D) > i_22)
> >>       goto <bb 16>; [89.00%]
> >>     else
> >>       goto <bb 11>; [11.00%]
> >>
> >> -  <bb 16> [local count: 850510901]:
> >> +  <bb 16> [local count: 569842305]:
> >>     goto <bb 12>; [100.00%]
> >>
> >>   }
> >>
> >> gcc/ChangeLog:
> >>
> >>    * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>    (do_split_loop_on_cond): Likewise.
> >> ---
> >>   gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>   1 file changed, 8 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >> index 3a09bbc39e5..8e5a7ded0f7 100644
> >> --- a/gcc/tree-ssa-loop-split.c
> >> +++ b/gcc/tree-ssa-loop-split.c
> >> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>    basic_block cond_bb;
>
>       if (!initial_true)
> -       cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +       cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> +     edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> +                        ? EDGE_SUCC (bbs[i], 0)
> +                        : EDGE_SUCC (bbs[i], 1);
>
> >>
> >>    class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >> -                                     profile_probability::always (),
> >> -                                     profile_probability::always (),
> >> -                                     profile_probability::always (),
> >> -                                     profile_probability::always (),
> >> +                                     true_edge->probability,
> >> +                                     true_edge->probability.invert (),
> >> +                                     true_edge->probability,
> >> +                                     true_edge->probability.invert (),
> >>                                       true);
> >
> > there is no 'true_edge' variable at this point.
>
> Sorry, missed the above hunk when split the patch.
>
> >
> >>    gcc_assert (loop2);
> >>
> >> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >>     initialize_original_copy_tables ();
> >>
> >>     struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> >> -                               profile_probability::always (),
> >> -                               profile_probability::never (),
> >> -                               profile_probability::always (),
> >> -                               profile_probability::always (),
> >> +                               invar_branch->probability.invert (),
> >> +                               invar_branch->probability,
> >> +                               invar_branch->probability.invert (),
> >> +                               invar_branch->probability,
> >>                                 true);
> >>     if (!loop2)
> >>       {
> >
> > The patch introduction seems to talk about do_split_loop_on_cond only.
>
> split_loop faces similar issue though it sets the two branches to 100% vs 100%
> and no scaling which seems also incorrect.
>
> > Since loop versioning inserts a condition with the passed probabilities
> > but in this case a 'boolean_true_node' condition the then and else
> > probabilities passed look correct.  It's just the scaling arguments
> > that look wrong?  This loop_version call should get a comment as to
> > why we are passing probabilities the way we do.
>
> This optimization is transforming from:
>
> for (i = 0; i < n; i = inc (i))
>     {
>       if (ga)
>         ga = do_something ();
> }
>
> to:
>
>   for (i = 0; i < x; i = inc (i))
> {
>     if (true)
>          ga = do_something ();
>         if (!ga)
>           break;
> }
>   for (; i < n; i = inc (i))
> {
>     if (false)
>          ga = do_something ();
> }
>
>
> `boolean_true_node` is passed in to make the first loop's condition
> statement to be 'true', after returning from loop_version, there is a
> piece of code forcing the condition in second loop to be false,
> and the original condition is moved from *in loop* to *exit edge*
> between loop1 and loop2.

Yes, one complication is that we use loop_version but do not retain
the CFG it creates.  Something like the vectorizers
slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
but then that code doesn't do any scaling yet.  But then
loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
we could write a variant that simply doesn't mangle the CFG
with a new condition switching between both loops but keeps them
executed after each other ...

As said, this adds to the confusion and some awkwardness.

> @fxue may have inputs about this since he contributed it years ago.
>
> >
> > It does seem that scaling the loop by the invar_branch probability
> > is correct.  Since this does similar things to unswitching, I see
> > that unswitching does
> >
> > prob_true = edge_true->probability;
> > loop_version (loop, unshare_expr (cond),
> >                         NULL, prob_true,
> >                         prob_true.invert (),
> >                         prob_true, prob_true.invert (),
> >                         false);
> >
> > which maybe suggests that your invar_branch based passing should
> > depend on 'true_invar'?
> >
> > Also compared to unswitching the first loop is always entered,
> > so I wonder if the scaling is correct with respect to that
> > given unswitching where only ever one loop is entered?
>
>
> Scaling is based on the probability calculated on "if (ga)", if it is
> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
> and loop2 to 67% suppose the branch probability is correct enough?
>
> unswitch also scaled the two loops based on prob_true, if prob_true
> is 50%, it decreases X's count to HALF unexpectedly since it should be
> executed on both branches?  while loop split still kept execute both first
> loop and second loop, it seems even more accurate than loop unswitching?

I just was saying that both doing exactly the same looks wrong (on
either side).

> tree-ssa-loop-unswitch.c:
>
>    while (A)
>      {
>        if (inv)
>          B;
>
>        X;
>
>        if (!inv)
>        C;
>      }
>
>    where inv is the loop invariant, into
>
>    if (inv)
>      {
>        while (A)
>        {
>            B;
>          X;
>        }
>      }
>    else
>      {
>        while (A)
>        {
>          X;
>          C;
>        }
>      }

Yes, here scaling based on the if (inv) probability looks obviously
100% correct to me.  But I might be wrong.  And thus the
splitting case must be still off (so I conclude).  Hmm, in fact
I think for the loop unswitching case the scaling of the B and
C blocks is off?  Those should be considered always executed.
Note the loop unswitching pass is altering the conditions
condition to static true/false but I don't think it performs
further adjustments.

That said, likely the profile update cannot be done uniformly
for all blocks of a loop?

Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-10 14:47     ` Richard Biener
  2021-08-11  3:03       ` Feng Xue OS
@ 2021-08-11  8:32       ` Xionghu Luo
  2021-08-11  9:16         ` Richard Biener
  1 sibling, 1 reply; 20+ messages in thread
From: Xionghu Luo @ 2021-08-11  8:32 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka



On 2021/8/10 22:47, Richard Biener wrote:
> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> 
>> Thanks,
>>
>> On 2021/8/6 19:46, Richard Biener wrote:
>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>
>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>> count and probability should also be duplicated instead of (100% vs INV),
>>>> secondly, the original loop1 and loop2 count need be propotional from the
>>>> original loop.
>>>>
>>>>
>>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
>>>> ...
>>>>      int prephitmp_16;
>>>>      int prephitmp_25;
>>>>
>>>>      <bb 2> [local count: 118111600]:
>>>>      if (n_7(D) > 0)
>>>>        goto <bb 4>; [89.00%]
>>>>      else
>>>>        goto <bb 3>; [11.00%]
>>>>
>>>>      <bb 3> [local count: 118111600]:
>>>>      return;
>>>>
>>>>      <bb 4> [local count: 105119324]:
>>>>      pretmp_3 = ga;
>>>>
>>>> -  <bb 5> [local count: 955630225]:
>>>> +  <bb 5> [local count: 315357973]:
>>>>      # i_13 = PHI <i_10(20), 0(4)>
>>>>      # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>      if (prephitmp_12 != 0)
>>>>        goto <bb 6>; [33.00%]
>>>>      else
>>>>        goto <bb 7>; [67.00%]
>>>>
>>>> -  <bb 6> [local count: 315357972]:
>>>> +  <bb 6> [local count: 104068130]:
>>>>      _2 = do_something ();
>>>>      ga = _2;
>>>>
>>>> -  <bb 7> [local count: 955630225]:
>>>> +  <bb 7> [local count: 315357973]:
>>>>      # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>      i_10 = inc (i_13);
>>>>      if (n_7(D) > i_10)
>>>>        goto <bb 21>; [89.00%]
>>>>      else
>>>>        goto <bb 11>; [11.00%]
>>>>
>>>>      <bb 11> [local count: 105119324]:
>>>>      goto <bb 3>; [100.00%]
>>>>
>>>> -  <bb 21> [local count: 850510901]:
>>>> +  <bb 21> [local count: 280668596]:
>>>>      if (prephitmp_12 != 0)
>>>> -    goto <bb 20>; [100.00%]
>>>> +    goto <bb 20>; [33.00%]
>>>>      else
>>>> -    goto <bb 19>; [INV]
>>>> +    goto <bb 19>; [67.00%]
>>>>
>>>> -  <bb 20> [local count: 850510901]:
>>>> +  <bb 20> [local count: 280668596]:
>>>>      goto <bb 5>; [100.00%]
>>>>
>>>> -  <bb 19> [count: 0]:
>>>> +  <bb 19> [local count: 70429947]:
>>>>      # i_23 = PHI <i_10(21)>
>>>>      # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>
>>>> -  <bb 12> [local count: 955630225]:
>>>> +  <bb 12> [local count: 640272252]:
>>>>      # i_15 = PHI <i_23(19), i_22(16)>
>>>>      # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>      i_22 = inc (i_15);
>>>>      if (n_7(D) > i_22)
>>>>        goto <bb 16>; [89.00%]
>>>>      else
>>>>        goto <bb 11>; [11.00%]
>>>>
>>>> -  <bb 16> [local count: 850510901]:
>>>> +  <bb 16> [local count: 569842305]:
>>>>      goto <bb 12>; [100.00%]
>>>>
>>>>    }
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>> 	(do_split_loop_on_cond): Likewise.
>>>> ---
>>>>    gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>    1 file changed, 8 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>> --- a/gcc/tree-ssa-loop-split.c
>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>    	basic_block cond_bb;
>>
>>   	if (!initial_true)
>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +
>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>> +			   ? EDGE_SUCC (bbs[i], 0)
>> +			   : EDGE_SUCC (bbs[i], 1);
>>
>>>>    
>>>>    	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> +					   true_edge->probability,
>>>> +					   true_edge->probability.invert (),
>>>> +					   true_edge->probability,
>>>> +					   true_edge->probability.invert (),
>>>>    					   true);
>>>
>>> there is no 'true_edge' variable at this point.
>>
>> Sorry, missed the above hunk when split the patch.
>>
>>>
>>>>    	gcc_assert (loop2);
>>>>    
>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>>>>      initialize_original_copy_tables ();
>>>>    
>>>>      struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>>>> -				     profile_probability::always (),
>>>> -				     profile_probability::never (),
>>>> -				     profile_probability::always (),
>>>> -				     profile_probability::always (),
>>>> +				     invar_branch->probability.invert (),
>>>> +				     invar_branch->probability,
>>>> +				     invar_branch->probability.invert (),
>>>> +				     invar_branch->probability,
>>>>    				     true);
>>>>      if (!loop2)
>>>>        {
>>>
>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>
>> split_loop faces similar issue though it sets the two branches to 100% vs 100%
>> and no scaling which seems also incorrect.
>>
>>> Since loop versioning inserts a condition with the passed probabilities
>>> but in this case a 'boolean_true_node' condition the then and else
>>> probabilities passed look correct.  It's just the scaling arguments
>>> that look wrong?  This loop_version call should get a comment as to
>>> why we are passing probabilities the way we do.
>>
>> This optimization is transforming from:
>>
>> for (i = 0; i < n; i = inc (i))
>>      {
>>        if (ga)
>>          ga = do_something ();
>> }
>>
>> to:
>>
>>    for (i = 0; i < x; i = inc (i))
>> {
>>      if (true)
>>           ga = do_something ();
>>          if (!ga)
>>            break;
>> }
>>    for (; i < n; i = inc (i))
>> {
>>      if (false)
>>           ga = do_something ();
>> }
>>
>>
>> `boolean_true_node` is passed in to make the first loop's condition
>> statement to be 'true', after returning from loop_version, there is a
>> piece of code forcing the condition in second loop to be false,
>> and the original condition is moved from *in loop* to *exit edge*
>> between loop1 and loop2.
> 
> Yes, one complication is that we use loop_version but do not retain
> the CFG it creates.  Something like the vectorizers
> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> but then that code doesn't do any scaling yet.  But then
> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> we could write a variant that simply doesn't mangle the CFG
> with a new condition switching between both loops but keeps them
> executed after each other ...
> 
> As said, this adds to the confusion and some awkwardness.

Then loop_version in loop split requires two types of variant, one
is to insert condition to loop preheader for 'split_loops' usage, 
another is to insert condition to loop exit for 'do_split_loop_on_cond'
usage, it needs one extra function to encapsulate these cfg alterations
from outside to inside.

unswitching only execute one loop as it only moves the invariant condition
to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
may execute both loops no matter the condition is moved to the first loop's
preheader or exit.

The condition stmt in loop unswitching is invariant, but it is variant
in loop splitting, that's why loop unswitching execute only one loop
but loop splitting executes both loops.

Seems we need two condition arguments for loop_version, one for connecting
loop1 preheader to loop2 preheader, another one for connecting loop1's exit
to loop2's header?  Then it will be more generic for both unswitching pass
and splitting pass.  Looks a bit complicated and conflicted with loop_version's
comments:

/* Main entry point for Loop Versioning transformation.

   This transformation given a condition and a loop, creates
   -if (condition) { loop_copy1 } else { loop_copy2 }, ... */


And this only works for loop split usage, those many other places
doesn't use loop_version like this?

grep "loop_version (" * -r
qgcc/tree-parloops.c:      loop_version (loop, many_iterations_cond, NULL,
gcc/tree-vect-loop-manip.c:      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
gcc/tree-loop-distribution.c:  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
gcc/tree-if-conv.c:  new_loop = loop_version (loop, cond, &cond_bb,
gcc/tree-ssa-loop-manip.c:  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
gcc/cfgloopmanip.c:loop_version (class loop *loop,
gcc/tree-ssa-loop-split.c:      class loop *loop2 = loop_version (loop1, cond, &cond_bb,
gcc/tree-ssa-loop-split.c:   loop2 is duplicated using loop_version (), which corresponds to the part
gcc/tree-ssa-loop-split.c:  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
gcc/tree-ssa-loop-unswitch.c:  return loop_version (loop, unshare_expr (cond),
gcc/modulo-sched.c:           loop_version (loop, comp_rtx, &condition_bb,
gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *,
gcc/gimple-loop-versioning.cc:  li.optimized_loop = loop_version (loop, cond, &cond_bb,  

> 
>> @fxue may have inputs about this since he contributed it years ago.
>>
>>>
>>> It does seem that scaling the loop by the invar_branch probability
>>> is correct.  Since this does similar things to unswitching, I see
>>> that unswitching does
>>>
>>> prob_true = edge_true->probability;
>>> loop_version (loop, unshare_expr (cond),
>>>                          NULL, prob_true,
>>>                          prob_true.invert (),
>>>                          prob_true, prob_true.invert (),
>>>                          false);
>>>
>>> which maybe suggests that your invar_branch based passing should
>>> depend on 'true_invar'?
>>>
>>> Also compared to unswitching the first loop is always entered,
>>> so I wonder if the scaling is correct with respect to that
>>> given unswitching where only ever one loop is entered?
>>
>>
>> Scaling is based on the probability calculated on "if (ga)", if it is
>> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
>> and loop2 to 67% suppose the branch probability is correct enough?
>>
>> unswitch also scaled the two loops based on prob_true, if prob_true
>> is 50%, it decreases X's count to HALF unexpectedly since it should be
>> executed on both branches?  while loop split still kept execute both first
>> loop and second loop, it seems even more accurate than loop unswitching?
> 
> I just was saying that both doing exactly the same looks wrong (on
> either side).
> 
>> tree-ssa-loop-unswitch.c:
>>
>>     while (A)
>>       {
>>         if (inv)
>>           B;
>>
>>         X;
>>
>>         if (!inv)
>> 	 C;
>>       }
>>
>>     where inv is the loop invariant, into
>>
>>     if (inv)
>>       {
>>         while (A)
>> 	 {
>>             B;
>> 	   X;
>> 	 }
>>       }
>>     else
>>       {
>>         while (A)
>> 	 {
>> 	   X;
>> 	   C;
>> 	 }
>>       }
> 
> Yes, here scaling based on the if (inv) probability looks obviously
> 100% correct to me.  But I might be wrong.  And thus the
> splitting case must be still off (so I conclude).  Hmm, in fact
> I think for the loop unswitching case the scaling of the B and
> C blocks is off?

B, C and X are all scaled in tree-ssa-loop-unswitch.c:

  prob_true = edge_true->probability;
  return loop_version (loop, unshare_expr (cond),
		       NULL, prob_true,
		       prob_true.invert (),
		       prob_true, prob_true.invert (),
		       false);

>  Those should be considered always executed.
> Note the loop unswitching pass is altering the conditions
> condition to static true/false but I don't think it performs
> further adjustments.

loop unswitching calls tree_unswitch_single_loop recursively, 
firstly, it calls loop_version to generate nloop, then it calls
itself again for nloop and loop with simplify_using_entry_checks
to cut their false branches respectively.

  /* Invoke itself on modified loops.  */
  tree_unswitch_single_loop (nloop, num + 1);
  tree_unswitch_single_loop (loop, num + 1);

> 
> That said, likely the profile update cannot be done uniformly
> for all blocks of a loop?
> 
> Richard.
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-11  8:32       ` Xionghu Luo
@ 2021-08-11  9:16         ` Richard Biener
  2021-08-12  3:24           ` Xionghu Luo
  2021-09-22  8:40           ` Xionghu Luo
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Biener @ 2021-08-11  9:16 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka

On Wed, 11 Aug 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/10 22:47, Richard Biener wrote:
> > On Mon, 9 Aug 2021, Xionghu Luo wrote:
> > 
> >> Thanks,
> >>
> >> On 2021/8/6 19:46, Richard Biener wrote:
> >>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>
> >>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>> count and probability should also be duplicated instead of (100% vs INV),
> >>>> secondly, the original loop1 and loop2 count need be propotional from the
> >>>> original loop.
> >>>>
> >>>>
> >>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> >>>> ...
> >>>>      int prephitmp_16;
> >>>>      int prephitmp_25;
> >>>>
> >>>>      <bb 2> [local count: 118111600]:
> >>>>      if (n_7(D) > 0)
> >>>>        goto <bb 4>; [89.00%]
> >>>>      else
> >>>>        goto <bb 3>; [11.00%]
> >>>>
> >>>>      <bb 3> [local count: 118111600]:
> >>>>      return;
> >>>>
> >>>>      <bb 4> [local count: 105119324]:
> >>>>      pretmp_3 = ga;
> >>>>
> >>>> -  <bb 5> [local count: 955630225]:
> >>>> +  <bb 5> [local count: 315357973]:
> >>>>      # i_13 = PHI <i_10(20), 0(4)>
> >>>>      # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>      if (prephitmp_12 != 0)
> >>>>        goto <bb 6>; [33.00%]
> >>>>      else
> >>>>        goto <bb 7>; [67.00%]
> >>>>
> >>>> -  <bb 6> [local count: 315357972]:
> >>>> +  <bb 6> [local count: 104068130]:
> >>>>      _2 = do_something ();
> >>>>      ga = _2;
> >>>>
> >>>> -  <bb 7> [local count: 955630225]:
> >>>> +  <bb 7> [local count: 315357973]:
> >>>>      # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>      i_10 = inc (i_13);
> >>>>      if (n_7(D) > i_10)
> >>>>        goto <bb 21>; [89.00%]
> >>>>      else
> >>>>        goto <bb 11>; [11.00%]
> >>>>
> >>>>      <bb 11> [local count: 105119324]:
> >>>>      goto <bb 3>; [100.00%]
> >>>>
> >>>> -  <bb 21> [local count: 850510901]:
> >>>> +  <bb 21> [local count: 280668596]:
> >>>>      if (prephitmp_12 != 0)
> >>>> -    goto <bb 20>; [100.00%]
> >>>> +    goto <bb 20>; [33.00%]
> >>>>      else
> >>>> -    goto <bb 19>; [INV]
> >>>> +    goto <bb 19>; [67.00%]
> >>>>
> >>>> -  <bb 20> [local count: 850510901]:
> >>>> +  <bb 20> [local count: 280668596]:
> >>>>      goto <bb 5>; [100.00%]
> >>>>
> >>>> -  <bb 19> [count: 0]:
> >>>> +  <bb 19> [local count: 70429947]:
> >>>>      # i_23 = PHI <i_10(21)>
> >>>>      # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>
> >>>> -  <bb 12> [local count: 955630225]:
> >>>> +  <bb 12> [local count: 640272252]:
> >>>>      # i_15 = PHI <i_23(19), i_22(16)>
> >>>>      # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>      i_22 = inc (i_15);
> >>>>      if (n_7(D) > i_22)
> >>>>        goto <bb 16>; [89.00%]
> >>>>      else
> >>>>        goto <bb 11>; [11.00%]
> >>>>
> >>>> -  <bb 16> [local count: 850510901]:
> >>>> +  <bb 16> [local count: 569842305]:
> >>>>      goto <bb 12>; [100.00%]
> >>>>
> >>>>    }
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>> 	(do_split_loop_on_cond): Likewise.
> >>>> ---
> >>>>    gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>    1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>> --- a/gcc/tree-ssa-loop-split.c
> >>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>    	basic_block cond_bb;
> >>
> >>   	if (!initial_true)
> >> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >> +
> >> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >> +			   ? EDGE_SUCC (bbs[i], 0)
> >> +			   : EDGE_SUCC (bbs[i], 1);
> >>
> >>>>    
> >>>>    	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>> -					   profile_probability::always (),
> >>>> -					   profile_probability::always (),
> >>>> -					   profile_probability::always (),
> >>>> -					   profile_probability::always (),
> >>>> +					   true_edge->probability,
> >>>> +					   true_edge->probability.invert (),
> >>>> +					   true_edge->probability,
> >>>> +					   true_edge->probability.invert (),
> >>>>    					   true);
> >>>
> >>> there is no 'true_edge' variable at this point.
> >>
> >> Sorry, missed the above hunk when split the patch.
> >>
> >>>
> >>>>    	gcc_assert (loop2);
> >>>>    
> >>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >>>>      initialize_original_copy_tables ();
> >>>>    
> >>>>      struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> >>>> -				     profile_probability::always (),
> >>>> -				     profile_probability::never (),
> >>>> -				     profile_probability::always (),
> >>>> -				     profile_probability::always (),
> >>>> +				     invar_branch->probability.invert (),
> >>>> +				     invar_branch->probability,
> >>>> +				     invar_branch->probability.invert (),
> >>>> +				     invar_branch->probability,
> >>>>    				     true);
> >>>>      if (!loop2)
> >>>>        {
> >>>
> >>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>
> >> split_loop faces similar issue though it sets the two branches to 100% vs 100%
> >> and no scaling which seems also incorrect.
> >>
> >>> Since loop versioning inserts a condition with the passed probabilities
> >>> but in this case a 'boolean_true_node' condition the then and else
> >>> probabilities passed look correct.  It's just the scaling arguments
> >>> that look wrong?  This loop_version call should get a comment as to
> >>> why we are passing probabilities the way we do.
> >>
> >> This optimization is transforming from:
> >>
> >> for (i = 0; i < n; i = inc (i))
> >>      {
> >>        if (ga)
> >>          ga = do_something ();
> >> }
> >>
> >> to:
> >>
> >>    for (i = 0; i < x; i = inc (i))
> >> {
> >>      if (true)
> >>           ga = do_something ();
> >>          if (!ga)
> >>            break;
> >> }
> >>    for (; i < n; i = inc (i))
> >> {
> >>      if (false)
> >>           ga = do_something ();
> >> }
> >>
> >>
> >> `boolean_true_node` is passed in to make the first loop's condition
> >> statement to be 'true', after returning from loop_version, there is a
> >> piece of code forcing the condition in second loop to be false,
> >> and the original condition is moved from *in loop* to *exit edge*
> >> between loop1 and loop2.
> > 
> > Yes, one complication is that we use loop_version but do not retain
> > the CFG it creates.  Something like the vectorizers
> > slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> > but then that code doesn't do any scaling yet.  But then
> > loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> > we could write a variant that simply doesn't mangle the CFG
> > with a new condition switching between both loops but keeps them
> > executed after each other ...
> > 
> > As said, this adds to the confusion and some awkwardness.
> 
> Then loop_version in loop split requires two types of variant, one
> is to insert condition to loop preheader for 'split_loops' usage, 
> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> usage, it needs one extra function to encapsulate these cfg alterations
> from outside to inside.
> 
> unswitching only execute one loop as it only moves the invariant condition
> to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
> may execute both loops no matter the condition is moved to the first loop's
> preheader or exit.
> 
> The condition stmt in loop unswitching is invariant, but it is variant
> in loop splitting, that's why loop unswitching execute only one loop
> but loop splitting executes both loops.
> 
> Seems we need two condition arguments for loop_version, one for connecting
> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> to loop2's header?  Then it will be more generic for both unswitching pass
> and splitting pass.  Looks a bit complicated and conflicted with loop_version's
> comments:
> 
> /* Main entry point for Loop Versioning transformation.
> 
>    This transformation given a condition and a loop, creates
>    -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> 
> 
> And this only works for loop split usage, those many other places
> doesn't use loop_version like this?

Yes, as said if you don't want the above CFG then you probably
shouldn't use loop_version but instead use its building blocks
(and some refactoring of loop_version can make that easier).

I think splitting wants

  loop_copy1
  if (condition)
    loop_copy2

IMHO it would be good to split 'loopify' into the actual loopification
and the scaling.  Basically make the part of loop_version that
copies the loop on the header edge and creates a loop structure for
it separate.

loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
(copy_loop_before).

> grep "loop_version (" * -r
> qgcc/tree-parloops.c:      loop_version (loop, many_iterations_cond, NULL,
> gcc/tree-vect-loop-manip.c:      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
> gcc/tree-loop-distribution.c:  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
> gcc/tree-if-conv.c:  new_loop = loop_version (loop, cond, &cond_bb,
> gcc/tree-ssa-loop-manip.c:  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
> gcc/cfgloopmanip.c:loop_version (class loop *loop,
> gcc/tree-ssa-loop-split.c:      class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> gcc/tree-ssa-loop-split.c:   loop2 is duplicated using loop_version (), which corresponds to the part
> gcc/tree-ssa-loop-split.c:  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> gcc/tree-ssa-loop-unswitch.c:  return loop_version (loop, unshare_expr (cond),
> gcc/modulo-sched.c:           loop_version (loop, comp_rtx, &condition_bb,
> gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *,
> gcc/gimple-loop-versioning.cc:  li.optimized_loop = loop_version (loop, cond, &cond_bb,  
> 
> > 
> >> @fxue may have inputs about this since he contributed it years ago.
> >>
> >>>
> >>> It does seem that scaling the loop by the invar_branch probability
> >>> is correct.  Since this does similar things to unswitching, I see
> >>> that unswitching does
> >>>
> >>> prob_true = edge_true->probability;
> >>> loop_version (loop, unshare_expr (cond),
> >>>                          NULL, prob_true,
> >>>                          prob_true.invert (),
> >>>                          prob_true, prob_true.invert (),
> >>>                          false);
> >>>
> >>> which maybe suggests that your invar_branch based passing should
> >>> depend on 'true_invar'?
> >>>
> >>> Also compared to unswitching the first loop is always entered,
> >>> so I wonder if the scaling is correct with respect to that
> >>> given unswitching where only ever one loop is entered?
> >>
> >>
> >> Scaling is based on the probability calculated on "if (ga)", if it is
> >> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
> >> and loop2 to 67% suppose the branch probability is correct enough?
> >>
> >> unswitch also scaled the two loops based on prob_true, if prob_true
> >> is 50%, it decreases X's count to HALF unexpectedly since it should be
> >> executed on both branches?  while loop split still kept execute both first
> >> loop and second loop, it seems even more accurate than loop unswitching?
> > 
> > I just was saying that both doing exactly the same looks wrong (on
> > either side).
> > 
> >> tree-ssa-loop-unswitch.c:
> >>
> >>     while (A)
> >>       {
> >>         if (inv)
> >>           B;
> >>
> >>         X;
> >>
> >>         if (!inv)
> >> 	 C;
> >>       }
> >>
> >>     where inv is the loop invariant, into
> >>
> >>     if (inv)
> >>       {
> >>         while (A)
> >> 	 {
> >>             B;
> >> 	   X;
> >> 	 }
> >>       }
> >>     else
> >>       {
> >>         while (A)
> >> 	 {
> >> 	   X;
> >> 	   C;
> >> 	 }
> >>       }
> > 
> > Yes, here scaling based on the if (inv) probability looks obviously
> > 100% correct to me.  But I might be wrong.  And thus the
> > splitting case must be still off (so I conclude).  Hmm, in fact
> > I think for the loop unswitching case the scaling of the B and
> > C blocks is off?
> 
> B, C and X are all scaled in tree-ssa-loop-unswitch.c:
> 
>   prob_true = edge_true->probability;
>   return loop_version (loop, unshare_expr (cond),
> 		       NULL, prob_true,
> 		       prob_true.invert (),
> 		       prob_true, prob_true.invert (),
> 		       false);
> 
> >  Those should be considered always executed.
> > Note the loop unswitching pass is altering the conditions
> > condition to static true/false but I don't think it performs
> > further adjustments.
> 
> loop unswitching calls tree_unswitch_single_loop recursively, 
> firstly, it calls loop_version to generate nloop, then it calls
> itself again for nloop and loop with simplify_using_entry_checks
> to cut their false branches respectively.
> 
>   /* Invoke itself on modified loops.  */
>   tree_unswitch_single_loop (nloop, num + 1);
>   tree_unswitch_single_loop (loop, num + 1);
> 
> > 
> > That said, likely the profile update cannot be done uniformly
> > for all blocks of a loop?
> > 
> > Richard.
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-11  9:16         ` Richard Biener
@ 2021-08-12  3:24           ` Xionghu Luo
  2021-09-22  8:40           ` Xionghu Luo
  1 sibling, 0 replies; 20+ messages in thread
From: Xionghu Luo @ 2021-08-12  3:24 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka



On 2021/8/11 17:16, Richard Biener wrote:
> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/8/10 22:47, Richard Biener wrote:
>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>
>>>> Thanks,
>>>>
>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>
>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>> count and probability should also be duplicated instead of (100% vs INV),
>>>>>> secondly, the original loop1 and loop2 count need be propotional from the
>>>>>> original loop.
>>>>>>
>>>>>>
>>>>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
>>>>>> ...
>>>>>>       int prephitmp_16;
>>>>>>       int prephitmp_25;
>>>>>>
>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>       if (n_7(D) > 0)
>>>>>>         goto <bb 4>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 3>; [11.00%]
>>>>>>
>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>       return;
>>>>>>
>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>       pretmp_3 = ga;
>>>>>>
>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>       if (prephitmp_12 != 0)
>>>>>>         goto <bb 6>; [33.00%]
>>>>>>       else
>>>>>>         goto <bb 7>; [67.00%]
>>>>>>
>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>       _2 = do_something ();
>>>>>>       ga = _2;
>>>>>>
>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>       i_10 = inc (i_13);
>>>>>>       if (n_7(D) > i_10)
>>>>>>         goto <bb 21>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 11>; [11.00%]
>>>>>>
>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>       goto <bb 3>; [100.00%]
>>>>>>
>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>       if (prephitmp_12 != 0)
>>>>>> -    goto <bb 20>; [100.00%]
>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>       else
>>>>>> -    goto <bb 19>; [INV]
>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>
>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>       goto <bb 5>; [100.00%]
>>>>>>
>>>>>> -  <bb 19> [count: 0]:
>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>
>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>       i_22 = inc (i_15);
>>>>>>       if (n_7(D) > i_22)
>>>>>>         goto <bb 16>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 11>; [11.00%]
>>>>>>
>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>       goto <bb 12>; [100.00%]
>>>>>>
>>>>>>     }
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>> 	(do_split_loop_on_cond): Likewise.
>>>>>> ---
>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>     	basic_block cond_bb;
>>>>
>>>>    	if (!initial_true)
>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +
>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>
>>>>>>     
>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> +					   true_edge->probability,
>>>>>> +					   true_edge->probability.invert (),
>>>>>> +					   true_edge->probability,
>>>>>> +					   true_edge->probability.invert (),
>>>>>>     					   true);
>>>>>
>>>>> there is no 'true_edge' variable at this point.
>>>>
>>>> Sorry, missed the above hunk when split the patch.
>>>>
>>>>>
>>>>>>     	gcc_assert (loop2);
>>>>>>     
>>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>>>>>>       initialize_original_copy_tables ();
>>>>>>     
>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>>>>>> -				     profile_probability::always (),
>>>>>> -				     profile_probability::never (),
>>>>>> -				     profile_probability::always (),
>>>>>> -				     profile_probability::always (),
>>>>>> +				     invar_branch->probability.invert (),
>>>>>> +				     invar_branch->probability,
>>>>>> +				     invar_branch->probability.invert (),
>>>>>> +				     invar_branch->probability,
>>>>>>     				     true);
>>>>>>       if (!loop2)
>>>>>>         {
>>>>>
>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>
>>>> split_loop faces similar issue though it sets the two branches to 100% vs 100%
>>>> and no scaling which seems also incorrect.
>>>>
>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>> why we are passing probabilities the way we do.
>>>>
>>>> This optimization is transforming from:
>>>>
>>>> for (i = 0; i < n; i = inc (i))
>>>>       {
>>>>         if (ga)
>>>>           ga = do_something ();
>>>> }
>>>>
>>>> to:
>>>>
>>>>     for (i = 0; i < x; i = inc (i))
>>>> {
>>>>       if (true)
>>>>            ga = do_something ();
>>>>           if (!ga)
>>>>             break;
>>>> }
>>>>     for (; i < n; i = inc (i))
>>>> {
>>>>       if (false)
>>>>            ga = do_something ();
>>>> }
>>>>
>>>>
>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>> statement to be 'true', after returning from loop_version, there is a
>>>> piece of code forcing the condition in second loop to be false,
>>>> and the original condition is moved from *in loop* to *exit edge*
>>>> between loop1 and loop2.
>>>
>>> Yes, one complication is that we use loop_version but do not retain
>>> the CFG it creates.  Something like the vectorizers
>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>> but then that code doesn't do any scaling yet.  But then
>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>> we could write a variant that simply doesn't mangle the CFG
>>> with a new condition switching between both loops but keeps them
>>> executed after each other ...
>>>
>>> As said, this adds to the confusion and some awkwardness.
>>
>> Then loop_version in loop split requires two types of variant, one
>> is to insert condition to loop preheader for 'split_loops' usage,
>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>> usage, it needs one extra function to encapsulate these cfg alterations
>> from outside to inside.
>>
>> unswitching only execute one loop as it only moves the invariant condition
>> to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
>> may execute both loops no matter the condition is moved to the first loop's
>> preheader or exit.
>>
>> The condition stmt in loop unswitching is invariant, but it is variant
>> in loop splitting, that's why loop unswitching execute only one loop
>> but loop splitting executes both loops.
>>
>> Seems we need two condition arguments for loop_version, one for connecting
>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
>> to loop2's header?  Then it will be more generic for both unswitching pass
>> and splitting pass.  Looks a bit complicated and conflicted with loop_version's
>> comments:
>>
>> /* Main entry point for Loop Versioning transformation.
>>
>>     This transformation given a condition and a loop, creates
>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>
>>
>> And this only works for loop split usage, those many other places
>> doesn't use loop_version like this?
> 
> Yes, as said if you don't want the above CFG then you probably
> shouldn't use loop_version but instead use its building blocks
> (and some refactoring of loop_version can make that easier).
> 
> I think splitting wants
> 
>    loop_copy1
>    if (condition)
>      loop_copy2

Maybe

   if (condition1)
   loop_copy1
   if (!condition1 || condition2)
     loop_copy2

as loop_copy1 is not always executed in loop split.

> 
> IMHO it would be good to split 'loopify' into the actual loopification
> and the scaling.  Basically make the part of loop_version that
> copies the loop on the header edge and creates a loop structure for
> it separate.
> 
> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> (copy_loop_before).
> 
>> grep "loop_version (" * -r
>> qgcc/tree-parloops.c:      loop_version (loop, many_iterations_cond, NULL,
>> gcc/tree-vect-loop-manip.c:      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
>> gcc/tree-loop-distribution.c:  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
>> gcc/tree-if-conv.c:  new_loop = loop_version (loop, cond, &cond_bb,
>> gcc/tree-ssa-loop-manip.c:  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
>> gcc/cfgloopmanip.c:loop_version (class loop *loop,
>> gcc/tree-ssa-loop-split.c:      class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>> gcc/tree-ssa-loop-split.c:   loop2 is duplicated using loop_version (), which corresponds to the part
>> gcc/tree-ssa-loop-split.c:  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>> gcc/tree-ssa-loop-unswitch.c:  return loop_version (loop, unshare_expr (cond),
>> gcc/modulo-sched.c:           loop_version (loop, comp_rtx, &condition_bb,
>> gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *,
>> gcc/gimple-loop-versioning.cc:  li.optimized_loop = loop_version (loop, cond, &cond_bb,
>>
>>>
>>>> @fxue may have inputs about this since he contributed it years ago.
>>>>
>>>>>
>>>>> It does seem that scaling the loop by the invar_branch probability
>>>>> is correct.  Since this does similar things to unswitching, I see
>>>>> that unswitching does
>>>>>
>>>>> prob_true = edge_true->probability;
>>>>> loop_version (loop, unshare_expr (cond),
>>>>>                           NULL, prob_true,
>>>>>                           prob_true.invert (),
>>>>>                           prob_true, prob_true.invert (),
>>>>>                           false);
>>>>>
>>>>> which maybe suggests that your invar_branch based passing should
>>>>> depend on 'true_invar'?
>>>>>
>>>>> Also compared to unswitching the first loop is always entered,
>>>>> so I wonder if the scaling is correct with respect to that
>>>>> given unswitching where only ever one loop is entered?
>>>>
>>>>
>>>> Scaling is based on the probability calculated on "if (ga)", if it is
>>>> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
>>>> and loop2 to 67% suppose the branch probability is correct enough?
>>>>
>>>> unswitch also scaled the two loops based on prob_true, if prob_true
>>>> is 50%, it decreases X's count to HALF unexpectedly since it should be
>>>> executed on both branches?  while loop split still kept execute both first
>>>> loop and second loop, it seems even more accurate than loop unswitching?
>>>
>>> I just was saying that both doing exactly the same looks wrong (on
>>> either side).
>>>
>>>> tree-ssa-loop-unswitch.c:
>>>>
>>>>      while (A)
>>>>        {
>>>>          if (inv)
>>>>            B;
>>>>
>>>>          X;
>>>>
>>>>          if (!inv)
>>>> 	 C;
>>>>        }
>>>>
>>>>      where inv is the loop invariant, into
>>>>
>>>>      if (inv)
>>>>        {
>>>>          while (A)
>>>> 	 {
>>>>              B;
>>>> 	   X;
>>>> 	 }
>>>>        }
>>>>      else
>>>>        {
>>>>          while (A)
>>>> 	 {
>>>> 	   X;
>>>> 	   C;
>>>> 	 }
>>>>        }
>>>
>>> Yes, here scaling based on the if (inv) probability looks obviously
>>> 100% correct to me.  But I might be wrong.  And thus the
>>> splitting case must be still off (so I conclude).  Hmm, in fact
>>> I think for the loop unswitching case the scaling of the B and
>>> C blocks is off?
>>
>> B, C and X are all scaled in tree-ssa-loop-unswitch.c:
>>
>>    prob_true = edge_true->probability;
>>    return loop_version (loop, unshare_expr (cond),
>> 		       NULL, prob_true,
>> 		       prob_true.invert (),
>> 		       prob_true, prob_true.invert (),
>> 		       false);
>>
>>>   Those should be considered always executed.
>>> Note the loop unswitching pass is altering the conditions
>>> condition to static true/false but I don't think it performs
>>> further adjustments.
>>
>> loop unswitching calls tree_unswitch_single_loop recursively,
>> firstly, it calls loop_version to generate nloop, then it calls
>> itself again for nloop and loop with simplify_using_entry_checks
>> to cut their false branches respectively.
>>
>>    /* Invoke itself on modified loops.  */
>>    tree_unswitch_single_loop (nloop, num + 1);
>>    tree_unswitch_single_loop (loop, num + 1);
>>
>>>
>>> That said, likely the profile update cannot be done uniformly
>>> for all blocks of a loop?
>>>
>>> Richard.
>>>
>>
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-11  9:16         ` Richard Biener
  2021-08-12  3:24           ` Xionghu Luo
@ 2021-09-22  8:40           ` Xionghu Luo
  2021-09-23 12:17             ` Richard Biener
  1 sibling, 1 reply; 20+ messages in thread
From: Xionghu Luo @ 2021-09-22  8:40 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka



On 2021/8/11 17:16, Richard Biener wrote:
> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/8/10 22:47, Richard Biener wrote:
>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>
>>>> Thanks,
>>>>
>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>
>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>> count and probability should also be duplicated instead of (100% vs INV),
>>>>>> secondly, the original loop1 and loop2 count need be propotional from the
>>>>>> original loop.
>>>>>>
>>>>>>
>>>>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
>>>>>> ...
>>>>>>       int prephitmp_16;
>>>>>>       int prephitmp_25;
>>>>>>
>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>       if (n_7(D) > 0)
>>>>>>         goto <bb 4>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 3>; [11.00%]
>>>>>>
>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>       return;
>>>>>>
>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>       pretmp_3 = ga;
>>>>>>
>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>       if (prephitmp_12 != 0)
>>>>>>         goto <bb 6>; [33.00%]
>>>>>>       else
>>>>>>         goto <bb 7>; [67.00%]
>>>>>>
>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>       _2 = do_something ();
>>>>>>       ga = _2;
>>>>>>
>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>       i_10 = inc (i_13);
>>>>>>       if (n_7(D) > i_10)
>>>>>>         goto <bb 21>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 11>; [11.00%]
>>>>>>
>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>       goto <bb 3>; [100.00%]
>>>>>>
>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>       if (prephitmp_12 != 0)
>>>>>> -    goto <bb 20>; [100.00%]
>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>       else
>>>>>> -    goto <bb 19>; [INV]
>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>
>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>       goto <bb 5>; [100.00%]
>>>>>>
>>>>>> -  <bb 19> [count: 0]:
>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>
>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>       i_22 = inc (i_15);
>>>>>>       if (n_7(D) > i_22)
>>>>>>         goto <bb 16>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 11>; [11.00%]
>>>>>>
>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>       goto <bb 12>; [100.00%]
>>>>>>
>>>>>>     }
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>> 	(do_split_loop_on_cond): Likewise.
>>>>>> ---
>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>     	basic_block cond_bb;
>>>>
>>>>    	if (!initial_true)
>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +
>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>
>>>>>>     
>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> +					   true_edge->probability,
>>>>>> +					   true_edge->probability.invert (),
>>>>>> +					   true_edge->probability,
>>>>>> +					   true_edge->probability.invert (),
>>>>>>     					   true);
>>>>>
>>>>> there is no 'true_edge' variable at this point.
>>>>
>>>> Sorry, missed the above hunk when split the patch.
>>>>
>>>>>
>>>>>>     	gcc_assert (loop2);
>>>>>>     
>>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>>>>>>       initialize_original_copy_tables ();
>>>>>>     
>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>>>>>> -				     profile_probability::always (),
>>>>>> -				     profile_probability::never (),
>>>>>> -				     profile_probability::always (),
>>>>>> -				     profile_probability::always (),
>>>>>> +				     invar_branch->probability.invert (),
>>>>>> +				     invar_branch->probability,
>>>>>> +				     invar_branch->probability.invert (),
>>>>>> +				     invar_branch->probability,
>>>>>>     				     true);
>>>>>>       if (!loop2)
>>>>>>         {
>>>>>
>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>
>>>> split_loop faces similar issue though it sets the two branches to 100% vs 100%
>>>> and no scaling which seems also incorrect.
>>>>
>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>> why we are passing probabilities the way we do.
>>>>
>>>> This optimization is transforming from:
>>>>
>>>> for (i = 0; i < n; i = inc (i))
>>>>       {
>>>>         if (ga)
>>>>           ga = do_something ();
>>>> }
>>>>
>>>> to:
>>>>
>>>>     for (i = 0; i < x; i = inc (i))
>>>> {
>>>>       if (true)
>>>>            ga = do_something ();
>>>>           if (!ga)
>>>>             break;
>>>> }
>>>>     for (; i < n; i = inc (i))
>>>> {
>>>>       if (false)
>>>>            ga = do_something ();
>>>> }
>>>>
>>>>
>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>> statement to be 'true', after returning from loop_version, there is a
>>>> piece of code forcing the condition in second loop to be false,
>>>> and the original condition is moved from *in loop* to *exit edge*
>>>> between loop1 and loop2.
>>>
>>> Yes, one complication is that we use loop_version but do not retain
>>> the CFG it creates.  Something like the vectorizers
>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>> but then that code doesn't do any scaling yet.  But then
>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>> we could write a variant that simply doesn't mangle the CFG
>>> with a new condition switching between both loops but keeps them
>>> executed after each other ...
>>>
>>> As said, this adds to the confusion and some awkwardness.
>>
>> Then loop_version in loop split requires two types of variant, one
>> is to insert condition to loop preheader for 'split_loops' usage,
>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>> usage, it needs one extra function to encapsulate these cfg alterations
>> from outside to inside.
>>
>> unswitching only execute one loop as it only moves the invariant condition
>> to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
>> may execute both loops no matter the condition is moved to the first loop's
>> preheader or exit.
>>
>> The condition stmt in loop unswitching is invariant, but it is variant
>> in loop splitting, that's why loop unswitching execute only one loop
>> but loop splitting executes both loops.
>>
>> Seems we need two condition arguments for loop_version, one for connecting
>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
>> to loop2's header?  Then it will be more generic for both unswitching pass
>> and splitting pass.  Looks a bit complicated and conflicted with loop_version's
>> comments:
>>
>> /* Main entry point for Loop Versioning transformation.
>>
>>     This transformation given a condition and a loop, creates
>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>
>>
>> And this only works for loop split usage, those many other places
>> doesn't use loop_version like this?
> 
> Yes, as said if you don't want the above CFG then you probably
> shouldn't use loop_version but instead use its building blocks
> (and some refactoring of loop_version can make that easier).
> 
> I think splitting wants
> 
>    loop_copy1
>    if (condition)
>      loop_copy2
> 
> IMHO it would be good to split 'loopify' into the actual loopification
> and the scaling.  Basically make the part of loop_version that
> copies the loop on the header edge and creates a loop structure for
> it separate.
> 
> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> (copy_loop_before).
> 

Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
copying loops with single exit, it would cause many ICEs in it even
building GCC stage1 (line 1065, line 1184 due to exit or new_exit
is NULL returning from single_exit (loop).).  Seems loop_version is
more flexible for loop split.


https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/tree-vect-loop-manip.c;h=4988c93fdb61507a26430651b416ae61b217793a;hb=HEAD


-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-09-22  8:40           ` Xionghu Luo
@ 2021-09-23 12:17             ` Richard Biener
  2021-10-15  5:51               ` Xionghu Luo
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2021-09-23 12:17 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka

On Wed, 22 Sep 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/11 17:16, Richard Biener wrote:
> > On Wed, 11 Aug 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/8/10 22:47, Richard Biener wrote:
> >>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>>
> >>>> Thanks,
> >>>>
> >>>> On 2021/8/6 19:46, Richard Biener wrote:
> >>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>>>
> >>>>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>>>> count and probability should also be duplicated instead of (100% vs
> >>>>>> INV),
> >>>>>> secondly, the original loop1 and loop2 count need be propotional from
> >>>>>> the
> >>>>>> original loop.
> >>>>>>
> >>>>>>
> >>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> >>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> >>>>>> ...
> >>>>>>       int prephitmp_16;
> >>>>>>       int prephitmp_25;
> >>>>>>
> >>>>>>       <bb 2> [local count: 118111600]:
> >>>>>>       if (n_7(D) > 0)
> >>>>>>         goto <bb 4>; [89.00%]
> >>>>>>       else
> >>>>>>         goto <bb 3>; [11.00%]
> >>>>>>
> >>>>>>       <bb 3> [local count: 118111600]:
> >>>>>>       return;
> >>>>>>
> >>>>>>       <bb 4> [local count: 105119324]:
> >>>>>>       pretmp_3 = ga;
> >>>>>>
> >>>>>> -  <bb 5> [local count: 955630225]:
> >>>>>> +  <bb 5> [local count: 315357973]:
> >>>>>>       # i_13 = PHI <i_10(20), 0(4)>
> >>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>>>       if (prephitmp_12 != 0)
> >>>>>>         goto <bb 6>; [33.00%]
> >>>>>>       else
> >>>>>>         goto <bb 7>; [67.00%]
> >>>>>>
> >>>>>> -  <bb 6> [local count: 315357972]:
> >>>>>> +  <bb 6> [local count: 104068130]:
> >>>>>>       _2 = do_something ();
> >>>>>>       ga = _2;
> >>>>>>
> >>>>>> -  <bb 7> [local count: 955630225]:
> >>>>>> +  <bb 7> [local count: 315357973]:
> >>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>>>       i_10 = inc (i_13);
> >>>>>>       if (n_7(D) > i_10)
> >>>>>>         goto <bb 21>; [89.00%]
> >>>>>>       else
> >>>>>>         goto <bb 11>; [11.00%]
> >>>>>>
> >>>>>>       <bb 11> [local count: 105119324]:
> >>>>>>       goto <bb 3>; [100.00%]
> >>>>>>
> >>>>>> -  <bb 21> [local count: 850510901]:
> >>>>>> +  <bb 21> [local count: 280668596]:
> >>>>>>       if (prephitmp_12 != 0)
> >>>>>> -    goto <bb 20>; [100.00%]
> >>>>>> +    goto <bb 20>; [33.00%]
> >>>>>>       else
> >>>>>> -    goto <bb 19>; [INV]
> >>>>>> +    goto <bb 19>; [67.00%]
> >>>>>>
> >>>>>> -  <bb 20> [local count: 850510901]:
> >>>>>> +  <bb 20> [local count: 280668596]:
> >>>>>>       goto <bb 5>; [100.00%]
> >>>>>>
> >>>>>> -  <bb 19> [count: 0]:
> >>>>>> +  <bb 19> [local count: 70429947]:
> >>>>>>       # i_23 = PHI <i_10(21)>
> >>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>>>
> >>>>>> -  <bb 12> [local count: 955630225]:
> >>>>>> +  <bb 12> [local count: 640272252]:
> >>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
> >>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>>>       i_22 = inc (i_15);
> >>>>>>       if (n_7(D) > i_22)
> >>>>>>         goto <bb 16>; [89.00%]
> >>>>>>       else
> >>>>>>         goto <bb 11>; [11.00%]
> >>>>>>
> >>>>>> -  <bb 16> [local count: 850510901]:
> >>>>>> +  <bb 16> [local count: 569842305]:
> >>>>>>       goto <bb 12>; [100.00%]
> >>>>>>
> >>>>>>     }
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>>>>  (do_split_loop_on_cond): Likewise.
> >>>>>> ---
> >>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>>>
> >>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>>>> --- a/gcc/tree-ssa-loop-split.c
> >>>>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>>>      basic_block cond_bb;
> >>>>
> >>>>    	if (!initial_true)
> >>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>> +
> >>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>>> +			   ? EDGE_SUCC (bbs[i], 0)
> >>>> +			   : EDGE_SUCC (bbs[i], 1);
> >>>>
> >>>>>>     
> >>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>>>> -					   profile_probability::always
> >>>>>> (),
> >>>>>> -					   profile_probability::always
> >>>>>> (),
> >>>>>> -					   profile_probability::always
> >>>>>> (),
> >>>>>> -					   profile_probability::always
> >>>>>> (),
> >>>>>> +					   true_edge->probability,
> >>>>>> +
> >>>>>> true_edge->probability.invert (),
> >>>>>> +					   true_edge->probability,
> >>>>>> +
> >>>>>> true_edge->probability.invert (),
> >>>>>>             true);
> >>>>>
> >>>>> there is no 'true_edge' variable at this point.
> >>>>
> >>>> Sorry, missed the above hunk when split the patch.
> >>>>
> >>>>>
> >>>>>>      gcc_assert (loop2);
> >>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
> >>>>>> edge invar_branch)
> >>>>>>       initialize_original_copy_tables ();
> >>>>>>     
> >>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
> >>>>>> NULL,
> >>>>>> -				     profile_probability::always (),
> >>>>>> -				     profile_probability::never (),
> >>>>>> -				     profile_probability::always (),
> >>>>>> -				     profile_probability::always (),
> >>>>>> +				     invar_branch->probability.invert
> >>>>>> (),
> >>>>>> +				     invar_branch->probability,
> >>>>>> +				     invar_branch->probability.invert
> >>>>>> (),
> >>>>>> +				     invar_branch->probability,
> >>>>>>       				     true);
> >>>>>>       if (!loop2)
> >>>>>>         {
> >>>>>
> >>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>>>
> >>>> split_loop faces similar issue though it sets the two branches to 100% vs
> >>>> 100%
> >>>> and no scaling which seems also incorrect.
> >>>>
> >>>>> Since loop versioning inserts a condition with the passed probabilities
> >>>>> but in this case a 'boolean_true_node' condition the then and else
> >>>>> probabilities passed look correct.  It's just the scaling arguments
> >>>>> that look wrong?  This loop_version call should get a comment as to
> >>>>> why we are passing probabilities the way we do.
> >>>>
> >>>> This optimization is transforming from:
> >>>>
> >>>> for (i = 0; i < n; i = inc (i))
> >>>>       {
> >>>>         if (ga)
> >>>>           ga = do_something ();
> >>>> }
> >>>>
> >>>> to:
> >>>>
> >>>>     for (i = 0; i < x; i = inc (i))
> >>>> {
> >>>>       if (true)
> >>>>           ga = do_something ();
> >>>>           if (!ga)
> >>>>             break;
> >>>> }
> >>>>     for (; i < n; i = inc (i))
> >>>> {
> >>>>       if (false)
> >>>>            ga = do_something ();
> >>>> }
> >>>>
> >>>>
> >>>> `boolean_true_node` is passed in to make the first loop's condition
> >>>> statement to be 'true', after returning from loop_version, there is a
> >>>> piece of code forcing the condition in second loop to be false,
> >>>> and the original condition is moved from *in loop* to *exit edge*
> >>>> between loop1 and loop2.
> >>>
> >>> Yes, one complication is that we use loop_version but do not retain
> >>> the CFG it creates.  Something like the vectorizers
> >>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> >>> but then that code doesn't do any scaling yet.  But then
> >>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> >>> we could write a variant that simply doesn't mangle the CFG
> >>> with a new condition switching between both loops but keeps them
> >>> executed after each other ...
> >>>
> >>> As said, this adds to the confusion and some awkwardness.
> >>
> >> Then loop_version in loop split requires two types of variant, one
> >> is to insert condition to loop preheader for 'split_loops' usage,
> >> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> >> usage, it needs one extra function to encapsulate these cfg alterations
> >> from outside to inside.
> >>
> >> unswitching only execute one loop as it only moves the invariant condition
> >> to first loop's pre-header.  While 'split_loops' and
> >> 'do_split_loop_on_cond'
> >> may execute both loops no matter the condition is moved to the first loop's
> >> preheader or exit.
> >>
> >> The condition stmt in loop unswitching is invariant, but it is variant
> >> in loop splitting, that's why loop unswitching execute only one loop
> >> but loop splitting executes both loops.
> >>
> >> Seems we need two condition arguments for loop_version, one for connecting
> >> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> >> to loop2's header?  Then it will be more generic for both unswitching pass
> >> and splitting pass.  Looks a bit complicated and conflicted with
> >> loop_version's
> >> comments:
> >>
> >> /* Main entry point for Loop Versioning transformation.
> >>
> >>     This transformation given a condition and a loop, creates
> >>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> >>
> >>
> >> And this only works for loop split usage, those many other places
> >> doesn't use loop_version like this?
> > 
> > Yes, as said if you don't want the above CFG then you probably
> > shouldn't use loop_version but instead use its building blocks
> > (and some refactoring of loop_version can make that easier).
> > 
> > I think splitting wants
> > 
> >    loop_copy1
> >    if (condition)
> >      loop_copy2
> > 
> > IMHO it would be good to split 'loopify' into the actual loopification
> > and the scaling.  Basically make the part of loop_version that
> > copies the loop on the header edge and creates a loop structure for
> > it separate.
> > 
> > loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> > (copy_loop_before).
> > 
> 
> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> copying loops with single exit, it would cause many ICEs in it even
> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> is NULL returning from single_exit (loop).).  Seems loop_version is
> more flexible for loop split.

Hmm, sure - loop_version does not need to do anything special with
exits since control flow either enters the original or the loop copy.

But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
control flow that enters _both_ loops, so it needs to have
an edge from the first loop exit to the second loop entry.

One could extend this to a region copy, copying eventual CFG merges
of exits and specifying which exit of a SEME region is supposed
to be connected to the original region entry.

After all that's what loop splitting needs in the end - though not
sure what exactly it does with more than one exit.

So there's another set of "loop" copying, gimple_duplicate_sese_region,
which doesn't actually require a single exit but a single "important"
exit.  That might match how you treat multiple exits.

Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-09-23 12:17             ` Richard Biener
@ 2021-10-15  5:51               ` Xionghu Luo
  2021-10-21  8:43                 ` Xionghu Luo
  0 siblings, 1 reply; 20+ messages in thread
From: Xionghu Luo @ 2021-10-15  5:51 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, segher, fxue, wschmidt, guojiufu, linkw, hubicka

[-- Attachment #1: Type: text/plain, Size: 12709 bytes --]



On 2021/9/23 20:17, Richard Biener wrote:
> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/8/11 17:16, Richard Biener wrote:
>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
>>>
>>>>
>>>>
>>>> On 2021/8/10 22:47, Richard Biener wrote:
>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>>>
>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>>>> count and probability should also be duplicated instead of (100% vs
>>>>>>>> INV),
>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
>>>>>>>> the
>>>>>>>> original loop.
>>>>>>>>
>>>>>>>>
>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
>>>>>>>> ...
>>>>>>>>       int prephitmp_16;
>>>>>>>>       int prephitmp_25;
>>>>>>>>
>>>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>>>       if (n_7(D) > 0)
>>>>>>>>         goto <bb 4>; [89.00%]
>>>>>>>>       else
>>>>>>>>         goto <bb 3>; [11.00%]
>>>>>>>>
>>>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>>>       return;
>>>>>>>>
>>>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>>>       pretmp_3 = ga;
>>>>>>>>
>>>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>         goto <bb 6>; [33.00%]
>>>>>>>>       else
>>>>>>>>         goto <bb 7>; [67.00%]
>>>>>>>>
>>>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>>>       _2 = do_something ();
>>>>>>>>       ga = _2;
>>>>>>>>
>>>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>>>       i_10 = inc (i_13);
>>>>>>>>       if (n_7(D) > i_10)
>>>>>>>>         goto <bb 21>; [89.00%]
>>>>>>>>       else
>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>
>>>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>>>       goto <bb 3>; [100.00%]
>>>>>>>>
>>>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>> -    goto <bb 20>; [100.00%]
>>>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>>>       else
>>>>>>>> -    goto <bb 19>; [INV]
>>>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>>>
>>>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>>>       goto <bb 5>; [100.00%]
>>>>>>>>
>>>>>>>> -  <bb 19> [count: 0]:
>>>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>>>
>>>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>>>       i_22 = inc (i_15);
>>>>>>>>       if (n_7(D) > i_22)
>>>>>>>>         goto <bb 16>; [89.00%]
>>>>>>>>       else
>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>
>>>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>>>       goto <bb 12>; [100.00%]
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>>>>  (do_split_loop_on_cond): Likewise.
>>>>>>>> ---
>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>>>      basic_block cond_bb;
>>>>>>
>>>>>>    	if (!initial_true)
>>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>> +
>>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>>>
>>>>>>>>     
>>>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>>>> -					   profile_probability::always
>>>>>>>> (),
>>>>>>>> -					   profile_probability::always
>>>>>>>> (),
>>>>>>>> -					   profile_probability::always
>>>>>>>> (),
>>>>>>>> -					   profile_probability::always
>>>>>>>> (),
>>>>>>>> +					   true_edge->probability,
>>>>>>>> +
>>>>>>>> true_edge->probability.invert (),
>>>>>>>> +					   true_edge->probability,
>>>>>>>> +
>>>>>>>> true_edge->probability.invert (),
>>>>>>>>             true);
>>>>>>>
>>>>>>> there is no 'true_edge' variable at this point.
>>>>>>
>>>>>> Sorry, missed the above hunk when split the patch.
>>>>>>
>>>>>>>
>>>>>>>>      gcc_assert (loop2);
>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
>>>>>>>> edge invar_branch)
>>>>>>>>       initialize_original_copy_tables ();
>>>>>>>>     
>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
>>>>>>>> NULL,
>>>>>>>> -				     profile_probability::always (),
>>>>>>>> -				     profile_probability::never (),
>>>>>>>> -				     profile_probability::always (),
>>>>>>>> -				     profile_probability::always (),
>>>>>>>> +				     invar_branch->probability.invert
>>>>>>>> (),
>>>>>>>> +				     invar_branch->probability,
>>>>>>>> +				     invar_branch->probability.invert
>>>>>>>> (),
>>>>>>>> +				     invar_branch->probability,
>>>>>>>>       				     true);
>>>>>>>>       if (!loop2)
>>>>>>>>         {
>>>>>>>
>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>>>
>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
>>>>>> 100%
>>>>>> and no scaling which seems also incorrect.
>>>>>>
>>>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>>>> why we are passing probabilities the way we do.
>>>>>>
>>>>>> This optimization is transforming from:
>>>>>>
>>>>>> for (i = 0; i < n; i = inc (i))
>>>>>>       {
>>>>>>         if (ga)
>>>>>>           ga = do_something ();
>>>>>> }
>>>>>>
>>>>>> to:
>>>>>>
>>>>>>     for (i = 0; i < x; i = inc (i))
>>>>>> {
>>>>>>       if (true)
>>>>>>           ga = do_something ();
>>>>>>           if (!ga)
>>>>>>             break;
>>>>>> }
>>>>>>     for (; i < n; i = inc (i))
>>>>>> {
>>>>>>       if (false)
>>>>>>            ga = do_something ();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>>>> statement to be 'true', after returning from loop_version, there is a
>>>>>> piece of code forcing the condition in second loop to be false,
>>>>>> and the original condition is moved from *in loop* to *exit edge*
>>>>>> between loop1 and loop2.
>>>>>
>>>>> Yes, one complication is that we use loop_version but do not retain
>>>>> the CFG it creates.  Something like the vectorizers
>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>>>> but then that code doesn't do any scaling yet.  But then
>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>>>> we could write a variant that simply doesn't mangle the CFG
>>>>> with a new condition switching between both loops but keeps them
>>>>> executed after each other ...
>>>>>
>>>>> As said, this adds to the confusion and some awkwardness.
>>>>
>>>> Then loop_version in loop split requires two types of variant, one
>>>> is to insert condition to loop preheader for 'split_loops' usage,
>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>>>> usage, it needs one extra function to encapsulate these cfg alterations
>>>> from outside to inside.
>>>>
>>>> unswitching only execute one loop as it only moves the invariant condition
>>>> to first loop's pre-header.  While 'split_loops' and
>>>> 'do_split_loop_on_cond'
>>>> may execute both loops no matter the condition is moved to the first loop's
>>>> preheader or exit.
>>>>
>>>> The condition stmt in loop unswitching is invariant, but it is variant
>>>> in loop splitting, that's why loop unswitching execute only one loop
>>>> but loop splitting executes both loops.
>>>>
>>>> Seems we need two condition arguments for loop_version, one for connecting
>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
>>>> to loop2's header?  Then it will be more generic for both unswitching pass
>>>> and splitting pass.  Looks a bit complicated and conflicted with
>>>> loop_version's
>>>> comments:
>>>>
>>>> /* Main entry point for Loop Versioning transformation.
>>>>
>>>>     This transformation given a condition and a loop, creates
>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>>>
>>>>
>>>> And this only works for loop split usage, those many other places
>>>> doesn't use loop_version like this?
>>>
>>> Yes, as said if you don't want the above CFG then you probably
>>> shouldn't use loop_version but instead use its building blocks
>>> (and some refactoring of loop_version can make that easier).
>>>
>>> I think splitting wants
>>>
>>>    loop_copy1
>>>    if (condition)
>>>      loop_copy2
>>>
>>> IMHO it would be good to split 'loopify' into the actual loopification
>>> and the scaling.  Basically make the part of loop_version that
>>> copies the loop on the header edge and creates a loop structure for
>>> it separate.
>>>
>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
>>> (copy_loop_before).
>>>
>>
>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
>> copying loops with single exit, it would cause many ICEs in it even
>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
>> is NULL returning from single_exit (loop).).  Seems loop_version is
>> more flexible for loop split.
> 
> Hmm, sure - loop_version does not need to do anything special with
> exits since control flow either enters the original or the loop copy.
> 
> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> control flow that enters _both_ loops, so it needs to have
> an edge from the first loop exit to the second loop entry.
> 
> One could extend this to a region copy, copying eventual CFG merges
> of exits and specifying which exit of a SEME region is supposed
> to be connected to the original region entry.
> 
> After all that's what loop splitting needs in the end - though not
> sure what exactly it does with more than one exit.

In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
the recently added split_loop_on_cond could process multiple exits loop.

For example, do some changes to the loop-cond-split-1.c,

int ga;
extern int a;
extern int b;
extern int c;

void test1 (int n) {
  int i;
  for (i = 0; i < n; i = inc (i)). {
      if (a+3 > 0)
       break;

      if (ga)
        ga = do_something ();

      for (int j = 0; j < n; j++)
        {
           b+=5;
           if (b > c) break;
        }
    }
}

the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
I am not sure whether it is valuable to do semi-invariant loop split to such
cases with multiple exits, but obviously the split_loop_on_cond is a special
case from split_loop both duplicate loop to 
   if (condition1) {loop_copy1} if (condition2) {loop_copy2}
The only difference is condition1 is true for split_loop_on_cond.

> 
> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> which doesn't actually require a single exit but a single "important"
> exit.  That might match how you treat multiple exits.

gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
I found that loop_version is still much better
than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
since it could handle all cases like multiple exits/subloops, etc.  I did some
refactor to the code to introduce loop_version2 to create duplicated loops
with two input conditions as attached patch, is this reasonable enough?

I also tried to copy the code in loop_version out of it to don't call loop_version
in loop_version2, but it seems useless with many duplicate code and NOT get rid
of creating "if (condition1) {loop_copy1}" at first?


-- 
Thanks,
Xionghu

[-- Attachment #2: 0001-Add-loop_version2-to-support-loop-transformation-wit.patch --]
[-- Type: text/plain, Size: 13160 bytes --]

From cde3a3a1ab6001f340f3da5c37b416eb42febc8e Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo <luoxhu@linux.ibm.com>
Date: Thu, 14 Oct 2021 04:01:50 -0500
Subject: [PATCH] Add loop_version2 to support loop transformation with two
 conditions

loop_version2 is added to support transform loop to
  - if (condition1) {loop_copy1} if(condition2) {loop_copy2},
to support both loop split and loop conditional split usage.  Actually
do_split_loop_on_cond is a specific usage from loop split when
condition1 is true. The looop_version2 implementation calls
loop_version first to create
 -if (condition1) { loop_copy1  } else { loop_copy2  }
first, then insert condition2 to loop_copy2 with connect_loops.

gcc/ChangeLog:

	* tree-ssa-loop-split.c (connect_loop_phis): Move from ...
	(connect_loops): Move from ...
	(split_loop): Call loop_version2.
	(do_split_loop_on_cond): Call loop_version2.
	* cfgloopmanip.c (connect_loop_phis): ... to this.
	(connect_loops): ... to this.
	(loop_version2): New function.
	* cfgloopmanip.h (loop_version2): New declare.
---
 gcc/cfgloopmanip.c        | 178 ++++++++++++++++++++++++++++++++++++++
 gcc/cfgloopmanip.h        |   5 ++
 gcc/tree-ssa-loop-split.c |  58 ++++---------
 3 files changed, 201 insertions(+), 40 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 82c242dd720..b5d0d99734d 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
+#include "ssa.h"
+#include "tree-cfg.h"
 #include "cfghooks.h"
 #include "cfganal.h"
 #include "cfgloop.h"
@@ -1768,3 +1770,179 @@ loop_version (class loop *loop,
 
   return nloop;
 }
+
+/* This function updates the SSA form after connect_loops made a new
+   edge NEW_E leading from LOOP1 exit to LOOP2 (via in intermediate
+   conditional).  I.e. the second loop can now be entered either
+   via the original entry or via NEW_E, so the entry values of LOOP2
+   phi nodes are either the original ones or those at the exit
+   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
+   this.  The loops need to fulfill easy_exit_values().  */
+
+static void
+connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e)
+{
+  basic_block rest = loop_preheader_edge (loop2)->src;
+  gcc_assert (new_e->dest == rest);
+  edge skip_first = EDGE_PRED (rest, EDGE_PRED (rest, 0) == new_e);
+
+  edge firste = loop_preheader_edge (loop1);
+  edge seconde = loop_preheader_edge (loop2);
+  edge firstn = loop_latch_edge (loop1);
+  gphi_iterator psi_first, psi_second;
+  for (psi_first = gsi_start_phis (loop1->header),
+       psi_second = gsi_start_phis (loop2->header);
+       !gsi_end_p (psi_first);
+       gsi_next (&psi_first), gsi_next (&psi_second))
+    {
+      tree init, next, new_init;
+      use_operand_p op;
+      gphi *phi_first = psi_first.phi ();
+      gphi *phi_second = psi_second.phi ();
+
+      init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
+      next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
+      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+
+      /* Prefer using original variable as a base for the new ssa name.
+	 This is necessary for virtual ops, and useful in order to avoid
+	 losing debug info for real ops.  */
+      if (TREE_CODE (next) == SSA_NAME
+	  && useless_type_conversion_p (TREE_TYPE (next),
+					TREE_TYPE (init)))
+	new_init = copy_ssa_name (next);
+      else if (TREE_CODE (init) == SSA_NAME
+	       && useless_type_conversion_p (TREE_TYPE (init),
+					     TREE_TYPE (next)))
+	new_init = copy_ssa_name (init);
+      else if (useless_type_conversion_p (TREE_TYPE (next),
+					  TREE_TYPE (init)))
+	new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
+				       "unrinittmp");
+      else
+	new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
+				       "unrinittmp");
+
+      gphi * newphi = create_phi_node (new_init, rest);
+      add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
+      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
+      SET_USE (op, new_init);
+    }
+}
+
+/* The two loops LOOP1 and LOOP2 were just created by loop versioning,
+   they are still equivalent and placed in two arms of a diamond, like so:
+
+               .------if (cond)------.
+               v                     v
+             pre1                   pre2
+              |                      |
+        .--->h1                     h2<----.
+        |     |                      |     |
+        |    ex1---.            .---ex2    |
+        |    /     |            |     \    |
+        '---l1     X            |     l2---'
+                   |            |
+                   |            |
+                   '--->join<---'
+
+   This function transforms the program such that LOOP1 is conditionally
+   falling through to LOOP2, or skipping it.  This is done by splitting
+   the ex1->join edge at X in the diagram above, and inserting a condition
+   whose one arm goes to pre2, resulting in this situation:
+
+               .------if (cond)------.
+               v                     v
+             pre1       .---------->pre2
+              |         |            |
+        .--->h1         |           h2<----.
+        |     |         |            |     |
+        |    ex1---.    |       .---ex2    |
+        |    /     v    |       |     \    |
+        '---l1   skip---'       |     l2---'
+                   |            |
+                   |            |
+                   '--->join<---'
+
+
+   The condition used is the exit condition of LOOP1, which effectively means
+   that when the first loop exits (for whatever reason) but the real original
+   exit expression is still false the second loop will be entered.
+   The function returns the new edge cond->pre2.
+
+   This doesn't update the SSA form, see connect_loop_phis for that.  */
+
+static edge
+connect_loops (class loop *loop1, class loop *loop2, void *cond_e)
+{
+  tree cond_expr = (tree) cond_e;
+  edge exit = single_exit (loop1);
+  basic_block skip_bb = split_edge (exit);
+  gimple *new_cond_expr;
+  gimple_stmt_iterator gsi;
+  edge new_e, skip_e;
+
+  new_cond_expr = gimple_build_cond_from_tree (cond_expr, NULL_TREE, NULL_TREE);
+
+  gsi = gsi_last_bb (skip_bb);
+  gsi_insert_after (&gsi, new_cond_expr, GSI_NEW_STMT);
+
+  skip_e = EDGE_SUCC (skip_bb, 0);
+  skip_e->flags &= ~EDGE_FALLTHRU;
+  new_e = make_edge (skip_bb, loop_preheader_edge (loop2)->src, 0);
+  if (exit->flags & EDGE_TRUE_VALUE)
+    {
+      skip_e->flags |= EDGE_TRUE_VALUE;
+      new_e->flags |= EDGE_FALSE_VALUE;
+    }
+  else
+    {
+      skip_e->flags |= EDGE_FALSE_VALUE;
+      new_e->flags |= EDGE_TRUE_VALUE;
+    }
+
+  new_e->probability = profile_probability::likely ();
+  skip_e->probability = new_e->probability.invert ();
+
+  return new_e;
+}
+
+/* Main entry point for Loop Versioning transformation with two conditions.
+
+   This transformation given a condition and a loop, creates
+   -if (condition1) { loop_copy1 } if (condition2) { loop_copy2 },
+   where loop_copy1 is the loop transformed in one way, and loop_copy2
+   is the loop transformed in another way (or unchanged). COND_EXPR1
+   and COND_EXPR2 may be a run time test for things that were not resolved by
+   static analysis (overlapping ranges (anti-aliasing), alignment, etc.).
+
+   If non-NULL, CONDITION_BB is set to the basic block
+   containing the condition.
+
+   THEN_PROB is the probability of the then edge of the if.  THEN_SCALE
+   is the ratio by that the frequencies in the original loop should
+   be scaled.  ELSE_SCALE is the ratio by that the frequencies in the
+   new loop should be scaled.
+
+   If PLACE_AFTER is true, we place the new loop after LOOP in the
+   instruction stream, otherwise it is placed before LOOP.  */
+
+class loop *
+loop_version2 (class loop *loop, void *cond_expr1, basic_block *condition_bb,
+	       void *cond_expr2, profile_probability then_prob,
+	       profile_probability else_prob, profile_probability then_scale,
+	       profile_probability else_scale, bool place_after)
+{
+  /* Generate if (condition1) {loop_copy1} else {loop_copy2} and propotion the
+    profile_count first.  */
+  class loop *nloop = loop_version (loop, cond_expr1, condition_bb, then_prob,
+				    else_prob, then_scale, else_scale, place_after);
+
+  /* Fix up the cfg to
+      - if (condition1) {loop_copy1}; if (condition2) {loop_copy2}.  */
+  edge new_e = connect_loops (loop, nloop, cond_expr2);
+  connect_loop_phis (loop, nloop, new_e);
+
+  return nloop;
+}
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 07d5f925b79..23b077e94d5 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -63,4 +63,9 @@ class loop * loop_version (class loop *, void *,
 			    profile_probability, profile_probability,
 			    profile_probability, profile_probability, bool);
 
+class loop * loop_version2 (class loop *, void *,
+			    basic_block *, void *,
+			    profile_probability, profile_probability,
+			    profile_probability, profile_probability, bool);
+
 #endif /* GCC_CFGLOOPMANIP_H */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index d30782888f3..f3096488473 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -573,9 +573,9 @@ split_loop (class loop *loop1)
 	if (stmts2)
 	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
 					    stmts2);
-	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
+	tree cond1 = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+	  cond1 = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond1);
 
 	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
 			   ? EDGE_SUCC (bbs[i], 0)
@@ -586,16 +586,19 @@ split_loop (class loop *loop1)
 	initialize_original_copy_tables ();
 	basic_block cond_bb;
 
-	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
+	edge exit = single_exit (loop1);
+	gimple *stmt = last_stmt (exit->src);
+	tree cond2 = build2 (gimple_cond_code (stmt), boolean_type_node,
+			     gimple_cond_lhs (stmt), gimple_cond_rhs (stmt));
+
+	class loop *loop2 = loop_version2 (loop1, cond1, &cond_bb, cond2,
 					   true_edge->probability,
 					   true_edge->probability.invert (),
 					   true_edge->probability,
 					   true_edge->probability.invert (),
 					   true);
-	gcc_assert (loop2);
 
-	edge new_e = connect_loops (loop1, loop2);
-	connect_loop_phis (loop1, loop2, new_e);
+	gcc_assert (loop2);
 
 	/* The iterations of the second loop is now already
 	   exactly those that the first loop didn't do, but the
@@ -1489,12 +1492,16 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
 
   initialize_original_copy_tables ();
 
-  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     invar_branch->probability.invert (),
-				     invar_branch->probability,
+  gimple *break_cond = gimple_build_cond (gimple_cond_code(cond),
+					  gimple_cond_lhs (cond),
+					  gimple_cond_rhs (cond),
+					  NULL_TREE, NULL_TREE);
+
+  class loop *loop2 = loop_version2 (loop1, boolean_true_node, &cond_bb,
+				     break_cond, profile_probability::always (),
+				     profile_probability::never (),
 				     invar_branch->probability.invert (),
-				     invar_branch->probability,
-				     true);
+				     invar_branch->probability, true);
   if (!loop2)
     {
       free_original_copy_tables ();
@@ -1513,35 +1520,6 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
 
   update_stmt (cond_copy);
 
-  /* Insert a new conditional statement on latch edge of loop1, its condition
-     is duplicated from the semi-invariant.  This statement acts as a switch
-     to transfer execution from loop1 to loop2, when loop1 enters into
-     invariant state.  */
-  basic_block latch_bb = split_edge (loop_latch_edge (loop1));
-  basic_block break_bb = split_edge (single_pred_edge (latch_bb));
-  gimple *break_cond = gimple_build_cond (gimple_cond_code(cond),
-					  gimple_cond_lhs (cond),
-					  gimple_cond_rhs (cond),
-					  NULL_TREE, NULL_TREE);
-
-  gimple_stmt_iterator gsi = gsi_last_bb (break_bb);
-  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
-
-  edge to_loop1 = single_succ_edge (break_bb);
-  edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0);
-
-  to_loop1->flags &= ~EDGE_FALLTHRU;
-  to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
-  to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
-
-  to_loop1->probability = invar_branch->probability.invert ();
-  to_loop2->probability = invar_branch->probability;
-
-  /* Due to introduction of a control flow edge from loop1 latch to loop2
-     pre-header, we should update PHIs in loop2 to reflect this connection
-     between loop1 and loop2.  */
-  connect_loop_phis (loop1, loop2, to_loop2);
-
   free_original_copy_tables ();
 
   return true;
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-10-15  5:51               ` Xionghu Luo
@ 2021-10-21  8:43                 ` Xionghu Luo
  2021-10-21 10:55                   ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Xionghu Luo @ 2021-10-21  8:43 UTC (permalink / raw)
  To: Richard Biener; +Cc: segher, wschmidt, linkw, gcc-patches, hubicka



On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/9/23 20:17, Richard Biener wrote:
>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
>>
>>>
>>>
>>> On 2021/8/11 17:16, Richard Biener wrote:
>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2021/8/10 22:47, Richard Biener wrote:
>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>>>>
>>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>>>>> count and probability should also be duplicated instead of (100% vs
>>>>>>>>> INV),
>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
>>>>>>>>> the
>>>>>>>>> original loop.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
>>>>>>>>> ...
>>>>>>>>>       int prephitmp_16;
>>>>>>>>>       int prephitmp_25;
>>>>>>>>>
>>>>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>>>>       if (n_7(D) > 0)
>>>>>>>>>         goto <bb 4>; [89.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 3>; [11.00%]
>>>>>>>>>
>>>>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>>>>       return;
>>>>>>>>>
>>>>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>>>>       pretmp_3 = ga;
>>>>>>>>>
>>>>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>>         goto <bb 6>; [33.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 7>; [67.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>>>>       _2 = do_something ();
>>>>>>>>>       ga = _2;
>>>>>>>>>
>>>>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>>>>       i_10 = inc (i_13);
>>>>>>>>>       if (n_7(D) > i_10)
>>>>>>>>>         goto <bb 21>; [89.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>>
>>>>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>>>>       goto <bb 3>; [100.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>> -    goto <bb 20>; [100.00%]
>>>>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>>>>       else
>>>>>>>>> -    goto <bb 19>; [INV]
>>>>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>>>>       goto <bb 5>; [100.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 19> [count: 0]:
>>>>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>>>>
>>>>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>>>>       i_22 = inc (i_15);
>>>>>>>>>       if (n_7(D) > i_22)
>>>>>>>>>         goto <bb 16>; [89.00%]
>>>>>>>>>       else
>>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>>
>>>>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>>>>       goto <bb 12>; [100.00%]
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>>>>>  (do_split_loop_on_cond): Likewise.
>>>>>>>>> ---
>>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>>>>      basic_block cond_bb;
>>>>>>>
>>>>>>>    	if (!initial_true)
>>>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>>> +
>>>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>>>>
>>>>>>>>>     
>>>>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>>>>> -					   profile_probability::always
>>>>>>>>> (),
>>>>>>>>> -					   profile_probability::always
>>>>>>>>> (),
>>>>>>>>> -					   profile_probability::always
>>>>>>>>> (),
>>>>>>>>> -					   profile_probability::always
>>>>>>>>> (),
>>>>>>>>> +					   true_edge->probability,
>>>>>>>>> +
>>>>>>>>> true_edge->probability.invert (),
>>>>>>>>> +					   true_edge->probability,
>>>>>>>>> +
>>>>>>>>> true_edge->probability.invert (),
>>>>>>>>>             true);
>>>>>>>>
>>>>>>>> there is no 'true_edge' variable at this point.
>>>>>>>
>>>>>>> Sorry, missed the above hunk when split the patch.
>>>>>>>
>>>>>>>>
>>>>>>>>>      gcc_assert (loop2);
>>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
>>>>>>>>> edge invar_branch)
>>>>>>>>>       initialize_original_copy_tables ();
>>>>>>>>>     
>>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
>>>>>>>>> NULL,
>>>>>>>>> -				     profile_probability::always (),
>>>>>>>>> -				     profile_probability::never (),
>>>>>>>>> -				     profile_probability::always (),
>>>>>>>>> -				     profile_probability::always (),
>>>>>>>>> +				     invar_branch->probability.invert
>>>>>>>>> (),
>>>>>>>>> +				     invar_branch->probability,
>>>>>>>>> +				     invar_branch->probability.invert
>>>>>>>>> (),
>>>>>>>>> +				     invar_branch->probability,
>>>>>>>>>       				     true);
>>>>>>>>>       if (!loop2)
>>>>>>>>>         {
>>>>>>>>
>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>>>>
>>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
>>>>>>> 100%
>>>>>>> and no scaling which seems also incorrect.
>>>>>>>
>>>>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>>>>> why we are passing probabilities the way we do.
>>>>>>>
>>>>>>> This optimization is transforming from:
>>>>>>>
>>>>>>> for (i = 0; i < n; i = inc (i))
>>>>>>>       {
>>>>>>>         if (ga)
>>>>>>>           ga = do_something ();
>>>>>>> }
>>>>>>>
>>>>>>> to:
>>>>>>>
>>>>>>>     for (i = 0; i < x; i = inc (i))
>>>>>>> {
>>>>>>>       if (true)
>>>>>>>           ga = do_something ();
>>>>>>>           if (!ga)
>>>>>>>             break;
>>>>>>> }
>>>>>>>     for (; i < n; i = inc (i))
>>>>>>> {
>>>>>>>       if (false)
>>>>>>>            ga = do_something ();
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>>>>> statement to be 'true', after returning from loop_version, there is a
>>>>>>> piece of code forcing the condition in second loop to be false,
>>>>>>> and the original condition is moved from *in loop* to *exit edge*
>>>>>>> between loop1 and loop2.
>>>>>>
>>>>>> Yes, one complication is that we use loop_version but do not retain
>>>>>> the CFG it creates.  Something like the vectorizers
>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>>>>> but then that code doesn't do any scaling yet.  But then
>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>>>>> we could write a variant that simply doesn't mangle the CFG
>>>>>> with a new condition switching between both loops but keeps them
>>>>>> executed after each other ...
>>>>>>
>>>>>> As said, this adds to the confusion and some awkwardness.
>>>>>
>>>>> Then loop_version in loop split requires two types of variant, one
>>>>> is to insert condition to loop preheader for 'split_loops' usage,
>>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>>>>> usage, it needs one extra function to encapsulate these cfg alterations
>>>>> from outside to inside.
>>>>>
>>>>> unswitching only execute one loop as it only moves the invariant condition
>>>>> to first loop's pre-header.  While 'split_loops' and
>>>>> 'do_split_loop_on_cond'
>>>>> may execute both loops no matter the condition is moved to the first loop's
>>>>> preheader or exit.
>>>>>
>>>>> The condition stmt in loop unswitching is invariant, but it is variant
>>>>> in loop splitting, that's why loop unswitching execute only one loop
>>>>> but loop splitting executes both loops.
>>>>>
>>>>> Seems we need two condition arguments for loop_version, one for connecting
>>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
>>>>> to loop2's header?  Then it will be more generic for both unswitching pass
>>>>> and splitting pass.  Looks a bit complicated and conflicted with
>>>>> loop_version's
>>>>> comments:
>>>>>
>>>>> /* Main entry point for Loop Versioning transformation.
>>>>>
>>>>>     This transformation given a condition and a loop, creates
>>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>>>>
>>>>>
>>>>> And this only works for loop split usage, those many other places
>>>>> doesn't use loop_version like this?
>>>>
>>>> Yes, as said if you don't want the above CFG then you probably
>>>> shouldn't use loop_version but instead use its building blocks
>>>> (and some refactoring of loop_version can make that easier).
>>>>
>>>> I think splitting wants
>>>>
>>>>    loop_copy1
>>>>    if (condition)
>>>>      loop_copy2
>>>>
>>>> IMHO it would be good to split 'loopify' into the actual loopification
>>>> and the scaling.  Basically make the part of loop_version that
>>>> copies the loop on the header edge and creates a loop structure for
>>>> it separate.
>>>>
>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
>>>> (copy_loop_before).
>>>>
>>>
>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
>>> copying loops with single exit, it would cause many ICEs in it even
>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
>>> is NULL returning from single_exit (loop).).  Seems loop_version is
>>> more flexible for loop split.
>>
>> Hmm, sure - loop_version does not need to do anything special with
>> exits since control flow either enters the original or the loop copy.
>>
>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
>> control flow that enters _both_ loops, so it needs to have
>> an edge from the first loop exit to the second loop entry.
>>
>> One could extend this to a region copy, copying eventual CFG merges
>> of exits and specifying which exit of a SEME region is supposed
>> to be connected to the original region entry.
>>
>> After all that's what loop splitting needs in the end - though not
>> sure what exactly it does with more than one exit.
> 
> In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
> the recently added split_loop_on_cond could process multiple exits loop.
> 
> For example, do some changes to the loop-cond-split-1.c,
> 
> int ga;
> extern int a;
> extern int b;
> extern int c;
> 
> void test1 (int n) {
>   int i;
>   for (i = 0; i < n; i = inc (i)). {
>       if (a+3 > 0)
>        break;
> 
>       if (ga)
>         ga = do_something ();
> 
>       for (int j = 0; j < n; j++)
>         {
>            b+=5;
>            if (b > c) break;
>         }
>     }
> }
> 
> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
> I am not sure whether it is valuable to do semi-invariant loop split to such
> cases with multiple exits, but obviously the split_loop_on_cond is a special
> case from split_loop both duplicate loop to 
>    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> The only difference is condition1 is true for split_loop_on_cond.
> 
>>
>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
>> which doesn't actually require a single exit but a single "important"
>> exit.  That might match how you treat multiple exits.
> 
> gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
> I found that loop_version is still much better
> than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> since it could handle all cases like multiple exits/subloops, etc.  I did some
> refactor to the code to introduce loop_version2 to create duplicated loops
> with two input conditions as attached patch, is this reasonable enough?
> 
> I also tried to copy the code in loop_version out of it to don't call loop_version
> in loop_version2, but it seems useless with many duplicate code and NOT get rid
> of creating "if (condition1) {loop_copy1}" at first?
> 
> 

The previous attached patch 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
had issues when connecting two loops, reason is split_loop connect_loops at *exit* edge,
do_split_loop_on_cond connect_loops at latch_edge.  So they have different CFGs even
both with two conditions :(.  It seems not so that useful to also define another new function
'connect_loops_at_latch' given the limited usage in loop split only?  And the loop_version2
also looks not so general again ...


-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-10-21  8:43                 ` Xionghu Luo
@ 2021-10-21 10:55                   ` Richard Biener
  2021-10-26  5:40                     ` Xionghu Luo
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2021-10-21 10:55 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: segher, wschmidt, linkw, gcc-patches, hubicka

On Thu, 21 Oct 2021, Xionghu Luo wrote:

> 
> 
> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> > 
> > 
> > On 2021/9/23 20:17, Richard Biener wrote:
> >> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> >>
> >>>
> >>>
> >>> On 2021/8/11 17:16, Richard Biener wrote:
> >>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> On 2021/8/10 22:47, Richard Biener wrote:
> >>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
> >>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>>>>>>
> >>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>>>>>>> count and probability should also be duplicated instead of (100% vs
> >>>>>>>>> INV),
> >>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
> >>>>>>>>> the
> >>>>>>>>> original loop.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> >>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> >>>>>>>>> ...
> >>>>>>>>>       int prephitmp_16;
> >>>>>>>>>       int prephitmp_25;
> >>>>>>>>>
> >>>>>>>>>       <bb 2> [local count: 118111600]:
> >>>>>>>>>       if (n_7(D) > 0)
> >>>>>>>>>         goto <bb 4>; [89.00%]
> >>>>>>>>>       else
> >>>>>>>>>         goto <bb 3>; [11.00%]
> >>>>>>>>>
> >>>>>>>>>       <bb 3> [local count: 118111600]:
> >>>>>>>>>       return;
> >>>>>>>>>
> >>>>>>>>>       <bb 4> [local count: 105119324]:
> >>>>>>>>>       pretmp_3 = ga;
> >>>>>>>>>
> >>>>>>>>> -  <bb 5> [local count: 955630225]:
> >>>>>>>>> +  <bb 5> [local count: 315357973]:
> >>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
> >>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>>>>>>       if (prephitmp_12 != 0)
> >>>>>>>>>         goto <bb 6>; [33.00%]
> >>>>>>>>>       else
> >>>>>>>>>         goto <bb 7>; [67.00%]
> >>>>>>>>>
> >>>>>>>>> -  <bb 6> [local count: 315357972]:
> >>>>>>>>> +  <bb 6> [local count: 104068130]:
> >>>>>>>>>       _2 = do_something ();
> >>>>>>>>>       ga = _2;
> >>>>>>>>>
> >>>>>>>>> -  <bb 7> [local count: 955630225]:
> >>>>>>>>> +  <bb 7> [local count: 315357973]:
> >>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>>>>>>       i_10 = inc (i_13);
> >>>>>>>>>       if (n_7(D) > i_10)
> >>>>>>>>>         goto <bb 21>; [89.00%]
> >>>>>>>>>       else
> >>>>>>>>>         goto <bb 11>; [11.00%]
> >>>>>>>>>
> >>>>>>>>>       <bb 11> [local count: 105119324]:
> >>>>>>>>>       goto <bb 3>; [100.00%]
> >>>>>>>>>
> >>>>>>>>> -  <bb 21> [local count: 850510901]:
> >>>>>>>>> +  <bb 21> [local count: 280668596]:
> >>>>>>>>>       if (prephitmp_12 != 0)
> >>>>>>>>> -    goto <bb 20>; [100.00%]
> >>>>>>>>> +    goto <bb 20>; [33.00%]
> >>>>>>>>>       else
> >>>>>>>>> -    goto <bb 19>; [INV]
> >>>>>>>>> +    goto <bb 19>; [67.00%]
> >>>>>>>>>
> >>>>>>>>> -  <bb 20> [local count: 850510901]:
> >>>>>>>>> +  <bb 20> [local count: 280668596]:
> >>>>>>>>>       goto <bb 5>; [100.00%]
> >>>>>>>>>
> >>>>>>>>> -  <bb 19> [count: 0]:
> >>>>>>>>> +  <bb 19> [local count: 70429947]:
> >>>>>>>>>       # i_23 = PHI <i_10(21)>
> >>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>>>>>>
> >>>>>>>>> -  <bb 12> [local count: 955630225]:
> >>>>>>>>> +  <bb 12> [local count: 640272252]:
> >>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
> >>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>>>>>>       i_22 = inc (i_15);
> >>>>>>>>>       if (n_7(D) > i_22)
> >>>>>>>>>         goto <bb 16>; [89.00%]
> >>>>>>>>>       else
> >>>>>>>>>         goto <bb 11>; [11.00%]
> >>>>>>>>>
> >>>>>>>>> -  <bb 16> [local count: 850510901]:
> >>>>>>>>> +  <bb 16> [local count: 569842305]:
> >>>>>>>>>       goto <bb 12>; [100.00%]
> >>>>>>>>>
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>
> >>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>>>>>>>  (do_split_loop_on_cond): Likewise.
> >>>>>>>>> ---
> >>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
> >>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>>>>>>      basic_block cond_bb;
> >>>>>>>
> >>>>>>>    	if (!initial_true)
> >>>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>> +
> >>>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
> >>>>>>> +			   : EDGE_SUCC (bbs[i], 1);
> >>>>>>>
> >>>>>>>>>     
> >>>>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>>>>>>> -					   profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> -					   profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> -					   profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> -					   profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> +					   true_edge->probability,
> >>>>>>>>> +
> >>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>> +					   true_edge->probability,
> >>>>>>>>> +
> >>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>>             true);
> >>>>>>>>
> >>>>>>>> there is no 'true_edge' variable at this point.
> >>>>>>>
> >>>>>>> Sorry, missed the above hunk when split the patch.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>      gcc_assert (loop2);
> >>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
> >>>>>>>>> edge invar_branch)
> >>>>>>>>>       initialize_original_copy_tables ();
> >>>>>>>>>     
> >>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
> >>>>>>>>> NULL,
> >>>>>>>>> -				     profile_probability::always (),
> >>>>>>>>> -				     profile_probability::never (),
> >>>>>>>>> -				     profile_probability::always (),
> >>>>>>>>> -				     profile_probability::always (),
> >>>>>>>>> +				     invar_branch->probability.invert
> >>>>>>>>> (),
> >>>>>>>>> +				     invar_branch->probability,
> >>>>>>>>> +				     invar_branch->probability.invert
> >>>>>>>>> (),
> >>>>>>>>> +				     invar_branch->probability,
> >>>>>>>>>       				     true);
> >>>>>>>>>       if (!loop2)
> >>>>>>>>>         {
> >>>>>>>>
> >>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>>>>>>
> >>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
> >>>>>>> 100%
> >>>>>>> and no scaling which seems also incorrect.
> >>>>>>>
> >>>>>>>> Since loop versioning inserts a condition with the passed probabilities
> >>>>>>>> but in this case a 'boolean_true_node' condition the then and else
> >>>>>>>> probabilities passed look correct.  It's just the scaling arguments
> >>>>>>>> that look wrong?  This loop_version call should get a comment as to
> >>>>>>>> why we are passing probabilities the way we do.
> >>>>>>>
> >>>>>>> This optimization is transforming from:
> >>>>>>>
> >>>>>>> for (i = 0; i < n; i = inc (i))
> >>>>>>>       {
> >>>>>>>         if (ga)
> >>>>>>>           ga = do_something ();
> >>>>>>> }
> >>>>>>>
> >>>>>>> to:
> >>>>>>>
> >>>>>>>     for (i = 0; i < x; i = inc (i))
> >>>>>>> {
> >>>>>>>       if (true)
> >>>>>>>           ga = do_something ();
> >>>>>>>           if (!ga)
> >>>>>>>             break;
> >>>>>>> }
> >>>>>>>     for (; i < n; i = inc (i))
> >>>>>>> {
> >>>>>>>       if (false)
> >>>>>>>            ga = do_something ();
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> `boolean_true_node` is passed in to make the first loop's condition
> >>>>>>> statement to be 'true', after returning from loop_version, there is a
> >>>>>>> piece of code forcing the condition in second loop to be false,
> >>>>>>> and the original condition is moved from *in loop* to *exit edge*
> >>>>>>> between loop1 and loop2.
> >>>>>>
> >>>>>> Yes, one complication is that we use loop_version but do not retain
> >>>>>> the CFG it creates.  Something like the vectorizers
> >>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> >>>>>> but then that code doesn't do any scaling yet.  But then
> >>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> >>>>>> we could write a variant that simply doesn't mangle the CFG
> >>>>>> with a new condition switching between both loops but keeps them
> >>>>>> executed after each other ...
> >>>>>>
> >>>>>> As said, this adds to the confusion and some awkwardness.
> >>>>>
> >>>>> Then loop_version in loop split requires two types of variant, one
> >>>>> is to insert condition to loop preheader for 'split_loops' usage,
> >>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> >>>>> usage, it needs one extra function to encapsulate these cfg alterations
> >>>>> from outside to inside.
> >>>>>
> >>>>> unswitching only execute one loop as it only moves the invariant condition
> >>>>> to first loop's pre-header.  While 'split_loops' and
> >>>>> 'do_split_loop_on_cond'
> >>>>> may execute both loops no matter the condition is moved to the first loop's
> >>>>> preheader or exit.
> >>>>>
> >>>>> The condition stmt in loop unswitching is invariant, but it is variant
> >>>>> in loop splitting, that's why loop unswitching execute only one loop
> >>>>> but loop splitting executes both loops.
> >>>>>
> >>>>> Seems we need two condition arguments for loop_version, one for connecting
> >>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> >>>>> to loop2's header?  Then it will be more generic for both unswitching pass
> >>>>> and splitting pass.  Looks a bit complicated and conflicted with
> >>>>> loop_version's
> >>>>> comments:
> >>>>>
> >>>>> /* Main entry point for Loop Versioning transformation.
> >>>>>
> >>>>>     This transformation given a condition and a loop, creates
> >>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> >>>>>
> >>>>>
> >>>>> And this only works for loop split usage, those many other places
> >>>>> doesn't use loop_version like this?
> >>>>
> >>>> Yes, as said if you don't want the above CFG then you probably
> >>>> shouldn't use loop_version but instead use its building blocks
> >>>> (and some refactoring of loop_version can make that easier).
> >>>>
> >>>> I think splitting wants
> >>>>
> >>>>    loop_copy1
> >>>>    if (condition)
> >>>>      loop_copy2
> >>>>
> >>>> IMHO it would be good to split 'loopify' into the actual loopification
> >>>> and the scaling.  Basically make the part of loop_version that
> >>>> copies the loop on the header edge and creates a loop structure for
> >>>> it separate.
> >>>>
> >>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> >>>> (copy_loop_before).
> >>>>
> >>>
> >>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> >>> copying loops with single exit, it would cause many ICEs in it even
> >>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> >>> is NULL returning from single_exit (loop).).  Seems loop_version is
> >>> more flexible for loop split.
> >>
> >> Hmm, sure - loop_version does not need to do anything special with
> >> exits since control flow either enters the original or the loop copy.
> >>
> >> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> >> control flow that enters _both_ loops, so it needs to have
> >> an edge from the first loop exit to the second loop entry.
> >>
> >> One could extend this to a region copy, copying eventual CFG merges
> >> of exits and specifying which exit of a SEME region is supposed
> >> to be connected to the original region entry.
> >>
> >> After all that's what loop splitting needs in the end - though not
> >> sure what exactly it does with more than one exit.
> > 
> > In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
> > the recently added split_loop_on_cond could process multiple exits loop.
> > 
> > For example, do some changes to the loop-cond-split-1.c,
> > 
> > int ga;
> > extern int a;
> > extern int b;
> > extern int c;
> > 
> > void test1 (int n) {
> >   int i;
> >   for (i = 0; i < n; i = inc (i)). {
> >       if (a+3 > 0)
> >        break;
> > 
> >       if (ga)
> >         ga = do_something ();
> > 
> >       for (int j = 0; j < n; j++)
> >         {
> >            b+=5;
> >            if (b > c) break;
> >         }
> >     }
> > }
> > 
> > the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
> > I am not sure whether it is valuable to do semi-invariant loop split to such
> > cases with multiple exits, but obviously the split_loop_on_cond is a special
> > case from split_loop both duplicate loop to 
> >    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> > The only difference is condition1 is true for split_loop_on_cond.
> > 
> >>
> >> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> >> which doesn't actually require a single exit but a single "important"
> >> exit.  That might match how you treat multiple exits.
> > 
> > gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
> > I found that loop_version is still much better
> > than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> > since it could handle all cases like multiple exits/subloops, etc.  I did some
> > refactor to the code to introduce loop_version2 to create duplicated loops
> > with two input conditions as attached patch, is this reasonable enough?
> > 
> > I also tried to copy the code in loop_version out of it to don't call loop_version
> > in loop_version2, but it seems useless with many duplicate code and NOT get rid
> > of creating "if (condition1) {loop_copy1}" at first?
> > 
> > 
> 
> The previous attached patch 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
> had issues when connecting two loops, reason is split_loop connect_loops at *exit* edge,
> do_split_loop_on_cond connect_loops at latch_edge.  So they have different CFGs even
> both with two conditions :(.  It seems not so that useful to also define another new function
> 'connect_loops_at_latch' given the limited usage in loop split only?  And the loop_version2
> also looks not so general again ...

All I wanted to say is that none of the current high-level APIs we have
are a 1:1 match for loop splitting and that they in fact might do
stuff that's unnecessary or counter-productive.  If inventing a new API
doesn't sound appealing then maybe refactoring an existing
(maybe loop_version) to expose re-usable pieces would make more sense.

For example loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header.  I wonder if
the condition generation can be done later and thus we can
have three pieces - 1) duplicating the loop on the entry edge,
2) adjusting the CFG to insert a condition branching to either loop,
3) from loopify extract the scale_loop_frequencies bits (in fact
loopify is only used from within cfgloopmanip.c)

That said, it shouldn't be a requirement to do all this to fix the
profile for loop splitting it's just that the current situation
does not help understanding of how the adjustment works.

Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-10-21 10:55                   ` Richard Biener
@ 2021-10-26  5:40                     ` Xionghu Luo
  2021-10-26 11:59                       ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Xionghu Luo @ 2021-10-26  5:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: segher, wschmidt, linkw, gcc-patches, hubicka

[-- Attachment #1: Type: text/plain, Size: 15904 bytes --]



On 2021/10/21 18:55, Richard Biener wrote:
> On Thu, 21 Oct 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
>>>
>>>
>>> On 2021/9/23 20:17, Richard Biener wrote:
>>>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2021/8/11 17:16, Richard Biener wrote:
>>>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2021/8/10 22:47, Richard Biener wrote:
>>>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>>>>>>
>>>>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>>>>>>> count and probability should also be duplicated instead of (100% vs
>>>>>>>>>>> INV),
>>>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
>>>>>>>>>>> the
>>>>>>>>>>> original loop.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
>>>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
>>>>>>>>>>> ...
>>>>>>>>>>>       int prephitmp_16;
>>>>>>>>>>>       int prephitmp_25;
>>>>>>>>>>>
>>>>>>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>>>>>>       if (n_7(D) > 0)
>>>>>>>>>>>         goto <bb 4>; [89.00%]
>>>>>>>>>>>       else
>>>>>>>>>>>         goto <bb 3>; [11.00%]
>>>>>>>>>>>
>>>>>>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>>>>>>       return;
>>>>>>>>>>>
>>>>>>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>>>>>>       pretmp_3 = ga;
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>>>>         goto <bb 6>; [33.00%]
>>>>>>>>>>>       else
>>>>>>>>>>>         goto <bb 7>; [67.00%]
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>>>>>>       _2 = do_something ();
>>>>>>>>>>>       ga = _2;
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>>>>>>       i_10 = inc (i_13);
>>>>>>>>>>>       if (n_7(D) > i_10)
>>>>>>>>>>>         goto <bb 21>; [89.00%]
>>>>>>>>>>>       else
>>>>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>>>>
>>>>>>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>>>>>>       goto <bb 3>; [100.00%]
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>>>>>>       if (prephitmp_12 != 0)
>>>>>>>>>>> -    goto <bb 20>; [100.00%]
>>>>>>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>>>>>>       else
>>>>>>>>>>> -    goto <bb 19>; [INV]
>>>>>>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>>>>>>       goto <bb 5>; [100.00%]
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 19> [count: 0]:
>>>>>>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>>>>>>       i_22 = inc (i_15);
>>>>>>>>>>>       if (n_7(D) > i_22)
>>>>>>>>>>>         goto <bb 16>; [89.00%]
>>>>>>>>>>>       else
>>>>>>>>>>>         goto <bb 11>; [11.00%]
>>>>>>>>>>>
>>>>>>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>>>>>>       goto <bb 12>; [100.00%]
>>>>>>>>>>>
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>>>>>>>  (do_split_loop_on_cond): Likewise.
>>>>>>>>>>> ---
>>>>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>>>>>>      basic_block cond_bb;
>>>>>>>>>
>>>>>>>>>    	if (!initial_true)
>>>>>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>>>>>>> +
>>>>>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>>>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>>>>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>>>>>>
>>>>>>>>>>>     
>>>>>>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>>>>>>> -					   profile_probability::always
>>>>>>>>>>> (),
>>>>>>>>>>> -					   profile_probability::always
>>>>>>>>>>> (),
>>>>>>>>>>> -					   profile_probability::always
>>>>>>>>>>> (),
>>>>>>>>>>> -					   profile_probability::always
>>>>>>>>>>> (),
>>>>>>>>>>> +					   true_edge->probability,
>>>>>>>>>>> +
>>>>>>>>>>> true_edge->probability.invert (),
>>>>>>>>>>> +					   true_edge->probability,
>>>>>>>>>>> +
>>>>>>>>>>> true_edge->probability.invert (),
>>>>>>>>>>>             true);
>>>>>>>>>>
>>>>>>>>>> there is no 'true_edge' variable at this point.
>>>>>>>>>
>>>>>>>>> Sorry, missed the above hunk when split the patch.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>      gcc_assert (loop2);
>>>>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
>>>>>>>>>>> edge invar_branch)
>>>>>>>>>>>       initialize_original_copy_tables ();
>>>>>>>>>>>     
>>>>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
>>>>>>>>>>> NULL,
>>>>>>>>>>> -				     profile_probability::always (),
>>>>>>>>>>> -				     profile_probability::never (),
>>>>>>>>>>> -				     profile_probability::always (),
>>>>>>>>>>> -				     profile_probability::always (),
>>>>>>>>>>> +				     invar_branch->probability.invert
>>>>>>>>>>> (),
>>>>>>>>>>> +				     invar_branch->probability,
>>>>>>>>>>> +				     invar_branch->probability.invert
>>>>>>>>>>> (),
>>>>>>>>>>> +				     invar_branch->probability,
>>>>>>>>>>>       				     true);
>>>>>>>>>>>       if (!loop2)
>>>>>>>>>>>         {
>>>>>>>>>>
>>>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>>>>>>
>>>>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
>>>>>>>>> 100%
>>>>>>>>> and no scaling which seems also incorrect.
>>>>>>>>>
>>>>>>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>>>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>>>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>>>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>>>>>>> why we are passing probabilities the way we do.
>>>>>>>>>
>>>>>>>>> This optimization is transforming from:
>>>>>>>>>
>>>>>>>>> for (i = 0; i < n; i = inc (i))
>>>>>>>>>       {
>>>>>>>>>         if (ga)
>>>>>>>>>           ga = do_something ();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> to:
>>>>>>>>>
>>>>>>>>>     for (i = 0; i < x; i = inc (i))
>>>>>>>>> {
>>>>>>>>>       if (true)
>>>>>>>>>           ga = do_something ();
>>>>>>>>>           if (!ga)
>>>>>>>>>             break;
>>>>>>>>> }
>>>>>>>>>     for (; i < n; i = inc (i))
>>>>>>>>> {
>>>>>>>>>       if (false)
>>>>>>>>>            ga = do_something ();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>>>>>>> statement to be 'true', after returning from loop_version, there is a
>>>>>>>>> piece of code forcing the condition in second loop to be false,
>>>>>>>>> and the original condition is moved from *in loop* to *exit edge*
>>>>>>>>> between loop1 and loop2.
>>>>>>>>
>>>>>>>> Yes, one complication is that we use loop_version but do not retain
>>>>>>>> the CFG it creates.  Something like the vectorizers
>>>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>>>>>>> but then that code doesn't do any scaling yet.  But then
>>>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>>>>>>> we could write a variant that simply doesn't mangle the CFG
>>>>>>>> with a new condition switching between both loops but keeps them
>>>>>>>> executed after each other ...
>>>>>>>>
>>>>>>>> As said, this adds to the confusion and some awkwardness.
>>>>>>>
>>>>>>> Then loop_version in loop split requires two types of variant, one
>>>>>>> is to insert condition to loop preheader for 'split_loops' usage,
>>>>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>>>>>>> usage, it needs one extra function to encapsulate these cfg alterations
>>>>>>> from outside to inside.
>>>>>>>
>>>>>>> unswitching only execute one loop as it only moves the invariant condition
>>>>>>> to first loop's pre-header.  While 'split_loops' and
>>>>>>> 'do_split_loop_on_cond'
>>>>>>> may execute both loops no matter the condition is moved to the first loop's
>>>>>>> preheader or exit.
>>>>>>>
>>>>>>> The condition stmt in loop unswitching is invariant, but it is variant
>>>>>>> in loop splitting, that's why loop unswitching execute only one loop
>>>>>>> but loop splitting executes both loops.
>>>>>>>
>>>>>>> Seems we need two condition arguments for loop_version, one for connecting
>>>>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
>>>>>>> to loop2's header?  Then it will be more generic for both unswitching pass
>>>>>>> and splitting pass.  Looks a bit complicated and conflicted with
>>>>>>> loop_version's
>>>>>>> comments:
>>>>>>>
>>>>>>> /* Main entry point for Loop Versioning transformation.
>>>>>>>
>>>>>>>     This transformation given a condition and a loop, creates
>>>>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>>>>>>
>>>>>>>
>>>>>>> And this only works for loop split usage, those many other places
>>>>>>> doesn't use loop_version like this?
>>>>>>
>>>>>> Yes, as said if you don't want the above CFG then you probably
>>>>>> shouldn't use loop_version but instead use its building blocks
>>>>>> (and some refactoring of loop_version can make that easier).
>>>>>>
>>>>>> I think splitting wants
>>>>>>
>>>>>>    loop_copy1
>>>>>>    if (condition)
>>>>>>      loop_copy2
>>>>>>
>>>>>> IMHO it would be good to split 'loopify' into the actual loopification
>>>>>> and the scaling.  Basically make the part of loop_version that
>>>>>> copies the loop on the header edge and creates a loop structure for
>>>>>> it separate.
>>>>>>
>>>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
>>>>>> (copy_loop_before).
>>>>>>
>>>>>
>>>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
>>>>> copying loops with single exit, it would cause many ICEs in it even
>>>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
>>>>> is NULL returning from single_exit (loop).).  Seems loop_version is
>>>>> more flexible for loop split.
>>>>
>>>> Hmm, sure - loop_version does not need to do anything special with
>>>> exits since control flow either enters the original or the loop copy.
>>>>
>>>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
>>>> control flow that enters _both_ loops, so it needs to have
>>>> an edge from the first loop exit to the second loop entry.
>>>>
>>>> One could extend this to a region copy, copying eventual CFG merges
>>>> of exits and specifying which exit of a SEME region is supposed
>>>> to be connected to the original region entry.
>>>>
>>>> After all that's what loop splitting needs in the end - though not
>>>> sure what exactly it does with more than one exit.
>>>
>>> In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
>>> the recently added split_loop_on_cond could process multiple exits loop.
>>>
>>> For example, do some changes to the loop-cond-split-1.c,
>>>
>>> int ga;
>>> extern int a;
>>> extern int b;
>>> extern int c;
>>>
>>> void test1 (int n) {
>>>   int i;
>>>   for (i = 0; i < n; i = inc (i)). {
>>>       if (a+3 > 0)
>>>        break;
>>>
>>>       if (ga)
>>>         ga = do_something ();
>>>
>>>       for (int j = 0; j < n; j++)
>>>         {
>>>            b+=5;
>>>            if (b > c) break;
>>>         }
>>>     }
>>> }
>>>
>>> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
>>> I am not sure whether it is valuable to do semi-invariant loop split to such
>>> cases with multiple exits, but obviously the split_loop_on_cond is a special
>>> case from split_loop both duplicate loop to 
>>>    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
>>> The only difference is condition1 is true for split_loop_on_cond.
>>>
>>>>
>>>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
>>>> which doesn't actually require a single exit but a single "important"
>>>> exit.  That might match how you treat multiple exits.
>>>
>>> gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
>>> I found that loop_version is still much better
>>> than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
>>> since it could handle all cases like multiple exits/subloops, etc.  I did some
>>> refactor to the code to introduce loop_version2 to create duplicated loops
>>> with two input conditions as attached patch, is this reasonable enough?
>>>
>>> I also tried to copy the code in loop_version out of it to don't call loop_version
>>> in loop_version2, but it seems useless with many duplicate code and NOT get rid
>>> of creating "if (condition1) {loop_copy1}" at first?
>>>
>>>
>>
>> The previous attached patch 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
>> had issues when connecting two loops, reason is split_loop connect_loops at *exit* edge,
>> do_split_loop_on_cond connect_loops at latch_edge.  So they have different CFGs even
>> both with two conditions :(.  It seems not so that useful to also define another new function
>> 'connect_loops_at_latch' given the limited usage in loop split only?  And the loop_version2
>> also looks not so general again ...
> 
> All I wanted to say is that none of the current high-level APIs we have
> are a 1:1 match for loop splitting and that they in fact might do
> stuff that's unnecessary or counter-productive.  If inventing a new API
> doesn't sound appealing then maybe refactoring an existing
> (maybe loop_version) to expose re-usable pieces would make more sense.
> 
> For example loop_version currently does lv_adjust_loop_entry_edge
> before it loopifys the copy inserted on the header.  I wonder if
> the condition generation can be done later and thus we can
> have three pieces - 1) duplicating the loop on the entry edge,
> 2) adjusting the CFG to insert a condition branching to either loop,
> 3) from loopify extract the scale_loop_frequencies bits (in fact
> loopify is only used from within cfgloopmanip.c)
> 
> That said, it shouldn't be a requirement to do all this to fix the
> profile for loop splitting it's just that the current situation
> does not help understanding of how the adjustment works.

The attached patch 0002-Refactor-loop_version.patch is to refactor loop_version
according to your above comments.

loopify is moved before condition generation, scale_loop_frequencies is
extracted out of loopify. 

0001-Fix-loop-split-incorrect-count-and-probability.patch is still the initial
version that fixes the probability and frequency issues in loop split, just a
repeat, both passed regression test on P8LE, OK for master?


> 
> Richard.
> 

-- 
Thanks,
Xionghu

[-- Attachment #2: 0001-Fix-loop-split-incorrect-count-and-probability.patch --]
[-- Type: text/plain, Size: 4898 bytes --]

From 2b4a48f05fb6eac83ec6153b8f89496ee8807b88 Mon Sep 17 00:00:00 2001
From: Xionghu Luo <luoxhu@linux.ibm.com>
Date: Tue, 3 Aug 2021 03:44:14 -0500
Subject: [PATCH 1/2] Fix loop split incorrect count and probability

loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.

Regression tested pass, OK for master?

diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
...
   int prephitmp_16;
   int prephitmp_25;

   <bb 2> [local count: 118111600]:
   if (n_7(D) > 0)
     goto <bb 4>; [89.00%]
   else
     goto <bb 3>; [11.00%]

   <bb 3> [local count: 118111600]:
   return;

   <bb 4> [local count: 105119324]:
   pretmp_3 = ga;

-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

-  <bb 6> [local count: 315357972]:
+  <bb 6> [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

 }

gcc/ChangeLog:

	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
	(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..d30782888f3 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -575,7 +575,11 @@ split_loop (class loop *loop1)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
+			   ? EDGE_SUCC (bbs[i], 0)
+			   : EDGE_SUCC (bbs[i], 1);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -583,10 +587,10 @@ split_loop (class loop *loop1)
 	basic_block cond_bb;
 
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
-					   profile_probability::always (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
+					   true_edge->probability,
+					   true_edge->probability.invert (),
 					   true);
 	gcc_assert (loop2);
 
@@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-				     profile_probability::always (),
-				     profile_probability::never (),
-				     profile_probability::always (),
-				     profile_probability::always (),
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
+				     invar_branch->probability.invert (),
+				     invar_branch->probability,
 				     true);
   if (!loop2)
     {
@@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
   to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
   to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
 
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
   /* Due to introduction of a control flow edge from loop1 latch to loop2
      pre-header, we should update PHIs in loop2 to reflect this connection
      between loop1 and loop2.  */
-- 
2.27.0.90.geebb51ba8c


[-- Attachment #3: 0002-Refactor-loop_version.patch --]
[-- Type: text/plain, Size: 4342 bytes --]

From e55a70b8d6fe1b0d2bf0905613acaff087bd40e9 Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo <luoxhu@linux.ibm.com>
Date: Mon, 25 Oct 2021 03:38:55 -0500
Subject: [PATCH 2/2] Refactor loop_version

loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header.  This patch moves
the condition generation later and thus we have four pieces to help
understanding of how the adjustment works:
 1) duplicating the loop on the entry edge.
 2) loopify the duplicated new loop.
 3) adjusting the CFG to insert a condition branching to either loop
 with lv_adjust_loop_entry_edge.
 4) From loopify extract the scale_loop_frequencies bits.

I also removed some piece of code seems obviously useless which is not
completely sure:
 - redirect_all_edges since it is false and loopify only called once.
 - extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
 edge is not redirected actually.
 - The unused function loopify is still not removed yet.

gcc/ChangeLog:

	* cfgloopmanip.c (loop_version): Refactor loopify to
	loop_version.  Move condition generation after loopify.
---
 gcc/cfgloopmanip.c | 48 ++++++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 82c242dd720..5d1491b8c1c 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1681,7 +1681,7 @@ loop_version (class loop *loop,
 	      bool place_after)
 {
   basic_block first_head, second_head;
-  edge entry, latch_edge, true_edge, false_edge;
+  edge entry, latch_edge;
   int irred_flag;
   class loop *nloop;
   basic_block cond_bb;
@@ -1694,7 +1694,7 @@ loop_version (class loop *loop,
   /* Note down head of loop as first_head.  */
   first_head = entry->dest;
 
-  /* Duplicate loop.  */
+  /* 1) Duplicate loop on the entry edge.  */
   if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
 					       NULL, NULL, NULL, 0))
     {
@@ -1702,11 +1702,28 @@ loop_version (class loop *loop,
       return NULL;
     }
 
+  /* 2) loopify the duplicated new loop. */
+  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
+  nloop = alloc_loop ();
+  class loop *outer = loop_outer (latch_edge->dest->loop_father);
+  edge new_header_edge = single_pred_edge (get_bb_copy (loop->header));
+  nloop->header = new_header_edge->dest;
+  nloop->latch = latch_edge->src;
+  loop_redirect_edge (latch_edge, nloop->header);
+
+  /* Compute new loop.  */
+  add_loop (nloop, outer);
+  copy_loop_info (loop, nloop);
+  set_loop_copy (loop, nloop);
+
+  /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
+  lv_flush_pending_stmts (latch_edge);
+
   /* After duplication entry edge now points to new loop head block.
      Note down new head as second_head.  */
   second_head = entry->dest;
 
-  /* Split loop entry edge and insert new block with cond expr.  */
+  /* 3) Split loop entry edge and insert new block with cond expr.  */
   cond_bb =  lv_adjust_loop_entry_edge (first_head, second_head,
 					entry, cond_expr, then_prob, else_prob);
   if (condition_bb)
@@ -1718,24 +1735,17 @@ loop_version (class loop *loop,
       return NULL;
     }
 
-  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
-
-  extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
-  nloop = loopify (latch_edge,
-		   single_pred_edge (get_bb_copy (loop->header)),
-		   cond_bb, true_edge, false_edge,
-		   false /* Do not redirect all edges.  */,
-		   then_scale, else_scale);
-
-  copy_loop_info (loop, nloop);
-  set_loop_copy (loop, nloop);
+  /* Add cond_bb to appropriate loop.  */
+  if (cond_bb->loop_father)
+    remove_bb_from_loops (cond_bb);
+  add_bb_to_loop (cond_bb, outer);
 
-  /* loopify redirected latch_edge. Update its PENDING_STMTS.  */
-  lv_flush_pending_stmts (latch_edge);
+  /* 4) Scale the original loop and new loop frequency.  */
+  scale_loop_frequencies (loop, then_scale);
+  scale_loop_frequencies (nloop, else_scale);
+  update_dominators_in_loop (loop);
+  update_dominators_in_loop (nloop);
 
-  /* loopify redirected condition_bb's succ edge. Update its PENDING_STMTS.  */
-  extract_cond_bb_edges (cond_bb, &true_edge, &false_edge);
-  lv_flush_pending_stmts (false_edge);
   /* Adjust irreducible flag.  */
   if (irred_flag)
     {
-- 
2.27.0.90.geebb51ba8c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-10-26  5:40                     ` Xionghu Luo
@ 2021-10-26 11:59                       ` Richard Biener
  2021-10-26 12:19                         ` Jan Hubicka
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2021-10-26 11:59 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: segher, wschmidt, linkw, gcc-patches, hubicka

On Tue, 26 Oct 2021, Xionghu Luo wrote:

> 
> 
> On 2021/10/21 18:55, Richard Biener wrote:
> > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> >>>
> >>>
> >>> On 2021/9/23 20:17, Richard Biener wrote:
> >>>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> On 2021/8/11 17:16, Richard Biener wrote:
> >>>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2021/8/10 22:47, Richard Biener wrote:
> >>>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
> >>>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>>>>>>>>
> >>>>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>>>>>>>>> count and probability should also be duplicated instead of (100% vs
> >>>>>>>>>>> INV),
> >>>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
> >>>>>>>>>>> the
> >>>>>>>>>>> original loop.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> >>>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> >>>>>>>>>>> ...
> >>>>>>>>>>>       int prephitmp_16;
> >>>>>>>>>>>       int prephitmp_25;
> >>>>>>>>>>>
> >>>>>>>>>>>       <bb 2> [local count: 118111600]:
> >>>>>>>>>>>       if (n_7(D) > 0)
> >>>>>>>>>>>         goto <bb 4>; [89.00%]
> >>>>>>>>>>>       else
> >>>>>>>>>>>         goto <bb 3>; [11.00%]
> >>>>>>>>>>>
> >>>>>>>>>>>       <bb 3> [local count: 118111600]:
> >>>>>>>>>>>       return;
> >>>>>>>>>>>
> >>>>>>>>>>>       <bb 4> [local count: 105119324]:
> >>>>>>>>>>>       pretmp_3 = ga;
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 5> [local count: 955630225]:
> >>>>>>>>>>> +  <bb 5> [local count: 315357973]:
> >>>>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
> >>>>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>>>>>>>>       if (prephitmp_12 != 0)
> >>>>>>>>>>>         goto <bb 6>; [33.00%]
> >>>>>>>>>>>       else
> >>>>>>>>>>>         goto <bb 7>; [67.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 6> [local count: 315357972]:
> >>>>>>>>>>> +  <bb 6> [local count: 104068130]:
> >>>>>>>>>>>       _2 = do_something ();
> >>>>>>>>>>>       ga = _2;
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 7> [local count: 955630225]:
> >>>>>>>>>>> +  <bb 7> [local count: 315357973]:
> >>>>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>>>>>>>>       i_10 = inc (i_13);
> >>>>>>>>>>>       if (n_7(D) > i_10)
> >>>>>>>>>>>         goto <bb 21>; [89.00%]
> >>>>>>>>>>>       else
> >>>>>>>>>>>         goto <bb 11>; [11.00%]
> >>>>>>>>>>>
> >>>>>>>>>>>       <bb 11> [local count: 105119324]:
> >>>>>>>>>>>       goto <bb 3>; [100.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 21> [local count: 850510901]:
> >>>>>>>>>>> +  <bb 21> [local count: 280668596]:
> >>>>>>>>>>>       if (prephitmp_12 != 0)
> >>>>>>>>>>> -    goto <bb 20>; [100.00%]
> >>>>>>>>>>> +    goto <bb 20>; [33.00%]
> >>>>>>>>>>>       else
> >>>>>>>>>>> -    goto <bb 19>; [INV]
> >>>>>>>>>>> +    goto <bb 19>; [67.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 20> [local count: 850510901]:
> >>>>>>>>>>> +  <bb 20> [local count: 280668596]:
> >>>>>>>>>>>       goto <bb 5>; [100.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 19> [count: 0]:
> >>>>>>>>>>> +  <bb 19> [local count: 70429947]:
> >>>>>>>>>>>       # i_23 = PHI <i_10(21)>
> >>>>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 12> [local count: 955630225]:
> >>>>>>>>>>> +  <bb 12> [local count: 640272252]:
> >>>>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
> >>>>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>>>>>>>>       i_22 = inc (i_15);
> >>>>>>>>>>>       if (n_7(D) > i_22)
> >>>>>>>>>>>         goto <bb 16>; [89.00%]
> >>>>>>>>>>>       else
> >>>>>>>>>>>         goto <bb 11>; [11.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> -  <bb 16> [local count: 850510901]:
> >>>>>>>>>>> +  <bb 16> [local count: 569842305]:
> >>>>>>>>>>>       goto <bb 12>; [100.00%]
> >>>>>>>>>>>
> >>>>>>>>>>>     }
> >>>>>>>>>>>
> >>>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>>
> >>>>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>>>>>>>>>  (do_split_loop_on_cond): Likewise.
> >>>>>>>>>>> ---
> >>>>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
> >>>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>>>>>>>>      basic_block cond_bb;
> >>>>>>>>>
> >>>>>>>>>    	if (!initial_true)
> >>>>>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>>>> +
> >>>>>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>>>>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
> >>>>>>>>> +			   : EDGE_SUCC (bbs[i], 1);
> >>>>>>>>>
> >>>>>>>>>>>     
> >>>>>>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>>>>>>>>> -					   profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> -					   profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> -					   profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> -					   profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> +					   true_edge->probability,
> >>>>>>>>>>> +
> >>>>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>>>> +					   true_edge->probability,
> >>>>>>>>>>> +
> >>>>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>>>>             true);
> >>>>>>>>>>
> >>>>>>>>>> there is no 'true_edge' variable at this point.
> >>>>>>>>>
> >>>>>>>>> Sorry, missed the above hunk when split the patch.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>      gcc_assert (loop2);
> >>>>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
> >>>>>>>>>>> edge invar_branch)
> >>>>>>>>>>>       initialize_original_copy_tables ();
> >>>>>>>>>>>     
> >>>>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
> >>>>>>>>>>> NULL,
> >>>>>>>>>>> -				     profile_probability::always (),
> >>>>>>>>>>> -				     profile_probability::never (),
> >>>>>>>>>>> -				     profile_probability::always (),
> >>>>>>>>>>> -				     profile_probability::always (),
> >>>>>>>>>>> +				     invar_branch->probability.invert
> >>>>>>>>>>> (),
> >>>>>>>>>>> +				     invar_branch->probability,
> >>>>>>>>>>> +				     invar_branch->probability.invert
> >>>>>>>>>>> (),
> >>>>>>>>>>> +				     invar_branch->probability,
> >>>>>>>>>>>       				     true);
> >>>>>>>>>>>       if (!loop2)
> >>>>>>>>>>>         {
> >>>>>>>>>>
> >>>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>>>>>>>>
> >>>>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
> >>>>>>>>> 100%
> >>>>>>>>> and no scaling which seems also incorrect.
> >>>>>>>>>
> >>>>>>>>>> Since loop versioning inserts a condition with the passed probabilities
> >>>>>>>>>> but in this case a 'boolean_true_node' condition the then and else
> >>>>>>>>>> probabilities passed look correct.  It's just the scaling arguments
> >>>>>>>>>> that look wrong?  This loop_version call should get a comment as to
> >>>>>>>>>> why we are passing probabilities the way we do.
> >>>>>>>>>
> >>>>>>>>> This optimization is transforming from:
> >>>>>>>>>
> >>>>>>>>> for (i = 0; i < n; i = inc (i))
> >>>>>>>>>       {
> >>>>>>>>>         if (ga)
> >>>>>>>>>           ga = do_something ();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> to:
> >>>>>>>>>
> >>>>>>>>>     for (i = 0; i < x; i = inc (i))
> >>>>>>>>> {
> >>>>>>>>>       if (true)
> >>>>>>>>>           ga = do_something ();
> >>>>>>>>>           if (!ga)
> >>>>>>>>>             break;
> >>>>>>>>> }
> >>>>>>>>>     for (; i < n; i = inc (i))
> >>>>>>>>> {
> >>>>>>>>>       if (false)
> >>>>>>>>>            ga = do_something ();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> `boolean_true_node` is passed in to make the first loop's condition
> >>>>>>>>> statement to be 'true', after returning from loop_version, there is a
> >>>>>>>>> piece of code forcing the condition in second loop to be false,
> >>>>>>>>> and the original condition is moved from *in loop* to *exit edge*
> >>>>>>>>> between loop1 and loop2.
> >>>>>>>>
> >>>>>>>> Yes, one complication is that we use loop_version but do not retain
> >>>>>>>> the CFG it creates.  Something like the vectorizers
> >>>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> >>>>>>>> but then that code doesn't do any scaling yet.  But then
> >>>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> >>>>>>>> we could write a variant that simply doesn't mangle the CFG
> >>>>>>>> with a new condition switching between both loops but keeps them
> >>>>>>>> executed after each other ...
> >>>>>>>>
> >>>>>>>> As said, this adds to the confusion and some awkwardness.
> >>>>>>>
> >>>>>>> Then loop_version in loop split requires two types of variant, one
> >>>>>>> is to insert condition to loop preheader for 'split_loops' usage,
> >>>>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> >>>>>>> usage, it needs one extra function to encapsulate these cfg alterations
> >>>>>>> from outside to inside.
> >>>>>>>
> >>>>>>> unswitching only execute one loop as it only moves the invariant condition
> >>>>>>> to first loop's pre-header.  While 'split_loops' and
> >>>>>>> 'do_split_loop_on_cond'
> >>>>>>> may execute both loops no matter the condition is moved to the first loop's
> >>>>>>> preheader or exit.
> >>>>>>>
> >>>>>>> The condition stmt in loop unswitching is invariant, but it is variant
> >>>>>>> in loop splitting, that's why loop unswitching execute only one loop
> >>>>>>> but loop splitting executes both loops.
> >>>>>>>
> >>>>>>> Seems we need two condition arguments for loop_version, one for connecting
> >>>>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> >>>>>>> to loop2's header?  Then it will be more generic for both unswitching pass
> >>>>>>> and splitting pass.  Looks a bit complicated and conflicted with
> >>>>>>> loop_version's
> >>>>>>> comments:
> >>>>>>>
> >>>>>>> /* Main entry point for Loop Versioning transformation.
> >>>>>>>
> >>>>>>>     This transformation given a condition and a loop, creates
> >>>>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> >>>>>>>
> >>>>>>>
> >>>>>>> And this only works for loop split usage, those many other places
> >>>>>>> doesn't use loop_version like this?
> >>>>>>
> >>>>>> Yes, as said if you don't want the above CFG then you probably
> >>>>>> shouldn't use loop_version but instead use its building blocks
> >>>>>> (and some refactoring of loop_version can make that easier).
> >>>>>>
> >>>>>> I think splitting wants
> >>>>>>
> >>>>>>    loop_copy1
> >>>>>>    if (condition)
> >>>>>>      loop_copy2
> >>>>>>
> >>>>>> IMHO it would be good to split 'loopify' into the actual loopification
> >>>>>> and the scaling.  Basically make the part of loop_version that
> >>>>>> copies the loop on the header edge and creates a loop structure for
> >>>>>> it separate.
> >>>>>>
> >>>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> >>>>>> (copy_loop_before).
> >>>>>>
> >>>>>
> >>>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> >>>>> copying loops with single exit, it would cause many ICEs in it even
> >>>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> >>>>> is NULL returning from single_exit (loop).).  Seems loop_version is
> >>>>> more flexible for loop split.
> >>>>
> >>>> Hmm, sure - loop_version does not need to do anything special with
> >>>> exits since control flow either enters the original or the loop copy.
> >>>>
> >>>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> >>>> control flow that enters _both_ loops, so it needs to have
> >>>> an edge from the first loop exit to the second loop entry.
> >>>>
> >>>> One could extend this to a region copy, copying eventual CFG merges
> >>>> of exits and specifying which exit of a SEME region is supposed
> >>>> to be connected to the original region entry.
> >>>>
> >>>> After all that's what loop splitting needs in the end - though not
> >>>> sure what exactly it does with more than one exit.
> >>>
> >>> In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
> >>> the recently added split_loop_on_cond could process multiple exits loop.
> >>>
> >>> For example, do some changes to the loop-cond-split-1.c,
> >>>
> >>> int ga;
> >>> extern int a;
> >>> extern int b;
> >>> extern int c;
> >>>
> >>> void test1 (int n) {
> >>>   int i;
> >>>   for (i = 0; i < n; i = inc (i)). {
> >>>       if (a+3 > 0)
> >>>        break;
> >>>
> >>>       if (ga)
> >>>         ga = do_something ();
> >>>
> >>>       for (int j = 0; j < n; j++)
> >>>         {
> >>>            b+=5;
> >>>            if (b > c) break;
> >>>         }
> >>>     }
> >>> }
> >>>
> >>> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
> >>> I am not sure whether it is valuable to do semi-invariant loop split to such
> >>> cases with multiple exits, but obviously the split_loop_on_cond is a special
> >>> case from split_loop both duplicate loop to 
> >>>    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> >>> The only difference is condition1 is true for split_loop_on_cond.
> >>>
> >>>>
> >>>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> >>>> which doesn't actually require a single exit but a single "important"
> >>>> exit.  That might match how you treat multiple exits.
> >>>
> >>> gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
> >>> I found that loop_version is still much better
> >>> than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> >>> since it could handle all cases like multiple exits/subloops, etc.  I did some
> >>> refactor to the code to introduce loop_version2 to create duplicated loops
> >>> with two input conditions as attached patch, is this reasonable enough?
> >>>
> >>> I also tried to copy the code in loop_version out of it to don't call loop_version
> >>> in loop_version2, but it seems useless with many duplicate code and NOT get rid
> >>> of creating "if (condition1) {loop_copy1}" at first?
> >>>
> >>>
> >>
> >> The previous attached patch 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
> >> had issues when connecting two loops, reason is split_loop connect_loops at *exit* edge,
> >> do_split_loop_on_cond connect_loops at latch_edge.  So they have different CFGs even
> >> both with two conditions :(.  It seems not so that useful to also define another new function
> >> 'connect_loops_at_latch' given the limited usage in loop split only?  And the loop_version2
> >> also looks not so general again ...
> > 
> > All I wanted to say is that none of the current high-level APIs we have
> > are a 1:1 match for loop splitting and that they in fact might do
> > stuff that's unnecessary or counter-productive.  If inventing a new API
> > doesn't sound appealing then maybe refactoring an existing
> > (maybe loop_version) to expose re-usable pieces would make more sense.
> > 
> > For example loop_version currently does lv_adjust_loop_entry_edge
> > before it loopifys the copy inserted on the header.  I wonder if
> > the condition generation can be done later and thus we can
> > have three pieces - 1) duplicating the loop on the entry edge,
> > 2) adjusting the CFG to insert a condition branching to either loop,
> > 3) from loopify extract the scale_loop_frequencies bits (in fact
> > loopify is only used from within cfgloopmanip.c)
> > 
> > That said, it shouldn't be a requirement to do all this to fix the
> > profile for loop splitting it's just that the current situation
> > does not help understanding of how the adjustment works.
> 
> The attached patch 0002-Refactor-loop_version.patch is to refactor loop_version
> according to your above comments.
> 
> loopify is moved before condition generation, scale_loop_frequencies is
> extracted out of loopify. 

I think that's a good improvement (and if loopify is unused after the
patch it should be removed together with it).  The loop copying part
could then be a new clone_loop_to_header_edge function, or loop_copy
(rather than loop_version).

The existing duplicate_loop_to_header_edge function is named badly
since it does duplicate_loop_body_to_header_edge ...

> 0001-Fix-loop-split-incorrect-count-and-probability.patch is still the initial
> version that fixes the probability and frequency issues in loop split, just a
> repeat, both passed regression test on P8LE, OK for master?

That's still the original patch, I don't see any of the comments addressed
and I do not have enough knowledge to approve the patch.

Sorry here, but I really hoped that Honza would eventually chime in.

Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-10-26 11:59                       ` Richard Biener
@ 2021-10-26 12:19                         ` Jan Hubicka
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2021-10-26 12:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: Xionghu Luo, segher, wschmidt, linkw, gcc-patches

> On Tue, 26 Oct 2021, Xionghu Luo wrote:
> 
> > 
> > 
> > On 2021/10/21 18:55, Richard Biener wrote:
> > > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> > > 
> > >>
> > >>
> > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> > >>>
> > >>>
> > >>> On 2021/9/23 20:17, Richard Biener wrote:
> > >>>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>> On 2021/8/11 17:16, Richard Biener wrote:
> > >>>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> > >>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 2021/8/10 22:47, Richard Biener wrote:
> > >>>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
> > >>>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
> > >>>>>>>>>>> count and probability should also be duplicated instead of (100% vs
> > >>>>>>>>>>> INV),
> > >>>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
> > >>>>>>>>>>> the
> > >>>>>>>>>>> original loop.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> > >>>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> > >>>>>>>>>>> ...
> > >>>>>>>>>>>       int prephitmp_16;
> > >>>>>>>>>>>       int prephitmp_25;
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 2> [local count: 118111600]:
> > >>>>>>>>>>>       if (n_7(D) > 0)
> > >>>>>>>>>>>         goto <bb 4>; [89.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 3>; [11.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 3> [local count: 118111600]:
> > >>>>>>>>>>>       return;
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 4> [local count: 105119324]:
> > >>>>>>>>>>>       pretmp_3 = ga;
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 5> [local count: 955630225]:
> > >>>>>>>>>>> +  <bb 5> [local count: 315357973]:
> > >>>>>>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
> > >>>>>>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> > >>>>>>>>>>>       if (prephitmp_12 != 0)
> > >>>>>>>>>>>         goto <bb 6>; [33.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 7>; [67.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 6> [local count: 315357972]:
> > >>>>>>>>>>> +  <bb 6> [local count: 104068130]:
> > >>>>>>>>>>>       _2 = do_something ();
> > >>>>>>>>>>>       ga = _2;
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 7> [local count: 955630225]:
> > >>>>>>>>>>> +  <bb 7> [local count: 315357973]:
> > >>>>>>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> > >>>>>>>>>>>       i_10 = inc (i_13);
> > >>>>>>>>>>>       if (n_7(D) > i_10)
> > >>>>>>>>>>>         goto <bb 21>; [89.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 11>; [11.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>>       <bb 11> [local count: 105119324]:
> > >>>>>>>>>>>       goto <bb 3>; [100.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 21> [local count: 850510901]:
> > >>>>>>>>>>> +  <bb 21> [local count: 280668596]:
> > >>>>>>>>>>>       if (prephitmp_12 != 0)
> > >>>>>>>>>>> -    goto <bb 20>; [100.00%]
> > >>>>>>>>>>> +    goto <bb 20>; [33.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>> -    goto <bb 19>; [INV]
> > >>>>>>>>>>> +    goto <bb 19>; [67.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 20> [local count: 850510901]:
> > >>>>>>>>>>> +  <bb 20> [local count: 280668596]:
> > >>>>>>>>>>>       goto <bb 5>; [100.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 19> [count: 0]:
> > >>>>>>>>>>> +  <bb 19> [local count: 70429947]:
> > >>>>>>>>>>>       # i_23 = PHI <i_10(21)>
> > >>>>>>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 12> [local count: 955630225]:
> > >>>>>>>>>>> +  <bb 12> [local count: 640272252]:
> > >>>>>>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
> > >>>>>>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> > >>>>>>>>>>>       i_22 = inc (i_15);
> > >>>>>>>>>>>       if (n_7(D) > i_22)
> > >>>>>>>>>>>         goto <bb 16>; [89.00%]
> > >>>>>>>>>>>       else
> > >>>>>>>>>>>         goto <bb 11>; [11.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>> -  <bb 16> [local count: 850510901]:
> > >>>>>>>>>>> +  <bb 16> [local count: 569842305]:
> > >>>>>>>>>>>       goto <bb 12>; [100.00%]
> > >>>>>>>>>>>
> > >>>>>>>>>>>     }
> > >>>>>>>>>>>
> > >>>>>>>>>>> gcc/ChangeLog:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> > >>>>>>>>>>>  (do_split_loop_on_cond): Likewise.
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> > >>>>>>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > >>>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> > >>>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
> > >>>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
> > >>>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> > >>>>>>>>>>>      basic_block cond_bb;
> > >>>>>>>>>
> > >>>>>>>>>    	if (!initial_true)
> > >>>>>>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > >>>>>>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > >>>>>>>>> +
> > >>>>>>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > >>>>>>>>> +			   ? EDGE_SUCC (bbs[i], 0)
> > >>>>>>>>> +			   : EDGE_SUCC (bbs[i], 1);
> > >>>>>>>>>
> > >>>>>>>>>>>     
> > >>>>>>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > >>>>>>>>>>> -					   profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -					   profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -					   profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> -					   profile_probability::always
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> +					   true_edge->probability,
> > >>>>>>>>>>> +
> > >>>>>>>>>>> true_edge->probability.invert (),
> > >>>>>>>>>>> +					   true_edge->probability,
> > >>>>>>>>>>> +
> > >>>>>>>>>>> true_edge->probability.invert (),
> > >>>>>>>>>>>             true);
> > >>>>>>>>>>
> > >>>>>>>>>> there is no 'true_edge' variable at this point.
> > >>>>>>>>>
> > >>>>>>>>> Sorry, missed the above hunk when split the patch.
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>      gcc_assert (loop2);
> > >>>>>>>>>>>     @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
> > >>>>>>>>>>> edge invar_branch)
> > >>>>>>>>>>>       initialize_original_copy_tables ();
> > >>>>>>>>>>>     
> > >>>>>>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node,
> > >>>>>>>>>>> NULL,
> > >>>>>>>>>>> -				     profile_probability::always (),
> > >>>>>>>>>>> -				     profile_probability::never (),
> > >>>>>>>>>>> -				     profile_probability::always (),
> > >>>>>>>>>>> -				     profile_probability::always (),
> > >>>>>>>>>>> +				     invar_branch->probability.invert
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> +				     invar_branch->probability,
> > >>>>>>>>>>> +				     invar_branch->probability.invert
> > >>>>>>>>>>> (),
> > >>>>>>>>>>> +				     invar_branch->probability,
> > >>>>>>>>>>>       				     true);
> > >>>>>>>>>>>       if (!loop2)
> > >>>>>>>>>>>         {
> > >>>>>>>>>>
> > >>>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
> > >>>>>>>>>
> > >>>>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
> > >>>>>>>>> 100%
> > >>>>>>>>> and no scaling which seems also incorrect.
> > >>>>>>>>>
> > >>>>>>>>>> Since loop versioning inserts a condition with the passed probabilities
> > >>>>>>>>>> but in this case a 'boolean_true_node' condition the then and else
> > >>>>>>>>>> probabilities passed look correct.  It's just the scaling arguments
> > >>>>>>>>>> that look wrong?  This loop_version call should get a comment as to
> > >>>>>>>>>> why we are passing probabilities the way we do.
> > >>>>>>>>>
> > >>>>>>>>> This optimization is transforming from:
> > >>>>>>>>>
> > >>>>>>>>> for (i = 0; i < n; i = inc (i))
> > >>>>>>>>>       {
> > >>>>>>>>>         if (ga)
> > >>>>>>>>>           ga = do_something ();
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> to:
> > >>>>>>>>>
> > >>>>>>>>>     for (i = 0; i < x; i = inc (i))
> > >>>>>>>>> {
> > >>>>>>>>>       if (true)
> > >>>>>>>>>           ga = do_something ();
> > >>>>>>>>>           if (!ga)
> > >>>>>>>>>             break;
> > >>>>>>>>> }
> > >>>>>>>>>     for (; i < n; i = inc (i))
> > >>>>>>>>> {
> > >>>>>>>>>       if (false)
> > >>>>>>>>>            ga = do_something ();
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> `boolean_true_node` is passed in to make the first loop's condition
> > >>>>>>>>> statement to be 'true', after returning from loop_version, there is a
> > >>>>>>>>> piece of code forcing the condition in second loop to be false,
> > >>>>>>>>> and the original condition is moved from *in loop* to *exit edge*
> > >>>>>>>>> between loop1 and loop2.
> > >>>>>>>>
> > >>>>>>>> Yes, one complication is that we use loop_version but do not retain
> > >>>>>>>> the CFG it creates.  Something like the vectorizers
> > >>>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> > >>>>>>>> but then that code doesn't do any scaling yet.  But then
> > >>>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> > >>>>>>>> we could write a variant that simply doesn't mangle the CFG
> > >>>>>>>> with a new condition switching between both loops but keeps them
> > >>>>>>>> executed after each other ...
> > >>>>>>>>
> > >>>>>>>> As said, this adds to the confusion and some awkwardness.
> > >>>>>>>
> > >>>>>>> Then loop_version in loop split requires two types of variant, one
> > >>>>>>> is to insert condition to loop preheader for 'split_loops' usage,
> > >>>>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> > >>>>>>> usage, it needs one extra function to encapsulate these cfg alterations
> > >>>>>>> from outside to inside.
> > >>>>>>>
> > >>>>>>> unswitching only execute one loop as it only moves the invariant condition
> > >>>>>>> to first loop's pre-header.  While 'split_loops' and
> > >>>>>>> 'do_split_loop_on_cond'
> > >>>>>>> may execute both loops no matter the condition is moved to the first loop's
> > >>>>>>> preheader or exit.
> > >>>>>>>
> > >>>>>>> The condition stmt in loop unswitching is invariant, but it is variant
> > >>>>>>> in loop splitting, that's why loop unswitching execute only one loop
> > >>>>>>> but loop splitting executes both loops.
> > >>>>>>>
> > >>>>>>> Seems we need two condition arguments for loop_version, one for connecting
> > >>>>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> > >>>>>>> to loop2's header?  Then it will be more generic for both unswitching pass
> > >>>>>>> and splitting pass.  Looks a bit complicated and conflicted with
> > >>>>>>> loop_version's
> > >>>>>>> comments:
> > >>>>>>>
> > >>>>>>> /* Main entry point for Loop Versioning transformation.
> > >>>>>>>
> > >>>>>>>     This transformation given a condition and a loop, creates
> > >>>>>>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> And this only works for loop split usage, those many other places
> > >>>>>>> doesn't use loop_version like this?
> > >>>>>>
> > >>>>>> Yes, as said if you don't want the above CFG then you probably
> > >>>>>> shouldn't use loop_version but instead use its building blocks
> > >>>>>> (and some refactoring of loop_version can make that easier).
> > >>>>>>
> > >>>>>> I think splitting wants
> > >>>>>>
> > >>>>>>    loop_copy1
> > >>>>>>    if (condition)
> > >>>>>>      loop_copy2
> > >>>>>>
> > >>>>>> IMHO it would be good to split 'loopify' into the actual loopification
> > >>>>>> and the scaling.  Basically make the part of loop_version that
> > >>>>>> copies the loop on the header edge and creates a loop structure for
> > >>>>>> it separate.
> > >>>>>>
> > >>>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> > >>>>>> (copy_loop_before).
> > >>>>>>
> > >>>>>
> > >>>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> > >>>>> copying loops with single exit, it would cause many ICEs in it even
> > >>>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> > >>>>> is NULL returning from single_exit (loop).).  Seems loop_version is
> > >>>>> more flexible for loop split.
> > >>>>
> > >>>> Hmm, sure - loop_version does not need to do anything special with
> > >>>> exits since control flow either enters the original or the loop copy.
> > >>>>
> > >>>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> > >>>> control flow that enters _both_ loops, so it needs to have
> > >>>> an edge from the first loop exit to the second loop entry.
> > >>>>
> > >>>> One could extend this to a region copy, copying eventual CFG merges
> > >>>> of exits and specifying which exit of a SEME region is supposed
> > >>>> to be connected to the original region entry.
> > >>>>
> > >>>> After all that's what loop splitting needs in the end - though not
> > >>>> sure what exactly it does with more than one exit.
> > >>>
> > >>> In tree-ssa-loop-split.c,  split_loop only accepts single exit loop,
> > >>> the recently added split_loop_on_cond could process multiple exits loop.
> > >>>
> > >>> For example, do some changes to the loop-cond-split-1.c,
> > >>>
> > >>> int ga;
> > >>> extern int a;
> > >>> extern int b;
> > >>> extern int c;
> > >>>
> > >>> void test1 (int n) {
> > >>>   int i;
> > >>>   for (i = 0; i < n; i = inc (i)). {
> > >>>       if (a+3 > 0)
> > >>>        break;
> > >>>
> > >>>       if (ga)
> > >>>         ga = do_something ();
> > >>>
> > >>>       for (int j = 0; j < n; j++)
> > >>>         {
> > >>>            b+=5;
> > >>>            if (b > c) break;
> > >>>         }
> > >>>     }
> > >>> }
> > >>>
> > >>> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2. 
> > >>> I am not sure whether it is valuable to do semi-invariant loop split to such
> > >>> cases with multiple exits, but obviously the split_loop_on_cond is a special
> > >>> case from split_loop both duplicate loop to 
> > >>>    if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> > >>> The only difference is condition1 is true for split_loop_on_cond.
> > >>>
> > >>>>
> > >>>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> > >>>> which doesn't actually require a single exit but a single "important"
> > >>>> exit.  That might match how you treat multiple exits.
> > >>>
> > >>> gimple_duplicate_sese_region doesn't handle subloops and latches.  Finally,
> > >>> I found that loop_version is still much better
> > >>> than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> > >>> since it could handle all cases like multiple exits/subloops, etc.  I did some
> > >>> refactor to the code to introduce loop_version2 to create duplicated loops
> > >>> with two input conditions as attached patch, is this reasonable enough?
> > >>>
> > >>> I also tried to copy the code in loop_version out of it to don't call loop_version
> > >>> in loop_version2, but it seems useless with many duplicate code and NOT get rid
> > >>> of creating "if (condition1) {loop_copy1}" at first?
> > >>>
> > >>>
> > >>
> > >> The previous attached patch 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
> > >> had issues when connecting two loops, reason is split_loop connect_loops at *exit* edge,
> > >> do_split_loop_on_cond connect_loops at latch_edge.  So they have different CFGs even
> > >> both with two conditions :(.  It seems not so that useful to also define another new function
> > >> 'connect_loops_at_latch' given the limited usage in loop split only?  And the loop_version2
> > >> also looks not so general again ...
> > > 
> > > All I wanted to say is that none of the current high-level APIs we have
> > > are a 1:1 match for loop splitting and that they in fact might do
> > > stuff that's unnecessary or counter-productive.  If inventing a new API
> > > doesn't sound appealing then maybe refactoring an existing
> > > (maybe loop_version) to expose re-usable pieces would make more sense.
> > > 
> > > For example loop_version currently does lv_adjust_loop_entry_edge
> > > before it loopifys the copy inserted on the header.  I wonder if
> > > the condition generation can be done later and thus we can
> > > have three pieces - 1) duplicating the loop on the entry edge,
> > > 2) adjusting the CFG to insert a condition branching to either loop,
> > > 3) from loopify extract the scale_loop_frequencies bits (in fact
> > > loopify is only used from within cfgloopmanip.c)
> > > 
> > > That said, it shouldn't be a requirement to do all this to fix the
> > > profile for loop splitting it's just that the current situation
> > > does not help understanding of how the adjustment works.
> > 
> > The attached patch 0002-Refactor-loop_version.patch is to refactor loop_version
> > according to your above comments.
> > 
> > loopify is moved before condition generation, scale_loop_frequencies is
> > extracted out of loopify. 
> 
> I think that's a good improvement (and if loopify is unused after the
> patch it should be removed together with it).  The loop copying part
> could then be a new clone_loop_to_header_edge function, or loop_copy
> (rather than loop_version).
> 
> The existing duplicate_loop_to_header_edge function is named badly
> since it does duplicate_loop_body_to_header_edge ...
> 
> > 0001-Fix-loop-split-incorrect-count-and-probability.patch is still the initial
> > version that fixes the probability and frequency issues in loop split, just a
> > repeat, both passed regression test on P8LE, OK for master?
> 
> That's still the original patch, I don't see any of the comments addressed
> and I do not have enough knowledge to approve the patch.
> 
> Sorry here, but I really hoped that Honza would eventually chime in.
Sorry, I got bit lost in the thread. What patch I should look at?
Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-08-11  3:03       ` Feng Xue OS
@ 2021-10-26 13:05         ` Jan Hubicka
  2021-10-27  1:42           ` Xionghu Luo
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2021-10-26 13:05 UTC (permalink / raw)
  To: Feng Xue OS
  Cc: Richard Biener, Xionghu Luo, gcc-patches, segher, wschmidt,
	guojiufu, linkw

> >

> That said, likely the profile update cannot be done uniformly
> for all blocks of a loop?

For the loop:

for (i = 0; i < n; i = inc (i))
    {
      if (ga)
        ga = do_something ();
}

to:

  for (i = 0; i < x; i = inc (i))
{
    if (true)
         ga = do_something ();
        if (!ga)
          break;
}
  for (; i < n; i = inc (i))
{
    if (false)
         ga = do_something ();
}

If probability of if (ga) being true is p, then you indeed can scale the
first loop by p and second loop by 1-p.

Imagine that loop has n iterations and it takes m iterations for ga to
become false, then probability of if(ga) is m/n and you get frequencies
with m=n*(m/n) for first loop and n-m=n*(1-n/m) for second loop.

Because the conditional becomes constant true, one needs to scale up the
basic block guarded by the if (true) up by n/m to compensate for the
change.  With that the udpate should be right.
Ideally one can bypass scaling of basic block(s) containing
 ga = do_something () since the scaling first scales down to m/n
and then scale sup to m/n. Which may not combine to noop.
Perhaps one wants to have a parameter specifying basic blocks on which
the scaling is performed while duplicating for this?

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Fix loop split incorrect count and probability
  2021-10-26 13:05         ` Jan Hubicka
@ 2021-10-27  1:42           ` Xionghu Luo
  0 siblings, 0 replies; 20+ messages in thread
From: Xionghu Luo @ 2021-10-27  1:42 UTC (permalink / raw)
  To: Jan Hubicka, Feng Xue OS
  Cc: Richard Biener, gcc-patches, segher, wschmidt, guojiufu, linkw



On 2021/10/26 21:05, Jan Hubicka wrote:
>>>
> 
>> That said, likely the profile update cannot be done uniformly
>> for all blocks of a loop?
> 
> For the loop:
> 
> for (i = 0; i < n; i = inc (i))
>     {
>       if (ga)
>         ga = do_something ();
> }
> 
> to:
> 
>   for (i = 0; i < x; i = inc (i))
> {
>     if (true)
>          ga = do_something ();
>         if (!ga)
>           break;
> }
>   for (; i < n; i = inc (i))
> {
>     if (false)
>          ga = do_something ();
> }
> 
> If probability of if (ga) being true is p, then you indeed can scale the
> first loop by p and second loop by 1-p.
> 
> Imagine that loop has n iterations and it takes m iterations for ga to
> become false, then probability of if(ga) is m/n and you get frequencies
> with m=n*(m/n) for first loop and n-m=n*(1-n/m) for second loop.
> 
> Because the conditional becomes constant true, one needs to scale up the
> basic block guarded by the if (true) up by n/m to compensate for the
> change.  With that the udpate should be right.
> Ideally one can bypass scaling of basic block(s) containing
>  ga = do_something () since the scaling first scales down to m/n
> and then scale sup to m/n. Which may not combine to noop.
> Perhaps one wants to have a parameter specifying basic blocks on which
> the scaling is performed while duplicating for this?

Yes.  This is the patch I tried to fix the issue for the case you
pasted for loop split.

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576566.html


 5<---------------- 
 | \              |
 6-- 7            |
    |  \          |
    21  11->3     |
    | \           |
    19 20----------
    |
    12<----
    / |   |
3<-11 16-- 

with the patch, Loop 1's bb {5,6,7,21,20} is scaled to 33% * count, 
Loop 2's bb {12,16} is scaled to 66% * count,
especially, probability of edge 21->19 and 21->20 is fixed to 
(33%, 67%) instead of (100%, INV).


-  <bb 5> [local count: 955630225]:
+  <bb 5> [local count: 315357973]:
   # i_13 = PHI <i_10(20), 0(4)>
   # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
   if (prephitmp_12 != 0)
     goto <bb 6>; [33.00%]
   else
     goto <bb 7>; [67.00%]

-  <bb 6> [local count: 315357972]:
+  <bb 6> [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-  <bb 7> [local count: 955630225]:
+  <bb 7> [local count: 315357973]:
   # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
     goto <bb 21>; [89.00%]
   else
     goto <bb 11>; [11.00%]

   <bb 11> [local count: 105119324]:
   goto <bb 3>; [100.00%]

-  <bb 21> [local count: 850510901]:
+  <bb 21> [local count: 280668596]:
   if (prephitmp_12 != 0)
-    goto <bb 20>; [100.00%]
+    goto <bb 20>; [33.00%]
   else
-    goto <bb 19>; [INV]
+    goto <bb 19>; [67.00%]

-  <bb 20> [local count: 850510901]:
+  <bb 20> [local count: 280668596]:
   goto <bb 5>; [100.00%]

-  <bb 19> [count: 0]:
+  <bb 19> [local count: 70429947]:
   # i_23 = PHI <i_10(21)>
   # prephitmp_25 = PHI <prephitmp_5(21)>

-  <bb 12> [local count: 955630225]:
+  <bb 12> [local count: 640272252]:
   # i_15 = PHI <i_23(19), i_22(16)>
   # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
     goto <bb 16>; [89.00%]
   else
     goto <bb 11>; [11.00%]

-  <bb 16> [local count: 850510901]:
+  <bb 16> [local count: 569842305]:
   goto <bb 12>; [100.00%]

> 
> Honza
>>
>> Richard.

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-10-27  1:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03  8:58 [PATCH] Fix loop split incorrect count and probability Xionghu Luo
2021-08-04  2:42 ` Xionghu Luo
2021-08-06 11:46 ` Richard Biener
2021-08-09  2:37   ` Xionghu Luo
2021-08-10 14:47     ` Richard Biener
2021-08-11  3:03       ` Feng Xue OS
2021-10-26 13:05         ` Jan Hubicka
2021-10-27  1:42           ` Xionghu Luo
2021-08-11  8:32       ` Xionghu Luo
2021-08-11  9:16         ` Richard Biener
2021-08-12  3:24           ` Xionghu Luo
2021-09-22  8:40           ` Xionghu Luo
2021-09-23 12:17             ` Richard Biener
2021-10-15  5:51               ` Xionghu Luo
2021-10-21  8:43                 ` Xionghu Luo
2021-10-21 10:55                   ` Richard Biener
2021-10-26  5:40                     ` Xionghu Luo
2021-10-26 11:59                       ` Richard Biener
2021-10-26 12:19                         ` Jan Hubicka
2021-08-09  4:33   ` Feng Xue OS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).