public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH GCC][2/2]Refine CFG and bound information for split loops
@ 2017-06-14 13:08 Bin Cheng
  2017-06-14 16:05 ` Bernhard Reutner-Fischer
  2017-06-30 16:09 ` Jeff Law
  0 siblings, 2 replies; 4+ messages in thread
From: Bin Cheng @ 2017-06-14 13:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

Hi,
Loop split currently generates below control flow graph for split loops:
+
+               .------ guard1  ------.
+               v                     v
+         pre1(loop1)    .---------->pre2(loop2)
+              |         |            |
+        .--->h1         |           h2<----.
+        |     |         |            |     |
+        |    ex1---.    |       .---ex2    |
+        |    /     v    |       |     \    |
+        '---l1 guard2---'       |     l2---'
+                   |            |
+                   |            |
+                   '--->join<---'
+
In which,
+   LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs
+   controling if loop2 should be executed.

Take added test case as an example, the second loop only iterates for 1 time,
as a result, the CFG and loop niter bound information can be refined.  In general,
guard1/guard2 can be folded to true/false if loop2's niters is known at compilation
time.  This patch does such improvement by analyzing and refining niters of
loop2; as well as using that information to simplify CFG.  With this patch,
the second split loop as in test can be completely unrolled by later passes.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin
2017-06-12  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-split.c (compute_new_first_bound): Compute and
	return bound information for the second split loop.
	(adjust_loop_split): New function.
	(split_loop): Update calls to above functions.

gcc/testsuite/ChangeLog
2017-06-12  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/loop-split-1.c: New test.

[-- Attachment #2: 0002-lsplit-refine-cfg-niter-bound-20170601.txt --]
[-- Type: text/plain, Size: 11709 bytes --]

From 61855c74c7db6178008f007198aedee9a03f13e6 Mon Sep 17 00:00:00 2001
From: amker <amker@amker-laptop.(none)>
Date: Sun, 4 Jun 2017 02:26:34 +0800
Subject: [PATCH 2/2] lsplit-refine-cfg-niter-bound-20170601.txt

---
 gcc/testsuite/gcc.dg/loop-split-1.c |  16 +++
 gcc/tree-ssa-loop-split.c           | 197 ++++++++++++++++++++++++++++++++----
 2 files changed, 194 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/loop-split-1.c

diff --git a/gcc/testsuite/gcc.dg/loop-split-1.c b/gcc/testsuite/gcc.dg/loop-split-1.c
new file mode 100644
index 0000000..2fd3e04
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+int foo (int *a, int *b, int len)
+{
+  int k;
+  for (k = 1; k <= len; k++)
+    {
+      a[k]++;
+
+      if (k < len)
+	b[k]++;
+    }
+}
+
+/* { dg-final { scan-tree-dump "The second split loop iterates at 0 latch times." "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index f8fe8e6..73c7dc2 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -364,10 +364,9 @@ connect_loops (struct loop *loop1, struct loop *loop2)
 
 /* This returns the new bound for iterations given the original iteration
    space in NITER, an arbitrary new bound BORDER, assumed to be some
-   comparison value with a different IV, the initial value GUARD_INIT of
-   that other IV, and the comparison code GUARD_CODE that compares
-   that other IV with BORDER.  We return an SSA name, and place any
-   necessary statements for that computation into *STMTS.
+   comparison value with a different IV, the initial value GUARD_INIT and
+   STEP of the other IV, and the comparison code GUARD_CODE that compares
+   that other IV with BORDER.
 
    For example for such a loop:
 
@@ -387,28 +386,41 @@ connect_loops (struct loop *loop1, struct loop *loop2)
 
    Depending on the direction of the IVs and if the exit tests
    are strict or non-strict we need to use MIN or MAX,
-   and add or subtract 1.  This routine computes newend above.  */
+   and add or subtract 1.  This routine computes newend above.
+
+   After computing the new bound (on j), we may be able to compare the
+   first loop's iteration space against the original loop's.  If it is
+   comparable at compilation time, we may be able to compute the niter
+   bound of the second loop.  Record the second loop's iteration bound
+   to SECOND_LOOP_NITER_BOUND which has below meaning:
+
+     -3: Don't know anything about the second loop;
+     -2: The second loop must not be executed;
+     -1: The second loop must be executed, but niter bound is unknown;
+      n: The second loop must be executed, niter bound is n (>= 0);
+
+   Note we compute niter bound for the second loop's latch.  */
 
 static tree
-compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
-			 tree border,
-			 enum tree_code guard_code, tree guard_init)
+compute_new_first_bound (struct tree_niter_desc *niter, tree border,
+			 enum tree_code guard_code, tree guard_init,
+			 tree step, HOST_WIDE_INT *second_loop_niter_bound)
 {
-  /* The niter structure contains the after-increment IV, we need
-     the loop-enter base, so subtract STEP once.  */
   tree controlbase = niter->control.base;
   tree controlstep = niter->control.step;
   tree enddiff, end = niter->bound;
   tree type;
 
-  /* Compute end-beg.  */
+  /* Compute end-beg.
+     Note the niter structure contains the after-increment IV, we need the
+     loop-enter base, so subtract STEP once.  */
   if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
     {
       controlstep = fold_build1 (NEGATE_EXPR,
 				 TREE_TYPE (controlstep), controlstep);
       enddiff = fold_build_pointer_plus (controlbase, controlstep);
 
-      type = unsigned_type_for (enddiff);
+      type = unsigned_type_for (TREE_TYPE (enddiff));
       enddiff = fold_build1 (NEGATE_EXPR, type, fold_convert (type, enddiff));
       end = fold_convert (type, end);
       enddiff = fold_build2 (PLUS_EXPR, type, end, enddiff);
@@ -432,15 +444,16 @@ compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
   /* Depending on the direction of the IVs the new bound for the first
      loop is the minimum or maximum of old bound and border.
      Also, if the guard condition isn't strictly less or greater,
-     we need to adjust the bound.  */ 
+     we need to adjust the bound.  */
   int addbound = 0;
-  enum tree_code minmax;
+  enum tree_code minmax, cmp_code;
   if (niter->cmp == LT_EXPR)
     {
       /* GT and LE are the same, inverted.  */
       if (guard_code == GT_EXPR || guard_code == LE_EXPR)
 	addbound = -1;
       minmax = MIN_EXPR;
+      cmp_code = LT_EXPR;
     }
   else
     {
@@ -448,6 +461,7 @@ compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
       if (guard_code == GE_EXPR || guard_code == LT_EXPR)
 	addbound = 1;
       minmax = MAX_EXPR;
+      cmp_code = GT_EXPR;
     }
 
   if (addbound)
@@ -462,8 +476,148 @@ compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
 			      build_int_cst (type, addbound));
     }
 
-  newbound = fold_build2 (minmax, TREE_TYPE (border), border, newbound);
-  return force_gimple_operand (newbound, stmts, true, NULL_TREE);
+  tree cmp_rslt = fold_build2 (cmp_code, boolean_type_node, border, newbound);
+  /* Only handle simple case with unit step.  */
+  if (wi::eq_p (wi::abs (step), 1)
+      && cmp_rslt != NULL_TREE && integer_nonzerop (cmp_rslt))
+    {
+      if (integer_nonzerop (cmp_rslt))
+	{
+	  tree niter_type, diff;
+
+	  niter_type = unsigned_type_for (TREE_TYPE (newbound));
+	  /* The split condition indicates smaller iteration space than the
+	     original loop, so the second loop must be executed.  */
+	  if (cmp_code == LT_EXPR)
+	    diff = fold_build2 (MINUS_EXPR, niter_type,
+				fold_convert (niter_type, newbound),
+				fold_convert (niter_type, border));
+	  else
+	    diff = fold_build2 (MINUS_EXPR, niter_type,
+				fold_convert (niter_type, border),
+				fold_convert (niter_type, newbound));
+
+	  /* Check if niter can be computed.  */
+	  if (tree_fits_shwi_p (diff) && !tree_int_cst_sign_bit (diff))
+	    *second_loop_niter_bound = tree_to_shwi (diff) - 1;
+	  else
+	    *second_loop_niter_bound = -1;
+
+	  return border;
+	}
+      else if (integer_zerop (cmp_rslt))
+	{
+	  /* The split condition indicates larger iteration space than the
+	     oiginal loop, so the second loop must not be executed.  */
+	  *second_loop_niter_bound = -2;
+	  return newbound;
+	}
+    }
+
+  cmp_rslt = fold_build2 (EQ_EXPR, boolean_type_node, border, newbound);
+  if (cmp_rslt != NULL_TREE && integer_nonzerop (cmp_rslt))
+    {
+      /* The split condition indicates the same iteration space as the
+	 original space, so the second loop must not be executed.  */
+      *second_loop_niter_bound = -2;
+      return border;
+    }
+  *second_loop_niter_bound = -3;
+  return fold_build2 (minmax, TREE_TYPE (border), border, newbound);
+}
+
+/* We have below control flow graph after versioning the original loop:
+
+               .------ guard1  ------.
+               v                     v
+         pre1(loop1)    .---------->pre2(loop2)
+              |         |            |
+        .--->h1         |           h2<----.
+        |     |         |            |     |
+        |    ex1---.    |       .---ex2    |
+        |    /     v    |       |     \    |
+        '---l1 guard2---'       |     l2---'
+                   |            |
+                   |            |
+                   '--->join<---'
+
+   LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs
+   controling if loop2 should be executed.  SECOND_LOOP_NITER_BOUND is the
+   niter bound for LOOP2's latch which has below meaning:
+
+     -3: Don't know anything about the second loop;
+     -2: The second loop must not be executed;
+     -1: The second loop must be executed, but niter bound is unknown;
+      n: The second loop must be executed, niter bound is n (>= 0).
+
+   With the information, this function folds GUARD1 and GUARD2 to true or
+   false so that LOOP2 is always executed or skipped.  It also sets upper
+   bound for LOOP2's niter.  */
+
+static void
+adjust_loop_split (struct loop *loop2,
+		   basic_block guard1, basic_block guard2,
+		   HOST_WIDE_INT second_loop_niter_bound)
+{
+  if (second_loop_niter_bound == -3)
+    return;
+
+  gcond *cond1 = as_a<gcond *> (last_stmt (guard1));
+  gcond *cond2 = as_a<gcond *> (last_stmt (guard2));
+
+  gcc_assert (EDGE_COUNT (guard1->succs) == 2);
+  gcc_assert (EDGE_COUNT (guard2->succs) == 2);
+  edge e = EDGE_SUCC (guard2, 0);
+  if (e->dest != loop_preheader_edge (loop2)->src)
+    {
+      e = EDGE_SUCC (guard2, 1);
+      gcc_assert (e->dest == loop_preheader_edge (loop2)->src);
+    }
+
+  bool invert_cond2 = (e->flags & EDGE_FALSE_VALUE);
+
+  /* Loop2 should never be executed.  */
+  if (second_loop_niter_bound == -2)
+    {
+      gimple_cond_make_true (cond1);
+      if (invert_cond2)
+	gimple_cond_make_true (cond2);
+      else
+	gimple_cond_make_false (cond2);
+
+      update_stmt (cond1);
+      update_stmt (cond2);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, ";; The second split loop is never executed.\n");
+
+      return;
+    }
+  /* Loop2 must be executed for unknown times.  */
+  if (second_loop_niter_bound == -1)
+    {
+      if (invert_cond2)
+	gimple_cond_make_false (cond2);
+      else
+	gimple_cond_make_true (cond2);
+
+      update_stmt (cond2);
+      return;
+    }
+
+  /* Loop2 must be executed for known times.  */
+  gcc_assert (second_loop_niter_bound >= 0);
+  if (invert_cond2)
+    gimple_cond_make_false (cond2);
+  else
+    gimple_cond_make_true (cond2);
+
+  update_stmt (cond2);
+  record_niter_bound (loop2, second_loop_niter_bound, true, true);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, ";; The second split loop iterates at %i latch"
+	     " times.\n", (int)max_loop_iterations_int (loop2));
 }
 
 /* Checks if LOOP contains an conditional block whose condition
@@ -536,7 +690,7 @@ split_loop (struct loop *loop1, struct tree_niter_desc *niter)
 					    stmts2);
 	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
 	if (!initial_true)
-	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
 
 	/* Now version the loop, placing loop2 after loop1 connecting
 	   them, and fix up SSA form for that.  */
@@ -558,13 +712,18 @@ split_loop (struct loop *loop1, struct tree_niter_desc *niter)
 	   Compute the new bound for the guarding IV and patch the
 	   loop exit to use it instead of original IV and bound.  */
 	gimple_seq stmts = NULL;
-	tree newend = compute_new_first_bound (&stmts, niter, border,
-					       guard_code, guard_init);
+	HOST_WIDE_INT second_loop_niter_bound;
+	tree newend = compute_new_first_bound (niter, border, guard_code,
+					       guard_init, iv.step,
+					       &second_loop_niter_bound);
+	newend = force_gimple_operand (newend, &stmts, true, NULL_TREE);
 	if (stmts)
 	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
 					    stmts);
 	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
 	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
+	adjust_loop_split (loop2, cond_bb, new_e->src,
+			   second_loop_niter_bound);
 
 	/* Finally patch out the two copies of the condition to be always
 	   true/false (or opposite).  */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH GCC][2/2]Refine CFG and bound information for split loops
  2017-06-14 13:08 [PATCH GCC][2/2]Refine CFG and bound information for split loops Bin Cheng
@ 2017-06-14 16:05 ` Bernhard Reutner-Fischer
  2017-06-30 16:09 ` Jeff Law
  1 sibling, 0 replies; 4+ messages in thread
From: Bernhard Reutner-Fischer @ 2017-06-14 16:05 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

On 14 June 2017 at 15:08, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> Loop split currently generates below control flow graph for split loops:
> +
> +               .------ guard1  ------.
> +               v                     v
> +         pre1(loop1)    .---------->pre2(loop2)
> +              |         |            |
> +        .--->h1         |           h2<----.
> +        |     |         |            |     |
> +        |    ex1---.    |       .---ex2    |
> +        |    /     v    |       |     \    |
> +        '---l1 guard2---'       |     l2---'
> +                   |            |
> +                   |            |
> +                   '--->join<---'
> +
> In which,
> +   LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs
> +   controling if loop2 should be executed.
>
> Take added test case as an example, the second loop only iterates for 1 time,
> as a result, the CFG and loop niter bound information can be refined.  In general,
> guard1/guard2 can be folded to true/false if loop2's niters is known at compilation
> time.  This patch does such improvement by analyzing and refining niters of
> loop2; as well as using that information to simplify CFG.  With this patch,
> the second split loop as in test can be completely unrolled by later passes.

@@ -462,8 +476,148 @@ compute_new_first_bound (gimple_seq *stmts,
struct tree_niter_desc *niter,
       build_int_cst (type, addbound));
     }

-  newbound = fold_build2 (minmax, TREE_TYPE (border), border, newbound);
-  return force_gimple_operand (newbound, stmts, true, NULL_TREE);
+  tree cmp_rslt = fold_build2 (cmp_code, boolean_type_node, border, newbound);
+  /* Only handle simple case with unit step.  */
+  if (wi::eq_p (wi::abs (step), 1)
+      && cmp_rslt != NULL_TREE && integer_nonzerop (cmp_rslt))

Did you mean to drop the above integer_nonzerop() in favour of
distinguishing -1 and -2 below or did you disable -2 on purpose?
If so then i'd swap the order of the NULL check and the wi:: as
supposedly faster and easier to read..

thanks,

+    {
+      if (integer_nonzerop (cmp_rslt))
+ {
+  tree niter_type, diff;
+
+  niter_type = unsigned_type_for (TREE_TYPE (newbound));
+  /* The split condition indicates smaller iteration space than the
+     original loop, so the second loop must be executed.  */
+  if (cmp_code == LT_EXPR)
+    diff = fold_build2 (MINUS_EXPR, niter_type,
+ fold_convert (niter_type, newbound),
+ fold_convert (niter_type, border));
+  else
+    diff = fold_build2 (MINUS_EXPR, niter_type,
+ fold_convert (niter_type, border),
+ fold_convert (niter_type, newbound));
+
+  /* Check if niter can be computed.  */
+  if (tree_fits_shwi_p (diff) && !tree_int_cst_sign_bit (diff))
+    *second_loop_niter_bound = tree_to_shwi (diff) - 1;
+  else
+    *second_loop_niter_bound = -1;
+
+  return border;
+ }
+      else if (integer_zerop (cmp_rslt))
+ {
+  /* The split condition indicates larger iteration space than the
+     oiginal loop, so the second loop must not be executed.  */
+  *second_loop_niter_bound = -2;
+  return newbound;
+ }
+    }
+

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH GCC][2/2]Refine CFG and bound information for split loops
  2017-06-14 13:08 [PATCH GCC][2/2]Refine CFG and bound information for split loops Bin Cheng
  2017-06-14 16:05 ` Bernhard Reutner-Fischer
@ 2017-06-30 16:09 ` Jeff Law
  2017-07-04 15:13   ` Bin.Cheng
  1 sibling, 1 reply; 4+ messages in thread
From: Jeff Law @ 2017-06-30 16:09 UTC (permalink / raw)
  To: Bin Cheng, gcc-patches; +Cc: nd

On 06/14/2017 07:08 AM, Bin Cheng wrote:
> Hi,
> Loop split currently generates below control flow graph for split loops:
> +
> +               .------ guard1  ------.
> +               v                     v
> +         pre1(loop1)    .---------->pre2(loop2)
> +              |         |            |
> +        .--->h1         |           h2<----.
> +        |     |         |            |     |
> +        |    ex1---.    |       .---ex2    |
> +        |    /     v    |       |     \    |
> +        '---l1 guard2---'       |     l2---'
> +                   |            |
> +                   |            |
> +                   '--->join<---'
> +
> In which,
> +   LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs
> +   controling if loop2 should be executed.
> 
> Take added test case as an example, the second loop only iterates for 1 time,
> as a result, the CFG and loop niter bound information can be refined.  In general,
> guard1/guard2 can be folded to true/false if loop2's niters is known at compilation
> time.  This patch does such improvement by analyzing and refining niters of
> loop2; as well as using that information to simplify CFG.  With this patch,
> the second split loop as in test can be completely unrolled by later passes.
In your testcase the second loop iterates precisely once, right?  In
fact, we know it always executes precisely one time regardless of the
incoming value of LEN.  That's the property you're trying to detect and
exploit, right?


> 
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
> 
> Thanks,
> bin
> 2017-06-12  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* tree-ssa-loop-split.c (compute_new_first_bound): Compute and
> 	return bound information for the second split loop.
> 	(adjust_loop_split): New function.
> 	(split_loop): Update calls to above functions.
> 
> gcc/testsuite/ChangeLog
> 2017-06-12  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* gcc.dg/loop-split-1.c: New test.
> 
> 
> 0002-lsplit-refine-cfg-niter-bound-20170601.txt
> 
> 
> From 61855c74c7db6178008f007198aedee9a03f13e6 Mon Sep 17 00:00:00 2001
> From: amker <amker@amker-laptop.(none)>
> Date: Sun, 4 Jun 2017 02:26:34 +0800
> Subject: [PATCH 2/2] lsplit-refine-cfg-niter-bound-20170601.txt
> 
> ---
>  gcc/testsuite/gcc.dg/loop-split-1.c |  16 +++
>  gcc/tree-ssa-loop-split.c           | 197 ++++++++++++++++++++++++++++++++----
>  2 files changed, 194 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/loop-split-1.c
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f8fe8e6..73c7dc2 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -387,28 +386,41 @@ connect_loops (struct loop *loop1, struct loop *loop2)
>  
>     Depending on the direction of the IVs and if the exit tests
>     are strict or non-strict we need to use MIN or MAX,
> -   and add or subtract 1.  This routine computes newend above.  */
> +   and add or subtract 1.  This routine computes newend above.
> +
> +   After computing the new bound (on j), we may be able to compare the
> +   first loop's iteration space against the original loop's.  If it is
> +   comparable at compilation time, we may be able to compute the niter
> +   bound of the second loop.  Record the second loop's iteration bound
> +   to SECOND_LOOP_NITER_BOUND which has below meaning:
> +
> +     -3: Don't know anything about the second loop;
> +     -2: The second loop must not be executed;
> +     -1: The second loop must be executed, but niter bound is unknown;
> +      n: The second loop must be executed, niter bound is n (>= 0);
> +
> +   Note we compute niter bound for the second loop's latch.  */
How hard would it be to have a test for each of the 4 cases above?  I
don't always ask for this, but ISTM this is a good chance to make sure
most of the new code gets covered by testing.

Does case -2 really occur in practice or are you just being safe?  ISTM
that if case -2 occurs then the loop shouldn't have been split to begin
with.  If this happens in practice, do we clean up all the dead code if
the bounds are set properly.

Similarly for N = 0, presumably meaning the second loop iterates exactly
once, but never traverses its backedge, is there any value in changing
the loop latch test so that it's always false?  Or will that get cleaned
up later?


>  
>  static tree
> -compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
> -			 tree border,
> -			 enum tree_code guard_code, tree guard_init)
> +compute_new_first_bound (struct tree_niter_desc *niter, tree border,
> +			 enum tree_code guard_code, tree guard_init,
> +			 tree step, HOST_WIDE_INT *second_loop_niter_bound)
>  {
> -  /* The niter structure contains the after-increment IV, we need
> -     the loop-enter base, so subtract STEP once.  */
>    tree controlbase = niter->control.base;
>    tree controlstep = niter->control.step;
>    tree enddiff, end = niter->bound;
>    tree type;
>  
> -  /* Compute end-beg.  */
> +  /* Compute end-beg.
> +     Note the niter structure contains the after-increment IV, we need the
> +     loop-enter base, so subtract STEP once.  */
>    if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>      {
>        controlstep = fold_build1 (NEGATE_EXPR,
>  				 TREE_TYPE (controlstep), controlstep);
>        enddiff = fold_build_pointer_plus (controlbase, controlstep);
>  
> -      type = unsigned_type_for (enddiff);
> +      type = unsigned_type_for (TREE_TYPE (enddiff));
Isn't this an indication the trunk sources are broken?  Consider pushing
this forward independent of the rest of this change if it looks like
we're going to have to iterate more than once or twice.



> @@ -462,8 +476,148 @@ compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>  			      build_int_cst (type, addbound));
>      }
>  
> -  newbound = fold_build2 (minmax, TREE_TYPE (border), border, newbound);
> -  return force_gimple_operand (newbound, stmts, true, NULL_TREE);
> +  tree cmp_rslt = fold_build2 (cmp_code, boolean_type_node, border, newbound);
> +  /* Only handle simple case with unit step.  */
> +  if (wi::eq_p (wi::abs (step), 1)
> +      && cmp_rslt != NULL_TREE && integer_nonzerop (cmp_rslt))
> +    {
> +      if (integer_nonzerop (cmp_rslt))
> +	{
> +	  tree niter_type, diff;
> +
> +	  niter_type = unsigned_type_for (TREE_TYPE (newbound));
> +	  /* The split condition indicates smaller iteration space than the
> +	     original loop, so the second loop must be executed.  */
> +	  if (cmp_code == LT_EXPR)
> +	    diff = fold_build2 (MINUS_EXPR, niter_type,
> +				fold_convert (niter_type, newbound),
> +				fold_convert (niter_type, border));
> +	  else
> +	    diff = fold_build2 (MINUS_EXPR, niter_type,
> +				fold_convert (niter_type, border),
> +				fold_convert (niter_type, newbound));
> +
> +	  /* Check if niter can be computed.  */
> +	  if (tree_fits_shwi_p (diff) && !tree_int_cst_sign_bit (diff))
> +	    *second_loop_niter_bound = tree_to_shwi (diff) - 1;
> +	  else
> +	    *second_loop_niter_bound = -1;
> +
> +	  return border;
> +	}
> +      else if (integer_zerop (cmp_rslt))
> +	{
> +	  /* The split condition indicates larger iteration space than the
> +	     oiginal loop, so the second loop must not be executed.  */
s/oiginal/original/


Overall I like it and it looks reasonably close to being ready.   Let's
get the answers to the few questions above and decide how to move
forward after that.

Thanks,

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH GCC][2/2]Refine CFG and bound information for split loops
  2017-06-30 16:09 ` Jeff Law
@ 2017-07-04 15:13   ` Bin.Cheng
  0 siblings, 0 replies; 4+ messages in thread
From: Bin.Cheng @ 2017-07-04 15:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bin Cheng, gcc-patches, Bernhard Reutner-Fischer

On Fri, Jun 30, 2017 at 5:09 PM, Jeff Law <law@redhat.com> wrote:
> On 06/14/2017 07:08 AM, Bin Cheng wrote:
>> Hi,
>> Loop split currently generates below control flow graph for split loops:
>> +
>> +               .------ guard1  ------.
>> +               v                     v
>> +         pre1(loop1)    .---------->pre2(loop2)
>> +              |         |            |
>> +        .--->h1         |           h2<----.
>> +        |     |         |            |     |
>> +        |    ex1---.    |       .---ex2    |
>> +        |    /     v    |       |     \    |
>> +        '---l1 guard2---'       |     l2---'
>> +                   |            |
>> +                   |            |
>> +                   '--->join<---'
>> +
>> In which,
>> +   LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs
>> +   controling if loop2 should be executed.
>>
>> Take added test case as an example, the second loop only iterates for 1 time,
>> as a result, the CFG and loop niter bound information can be refined.  In general,
>> guard1/guard2 can be folded to true/false if loop2's niters is known at compilation
>> time.  This patch does such improvement by analyzing and refining niters of
>> loop2; as well as using that information to simplify CFG.  With this patch,
>> the second split loop as in test can be completely unrolled by later passes.
> In your testcase the second loop iterates precisely once, right?  In
> fact, we know it always executes precisely one time regardless of the
> incoming value of LEN.  That's the property you're trying to detect and
> exploit, right?
>
>
>>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Thanks,
>> bin
>> 2017-06-12  Bin Cheng  <bin.cheng@arm.com>
>>
>>       * tree-ssa-loop-split.c (compute_new_first_bound): Compute and
>>       return bound information for the second split loop.
>>       (adjust_loop_split): New function.
>>       (split_loop): Update calls to above functions.
>>
>> gcc/testsuite/ChangeLog
>> 2017-06-12  Bin Cheng  <bin.cheng@arm.com>
>>
>>       * gcc.dg/loop-split-1.c: New test.
>>
>>
>> 0002-lsplit-refine-cfg-niter-bound-20170601.txt
>>
>>
>> From 61855c74c7db6178008f007198aedee9a03f13e6 Mon Sep 17 00:00:00 2001
>> From: amker <amker@amker-laptop.(none)>
>> Date: Sun, 4 Jun 2017 02:26:34 +0800
>> Subject: [PATCH 2/2] lsplit-refine-cfg-niter-bound-20170601.txt
>>
>> ---
>>  gcc/testsuite/gcc.dg/loop-split-1.c |  16 +++
>>  gcc/tree-ssa-loop-split.c           | 197 ++++++++++++++++++++++++++++++++----
>>  2 files changed, 194 insertions(+), 19 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/loop-split-1.c
>>
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index f8fe8e6..73c7dc2 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -387,28 +386,41 @@ connect_loops (struct loop *loop1, struct loop *loop2)
>>
>>     Depending on the direction of the IVs and if the exit tests
>>     are strict or non-strict we need to use MIN or MAX,
>> -   and add or subtract 1.  This routine computes newend above.  */
>> +   and add or subtract 1.  This routine computes newend above.
>> +
>> +   After computing the new bound (on j), we may be able to compare the
>> +   first loop's iteration space against the original loop's.  If it is
>> +   comparable at compilation time, we may be able to compute the niter
>> +   bound of the second loop.  Record the second loop's iteration bound
>> +   to SECOND_LOOP_NITER_BOUND which has below meaning:
>> +
>> +     -3: Don't know anything about the second loop;
>> +     -2: The second loop must not be executed;
>> +     -1: The second loop must be executed, but niter bound is unknown;
>> +      n: The second loop must be executed, niter bound is n (>= 0);
>> +
>> +   Note we compute niter bound for the second loop's latch.  */
> How hard would it be to have a test for each of the 4 cases above?  I
> don't always ask for this, but ISTM this is a good chance to make sure
> most of the new code gets covered by testing.
>
> Does case -2 really occur in practice or are you just being safe?  ISTM
> that if case -2 occurs then the loop shouldn't have been split to begin
> with.  If this happens in practice, do we clean up all the dead code if
> the bounds are set properly.
>
> Similarly for N = 0, presumably meaning the second loop iterates exactly
> once, but never traverses its backedge, is there any value in changing
> the loop latch test so that it's always false?  Or will that get cleaned
> up later?
>
>
>>
>>  static tree
>> -compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>> -                      tree border,
>> -                      enum tree_code guard_code, tree guard_init)
>> +compute_new_first_bound (struct tree_niter_desc *niter, tree border,
>> +                      enum tree_code guard_code, tree guard_init,
>> +                      tree step, HOST_WIDE_INT *second_loop_niter_bound)
>>  {
>> -  /* The niter structure contains the after-increment IV, we need
>> -     the loop-enter base, so subtract STEP once.  */
>>    tree controlbase = niter->control.base;
>>    tree controlstep = niter->control.step;
>>    tree enddiff, end = niter->bound;
>>    tree type;
>>
>> -  /* Compute end-beg.  */
>> +  /* Compute end-beg.
>> +     Note the niter structure contains the after-increment IV, we need the
>> +     loop-enter base, so subtract STEP once.  */
>>    if (POINTER_TYPE_P (TREE_TYPE (controlbase)))
>>      {
>>        controlstep = fold_build1 (NEGATE_EXPR,
>>                                TREE_TYPE (controlstep), controlstep);
>>        enddiff = fold_build_pointer_plus (controlbase, controlstep);
>>
>> -      type = unsigned_type_for (enddiff);
>> +      type = unsigned_type_for (TREE_TYPE (enddiff));
> Isn't this an indication the trunk sources are broken?  Consider pushing
> this forward independent of the rest of this change if it looks like
> we're going to have to iterate more than once or twice.
>
>
>
>> @@ -462,8 +476,148 @@ compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>>                             build_int_cst (type, addbound));
>>      }
>>
>> -  newbound = fold_build2 (minmax, TREE_TYPE (border), border, newbound);
>> -  return force_gimple_operand (newbound, stmts, true, NULL_TREE);
>> +  tree cmp_rslt = fold_build2 (cmp_code, boolean_type_node, border, newbound);
>> +  /* Only handle simple case with unit step.  */
>> +  if (wi::eq_p (wi::abs (step), 1)
>> +      && cmp_rslt != NULL_TREE && integer_nonzerop (cmp_rslt))
>> +    {
>> +      if (integer_nonzerop (cmp_rslt))
>> +     {
>> +       tree niter_type, diff;
>> +
>> +       niter_type = unsigned_type_for (TREE_TYPE (newbound));
>> +       /* The split condition indicates smaller iteration space than the
>> +          original loop, so the second loop must be executed.  */
>> +       if (cmp_code == LT_EXPR)
>> +         diff = fold_build2 (MINUS_EXPR, niter_type,
>> +                             fold_convert (niter_type, newbound),
>> +                             fold_convert (niter_type, border));
>> +       else
>> +         diff = fold_build2 (MINUS_EXPR, niter_type,
>> +                             fold_convert (niter_type, border),
>> +                             fold_convert (niter_type, newbound));
>> +
>> +       /* Check if niter can be computed.  */
>> +       if (tree_fits_shwi_p (diff) && !tree_int_cst_sign_bit (diff))
>> +         *second_loop_niter_bound = tree_to_shwi (diff) - 1;
>> +       else
>> +         *second_loop_niter_bound = -1;
>> +
>> +       return border;
>> +     }
>> +      else if (integer_zerop (cmp_rslt))
>> +     {
>> +       /* The split condition indicates larger iteration space than the
>> +          oiginal loop, so the second loop must not be executed.  */
> s/oiginal/original/
>
>
> Overall I like it and it looks reasonably close to being ready.   Let's
> get the answers to the few questions above and decide how to move
> forward after that.
Hi,
Sorry for being late.  I will update patch according to your comments,
but firstly I need to rework the first [01/02] patch on which this one
is based.

Thanks,
bin
>
> Thanks,
>
> Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-04 15:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-14 13:08 [PATCH GCC][2/2]Refine CFG and bound information for split loops Bin Cheng
2017-06-14 16:05 ` Bernhard Reutner-Fischer
2017-06-30 16:09 ` Jeff Law
2017-07-04 15:13   ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).