[PATCH] Handling == or != comparisons that may affect range test optimization.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Handling == or != comparisons that may affect range test optimization.
@ 2013-11-01  0:03 Cong Hou
  2013-11-05 20:46 ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Hou @ 2013-11-01  0:03 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 6003 bytes --]

(This patch is for the bug 58728:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58728)

As in the bug report, consider the following loop:

int foo(unsigned int n)
{
  if (n != 0)
  if (n != 1)
  if (n != 2)
  if (n != 3)
  if (n != 4)
    return ++n;
  return n;
}

The range test optimization should be able to merge all those five
conditions into one in reassoc pass, but I fails to do so. The reason
is that the phi arg of n is replaced by the constant it compares to in
case of == or != comparisons (in vrp pass). GCC checks there is no
side effect on n between any two neighboring conditions by examining
if they defined the same phi arg in the join node. But as the phi arg
is replace by a constant, the check fails.

This patch deals with this situation by considering the existence of
== or != comparisons, which is attached below (a text file is also
attached with proper tabs). Bootstrap and make check both get passed.

Any comment?


thanks,
Cong




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..9247222 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2013-10-31  Cong Hou  <congh@google.com>
+
+ PR tree-optimization/58728
+ * tree-ssa-reassoc.c (suitable_cond_bb): Consider the situtation
+ that ==/!= comparisons between a variable and a constant may lead
+ to that the later phi arg of the variable is substitued by the
+ constant from prior passes, during range test optimization.
+
 2013-10-14  David Malcolm  <dmalcolm@redhat.com>

  * dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 075d071..44a5e70 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-31  Cong Hou  <congh@google.com>
+
+ PR tree-optimization/58728
+ * gcc.dg/tree-ssa/pr58728: New test.
+
 2013-10-14  Tobias Burnus  <burnus@net-b.de>

  PR fortran/58658
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr58728.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr58728.c
new file mode 100644
index 0000000..312aebc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr58728.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1-details" } */
+
+int foo (unsigned int n)
+{
+  if (n != 0)
+    if (n != 1)
+      return ++n;
+  return n;
+}
+
+int bar (unsigned int n)
+{
+  if (n == 0)
+    ;
+  else if (n == 1)
+    ;
+  else
+    return ++n;
+  return n;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Optimizing range tests" 2
"reassoc1" } } */
+/* { dg-final { cleanup-tree-dump "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 6859518..bccf99f 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2426,11 +2426,70 @@ suitable_cond_bb (basic_block bb, basic_block
test_bb, basic_block *other_bb,
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gimple phi = gsi_stmt (gsi);
+      tree phi_arg = gimple_phi_arg_def (phi, e->dest_idx);
+      tree phi_arg2 = gimple_phi_arg_def (phi, e2->dest_idx);
+
       /* If both BB and TEST_BB end with GIMPLE_COND, all PHI arguments
  corresponding to BB and TEST_BB predecessor must be the same.  */
-      if (!operand_equal_p (gimple_phi_arg_def (phi, e->dest_idx),
-    gimple_phi_arg_def (phi, e2->dest_idx), 0))
- {
+      if (!operand_equal_p (phi_arg, phi_arg2, 0))
+      {
+ /* If the condition in BB or TEST_BB is an NE or EQ comparison like
+   if (n != N) or if (n == N), it is possible that the corresponding
+   def of n in the phi function is replaced by N.  We should still allow
+   range test optimization in this case.  */
+
+ tree lhs = NULL, rhs = NULL,
+     lhs2 = NULL, rhs2 = NULL;
+ bool is_eq_expr = is_cond && (gimple_cond_code (stmt) == NE_EXPR
+ || gimple_cond_code (stmt) == EQ_EXPR)
+  && TREE_CODE (phi_arg) == INTEGER_CST;
+
+ if (is_eq_expr)
+  {
+    lhs = gimple_cond_lhs (stmt);
+    rhs = gimple_cond_rhs (stmt);
+
+    if (operand_equal_p (lhs, phi_arg, 0))
+      {
+ tree t = lhs;
+ lhs = rhs;
+ rhs = t;
+      }
+    if (operand_equal_p (rhs, phi_arg, 0)
+ && operand_equal_p (lhs, phi_arg2, 0))
+      continue;
+  }
+
+ gimple stmt2 = last_stmt (test_bb);
+ bool is_eq_expr2 = gimple_code (stmt2) == GIMPLE_COND
+     && (gimple_cond_code (stmt2) == NE_EXPR
+ || gimple_cond_code (stmt2) == EQ_EXPR)
+     && TREE_CODE (phi_arg2) == INTEGER_CST;
+
+ if (is_eq_expr2)
+  {
+    lhs2 = gimple_cond_lhs (stmt2);
+    rhs2 = gimple_cond_rhs (stmt2);
+
+    if (operand_equal_p (lhs2, phi_arg2, 0))
+      {
+ tree t = lhs2;
+ lhs2 = rhs2;
+ rhs2 = t;
+      }
+    if (operand_equal_p (rhs2, phi_arg2, 0)
+ && operand_equal_p (lhs2, phi_arg, 0))
+      continue;
+  }
+
+ if (is_eq_expr && is_eq_expr2)
+  {
+    if (operand_equal_p (rhs, phi_arg, 0)
+ && operand_equal_p (rhs2, phi_arg2, 0)
+ && operand_equal_p (lhs, lhs2, 0))
+      continue;
+  }
+
   /* Otherwise, if one of the blocks doesn't end with GIMPLE_COND,
      one of the PHIs should have the lhs of the last stmt in
      that block as PHI arg and that PHI should have 0 or 1
@@ -2438,21 +2497,16 @@ suitable_cond_bb (basic_block bb, basic_block
test_bb, basic_block *other_bb,
      considered.  */
   if (!is_cond)
     {
-      if (gimple_phi_arg_def (phi, e->dest_idx)
-  == gimple_assign_lhs (stmt)
-  && (integer_zerop (gimple_phi_arg_def (phi, e2->dest_idx))
-      || integer_onep (gimple_phi_arg_def (phi,
-   e2->dest_idx))))
+      if (phi_arg == gimple_assign_lhs (stmt)
+  && (integer_zerop (phi_arg2) || integer_onep (phi_arg2)))
  continue;
     }
   else
     {
       gimple test_last = last_stmt (test_bb);
       if (gimple_code (test_last) != GIMPLE_COND
-  && gimple_phi_arg_def (phi, e2->dest_idx)
-     == gimple_assign_lhs (test_last)
-  && (integer_zerop (gimple_phi_arg_def (phi, e->dest_idx))
-      || integer_onep (gimple_phi_arg_def (phi, e->dest_idx))))
+  && phi_arg2 == gimple_assign_lhs (test_last)
+  && (integer_zerop (phi_arg) || integer_onep (phi_arg)))
  continue;
     }

[-- Attachment #2: patch-range-test.txt --]
[-- Type: text/plain, Size: 5146 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..9247222 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2013-10-31  Cong Hou  <congh@google.com>
+
+	PR tree-optimization/58728
+	* tree-ssa-reassoc.c (suitable_cond_bb): Consider the situtation
+	that ==/!= comparisons between a variable and a constant may lead
+	to that the later phi arg of the variable is substitued by the
+	constant from prior passes, during range test optimization.
+
 2013-10-14  David Malcolm  <dmalcolm@redhat.com>
 
 	* dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 075d071..44a5e70 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-31  Cong Hou  <congh@google.com>
+
+	PR tree-optimization/58728
+	* gcc.dg/tree-ssa/pr58728: New test.
+
 2013-10-14  Tobias Burnus  <burnus@net-b.de>
 
 	PR fortran/58658
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr58728.c b/gcc/testsuite/gcc.dg/tree-ssa/pr58728.c
new file mode 100644
index 0000000..312aebc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr58728.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1-details" } */
+
+int foo (unsigned int n)
+{
+  if (n != 0)
+    if (n != 1)
+      return ++n;
+  return n;
+}
+
+int bar (unsigned int n)
+{
+  if (n == 0)
+    ;
+  else if (n == 1)
+    ;
+  else
+    return ++n;
+  return n;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Optimizing range tests" 2 "reassoc1" } } */
+/* { dg-final { cleanup-tree-dump "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 6859518..bccf99f 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2426,11 +2426,70 @@ suitable_cond_bb (basic_block bb, basic_block test_bb, basic_block *other_bb,
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gimple phi = gsi_stmt (gsi);
+      tree phi_arg = gimple_phi_arg_def (phi, e->dest_idx);
+      tree phi_arg2 = gimple_phi_arg_def (phi, e2->dest_idx);
+
       /* If both BB and TEST_BB end with GIMPLE_COND, all PHI arguments
 	 corresponding to BB and TEST_BB predecessor must be the same.  */
-      if (!operand_equal_p (gimple_phi_arg_def (phi, e->dest_idx),
-			    gimple_phi_arg_def (phi, e2->dest_idx), 0))
-	{
+      if (!operand_equal_p (phi_arg, phi_arg2, 0))
+      {
+	/* If the condition in BB or TEST_BB is an NE or EQ comparison like
+	   if (n != N) or if (n == N), it is possible that the corresponding
+	   def of n in the phi function is replaced by N.  We should still allow
+	   range test optimization in this case.  */
+
+	tree lhs = NULL, rhs = NULL,
+	     lhs2 = NULL, rhs2 = NULL;
+	bool is_eq_expr = is_cond && (gimple_cond_code (stmt) == NE_EXPR
+					|| gimple_cond_code (stmt) == EQ_EXPR)
+				  && TREE_CODE (phi_arg) == INTEGER_CST;
+
+	if (is_eq_expr)
+	  {
+	    lhs = gimple_cond_lhs (stmt);
+	    rhs = gimple_cond_rhs (stmt);
+
+	    if (operand_equal_p (lhs, phi_arg, 0))
+	      {
+		tree t = lhs;
+		lhs = rhs;
+		rhs = t;
+	      }
+	    if (operand_equal_p (rhs, phi_arg, 0)
+		&& operand_equal_p (lhs, phi_arg2, 0))
+	      continue;
+	  }
+
+	gimple stmt2 = last_stmt (test_bb);
+	bool is_eq_expr2 = gimple_code (stmt2) == GIMPLE_COND
+			     && (gimple_cond_code (stmt2) == NE_EXPR
+				 || gimple_cond_code (stmt2) == EQ_EXPR)
+			     && TREE_CODE (phi_arg2) == INTEGER_CST;
+
+	if (is_eq_expr2)
+	  {
+	    lhs2 = gimple_cond_lhs (stmt2);
+	    rhs2 = gimple_cond_rhs (stmt2);
+
+	    if (operand_equal_p (lhs2, phi_arg2, 0))
+	      {
+		tree t = lhs2;
+		lhs2 = rhs2;
+		rhs2 = t;
+	      }
+	    if (operand_equal_p (rhs2, phi_arg2, 0)
+		&& operand_equal_p (lhs2, phi_arg, 0))
+	      continue;
+	  }
+
+	if (is_eq_expr && is_eq_expr2)
+	  {
+	    if (operand_equal_p (rhs, phi_arg, 0)
+		&& operand_equal_p (rhs2, phi_arg2, 0)
+		&& operand_equal_p (lhs, lhs2, 0))
+	      continue;
+	  }
+
 	  /* Otherwise, if one of the blocks doesn't end with GIMPLE_COND,
 	     one of the PHIs should have the lhs of the last stmt in
 	     that block as PHI arg and that PHI should have 0 or 1
@@ -2438,21 +2497,16 @@ suitable_cond_bb (basic_block bb, basic_block test_bb, basic_block *other_bb,
 	     considered.  */
 	  if (!is_cond)
 	    {
-	      if (gimple_phi_arg_def (phi, e->dest_idx)
-		  == gimple_assign_lhs (stmt)
-		  && (integer_zerop (gimple_phi_arg_def (phi, e2->dest_idx))
-		      || integer_onep (gimple_phi_arg_def (phi,
-							   e2->dest_idx))))
+	      if (phi_arg == gimple_assign_lhs (stmt)
+		  && (integer_zerop (phi_arg2) || integer_onep (phi_arg2)))
 		continue;
 	    }
 	  else
 	    {
 	      gimple test_last = last_stmt (test_bb);
 	      if (gimple_code (test_last) != GIMPLE_COND
-		  && gimple_phi_arg_def (phi, e2->dest_idx)
-		     == gimple_assign_lhs (test_last)
-		  && (integer_zerop (gimple_phi_arg_def (phi, e->dest_idx))
-		      || integer_onep (gimple_phi_arg_def (phi, e->dest_idx))))
+		  && phi_arg2 == gimple_assign_lhs (test_last)
+		  && (integer_zerop (phi_arg) || integer_onep (phi_arg)))
 		continue;
 	    }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Handling == or != comparisons that may affect range test optimization.
  2013-11-01  0:03 [PATCH] Handling == or != comparisons that may affect range test optimization Cong Hou
@ 2013-11-05 20:46 ` Jeff Law
  2013-11-05 20:59   ` Jakub Jelinek
  2013-11-05 22:03   ` Cong Hou
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff Law @ 2013-11-05 20:46 UTC (permalink / raw)
  To: Cong Hou, GCC Patches

On 10/31/13 18:03, Cong Hou wrote:
> (This patch is for the bug 58728:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58728)
>
> As in the bug report, consider the following loop:
>
> int foo(unsigned int n)
> {
>    if (n != 0)
>    if (n != 1)
>    if (n != 2)
>    if (n != 3)
>    if (n != 4)
>      return ++n;
>    return n;
> }
>
> The range test optimization should be able to merge all those five
> conditions into one in reassoc pass, but I fails to do so. The reason
> is that the phi arg of n is replaced by the constant it compares to in
> case of == or != comparisons (in vrp pass). GCC checks there is no
> side effect on n between any two neighboring conditions by examining
> if they defined the same phi arg in the join node. But as the phi arg
> is replace by a constant, the check fails.
>
> This patch deals with this situation by considering the existence of
> == or != comparisons, which is attached below (a text file is also
> attached with proper tabs). Bootstrap and make check both get passed.
>
> Any comment?

+	bool is_eq_expr = is_cond && (gimple_cond_code (stmt) == NE_EXPR
+					|| gimple_cond_code (stmt) == EQ_EXPR)
+				  && TREE_CODE (phi_arg) == INTEGER_CST;
+
+	if (is_eq_expr)
+	  {
+	    lhs = gimple_cond_lhs (stmt);
+	    rhs = gimple_cond_rhs (stmt);
+
+	    if (operand_equal_p (lhs, phi_arg, 0))
+	      {
+		tree t = lhs;
+		lhs = rhs;
+		rhs = t;
+	      }
+	    if (operand_equal_p (rhs, phi_arg, 0)
+		&& operand_equal_p (lhs, phi_arg2, 0))
+	      continue;
+	  }
+
+	gimple stmt2 = last_stmt (test_bb);
+	bool is_eq_expr2 = gimple_code (stmt2) == GIMPLE_COND
+			     && (gimple_cond_code (stmt2) == NE_EXPR
+				 || gimple_cond_code (stmt2) == EQ_EXPR)
+			     && TREE_CODE (phi_arg2) == INTEGER_CST;
+
+	if (is_eq_expr2)
+	  {
+	    lhs2 = gimple_cond_lhs (stmt2);
+	    rhs2 = gimple_cond_rhs (stmt2);
+
+	    if (operand_equal_p (lhs2, phi_arg2, 0))
+	      {
+		tree t = lhs2;
+		lhs2 = rhs2;
+		rhs2 = t;
+	      }
+	    if (operand_equal_p (rhs2, phi_arg2, 0)
+		&& operand_equal_p (lhs2, phi_arg, 0))
+	      continue;
+	  }

Can you factor those two hunks of nearly identical code into a single 
function and call it twice?  I'm also curious if you really need the 
code to swap lhs/rhs.  When can the LHS of a cond be an integer 
constant?  Don't we canonicalize it as <ssa_name> <COND> <constant>?

I'd probably write the ChangeLog as:

	* tree-ssa-reassoc.c (suitable_cond_bb): Handle constant PHI
	operands resulting from propagation of edge equivalences.


I'm also curious -- did this code show up in a particular benchmark, if 
so, which one?

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Handling == or != comparisons that may affect range test optimization.
  2013-11-05 20:46 ` Jeff Law
@ 2013-11-05 20:59   ` Jakub Jelinek
  2013-11-05 21:49     ` Cong Hou
  2013-11-05 22:03   ` Cong Hou
  1 sibling, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2013-11-05 20:59 UTC (permalink / raw)
  To: Jeff Law; +Cc: Cong Hou, GCC Patches

On Tue, Nov 05, 2013 at 01:23:00PM -0700, Jeff Law wrote:
> On 10/31/13 18:03, Cong Hou wrote:
> >(This patch is for the bug 58728:
> >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58728)
> >
> >As in the bug report, consider the following loop:
> >
> >int foo(unsigned int n)
> >{
> >   if (n != 0)
> >   if (n != 1)
> >   if (n != 2)
> >   if (n != 3)
> >   if (n != 4)
> >     return ++n;
> >   return n;
> >}
> >
> >The range test optimization should be able to merge all those five
> >conditions into one in reassoc pass, but I fails to do so. The reason
> >is that the phi arg of n is replaced by the constant it compares to in
> >case of == or != comparisons (in vrp pass). GCC checks there is no
> >side effect on n between any two neighboring conditions by examining
> >if they defined the same phi arg in the join node. But as the phi arg
> >is replace by a constant, the check fails.

I can't reproduce this, at least not on x86_64-linux with -O2,
the ifcombine pass already merges those.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Handling == or != comparisons that may affect range test optimization.
  2013-11-05 20:59   ` Jakub Jelinek
@ 2013-11-05 21:49     ` Cong Hou
  0 siblings, 0 replies; 16+ messages in thread
From: Cong Hou @ 2013-11-05 21:49 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, GCC Patches

It seems there are some changes in GCC. But if you change the type of
n into signed int, the issue appears again:


int foo(int n)
{
   if (n != 0)
   if (n != 1)
   if (n != 2)
   if (n != 3)
   if (n != 4)
     return ++n;
   return n;
}

Also, ifcombine also suffers from the same issue here.


thanks,
Cong


On Tue, Nov 5, 2013 at 12:53 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 05, 2013 at 01:23:00PM -0700, Jeff Law wrote:
>> On 10/31/13 18:03, Cong Hou wrote:
>> >(This patch is for the bug 58728:
>> >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58728)
>> >
>> >As in the bug report, consider the following loop:
>> >
>> >int foo(unsigned int n)
>> >{
>> >   if (n != 0)
>> >   if (n != 1)
>> >   if (n != 2)
>> >   if (n != 3)
>> >   if (n != 4)
>> >     return ++n;
>> >   return n;
>> >}
>> >
>> >The range test optimization should be able to merge all those five
>> >conditions into one in reassoc pass, but I fails to do so. The reason
>> >is that the phi arg of n is replaced by the constant it compares to in
>> >case of == or != comparisons (in vrp pass). GCC checks there is no
>> >side effect on n between any two neighboring conditions by examining
>> >if they defined the same phi arg in the join node. But as the phi arg
>> >is replace by a constant, the check fails.
>
> I can't reproduce this, at least not on x86_64-linux with -O2,
> the ifcombine pass already merges those.
>
>         Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Handling == or != comparisons that may affect range test optimization.
  2013-11-05 20:46 ` Jeff Law
  2013-11-05 20:59   ` Jakub Jelinek
@ 2013-11-05 22:03   ` Cong Hou
  2013-11-05 22:39     ` Jeff Law
  2013-11-06 17:25     ` [PATCH] Improve VRP assert creation for loops Jakub Jelinek
  1 sibling, 2 replies; 16+ messages in thread
From: Cong Hou @ 2013-11-05 22:03 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Tue, Nov 5, 2013 at 12:23 PM, Jeff Law <law@redhat.com> wrote:
> On 10/31/13 18:03, Cong Hou wrote:
>>
>> (This patch is for the bug 58728:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58728)
>>
>> As in the bug report, consider the following loop:
>>
>> int foo(unsigned int n)
>> {
>>    if (n != 0)
>>    if (n != 1)
>>    if (n != 2)
>>    if (n != 3)
>>    if (n != 4)
>>      return ++n;
>>    return n;
>> }
>>
>> The range test optimization should be able to merge all those five
>> conditions into one in reassoc pass, but I fails to do so. The reason
>> is that the phi arg of n is replaced by the constant it compares to in
>> case of == or != comparisons (in vrp pass). GCC checks there is no
>> side effect on n between any two neighboring conditions by examining
>> if they defined the same phi arg in the join node. But as the phi arg
>> is replace by a constant, the check fails.
>>
>> This patch deals with this situation by considering the existence of
>> == or != comparisons, which is attached below (a text file is also
>> attached with proper tabs). Bootstrap and make check both get passed.
>>
>> Any comment?
>
>
> +       bool is_eq_expr = is_cond && (gimple_cond_code (stmt) == NE_EXPR
> +                                       || gimple_cond_code (stmt) ==
> EQ_EXPR)
> +                                 && TREE_CODE (phi_arg) == INTEGER_CST;
> +
> +       if (is_eq_expr)
> +         {
> +           lhs = gimple_cond_lhs (stmt);
> +           rhs = gimple_cond_rhs (stmt);
> +
> +           if (operand_equal_p (lhs, phi_arg, 0))
> +             {
> +               tree t = lhs;
> +               lhs = rhs;
> +               rhs = t;
> +             }
> +           if (operand_equal_p (rhs, phi_arg, 0)
> +               && operand_equal_p (lhs, phi_arg2, 0))
> +             continue;
> +         }
> +
> +       gimple stmt2 = last_stmt (test_bb);
> +       bool is_eq_expr2 = gimple_code (stmt2) == GIMPLE_COND
> +                            && (gimple_cond_code (stmt2) == NE_EXPR
> +                                || gimple_cond_code (stmt2) == EQ_EXPR)
> +                            && TREE_CODE (phi_arg2) == INTEGER_CST;
> +
> +       if (is_eq_expr2)
> +         {
> +           lhs2 = gimple_cond_lhs (stmt2);
> +           rhs2 = gimple_cond_rhs (stmt2);
> +
> +           if (operand_equal_p (lhs2, phi_arg2, 0))
> +             {
> +               tree t = lhs2;
> +               lhs2 = rhs2;
> +               rhs2 = t;
> +             }
> +           if (operand_equal_p (rhs2, phi_arg2, 0)
> +               && operand_equal_p (lhs2, phi_arg, 0))
> +             continue;
> +         }
>
> Can you factor those two hunks of nearly identical code into a single
> function and call it twice?  I'm also curious if you really need the code to
> swap lhs/rhs.  When can the LHS of a cond be an integer constant?  Don't we
> canonicalize it as <ssa_name> <COND> <constant>?


I was not aware that the comparison between a variable and a constant
will always be canonicalized as <ssa_name> <COND> <constant>. Then I
will remove the swap, and as the code is much smaller, I think it may
not be necessary to create a function for them.


>
> I'd probably write the ChangeLog as:
>
>         * tree-ssa-reassoc.c (suitable_cond_bb): Handle constant PHI
>         operands resulting from propagation of edge equivalences.
>
>

OK, much better than mine ;)


> I'm also curious -- did this code show up in a particular benchmark, if so,
> which one?

I didn't find this problem from any benchmark, but from another
concern about loop upper bound estimation. Look at the following code:

int foo(unsigned int n, int r)
{
  int i;
  if (n > 0)
    if (n < 4)
    {
      do {
         --n;
         r *= 2;
      } while (n > 0);
    }
  return r+n;
}


In order to get the upper bound of the loop in this function, GCC
traverses conditions n<4 and n>0 separately and tries to get any
useful information. But as those two conditions cannot be combined
into one due to this issue (note that n>0 will be transformed into
n!=0), when GCC sees n<4, it will consider the possibility that n may
be equal to 0, in which case the upper bound is UINT_MAX. If those two
conditions can be combined into one, which is n-1<=2, then we can get
the correct upper bound of the loop.


thanks,
Cong

>
> jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Handling == or != comparisons that may affect range test optimization.
  2013-11-05 22:03   ` Cong Hou
@ 2013-11-05 22:39     ` Jeff Law
  2013-11-06 17:25     ` [PATCH] Improve VRP assert creation for loops Jakub Jelinek
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Law @ 2013-11-05 22:39 UTC (permalink / raw)
  To: Cong Hou; +Cc: GCC Patches

On 11/05/13 15:00, Cong Hou wrote:
>>
>> Can you factor those two hunks of nearly identical code into a single
>> function and call it twice?  I'm also curious if you really need the code to
>> swap lhs/rhs.  When can the LHS of a cond be an integer constant?  Don't we
>> canonicalize it as <ssa_name> <COND> <constant>?
>
>
> I was not aware that the comparison between a variable and a constant
> will always be canonicalized as <ssa_name> <COND> <constant>. Then I
> will remove the swap, and as the code is much smaller, I think it may
> not be necessary to create a function for them.
Hmm, looking at is_gimple_condexpr, it appears to accept both forms:

/*  Return true if T is a GIMPLE condition.  */

bool
is_gimple_condexpr (tree t)
{
   return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
                                 && !tree_could_throw_p (t)
                                 && is_gimple_val (TREE_OPERAND (t, 0))
                                 && is_gimple_val (TREE_OPERAND (t, 1))));
}


Sigh.  So you probably still need to handle both forms :(


>
> I didn't find this problem from any benchmark, but from another
> concern about loop upper bound estimation. Look at the following code:
>
> int foo(unsigned int n, int r)
> {
>    int i;
>    if (n > 0)
>      if (n < 4)
>      {
>        do {
>           --n;
>           r *= 2;
>        } while (n > 0);
>      }
>    return r+n;
> }
>
>
> In order to get the upper bound of the loop in this function, GCC
> traverses conditions n<4 and n>0 separately and tries to get any
> useful information. But as those two conditions cannot be combined
> into one due to this issue (note that n>0 will be transformed into
> n!=0), when GCC sees n<4, it will consider the possibility that n may
> be equal to 0, in which case the upper bound is UINT_MAX. If those two
> conditions can be combined into one, which is n-1<=2, then we can get
> the correct upper bound of the loop.
Ah.  Ok. Thanks.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Improve VRP assert creation for loops
  2013-11-05 22:03   ` Cong Hou
  2013-11-05 22:39     ` Jeff Law
@ 2013-11-06 17:25     ` Jakub Jelinek
  2013-11-06 17:32       ` [PATCH] Use get_range_info during number of iterations analysis Jakub Jelinek
                         ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Jakub Jelinek @ 2013-11-06 17:25 UTC (permalink / raw)
  To: Richard Biener, Cong Hou; +Cc: GCC Patches

On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
> > I'm also curious -- did this code show up in a particular benchmark, if so,
> > which one?
> 
> I didn't find this problem from any benchmark, but from another
> concern about loop upper bound estimation. Look at the following code:
> 
> int foo(unsigned int n, int r)
> {
>   int i;
>   if (n > 0)
>     if (n < 4)
>     {
>       do {
>          --n;
>          r *= 2;
>       } while (n > 0);
>     }
>   return r+n;
> }
> 
> 
> In order to get the upper bound of the loop in this function, GCC
> traverses conditions n<4 and n>0 separately and tries to get any
> useful information. But as those two conditions cannot be combined
> into one due to this issue (note that n>0 will be transformed into
> n!=0), when GCC sees n<4, it will consider the possibility that n may
> be equal to 0, in which case the upper bound is UINT_MAX. If those two
> conditions can be combined into one, which is n-1<=2, then we can get
> the correct upper bound of the loop.

I've looked at the above testcase to see why we aren't able to determine
the number of iterations upper bound properly here.

That doesn't mean your patch is useless, though I must say I'm far from
being convinced it is safe ATM and also it looks like quite ugly special
case (trying to undo a VRP optimization but only one single specific case).

The first problem is VRP, we end up with:
  <bb 5>:
  # n_1 = PHI <n_5(D)(4), n_7(6)>
  # r_3 = PHI <r_6(D)(4), r_8(6)>
  # RANGE [0, 4294967295]
  n_7 = n_1 + 4294967295;
  # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
  r_8 = r_3 * 2;
  if (n_7 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;
  
  <bb 6>:
  goto <bb 5>;
- missing range on n_1, extremely conservative range on n_7.  With the
attached patch we get instead:
  <bb 5>:
  # RANGE [1, 3] NONZERO 0x00000000000000003
  # n_1 = PHI <n_5(D)(4), n_7(6)>
  # r_3 = PHI <r_6(D)(4), r_8(6)>
  # RANGE [0, 2] NONZERO 0x00000000000000003
  n_7 = n_1 + 4294967295;
  # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
  r_8 = r_3 * 2;
  if (n_7 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

  <bb 6>:
  goto <bb 5>;

The issue is that we use live_on_edge to determine if it is desirable
to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
the loop through the bb with exit edge and thus live of the latch isn't
computed (and, generally the propagation for live ignores dfs back edges
anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
iteration we start with n_1 being assumed [1, 3] (that is range of the
assertion from the preceeding conditions on n_5(D)), but in the next
iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
PHI and thus end up with uselessly wide range because we think the
subtraction can wrap around.  This patch improves live_on_edge for
empty latches, by just looking at the PHIs on loop->header from the
latch -> header edge and noting which SSA_NAMEs are used there.

I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
various tree optimizations already and want to test unrolling of loops
iterating exactly twice, but with this VRP change VRP is smart enough
to replace the PHI argument on the i IV from
-  # i_13 = PHI <i_8(3), 0(2)>
+  # i_13 = PHI <1(3), 0(2)>
(the loop iterates exactly twice) and RTL unrolling doesn't do the
tested thing in that case.  But, this should affect only loops that roll
exactly twice and those realy should be unrolled already far before, so IMHO
it isn't something to try to optimize better at the RTL level.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-11-06  Jakub Jelinek  <jakub@redhat.com>

	* tree-vrp.c (live_on_edge): If e->dest is an empty latch
	of some loop, but live[e->dest->index] is not computed, look
	for SSA_NAMEs used in loop header PHIs from the latch edge.

	* gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
	* gcc.dg/unroll_2.c: Likewise.
	* gcc.dg/unroll_3.c: Likewise.
	* gcc.dg/unroll_4.c: Likewise.

--- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
+++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
@@ -92,6 +92,42 @@ static sbitmap *live;
 static bool
 live_on_edge (edge e, tree name)
 {
+  if (live[e->dest->index] == NULL)
+    {
+      /* Handle edges to empty latch blocks.  If NAME appears
+	 in loop header phis on edges from latch, return true
+	 and remember those SSA_NAMEs.  */
+      basic_block bb = e->dest;
+      if (bb->loop_father
+	  && bb->loop_father->latch == bb
+	  && single_succ_p (bb)
+	  && empty_block_p (bb))
+	{
+	  gimple_stmt_iterator gsi;
+	  edge e2 = single_succ_edge (bb);
+	  for (gsi = gsi_start_phis (e2->dest);
+	       !gsi_end_p (gsi);
+	       gsi_next (&gsi))
+	    {
+	      gimple phi = gsi_stmt (gsi);
+	      tree res = gimple_phi_result (phi), arg;
+
+	      if (virtual_operand_p (res))
+		continue;
+	      arg = PHI_ARG_DEF_FROM_EDGE (phi, e2);
+	      if (TREE_CODE (arg) == SSA_NAME)
+		{
+		  if (live[e->dest->index] == NULL)
+		    {
+		      live[e->dest->index] = sbitmap_alloc (num_ssa_names);
+		      bitmap_clear (live[e->dest->index]);
+		    }
+		  bitmap_set_bit (live[e->dest->index],
+				  SSA_NAME_VERSION (arg));
+		}
+	    }
+	}
+    }
   return (live[e->dest->index]
 	  && bitmap_bit_p (live[e->dest->index], SSA_NAME_VERSION (name)));
 }
--- gcc/testsuite/gcc.dg/unroll_1.c.jj	2013-09-09 11:32:36.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_1.c	2013-11-06 17:10:52.900722932 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_2.c.jj	2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_2.c	2013-11-06 17:10:30.751845504 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_3.c.jj	2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_3.c	2013-11-06 17:10:38.864800338 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_4.c.jj	2013-08-30 14:38:40.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_4.c	2013-11-06 17:11:03.816665603 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
 
 unsigned a[100], b[100];
 inline void bar()


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Use get_range_info during number of iterations analysis
  2013-11-06 17:25     ` [PATCH] Improve VRP assert creation for loops Jakub Jelinek
@ 2013-11-06 17:32       ` Jakub Jelinek
  2013-11-06 22:23         ` Jeff Law
  2013-11-06 22:10       ` [PATCH] Improve VRP assert creation for loops Jeff Law
  2013-11-07  9:08       ` Richard Biener
  2 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2013-11-06 17:32 UTC (permalink / raw)
  To: Richard Biener, Cong Hou; +Cc: GCC Patches

On Wed, Nov 06, 2013 at 06:13:42PM +0100, Jakub Jelinek wrote:
> I've looked at the above testcase to see why we aren't able to determine
> the number of iterations upper bound properly here.

And here is a patch that uses get_range_info during # of iterations
analysis, so that for your testcase we can actually find out that it has
at most 2 latch executions and at most 3 header executions.

On that testcase, bound_difference is actually called with n_5(D) + 0xffffffff
and 0 and we don't have range info for n_5(D), because we had better range
info only for the SSA_NAMEs temporarily created for the assertions.
But with the previously posted patch, we have reasonable range info on the
PHI of the IV at loop->header, and as that PHI has n_5(D) as one of its
arguments (the one from preheader edge), we can still use that range info
when inside of the loop for # of iters purposes, even when it might be
wider range than strictly necessary (but not in this case).
So, the patch looks at both get_range_info of var itself and of all the
results of PHIs on loop->header that have var as PHI arg from the preheader
edge, and ands all the ranges together.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-11-06  Jakub Jelinek  <jakub@redhat.com>

	* tree-ssa-loop-niter.c: Include tree-ssanames.h.
	(determine_value_range): Add loop argument.  Use get_range_info to
	improve range.
	(bound_difference): Adjust caller.

--- gcc/tree-ssa-loop-niter.c.jj	2013-10-24 10:19:21.000000000 +0200
+++ gcc/tree-ssa-loop-niter.c	2013-11-06 14:37:47.445220588 +0100
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.
 #include "diagnostic-core.h"
 #include "tree-inline.h"
 #include "tree-pass.h"
+#include "tree-ssanames.h"
 
 
 #define SWAP(X, Y) do { affine_iv *tmp = (X); (X) = (Y); (Y) = tmp; } while (0)
@@ -119,9 +120,12 @@ split_to_var_and_offset (tree expr, tree
    in TYPE to MIN and MAX.  */
 
 static void
-determine_value_range (tree type, tree var, mpz_t off,
+determine_value_range (struct loop *loop, tree type, tree var, mpz_t off,
 		       mpz_t min, mpz_t max)
 {
+  double_int minv, maxv;
+  enum value_range_type rtype = VR_VARYING;
+
   /* If the expression is a constant, we know its value exactly.  */
   if (integer_zerop (var))
     {
@@ -130,9 +134,73 @@ determine_value_range (tree type, tree v
       return;
     }
 
+  get_type_static_bounds (type, min, max);
+
+  /* See if we have some range info from VRP.  */
+  if (TREE_CODE (var) == SSA_NAME && INTEGRAL_TYPE_P (type))
+    {
+      edge e = loop_preheader_edge (loop);
+      gimple_stmt_iterator gsi;
+
+      /* Either for VAR itself...  */
+      rtype = get_range_info (var, &minv, &maxv);
+      /* Or for PHI results in loop->header where VAR is used as
+	 PHI argument from the loop preheader edge.  */
+      for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple phi = gsi_stmt (gsi);
+	  double_int minc, maxc;
+	  if (PHI_ARG_DEF_FROM_EDGE (phi, e) == var
+	      && (get_range_info (gimple_phi_result (phi), &minc, &maxc)
+		  == VR_RANGE))
+	    {
+	      if (rtype != VR_RANGE)
+		{
+		  rtype = VR_RANGE;
+		  minv = minc;
+		  maxv = maxc;
+		}
+	      else
+		{
+		  minv = minv.max (minc, TYPE_UNSIGNED (type));
+		  maxv = maxv.min (maxc, TYPE_UNSIGNED (type));
+		  gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0);
+		}
+	    }
+	}
+      if (rtype == VR_RANGE)
+	{
+	  mpz_t minm, maxm;
+	  gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0);
+	  mpz_init (minm);
+	  mpz_init (maxm);
+	  mpz_set_double_int (minm, minv, TYPE_UNSIGNED (type));
+	  mpz_set_double_int (maxm, maxv, TYPE_UNSIGNED (type));
+	  mpz_add (minm, minm, off);
+	  mpz_add (maxm, maxm, off);
+	  /* If the computation may not wrap or off is zero, then this
+	     is always fine.  If off is negative and minv + off isn't
+	     smaller than type's minimum, or off is positive and
+	     maxv + off isn't bigger than type's maximum, use the more
+	     precise range too.  */
+	  if (nowrap_type_p (type)
+	      || mpz_sgn (off) == 0
+	      || (mpz_sgn (off) < 0 && mpz_cmp (minm, min) >= 0)
+	      || (mpz_sgn (off) > 0 && mpz_cmp (maxm, max) <= 0))
+	    {
+	      mpz_set (min, minm);
+	      mpz_set (max, maxm);
+	      mpz_clear (minm);
+	      mpz_clear (maxm);
+	      return;
+	    }
+	  mpz_clear (minm);
+	  mpz_clear (maxm);
+	}
+    }
+
   /* If the computation may wrap, we know nothing about the value, except for
      the range of the type.  */
-  get_type_static_bounds (type, min, max);
   if (!nowrap_type_p (type))
     return;
 
@@ -405,8 +473,8 @@ bound_difference (struct loop *loop, tre
       mpz_init (maxx);
       mpz_init (miny);
       mpz_init (maxy);
-      determine_value_range (type, varx, offx, minx, maxx);
-      determine_value_range (type, vary, offy, miny, maxy);
+      determine_value_range (loop, type, varx, offx, minx, maxx);
+      determine_value_range (loop, type, vary, offy, miny, maxy);
 
       mpz_sub (bnds->below, minx, maxy);
       mpz_sub (bnds->up, maxx, miny);


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Improve VRP assert creation for loops
  2013-11-06 17:25     ` [PATCH] Improve VRP assert creation for loops Jakub Jelinek
  2013-11-06 17:32       ` [PATCH] Use get_range_info during number of iterations analysis Jakub Jelinek
@ 2013-11-06 22:10       ` Jeff Law
  2013-11-07  9:08       ` Richard Biener
  2 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2013-11-06 22:10 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Cong Hou; +Cc: GCC Patches

On 11/06/13 10:13, Jakub Jelinek wrote:
> On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
>>> I'm also curious -- did this code show up in a particular benchmark, if so,
>>> which one?
>>
>> I didn't find this problem from any benchmark, but from another
>> concern about loop upper bound estimation. Look at the following code:
>>
>> int foo(unsigned int n, int r)
>> {
>>    int i;
>>    if (n > 0)
>>      if (n < 4)
>>      {
>>        do {
>>           --n;
>>           r *= 2;
>>        } while (n > 0);
>>      }
>>    return r+n;
>> }
>>
>>
>> In order to get the upper bound of the loop in this function, GCC
>> traverses conditions n<4 and n>0 separately and tries to get any
>> useful information. But as those two conditions cannot be combined
>> into one due to this issue (note that n>0 will be transformed into
>> n!=0), when GCC sees n<4, it will consider the possibility that n may
>> be equal to 0, in which case the upper bound is UINT_MAX. If those two
>> conditions can be combined into one, which is n-1<=2, then we can get
>> the correct upper bound of the loop.
>
> I've looked at the above testcase to see why we aren't able to determine
> the number of iterations upper bound properly here.
>
> That doesn't mean your patch is useless, though I must say I'm far from
> being convinced it is safe ATM and also it looks like quite ugly special
> case (trying to undo a VRP optimization but only one single specific case).
>
> The first problem is VRP, we end up with:
>    <bb 5>:
>    # n_1 = PHI <n_5(D)(4), n_7(6)>
>    # r_3 = PHI <r_6(D)(4), r_8(6)>
>    # RANGE [0, 4294967295]
>    n_7 = n_1 + 4294967295;
>    # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>    r_8 = r_3 * 2;
>    if (n_7 != 0)
>      goto <bb 6>;
>    else
>      goto <bb 7>;
>
>    <bb 6>:
>    goto <bb 5>;
> - missing range on n_1, extremely conservative range on n_7.  With the
> attached patch we get instead:
>    <bb 5>:
>    # RANGE [1, 3] NONZERO 0x00000000000000003
>    # n_1 = PHI <n_5(D)(4), n_7(6)>
>    # r_3 = PHI <r_6(D)(4), r_8(6)>
>    # RANGE [0, 2] NONZERO 0x00000000000000003
>    n_7 = n_1 + 4294967295;
>    # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>    r_8 = r_3 * 2;
>    if (n_7 != 0)
>      goto <bb 6>;
>    else
>      goto <bb 7>;
>
>    <bb 6>:
>    goto <bb 5>;
>
> The issue is that we use live_on_edge to determine if it is desirable
> to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
> the loop through the bb with exit edge and thus live of the latch isn't
> computed (and, generally the propagation for live ignores dfs back edges
> anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
> we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
> iteration we start with n_1 being assumed [1, 3] (that is range of the
> assertion from the preceeding conditions on n_5(D)), but in the next
> iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
> PHI and thus end up with uselessly wide range because we think the
> subtraction can wrap around.  This patch improves live_on_edge for
> empty latches, by just looking at the PHIs on loop->header from the
> latch -> header edge and noting which SSA_NAMEs are used there.
>
> I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
> various tree optimizations already and want to test unrolling of loops
> iterating exactly twice, but with this VRP change VRP is smart enough
> to replace the PHI argument on the i IV from
> -  # i_13 = PHI <i_8(3), 0(2)>
> +  # i_13 = PHI <1(3), 0(2)>
> (the loop iterates exactly twice) and RTL unrolling doesn't do the
> tested thing in that case.  But, this should affect only loops that roll
> exactly twice and those realy should be unrolled already far before, so IMHO
> it isn't something to try to optimize better at the RTL level.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-11-06  Jakub Jelinek  <jakub@redhat.com>
>
> 	* tree-vrp.c (live_on_edge): If e->dest is an empty latch
> 	of some loop, but live[e->dest->index] is not computed, look
> 	for SSA_NAMEs used in loop header PHIs from the latch edge.
>
> 	* gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
> 	* gcc.dg/unroll_2.c: Likewise.
> 	* gcc.dg/unroll_3.c: Likewise.
> 	* gcc.dg/unroll_4.c: Likewise.
OK with a testcase verifying we get the tighter ranges.

I wonder a bit how often this situation occurs in practice 
(unnecessarily wide range due to missing an ASSERT_EXPR on the latch edge.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Use get_range_info during number of iterations analysis
  2013-11-06 17:32       ` [PATCH] Use get_range_info during number of iterations analysis Jakub Jelinek
@ 2013-11-06 22:23         ` Jeff Law
  2013-11-07 14:38           ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2013-11-06 22:23 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Cong Hou; +Cc: GCC Patches

On 11/06/13 10:23, Jakub Jelinek wrote:
> On Wed, Nov 06, 2013 at 06:13:42PM +0100, Jakub Jelinek wrote:
>> I've looked at the above testcase to see why we aren't able to determine
>> the number of iterations upper bound properly here.
>
> And here is a patch that uses get_range_info during # of iterations
> analysis, so that for your testcase we can actually find out that it has
> at most 2 latch executions and at most 3 header executions.
>
> On that testcase, bound_difference is actually called with n_5(D) + 0xffffffff
> and 0 and we don't have range info for n_5(D), because we had better range
> info only for the SSA_NAMEs temporarily created for the assertions.
> But with the previously posted patch, we have reasonable range info on the
> PHI of the IV at loop->header, and as that PHI has n_5(D) as one of its
> arguments (the one from preheader edge), we can still use that range info
> when inside of the loop for # of iters purposes, even when it might be
> wider range than strictly necessary (but not in this case).
> So, the patch looks at both get_range_info of var itself and of all the
> results of PHIs on loop->header that have var as PHI arg from the preheader
> edge, and ands all the ranges together.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-11-06  Jakub Jelinek  <jakub@redhat.com>
>
> 	* tree-ssa-loop-niter.c: Include tree-ssanames.h.
> 	(determine_value_range): Add loop argument.  Use get_range_info to
> 	improve range.
> 	(bound_difference): Adjust caller.
Clever, using the range from the PHI in the loop header.  While it may 
be too wide, it's likely going to be a lot better than nothing.

Add a testcase and this is OK.

jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Improve VRP assert creation for loops
  2013-11-06 17:25     ` [PATCH] Improve VRP assert creation for loops Jakub Jelinek
  2013-11-06 17:32       ` [PATCH] Use get_range_info during number of iterations analysis Jakub Jelinek
  2013-11-06 22:10       ` [PATCH] Improve VRP assert creation for loops Jeff Law
@ 2013-11-07  9:08       ` Richard Biener
  2013-11-07  9:42         ` Jakub Jelinek
  2 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2013-11-07  9:08 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Cong Hou, GCC Patches

On Wed, 6 Nov 2013, Jakub Jelinek wrote:

> On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
> > > I'm also curious -- did this code show up in a particular benchmark, if so,
> > > which one?
> > 
> > I didn't find this problem from any benchmark, but from another
> > concern about loop upper bound estimation. Look at the following code:
> > 
> > int foo(unsigned int n, int r)
> > {
> >   int i;
> >   if (n > 0)
> >     if (n < 4)
> >     {
> >       do {
> >          --n;
> >          r *= 2;
> >       } while (n > 0);
> >     }
> >   return r+n;
> > }
> > 
> > 
> > In order to get the upper bound of the loop in this function, GCC
> > traverses conditions n<4 and n>0 separately and tries to get any
> > useful information. But as those two conditions cannot be combined
> > into one due to this issue (note that n>0 will be transformed into
> > n!=0), when GCC sees n<4, it will consider the possibility that n may
> > be equal to 0, in which case the upper bound is UINT_MAX. If those two
> > conditions can be combined into one, which is n-1<=2, then we can get
> > the correct upper bound of the loop.
> 
> I've looked at the above testcase to see why we aren't able to determine
> the number of iterations upper bound properly here.
> 
> That doesn't mean your patch is useless, though I must say I'm far from
> being convinced it is safe ATM and also it looks like quite ugly special
> case (trying to undo a VRP optimization but only one single specific case).
> 
> The first problem is VRP, we end up with:
>   <bb 5>:
>   # n_1 = PHI <n_5(D)(4), n_7(6)>
>   # r_3 = PHI <r_6(D)(4), r_8(6)>
>   # RANGE [0, 4294967295]
>   n_7 = n_1 + 4294967295;
>   # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>   r_8 = r_3 * 2;
>   if (n_7 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
>   
>   <bb 6>:
>   goto <bb 5>;
> - missing range on n_1, extremely conservative range on n_7.  With the
> attached patch we get instead:
>   <bb 5>:
>   # RANGE [1, 3] NONZERO 0x00000000000000003
>   # n_1 = PHI <n_5(D)(4), n_7(6)>
>   # r_3 = PHI <r_6(D)(4), r_8(6)>
>   # RANGE [0, 2] NONZERO 0x00000000000000003
>   n_7 = n_1 + 4294967295;
>   # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>   r_8 = r_3 * 2;
>   if (n_7 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
> 
>   <bb 6>:
>   goto <bb 5>;
> 
> The issue is that we use live_on_edge to determine if it is desirable
> to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
> the loop through the bb with exit edge and thus live of the latch isn't
> computed (and, generally the propagation for live ignores dfs back edges
> anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
> we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
> iteration we start with n_1 being assumed [1, 3] (that is range of the
> assertion from the preceeding conditions on n_5(D)), but in the next
> iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
> PHI and thus end up with uselessly wide range because we think the
> subtraction can wrap around.  This patch improves live_on_edge for
> empty latches, by just looking at the PHIs on loop->header from the
> latch -> header edge and noting which SSA_NAMEs are used there.
> 
> I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
> various tree optimizations already and want to test unrolling of loops
> iterating exactly twice, but with this VRP change VRP is smart enough
> to replace the PHI argument on the i IV from
> -  # i_13 = PHI <i_8(3), 0(2)>
> +  # i_13 = PHI <1(3), 0(2)>
> (the loop iterates exactly twice) and RTL unrolling doesn't do the
> tested thing in that case.  But, this should affect only loops that roll
> exactly twice and those realy should be unrolled already far before, so IMHO
> it isn't something to try to optimize better at the RTL level.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2013-11-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* tree-vrp.c (live_on_edge): If e->dest is an empty latch
> 	of some loop, but live[e->dest->index] is not computed, look
> 	for SSA_NAMEs used in loop header PHIs from the latch edge.
> 
> 	* gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
> 	* gcc.dg/unroll_2.c: Likewise.
> 	* gcc.dg/unroll_3.c: Likewise.
> 	* gcc.dg/unroll_4.c: Likewise.
> 
> --- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
> +++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
> @@ -92,6 +92,42 @@ static sbitmap *live;
>  static bool
>  live_on_edge (edge e, tree name)
>  {
> +  if (live[e->dest->index] == NULL)
> +    {

I'd rather have live[] computed "correctly" than the following
which I think is a hack ... as you say the issue is ordering
(or rather that there isn't an "order" for CFG cycles unless
we want to iterate).  For loop BB order we explicitely handle
the latch, maybe just using a different order than
RPO order, with special-casing the latch, makes more sense?

Richard.

> +      /* Handle edges to empty latch blocks.  If NAME appears
> +	 in loop header phis on edges from latch, return true
> +	 and remember those SSA_NAMEs.  */
> +      basic_block bb = e->dest;
> +      if (bb->loop_father
> +	  && bb->loop_father->latch == bb
> +	  && single_succ_p (bb)
> +	  && empty_block_p (bb))
> +	{
> +	  gimple_stmt_iterator gsi;
> +	  edge e2 = single_succ_edge (bb);
> +	  for (gsi = gsi_start_phis (e2->dest);
> +	       !gsi_end_p (gsi);
> +	       gsi_next (&gsi))
> +	    {
> +	      gimple phi = gsi_stmt (gsi);
> +	      tree res = gimple_phi_result (phi), arg;
> +
> +	      if (virtual_operand_p (res))
> +		continue;
> +	      arg = PHI_ARG_DEF_FROM_EDGE (phi, e2);
> +	      if (TREE_CODE (arg) == SSA_NAME)
> +		{
> +		  if (live[e->dest->index] == NULL)
> +		    {
> +		      live[e->dest->index] = sbitmap_alloc (num_ssa_names);
> +		      bitmap_clear (live[e->dest->index]);
> +		    }
> +		  bitmap_set_bit (live[e->dest->index],
> +				  SSA_NAME_VERSION (arg));
> +		}
> +	    }
> +	}
> +    }
>    return (live[e->dest->index]
>  	  && bitmap_bit_p (live[e->dest->index], SSA_NAME_VERSION (name)));
>  }
> --- gcc/testsuite/gcc.dg/unroll_1.c.jj	2013-09-09 11:32:36.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_1.c	2013-11-06 17:10:52.900722932 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_2.c.jj	2013-08-30 14:38:39.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_2.c	2013-11-06 17:10:30.751845504 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_3.c.jj	2013-08-30 14:38:39.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_3.c	2013-11-06 17:10:38.864800338 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_4.c.jj	2013-08-30 14:38:40.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_4.c	2013-11-06 17:11:03.816665603 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Improve VRP assert creation for loops
  2013-11-07  9:08       ` Richard Biener
@ 2013-11-07  9:42         ` Jakub Jelinek
  2013-11-07  9:44           ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2013-11-07  9:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: Cong Hou, GCC Patches

On Thu, Nov 07, 2013 at 09:48:46AM +0100, Richard Biener wrote:
> > --- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
> > +++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
> > @@ -92,6 +92,42 @@ static sbitmap *live;
> >  static bool
> >  live_on_edge (edge e, tree name)
> >  {
> > +  if (live[e->dest->index] == NULL)
> > +    {
> 
> I'd rather have live[] computed "correctly" than the following
> which I think is a hack ... as you say the issue is ordering
> (or rather that there isn't an "order" for CFG cycles unless
> we want to iterate).  For loop BB order we explicitely handle
> the latch, maybe just using a different order than
> RPO order, with special-casing the latch, makes more sense?

But is there any order that would help?  Sure, we could say just tweak
the rpo/bb_rpo arrays and put there latch bbs before any other bbs from the
same loop, but that would only help us in calling find_assert_locations_1
on the latch before other bbs in the loop.  But that call does nothing,
and we'd need to special case handling of the live for the latch block
anyway, just not in live_on_edge, but in find_assert_locations instead.

Is that what what you are looking for?

Because without that special casing of handling latch edges, you'd need to
process the header rather than latch first, and propagate through DFS_BACK
edges, but then you couldn't propagate through other edges, nor in this
testcase would be the latch live computed when processing the header which
has there the condition.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Improve VRP assert creation for loops
  2013-11-07  9:42         ` Jakub Jelinek
@ 2013-11-07  9:44           ` Richard Biener
  2013-11-07 11:32             ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2013-11-07  9:44 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Cong Hou, GCC Patches

On Thu, 7 Nov 2013, Jakub Jelinek wrote:

> On Thu, Nov 07, 2013 at 09:48:46AM +0100, Richard Biener wrote:
> > > --- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
> > > +++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
> > > @@ -92,6 +92,42 @@ static sbitmap *live;
> > >  static bool
> > >  live_on_edge (edge e, tree name)
> > >  {
> > > +  if (live[e->dest->index] == NULL)
> > > +    {
> > 
> > I'd rather have live[] computed "correctly" than the following
> > which I think is a hack ... as you say the issue is ordering
> > (or rather that there isn't an "order" for CFG cycles unless
> > we want to iterate).  For loop BB order we explicitely handle
> > the latch, maybe just using a different order than
> > RPO order, with special-casing the latch, makes more sense?
> 
> But is there any order that would help?  Sure, we could say just tweak
> the rpo/bb_rpo arrays and put there latch bbs before any other bbs from the
> same loop, but that would only help us in calling find_assert_locations_1
> on the latch before other bbs in the loop.  But that call does nothing,
> and we'd need to special case handling of the live for the latch block
> anyway, just not in live_on_edge, but in find_assert_locations instead.
> 
> Is that what what you are looking for?
> 
> Because without that special casing of handling latch edges, you'd need to
> process the header rather than latch first, and propagate through DFS_BACK
> edges, but then you couldn't propagate through other edges, nor in this
> testcase would be the latch live computed when processing the header which
> has there the condition.

I'm looking for adjusting of live compute - either by adjusting the
algorithm or by special casing the latches.  Like for example
with the following (untested, needs cleanup - the PHI processing
can be factored out from find_assert_locations_1 and re-used):

@@ -5904,6 +5898,27 @@ find_assert_locations (void)
   for (i = 0; i < rpo_cnt; ++i)
     bb_rpo[rpo[i]] = i;
 
+  /* Pre-seed loop latch liveness from loop header PHI nodes.  Due to
+     the order we compute liveness and insert asserts we otherwise
+     fail to insert asserts into the loop latch.  */
+  loop_p loop;
+  loop_iterator li;
+  FOR_EACH_LOOP (li, loop, 0)
+    {
+      i = loop->latch->index;
+      live[i] = sbitmap_alloc (num_ssa_names);
+      bitmap_clear (live[i]);
+      for (gimple_stmt_iterator gsi = gsi_start_phis (loop->header);
+          !gsi_end_p (gsi); gsi_next (&gsi))
+       {
+         gimple phi = gsi_stmt (gsi);
+         if (virtual_operand_p (gimple_phi_result (phi)))
+           continue;
+         for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
+           if (TREE_CODE (gimple_phi_arg_def (phi, j)) == SSA_NAME)
+             bitmap_set_bit (live[i], SSA_NAME_VERSION 
(gimple_phi_arg_def (phi, j)));
+       }
+    }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Improve VRP assert creation for loops
  2013-11-07  9:44           ` Richard Biener
@ 2013-11-07 11:32             ` Jakub Jelinek
  2013-11-07 11:41               ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2013-11-07 11:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: Cong Hou, GCC Patches

On Thu, Nov 07, 2013 at 10:32:10AM +0100, Richard Biener wrote:
> I'm looking for adjusting of live compute - either by adjusting the
> algorithm or by special casing the latches.  Like for example
> with the following (untested, needs cleanup - the PHI processing
> can be factored out from find_assert_locations_1 and re-used):
> 
> @@ -5904,6 +5898,27 @@ find_assert_locations (void)
>    for (i = 0; i < rpo_cnt; ++i)
>      bb_rpo[rpo[i]] = i;
>  
> +  /* Pre-seed loop latch liveness from loop header PHI nodes.  Due to
> +     the order we compute liveness and insert asserts we otherwise
> +     fail to insert asserts into the loop latch.  */
> +  loop_p loop;
> +  loop_iterator li;
> +  FOR_EACH_LOOP (li, loop, 0)
> +    {
> +      i = loop->latch->index;
> +      live[i] = sbitmap_alloc (num_ssa_names);
> +      bitmap_clear (live[i]);
> +      for (gimple_stmt_iterator gsi = gsi_start_phis (loop->header);
> +          !gsi_end_p (gsi); gsi_next (&gsi))
> +       {
> +         gimple phi = gsi_stmt (gsi);
> +         if (virtual_operand_p (gimple_phi_result (phi)))
> +           continue;
> +         for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
> +           if (TREE_CODE (gimple_phi_arg_def (phi, j)) == SSA_NAME)
> +             bitmap_set_bit (live[i], SSA_NAME_VERSION 
> (gimple_phi_arg_def (phi, j)));
> +       }
> +    }

LGTM, except that I think we should only care about phi args from the
latch -> header edge, not others and IMHO it is memory waste to allocate
sbitmap even when we wouldn't add anything.  So, I'm going to
rebootstrap/regtest the following (including new testcase Jeff asked for):

2013-11-07  Richard Biener  <rguenther@suse.de>
	    Jakub Jelinek  <jakub@redhat.com>

	* tree-vrp.c (find_assert_locations): Pre-seed live bitmaps for loop
	latches from header PHI arguments from the latch edge.

        * gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
        * gcc.dg/unroll_2.c: Likewise.
        * gcc.dg/unroll_3.c: Likewise.
        * gcc.dg/unroll_4.c: Likewise.
	* gcc.dg/vrp90.c: New test.

--- gcc/tree-vrp.c.jj	2013-11-06 16:58:04.675678567 +0100
+++ gcc/tree-vrp.c	2013-11-07 10:59:38.841283931 +0100
@@ -5906,6 +5906,34 @@ find_assert_locations (void)
   for (i = 0; i < rpo_cnt; ++i)
     bb_rpo[rpo[i]] = i;
 
+  /* Pre-seed loop latch liveness from loop header PHI nodes.  Due to
+     the order we compute liveness and insert asserts we otherwise
+     fail to insert asserts into the loop latch.  */
+  loop_p loop;
+  loop_iterator li;
+  FOR_EACH_LOOP (li, loop, 0)
+    {
+      i = loop->latch->index;
+      unsigned int j = find_edge (loop->latch, loop->header)->dest_idx;
+      for (gimple_stmt_iterator gsi = gsi_start_phis (loop->header);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple phi = gsi_stmt (gsi);
+	  if (virtual_operand_p (gimple_phi_result (phi)))
+	    continue;
+	  tree arg = gimple_phi_arg_def (phi, j);
+	  if (TREE_CODE (arg) == SSA_NAME)
+	    {
+	      if (live[i] == NULL)
+		{
+		  live[i] = sbitmap_alloc (num_ssa_names);
+		  bitmap_clear (live[i]);
+		}
+	      bitmap_set_bit (live[i], SSA_NAME_VERSION (arg));
+	    }
+	}
+    }
+
   need_asserts = false;
   for (i = rpo_cnt - 1; i >= 0; --i)
     {
--- gcc/testsuite/gcc.dg/unroll_1.c.jj	2013-09-09 11:32:36.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_1.c	2013-11-06 17:10:52.900722932 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_2.c.jj	2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_2.c	2013-11-06 17:10:30.751845504 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_3.c.jj	2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_3.c	2013-11-06 17:10:38.864800338 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_4.c.jj	2013-08-30 14:38:40.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_4.c	2013-11-06 17:11:03.816665603 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/tree-ssa/vrp90.c.jj	2013-11-07 11:01:55.137535477 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp90.c	2013-11-07 11:02:35.189335924 +0100
@@ -0,0 +1,36 @@
+/* { dg-do link } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+/* { dg-final { scan-tree-dump-not "link_error" "vrp1"} } */
+/* { dg-final { cleanup-tree-dump "vrp1" } } */
+
+extern void link_error (void);
+
+__attribute__((noinline, noclone)) int
+foo (unsigned int n, int r)
+{
+  int i;
+  if (n > 0)
+    {
+      asm ("");
+      if (n < 10)
+	{
+	  asm ("");
+	  do
+	    {
+	      --n;
+	      r *= 2;
+	      if (n >= 9)
+		link_error ();
+	    }
+	  while (n > 0);
+	}
+    }
+  return r + n;
+}
+
+int
+main ()
+{
+  foo (7, 2);
+  return 0;
+}


	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Improve VRP assert creation for loops
  2013-11-07 11:32             ` Jakub Jelinek
@ 2013-11-07 11:41               ` Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2013-11-07 11:41 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Cong Hou, GCC Patches

On Thu, 7 Nov 2013, Jakub Jelinek wrote:

> On Thu, Nov 07, 2013 at 10:32:10AM +0100, Richard Biener wrote:
> > I'm looking for adjusting of live compute - either by adjusting the
> > algorithm or by special casing the latches.  Like for example
> > with the following (untested, needs cleanup - the PHI processing
> > can be factored out from find_assert_locations_1 and re-used):
> > 
> > @@ -5904,6 +5898,27 @@ find_assert_locations (void)
> >    for (i = 0; i < rpo_cnt; ++i)
> >      bb_rpo[rpo[i]] = i;
> >  
> > +  /* Pre-seed loop latch liveness from loop header PHI nodes.  Due to
> > +     the order we compute liveness and insert asserts we otherwise
> > +     fail to insert asserts into the loop latch.  */
> > +  loop_p loop;
> > +  loop_iterator li;
> > +  FOR_EACH_LOOP (li, loop, 0)
> > +    {
> > +      i = loop->latch->index;
> > +      live[i] = sbitmap_alloc (num_ssa_names);
> > +      bitmap_clear (live[i]);
> > +      for (gimple_stmt_iterator gsi = gsi_start_phis (loop->header);
> > +          !gsi_end_p (gsi); gsi_next (&gsi))
> > +       {
> > +         gimple phi = gsi_stmt (gsi);
> > +         if (virtual_operand_p (gimple_phi_result (phi)))
> > +           continue;
> > +         for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
> > +           if (TREE_CODE (gimple_phi_arg_def (phi, j)) == SSA_NAME)
> > +             bitmap_set_bit (live[i], SSA_NAME_VERSION 
> > (gimple_phi_arg_def (phi, j)));
> > +       }
> > +    }
> 
> LGTM, except that I think we should only care about phi args from the
> latch -> header edge, not others and IMHO it is memory waste to allocate
> sbitmap even when we wouldn't add anything.  So, I'm going to
> rebootstrap/regtest the following (including new testcase Jeff asked for):
> 
> 2013-11-07  Richard Biener  <rguenther@suse.de>
> 	    Jakub Jelinek  <jakub@redhat.com>
> 
> 	* tree-vrp.c (find_assert_locations): Pre-seed live bitmaps for loop
> 	latches from header PHI arguments from the latch edge.
> 
>         * gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
>         * gcc.dg/unroll_2.c: Likewise.
>         * gcc.dg/unroll_3.c: Likewise.
>         * gcc.dg/unroll_4.c: Likewise.
> 	* gcc.dg/vrp90.c: New test.
> 
> --- gcc/tree-vrp.c.jj	2013-11-06 16:58:04.675678567 +0100
> +++ gcc/tree-vrp.c	2013-11-07 10:59:38.841283931 +0100
> @@ -5906,6 +5906,34 @@ find_assert_locations (void)
>    for (i = 0; i < rpo_cnt; ++i)
>      bb_rpo[rpo[i]] = i;
>  
> +  /* Pre-seed loop latch liveness from loop header PHI nodes.  Due to
> +     the order we compute liveness and insert asserts we otherwise
> +     fail to insert asserts into the loop latch.  */
> +  loop_p loop;
> +  loop_iterator li;
> +  FOR_EACH_LOOP (li, loop, 0)
> +    {
> +      i = loop->latch->index;
> +      unsigned int j = find_edge (loop->latch, loop->header)->dest_idx;

Actually there is only one edge from latch to header by definition,
so single_succ_edge (loop->latch) would be better?

Otherwise ok.

Thanks,
Richard.

> +      for (gimple_stmt_iterator gsi = gsi_start_phis (loop->header);
> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple phi = gsi_stmt (gsi);
> +	  if (virtual_operand_p (gimple_phi_result (phi)))
> +	    continue;
> +	  tree arg = gimple_phi_arg_def (phi, j);
> +	  if (TREE_CODE (arg) == SSA_NAME)
> +	    {
> +	      if (live[i] == NULL)
> +		{
> +		  live[i] = sbitmap_alloc (num_ssa_names);
> +		  bitmap_clear (live[i]);
> +		}
> +	      bitmap_set_bit (live[i], SSA_NAME_VERSION (arg));
> +	    }
> +	}
> +    }
> +
>    need_asserts = false;
>    for (i = rpo_cnt - 1; i >= 0; --i)
>      {
> --- gcc/testsuite/gcc.dg/unroll_1.c.jj	2013-09-09 11:32:36.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_1.c	2013-11-06 17:10:52.900722932 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_2.c.jj	2013-08-30 14:38:39.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_2.c	2013-11-06 17:10:30.751845504 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_3.c.jj	2013-08-30 14:38:39.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_3.c	2013-11-06 17:10:38.864800338 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_4.c.jj	2013-08-30 14:38:40.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_4.c	2013-11-06 17:11:03.816665603 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/tree-ssa/vrp90.c.jj	2013-11-07 11:01:55.137535477 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/vrp90.c	2013-11-07 11:02:35.189335924 +0100
> @@ -0,0 +1,36 @@
> +/* { dg-do link } */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +/* { dg-final { scan-tree-dump-not "link_error" "vrp1"} } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
> +
> +extern void link_error (void);
> +
> +__attribute__((noinline, noclone)) int
> +foo (unsigned int n, int r)
> +{
> +  int i;
> +  if (n > 0)
> +    {
> +      asm ("");
> +      if (n < 10)
> +	{
> +	  asm ("");
> +	  do
> +	    {
> +	      --n;
> +	      r *= 2;
> +	      if (n >= 9)
> +		link_error ();
> +	    }
> +	  while (n > 0);
> +	}
> +    }
> +  return r + n;
> +}
> +
> +int
> +main ()
> +{
> +  foo (7, 2);
> +  return 0;
> +}
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Use get_range_info during number of iterations analysis
  2013-11-06 22:23         ` Jeff Law
@ 2013-11-07 14:38           ` Jakub Jelinek
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Jelinek @ 2013-11-07 14:38 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Cong Hou, GCC Patches

On Wed, Nov 06, 2013 at 03:21:04PM -0700, Jeff Law wrote:
> >2013-11-06  Jakub Jelinek  <jakub@redhat.com>
> >
> >	* tree-ssa-loop-niter.c: Include tree-ssanames.h.
> >	(determine_value_range): Add loop argument.  Use get_range_info to
> >	improve range.
> >	(bound_difference): Adjust caller.
> Clever, using the range from the PHI in the loop header.  While it
> may be too wide, it's likely going to be a lot better than nothing.
> 
> Add a testcase and this is OK.

Ok, thanks, here is what I've committed after another bootstrap/regtest.

2013-11-07  Jakub Jelinek  <jakub@redhat.com>

	* tree-ssa-loop-niter.c: Include tree-ssanames.h.
	(determine_value_range): Add loop argument.  Use get_range_info to
	improve range.
	(bound_difference): Adjust caller.

	* gcc.dg/tree-ssa/loop-39.c: New test.

--- gcc/tree-ssa-loop-niter.c.jj	2013-10-24 10:19:21.000000000 +0200
+++ gcc/tree-ssa-loop-niter.c	2013-11-07 11:49:37.568083138 +0100
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.
 #include "diagnostic-core.h"
 #include "tree-inline.h"
 #include "tree-pass.h"
+#include "tree-ssanames.h"
 
 
 #define SWAP(X, Y) do { affine_iv *tmp = (X); (X) = (Y); (Y) = tmp; } while (0)
@@ -119,9 +120,12 @@ split_to_var_and_offset (tree expr, tree
    in TYPE to MIN and MAX.  */
 
 static void
-determine_value_range (tree type, tree var, mpz_t off,
+determine_value_range (struct loop *loop, tree type, tree var, mpz_t off,
 		       mpz_t min, mpz_t max)
 {
+  double_int minv, maxv;
+  enum value_range_type rtype = VR_VARYING;
+
   /* If the expression is a constant, we know its value exactly.  */
   if (integer_zerop (var))
     {
@@ -130,9 +134,73 @@ determine_value_range (tree type, tree v
       return;
     }
 
+  get_type_static_bounds (type, min, max);
+
+  /* See if we have some range info from VRP.  */
+  if (TREE_CODE (var) == SSA_NAME && INTEGRAL_TYPE_P (type))
+    {
+      edge e = loop_preheader_edge (loop);
+      gimple_stmt_iterator gsi;
+
+      /* Either for VAR itself...  */
+      rtype = get_range_info (var, &minv, &maxv);
+      /* Or for PHI results in loop->header where VAR is used as
+	 PHI argument from the loop preheader edge.  */
+      for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple phi = gsi_stmt (gsi);
+	  double_int minc, maxc;
+	  if (PHI_ARG_DEF_FROM_EDGE (phi, e) == var
+	      && (get_range_info (gimple_phi_result (phi), &minc, &maxc)
+		  == VR_RANGE))
+	    {
+	      if (rtype != VR_RANGE)
+		{
+		  rtype = VR_RANGE;
+		  minv = minc;
+		  maxv = maxc;
+		}
+	      else
+		{
+		  minv = minv.max (minc, TYPE_UNSIGNED (type));
+		  maxv = maxv.min (maxc, TYPE_UNSIGNED (type));
+		  gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0);
+		}
+	    }
+	}
+      if (rtype == VR_RANGE)
+	{
+	  mpz_t minm, maxm;
+	  gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0);
+	  mpz_init (minm);
+	  mpz_init (maxm);
+	  mpz_set_double_int (minm, minv, TYPE_UNSIGNED (type));
+	  mpz_set_double_int (maxm, maxv, TYPE_UNSIGNED (type));
+	  mpz_add (minm, minm, off);
+	  mpz_add (maxm, maxm, off);
+	  /* If the computation may not wrap or off is zero, then this
+	     is always fine.  If off is negative and minv + off isn't
+	     smaller than type's minimum, or off is positive and
+	     maxv + off isn't bigger than type's maximum, use the more
+	     precise range too.  */
+	  if (nowrap_type_p (type)
+	      || mpz_sgn (off) == 0
+	      || (mpz_sgn (off) < 0 && mpz_cmp (minm, min) >= 0)
+	      || (mpz_sgn (off) > 0 && mpz_cmp (maxm, max) <= 0))
+	    {
+	      mpz_set (min, minm);
+	      mpz_set (max, maxm);
+	      mpz_clear (minm);
+	      mpz_clear (maxm);
+	      return;
+	    }
+	  mpz_clear (minm);
+	  mpz_clear (maxm);
+	}
+    }
+
   /* If the computation may wrap, we know nothing about the value, except for
      the range of the type.  */
-  get_type_static_bounds (type, min, max);
   if (!nowrap_type_p (type))
     return;
 
@@ -405,8 +473,8 @@ bound_difference (struct loop *loop, tre
       mpz_init (maxx);
       mpz_init (miny);
       mpz_init (maxy);
-      determine_value_range (type, varx, offx, minx, maxx);
-      determine_value_range (type, vary, offy, miny, maxy);
+      determine_value_range (loop, type, varx, offx, minx, maxx);
+      determine_value_range (loop, type, vary, offy, miny, maxy);
 
       mpz_sub (bnds->below, minx, maxy);
       mpz_sub (bnds->up, maxx, miny);
--- gcc/testsuite/gcc.dg/tree-ssa/loop-39.c.jj	2013-11-07 11:52:01.871309995 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-39.c	2013-11-07 11:54:30.549496152 +0100
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sccp-details" } */
+
+int
+foo (unsigned int n)
+{
+  int i, r = 1;
+  if (n > 0)
+    {
+      asm ("");
+      if (n < 10)
+	{
+	  asm ("");
+	  do
+	    {
+	      --n;
+	      r *= 2;
+	    }
+	  while (n > 0);
+	}
+    }
+  return r + n;
+}
+
+/* { dg-final { scan-tree-dump "# of iterations \[^\n\r]*, bounded by 8" "sccp" } } */
+/* { dg-final { cleanup-tree-dump "sccp" } } */

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-07 14:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-01  0:03 [PATCH] Handling == or != comparisons that may affect range test optimization Cong Hou
2013-11-05 20:46 ` Jeff Law
2013-11-05 20:59   ` Jakub Jelinek
2013-11-05 21:49     ` Cong Hou
2013-11-05 22:03   ` Cong Hou
2013-11-05 22:39     ` Jeff Law
2013-11-06 17:25     ` [PATCH] Improve VRP assert creation for loops Jakub Jelinek
2013-11-06 17:32       ` [PATCH] Use get_range_info during number of iterations analysis Jakub Jelinek
2013-11-06 22:23         ` Jeff Law
2013-11-07 14:38           ` Jakub Jelinek
2013-11-06 22:10       ` [PATCH] Improve VRP assert creation for loops Jeff Law
2013-11-07  9:08       ` Richard Biener
2013-11-07  9:42         ` Jakub Jelinek
2013-11-07  9:44           ` Richard Biener
2013-11-07 11:32             ` Jakub Jelinek
2013-11-07 11:41               ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).