public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
@ 2012-01-23 21:27 Tom de Vries
  2012-01-24 10:40 ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-01-23 21:27 UTC (permalink / raw)
  To: Richard Guenther, Jakub Jelinek; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1721 bytes --]

Richard,
Jakub,

the following patch fixes PR51879.

Consider the following test-case:
...
int bar (int);
void baz (int);

void
foo (int y)
{
  int a;
  if (y == 6)
    a = bar (7);
  else
    a = bar (7);
  baz (a);
}
...

after compiling at -02, the representation looks like this before tail-merging:
...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_3 = barD.1703 (7);
  goto <bb 5>;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_4 = barD.1703 (7);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:10000
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1705 (aD.1709_1);
  # VUSE <.MEMD.1714_9>
  return;
...

the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.

The patch makes sure non-const/pure call results (gimple_vdef and
gimple_call_lhs) are properly value numbered.

Bootstrapped and reg-tested on x86_64.

ok for stage1?

Thanks,
- Tom

2012-01-23  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/51879
	tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
	(visit_use): Handle non-pure/const calls using visit_reference_op_call.

	gcc.dg/pr51879.c: New test.
	gcc.dg/pr51879-2.c: Same.
	gcc.dg/pr51879-3.c: Same.
	gcc.dg/pr51879-4.c: Same.

[-- Attachment #2: pr51879.3.patch --]
[-- Type: text/x-patch, Size: 4047 bytes --]

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 183325)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -2591,6 +2591,7 @@ visit_reference_op_call (tree lhs, gimpl
   struct vn_reference_s vr1;
   tree result;
   tree vuse = gimple_vuse (stmt);
+  tree vdef = gimple_vdef (stmt);
 
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
@@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
   result = vn_reference_lookup_1 (&vr1, NULL);
   if (result)
     {
-      changed = set_ssa_val_to (lhs, result);
+      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
+      if (vdef)
+	changed |= set_ssa_val_to (vdef, result_vdef);
+      changed |= set_ssa_val_to (lhs, result);
+
       if (TREE_CODE (result) == SSA_NAME
 	  && VN_INFO (result)->has_constants)
 	VN_INFO (lhs)->has_constants = true;
@@ -2609,7 +2614,9 @@ visit_reference_op_call (tree lhs, gimpl
     {
       void **slot;
       vn_reference_t vr2;
-      changed = set_ssa_val_to (lhs, lhs);
+      if (vdef)
+	changed |= set_ssa_val_to (vdef, vdef);
+      changed |= set_ssa_val_to (lhs, lhs);
       vr2 = (vn_reference_t) pool_alloc (current_info->references_pool);
       vr2->vuse = vr1.vuse;
       vr2->operands = valueize_refs (create_reference_ops_from_call (stmt));
@@ -3359,8 +3366,10 @@ visit_use (tree use)
 	  /* ???  We should handle stores from calls.  */
 	  else if (TREE_CODE (lhs) == SSA_NAME)
 	    {
+	      tree vuse = gimple_vuse (stmt);
 	      if (!gimple_call_internal_p (stmt)
-		  && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
+		  && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
+		      || (vuse && SSA_VAL (vuse) != VN_TOP)))
 		changed = visit_reference_op_call (lhs, stmt);
 	      else
 		changed = defs_to_varying (stmt);
Index: gcc/testsuite/gcc.dg/pr51879-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y)
+    baz (bar (7) + 6);
+  else
+    baz (bar (7) + 6);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7);
+  else
+    a = bar (7);
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7) + 6;
+  else
+    a = bar (7) + 6;
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-4.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+int foo (int y)
+{
+  int a, b;
+  a = bar (7) + 6;
+  b = bar (7) + 6;
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-01-23 21:27 [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls Tom de Vries
@ 2012-01-24 10:40 ` Richard Guenther
  2012-01-27 20:37   ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2012-01-24 10:40 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Richard,
> Jakub,
>
> the following patch fixes PR51879.
>
> Consider the following test-case:
> ...
> int bar (int);
> void baz (int);
>
> void
> foo (int y)
> {
>  int a;
>  if (y == 6)
>    a = bar (7);
>  else
>    a = bar (7);
>  baz (a);
> }
> ...
>
> after compiling at -02, the representation looks like this before tail-merging:
> ...
>  # BLOCK 3 freq:1991
>  # PRED: 2 [19.9%]  (true,exec)
>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>  # USE = nonlocal
>  # CLB = nonlocal
>  aD.1709_3 = barD.1703 (7);
>  goto <bb 5>;
>  # SUCC: 5 [100.0%]  (fallthru,exec)
>
>  # BLOCK 4 freq:8009
>  # PRED: 2 [80.1%]  (false,exec)
>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>  # USE = nonlocal
>  # CLB = nonlocal
>  aD.1709_4 = barD.1703 (7);
>  # SUCC: 5 [100.0%]  (fallthru,exec)
>
>  # BLOCK 5 freq:10000
>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>  # USE = nonlocal
>  # CLB = nonlocal
>  bazD.1705 (aD.1709_1);
>  # VUSE <.MEMD.1714_9>
>  return;
> ...
>
> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>
> The patch makes sure non-const/pure call results (gimple_vdef and
> gimple_call_lhs) are properly value numbered.
>
> Bootstrapped and reg-tested on x86_64.
>
> ok for stage1?

The following cannot really work:

@@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
   result = vn_reference_lookup_1 (&vr1, NULL);
   if (result)
     {
-      changed = set_ssa_val_to (lhs, result);
+      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
+      if (vdef)
+       changed |= set_ssa_val_to (vdef, result_vdef);
+      changed |= set_ssa_val_to (lhs, result);

becase 'result' may be not an SSA name.  It might also not have
a proper definition statement (if VN_INFO (result)->needs_insertion
is true).  So you at least need to guard things properly.

(On a side-note - I _did_ want to remove value-numbering virtual operands
at some point ...)

@@ -3359,8 +3366,10 @@ visit_use (tree use)
          /* ???  We should handle stores from calls.  */
          else if (TREE_CODE (lhs) == SSA_NAME)
            {
+             tree vuse = gimple_vuse (stmt);
              if (!gimple_call_internal_p (stmt)
-                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
+                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
+                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
                changed = visit_reference_op_call (lhs, stmt);
              else
                changed = defs_to_varying (stmt);

... exactly because of the issue that a stmt has multiple defs.  Btw,
vuse should have been visited here or be part of our SCC, so, why do
you need this check?

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2012-01-23  Tom de Vries  <tom@codesourcery.com>
>
>        PR tree-optimization/51879
>        tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
>        (visit_use): Handle non-pure/const calls using visit_reference_op_call.
>
>        gcc.dg/pr51879.c: New test.
>        gcc.dg/pr51879-2.c: Same.
>        gcc.dg/pr51879-3.c: Same.
>        gcc.dg/pr51879-4.c: Same.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-01-24 10:40 ` Richard Guenther
@ 2012-01-27 20:37   ` Tom de Vries
  2012-04-14  7:26     ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-01-27 20:37 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4322 bytes --]

On 24/01/12 11:40, Richard Guenther wrote:
> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Richard,
>> Jakub,
>>
>> the following patch fixes PR51879.
>>
>> Consider the following test-case:
>> ...
>> int bar (int);
>> void baz (int);
>>
>> void
>> foo (int y)
>> {
>>  int a;
>>  if (y == 6)
>>    a = bar (7);
>>  else
>>    a = bar (7);
>>  baz (a);
>> }
>> ...
>>
>> after compiling at -02, the representation looks like this before tail-merging:
>> ...
>>  # BLOCK 3 freq:1991
>>  # PRED: 2 [19.9%]  (true,exec)
>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  aD.1709_3 = barD.1703 (7);
>>  goto <bb 5>;
>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 4 freq:8009
>>  # PRED: 2 [80.1%]  (false,exec)
>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  aD.1709_4 = barD.1703 (7);
>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 5 freq:10000
>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  bazD.1705 (aD.1709_1);
>>  # VUSE <.MEMD.1714_9>
>>  return;
>> ...
>>
>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>
>> The patch makes sure non-const/pure call results (gimple_vdef and
>> gimple_call_lhs) are properly value numbered.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> ok for stage1?
> 
> The following cannot really work:
> 
> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>    result = vn_reference_lookup_1 (&vr1, NULL);
>    if (result)
>      {
> -      changed = set_ssa_val_to (lhs, result);
> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
> +      if (vdef)
> +       changed |= set_ssa_val_to (vdef, result_vdef);
> +      changed |= set_ssa_val_to (lhs, result);
> 
> because 'result' may be not an SSA name.  It might also not have
> a proper definition statement (if VN_INFO (result)->needs_insertion
> is true).  So you at least need to guard things properly.
> 

Right. And that also doesn't work if the function is without lhs, such as in the
new test-case pr51879-6.c.

I fixed this by storing both lhs and vdef, such that I don't have to derive
the vdef from the lhs.

> (On a side-note - I _did_ want to remove value-numbering virtual operands
> at some point ...)
> 

Doing so willl hurt performance of tail-merging in its current form.
OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
value numbering as used in tail-merging has its limitations too.
Do you have any ideas how to address that one?

> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>           /* ???  We should handle stores from calls.  */
>           else if (TREE_CODE (lhs) == SSA_NAME)
>             {
> +             tree vuse = gimple_vuse (stmt);
>               if (!gimple_call_internal_p (stmt)
> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>                 changed = visit_reference_op_call (lhs, stmt);
>               else
>                 changed = defs_to_varying (stmt);
> 
> ... exactly because of the issue that a stmt has multiple defs.  Btw,
> vuse should have been visited here or be part of our SCC, so, why do
> you need this check?
> 

Removed now, that was a workaround for a bug in an earlier version of the patch,
that I didn't need anymore.

Bootstrapped and reg-tested on x86_64.

OK for stage1?

Thanks,
- Tom

> Thanks,
> Richard.
> 

2012-01-27  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/51879
	* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
	* tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
	Handle case that lhs is NULL_TREE.
	(visit_use): Handle non-pure/const calls and calls without result using
	visit_reference_op_call.

	gcc.dg/pr51879.c: New test.
	gcc.dg/pr51879-2.c: Same.
	gcc.dg/pr51879-3.c: Same.
	gcc.dg/pr51879-4.c: Same.
	gcc.dg/pr51879-6.c: Same.


[-- Attachment #2: pr51879.4.patch --]
[-- Type: text/x-patch, Size: 7454 bytes --]

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 183325)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -2589,27 +2589,44 @@ visit_reference_op_call (tree lhs, gimpl
 {
   bool changed = false;
   struct vn_reference_s vr1;
-  tree result;
+  vn_reference_t vnresult = NULL;
   tree vuse = gimple_vuse (stmt);
+  tree vdef = gimple_vdef (stmt);
+
+  if (vdef)
+    VN_INFO (vdef)->use_processed = true;
 
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
   vr1.type = gimple_expr_type (stmt);
   vr1.set = 0;
   vr1.hashcode = vn_reference_compute_hash (&vr1);
-  result = vn_reference_lookup_1 (&vr1, NULL);
-  if (result)
+  vn_reference_lookup_1 (&vr1, &vnresult);
+
+  if (vnresult)
     {
-      changed = set_ssa_val_to (lhs, result);
-      if (TREE_CODE (result) == SSA_NAME
-	  && VN_INFO (result)->has_constants)
-	VN_INFO (lhs)->has_constants = true;
+      if (vnresult->vdef)
+	changed |= set_ssa_val_to (vdef, vnresult->vdef);
+
+      if (!vnresult->result && lhs)
+	vnresult->result = lhs;
+
+      if (vnresult->result && lhs)
+	{
+	  changed |= set_ssa_val_to (lhs, vnresult->result);
+
+	  if (VN_INFO (vnresult->result)->has_constants)
+	    VN_INFO (lhs)->has_constants = true;
+	}
     }
   else
     {
       void **slot;
       vn_reference_t vr2;
-      changed = set_ssa_val_to (lhs, lhs);
+      if (vdef)
+	changed |= set_ssa_val_to (vdef, vdef);
+      if (lhs)
+	changed |= set_ssa_val_to (lhs, lhs);
       vr2 = (vn_reference_t) pool_alloc (current_info->references_pool);
       vr2->vuse = vr1.vuse;
       vr2->operands = valueize_refs (create_reference_ops_from_call (stmt));
@@ -2617,6 +2634,7 @@ visit_reference_op_call (tree lhs, gimpl
       vr2->set = vr1.set;
       vr2->hashcode = vr1.hashcode;
       vr2->result = lhs;
+      vr2->vdef = vdef;
       slot = htab_find_slot_with_hash (current_info->references,
 				       vr2, vr2->hashcode, INSERT);
       if (*slot)
@@ -3177,8 +3195,7 @@ visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (!gimple_has_lhs (stmt)
-	       || gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{
@@ -3340,34 +3357,37 @@ visit_use (tree use)
 
 	  /* ???  We could try to simplify calls.  */
 
-	  if (stmt_has_constants (stmt)
-	      && TREE_CODE (lhs) == SSA_NAME)
-	    VN_INFO (lhs)->has_constants = true;
-	  else if (TREE_CODE (lhs) == SSA_NAME)
+	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    {
-	      /* We reset expr and constantness here because we may
-		 have been value numbering optimistically, and
-		 iterating. They may become non-constant in this case,
-		 even if they were optimistically constant. */
-	      VN_INFO (lhs)->has_constants = false;
-	      VN_INFO (lhs)->expr = NULL_TREE;
+	      if (stmt_has_constants (stmt))
+		VN_INFO (lhs)->has_constants = true;
+	      else
+		{
+		  /* We reset expr and constantness here because we may
+		     have been value numbering optimistically, and
+		     iterating. They may become non-constant in this case,
+		     even if they were optimistically constant. */
+		  VN_INFO (lhs)->has_constants = false;
+		  VN_INFO (lhs)->expr = NULL_TREE;
+		}
+
+	      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+		{
+		  changed = defs_to_varying (stmt);
+		  goto done;
+		}
 	    }
 
-	  if (TREE_CODE (lhs) == SSA_NAME
-	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
-	    changed = defs_to_varying (stmt);
 	  /* ???  We should handle stores from calls.  */
-	  else if (TREE_CODE (lhs) == SSA_NAME)
-	    {
-	      if (!gimple_call_internal_p (stmt)
-		  && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
-		changed = visit_reference_op_call (lhs, stmt);
-	      else
-		changed = defs_to_varying (stmt);
-	    }
+	  if (!gimple_call_internal_p (stmt)
+	      && ((lhs && TREE_CODE (lhs) == SSA_NAME)
+		  || (!lhs && gimple_vdef (stmt))))
+	    changed = visit_reference_op_call (lhs, stmt);
 	  else
 	    changed = defs_to_varying (stmt);
 	}
+      else
+	changed = defs_to_varying (stmt);
     }
  done:
   return changed;
Index: gcc/tree-ssa-sccvn.h
===================================================================
--- gcc/tree-ssa-sccvn.h (revision 183325)
+++ gcc/tree-ssa-sccvn.h (working copy)
@@ -110,6 +110,7 @@ typedef struct vn_reference_s
   tree type;
   VEC (vn_reference_op_s, heap) *operands;
   tree result;
+  tree vdef;
 } *vn_reference_t;
 typedef const struct vn_reference_s *const_vn_reference_t;
 
Index: gcc/testsuite/gcc.dg/pr51879-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y)
+    baz (bar (7) + 6);
+  else
+    baz (bar (7) + 6);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-6.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-6.c (revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+
+int bar (int);
+void baz (int);
+void bla (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    {
+      bla (5);
+      a = bar (7);
+    }
+  else
+    {
+      bla (5);
+      a = bar (7);
+    }
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7);
+  else
+    a = bar (7);
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7) + 6;
+  else
+    a = bar (7) + 6;
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-4.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+int foo (int y)
+{
+  int a, b;
+  a = bar (7) + 6;
+  b = bar (7) + 6;
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-01-27 20:37   ` Tom de Vries
@ 2012-04-14  7:26     ` Tom de Vries
  2012-04-17 12:25       ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-04-14  7:26 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

On 27/01/12 21:37, Tom de Vries wrote:
> On 24/01/12 11:40, Richard Guenther wrote:
>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> Richard,
>>> Jakub,
>>>
>>> the following patch fixes PR51879.
>>>
>>> Consider the following test-case:
>>> ...
>>> int bar (int);
>>> void baz (int);
>>>
>>> void
>>> foo (int y)
>>> {
>>>  int a;
>>>  if (y == 6)
>>>    a = bar (7);
>>>  else
>>>    a = bar (7);
>>>  baz (a);
>>> }
>>> ...
>>>
>>> after compiling at -02, the representation looks like this before tail-merging:
>>> ...
>>>  # BLOCK 3 freq:1991
>>>  # PRED: 2 [19.9%]  (true,exec)
>>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  aD.1709_3 = barD.1703 (7);
>>>  goto <bb 5>;
>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 4 freq:8009
>>>  # PRED: 2 [80.1%]  (false,exec)
>>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  aD.1709_4 = barD.1703 (7);
>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 5 freq:10000
>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  bazD.1705 (aD.1709_1);
>>>  # VUSE <.MEMD.1714_9>
>>>  return;
>>> ...
>>>
>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>
>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>> gimple_call_lhs) are properly value numbered.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> ok for stage1?
>>
>> The following cannot really work:
>>
>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>    if (result)
>>      {
>> -      changed = set_ssa_val_to (lhs, result);
>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>> +      if (vdef)
>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>> +      changed |= set_ssa_val_to (lhs, result);
>>
>> because 'result' may be not an SSA name.  It might also not have
>> a proper definition statement (if VN_INFO (result)->needs_insertion
>> is true).  So you at least need to guard things properly.
>>
> 
> Right. And that also doesn't work if the function is without lhs, such as in the
> new test-case pr51879-6.c.
> 
> I fixed this by storing both lhs and vdef, such that I don't have to derive
> the vdef from the lhs.
> 
>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>> at some point ...)
>>
> 
> Doing so willl hurt performance of tail-merging in its current form.
> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
> value numbering as used in tail-merging has its limitations too.
> Do you have any ideas how to address that one?
> 
>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>           /* ???  We should handle stores from calls.  */
>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>             {
>> +             tree vuse = gimple_vuse (stmt);
>>               if (!gimple_call_internal_p (stmt)
>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>                 changed = visit_reference_op_call (lhs, stmt);
>>               else
>>                 changed = defs_to_varying (stmt);
>>
>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>> vuse should have been visited here or be part of our SCC, so, why do
>> you need this check?
>>
> 
> Removed now, that was a workaround for a bug in an earlier version of the patch,
> that I didn't need anymore.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage1?
> 

Richard,

quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
...
I think these fixes hint at that we should
use "structural" equality as fallback if value-numbering doesn't equate
two stmt effects.  Thus, treat two stmts with exactly the same operands
and flags as equal and using value-numbering to canonicalize operands
(when they are SSA names) for that comparison, or use VN entirely
if there are no side-effects on the stmt.

Changing value-numbering of virtual operands, even if it looks correct in the
simple cases you change, doesn't look like a general solution for the missed
tail merging opportunities.
...

The test-case pr51879-6.c shows a case where improving value numbering will help
tail-merging, but structural equality comparison not:
...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
  # USE = nonlocal
  # CLB = nonlocal
  blaD.1708 (5);
  # .MEMD.1717_8 = VDEF <.MEMD.1717_7>
  # USE = nonlocal
  # CLB = nonlocal
  aD.1712_3 = barD.1704 (7);
  goto <bb 5>;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
  # USE = nonlocal
  # CLB = nonlocal
  blaD.1708 (5);
  # .MEMD.1717_10 = VDEF <.MEMD.1717_9>
  # USE = nonlocal
  # CLB = nonlocal
  aD.1712_4 = barD.1704 (7);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:10000
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
  # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
  # .MEMD.1717_11 = VDEF <.MEMD.1717_5>
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1706 (aD.1712_1);
  # VUSE <.MEMD.1717_11>
  return;
...
by structurally comparing the gimples in both blocks, we can see that these are
equal.
What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
that, we need value numbering to number the vuses the same. And if we get value
numbering to number these values equal, we don't need the structural comparison.

In this case structural comparison cannot be used as fallback for
value-numbering. So, ok for trunk?

And if not ok, do you have a suggestion of how to deal with this otherwise?

Thanks,
- Tom

> Thanks,
> - Tom
> 
>> Thanks,
>> Richard.
>>
> 
> 2012-01-27  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR tree-optimization/51879
> 	* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
> 	* tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
> 	Handle case that lhs is NULL_TREE.
> 	(visit_use): Handle non-pure/const calls and calls without result using
> 	visit_reference_op_call.
> 
> 	gcc.dg/pr51879.c: New test.
> 	gcc.dg/pr51879-2.c: Same.
> 	gcc.dg/pr51879-3.c: Same.
> 	gcc.dg/pr51879-4.c: Same.
> 	gcc.dg/pr51879-6.c: Same.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-14  7:26     ` Tom de Vries
@ 2012-04-17 12:25       ` Richard Guenther
  2012-04-24 21:20         ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2012-04-17 12:25 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 27/01/12 21:37, Tom de Vries wrote:
>> On 24/01/12 11:40, Richard Guenther wrote:
>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> Richard,
>>>> Jakub,
>>>>
>>>> the following patch fixes PR51879.
>>>>
>>>> Consider the following test-case:
>>>> ...
>>>> int bar (int);
>>>> void baz (int);
>>>>
>>>> void
>>>> foo (int y)
>>>> {
>>>>  int a;
>>>>  if (y == 6)
>>>>    a = bar (7);
>>>>  else
>>>>    a = bar (7);
>>>>  baz (a);
>>>> }
>>>> ...
>>>>
>>>> after compiling at -02, the representation looks like this before tail-merging:
>>>> ...
>>>>  # BLOCK 3 freq:1991
>>>>  # PRED: 2 [19.9%]  (true,exec)
>>>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>>  # USE = nonlocal
>>>>  # CLB = nonlocal
>>>>  aD.1709_3 = barD.1703 (7);
>>>>  goto <bb 5>;
>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>
>>>>  # BLOCK 4 freq:8009
>>>>  # PRED: 2 [80.1%]  (false,exec)
>>>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>>  # USE = nonlocal
>>>>  # CLB = nonlocal
>>>>  aD.1709_4 = barD.1703 (7);
>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>
>>>>  # BLOCK 5 freq:10000
>>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>>  # USE = nonlocal
>>>>  # CLB = nonlocal
>>>>  bazD.1705 (aD.1709_1);
>>>>  # VUSE <.MEMD.1714_9>
>>>>  return;
>>>> ...
>>>>
>>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>>
>>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>>> gimple_call_lhs) are properly value numbered.
>>>>
>>>> Bootstrapped and reg-tested on x86_64.
>>>>
>>>> ok for stage1?
>>>
>>> The following cannot really work:
>>>
>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>>    if (result)
>>>      {
>>> -      changed = set_ssa_val_to (lhs, result);
>>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>> +      if (vdef)
>>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>>> +      changed |= set_ssa_val_to (lhs, result);
>>>
>>> because 'result' may be not an SSA name.  It might also not have
>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>> is true).  So you at least need to guard things properly.
>>>
>>
>> Right. And that also doesn't work if the function is without lhs, such as in the
>> new test-case pr51879-6.c.
>>
>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>> the vdef from the lhs.
>>
>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>> at some point ...)
>>>
>>
>> Doing so willl hurt performance of tail-merging in its current form.
>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>> value numbering as used in tail-merging has its limitations too.
>> Do you have any ideas how to address that one?
>>
>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>           /* ???  We should handle stores from calls.  */
>>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>>             {
>>> +             tree vuse = gimple_vuse (stmt);
>>>               if (!gimple_call_internal_p (stmt)
>>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>                 changed = visit_reference_op_call (lhs, stmt);
>>>               else
>>>                 changed = defs_to_varying (stmt);
>>>
>>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>>> vuse should have been visited here or be part of our SCC, so, why do
>>> you need this check?
>>>
>>
>> Removed now, that was a workaround for a bug in an earlier version of the patch,
>> that I didn't need anymore.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for stage1?
>>
>
> Richard,
>
> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
> ...
> I think these fixes hint at that we should
> use "structural" equality as fallback if value-numbering doesn't equate
> two stmt effects.  Thus, treat two stmts with exactly the same operands
> and flags as equal and using value-numbering to canonicalize operands
> (when they are SSA names) for that comparison, or use VN entirely
> if there are no side-effects on the stmt.
>
> Changing value-numbering of virtual operands, even if it looks correct in the
> simple cases you change, doesn't look like a general solution for the missed
> tail merging opportunities.
> ...
>
> The test-case pr51879-6.c shows a case where improving value numbering will help
> tail-merging, but structural equality comparison not:
> ...
>  # BLOCK 3 freq:1991
>  # PRED: 2 [19.9%]  (true,exec)
>  # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
>  # USE = nonlocal
>  # CLB = nonlocal
>  blaD.1708 (5);
>  # .MEMD.1717_8 = VDEF <.MEMD.1717_7>
>  # USE = nonlocal
>  # CLB = nonlocal
>  aD.1712_3 = barD.1704 (7);
>  goto <bb 5>;
>  # SUCC: 5 [100.0%]  (fallthru,exec)
>
>  # BLOCK 4 freq:8009
>  # PRED: 2 [80.1%]  (false,exec)
>  # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
>  # USE = nonlocal
>  # CLB = nonlocal
>  blaD.1708 (5);
>  # .MEMD.1717_10 = VDEF <.MEMD.1717_9>
>  # USE = nonlocal
>  # CLB = nonlocal
>  aD.1712_4 = barD.1704 (7);
>  # SUCC: 5 [100.0%]  (fallthru,exec)
>
>  # BLOCK 5 freq:10000
>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>  # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
>  # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
>  # .MEMD.1717_11 = VDEF <.MEMD.1717_5>
>  # USE = nonlocal
>  # CLB = nonlocal
>  bazD.1706 (aD.1712_1);
>  # VUSE <.MEMD.1717_11>
>  return;
> ...
> by structurally comparing the gimples in both blocks, we can see that these are
> equal.
> What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
> that, we need value numbering to number the vuses the same. And if we get value
> numbering to number these values equal, we don't need the structural comparison.
>
> In this case structural comparison cannot be used as fallback for
> value-numbering. So, ok for trunk?
>
> And if not ok, do you have a suggestion of how to deal with this otherwise?

Hmm.  I'm not sure we can conclude that they have the same value!

+int bar (int);
+void baz (int);
+void bla (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    {
+      bla (5);
+      a = bar (7);
+    }
+  else
+    {
+      bla (5);
+      a = bar (7);
+    }
+  baz (a);
+}

I can implement bla as

void bla(int) { a = random (); }

thus a would not have the same value on both paths - but that is not what
matters - because obviously we can still merge the blocks.  That is,
we cannot be sure that the side-effects on memory of bla (5) and bla (5)
are the same.  But is that not what your value-numbering changes will assert?
(not sure how you achieve this, but it looks like you insert/lookup general
calls, thus not pure/const ones - that could easily lead to miscompiles(?)).

Richard.

Richard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-17 12:25       ` Richard Guenther
@ 2012-04-24 21:20         ` Tom de Vries
  2012-04-25  9:57           ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-04-24 21:20 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 10206 bytes --]

On 17/04/12 14:24, Richard Guenther wrote:
> On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 27/01/12 21:37, Tom de Vries wrote:
>>> On 24/01/12 11:40, Richard Guenther wrote:
>>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> Richard,
>>>>> Jakub,
>>>>>
>>>>> the following patch fixes PR51879.
>>>>>
>>>>> Consider the following test-case:
>>>>> ...
>>>>> int bar (int);
>>>>> void baz (int);
>>>>>
>>>>> void
>>>>> foo (int y)
>>>>> {
>>>>>  int a;
>>>>>  if (y == 6)
>>>>>    a = bar (7);
>>>>>  else
>>>>>    a = bar (7);
>>>>>  baz (a);
>>>>> }
>>>>> ...
>>>>>
>>>>> after compiling at -02, the representation looks like this before tail-merging:
>>>>> ...
>>>>>  # BLOCK 3 freq:1991
>>>>>  # PRED: 2 [19.9%]  (true,exec)
>>>>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>>>  # USE = nonlocal
>>>>>  # CLB = nonlocal
>>>>>  aD.1709_3 = barD.1703 (7);
>>>>>  goto <bb 5>;
>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>
>>>>>  # BLOCK 4 freq:8009
>>>>>  # PRED: 2 [80.1%]  (false,exec)
>>>>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>>>  # USE = nonlocal
>>>>>  # CLB = nonlocal
>>>>>  aD.1709_4 = barD.1703 (7);
>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>
>>>>>  # BLOCK 5 freq:10000
>>>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>>>  # USE = nonlocal
>>>>>  # CLB = nonlocal
>>>>>  bazD.1705 (aD.1709_1);
>>>>>  # VUSE <.MEMD.1714_9>
>>>>>  return;
>>>>> ...
>>>>>
>>>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>>>
>>>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>>>> gimple_call_lhs) are properly value numbered.
>>>>>
>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>
>>>>> ok for stage1?
>>>>
>>>> The following cannot really work:
>>>>
>>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>>>    if (result)
>>>>      {
>>>> -      changed = set_ssa_val_to (lhs, result);
>>>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>>> +      if (vdef)
>>>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>>>> +      changed |= set_ssa_val_to (lhs, result);
>>>>
>>>> because 'result' may be not an SSA name.  It might also not have
>>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>>> is true).  So you at least need to guard things properly.
>>>>
>>>
>>> Right. And that also doesn't work if the function is without lhs, such as in the
>>> new test-case pr51879-6.c.
>>>
>>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>>> the vdef from the lhs.
>>>
>>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>>> at some point ...)
>>>>
>>>
>>> Doing so willl hurt performance of tail-merging in its current form.
>>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>>> value numbering as used in tail-merging has its limitations too.
>>> Do you have any ideas how to address that one?
>>>
>>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>>           /* ???  We should handle stores from calls.  */
>>>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>>>             {
>>>> +             tree vuse = gimple_vuse (stmt);
>>>>               if (!gimple_call_internal_p (stmt)
>>>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>>                 changed = visit_reference_op_call (lhs, stmt);
>>>>               else
>>>>                 changed = defs_to_varying (stmt);
>>>>
>>>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>>>> vuse should have been visited here or be part of our SCC, so, why do
>>>> you need this check?
>>>>
>>>
>>> Removed now, that was a workaround for a bug in an earlier version of the patch,
>>> that I didn't need anymore.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for stage1?
>>>
>>
>> Richard,
>>
>> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
>> ...
>> I think these fixes hint at that we should
>> use "structural" equality as fallback if value-numbering doesn't equate
>> two stmt effects.  Thus, treat two stmts with exactly the same operands
>> and flags as equal and using value-numbering to canonicalize operands
>> (when they are SSA names) for that comparison, or use VN entirely
>> if there are no side-effects on the stmt.
>>
>> Changing value-numbering of virtual operands, even if it looks correct in the
>> simple cases you change, doesn't look like a general solution for the missed
>> tail merging opportunities.
>> ...
>>
>> The test-case pr51879-6.c shows a case where improving value numbering will help
>> tail-merging, but structural equality comparison not:
>> ...
>>  # BLOCK 3 freq:1991
>>  # PRED: 2 [19.9%]  (true,exec)
>>  # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  blaD.1708 (5);
>>  # .MEMD.1717_8 = VDEF <.MEMD.1717_7>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  aD.1712_3 = barD.1704 (7);
>>  goto <bb 5>;
>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 4 freq:8009
>>  # PRED: 2 [80.1%]  (false,exec)
>>  # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  blaD.1708 (5);
>>  # .MEMD.1717_10 = VDEF <.MEMD.1717_9>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  aD.1712_4 = barD.1704 (7);
>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 5 freq:10000
>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>  # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
>>  # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
>>  # .MEMD.1717_11 = VDEF <.MEMD.1717_5>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  bazD.1706 (aD.1712_1);
>>  # VUSE <.MEMD.1717_11>
>>  return;
>> ...
>> by structurally comparing the gimples in both blocks, we can see that these are
>> equal.
>> What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
>> that, we need value numbering to number the vuses the same. And if we get value
>> numbering to number these values equal, we don't need the structural comparison.
>>
>> In this case structural comparison cannot be used as fallback for
>> value-numbering. So, ok for trunk?
>>
>> And if not ok, do you have a suggestion of how to deal with this otherwise?
> 
> Hmm.  I'm not sure we can conclude that they have the same value!
> 
> +int bar (int);
> +void baz (int);
> +void bla (int);
> +
> +void
> +foo (int y)
> +{
> +  int a;
> +  if (y == 6)
> +    {
> +      bla (5);
> +      a = bar (7);
> +    }
> +  else
> +    {
> +      bla (5);
> +      a = bar (7);
> +    }
> +  baz (a);
> +}
> 
> I can implement bla as
> 
> void bla(int) { a = random (); }
> 
> thus a would not have the same value on both paths

I think it's hard to decide whether they have the same value or not, since we
cannot write a test-case that compares the results of calls from 2 branches of
an if statement.

I started looking at the problem in terms of subsequent calls. Consider the
imaginary attribute nosideeffect, similar to const and pure.

const:
- no effects except the return value
- the return value depends only on the parameters

pure:
- no effects except the return value
- the return value depends only on the parameters and/or global variables

nosideeffect:
- no effects except the return value

The code fragment:
...
extern int f (int) __attribute ((nosideeffect));
int g()
{
  int a;
  int b;
  a = f (5);
  b = f (5);
  return a + b;
}
...

would result in a gimple sequence more or less like this:
...
  # VUSE <.MEMD.1712_4(D)>
  aD.1707_1 = fD.1704 (5);
  # VUSE <.MEMD.1712_4(D)>
  bD.1708_2 = fD.1704 (5);
  D.1710_3 = aD.1707_1 + bD.1708_2;
  # VUSE <.MEMD.1712_6>
  return D.1710_3;
...

The value numbering modification I'm proposing would say that a and b have the
same value, which is not true. I updated the patch to ensure in this case that a
and b would be seen as different.
Note that things won't break if the attribute is missing on a function, since
that just means that the function would have a vdef, and there would be no 2
subsequent function calls with the same vuse.

> - but that is not what
> matters - because obviously we can still merge the blocks.

Right.

>  That is,
> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
> are the same.  But is that not what your value-numbering changes will assert?

In the case of calls in different branches of an control flow statement, we can
aggressively assume they're the same, since we cannot write a check that would
start failing.

In the case of subsequent calls with side effects, the vuses will never be the same.

In the case of subsequent calls without side effects, but not pure or const, we
can't assume they have the same result. AFAIK, there's currently no way to
specify such a function, but the updated patch should be able handle this.

> (not sure how you achieve this, but it looks like you insert/lookup general
> calls, thus not pure/const ones

Correct.

> - that could easily lead to miscompiles(?)).
> 

I have bootstrapped and reg-tested the updated (and the previous) patch on
x86_64 (ada inclusive), and found no issues.

Can you think of a test-case that fails with the previous or the updated patch?
Otherwise, ok for trunk?

Thanks,
- Tom

> Richard.

2012-04-24  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/51879
	* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
	* tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
	Handle case that lhs is NULL_TREE.
	(visit_use): Handle calls with side-effect using
	visit_reference_op_call.

	gcc.dg/pr51879.c: New test.
	gcc.dg/pr51879-2.c: Same.
	gcc.dg/pr51879-3.c: Same.
	gcc.dg/pr51879-4.c: Same.
	gcc.dg/pr51879-6.c: Same.



[-- Attachment #2: pr51879.5.patch --]
[-- Type: text/x-patch, Size: 7701 bytes --]

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 185028)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -2598,27 +2598,44 @@ visit_reference_op_call (tree lhs, gimpl
 {
   bool changed = false;
   struct vn_reference_s vr1;
-  tree result;
+  vn_reference_t vnresult = NULL;
   tree vuse = gimple_vuse (stmt);
+  tree vdef = gimple_vdef (stmt);
+
+  if (vdef)
+    VN_INFO (vdef)->use_processed = true;
 
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
   vr1.type = gimple_expr_type (stmt);
   vr1.set = 0;
   vr1.hashcode = vn_reference_compute_hash (&vr1);
-  result = vn_reference_lookup_1 (&vr1, NULL);
-  if (result)
+  vn_reference_lookup_1 (&vr1, &vnresult);
+
+  if (vnresult)
     {
-      changed = set_ssa_val_to (lhs, result);
-      if (TREE_CODE (result) == SSA_NAME
-	  && VN_INFO (result)->has_constants)
-	VN_INFO (lhs)->has_constants = true;
+      if (vnresult->vdef)
+	changed |= set_ssa_val_to (vdef, vnresult->vdef);
+
+      if (!vnresult->result && lhs)
+	vnresult->result = lhs;
+
+      if (vnresult->result && lhs)
+	{
+	  changed |= set_ssa_val_to (lhs, vnresult->result);
+
+	  if (VN_INFO (vnresult->result)->has_constants)
+	    VN_INFO (lhs)->has_constants = true;
+	}
     }
   else
     {
       void **slot;
       vn_reference_t vr2;
-      changed = set_ssa_val_to (lhs, lhs);
+      if (vdef)
+	changed |= set_ssa_val_to (vdef, vdef);
+      if (lhs)
+	changed |= set_ssa_val_to (lhs, lhs);
       vr2 = (vn_reference_t) pool_alloc (current_info->references_pool);
       vr2->vuse = vr1.vuse;
       vr2->operands = valueize_refs (create_reference_ops_from_call (stmt));
@@ -2626,6 +2643,7 @@ visit_reference_op_call (tree lhs, gimpl
       vr2->set = vr1.set;
       vr2->hashcode = vr1.hashcode;
       vr2->result = lhs;
+      vr2->vdef = vdef;
       slot = htab_find_slot_with_hash (current_info->references,
 				       vr2, vr2->hashcode, INSERT);
       if (*slot)
@@ -3186,8 +3204,7 @@ visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (!gimple_has_lhs (stmt)
-	       || gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{
@@ -3349,34 +3366,42 @@ visit_use (tree use)
 
 	  /* ???  We could try to simplify calls.  */
 
-	  if (stmt_has_constants (stmt)
-	      && TREE_CODE (lhs) == SSA_NAME)
-	    VN_INFO (lhs)->has_constants = true;
-	  else if (TREE_CODE (lhs) == SSA_NAME)
+	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    {
-	      /* We reset expr and constantness here because we may
-		 have been value numbering optimistically, and
-		 iterating. They may become non-constant in this case,
-		 even if they were optimistically constant. */
-	      VN_INFO (lhs)->has_constants = false;
-	      VN_INFO (lhs)->expr = NULL_TREE;
+	      if (stmt_has_constants (stmt))
+		VN_INFO (lhs)->has_constants = true;
+	      else
+		{
+		  /* We reset expr and constantness here because we may
+		     have been value numbering optimistically, and
+		     iterating.  They may become non-constant in this case,
+		     even if they were optimistically constant.  */
+		  VN_INFO (lhs)->has_constants = false;
+		  VN_INFO (lhs)->expr = NULL_TREE;
+		}
+
+	      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+		{
+		  changed = defs_to_varying (stmt);
+		  goto done;
+		}
 	    }
 
-	  if (TREE_CODE (lhs) == SSA_NAME
-	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
-	    changed = defs_to_varying (stmt);
 	  /* ???  We should handle stores from calls.  */
-	  else if (TREE_CODE (lhs) == SSA_NAME)
-	    {
-	      if (!gimple_call_internal_p (stmt)
-		  && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
-		changed = visit_reference_op_call (lhs, stmt);
-	      else
-		changed = defs_to_varying (stmt);
-	    }
+	  if (!gimple_call_internal_p (stmt)
+	      && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
+		  /* If the call has side effects, subsequent calls won't have
+		     the same incoming vuse, so it's save to assume
+		     equality.  */
+		  || gimple_has_side_effects (stmt))
+	      && ((lhs && TREE_CODE (lhs) == SSA_NAME)
+		  || (!lhs && gimple_vdef (stmt))))
+	    changed = visit_reference_op_call (lhs, stmt);
 	  else
 	    changed = defs_to_varying (stmt);
 	}
+      else
+	changed = defs_to_varying (stmt);
     }
  done:
   return changed;
Index: gcc/tree-ssa-sccvn.h
===================================================================
--- gcc/tree-ssa-sccvn.h (revision 185028)
+++ gcc/tree-ssa-sccvn.h (working copy)
@@ -110,6 +110,7 @@ typedef struct vn_reference_s
   tree type;
   VEC (vn_reference_op_s, heap) *operands;
   tree result;
+  tree vdef;
 } *vn_reference_t;
 typedef const struct vn_reference_s *const_vn_reference_t;
 
Index: gcc/testsuite/gcc.dg/pr51879-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y)
+    baz (bar (7) + 6);
+  else
+    baz (bar (7) + 6);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-6.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-6.c (revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+
+int bar (int);
+void baz (int);
+void bla (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    {
+      bla (5);
+      a = bar (7);
+    }
+  else
+    {
+      bla (5);
+      a = bar (7);
+    }
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7);
+  else
+    a = bar (7);
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7) + 6;
+  else
+    a = bar (7) + 6;
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-4.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+int foo (int y)
+{
+  int a, b;
+  a = bar (7) + 6;
+  b = bar (7) + 6;
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-24 21:20         ` Tom de Vries
@ 2012-04-25  9:57           ` Richard Guenther
  2012-04-25 10:10             ` Jakub Jelinek
  2012-04-25 21:57             ` Tom de Vries
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Guenther @ 2012-04-25  9:57 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Tue, Apr 24, 2012 at 11:19 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 17/04/12 14:24, Richard Guenther wrote:
>> On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 27/01/12 21:37, Tom de Vries wrote:
>>>> On 24/01/12 11:40, Richard Guenther wrote:
>>>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> Richard,
>>>>>> Jakub,
>>>>>>
>>>>>> the following patch fixes PR51879.
>>>>>>
>>>>>> Consider the following test-case:
>>>>>> ...
>>>>>> int bar (int);
>>>>>> void baz (int);
>>>>>>
>>>>>> void
>>>>>> foo (int y)
>>>>>> {
>>>>>>  int a;
>>>>>>  if (y == 6)
>>>>>>    a = bar (7);
>>>>>>  else
>>>>>>    a = bar (7);
>>>>>>  baz (a);
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> after compiling at -02, the representation looks like this before tail-merging:
>>>>>> ...
>>>>>>  # BLOCK 3 freq:1991
>>>>>>  # PRED: 2 [19.9%]  (true,exec)
>>>>>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>>>>  # USE = nonlocal
>>>>>>  # CLB = nonlocal
>>>>>>  aD.1709_3 = barD.1703 (7);
>>>>>>  goto <bb 5>;
>>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>>
>>>>>>  # BLOCK 4 freq:8009
>>>>>>  # PRED: 2 [80.1%]  (false,exec)
>>>>>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>>>>  # USE = nonlocal
>>>>>>  # CLB = nonlocal
>>>>>>  aD.1709_4 = barD.1703 (7);
>>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>>
>>>>>>  # BLOCK 5 freq:10000
>>>>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>>>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>>>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>>>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>>>>  # USE = nonlocal
>>>>>>  # CLB = nonlocal
>>>>>>  bazD.1705 (aD.1709_1);
>>>>>>  # VUSE <.MEMD.1714_9>
>>>>>>  return;
>>>>>> ...
>>>>>>
>>>>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>>>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>>>>
>>>>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>>>>> gimple_call_lhs) are properly value numbered.
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>
>>>>>> ok for stage1?
>>>>>
>>>>> The following cannot really work:
>>>>>
>>>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>>>>    if (result)
>>>>>      {
>>>>> -      changed = set_ssa_val_to (lhs, result);
>>>>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>>>> +      if (vdef)
>>>>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>>>>> +      changed |= set_ssa_val_to (lhs, result);
>>>>>
>>>>> because 'result' may be not an SSA name.  It might also not have
>>>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>>>> is true).  So you at least need to guard things properly.
>>>>>
>>>>
>>>> Right. And that also doesn't work if the function is without lhs, such as in the
>>>> new test-case pr51879-6.c.
>>>>
>>>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>>>> the vdef from the lhs.
>>>>
>>>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>>>> at some point ...)
>>>>>
>>>>
>>>> Doing so willl hurt performance of tail-merging in its current form.
>>>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>>>> value numbering as used in tail-merging has its limitations too.
>>>> Do you have any ideas how to address that one?
>>>>
>>>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>>>           /* ???  We should handle stores from calls.  */
>>>>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>>>>             {
>>>>> +             tree vuse = gimple_vuse (stmt);
>>>>>               if (!gimple_call_internal_p (stmt)
>>>>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>>>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>>>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>>>                 changed = visit_reference_op_call (lhs, stmt);
>>>>>               else
>>>>>                 changed = defs_to_varying (stmt);
>>>>>
>>>>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>>>>> vuse should have been visited here or be part of our SCC, so, why do
>>>>> you need this check?
>>>>>
>>>>
>>>> Removed now, that was a workaround for a bug in an earlier version of the patch,
>>>> that I didn't need anymore.
>>>>
>>>> Bootstrapped and reg-tested on x86_64.
>>>>
>>>> OK for stage1?
>>>>
>>>
>>> Richard,
>>>
>>> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
>>> ...
>>> I think these fixes hint at that we should
>>> use "structural" equality as fallback if value-numbering doesn't equate
>>> two stmt effects.  Thus, treat two stmts with exactly the same operands
>>> and flags as equal and using value-numbering to canonicalize operands
>>> (when they are SSA names) for that comparison, or use VN entirely
>>> if there are no side-effects on the stmt.
>>>
>>> Changing value-numbering of virtual operands, even if it looks correct in the
>>> simple cases you change, doesn't look like a general solution for the missed
>>> tail merging opportunities.
>>> ...
>>>
>>> The test-case pr51879-6.c shows a case where improving value numbering will help
>>> tail-merging, but structural equality comparison not:
>>> ...
>>>  # BLOCK 3 freq:1991
>>>  # PRED: 2 [19.9%]  (true,exec)
>>>  # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  blaD.1708 (5);
>>>  # .MEMD.1717_8 = VDEF <.MEMD.1717_7>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  aD.1712_3 = barD.1704 (7);
>>>  goto <bb 5>;
>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 4 freq:8009
>>>  # PRED: 2 [80.1%]  (false,exec)
>>>  # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  blaD.1708 (5);
>>>  # .MEMD.1717_10 = VDEF <.MEMD.1717_9>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  aD.1712_4 = barD.1704 (7);
>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 5 freq:10000
>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>  # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
>>>  # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
>>>  # .MEMD.1717_11 = VDEF <.MEMD.1717_5>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  bazD.1706 (aD.1712_1);
>>>  # VUSE <.MEMD.1717_11>
>>>  return;
>>> ...
>>> by structurally comparing the gimples in both blocks, we can see that these are
>>> equal.
>>> What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
>>> that, we need value numbering to number the vuses the same. And if we get value
>>> numbering to number these values equal, we don't need the structural comparison.
>>>
>>> In this case structural comparison cannot be used as fallback for
>>> value-numbering. So, ok for trunk?
>>>
>>> And if not ok, do you have a suggestion of how to deal with this otherwise?
>>
>> Hmm.  I'm not sure we can conclude that they have the same value!
>>
>> +int bar (int);
>> +void baz (int);
>> +void bla (int);
>> +
>> +void
>> +foo (int y)
>> +{
>> +  int a;
>> +  if (y == 6)
>> +    {
>> +      bla (5);
>> +      a = bar (7);
>> +    }
>> +  else
>> +    {
>> +      bla (5);
>> +      a = bar (7);
>> +    }
>> +  baz (a);
>> +}
>>
>> I can implement bla as
>>
>> void bla(int) { a = random (); }
>>
>> thus a would not have the same value on both paths
>
> I think it's hard to decide whether they have the same value or not, since we
> cannot write a test-case that compares the results of calls from 2 branches of
> an if statement.

void *foo ()
{
  return __builtin_return_address (0);
}

void *bar (_Bool b)
{
  if (b)
    return foo ();
  else
    return foo ();
}

int main()
{
  if (bar(true) == bar(false))
    abort ();
}

ok ... outside of the scope of standard "C", but we certainly _can_ do this.
Which would question tail-merging the above at all, of course.

> I started looking at the problem in terms of subsequent calls. Consider the
> imaginary attribute nosideeffect, similar to const and pure.
>
> const:
> - no effects except the return value
> - the return value depends only on the parameters
>
> pure:
> - no effects except the return value
> - the return value depends only on the parameters and/or global variables
>
> nosideeffect:
> - no effects except the return value
>
> The code fragment:
> ...
> extern int f (int) __attribute ((nosideeffect));
> int g()
> {
>  int a;
>  int b;
>  a = f (5);
>  b = f (5);
>  return a + b;
> }
> ...
>
> would result in a gimple sequence more or less like this:
> ...
>  # VUSE <.MEMD.1712_4(D)>
>  aD.1707_1 = fD.1704 (5);
>  # VUSE <.MEMD.1712_4(D)>
>  bD.1708_2 = fD.1704 (5);
>  D.1710_3 = aD.1707_1 + bD.1708_2;
>  # VUSE <.MEMD.1712_6>
>  return D.1710_3;
> ...
>
> The value numbering modification I'm proposing would say that a and b have the
> same value, which is not true. I updated the patch to ensure in this case that a
> and b would be seen as different.
> Note that things won't break if the attribute is missing on a function, since
> that just means that the function would have a vdef, and there would be no 2
> subsequent function calls with the same vuse.
>
>> - but that is not what
>> matters - because obviously we can still merge the blocks.
>
> Right.
>
>>  That is,
>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>> are the same.  But is that not what your value-numbering changes will assert?
>
> In the case of calls in different branches of an control flow statement, we can
> aggressively assume they're the same, since we cannot write a check that would
> start failing.

Apart from the above ... which shows that even the returned values can matter.

> In the case of subsequent calls with side effects, the vuses will never be the same.
>
> In the case of subsequent calls without side effects, but not pure or const, we
> can't assume they have the same result. AFAIK, there's currently no way to
> specify such a function, but the updated patch should be able handle this.
>
>> (not sure how you achieve this, but it looks like you insert/lookup general
>> calls, thus not pure/const ones
>
> Correct.
>
>> - that could easily lead to miscompiles(?)).
>>

And to increased compile-time (not sure if it matters).

> I have bootstrapped and reg-tested the updated (and the previous) patch on
> x86_64 (ada inclusive), and found no issues.
>
> Can you think of a test-case that fails with the previous or the updated patch?

I see you do not handle

struct S { int i; };
struct S foo (void);
struct S bar (void)
{
  struct S s1, s2;
  if (...)
   s = foo ();
  else
   s = foo ();

because the calls have a LHS that is not an SSA name.  In general
value-numbering virtual operands will change what further consumers lookup.
So changing

  # MEM_2 = VUSE <MEM_3>
  foo ();
  # VUSE <MEM_2>
  ... = x;

by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
MEM_3, not considering a clobber by foo.  This might not actually be an
issue (after all we do this for stores, too).

I see you do not handle

int bar (int);
void baz (int);
void bla (int);
int s;
void
foo (int y)
{
  int a;
  if (y == 6)
    {
      s = 1;
      bla (5);
      a = bar (7);
    }
  else
    {
      s = 1;
      bla (5);
      a = bar (7);
    }
  baz (a);
}

and I don't see how you need the new ref->vdef member - that is also
not handled consistently (not in hashing or comparing, etc.).  Can't you
always use SSA_VAL (vuse)?

Btw, what's

+  if (vdef)
+    VN_INFO (vdef)->use_processed = true;

for?  We arrive here by visiting the VDEF after all and visit_use does already
set the use processed.  Is it that we end up visiting the call twice because
of the lhs SSA name definition and the virtual operand definition?  If so then
visit_use should instead do

FOR_EACH_SSA_DEF_OPERAND (...)
   VN_INFO (def)->use_processed = true;

and defs_to_varying then no longer needs to do that.

> Otherwise, ok for trunk?

Let's give this one more iteration, but I guess the basic idea should work.

Thanks,
Richard.

> Thanks,
> - Tom
>
>> Richard.
>
> 2012-04-24  Tom de Vries  <tom@codesourcery.com>
>
>        PR tree-optimization/51879
>        * tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
>        * tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
>        Handle case that lhs is NULL_TREE.
>        (visit_use): Handle calls with side-effect using
>        visit_reference_op_call.
>
>        gcc.dg/pr51879.c: New test.
>        gcc.dg/pr51879-2.c: Same.
>        gcc.dg/pr51879-3.c: Same.
>        gcc.dg/pr51879-4.c: Same.
>        gcc.dg/pr51879-6.c: Same.
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-25  9:57           ` Richard Guenther
@ 2012-04-25 10:10             ` Jakub Jelinek
  2012-04-25 10:18               ` Tom de Vries
  2012-04-25 21:57             ` Tom de Vries
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Jelinek @ 2012-04-25 10:10 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Tom de Vries, gcc-patches

On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote:
> void *foo ()
> {
>   return __builtin_return_address (0);
> }
> 
> void *bar (_Bool b)
> {
>   if (b)
>     return foo ();
>   else
>     return foo ();
> }
> 
> int main()
> {
>   if (bar(true) == bar(false))
>     abort ();
> }
> 
> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
> Which would question tail-merging the above at all, of course.

I don't think we guarantee the above, after all, even pure functions may
use __builtin_return_address (0) - it doesn't modify memory, and we happily
remove pure calls, CSE the return values etc.

	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-25 10:10             ` Jakub Jelinek
@ 2012-04-25 10:18               ` Tom de Vries
  0 siblings, 0 replies; 17+ messages in thread
From: Tom de Vries @ 2012-04-25 10:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Guenther, gcc-patches

On 25/04/12 12:09, Jakub Jelinek wrote:
> On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote:
>> void *foo ()
>> {
>>   return __builtin_return_address (0);
>> }
>>
>> void *bar (_Bool b)
>> {
>>   if (b)
>>     return foo ();
>>   else
>>     return foo ();
>> }
>>
>> int main()
>> {
>>   if (bar(true) == bar(false))
>>     abort ();
>> }
>>
>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>> Which would question tail-merging the above at all, of course.
> 
> I don't think we guarantee the above, after all, even pure functions may
> use __builtin_return_address (0) - it doesn't modify memory, and we happily
> remove pure calls, CSE the return values etc.
> 

Jakub,

pure:
- no effects except the return value
- return value depends only on the parameters and/or global variables

AFAIU, given this definition, a pure function cannot use
__builtin_return_address since it would mean that the return value depends on
something else than parameters and global variables.

Thanks,
- Tom

> 	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-25  9:57           ` Richard Guenther
  2012-04-25 10:10             ` Jakub Jelinek
@ 2012-04-25 21:57             ` Tom de Vries
  2012-04-26 10:21               ` Richard Guenther
  1 sibling, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-04-25 21:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 9332 bytes --]

On 25/04/12 11:57, Richard Guenther wrote:
<SNIP>
>>> >> Hmm.  I'm not sure we can conclude that they have the same value!
>>> >>
>>> >> +int bar (int);
>>> >> +void baz (int);
>>> >> +void bla (int);
>>> >> +
>>> >> +void
>>> >> +foo (int y)
>>> >> +{
>>> >> +  int a;
>>> >> +  if (y == 6)
>>> >> +    {
>>> >> +      bla (5);
>>> >> +      a = bar (7);
>>> >> +    }
>>> >> +  else
>>> >> +    {
>>> >> +      bla (5);
>>> >> +      a = bar (7);
>>> >> +    }
>>> >> +  baz (a);
>>> >> +}
>>> >>
>>> >> I can implement bla as
>>> >>
>>> >> void bla(int) { a = random (); }
>>> >>
>>> >> thus a would not have the same value on both paths
>> >
>> > I think it's hard to decide whether they have the same value or not, since we
>> > cannot write a test-case that compares the results of calls from 2 branches of
>> > an if statement.
> void *foo ()
> {
>   return __builtin_return_address (0);
> }
> 
> void *bar (_Bool b)
> {
>   if (b)
>     return foo ();
>   else
>     return foo ();
> }
> 
> int main()
> {
>   if (bar(true) == bar(false))
>     abort ();
> }
> 
> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
> Which would question tail-merging the above at all, of course.
> 

I added noinline to make sure there's no variation from inlining.
...
extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
((__noreturn__));

__attribute__((noinline))
void *foo ()
{
  return __builtin_return_address (0);
}

__attribute__((noinline))
void *bar (int b)
{
  if (b)
    return foo ();
  else
    return foo ();
}

int main()
{
  void *a, *b;
  a = bar (1);
  b = bar (0);
  if (a == b)
    abort ();
  return 0;
}
...

This test-case passes with:
...
gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
...

and fails with:
...
gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
...

with the proposed patch, it also fails with:
...
gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
...

Tree-tail-merge performs the same transformation as cross-jumping for this
test-case. I think that the transformation is valid, and that the test-case
behaves as expected. Subsequent calls to foo may or may not return the same
value, depending on the optimizations, we can't assert a != b, that's just the
nature of __builtin_return_address.

Btw, this example is not what I meant when I said 'a test-case that compares the
results of calls from 2 branches of an if statement'. I tried to say that the
semantics of C is defined in terms of taking either 1 branch or the other,
rather than executing both in parallel, after which both results would be
available. From that point of view, this example compares subsequent calls to
foo, not parallel.

>> > I started looking at the problem in terms of subsequent calls. Consider the
>> > imaginary attribute nosideeffect, similar to const and pure.
>> >
>> > const:
>> > - no effects except the return value
>> > - the return value depends only on the parameters
>> >
>> > pure:
>> > - no effects except the return value
>> > - the return value depends only on the parameters and/or global variables
>> >
>> > nosideeffect:
>> > - no effects except the return value
>> >
>> > The code fragment:
>> > ...
>> > extern int f (int) __attribute ((nosideeffect));
>> > int g()
>> > {
>> >  int a;
>> >  int b;
>> >  a = f (5);
>> >  b = f (5);
>> >  return a + b;
>> > }
>> > ...
>> >
>> > would result in a gimple sequence more or less like this:
>> > ...
>> >  # VUSE <.MEMD.1712_4(D)>
>> >  aD.1707_1 = fD.1704 (5);
>> >  # VUSE <.MEMD.1712_4(D)>
>> >  bD.1708_2 = fD.1704 (5);
>> >  D.1710_3 = aD.1707_1 + bD.1708_2;
>> >  # VUSE <.MEMD.1712_6>
>> >  return D.1710_3;
>> > ...
>> >
>> > The value numbering modification I'm proposing would say that a and b have the
>> > same value, which is not true. I updated the patch to ensure in this case that a
>> > and b would be seen as different.
>> > Note that things won't break if the attribute is missing on a function, since
>> > that just means that the function would have a vdef, and there would be no 2
>> > subsequent function calls with the same vuse.
>> >
>>> >> - but that is not what
>>> >> matters - because obviously we can still merge the blocks.
>> >
>> > Right.
>> >
>>> >>  That is,
>>> >> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>> >> are the same.  But is that not what your value-numbering changes will assert?
>> >
>> > In the case of calls in different branches of an control flow statement, we can
>> > aggressively assume they're the same, since we cannot write a check that would
>> > start failing.
> Apart from the above ... which shows that even the returned values can matter.
> 
>> > In the case of subsequent calls with side effects, the vuses will never be the same.
>> >
>> > In the case of subsequent calls without side effects, but not pure or const, we
>> > can't assume they have the same result. AFAIK, there's currently no way to
>> > specify such a function, but the updated patch should be able handle this.
>> >
>>> >> (not sure how you achieve this, but it looks like you insert/lookup general
>>> >> calls, thus not pure/const ones
>> >
>> > Correct.
>> >
>>> >> - that could easily lead to miscompiles(?)).
>>> >>
> And to increased compile-time (not sure if it matters).
> 
>> > I have bootstrapped and reg-tested the updated (and the previous) patch on
>> > x86_64 (ada inclusive), and found no issues.
>> >
>> > Can you think of a test-case that fails with the previous or the updated patch?
> I see you do not handle
> 

I assume you mean 'not handle' in the sense of 'handle conservatively'.

> struct S { int i; };
> struct S foo (void);
> struct S bar (void)
> {
>   struct S s1, s2;
>   if (...)
>    s = foo ();
>   else
>    s = foo ();
> 
> because the calls have a LHS that is not an SSA name.

Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
fails to optimize this test-case:
...
struct S { int i; };
extern struct S foo (void);
extern int foo2 (void);
struct S s;
int bar (int c) {
  int r;
  if (c)
    {
      s = foo ();
      r = foo2 ();
    }
  else
    {
      s = foo ();
      r = foo2 ();
    }
  return r;
}
...

A todo.

>  In general
> value-numbering virtual operands will change what further consumers lookup.
> So changing
> 
>   # MEM_2 = VUSE <MEM_3>
>   foo ();
>   # VUSE <MEM_2>
>   ... = x;
> 
> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
> MEM_3, not considering a clobber by foo.

AFAIU, the patch doesn't do this type of value numbering.

> This might not actually be an
> issue (after all we do this for stores, too).
> 
> I see you do not handle
> 
> int bar (int);
> void baz (int);
> void bla (int);
> int s;
> void
> foo (int y)
> {
>   int a;
>   if (y == 6)
>     {
>       s = 1;
>       bla (5);
>       a = bar (7);
>     }
>   else
>     {
>       s = 1;
>       bla (5);
>       a = bar (7);
>     }
>   baz (a);
> }
> 

I think a more basic example is this one from PR52009, which is basically about
stores:
...
int z;

void
foo (int y)
{
  int a;
  if (y == 6)
    z = 5;
  else
    z = 5;
}
...

I proposed a patch for that separately:
http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
patch now, I suppose I could use part of that patch as infrastructure for the
todo above.

> and I don't see how you need the new ref->vdef member

I'm regarding the result of the ref as a compound <vdef, result>. Most of the
time I could (as I did in an earlier version of the patch) deduce vdef as
gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
result is NULL_TREE.

> - that is also
> not handled consistently (not in hashing or comparing, etc.).

I handle the vdef as a result.

> Can't you
> always use SSA_VAL (vuse)?

I'm interested in storing the vdef, not the vuse.

> 
> Btw, what's
> 
> +  if (vdef)
> +    VN_INFO (vdef)->use_processed = true;
> 
> for?  We arrive here by visiting the VDEF after all and visit_use does already
> set the use processed.  Is it that we end up visiting the call twice because
> of the lhs SSA name definition and the virtual operand definition? 

Yes.

> If so then
> visit_use should instead do
> 
> FOR_EACH_SSA_DEF_OPERAND (...)
>    VN_INFO (def)->use_processed = true;
> 
> and defs_to_varying then no longer needs to do that.
> 

factored mark_use_processed out of defs_to_varying, calling mark_use_processed
at start of visit_use.

>> > Otherwise, ok for trunk?
> Let's give this one more iteration, but I guess the basic idea should work.
> 

bootstrapped and reg-tested on x86_64 (ada inclusive).

Is this patch ok, or is the todo required?

Thanks,
 - Tom

2012-04-25  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/51879
	* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
	* tree-ssa-sccvn.c (mark_use_processed): New function, factored out
	of ...
	(defs_to_varying): ... here.
	(visit_reference_op_call): Handle gimple_vdef.
	Handle case that lhs is NULL_TREE.
	(visit_use): Use mark_use_processed.  Handle calls with side-effect
	using visit_reference_op_call.

	gcc.dg/pr51879.c: New test.
	gcc.dg/pr51879-2.c: Same.
	gcc.dg/pr51879-3.c: Same.
	gcc.dg/pr51879-4.c: Same.
	gcc.dg/pr51879-6.c: Same.

[-- Attachment #2: pr51879.6.patch --]
[-- Type: text/x-patch, Size: 9408 bytes --]

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 185028)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -2525,6 +2525,30 @@ set_ssa_val_to (tree from, tree to)
   return false;
 }
 
+/* Mark as processed all the definitions in the defining stmt of USE, or
+   the USE itself.  */
+
+static void
+mark_use_processed (tree use)
+{
+  ssa_op_iter iter;
+  def_operand_p defp;
+  gimple stmt = SSA_NAME_DEF_STMT (use);
+
+  if (!stmt || gimple_code (stmt) == GIMPLE_PHI)
+    {
+      VN_INFO (use)->use_processed = true;
+      return;
+    }
+
+  FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_ALL_DEFS)
+    {
+      tree def = DEF_FROM_PTR (defp);
+
+      VN_INFO (def)->use_processed = true;
+    }
+}
+
 /* Set all definitions in STMT to value number to themselves.
    Return true if a value number changed. */
 
@@ -2538,8 +2562,6 @@ defs_to_varying (gimple stmt)
   FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_ALL_DEFS)
     {
       tree def = DEF_FROM_PTR (defp);
-
-      VN_INFO (def)->use_processed = true;
       changed |= set_ssa_val_to (def, def);
     }
   return changed;
@@ -2598,27 +2620,41 @@ visit_reference_op_call (tree lhs, gimpl
 {
   bool changed = false;
   struct vn_reference_s vr1;
-  tree result;
+  vn_reference_t vnresult = NULL;
   tree vuse = gimple_vuse (stmt);
+  tree vdef = gimple_vdef (stmt);
 
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
   vr1.type = gimple_expr_type (stmt);
   vr1.set = 0;
   vr1.hashcode = vn_reference_compute_hash (&vr1);
-  result = vn_reference_lookup_1 (&vr1, NULL);
-  if (result)
+  vn_reference_lookup_1 (&vr1, &vnresult);
+
+  if (vnresult)
     {
-      changed = set_ssa_val_to (lhs, result);
-      if (TREE_CODE (result) == SSA_NAME
-	  && VN_INFO (result)->has_constants)
-	VN_INFO (lhs)->has_constants = true;
+      if (vnresult->vdef)
+	changed |= set_ssa_val_to (vdef, vnresult->vdef);
+
+      if (!vnresult->result && lhs)
+	vnresult->result = lhs;
+
+      if (vnresult->result && lhs)
+	{
+	  changed |= set_ssa_val_to (lhs, vnresult->result);
+
+	  if (VN_INFO (vnresult->result)->has_constants)
+	    VN_INFO (lhs)->has_constants = true;
+	}
     }
   else
     {
       void **slot;
       vn_reference_t vr2;
-      changed = set_ssa_val_to (lhs, lhs);
+      if (vdef)
+	changed |= set_ssa_val_to (vdef, vdef);
+      if (lhs)
+	changed |= set_ssa_val_to (lhs, lhs);
       vr2 = (vn_reference_t) pool_alloc (current_info->references_pool);
       vr2->vuse = vr1.vuse;
       vr2->operands = valueize_refs (create_reference_ops_from_call (stmt));
@@ -2626,6 +2662,7 @@ visit_reference_op_call (tree lhs, gimpl
       vr2->set = vr1.set;
       vr2->hashcode = vr1.hashcode;
       vr2->result = lhs;
+      vr2->vdef = vdef;
       slot = htab_find_slot_with_hash (current_info->references,
 				       vr2, vr2->hashcode, INSERT);
       if (*slot)
@@ -2795,7 +2832,6 @@ visit_reference_op_store (tree lhs, tree
 	 going to valueize the references in-place.  */
       if ((vdef = gimple_vdef (stmt)))
 	{
-	  VN_INFO (vdef)->use_processed = true;
 	  changed |= set_ssa_val_to (vdef, vdef);
 	}
 
@@ -2817,7 +2853,6 @@ visit_reference_op_store (tree lhs, tree
       def = gimple_vdef (stmt);
       use = gimple_vuse (stmt);
 
-      VN_INFO (def)->use_processed = true;
       changed |= set_ssa_val_to (def, SSA_VAL (use));
     }
 
@@ -3167,7 +3202,7 @@ visit_use (tree use)
   bool changed = false;
   gimple stmt = SSA_NAME_DEF_STMT (use);
 
-  VN_INFO (use)->use_processed = true;
+  mark_use_processed (use);
 
   gcc_assert (!SSA_NAME_IN_FREE_LIST (use));
   if (dump_file && (dump_flags & TDF_DETAILS)
@@ -3186,8 +3221,7 @@ visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (!gimple_has_lhs (stmt)
-	       || gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{
@@ -3349,34 +3383,44 @@ visit_use (tree use)
 
 	  /* ???  We could try to simplify calls.  */
 
-	  if (stmt_has_constants (stmt)
-	      && TREE_CODE (lhs) == SSA_NAME)
-	    VN_INFO (lhs)->has_constants = true;
-	  else if (TREE_CODE (lhs) == SSA_NAME)
+	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    {
-	      /* We reset expr and constantness here because we may
-		 have been value numbering optimistically, and
-		 iterating. They may become non-constant in this case,
-		 even if they were optimistically constant. */
-	      VN_INFO (lhs)->has_constants = false;
-	      VN_INFO (lhs)->expr = NULL_TREE;
+	      if (stmt_has_constants (stmt))
+		VN_INFO (lhs)->has_constants = true;
+	      else
+		{
+		  /* We reset expr and constantness here because we may
+		     have been value numbering optimistically, and
+		     iterating.  They may become non-constant in this case,
+		     even if they were optimistically constant.  */
+		  VN_INFO (lhs)->has_constants = false;
+		  VN_INFO (lhs)->expr = NULL_TREE;
+		}
+
+	      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+		{
+		  changed = defs_to_varying (stmt);
+		  goto done;
+		}
 	    }
 
-	  if (TREE_CODE (lhs) == SSA_NAME
-	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
-	    changed = defs_to_varying (stmt);
 	  /* ???  We should handle stores from calls.  */
-	  else if (TREE_CODE (lhs) == SSA_NAME)
+	  if (!gimple_call_internal_p (stmt)
+	      && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
+		  /* If the call has side effects, subsequent calls won't have
+		     the same incoming vuse, so it's save to assume
+		     equality.  */
+		  || gimple_has_side_effects (stmt))
+	      && ((lhs && TREE_CODE (lhs) == SSA_NAME)
+		  || (!lhs && gimple_vdef (stmt))))
 	    {
-	      if (!gimple_call_internal_p (stmt)
-		  && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
-		changed = visit_reference_op_call (lhs, stmt);
-	      else
-		changed = defs_to_varying (stmt);
+	      changed = visit_reference_op_call (lhs, stmt);
 	    }
 	  else
 	    changed = defs_to_varying (stmt);
 	}
+      else
+	changed = defs_to_varying (stmt);
     }
  done:
   return changed;
Index: gcc/tree-ssa-sccvn.h
===================================================================
--- gcc/tree-ssa-sccvn.h (revision 185028)
+++ gcc/tree-ssa-sccvn.h (working copy)
@@ -110,6 +110,7 @@ typedef struct vn_reference_s
   tree type;
   VEC (vn_reference_op_s, heap) *operands;
   tree result;
+  tree vdef;
 } *vn_reference_t;
 typedef const struct vn_reference_s *const_vn_reference_t;
 
Index: gcc/testsuite/gcc.dg/pr51879-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y)
+    baz (bar (7) + 6);
+  else
+    baz (bar (7) + 6);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-6.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-6.c (revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+
+int bar (int);
+void baz (int);
+void bla (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    {
+      bla (5);
+      a = bar (7);
+    }
+  else
+    {
+      bla (5);
+      a = bar (7);
+    }
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7);
+  else
+    a = bar (7);
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7) + 6;
+  else
+    a = bar (7) + 6;
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-4.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+int foo (int y)
+{
+  int a, b;
+  a = bar (7) + 6;
+  b = bar (7) + 6;
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-25 21:57             ` Tom de Vries
@ 2012-04-26 10:21               ` Richard Guenther
  2012-04-27  6:20                 ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2012-04-26 10:21 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 25/04/12 11:57, Richard Guenther wrote:
> <SNIP>
>>>> >> Hmm.  I'm not sure we can conclude that they have the same value!
>>>> >>
>>>> >> +int bar (int);
>>>> >> +void baz (int);
>>>> >> +void bla (int);
>>>> >> +
>>>> >> +void
>>>> >> +foo (int y)
>>>> >> +{
>>>> >> +  int a;
>>>> >> +  if (y == 6)
>>>> >> +    {
>>>> >> +      bla (5);
>>>> >> +      a = bar (7);
>>>> >> +    }
>>>> >> +  else
>>>> >> +    {
>>>> >> +      bla (5);
>>>> >> +      a = bar (7);
>>>> >> +    }
>>>> >> +  baz (a);
>>>> >> +}
>>>> >>
>>>> >> I can implement bla as
>>>> >>
>>>> >> void bla(int) { a = random (); }
>>>> >>
>>>> >> thus a would not have the same value on both paths
>>> >
>>> > I think it's hard to decide whether they have the same value or not, since we
>>> > cannot write a test-case that compares the results of calls from 2 branches of
>>> > an if statement.
>> void *foo ()
>> {
>>   return __builtin_return_address (0);
>> }
>>
>> void *bar (_Bool b)
>> {
>>   if (b)
>>     return foo ();
>>   else
>>     return foo ();
>> }
>>
>> int main()
>> {
>>   if (bar(true) == bar(false))
>>     abort ();
>> }
>>
>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>> Which would question tail-merging the above at all, of course.
>>
>
> I added noinline to make sure there's no variation from inlining.
> ...
> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
> ((__noreturn__));
>
> __attribute__((noinline))
> void *foo ()
> {
>  return __builtin_return_address (0);
> }
>
> __attribute__((noinline))
> void *bar (int b)
> {
>  if (b)
>    return foo ();
>  else
>    return foo ();
> }
>
> int main()
> {
>  void *a, *b;
>  a = bar (1);
>  b = bar (0);
>  if (a == b)
>    abort ();
>  return 0;
> }
> ...
>
> This test-case passes with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
> ...
>
> and fails with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
> ...
>
> with the proposed patch, it also fails with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
> ...
>
> Tree-tail-merge performs the same transformation as cross-jumping for this
> test-case. I think that the transformation is valid, and that the test-case
> behaves as expected. Subsequent calls to foo may or may not return the same
> value, depending on the optimizations, we can't assert a != b, that's just the
> nature of __builtin_return_address.
>
> Btw, this example is not what I meant when I said 'a test-case that compares the
> results of calls from 2 branches of an if statement'. I tried to say that the
> semantics of C is defined in terms of taking either 1 branch or the other,
> rather than executing both in parallel, after which both results would be
> available. From that point of view, this example compares subsequent calls to
> foo, not parallel.

Ah, I see.  Btw, I was not suggesting that we should not optimize the above.

>>> > I started looking at the problem in terms of subsequent calls. Consider the
>>> > imaginary attribute nosideeffect, similar to const and pure.
>>> >
>>> > const:
>>> > - no effects except the return value
>>> > - the return value depends only on the parameters
>>> >
>>> > pure:
>>> > - no effects except the return value
>>> > - the return value depends only on the parameters and/or global variables
>>> >
>>> > nosideeffect:
>>> > - no effects except the return value
>>> >
>>> > The code fragment:
>>> > ...
>>> > extern int f (int) __attribute ((nosideeffect));
>>> > int g()
>>> > {
>>> >  int a;
>>> >  int b;
>>> >  a = f (5);
>>> >  b = f (5);
>>> >  return a + b;
>>> > }
>>> > ...
>>> >
>>> > would result in a gimple sequence more or less like this:
>>> > ...
>>> >  # VUSE <.MEMD.1712_4(D)>
>>> >  aD.1707_1 = fD.1704 (5);
>>> >  # VUSE <.MEMD.1712_4(D)>
>>> >  bD.1708_2 = fD.1704 (5);
>>> >  D.1710_3 = aD.1707_1 + bD.1708_2;
>>> >  # VUSE <.MEMD.1712_6>
>>> >  return D.1710_3;
>>> > ...
>>> >
>>> > The value numbering modification I'm proposing would say that a and b have the
>>> > same value, which is not true. I updated the patch to ensure in this case that a
>>> > and b would be seen as different.
>>> > Note that things won't break if the attribute is missing on a function, since
>>> > that just means that the function would have a vdef, and there would be no 2
>>> > subsequent function calls with the same vuse.
>>> >
>>>> >> - but that is not what
>>>> >> matters - because obviously we can still merge the blocks.
>>> >
>>> > Right.
>>> >
>>>> >>  That is,
>>>> >> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>> >> are the same.  But is that not what your value-numbering changes will assert?
>>> >
>>> > In the case of calls in different branches of an control flow statement, we can
>>> > aggressively assume they're the same, since we cannot write a check that would
>>> > start failing.
>> Apart from the above ... which shows that even the returned values can matter.
>>
>>> > In the case of subsequent calls with side effects, the vuses will never be the same.
>>> >
>>> > In the case of subsequent calls without side effects, but not pure or const, we
>>> > can't assume they have the same result. AFAIK, there's currently no way to
>>> > specify such a function, but the updated patch should be able handle this.
>>> >
>>>> >> (not sure how you achieve this, but it looks like you insert/lookup general
>>>> >> calls, thus not pure/const ones
>>> >
>>> > Correct.
>>> >
>>>> >> - that could easily lead to miscompiles(?)).
>>>> >>
>> And to increased compile-time (not sure if it matters).
>>
>>> > I have bootstrapped and reg-tested the updated (and the previous) patch on
>>> > x86_64 (ada inclusive), and found no issues.
>>> >
>>> > Can you think of a test-case that fails with the previous or the updated patch?
>> I see you do not handle
>>
>
> I assume you mean 'not handle' in the sense of 'handle conservatively'.

Yes.

>> struct S { int i; };
>> struct S foo (void);
>> struct S bar (void)
>> {
>>   struct S s1, s2;
>>   if (...)
>>    s = foo ();
>>   else
>>    s = foo ();
>>
>> because the calls have a LHS that is not an SSA name.
>
> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
> fails to optimize this test-case:
> ...
> struct S { int i; };
> extern struct S foo (void);
> extern int foo2 (void);
> struct S s;
> int bar (int c) {
>  int r;
>  if (c)
>    {
>      s = foo ();
>      r = foo2 ();
>    }
>  else
>    {
>      s = foo ();
>      r = foo2 ();
>    }
>  return r;
> }
> ...
>
> A todo.
>
>>  In general
>> value-numbering virtual operands will change what further consumers lookup.
>> So changing
>>
>>   # MEM_2 = VUSE <MEM_3>
>>   foo ();
>>   # VUSE <MEM_2>
>>   ... = x;
>>
>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>> MEM_3, not considering a clobber by foo.
>
> AFAIU, the patch doesn't do this type of value numbering.
>
>> This might not actually be an
>> issue (after all we do this for stores, too).
>>
>> I see you do not handle
>>
>> int bar (int);
>> void baz (int);
>> void bla (int);
>> int s;
>> void
>> foo (int y)
>> {
>>   int a;
>>   if (y == 6)
>>     {
>>       s = 1;
>>       bla (5);
>>       a = bar (7);
>>     }
>>   else
>>     {
>>       s = 1;
>>       bla (5);
>>       a = bar (7);
>>     }
>>   baz (a);
>> }
>>
>
> I think a more basic example is this one from PR52009, which is basically about
> stores:
> ...
> int z;
>
> void
> foo (int y)
> {
>  int a;
>  if (y == 6)
>    z = 5;
>  else
>    z = 5;
> }
> ...
>
> I proposed a patch for that separately:
> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
> patch now, I suppose I could use part of that patch as infrastructure for the
> todo above.

Yes, I think so.

>> and I don't see how you need the new ref->vdef member
>
> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
> time I could (as I did in an earlier version of the patch) deduce vdef as
> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
> result is NULL_TREE.
>
>> - that is also
>> not handled consistently (not in hashing or comparing, etc.).
>
> I handle the vdef as a result.

Ok.  Can you please rename ->vdef to ->result_vdef then?

>> Can't you
>> always use SSA_VAL (vuse)?
>
> I'm interested in storing the vdef, not the vuse.

I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
the stmt defining vuse).  No?

>>
>> Btw, what's
>>
>> +  if (vdef)
>> +    VN_INFO (vdef)->use_processed = true;
>>
>> for?  We arrive here by visiting the VDEF after all and visit_use does already
>> set the use processed.  Is it that we end up visiting the call twice because
>> of the lhs SSA name definition and the virtual operand definition?
>
> Yes.
>
>> If so then
>> visit_use should instead do
>>
>> FOR_EACH_SSA_DEF_OPERAND (...)
>>    VN_INFO (def)->use_processed = true;
>>
>> and defs_to_varying then no longer needs to do that.
>>
>
> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
> at start of visit_use.

Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
default-def for the virtual operand?

>>> > Otherwise, ok for trunk?
>> Let's give this one more iteration, but I guess the basic idea should work.
>>
>
> bootstrapped and reg-tested on x86_64 (ada inclusive).
>
> Is this patch ok, or is the todo required?

No, you can followup with that.

Ok with s/vdef/result_vdef/

Thanks,
Richard.

> Thanks,
>  - Tom
>
> 2012-04-25  Tom de Vries  <tom@codesourcery.com>
>
>        PR tree-optimization/51879
>        * tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
>        * tree-ssa-sccvn.c (mark_use_processed): New function, factored out
>        of ...
>        (defs_to_varying): ... here.
>        (visit_reference_op_call): Handle gimple_vdef.
>        Handle case that lhs is NULL_TREE.
>        (visit_use): Use mark_use_processed.  Handle calls with side-effect
>        using visit_reference_op_call.
>
>        gcc.dg/pr51879.c: New test.
>        gcc.dg/pr51879-2.c: Same.
>        gcc.dg/pr51879-3.c: Same.
>        gcc.dg/pr51879-4.c: Same.
>        gcc.dg/pr51879-6.c: Same.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-26 10:21               ` Richard Guenther
@ 2012-04-27  6:20                 ` Tom de Vries
  2012-04-27  9:02                   ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-04-27  6:20 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

On 26/04/12 12:20, Richard Guenther wrote:
> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 25/04/12 11:57, Richard Guenther wrote:
>> <SNIP>
>>>>>>> Hmm.  I'm not sure we can conclude that they have the same value!
>>>>>>>
>>>>>>> +int bar (int);
>>>>>>> +void baz (int);
>>>>>>> +void bla (int);
>>>>>>> +
>>>>>>> +void
>>>>>>> +foo (int y)
>>>>>>> +{
>>>>>>> +  int a;
>>>>>>> +  if (y == 6)
>>>>>>> +    {
>>>>>>> +      bla (5);
>>>>>>> +      a = bar (7);
>>>>>>> +    }
>>>>>>> +  else
>>>>>>> +    {
>>>>>>> +      bla (5);
>>>>>>> +      a = bar (7);
>>>>>>> +    }
>>>>>>> +  baz (a);
>>>>>>> +}
>>>>>>>
>>>>>>> I can implement bla as
>>>>>>>
>>>>>>> void bla(int) { a = random (); }
>>>>>>>
>>>>>>> thus a would not have the same value on both paths
>>>>>
>>>>> I think it's hard to decide whether they have the same value or not, since we
>>>>> cannot write a test-case that compares the results of calls from 2 branches of
>>>>> an if statement.
>>> void *foo ()
>>> {
>>>   return __builtin_return_address (0);
>>> }
>>>
>>> void *bar (_Bool b)
>>> {
>>>   if (b)
>>>     return foo ();
>>>   else
>>>     return foo ();
>>> }
>>>
>>> int main()
>>> {
>>>   if (bar(true) == bar(false))
>>>     abort ();
>>> }
>>>
>>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>>> Which would question tail-merging the above at all, of course.
>>>
>>
>> I added noinline to make sure there's no variation from inlining.
>> ...
>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
>> ((__noreturn__));
>>
>> __attribute__((noinline))
>> void *foo ()
>> {
>>  return __builtin_return_address (0);
>> }
>>
>> __attribute__((noinline))
>> void *bar (int b)
>> {
>>  if (b)
>>    return foo ();
>>  else
>>    return foo ();
>> }
>>
>> int main()
>> {
>>  void *a, *b;
>>  a = bar (1);
>>  b = bar (0);
>>  if (a == b)
>>    abort ();
>>  return 0;
>> }
>> ...
>>
>> This test-case passes with:
>> ...
>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
>> ...
>>
>> and fails with:
>> ...
>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
>> ...
>>
>> with the proposed patch, it also fails with:
>> ...
>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
>> ...
>>
>> Tree-tail-merge performs the same transformation as cross-jumping for this
>> test-case. I think that the transformation is valid, and that the test-case
>> behaves as expected. Subsequent calls to foo may or may not return the same
>> value, depending on the optimizations, we can't assert a != b, that's just the
>> nature of __builtin_return_address.
>>
>> Btw, this example is not what I meant when I said 'a test-case that compares the
>> results of calls from 2 branches of an if statement'. I tried to say that the
>> semantics of C is defined in terms of taking either 1 branch or the other,
>> rather than executing both in parallel, after which both results would be
>> available. From that point of view, this example compares subsequent calls to
>> foo, not parallel.
> 
> Ah, I see.  Btw, I was not suggesting that we should not optimize the above.
> 
>>>>> I started looking at the problem in terms of subsequent calls. Consider the
>>>>> imaginary attribute nosideeffect, similar to const and pure.
>>>>>
>>>>> const:
>>>>> - no effects except the return value
>>>>> - the return value depends only on the parameters
>>>>>
>>>>> pure:
>>>>> - no effects except the return value
>>>>> - the return value depends only on the parameters and/or global variables
>>>>>
>>>>> nosideeffect:
>>>>> - no effects except the return value
>>>>>
>>>>> The code fragment:
>>>>> ...
>>>>> extern int f (int) __attribute ((nosideeffect));
>>>>> int g()
>>>>> {
>>>>>  int a;
>>>>>  int b;
>>>>>  a = f (5);
>>>>>  b = f (5);
>>>>>  return a + b;
>>>>> }
>>>>> ...
>>>>>
>>>>> would result in a gimple sequence more or less like this:
>>>>> ...
>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>  aD.1707_1 = fD.1704 (5);
>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>  bD.1708_2 = fD.1704 (5);
>>>>>  D.1710_3 = aD.1707_1 + bD.1708_2;
>>>>>  # VUSE <.MEMD.1712_6>
>>>>>  return D.1710_3;
>>>>> ...
>>>>>
>>>>> The value numbering modification I'm proposing would say that a and b have the
>>>>> same value, which is not true. I updated the patch to ensure in this case that a
>>>>> and b would be seen as different.
>>>>> Note that things won't break if the attribute is missing on a function, since
>>>>> that just means that the function would have a vdef, and there would be no 2
>>>>> subsequent function calls with the same vuse.
>>>>>
>>>>>>> - but that is not what
>>>>>>> matters - because obviously we can still merge the blocks.
>>>>>
>>>>> Right.
>>>>>
>>>>>>>  That is,
>>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>>>>> are the same.  But is that not what your value-numbering changes will assert?
>>>>>
>>>>> In the case of calls in different branches of an control flow statement, we can
>>>>> aggressively assume they're the same, since we cannot write a check that would
>>>>> start failing.
>>> Apart from the above ... which shows that even the returned values can matter.
>>>
>>>>> In the case of subsequent calls with side effects, the vuses will never be the same.
>>>>>
>>>>> In the case of subsequent calls without side effects, but not pure or const, we
>>>>> can't assume they have the same result. AFAIK, there's currently no way to
>>>>> specify such a function, but the updated patch should be able handle this.
>>>>>
>>>>>>> (not sure how you achieve this, but it looks like you insert/lookup general
>>>>>>> calls, thus not pure/const ones
>>>>>
>>>>> Correct.
>>>>>
>>>>>>> - that could easily lead to miscompiles(?)).
>>>>>>>
>>> And to increased compile-time (not sure if it matters).
>>>
>>>>> I have bootstrapped and reg-tested the updated (and the previous) patch on
>>>>> x86_64 (ada inclusive), and found no issues.
>>>>>
>>>>> Can you think of a test-case that fails with the previous or the updated patch?
>>> I see you do not handle
>>>
>>
>> I assume you mean 'not handle' in the sense of 'handle conservatively'.
> 
> Yes.
> 
>>> struct S { int i; };
>>> struct S foo (void);
>>> struct S bar (void)
>>> {
>>>   struct S s1, s2;
>>>   if (...)
>>>    s = foo ();
>>>   else
>>>    s = foo ();
>>>
>>> because the calls have a LHS that is not an SSA name.
>>
>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>> fails to optimize this test-case:
>> ...
>> struct S { int i; };
>> extern struct S foo (void);
>> extern int foo2 (void);
>> struct S s;
>> int bar (int c) {
>>  int r;
>>  if (c)
>>    {
>>      s = foo ();
>>      r = foo2 ();
>>    }
>>  else
>>    {
>>      s = foo ();
>>      r = foo2 ();
>>    }
>>  return r;
>> }
>> ...
>>
>> A todo.
>>
>>>  In general
>>> value-numbering virtual operands will change what further consumers lookup.
>>> So changing
>>>
>>>   # MEM_2 = VUSE <MEM_3>
>>>   foo ();
>>>   # VUSE <MEM_2>
>>>   ... = x;
>>>
>>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>>> MEM_3, not considering a clobber by foo.
>>
>> AFAIU, the patch doesn't do this type of value numbering.
>>
>>> This might not actually be an
>>> issue (after all we do this for stores, too).
>>>
>>> I see you do not handle
>>>
>>> int bar (int);
>>> void baz (int);
>>> void bla (int);
>>> int s;
>>> void
>>> foo (int y)
>>> {
>>>   int a;
>>>   if (y == 6)
>>>     {
>>>       s = 1;
>>>       bla (5);
>>>       a = bar (7);
>>>     }
>>>   else
>>>     {
>>>       s = 1;
>>>       bla (5);
>>>       a = bar (7);
>>>     }
>>>   baz (a);
>>> }
>>>
>>
>> I think a more basic example is this one from PR52009, which is basically about
>> stores:
>> ...
>> int z;
>>
>> void
>> foo (int y)
>> {
>>  int a;
>>  if (y == 6)
>>    z = 5;
>>  else
>>    z = 5;
>> }
>> ...
>>
>> I proposed a patch for that separately:
>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
>> patch now, I suppose I could use part of that patch as infrastructure for the
>> todo above.
> 
> Yes, I think so.
> 
>>> and I don't see how you need the new ref->vdef member
>>
>> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
>> time I could (as I did in an earlier version of the patch) deduce vdef as
>> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
>> result is NULL_TREE.
>>
>>> - that is also
>>> not handled consistently (not in hashing or comparing, etc.).
>>
>> I handle the vdef as a result.
> 
> Ok.  Can you please rename ->vdef to ->result_vdef then?
> 

Done.

>>> Can't you
>>> always use SSA_VAL (vuse)?
>>
>> I'm interested in storing the vdef, not the vuse.
> 
> I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
> the stmt defining vuse).  No?
> 

Right.

>>>
>>> Btw, what's
>>>
>>> +  if (vdef)
>>> +    VN_INFO (vdef)->use_processed = true;
>>>
>>> for?  We arrive here by visiting the VDEF after all and visit_use does already
>>> set the use processed.  Is it that we end up visiting the call twice because
>>> of the lhs SSA name definition and the virtual operand definition?
>>
>> Yes.
>>
>>> If so then
>>> visit_use should instead do
>>>
>>> FOR_EACH_SSA_DEF_OPERAND (...)
>>>    VN_INFO (def)->use_processed = true;
>>>
>>> and defs_to_varying then no longer needs to do that.
>>>
>>
>> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
>> at start of visit_use.
> 
> Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
> default-def for the virtual operand?
> 

SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed.

I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm
still somewhat confused as to what is the difference. I suppose we might have a
nop statement that is a default def, while it's unequal to NULL.

Updated in committed patch.

>>>>> Otherwise, ok for trunk?
>>> Let's give this one more iteration, but I guess the basic idea should work.
>>>
>>
>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>
>> Is this patch ok, or is the todo required?
> 
> No, you can followup with that.
> 
> Ok with s/vdef/result_vdef/
> 

retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of
'!stmt'.

Thanks.
- Tom

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-27  6:20                 ` Tom de Vries
@ 2012-04-27  9:02                   ` Richard Guenther
  2012-05-02 14:07                     ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2012-04-27  9:02 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 26/04/12 12:20, Richard Guenther wrote:
>> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 25/04/12 11:57, Richard Guenther wrote:
>>> <SNIP>
>>>>>>>> Hmm.  I'm not sure we can conclude that they have the same value!
>>>>>>>>
>>>>>>>> +int bar (int);
>>>>>>>> +void baz (int);
>>>>>>>> +void bla (int);
>>>>>>>> +
>>>>>>>> +void
>>>>>>>> +foo (int y)
>>>>>>>> +{
>>>>>>>> +  int a;
>>>>>>>> +  if (y == 6)
>>>>>>>> +    {
>>>>>>>> +      bla (5);
>>>>>>>> +      a = bar (7);
>>>>>>>> +    }
>>>>>>>> +  else
>>>>>>>> +    {
>>>>>>>> +      bla (5);
>>>>>>>> +      a = bar (7);
>>>>>>>> +    }
>>>>>>>> +  baz (a);
>>>>>>>> +}
>>>>>>>>
>>>>>>>> I can implement bla as
>>>>>>>>
>>>>>>>> void bla(int) { a = random (); }
>>>>>>>>
>>>>>>>> thus a would not have the same value on both paths
>>>>>>
>>>>>> I think it's hard to decide whether they have the same value or not, since we
>>>>>> cannot write a test-case that compares the results of calls from 2 branches of
>>>>>> an if statement.
>>>> void *foo ()
>>>> {
>>>>   return __builtin_return_address (0);
>>>> }
>>>>
>>>> void *bar (_Bool b)
>>>> {
>>>>   if (b)
>>>>     return foo ();
>>>>   else
>>>>     return foo ();
>>>> }
>>>>
>>>> int main()
>>>> {
>>>>   if (bar(true) == bar(false))
>>>>     abort ();
>>>> }
>>>>
>>>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>>>> Which would question tail-merging the above at all, of course.
>>>>
>>>
>>> I added noinline to make sure there's no variation from inlining.
>>> ...
>>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
>>> ((__noreturn__));
>>>
>>> __attribute__((noinline))
>>> void *foo ()
>>> {
>>>  return __builtin_return_address (0);
>>> }
>>>
>>> __attribute__((noinline))
>>> void *bar (int b)
>>> {
>>>  if (b)
>>>    return foo ();
>>>  else
>>>    return foo ();
>>> }
>>>
>>> int main()
>>> {
>>>  void *a, *b;
>>>  a = bar (1);
>>>  b = bar (0);
>>>  if (a == b)
>>>    abort ();
>>>  return 0;
>>> }
>>> ...
>>>
>>> This test-case passes with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping
>>> ...
>>>
>>> and fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping
>>> ...
>>>
>>> with the proposed patch, it also fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping
>>> ...
>>>
>>> Tree-tail-merge performs the same transformation as cross-jumping for this
>>> test-case. I think that the transformation is valid, and that the test-case
>>> behaves as expected. Subsequent calls to foo may or may not return the same
>>> value, depending on the optimizations, we can't assert a != b, that's just the
>>> nature of __builtin_return_address.
>>>
>>> Btw, this example is not what I meant when I said 'a test-case that compares the
>>> results of calls from 2 branches of an if statement'. I tried to say that the
>>> semantics of C is defined in terms of taking either 1 branch or the other,
>>> rather than executing both in parallel, after which both results would be
>>> available. From that point of view, this example compares subsequent calls to
>>> foo, not parallel.
>>
>> Ah, I see.  Btw, I was not suggesting that we should not optimize the above.
>>
>>>>>> I started looking at the problem in terms of subsequent calls. Consider the
>>>>>> imaginary attribute nosideeffect, similar to const and pure.
>>>>>>
>>>>>> const:
>>>>>> - no effects except the return value
>>>>>> - the return value depends only on the parameters
>>>>>>
>>>>>> pure:
>>>>>> - no effects except the return value
>>>>>> - the return value depends only on the parameters and/or global variables
>>>>>>
>>>>>> nosideeffect:
>>>>>> - no effects except the return value
>>>>>>
>>>>>> The code fragment:
>>>>>> ...
>>>>>> extern int f (int) __attribute ((nosideeffect));
>>>>>> int g()
>>>>>> {
>>>>>>  int a;
>>>>>>  int b;
>>>>>>  a = f (5);
>>>>>>  b = f (5);
>>>>>>  return a + b;
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> would result in a gimple sequence more or less like this:
>>>>>> ...
>>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>>  aD.1707_1 = fD.1704 (5);
>>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>>  bD.1708_2 = fD.1704 (5);
>>>>>>  D.1710_3 = aD.1707_1 + bD.1708_2;
>>>>>>  # VUSE <.MEMD.1712_6>
>>>>>>  return D.1710_3;
>>>>>> ...
>>>>>>
>>>>>> The value numbering modification I'm proposing would say that a and b have the
>>>>>> same value, which is not true. I updated the patch to ensure in this case that a
>>>>>> and b would be seen as different.
>>>>>> Note that things won't break if the attribute is missing on a function, since
>>>>>> that just means that the function would have a vdef, and there would be no 2
>>>>>> subsequent function calls with the same vuse.
>>>>>>
>>>>>>>> - but that is not what
>>>>>>>> matters - because obviously we can still merge the blocks.
>>>>>>
>>>>>> Right.
>>>>>>
>>>>>>>>  That is,
>>>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>>>>>> are the same.  But is that not what your value-numbering changes will assert?
>>>>>>
>>>>>> In the case of calls in different branches of an control flow statement, we can
>>>>>> aggressively assume they're the same, since we cannot write a check that would
>>>>>> start failing.
>>>> Apart from the above ... which shows that even the returned values can matter.
>>>>
>>>>>> In the case of subsequent calls with side effects, the vuses will never be the same.
>>>>>>
>>>>>> In the case of subsequent calls without side effects, but not pure or const, we
>>>>>> can't assume they have the same result. AFAIK, there's currently no way to
>>>>>> specify such a function, but the updated patch should be able handle this.
>>>>>>
>>>>>>>> (not sure how you achieve this, but it looks like you insert/lookup general
>>>>>>>> calls, thus not pure/const ones
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>>> - that could easily lead to miscompiles(?)).
>>>>>>>>
>>>> And to increased compile-time (not sure if it matters).
>>>>
>>>>>> I have bootstrapped and reg-tested the updated (and the previous) patch on
>>>>>> x86_64 (ada inclusive), and found no issues.
>>>>>>
>>>>>> Can you think of a test-case that fails with the previous or the updated patch?
>>>> I see you do not handle
>>>>
>>>
>>> I assume you mean 'not handle' in the sense of 'handle conservatively'.
>>
>> Yes.
>>
>>>> struct S { int i; };
>>>> struct S foo (void);
>>>> struct S bar (void)
>>>> {
>>>>   struct S s1, s2;
>>>>   if (...)
>>>>    s = foo ();
>>>>   else
>>>>    s = foo ();
>>>>
>>>> because the calls have a LHS that is not an SSA name.
>>>
>>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>>> fails to optimize this test-case:
>>> ...
>>> struct S { int i; };
>>> extern struct S foo (void);
>>> extern int foo2 (void);
>>> struct S s;
>>> int bar (int c) {
>>>  int r;
>>>  if (c)
>>>    {
>>>      s = foo ();
>>>      r = foo2 ();
>>>    }
>>>  else
>>>    {
>>>      s = foo ();
>>>      r = foo2 ();
>>>    }
>>>  return r;
>>> }
>>> ...
>>>
>>> A todo.
>>>
>>>>  In general
>>>> value-numbering virtual operands will change what further consumers lookup.
>>>> So changing
>>>>
>>>>   # MEM_2 = VUSE <MEM_3>
>>>>   foo ();
>>>>   # VUSE <MEM_2>
>>>>   ... = x;
>>>>
>>>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>>>> MEM_3, not considering a clobber by foo.
>>>
>>> AFAIU, the patch doesn't do this type of value numbering.
>>>
>>>> This might not actually be an
>>>> issue (after all we do this for stores, too).
>>>>
>>>> I see you do not handle
>>>>
>>>> int bar (int);
>>>> void baz (int);
>>>> void bla (int);
>>>> int s;
>>>> void
>>>> foo (int y)
>>>> {
>>>>   int a;
>>>>   if (y == 6)
>>>>     {
>>>>       s = 1;
>>>>       bla (5);
>>>>       a = bar (7);
>>>>     }
>>>>   else
>>>>     {
>>>>       s = 1;
>>>>       bla (5);
>>>>       a = bar (7);
>>>>     }
>>>>   baz (a);
>>>> }
>>>>
>>>
>>> I think a more basic example is this one from PR52009, which is basically about
>>> stores:
>>> ...
>>> int z;
>>>
>>> void
>>> foo (int y)
>>> {
>>>  int a;
>>>  if (y == 6)
>>>    z = 5;
>>>  else
>>>    z = 5;
>>> }
>>> ...
>>>
>>> I proposed a patch for that separately:
>>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
>>> patch now, I suppose I could use part of that patch as infrastructure for the
>>> todo above.
>>
>> Yes, I think so.
>>
>>>> and I don't see how you need the new ref->vdef member
>>>
>>> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
>>> time I could (as I did in an earlier version of the patch) deduce vdef as
>>> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
>>> result is NULL_TREE.
>>>
>>>> - that is also
>>>> not handled consistently (not in hashing or comparing, etc.).
>>>
>>> I handle the vdef as a result.
>>
>> Ok.  Can you please rename ->vdef to ->result_vdef then?
>>
>
> Done.
>
>>>> Can't you
>>>> always use SSA_VAL (vuse)?
>>>
>>> I'm interested in storing the vdef, not the vuse.
>>
>> I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
>> the stmt defining vuse).  No?
>>
>
> Right.
>
>>>>
>>>> Btw, what's
>>>>
>>>> +  if (vdef)
>>>> +    VN_INFO (vdef)->use_processed = true;
>>>>
>>>> for?  We arrive here by visiting the VDEF after all and visit_use does already
>>>> set the use processed.  Is it that we end up visiting the call twice because
>>>> of the lhs SSA name definition and the virtual operand definition?
>>>
>>> Yes.
>>>
>>>> If so then
>>>> visit_use should instead do
>>>>
>>>> FOR_EACH_SSA_DEF_OPERAND (...)
>>>>    VN_INFO (def)->use_processed = true;
>>>>
>>>> and defs_to_varying then no longer needs to do that.
>>>>
>>>
>>> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
>>> at start of visit_use.
>>
>> Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
>> default-def for the virtual operand?
>>
>
> SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed.
>
> I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm
> still somewhat confused as to what is the difference. I suppose we might have a
> nop statement that is a default def, while it's unequal to NULL.

All default defs have a GIMPLE_NOP as definition, so I was wondering for
which case you see a NULL SSA_NAME_DEF_STMT (and only came up
with a virtual operand maybe?)

> Updated in committed patch.
>
>>>>>> Otherwise, ok for trunk?
>>>> Let's give this one more iteration, but I guess the basic idea should work.
>>>>
>>>
>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>
>>> Is this patch ok, or is the todo required?
>>
>> No, you can followup with that.
>>
>> Ok with s/vdef/result_vdef/
>>
>
> retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of
> '!stmt'.

Thanks,
Richard.

> Thanks.
> - Tom

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-04-27  9:02                   ` Richard Guenther
@ 2012-05-02 14:07                     ` Tom de Vries
  2012-05-03 10:21                       ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-05-02 14:07 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1770 bytes --]

On 27/04/12 11:01, Richard Guenther wrote:
<SNIP>
>>>>> I see you do not handle
<SNIP>
>>>>> struct S { int i; };
>>>>> struct S foo (void);
>>>>> struct S bar (void)
>>>>> {
>>>>>   struct S s1, s2;
>>>>>   if (...)
>>>>>    s = foo ();
>>>>>   else
>>>>>    s = foo ();
>>>>>
>>>>> because the calls have a LHS that is not an SSA name.
>>>>
>>>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>>>> fails to optimize this test-case:
>>>> ...
>>>> struct S { int i; };
>>>> extern struct S foo (void);
>>>> extern int foo2 (void);
>>>> struct S s;
>>>> int bar (int c) {
>>>>  int r;
>>>>  if (c)
>>>>    {
>>>>      s = foo ();
>>>>      r = foo2 ();
>>>>    }
>>>>  else
>>>>    {
>>>>      s = foo ();
>>>>      r = foo2 ();
>>>>    }
>>>>  return r;
>>>> }
>>>> ...
>>>>
>>>> A todo.
>>>>
<SNIP>
>>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>>
>>>> Is this patch ok, or is the todo required?
>>>
>>> No, you can followup with that.
>>>

Richard,

here is the follow-up patch, which adds value numbering of a call for which the
lhs is not an SSA_NAME.

The only thing I ended up using from the patch in
http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using
MODIFY_EXPR.

I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1
because I don't think it will trigger with PRE.

bootstrapped and reg-tested on x86_64.

Ok for trunk?

Thanks,
- Tom

2012-05-02  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-sccvn.c (copy_reference_ops_from_call)
	(visit_reference_op_call): Handle case that lhs is not an SSA_NAME.
	(visit_use): Call visit_reference_op_call for calls with lhs that is not
	an SSA_NAME.
	
	* gcc.dg/pr51879-16.c: New test.
	* gcc.dg/pr51879-17.c: Same.

[-- Attachment #2: pr51879.18.1.patch --]
[-- Type: text/x-patch, Size: 2942 bytes --]

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 186907)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -942,6 +947,17 @@ copy_reference_ops_from_call (gimple cal
 {
   vn_reference_op_s temp;
   unsigned i;
+  tree lhs = gimple_call_lhs (call);
+
+  if (lhs && TREE_CODE (lhs) != SSA_NAME)
+    {
+      memset (&temp, 0, sizeof (temp));
+      temp.opcode = MODIFY_EXPR;
+      temp.type = TREE_TYPE (lhs);
+      temp.op0 = lhs;
+      temp.off = -1;
+      VEC_safe_push (vn_reference_op_s, heap, *result, &temp);
+    }
 
   /* Copy the type, opcode, function being called and static chain.  */
   memset (&temp, 0, sizeof (temp));
@@ -2624,6 +2641,9 @@ visit_reference_op_call (tree lhs, gimpl
   tree vuse = gimple_vuse (stmt);
   tree vdef = gimple_vdef (stmt);
 
+  if (lhs && TREE_CODE (lhs) != SSA_NAME)
+    lhs = NULL_TREE;
+
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
   vr1.type = gimple_expr_type (stmt);
@@ -3410,9 +3432,7 @@ visit_use (tree use)
 		  /* If the call has side effects, subsequent calls won't have
 		     the same incoming vuse, so it's save to assume
 		     equality.  */
-		  || gimple_has_side_effects (stmt))
-	      && ((lhs && TREE_CODE (lhs) == SSA_NAME)
-		  || (!lhs && gimple_vdef (stmt))))
+		  || gimple_has_side_effects (stmt)))
 	    {
 	      changed = visit_reference_op_call (lhs, stmt);
 	    }
Index: gcc/testsuite/gcc.dg/pr51879-16.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-16.c (revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+struct S {
+  int i;
+};
+
+extern struct S foo (void);
+extern int foo2 (void);
+
+struct S s;
+
+int bar (int c) {
+  int r;
+
+  if (c)
+    {
+      s = foo ();
+      r = foo2 ();
+    }
+  else
+    {
+      s = foo ();
+      r = foo2 ();
+    }
+
+  return r;
+}
+
+/* { dg-final { scan-tree-dump-times "foo \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "foo2 \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-17.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-17.c (revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+struct S {
+  int i;
+};
+
+extern struct S foo (void);
+extern int foo2 (void);
+
+struct S s, s2;
+
+int bar (int c) {
+  int r;
+
+  if (c)
+    {
+      s = foo ();
+      r = foo2 ();
+    }
+  else
+    {
+      s2 = foo ();
+      r = foo2 ();
+    }
+
+  return r;
+}
+
+/* { dg-final { scan-tree-dump-times "foo \\(" 2 "pre"} } */
+/* { dg-final { scan-tree-dump-times "foo2 \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-05-02 14:07                     ` Tom de Vries
@ 2012-05-03 10:21                       ` Richard Guenther
  2012-07-05 16:45                         ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2012-05-03 10:21 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Wed, May 2, 2012 at 4:06 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 27/04/12 11:01, Richard Guenther wrote:
> <SNIP>
>>>>>> I see you do not handle
> <SNIP>
>>>>>> struct S { int i; };
>>>>>> struct S foo (void);
>>>>>> struct S bar (void)
>>>>>> {
>>>>>>   struct S s1, s2;
>>>>>>   if (...)
>>>>>>    s = foo ();
>>>>>>   else
>>>>>>    s = foo ();
>>>>>>
>>>>>> because the calls have a LHS that is not an SSA name.
>>>>>
>>>>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>>>>> fails to optimize this test-case:
>>>>> ...
>>>>> struct S { int i; };
>>>>> extern struct S foo (void);
>>>>> extern int foo2 (void);
>>>>> struct S s;
>>>>> int bar (int c) {
>>>>>  int r;
>>>>>  if (c)
>>>>>    {
>>>>>      s = foo ();
>>>>>      r = foo2 ();
>>>>>    }
>>>>>  else
>>>>>    {
>>>>>      s = foo ();
>>>>>      r = foo2 ();
>>>>>    }
>>>>>  return r;
>>>>> }
>>>>> ...
>>>>>
>>>>> A todo.
>>>>>
> <SNIP>
>>>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>>>
>>>>> Is this patch ok, or is the todo required?
>>>>
>>>> No, you can followup with that.
>>>>
>
> Richard,
>
> here is the follow-up patch, which adds value numbering of a call for which the
> lhs is not an SSA_NAME.
>
> The only thing I ended up using from the patch in
> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using
> MODIFY_EXPR.
>
> I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1
> because I don't think it will trigger with PRE.
>
> bootstrapped and reg-tested on x86_64.
>
> Ok for trunk?

Hmm, I wonder why

          if (!gimple_call_internal_p (stmt)
              && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
                  /* If the call has side effects, subsequent calls won't have
                     the same incoming vuse, so it's save to assume
                     equality.  */
                  || gimple_has_side_effects (stmt)))

works - I realize you added the gimple_has_side_effects () call - but
if you consider ECF_LOOPING_CONST_OR_PURE functions, which
have no VDEF, then it's odd how the comment applies.  And together
both tests turn out to let all calls pass.

+  tree lhs = gimple_call_lhs (call);
+
+  if (lhs && TREE_CODE (lhs) != SSA_NAME)
+    {
+      memset (&temp, 0, sizeof (temp));
+      temp.opcode = MODIFY_EXPR;
+      temp.type = TREE_TYPE (lhs);
+      temp.op0 = lhs;
+      temp.off = -1;
+      VEC_safe_push (vn_reference_op_s, heap, *result, &temp);
+    }

this deserves a comment - you are adding the extra operand solely for
the purpose of hashing.  You are also not doing a good job identifying
common calls.  Consider

if ()
 *p = foo ();
else
 *q = foo ();

where p and q are value-numbered the same.  You fail to properly
commonize the blocks.  That is because valueization of the ops
of the call does not work for arbitrarily complex operands - see
how we handle call operands.  Instead you should probably use
copy_reference_ops_from_ref on the lhs, similar to call operands.

Using MODIFY_EXPR as toplevel code for the vn_reference is going to
indeed disable PRE for them, likewise any other call handling in VN.

Otherwise the idea looks ok - can you change the patch like above
and add a testcase with an equal-VNed indirect store?

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2012-05-02  Tom de Vries  <tom@codesourcery.com>
>
>        * tree-ssa-sccvn.c (copy_reference_ops_from_call)
>        (visit_reference_op_call): Handle case that lhs is not an SSA_NAME.
>        (visit_use): Call visit_reference_op_call for calls with lhs that is not
>        an SSA_NAME.
>
>        * gcc.dg/pr51879-16.c: New test.
>        * gcc.dg/pr51879-17.c: Same.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-05-03 10:21                       ` Richard Guenther
@ 2012-07-05 16:45                         ` Tom de Vries
  2012-07-06  9:17                           ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2012-07-05 16:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4846 bytes --]

On 03/05/12 12:21, Richard Guenther wrote:
> On Wed, May 2, 2012 at 4:06 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 27/04/12 11:01, Richard Guenther wrote:
>> <SNIP>
>>>>>>> I see you do not handle
>> <SNIP>
>>>>>>> struct S { int i; };
>>>>>>> struct S foo (void);
>>>>>>> struct S bar (void)
>>>>>>> {
>>>>>>>   struct S s1, s2;
>>>>>>>   if (...)
>>>>>>>    s = foo ();
>>>>>>>   else
>>>>>>>    s = foo ();
>>>>>>>
>>>>>>> because the calls have a LHS that is not an SSA name.
>>>>>>
>>>>>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>>>>>> fails to optimize this test-case:
>>>>>> ...
>>>>>> struct S { int i; };
>>>>>> extern struct S foo (void);
>>>>>> extern int foo2 (void);
>>>>>> struct S s;
>>>>>> int bar (int c) {
>>>>>>  int r;
>>>>>>  if (c)
>>>>>>    {
>>>>>>      s = foo ();
>>>>>>      r = foo2 ();
>>>>>>    }
>>>>>>  else
>>>>>>    {
>>>>>>      s = foo ();
>>>>>>      r = foo2 ();
>>>>>>    }
>>>>>>  return r;
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> A todo.
>>>>>>
>> <SNIP>
>>>>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>>>>
>>>>>> Is this patch ok, or is the todo required?
>>>>>
>>>>> No, you can followup with that.
>>>>>
>>
>> Richard,
>>
>> here is the follow-up patch, which adds value numbering of a call for which the
>> lhs is not an SSA_NAME.
>>
>> The only thing I ended up using from the patch in
>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using
>> MODIFY_EXPR.
>>
>> I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1
>> because I don't think it will trigger with PRE.
>>
>> bootstrapped and reg-tested on x86_64.
>>
>> Ok for trunk?
> 
> Hmm, I wonder why
> 
>           if (!gimple_call_internal_p (stmt)
>               && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>                   /* If the call has side effects, subsequent calls won't have
>                      the same incoming vuse, so it's save to assume
>                      equality.  */
>                   || gimple_has_side_effects (stmt)))
> 
> works - I realize you added the gimple_has_side_effects () call - but
> if you consider ECF_LOOPING_CONST_OR_PURE functions, which
> have no VDEF, then it's odd how the comment applies.  And together
> both tests turn out to let all calls pass.
> 

Richard,

You're right, this is not correct. The test for gimple_has_side_effect should be
a test for gimple_vdef.
A ECF_LOOPING_CONST_OR_PURE function will be rejected by the updated condition.

I fixed this in the patch, and added comments describing both the const/pure
clause, and the vdef clause.

I also removed the comment 'We should handle stores from calls' since this patch
implements that.

> +  tree lhs = gimple_call_lhs (call);
> +
> +  if (lhs && TREE_CODE (lhs) != SSA_NAME)
> +    {
> +      memset (&temp, 0, sizeof (temp));
> +      temp.opcode = MODIFY_EXPR;
> +      temp.type = TREE_TYPE (lhs);
> +      temp.op0 = lhs;
> +      temp.off = -1;
> +      VEC_safe_push (vn_reference_op_s, heap, *result, &temp);
> +    }
> 
> this deserves a comment

Done.

> - you are adding the extra operand solely for
> the purpose of hashing.  You are also not doing a good job identifying
> common calls.  Consider
> 
> if ()
>  *p = foo ();
> else
>  *q = foo ();
> 
> where p and q are value-numbered the same.  You fail to properly
> commonize the blocks.  That is because valueization of the ops
> of the call does not work for arbitrarily complex operands - see
> how we handle call operands.  Instead you should probably use
> copy_reference_ops_from_ref on the lhs, similar to call operands.
> 

If p and q are value numbered, it means they're SSA_NAMEs, and that means
they're not handled by this patch which is only about handling calls for which
the lhs is not an SSA_NAME.

This example is handled by the patch I posted for pr52009. I reposted the patch
and added this test-case (http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00155.html).

So I'm not using copy_reference_ops_from_ref on the lhs, since it's not an SSA_NAME.

> Using MODIFY_EXPR as toplevel code for the vn_reference is going to
> indeed disable PRE for them, likewise any other call handling in VN.
> 
> Otherwise the idea looks ok - can you change the patch like above
> and add a testcase with an equal-VNed indirect store?
> 

I updated the patch as indicated in my comments, and added the test-case to the
patch for pr52009.

Bootstrapped and reg-tested on x86_64 (ada inclusive).

OK for trunk?

Thanks,
- Tom

2012-07-05  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-sccvn.c (copy_reference_ops_from_call)
	(visit_reference_op_call): Handle case that lhs is not an SSA_NAME.
	(visit_use): Also call visit_reference_op_call for calls with a vdef.
	
	* gcc.dg/pr51879-16.c: New test.
	* gcc.dg/pr51879-17.c: Same.

[-- Attachment #2: pr51879.19.patch --]
[-- Type: text/x-patch, Size: 4134 bytes --]

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 189007)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -942,6 +942,20 @@ copy_reference_ops_from_call (gimple cal
 {
   vn_reference_op_s temp;
   unsigned i;
+  tree lhs = gimple_call_lhs (call);
+
+  /* If 2 calls have a different non-ssa lhs, vdef value numbers should be
+     different.  By adding the lhs here in the vector, we ensure that the
+     hashcode is different, guaranteeing a different value number.  */
+  if (lhs && TREE_CODE (lhs) != SSA_NAME)
+    {
+      memset (&temp, 0, sizeof (temp));
+      temp.opcode = MODIFY_EXPR;
+      temp.type = TREE_TYPE (lhs);
+      temp.op0 = lhs;
+      temp.off = -1;
+      VEC_safe_push (vn_reference_op_s, heap, *result, &temp);
+    }
 
   /* Copy the type, opcode, function being called and static chain.  */
   memset (&temp, 0, sizeof (temp));
@@ -2628,6 +2642,10 @@ visit_reference_op_call (tree lhs, gimpl
   tree vuse = gimple_vuse (stmt);
   tree vdef = gimple_vdef (stmt);
 
+  /* Non-ssa lhs is handled in copy_reference_ops_from_call.  */
+  if (lhs && TREE_CODE (lhs) != SSA_NAME)
+    lhs = NULL_TREE;
+
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
   vr1.type = gimple_expr_type (stmt);
@@ -3408,18 +3426,20 @@ visit_use (tree use)
 		}
 	    }
 
-	  /* ???  We should handle stores from calls.  */
 	  if (!gimple_call_internal_p (stmt)
-	      && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
-		  /* If the call has side effects, subsequent calls won't have
-		     the same incoming vuse, so it's save to assume
-		     equality.  */
-		  || gimple_has_side_effects (stmt))
-	      && ((lhs && TREE_CODE (lhs) == SSA_NAME)
-		  || (!lhs && gimple_vdef (stmt))))
-	    {
-	      changed = visit_reference_op_call (lhs, stmt);
-	    }
+	      && (/* Calls to the same function with the same vuse
+		     and the same operands do not necessarily return the same
+		     value, unless they're pure or const.  */
+		  gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
+		  /* If calls have a vdef, subsequent calls won't have
+		     the same incoming vuse.  So, if 2 calls with vdef have the
+		     same vuse, we know they're not subsequent.
+		     We can value number 2 calls to the same function with the
+		     same vuse and the same operands which are not subsequent
+		     the same, because there is no code in the program that can
+		     compare the 2 values.  */
+		  || gimple_vdef (stmt)))
+	    changed = visit_reference_op_call (lhs, stmt);
 	  else
 	    changed = defs_to_varying (stmt);
 	}
Index: gcc/testsuite/gcc.dg/pr51879-16.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-16.c (revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+struct S {
+  int i;
+};
+
+extern struct S foo (void);
+extern int foo2 (void);
+
+struct S s;
+
+int bar (int c) {
+  int r;
+
+  if (c)
+    {
+      s = foo ();
+      r = foo2 ();
+    }
+  else
+    {
+      s = foo ();
+      r = foo2 ();
+    }
+
+  return r;
+}
+
+/* { dg-final { scan-tree-dump-times "foo \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "foo2 \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-17.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-17.c (revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+struct S {
+  int i;
+};
+
+extern struct S foo (void);
+extern int foo2 (void);
+
+struct S s, s2;
+
+int bar (int c) {
+  int r;
+
+  if (c)
+    {
+      s = foo ();
+      r = foo2 ();
+    }
+  else
+    {
+      s2 = foo ();
+      r = foo2 ();
+    }
+
+  return r;
+}
+
+/* { dg-final { scan-tree-dump-times "foo \\(" 2 "pre"} } */
+/* { dg-final { scan-tree-dump-times "foo2 \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
  2012-07-05 16:45                         ` Tom de Vries
@ 2012-07-06  9:17                           ` Richard Guenther
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Guenther @ 2012-07-06  9:17 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Jakub Jelinek, gcc-patches

On Thu, Jul 5, 2012 at 6:44 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 03/05/12 12:21, Richard Guenther wrote:
>> On Wed, May 2, 2012 at 4:06 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 27/04/12 11:01, Richard Guenther wrote:
>>> <SNIP>
>>>>>>>> I see you do not handle
>>> <SNIP>
>>>>>>>> struct S { int i; };
>>>>>>>> struct S foo (void);
>>>>>>>> struct S bar (void)
>>>>>>>> {
>>>>>>>>   struct S s1, s2;
>>>>>>>>   if (...)
>>>>>>>>    s = foo ();
>>>>>>>>   else
>>>>>>>>    s = foo ();
>>>>>>>>
>>>>>>>> because the calls have a LHS that is not an SSA name.
>>>>>>>
>>>>>>> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
>>>>>>> fails to optimize this test-case:
>>>>>>> ...
>>>>>>> struct S { int i; };
>>>>>>> extern struct S foo (void);
>>>>>>> extern int foo2 (void);
>>>>>>> struct S s;
>>>>>>> int bar (int c) {
>>>>>>>  int r;
>>>>>>>  if (c)
>>>>>>>    {
>>>>>>>      s = foo ();
>>>>>>>      r = foo2 ();
>>>>>>>    }
>>>>>>>  else
>>>>>>>    {
>>>>>>>      s = foo ();
>>>>>>>      r = foo2 ();
>>>>>>>    }
>>>>>>>  return r;
>>>>>>> }
>>>>>>> ...
>>>>>>>
>>>>>>> A todo.
>>>>>>>
>>> <SNIP>
>>>>>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>>>>>
>>>>>>> Is this patch ok, or is the todo required?
>>>>>>
>>>>>> No, you can followup with that.
>>>>>>
>>>
>>> Richard,
>>>
>>> here is the follow-up patch, which adds value numbering of a call for which the
>>> lhs is not an SSA_NAME.
>>>
>>> The only thing I ended up using from the patch in
>>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using
>>> MODIFY_EXPR.
>>>
>>> I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1
>>> because I don't think it will trigger with PRE.
>>>
>>> bootstrapped and reg-tested on x86_64.
>>>
>>> Ok for trunk?
>>
>> Hmm, I wonder why
>>
>>           if (!gimple_call_internal_p (stmt)
>>               && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>                   /* If the call has side effects, subsequent calls won't have
>>                      the same incoming vuse, so it's save to assume
>>                      equality.  */
>>                   || gimple_has_side_effects (stmt)))
>>
>> works - I realize you added the gimple_has_side_effects () call - but
>> if you consider ECF_LOOPING_CONST_OR_PURE functions, which
>> have no VDEF, then it's odd how the comment applies.  And together
>> both tests turn out to let all calls pass.
>>
>
> Richard,
>
> You're right, this is not correct. The test for gimple_has_side_effect should be
> a test for gimple_vdef.
> A ECF_LOOPING_CONST_OR_PURE function will be rejected by the updated condition.
>
> I fixed this in the patch, and added comments describing both the const/pure
> clause, and the vdef clause.
>
> I also removed the comment 'We should handle stores from calls' since this patch
> implements that.
>
>> +  tree lhs = gimple_call_lhs (call);
>> +
>> +  if (lhs && TREE_CODE (lhs) != SSA_NAME)
>> +    {
>> +      memset (&temp, 0, sizeof (temp));
>> +      temp.opcode = MODIFY_EXPR;
>> +      temp.type = TREE_TYPE (lhs);
>> +      temp.op0 = lhs;
>> +      temp.off = -1;
>> +      VEC_safe_push (vn_reference_op_s, heap, *result, &temp);
>> +    }
>>
>> this deserves a comment
>
> Done.
>
>> - you are adding the extra operand solely for
>> the purpose of hashing.  You are also not doing a good job identifying
>> common calls.  Consider
>>
>> if ()
>>  *p = foo ();
>> else
>>  *q = foo ();
>>
>> where p and q are value-numbered the same.  You fail to properly
>> commonize the blocks.  That is because valueization of the ops
>> of the call does not work for arbitrarily complex operands - see
>> how we handle call operands.  Instead you should probably use
>> copy_reference_ops_from_ref on the lhs, similar to call operands.
>>
>
> If p and q are value numbered, it means they're SSA_NAMEs, and that means
> they're not handled by this patch which is only about handling calls for which
> the lhs is not an SSA_NAME.
>
> This example is handled by the patch I posted for pr52009. I reposted the patch
> and added this test-case (http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00155.html).
>
> So I'm not using copy_reference_ops_from_ref on the lhs, since it's not an SSA_NAME.
>
>> Using MODIFY_EXPR as toplevel code for the vn_reference is going to
>> indeed disable PRE for them, likewise any other call handling in VN.
>>
>> Otherwise the idea looks ok - can you change the patch like above
>> and add a testcase with an equal-VNed indirect store?
>>
>
> I updated the patch as indicated in my comments, and added the test-case to the
> patch for pr52009.
>
> Bootstrapped and reg-tested on x86_64 (ada inclusive).
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2012-07-05  Tom de Vries  <tom@codesourcery.com>
>
>         * tree-ssa-sccvn.c (copy_reference_ops_from_call)
>         (visit_reference_op_call): Handle case that lhs is not an SSA_NAME.
>         (visit_use): Also call visit_reference_op_call for calls with a vdef.
>
>         * gcc.dg/pr51879-16.c: New test.
>         * gcc.dg/pr51879-17.c: Same.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-07-06  9:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-23 21:27 [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls Tom de Vries
2012-01-24 10:40 ` Richard Guenther
2012-01-27 20:37   ` Tom de Vries
2012-04-14  7:26     ` Tom de Vries
2012-04-17 12:25       ` Richard Guenther
2012-04-24 21:20         ` Tom de Vries
2012-04-25  9:57           ` Richard Guenther
2012-04-25 10:10             ` Jakub Jelinek
2012-04-25 10:18               ` Tom de Vries
2012-04-25 21:57             ` Tom de Vries
2012-04-26 10:21               ` Richard Guenther
2012-04-27  6:20                 ` Tom de Vries
2012-04-27  9:02                   ` Richard Guenther
2012-05-02 14:07                     ` Tom de Vries
2012-05-03 10:21                       ` Richard Guenther
2012-07-05 16:45                         ` Tom de Vries
2012-07-06  9:17                           ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).