public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix parts of PR61607
@ 2014-06-25 14:07 Richard Biener
  2014-06-25 18:28 ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2014-06-25 14:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: law


This removes restrictions in DOM cprop_operand that inhibit
some optimizations.  The volatile pointer thing is really realy
old and no longer necessary while the loop-depth consideration
is only valid for loop-closed PHI nodes (but we're not in
loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
phase by inserting copies appropriately.

Bootstrapped on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2014-06-25  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/61607
	* tree-ssa-dom.c (cprop_operand): Remove restriction on
	propagating volatile pointers and on loop depth.

Index: gcc/tree-ssa-dom.c
===================================================================
--- gcc/tree-ssa-dom.c	(revision 211969)
+++ gcc/tree-ssa-dom.c	(working copy)
@@ -2247,22 +2247,6 @@ cprop_operand (gimple stmt, use_operand_
       if (!may_propagate_copy (op, val))
 	return;
 
-      /* Do not propagate addresses that point to volatiles into memory
-	 stmts without volatile operands.  */
-      if (POINTER_TYPE_P (TREE_TYPE (val))
-	  && TYPE_VOLATILE (TREE_TYPE (TREE_TYPE (val)))
-	  && gimple_has_mem_ops (stmt)
-	  && !gimple_has_volatile_ops (stmt))
-	return;
-
-      /* Do not propagate copies if the propagated value is at a deeper loop
-	 depth than the propagatee.  Otherwise, this may move loop variant
-	 variables outside of their loops and prevent coalescing
-	 opportunities.  If the value was loop invariant, it will be hoisted
-	 by LICM and exposed for copy propagation.  */
-      if (loop_depth_of_name (val) > loop_depth_of_name (op))
-	return;
-
       /* Do not propagate copies into simple IV increment statements.
          See PR23821 for how this can disturb IV analysis.  */
       if (TREE_CODE (val) != INTEGER_CST

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix parts of PR61607
  2014-06-25 14:07 [PATCH] Fix parts of PR61607 Richard Biener
@ 2014-06-25 18:28 ` Jeff Law
  2014-06-26  7:57   ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2014-06-25 18:28 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

On 06/25/14 08:05, Richard Biener wrote:
>
> This removes restrictions in DOM cprop_operand that inhibit
> some optimizations.  The volatile pointer thing is really realy
> old and no longer necessary while the loop-depth consideration
> is only valid for loop-closed PHI nodes (but we're not in
> loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
> phase by inserting copies appropriately.
>
> Bootstrapped on x86_64-unknown-linux-gnu, ok?
>
> Thanks,
> Richard.
>
> 2014-06-25  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/61607
> 	* tree-ssa-dom.c (cprop_operand): Remove restriction on
> 	propagating volatile pointers and on loop depth.
The first hunk is OK.

I thought we had tests for the do not copy propagate out of a loop nest 
in the suite.  Did you check that tests in BZ 19038 still generate good 
code after this change?  If we still generate good code for those tests, 
then this hunk is fine too.

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix parts of PR61607
  2014-06-25 18:28 ` Jeff Law
@ 2014-06-26  7:57   ` Richard Biener
  2014-06-26  8:33     ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2014-06-26  7:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, 25 Jun 2014, Jeff Law wrote:

> On 06/25/14 08:05, Richard Biener wrote:
> > 
> > This removes restrictions in DOM cprop_operand that inhibit
> > some optimizations.  The volatile pointer thing is really realy
> > old and no longer necessary while the loop-depth consideration
> > is only valid for loop-closed PHI nodes (but we're not in
> > loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
> > phase by inserting copies appropriately.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, ok?
> > 
> > Thanks,
> > Richard.
> > 
> > 2014-06-25  Richard Biener  <rguenther@suse.de>
> > 
> > 	PR tree-optimization/61607
> > 	* tree-ssa-dom.c (cprop_operand): Remove restriction on
> > 	propagating volatile pointers and on loop depth.
> The first hunk is OK.
> 
> I thought we had tests for the do not copy propagate out of a loop nest in the
> suite.  Did you check that tests in BZ 19038 still generate good code after
> this change?  If we still generate good code for those tests, then this hunk
> is fine too.

I have applied the first hunk and will investigate further.  Testing 
didn't show any issue and I know how to retain the check but not
cause the missed optimization shown in PR61607.

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix parts of PR61607
  2014-06-26  7:57   ` Richard Biener
@ 2014-06-26  8:33     ` Richard Biener
  2014-06-26  9:01       ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2014-06-26  8:33 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Thu, 26 Jun 2014, Richard Biener wrote:

> On Wed, 25 Jun 2014, Jeff Law wrote:
> 
> > On 06/25/14 08:05, Richard Biener wrote:
> > > 
> > > This removes restrictions in DOM cprop_operand that inhibit
> > > some optimizations.  The volatile pointer thing is really realy
> > > old and no longer necessary while the loop-depth consideration
> > > is only valid for loop-closed PHI nodes (but we're not in
> > > loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
> > > phase by inserting copies appropriately.
> > > 
> > > Bootstrapped on x86_64-unknown-linux-gnu, ok?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2014-06-25  Richard Biener  <rguenther@suse.de>
> > > 
> > > 	PR tree-optimization/61607
> > > 	* tree-ssa-dom.c (cprop_operand): Remove restriction on
> > > 	propagating volatile pointers and on loop depth.
> > The first hunk is OK.
> > 
> > I thought we had tests for the do not copy propagate out of a loop nest in the
> > suite.  Did you check that tests in BZ 19038 still generate good code after
> > this change?  If we still generate good code for those tests, then this hunk
> > is fine too.
> 
> I have applied the first hunk and will investigate further.  Testing 
> didn't show any issue and I know how to retain the check but not
> cause the missed optimization shown in PR61607.

Let's try to summarize what the restriction is supposed to avoid.
It tries to avoid introducing uses of SSA names defined inside a
loop outside of it because if the SSA name is live over the backedge
we will then have an overlapping life-range which prevents out-of-SSA
from coalescing it to a single register.

Now, the existing test is not working in that way.

Rather the best way we have to ensure this property (all outside
uses go through a copy that is placed on exit edges rather than
possibly on the backedge) is to go into loop-closed SSA form.
This is also where the PHI nodes that confuse DOM in PR61607
come from in the first place.

Now as the existing measure is ineffective in some cases out-of-SSA
has gotten the ability to deal with this (or a subset):

  /* If elimination of a PHI requires inserting a copy on a backedge,
     then we will have to split the backedge which has numerous
     undesirable performance effects.

     A significant number of such cases can be handled here by inserting
     copies into the loop itself.  */
  insert_backedge_copies ();

now, this doesn't seem to deal with outside uses.  But eventually
the coalescing code already assigns proper cost to backedge copies
so that we choose to place copies on the exit edges rather than
the backedge ones - seems not so from looking at coalesce_cost_edge.

So I think that we should remove the copy-propagation restrictions
and instead address this in out-of-SSA.

For now the following patch retains the exact same restriction in
DOM as it is present in copyprop (but not in FRE - ok my recent fault,
or in VRP).  By avoiding to record the equivalency for PHIs
(where we know that either all or no uses should be covered by
the loop depth check) we retain the ability to record the equivalency
for the two loop exit PHI nodes and thus the threading (if only
on the false path).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I'll try to see what happens to the PR19038 testcases (though
that PR is a mess ...)

Richard.

2014-06-26  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/61607
	* tree-ssa-copy.c (copy_prop_visit_phi_node): Adjust comment
	explaining why we restrict copies on loop depth.
	* tree-ssa-dom.c (cprop_operand): Remove restriction on
	on loop depth.
	(record_equivalences_from_phis): Instead add it here.

	* gcc.dg/tree-ssa/ssa-dom-thread-5.c: New testcase.

Index: gcc/tree-ssa-copy.c
===================================================================
--- gcc/tree-ssa-copy.c	(revision 212012)
+++ gcc/tree-ssa-copy.c	(working copy)
@@ -401,11 +401,8 @@ copy_prop_visit_phi_node (gimple phi)
 	arg_value = valueize_val (arg);
 
       /* Avoid copy propagation from an inner into an outer loop.
-	 Otherwise, this may move loop variant variables outside of
-	 their loops and prevent coalescing opportunities.  If the
-	 value was loop invariant, it will be hoisted by LICM and
-	 exposed for copy propagation.
-	 ???  The value will be always loop invariant.
+	 Otherwise, this may introduce uses of loop variant variables
+	 outside of their loops and prevent coalescing opportunities.
 	 In loop-closed SSA form do not copy-propagate through
 	 PHI nodes in blocks with a loop exit edge predecessor.  */
       if (TREE_CODE (arg_value) == SSA_NAME
Index: gcc/tree-ssa-dom.c
===================================================================
--- gcc/tree-ssa-dom.c	(revision 212013)
+++ gcc/tree-ssa-dom.c	(working copy)
@@ -1234,7 +1234,13 @@ record_equivalences_from_phis (basic_blo
 	 this, since this is a true assignment and not an equivalence
 	 inferred from a comparison.  All uses of this ssa name are dominated
 	 by this assignment, so unwinding just costs time and space.  */
-      if (i == gimple_phi_num_args (phi) && may_propagate_copy (lhs, rhs))
+      if (i == gimple_phi_num_args (phi)
+	  && may_propagate_copy (lhs, rhs)
+	  /* Do not propagate copies if the propagated value is at a deeper loop
+	     depth than the propagatee.  Otherwise, this may introduce uses
+	     of loop variant variables outside of their loops and prevent
+	     coalescing opportunities.  */
+	  && !(loop_depth_of_name (rhs) > loop_depth_of_name (lhs)))
 	set_ssa_name_value (lhs, rhs);
     }
 }
@@ -2247,14 +2253,6 @@ cprop_operand (gimple stmt, use_operand_
       if (!may_propagate_copy (op, val))
 	return;
 
-      /* Do not propagate copies if the propagated value is at a deeper loop
-	 depth than the propagatee.  Otherwise, this may move loop variant
-	 variables outside of their loops and prevent coalescing
-	 opportunities.  If the value was loop invariant, it will be hoisted
-	 by LICM and exposed for copy propagation.  */
-      if (loop_depth_of_name (val) > loop_depth_of_name (op))
-	return;
-
       /* Do not propagate copies into simple IV increment statements.
          See PR23821 for how this can disturb IV analysis.  */
       if (TREE_CODE (val) != INTEGER_CST
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c	(working copy)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fno-tree-fre -fdump-tree-dom1-details" } */
+
+void foo(int *);
+void f2(int dst[3], int R)
+{
+  int i, inter[2];
+  _Bool inter0p = 0;
+  _Bool inter1p = 0;
+  for (i = 1; i < R; i++)
+    {
+      inter0p = 1;
+      inter1p = 1;
+    }
+  if (inter0p)
+    inter[0] = 1;
+  if (inter1p)
+    inter[1] = 1;
+  foo(inter);
+}
+
+/* { dg-final { scan-tree-dump "Threaded jump" "dom1" } } */
+/* { dg-final { cleanup-tree-dump "dom1" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix parts of PR61607
  2014-06-26  8:33     ` Richard Biener
@ 2014-06-26  9:01       ` Richard Biener
  2014-06-26 15:33         ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2014-06-26  9:01 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Thu, 26 Jun 2014, Richard Biener wrote:

> On Thu, 26 Jun 2014, Richard Biener wrote:
> 
> > On Wed, 25 Jun 2014, Jeff Law wrote:
> > 
> > > On 06/25/14 08:05, Richard Biener wrote:
> > > > 
> > > > This removes restrictions in DOM cprop_operand that inhibit
> > > > some optimizations.  The volatile pointer thing is really realy
> > > > old and no longer necessary while the loop-depth consideration
> > > > is only valid for loop-closed PHI nodes (but we're not in
> > > > loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
> > > > phase by inserting copies appropriately.
> > > > 
> > > > Bootstrapped on x86_64-unknown-linux-gnu, ok?
> > > > 
> > > > Thanks,
> > > > Richard.
> > > > 
> > > > 2014-06-25  Richard Biener  <rguenther@suse.de>
> > > > 
> > > > 	PR tree-optimization/61607
> > > > 	* tree-ssa-dom.c (cprop_operand): Remove restriction on
> > > > 	propagating volatile pointers and on loop depth.
> > > The first hunk is OK.
> > > 
> > > I thought we had tests for the do not copy propagate out of a loop nest in the
> > > suite.  Did you check that tests in BZ 19038 still generate good code after
> > > this change?  If we still generate good code for those tests, then this hunk
> > > is fine too.
> > 
> > I have applied the first hunk and will investigate further.  Testing 
> > didn't show any issue and I know how to retain the check but not
> > cause the missed optimization shown in PR61607.
> 
> Let's try to summarize what the restriction is supposed to avoid.
> It tries to avoid introducing uses of SSA names defined inside a
> loop outside of it because if the SSA name is live over the backedge
> we will then have an overlapping life-range which prevents out-of-SSA
> from coalescing it to a single register.
> 
> Now, the existing test is not working in that way.
> 
> Rather the best way we have to ensure this property (all outside
> uses go through a copy that is placed on exit edges rather than
> possibly on the backedge) is to go into loop-closed SSA form.
> This is also where the PHI nodes that confuse DOM in PR61607
> come from in the first place.
> 
> Now as the existing measure is ineffective in some cases out-of-SSA
> has gotten the ability to deal with this (or a subset):
> 
>   /* If elimination of a PHI requires inserting a copy on a backedge,
>      then we will have to split the backedge which has numerous
>      undesirable performance effects.
> 
>      A significant number of such cases can be handled here by inserting
>      copies into the loop itself.  */
>   insert_backedge_copies ();
> 
> now, this doesn't seem to deal with outside uses.  But eventually
> the coalescing code already assigns proper cost to backedge copies
> so that we choose to place copies on the exit edges rather than
> the backedge ones - seems not so from looking at coalesce_cost_edge.
> 
> So I think that we should remove the copy-propagation restrictions
> and instead address this in out-of-SSA.
> 
> For now the following patch retains the exact same restriction in
> DOM as it is present in copyprop (but not in FRE - ok my recent fault,
> or in VRP).  By avoiding to record the equivalency for PHIs
> (where we know that either all or no uses should be covered by
> the loop depth check) we retain the ability to record the equivalency
> for the two loop exit PHI nodes and thus the threading (if only
> on the false path).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I'll try to see what happens to the PR19038 testcases (though
> that PR is a mess ...)

I checked the very original one (thin6d.f from sixtrack) and the
generated assembly for -Ofast is the same without any patch
and with _all_ loop_depth_of_name restrictions removed from
both DOM and copyprop (thus making loop_depth_of_name dead).

The cost of out-of-SSA copies for backedges (or in the case
of the PR, loop latch edges causing an edge split) is dealt
with by

  /* Inserting copy on critical edge costs more than inserting it 
elsewhere.  */
  if (EDGE_CRITICAL_P (e))
    mult = 2;

in coalesce_cost_edge.

So in the end, without a testcase to investigate, I'd propose
to get rid of those restrictions.  I'm still going forward
with the patch below for now.

Richard.

> Richard.
> 
> 2014-06-26  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/61607
> 	* tree-ssa-copy.c (copy_prop_visit_phi_node): Adjust comment
> 	explaining why we restrict copies on loop depth.
> 	* tree-ssa-dom.c (cprop_operand): Remove restriction on
> 	on loop depth.
> 	(record_equivalences_from_phis): Instead add it here.
> 
> 	* gcc.dg/tree-ssa/ssa-dom-thread-5.c: New testcase.
> 
> Index: gcc/tree-ssa-copy.c
> ===================================================================
> --- gcc/tree-ssa-copy.c	(revision 212012)
> +++ gcc/tree-ssa-copy.c	(working copy)
> @@ -401,11 +401,8 @@ copy_prop_visit_phi_node (gimple phi)
>  	arg_value = valueize_val (arg);
>  
>        /* Avoid copy propagation from an inner into an outer loop.
> -	 Otherwise, this may move loop variant variables outside of
> -	 their loops and prevent coalescing opportunities.  If the
> -	 value was loop invariant, it will be hoisted by LICM and
> -	 exposed for copy propagation.
> -	 ???  The value will be always loop invariant.
> +	 Otherwise, this may introduce uses of loop variant variables
> +	 outside of their loops and prevent coalescing opportunities.
>  	 In loop-closed SSA form do not copy-propagate through
>  	 PHI nodes in blocks with a loop exit edge predecessor.  */
>        if (TREE_CODE (arg_value) == SSA_NAME
> Index: gcc/tree-ssa-dom.c
> ===================================================================
> --- gcc/tree-ssa-dom.c	(revision 212013)
> +++ gcc/tree-ssa-dom.c	(working copy)
> @@ -1234,7 +1234,13 @@ record_equivalences_from_phis (basic_blo
>  	 this, since this is a true assignment and not an equivalence
>  	 inferred from a comparison.  All uses of this ssa name are dominated
>  	 by this assignment, so unwinding just costs time and space.  */
> -      if (i == gimple_phi_num_args (phi) && may_propagate_copy (lhs, rhs))
> +      if (i == gimple_phi_num_args (phi)
> +	  && may_propagate_copy (lhs, rhs)
> +	  /* Do not propagate copies if the propagated value is at a deeper loop
> +	     depth than the propagatee.  Otherwise, this may introduce uses
> +	     of loop variant variables outside of their loops and prevent
> +	     coalescing opportunities.  */
> +	  && !(loop_depth_of_name (rhs) > loop_depth_of_name (lhs)))
>  	set_ssa_name_value (lhs, rhs);
>      }
>  }
> @@ -2247,14 +2253,6 @@ cprop_operand (gimple stmt, use_operand_
>        if (!may_propagate_copy (op, val))
>  	return;
>  
> -      /* Do not propagate copies if the propagated value is at a deeper loop
> -	 depth than the propagatee.  Otherwise, this may move loop variant
> -	 variables outside of their loops and prevent coalescing
> -	 opportunities.  If the value was loop invariant, it will be hoisted
> -	 by LICM and exposed for copy propagation.  */
> -      if (loop_depth_of_name (val) > loop_depth_of_name (op))
> -	return;
> -
>        /* Do not propagate copies into simple IV increment statements.
>           See PR23821 for how this can disturb IV analysis.  */
>        if (TREE_CODE (val) != INTEGER_CST
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-5.c	(working copy)
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fno-tree-fre -fdump-tree-dom1-details" } */
> +
> +void foo(int *);
> +void f2(int dst[3], int R)
> +{
> +  int i, inter[2];
> +  _Bool inter0p = 0;
> +  _Bool inter1p = 0;
> +  for (i = 1; i < R; i++)
> +    {
> +      inter0p = 1;
> +      inter1p = 1;
> +    }
> +  if (inter0p)
> +    inter[0] = 1;
> +  if (inter1p)
> +    inter[1] = 1;
> +  foo(inter);
> +}
> +
> +/* { dg-final { scan-tree-dump "Threaded jump" "dom1" } } */
> +/* { dg-final { cleanup-tree-dump "dom1" } } */
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix parts of PR61607
  2014-06-26  9:01       ` Richard Biener
@ 2014-06-26 15:33         ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2014-06-26 15:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 06/26/14 02:58, Richard Biener wrote:
> On Thu, 26 Jun 2014, Richard Biener wrote:
>
>> On Thu, 26 Jun 2014, Richard Biener wrote:
>>
>>> On Wed, 25 Jun 2014, Jeff Law wrote:
>>>
>>>> On 06/25/14 08:05, Richard Biener wrote:
>>>>>
>>>>> This removes restrictions in DOM cprop_operand that inhibit
>>>>> some optimizations.  The volatile pointer thing is really realy
>>>>> old and no longer necessary while the loop-depth consideration
>>>>> is only valid for loop-closed PHI nodes (but we're not in
>>>>> loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
>>>>> phase by inserting copies appropriately.
>>>>>
>>>>> Bootstrapped on x86_64-unknown-linux-gnu, ok?
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>> 2014-06-25  Richard Biener  <rguenther@suse.de>
>>>>>
>>>>> 	PR tree-optimization/61607
>>>>> 	* tree-ssa-dom.c (cprop_operand): Remove restriction on
>>>>> 	propagating volatile pointers and on loop depth.
>>>> The first hunk is OK.
>>>>
>>>> I thought we had tests for the do not copy propagate out of a loop nest in the
>>>> suite.  Did you check that tests in BZ 19038 still generate good code after
>>>> this change?  If we still generate good code for those tests, then this hunk
>>>> is fine too.
>>>
>>> I have applied the first hunk and will investigate further.  Testing
>>> didn't show any issue and I know how to retain the check but not
>>> cause the missed optimization shown in PR61607.
>>
>> Let's try to summarize what the restriction is supposed to avoid.
>> It tries to avoid introducing uses of SSA names defined inside a
>> loop outside of it because if the SSA name is live over the backedge
>> we will then have an overlapping life-range which prevents out-of-SSA
>> from coalescing it to a single register.
>>
>> Now, the existing test is not working in that way.
>>
>> Rather the best way we have to ensure this property (all outside
>> uses go through a copy that is placed on exit edges rather than
>> possibly on the backedge) is to go into loop-closed SSA form.
>> This is also where the PHI nodes that confuse DOM in PR61607
>> come from in the first place.
>>
>> Now as the existing measure is ineffective in some cases out-of-SSA
>> has gotten the ability to deal with this (or a subset):
>>
>>    /* If elimination of a PHI requires inserting a copy on a backedge,
>>       then we will have to split the backedge which has numerous
>>       undesirable performance effects.
>>
>>       A significant number of such cases can be handled here by inserting
>>       copies into the loop itself.  */
>>    insert_backedge_copies ();
>>
>> now, this doesn't seem to deal with outside uses.  But eventually
>> the coalescing code already assigns proper cost to backedge copies
>> so that we choose to place copies on the exit edges rather than
>> the backedge ones - seems not so from looking at coalesce_cost_edge.
>>
>> So I think that we should remove the copy-propagation restrictions
>> and instead address this in out-of-SSA.
>>
>> For now the following patch retains the exact same restriction in
>> DOM as it is present in copyprop (but not in FRE - ok my recent fault,
>> or in VRP).  By avoiding to record the equivalency for PHIs
>> (where we know that either all or no uses should be covered by
>> the loop depth check) we retain the ability to record the equivalency
>> for the two loop exit PHI nodes and thus the threading (if only
>> on the false path).
>>
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>>
>> I'll try to see what happens to the PR19038 testcases (though
>> that PR is a mess ...)
>
> I checked the very original one (thin6d.f from sixtrack) and the
> generated assembly for -Ofast is the same without any patch
> and with _all_ loop_depth_of_name restrictions removed from
> both DOM and copyprop (thus making loop_depth_of_name dead).
>
> The cost of out-of-SSA copies for backedges (or in the case
> of the PR, loop latch edges causing an edge split) is dealt
> with by
>
>    /* Inserting copy on critical edge costs more than inserting it
> elsewhere.  */
>    if (EDGE_CRITICAL_P (e))
>      mult = 2;
>
> in coalesce_cost_edge.
>
> So in the end, without a testcase to investigate, I'd propose
> to get rid of those restrictions.  I'm still going forward
> with the patch below for now.
Sounds good.  Glad to see those hacks disappear.

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix parts of PR61607
  2014-06-25 14:09 Richard Biener
@ 2014-06-25 18:34 ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2014-06-25 18:34 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

On 06/25/14 08:06, Richard Biener wrote:
>
> This removes prematurely killing loops during jump threading
> and moves it to a spot where we know we didn't cancel the
> jump thread.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Ok?
>
> Thanks,
> Richard.
>
> 2014-06-25  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/61607
> 	* tree-ssa-threadupdate.c (ssa_redirect_edges): Cancel the
> 	loop if we redirected its latch edge.
> 	(thread_block_1): Do not cancel loops prematurely.
This is fine.  Looking at how often we prematurely cancel loops has been 
on my todo list for a while.  Thanks for taking care of it.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Fix parts of PR61607
@ 2014-06-25 14:09 Richard Biener
  2014-06-25 18:34 ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2014-06-25 14:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: law


This removes prematurely killing loops during jump threading
and moves it to a spot where we know we didn't cancel the
jump thread.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok?

Thanks,
Richard.

2014-06-25  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/61607
	* tree-ssa-threadupdate.c (ssa_redirect_edges): Cancel the
	loop if we redirected its latch edge.
	(thread_block_1): Do not cancel loops prematurely.

Index: gcc/tree-ssa-threadupdate.c
===================================================================
*** gcc/tree-ssa-threadupdate.c	(revision 211978)
--- gcc/tree-ssa-threadupdate.c	(working copy)
*************** ssa_redirect_edges (struct redirection_d
*** 764,769 ****
--- 764,777 ----
  	  if ((*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK)
  	    EDGE_SUCC (rd->dup_blocks[0], 0)->count += e->count;
  
+ 	  /* If we redirect a loop latch edge cancel its loop.  */
+ 	  if (e->src == e->src->loop_father->latch)
+ 	    {
+ 	      e->src->loop_father->header = NULL;
+ 	      e->src->loop_father->latch = NULL;
+ 	      loops_state_set (LOOPS_NEED_FIXUP);
+ 	    }
+ 
  	  /* Redirect the incoming edge (possibly to the joiner block) to the
  	     appropriate duplicate block.  */
  	  e2 = redirect_edge_and_branch (e, rd->dup_blocks[0]);
*************** thread_block_1 (basic_block bb, bool nol
*** 844,850 ****
    edge e, e2;
    edge_iterator ei;
    ssa_local_info_t local_info;
-   struct loop *loop = bb->loop_father;
  
    /* To avoid scanning a linear array for the element we need we instead
       use a hash table.  For normal code there should be no noticeable
--- 852,857 ----
*************** thread_block_1 (basic_block bb, bool nol
*** 853,884 ****
    redirection_data
      = new hash_table<struct redirection_data> (EDGE_COUNT (bb->succs));
  
-   /* If we thread the latch of the loop to its exit, the loop ceases to
-      exist.  Make sure we do not restrict ourselves in order to preserve
-      this loop.  */
-   if (loop->header == bb)
-     {
-       e = loop_latch_edge (loop);
-       vec<jump_thread_edge *> *path = THREAD_PATH (e);
- 
-       if (path
- 	  && (((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && joiners)
- 	      || ((*path)[1]->type == EDGE_COPY_SRC_BLOCK && !joiners)))
- 	{
- 	  for (unsigned int i = 1; i < path->length (); i++)
- 	    {
- 	      edge e2 = (*path)[i]->e;
- 
- 	      if (loop_exit_edge_p (loop, e2))
- 		{
- 		  loop->header = NULL;
- 		  loop->latch = NULL;
- 		  loops_state_set (LOOPS_NEED_FIXUP);
- 		}
- 	    }
- 	}
-     }
- 
    /* Record each unique threaded destination into a hash table for
       efficient lookups.  */
    FOR_EACH_EDGE (e, ei, bb->preds)
--- 860,865 ----

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-06-26 15:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 14:07 [PATCH] Fix parts of PR61607 Richard Biener
2014-06-25 18:28 ` Jeff Law
2014-06-26  7:57   ` Richard Biener
2014-06-26  8:33     ` Richard Biener
2014-06-26  9:01       ` Richard Biener
2014-06-26 15:33         ` Jeff Law
2014-06-25 14:09 Richard Biener
2014-06-25 18:34 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).