public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
@ 2020-02-05  1:04 Jeff Law
  2020-02-05  6:26 ` Jakub Jelinek
  2020-02-05 13:31 ` Richard Sandiford
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff Law @ 2020-02-05  1:04 UTC (permalink / raw)
  To: richard.sandiford, segher; +Cc: gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 3168 bytes --]


Richard & Segher, if y'all could check my analysis here, it'd be
appreciated.

pr90275 is a P2 regression that is only triggering on ARM.  David's
testcase in c#1 is the best for this problem as it doesn't require
magic flags like -fno-dce to trigger.

The block in question:

> (code_label 89 88 90 24 15 (nil) [0 uses])
> (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> (insn 97 90 98 24 (parallel [
>             (set (reg:CC 100 cc)
>                 (compare:CC (reg:SI 131 [ d_lsm.21 ])
>                     (const_int 0 [0])))
>             (set (reg:SI 135 [ d_lsm.21 ])
>                 (reg:SI 131 [ d_lsm.21 ]))
>         ]) "pr90275.c":21:45 248 {*movsi_compare0}
>      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])
>         (nil)))
> (insn 98 97 151 24 (set (reg:SI 136 [+4 ])
>         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])
>         (expr_list:REG_DEAD (reg:CC 100 cc)
>             (nil))))
> (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])
>         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])
>         (nil)))
> (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])
>         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>      (expr_list:REG_DEAD (reg:SI 136 [+4 ])
>         (nil)))
> 
insns 97 and 151 are the most important.

We process insn 97 which creates an equivalency between r135 and r131. 
This is expressed by putting both on on the "same_value" chain
(table_elt->{next,prev}_same_value).

When we put the REGs on the chain we'll set REG_QTY to a positive
number which indicates their values are valid.

We continue processing insns forward and run into insn 151 which is a
self-copy.

First CSE will invalidate r131 (because its set).  Invalidation is
accomplished by setting REG_QTY for r131 to a negative value.  It does
not remove r131 from the same value chains.

Then CSE will call insert_regs for r131.  The qty is not valid, so we
get into this code:

>      if (modified || ! qty_valid)
>         {
>           if (classp)
>             for (classp = classp->first_same_value;
>                  classp != 0;
>                  classp = classp->next_same_value)
>               if (REG_P (classp->exp)
>                   && GET_MODE (classp->exp) == GET_MODE (x))
>                 {
>                   unsigned c_regno = REGNO (classp->exp);
> 
>                   gcc_assert (REGNO_QTY_VALID_P (c_regno));
> [ ... ]

So we walk the chain of same values for r131.  WHen walking we run into
r131 itself.  Since r131 has been invalidated  we trip the assert.


The fix is pretty simple.  We just arrange to stop processing insns
that are nop reg->reg copies much like we already do for mem->mem
copies and (set (pc) (pc)).

This has bootstrapped and regression tested on x86_64.  I've also
verified it fixes the testcase in c#1 of pr90275, the test in pr93125
and pr92388.  Interestingly enough I couldn't trigger the original
testcase in 90275, but I'm confident this is ultimately all the same
problem.

OK for the trunk?

Thanks,
Jeff

[-- Attachment #2: P --]
[-- Type: text/x-patch, Size: 1823 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d6b5ded32a4..90d9f9d92d3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-04  Jeff Law <law@redhat.com>
+
+	PR rtl-optimization/90275
+	* cse.c (cse_insn): Stop processing reg->reg nop moves early.
+
 2020-02-04  Richard Biener  <rguenther@suse.de>
 
 	PR tree-optimization/93538
diff --git a/gcc/cse.c b/gcc/cse.c
index 79ee0ce80e3..6e18bdae85f 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)
 	  sets[i].rtl = 0;
 	}
 
+      /* Similarly for no-op moves.  */
+      if (n_sets == 1
+	  && GET_CODE (src) == REG
+	  && src == dest)
+	{
+	  cse_cfg_altered |= delete_insn_and_edges (insn);
+	  /* No more processing for this set.  */
+	  sets[i].rtl = 0;
+	}
+
       /* If this SET is now setting PC to a label, we know it used to
 	 be a conditional or computed branch.  */
       else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d2dc6648bc4..7be52bd6d2a 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-04  Jeff Law  <law@redhat.com>
+
+	PR rtl-optimization/90275
+	* gcc.c-torture/compile/pr90275.c: New test
+
 2020-02-04  David Malcolm  <dmalcolm@redhat.com>
 
 	* gcc.dg/analyzer/data-model-1.c (struct coord): Convert fields
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
new file mode 100644
index 00000000000..83e0df77226
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
@@ -0,0 +1,27 @@
+a, b, c;
+
+long long d;
+
+e() {
+
+  char f;
+
+  for (;;) {
+
+    c = a = c ? 5 : 0;
+
+    if (f) {
+
+      b = a;
+
+      f = d;
+
+    }
+
+    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
+
+  }
+
+}
+
+

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-05  1:04 [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse Jeff Law
@ 2020-02-05  6:26 ` Jakub Jelinek
  2020-02-05 12:09   ` Segher Boessenkool
  2020-02-05 13:31 ` Richard Sandiford
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Jelinek @ 2020-02-05  6:26 UTC (permalink / raw)
  To: Jeff Law; +Cc: richard.sandiford, segher, gcc-patches List

On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote:
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)
>  	  sets[i].rtl = 0;
>  	}
>  
> +      /* Similarly for no-op moves.  */
> +      if (n_sets == 1
> +	  && GET_CODE (src) == REG

Just nits:
REG_P (src) ?

> +	  && src == dest)

Is pointer comparison ok?  I mean, shouldn't we instead do
rtx_equal_p (src, dest), set_noop_p (sets[i].rtl) or noop_move_p (insn)?

> +	* gcc.c-torture/compile/pr90275.c: New test

Missing full stop.

> +++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
> @@ -0,0 +1,27 @@
> +a, b, c;

int 

> +
> +long long d;
> +
> +e() {

void

(unless the ommission of those makes it not reproduce anymore, which I
doubt).

> +
> +  char f;
> +
> +  for (;;) {
> +
> +    c = a = c ? 5 : 0;
> +
> +    if (f) {
> +
> +      b = a;
> +
> +      f = d;
> +
> +    }
> +
> +    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
> +
> +  }
> +
> +}


	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-05  6:26 ` Jakub Jelinek
@ 2020-02-05 12:09   ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2020-02-05 12:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, richard.sandiford, gcc-patches List

Hi all,

On Wed, Feb 05, 2020 at 07:26:03AM +0100, Jakub Jelinek wrote:
> On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote:
> > --- a/gcc/cse.c
> > +++ b/gcc/cse.c
> > @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)
> >  	  sets[i].rtl = 0;
> >  	}
> >  
> > +      /* Similarly for no-op moves.  */

It says "no-op MEM moves" right above, so it should say "no-op REG
moves" here?

> > +      if (n_sets == 1
> > +	  && GET_CODE (src) == REG
> 
> Just nits:
> REG_P (src) ?

Hey that is my nit!  Find your own!  ;-)

> > +	  && src == dest)
> 
> Is pointer comparison ok?

It isn't, it doesn't work for hard registers.

> I mean, shouldn't we instead do
> rtx_equal_p (src, dest),

This does not see e.g.
  (set (reg:SI 123) (subreg:SI (reg:DI 123) 0))
as no-op move.

> set_noop_p (sets[i].rtl)

This doesn't catch all such cases either.

> or noop_move_p (insn)?

And this one is plain wrong (it should be called something with "maybe"
in the name, it returns false if it thinks that may lead to better
optimisation, see the REG_EQUAL handling).

What we need here is a test whether CSE can ignore this insn, and we
will run into this problem if we don't (Jeff's analysis).  Does CSE
already have everything it uses to make that decision scribbled away
somewhere, when we get here?  It would be good if we could use the
exact same condition (same predicate function for example) as what
would lead to the problem later, or we'll be playing whack-a-mole for
a while (or CSE is completely rewritten soon, my preferred option, but
"soon" on a GCC timescale will take way too long for the PR).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-05  1:04 [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse Jeff Law
  2020-02-05  6:26 ` Jakub Jelinek
@ 2020-02-05 13:31 ` Richard Sandiford
  2020-02-06 13:01   ` Jeff Law
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Sandiford @ 2020-02-05 13:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: segher, gcc-patches List

Jeff Law <law@redhat.com> writes:
> Richard & Segher, if y'all could check my analysis here, it'd be
> appreciated.
>
> pr90275 is a P2 regression that is only triggering on ARM.  David's
> testcase in c#1 is the best for this problem as it doesn't require
> magic flags like -fno-dce to trigger.
>
> The block in question:
>
>> (code_label 89 88 90 24 15 (nil) [0 uses])
>> (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
>> (insn 97 90 98 24 (parallel [
>>             (set (reg:CC 100 cc)
>>                 (compare:CC (reg:SI 131 [ d_lsm.21 ])
>>                     (const_int 0 [0])))
>>             (set (reg:SI 135 [ d_lsm.21 ])
>>                 (reg:SI 131 [ d_lsm.21 ]))
>>         ]) "pr90275.c":21:45 248 {*movsi_compare0}
>>      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])
>>         (nil)))
>> (insn 98 97 151 24 (set (reg:SI 136 [+4 ])
>>         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>>      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])
>>         (expr_list:REG_DEAD (reg:CC 100 cc)
>>             (nil))))
>> (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])
>>         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>>      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])
>>         (nil)))
>> (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])
>>         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
>>      (expr_list:REG_DEAD (reg:SI 136 [+4 ])
>>         (nil)))
>> 
> insns 97 and 151 are the most important.
>
> We process insn 97 which creates an equivalency between r135 and r131. 
> This is expressed by putting both on on the "same_value" chain
> (table_elt->{next,prev}_same_value).
>
> When we put the REGs on the chain we'll set REG_QTY to a positive
> number which indicates their values are valid.
>
> We continue processing insns forward and run into insn 151 which is a
> self-copy.
>
> First CSE will invalidate r131 (because its set).  Invalidation is
> accomplished by setting REG_QTY for r131 to a negative value.  It does
> not remove r131 from the same value chains.
>
> Then CSE will call insert_regs for r131.  The qty is not valid, so we
> get into this code:
>
>>      if (modified || ! qty_valid)
>>         {
>>           if (classp)
>>             for (classp = classp->first_same_value;
>>                  classp != 0;
>>                  classp = classp->next_same_value)
>>               if (REG_P (classp->exp)
>>                   && GET_MODE (classp->exp) == GET_MODE (x))
>>                 {
>>                   unsigned c_regno = REGNO (classp->exp);
>> 
>>                   gcc_assert (REGNO_QTY_VALID_P (c_regno));
>> [ ... ]
>
> So we walk the chain of same values for r131.  WHen walking we run into
> r131 itself.  Since r131 has been invalidated  we trip the assert.
>
>
> The fix is pretty simple.  We just arrange to stop processing insns
> that are nop reg->reg copies much like we already do for mem->mem
> copies and (set (pc) (pc)).
>
> This has bootstrapped and regression tested on x86_64.  I've also
> verified it fixes the testcase in c#1 of pr90275, the test in pr93125
> and pr92388.  Interestingly enough I couldn't trigger the original
> testcase in 90275, but I'm confident this is ultimately all the same
> problem.

This looks similar to the infamous (to me):

   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html

which had to be reverted because it broke powerpc64 bootstrap.
The problem was that n_sets is misleading for calls:

   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html

That's easy to fix (and I have a fix).  But given the damage this caused
last time, I think it's probably best left to GCC 11.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-05 13:31 ` Richard Sandiford
@ 2020-02-06 13:01   ` Jeff Law
  2020-02-06 13:57     ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-02-06 13:01 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: segher, gcc-patches List

On Wed, 2020-02-05 at 13:30 +0000, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
> > Richard & Segher, if y'all could check my analysis here, it'd be
> > appreciated.
> > 
> > pr90275 is a P2 regression that is only triggering on ARM.  David's
> > testcase in c#1 is the best for this problem as it doesn't require
> > magic flags like -fno-dce to trigger.
> > 
> > The block in question:
> > 
> > > (code_label 89 88 90 24 15 (nil) [0 uses])
> > > (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> > > (insn 97 90 98 24 (parallel [
> > >             (set (reg:CC 100 cc)
> > >                 (compare:CC (reg:SI 131 [ d_lsm.21 ])
> > >                     (const_int 0 [0])))
> > >             (set (reg:SI 135 [ d_lsm.21 ])
> > >                 (reg:SI 131 [ d_lsm.21 ]))
> > >         ]) "pr90275.c":21:45 248 {*movsi_compare0}
> > >      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])
> > >         (nil)))
> > > (insn 98 97 151 24 (set (reg:SI 136 [+4 ])
> > >         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])
> > >         (expr_list:REG_DEAD (reg:CC 100 cc)
> > >             (nil))))
> > > (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])
> > >         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])
> > >         (nil)))
> > > (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])
> > >         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >      (expr_list:REG_DEAD (reg:SI 136 [+4 ])
> > >         (nil)))
> > > 
> > insns 97 and 151 are the most important.
> > 
> > We process insn 97 which creates an equivalency between r135 and r131. 
> > This is expressed by putting both on on the "same_value" chain
> > (table_elt->{next,prev}_same_value).
> > 
> > When we put the REGs on the chain we'll set REG_QTY to a positive
> > number which indicates their values are valid.
> > 
> > We continue processing insns forward and run into insn 151 which is a
> > self-copy.
> > 
> > First CSE will invalidate r131 (because its set).  Invalidation is
> > accomplished by setting REG_QTY for r131 to a negative value.  It does
> > not remove r131 from the same value chains.
> > 
> > Then CSE will call insert_regs for r131.  The qty is not valid, so we
> > get into this code:
> > 
> > >      if (modified || ! qty_valid)
> > >         {
> > >           if (classp)
> > >             for (classp = classp->first_same_value;
> > >                  classp != 0;
> > >                  classp = classp->next_same_value)
> > >               if (REG_P (classp->exp)
> > >                   && GET_MODE (classp->exp) == GET_MODE (x))
> > >                 {
> > >                   unsigned c_regno = REGNO (classp->exp);
> > > 
> > >                   gcc_assert (REGNO_QTY_VALID_P (c_regno));
> > > [ ... ]
> > 
> > So we walk the chain of same values for r131.  WHen walking we run into
> > r131 itself.  Since r131 has been invalidated  we trip the assert.
> > 
> > 
> > The fix is pretty simple.  We just arrange to stop processing insns
> > that are nop reg->reg copies much like we already do for mem->mem
> > copies and (set (pc) (pc)).
> > 
> > This has bootstrapped and regression tested on x86_64.  I've also
> > verified it fixes the testcase in c#1 of pr90275, the test in pr93125
> > and pr92388.  Interestingly enough I couldn't trigger the original
> > testcase in 90275, but I'm confident this is ultimately all the same
> > problem.
> 
> This looks similar to the infamous (to me):
> 
>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html
> 
> which had to be reverted because it broke powerpc64 bootstrap.
> The problem was that n_sets is misleading for calls:
> 
>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html
> 
> That's easy to fix (and I have a fix).  But given the damage this caused
> last time, I think it's probably best left to GCC 11.
Yea, it's closely related.  In your case you need to effectively ignore
the nop insn to get the optimization you want.  In mine that nop insn
causes an ICE.

I think we could take your cse bits + adding a !CALL_P separately from
the simplify-rtx stuff which Segher objected to.  THat'd likely solve
the ARM ICEs and take you a tiny step forward on optimizing that SVE
case.  Thoughts?

Jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-06 13:01   ` Jeff Law
@ 2020-02-06 13:57     ` Segher Boessenkool
  2020-02-07 16:01       ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2020-02-06 13:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Sandiford, gcc-patches List

On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> Yea, it's closely related.  In your case you need to effectively ignore
> the nop insn to get the optimization you want.  In mine that nop insn
> causes an ICE.
> 
> I think we could take your cse bits + adding a !CALL_P separately from
> the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> the ARM ICEs and take you a tiny step forward on optimizing that SVE
> case.  Thoughts?

CSE should consistently keep track of what insns are no-op moves (in its
definition, all passes have a slightly different definition of this),
and use that everywhere consistently.

(Or we should rewrite CSE).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-06 13:57     ` Segher Boessenkool
@ 2020-02-07 16:01       ` Jeff Law
  2020-02-08 16:41         ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-02-07 16:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Sandiford, gcc-patches List

On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:
> On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> > Yea, it's closely related.  In your case you need to effectively ignore
> > the nop insn to get the optimization you want.  In mine that nop insn
> > causes an ICE.
> > 
> > I think we could take your cse bits + adding a !CALL_P separately from
> > the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> > the ARM ICEs and take you a tiny step forward on optimizing that SVE
> > case.  Thoughts?
> 
> CSE should consistently keep track of what insns are no-op moves (in its
> definition, all passes have a slightly different definition of this),
> and use that everywhere consistently.
So does that mean you object to the cse.c portion of Richard's patch?

Jeff
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-07 16:01       ` Jeff Law
@ 2020-02-08 16:41         ` Segher Boessenkool
  2020-03-12 18:03           ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2020-02-08 16:41 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Sandiford, gcc-patches List

On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote:
> On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:
> > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> > > Yea, it's closely related.  In your case you need to effectively ignore
> > > the nop insn to get the optimization you want.  In mine that nop insn
> > > causes an ICE.
> > > 
> > > I think we could take your cse bits + adding a !CALL_P separately from
> > > the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> > > the ARM ICEs and take you a tiny step forward on optimizing that SVE
> > > case.  Thoughts?
> > 
> > CSE should consistently keep track of what insns are no-op moves (in its
> > definition, all passes have a slightly different definition of this),
> > and use that everywhere consistently.
> So does that mean you object to the cse.c portion of Richard's patch?

It's more a "what we need to do in the future" thing, it is stage 4 now,
it is too big a change to do now.

What patch?  The "34" patch?  https://gcc.gnu.org/r278411 .

I don't think each stanza of code should use it's own "noop-ness", no.

I don't know if this patch makes matters worse or not.  It doesn't seem
suitable for stage 4 though.  And Richard said the cse.c part breaks
rs6000, if that is true, yes I do object ;-)


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-02-08 16:41         ` Segher Boessenkool
@ 2020-03-12 18:03           ` Jeff Law
  2020-03-12 18:23             ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-03-12 18:03 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Sandiford, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote:
> > On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:
> > > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> > > > Yea, it's closely related.  In your case you need to effectively ignore
> > > > the nop insn to get the optimization you want.  In mine that nop insn
> > > > causes an ICE.
> > > > 
> > > > I think we could take your cse bits + adding a !CALL_P separately from
> > > > the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> > > > the ARM ICEs and take you a tiny step forward on optimizing that SVE
> > > > case.  Thoughts?
> > > 
> > > CSE should consistently keep track of what insns are no-op moves (in its
> > > definition, all passes have a slightly different definition of this),
> > > and use that everywhere consistently.
> > So does that mean you object to the cse.c portion of Richard's patch?
> 
> It's more a "what we need to do in the future" thing, it is stage 4 now,
> it is too big a change to do now.
I suspect you're referring to the simplify-rtx bits from his patch which I agree
are not appropriate for stage4.  The cse bits from that patch are are simple.

> 
> What patch?  The "34" patch?  https://gcc.gnu.org/r278411 .
> 
> I don't think each stanza of code should use it's own "noop-ness", no.
Richard's patch is actually better than mine in that regard as it handles mem and
reg nop moves in an indentical way.  I don't think refactoring the cse.c code is
advisable now though -- but I do want to fix the multiply-reported ICE on ARM and
Richard's cse changes are the cleanest way to do that that I can see.



> 
> I don't know if this patch makes matters worse or not.  It doesn't seem
> suitable for stage 4 though.  And Richard said the cse.c part breaks
> rs6000, if that is true, yes I do object ;-)
The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it through
my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native
and more.

Concretely I'm proposing the following patch to address 90275 and its duplicates.



[-- Attachment #2: P --]
[-- Type: text/plain, Size: 2017 bytes --]

	PR rtl-optimization/90275
	* cse.c (cse_insn): Delete no-op register moves too.

	PR rtl-optimization/90275
	* gcc.c-torture/compile/pr90275.c: New test.

diff --git a/gcc/cse.c b/gcc/cse.c
index 79ee0ce80e3..1779bb9a333 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4625,7 +4625,7 @@ cse_insn (rtx_insn *insn)
   for (i = 0; i < n_sets; i++)
     {
       bool repeat = false;
-      bool mem_noop_insn = false;
+      bool noop_insn = false;
       rtx src, dest;
       rtx src_folded;
       struct table_elt *elt = 0, *p;
@@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
 	    }
 
 	  /* Similarly, lots of targets don't allow no-op
-	     (set (mem x) (mem x)) moves.  */
+	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
+	     might be impossible for certain registers (like CC registers).  */
 	  else if (n_sets == 1
-		   && MEM_P (trial)
+		   && ! CALL_P (insn)
+		   && (MEM_P (trial) || REG_P (trial))
 		   && MEM_P (dest)
 		   && rtx_equal_p (trial, dest)
 		   && !side_effects_p (dest)
@@ -5334,7 +5336,7 @@ cse_insn (rtx_insn *insn)
 		       || insn_nothrow_p (insn)))
 	    {
 	      SET_SRC (sets[i].rtl) = trial;
-	      mem_noop_insn = true;
+	      noop_insn = true;
 	      break;
 	    }
 
@@ -5562,8 +5564,8 @@ cse_insn (rtx_insn *insn)
 	  sets[i].rtl = 0;
 	}
 
-      /* Similarly for no-op MEM moves.  */
-      else if (mem_noop_insn)
+      /* Similarly for no-op moves.  */
+      else if (noop_insn)
 	{
 	  if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
 	    cse_cfg_altered = true;
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
new file mode 100644
index 00000000000..83e0df77226
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
@@ -0,0 +1,27 @@
+a, b, c;
+
+long long d;
+
+e() {
+
+  char f;
+
+  for (;;) {
+
+    c = a = c ? 5 : 0;
+
+    if (f) {
+
+      b = a;
+
+      f = d;
+
+    }
+
+    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
+
+  }
+
+}
+
+

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 18:03           ` Jeff Law
@ 2020-03-12 18:23             ` Segher Boessenkool
  2020-03-12 18:47               ` Jeff Law
  2020-03-12 22:11               ` Jeff Law
  0 siblings, 2 replies; 17+ messages in thread
From: Segher Boessenkool @ 2020-03-12 18:23 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Sandiford, gcc-patches List

On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> > I don't think each stanza of code should use it's own "noop-ness", no.
> Richard's patch is actually better than mine in that regard as it handles mem and
> reg nop moves in an indentical way.  I don't think refactoring the cse.c code is
> advisable now though -- but I do want to fix the multiply-reported ICE on ARM and
> Richard's cse changes are the cleanest way to do that that I can see.

It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
this patch does not make things worse.

> > I don't know if this patch makes matters worse or not.  It doesn't seem
> > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > rs6000, if that is true, yes I do object ;-)
> The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it through
> my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native
> and more.

I don't see anything rs6000 below?  Is it just this generic code?

> @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
>  	    }
>  
>  	  /* Similarly, lots of targets don't allow no-op
> -	     (set (mem x) (mem x)) moves.  */
> +	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> +	     might be impossible for certain registers (like CC registers).  */
>  	  else if (n_sets == 1
> -		   && MEM_P (trial)
> +		   && ! CALL_P (insn)
> +		   && (MEM_P (trial) || REG_P (trial))
>  		   && MEM_P (dest)
>  		   && rtx_equal_p (trial, dest)
>  		   && !side_effects_p (dest)

This adds the !CALL_P (no space btw) condition, why is that?

(Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
patterns for that, or *should* have at least!)


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 18:23             ` Segher Boessenkool
@ 2020-03-12 18:47               ` Jeff Law
  2020-03-12 20:26                 ` Segher Boessenkool
  2020-03-12 22:11               ` Jeff Law
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-03-12 18:47 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Sandiford, gcc-patches List

On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> 
> > > I don't know if this patch makes matters worse or not.  It doesn't seem
> > > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > > rs6000, if that is true, yes I do object ;-)
> > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it
> > through
> > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86
> > native
> > and more.
> 
> I don't see anything rs6000 below?  Is it just this generic code?
It's just generic code.  THe rs6000 issue is fixed by the !CALL_P condition.

> 
> > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
> >  	    }
> >  
> >  	  /* Similarly, lots of targets don't allow no-op
> > -	     (set (mem x) (mem x)) moves.  */
> > +	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> > +	     might be impossible for certain registers (like CC registers).  */
> >  	  else if (n_sets == 1
> > -		   && MEM_P (trial)
> > +		   && ! CALL_P (insn)
> > +		   && (MEM_P (trial) || REG_P (trial))
> >  		   && MEM_P (dest)
> >  		   && rtx_equal_p (trial, dest)
> >  		   && !side_effects_p (dest)
> 
> This adds the !CALL_P (no space btw) condition, why is that?
Because n_sets is not valid for CALL_P insns which resulted in a failure on ppc. 
See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine if
we had a nop register set in parallel with a (set (reg) (call ...)).  We'd end up
deleting the entire PARALLEL which is obviously wrong.

One could argue that find_sets_in_insn should be fixed as well.  I'd be worried
about fallout from that.

jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 18:47               ` Jeff Law
@ 2020-03-12 20:26                 ` Segher Boessenkool
  2020-03-12 20:56                   ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2020-03-12 20:26 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Sandiford, gcc-patches List

On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:
> On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> > >  	  else if (n_sets == 1
> > > -		   && MEM_P (trial)
> > > +		   && ! CALL_P (insn)
> > > +		   && (MEM_P (trial) || REG_P (trial))
> > >  		   && MEM_P (dest)
> > >  		   && rtx_equal_p (trial, dest)
> > >  		   && !side_effects_p (dest)
> > 
> > This adds the !CALL_P (no space btw) condition, why is that?
> Because n_sets is not valid for CALL_P insns which resulted in a failure on ppc. 
> See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine if
> we had a nop register set in parallel with a (set (reg) (call ...)).  We'd end up
> deleting the entire PARALLEL which is obviously wrong.

Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE
sees a TOC set parallel with a call as a no-op because it is set to the
same value (an unspec, not an unspec_volatile) that GCC can derive is
already in the TOC reg?  Or is this some other case?

The change sounds fine, fwiw.

> One could argue that find_sets_in_insn should be fixed as well.  I'd be worried
> about fallout from that.

Should it ignore all SETs in parallel with a call?  Or what do you want
to fix there?


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 20:26                 ` Segher Boessenkool
@ 2020-03-12 20:56                   ` Jeff Law
  2020-03-13 10:29                     ` Richard Sandiford
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-03-12 20:56 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Sandiford, gcc-patches List

On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:
> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> > > >  	  else if (n_sets == 1
> > > > -		   && MEM_P (trial)
> > > > +		   && ! CALL_P (insn)
> > > > +		   && (MEM_P (trial) || REG_P (trial))
> > > >  		   && MEM_P (dest)
> > > >  		   && rtx_equal_p (trial, dest)
> > > >  		   && !side_effects_p (dest)
> > > 
> > > This adds the !CALL_P (no space btw) condition, why is that?
> > Because n_sets is not valid for CALL_P insns which resulted in a failure on
> > ppc. 
> > See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine
> > if
> > we had a nop register set in parallel with a (set (reg) (call ...)).  We'd
> > end up
> > deleting the entire PARALLEL which is obviously wrong.
> 
> Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE
> sees a TOC set parallel with a call as a no-op because it is set to the
> same value (an unspec, not an unspec_volatile) that GCC can derive is
> already in the TOC reg?  Or is this some other case?
Not entirely sure.  Richard's message didn't include the precise details. 

> 
> The change sounds fine, fwiw.
> 
> > One could argue that find_sets_in_insn should be fixed as well.  I'd be
> > worried
> > about fallout from that.
> 
> Should it ignore all SETs in parallel with a call?  Or what do you want
> to fix there?
I was thinking the other way.  Make it include sets even those for the function
return value.  Having n_sets's meaning be different for CALL_INSNs vs other INSNs
just seems like a poor implementation decision because of the inconsistency.

jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 18:23             ` Segher Boessenkool
  2020-03-12 18:47               ` Jeff Law
@ 2020-03-12 22:11               ` Jeff Law
  2020-03-13  8:09                 ` Christophe Lyon
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2020-03-12 22:11 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Sandiford, gcc-patches List

On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> > > I don't think each stanza of code should use it's own "noop-ness", no.
> > Richard's patch is actually better than mine in that regard as it handles mem
> > and
> > reg nop moves in an indentical way.  I don't think refactoring the cse.c code
> > is
> > advisable now though -- but I do want to fix the multiply-reported ICE on ARM
> > and
> > Richard's cse changes are the cleanest way to do that that I can see.
> 
> It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
> this patch does not make things worse.
> 
> > > I don't know if this patch makes matters worse or not.  It doesn't seem
> > > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > > rs6000, if that is true, yes I do object ;-)
> > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it
> > through
> > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86
> > native
> > and more.
> 
> I don't see anything rs6000 below?  Is it just this generic code?
> 
> > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
> >  	    }
> >  
> >  	  /* Similarly, lots of targets don't allow no-op
> > -	     (set (mem x) (mem x)) moves.  */
> > +	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> > +	     might be impossible for certain registers (like CC registers).  */
> >  	  else if (n_sets == 1
> > -		   && MEM_P (trial)
> > +		   && ! CALL_P (insn)
> > +		   && (MEM_P (trial) || REG_P (trial))
> >  		   && MEM_P (dest)
> >  		   && rtx_equal_p (trial, dest)
> >  		   && !side_effects_p (dest)
> 
> This adds the !CALL_P (no space btw) condition, why is that?
> 
> (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
> patterns for that, or *should* have at least!)
I fixed the extraneous whitespace and committed the change.

THanks,
jeff
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 22:11               ` Jeff Law
@ 2020-03-13  8:09                 ` Christophe Lyon
  2020-03-13 21:49                   ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Lyon @ 2020-03-13  8:09 UTC (permalink / raw)
  To: Jeff Law; +Cc: Segher Boessenkool, gcc-patches List

Hi,


On Thu, 12 Mar 2020 at 23:12, Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> > > > I don't think each stanza of code should use it's own "noop-ness", no.
> > > Richard's patch is actually better than mine in that regard as it handles mem
> > > and
> > > reg nop moves in an indentical way.  I don't think refactoring the cse.c code
> > > is
> > > advisable now though -- but I do want to fix the multiply-reported ICE on ARM
> > > and
> > > Richard's cse changes are the cleanest way to do that that I can see.
> >
> > It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
> > this patch does not make things worse.
> >
> > > > I don't know if this patch makes matters worse or not.  It doesn't seem
> > > > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > > > rs6000, if that is true, yes I do object ;-)
> > > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it
> > > through
> > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86
> > > native
> > > and more.
> >
> > I don't see anything rs6000 below?  Is it just this generic code?
> >
> > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
> > >         }
> > >
> > >       /* Similarly, lots of targets don't allow no-op
> > > -        (set (mem x) (mem x)) moves.  */
> > > +        (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> > > +        might be impossible for certain registers (like CC registers).  */
> > >       else if (n_sets == 1
> > > -              && MEM_P (trial)
> > > +              && ! CALL_P (insn)
> > > +              && (MEM_P (trial) || REG_P (trial))
> > >                && MEM_P (dest)
> > >                && rtx_equal_p (trial, dest)
> > >                && !side_effects_p (dest)
> >
> > This adds the !CALL_P (no space btw) condition, why is that?
> >
> > (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
> > patterns for that, or *should* have at least!)
> I fixed the extraneous whitespace and committed the change.
>

The new test fails on ARM:
FAIL: gcc.c-torture/compile/pr90275.c   -O3 -g  (internal compiler error)
during RTL pass: cse_local
/gcc/testsuite/gcc.c-torture/compile/pr90275.c: In function 'e':
/gcc/testsuite/gcc.c-torture/compile/pr90275.c:25:1: internal compiler
error: in insert_regs, at cse.c:1128
0x15725bd insert_regs
        /gcc/cse.c:1128
0x1579731 cse_insn
        /gcc/cse.c:5957
0x157aff6 cse_extended_basic_block
        /gcc/cse.c:6615
0x157aff6 cse_main
        /gcc/cse.c:6794
0x157bc0d rest_of_handle_cse_after_global_opts
        /gcc/cse.c:7766
0x157bc0d execute
        /gcc/cse.c:7817
Please submit a full bug report,


Is the patch supposed to fix all the ICEs on ARM?

I see this with cross-compilers, it seems OK on native builds? (I
can't find error reports about this on gcc-testresults)

Christophe


> THanks,
> jeff
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-12 20:56                   ` Jeff Law
@ 2020-03-13 10:29                     ` Richard Sandiford
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2020-03-13 10:29 UTC (permalink / raw)
  To: Jeff Law; +Cc: Segher Boessenkool, gcc-patches List

Jeff, thanks for picking this up.

Jeff Law <law@redhat.com> writes:
> On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote:
>> On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:
>> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
>> > > >  	  else if (n_sets == 1
>> > > > -		   && MEM_P (trial)
>> > > > +		   && ! CALL_P (insn)
>> > > > +		   && (MEM_P (trial) || REG_P (trial))
>> > > >  		   && MEM_P (dest)
>> > > >  		   && rtx_equal_p (trial, dest)
>> > > >  		   && !side_effects_p (dest)
>> > > 
>> > > This adds the !CALL_P (no space btw) condition, why is that?
>> > Because n_sets is not valid for CALL_P insns which resulted in a failure on
>> > ppc. 
>> > See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine
>> > if
>> > we had a nop register set in parallel with a (set (reg) (call ...)).  We'd
>> > end up
>> > deleting the entire PARALLEL which is obviously wrong.
>> 
>> Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE
>> sees a TOC set parallel with a call as a no-op because it is set to the
>> same value (an unspec, not an unspec_volatile) that GCC can derive is
>> already in the TOC reg?  Or is this some other case?
> Not entirely sure.  Richard's message didn't include the precise details. 

Yeah, that was exactly it.

On the bright side, removing many calls as dead made for an easy-to-debug
bootstrap failure :-)

Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
  2020-03-13  8:09                 ` Christophe Lyon
@ 2020-03-13 21:49                   ` Jeff Law
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Law @ 2020-03-13 21:49 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Segher Boessenkool, gcc-patches List

On Fri, 2020-03-13 at 09:09 +0100, Christophe Lyon wrote:
> Hi,
> 
> 
> On Thu, 12 Mar 2020 at 23:12, Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> > > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> > > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> > > > > I don't think each stanza of code should use it's own "noop-ness", no.
> > > > Richard's patch is actually better than mine in that regard as it handles
> > > > mem
> > > > and
> > > > reg nop moves in an indentical way.  I don't think refactoring the cse.c
> > > > code
> > > > is
> > > > advisable now though -- but I do want to fix the multiply-reported ICE on
> > > > ARM
> > > > and
> > > > Richard's cse changes are the cleanest way to do that that I can see.
> > > 
> > > It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
> > > this patch does not make things worse.
> > > 
> > > > > I don't know if this patch makes matters worse or not.  It doesn't seem
> > > > > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > > > > rs6000, if that is true, yes I do object ;-)
> > > > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it
> > > > through
> > > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86
> > > > native
> > > > and more.
> > > 
> > > I don't see anything rs6000 below?  Is it just this generic code?
> > > 
> > > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
> > > >         }
> > > > 
> > > >       /* Similarly, lots of targets don't allow no-op
> > > > -        (set (mem x) (mem x)) moves.  */
> > > > +        (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> > > > +        might be impossible for certain registers (like CC
> > > > registers).  */
> > > >       else if (n_sets == 1
> > > > -              && MEM_P (trial)
> > > > +              && ! CALL_P (insn)
> > > > +              && (MEM_P (trial) || REG_P (trial))
> > > >                && MEM_P (dest)
> > > >                && rtx_equal_p (trial, dest)
> > > >                && !side_effects_p (dest)
> > > 
> > > This adds the !CALL_P (no space btw) condition, why is that?
> > > 
> > > (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
> > > patterns for that, or *should* have at least!)
> > I fixed the extraneous whitespace and committed the change.
> > 
> 
> The new test fails on ARM:
THanks.  I see what's happening, though I'm not sure *how* it happened.  Anyway,
doing some testing on the fix now.

jeff
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-13 21:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05  1:04 [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse Jeff Law
2020-02-05  6:26 ` Jakub Jelinek
2020-02-05 12:09   ` Segher Boessenkool
2020-02-05 13:31 ` Richard Sandiford
2020-02-06 13:01   ` Jeff Law
2020-02-06 13:57     ` Segher Boessenkool
2020-02-07 16:01       ` Jeff Law
2020-02-08 16:41         ` Segher Boessenkool
2020-03-12 18:03           ` Jeff Law
2020-03-12 18:23             ` Segher Boessenkool
2020-03-12 18:47               ` Jeff Law
2020-03-12 20:26                 ` Segher Boessenkool
2020-03-12 20:56                   ` Jeff Law
2020-03-13 10:29                     ` Richard Sandiford
2020-03-12 22:11               ` Jeff Law
2020-03-13  8:09                 ` Christophe Lyon
2020-03-13 21:49                   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).