public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
@ 2023-10-14 22:14 Roger Sayle
  2023-10-14 23:02 ` Jeff Law
  2023-10-19 15:20 ` Jeff Law
  0 siblings, 2 replies; 8+ messages in thread
From: Roger Sayle @ 2023-10-14 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Jeff Law'

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]


This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:    AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:    MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/91865
        * combine.cc (make_compound_operation): Avoid creating a
        ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
        PR rtl-optimization/91865
        * gcc.target/msp430/pr91865.c: New test case.


Thanks in advance,
Roger
--


[-- Attachment #2: patchmc.txt --]
[-- Type: text/plain, Size: 965 bytes --]

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 360aa2f25e6..f47ff596782 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8453,6 +8453,9 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 					    new_rtx, GET_MODE (XEXP (x, 0)));
       if (tem)
 	return tem;
+      /* Avoid creating a ZERO_EXTEND of a ZERO_EXTEND.  */
+      if (GET_CODE (new_rtx) == ZERO_EXTEND)
+	new_rtx = XEXP (new_rtx, 0);
       SUBST (XEXP (x, 0), new_rtx);
       return x;
     }
diff --git a/gcc/testsuite/gcc.target/msp430/pr91865.c b/gcc/testsuite/gcc.target/msp430/pr91865.c
new file mode 100644
index 00000000000..8cc21c8b9e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr91865.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlarge" } */
+
+const int table[2] = {1, 2};
+int foo (char i) { return table[i]; }
+
+/* { dg-final { scan-assembler-not "AND" } } */
+/* { dg-final { scan-assembler-not "RRAM" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-14 22:14 [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation Roger Sayle
@ 2023-10-14 23:02 ` Jeff Law
  2023-10-15  9:49   ` Roger Sayle
  2023-10-19 15:20 ` Jeff Law
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-10-14 23:02 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches



On 10/14/23 16:14, Roger Sayle wrote:
> 
> This patch is my proposed solution to PR rtl-optimization/91865.
> Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
> to a single ZERO_EXTEND, but as shown in this PR it is possible for
> combine's make_compound_operation to unintentionally generate a
> non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
> matched by the backend.
> 
> For the new test case:
> 
> const int table[2] = {1, 2};
> int foo (char i) { return table[i]; }
> 
> compiling with -O2 -mlarge on msp430 we currently see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Failed to match this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> 
> which results in the following code:
> 
> foo:    AND     #0xff, R12
>          RLAM.A #4, R12 { RRAM.A #4, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> With this patch, we now see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Successfully matched this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
> allowing combination of insns 2 and 7
> original costs 4 + 8 = 12
> replacement cost 8
> 
> foo:    MOV.B   R12, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          PR rtl-optimization/91865
>          * combine.cc (make_compound_operation): Avoid creating a
>          ZERO_EXTEND of a ZERO_EXTEND.
> 
> gcc/testsuite/ChangeLog
>          PR rtl-optimization/91865
>          * gcc.target/msp430/pr91865.c: New test case.
Neither an ACK or NAK at this point.

The bug report includes a patch from Segher which purports to fix this 
in simplify-rtx.  Any thoughts on Segher's approach and whether or not 
it should be considered?

The BZ also indicates that removal of 2 patterns from msp430.md would 
solve this too (though it may cause regressions elsewhere?).  Any 
thoughts on that approach as well?

Thanks,
jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-14 23:02 ` Jeff Law
@ 2023-10-15  9:49   ` Roger Sayle
  2023-10-16 19:27     ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Sayle @ 2023-10-15  9:49 UTC (permalink / raw)
  To: 'Jeff Law', gcc-patches


Hi Jeff,
Thanks for the speedy review(s).

> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: 15 October 2023 00:03
> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
> make_compound_operation.
> 
> On 10/14/23 16:14, Roger Sayle wrote:
> >
> > This patch is my proposed solution to PR rtl-optimization/91865.
> > Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> > ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> > possible for combine's make_compound_operation to unintentionally
> > generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> > unlikely to be matched by the backend.
> >
> > For the new test case:
> >
> > const int table[2] = {1, 2};
> > int foo (char i) { return table[i]; }
> >
> > compiling with -O2 -mlarge on msp430 we currently see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Failed to match this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >
> > which results in the following code:
> >
> > foo:    AND     #0xff, R12
> >          RLAM.A #4, R12 { RRAM.A #4, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> > With this patch, we now see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Successfully matched this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> > combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> > cost 8
> >
> > foo:    MOV.B   R12, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR rtl-optimization/91865
> >          * combine.cc (make_compound_operation): Avoid creating a
> >          ZERO_EXTEND of a ZERO_EXTEND.
> >
> > gcc/testsuite/ChangeLog
> >          PR rtl-optimization/91865
> >          * gcc.target/msp430/pr91865.c: New test case.
> Neither an ACK or NAK at this point.
> 
> The bug report includes a patch from Segher which purports to fix this in simplify-
> rtx.  Any thoughts on Segher's approach and whether or not it should be
> considered?
> 
> The BZ also indicates that removal of 2 patterns from msp430.md would solve this
> too (though it may cause regressions elsewhere?).  Any thoughts on that approach
> as well?
> 

Great questions.  I believe Segher's proposed patch (in comment #4) was an
msp430-specific proof-of-concept workaround rather than intended to be fix.
Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
is not a solution that'll work on many platforms (and therefore not really suitable
for target-independent middle-end code in the RTL optimizers).

For example, zero_extend:TI (and:QI (reg:QI hard_r1) (const_int 0x0f)) can't
universally be reduced to (and:TI (reg:TI hard_r1) (const_int 0x0f)).  Notice that
Segher's code doesn't check TARGET_HARD_REGNO_MODE_OK or 
TARGET_MODES_TIEABLE_P or any of the other backend hooks necessary
to confirm such a transformation is safe/possible.

Secondly, the hard register aspect is a bit of a red herring.  This work-around
fixes the issue in the original BZ description, but not the slightly modified test
case in comment #2 (with a global variable).  This doesn't have a hard register,
but does have the dubious ZERO_EXTEND/SIGN_EXTEND of a ZERO_EXTEND.

The underlying issue, which is applicable to all targets, is that combine.cc's
make_compound_operation is expected to reverse the local transformations
made by expand_compound_operation.  Hence, if an RTL expression is
canonical going into expand_compound_operation, it is expected (hoped)
to be canonical (and equivalent) coming out of make_compound_operation.

Hence, rather than be a MSP430 specific issue, no target should expect (or
be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
CONST_INT operands, or a shift by zero, it's something the middle-end might
reasonably be expected to clean-up. [Yeah, I know... 😊]

However, if it isn't considered a problem that make_compound_operand and
simplify_rtx generate different forms of the same expression, requiring the
backend to match multiple patterns (some for the combine pass, and others
for other RTL passes), there is a simple fix for MSP430; Just add a pattern 
that matches (the slightly odd):

> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

As a rule of thumb, if the missed optimization bug report includes combine's
diagnostic "Failed to match this instruction:", things can be improved by adding
a pattern (often a define_insn_and_split) that matches the shown RTL.

In this case the perhaps reasonable assumption is that the above would/should
(normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern.
Or that's my understanding of why this PR is classified as rtl-optimization and
not target.

Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch
or other solution to PR 91865.  The more (safe) solutions the merrier.

Thanks again.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-15  9:49   ` Roger Sayle
@ 2023-10-16 19:27     ` Jeff Law
  2023-10-17  8:43       ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-10-16 19:27 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches



On 10/15/23 03:49, Roger Sayle wrote:
> 
> Hi Jeff,
> Thanks for the speedy review(s).
> 
>> From: Jeff Law <jeffreyalaw@gmail.com>
>> Sent: 15 October 2023 00:03
>> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
>> make_compound_operation.
>>
>> On 10/14/23 16:14, Roger Sayle wrote:
>>>
>>> This patch is my proposed solution to PR rtl-optimization/91865.
>>> Normally RTX simplification canonicalizes a ZERO_EXTEND of a
>>> ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
>>> possible for combine's make_compound_operation to unintentionally
>>> generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
>>> unlikely to be matched by the backend.
>>>
>>> For the new test case:
>>>
>>> const int table[2] = {1, 2};
>>> int foo (char i) { return table[i]; }
>>>
>>> compiling with -O2 -mlarge on msp430 we currently see:
>>>
>>> Trying 2 -> 7:
>>>       2: r25:HI=zero_extend(R12:QI)
>>>         REG_DEAD R12:QI
>>>       7: r28:PSI=sign_extend(r25:HI)#0
>>>         REG_DEAD r25:HI
>>> Failed to match this instruction:
>>> (set (reg:PSI 28 [ iD.1772 ])
>>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
>>>
>>> which results in the following code:
>>>
>>> foo:    AND     #0xff, R12
>>>           RLAM.A #4, R12 { RRAM.A #4, R12
>>>           RLAM.A  #1, R12
>>>           MOVX.W  table(R12), R12
>>>           RETA
>>>
>>> With this patch, we now see:
>>>
>>> Trying 2 -> 7:
>>>       2: r25:HI=zero_extend(R12:QI)
>>>         REG_DEAD R12:QI
>>>       7: r28:PSI=sign_extend(r25:HI)#0
>>>         REG_DEAD r25:HI
>>> Successfully matched this instruction:
>>> (set (reg:PSI 28 [ iD.1772 ])
>>>       (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
>>> combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
>>> cost 8
>>>
>>> foo:    MOV.B   R12, R12
>>>           RLAM.A  #1, R12
>>>           MOVX.W  table(R12), R12
>>>           RETA
>>>
>>>
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check, both with and without --target_board=unix{-m32}
>>> with no new failures.  Ok for mainline?
>>>
>>> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
>>>
>>> gcc/ChangeLog
>>>           PR rtl-optimization/91865
>>>           * combine.cc (make_compound_operation): Avoid creating a
>>>           ZERO_EXTEND of a ZERO_EXTEND.
>>>
>>> gcc/testsuite/ChangeLog
>>>           PR rtl-optimization/91865
>>>           * gcc.target/msp430/pr91865.c: New test case.
>> Neither an ACK or NAK at this point.
>>
>> The bug report includes a patch from Segher which purports to fix this in simplify-
>> rtx.  Any thoughts on Segher's approach and whether or not it should be
>> considered?
>>
>> The BZ also indicates that removal of 2 patterns from msp430.md would solve this
>> too (though it may cause regressions elsewhere?).  Any thoughts on that approach
>> as well?
>>
> 
> Great questions.  I believe Segher's proposed patch (in comment #4) was an
> msp430-specific proof-of-concept workaround rather than intended to be fix.
> Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
> is not a solution that'll work on many platforms (and therefore not really suitable
> for target-independent middle-end code in the RTL optimizers).
Thanks.  I didn't really look at Segher's patch, so thanks for digging 
into it.  Certainly just flipping the mode of the hard register isn't 
correct.


> 
> The underlying issue, which is applicable to all targets, is that combine.cc's
> make_compound_operation is expected to reverse the local transformations
> made by expand_compound_operation.  Hence, if an RTL expression is
> canonical going into expand_compound_operation, it is expected (hoped)
> to be canonical (and equivalent) coming out of make_compound_operation.
In theory, correct.


> 
> Hence, rather than be a MSP430 specific issue, no target should expect (or
> be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
> of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
> CONST_INT operands, or a shift by zero, it's something the middle-end might
> reasonably be expected to clean-up. [Yeah, I know... 😊]
Agreed.



> 
>>> (set (reg:PSI 28 [ iD.1772 ])
>>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> 
> As a rule of thumb, if the missed optimization bug report includes combine's
> diagnostic "Failed to match this instruction:", things can be improved by adding
> a pattern (often a define_insn_and_split) that matches the shown RTL.
Yes, but we also need to ponder if that's the right way to fix any given 
problem.  Sometimes we're going to be better off simplifying elsewhere 
in the pipeline.  I think we can agree this is one of the cases where 
matching the RTL in the backend is not the desired approach.

Occasionally things like those two patterns show up for various reasons. 
  Hopefully they can be removed :-)  Though the first looks awful close 
to something I did for the mn102 (not to be confused with the mn103) 
eons ago.  Partial modes aren't exactly handled well.

> 
> In this case the perhaps reasonable assumption is that the above would/should
> (normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern.
> Or that's my understanding of why this PR is classified as rtl-optimization and
> not target.
I wouldn't put a lot of faith in the classification ;-)


> 
> Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch
> or other solution to PR 91865.  The more (safe) solutions the merrier.
Generally agreed.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-16 19:27     ` Jeff Law
@ 2023-10-17  8:43       ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2023-10-17  8:43 UTC (permalink / raw)
  To: Jeff Law, Segher Boessenkool; +Cc: Roger Sayle, gcc-patches

On Mon, Oct 16, 2023 at 9:27 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 10/15/23 03:49, Roger Sayle wrote:
> >
> > Hi Jeff,
> > Thanks for the speedy review(s).
> >
> >> From: Jeff Law <jeffreyalaw@gmail.com>
> >> Sent: 15 October 2023 00:03
> >> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
> >> make_compound_operation.
> >>
> >> On 10/14/23 16:14, Roger Sayle wrote:
> >>>
> >>> This patch is my proposed solution to PR rtl-optimization/91865.
> >>> Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> >>> ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> >>> possible for combine's make_compound_operation to unintentionally
> >>> generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> >>> unlikely to be matched by the backend.
> >>>
> >>> For the new test case:
> >>>
> >>> const int table[2] = {1, 2};
> >>> int foo (char i) { return table[i]; }
> >>>
> >>> compiling with -O2 -mlarge on msp430 we currently see:
> >>>
> >>> Trying 2 -> 7:
> >>>       2: r25:HI=zero_extend(R12:QI)
> >>>         REG_DEAD R12:QI
> >>>       7: r28:PSI=sign_extend(r25:HI)#0
> >>>         REG_DEAD r25:HI
> >>> Failed to match this instruction:
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >>>
> >>> which results in the following code:
> >>>
> >>> foo:    AND     #0xff, R12
> >>>           RLAM.A #4, R12 { RRAM.A #4, R12
> >>>           RLAM.A  #1, R12
> >>>           MOVX.W  table(R12), R12
> >>>           RETA
> >>>
> >>> With this patch, we now see:
> >>>
> >>> Trying 2 -> 7:
> >>>       2: r25:HI=zero_extend(R12:QI)
> >>>         REG_DEAD R12:QI
> >>>       7: r28:PSI=sign_extend(r25:HI)#0
> >>>         REG_DEAD r25:HI
> >>> Successfully matched this instruction:
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>       (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> >>> combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> >>> cost 8
> >>>
> >>> foo:    MOV.B   R12, R12
> >>>           RLAM.A  #1, R12
> >>>           MOVX.W  table(R12), R12
> >>>           RETA
> >>>
> >>>
> >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >>> and make -k check, both with and without --target_board=unix{-m32}
> >>> with no new failures.  Ok for mainline?
> >>>
> >>> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> >>>
> >>> gcc/ChangeLog
> >>>           PR rtl-optimization/91865
> >>>           * combine.cc (make_compound_operation): Avoid creating a
> >>>           ZERO_EXTEND of a ZERO_EXTEND.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>>           PR rtl-optimization/91865
> >>>           * gcc.target/msp430/pr91865.c: New test case.
> >> Neither an ACK or NAK at this point.
> >>
> >> The bug report includes a patch from Segher which purports to fix this in simplify-
> >> rtx.  Any thoughts on Segher's approach and whether or not it should be
> >> considered?
> >>
> >> The BZ also indicates that removal of 2 patterns from msp430.md would solve this
> >> too (though it may cause regressions elsewhere?).  Any thoughts on that approach
> >> as well?
> >>
> >
> > Great questions.  I believe Segher's proposed patch (in comment #4) was an
> > msp430-specific proof-of-concept workaround rather than intended to be fix.
> > Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
> > is not a solution that'll work on many platforms (and therefore not really suitable
> > for target-independent middle-end code in the RTL optimizers).
> Thanks.  I didn't really look at Segher's patch, so thanks for digging
> into it.  Certainly just flipping the mode of the hard register isn't
> correct.
>
>
> >
> > The underlying issue, which is applicable to all targets, is that combine.cc's
> > make_compound_operation is expected to reverse the local transformations
> > made by expand_compound_operation.  Hence, if an RTL expression is
> > canonical going into expand_compound_operation, it is expected (hoped)
> > to be canonical (and equivalent) coming out of make_compound_operation.
> In theory, correct.
>
>
> >
> > Hence, rather than be a MSP430 specific issue, no target should expect (or
> > be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
> > of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
> > CONST_INT operands, or a shift by zero, it's something the middle-end might
> > reasonably be expected to clean-up. [Yeah, I know... 😊]
> Agreed.
>
>
>
> >
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>       (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >
> > As a rule of thumb, if the missed optimization bug report includes combine's
> > diagnostic "Failed to match this instruction:", things can be improved by adding
> > a pattern (often a define_insn_and_split) that matches the shown RTL.
> Yes, but we also need to ponder if that's the right way to fix any given
> problem.  Sometimes we're going to be better off simplifying elsewhere
> in the pipeline.  I think we can agree this is one of the cases where
> matching the RTL in the backend is not the desired approach.
>
> Occasionally things like those two patterns show up for various reasons.
>   Hopefully they can be removed :-)  Though the first looks awful close
> to something I did for the mn102 (not to be confused with the mn103)
> eons ago.  Partial modes aren't exactly handled well.
>
> >
> > In this case the perhaps reasonable assumption is that the above would/should
> > (normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn pattern.
> > Or that's my understanding of why this PR is classified as rtl-optimization and
> > not target.
> I wouldn't put a lot of faith in the classification ;-)
>
>
> >
> > Finally, my patch shouldn't block a (updated corrected) variant of Segher's patch
> > or other solution to PR 91865.  The more (safe) solutions the merrier.
> Generally agreed.

Looking at the patch I wonder whether handling (zero_extend (zero_extend ..))
shouldn't be done by using simplify_unary_operation instead of
simplify_const_unary_operation
here?  If that's by design then I agree the patch looks reasonable (albeit ugly)
as long as the reverse still works.

But you probably need Seghers ack here.

Thanks,
Richard.


> jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-14 22:14 [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation Roger Sayle
  2023-10-14 23:02 ` Jeff Law
@ 2023-10-19 15:20 ` Jeff Law
  2023-10-25  9:21   ` [PATCH v2] " Roger Sayle
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-10-19 15:20 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches



On 10/14/23 16:14, Roger Sayle wrote:
> 
> This patch is my proposed solution to PR rtl-optimization/91865.
> Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
> to a single ZERO_EXTEND, but as shown in this PR it is possible for
> combine's make_compound_operation to unintentionally generate a
> non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
> matched by the backend.
> 
> For the new test case:
> 
> const int table[2] = {1, 2};
> int foo (char i) { return table[i]; }
> 
> compiling with -O2 -mlarge on msp430 we currently see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Failed to match this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> 
> which results in the following code:
> 
> foo:    AND     #0xff, R12
>          RLAM.A #4, R12 { RRAM.A #4, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> With this patch, we now see:
> 
> Trying 2 -> 7:
>      2: r25:HI=zero_extend(R12:QI)
>        REG_DEAD R12:QI
>      7: r28:PSI=sign_extend(r25:HI)#0
>        REG_DEAD r25:HI
> Successfully matched this instruction:
> (set (reg:PSI 28 [ iD.1772 ])
>      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
> allowing combination of insns 2 and 7
> original costs 4 + 8 = 12
> replacement cost 8
> 
> foo:    MOV.B   R12, R12
>          RLAM.A  #1, R12
>          MOVX.W  table(R12), R12
>          RETA
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          PR rtl-optimization/91865
>          * combine.cc (make_compound_operation): Avoid creating a
>          ZERO_EXTEND of a ZERO_EXTEND.
Final question.  Is there a reasonable expectation that we could get a 
similar situation with sign extensions?   If so we probably ought to try 
and handle both.

OK with the obvious change to handle nested sign extensions if you think 
it's useful to do so.  And OK as-is if you don't think handling nested 
sign extensions is useful.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-19 15:20 ` Jeff Law
@ 2023-10-25  9:21   ` Roger Sayle
  2023-10-25 16:26     ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Sayle @ 2023-10-25  9:21 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: gcc-patches, 'Richard Biener'

[-- Attachment #1: Type: text/plain, Size: 3769 bytes --]


Hi Jeff,
Many thanks for the review/approval of my fix for PR rtl-optimization/91865.
Based on your and Richard Biener's feedback, I’d like to propose a revision
calling simplify_unary_operation instead of simplify_const_unary_operation
(i.e. Richi's recommendation).  I was originally concerned that this might
potentially result in unbounded recursion, and testing for ZERO_EXTEND was
safer but "uglier", but testing hasn't shown any issues.  If we do see issues
in the future, it's easy to fall back to the previous version of this patch.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-25  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
        PR rtl-optimization/91865
        * combine.cc (make_compound_operation): Avoid creating a
        ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
        PR rtl-optimization/91865
        * gcc.target/msp430/pr91865.c: New test case.


Thanks again,
Roger
--

> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: 19 October 2023 16:20
> 
> On 10/14/23 16:14, Roger Sayle wrote:
> >
> > This patch is my proposed solution to PR rtl-optimization/91865.
> > Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> > ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> > possible for combine's make_compound_operation to unintentionally
> > generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> > unlikely to be matched by the backend.
> >
> > For the new test case:
> >
> > const int table[2] = {1, 2};
> > int foo (char i) { return table[i]; }
> >
> > compiling with -O2 -mlarge on msp430 we currently see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Failed to match this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))
> >
> > which results in the following code:
> >
> > foo:    AND     #0xff, R12
> >          RLAM.A #4, R12 { RRAM.A #4, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> > With this patch, we now see:
> >
> > Trying 2 -> 7:
> >      2: r25:HI=zero_extend(R12:QI)
> >        REG_DEAD R12:QI
> >      7: r28:PSI=sign_extend(r25:HI)#0
> >        REG_DEAD r25:HI
> > Successfully matched this instruction:
> > (set (reg:PSI 28 [ iD.1772 ])
> >      (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> > combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> > cost 8
> >
> > foo:    MOV.B   R12, R12
> >          RLAM.A  #1, R12
> >          MOVX.W  table(R12), R12
> >          RETA
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > 2023-10-14  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR rtl-optimization/91865
> >          * combine.cc (make_compound_operation): Avoid creating a
> >          ZERO_EXTEND of a ZERO_EXTEND.
> Final question.  Is there a reasonable expectation that we could get a
> similar situation with sign extensions?   If so we probably ought to try
> and handle both.
> 
> OK with the obvious change to handle nested sign extensions if you think it's
> useful to do so.  And OK as-is if you don't think handling nested sign extensions is
> useful.
> 
> jeff

[-- Attachment #2: patchmc2.txt --]
[-- Type: text/plain, Size: 1089 bytes --]

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 360aa2f25e6..b1b16ac7bb2 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8449,8 +8449,8 @@ make_compound_operation (rtx x, enum rtx_code in_code)
   if (code == ZERO_EXTEND)
     {
       new_rtx = make_compound_operation (XEXP (x, 0), next_code);
-      tem = simplify_const_unary_operation (ZERO_EXTEND, GET_MODE (x),
-					    new_rtx, GET_MODE (XEXP (x, 0)));
+      tem = simplify_unary_operation (ZERO_EXTEND, GET_MODE (x),
+				      new_rtx, GET_MODE (XEXP (x, 0)));
       if (tem)
 	return tem;
       SUBST (XEXP (x, 0), new_rtx);
diff --git a/gcc/testsuite/gcc.target/msp430/pr91865.c b/gcc/testsuite/gcc.target/msp430/pr91865.c
new file mode 100644
index 00000000000..8cc21c8b9e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr91865.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlarge" } */
+
+const int table[2] = {1, 2};
+int foo (char i) { return table[i]; }
+
+/* { dg-final { scan-assembler-not "AND" } } */
+/* { dg-final { scan-assembler-not "RRAM" } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
  2023-10-25  9:21   ` [PATCH v2] " Roger Sayle
@ 2023-10-25 16:26     ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2023-10-25 16:26 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches, 'Richard Biener'



On 10/25/23 03:21, Roger Sayle wrote:
> 
> Hi Jeff,
> Many thanks for the review/approval of my fix for PR rtl-optimization/91865.
> Based on your and Richard Biener's feedback, I’d like to propose a revision
> calling simplify_unary_operation instead of simplify_const_unary_operation
> (i.e. Richi's recommendation).  I was originally concerned that this might
> potentially result in unbounded recursion, and testing for ZERO_EXTEND was
> safer but "uglier", but testing hasn't shown any issues.  If we do see issues
> in the future, it's easy to fall back to the previous version of this patch.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
> 
> 
> 2023-10-25  Roger Sayle  <roger@nextmovesoftware.com>
>              Richard Biener  <rguenther@suse.de>
> 
> gcc/ChangeLog
>          PR rtl-optimization/91865
>          * combine.cc (make_compound_operation): Avoid creating a
>          ZERO_EXTEND of a ZERO_EXTEND.
> 
> gcc/testsuite/ChangeLog
>          PR rtl-optimization/91865
>          * gcc.target/msp430/pr91865.c: New test case.
I'm not terribly worried about recursion.  For the case you want to 
handle, it's going to be picked up by the call to 
simplify_const_unary_operation at the start of simplify_unary_operation. 
  It's only if that fails that we call into simplify_unary_operation_1.

The only thing that even comes close to worrisome to me in this space is 
the asserts in do_SUBST.  But I don't think your patch is likely to make 
the problems with those asserts any worse than they already are.

OK for the trunk.

Jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-25 16:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-14 22:14 [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation Roger Sayle
2023-10-14 23:02 ` Jeff Law
2023-10-15  9:49   ` Roger Sayle
2023-10-16 19:27     ` Jeff Law
2023-10-17  8:43       ` Richard Biener
2023-10-19 15:20 ` Jeff Law
2023-10-25  9:21   ` [PATCH v2] " Roger Sayle
2023-10-25 16:26     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).