public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode
@ 2022-08-12 11:23 tnfchris at gcc dot gnu.org
  2022-08-12 12:04 ` [Bug tree-optimization/106594] " rguenth at gcc dot gnu.org
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-08-12 11:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

            Bug ID: 106594
           Summary: [13 Regression] sign-extensions no longer merged into
                    addressing mode
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

Since around the 5th of August (need to bisect) we no longer generate
addressing modes on load merging the sign extends.

The following example:

extern const int constellation_64qam[64];

void foo(int nbits,
         const char *p_src,
         int *p_dst) {

  while (nbits > 0U) {
    char first = *p_src++;

    char index1 = ((first & 0x3) << 4) | (first >> 4);

    *p_dst++ = constellation_64qam[index1];

    nbits--;
  }
}

used to generate

        orr     w3, w4, w3, lsr 4
        ldr     w3, [x6, w3, sxtw 2]

and now generates:

        orr     w0, w4, w0, lsr 4
        sxtw    x0, w0
        ldr     w0, [x6, x0, lsl 2]

at -O2 (-O1 seems to still be fine).  This is causing a regression in perf in
some of our libraries.

Looks like there's a change in how the operation is expressed.  It used to be

  first_17 = *p_src_28;
  _1 = (int) first_17;
  _2 = _1 << 4;
  _3 = (signed char) _2;

where the shift is done as an int, whereas now it's

  first_16 = *p_src_27;
  first.1_1 = (signed char) first_16;
  _2 = first.1_1 << 4;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
@ 2022-08-12 12:04 ` rguenth at gcc dot gnu.org
  2022-08-12 12:44 ` tnfchris at gcc dot gnu.org
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-12 12:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0
                 CC|                            |sayle at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I believe this was some match.pd simplification - why does this affect the
addressing mode?  Is the IVOPTs result different or does it differ later?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
  2022-08-12 12:04 ` [Bug tree-optimization/106594] " rguenth at gcc dot gnu.org
@ 2022-08-12 12:44 ` tnfchris at gcc dot gnu.org
  2022-08-12 14:01 ` roger at nextmovesoftware dot com
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-08-12 12:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> I believe this was some match.pd simplification - why does this affect the
> addressing mode?  Is the IVOPTs result different or does it differ later?

The results in IVOpts don't change aside from the changes mentioned above.  It
looks like this was previously being matched in RTL through combine.

Before we had

(insn 25 24 26 4 (set (reg:DI 118)
        (sign_extend:DI (reg:SI 117))) "/app/example.c":12:35 109
{*extendsidi2_aarch64}
     (expr_list:REG_DEAD (reg:SI 117)
        (nil)))
(insn 26 25 27 4 (set (reg:SI 119 [ constellation_64qam[_6] ])
        (mem/u:SI (plus:DI (mult:DI (reg:DI 118)
                    (const_int 4 [0x4]))
                (reg/f:DI 120)) [1 constellation_64qam[_6]+0 S4 A32]))
"/app/example.c":12:14 52 {*movsi_aarch64}

Trying 25 -> 26:
   25: r118:DI=sign_extend(r117:SI)
      REG_DEAD r117:SI
   26: r119:SI=[r118:DI*0x4+r120:DI]
      REG_DEAD r118:DI
      REG_EQUAL [r118:DI*0x4+`constellation_64qam']
Successfully matched this instruction:
(set (reg:SI 119 [ constellation_64qamD.3590[_6] ])
    (mem/u:SI (plus:DI (mult:DI (sign_extend:DI (reg:SI 117))
                (const_int 4 [0x4]))
            (reg/f:DI 120)) [1 constellation_64qamD.3590[_6]+0 S4 A32]))
allowing combination of insns 25 and 26
original costs 4 + 16 = 20
replacement cost 16
deferring deletion of insn with uid = 25.
modifying insn i3    26: r119:SI=[sign_extend(r117:SI)*0x4+r120:DI]
      REG_DEAD r117:SI
deferring rescan insn with uid = 26.

now, instead we have


(insn 24 23 25 4 (set (reg:DI 116)
        (sign_extend:DI (reg:SI 115))) "/app/example.c":12:35 126
{*extendsidi2_aarch64}
     (expr_list:REG_DEAD (reg:SI 115)
        (nil)))
(insn 25 24 26 4 (set (reg:SI 117 [ constellation_64qam[_5] ])
        (mem/u:SI (plus:DI (mult:DI (reg:DI 116)
                    (const_int 4 [0x4]))
                (reg/f:DI 118)) [1 constellation_64qam[_5]+0 S4 A32]))
"/app/example.c":12:14 52 {*movsi_aarch64}

Trying 24 -> 25:
   24: r116:DI=sign_extend(r115:SI)
      REG_DEAD r115:SI
   25: r117:SI=[r116:DI*0x4+r118:DI]
      REG_DEAD r116:DI
      REG_EQUAL [r116:DI*0x4+`constellation_64qam']
Failed to match this instruction:
(set (reg:SI 117 [ constellation_64qamD.3641[_5] ])
    (mem/u:SI (plus:DI (and:DI (mult:DI (subreg:DI (reg:SI 115) 0)
                    (const_int 4 [0x4]))
                (const_int 252 [0xfc]))
            (reg/f:DI 118)) [1 constellation_64qamD.3641[_5]+0 S4 A32]))


where the sign extend has suddenly turned into an and.

I don't know why though, the two input RTLs look identical to me.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
  2022-08-12 12:04 ` [Bug tree-optimization/106594] " rguenth at gcc dot gnu.org
  2022-08-12 12:44 ` tnfchris at gcc dot gnu.org
@ 2022-08-12 14:01 ` roger at nextmovesoftware dot com
  2022-08-12 14:48 ` tnfchris at gcc dot gnu.org
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-08-12 14:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-08-12
             Status|UNCONFIRMED                 |NEW
                 CC|                            |roger at nextmovesoftware dot com
     Ever confirmed|0                           |1

--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
Ah interesting.  Because index is a char, the tree-level optimizers realize
that the shift by 4 can be an 8-bit shift instead of an int-sized shift. 
What's interesting is that because of (x & 3) << 4, is used, the optimizers
realize that because index can never be negative, that in the array memory
reference expression constellation_64qam[index], when the 8-bit index is being
sign extended, it is effectively being zero-extended.

I think that the aarch64 backend needs to be taught that in this case (because
of the AND), the zero extension is the same as (can be implemented using) a
sign-extension, i.e. restoring the original code generation.  Practically, the
sxtw;ldr[..lsl 2] above can legitimately be optimized to ldr[..sxtw 2] (or
ldr[..uxtw 2] like LLVM) in this case [cases like this].

(sign_extend:DI (and:SI x (const_int 3))) and
(zero_extend:DI (and:SI x (const_int 3))) should (ideally) produce the exact
same code [the more efficient if two implementations are possible].

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-08-12 14:01 ` roger at nextmovesoftware dot com
@ 2022-08-12 14:48 ` tnfchris at gcc dot gnu.org
  2022-08-12 17:01 ` roger at nextmovesoftware dot com
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-08-12 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Hmm my concern here is though that we've now introduced two forms to represent
this and may cause an issue in other places where we sink extensions.

Perhaps there should be some canonization somewhere? maybe combine?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-08-12 14:48 ` tnfchris at gcc dot gnu.org
@ 2022-08-12 17:01 ` roger at nextmovesoftware dot com
  2022-08-13 11:34 ` [Bug rtl-optimization/106594] " roger at nextmovesoftware dot com
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-08-12 17:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #5 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Tamar,
I think this is where I need to apologize.  Combine is now canonicalizing these
equivalent RTL expressions to the zero_extend form, on the assumption that zero
extension has no data dependency and is cheaper or at worst the same speed on
many targets.  Unfortunately, for aarch64 there are patterns (splitters or
peephole2s) for optimizing the sign_extend version that don't exist for the
zero_extend version [even though the instruction set is symmetric and should
handle both sxtw/uxtw].  Technically, these were just missed-optimizations
before,  but I'm guessing my changes (to both trees and RTL) lead to changes in
the form that the backend encounters, and leads to a code quality regression.

This should be easy to fix, I just need to get up to speed on the instructions
that aarch64 supports, and which zero extended forms are currently missing.
I'm sure if GCC instead canonicalized to the sign_extend form, that other
targets would show similar asymmetries (it's only when things change that
anyone notices the difference).  I'll see if I can come up with a fix over the
weekend.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-08-12 17:01 ` roger at nextmovesoftware dot com
@ 2022-08-13 11:34 ` roger at nextmovesoftware dot com
  2022-08-14 12:07 ` tnfchris at gcc dot gnu.org
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-08-13 11:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |rtl-optimization
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-08-13 11:34 ` [Bug rtl-optimization/106594] " roger at nextmovesoftware dot com
@ 2022-08-14 12:07 ` tnfchris at gcc dot gnu.org
  2022-08-17  1:05 ` hp at gcc dot gnu.org
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-08-14 12:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Hi Roger,

before you spend too much time on this, I just wanted to clarify.

If you're saying this is a target issue where we lack some symmetry on patterns
I would be happy to fix it up and don't really expect you to!

My confusion here is that I am wondering whether combine will match the and
form with zero extend as well.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-08-14 12:07 ` tnfchris at gcc dot gnu.org
@ 2022-08-17  1:05 ` hp at gcc dot gnu.org
  2022-10-19  7:11 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hp at gcc dot gnu.org @ 2022-08-17  1:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Hans-Peter Nilsson <hp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hp at gcc dot gnu.org

--- Comment #7 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #5)
> I'm sure if GCC instead canonicalized to the sign_extend form, that other
> targets would show similar asymmetries (it's only when things change that
> anyone notices the difference).  I'll see if I can come up with a fix over
> the weekend.

Maybe I'm hallucinating and I'm certainly late to the game, but perhaps a
REG_NONNEGATIVE note can be added on the insn for the target register when
doing the AND transformation and then combine can be improved to try to replace
ZERO_EXTEND with SIGN_EXTEND when seeing this note on the applicable input
operand?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2022-08-17  1:05 ` hp at gcc dot gnu.org
@ 2022-10-19  7:11 ` rguenth at gcc dot gnu.org
  2022-10-19  7:52 ` roger at nextmovesoftware dot com
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-10-19  7:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2022-10-19  7:11 ` rguenth at gcc dot gnu.org
@ 2022-10-19  7:52 ` roger at nextmovesoftware dot com
  2023-01-17  4:06 ` roger at nextmovesoftware dot com
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-10-19  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
             Status|ASSIGNED                    |NEW
           Assignee|roger at nextmovesoftware dot com  |unassigned at gcc dot gnu.org

--- Comment #9 from Roger Sayle <roger at nextmovesoftware dot com> ---
My proposed patch to solve this problem was posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601423.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2022-10-19  7:52 ` roger at nextmovesoftware dot com
@ 2023-01-17  4:06 ` roger at nextmovesoftware dot com
  2023-02-27 10:00 ` tnfchris at gcc dot gnu.org
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-01-17  4:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #10 from Roger Sayle <roger at nextmovesoftware dot com> ---
Status update: The x86 backend pieces of my proposed fix have been approved and
committed, but the remaining middle-end pieces have been making slow progress:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609286.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-01-17  4:06 ` roger at nextmovesoftware dot com
@ 2023-02-27 10:00 ` tnfchris at gcc dot gnu.org
  2023-02-27 10:03 ` ktkachov at gcc dot gnu.org
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-02-27 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
This patch seems to have stalled. CC'ing the maintainers as this is still a
large regression for us.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-02-27 10:00 ` tnfchris at gcc dot gnu.org
@ 2023-02-27 10:03 ` ktkachov at gcc dot gnu.org
  2023-03-04 22:26 ` segher at gcc dot gnu.org
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2023-02-27 10:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ktkachov at gcc dot gnu.org

--- Comment #12 from ktkachov at gcc dot gnu.org ---
(In reply to Tamar Christina from comment #11)
> This patch seems to have stalled. CC'ing the maintainers as this is still a
> large regression for us.

Roger's latest updated patch was posted recently at
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612840.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-02-27 10:03 ` ktkachov at gcc dot gnu.org
@ 2023-03-04 22:26 ` segher at gcc dot gnu.org
  2023-03-05  8:06 ` roger at nextmovesoftware dot com
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: segher at gcc dot gnu.org @ 2023-03-04 22:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Hi!

Either this should not be P1, or the proposed patch is taking completely the
wrong direction.  P1 means there is a regression.  There is no regression in
combine, in fact the proposed patch would *cause* regressions on many targets!

I certainly welcome making the compound_operation stuff behave better, but the
key point there is *better*, random changes that have not been tested on most
archs (and will likely regress on many) are not okay.  This is stage 1 material
no matter what.

Maybe we need some new target macros saying to not use two shifts, and/or
zero_extract, or sign_extract, etc.  No machine newer than VAX (or was it PDP?)
has real hardware support for that, we are much better off expressing things
with machine instructions that *do* exist only.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-03-04 22:26 ` segher at gcc dot gnu.org
@ 2023-03-05  8:06 ` roger at nextmovesoftware dot com
  2023-03-05 12:00 ` roger at nextmovesoftware dot com
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-03-05  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #14 from Roger Sayle <roger at nextmovesoftware dot com> ---
This really is a regression, that points to a previously latent/unnoticed bug
in combine.

In GCC 12, combine would take the input RTL and based on target costs transform
it into the better of implementation A or B.  Now in GCC 13, the
tree-optimizers are able to perform this same optimization earlier and so
combine is now given optimal implementation A, where a latent bug always
transforms this to B without ever checking target costs.

The consensus is that performing (more) optimizations at the tree-level is a
good thing, so reverting changes to the tree optimizers (that now produce
better code) is a workaround to a glitch where combine is transforming RTL into
more expensive forms.

There's already a code path in combine that checks/compares costs, it just
isn't being reached any more.

p.s. this has nothing to do with sign_extract/zero_extract, for which hardware
support would be hypothetical, this patch only affects sign_extend/zero_extend,
such as aarch64's sxtw or x86_64's movsx or powerpc's extsw.  If a target has
this functionality, it's unlikely that a sequence of shifts or bit-wise
operations would be better.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-03-05  8:06 ` roger at nextmovesoftware dot com
@ 2023-03-05 12:00 ` roger at nextmovesoftware dot com
  2023-03-05 15:23 ` segher at gcc dot gnu.org
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-03-05 12:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #15 from Roger Sayle <roger at nextmovesoftware dot com> ---
An example of combine's temporary lapses of sanity can be seen on powerpc:
Trying 14 -> 15:
   14: %3:SI=sign_extend(r128:SI#2)*sign_extend(r127:SI#2)
      REG_DEAD r128:SI
      REG_DEAD r127:SI
   15: use %3:SI
Failed to match this instruction:
(parallel [
        (use (mult:SI (ashiftrt:SI (ashift:SI (reg:SI 128)
                        (const_int 16 [0x10]))
                    (const_int 16 [0x10]))
                (ashiftrt:SI (ashift:SI (reg:SI 127)
                        (const_int 16 [0x10]))
                    (const_int 16 [0x10]))))
        (set (reg/i:SI 3 3)
            (mult:SI (sign_extend:SI (subreg:HI (reg:SI 128) 2))
                (sign_extend:SI (subreg:HI (reg:SI 127) 2))))
    ])
with the proposed patch, we now see:
Trying 14 -> 15:
   14: %3:SI=sign_extend(r128:SI#2)*sign_extend(r127:SI#2)
      REG_DEAD r128:SI
      REG_DEAD r127:SI
   15: use %3:SI
Failed to match this instruction:
(parallel [
        (use (mult:SI (sign_extend:SI (subreg:HI (reg:SI 128) 2))
                (sign_extend:SI (subreg:HI (reg:SI 127) 2))))
        (set (reg/i:SI 3 3)
            (mult:SI (sign_extend:SI (subreg:HI (reg:SI 128) 2))
                (sign_extend:SI (subreg:HI (reg:SI 127) 2))))
    ])

i.e. the "canonical" form of the expression actually matches the RTL pattern
used/expected by the rs6000 backend.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-03-05 12:00 ` roger at nextmovesoftware dot com
@ 2023-03-05 15:23 ` segher at gcc dot gnu.org
  2023-03-05 19:23 ` tnfchris at gcc dot gnu.org
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: segher at gcc dot gnu.org @ 2023-03-05 15:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #16 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #14)
> This really is a regression, that points to a previously latent/unnoticed
> bug in combine.
> 
> In GCC 12, combine would take the input RTL and based on target costs
> transform it into the better of implementation A or B.

And it still does exactly that.  And your patch would *not*!  It does not
compare the cost of two alternative pieces of code; it just says "if the cost
of calculating some RTL expression as separate code is 4 or less, do not do
any other optimisation here".  It is completely ad-hoc, not an improvement,
will only cause more problems in the future.

> Now in GCC 13, the
> tree-optimizers are able to perform this same optimization earlier and so
> combine is now given optimal implementation A,

So it is not a regression in combine.

> where a latent bug always
> transforms this to B without ever checking target costs.

Latent.  It is not a regression.

> The consensus is that performing (more) optimizations at the tree-level is a
> good thing,

No.  The situation is different, not so simplistic.

At tree level the representation is higher level.  At RTL level it is more
nitty-gritty, very *low* level, very close to machine code.  It is very
important to do optimisations at that level as well.  It is pretty much
impossible to do a good job of low-level optimisations (like instruction
selection) at a high level, i.e. at great distance.  It is a bad idea to try
to such things at a high level.  What we can (and should, and *do*) do is
to in higher level optimisations keep code flexible, only have abstractions
that are easy to work with, etc.; and to have earlier passes output code
that the later passes can work with well.

> so reverting changes to the tree optimizers (that now produce
> better code) is a workaround to a glitch where combine is transforming RTL
> into more expensive forms.

But no one is talking about reverting anything?

> There's already a code path in combine that checks/compares costs, it just
> isn't being reached any more.

Now you completely lost me.

> p.s. this has nothing to do with sign_extract/zero_extract, for which
> hardware support would be hypothetical, this patch only affects
> sign_extend/zero_extend, such as aarch64's sxtw or x86_64's movsx or
> powerpc's extsw.  If a target has this functionality, it's unlikely that a
> sequence of shifts or bit-wise operations would be better.

*_extract is just an example (see various open PRs) of the problems with
compound_operation stuff.  I would just rip out all that stuff, similar to
your patch but a bit more drastic, except that causes regressions on other
targets.

Which I have tested and analysed, btw.  As I said in the patch review, this
really needs to be looked at on more targets.  And it should be done in
stage 1, not in stage 4.

(I am running such a test with your patch on all Linux targets, fwiw).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-03-05 15:23 ` segher at gcc dot gnu.org
@ 2023-03-05 19:23 ` tnfchris at gcc dot gnu.org
  2023-03-06  7:51 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-03-05 19:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #17 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #13)
> Hi!
> 
> Either this should not be P1, or the proposed patch is taking completely the
> wrong direction.  P1 means there is a regression.  There is no regression in
> combine, in fact the proposed patch would *cause* regressions on many
> targets!
> 

It is a regression in that it causes much worse code on AArch64. I don't
particularly care what caused the codegen regression, whether latent or new
functionality.

I just care that we get much worse code now in GCC 13 and that needs to be
fixed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-03-05 19:23 ` tnfchris at gcc dot gnu.org
@ 2023-03-06  7:51 ` rguenth at gcc dot gnu.org
  2023-03-06 10:38 ` rsandifo at gcc dot gnu.org
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-06  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, sorry for chiming in here ;)  I see that expand_compound_operation handles
sign_extends that can be expressed as zero_extends as the cheaper variant of

 a) the zero_extend
 b) the original sign_extend
 c) what expand_compound_operation recursively makes out of the zero_extend

while zero_extend seems to be always expanded to shifts (if possible)
as canonicalization(?) without considering costs.

So either that canonicalization isn't a canonicalization and thus at the
end of expand_compound_operation we should cost the shift sequence against
x and return x if that was cheaper _or_ the sign_extend case should not
leak non-canonical zero_extend and thus should not consider the bare
zero_extend but only the recursively treated one (supposed the sign_extend
is fine to be kept and not canonicalized to zero_extend).

Thus the problem in combines expand_compound_operation is confusion about
canonicalization vs. transform to a cheaper variant?  The comment of
the function suggests it should do canonicalization and instead
make_compound_operation would be the more appropriate place to do
costing?  (but I see none in the latter ...)

That said, the regression is caused by alternate input to the combine pass,
not by combine itself and from the looks at the IL snippets in comment#2
it might be just nonzero-bits knowledge that differs?

So for expand_compound_operation either

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..bb151446ef3 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7335,6 +7335,10 @@ expand_compound_operation (rtx x)
   if (GET_CODE (tem) == CLOBBER)
     return x;

+  if (set_src_cost (tem, mode, optimize_this_for_speed_p)
+      > set_src_cost (x, mode, optimize_this_for_speed_p))
+    return x;
+
   return tem;
 }

(which should enable simplification of the sign_extend checking)

or (more radical than suggested above) pruning all costing in this
function (maybe digging in history shows that's what it was before?):

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..5966da55412 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7231,20 +7231,7 @@ expand_compound_operation (rtx x)
       && ((nonzero_bits (XEXP (x, 0), inner_mode)
           & ~(((unsigned HOST_WIDE_INT) GET_MODE_MASK (inner_mode)) >> 1))
          == 0))
-    {
-      rtx temp = gen_rtx_ZERO_EXTEND (mode, XEXP (x, 0));
-      rtx temp2 = expand_compound_operation (temp);
-
-      /* Make sure this is a profitable operation.  */
-      if (set_src_cost (x, mode, optimize_this_for_speed_p)
-          > set_src_cost (temp2, mode, optimize_this_for_speed_p))
-       return temp2;
-      else if (set_src_cost (x, mode, optimize_this_for_speed_p)
-               > set_src_cost (temp, mode, optimize_this_for_speed_p))
-       return temp;
-      else
-       return x;
-    }
+    x = gen_rtx_ZERO_EXTEND (mode, XEXP (x, 0));

   /* We can optimize some special cases of ZERO_EXTEND.  */
   if (GET_CODE (x) == ZERO_EXTEND)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-03-06  7:51 ` rguenth at gcc dot gnu.org
@ 2023-03-06 10:38 ` rsandifo at gcc dot gnu.org
  2023-03-06 11:07 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-03-06 10:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #19 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I completely agree with comment 18.  See:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601472.html
for some more on a similar theme.

make_compound_operation_int already has code to convert
(and (subreg X) (const_int N)) into (zero_extend X) when
nonzero_bits indicates that that's safe.  I think the AArch64
regression can be fixed by extending that to:

  (and (mult (subreg x) (const_int N2)) (const_int N))
  -> (mult (sign_extend X) (const_int N2))

for power-of-N2, with the nonzero_bits test adjusted for N2.
(mult is the canonical form here, since it's part of an address.)

Testing a patch for that.  Hopefully that should be less
risky than changing the expand/make_compound_operation
dance in stage 4.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2023-03-06 10:38 ` rsandifo at gcc dot gnu.org
@ 2023-03-06 11:07 ` jakub at gcc dot gnu.org
  2023-03-07 11:32 ` roger at nextmovesoftware dot com
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-03-06 11:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #20 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to rsandifo@gcc.gnu.org from comment #19)
> I completely agree with comment 18.  See:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601472.html
> for some more on a similar theme.
> 
> make_compound_operation_int already has code to convert
> (and (subreg X) (const_int N)) into (zero_extend X) when
> nonzero_bits indicates that that's safe.  I think the AArch64
> regression can be fixed by extending that to:
> 
>   (and (mult (subreg x) (const_int N2)) (const_int N))
>   -> (mult (sign_extend X) (const_int N2))
> 
> for power-of-N2, with the nonzero_bits test adjusted for N2.
> (mult is the canonical form here, since it's part of an address.)
> 
> Testing a patch for that.  Hopefully that should be less
> risky than changing the expand/make_compound_operation
> dance in stage 4.

Complete agreement here, when I was seeing the gcc-patches thread during the
weekend that was exactly what I'd try to do if you haven't posted this comment.
In the nonzero_bits check for this one needs to verify that:
1) low log2(N2) bits don't really matter in N
2) bits above that if zero in N need to correspond to zero in nonzero_bits
after the shift
3) the most significant bit of the SUBREG_REG needs to be clear (so that it
doesn't
matter if it is zero or sign extension)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2023-03-06 11:07 ` jakub at gcc dot gnu.org
@ 2023-03-07 11:32 ` roger at nextmovesoftware dot com
  2023-03-13  9:30 ` roger at nextmovesoftware dot com
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-03-07 11:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #21 from Roger Sayle <roger at nextmovesoftware dot com> ---
I completely agree that Richard Sandiford's patch is a much better solution,
but I'd like to counter the claims that the change originally proposed in
comment #8 is obviously universally bad.

Segher has proposed that object code size correlates with the quality of
"combine", so I thought that I'd point out that the original patch reduces code
size on the CSiBE benchmark on x86_64 when compiled with -Os.

#bench,file,before,after,delta,cumsum
bzip2-1.0.2,bzlib,11750,11756,6,6
cg_compiler_opensrc,memory,818,825,7,13
jpeg-6b,jcsample,2814,2804,-10,3
jpeg-6b,jcphuff,3606,3609,3,6
libpng-1.2.5,pngget,3798,3792,-6,0
linux-2.4.23-pre3-testplatform,fs/nfs/nfs2xdr,6070,6069,-1,-1
linux-2.4.23-pre3-testplatform,fs/nfs/nfs3xdr,8229,8225,-4,-5
linux-2.4.23-pre3-testplatform,fs/ext3/balloc,6769,6773,4,-1
linux-2.4.23-pre3-testplatform,fs/ext3/ialloc,6063,6064,1,0
linux-2.4.23-pre3-testplatform,fs/nfsd/nfsfh,5896,5889,-7,-7
linux-2.4.23-pre3-testplatform,fs/nfsd/nfs3xdr,6145,6143,-2,-9
linux-2.4.23-pre3-testplatform,fs/lockd/xdr,5924,5916,-8,-17
linux-2.4.23-pre3-testplatform,fs/lockd/xdr4,5883,5875,-8,-25
linux-2.4.23-pre3-testplatform,fs/inode,8227,8229,2,-23
linux-2.4.23-pre3-testplatform,mm/memory,8877,8880,3,-20
linux-2.4.23-pre3-testplatform,lib/zlib_deflate/deflate,6319,6315,-4,-24
linux-2.4.23-pre3-testplatform,net/ipv4/tcp_ipv4,17587,17588,1,-23
linux-2.4.23-pre3-testplatform,net/sunrpc/auth_unix,2155,2148,-7,-30
linux-2.4.23-pre3-testplatform,net/sunrpc/svcauth,948,947,-1,-31
linux-2.4.23-pre3-testplatform,net/sunrpc/xdr,4033,4043,10,-21
linux-2.4.23-pre3-testplatform,kernel/timer,5106,5108,2,-19

The story with -O2 is more complicated, it does indeed increase code size, but
the effects are greatly inflated due to jump alignment (notice that the
majority  of deltas in the report below are multiples of 16).  If the single
pathological OpenTCP/ip is excluded, the size is reduced over all of the other
tests.

OpenTCP-1.0.4,dns/dns,1793,1825,32,32
OpenTCP-1.0.4,http/http_server,2691,2676,-15,17
OpenTCP-1.0.4,ip,3152,3312,160,177
OpenTCP-1.0.4,tcp,8823,8839,16,193
OpenTCP-1.0.4,udp,2147,2163,16,209
bzip2-1.0.2,compress,17779,17763,-16,193
jikespg-1.3,src/mkfirst,20023,20007,-16,177
jikespg-1.3,src/mkred,8993,16,193
jikespg-1.3,src/produce,14897,14961,-16,177
jikespg-1.3,src/remsp,10678,10694,16,193
jikespg-1.3,src/resolve,17542,17558,16,209
jpeg-6b,jchuff,6439,6423,-16,193
jpeg-6b,jcphuff,7921,7905,-16,177
jpeg-6b,jdmarker,9845,9893,48,225
jpeg-6b,jquant2,6785,6769,-16,209
jpeg-6b,wrtarga,1353,1369,16,225
jpeg-6b,wrbmp,2551,2567,16,241
libmspack,test/cabextract_md5,32951,32935,-16,225
libpng-1.2.5,pngget,4583,4567,-16,209
libpng-1.2.5,pngwutil,21647,21615,-32,177
libpng-1.2.5,pngrtran,26045,26109,64,241
libpng-1.2.5,pngwtran,2539,2555,16,257
linux-2.4.23-pre3-testplatform,fs/nfs/nfs3xdr,14038,14006,-32,225
linux-2.4.23-pre3-testplatform,fs/nfsd/nfs3xdr,13584,13568,-16,209
linux-2.4.23-pre3-testplatform,fs/lockd/xdr,9554,9538,-16,193
linux-2.4.23-pre3-testplatform,fs/lockd/xdr4,7903,7919,16,209
linux-2.4.23-pre3-testplatform,fs/buffer,22824,22840,16,225
linux-2.4.23-pre3-testplatform,mm/filemap,23872,23888,16,241
linux-2.4.23-pre3-testplatform,net/ipv4/ip_input,4189,4173,-16,225
linux-2.4.23-pre3-testplatform,net/ipv4/ip_fragment,7242,7226,-16,209
linux-2.4.23-pre3-testplatform,net/ipv4/ip_options,7664,7680,16,225
linux-2.4.23-pre3-testplatform,net/ipv4/ip_output,10956,10924,-32,193
linux-2.4.23-pre3-testplatform,net/ipv4/tcp_ipv4,22663,22679,16,209
linux-2.4.23-pre3-testplatform,net/ipv4/udp,10365,10349,-16,193
linux-2.4.23-pre3-testplatform,net/ipv4/icmp,8589,8573,-16,177
linux-2.4.23-pre3-testplatform,net/sunrpc/auth_unix,2782,2766,-16,161
linux-2.4.23-pre3-testplatform,net/sunrpc/svcauth,1172,1156,-16,145
linux-2.4.23-pre3-testplatform,drivers/char/raw,4860,4876,16,161
linux-2.4.23-pre3-testplatform,kernel/exit,5485,5469,-16,145
linux-2.4.23-pre3-testplatform,kernel/timer,7257,7273,16,161
lwip-0.5.3.preproc,src/core/ipv4/ip,1883,1899,16,177
lwip-0.5.3.preproc,src/core/tcp_input,5513,5497,-16,161
lwip-0.5.3.preproc,src/core/tcp_output,3290,3354,64,225
teem-1.6.0-src,src/gage/st,5248,5264,16,241
teem-1.6.0-src,src/nrrd/apply1D,10837,10789,-48,193
unrarlib-0.4.0,unrarlib/unrarlib,16682,16666,-16,177
zlib-1.1.4,deflate,8721,8689,-32,145

Picking "lwip-0.5.3.preproc,src/core/tcp_output" as an example size regression,
the first difference in code is:

Before:
83 c2 05                add    $0x5,%edx
89 d3                   mov    %edx,%ebx
c1 e3 0c                shl    $0xc,%ebx

After:
c1 e2 0c                shl    $0xc,%edx
8d 9a 00 50 00 00       lea    0x5000(%rdx),%ebx

Notice that the size has increased by a byte, but the new sequence is actually
now only two instructions compared to the original three.

Let's just say the situation is complicated (comparing code size when not
optimizing for code size may be misleading), but importantly it is possible to
do better than the current expand_compound_operation/make_compound_operation.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2023-03-07 11:32 ` roger at nextmovesoftware dot com
@ 2023-03-13  9:30 ` roger at nextmovesoftware dot com
  2023-04-17 11:41 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-03-13  9:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=100378

--- Comment #22 from Roger Sayle <roger at nextmovesoftware dot com> ---
This issue is related to PR 100378, which is another combine-related
regression.
In fact, PR 100378 is resolved by the (now disapproved) patch in comment #8.
Presumably a variant of Richard Sandiford's make_compound_operation fix should
be able to handle PR100378 too.  Resolving multiple (long-standing) regressions
really should be suitable for stage 4.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2023-03-13  9:30 ` roger at nextmovesoftware dot com
@ 2023-04-17 11:41 ` jakub at gcc dot gnu.org
  2023-04-17 12:58 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-04-17 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #23 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This is just missed optimization and likely to be resolved only for GCC 14 and
perhaps later backported.  Downgrading to P2.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2023-04-17 11:41 ` jakub at gcc dot gnu.org
@ 2023-04-17 12:58 ` jakub at gcc dot gnu.org
  2023-04-26  6:56 ` [Bug rtl-optimization/106594] [13/14 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-04-17 12:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13/14 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2023-04-17 12:58 ` jakub at gcc dot gnu.org
@ 2023-04-26  6:56 ` rguenth at gcc dot gnu.org
  2023-07-27  9:23 ` rguenth at gcc dot gnu.org
  2023-10-30 16:19 ` segher at gcc dot gnu.org
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-26  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|13.0                        |13.2

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 13.1 is being released, retargeting bugs to GCC 13.2.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13/14 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2023-04-26  6:56 ` [Bug rtl-optimization/106594] [13/14 " rguenth at gcc dot gnu.org
@ 2023-07-27  9:23 ` rguenth at gcc dot gnu.org
  2023-10-30 16:19 ` segher at gcc dot gnu.org
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-27  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|13.2                        |13.3

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 13.2 is being released, retargeting bugs to GCC 13.3.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug rtl-optimization/106594] [13/14 Regression] sign-extensions no longer merged into addressing mode
  2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2023-07-27  9:23 ` rguenth at gcc dot gnu.org
@ 2023-10-30 16:19 ` segher at gcc dot gnu.org
  27 siblings, 0 replies; 29+ messages in thread
From: segher at gcc dot gnu.org @ 2023-10-30 16:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #27 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #21)
> Segher has proposed that object code size correlates with the quality of

It isn't a proposition, it is a simple and obvious fact.  But, this isn't
exactly
what I say :-)

Code size strongly correlates with number of instructions, almost 1-1 on most
targets.  Number of instructions is exactly what combine tries to reduce.

Whether that makes the code actually better is something completely separate as
well.  If your instruction cost function (and please use insn_cost, it is much
easier to use, and thus gives better results than rtx_costs) is good, this of
course should work fine.  And there is a hook (TARGET_LEGITIMATE_COMBINED_INSN)
for the very nasty cases.

But the whole "fewer insns that do the same thing, is better" thing is not
actually true on some targets.  Such targets are incredibly hard to optimise
for.  There is no way combine can do a good job for such targets.  It is
incredibly hard for human programmers to write good machine code for such
systems
by hand as well.

I do use object code size **of a huge sample** as a quick and dirty sniff test
to see if a change to combines is good or bad.  After that I always look at the
actual changes as well.  I do realise all pitfalls associated with this :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-10-30 16:19 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 11:23 [Bug tree-optimization/106594] New: [13 Regression] sign-extensions no longer merged into addressing mode tnfchris at gcc dot gnu.org
2022-08-12 12:04 ` [Bug tree-optimization/106594] " rguenth at gcc dot gnu.org
2022-08-12 12:44 ` tnfchris at gcc dot gnu.org
2022-08-12 14:01 ` roger at nextmovesoftware dot com
2022-08-12 14:48 ` tnfchris at gcc dot gnu.org
2022-08-12 17:01 ` roger at nextmovesoftware dot com
2022-08-13 11:34 ` [Bug rtl-optimization/106594] " roger at nextmovesoftware dot com
2022-08-14 12:07 ` tnfchris at gcc dot gnu.org
2022-08-17  1:05 ` hp at gcc dot gnu.org
2022-10-19  7:11 ` rguenth at gcc dot gnu.org
2022-10-19  7:52 ` roger at nextmovesoftware dot com
2023-01-17  4:06 ` roger at nextmovesoftware dot com
2023-02-27 10:00 ` tnfchris at gcc dot gnu.org
2023-02-27 10:03 ` ktkachov at gcc dot gnu.org
2023-03-04 22:26 ` segher at gcc dot gnu.org
2023-03-05  8:06 ` roger at nextmovesoftware dot com
2023-03-05 12:00 ` roger at nextmovesoftware dot com
2023-03-05 15:23 ` segher at gcc dot gnu.org
2023-03-05 19:23 ` tnfchris at gcc dot gnu.org
2023-03-06  7:51 ` rguenth at gcc dot gnu.org
2023-03-06 10:38 ` rsandifo at gcc dot gnu.org
2023-03-06 11:07 ` jakub at gcc dot gnu.org
2023-03-07 11:32 ` roger at nextmovesoftware dot com
2023-03-13  9:30 ` roger at nextmovesoftware dot com
2023-04-17 11:41 ` jakub at gcc dot gnu.org
2023-04-17 12:58 ` jakub at gcc dot gnu.org
2023-04-26  6:56 ` [Bug rtl-optimization/106594] [13/14 " rguenth at gcc dot gnu.org
2023-07-27  9:23 ` rguenth at gcc dot gnu.org
2023-10-30 16:19 ` segher at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).