[Bug target/109874] New: [SH] GCC 13's -Os code is 50% bigger than GCC 4's

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/109874] New: [SH] GCC 13's -Os code is 50% bigger than GCC 4's
@ 2023-05-16 10:51 paul at crapouillou dot net
  2023-05-16 12:16 ` [Bug target/109874] " dkm at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: paul at crapouillou dot net @ 2023-05-16 10:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

            Bug ID: 109874
           Summary: [SH] GCC 13's -Os code is 50% bigger than GCC 4's
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: paul at crapouillou dot net
  Target Milestone: ---

Using the following C code snippet:

------
unsigned int CHRmask1,CHRmask2,CHRmask4,CHRmask8;

void SetupCartCHRMapping(unsigned int size)
{
#if 0
    CHRmask1 = (size >> 10) - 1;
    CHRmask2 = (size >> 11) - 1;
    CHRmask4 = (size >> 12) - 1;
    CHRmask8 = (size >> 13) - 1;
#else
    size >>= 10;

    CHRmask1 = size - 1;
    size >>= 1;
    CHRmask2 = size - 1;
    size >>= 1;
    CHRmask4 = size - 1;
    size >>= 1;
    CHRmask8 = size - 1;
#endif
}
------

Compiling with -Os, GCC 13.1 will generate the exact same code for the two
cases, as it rightfully detects that they are functionally the same:

------
_SetupCartCHRMapping:
        mov.l   r12,@-r15
        mova    .L3,r0
        mov.l   .L3,r12
        mov     r4,r1
        shlr8   r1
        add     r0,r12
        mov.l   .L4,r0
        shlr2   r1
        add     #-1,r1
        mov.l   r1,@(r0,r12)
        mov     r4,r1
        shlr8   r1
        mov.l   .L5,r0
        shlr    r1
        shlr2   r1
        add     #-1,r1
        mov.l   r1,@(r0,r12)
        mov     r4,r1
        shlr8   r1
        mov.l   .L6,r0
        shlr2   r1
        shlr2   r1
        shlr8   r4
        add     #-1,r1
        shlr2   r4
        mov.l   r1,@(r0,r12)
        shlr    r4
        mov.l   .L7,r0
        shlr2   r4
        add     #-1,r4
        mov.l   r4,@(r0,r12)
        rts     
        mov.l   @r15+,r12
.L3:
        .long   _GLOBAL_OFFSET_TABLE_
.L4:
        .long   _CHRmask1@GOTOFF
.L5:
        .long   _CHRmask2@GOTOFF
.L6:
        .long   _CHRmask4@GOTOFF
.L7:
        .long   _CHRmask8@GOTOFF
_CHRmask8:
        .zero   4
_CHRmask4:
        .zero   4
_CHRmask2:
        .zero   4
_CHRmask1:
        .zero   4
------

The code part (excluding labels and data fields) is 33 instructions.

GCC 4.9.4 won't detect that the two versions of the code are equivalent, and
generate different machine code for them. The second version generates the
smallest code, at only 21 instructions:

------
_SetupCartCHRMapping:
        shlr8   r4
        shlr2   r4
        mov.l   .L2,r1
        mov     r4,r2
        add     #-1,r2
        mov.l   r2,@r1
        mov     r4,r1
        mov.l   .L3,r2
        shlr    r1
        add     #-1,r1
        mov.l   r1,@r2
        shlr2   r4
        mov.l   .L4,r1
        mov     r4,r2
        add     #-1,r2
        mov.l   r2,@r1
        shlr    r4
        mov.l   .L5,r1
        add     #-1,r4
        rts     
        mov.l   r4,@r1
.L2:
        .long   _CHRmask1
.L3:
        .long   _CHRmask2
.L4:
        .long   _CHRmask4
.L5:
        .long   _CHRmask8
------

So GCC 13.1 at -Os generates code that is 50% bigger than what GCC 4 would
generate for a functionally equivalent algorithm.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's
  2023-05-16 10:51 [Bug target/109874] New: [SH] GCC 13's -Os code is 50% bigger than GCC 4's paul at crapouillou dot net
@ 2023-05-16 12:16 ` dkm at gcc dot gnu.org
  2023-05-17  7:06 ` rguenth at gcc dot gnu.org
  2023-07-07  6:12 ` olegendo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: dkm at gcc dot gnu.org @ 2023-05-16 12:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

--- Comment #1 from Marc Poulhiès <dkm at gcc dot gnu.org> ---
Forcing GCC 13 to emit non-PIC (as gcc4) code shaves a few insns, down to 28.

```
_SetupCartCHRMapping:
        mov     r4,r1
        mov.l   .L3,r2
        shlr8   r1
        shlr2   r1
        add     #-1,r1
        mov.l   r1,@r2
        mov     r4,r1
        shlr8   r1
        mov.l   .L4,r2
        shlr    r1
        shlr2   r1
        add     #-1,r1
        mov.l   r1,@r2
        mov     r4,r1
        shlr8   r1
        mov.l   .L5,r2
        shlr2   r1
        shlr2   r1
        shlr8   r4
        add     #-1,r1
        shlr2   r4
        mov.l   r1,@r2
        shlr    r4
        mov.l   .L6,r1
        shlr2   r4
        add     #-1,r4
        rts     
        mov.l   r4,@r1
.L3:
        .long   _CHRmask1
.L4:
        .long   _CHRmask2
.L5:
        .long   _CHRmask4
.L6:
        .long   _CHRmask8
_CHRmask8:
        .zero   4
_CHRmask4:
        .zero   4
_CHRmask2:
        .zero   4
_CHRmask1:
        .zero   4
```

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's
  2023-05-16 10:51 [Bug target/109874] New: [SH] GCC 13's -Os code is 50% bigger than GCC 4's paul at crapouillou dot net
  2023-05-16 12:16 ` [Bug target/109874] " dkm at gcc dot gnu.org
@ 2023-05-17  7:06 ` rguenth at gcc dot gnu.org
  2023-07-07  6:12 ` olegendo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-17  7:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-05-17
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
             Target|                            |sh*

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It looks like the target cannot do arbitrary constant shifts so it benefits
from shifting incrementally.  Even if that is exposed early enough for CSE the
optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE here.

I'm not sure if there are other targets affected but this is a "global"
optimization problem which for example also affects optimal power expansion.

Generally strength-reduction techniques apply to improve these kind of
things, possibly in a machine dependent pass.

The regression was likely introduced when merging the shifts at the GIMPLE
level without considering the uses of the intermediate values (after the
transform
the values can be computed in parallel since the dependency chains are
shortened)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's
  2023-05-16 10:51 [Bug target/109874] New: [SH] GCC 13's -Os code is 50% bigger than GCC 4's paul at crapouillou dot net
  2023-05-16 12:16 ` [Bug target/109874] " dkm at gcc dot gnu.org
  2023-05-17  7:06 ` rguenth at gcc dot gnu.org
@ 2023-07-07  6:12 ` olegendo at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-07-07  6:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |olegendo at gcc dot gnu.org

--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> It looks like the target cannot do arbitrary constant shifts so it benefits
> from shifting incrementally.  Even if that is exposed early enough for CSE
> the optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE
> here.

That's right.  SH1, SH2 doesn't have a barrel shifter and needs stitched
constant shifts.  In some cases we resort to a rt lib call to avoid code bloat.

There are a couple of opportunities when sharing intermediate results of
incremental / stitched shifts.  A while ago I had the idea of writing an RTL
pass that would try to figure that out...

In this case the shifts are expanded to RTL with the constant shift amounts
already propagated and the incremental shifts removed, so it's a bit harder to
undo this at the RTL level, but not impossible.

On SH3, SH4 dynamic shifts are available, but it requires another register +
constant load.  Incremental / stitched shifts would be always better on SH for
this test case.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-07  6:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-16 10:51 [Bug target/109874] New: [SH] GCC 13's -Os code is 50% bigger than GCC 4's paul at crapouillou dot net
2023-05-16 12:16 ` [Bug target/109874] " dkm at gcc dot gnu.org
2023-05-17  7:06 ` rguenth at gcc dot gnu.org
2023-07-07  6:12 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).