public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64
@ 2023-11-16 21:30 jwerner at chromium dot org
  2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: jwerner at chromium dot org @ 2023-11-16 21:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573

            Bug ID: 112573
           Summary: Suboptimal code generation with `-fdata-sections` on
                    aarch64
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jwerner at chromium dot org
  Target Milestone: ---
            Target: aarch64

I noticed some weird code generation behavior in aarch64 that seems to be a new
regression in GCC 13 or 12. Consider the following minimal test case:

char a[32];

void f(int x, int y)
{
        a[y + 3] = (char)x;
        a[y + 2] = (char)(x >> 8);
        a[y + 1] = (char)(x >> 16);
        a[y + 0] = (char)(x >> 24);
}

void h(int x, int y)
{
        *((a + y) + 3) = (char)x;
        *((a + y) + 2) = (char)(x >> 8);
        *((a + y) + 1) = (char)(x >> 16);
        *((a + y) + 0) = (char)(x >> 24);
}

These two functions should, of course, be 100% equivalent. Strangely enough
they still don't give the same code when compiling with -Os, but the difference
is minimal:

0000000000000000 <f> (File Offset: 0x40):
   0:   11000c23        add     w3, w1, #0x3
   4:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        4: R_AARCH64_ADR_PREL_PG_HI21   .bss
   8:   91000042        add     x2, x2, #0x0
                        8: R_AARCH64_ADD_ABS_LO12_NC    .bss
   c:   13087c04        asr     w4, w0, #8
  10:   3823c840        strb    w0, [x2, w3, sxtw]
  14:   11000823        add     w3, w1, #0x2
  18:   3823c844        strb    w4, [x2, w3, sxtw]
  1c:   11000423        add     w3, w1, #0x1
  20:   13107c04        asr     w4, w0, #16
  24:   13187c00        asr     w0, w0, #24
  28:   3823c844        strb    w4, [x2, w3, sxtw]
  2c:   3821c840        strb    w0, [x2, w1, sxtw]
  30:   d65f03c0        ret

0000000000000034 <h> (File Offset: 0x74):
  34:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        34: R_AARCH64_ADR_PREL_PG_HI21  .bss
  38:   91000042        add     x2, x2, #0x0
                        38: R_AARCH64_ADD_ABS_LO12_NC   .bss
  3c:   8b21c043        add     x3, x2, w1, sxtw
  40:   13087c04        asr     w4, w0, #8
  44:   39000864        strb    w4, [x3, #2]
  48:   13107c04        asr     w4, w0, #16
  4c:   39000464        strb    w4, [x3, #1]
  50:   39000c60        strb    w0, [x3, #3]
  54:   13187c00        asr     w0, w0, #24
  58:   3821c840        strb    w0, [x2, w1, sxtw]
  5c:   d65f03c0        ret

However, when I add the -fdata-sections flag, the code for f() remains the
same, but the code for h() becomes this weird thing:

0000000000000034 <h> (File Offset: 0x74):
  34:   93407c21        sxtw    x1, w1
  38:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        38: R_AARCH64_ADR_PREL_PG_HI21  a+0x3
  3c:   91000042        add     x2, x2, #0x0
                        3c: R_AARCH64_ADD_ABS_LO12_NC   a+0x3
  40:   13087c03        asr     w3, w0, #8
  44:   38216840        strb    w0, [x2, x1]
  48:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        48: R_AARCH64_ADR_PREL_PG_HI21  a+0x2
  4c:   91000042        add     x2, x2, #0x0
                        4c: R_AARCH64_ADD_ABS_LO12_NC   a+0x2
  50:   38216843        strb    w3, [x2, x1]
  54:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        54: R_AARCH64_ADR_PREL_PG_HI21  a+0x1
  58:   91000042        add     x2, x2, #0x0
                        58: R_AARCH64_ADD_ABS_LO12_NC   a+0x1
  5c:   13107c03        asr     w3, w0, #16
  60:   13187c00        asr     w0, w0, #24
  64:   38216843        strb    w3, [x2, x1]
  68:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        68: R_AARCH64_ADR_PREL_PG_HI21  a
  6c:   91000042        add     x2, x2, #0x0
                        6c: R_AARCH64_ADD_ABS_LO12_NC   a
  70:   38216840        strb    w0, [x2, x1]
  74:   d65f03c0        ret

There should be absolutely no reason to reload the address of `a` from scratch
for each access. `-fdata-sections` also shouldn't have any influence at all on
how the code inside a function looks like, as far as I'm aware.

When I try this on GCC 11.2.0 instead, the output for h() with -fdata-sections
is the same short code that it is without.

Also, there is of course the question why GCC doesn't just reduce this code to
the `rev` instruction (it doesn't seem to be able to figure that out on either
GCC 11 or GCC 13, but it would be able to do it if the `y` parameter was a
constant instead).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
  2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
@ 2023-11-16 22:01 ` pinskia at gcc dot gnu.org
  2023-11-16 22:11 ` jwerner at chromium dot org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-16 22:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The rev issue is recorded seperately already.

The difference between f and h are due to reassociation .
In f, (y+CST) can overflow so it is not reassociated with the addition with &a.
While h the addition is done in pointer size (64bit) so it can be reassociated
and you get that effect.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
  2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
  2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
@ 2023-11-16 22:11 ` jwerner at chromium dot org
  2023-11-20 19:28 ` wilco at gcc dot gnu.org
  2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: jwerner at chromium dot org @ 2023-11-16 22:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573

--- Comment #2 from Julius Werner <jwerner at chromium dot org> ---
Sorry, I don't really know what reassociation means in this context. Are you
saying that the behavior is WAI?

Note that the problem also exists when you write the accesses in h() as:

        *(a + (y + 3)) = (char)x;

so I don't really think this is just an order-of-operations / overflow thing
(also, how would that explain why it only occurs with -fdata-sections?).
According to my understanding of C, `*(a + (y + 3))` and `a[y + 3]` are
different notations for exactly the same expression, so they shouldn't generate
different code.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
  2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
  2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
  2023-11-16 22:11 ` jwerner at chromium dot org
@ 2023-11-20 19:28 ` wilco at gcc dot gnu.org
  2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2023-11-20 19:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-11-20
     Ever confirmed|0                           |1
                 CC|                            |wilco at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Wilco <wilco at gcc dot gnu.org> ---
We should reassociate the immediate last for more optimal addressing like LLVM:

        adrp    x8, a
        add     x8, x8, :lo12:a
        lsr     w9, w0, #8
        add     x8, x8, w1, sxtw
        strb    w9, [x8, #1]
        lsr     w9, w0, #16
        strb    w0, [x8, #3]
        strb    w9, [x8, #2]
        lsr     w9, w0, #24
        strb    w9, [x8]
        ret

However GCC's reassociation is incorrect - it has been for many years and
things got much worse in GCC12...

As a result we may merge the immediate offset into the base address like in
'h'. Using -fdata-sections behaves like -fno-section-anchors, so it works as
expected (and 'extern' is the same as well). We could block merging offsets to
get more address CSEs if that ends up better overall.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
  2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
                   ` (2 preceding siblings ...)
  2023-11-20 19:28 ` wilco at gcc dot gnu.org
@ 2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-16 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

https://gcc.gnu.org/g:db4e496aadf1d7ab1c5af24410394d1551ddd3f0

commit r14-7284-gdb4e496aadf1d7ab1c5af24410394d1551ddd3f0
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date:   Tue Jan 16 16:27:02 2024 +0000

    AArch64: Reassociate CONST in address expressions

    GCC tends to optimistically create CONST of globals with an immediate
offset.
    However it is almost always better to CSE addresses of globals and add
immediate
    offsets separately (the offset could be merged later in single-use cases).
    Splitting CONST expressions with an index in aarch64_legitimize_address
fixes
    part of PR112573.

    gcc/ChangeLog:
            PR target/112573
            * config/aarch64/aarch64.cc (aarch64_legitimize_address):
Reassociate
            badly formed CONST expressions.

    gcc/testsuite/ChangeLog:
            PR target/112573
            * gcc.target/aarch64/pr112573.c: Add new test.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-16 17:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
2023-11-16 22:11 ` jwerner at chromium dot org
2023-11-20 19:28 ` wilco at gcc dot gnu.org
2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).