public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64
@ 2023-11-16 21:30 jwerner at chromium dot org
2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: jwerner at chromium dot org @ 2023-11-16 21:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573
Bug ID: 112573
Summary: Suboptimal code generation with `-fdata-sections` on
aarch64
Product: gcc
Version: 13.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: jwerner at chromium dot org
Target Milestone: ---
Target: aarch64
I noticed some weird code generation behavior in aarch64 that seems to be a new
regression in GCC 13 or 12. Consider the following minimal test case:
char a[32];
void f(int x, int y)
{
a[y + 3] = (char)x;
a[y + 2] = (char)(x >> 8);
a[y + 1] = (char)(x >> 16);
a[y + 0] = (char)(x >> 24);
}
void h(int x, int y)
{
*((a + y) + 3) = (char)x;
*((a + y) + 2) = (char)(x >> 8);
*((a + y) + 1) = (char)(x >> 16);
*((a + y) + 0) = (char)(x >> 24);
}
These two functions should, of course, be 100% equivalent. Strangely enough
they still don't give the same code when compiling with -Os, but the difference
is minimal:
0000000000000000 <f> (File Offset: 0x40):
0: 11000c23 add w3, w1, #0x3
4: 90000002 adrp x2, 0 <f> (File Offset: 0x40)
4: R_AARCH64_ADR_PREL_PG_HI21 .bss
8: 91000042 add x2, x2, #0x0
8: R_AARCH64_ADD_ABS_LO12_NC .bss
c: 13087c04 asr w4, w0, #8
10: 3823c840 strb w0, [x2, w3, sxtw]
14: 11000823 add w3, w1, #0x2
18: 3823c844 strb w4, [x2, w3, sxtw]
1c: 11000423 add w3, w1, #0x1
20: 13107c04 asr w4, w0, #16
24: 13187c00 asr w0, w0, #24
28: 3823c844 strb w4, [x2, w3, sxtw]
2c: 3821c840 strb w0, [x2, w1, sxtw]
30: d65f03c0 ret
0000000000000034 <h> (File Offset: 0x74):
34: 90000002 adrp x2, 0 <f> (File Offset: 0x40)
34: R_AARCH64_ADR_PREL_PG_HI21 .bss
38: 91000042 add x2, x2, #0x0
38: R_AARCH64_ADD_ABS_LO12_NC .bss
3c: 8b21c043 add x3, x2, w1, sxtw
40: 13087c04 asr w4, w0, #8
44: 39000864 strb w4, [x3, #2]
48: 13107c04 asr w4, w0, #16
4c: 39000464 strb w4, [x3, #1]
50: 39000c60 strb w0, [x3, #3]
54: 13187c00 asr w0, w0, #24
58: 3821c840 strb w0, [x2, w1, sxtw]
5c: d65f03c0 ret
However, when I add the -fdata-sections flag, the code for f() remains the
same, but the code for h() becomes this weird thing:
0000000000000034 <h> (File Offset: 0x74):
34: 93407c21 sxtw x1, w1
38: 90000002 adrp x2, 0 <f> (File Offset: 0x40)
38: R_AARCH64_ADR_PREL_PG_HI21 a+0x3
3c: 91000042 add x2, x2, #0x0
3c: R_AARCH64_ADD_ABS_LO12_NC a+0x3
40: 13087c03 asr w3, w0, #8
44: 38216840 strb w0, [x2, x1]
48: 90000002 adrp x2, 0 <f> (File Offset: 0x40)
48: R_AARCH64_ADR_PREL_PG_HI21 a+0x2
4c: 91000042 add x2, x2, #0x0
4c: R_AARCH64_ADD_ABS_LO12_NC a+0x2
50: 38216843 strb w3, [x2, x1]
54: 90000002 adrp x2, 0 <f> (File Offset: 0x40)
54: R_AARCH64_ADR_PREL_PG_HI21 a+0x1
58: 91000042 add x2, x2, #0x0
58: R_AARCH64_ADD_ABS_LO12_NC a+0x1
5c: 13107c03 asr w3, w0, #16
60: 13187c00 asr w0, w0, #24
64: 38216843 strb w3, [x2, x1]
68: 90000002 adrp x2, 0 <f> (File Offset: 0x40)
68: R_AARCH64_ADR_PREL_PG_HI21 a
6c: 91000042 add x2, x2, #0x0
6c: R_AARCH64_ADD_ABS_LO12_NC a
70: 38216840 strb w0, [x2, x1]
74: d65f03c0 ret
There should be absolutely no reason to reload the address of `a` from scratch
for each access. `-fdata-sections` also shouldn't have any influence at all on
how the code inside a function looks like, as far as I'm aware.
When I try this on GCC 11.2.0 instead, the output for h() with -fdata-sections
is the same short code that it is without.
Also, there is of course the question why GCC doesn't just reduce this code to
the `rev` instruction (it doesn't seem to be able to figure that out on either
GCC 11 or GCC 13, but it would be able to do it if the `y` parameter was a
constant instead).
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
@ 2023-11-16 22:01 ` pinskia at gcc dot gnu.org
2023-11-16 22:11 ` jwerner at chromium dot org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-16 22:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The rev issue is recorded seperately already.
The difference between f and h are due to reassociation .
In f, (y+CST) can overflow so it is not reassociated with the addition with &a.
While h the addition is done in pointer size (64bit) so it can be reassociated
and you get that effect.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
@ 2023-11-16 22:11 ` jwerner at chromium dot org
2023-11-20 19:28 ` wilco at gcc dot gnu.org
2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: jwerner at chromium dot org @ 2023-11-16 22:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573
--- Comment #2 from Julius Werner <jwerner at chromium dot org> ---
Sorry, I don't really know what reassociation means in this context. Are you
saying that the behavior is WAI?
Note that the problem also exists when you write the accesses in h() as:
*(a + (y + 3)) = (char)x;
so I don't really think this is just an order-of-operations / overflow thing
(also, how would that explain why it only occurs with -fdata-sections?).
According to my understanding of C, `*(a + (y + 3))` and `a[y + 3]` are
different notations for exactly the same expression, so they shouldn't generate
different code.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
2023-11-16 22:11 ` jwerner at chromium dot org
@ 2023-11-20 19:28 ` wilco at gcc dot gnu.org
2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2023-11-20 19:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2023-11-20
Ever confirmed|0 |1
CC| |wilco at gcc dot gnu.org
Status|UNCONFIRMED |NEW
--- Comment #3 from Wilco <wilco at gcc dot gnu.org> ---
We should reassociate the immediate last for more optimal addressing like LLVM:
adrp x8, a
add x8, x8, :lo12:a
lsr w9, w0, #8
add x8, x8, w1, sxtw
strb w9, [x8, #1]
lsr w9, w0, #16
strb w0, [x8, #3]
strb w9, [x8, #2]
lsr w9, w0, #24
strb w9, [x8]
ret
However GCC's reassociation is incorrect - it has been for many years and
things got much worse in GCC12...
As a result we may merge the immediate offset into the base address like in
'h'. Using -fdata-sections behaves like -fno-section-anchors, so it works as
expected (and 'extern' is the same as well). We could block merging offsets to
get more address CSEs if that ends up better overall.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/112573] Suboptimal code generation with `-fdata-sections` on aarch64
2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
` (2 preceding siblings ...)
2023-11-20 19:28 ` wilco at gcc dot gnu.org
@ 2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-16 17:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:
https://gcc.gnu.org/g:db4e496aadf1d7ab1c5af24410394d1551ddd3f0
commit r14-7284-gdb4e496aadf1d7ab1c5af24410394d1551ddd3f0
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date: Tue Jan 16 16:27:02 2024 +0000
AArch64: Reassociate CONST in address expressions
GCC tends to optimistically create CONST of globals with an immediate
offset.
However it is almost always better to CSE addresses of globals and add
immediate
offsets separately (the offset could be merged later in single-use cases).
Splitting CONST expressions with an index in aarch64_legitimize_address
fixes
part of PR112573.
gcc/ChangeLog:
PR target/112573
* config/aarch64/aarch64.cc (aarch64_legitimize_address):
Reassociate
badly formed CONST expressions.
gcc/testsuite/ChangeLog:
PR target/112573
* gcc.target/aarch64/pr112573.c: Add new test.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-16 17:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-16 21:30 [Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64 jwerner at chromium dot org
2023-11-16 22:01 ` [Bug target/112573] " pinskia at gcc dot gnu.org
2023-11-16 22:11 ` jwerner at chromium dot org
2023-11-20 19:28 ` wilco at gcc dot gnu.org
2024-01-16 17:12 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).