public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies
@ 2024-05-08 17:25 wilco at gcc dot gnu.org
2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-05-08 17:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
Bug ID: 114991
Summary: [14/15 Regression] AArch64: LDP pass does not handle
some structure copies
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wilco at gcc dot gnu.org
Target Milestone: ---
The following example no longer emits LDP/STP since GCC14:
#include <string.h>
typedef struct { int arr[20]; } S;
void g (S *);
void h (S);
void f(int x)
{
S s;
g (&s);
h (s);
}
f:
stp x29, x30, [sp, -176]!
add x1, sp, 96
mov x29, sp
add x0, sp, 16
ldp q29, q31, [x1]
ldr q30, [x1, 32]
str q29, [sp, 16]
ldr q29, [x1, 48]
str q31, [x0, 16]
ldr q31, [x1, 64]
stp q30, q29, [x0, 32]
str q31, [x0, 64]
bl h
ldp x29, x30, [sp], 176
ret
The expansions for memcpy/move/memset no longer emit LDP directly in RTL and
now rely on the new LDP pass. Stack based loads/stores seem to confuse its
alias checks and it gives up.
Using -fno-schedule-insns fixes this example, but not all cases.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
@ 2024-05-08 17:25 ` wilco at gcc dot gnu.org
2024-05-09 13:49 ` acoplan at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-05-08 17:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |aarch64-*-*
Target Milestone|--- |15.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
@ 2024-05-09 13:49 ` acoplan at gcc dot gnu.org
2024-05-09 15:55 ` acoplan at gcc dot gnu.org
2024-05-09 15:58 ` acoplan at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-09 13:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-05-09
Status|UNCONFIRMED |NEW
CC| |acoplan at gcc dot gnu.org,
| |vmakarov at gcc dot gnu.org
Ever confirmed|0 |1
Keywords| |missed-optimization, ra
--- Comment #1 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed. There is a lot to unpack here. Of course, the include isn't needed
in this testcase and the problem can be seen more clearly with a slightly
smaller array size:
typedef struct { int arr[16]; } S;
void g (S *);
void h (S);
void f(int x)
{
S s;
g (&s);
h (s);
}
In this case sizeof(S) = 64 so we should be able to do the copy with 2 LDPs + 2
STPs.
So just for clarity, the missed ldp/stp started when we turned off the early
ldp/stp formation in memcpy expansion, i.e. with
r14-9373-g19b23bf3c32df3cbb96b3d898a1d7142f7bea4a0 .
However, things already started to regress earlier for this testcase with
r14-4944-gf55cdce3f8dd8503e080e35be59c5f5390f6d95e i.e.
commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date: Thu Oct 26 14:50:40 2023
[RA]: Modfify cost calculation for dealing with equivalences
before that RA change we get:
f:
stp x29, x30, [sp, -144]!
mov x29, sp
add x0, sp, 80
bl g
ldp q29, q28, [sp, 80]
add x0, sp, 16
ldp q31, q30, [sp, 112]
stp q29, q28, [sp, 16]
stp q31, q30, [sp, 48]
bl h
ldp x29, x30, [sp], 144
ret
and afterwards we get:
f:
stp x29, x30, [sp, -160]!
mov x29, sp
str x19, [sp, 16]
add x19, sp, 96
mov x0, x19
bl g
add x0, sp, 32
ldp q29, q28, [x19]
ldp q31, q30, [x19, 32]
stp q29, q28, [x0]
stp q31, q30, [x0, 32]
bl h
ldr x19, [sp, 16]
ldp x29, x30, [sp], 160
ret
which is really not great as now we have a save/restore of x19 and the accesses
end up using different (non-sp) registers which I suspect doesn't help with the
ldp/stp formation (on trunk).
I will try to give a detailed analysis on what goes wrong with the ldp/stp
formation at the RTL level shortly (there are a lot of different issues), but I
think that RA change is a contributing factor.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
2024-05-09 13:49 ` acoplan at gcc dot gnu.org
@ 2024-05-09 15:55 ` acoplan at gcc dot gnu.org
2024-05-09 15:58 ` acoplan at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-09 15:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
--- Comment #2 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Here is some analysis on why we miss some of these opportunities in ldp_fusion.
So initially in 267r.vregs we have some very clean RTL:
6: r101:DI=sfp:DI-0x40
7: x0:DI=r101:DI
8: call [`g'] argc:0
REG_CALL_DECL `g'
9: r102:DI=sfp:DI-0x80
10: r103:DI=sfp:DI-0x40
11: r104:V4SI=[r103:DI]
13: r105:V4SI=[r103:DI+0x10]
15: r106:V4SI=[r103:DI+0x20]
17: r107:V4SI=[r103:DI+0x30]
12: [r102:DI]=r104:V4SI
14: [r102:DI+0x10]=r105:V4SI
16: [r102:DI+0x20]=r106:V4SI
18: [r102:DI+0x30]=r107:V4SI
if were to run the ldp/stp pass on this it should form the pairs without a
problem. Of course things go downhill from here. The first slightly strange
thing is that fwprop propagates the sfp into the first of each group of
accesses (i.e. with offset 0), but not the others:
9: r102:DI=sfp:DI-0x80
11: r104:V4SI=[sfp:DI-0x40]
13: r105:V4SI=[r101:DI+0x10]
15: r106:V4SI=[r101:DI+0x20]
17: r107:V4SI=[r101:DI+0x30]
REG_DEAD r103:DI
12: [sfp:DI-0x80]=r104:V4SI
14: [r102:DI+0x10]=r105:V4SI
REG_DEAD r105:V4SI
16: [r102:DI+0x20]=r106:V4SI
REG_DEAD r106:V4SI
18: [r102:DI+0x30]=r107:V4SI
the RTL then stays mostly unchanged until sched1, where things really start to
go downhill:
11: r104:V4SI=[sfp:DI-0x40]
9: r102:DI=sfp:DI-0x80
13: r105:V4SI=[r101:DI+0x10]
20: x0:DI=r102:DI
REG_DEAD r102:DI
REG_EQUAL sfp:DI-0x80
15: r106:V4SI=[r101:DI+0x20]
12: [sfp:DI-0x80]=r104:V4SI
REG_DEAD r104:V4SI
17: r107:V4SI=[r101:DI+0x30]
REG_DEAD r101:DI
14: [r102:DI+0x10]=r105:V4SI
REG_DEAD r105:V4SI
16: [r102:DI+0x20]=r106:V4SI
REG_DEAD r106:V4SI
18: [r102:DI+0x30]=r107:V4SI
here the first of the stores (i12) has been moved up between the last pair of
loads (i15, i17). Now the interesting thing is how sched1 knows that it is
safe to perform this transformation. In the ldp_fusion1 pass we miss this pair
because we think that the loads may alias with i12:
cannot form pair (15,17) due to alias conflicts (12,12)
so it would be good to look into how our alias analysis differs from what
sched1 is doing. It's worth further noting that while the loads have MEM_EXPR
information (they point to the var_decl for s) the stores do not. Presumably
this is because the copy of s mandated by the ABI doesn't necessarily have a
tree decl representation that the MEM_EXPRs could point to.
Separately to the aliasing issue, because:
- there is no MEM_EXPR information for the stores, and
- forwprop1 substituted the sfp in for the first store
ldp_fusion fails to discover the (i12,i14) store pair opportunity. As a result
we unfortunately end up forming an stp in the middle.
Interestingly if I turn off fwprop1 then we still fail to form the
(12,14) pair due to aliasing.
So it seems the main thing to investigate is how sched1 does its alias
analysis and how that differs from what we're doing in ldp_fusion.
I have some WIP patches that should improve the pair discovery and could
potentially be extended to help with the case of the (12,14) pair here.
Another thing that could help with that is if we populated the MEM_EXPR for the
stores of the structure copy.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/114991] [14/15 Regression] AArch64: LDP pass does not handle some structure copies
2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
` (2 preceding siblings ...)
2024-05-09 15:55 ` acoplan at gcc dot gnu.org
@ 2024-05-09 15:58 ` acoplan at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-05-09 15:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114991
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |acoplan at gcc dot gnu.org
--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Mine for the aliasing issues/investigation, might be worth splitting off the RA
problem to track that separately.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-09 15:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-08 17:25 [Bug target/114991] New: [14/15 Regression] AArch64: LDP pass does not handle some structure copies wilco at gcc dot gnu.org
2024-05-08 17:25 ` [Bug target/114991] " wilco at gcc dot gnu.org
2024-05-09 13:49 ` acoplan at gcc dot gnu.org
2024-05-09 15:55 ` acoplan at gcc dot gnu.org
2024-05-09 15:58 ` acoplan at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).