public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes
@ 2024-01-26 6:04 pinskia at gcc dot gnu.org
2024-01-26 6:05 ` [Bug target/113613] " pinskia at gcc dot gnu.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26 6:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Bug ID: 113613
Summary: [14 Regression] Missing ldp/stp optimization sometimes
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
CC: acoplan at gcc dot gnu.org
Target Milestone: ---
Target: aarch64-*-*
Take:
```
typedef float __attribute__((vector_size(8))) v2sf;
v2sf a[4];
v2sf b[4];
void f()
{
b[0] += a[0];
b[1] += a[1];
}
```
With -O3 on the trunk we get:
```
f:
adrp x1, .LANCHOR0
add x0, x1, :lo12:.LANCHOR0
ldr d31, [x1, #:lo12:.LANCHOR0]
ldr d30, [x0, 32]
fadd v30.2s, v31.2s, v30.2s
ldr d31, [x0, 8]
str d30, [x1, #:lo12:.LANCHOR0]
ldr d30, [x0, 40]
fadd v30.2s, v31.2s, v30.2s
str d30, [x0, 8]
ret
```
But in GCC 13 we got:
```
f:
adrp x1, .LANCHOR0
add x0, x1, :lo12:.LANCHOR0
ldp d1, d0, [x0]
ldp d3, d2, [x0, 32]
fadd v1.2s, v1.2s, v3.2s
fadd v0.2s, v0.2s, v2.2s
stp d1, d0, [x0]
ret
```
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization sometimes
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
@ 2024-01-26 6:05 ` pinskia at gcc dot gnu.org
2024-01-26 6:39 ` pinskia at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26 6:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.0
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note we should really get v4sf but that is PR 95960.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization sometimes
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
2024-01-26 6:05 ` [Bug target/113613] " pinskia at gcc dot gnu.org
@ 2024-01-26 6:39 ` pinskia at gcc dot gnu.org
2024-01-26 8:34 ` acoplan at gcc dot gnu.org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26 6:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I don't know if this shows up in real programs but it might point to
something missing that might happen in real programs.
Another testcase this time without vectors:
```
double a[4];
double b[4];
void f()
{
b[0] += a[0];
b[1] *= a[1];
}
```
For some reason it works with the GPRs though:
```
int a[4];
int b[4];
void f()
{
b[0] += a[0];
b[1] *= a[1];
}
```
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization sometimes
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
2024-01-26 6:05 ` [Bug target/113613] " pinskia at gcc dot gnu.org
2024-01-26 6:39 ` pinskia at gcc dot gnu.org
@ 2024-01-26 8:34 ` acoplan at gcc dot gnu.org
2024-01-26 8:58 ` [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8 acoplan at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-01-26 8:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |acoplan at gcc dot gnu.org
Last reconfirmed| |2024-01-26
Status|UNCONFIRMED |ASSIGNED
--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed, I'll take a look.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (2 preceding siblings ...)
2024-01-26 8:34 ` acoplan at gcc dot gnu.org
@ 2024-01-26 8:58 ` acoplan at gcc dot gnu.org
2024-01-26 9:00 ` acoplan at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-01-26 8:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
Summary|[14 Regression] Missing |[14 Regression] Missing
|ldp/stp optimization |ldp/stp optimization since
|sometimes |r14-6290-g9f0f7d802482a8
--- Comment #4 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Interestingly we started to miss this with the introduction of aarch64
early RA i.e. r14-6290-g9f0f7d802482a8958d6cdc72f1fe0c8549db2182.
My ldp/stp pattern rewrite was:
r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45
so we started to miss this before any of my ldp/stp patches.
Looking at what happens with the ldp/stp pass, I can see that in sched1 we've
already allocated hard regs to the vector load destinations:
3: NOTE_INSN_BASIC_BLOCK 2
2: NOTE_INSN_FUNCTION_BEG
13: NOTE_INSN_DELETED
5: debug begin stmt marker
6: r107:DI=high(`*.LANCHOR0')
7: r106:DI=r107:DI+low(`*.LANCHOR0')
REG_EQUAL `*.LANCHOR0'
14: v31:V2SF=[r107:DI+low(`*.LANCHOR0')]
15: v30:V2SF=[r106:DI+0x20]
16: v30:V2SF=v31:V2SF+v30:V2SF
REG_DEAD v31:V2SF
27: v31:V2SF=[r106:DI+0x8]
17: [r107:DI+low(`*.LANCHOR0')]=v30:V2SF
REG_DEAD r107:DI
REG_DEAD v30:V2SF
18: debug begin stmt marker
28: v30:V2SF=[r106:DI+0x28]
29: v30:V2SF=v31:V2SF+v30:V2SF
REG_DEAD v31:V2SF
30: [r106:DI+0x8]=v30:V2SF
REG_DEAD r106:DI
REG_DEAD v30:V2SF
33: NOTE_INSN_DELETED
and then there's nothing that the early ldp/stp pass can do because the
would-be load pair candidates already use the same (hard) transfer register due
to early RA:
merge_pairs [L=1], cand vecs (14) x (27)
analyzing pair (load=1): (14,27)
punting on ldp due to reg conflcits (14,27)
merge_pairs [L=1], cand vecs (15) x (28)
analyzing pair (load=1): (15,28)
punting on ldp due to reg conflcits (15,28)
merge_pairs [L=0], cand vecs (17) x (30)
analyzing pair (load=0): (17,30)
pair (17,30): rejecting base 106 due to dataflow hazards (28,29)
can't form pair (17,30) due to dataflow hazards
starting the processing of deferred insns
ending the processing of deferred insns
CCing Richard S for an opinion.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (3 preceding siblings ...)
2024-01-26 8:58 ` [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8 acoplan at gcc dot gnu.org
@ 2024-01-26 9:00 ` acoplan at gcc dot gnu.org
2024-01-26 9:05 ` acoplan at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-01-26 9:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
Assignee|acoplan at gcc dot gnu.org |unassigned at gcc dot gnu.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (4 preceding siblings ...)
2024-01-26 9:00 ` acoplan at gcc dot gnu.org
@ 2024-01-26 9:05 ` acoplan at gcc dot gnu.org
2024-01-26 9:33 ` acoplan at gcc dot gnu.org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-01-26 9:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
--- Comment #5 from Alex Coplan <acoplan at gcc dot gnu.org> ---
It looks like the current ordering of passes is:
early_ra
sched1
ldp_fusion1
early_remat
ISTM that ldp_fusion1 should probably be running before early_ra, but we found
that running ldp_fusion1 before sched1 could lead to increased register
pressure. Hmm.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (5 preceding siblings ...)
2024-01-26 9:05 ` acoplan at gcc dot gnu.org
@ 2024-01-26 9:33 ` acoplan at gcc dot gnu.org
2024-01-26 10:20 ` rsandifo at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-01-26 9:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
--- Comment #6 from Alex Coplan <acoplan at gcc dot gnu.org> ---
FWIW, if I move ldp_fusion1 before early_ra, with:
diff --git a/gcc/config/aarch64/aarch64-passes.def
b/gcc/config/aarch64/aarch64-passes.def
index 769d48f4faa..3853f6bf7a4 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -18,6 +18,7 @@
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
+INSERT_PASS_BEFORE (pass_sched, 1, pass_ldp_fusion);
INSERT_PASS_BEFORE (pass_sched, 1, pass_aarch64_early_ra);
INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
@@ -25,5 +26,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue,
1, pass_switch_pstat
INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
-INSERT_PASS_BEFORE (pass_early_remat, 1, pass_ldp_fusion);
INSERT_PASS_BEFORE (pass_peephole2, 1, pass_ldp_fusion);
we get:
f:
.LFB0:
.cfi_startproc
adrp x0, .LANCHOR0
add x0, x0, :lo12:.LANCHOR0
ldp d31, d30, [x0]
ldp d29, d28, [x0, 32]
fadd v29.2s, v31.2s, v29.2s
fadd v28.2s, v30.2s, v28.2s
stp d29, d28, [x0]
ret
note that this does use more registers, though, so it's not necessarily a clear
win in the general case (particularly if register pressure is already high).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (6 preceding siblings ...)
2024-01-26 9:33 ` acoplan at gcc dot gnu.org
@ 2024-01-26 10:20 ` rsandifo at gcc dot gnu.org
2024-02-23 14:13 ` cvs-commit at gcc dot gnu.org
2024-02-23 14:16 ` rsandifo at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-01-26 10:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Richard Sandiford <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org
--- Comment #7 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
early-ra does try to avoid reusing registers too soon, to increase scheduling
freedom. But in this case I imagine it handles the two statements as separate
regions. Should be fixable by carrying across a round-robin counter.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (7 preceding siblings ...)
2024-01-26 10:20 ` rsandifo at gcc dot gnu.org
@ 2024-02-23 14:13 ` cvs-commit at gcc dot gnu.org
2024-02-23 14:16 ` rsandifo at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-23 14:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:
https://gcc.gnu.org/g:ff442719cdb64c9df9d069af88e90d51bee6fb56
commit r14-9157-gff442719cdb64c9df9d069af88e90d51bee6fb56
Author: Richard Sandiford <richard.sandiford@arm.com>
Date: Fri Feb 23 14:12:55 2024 +0000
aarch64: Spread out FPR usage between RA regions [PR113613]
early-ra already had code to do regrename-style "broadening"
of the allocation, to promote scheduling freedom. However,
the pass divides the function into allocation regions
and this broadening only worked within a single region.
This meant that if a basic block contained one subblock
of FPR use, followed by a point at which no FPRs were live,
followed by another subblock of FPR use, the two subblocks
would tend to reuse the same registers. This in turn meant
that it wasn't possible to form LDP/STP pairs between them.
The failure to form LDPs and STPs in the testcase was a
regression from GCC 13.
The patch adds a simple heuristic to prefer less recently
used registers in the event of a tie.
gcc/
PR target/113613
* config/aarch64/aarch64-early-ra.cc
(early_ra::m_current_region): New member variable.
(early_ra::m_fpr_recency): Likewise.
(early_ra::start_new_region): Bump m_current_region.
(early_ra::allocate_colors): Prefer less recently used registers
in the event of a tie. Add a comment to explain why we prefer(ed)
higher-numbered registers.
(early_ra::find_oldest_color): Prefer less recently used registers
here too.
(early_ra::finalize_allocation): Update recency information for
allocated registers.
(early_ra::process_blocks): Initialize m_current_region and
m_fpr_recency.
gcc/testsuite/
PR target/113613
* gcc.target/aarch64/pr113613.c: New test.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
` (8 preceding siblings ...)
2024-02-23 14:13 ` cvs-commit at gcc dot gnu.org
@ 2024-02-23 14:16 ` rsandifo at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-02-23 14:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113613
Richard Sandiford <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #9 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-02-23 14:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-26 6:04 [Bug target/113613] New: [14 Regression] Missing ldp/stp optimization sometimes pinskia at gcc dot gnu.org
2024-01-26 6:05 ` [Bug target/113613] " pinskia at gcc dot gnu.org
2024-01-26 6:39 ` pinskia at gcc dot gnu.org
2024-01-26 8:34 ` acoplan at gcc dot gnu.org
2024-01-26 8:58 ` [Bug target/113613] [14 Regression] Missing ldp/stp optimization since r14-6290-g9f0f7d802482a8 acoplan at gcc dot gnu.org
2024-01-26 9:00 ` acoplan at gcc dot gnu.org
2024-01-26 9:05 ` acoplan at gcc dot gnu.org
2024-01-26 9:33 ` acoplan at gcc dot gnu.org
2024-01-26 10:20 ` rsandifo at gcc dot gnu.org
2024-02-23 14:13 ` cvs-commit at gcc dot gnu.org
2024-02-23 14:16 ` rsandifo at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).