public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression
@ 2024-01-26 14:49 wilco at gcc dot gnu.org
2024-01-26 15:05 ` [Bug target/113618] " acoplan at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-01-26 14:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
Bug ID: 113618
Summary: [14 Regression] AArch64: memmove idiom regression
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wilco at gcc dot gnu.org
Target Milestone: ---
The following is often used as an idiom for memmove since GCC mid-end and most
back-ends have no support for inlining memmove:
void move64 (char *a, char *b)
{
char temp[64];
memcpy (temp, a, 64);
memcpy (b, temp, 64);
}
On trunk this generates:
ldp q30, q31, [x0]
sub sp, sp, #64
ldp q28, q29, [x0, 32]
stp q30, q31, [sp]
ldp q30, q31, [sp]
stp q28, q29, [sp, 32]
ldp q28, q29, [sp, 32]
stp q30, q31, [x1]
stp q28, q29, [x1, 32]
add sp, sp, 64
ret
This is a significant regression from GCC13 which has redundant stores but
avoids load-after-store forwarding penalties:
ldp q2, q3, [x0]
sub sp, sp, #64
ldp q0, q1, [x0, 32]
stp q2, q3, [sp]
stp q2, q3, [x1]
stp q0, q1, [sp, 32]
stp q0, q1, [x1, 32]
add sp, sp, 64
ret
LLVM avoids writing to the temporary and removes the stackframe altogether:
ldp q1, q0, [x0, #32]
ldp q2, q3, [x0]
stp q1, q0, [x1, #32]
stp q2, q3, [x1]
ret
The reason for the regression appears to be the changed RTL representation of
LDP/STP. The RTL optimizer does not understand LDP/STP, so emitting LDP/STP
early in memcpy expansion means it cannot remove the redundant stack stores.
A possible fix would be to avoid emitting LDP/STP in memcpy/memmove/memset
expansions.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
@ 2024-01-26 15:05 ` acoplan at gcc dot gnu.org
2024-01-26 16:51 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-01-26 15:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
Alex Coplan <acoplan at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-01-26
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
CC| |acoplan at gcc dot gnu.org
--- Comment #1 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Confirmed.
(In reply to Wilco from comment #0)
> A possible fix would be to avoid emitting LDP/STP in memcpy/memmove/memset
> expansions.
Yeah, so I had posted
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636855.html for that
but held off from committing it at the time as IMO there wasn't enough evidence
to show that this helps in general (and the pass could in theory miss
opportunities which would lead to regressions).
But perhaps this is a good argument for going ahead with that change (of course
it will need rebasing).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
2024-01-26 15:05 ` [Bug target/113618] " acoplan at gcc dot gnu.org
@ 2024-01-26 16:51 ` pinskia at gcc dot gnu.org
2024-01-29 7:55 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26 16:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
CC| |pinskia at gcc dot gnu.org
Target Milestone|--- |14.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
2024-01-26 15:05 ` [Bug target/113618] " acoplan at gcc dot gnu.org
2024-01-26 16:51 ` pinskia at gcc dot gnu.org
@ 2024-01-29 7:55 ` rguenth at gcc dot gnu.org
2024-01-29 11:51 ` wilco at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-29 7:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It might be good to recognize this pattern in strlenopt or a related pass.
A purely local transform would turn it into
memcpy (temp, a, 64);
memmove (b, a, 64);
relying on DSE to eliminate the copy to temp if possible. Not sure if
that possibly would be a bad transform if copying to temp is required.
stp q30, q31, [sp]
ldp q30, q31, [sp]
why is CSE not able to catch this?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
` (2 preceding siblings ...)
2024-01-29 7:55 ` rguenth at gcc dot gnu.org
@ 2024-01-29 11:51 ` wilco at gcc dot gnu.org
2024-01-31 16:58 ` wilco at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-01-29 11:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
--- Comment #3 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> It might be good to recognize this pattern in strlenopt or a related pass.
>
> A purely local transform would turn it into
>
> memcpy (temp, a, 64);
> memmove (b, a, 64);
>
> relying on DSE to eliminate the copy to temp if possible. Not sure if
> that possibly would be a bad transform if copying to temp is required.
This would only be beneficial if you know memmove is inlined if memcpy is - on
almost all targets memmove becomes a library call, so the transformation would
be worse if memcpy can be inlined.
> stp q30, q31, [sp]
> ldp q30, q31, [sp]
>
> why is CSE not able to catch this?
The new RTL now has UNSPECs in them, so CSE doesn't know it is a plain
load/store:
STP:
(insn 12 11 13 2 (set (mem/c:V2x16QI (reg:DI 102) [0 +0 S32 A128])
(unspec:V2x16QI [
(reg:V4SI 104)
(reg:V4SI 105)
] UNSPEC_STP)) "/app/example.c":5:5 -1
(nil))
LDP:
(insn 16 15 17 2 (parallel [
(set (reg:V4SI 108)
(unspec:V4SI [
(mem/c:V2x16QI (reg:DI 107) [0 +0 S32 A128])
] UNSPEC_LDP_FST))
(set (reg:V4SI 109)
(unspec:V4SI [
(mem/c:V2x16QI (reg:DI 107) [0 +0 S32 A128])
] UNSPEC_LDP_SND))
]) "/app/example.c":6:5 -1
(nil))
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
` (3 preceding siblings ...)
2024-01-29 11:51 ` wilco at gcc dot gnu.org
@ 2024-01-31 16:58 ` wilco at gcc dot gnu.org
2024-03-07 20:44 ` law at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-01-31 16:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org
--- Comment #4 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Alex Coplan from comment #1)
> Confirmed.
>
> (In reply to Wilco from comment #0)
> > A possible fix would be to avoid emitting LDP/STP in memcpy/memmove/memset
> > expansions.
>
> Yeah, so I had posted
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636855.html for that
> but held off from committing it at the time as IMO there wasn't enough
> evidence to show that this helps in general (and the pass could in theory
> miss opportunities which would lead to regressions).
>
> But perhaps this is a good argument for going ahead with that change (of
> course it will need rebasing).
Yes I have a patch based on current trunk + my outstanding memset cleanup
patch. It's slightly faster but causes a small codesize regression. This
appears mostly due to GCC being overly aggressive in changing loads/stores with
a zero offset into indexing, a non-zero offset or a lo_sym. This not only
blocks LDP opportunities but also increases register pressure and spilling.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
` (4 preceding siblings ...)
2024-01-31 16:58 ` wilco at gcc dot gnu.org
@ 2024-03-07 20:44 ` law at gcc dot gnu.org
2024-03-07 21:25 ` cvs-commit at gcc dot gnu.org
2024-03-13 14:36 ` wilco at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 20:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |law at gcc dot gnu.org
Priority|P3 |P2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
` (5 preceding siblings ...)
2024-03-07 20:44 ` law at gcc dot gnu.org
@ 2024-03-07 21:25 ` cvs-commit at gcc dot gnu.org
2024-03-13 14:36 ` wilco at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-07 21:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:
https://gcc.gnu.org/g:19b23bf3c32df3cbb96b3d898a1d7142f7bea4a0
commit r14-9373-g19b23bf3c32df3cbb96b3d898a1d7142f7bea4a0
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date: Wed Feb 21 23:33:58 2024 +0000
AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618]
The new RTL introduced for LDP/STP results in regressions due to use of
UNSPEC.
Given the new LDP fusion pass is good at finding LDP opportunities, change
the
memcpy, memmove and memset expansions to emit single vector loads/stores.
This fixes the regression and enables more RTL optimization on the standard
memory accesses. Handling of unaligned tail of memcpy/memmove is improved
with -mgeneral-regs-only. SPEC2017 performance improves slightly.
Codesize
is a bit worse due to missed LDP opportunities as discussed in the PR.
gcc/ChangeLog:
PR target/113618
* config/aarch64/aarch64.cc (aarch64_copy_one_block): Remove.
(aarch64_expand_cpymem): Emit single load/store only.
(aarch64_set_one_block): Emit single stores only.
gcc/testsuite/ChangeLog:
PR target/113618
* gcc.target/aarch64/pr113618.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/113618] [14 Regression] AArch64: memmove idiom regression
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
` (6 preceding siblings ...)
2024-03-07 21:25 ` cvs-commit at gcc dot gnu.org
@ 2024-03-13 14:36 ` wilco at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: wilco at gcc dot gnu.org @ 2024-03-13 14:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #6 from Wilco <wilco at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-03-13 14:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-26 14:49 [Bug target/113618] New: [14 Regression] AArch64: memmove idiom regression wilco at gcc dot gnu.org
2024-01-26 15:05 ` [Bug target/113618] " acoplan at gcc dot gnu.org
2024-01-26 16:51 ` pinskia at gcc dot gnu.org
2024-01-29 7:55 ` rguenth at gcc dot gnu.org
2024-01-29 11:51 ` wilco at gcc dot gnu.org
2024-01-31 16:58 ` wilco at gcc dot gnu.org
2024-03-07 20:44 ` law at gcc dot gnu.org
2024-03-07 21:25 ` cvs-commit at gcc dot gnu.org
2024-03-13 14:36 ` wilco at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).