public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951.
@ 2023-05-29 14:42 d_vampile at 163 dot com
2023-05-29 14:46 ` [Bug target/110024] " d_vampile at 163 dot com
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-29 14:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
Bug ID: 110024
Summary: [Bug] 5% performance drop on important benchmark after
r260951.
Product: gcc
Version: 10.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: d_vampile at 163 dot com
Target Milestone: ---
After the patch is submitted on AArch64, the performance of copying subitems in
the stream benchmark decreases by 3%.
Alternatively, you can obtain it from
https://github.com/jeffhammond/stream/archive/master.zip.
Compiling & Running:
gcc -fopenmp -O -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream
./stream
Before modification: (copy subitem)
ldr d0, [x2, x0, lsl #3]
str d0, [x3, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret
After the modification:
ldr x2, [x3, x0, lsl #3]
str x2, [x4, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 14:42 [Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
@ 2023-05-29 14:46 ` d_vampile at 163 dot com
2023-05-29 14:48 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-29 14:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
--- Comment #1 from d_vampile <d_vampile at 163 dot com> ---
It can be seen that the vector register (D0) is used before the modification,
and the common register (X0) is used after the modification.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 14:42 [Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 14:46 ` [Bug target/110024] " d_vampile at 163 dot com
@ 2023-05-29 14:48 ` pinskia at gcc dot gnu.org
2023-05-29 15:09 ` d_vampile at 163 dot com
2023-05-29 15:19 ` d_vampile at 163 dot com
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-29 14:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2023-05-29
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Which core is showing the difference here?
Because some cores I know of, loading/storing using the FP registers is
actually one cycle slower than using GPRs.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 14:42 [Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 14:46 ` [Bug target/110024] " d_vampile at 163 dot com
2023-05-29 14:48 ` pinskia at gcc dot gnu.org
@ 2023-05-29 15:09 ` d_vampile at 163 dot com
2023-05-29 15:19 ` d_vampile at 163 dot com
3 siblings, 0 replies; 5+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-29 15:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
d_vampile <d_vampile at 163 dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution|--- |INVALID
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 14:42 [Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
` (2 preceding siblings ...)
2023-05-29 15:09 ` d_vampile at 163 dot com
@ 2023-05-29 15:19 ` d_vampile at 163 dot com
3 siblings, 0 replies; 5+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-29 15:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
--- Comment #3 from d_vampile <d_vampile at 163 dot com> ---
(In reply to Andrew Pinski from comment #2)
> Which core is showing the difference here?
> Because some cores I know of, loading/storing using the FP registers is
> actually one cycle slower than using GPRs.
Yes, you're right; This submission is due to my careless post wrong assembly
code location; The performance is better when the X0 register is used before
the modification. The question, however, is why this modification causes the
register to select D0 and performance degradation. In addition, I will continue
to follow up in the new submission, look forward to your reply.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-05-29 15:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-29 14:42 [Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 14:46 ` [Bug target/110024] " d_vampile at 163 dot com
2023-05-29 14:48 ` pinskia at gcc dot gnu.org
2023-05-29 15:09 ` d_vampile at 163 dot com
2023-05-29 15:19 ` d_vampile at 163 dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).