public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951.
@ 2023-05-29 15:14 d_vampile at 163 dot com
2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-29 15:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
Bug ID: 110026
Summary: [Bug] 5% performance drop on important benchmark after
r260951.
Product: gcc
Version: 10.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: d_vampile at 163 dot com
Target Milestone: ---
Created attachment 55184
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55184&action=edit
Open-source stream benchmark
After the patch is submitted on AArch64, the performance of copying subitems in
the stream benchmark decreases by 3%.
Alternatively, you can obtain it from
https://github.com/jeffhammond/stream/archive/master.zip.
Compiling & Running:
gcc -fopenmp -O -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream
./stream
Before modification: (copy subitem)
ldr x2, [x3, x0, lsl #3]
str x2, [x4, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret
After the modification:
ldr d0, [x2, x0, lsl #3]
str d0, [x3, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret
It can be seen that the vector register (X0) is used before the modification,
and the common register (D0) is used after the modification.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110026] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
@ 2023-05-29 15:17 ` jakub at gcc dot gnu.org
2023-05-30 14:37 ` [Bug target/110026] " d_vampile at 163 dot com
2023-05-30 14:48 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 15:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, any benchmarking for speed with -O rather than -O2/-O3 is intentionally
missing various optimizations which can greatly improve performance.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/110026] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
@ 2023-05-30 14:37 ` d_vampile at 163 dot com
2023-05-30 14:48 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-30 14:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
--- Comment #2 from d_vampile <d_vampile at 163 dot com> ---
(In reply to Jakub Jelinek from comment #1)
> Note, any benchmarking for speed with -O rather than -O2/-O3 is
> intentionally missing various optimizations which can greatly improve
> performance.
O0 does miss a lot of optimizations. However, for the problem I mentioned, the
GPRs used before and the FP registers after modification are used. When
vectorization is not applicable, the X0 register is faster than the D0
register. Is it appropriate to modify here?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/110026] [Bug] 5% performance drop on important benchmark after r260951.
2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
2023-05-30 14:37 ` [Bug target/110026] " d_vampile at 163 dot com
@ 2023-05-30 14:48 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-30 14:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |ra
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to d_vampile from comment #2)
> O0 does miss a lot of optimizations. However, for the problem I mentioned,
> the GPRs used before and the FP registers after modification are used. When
> vectorization is not applicable, the X0 register is faster than the D0
> register. Is it appropriate to modify here?
Well the generic_tunings has:
{ 4, /* load_int. */
4, /* store_int. */
4, /* load_fp. */
4, /* store_fp. */
4, /* load_pred. */
4 /* store_pred. */
}, /* memmov_cost. */
Which says the load/store of fp has the same cost as ints (gprs) (this is the
same as a53's tuning).
If anything that should be changed ....
Of you should use -mcpu=* where appliable.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-30 14:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
2023-05-30 14:37 ` [Bug target/110026] " d_vampile at 163 dot com
2023-05-30 14:48 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).