public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
@ 2022-10-26 9:16 rvmallad at amazon dot com
2022-10-26 9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-26 9:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
Bug ID: 107413
Summary: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rvmallad at amazon dot com
Target Milestone: ---
Created attachment 53775
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53775&action=edit
Input and source files.
Below is some perf data executing the 519.lbm_r benchmark on aarch64
architecture (Graviton 3 processor). I have comparison of the baseline perf
(mainline commit ID: f56d48b2471c388401174029324e1f4c4b84fcdb) vs. a fix for
the same (revert the code change in commit ID:
b5b33e113434be909e8a6d7b93824196fb6925c0).
Steps to compile:
$ gcc -std=c99 -mabi=lp64 -g -Ofast -mcpu=native lbm.i main.i -lm -flto -o
519_lbm_r_base
$ time ./519_lbm_r_base 3000 reference.dat 0 0 100_100_130_ldc.of
real 2m50.946s
Reverting the code changes in commit ID:
b5b33e113434be909e8a6d7b93824196fb6925c0
$ time ./519_lbm_r_fix 3000 reference.dat 0 0 100_100_130_ldc.of
real 2m27.157s
The code change reverted was:
[AArch64] PR84114: Avoid reassociating FMA
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date: Mon Mar 5 14:40:55 2018 +0000
Please find attached the files to reproduce this issue and the fix.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
@ 2022-10-26 9:20 ` rvmallad at amazon dot com
2022-10-26 11:47 ` wilco at gcc dot gnu.org
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-26 9:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #1 from Rama Malladi <rvmallad at amazon dot com> ---
$ /home/ubuntu/gccfixissue2/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/home/ubuntu/gccfixissue2/bin/gcc
COLLECT_LTO_WRAPPER=/home/ubuntu/gccfixissue2/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/ubuntu/gccfixissue2
--enable-languages=c,fortran
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20221021 (experimental) (GCC)
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
2022-10-26 9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
@ 2022-10-26 11:47 ` wilco at gcc dot gnu.org
2022-10-26 19:03 ` rvmallad at amazon dot com
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-10-26 11:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilco at gcc dot gnu.org
--- Comment #2 from Wilco <wilco at gcc dot gnu.org> ---
That's interesting - if the reassociation pass has become a bit smarter in the
last 5 years, we might no longer need this workaround. What is the effect on
the overall SPECFP score? Did you try other values like fp_reassoc_width = 2 or
3?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
2022-10-26 9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
2022-10-26 11:47 ` wilco at gcc dot gnu.org
@ 2022-10-26 19:03 ` rvmallad at amazon dot com
2022-10-27 12:19 ` mark at gcc dot gnu.org
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-26 19:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #3 from Rama Malladi <rvmallad at amazon dot com> ---
I will get the effect of this revert for the overall SPEC FP score. I haven't
tried experimenting with fp_reassoc_width values. Will try it and update.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (2 preceding siblings ...)
2022-10-26 19:03 ` rvmallad at amazon dot com
@ 2022-10-27 12:19 ` mark at gcc dot gnu.org
2022-10-28 10:41 ` rvmallad at amazon dot com
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: mark at gcc dot gnu.org @ 2022-10-27 12:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #4 from Mark Wielaard <mark at gcc dot gnu.org> ---
The content of attachment 53775 has been deleted for the following reason:
https://sourceware.org/pipermail/overseers/2022q4/019048.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (3 preceding siblings ...)
2022-10-27 12:19 ` mark at gcc dot gnu.org
@ 2022-10-28 10:41 ` rvmallad at amazon dot com
2022-10-28 10:46 ` rvmallad at amazon dot com
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-28 10:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #5 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #2)
> That's interesting - if the reassociation pass has become a bit smarter in
> the last 5 years, we might no longer need this workaround. What is the
> effect on the overall SPECFP score? Did you try other values like
> fp_reassoc_width = 2 or 3?
Here is SPEC cpu2017 fprate perf data for 1-copy rate run. The runs were run on
a c7g.16xlarge AWS cloud instance.
Benchmark w fix
----------------------
503.bwaves_r 0.98
507.cactuBSSN_r NA
508.namd_r 0.97
510.parest_r NA
511.povray_r 1.01
519.lbm_r 1.16
521.wrf_r 1.00
526.blender_r NA
527.cam4_r 1.00
538.imagick_r 0.99
544.nab_r 1.00
549.fotonik3d_r NA
554.roms_r 1.00
geomean 1.01
The baseline was gcc version 12.2.0 (GCC) compiler. Fix was revert of code
change in commit: b5b33e113434be909e8a6d7b93824196fb6925c0.
So, looks like we aren't impacted much with this commit revert.
I haven't yet tried fp_reassoc_width. Will try shortly.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (4 preceding siblings ...)
2022-10-28 10:41 ` rvmallad at amazon dot com
@ 2022-10-28 10:46 ` rvmallad at amazon dot com
2022-11-01 12:48 ` [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be wilco at gcc dot gnu.org
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-28 10:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #6 from Rama Malladi <rvmallad at amazon dot com> ---
The compilation options were: -Ofast -mcpu=native -flto
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (5 preceding siblings ...)
2022-10-28 10:46 ` rvmallad at amazon dot com
@ 2022-11-01 12:48 ` wilco at gcc dot gnu.org
2022-11-02 0:29 ` rvmallad at amazon dot com
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-11-01 12:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #7 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #5)
> So, looks like we aren't impacted much with this commit revert.
>
> I haven't yet tried fp_reassoc_width. Will try shortly.
The revert results in about 0.5% loss on Neoverse N1, so it looks like the
reassociation pass is still splitting FMAs into separate MUL and ADD (which is
bad for narrow cores).
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (6 preceding siblings ...)
2022-11-01 12:48 ` [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be wilco at gcc dot gnu.org
@ 2022-11-02 0:29 ` rvmallad at amazon dot com
2022-11-02 23:39 ` rvmallad at amazon dot com
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-02 0:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #8 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #7)
> The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> reassociation pass is still splitting FMAs into separate MUL and ADD (which
> is bad for narrow cores).
Thank you for checking on N1. Did you happen to check on V1 too to reproduce
the perf results I had? Any other experiments/ tests I can do to help on this
filing? Thanks again for the debug/ fix.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (7 preceding siblings ...)
2022-11-02 0:29 ` rvmallad at amazon dot com
@ 2022-11-02 23:39 ` rvmallad at amazon dot com
2022-11-04 17:26 ` wilco at gcc dot gnu.org
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-02 23:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #9 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Rama Malladi from comment #8)
> (In reply to Wilco from comment #7)
> > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > is bad for narrow cores).
>
> Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> the perf results I had? Any other experiments/ tests I can do to help on
> this filing? Thanks again for the debug/ fix.
I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
using option 'neoverse-n1' on the Graviton 3 processor (which has support for
SVE). The performance was up by 0.4%, primary contributor being 519.lbm_r which
was up 13%.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (8 preceding siblings ...)
2022-11-02 23:39 ` rvmallad at amazon dot com
@ 2022-11-04 17:26 ` wilco at gcc dot gnu.org
2022-11-07 7:42 ` rvmallad at amazon dot com
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-11-04 17:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2022-11-04
Status|UNCONFIRMED |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org
--- Comment #10 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #9)
> (In reply to Rama Malladi from comment #8)
> > (In reply to Wilco from comment #7)
> > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > > is bad for narrow cores).
> >
> > Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> > the perf results I had? Any other experiments/ tests I can do to help on
> > this filing? Thanks again for the debug/ fix.
>
> I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
> using option 'neoverse-n1' on the Graviton 3 processor (which has support
> for SVE). The performance was up by 0.4%, primary contributor being
> 519.lbm_r which was up 13%.
I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
post a patch that allows per-CPU settings for FMA reassociation, so you'll get
good performance with -mcpu=native. However reassociation really needs to be
taught about the existence of FMAs.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (9 preceding siblings ...)
2022-11-04 17:26 ` wilco at gcc dot gnu.org
@ 2022-11-07 7:42 ` rvmallad at amazon dot com
2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-07 7:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #11 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #10)
> I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
> post a patch that allows per-CPU settings for FMA reassociation, so you'll
> get good performance with -mcpu=native. However reassociation really needs
> to be taught about the existence of FMAs.
Thank you very much Wilco.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (10 preceding siblings ...)
2022-11-07 7:42 ` rvmallad at amazon dot com
@ 2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
2022-11-28 8:33 ` rvmallad at amazon dot com
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-24 13:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:
https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff900377772
commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff900377772
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date: Wed Nov 23 17:27:19 2022 +0000
AArch64: Add fma_reassoc_width [PR107413]
Add a reassocation width for FMA in per-CPU tuning structures. Keep
the existing setting of 1 for cores with 2 FMA pipes (this disables
reassociation), and use 4 for cores with 4 FMA pipes. This improves
SPECFP2017 on Neoverse V1 by ~1.5%.
gcc/
PR tree-optimization/107413
* config/aarch64/aarch64.cc (struct tune_params): Add
fma_reassoc_width to all CPU tuning structures.
(aarch64_reassociation_width): Use fma_reassoc_width.
* config/aarch64/aarch64-protos.h (struct tune_params): Add
fma_reassoc_width.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (11 preceding siblings ...)
2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
@ 2022-11-28 8:33 ` rvmallad at amazon dot com
2022-11-29 9:04 ` rvmallad at amazon dot com
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-28 8:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #13 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to CVS Commits from comment #12)
> The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:
>
> https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff900377772
>
> commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff900377772
> Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
> Date: Wed Nov 23 17:27:19 2022 +0000
>
> AArch64: Add fma_reassoc_width [PR107413]
>
> Add a reassocation width for FMA in per-CPU tuning structures. Keep
> the existing setting of 1 for cores with 2 FMA pipes (this disables
> reassociation), and use 4 for cores with 4 FMA pipes. This improves
> SPECFP2017 on Neoverse V1 by ~1.5%.
>
> gcc/
> PR tree-optimization/107413
> * config/aarch64/aarch64.cc (struct tune_params): Add
> fma_reassoc_width to all CPU tuning structures.
> (aarch64_reassociation_width): Use fma_reassoc_width.
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> fma_reassoc_width.
Thank you for this code change/ fix. I will attempt a run with this change.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (12 preceding siblings ...)
2022-11-28 8:33 ` rvmallad at amazon dot com
@ 2022-11-29 9:04 ` rvmallad at amazon dot com
2022-11-29 12:55 ` wilco at gcc dot gnu.org
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-29 9:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #14 from Rama Malladi <rvmallad at amazon dot com> ---
This fix also improved performance of 538.imagick_r by 15%. Did you have a
similar observation? Thank you.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (13 preceding siblings ...)
2022-11-29 9:04 ` rvmallad at amazon dot com
@ 2022-11-29 12:55 ` wilco at gcc dot gnu.org
2022-11-30 4:15 ` rvmallad at amazon dot com
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-11-29 12:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #15 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #14)
> This fix also improved performance of 538.imagick_r by 15%. Did you have a
> similar observation? Thank you.
No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
-mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall FP
score?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (14 preceding siblings ...)
2022-11-29 12:55 ` wilco at gcc dot gnu.org
@ 2022-11-30 4:15 ` rvmallad at amazon dot com
2022-12-01 13:13 ` wilco at gcc dot gnu.org
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-30 4:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #16 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #15)
> (In reply to Rama Malladi from comment #14)
> > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > similar observation? Thank you.
>
> No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> FP score?
I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
the scores I got (relative gains of latest mainline vs. an earlier mainline).
Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
fp 1-copy rate Ratio
503.bwaves_r 0.98
507.cactuBSSN_r 1.00
508.namd_r 0.97
510.parest_r NA
511.povray_r NA
519.lbm_r 1.16
521.wrf_r 1.00
526.blender_r 0.99
527.cam4_r NA
538.imagick_r 1.17
544.nab_r 1.01
549.fotonik3d_r NA
554.roms_r 1.00
geomean 1.03
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (15 preceding siblings ...)
2022-11-30 4:15 ` rvmallad at amazon dot com
@ 2022-12-01 13:13 ` wilco at gcc dot gnu.org
2022-12-01 16:33 ` rvmallad at amazon dot com
2022-12-02 2:30 ` rvmallad at amazon dot com
18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-12-01 13:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #17 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #16)
> (In reply to Wilco from comment #15)
> > (In reply to Rama Malladi from comment #14)
> > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > similar observation? Thank you.
> >
> > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > FP score?
>
> I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> the scores I got (relative gains of latest mainline vs. an earlier mainline).
>
> Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
Right that's about 3 weeks of changes, I think
1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
> geomean 1.03
That's a nice gain in 3 weeks!
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (16 preceding siblings ...)
2022-12-01 13:13 ` wilco at gcc dot gnu.org
@ 2022-12-01 16:33 ` rvmallad at amazon dot com
2022-12-02 2:30 ` rvmallad at amazon dot com
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-12-01 16:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #18 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #17)
> (In reply to Rama Malladi from comment #16)
> > (In reply to Wilco from comment #15)
> > > (In reply to Rama Malladi from comment #14)
> > > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > > similar observation? Thank you.
> > >
> > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > > FP score?
> >
> > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> > the scores I got (relative gains of latest mainline vs. an earlier mainline).
> >
> > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
>
> Right that's about 3 weeks of changes, I think
> 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
>
> > geomean 1.03
>
> That's a nice gain in 3 weeks!
Yes, indeed :-) ... Thank you.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
` (17 preceding siblings ...)
2022-12-01 16:33 ` rvmallad at amazon dot com
@ 2022-12-02 2:30 ` rvmallad at amazon dot com
18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-12-02 2:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
--- Comment #19 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #17)
> (In reply to Rama Malladi from comment #16)
> > (In reply to Wilco from comment #15)
> > > (In reply to Rama Malladi from comment #14)
> > > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > > similar observation? Thank you.
> > >
> > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > > FP score?
> >
> > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> > the scores I got (relative gains of latest mainline vs. an earlier mainline).
> >
> > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
>
> Right that's about 3 weeks of changes, I think
> 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
>
> > geomean 1.03
>
> That's a nice gain in 3 weeks!
Hi Wilco, Could you backport the change to active release branches? Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2022-12-02 2:30 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
2022-10-26 9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
2022-10-26 11:47 ` wilco at gcc dot gnu.org
2022-10-26 19:03 ` rvmallad at amazon dot com
2022-10-27 12:19 ` mark at gcc dot gnu.org
2022-10-28 10:41 ` rvmallad at amazon dot com
2022-10-28 10:46 ` rvmallad at amazon dot com
2022-11-01 12:48 ` [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be wilco at gcc dot gnu.org
2022-11-02 0:29 ` rvmallad at amazon dot com
2022-11-02 23:39 ` rvmallad at amazon dot com
2022-11-04 17:26 ` wilco at gcc dot gnu.org
2022-11-07 7:42 ` rvmallad at amazon dot com
2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
2022-11-28 8:33 ` rvmallad at amazon dot com
2022-11-29 9:04 ` rvmallad at amazon dot com
2022-11-29 12:55 ` wilco at gcc dot gnu.org
2022-11-30 4:15 ` rvmallad at amazon dot com
2022-12-01 13:13 ` wilco at gcc dot gnu.org
2022-12-01 16:33 ` rvmallad at amazon dot com
2022-12-02 2:30 ` rvmallad at amazon dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).