[Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
@ 2022-10-26  9:16 rvmallad at amazon dot com
  2022-10-26  9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-26  9:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

            Bug ID: 107413
           Summary: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rvmallad at amazon dot com
  Target Milestone: ---

Created attachment 53775
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53775&action=edit
Input and source files.

Below is some perf data executing the 519.lbm_r benchmark on aarch64
architecture (Graviton 3 processor). I have comparison of the baseline perf
(mainline commit ID: f56d48b2471c388401174029324e1f4c4b84fcdb) vs. a fix for
the same (revert the code change in commit ID:
b5b33e113434be909e8a6d7b93824196fb6925c0).

Steps to compile:
$ gcc -std=c99 -mabi=lp64 -g -Ofast -mcpu=native lbm.i main.i -lm -flto -o
519_lbm_r_base

$ time ./519_lbm_r_base 3000 reference.dat 0 0 100_100_130_ldc.of
real    2m50.946s

Reverting the code changes in commit ID:
b5b33e113434be909e8a6d7b93824196fb6925c0
$ time ./519_lbm_r_fix 3000 reference.dat 0 0 100_100_130_ldc.of
real    2m27.157s

The code change reverted was:
    [AArch64] PR84114: Avoid reassociating FMA

Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Mon Mar 5 14:40:55 2018 +0000

Please find attached the files to reproduce this issue and the fix.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
@ 2022-10-26  9:20 ` rvmallad at amazon dot com
  2022-10-26 11:47 ` wilco at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-26  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #1 from Rama Malladi <rvmallad at amazon dot com> ---
$ /home/ubuntu/gccfixissue2/bin/gcc  -v
Using built-in specs.
COLLECT_GCC=/home/ubuntu/gccfixissue2/bin/gcc
COLLECT_LTO_WRAPPER=/home/ubuntu/gccfixissue2/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/ubuntu/gccfixissue2
--enable-languages=c,fortran
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20221021 (experimental) (GCC)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
  2022-10-26  9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
@ 2022-10-26 11:47 ` wilco at gcc dot gnu.org
  2022-10-26 19:03 ` rvmallad at amazon dot com
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-10-26 11:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #2 from Wilco <wilco at gcc dot gnu.org> ---
That's interesting - if the reassociation pass has become a bit smarter in the
last 5 years, we might no longer need this workaround. What is the effect on
the overall SPECFP score? Did you try other values like fp_reassoc_width = 2 or
3?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
  2022-10-26  9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
  2022-10-26 11:47 ` wilco at gcc dot gnu.org
@ 2022-10-26 19:03 ` rvmallad at amazon dot com
  2022-10-27 12:19 ` mark at gcc dot gnu.org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-26 19:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #3 from Rama Malladi <rvmallad at amazon dot com> ---
I will get the effect of this revert for the overall SPEC FP score. I haven't
tried experimenting with fp_reassoc_width values. Will try it and update.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (2 preceding siblings ...)
  2022-10-26 19:03 ` rvmallad at amazon dot com
@ 2022-10-27 12:19 ` mark at gcc dot gnu.org
  2022-10-28 10:41 ` rvmallad at amazon dot com
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: mark at gcc dot gnu.org @ 2022-10-27 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #4 from Mark Wielaard <mark at gcc dot gnu.org> ---
The content of attachment 53775 has been deleted for the following reason:

https://sourceware.org/pipermail/overseers/2022q4/019048.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (3 preceding siblings ...)
  2022-10-27 12:19 ` mark at gcc dot gnu.org
@ 2022-10-28 10:41 ` rvmallad at amazon dot com
  2022-10-28 10:46 ` rvmallad at amazon dot com
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-28 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #5 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #2)
> That's interesting - if the reassociation pass has become a bit smarter in
> the last 5 years, we might no longer need this workaround. What is the
> effect on the overall SPECFP score? Did you try other values like
> fp_reassoc_width = 2 or 3?

Here is SPEC cpu2017 fprate perf data for 1-copy rate run. The runs were run on
a c7g.16xlarge AWS cloud instance.

Benchmark       w fix
----------------------
503.bwaves_r    0.98
507.cactuBSSN_r NA
508.namd_r      0.97
510.parest_r    NA
511.povray_r    1.01
519.lbm_r       1.16
521.wrf_r       1.00
526.blender_r   NA
527.cam4_r      1.00
538.imagick_r   0.99
544.nab_r       1.00
549.fotonik3d_r NA
554.roms_r      1.00
geomean         1.01

The baseline was gcc version 12.2.0 (GCC) compiler. Fix was revert of code
change in commit: b5b33e113434be909e8a6d7b93824196fb6925c0.

So, looks like we aren't impacted much with this commit revert.

I haven't yet tried fp_reassoc_width. Will try shortly.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (4 preceding siblings ...)
  2022-10-28 10:41 ` rvmallad at amazon dot com
@ 2022-10-28 10:46 ` rvmallad at amazon dot com
  2022-11-01 12:48 ` [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be wilco at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-10-28 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #6 from Rama Malladi <rvmallad at amazon dot com> ---
The compilation options were: -Ofast -mcpu=native -flto

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (5 preceding siblings ...)
  2022-10-28 10:46 ` rvmallad at amazon dot com
@ 2022-11-01 12:48 ` wilco at gcc dot gnu.org
  2022-11-02  0:29 ` rvmallad at amazon dot com
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-11-01 12:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #7 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #5)

> So, looks like we aren't impacted much with this commit revert.
> 
> I haven't yet tried fp_reassoc_width. Will try shortly.

The revert results in about 0.5% loss on Neoverse N1, so it looks like the
reassociation pass is still splitting FMAs into separate MUL and ADD (which is
bad for narrow cores).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (6 preceding siblings ...)
  2022-11-01 12:48 ` [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be wilco at gcc dot gnu.org
@ 2022-11-02  0:29 ` rvmallad at amazon dot com
  2022-11-02 23:39 ` rvmallad at amazon dot com
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-02  0:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #8 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #7)
> The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> reassociation pass is still splitting FMAs into separate MUL and ADD (which
> is bad for narrow cores).

Thank you for checking on N1. Did you happen to check on V1 too to reproduce
the perf results I had? Any other experiments/ tests I can do to help on this
filing? Thanks again for the debug/ fix.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (7 preceding siblings ...)
  2022-11-02  0:29 ` rvmallad at amazon dot com
@ 2022-11-02 23:39 ` rvmallad at amazon dot com
  2022-11-04 17:26 ` wilco at gcc dot gnu.org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-02 23:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #9 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Rama Malladi from comment #8)
> (In reply to Wilco from comment #7)
> > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > is bad for narrow cores).
> 
> Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> the perf results I had? Any other experiments/ tests I can do to help on
> this filing? Thanks again for the debug/ fix.

I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
using option 'neoverse-n1' on the Graviton 3 processor (which has support for
SVE). The performance was up by 0.4%, primary contributor being 519.lbm_r which
was up 13%.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (8 preceding siblings ...)
  2022-11-02 23:39 ` rvmallad at amazon dot com
@ 2022-11-04 17:26 ` wilco at gcc dot gnu.org
  2022-11-07  7:42 ` rvmallad at amazon dot com
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-11-04 17:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-11-04
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |wilco at gcc dot gnu.org

--- Comment #10 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #9)
> (In reply to Rama Malladi from comment #8)
> > (In reply to Wilco from comment #7)
> > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > > is bad for narrow cores).
> > 
> > Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> > the perf results I had? Any other experiments/ tests I can do to help on
> > this filing? Thanks again for the debug/ fix.
> 
> I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
> using option 'neoverse-n1' on the Graviton 3 processor (which has support
> for SVE). The performance was up by 0.4%, primary contributor being
> 519.lbm_r which was up 13%.

I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
post a patch that allows per-CPU settings for FMA reassociation, so you'll get
good performance with -mcpu=native. However reassociation really needs to be
taught about the existence of FMAs.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (9 preceding siblings ...)
  2022-11-04 17:26 ` wilco at gcc dot gnu.org
@ 2022-11-07  7:42 ` rvmallad at amazon dot com
  2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-07  7:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #11 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #10)
> I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
> post a patch that allows per-CPU settings for FMA reassociation, so you'll
> get good performance with -mcpu=native. However reassociation really needs
> to be taught about the existence of FMAs.

Thank you very much Wilco.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (10 preceding siblings ...)
  2022-11-07  7:42 ` rvmallad at amazon dot com
@ 2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
  2022-11-28  8:33 ` rvmallad at amazon dot com
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-24 13:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:

https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff900377772

commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff900377772
Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
Date:   Wed Nov 23 17:27:19 2022 +0000

    AArch64: Add fma_reassoc_width [PR107413]

    Add a reassocation width for FMA in per-CPU tuning structures. Keep
    the existing setting of 1 for cores with 2 FMA pipes (this disables
    reassociation), and use 4 for cores with 4 FMA pipes.  This improves
    SPECFP2017 on Neoverse V1 by ~1.5%.

    gcc/
            PR tree-optimization/107413
            * config/aarch64/aarch64.cc (struct tune_params): Add
            fma_reassoc_width to all CPU tuning structures.
            (aarch64_reassociation_width): Use fma_reassoc_width.
            * config/aarch64/aarch64-protos.h (struct tune_params): Add
            fma_reassoc_width.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (11 preceding siblings ...)
  2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
@ 2022-11-28  8:33 ` rvmallad at amazon dot com
  2022-11-29  9:04 ` rvmallad at amazon dot com
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-28  8:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #13 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to CVS Commits from comment #12)
> The master branch has been updated by Wilco Dijkstra <wilco@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff900377772
> 
> commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff900377772
> Author: Wilco Dijkstra <wilco.dijkstra@arm.com>
> Date:   Wed Nov 23 17:27:19 2022 +0000
> 
>     AArch64: Add fma_reassoc_width [PR107413]
>     
>     Add a reassocation width for FMA in per-CPU tuning structures. Keep
>     the existing setting of 1 for cores with 2 FMA pipes (this disables
>     reassociation), and use 4 for cores with 4 FMA pipes.  This improves
>     SPECFP2017 on Neoverse V1 by ~1.5%.
>     
>     gcc/
>             PR tree-optimization/107413
>             * config/aarch64/aarch64.cc (struct tune_params): Add
>             fma_reassoc_width to all CPU tuning structures.
>             (aarch64_reassociation_width): Use fma_reassoc_width.
>             * config/aarch64/aarch64-protos.h (struct tune_params): Add
>             fma_reassoc_width.

Thank you for this code change/ fix. I will attempt a run with this change.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (12 preceding siblings ...)
  2022-11-28  8:33 ` rvmallad at amazon dot com
@ 2022-11-29  9:04 ` rvmallad at amazon dot com
  2022-11-29 12:55 ` wilco at gcc dot gnu.org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-29  9:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #14 from Rama Malladi <rvmallad at amazon dot com> ---
This fix also improved performance of 538.imagick_r by 15%. Did you have a
similar observation? Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (13 preceding siblings ...)
  2022-11-29  9:04 ` rvmallad at amazon dot com
@ 2022-11-29 12:55 ` wilco at gcc dot gnu.org
  2022-11-30  4:15 ` rvmallad at amazon dot com
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-11-29 12:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #15 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #14)
> This fix also improved performance of 538.imagick_r by 15%. Did you have a
> similar observation? Thank you.

No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
-mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall FP
score?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (14 preceding siblings ...)
  2022-11-29 12:55 ` wilco at gcc dot gnu.org
@ 2022-11-30  4:15 ` rvmallad at amazon dot com
  2022-12-01 13:13 ` wilco at gcc dot gnu.org
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-11-30  4:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #16 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #15)
> (In reply to Rama Malladi from comment #14)
> > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > similar observation? Thank you.
> 
> No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> FP score?

I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
the scores I got (relative gains of latest mainline vs. an earlier mainline).

Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c

fp 1-copy rate  Ratio
503.bwaves_r    0.98
507.cactuBSSN_r 1.00
508.namd_r      0.97
510.parest_r    NA
511.povray_r    NA
519.lbm_r       1.16
521.wrf_r       1.00
526.blender_r   0.99
527.cam4_r      NA
538.imagick_r   1.17
544.nab_r       1.01
549.fotonik3d_r NA
554.roms_r      1.00
geomean         1.03

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (15 preceding siblings ...)
  2022-11-30  4:15 ` rvmallad at amazon dot com
@ 2022-12-01 13:13 ` wilco at gcc dot gnu.org
  2022-12-01 16:33 ` rvmallad at amazon dot com
  2022-12-02  2:30 ` rvmallad at amazon dot com
  18 siblings, 0 replies; 20+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-12-01 13:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #17 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Rama Malladi from comment #16)
> (In reply to Wilco from comment #15)
> > (In reply to Rama Malladi from comment #14)
> > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > similar observation? Thank you.
> > 
> > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > FP score?
> 
> I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> the scores I got (relative gains of latest mainline vs. an earlier mainline).
> 
> Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c

Right that's about 3 weeks of changes, I think
1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.

> geomean	        1.03

That's a nice gain in 3 weeks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (16 preceding siblings ...)
  2022-12-01 13:13 ` wilco at gcc dot gnu.org
@ 2022-12-01 16:33 ` rvmallad at amazon dot com
  2022-12-02  2:30 ` rvmallad at amazon dot com
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-12-01 16:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #18 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #17)
> (In reply to Rama Malladi from comment #16)
> > (In reply to Wilco from comment #15)
> > > (In reply to Rama Malladi from comment #14)
> > > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > > similar observation? Thank you.
> > > 
> > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > > FP score?
> > 
> > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> > the scores I got (relative gains of latest mainline vs. an earlier mainline).
> > 
> > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
> 
> Right that's about 3 weeks of changes, I think
> 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
> 
> > geomean	        1.03
> 
> That's a nice gain in 3 weeks!

Yes, indeed :-) ... Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
  2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
                   ` (17 preceding siblings ...)
  2022-12-01 16:33 ` rvmallad at amazon dot com
@ 2022-12-02  2:30 ` rvmallad at amazon dot com
  18 siblings, 0 replies; 20+ messages in thread
From: rvmallad at amazon dot com @ 2022-12-02  2:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #19 from Rama Malladi <rvmallad at amazon dot com> ---
(In reply to Wilco from comment #17)
> (In reply to Rama Malladi from comment #16)
> > (In reply to Wilco from comment #15)
> > > (In reply to Rama Malladi from comment #14)
> > > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > > similar observation? Thank you.
> > > 
> > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > > FP score?
> > 
> > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> > the scores I got (relative gains of latest mainline vs. an earlier mainline).
> > 
> > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
> 
> Right that's about 3 weeks of changes, I think
> 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
> 
> > geomean	        1.03
> 
> That's a nice gain in 3 weeks!

Hi Wilco, Could you backport the change to active release branches? Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-02  2:30 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26  9:16 [Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark rvmallad at amazon dot com
2022-10-26  9:20 ` [Bug tree-optimization/107413] " rvmallad at amazon dot com
2022-10-26 11:47 ` wilco at gcc dot gnu.org
2022-10-26 19:03 ` rvmallad at amazon dot com
2022-10-27 12:19 ` mark at gcc dot gnu.org
2022-10-28 10:41 ` rvmallad at amazon dot com
2022-10-28 10:46 ` rvmallad at amazon dot com
2022-11-01 12:48 ` [Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be wilco at gcc dot gnu.org
2022-11-02  0:29 ` rvmallad at amazon dot com
2022-11-02 23:39 ` rvmallad at amazon dot com
2022-11-04 17:26 ` wilco at gcc dot gnu.org
2022-11-07  7:42 ` rvmallad at amazon dot com
2022-11-24 13:30 ` cvs-commit at gcc dot gnu.org
2022-11-28  8:33 ` rvmallad at amazon dot com
2022-11-29  9:04 ` rvmallad at amazon dot com
2022-11-29 12:55 ` wilco at gcc dot gnu.org
2022-11-30  4:15 ` rvmallad at amazon dot com
2022-12-01 13:13 ` wilco at gcc dot gnu.org
2022-12-01 16:33 ` rvmallad at amazon dot com
2022-12-02  2:30 ` rvmallad at amazon dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).