[Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
@ 2023-06-16  8:32 dizhao at os dot amperecomputing.com
  2023-06-16  8:41 ` [Bug tree-optimization/110279] [14 Regression] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2023-06-16  8:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

            Bug ID: 110279
           Summary: Regressions on aarch64 cause by handing FMA in reassoc
                    (510.parest_r, 508.namd_r)
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dizhao at os dot amperecomputing.com
  Target Milestone: ---

Created attachment 55339
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55339&action=edit
[PATCH] Check for nested FMA chains in reassoc

After testing the recent patch "Handle FMA friendly in reassoc pass" (e5405f06)
on some of our aarch64 machines, I found regressions in a few spec2017 fprate
cases.

On ampere1, the patch introduced approximately 2% regression in 510.parest_r.
Additionally, with fp_reassoc_width changed so that reassociation actually
works on floating points additions (which brings about 1% overall benefit),
there's approximately 5% regression in 508.namd_r on ampere1, and 2.6% on
neoverse-n1.

The compile options we used is "-Ofast -mcpu=ampere1 -flto=32 --param
avoid-fma-max-bits=512" for ampere1, and "-Ofast -mcpu=neoverse-n1 -flto=32"
for neoverse-n1. The tests are single copy run.

Below is from my investigations.

1) From perf result, the regression in 510.parest_r is because the re-arranging
in rank_ops_for_fma() produced 2 FMAs in a small loop, with the last FMA's
result fed back into first one from PHI. With avoid-fma-max-bits, these
candidates are dropped in widening_mul, causing 2% regression; without the
parameter there is 1% regression.

Before the patch, the generated code looks like:
        label:  ....
               fmul v2, v2, v3
               fmla v2, v4, v5
               fadd v1, v1, v2
               ...
               b.ne  label

After the patch (without avoid-fma-max-bits):
        label:  ...
               fmla v1, v2, v3
               fmla v1, v4, v5
               ...
               b.ne  label

2) As for 508.namd_r, there are slightly fewer FMAs generated. It seems the
patch is not handling FMAs like ((a * b + c) * d + e) *... well. For example,
below is a piece of CFG before reassoc2:

  _797 = A_788 * _796;
  fast_c = _797 + _1161;
  _815 = diffa * fast_d;
  _816 = fast_c + _815;
  _817 = diffa * _816;
  fast_dir = fast_b + _817;

Before the patch, optimized code looks like:

  fast_c = .FNMA (B_790, _798, _334);
  _816 = .FMA (diffa, fast_d, fast_c);
  fast_dir = .FMA (diffa, _816, fast_b);

After the patch:

  _815 = diffa * fast_d;
  _5910 = .FMA (A_788, _796, _815);
  _816 = _5909 + _5910;
  _817 = diffa * _816;
  _5908 = .FMA (A_788, _801, _817);
  fast_dir = _5907 + _5908;

I came out with a patch to solve this. I'll also attach here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
@ 2023-06-16  8:41 ` rguenth at gcc dot gnu.org
  2023-08-09 18:09 ` dizhao at os dot amperecomputing.com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-16  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Target Milestone|---                         |14.0
            Summary|Regressions on aarch64      |[14 Regression] Regressions
                   |cause by handing FMA in     |on aarch64 cause by handing
                   |reassoc (510.parest_r,      |FMA in reassoc
                   |508.namd_r)                 |(510.parest_r, 508.namd_r)
             Target|                            |aarch64

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
  2023-06-16  8:41 ` [Bug tree-optimization/110279] [14 Regression] " rguenth at gcc dot gnu.org
@ 2023-08-09 18:09 ` dizhao at os dot amperecomputing.com
  2023-11-23 12:57 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2023-08-09 18:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #1 from Di Zhao <dizhao at os dot amperecomputing.com> ---
Here's a small example for the issue exposed in 508.namd_r:

  #define LOOP_COUNT 800000000
  typedef double data_e;
  #include <stdio.h>

  __attribute_noinline__ data_e
  foo (data_e a, data_e b, data_e c, data_e d, data_e x, data_e y)
  {
    data_e tmp1, tmp2;
    data_e result = 0;

    for (int ic = 0; ic < LOOP_COUNT; ic++)
      {
        /*  LHS is operator of another FMA, re-writing to parallel is worse. 
*/
        tmp1 = a + c * c - d * d + x * y;

        tmp2 = x * tmp1;
        result += (a + c - d + tmp2);

        a -= 0.1;
        b += 0.9;
        c *= 1.02;
        x *= 0.1;
        y *= y;
        d *= 0.61;
      }

    return result;
  }

  int
  main (int argc, char **argv)
  {
    printf ("%f\n", foo (-1.0, 0.01, 9.8, 1e2, -1.9, 0.2));
  }

Tested on the following platforms, rewriting all the two op list is worse than
no-rewriting or only rewriting "result" (compile option I used are "-Ofast
--param tree-reassoc-width=4 -march=native"):

run         no        rewrite    rewrite   rewrite
time        rewrite   "result"   "tmp1"    both
-----------------------------------------------
Ampere1     1.80       1.93       2.04     2.10
Neoverse-n1 1.36       1.45       1.49     1.52
Intel Xeon  1.57       1.55       1.66     1.62

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
  2023-06-16  8:41 ` [Bug tree-optimization/110279] [14 Regression] " rguenth at gcc dot gnu.org
  2023-08-09 18:09 ` dizhao at os dot amperecomputing.com
@ 2023-11-23 12:57 ` cvs-commit at gcc dot gnu.org
  2023-11-27 22:45 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-23 12:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Di Zhao <dzhao@gcc.gnu.org>:

https://gcc.gnu.org/g:746344dd53807d840c29f52adba10d0ab093bd3d

commit r14-5779-g746344dd53807d840c29f52adba10d0ab093bd3d
Author: Di Zhao <dizhao@os.amperecomputing.com>
Date:   Thu Nov 9 15:06:37 2023 +0800

    swap ops in reassoc to reduce cross backedge FMA

    Previously for ops.length >= 3, when FMA is present, we don't
    rank the operands so that more FMAs can be preserved. But this
    brings more FMAs with loop dependency, which lead to worse
    performance on some targets.

    Rank the oprands (set width=2) when:
    1. avoid_fma_max_bits is set.
    2. And loop dependent FMA sequence is found.

    In this way, we don't have to discard all the FMA candidates
    in the bad shaped sequence in widening_mul, instead we can keep
    fewer FMAs without loop dependency.

    With this patch, there's about 2% improvement in 510.parest_r
    1-copy run on ampere1 (with "-Ofast -mcpu=ampere1 -flto
    --param avoid-fma-max-bits=512").

    PR tree-optimization/110279

    gcc/ChangeLog:

            * tree-ssa-reassoc.cc (get_reassociation_width): check
            for loop dependent FMAs.
            (reassociate_bb): For 3 ops, refine the condition to call
            swap_ops_for_binary_stmt.

    gcc/testsuite/ChangeLog:

            * gcc.dg/pr110279-1.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (2 preceding siblings ...)
  2023-11-23 12:57 ` cvs-commit at gcc dot gnu.org
@ 2023-11-27 22:45 ` pinskia at gcc dot gnu.org
  2023-11-27 23:19 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-27 22:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to GCC Commits from comment #2)
>     gcc/testsuite/ChangeLog:
>     
>             * gcc.dg/pr110279-1.c: New test.

This testcase fails on x86_64-linux-gnu ....

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (3 preceding siblings ...)
  2023-11-27 22:45 ` pinskia at gcc dot gnu.org
@ 2023-11-27 23:19 ` pinskia at gcc dot gnu.org
  2023-12-14 19:40 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-27 23:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> (In reply to GCC Commits from comment #2)
> >     gcc/testsuite/ChangeLog:
> >     
> >             * gcc.dg/pr110279-1.c: New test.
> 
> This testcase fails on x86_64-linux-gnu ....

Looks like it fails on all targets that don't have FMA by default ...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (4 preceding siblings ...)
  2023-11-27 23:19 ` pinskia at gcc dot gnu.org
@ 2023-12-14 19:40 ` cvs-commit at gcc dot gnu.org
  2023-12-19  1:18 ` sandra at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-14 19:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Di Zhao <dzhao@gcc.gnu.org>:

https://gcc.gnu.org/g:8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652

commit r14-6559-g8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652
Author: Di Zhao <dizhao@os.amperecomputing.com>
Date:   Fri Dec 15 03:22:32 2023 +0800

    Consider fully pipelined FMA in get_reassociation_width

    Add a new parameter param_fully_pipelined_fma. If it is non-zero,
    reassociation considers the benefit of parallelizing FMA's
    multiplication part and addition part, assuming FMUL and FMA use the
    same units that can also do FADD.

    With the patch and new option, there's ~2% improvement in spec2017
    508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1
     -flto".)

            PR tree-optimization/110279

    gcc/ChangeLog:

            * doc/invoke.texi: New parameter fully-pipelined-fma.
            * params.opt: New parameter fully-pipelined-fma.
            * tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return
            the latency of MULT_EXPRs that can't be hidden by the FMAs.
            (get_reassociation_width): Search for a smaller width
            considering the benefit of fully pipelined FMA.
            (rank_ops_for_fma): Return the number of MULT_EXPRs.
            (reassociate_bb): Pass the number of MULT_EXPRs to
            get_reassociation_width; avoid calling
            get_reassociation_width twice.

    gcc/testsuite/ChangeLog:

            * gcc.dg/pr110279-2.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (5 preceding siblings ...)
  2023-12-14 19:40 ` cvs-commit at gcc dot gnu.org
@ 2023-12-19  1:18 ` sandra at gcc dot gnu.org
  2024-01-09  7:44 ` dizhao at os dot amperecomputing.com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: sandra at gcc dot gnu.org @ 2023-12-19  1:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

sandra at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandra at gcc dot gnu.org

--- Comment #6 from sandra at gcc dot gnu.org ---
Both testcases are failing on nios2-elf.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (6 preceding siblings ...)
  2023-12-19  1:18 ` sandra at gcc dot gnu.org
@ 2024-01-09  7:44 ` dizhao at os dot amperecomputing.com
  2024-03-10  3:40 ` law at gcc dot gnu.org
  2024-05-21 14:26 ` ro at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2024-01-09  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

--- Comment #7 from Di Zhao <dizhao at os dot amperecomputing.com> ---
This patch is to fix the regression test failures:
"Fix compile options of pr110279-1.c and pr110279-2.c"
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=6cec7b06b3c8187b36fc05cfd4dd38b42313d727)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (7 preceding siblings ...)
  2024-01-09  7:44 ` dizhao at os dot amperecomputing.com
@ 2024-03-10  3:40 ` law at gcc dot gnu.org
  2024-05-21 14:26 ` ro at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-10  3:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |law at gcc dot gnu.org
         Resolution|---                         |FIXED

--- Comment #8 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Appears to be fixed on the trunk.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/110279] [14 Regression] Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
  2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
                   ` (8 preceding siblings ...)
  2024-03-10  3:40 ` law at gcc dot gnu.org
@ 2024-05-21 14:26 ` ro at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: ro at gcc dot gnu.org @ 2024-05-21 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

Rainer Orth <ro at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |---
                 CC|                            |ro at gcc dot gnu.org
   Last reconfirmed|                            |2024-05-21
             Status|RESOLVED                    |REOPENED
     Ever confirmed|0                           |1

--- Comment #9 from Rainer Orth <ro at gcc dot gnu.org> ---
Not at all, the test still FAILs on sparc-sun-solaris2.11,
arm-unknown-linux-gnueabihf, pru-unknown-elf, m68k-unknown-linux-gnu.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-05-21 14:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
2023-06-16  8:41 ` [Bug tree-optimization/110279] [14 Regression] " rguenth at gcc dot gnu.org
2023-08-09 18:09 ` dizhao at os dot amperecomputing.com
2023-11-23 12:57 ` cvs-commit at gcc dot gnu.org
2023-11-27 22:45 ` pinskia at gcc dot gnu.org
2023-11-27 23:19 ` pinskia at gcc dot gnu.org
2023-12-14 19:40 ` cvs-commit at gcc dot gnu.org
2023-12-19  1:18 ` sandra at gcc dot gnu.org
2024-01-09  7:44 ` dizhao at os dot amperecomputing.com
2024-03-10  3:40 ` law at gcc dot gnu.org
2024-05-21 14:26 ` ro at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).