public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r)
@ 2023-06-16  8:32 dizhao at os dot amperecomputing.com
  2023-06-16  8:41 ` [Bug tree-optimization/110279] [14 Regression] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2023-06-16  8:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279

            Bug ID: 110279
           Summary: Regressions on aarch64 cause by handing FMA in reassoc
                    (510.parest_r, 508.namd_r)
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dizhao at os dot amperecomputing.com
  Target Milestone: ---

Created attachment 55339
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55339&action=edit
[PATCH] Check for nested FMA chains in reassoc

After testing the recent patch "Handle FMA friendly in reassoc pass" (e5405f06)
on some of our aarch64 machines, I found regressions in a few spec2017 fprate
cases.

On ampere1, the patch introduced approximately 2% regression in 510.parest_r.
Additionally, with fp_reassoc_width changed so that reassociation actually
works on floating points additions (which brings about 1% overall benefit),
there's approximately 5% regression in 508.namd_r on ampere1, and 2.6% on
neoverse-n1.

The compile options we used is "-Ofast -mcpu=ampere1 -flto=32 --param
avoid-fma-max-bits=512" for ampere1, and "-Ofast -mcpu=neoverse-n1 -flto=32"
for neoverse-n1. The tests are single copy run.

Below is from my investigations.

1) From perf result, the regression in 510.parest_r is because the re-arranging
in rank_ops_for_fma() produced 2 FMAs in a small loop, with the last FMA's
result fed back into first one from PHI. With avoid-fma-max-bits, these
candidates are dropped in widening_mul, causing 2% regression; without the
parameter there is 1% regression.

Before the patch, the generated code looks like:
        label:  ....
               fmul v2, v2, v3
               fmla v2, v4, v5
               fadd v1, v1, v2
               ...
               b.ne  label

After the patch (without avoid-fma-max-bits):
        label:  ...
               fmla v1, v2, v3
               fmla v1, v4, v5
               ...
               b.ne  label

2) As for 508.namd_r, there are slightly fewer FMAs generated. It seems the
patch is not handling FMAs like ((a * b + c) * d + e) *... well. For example,
below is a piece of CFG before reassoc2:

  _797 = A_788 * _796;
  fast_c = _797 + _1161;
  _815 = diffa * fast_d;
  _816 = fast_c + _815;
  _817 = diffa * _816;
  fast_dir = fast_b + _817;

Before the patch, optimized code looks like:

  fast_c = .FNMA (B_790, _798, _334);
  _816 = .FMA (diffa, fast_d, fast_c);
  fast_dir = .FMA (diffa, _816, fast_b);

After the patch:

  _815 = diffa * fast_d;
  _5910 = .FMA (A_788, _796, _815);
  _816 = _5909 + _5910;
  _817 = diffa * _816;
  _5908 = .FMA (A_788, _801, _817);
  fast_dir = _5907 + _5908;

I came out with a patch to solve this. I'll also attach here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-05-21 14:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-16  8:32 [Bug tree-optimization/110279] New: Regressions on aarch64 cause by handing FMA in reassoc (510.parest_r, 508.namd_r) dizhao at os dot amperecomputing.com
2023-06-16  8:41 ` [Bug tree-optimization/110279] [14 Regression] " rguenth at gcc dot gnu.org
2023-08-09 18:09 ` dizhao at os dot amperecomputing.com
2023-11-23 12:57 ` cvs-commit at gcc dot gnu.org
2023-11-27 22:45 ` pinskia at gcc dot gnu.org
2023-11-27 23:19 ` pinskia at gcc dot gnu.org
2023-12-14 19:40 ` cvs-commit at gcc dot gnu.org
2023-12-19  1:18 ` sandra at gcc dot gnu.org
2024-01-09  7:44 ` dizhao at os dot amperecomputing.com
2024-03-10  3:40 ` law at gcc dot gnu.org
2024-05-21 14:26 ` ro at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).