public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/98350] New: Reassociation breaks FMA chains
@ 2020-12-17 15:24 ktkachov at gcc dot gnu.org
2021-01-05 8:31 ` [Bug tree-optimization/98350] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2020-12-17 15:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
Bug ID: 98350
Summary: Reassociation breaks FMA chains
Product: gcc
Version: unknown
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Consider the testcase:
#define N 1024
double a[N];
double b[N];
double c[N];
double d[N];
double e[N];
double f[N];
double g[N];
double h[N];
double j[N];
double k[N];
double l[N];
double m[N];
double o[N];
double p[N];
void
foo (void)
{
for (int i = 0; i < N; i++)
{
a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i]
+ m[i]* o[i] + p[i];
}
}
For -Ofast --param=tree-reassoc-width=1 GCC generates the loop:
.L2:
ldr q1, [x1, x0]
ldr q0, [x12, x0]
ldr q3, [x14, x0]
fadd v0.2d, v0.2d, v1.2d
ldr q1, [x13, x0]
ldr q2, [x11, x0]
fmla v0.2d, v3.2d, v1.2d
ldr q1, [x10, x0]
ldr q3, [x9, x0]
fmla v0.2d, v2.2d, v1.2d
ldr q1, [x8, x0]
ldr q2, [x7, x0]
fmla v0.2d, v3.2d, v1.2d
ldr q1, [x6, x0]
ldr q3, [x5, x0]
fmla v0.2d, v2.2d, v1.2d
ldr q1, [x4, x0]
ldr q2, [x3, x0]
fmla v0.2d, v3.2d, v1.2d
ldr q1, [x2, x0]
fmla v0.2d, v2.2d, v1.2d
str q0, [x1, x0]
add x0, x0, 16
cmp x0, 8192
bne .L2
with --param=tree-reassoc-width=4 it generates:
.L2:
ldr q5, [x11, x0]
ldr q4, [x7, x0]
ldr q0, [x3, x0]
ldr q3, [x12, x0]
ldr q1, [x8, x0]
ldr q2, [x4, x0]
fmul v3.2d, v3.2d, v5.2d
fmul v1.2d, v1.2d, v4.2d
fmul v2.2d, v2.2d, v0.2d
ldr q16, [x1, x0]
ldr q18, [x14, x0]
ldr q17, [x13, x0]
ldr q0, [x2, x0]
ldr q7, [x10, x0]
ldr q6, [x9, x0]
ldr q5, [x6, x0]
ldr q4, [x5, x0]
fmla v3.2d, v18.2d, v17.2d
fadd v0.2d, v0.2d, v16.2d
fmla v1.2d, v7.2d, v6.2d
fmla v2.2d, v5.2d, v4.2d
fadd v0.2d, v0.2d, v3.2d
fadd v1.2d, v1.2d, v2.2d
fadd v0.2d, v0.2d, v1.2d
str q0, [x1, x0]
add x0, x0, 16
cmp x0, 8192
bne .L2
The reassociation is evident. The problem here is that the fmla chains are
something we'd want to preserve.
Is there a way we can get the reassoc pass to handle FMAs more intelligently?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/98350] Reassociation breaks FMA chains
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
@ 2021-01-05 8:31 ` rguenth at gcc dot gnu.org
2021-09-06 18:40 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-05 8:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|unknown |11.0
Ever confirmed|0 |1
Last reconfirmed| |2021-01-05
Status|UNCONFIRMED |NEW
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
There is no built-in way, and yes, reassoc-width is known to have this effect.
What could be done is move/duplicate FMA discovery from
pass_optimize_widening_mul to reassoc(*). The simplistic idea would be to
perform a separate FMA detection on the OPS array.
The question is how to handle imperfect chains where reassoc would order
after rank, like
a[i] += b[i]* c[i] + d[i] + f[i] * g[i] + h[i] + k[i] * l[i] + m[i] + p[i];
and also how to not "break" the special heuristics the current FMA formation
pass has. Alternatively altering rewrite_expr_tree_parallel only to avoid
splitting FMA chains in unwanted ways would be possible.
(*) note since reassoc doesn't handle signed integer arithmetic it cannot
fully replace late FMA detect
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/98350] Reassociation breaks FMA chains
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
2021-01-05 8:31 ` [Bug tree-optimization/98350] " rguenth at gcc dot gnu.org
@ 2021-09-06 18:40 ` pinskia at gcc dot gnu.org
2023-03-23 4:16 ` dizhao at os dot amperecomputing.com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-06 18:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |acsawdey at gcc dot gnu.org
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 70912 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/98350] Reassociation breaks FMA chains
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
2021-01-05 8:31 ` [Bug tree-optimization/98350] " rguenth at gcc dot gnu.org
2021-09-06 18:40 ` pinskia at gcc dot gnu.org
@ 2023-03-23 4:16 ` dizhao at os dot amperecomputing.com
2023-03-23 4:22 ` dizhao at os dot amperecomputing.com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2023-03-23 4:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
Di Zhao <dizhao at os dot amperecomputing.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dizhao at os dot amperecomputing.c
| |om
--- Comment #3 from Di Zhao <dizhao at os dot amperecomputing.com> ---
Created attachment 54735
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54735&action=edit
move-FLOAT_MODE_P-ahead-to-insert-more-FMAs
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/98350] Reassociation breaks FMA chains
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
` (2 preceding siblings ...)
2023-03-23 4:16 ` dizhao at os dot amperecomputing.com
@ 2023-03-23 4:22 ` dizhao at os dot amperecomputing.com
2023-05-19 7:24 ` pinskia at gcc dot gnu.org
2023-05-30 6:03 ` cvs-commit at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: dizhao at os dot amperecomputing.com @ 2023-03-23 4:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
--- Comment #4 from Di Zhao <dizhao at os dot amperecomputing.com> ---
I've found the same problem with gcc-12 and gcc-13 (trunk).
By improving the workaround in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114, more FMAs can be inserted
for vector mode. For the testcase in this tracker, 6 "fmla" can be generated
with attachment 54735. The compile option I used is "-Ofast -mcpu=neoverse-n1".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/98350] Reassociation breaks FMA chains
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
` (3 preceding siblings ...)
2023-03-23 4:22 ` dizhao at os dot amperecomputing.com
@ 2023-05-19 7:24 ` pinskia at gcc dot gnu.org
2023-05-30 6:03 ` cvs-commit at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-19 7:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kyukhin at gcc dot gnu.org
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 70479 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/98350] Reassociation breaks FMA chains
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
` (4 preceding siblings ...)
2023-05-19 7:24 ` pinskia at gcc dot gnu.org
@ 2023-05-30 6:03 ` cvs-commit at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-30 6:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Lili Cui <cuilili@gcc.gnu.org>:
https://gcc.gnu.org/g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409
commit r14-1371-ge5405f065bace0685cb3b8878d1dfc7a6e7ef409
Author: Lili Cui <lili.cui@intel.com>
Date: Tue May 30 05:47:47 2023 +0000
Handle FMA friendly in reassoc pass
Make some changes in reassoc pass to make it more friendly to fma pass
later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.
There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which
is
conducive to generating more fma and reducing the loss of FMA when breaking
the chain.
2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
chains according to the given correlation width, keeping the FMA chance as
much as possible.
With the patch applied
On ICX:
507.cactuBSSN_r: Improved by 1.7% for multi-copy .
503.bwaves_r : Improved by 0.60% for single copy .
507.cactuBSSN_r: Improved by 1.10% for single copy .
519.lbm_r : Improved by 2.21% for single copy .
no measurable changes for other benchmarks.
On aarch64
507.cactuBSSN_r: Improved by 1.7% for multi-copy.
503.bwaves_r : Improved by 6.00% for single-copy.
no measurable changes for other benchmarks.
TEST1:
float
foo (float a, float b, float c, float d, float *e)
{
return *e + a * b + c * d ;
}
For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmulss %xmm3, %xmm2, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vaddss (%rdi), %xmm0, %xmm0
ret
With this patch GCC generates:
vfmadd213ss (%rdi), %xmm1, %xmm0
vfmadd231ss %xmm2, %xmm3, %xmm0
ret
TEST2:
for (int i = 0; i < N; i++)
{
a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] *
l[i] + m[i]* o[i] + p[i];
}
For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmovapd e(%rax), %ymm4
vmulpd d(%rax), %ymm4, %ymm3
addq $32, %rax
vmovapd c-32(%rax), %ymm5
vmovapd j-32(%rax), %ymm6
vmulpd h-32(%rax), %ymm6, %ymm2
vmovapd a-32(%rax), %ymm6
vaddpd p-32(%rax), %ymm6, %ymm0
vmovapd g-32(%rax), %ymm7
vfmadd231pd b-32(%rax), %ymm5, %ymm3
vmovapd o-32(%rax), %ymm4
vmulpd m-32(%rax), %ymm4, %ymm1
vmovapd l-32(%rax), %ymm5
vfmadd231pd f-32(%rax), %ymm7, %ymm2
vfmadd231pd k-32(%rax), %ymm5, %ymm1
vaddpd %ymm3, %ymm0, %ymm0
vaddpd %ymm2, %ymm0, %ymm0
vaddpd %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq $8192, %rax
jne .L4
vzeroupper
ret
with this patch applied GCC breaks the chain with width = 2 and generates 6
fma:
vmovapd a(%rax), %ymm2
vmovapd c(%rax), %ymm0
addq $32, %rax
vmovapd e-32(%rax), %ymm1
vmovapd p-32(%rax), %ymm5
vmovapd g-32(%rax), %ymm3
vmovapd j-32(%rax), %ymm6
vmovapd l-32(%rax), %ymm4
vmovapd o-32(%rax), %ymm7
vfmadd132pd b-32(%rax), %ymm2, %ymm0
vfmadd132pd d-32(%rax), %ymm5, %ymm1
vfmadd231pd f-32(%rax), %ymm3, %ymm0
vfmadd231pd h-32(%rax), %ymm6, %ymm1
vfmadd231pd k-32(%rax), %ymm4, %ymm0
vfmadd231pd m-32(%rax), %ymm7, %ymm1
vaddpd %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq $8192, %rax
jne .L2
vzeroupper
ret
gcc/ChangeLog:
PR tree-optimization/98350
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Rewrite this function.
(rank_ops_for_fma): New.
(reassociate_bb): Handle new function.
gcc/testsuite/ChangeLog:
PR tree-optimization/98350
* gcc.dg/pr98350-1.c: New test.
* gcc.dg/pr98350-2.c: Ditto.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-05-30 6:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 15:24 [Bug tree-optimization/98350] New: Reassociation breaks FMA chains ktkachov at gcc dot gnu.org
2021-01-05 8:31 ` [Bug tree-optimization/98350] " rguenth at gcc dot gnu.org
2021-09-06 18:40 ` pinskia at gcc dot gnu.org
2023-03-23 4:16 ` dizhao at os dot amperecomputing.com
2023-03-23 4:22 ` dizhao at os dot amperecomputing.com
2023-05-19 7:24 ` pinskia at gcc dot gnu.org
2023-05-30 6:03 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).