Updated the fix in attachment. Is it OK for trunk? Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. Thanks, Di Zhao > -----Original Message----- > From: Di Zhao OS > Sent: Sunday, December 17, 2023 8:31 PM > To: Thomas Schwinge ; gcc-patches@gcc.gnu.org > Cc: Richard Biener > Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > > Hello Thomas, > > > -----Original Message----- > > From: Thomas Schwinge > > Sent: Friday, December 15, 2023 5:46 PM > > To: Di Zhao OS ; gcc-patches@gcc.gnu.org > > Cc: Richard Biener > > Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in > > get_reassociation_width > > > > Hi! > > > > On 2023-12-13T08:14:28+0000, Di Zhao OS > wrote: > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr110279-2.c > > > @@ -0,0 +1,41 @@ > > > +/* PR tree-optimization/110279 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param fully- > > pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */ > > > +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */ > > > + > > > +#define LOOP_COUNT 800000000 > > > +typedef double data_e; > > > + > > > +#include > > > + > > > +__attribute_noinline__ data_e > > > +foo (data_e in) > > > > Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6 > > "Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'", > > see attached. > > > > However: > > > > > +{ > > > + data_e a1, a2, a3, a4; > > > + data_e tmp, result = 0; > > > + a1 = in + 0.1; > > > + a2 = in * 0.1; > > > + a3 = in + 0.01; > > > + a4 = in * 0.59; > > > + > > > + data_e result2 = 0; > > > + > > > + for (int ic = 0; ic < LOOP_COUNT; ic++) > > > + { > > > + /* Test that a complete FMA chain with length=4 is not broken. */ > > > + tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ; > > > + result += tmp - ic; > > > + result2 = result2 / 2 - tmp; > > > + > > > + a1 += 0.91; > > > + a2 += 0.1; > > > + a3 -= 0.01; > > > + a4 -= 0.89; > > > + > > > + } > > > + > > > + return result + result2; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "was chosen for reassociation" > > "reassoc2"} } */ > > > +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */ > > Thank you for the fix. > > > ..., I still see these latter two tree dump scans FAIL, for GCN: > > > > $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 > > 2 *: a3_40 > > 2 *: a2_39 > > Width = 4 was chosen for reassociation > > Transforming _15 = powmult_1 + powmult_3; > > into _63 = powmult_1 + a1_38; > > $ grep -F .FMA pr110279-2.c.265t.optimized > > _63 = .FMA (a2_39, a2_39, a1_38); > > _64 = .FMA (a3_40, a3_40, powmult_5); > > > > ..., nvptx: > > > > $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 > > 2 *: a3_40 > > 2 *: a2_39 > > Width = 4 was chosen for reassociation > > Transforming _15 = powmult_1 + powmult_3; > > into _63 = powmult_1 + a1_38; > > $ grep -F .FMA pr110279-2.c.265t.optimized > > _63 = .FMA (a2_39, a2_39, a1_38); > > _64 = .FMA (a3_40, a3_40, powmult_5); > > For these 2 targets, the reassoc_width for FMUL is 1 (default value), > While the testcase assumes that to be 4. The bug was introduced when I > updated the patch but forgot to update the testcase. > > > ..., but also x86_64-pc-linux-gnu: > > > > $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 > > 2 *: a3_40 > > 2 *: a2_39 > > Width = 2 was chosen for reassociation > > Transforming _15 = powmult_1 + powmult_3; > > into _63 = powmult_1 + powmult_3; > > $ grep -cF .FMA pr110279-2.c.265t.optimized > > 0 > > For x86_64 this needs "-mfma". Sorry the compile options missed that. > Can the change below fix these issues? I moved them into > testsuite/gcc.target/aarch64, since they rely on tunings. > > Tested on aarch64-unknown-linux-gnu. > > > > > Grüße > > Thomas > > > > > > ----------------- > > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, > 80634 > > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht > > München, HRB 106955 > > Thanks, > Di Zhao > > --- > gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-1.c | 3 +-- > gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-2.c | 3 +-- > 2 files changed, 2 insertions(+), 4 deletions(-) > rename gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-1.c (83%) > rename gcc/testsuite/{gcc.dg => gcc.target/aarch64}/pr110279-2.c (78%) > > diff --git a/gcc/testsuite/gcc.dg/pr110279-1.c > b/gcc/testsuite/gcc.target/aarch64/pr110279-1.c > similarity index 83% > rename from gcc/testsuite/gcc.dg/pr110279-1.c > rename to gcc/testsuite/gcc.target/aarch64/pr110279-1.c > index f25b6aec967..97d693f56a5 100644 > --- a/gcc/testsuite/gcc.dg/pr110279-1.c > +++ b/gcc/testsuite/gcc.target/aarch64/pr110279-1.c > @@ -1,6 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-Ofast --param avoid-fma-max-bits=512 --param tree-reassoc- > width=4 -fdump-tree-widening_mul-details" } */ > -/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */ > +/* { dg-options "-Ofast -mcpu=generic --param avoid-fma-max-bits=512 --param > tree-reassoc-width=4 -fdump-tree-widening_mul-details" } */ > > #define LOOP_COUNT 800000000 > typedef double data_e; > diff --git a/gcc/testsuite/gcc.dg/pr110279-2.c > b/gcc/testsuite/gcc.target/aarch64/pr110279-2.c > similarity index 78% > rename from gcc/testsuite/gcc.dg/pr110279-2.c > rename to gcc/testsuite/gcc.target/aarch64/pr110279-2.c > index b6b69969c6b..a88cb361fdc 100644 > --- a/gcc/testsuite/gcc.dg/pr110279-2.c > +++ b/gcc/testsuite/gcc.target/aarch64/pr110279-2.c > @@ -1,7 +1,6 @@ > /* PR tree-optimization/110279 */ > /* { dg-do compile } */ > -/* { dg-options "-Ofast --param tree-reassoc-width=4 --param fully-pipelined- > fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */ > -/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */ > +/* { dg-options "-Ofast -mcpu=generic --param tree-reassoc-width=4 --param > fully-pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */ > > #define LOOP_COUNT 800000000 > typedef double data_e; > -- > 2.25.1