From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) by sourceware.org (Postfix) with ESMTPS id 5D5293861C54 for ; Tue, 6 Jul 2021 08:24:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5D5293861C54 Received: by mail-ua1-x932.google.com with SMTP id q20so1006253uaa.3 for ; Tue, 06 Jul 2021 01:24:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5stehDAeP6pBiUej2FFuybbEIXh/19szeOgRTlTVY98=; b=HVnCh0EBXKnzibTZbyTHnH/HBeUsG/M0rMsUYoJ4Y59o46Yd1gvs13xbL5vH0rX4gg oCeD+EPHTzs73uz6jLTZgvIlkgykMYDxoIP5qtxBzm136CH5ZUt7cWngCTPDcgv8cgMi /y4P5btq6J4lEtV+1/wJJaVO1XuaK0s95FQfhEvl39vu9sj6H4n+L9VmyLwPi+q0CeFH cENecqJfyHQPWDNMIKi22wmIydGrjLUv23utHlY1ZfBPBzQ6IBbK5FoF1uRm6ry93cPa tTpsM1+0r5pxcy62OQaAhS+FkZGheVBbtdfxDHoaQxBX7rRhLcFFCJK1xm9/wrDUTOjP BfXw== X-Gm-Message-State: AOAM531zSXqK2o9OP2HvSbKe7dvZsV/JJ1gxczNTtC+eqS+ipqZj6zr3 TWOnUUkmlKvPN11u7pPKQxbQW0NgZ6FcgzaO/VY= X-Google-Smtp-Source: ABdhPJxNoiHgG8RDO0jKCR97PILYTBpGoFzLVr/PjtiZ81jwJV6g/VJ/6MJArq8WGYbiJcdp4flL5bpr9pYTA9lSnDA= X-Received: by 2002:a9f:2d8c:: with SMTP id v12mr7677506uaj.22.1625559893631; Tue, 06 Jul 2021 01:24:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Hongtao Liu Date: Tue, 6 Jul 2021 16:29:49 +0800 Message-ID: Subject: Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs To: Richard Biener Cc: GCC Patches , "Liu, Hongtao" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jul 2021 08:24:57 -0000 On Tue, Jul 6, 2021 at 3:42 PM Richard Biener wrote: > > On Tue, 6 Jul 2021, Hongtao Liu wrote: > > > On Mon, Jul 5, 2021 at 10:09 PM Richard Biener wrot= e: > > > > > > This adds named expanders for vec_fmaddsub4 and > > > vec_fmsubadd4 which map to x86 vfmaddsubXXXp{ds} and > > > vfmsubaddXXXp{ds} instructions. This complements the previous > > > addition of ADDSUB support. > > > > > > x86 lacks SUBADD and the negate variants of FMA with mixed > > > plus minus so I did not add optabs or patterns for those but > > > it would not be difficult if there's a target that has them. > > > Maybe one of the complex fma patterns match those variants? > > > > > > I did not dare to rewrite the numerous patterns to the new > > > canonical name but instead added two new expanders. Note I > > > did not cover AVX512 since the existing patterns are separated > > > and I have no easy way to test things there. Handling AVX512 > > > should be easy as followup though. > > > > > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress. > > > > > > Any comments? > > > > > > Thanks, > > > Richard. > > > > > > 2021-07-05 Richard Biener > > > > > > * doc/md.texi (vec_fmaddsub4): Document. > > > (vec_fmsubadd4): Likewise. > > > * optabs.def (vec_fmaddsub$a4): Add. > > > (vec_fmsubadd$a4): Likewise. > > > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > > > (IFN_VEC_FMSUBADD): Likewise. > > > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > > > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > > > (addsub_pattern::build): Likewise. > > > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > > > and CFN_VEC_FMSUBADD are not transparent for permutes. > > > * config/i386/sse.md (vec_fmaddsub4): New expander. > > > (vec_fmsubadd4): Likewise. > > > > > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > > > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > > > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > > > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > > > --- > > > gcc/config/i386/sse.md | 19 ++ > > > gcc/doc/md.texi | 14 ++ > > > gcc/internal-fn.def | 3 +- > > > gcc/optabs.def | 2 + > > > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > > > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > > > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > > > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > > > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++---= -- > > > gcc/tree-vect-slp.c | 2 + > > > 10 files changed, 311 insertions(+), 57 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.= c > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.= c > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.= c > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.= c > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > index bcf1605d147..6fc13c184bf 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -4644,6 +4644,25 @@ > > > ;; > > > ;; But this doesn't seem useful in practice. > > > > > > +(define_expand "vec_fmaddsub4" > > > + [(set (match_operand:VF 0 "register_operand") > > > + (unspec:VF > > > + [(match_operand:VF 1 "nonimmediate_operand") > > > + (match_operand:VF 2 "nonimmediate_operand") > > > + (match_operand:VF 3 "nonimmediate_operand")] > > > + UNSPEC_FMADDSUB))] > > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > > + > > > +(define_expand "vec_fmsubadd4" > > > + [(set (match_operand:VF 0 "register_operand") > > > + (unspec:VF > > > + [(match_operand:VF 1 "nonimmediate_operand") > > > + (match_operand:VF 2 "nonimmediate_operand") > > > + (neg:VF > > > + (match_operand:VF 3 "nonimmediate_operand"))] > > > + UNSPEC_FMADDSUB))] > > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > > + > > > > W/ condition like > > "TARGET_FMA || TARGET_FMA4 > > || ( =3D=3D 64 || TARGET_AVX512VL)=E2=80=9C=EF=BC=9F > > > > the original expander "fmaddsub_" is only used by builtins which > > have it's own guard for AVX512VL, It doesn't matter if it doesn't have > > TARGET_AVX512VL > > BDESC (OPTION_MASK_ISA_AVX512VL, 0, > > CODE_FOR_avx512vl_fmaddsub_v4df_mask, > > "__builtin_ia32_vfmaddsubpd256_mask", > > IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int) > > V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) > > OK, that seems to work! > > Bootstrapped and tested on x86_64-unknown-linux-gnu - are the > x86 backend changes OK? > Yes, LGTM. > Thanks, > Richard. > > > This adds named expanders for vec_fmaddsub4 and > vec_fmsubadd4 which map to x86 vfmaddsubXXXp{ds} and > vfmsubaddXXXp{ds} instructions. This complements the previous > addition of ADDSUB support. > > x86 lacks SUBADD and the negate variants of FMA with mixed > plus minus so I did not add optabs or patterns for those but > it would not be difficult if there's a target that has them. > > 2021-07-05 Richard Biener > > * doc/md.texi (vec_fmaddsub4): Document. > (vec_fmsubadd4): Likewise. > * optabs.def (vec_fmaddsub$a4): Add. > (vec_fmsubadd$a4): Likewise. > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > (IFN_VEC_FMSUBADD): Likewise. > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > (addsub_pattern::build): Likewise. > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > and CFN_VEC_FMSUBADD are not transparent for permutes. > * config/i386/sse.md (vec_fmaddsub4): New expander. > (vec_fmsubadd4): Likewise. > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > --- > gcc/config/i386/sse.md | 19 ++ > gcc/doc/md.texi | 14 ++ > gcc/internal-fn.def | 3 +- > gcc/optabs.def | 2 + > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > gcc/tree-vect-slp.c | 2 + > 10 files changed, 311 insertions(+), 57 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index bcf1605d147..17c9e571d5d 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -4644,6 +4644,25 @@ > ;; > ;; But this doesn't seem useful in practice. > > +(define_expand "vec_fmaddsub4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (match_operand:VF 3 "nonimmediate_operand")] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || ( =3D=3D 64 || TARGET_AVX512V= L)") > + > +(define_expand "vec_fmsubadd4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (neg:VF > + (match_operand:VF 3 "nonimmediate_operand"))] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || ( =3D=3D 64 || TARGET_AVX512V= L)") > + > (define_expand "fmaddsub_" > [(set (match_operand:VF 0 "register_operand") > (unspec:VF > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 1b918144330..cc92ebd26aa 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing su= btract and odd > lanes doing addition. Operands 1 and 2 and the outout operand are vecto= rs > with mode @var{m}. > > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern > +@item @samp{vec_fmaddsub@var{m}4} > +Alternating multiply subtract, add with even lanes doing subtract and od= d > +lanes doing addition of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand a= re vectors > +with mode @var{m}. > + > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern > +@item @samp{vec_fmsubadd@var{m}4} > +Alternating multiply add, subtract with even lanes doing addition and od= d > +lanes doing subtraction of the third operand to the multiplication resul= t > +of the first two operands. Operands 1, 2 and 3 and the outout operand a= re vectors > +with mode @var{m}. > + > These instructions are not allowed to @code{FAIL}. > > @cindex @code{mulhisi3} instruction pattern > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index c3b8e730960..a7003d5da8e 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,= cadd270, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) > DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) > - > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 41ab2598eb6..51acc1be8f5 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_l= o_$a") > OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > OPTAB_D (sync_add_optab, "sync_add$I$a") > OPTAB_D (sync_and_optab, "sync_and$I$a") > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/tes= tsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > new file mode 100644 > index 00000000000..b30d10731a7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i =3D 0; i < n; ++i) > + { > + a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i =3D 0; i < 4; ++i) > + { > + a[i] =3D i; > + b[i] =3D 3*i; > + c[i] =3D 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const double d[4] =3D { 0., 22., 82., 192. }; > + for (int i =3D 0; i < 4; ++i) > + if (a[i] !=3D d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/tes= tsuite/gcc.target/i386/vect-fmaddsubXXXps.c > new file mode 100644 > index 00000000000..cd2af8725a3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i =3D 0; i < n; ++i) > + { > + a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i =3D 0; i < 4; ++i) > + { > + a[i] =3D i; > + b[i] =3D 3*i; > + c[i] =3D 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const float d[4] =3D { 0., 22., 82., 192. }; > + for (int i =3D 0; i < 4; ++i) > + if (a[i] !=3D d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/tes= tsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > new file mode 100644 > index 00000000000..7ca2a275cc1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i =3D 0; i < n; ++i) > + { > + a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i =3D 0; i < 4; ++i) > + { > + a[i] =3D i; > + b[i] =3D 3*i; > + c[i] =3D 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const double d[4] =3D { 0., 20., 86., 186. }; > + for (int i =3D 0; i < 4; ++i) > + if (a[i] !=3D d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/tes= tsuite/gcc.target/i386/vect-fmsubaddXXXps.c > new file mode 100644 > index 00000000000..9ddd0e423db > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i =3D 0; i < n; ++i) > + { > + a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i =3D 0; i < 4; ++i) > + { > + a[i] =3D i; > + b[i] =3D 3*i; > + c[i] =3D 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const float d[4] =3D { 0., 20., 86., 186. }; > + for (int i =3D 0; i < 4; ++i) > + if (a[i] !=3D d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > index 2671f91972d..f774cac4a4d 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vi= nfo */) > class addsub_pattern : public vect_pattern > { > public: > - addsub_pattern (slp_tree *node) > - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; > + addsub_pattern (slp_tree *node, internal_fn ifn) > + : vect_pattern (node, NULL, ifn) {}; > > void build (vec_info *); > > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_= map_t *, slp_tree *node_) > { > slp_tree node =3D *node_; > if (SLP_TREE_CODE (node) !=3D VEC_PERM_EXPR > - || SLP_TREE_CHILDREN (node).length () !=3D 2) > + || SLP_TREE_CHILDREN (node).length () !=3D 2 > + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) > return NULL; > > /* Match a blend of a plus and a minus op with the same number of plus= and > minus lanes on the same operands. */ > - slp_tree sub =3D SLP_TREE_CHILDREN (node)[0]; > - slp_tree add =3D SLP_TREE_CHILDREN (node)[1]; > - bool swapped_p =3D false; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - { > - std::swap (add, sub); > - swapped_p =3D true; > - } > - if (!(vect_match_expression_p (add, PLUS_EXPR) > - && vect_match_expression_p (sub, MINUS_EXPR))) > + unsigned l0 =3D SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 =3D SLP_TREE_LANE_PERMUTATION (node)[1].first; > + if (l0 =3D=3D l1) > + return NULL; > + bool l0add_p =3D vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0]= , > + PLUS_EXPR); > + if (!l0add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_E= XPR)) > + return NULL; > + bool l1add_p =3D vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1]= , > + PLUS_EXPR); > + if (!l1add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_E= XPR)) > return NULL; > - if (!((SLP_TREE_CHILDREN (sub)[0] =3D=3D SLP_TREE_CHILDREN (add)[0] > - && SLP_TREE_CHILDREN (sub)[1] =3D=3D SLP_TREE_CHILDREN (add)[1]) > - || (SLP_TREE_CHILDREN (sub)[0] =3D=3D SLP_TREE_CHILDREN (add)[1] > - && SLP_TREE_CHILDREN (sub)[1] =3D=3D SLP_TREE_CHILDREN (add)[= 0]))) > + > + slp_tree l0node =3D SLP_TREE_CHILDREN (node)[l0]; > + slp_tree l1node =3D SLP_TREE_CHILDREN (node)[l1]; > + if (!((SLP_TREE_CHILDREN (l0node)[0] =3D=3D SLP_TREE_CHILDREN (l1node)= [0] > + && SLP_TREE_CHILDREN (l0node)[1] =3D=3D SLP_TREE_CHILDREN (l1nod= e)[1]) > + || (SLP_TREE_CHILDREN (l0node)[0] =3D=3D SLP_TREE_CHILDREN (l1nod= e)[1] > + && SLP_TREE_CHILDREN (l0node)[1] =3D=3D SLP_TREE_CHILDREN (l1= node)[0]))) > return NULL; > > for (unsigned i =3D 0; i < SLP_TREE_LANE_PERMUTATION (node).length ();= ++i) > { > std::pair perm =3D SLP_TREE_LANE_PERMUTATION (= node)[i]; > - if (swapped_p) > - perm.first =3D perm.first =3D=3D 0 ? 1 : 0; > - /* It has to be alternating -, +, -, ... > + /* It has to be alternating -, +, -, > While we could permute the .ADDSUB inputs and the .ADDSUB output > that's only profitable over the add + sub + blend if at least > one of the permute is optimized which we can't determine here. = */ > - if (perm.first !=3D (i & 1) > + if (perm.first !=3D ((i & 1) ? l1 : l0) > || perm.second !=3D i) > return NULL; > } > > - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > - return NULL; > + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ..= . } > + (l0add_p), see whether we have FMA variants. */ > + if (!l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EX= PR)) > + { > + /* (c * d) -+ a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); > + } > + else if (l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MUL= T_EXPR)) > + { > + /* (c * d) +- a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) > + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); > + } > > - return new addsub_pattern (node_); > + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_ADDSUB); > + > + return NULL; > } > > void > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) > { > slp_tree node =3D *m_node; > > - slp_tree sub =3D SLP_TREE_CHILDREN (node)[0]; > - slp_tree add =3D SLP_TREE_CHILDREN (node)[1]; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - std::swap (add, sub); > - > - /* Modify the blend node in-place. */ > - SLP_TREE_CHILDREN (node)[0] =3D SLP_TREE_CHILDREN (sub)[0]; > - SLP_TREE_CHILDREN (node)[1] =3D SLP_TREE_CHILDREN (sub)[1]; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > - > - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > - stmt_vec_info rep =3D SLP_TREE_REPRESENTATIVE (sub); > - gcall *call =3D gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > - gimple_assign_rhs1 (rep->stmt= ), > - gimple_assign_rhs2 (rep->stmt= )); > - gimple_call_set_lhs (call, make_ssa_name > - (TREE_TYPE (gimple_assign_lhs (rep->stmt))= )); > - gimple_call_set_nothrow (call, true); > - gimple_set_bb (call, gimple_bb (rep->stmt)); > - stmt_vec_info new_rep =3D vinfo->add_pattern_stmt (call, rep); > - SLP_TREE_REPRESENTATIVE (node) =3D new_rep; > - STMT_VINFO_RELEVANT (new_rep) =3D vect_used_in_scope; > - STMT_SLP_TYPE (new_rep) =3D pure_slp; > - STMT_VINFO_VECTYPE (new_rep) =3D SLP_TREE_VECTYPE (node); > - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) =3D true; > - STMT_VINFO_REDUC_DEF (new_rep) =3D STMT_VINFO_REDUC_DEF (vect_orig_stm= t (rep)); > - SLP_TREE_CODE (node) =3D ERROR_MARK; > - SLP_TREE_LANE_PERMUTATION (node).release (); > - > - vect_free_slp_tree (sub); > - vect_free_slp_tree (add); > + unsigned l0 =3D SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 =3D SLP_TREE_LANE_PERMUTATION (node)[1].first; > + > + switch (m_ifn) > + { > + case IFN_VEC_ADDSUB: > + { > + slp_tree sub =3D SLP_TREE_CHILDREN (node)[l0]; > + slp_tree add =3D SLP_TREE_CHILDREN (node)[l1]; > + > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node)[0] =3D SLP_TREE_CHILDREN (sub)[0]; > + SLP_TREE_CHILDREN (node)[1] =3D SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + > + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > + stmt_vec_info rep =3D SLP_TREE_REPRESENTATIVE (sub); > + gcall *call =3D gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > + gimple_assign_rhs1 (rep= ->stmt), > + gimple_assign_rhs2 (rep= ->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (rep->stmt))))= ; > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (rep->stmt)); > + stmt_vec_info new_rep =3D vinfo->add_pattern_stmt (call, rep); > + SLP_TREE_REPRESENTATIVE (node) =3D new_rep; > + STMT_VINFO_RELEVANT (new_rep) =3D vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) =3D pure_slp; > + STMT_VINFO_VECTYPE (new_rep) =3D SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) =3D true; > + STMT_VINFO_REDUC_DEF (new_rep) =3D STMT_VINFO_REDUC_DEF (vect_ori= g_stmt (rep)); > + SLP_TREE_CODE (node) =3D ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + case IFN_VEC_FMADDSUB: > + case IFN_VEC_FMSUBADD: > + { > + slp_tree sub, add; > + if (m_ifn =3D=3D IFN_VEC_FMADDSUB) > + { > + sub =3D SLP_TREE_CHILDREN (node)[l0]; > + add =3D SLP_TREE_CHILDREN (node)[l1]; > + } > + else /* m_ifn =3D=3D IFN_VEC_FMSUBADD */ > + { > + sub =3D SLP_TREE_CHILDREN (node)[l1]; > + add =3D SLP_TREE_CHILDREN (node)[l0]; > + } > + slp_tree mul =3D SLP_TREE_CHILDREN (sub)[0]; > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node).safe_grow (3, true); > + SLP_TREE_CHILDREN (node)[0] =3D SLP_TREE_CHILDREN (mul)[0]; > + SLP_TREE_CHILDREN (node)[1] =3D SLP_TREE_CHILDREN (mul)[1]; > + SLP_TREE_CHILDREN (node)[2] =3D SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; > + > + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operand= s. */ > + stmt_vec_info srep =3D SLP_TREE_REPRESENTATIVE (sub); > + stmt_vec_info mrep =3D SLP_TREE_REPRESENTATIVE (mul); > + gcall *call =3D gimple_build_call_internal (m_ifn, 3, > + gimple_assign_rhs1 (mre= p->stmt), > + gimple_assign_rhs2 (mre= p->stmt), > + gimple_assign_rhs2 (sre= p->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))= ); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (srep->stmt)); > + stmt_vec_info new_rep =3D vinfo->add_pattern_stmt (call, srep); > + SLP_TREE_REPRESENTATIVE (node) =3D new_rep; > + STMT_VINFO_RELEVANT (new_rep) =3D vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) =3D pure_slp; > + STMT_VINFO_VECTYPE (new_rep) =3D SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) =3D true; > + STMT_VINFO_REDUC_DEF (new_rep) =3D STMT_VINFO_REDUC_DEF (vect_ori= g_stmt (srep)); > + SLP_TREE_CODE (node) =3D ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + default:; > + } > } > > /***********************************************************************= ******** > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index f08797c2bc0..5357cd0e7a4 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) > case CFN_COMPLEX_MUL: > case CFN_COMPLEX_MUL_CONJ: > case CFN_VEC_ADDSUB: > + case CFN_VEC_FMADDSUB: > + case CFN_VEC_FMSUBADD: > vertices[idx].perm_in =3D 0; > vertices[idx].perm_out =3D 0; > default:; > -- > 2.26.2 --=20 BR, Hongtao