From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <crazylht@gmail.com>
Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com
 [IPv6:2607:f8b0:4864:20::932])
 by sourceware.org (Postfix) with ESMTPS id 5D5293861C54
 for <gcc-patches@gcc.gnu.org>; Tue,  6 Jul 2021 08:24:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5D5293861C54
Received: by mail-ua1-x932.google.com with SMTP id q20so1006253uaa.3
 for <gcc-patches@gcc.gnu.org>; Tue, 06 Jul 2021 01:24:54 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=5stehDAeP6pBiUej2FFuybbEIXh/19szeOgRTlTVY98=;
 b=HVnCh0EBXKnzibTZbyTHnH/HBeUsG/M0rMsUYoJ4Y59o46Yd1gvs13xbL5vH0rX4gg
 oCeD+EPHTzs73uz6jLTZgvIlkgykMYDxoIP5qtxBzm136CH5ZUt7cWngCTPDcgv8cgMi
 /y4P5btq6J4lEtV+1/wJJaVO1XuaK0s95FQfhEvl39vu9sj6H4n+L9VmyLwPi+q0CeFH
 cENecqJfyHQPWDNMIKi22wmIydGrjLUv23utHlY1ZfBPBzQ6IBbK5FoF1uRm6ry93cPa
 tTpsM1+0r5pxcy62OQaAhS+FkZGheVBbtdfxDHoaQxBX7rRhLcFFCJK1xm9/wrDUTOjP
 BfXw==
X-Gm-Message-State: AOAM531zSXqK2o9OP2HvSbKe7dvZsV/JJ1gxczNTtC+eqS+ipqZj6zr3
 TWOnUUkmlKvPN11u7pPKQxbQW0NgZ6FcgzaO/VY=
X-Google-Smtp-Source: ABdhPJxNoiHgG8RDO0jKCR97PILYTBpGoFzLVr/PjtiZ81jwJV6g/VJ/6MJArq8WGYbiJcdp4flL5bpr9pYTA9lSnDA=
X-Received: by 2002:a9f:2d8c:: with SMTP id v12mr7677506uaj.22.1625559893631; 
 Tue, 06 Jul 2021 01:24:53 -0700 (PDT)
MIME-Version: 1.0
References: <oss7q3o7-p1r4-o47n-s861-r7499sn5on61@fhfr.qr>
 <CAMZc-bx-esNjcXGM2xG4mDKrCLfS7H1+wFPzw-WKYfib9hBA7A@mail.gmail.com>
 <nycvar.YFH.7.76.2107060941071.10711@zhemvz.fhfr.qr>
In-Reply-To: <nycvar.YFH.7.76.2107060941071.10711@zhemvz.fhfr.qr>
From: Hongtao Liu <crazylht@gmail.com>
Date: Tue, 6 Jul 2021 16:29:49 +0800
Message-ID: <CAMZc-bzbfiuDaAunnPzi27oyEPt-7-KjNKvM=tMcdSFMc1hmMQ@mail.gmail.com>
Subject: Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and
 optabs
To: Richard Biener <rguenther@suse.de>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, "Liu,
 Hongtao" <hongtao.liu@intel.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jul 2021 08:24:57 -0000

On Tue, Jul 6, 2021 at 3:42 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 6 Jul 2021, Hongtao Liu wrote:
>
> > On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrot=
e:
> > >
> > > This adds named expanders for vec_fmaddsub<mode>4 and
> > > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> > > vfmsubaddXXXp{ds} instructions.  This complements the previous
> > > addition of ADDSUB support.
> > >
> > > x86 lacks SUBADD and the negate variants of FMA with mixed
> > > plus minus so I did not add optabs or patterns for those but
> > > it would not be difficult if there's a target that has them.
> > > Maybe one of the complex fma patterns match those variants?
> > >
> > > I did not dare to rewrite the numerous patterns to the new
> > > canonical name but instead added two new expanders.  Note I
> > > did not cover AVX512 since the existing patterns are separated
> > > and I have no easy way to test things there.  Handling AVX512
> > > should be easy as followup though.
> > >
> > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
> > >
> > > Any comments?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2021-07-05  Richard Biener  <rguenther@suse.de>
> > >
> > >         * doc/md.texi (vec_fmaddsub<mode>4): Document.
> > >         (vec_fmsubadd<mode>4): Likewise.
> > >         * optabs.def (vec_fmaddsub$a4): Add.
> > >         (vec_fmsubadd$a4): Likewise.
> > >         * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> > >         (IFN_VEC_FMSUBADD): Likewise.
> > >         * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> > >         Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> > >         (addsub_pattern::build): Likewise.
> > >         * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> > >         and CFN_VEC_FMSUBADD are not transparent for permutes.
> > >         * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> > >         (vec_fmsubadd<mode>4): Likewise.
> > >
> > >         * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> > >         * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> > >         * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> > >         * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> > > ---
> > >  gcc/config/i386/sse.md                        |  19 ++
> > >  gcc/doc/md.texi                               |  14 ++
> > >  gcc/internal-fn.def                           |   3 +-
> > >  gcc/optabs.def                                |   2 +
> > >  .../gcc.target/i386/vect-fmaddsubXXXpd.c      |  34 ++++
> > >  .../gcc.target/i386/vect-fmaddsubXXXps.c      |  34 ++++
> > >  .../gcc.target/i386/vect-fmsubaddXXXpd.c      |  34 ++++
> > >  .../gcc.target/i386/vect-fmsubaddXXXps.c      |  34 ++++
> > >  gcc/tree-vect-slp-patterns.c                  | 192 +++++++++++++---=
--
> > >  gcc/tree-vect-slp.c                           |   2 +
> > >  10 files changed, 311 insertions(+), 57 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.=
c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.=
c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.=
c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.=
c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index bcf1605d147..6fc13c184bf 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -4644,6 +4644,25 @@
> > >  ;;
> > >  ;; But this doesn't seem useful in practice.
> > >
> > > +(define_expand "vec_fmaddsub<mode>4"
> > > +  [(set (match_operand:VF 0 "register_operand")
> > > +       (unspec:VF
> > > +         [(match_operand:VF 1 "nonimmediate_operand")
> > > +          (match_operand:VF 2 "nonimmediate_operand")
> > > +          (match_operand:VF 3 "nonimmediate_operand")]
> > > +         UNSPEC_FMADDSUB))]
> > > +  "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > > +
> > > +(define_expand "vec_fmsubadd<mode>4"
> > > +  [(set (match_operand:VF 0 "register_operand")
> > > +       (unspec:VF
> > > +         [(match_operand:VF 1 "nonimmediate_operand")
> > > +          (match_operand:VF 2 "nonimmediate_operand")
> > > +          (neg:VF
> > > +            (match_operand:VF 3 "nonimmediate_operand"))]
> > > +         UNSPEC_FMADDSUB))]
> > > +  "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > > +
> >
> > W/ condition like
> >   "TARGET_FMA || TARGET_FMA4
> >    || (<MODE_SIZE> =3D=3D 64 || TARGET_AVX512VL)=E2=80=9C=EF=BC=9F
> >
> > the original expander "fmaddsub_<mode>" is only used by builtins which
> > have it's own guard for AVX512VL, It doesn't matter if it doesn't have
> > TARGET_AVX512VL
> > BDESC (OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512vl_fmaddsub_v4df_mask,
> > "__builtin_ia32_vfmaddsubpd256_mask",
> > IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int)
> > V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
>
> OK, that seems to work!
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu - are the
> x86 backend changes OK?
>
Yes, LGTM.
> Thanks,
> Richard.
>
>
> This adds named expanders for vec_fmaddsub<mode>4 and
> vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> vfmsubaddXXXp{ds} instructions.  This complements the previous
> addition of ADDSUB support.
>
> x86 lacks SUBADD and the negate variants of FMA with mixed
> plus minus so I did not add optabs or patterns for those but
> it would not be difficult if there's a target that has them.
>
> 2021-07-05  Richard Biener  <rguenther@suse.de>
>
>         * doc/md.texi (vec_fmaddsub<mode>4): Document.
>         (vec_fmsubadd<mode>4): Likewise.
>         * optabs.def (vec_fmaddsub$a4): Add.
>         (vec_fmsubadd$a4): Likewise.
>         * internal-fn.def (IFN_VEC_FMADDSUB): Add.
>         (IFN_VEC_FMSUBADD): Likewise.
>         * tree-vect-slp-patterns.c (addsub_pattern::recognize):
>         Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
>         (addsub_pattern::build): Likewise.
>         * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
>         and CFN_VEC_FMSUBADD are not transparent for permutes.
>         * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
>         (vec_fmsubadd<mode>4): Likewise.
>
>         * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
>         * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
>         * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
>         * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> ---
>  gcc/config/i386/sse.md                        |  19 ++
>  gcc/doc/md.texi                               |  14 ++
>  gcc/internal-fn.def                           |   3 +-
>  gcc/optabs.def                                |   2 +
>  .../gcc.target/i386/vect-fmaddsubXXXpd.c      |  34 ++++
>  .../gcc.target/i386/vect-fmaddsubXXXps.c      |  34 ++++
>  .../gcc.target/i386/vect-fmsubaddXXXpd.c      |  34 ++++
>  .../gcc.target/i386/vect-fmsubaddXXXps.c      |  34 ++++
>  gcc/tree-vect-slp-patterns.c                  | 192 +++++++++++++-----
>  gcc/tree-vect-slp.c                           |   2 +
>  10 files changed, 311 insertions(+), 57 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index bcf1605d147..17c9e571d5d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4644,6 +4644,25 @@
>  ;;
>  ;; But this doesn't seem useful in practice.
>
> +(define_expand "vec_fmaddsub<mode>4"
> +  [(set (match_operand:VF 0 "register_operand")
> +       (unspec:VF
> +         [(match_operand:VF 1 "nonimmediate_operand")
> +          (match_operand:VF 2 "nonimmediate_operand")
> +          (match_operand:VF 3 "nonimmediate_operand")]
> +         UNSPEC_FMADDSUB))]
> +  "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> =3D=3D 64 || TARGET_AVX512V=
L)")
> +
> +(define_expand "vec_fmsubadd<mode>4"
> +  [(set (match_operand:VF 0 "register_operand")
> +       (unspec:VF
> +         [(match_operand:VF 1 "nonimmediate_operand")
> +          (match_operand:VF 2 "nonimmediate_operand")
> +          (neg:VF
> +            (match_operand:VF 3 "nonimmediate_operand"))]
> +         UNSPEC_FMADDSUB))]
> +  "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> =3D=3D 64 || TARGET_AVX512V=
L)")
> +
>  (define_expand "fmaddsub_<mode>"
>    [(set (match_operand:VF 0 "register_operand")
>         (unspec:VF
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 1b918144330..cc92ebd26aa 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing su=
btract and odd
>  lanes doing addition.  Operands 1 and 2 and the outout operand are vecto=
rs
>  with mode @var{m}.
>
> +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
> +@item @samp{vec_fmaddsub@var{m}4}
> +Alternating multiply subtract, add with even lanes doing subtract and od=
d
> +lanes doing addition of the third operand to the multiplication result
> +of the first two operands.  Operands 1, 2 and 3 and the outout operand a=
re vectors
> +with mode @var{m}.
> +
> +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
> +@item @samp{vec_fmsubadd@var{m}4}
> +Alternating multiply add, subtract with even lanes doing addition and od=
d
> +lanes doing subtraction of the third operand to the multiplication resul=
t
> +of the first two operands.  Operands 1, 2 and 3 and the outout operand a=
re vectors
> +with mode @var{m}.
> +
>  These instructions are not allowed to @code{FAIL}.
>
>  @cindex @code{mulhisi3} instruction pattern
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index c3b8e730960..a7003d5da8e 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,=
 cadd270, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
>  DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
> -
> +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 41ab2598eb6..51acc1be8f5 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_l=
o_$a")
>  OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
>  OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
>  OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
>
>  OPTAB_D (sync_add_optab, "sync_add$I$a")
>  OPTAB_D (sync_and_optab, "sync_and$I$a")
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/tes=
tsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> new file mode 100644
> index 00000000000..b30d10731a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (double * __restrict a, double *b, double *c, int n)
> +{
> +  for (int i =3D 0; i < n; ++i)
> +    {
> +      a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> +      a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> +    }
> +}
> +
> +static void
> +fma_test (void)
> +{
> +  double a[4], b[4], c[4];
> +  for (int i =3D 0; i < 4; ++i)
> +    {
> +      a[i] =3D i;
> +      b[i] =3D 3*i;
> +      c[i] =3D 7*i;
> +    }
> +  check_fmaddsub (a, b, c, 2);
> +  const double d[4] =3D { 0., 22., 82., 192. };
> +  for (int i =3D 0; i < 4; ++i)
> +    if (a[i] !=3D d[i])
> +      __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/tes=
tsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> new file mode 100644
> index 00000000000..cd2af8725a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (float * __restrict a, float *b, float *c, int n)
> +{
> +  for (int i =3D 0; i < n; ++i)
> +    {
> +      a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> +      a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> +    }
> +}
> +
> +static void
> +fma_test (void)
> +{
> +  float a[4], b[4], c[4];
> +  for (int i =3D 0; i < 4; ++i)
> +    {
> +      a[i] =3D i;
> +      b[i] =3D 3*i;
> +      c[i] =3D 7*i;
> +    }
> +  check_fmaddsub (a, b, c, 2);
> +  const float d[4] =3D { 0., 22., 82., 192. };
> +  for (int i =3D 0; i < 4; ++i)
> +    if (a[i] !=3D d[i])
> +      __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...ps" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/tes=
tsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> new file mode 100644
> index 00000000000..7ca2a275cc1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (double * __restrict a, double *b, double *c, int n)
> +{
> +  for (int i =3D 0; i < n; ++i)
> +    {
> +      a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> +      a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> +    }
> +}
> +
> +static void
> +fma_test (void)
> +{
> +  double a[4], b[4], c[4];
> +  for (int i =3D 0; i < 4; ++i)
> +    {
> +      a[i] =3D i;
> +      b[i] =3D 3*i;
> +      c[i] =3D 7*i;
> +    }
> +  check_fmsubadd (a, b, c, 2);
> +  const double d[4] =3D { 0., 20., 86., 186. };
> +  for (int i =3D 0; i < 4; ++i)
> +    if (a[i] !=3D d[i])
> +      __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/tes=
tsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> new file mode 100644
> index 00000000000..9ddd0e423db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (float * __restrict a, float *b, float *c, int n)
> +{
> +  for (int i =3D 0; i < n; ++i)
> +    {
> +      a[2*i + 0] =3D b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> +      a[2*i + 1] =3D b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> +    }
> +}
> +
> +static void
> +fma_test (void)
> +{
> +  float a[4], b[4], c[4];
> +  for (int i =3D 0; i < 4; ++i)
> +    {
> +      a[i] =3D i;
> +      b[i] =3D 3*i;
> +      c[i] =3D 7*i;
> +    }
> +  check_fmsubadd (a, b, c, 2);
> +  const float d[4] =3D { 0., 20., 86., 186. };
> +  for (int i =3D 0; i < 4; ++i)
> +    if (a[i] !=3D d[i])
> +      __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...ps" } } */
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 2671f91972d..f774cac4a4d 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vi=
nfo */)
>  class addsub_pattern : public vect_pattern
>  {
>    public:
> -    addsub_pattern (slp_tree *node)
> -       : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
> +    addsub_pattern (slp_tree *node, internal_fn ifn)
> +       : vect_pattern (node, NULL, ifn) {};
>
>      void build (vec_info *);
>
> @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_=
map_t *, slp_tree *node_)
>  {
>    slp_tree node =3D *node_;
>    if (SLP_TREE_CODE (node) !=3D VEC_PERM_EXPR
> -      || SLP_TREE_CHILDREN (node).length () !=3D 2)
> +      || SLP_TREE_CHILDREN (node).length () !=3D 2
> +      || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
>      return NULL;
>
>    /* Match a blend of a plus and a minus op with the same number of plus=
 and
>       minus lanes on the same operands.  */
> -  slp_tree sub =3D SLP_TREE_CHILDREN (node)[0];
> -  slp_tree add =3D SLP_TREE_CHILDREN (node)[1];
> -  bool swapped_p =3D false;
> -  if (vect_match_expression_p (sub, PLUS_EXPR))
> -    {
> -      std::swap (add, sub);
> -      swapped_p =3D true;
> -    }
> -  if (!(vect_match_expression_p (add, PLUS_EXPR)
> -       && vect_match_expression_p (sub, MINUS_EXPR)))
> +  unsigned l0 =3D SLP_TREE_LANE_PERMUTATION (node)[0].first;
> +  unsigned l1 =3D SLP_TREE_LANE_PERMUTATION (node)[1].first;
> +  if (l0 =3D=3D l1)
> +    return NULL;
> +  bool l0add_p =3D vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0]=
,
> +                                         PLUS_EXPR);
> +  if (!l0add_p
> +      && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_E=
XPR))
> +    return NULL;
> +  bool l1add_p =3D vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1]=
,
> +                                         PLUS_EXPR);
> +  if (!l1add_p
> +      && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_E=
XPR))
>      return NULL;
> -  if (!((SLP_TREE_CHILDREN (sub)[0] =3D=3D SLP_TREE_CHILDREN (add)[0]
> -        && SLP_TREE_CHILDREN (sub)[1] =3D=3D SLP_TREE_CHILDREN (add)[1])
> -       || (SLP_TREE_CHILDREN (sub)[0] =3D=3D SLP_TREE_CHILDREN (add)[1]
> -           && SLP_TREE_CHILDREN (sub)[1] =3D=3D SLP_TREE_CHILDREN (add)[=
0])))
> +
> +  slp_tree l0node =3D SLP_TREE_CHILDREN (node)[l0];
> +  slp_tree l1node =3D SLP_TREE_CHILDREN (node)[l1];
> +  if (!((SLP_TREE_CHILDREN (l0node)[0] =3D=3D SLP_TREE_CHILDREN (l1node)=
[0]
> +        && SLP_TREE_CHILDREN (l0node)[1] =3D=3D SLP_TREE_CHILDREN (l1nod=
e)[1])
> +       || (SLP_TREE_CHILDREN (l0node)[0] =3D=3D SLP_TREE_CHILDREN (l1nod=
e)[1]
> +           && SLP_TREE_CHILDREN (l0node)[1] =3D=3D SLP_TREE_CHILDREN (l1=
node)[0])))
>      return NULL;
>
>    for (unsigned i =3D 0; i < SLP_TREE_LANE_PERMUTATION (node).length ();=
 ++i)
>      {
>        std::pair<unsigned, unsigned> perm =3D SLP_TREE_LANE_PERMUTATION (=
node)[i];
> -      if (swapped_p)
> -       perm.first =3D perm.first =3D=3D 0 ? 1 : 0;
> -      /* It has to be alternating -, +, -, ...
> +      /* It has to be alternating -, +, -,
>          While we could permute the .ADDSUB inputs and the .ADDSUB output
>          that's only profitable over the add + sub + blend if at least
>          one of the permute is optimized which we can't determine here.  =
*/
> -      if (perm.first !=3D (i & 1)
> +      if (perm.first !=3D ((i & 1) ? l1 : l0)
>           || perm.second !=3D i)
>         return NULL;
>      }
>
> -  if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> -    return NULL;
> +  /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ..=
. }
> +     (l0add_p), see whether we have FMA variants.  */
> +  if (!l0add_p
> +      && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EX=
PR))
> +    {
> +      /* (c * d) -+ a */
> +      if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
> +       return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
> +    }
> +  else if (l0add_p
> +          && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MUL=
T_EXPR))
> +    {
> +      /* (c * d) +- a */
> +      if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
> +       return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
> +    }
>
> -  return new addsub_pattern (node_);
> +  if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> +    return new addsub_pattern (node_, IFN_VEC_ADDSUB);
> +
> +  return NULL;
>  }
>
>  void
> @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
>  {
>    slp_tree node =3D *m_node;
>
> -  slp_tree sub =3D SLP_TREE_CHILDREN (node)[0];
> -  slp_tree add =3D SLP_TREE_CHILDREN (node)[1];
> -  if (vect_match_expression_p (sub, PLUS_EXPR))
> -    std::swap (add, sub);
> -
> -  /* Modify the blend node in-place.  */
> -  SLP_TREE_CHILDREN (node)[0] =3D SLP_TREE_CHILDREN (sub)[0];
> -  SLP_TREE_CHILDREN (node)[1] =3D SLP_TREE_CHILDREN (sub)[1];
> -  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> -  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> -
> -  /* Build IFN_VEC_ADDSUB from the sub representative operands.  */
> -  stmt_vec_info rep =3D SLP_TREE_REPRESENTATIVE (sub);
> -  gcall *call =3D gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> -                                           gimple_assign_rhs1 (rep->stmt=
),
> -                                           gimple_assign_rhs2 (rep->stmt=
));
> -  gimple_call_set_lhs (call, make_ssa_name
> -                              (TREE_TYPE (gimple_assign_lhs (rep->stmt))=
));
> -  gimple_call_set_nothrow (call, true);
> -  gimple_set_bb (call, gimple_bb (rep->stmt));
> -  stmt_vec_info new_rep =3D vinfo->add_pattern_stmt (call, rep);
> -  SLP_TREE_REPRESENTATIVE (node) =3D new_rep;
> -  STMT_VINFO_RELEVANT (new_rep) =3D vect_used_in_scope;
> -  STMT_SLP_TYPE (new_rep) =3D pure_slp;
> -  STMT_VINFO_VECTYPE (new_rep) =3D SLP_TREE_VECTYPE (node);
> -  STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) =3D true;
> -  STMT_VINFO_REDUC_DEF (new_rep) =3D STMT_VINFO_REDUC_DEF (vect_orig_stm=
t (rep));
> -  SLP_TREE_CODE (node) =3D ERROR_MARK;
> -  SLP_TREE_LANE_PERMUTATION (node).release ();
> -
> -  vect_free_slp_tree (sub);
> -  vect_free_slp_tree (add);
> +  unsigned l0 =3D SLP_TREE_LANE_PERMUTATION (node)[0].first;
> +  unsigned l1 =3D SLP_TREE_LANE_PERMUTATION (node)[1].first;
> +
> +  switch (m_ifn)
> +    {
> +    case IFN_VEC_ADDSUB:
> +      {
> +       slp_tree sub =3D SLP_TREE_CHILDREN (node)[l0];
> +       slp_tree add =3D SLP_TREE_CHILDREN (node)[l1];
> +
> +       /* Modify the blend node in-place.  */
> +       SLP_TREE_CHILDREN (node)[0] =3D SLP_TREE_CHILDREN (sub)[0];
> +       SLP_TREE_CHILDREN (node)[1] =3D SLP_TREE_CHILDREN (sub)[1];
> +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> +
> +       /* Build IFN_VEC_ADDSUB from the sub representative operands.  */
> +       stmt_vec_info rep =3D SLP_TREE_REPRESENTATIVE (sub);
> +       gcall *call =3D gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> +                                                 gimple_assign_rhs1 (rep=
->stmt),
> +                                                 gimple_assign_rhs2 (rep=
->stmt));
> +       gimple_call_set_lhs (call, make_ssa_name
> +                            (TREE_TYPE (gimple_assign_lhs (rep->stmt))))=
;
> +       gimple_call_set_nothrow (call, true);
> +       gimple_set_bb (call, gimple_bb (rep->stmt));
> +       stmt_vec_info new_rep =3D vinfo->add_pattern_stmt (call, rep);
> +       SLP_TREE_REPRESENTATIVE (node) =3D new_rep;
> +       STMT_VINFO_RELEVANT (new_rep) =3D vect_used_in_scope;
> +       STMT_SLP_TYPE (new_rep) =3D pure_slp;
> +       STMT_VINFO_VECTYPE (new_rep) =3D SLP_TREE_VECTYPE (node);
> +       STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) =3D true;
> +       STMT_VINFO_REDUC_DEF (new_rep) =3D STMT_VINFO_REDUC_DEF (vect_ori=
g_stmt (rep));
> +       SLP_TREE_CODE (node) =3D ERROR_MARK;
> +       SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> +       vect_free_slp_tree (sub);
> +       vect_free_slp_tree (add);
> +       break;
> +      }
> +    case IFN_VEC_FMADDSUB:
> +    case IFN_VEC_FMSUBADD:
> +      {
> +       slp_tree sub, add;
> +       if (m_ifn =3D=3D IFN_VEC_FMADDSUB)
> +         {
> +           sub =3D SLP_TREE_CHILDREN (node)[l0];
> +           add =3D SLP_TREE_CHILDREN (node)[l1];
> +         }
> +       else /* m_ifn =3D=3D IFN_VEC_FMSUBADD */
> +         {
> +           sub =3D SLP_TREE_CHILDREN (node)[l1];
> +           add =3D SLP_TREE_CHILDREN (node)[l0];
> +         }
> +       slp_tree mul =3D SLP_TREE_CHILDREN (sub)[0];
> +       /* Modify the blend node in-place.  */
> +       SLP_TREE_CHILDREN (node).safe_grow (3, true);
> +       SLP_TREE_CHILDREN (node)[0] =3D SLP_TREE_CHILDREN (mul)[0];
> +       SLP_TREE_CHILDREN (node)[1] =3D SLP_TREE_CHILDREN (mul)[1];
> +       SLP_TREE_CHILDREN (node)[2] =3D SLP_TREE_CHILDREN (sub)[1];
> +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
> +
> +       /* Build IFN_VEC_FMADDSUB from the mul/sub representative operand=
s.  */
> +       stmt_vec_info srep =3D SLP_TREE_REPRESENTATIVE (sub);
> +       stmt_vec_info mrep =3D SLP_TREE_REPRESENTATIVE (mul);
> +       gcall *call =3D gimple_build_call_internal (m_ifn, 3,
> +                                                 gimple_assign_rhs1 (mre=
p->stmt),
> +                                                 gimple_assign_rhs2 (mre=
p->stmt),
> +                                                 gimple_assign_rhs2 (sre=
p->stmt));
> +       gimple_call_set_lhs (call, make_ssa_name
> +                            (TREE_TYPE (gimple_assign_lhs (srep->stmt)))=
);
> +       gimple_call_set_nothrow (call, true);
> +       gimple_set_bb (call, gimple_bb (srep->stmt));
> +       stmt_vec_info new_rep =3D vinfo->add_pattern_stmt (call, srep);
> +       SLP_TREE_REPRESENTATIVE (node) =3D new_rep;
> +       STMT_VINFO_RELEVANT (new_rep) =3D vect_used_in_scope;
> +       STMT_SLP_TYPE (new_rep) =3D pure_slp;
> +       STMT_VINFO_VECTYPE (new_rep) =3D SLP_TREE_VECTYPE (node);
> +       STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) =3D true;
> +       STMT_VINFO_REDUC_DEF (new_rep) =3D STMT_VINFO_REDUC_DEF (vect_ori=
g_stmt (srep));
> +       SLP_TREE_CODE (node) =3D ERROR_MARK;
> +       SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> +       vect_free_slp_tree (sub);
> +       vect_free_slp_tree (add);
> +       break;
> +      }
> +    default:;
> +    }
>  }
>
>  /***********************************************************************=
********
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index f08797c2bc0..5357cd0e7a4 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
>                   case CFN_COMPLEX_MUL:
>                   case CFN_COMPLEX_MUL_CONJ:
>                   case CFN_VEC_ADDSUB:
> +                 case CFN_VEC_FMADDSUB:
> +                 case CFN_VEC_FMSUBADD:
>                     vertices[idx].perm_in =3D 0;
>                     vertices[idx].perm_out =3D 0;
>                   default:;
> --
> 2.26.2



--=20
BR,
Hongtao