From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
 by sourceware.org (Postfix) with ESMTPS id 64EAD3858031
 for <gcc-patches@gcc.gnu.org>; Mon,  5 Jul 2021 14:38:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 64EAD3858031
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out2.suse.de (Postfix) with ESMTP id 5FA731FE7E;
 Mon,  5 Jul 2021 14:38:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
 t=1625495930; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
 mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=8lWkUpgw4N3ZfmyLsDzURf4t7Wju67bTfcS99ohxoqA=;
 b=p2RbhvpR706xDejRUFEmGKlxOtA5TG0JT7OxPTd9mtgiilAij7zb99ixttrp/NHSwFblBM
 KjMPv+7XP924qK/FuEZtQdf32cRm971ng/ugn+8sL/GqdIYqsPOeVgoR92WFYB9VZOi77z
 wQCFH1cl2uWLkScWTVQAh9Vn4fPSqGI=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_ed25519; t=1625495930;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
 mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=8lWkUpgw4N3ZfmyLsDzURf4t7Wju67bTfcS99ohxoqA=;
 b=b4hefZUort/dB/+M7oeLb4xtHdjYFdousa5Ef6Lx1e6r6Vqp1xTfmUF/O7FOvSQorx3wdg
 NkaTjM7mCVuFPWCQ==
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 4DED9A3B9E;
 Mon,  5 Jul 2021 14:38:50 +0000 (UTC)
Date: Mon, 5 Jul 2021 16:38:50 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Richard Biener <richard.guenther@gmail.com>
cc: GCC Patches <gcc-patches@gcc.gnu.org>, Hongtao Liu <hongtao.liu@intel.com>
Subject: Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns
 and optabs
In-Reply-To: <CAFiYyc2dT-a7oKKL3mWSCzxbNNqCv7=O2g95rrnC+m0_7ndm5w@mail.gmail.com>
Message-ID: <nycvar.YFH.7.76.2107051635340.10711@zhemvz.fhfr.qr>
References: <oss7q3o7-p1r4-o47n-s861-r7499sn5on61@fhfr.qr>
 <CAFiYyc2dT-a7oKKL3mWSCzxbNNqCv7=O2g95rrnC+m0_7ndm5w@mail.gmail.com>
User-Agent: Alpine 2.21 (LSU 202 2017-01-01)
MIME-Version: 1.0
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Jul 2021 14:38:54 -0000

On Mon, 5 Jul 2021, Richard Biener wrote:

> On Mon, Jul 5, 2021 at 4:09 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > This adds named expanders for vec_fmaddsub<mode>4 and
> > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> > vfmsubaddXXXp{ds} instructions.  This complements the previous
> > addition of ADDSUB support.
> >
> > x86 lacks SUBADD and the negate variants of FMA with mixed
> > plus minus so I did not add optabs or patterns for those but
> > it would not be difficult if there's a target that has them.
> > Maybe one of the complex fma patterns match those variants?
> >
> > I did not dare to rewrite the numerous patterns to the new
> > canonical name but instead added two new expanders.  Note I
> > did not cover AVX512 since the existing patterns are separated
> > and I have no easy way to test things there.  Handling AVX512
> > should be easy as followup though.
> >
> > Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
> 
> FYI, building libgfortran matmul_c4 we hit
> 
> /home/rguenther/src/trunk/libgfortran/generated/matmul_c4.c:1781:1:
> error: unrecognizable insn:
>  1781 | }
>       | ^
> (insn 5408 5407 5409 213 (set (reg:V8SF 1454 [ vect__4368.5363 ])
>         (unspec:V8SF [
>                 (reg:V8SF 4391)
>                 (reg:V8SF 4398)
>                 (reg:V8SF 4415 [ vect__2005.5362 ])
>             ] UNSPEC_FMADDSUB)) -1
>      (nil))
> during RTL pass: vregs
> 
> so it looks like the existing fmaddsub_<mode> expander cannot be
> simply re-purposed?

Ah, using the VF_128_256 iterator and removing the || TARGET_AVX512F
predication fixes it.  There's a avx512f but not fma target variant
of matmul which likely lacks avx512vl for the above.  So consider it
changed this way.  Not sure if there's a more appropriate iterator
that catches this case.

Richard.

> > Any comments?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-05  Richard Biener  <rguenther@suse.de>
> >
> >         * doc/md.texi (vec_fmaddsub<mode>4): Document.
> >         (vec_fmsubadd<mode>4): Likewise.
> >         * optabs.def (vec_fmaddsub$a4): Add.
> >         (vec_fmsubadd$a4): Likewise.
> >         * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> >         (IFN_VEC_FMSUBADD): Likewise.
> >         * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> >         Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> >         (addsub_pattern::build): Likewise.
> >         * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> >         and CFN_VEC_FMSUBADD are not transparent for permutes.
> >         * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> >         (vec_fmsubadd<mode>4): Likewise.
> >
> >         * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> >         * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> >         * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> >         * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> > ---
> >  gcc/config/i386/sse.md                        |  19 ++
> >  gcc/doc/md.texi                               |  14 ++
> >  gcc/internal-fn.def                           |   3 +-
> >  gcc/optabs.def                                |   2 +
> >  .../gcc.target/i386/vect-fmaddsubXXXpd.c      |  34 ++++
> >  .../gcc.target/i386/vect-fmaddsubXXXps.c      |  34 ++++
> >  .../gcc.target/i386/vect-fmsubaddXXXpd.c      |  34 ++++
> >  .../gcc.target/i386/vect-fmsubaddXXXps.c      |  34 ++++
> >  gcc/tree-vect-slp-patterns.c                  | 192 +++++++++++++-----
> >  gcc/tree-vect-slp.c                           |   2 +
> >  10 files changed, 311 insertions(+), 57 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index bcf1605d147..6fc13c184bf 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -4644,6 +4644,25 @@
> >  ;;
> >  ;; But this doesn't seem useful in practice.
> >
> > +(define_expand "vec_fmaddsub<mode>4"
> > +  [(set (match_operand:VF 0 "register_operand")
> > +       (unspec:VF
> > +         [(match_operand:VF 1 "nonimmediate_operand")
> > +          (match_operand:VF 2 "nonimmediate_operand")
> > +          (match_operand:VF 3 "nonimmediate_operand")]
> > +         UNSPEC_FMADDSUB))]
> > +  "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > +
> > +(define_expand "vec_fmsubadd<mode>4"
> > +  [(set (match_operand:VF 0 "register_operand")
> > +       (unspec:VF
> > +         [(match_operand:VF 1 "nonimmediate_operand")
> > +          (match_operand:VF 2 "nonimmediate_operand")
> > +          (neg:VF
> > +            (match_operand:VF 3 "nonimmediate_operand"))]
> > +         UNSPEC_FMADDSUB))]
> > +  "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > +
> >  (define_expand "fmaddsub_<mode>"
> >    [(set (match_operand:VF 0 "register_operand")
> >         (unspec:VF
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 1b918144330..cc92ebd26aa 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
> >  lanes doing addition.  Operands 1 and 2 and the outout operand are vectors
> >  with mode @var{m}.
> >
> > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
> > +@item @samp{vec_fmaddsub@var{m}4}
> > +Alternating multiply subtract, add with even lanes doing subtract and odd
> > +lanes doing addition of the third operand to the multiplication result
> > +of the first two operands.  Operands 1, 2 and 3 and the outout operand are vectors
> > +with mode @var{m}.
> > +
> > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
> > +@item @samp{vec_fmsubadd@var{m}4}
> > +Alternating multiply add, subtract with even lanes doing addition and odd
> > +lanes doing subtraction of the third operand to the multiplication result
> > +of the first two operands.  Operands 1, 2 and 3 and the outout operand are vectors
> > +with mode @var{m}.
> > +
> >  These instructions are not allowed to @code{FAIL}.
> >
> >  @cindex @code{mulhisi3} instruction pattern
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index c3b8e730960..a7003d5da8e 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> >  DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> >  DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
> > -
> > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 41ab2598eb6..51acc1be8f5 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> >  OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> >  OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> >  OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
> >
> >  OPTAB_D (sync_add_optab, "sync_add$I$a")
> >  OPTAB_D (sync_and_optab, "sync_and$I$a")
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > new file mode 100644
> > index 00000000000..b30d10731a7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmaddsub (double * __restrict a, double *b, double *c, int n)
> > +{
> > +  for (int i = 0; i < n; ++i)
> > +    {
> > +      a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> > +      a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> > +    }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > +  double a[4], b[4], c[4];
> > +  for (int i = 0; i < 4; ++i)
> > +    {
> > +      a[i] = i;
> > +      b[i] = 3*i;
> > +      c[i] = 7*i;
> > +    }
> > +  check_fmaddsub (a, b, c, 2);
> > +  const double d[4] = { 0., 22., 82., 192. };
> > +  for (int i = 0; i < 4; ++i)
> > +    if (a[i] != d[i])
> > +      __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > new file mode 100644
> > index 00000000000..cd2af8725a3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmaddsub (float * __restrict a, float *b, float *c, int n)
> > +{
> > +  for (int i = 0; i < n; ++i)
> > +    {
> > +      a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> > +      a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> > +    }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > +  float a[4], b[4], c[4];
> > +  for (int i = 0; i < 4; ++i)
> > +    {
> > +      a[i] = i;
> > +      b[i] = 3*i;
> > +      c[i] = 7*i;
> > +    }
> > +  check_fmaddsub (a, b, c, 2);
> > +  const float d[4] = { 0., 22., 82., 192. };
> > +  for (int i = 0; i < 4; ++i)
> > +    if (a[i] != d[i])
> > +      __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > new file mode 100644
> > index 00000000000..7ca2a275cc1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmsubadd (double * __restrict a, double *b, double *c, int n)
> > +{
> > +  for (int i = 0; i < n; ++i)
> > +    {
> > +      a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> > +      a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> > +    }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > +  double a[4], b[4], c[4];
> > +  for (int i = 0; i < 4; ++i)
> > +    {
> > +      a[i] = i;
> > +      b[i] = 3*i;
> > +      c[i] = 7*i;
> > +    }
> > +  check_fmsubadd (a, b, c, 2);
> > +  const double d[4] = { 0., 20., 86., 186. };
> > +  for (int i = 0; i < 4; ++i)
> > +    if (a[i] != d[i])
> > +      __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> > new file mode 100644
> > index 00000000000..9ddd0e423db
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmsubadd (float * __restrict a, float *b, float *c, int n)
> > +{
> > +  for (int i = 0; i < n; ++i)
> > +    {
> > +      a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> > +      a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> > +    }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > +  float a[4], b[4], c[4];
> > +  for (int i = 0; i < 4; ++i)
> > +    {
> > +      a[i] = i;
> > +      b[i] = 3*i;
> > +      c[i] = 7*i;
> > +    }
> > +  check_fmsubadd (a, b, c, 2);
> > +  const float d[4] = { 0., 20., 86., 186. };
> > +  for (int i = 0; i < 4; ++i)
> > +    if (a[i] != d[i])
> > +      __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index 2671f91972d..f774cac4a4d 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
> >  class addsub_pattern : public vect_pattern
> >  {
> >    public:
> > -    addsub_pattern (slp_tree *node)
> > -       : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
> > +    addsub_pattern (slp_tree *node, internal_fn ifn)
> > +       : vect_pattern (node, NULL, ifn) {};
> >
> >      void build (vec_info *);
> >
> > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> >  {
> >    slp_tree node = *node_;
> >    if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > -      || SLP_TREE_CHILDREN (node).length () != 2)
> > +      || SLP_TREE_CHILDREN (node).length () != 2
> > +      || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
> >      return NULL;
> >
> >    /* Match a blend of a plus and a minus op with the same number of plus and
> >       minus lanes on the same operands.  */
> > -  slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> > -  slp_tree add = SLP_TREE_CHILDREN (node)[1];
> > -  bool swapped_p = false;
> > -  if (vect_match_expression_p (sub, PLUS_EXPR))
> > -    {
> > -      std::swap (add, sub);
> > -      swapped_p = true;
> > -    }
> > -  if (!(vect_match_expression_p (add, PLUS_EXPR)
> > -       && vect_match_expression_p (sub, MINUS_EXPR)))
> > +  unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> > +  unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> > +  if (l0 == l1)
> > +    return NULL;
> > +  bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
> > +                                         PLUS_EXPR);
> > +  if (!l0add_p
> > +      && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
> > +    return NULL;
> > +  bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
> > +                                         PLUS_EXPR);
> > +  if (!l1add_p
> > +      && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
> >      return NULL;
> > -  if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
> > -        && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
> > -       || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
> > -           && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
> > +
> > +  slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
> > +  slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
> > +  if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
> > +        && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
> > +       || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
> > +           && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
> >      return NULL;
> >
> >    for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
> >      {
> >        std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
> > -      if (swapped_p)
> > -       perm.first = perm.first == 0 ? 1 : 0;
> > -      /* It has to be alternating -, +, -, ...
> > +      /* It has to be alternating -, +, -,
> >          While we could permute the .ADDSUB inputs and the .ADDSUB output
> >          that's only profitable over the add + sub + blend if at least
> >          one of the permute is optimized which we can't determine here.  */
> > -      if (perm.first != (i & 1)
> > +      if (perm.first != ((i & 1) ? l1 : l0)
> >           || perm.second != i)
> >         return NULL;
> >      }
> >
> > -  if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> > -    return NULL;
> > +  /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
> > +     (l0add_p), see whether we have FMA variants.  */
> > +  if (!l0add_p
> > +      && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
> > +    {
> > +      /* (c * d) -+ a */
> > +      if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
> > +       return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
> > +    }
> > +  else if (l0add_p
> > +          && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
> > +    {
> > +      /* (c * d) +- a */
> > +      if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
> > +       return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
> > +    }
> >
> > -  return new addsub_pattern (node_);
> > +  if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> > +    return new addsub_pattern (node_, IFN_VEC_ADDSUB);
> > +
> > +  return NULL;
> >  }
> >
> >  void
> > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
> >  {
> >    slp_tree node = *m_node;
> >
> > -  slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> > -  slp_tree add = SLP_TREE_CHILDREN (node)[1];
> > -  if (vect_match_expression_p (sub, PLUS_EXPR))
> > -    std::swap (add, sub);
> > -
> > -  /* Modify the blend node in-place.  */
> > -  SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> > -  SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> > -  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> > -  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> > -
> > -  /* Build IFN_VEC_ADDSUB from the sub representative operands.  */
> > -  stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> > -  gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> > -                                           gimple_assign_rhs1 (rep->stmt),
> > -                                           gimple_assign_rhs2 (rep->stmt));
> > -  gimple_call_set_lhs (call, make_ssa_name
> > -                              (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> > -  gimple_call_set_nothrow (call, true);
> > -  gimple_set_bb (call, gimple_bb (rep->stmt));
> > -  stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> > -  SLP_TREE_REPRESENTATIVE (node) = new_rep;
> > -  STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> > -  STMT_SLP_TYPE (new_rep) = pure_slp;
> > -  STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> > -  STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> > -  STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> > -  SLP_TREE_CODE (node) = ERROR_MARK;
> > -  SLP_TREE_LANE_PERMUTATION (node).release ();
> > -
> > -  vect_free_slp_tree (sub);
> > -  vect_free_slp_tree (add);
> > +  unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> > +  unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> > +
> > +  switch (m_ifn)
> > +    {
> > +    case IFN_VEC_ADDSUB:
> > +      {
> > +       slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
> > +       slp_tree add = SLP_TREE_CHILDREN (node)[l1];
> > +
> > +       /* Modify the blend node in-place.  */
> > +       SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> > +       SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> > +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> > +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> > +
> > +       /* Build IFN_VEC_ADDSUB from the sub representative operands.  */
> > +       stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> > +       gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> > +                                                 gimple_assign_rhs1 (rep->stmt),
> > +                                                 gimple_assign_rhs2 (rep->stmt));
> > +       gimple_call_set_lhs (call, make_ssa_name
> > +                            (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> > +       gimple_call_set_nothrow (call, true);
> > +       gimple_set_bb (call, gimple_bb (rep->stmt));
> > +       stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> > +       SLP_TREE_REPRESENTATIVE (node) = new_rep;
> > +       STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> > +       STMT_SLP_TYPE (new_rep) = pure_slp;
> > +       STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> > +       STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> > +       STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> > +       SLP_TREE_CODE (node) = ERROR_MARK;
> > +       SLP_TREE_LANE_PERMUTATION (node).release ();
> > +
> > +       vect_free_slp_tree (sub);
> > +       vect_free_slp_tree (add);
> > +       break;
> > +      }
> > +    case IFN_VEC_FMADDSUB:
> > +    case IFN_VEC_FMSUBADD:
> > +      {
> > +       slp_tree sub, add;
> > +       if (m_ifn == IFN_VEC_FMADDSUB)
> > +         {
> > +           sub = SLP_TREE_CHILDREN (node)[l0];
> > +           add = SLP_TREE_CHILDREN (node)[l1];
> > +         }
> > +       else /* m_ifn == IFN_VEC_FMSUBADD */
> > +         {
> > +           sub = SLP_TREE_CHILDREN (node)[l1];
> > +           add = SLP_TREE_CHILDREN (node)[l0];
> > +         }
> > +       slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
> > +       /* Modify the blend node in-place.  */
> > +       SLP_TREE_CHILDREN (node).safe_grow (3, true);
> > +       SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
> > +       SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
> > +       SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
> > +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> > +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> > +       SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
> > +
> > +       /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands.  */
> > +       stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
> > +       stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
> > +       gcall *call = gimple_build_call_internal (m_ifn, 3,
> > +                                                 gimple_assign_rhs1 (mrep->stmt),
> > +                                                 gimple_assign_rhs2 (mrep->stmt),
> > +                                                 gimple_assign_rhs2 (srep->stmt));
> > +       gimple_call_set_lhs (call, make_ssa_name
> > +                            (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
> > +       gimple_call_set_nothrow (call, true);
> > +       gimple_set_bb (call, gimple_bb (srep->stmt));
> > +       stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
> > +       SLP_TREE_REPRESENTATIVE (node) = new_rep;
> > +       STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> > +       STMT_SLP_TYPE (new_rep) = pure_slp;
> > +       STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> > +       STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> > +       STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
> > +       SLP_TREE_CODE (node) = ERROR_MARK;
> > +       SLP_TREE_LANE_PERMUTATION (node).release ();
> > +
> > +       vect_free_slp_tree (sub);
> > +       vect_free_slp_tree (add);
> > +       break;
> > +      }
> > +    default:;
> > +    }
> >  }
> >
> >  /*******************************************************************************
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index f08797c2bc0..5357cd0e7a4 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
> >                   case CFN_COMPLEX_MUL:
> >                   case CFN_COMPLEX_MUL_CONJ:
> >                   case CFN_VEC_ADDSUB:
> > +                 case CFN_VEC_FMADDSUB:
> > +                 case CFN_VEC_FMSUBADD:
> >                     vertices[idx].perm_in = 0;
> >                     vertices[idx].perm_out = 0;
> >                   default:;
> > --
> > 2.26.2
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)