From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=do2y=GP=gmail.com=pinskia@sourceware.org>
Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d])
	by sourceware.org (Postfix) with ESMTPS id 9ECA73858D28
	for <gcc-patches@gcc.gnu.org>; Thu,  2 Nov 2023 23:26:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9ECA73858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9ECA73858D28
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52d
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698967600; cv=none;
	b=rMufiSE/DKtlWqT7IdGOFWSFC9xF6cRRt0wrgJuT8U6IpQQZgZFafe4n2esh+CxGonExJmH4OKweEwXEWpivAI0Q+hbgtRg4A6/JFI0qVohSEDBCaMvv3ZKLAVt5e+3gp6d9JW8VczHvbWjjY/y0Dmow1DHGPGpLYqPLr67cyu4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1698967600; c=relaxed/simple;
	bh=PqcO+8c6e1cd3093THTEoWuADV8tiUiwJnQegytWTl8=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=JA4Uvm63S0LdjEnKeryfgbqAn/6QHUa91HaeJ/azy5TRpO3KI9zHs0JkWKSCEetcxtcu9WC9mqTQs2F4n8xxxuvbI5yPrNvkOnLzIDDqJwvtSNfw+MCN5MnY3Uqx4x8K5iKVuUvWdnoHJT5mRe2s7CUHAPPCq3YtT8r1tI8emu0=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-565334377d0so1100884a12.2
        for <gcc-patches@gcc.gnu.org>; Thu, 02 Nov 2023 16:26:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1698967588; x=1699572388; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=5paj8H8RXVmwUJoPuK6B/emVvTa/NVhM+KtogVAqstQ=;
        b=BO7R4ZkMoOy2lxzqpX1EXHlVeJj6fCWrPGVd7uCEcUYuRnrZKTR07pYKndFWiweY0i
         ddE6sFHWGNEE/xnHpHsJS06CFG4KBtGtnyY2ayvFkup83cMa5ZoYwj/VxC73UeLTYXTv
         S/MJ3+p6f5obmDSutRWtfJAP9ZSuOBnLd5EsmjLTPV72peY/C/ZW8yRsy16MglmU59pq
         mktPJpDYmCxO8vnuetmcoh3ShTeKU8L97KMWL6M2Q+iodjAw763NjutPS9kvILl9jtSd
         KYaxiRmuHxQofth/tJzPNYgN6XUj5M+M/U3dpda+o7YNuugp/AlgSikbvOrs5eYq7iF0
         GCFw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1698967588; x=1699572388;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=5paj8H8RXVmwUJoPuK6B/emVvTa/NVhM+KtogVAqstQ=;
        b=gHeXbwURZ5NsEOUNDtd5TkzEzTJSLY/53/8vJJ+3kBlBmhgf95KWQb71RYgLERSMJZ
         keiddNC+doVSm5zImOmLv7SdmQb1p1SNAPr7bI7zZwF0IKlJvQgp7vFT0yDbJcjif/Ah
         H34BHQXlTb+CrIsuslWS4XdIfQIWTp/z/3s4TkZwHwFO5ICewo2pByMvkRCF0F9BbTVP
         By8/nxbv8WxpEb0ugg+E+IaLxgXPv/JRr9v9e0fFXDdU8pUjVMwApF8BUkJfnT3OMMiF
         fZOTDAJTrHGf1nrrCzz+nJ/ZYzsUCzheilgdYWKqgg3CvPZ4VbNYKNDN7Z5S46wb8AtJ
         2geA==
X-Gm-Message-State: AOJu0YwstlT3hVK7hVa3zSEcIqK0wIHiYlmV3B41QDqUUhuYni/UU0fQ
	vGf59NFjuOFUBCCj0yfGCfWELbXa8N0dmhNNono=
X-Google-Smtp-Source: AGHT+IEuOPqYMkTNaAWEvMTFzGLcWUHjekoFgCw49DVNuDQf1o1MYdHaXtWGVkdqUP4uL7hAMwwjYCx/pofU3wlXP+Y=
X-Received: by 2002:a05:6a20:8c19:b0:161:2607:d815 with SMTP id
 j25-20020a056a208c1900b001612607d815mr17317635pzh.24.1698967588099; Thu, 02
 Nov 2023 16:26:28 -0700 (PDT)
MIME-Version: 1.0
References: <0193b63e-98dc-42bc-cd33-485361ea50bf@gmail.com>
In-Reply-To: <0193b63e-98dc-42bc-cd33-485361ea50bf@gmail.com>
From: Andrew Pinski <pinskia@gmail.com>
Date: Thu, 2 Nov 2023 16:26:15 -0700
Message-ID: <CA+=Sn1n9gB6JvjSNRQxatUz1ANK+rTXaJ9WAqa+=YvWVjvEv3Q@mail.gmail.com>
Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
To: Robin Dapp <rdapp.gcc@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, Sep 20, 2023 at 6:52=E2=80=AFAM Robin Dapp <rdapp.gcc@gmail.com> wr=
ote:
>
> Hi,
>
> as described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> Related question/change: We only allow PLUS_EXPR in fold_left_reduction_f=
n
> but have code to handle MINUS_EXPR in vectorize_fold_left_reduction.  I
> suppose that's intentional but it "just works" on riscv and the testsuite
> doesn't change when allowing MINUS_EXPR so I went ahead and did that.
>
> Bootstrapped and regtested on x86 and aarch64.

This caused gcc.target/i386/avx512f-reduce-op-1.c testcase to start to
fail when testing on a x86_64 that has avx512f (In my case I am using
`Intel(R) Xeon(R) D-2166NT CPU @ 2.00GHz`).  I reverted the commit to
double check it too.

The difference in optimized I see is:
  if (_40 !=3D 3.5e+1) // working
vs
  if (_40 !=3D 6.4e+1) // not working

It is test_epi32_ps which is failing with TEST_PS macro and the plus
operand that uses TESTOP:
    TESTOP (add, +, float, ps, 0.0f);                                   \

I have not reduced the testcase any further though.

Thanks,
Andrew Pinski


>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
>         PR middle-end/111401
>         * internal-fn.cc (cond_fn_p): New function.
>         * internal-fn.h (cond_fn_p): Define.
>         * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>         if supported.
>         (predicate_scalar_phi): Add whitespace.
>         * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>         (neutral_op_for_reduction): Return -0 for PLUS.
>         (vect_is_simple_reduction): Don't count else operand in
>         COND_ADD.
>         (vectorize_fold_left_reduction): Add COND_ADD handling.
>         (vectorizable_reduction): Don't count else operand in COND_ADD.
>         (vect_transform_reduction): Add COND_ADD handling.
>         * tree-vectorizer.h (neutral_op_for_reduction): Add default
>         parameter.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>         * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  gcc/internal-fn.cc                            |  38 +++++
>  gcc/internal-fn.h                             |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++++++++++++++++++
>  .../riscv/rvv/autovec/cond/pr111401.c         |  61 ++++++++
>  gcc/tree-if-conv.cc                           |  63 ++++++--
>  gcc/tree-vect-loop.cc                         | 130 ++++++++++++----
>  gcc/tree-vectorizer.h                         |   2 +-
>  7 files changed, 394 insertions(+), 42 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-=
signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111=
401.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..77939890f5a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn)
>      }
>  }
>
> +/* Return true if this CODE describes a conditional (masked) internal_fn=
.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +    return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +    return false;
> +
> +  internal_fn fn =3D as_internal_fn ((combined_fn) code);
> +  switch (fn)
> +    {
> +    #undef DEF_INTERNAL_COND_FN
> +    #define DEF_INTERNAL_COND_FN(NAME, F, O, T)                         =
 \
> +    case IFN_COND_##NAME:                                        \
> +    case IFN_COND_LEN_##NAME:                                    \
> +      return true;
> +    #include "internal-fn.def"
> +    #undef DEF_INTERNAL_COND_FN
> +
> +    #undef DEF_INTERNAL_SIGNED_COND_FN
> +    #define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T)   \
> +    case IFN_COND_##NAME:                                        \
> +    case IFN_COND_LEN_##NAME:                                    \
> +      return true;
> +    #include "internal-fn.def"
> +    #undef DEF_INTERNAL_SIGNED_COND_FN
> +
> +    default:
> +      return false;
> +    }
> +
> +  return false;
> +}
> +
> +
>  /* Return true if this CODE describes an internal_fn that returns a vect=
or with
>     elements twice as wide as the element size of the input vectors.  */
>
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 99de13a0199..f1cc9db29c0 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
>  extern int first_commutative_argument (internal_fn);
>  extern bool associative_binary_fn_p (internal_fn);
>  extern bool widening_fn_p (code_helper);
> +extern bool cond_fn_p (code_helper code);
>
>  extern bool set_edom_supported_p (void);
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-=
zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 00000000000..57c600838ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=3Dc99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include <math.h>
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res =3D init;
> +  for (int i =3D 0; i < n; i++)
> +    if (cond[i])
> +      res +=3D a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n=
)
> +{
> +  double res =3D init;
> +  for (int i =3D 0; i < n; i++)
> +    if (cond[i])
> +      res +=3D a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_minus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res =3D init;
> +  for (int i =3D 0; i < n; i++)
> +    if (cond[i])
> +      res -=3D a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int =
n)
> +{
> +  double res =3D init;
> +  for (int i =3D 0; i < n; i++)
> +    if (cond[i])
> +      res -=3D a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n =3D 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i =3D 0; i < N; i++)
> +    {
> +      a[i] =3D (i * 0.1) * (i & 1 ? 1 : -1);
> +      cond1[i] =3D 0;
> +      cond2[i] =3D i & 4 ? 1 : 0;
> +      asm volatile ("" ::: "memory");
> +    }
> +
> +  double res1 =3D reduc_plus_double (a, -0.0, cond1, n);
> +  double ref1 =3D reduc_plus_double_ref (a, -0.0, cond1, n);
> +  double res2 =3D reduc_minus_double (a, -0.0, cond1, n);
> +  double ref2 =3D reduc_minus_double_ref (a, -0.0, cond1, n);
> +  double res3 =3D reduc_plus_double (a, -0.0, cond1, n);
> +  double ref3 =3D reduc_plus_double_ref (a, -0.0, cond1, n);
> +  double res4 =3D reduc_minus_double (a, -0.0, cond1, n);
> +  double ref4 =3D reduc_minus_double_ref (a, -0.0, cond1, n);
> +
> +  if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4))
> +    __builtin_abort ();
> +
> +  res1 =3D reduc_plus_double (a, 0.0, cond1, n);
> +  ref1 =3D reduc_plus_double_ref (a, 0.0, cond1, n);
> +  res2 =3D reduc_minus_double (a, 0.0, cond1, n);
> +  ref2 =3D reduc_minus_double_ref (a, 0.0, cond1, n);
> +  res3 =3D reduc_plus_double (a, 0.0, cond1, n);
> +  ref3 =3D reduc_plus_double_ref (a, 0.0, cond1, n);
> +  res4 =3D reduc_minus_double (a, 0.0, cond1, n);
> +  ref4 =3D reduc_minus_double_ref (a, 0.0, cond1, n);
> +
> +  if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4))
> +    __builtin_abort ();
> +
> +  res1 =3D reduc_plus_double (a, -0.0, cond2, n);
> +  ref1 =3D reduc_plus_double_ref (a, -0.0, cond2, n);
> +  res2 =3D reduc_minus_double (a, -0.0, cond2, n);
> +  ref2 =3D reduc_minus_double_ref (a, -0.0, cond2, n);
> +  res3 =3D reduc_plus_double (a, -0.0, cond2, n);
> +  ref3 =3D reduc_plus_double_ref (a, -0.0, cond2, n);
> +  res4 =3D reduc_minus_double (a, -0.0, cond2, n);
> +  ref4 =3D reduc_minus_double_ref (a, -0.0, cond2, n);
> +
> +  if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4))
> +    __builtin_abort ();
> +
> +  res1 =3D reduc_plus_double (a, 0.0, cond2, n);
> +  ref1 =3D reduc_plus_double_ref (a, 0.0, cond2, n);
> +  res2 =3D reduc_minus_double (a, 0.0, cond2, n);
> +  ref2 =3D reduc_minus_double_ref (a, 0.0, cond2, n);
> +  res3 =3D reduc_plus_double (a, 0.0, cond2, n);
> +  ref3 =3D reduc_plus_double_ref (a, 0.0, cond2, n);
> +  res4 =3D reduc_minus_double (a, 0.0, cond2, n);
> +  ref4 =3D reduc_minus_double_ref (a, 0.0, cond2, n);
> +
> +  if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4))
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c b=
/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> new file mode 100644
> index 00000000000..1d559ce5391
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> @@ -0,0 +1,61 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "-march=3Drv64gcv -mabi=3Dlp64d --param riscv=
-autovec-preference=3Dscalable -fdump-tree-vect-details" } */
> +
> +double
> +__attribute__ ((noipa))
> +foo2 (double *__restrict a, double init, int *__restrict cond, int n)
> +{
> +  for (int i =3D 0; i < n; i++)
> +    if (cond[i])
> +      init +=3D a[i];
> +  return init;
> +}
> +
> +double
> +__attribute__ ((noipa))
> +foo3 (double *__restrict a, double init, int *__restrict cond, int n)
> +{
> +  for (int i =3D 0; i < n; i++)
> +    if (cond[i])
> +      init -=3D a[i];
> +  return init;
> +}
> +
> +#define SZ 125
> +
> +__attribute__ ((optimize ("1")))
> +int
> +main ()
> +{
> +  double res1 =3D 0, res2 =3D 0;
> +  double a1[SZ], a2[SZ];
> +  int c1[SZ], c2[SZ];
> +  for (int i =3D 0; i < SZ; i++)
> +    {
> +      a1[i] =3D i * 3 + (i & 4) - (i & 7);
> +      a2[i] =3D i * 3 + (i & 4) - (i & 7);
> +      c1[i] =3D i & 1;
> +      c2[i] =3D i & 1;
> +    }
> +
> +  double init1 =3D 2.7, init2 =3D 8.2;
> +  double ref1 =3D init1, ref2 =3D init2;
> +  for (int i =3D 0; i < SZ; i++)
> +    {
> +      if (c1[i])
> +        ref1 +=3D a1[i];
> +      if (c2[i])
> +        ref2 -=3D a2[i];
> +    }
> +
> +  res1 =3D foo2 (a1, init1, c1, SZ);
> +  res2 =3D foo3 (a2, init2, c2, SZ);
> +
> +  if (res1 !=3D ref1)
> +    __builtin_abort ();
> +  if (res2 !=3D ref2)
> +    __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-not "VCOND_MASK" "vect" } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 799f071965e..425976b0861 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1852,10 +1852,12 @@ convert_scalar_cond_reduction (gimple *reduc, gim=
ple_stmt_iterator *gsi,
>    gimple *new_assign;
>    tree rhs;
>    tree rhs1 =3D gimple_assign_rhs1 (reduc);
> +  tree lhs =3D gimple_assign_lhs (reduc);
>    tree tmp =3D make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_ifc_");
>    tree c;
>    enum tree_code reduction_op  =3D gimple_assign_rhs_code (reduc);
> -  tree op_nochange =3D neutral_op_for_reduction (TREE_TYPE (rhs1), reduc=
tion_op, NULL);
> +  tree op_nochange =3D neutral_op_for_reduction (TREE_TYPE (rhs1), reduc=
tion_op,
> +                                              NULL, false);
>    gimple_seq stmts =3D NULL;
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -1864,19 +1866,52 @@ convert_scalar_cond_reduction (gimple *reduc, gim=
ple_stmt_iterator *gsi,
>        print_gimple_stmt (dump_file, reduc, 0, TDF_SLIM);
>      }
>
> -  /* Build cond expression using COND and constant operand
> -     of reduction rhs.  */
> -  c =3D fold_build_cond_expr (TREE_TYPE (rhs1),
> -                           unshare_expr (cond),
> -                           swap ? op_nochange : op1,
> -                           swap ? op1 : op_nochange);
> +  /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR an=
d
> +     a PLUS_EXPR.  Don't do this if the reduction def operand itself is
> +     a vectorizable call as we can create a COND version of it directly.=
  */
> +  internal_fn ifn;
> +  ifn =3D get_conditional_internal_fn (reduction_op);
>
> -  /* Create assignment stmt and insert it at GSI.  */
> -  new_assign =3D gimple_build_assign (tmp, c);
> -  gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
> -  /* Build rhs for unconditional increment/decrement/logic_operation.  *=
/
> -  rhs =3D gimple_build (&stmts, reduction_op,
> -                     TREE_TYPE (rhs1), op0, tmp);
> +  bool try_cond_op =3D true;
> +  gimple *opstmt;
> +  if (TREE_CODE (op1) =3D=3D SSA_NAME
> +      && (opstmt =3D SSA_NAME_DEF_STMT (op1))
> +      && is_gimple_call (opstmt))
> +    {
> +      combined_fn cfn =3D gimple_call_combined_fn (opstmt);
> +      internal_fn ifnop;
> +      reduction_fn_for_scalar_code (cfn, &ifnop);
> +      if (vectorized_internal_fn_supported_p (ifnop, TREE_TYPE
> +                                             (gimple_call_lhs (opstmt)))=
)
> +       try_cond_op =3D false;
> +    }
> +
> +  if (ifn !=3D IFN_LAST
> +      && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs))
> +      && try_cond_op && !swap)
> +    {
> +      gcall *cond_call =3D gimple_build_call_internal (ifn, 4,
> +                                                    unshare_expr (cond),
> +                                                    op0, op1, op0);
> +      gsi_insert_before (gsi, cond_call, GSI_SAME_STMT);
> +      gimple_call_set_lhs (cond_call, tmp);
> +      rhs =3D tmp;
> +    }
> +  else
> +    {
> +      /* Build cond expression using COND and constant operand
> +        of reduction rhs.  */
> +      c =3D fold_build_cond_expr (TREE_TYPE (rhs1),
> +                               unshare_expr (cond),
> +                               swap ? op_nochange : op1,
> +                               swap ? op1 : op_nochange);
> +      /* Create assignment stmt and insert it at GSI.  */
> +      new_assign =3D gimple_build_assign (tmp, c);
> +      gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
> +      /* Build rhs for unconditional increment/decrement/logic_operation=
.  */
> +      rhs =3D gimple_build (&stmts, reduction_op,
> +                         TREE_TYPE (rhs1), op0, tmp);
> +    }
>
>    if (has_nop)
>      {
> @@ -2241,7 +2276,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterat=
or *gsi)
>         {
>           /* Convert reduction stmt into vectorizable form.  */
>           rhs =3D convert_scalar_cond_reduction (reduc, gsi, cond, op0, o=
p1,
> -                                              swap,has_nop, nop_reduc);
> +                                              swap, has_nop, nop_reduc);
>           redundant_ssa_names.safe_push (std::make_pair (res, rhs));
>         }
>        new_stmt =3D gimple_build_assign (res, rhs);
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 23c6e8259e7..94d3cead1e6 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_share=
d *shared)
>  static bool
>  fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
>  {
> -  if (code =3D=3D PLUS_EXPR)
> +  if (code =3D=3D PLUS_EXPR || code =3D=3D MINUS_EXPR)
>      {
>        *reduc_fn =3D IFN_FOLD_LEFT_PLUS;
>        return true;
> @@ -3751,23 +3751,29 @@ reduction_fn_for_scalar_code (code_helper code, i=
nternal_fn *reduc_fn)
>     by the introduction of additional X elements, return that X, otherwis=
e
>     return null.  CODE is the code of the reduction and SCALAR_TYPE is ty=
pe
>     of the scalar elements.  If the reduction has just a single initial v=
alue
> -   then INITIAL_VALUE is that value, otherwise it is null.  */
> +   then INITIAL_VALUE is that value, otherwise it is null.
> +   If AS_INITIAL is TRUE the value is supposed to be used as initial val=
ue.
> +   In that case no signed zero is returned.  */
>
>  tree
>  neutral_op_for_reduction (tree scalar_type, code_helper code,
> -                         tree initial_value)
> +                         tree initial_value, bool as_initial)
>  {
>    if (code.is_tree_code ())
>      switch (tree_code (code))
>        {
> -      case WIDEN_SUM_EXPR:
>        case DOT_PROD_EXPR:
>        case SAD_EXPR:
> -      case PLUS_EXPR:
>        case MINUS_EXPR:
>        case BIT_IOR_EXPR:
>        case BIT_XOR_EXPR:
>         return build_zero_cst (scalar_type);
> +      case WIDEN_SUM_EXPR:
> +      case PLUS_EXPR:
> +       if (!as_initial && HONOR_SIGNED_ZEROS (scalar_type))
> +         return build_real (scalar_type, dconstm0);
> +       else
> +         return build_zero_cst (scalar_type);
>
>        case MULT_EXPR:
>         return build_one_cst (scalar_type);
> @@ -4106,8 +4112,14 @@ vect_is_simple_reduction (loop_vec_info loop_info,=
 stmt_vec_info phi_info,
>            return NULL;
>          }
>
> -      nphi_def_loop_uses++;
> -      phi_use_stmt =3D use_stmt;
> +      /* In case of a COND_OP (mask, op1, op2, op1) reduction we might h=
ave
> +        op1 twice (once as definition, once as else) in the same operati=
on.
> +        Only count it as one. */
> +      if (use_stmt !=3D phi_use_stmt)
> +       {
> +         nphi_def_loop_uses++;
> +         phi_use_stmt =3D use_stmt;
> +       }
>      }
>
>    tree latch_def =3D PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop))=
;
> @@ -6378,7 +6390,7 @@ vect_create_epilog_for_reduction (loop_vec_info loo=
p_vinfo,
>           if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>             initial_value =3D reduc_info->reduc_initial_values[0];
>           neutral_op =3D neutral_op_for_reduction (TREE_TYPE (vectype), c=
ode,
> -                                                initial_value);
> +                                                initial_value, false);
>         }
>        if (neutral_op)
>         vector_identity =3D gimple_build_vector_from_val (&seq, vectype,
> @@ -6860,8 +6872,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_v=
info,
>                                gimple_stmt_iterator *gsi,
>                                gimple **vec_stmt, slp_tree slp_node,
>                                gimple *reduc_def_stmt,
> -                              tree_code code, internal_fn reduc_fn,
> -                              tree ops[3], tree vectype_in,
> +                              code_helper code, internal_fn reduc_fn,
> +                              tree *ops, int num_ops, tree vectype_in,
>                                int reduc_index, vec_loop_masks *masks,
>                                vec_loop_lens *lens)
>  {
> @@ -6877,17 +6889,40 @@ vectorize_fold_left_reduction (loop_vec_info loop=
_vinfo,
>
>    gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
>    gcc_assert (ncopies =3D=3D 1);
> -  gcc_assert (TREE_CODE_LENGTH (code) =3D=3D binary_op);
> +
> +  bool is_cond_op =3D false;
> +  if (code.is_tree_code ())
> +    code =3D tree_code (code);
> +  else
> +    {
> +      gcc_assert (cond_fn_p (code));
> +      is_cond_op =3D true;
> +      code =3D conditional_internal_fn_code (internal_fn (code));
> +    }
> +
> +  gcc_assert (TREE_CODE_LENGTH (tree_code (code)) =3D=3D binary_op);
>
>    if (slp_node)
>      gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
>                           TYPE_VECTOR_SUBPARTS (vectype_in)));
>
> -  tree op0 =3D ops[1 - reduc_index];
> +  /* The operands either come from a binary operation or an IFN_COND ope=
ration.
> +     The former is a gimple assign with binary rhs and the latter is a
> +     gimple call with four arguments.  */
> +  gcc_assert (num_ops =3D=3D 2 || num_ops =3D=3D 4);
> +  tree op0, opmask;
> +  if (!is_cond_op)
> +    op0 =3D ops[1 - reduc_index];
> +  else
> +    {
> +      op0 =3D ops[2];
> +      opmask =3D ops[0];
> +      gcc_assert (!slp_node);
> +    }
>
>    int group_size =3D 1;
>    stmt_vec_info scalar_dest_def_info;
> -  auto_vec<tree> vec_oprnds0;
> +  auto_vec<tree> vec_oprnds0, vec_opmask;
>    if (slp_node)
>      {
>        auto_vec<vec<tree> > vec_defs (2);
> @@ -6903,9 +6938,17 @@ vectorize_fold_left_reduction (loop_vec_info loop_=
vinfo,
>        vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
>                                      op0, &vec_oprnds0);
>        scalar_dest_def_info =3D stmt_info;
> +
> +      /* For an IFN_COND_OP we also need the vector mask operand.  */
> +      if (is_cond_op)
> +         vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
> +                                        opmask, &vec_opmask);
>      }
>
> -  tree scalar_dest =3D gimple_assign_lhs (scalar_dest_def_info->stmt);
> +  gimple *sdef =3D scalar_dest_def_info->stmt;
> +  tree scalar_dest =3D is_gimple_call (sdef)
> +                      ? gimple_call_lhs (sdef)
> +                      : gimple_assign_lhs (scalar_dest_def_info->stmt);
>    tree scalar_type =3D TREE_TYPE (scalar_dest);
>    tree reduc_var =3D gimple_phi_result (reduc_def_stmt);
>
> @@ -6939,17 +6982,20 @@ vectorize_fold_left_reduction (loop_vec_info loop=
_vinfo,
>        tree bias =3D NULL_TREE;
>        if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>         mask =3D vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vec=
type_in, i);
> +      else if (is_cond_op)
> +       mask =3D vec_opmask[0];
>        if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>         {
>           len =3D vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vect=
ype_in,
>                                    i, 1);
>           signed char biasval =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loo=
p_vinfo);
>           bias =3D build_int_cst (intQI_type_node, biasval);
> -         mask =3D build_minus_one_cst (truth_type_for (vectype_in));
> +         if (!is_cond_op)
> +           mask =3D build_minus_one_cst (truth_type_for (vectype_in));
>         }
>
>        /* Handle MINUS by adding the negative.  */
> -      if (reduc_fn !=3D IFN_LAST && code =3D=3D MINUS_EXPR)
> +      if (reduc_fn !=3D IFN_LAST && tree_code (code) =3D=3D MINUS_EXPR)
>         {
>           tree negated =3D make_ssa_name (vectype_out);
>           new_stmt =3D gimple_build_assign (negated, NEGATE_EXPR, def0);
> @@ -6957,7 +7003,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_v=
info,
>           def0 =3D negated;
>         }
>
> -      if (mask && mask_reduc_fn =3D=3D IFN_LAST)
> +      if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +         && mask && mask_reduc_fn =3D=3D IFN_LAST)
>         def0 =3D merge_with_identity (gsi, mask, vectype_out, def0,
>                                     vector_identity);
>
> @@ -6988,8 +7035,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_v=
info,
>         }
>        else
>         {
> -         reduc_var =3D vect_expand_fold_left (gsi, scalar_dest_var, code=
,
> -                                            reduc_var, def0);
> +         reduc_var =3D vect_expand_fold_left (gsi, scalar_dest_var,
> +                                            tree_code (code), reduc_var,=
 def0);
>           new_stmt =3D SSA_NAME_DEF_STMT (reduc_var);
>           /* Remove the statement, so that we can use the same code paths
>              as for statements that we've just created.  */
> @@ -7440,6 +7487,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        if (i =3D=3D STMT_VINFO_REDUC_IDX (stmt_info))
>         continue;
>
> +      /* For an IFN_COND_OP we might hit the reduction definition operan=
d
> +        twice (once as definition, once as else).  */
> +      if (op.ops[i] =3D=3D op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
> +       continue;
> +
>        /* There should be only one cycle def in the stmt, the one
>           leading to reduc_def.  */
>        if (VECTORIZABLE_CYCLE_DEF (dt))
> @@ -7640,6 +7692,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>            when generating the code inside the loop.  */
>
>    code_helper orig_code =3D STMT_VINFO_REDUC_CODE (phi_info);
> +
> +  /* If conversion might have created a conditional operation like
> +     IFN_COND_ADD already.  Use the internal code for the following chec=
ks.  */
> +  if (cond_fn_p (orig_code))
> +      orig_code =3D conditional_internal_fn_code
> +       (as_internal_fn(combined_fn (orig_code)));
> +
>    STMT_VINFO_REDUC_CODE (reduc_info) =3D orig_code;
>
>    vect_reduction_type reduction_type =3D STMT_VINFO_REDUC_TYPE (reduc_in=
fo);
> @@ -7678,7 +7737,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                           "reduction: not commutative/associative");
> +                           "reduction: not commutative/associative\n");
>           return false;
>         }
>      }
> @@ -8213,6 +8272,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>
>    code_helper code =3D canonicalize_code (op.code, op.type);
>    internal_fn cond_fn =3D get_conditional_internal_fn (code, op.type);
> +
>    vec_loop_masks *masks =3D &LOOP_VINFO_MASKS (loop_vinfo);
>    vec_loop_lens *lens =3D &LOOP_VINFO_LENS (loop_vinfo);
>    bool mask_by_cond_expr =3D use_mask_by_cond_expr_p (code, cond_fn, vec=
type_in);
> @@ -8231,17 +8291,21 @@ vect_transform_reduction (loop_vec_info loop_vinf=
o,
>    if (code =3D=3D COND_EXPR)
>      gcc_assert (ncopies =3D=3D 1);
>
> +  /* A COND_OP reduction must have the same definition and else value. *=
/
> +  if (cond_fn_p (code))
> +    gcc_assert (op.num_ops =3D=3D 4 && (op.ops[1] =3D=3D op.ops[3]));
> +
>    bool masked_loop_p =3D LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>
>    vect_reduction_type reduction_type =3D STMT_VINFO_REDUC_TYPE (reduc_in=
fo);
>    if (reduction_type =3D=3D FOLD_LEFT_REDUCTION)
>      {
>        internal_fn reduc_fn =3D STMT_VINFO_REDUC_FN (reduc_info);
> -      gcc_assert (code.is_tree_code ());
> +      gcc_assert (code.is_tree_code () || cond_fn_p (code));
>        return vectorize_fold_left_reduction
>           (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
> -          tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, m=
asks,
> -          lens);
> +          code, reduc_fn, op.ops, op.num_ops, vectype_in,
> +          reduc_index, masks, lens);
>      }
>
>    bool single_defuse_cycle =3D STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info=
);
> @@ -8254,14 +8318,20 @@ vect_transform_reduction (loop_vec_info loop_vinf=
o,
>    tree scalar_dest =3D gimple_get_lhs (stmt_info->stmt);
>    tree vec_dest =3D vect_create_destination_var (scalar_dest, vectype_ou=
t);
>
> +  /* Get NCOPIES vector definitions for all operands except the reductio=
n
> +     definition.  */
>    vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
>                      single_defuse_cycle && reduc_index =3D=3D 0
>                      ? NULL_TREE : op.ops[0], &vec_oprnds0,
>                      single_defuse_cycle && reduc_index =3D=3D 1
>                      ? NULL_TREE : op.ops[1], &vec_oprnds1,
> -                    op.num_ops =3D=3D 3
> -                    && !(single_defuse_cycle && reduc_index =3D=3D 2)
> +                    op.num_ops =3D=3D 4
> +                    || (op.num_ops =3D=3D 3
> +                        && !(single_defuse_cycle && reduc_index =3D=3D 2=
))
>                      ? op.ops[2] : NULL_TREE, &vec_oprnds2);
> +
> +  /* For single def-use cycles get one copy of the vectorized reduction
> +     definition.  */
>    if (single_defuse_cycle)
>      {
>        gcc_assert (!slp_node);
> @@ -8301,7 +8371,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>         }
>        else
>         {
> -         if (op.num_ops =3D=3D 3)
> +         if (op.num_ops >=3D 3)
>             vop[2] =3D vec_oprnds2[i];
>
>           if (masked_loop_p && mask_by_cond_expr)
> @@ -8314,10 +8384,16 @@ vect_transform_reduction (loop_vec_info loop_vinf=
o,
>           if (emulated_mixed_dot_prod)
>             new_stmt =3D vect_emulate_mixed_dot_prod (loop_vinfo, stmt_in=
fo, gsi,
>                                                     vec_dest, vop);
> -         else if (code.is_internal_fn ())
> +
> +         else if (code.is_internal_fn () && !cond_fn_p (code))
>             new_stmt =3D gimple_build_call_internal (internal_fn (code),
>                                                    op.num_ops,
>                                                    vop[0], vop[1], vop[2]=
);
> +         else if (cond_fn_p (code))
> +           new_stmt =3D gimple_build_call_internal (internal_fn (code),
> +                                                  op.num_ops,
> +                                                  vop[0], vop[1], vop[2]=
,
> +                                                  vop[1]);
>           else
>             new_stmt =3D gimple_build_assign (vec_dest, tree_code (op.cod=
e),
>                                             vop[0], vop[1], vop[2]);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index f1d0cd79961..e22067400af 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2319,7 +2319,7 @@ extern tree vect_create_addr_base_for_vector_ref (v=
ec_info *,
>                                                   tree);
>
>  /* In tree-vect-loop.cc.  */
> -extern tree neutral_op_for_reduction (tree, code_helper, tree);
> +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool =3D =
true);
>  extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_=
vinfo);
>  bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
>  /* Used in tree-vect-loop-manip.cc */
> --
> 2.41.0