From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id 9ECA73858D28 for ; Thu, 2 Nov 2023 23:26:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9ECA73858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9ECA73858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698967600; cv=none; b=rMufiSE/DKtlWqT7IdGOFWSFC9xF6cRRt0wrgJuT8U6IpQQZgZFafe4n2esh+CxGonExJmH4OKweEwXEWpivAI0Q+hbgtRg4A6/JFI0qVohSEDBCaMvv3ZKLAVt5e+3gp6d9JW8VczHvbWjjY/y0Dmow1DHGPGpLYqPLr67cyu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698967600; c=relaxed/simple; bh=PqcO+8c6e1cd3093THTEoWuADV8tiUiwJnQegytWTl8=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=JA4Uvm63S0LdjEnKeryfgbqAn/6QHUa91HaeJ/azy5TRpO3KI9zHs0JkWKSCEetcxtcu9WC9mqTQs2F4n8xxxuvbI5yPrNvkOnLzIDDqJwvtSNfw+MCN5MnY3Uqx4x8K5iKVuUvWdnoHJT5mRe2s7CUHAPPCq3YtT8r1tI8emu0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-565334377d0so1100884a12.2 for ; Thu, 02 Nov 2023 16:26:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698967588; x=1699572388; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5paj8H8RXVmwUJoPuK6B/emVvTa/NVhM+KtogVAqstQ=; b=BO7R4ZkMoOy2lxzqpX1EXHlVeJj6fCWrPGVd7uCEcUYuRnrZKTR07pYKndFWiweY0i ddE6sFHWGNEE/xnHpHsJS06CFG4KBtGtnyY2ayvFkup83cMa5ZoYwj/VxC73UeLTYXTv S/MJ3+p6f5obmDSutRWtfJAP9ZSuOBnLd5EsmjLTPV72peY/C/ZW8yRsy16MglmU59pq mktPJpDYmCxO8vnuetmcoh3ShTeKU8L97KMWL6M2Q+iodjAw763NjutPS9kvILl9jtSd KYaxiRmuHxQofth/tJzPNYgN6XUj5M+M/U3dpda+o7YNuugp/AlgSikbvOrs5eYq7iF0 GCFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698967588; x=1699572388; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5paj8H8RXVmwUJoPuK6B/emVvTa/NVhM+KtogVAqstQ=; b=gHeXbwURZ5NsEOUNDtd5TkzEzTJSLY/53/8vJJ+3kBlBmhgf95KWQb71RYgLERSMJZ keiddNC+doVSm5zImOmLv7SdmQb1p1SNAPr7bI7zZwF0IKlJvQgp7vFT0yDbJcjif/Ah H34BHQXlTb+CrIsuslWS4XdIfQIWTp/z/3s4TkZwHwFO5ICewo2pByMvkRCF0F9BbTVP By8/nxbv8WxpEb0ugg+E+IaLxgXPv/JRr9v9e0fFXDdU8pUjVMwApF8BUkJfnT3OMMiF fZOTDAJTrHGf1nrrCzz+nJ/ZYzsUCzheilgdYWKqgg3CvPZ4VbNYKNDN7Z5S46wb8AtJ 2geA== X-Gm-Message-State: AOJu0YwstlT3hVK7hVa3zSEcIqK0wIHiYlmV3B41QDqUUhuYni/UU0fQ vGf59NFjuOFUBCCj0yfGCfWELbXa8N0dmhNNono= X-Google-Smtp-Source: AGHT+IEuOPqYMkTNaAWEvMTFzGLcWUHjekoFgCw49DVNuDQf1o1MYdHaXtWGVkdqUP4uL7hAMwwjYCx/pofU3wlXP+Y= X-Received: by 2002:a05:6a20:8c19:b0:161:2607:d815 with SMTP id j25-20020a056a208c1900b001612607d815mr17317635pzh.24.1698967588099; Thu, 02 Nov 2023 16:26:28 -0700 (PDT) MIME-Version: 1.0 References: <0193b63e-98dc-42bc-cd33-485361ea50bf@gmail.com> In-Reply-To: <0193b63e-98dc-42bc-cd33-485361ea50bf@gmail.com> From: Andrew Pinski Date: Thu, 2 Nov 2023 16:26:15 -0700 Message-ID: Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction. To: Robin Dapp Cc: gcc-patches , Richard Biener Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Sep 20, 2023 at 6:52=E2=80=AFAM Robin Dapp wr= ote: > > Hi, > > as described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD during ifcvt and > adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > Related question/change: We only allow PLUS_EXPR in fold_left_reduction_f= n > but have code to handle MINUS_EXPR in vectorize_fold_left_reduction. I > suppose that's intentional but it "just works" on riscv and the testsuite > doesn't change when allowing MINUS_EXPR so I went ahead and did that. > > Bootstrapped and regtested on x86 and aarch64. This caused gcc.target/i386/avx512f-reduce-op-1.c testcase to start to fail when testing on a x86_64 that has avx512f (In my case I am using `Intel(R) Xeon(R) D-2166NT CPU @ 2.00GHz`). I reverted the commit to double check it too. The difference in optimized I see is: if (_40 !=3D 3.5e+1) // working vs if (_40 !=3D 6.4e+1) // not working It is test_epi32_ps which is failing with TEST_PS macro and the plus operand that uses TESTOP: TESTOP (add, +, float, ps, 0.0f); \ I have not reduced the testcase any further though. Thanks, Andrew Pinski > > Regards > Robin > > gcc/ChangeLog: > > PR middle-end/111401 > * internal-fn.cc (cond_fn_p): New function. > * internal-fn.h (cond_fn_p): Define. > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD. > (neutral_op_for_reduction): Return -0 for PLUS. > (vect_is_simple_reduction): Don't count else operand in > COND_ADD. > (vectorize_fold_left_reduction): Add COND_ADD handling. > (vectorizable_reduction): Don't count else operand in COND_ADD. > (vect_transform_reduction): Add COND_ADD handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > --- > gcc/internal-fn.cc | 38 +++++ > gcc/internal-fn.h | 1 + > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 ++++++++++++++++++ > .../riscv/rvv/autovec/cond/pr111401.c | 61 ++++++++ > gcc/tree-if-conv.cc | 63 ++++++-- > gcc/tree-vect-loop.cc | 130 ++++++++++++---- > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 394 insertions(+), 42 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-= signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111= 401.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0fd34359247..77939890f5a 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn) > } > } > > +/* Return true if this CODE describes a conditional (masked) internal_fn= . */ > + > +bool > +cond_fn_p (code_helper code) > +{ > + if (!code.is_fn_code ()) > + return false; > + > + if (!internal_fn_p ((combined_fn) code)) > + return false; > + > + internal_fn fn =3D as_internal_fn ((combined_fn) code); > + switch (fn) > + { > + #undef DEF_INTERNAL_COND_FN > + #define DEF_INTERNAL_COND_FN(NAME, F, O, T) = \ > + case IFN_COND_##NAME: \ > + case IFN_COND_LEN_##NAME: \ > + return true; > + #include "internal-fn.def" > + #undef DEF_INTERNAL_COND_FN > + > + #undef DEF_INTERNAL_SIGNED_COND_FN > + #define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T) \ > + case IFN_COND_##NAME: \ > + case IFN_COND_LEN_##NAME: \ > + return true; > + #include "internal-fn.def" > + #undef DEF_INTERNAL_SIGNED_COND_FN > + > + default: > + return false; > + } > + > + return false; > +} > + > + > /* Return true if this CODE describes an internal_fn that returns a vect= or with > elements twice as wide as the element size of the input vectors. */ > > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > index 99de13a0199..f1cc9db29c0 100644 > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn); > extern int first_commutative_argument (internal_fn); > extern bool associative_binary_fn_p (internal_fn); > extern bool widening_fn_p (code_helper); > +extern bool cond_fn_p (code_helper code); > > extern bool set_edom_supported_p (void); > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-= zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 00000000000..57c600838ee > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=3Dc99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) > +reduc_plus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res =3D init; > + for (int i =3D 0; i < n; i++) > + if (cond[i]) > + res +=3D a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n= ) > +{ > + double res =3D init; > + for (int i =3D 0; i < n; i++) > + if (cond[i]) > + res +=3D a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone)) > +reduc_minus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res =3D init; > + for (int i =3D 0; i < n; i++) > + if (cond[i]) > + res -=3D a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_minus_double_ref (double *restrict a, double init, int *cond, int = n) > +{ > + double res =3D init; > + for (int i =3D 0; i < n; i++) > + if (cond[i]) > + res -=3D a[i]; > + return res; > +} > + > +int __attribute__ ((optimize (1))) > +main () > +{ > + int n =3D 19; > + double a[N]; > + int cond1[N], cond2[N]; > + > + for (int i =3D 0; i < N; i++) > + { > + a[i] =3D (i * 0.1) * (i & 1 ? 1 : -1); > + cond1[i] =3D 0; > + cond2[i] =3D i & 4 ? 1 : 0; > + asm volatile ("" ::: "memory"); > + } > + > + double res1 =3D reduc_plus_double (a, -0.0, cond1, n); > + double ref1 =3D reduc_plus_double_ref (a, -0.0, cond1, n); > + double res2 =3D reduc_minus_double (a, -0.0, cond1, n); > + double ref2 =3D reduc_minus_double_ref (a, -0.0, cond1, n); > + double res3 =3D reduc_plus_double (a, -0.0, cond1, n); > + double ref3 =3D reduc_plus_double_ref (a, -0.0, cond1, n); > + double res4 =3D reduc_minus_double (a, -0.0, cond1, n); > + double ref4 =3D reduc_minus_double_ref (a, -0.0, cond1, n); > + > + if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1)) > + __builtin_abort (); > + if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2)) > + __builtin_abort (); > + if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3)) > + __builtin_abort (); > + if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4)) > + __builtin_abort (); > + > + res1 =3D reduc_plus_double (a, 0.0, cond1, n); > + ref1 =3D reduc_plus_double_ref (a, 0.0, cond1, n); > + res2 =3D reduc_minus_double (a, 0.0, cond1, n); > + ref2 =3D reduc_minus_double_ref (a, 0.0, cond1, n); > + res3 =3D reduc_plus_double (a, 0.0, cond1, n); > + ref3 =3D reduc_plus_double_ref (a, 0.0, cond1, n); > + res4 =3D reduc_minus_double (a, 0.0, cond1, n); > + ref4 =3D reduc_minus_double_ref (a, 0.0, cond1, n); > + > + if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1)) > + __builtin_abort (); > + if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2)) > + __builtin_abort (); > + if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3)) > + __builtin_abort (); > + if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4)) > + __builtin_abort (); > + > + res1 =3D reduc_plus_double (a, -0.0, cond2, n); > + ref1 =3D reduc_plus_double_ref (a, -0.0, cond2, n); > + res2 =3D reduc_minus_double (a, -0.0, cond2, n); > + ref2 =3D reduc_minus_double_ref (a, -0.0, cond2, n); > + res3 =3D reduc_plus_double (a, -0.0, cond2, n); > + ref3 =3D reduc_plus_double_ref (a, -0.0, cond2, n); > + res4 =3D reduc_minus_double (a, -0.0, cond2, n); > + ref4 =3D reduc_minus_double_ref (a, -0.0, cond2, n); > + > + if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1)) > + __builtin_abort (); > + if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2)) > + __builtin_abort (); > + if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3)) > + __builtin_abort (); > + if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4)) > + __builtin_abort (); > + > + res1 =3D reduc_plus_double (a, 0.0, cond2, n); > + ref1 =3D reduc_plus_double_ref (a, 0.0, cond2, n); > + res2 =3D reduc_minus_double (a, 0.0, cond2, n); > + ref2 =3D reduc_minus_double_ref (a, 0.0, cond2, n); > + res3 =3D reduc_plus_double (a, 0.0, cond2, n); > + ref3 =3D reduc_plus_double_ref (a, 0.0, cond2, n); > + res4 =3D reduc_minus_double (a, 0.0, cond2, n); > + ref4 =3D reduc_minus_double_ref (a, 0.0, cond2, n); > + > + if (res1 !=3D ref1 || signbit (res1) !=3D signbit (ref1)) > + __builtin_abort (); > + if (res2 !=3D ref2 || signbit (res2) !=3D signbit (ref2)) > + __builtin_abort (); > + if (res3 !=3D ref3 || signbit (res3) !=3D signbit (ref3)) > + __builtin_abort (); > + if (res4 !=3D ref4 || signbit (res4) !=3D signbit (ref4)) > + __builtin_abort (); > + > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c b= /gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > new file mode 100644 > index 00000000000..1d559ce5391 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > @@ -0,0 +1,61 @@ > +/* { dg-do run { target { riscv_vector } } } */ > +/* { dg-additional-options "-march=3Drv64gcv -mabi=3Dlp64d --param riscv= -autovec-preference=3Dscalable -fdump-tree-vect-details" } */ > + > +double > +__attribute__ ((noipa)) > +foo2 (double *__restrict a, double init, int *__restrict cond, int n) > +{ > + for (int i =3D 0; i < n; i++) > + if (cond[i]) > + init +=3D a[i]; > + return init; > +} > + > +double > +__attribute__ ((noipa)) > +foo3 (double *__restrict a, double init, int *__restrict cond, int n) > +{ > + for (int i =3D 0; i < n; i++) > + if (cond[i]) > + init -=3D a[i]; > + return init; > +} > + > +#define SZ 125 > + > +__attribute__ ((optimize ("1"))) > +int > +main () > +{ > + double res1 =3D 0, res2 =3D 0; > + double a1[SZ], a2[SZ]; > + int c1[SZ], c2[SZ]; > + for (int i =3D 0; i < SZ; i++) > + { > + a1[i] =3D i * 3 + (i & 4) - (i & 7); > + a2[i] =3D i * 3 + (i & 4) - (i & 7); > + c1[i] =3D i & 1; > + c2[i] =3D i & 1; > + } > + > + double init1 =3D 2.7, init2 =3D 8.2; > + double ref1 =3D init1, ref2 =3D init2; > + for (int i =3D 0; i < SZ; i++) > + { > + if (c1[i]) > + ref1 +=3D a1[i]; > + if (c2[i]) > + ref2 -=3D a2[i]; > + } > + > + res1 =3D foo2 (a1, init1, c1, SZ); > + res2 =3D foo3 (a2, init2, c2, SZ); > + > + if (res1 !=3D ref1) > + __builtin_abort (); > + if (res2 !=3D ref2) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" } } */ > +/* { dg-final { scan-tree-dump-not "VCOND_MASK" "vect" } } */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 799f071965e..425976b0861 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1852,10 +1852,12 @@ convert_scalar_cond_reduction (gimple *reduc, gim= ple_stmt_iterator *gsi, > gimple *new_assign; > tree rhs; > tree rhs1 =3D gimple_assign_rhs1 (reduc); > + tree lhs =3D gimple_assign_lhs (reduc); > tree tmp =3D make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_ifc_"); > tree c; > enum tree_code reduction_op =3D gimple_assign_rhs_code (reduc); > - tree op_nochange =3D neutral_op_for_reduction (TREE_TYPE (rhs1), reduc= tion_op, NULL); > + tree op_nochange =3D neutral_op_for_reduction (TREE_TYPE (rhs1), reduc= tion_op, > + NULL, false); > gimple_seq stmts =3D NULL; > > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -1864,19 +1866,52 @@ convert_scalar_cond_reduction (gimple *reduc, gim= ple_stmt_iterator *gsi, > print_gimple_stmt (dump_file, reduc, 0, TDF_SLIM); > } > > - /* Build cond expression using COND and constant operand > - of reduction rhs. */ > - c =3D fold_build_cond_expr (TREE_TYPE (rhs1), > - unshare_expr (cond), > - swap ? op_nochange : op1, > - swap ? op1 : op_nochange); > + /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR an= d > + a PLUS_EXPR. Don't do this if the reduction def operand itself is > + a vectorizable call as we can create a COND version of it directly.= */ > + internal_fn ifn; > + ifn =3D get_conditional_internal_fn (reduction_op); > > - /* Create assignment stmt and insert it at GSI. */ > - new_assign =3D gimple_build_assign (tmp, c); > - gsi_insert_before (gsi, new_assign, GSI_SAME_STMT); > - /* Build rhs for unconditional increment/decrement/logic_operation. *= / > - rhs =3D gimple_build (&stmts, reduction_op, > - TREE_TYPE (rhs1), op0, tmp); > + bool try_cond_op =3D true; > + gimple *opstmt; > + if (TREE_CODE (op1) =3D=3D SSA_NAME > + && (opstmt =3D SSA_NAME_DEF_STMT (op1)) > + && is_gimple_call (opstmt)) > + { > + combined_fn cfn =3D gimple_call_combined_fn (opstmt); > + internal_fn ifnop; > + reduction_fn_for_scalar_code (cfn, &ifnop); > + if (vectorized_internal_fn_supported_p (ifnop, TREE_TYPE > + (gimple_call_lhs (opstmt)))= ) > + try_cond_op =3D false; > + } > + > + if (ifn !=3D IFN_LAST > + && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs)) > + && try_cond_op && !swap) > + { > + gcall *cond_call =3D gimple_build_call_internal (ifn, 4, > + unshare_expr (cond), > + op0, op1, op0); > + gsi_insert_before (gsi, cond_call, GSI_SAME_STMT); > + gimple_call_set_lhs (cond_call, tmp); > + rhs =3D tmp; > + } > + else > + { > + /* Build cond expression using COND and constant operand > + of reduction rhs. */ > + c =3D fold_build_cond_expr (TREE_TYPE (rhs1), > + unshare_expr (cond), > + swap ? op_nochange : op1, > + swap ? op1 : op_nochange); > + /* Create assignment stmt and insert it at GSI. */ > + new_assign =3D gimple_build_assign (tmp, c); > + gsi_insert_before (gsi, new_assign, GSI_SAME_STMT); > + /* Build rhs for unconditional increment/decrement/logic_operation= . */ > + rhs =3D gimple_build (&stmts, reduction_op, > + TREE_TYPE (rhs1), op0, tmp); > + } > > if (has_nop) > { > @@ -2241,7 +2276,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterat= or *gsi) > { > /* Convert reduction stmt into vectorizable form. */ > rhs =3D convert_scalar_cond_reduction (reduc, gsi, cond, op0, o= p1, > - swap,has_nop, nop_reduc); > + swap, has_nop, nop_reduc); > redundant_ssa_names.safe_push (std::make_pair (res, rhs)); > } > new_stmt =3D gimple_build_assign (res, rhs); > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 23c6e8259e7..94d3cead1e6 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_share= d *shared) > static bool > fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) > { > - if (code =3D=3D PLUS_EXPR) > + if (code =3D=3D PLUS_EXPR || code =3D=3D MINUS_EXPR) > { > *reduc_fn =3D IFN_FOLD_LEFT_PLUS; > return true; > @@ -3751,23 +3751,29 @@ reduction_fn_for_scalar_code (code_helper code, i= nternal_fn *reduc_fn) > by the introduction of additional X elements, return that X, otherwis= e > return null. CODE is the code of the reduction and SCALAR_TYPE is ty= pe > of the scalar elements. If the reduction has just a single initial v= alue > - then INITIAL_VALUE is that value, otherwise it is null. */ > + then INITIAL_VALUE is that value, otherwise it is null. > + If AS_INITIAL is TRUE the value is supposed to be used as initial val= ue. > + In that case no signed zero is returned. */ > > tree > neutral_op_for_reduction (tree scalar_type, code_helper code, > - tree initial_value) > + tree initial_value, bool as_initial) > { > if (code.is_tree_code ()) > switch (tree_code (code)) > { > - case WIDEN_SUM_EXPR: > case DOT_PROD_EXPR: > case SAD_EXPR: > - case PLUS_EXPR: > case MINUS_EXPR: > case BIT_IOR_EXPR: > case BIT_XOR_EXPR: > return build_zero_cst (scalar_type); > + case WIDEN_SUM_EXPR: > + case PLUS_EXPR: > + if (!as_initial && HONOR_SIGNED_ZEROS (scalar_type)) > + return build_real (scalar_type, dconstm0); > + else > + return build_zero_cst (scalar_type); > > case MULT_EXPR: > return build_one_cst (scalar_type); > @@ -4106,8 +4112,14 @@ vect_is_simple_reduction (loop_vec_info loop_info,= stmt_vec_info phi_info, > return NULL; > } > > - nphi_def_loop_uses++; > - phi_use_stmt =3D use_stmt; > + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might h= ave > + op1 twice (once as definition, once as else) in the same operati= on. > + Only count it as one. */ > + if (use_stmt !=3D phi_use_stmt) > + { > + nphi_def_loop_uses++; > + phi_use_stmt =3D use_stmt; > + } > } > > tree latch_def =3D PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop))= ; > @@ -6378,7 +6390,7 @@ vect_create_epilog_for_reduction (loop_vec_info loo= p_vinfo, > if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) > initial_value =3D reduc_info->reduc_initial_values[0]; > neutral_op =3D neutral_op_for_reduction (TREE_TYPE (vectype), c= ode, > - initial_value); > + initial_value, false); > } > if (neutral_op) > vector_identity =3D gimple_build_vector_from_val (&seq, vectype, > @@ -6860,8 +6872,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_v= info, > gimple_stmt_iterator *gsi, > gimple **vec_stmt, slp_tree slp_node, > gimple *reduc_def_stmt, > - tree_code code, internal_fn reduc_fn, > - tree ops[3], tree vectype_in, > + code_helper code, internal_fn reduc_fn, > + tree *ops, int num_ops, tree vectype_in, > int reduc_index, vec_loop_masks *masks, > vec_loop_lens *lens) > { > @@ -6877,17 +6889,40 @@ vectorize_fold_left_reduction (loop_vec_info loop= _vinfo, > > gcc_assert (!nested_in_vect_loop_p (loop, stmt_info)); > gcc_assert (ncopies =3D=3D 1); > - gcc_assert (TREE_CODE_LENGTH (code) =3D=3D binary_op); > + > + bool is_cond_op =3D false; > + if (code.is_tree_code ()) > + code =3D tree_code (code); > + else > + { > + gcc_assert (cond_fn_p (code)); > + is_cond_op =3D true; > + code =3D conditional_internal_fn_code (internal_fn (code)); > + } > + > + gcc_assert (TREE_CODE_LENGTH (tree_code (code)) =3D=3D binary_op); > > if (slp_node) > gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out), > TYPE_VECTOR_SUBPARTS (vectype_in))); > > - tree op0 =3D ops[1 - reduc_index]; > + /* The operands either come from a binary operation or an IFN_COND ope= ration. > + The former is a gimple assign with binary rhs and the latter is a > + gimple call with four arguments. */ > + gcc_assert (num_ops =3D=3D 2 || num_ops =3D=3D 4); > + tree op0, opmask; > + if (!is_cond_op) > + op0 =3D ops[1 - reduc_index]; > + else > + { > + op0 =3D ops[2]; > + opmask =3D ops[0]; > + gcc_assert (!slp_node); > + } > > int group_size =3D 1; > stmt_vec_info scalar_dest_def_info; > - auto_vec vec_oprnds0; > + auto_vec vec_oprnds0, vec_opmask; > if (slp_node) > { > auto_vec > vec_defs (2); > @@ -6903,9 +6938,17 @@ vectorize_fold_left_reduction (loop_vec_info loop_= vinfo, > vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, > op0, &vec_oprnds0); > scalar_dest_def_info =3D stmt_info; > + > + /* For an IFN_COND_OP we also need the vector mask operand. */ > + if (is_cond_op) > + vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, > + opmask, &vec_opmask); > } > > - tree scalar_dest =3D gimple_assign_lhs (scalar_dest_def_info->stmt); > + gimple *sdef =3D scalar_dest_def_info->stmt; > + tree scalar_dest =3D is_gimple_call (sdef) > + ? gimple_call_lhs (sdef) > + : gimple_assign_lhs (scalar_dest_def_info->stmt); > tree scalar_type =3D TREE_TYPE (scalar_dest); > tree reduc_var =3D gimple_phi_result (reduc_def_stmt); > > @@ -6939,17 +6982,20 @@ vectorize_fold_left_reduction (loop_vec_info loop= _vinfo, > tree bias =3D NULL_TREE; > if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > mask =3D vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vec= type_in, i); > + else if (is_cond_op) > + mask =3D vec_opmask[0]; > if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > { > len =3D vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vect= ype_in, > i, 1); > signed char biasval =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loo= p_vinfo); > bias =3D build_int_cst (intQI_type_node, biasval); > - mask =3D build_minus_one_cst (truth_type_for (vectype_in)); > + if (!is_cond_op) > + mask =3D build_minus_one_cst (truth_type_for (vectype_in)); > } > > /* Handle MINUS by adding the negative. */ > - if (reduc_fn !=3D IFN_LAST && code =3D=3D MINUS_EXPR) > + if (reduc_fn !=3D IFN_LAST && tree_code (code) =3D=3D MINUS_EXPR) > { > tree negated =3D make_ssa_name (vectype_out); > new_stmt =3D gimple_build_assign (negated, NEGATE_EXPR, def0); > @@ -6957,7 +7003,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_v= info, > def0 =3D negated; > } > > - if (mask && mask_reduc_fn =3D=3D IFN_LAST) > + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + && mask && mask_reduc_fn =3D=3D IFN_LAST) > def0 =3D merge_with_identity (gsi, mask, vectype_out, def0, > vector_identity); > > @@ -6988,8 +7035,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_v= info, > } > else > { > - reduc_var =3D vect_expand_fold_left (gsi, scalar_dest_var, code= , > - reduc_var, def0); > + reduc_var =3D vect_expand_fold_left (gsi, scalar_dest_var, > + tree_code (code), reduc_var,= def0); > new_stmt =3D SSA_NAME_DEF_STMT (reduc_var); > /* Remove the statement, so that we can use the same code paths > as for statements that we've just created. */ > @@ -7440,6 +7487,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > if (i =3D=3D STMT_VINFO_REDUC_IDX (stmt_info)) > continue; > > + /* For an IFN_COND_OP we might hit the reduction definition operan= d > + twice (once as definition, once as else). */ > + if (op.ops[i] =3D=3D op.ops[STMT_VINFO_REDUC_IDX (stmt_info)]) > + continue; > + > /* There should be only one cycle def in the stmt, the one > leading to reduc_def. */ > if (VECTORIZABLE_CYCLE_DEF (dt)) > @@ -7640,6 +7692,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > when generating the code inside the loop. */ > > code_helper orig_code =3D STMT_VINFO_REDUC_CODE (phi_info); > + > + /* If conversion might have created a conditional operation like > + IFN_COND_ADD already. Use the internal code for the following chec= ks. */ > + if (cond_fn_p (orig_code)) > + orig_code =3D conditional_internal_fn_code > + (as_internal_fn(combined_fn (orig_code))); > + > STMT_VINFO_REDUC_CODE (reduc_info) =3D orig_code; > > vect_reduction_type reduction_type =3D STMT_VINFO_REDUC_TYPE (reduc_in= fo); > @@ -7678,7 +7737,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "reduction: not commutative/associative"); > + "reduction: not commutative/associative\n"); > return false; > } > } > @@ -8213,6 +8272,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > > code_helper code =3D canonicalize_code (op.code, op.type); > internal_fn cond_fn =3D get_conditional_internal_fn (code, op.type); > + > vec_loop_masks *masks =3D &LOOP_VINFO_MASKS (loop_vinfo); > vec_loop_lens *lens =3D &LOOP_VINFO_LENS (loop_vinfo); > bool mask_by_cond_expr =3D use_mask_by_cond_expr_p (code, cond_fn, vec= type_in); > @@ -8231,17 +8291,21 @@ vect_transform_reduction (loop_vec_info loop_vinf= o, > if (code =3D=3D COND_EXPR) > gcc_assert (ncopies =3D=3D 1); > > + /* A COND_OP reduction must have the same definition and else value. *= / > + if (cond_fn_p (code)) > + gcc_assert (op.num_ops =3D=3D 4 && (op.ops[1] =3D=3D op.ops[3])); > + > bool masked_loop_p =3D LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); > > vect_reduction_type reduction_type =3D STMT_VINFO_REDUC_TYPE (reduc_in= fo); > if (reduction_type =3D=3D FOLD_LEFT_REDUCTION) > { > internal_fn reduc_fn =3D STMT_VINFO_REDUC_FN (reduc_info); > - gcc_assert (code.is_tree_code ()); > + gcc_assert (code.is_tree_code () || cond_fn_p (code)); > return vectorize_fold_left_reduction > (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, > - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, m= asks, > - lens); > + code, reduc_fn, op.ops, op.num_ops, vectype_in, > + reduc_index, masks, lens); > } > > bool single_defuse_cycle =3D STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info= ); > @@ -8254,14 +8318,20 @@ vect_transform_reduction (loop_vec_info loop_vinf= o, > tree scalar_dest =3D gimple_get_lhs (stmt_info->stmt); > tree vec_dest =3D vect_create_destination_var (scalar_dest, vectype_ou= t); > > + /* Get NCOPIES vector definitions for all operands except the reductio= n > + definition. */ > vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, > single_defuse_cycle && reduc_index =3D=3D 0 > ? NULL_TREE : op.ops[0], &vec_oprnds0, > single_defuse_cycle && reduc_index =3D=3D 1 > ? NULL_TREE : op.ops[1], &vec_oprnds1, > - op.num_ops =3D=3D 3 > - && !(single_defuse_cycle && reduc_index =3D=3D 2) > + op.num_ops =3D=3D 4 > + || (op.num_ops =3D=3D 3 > + && !(single_defuse_cycle && reduc_index =3D=3D 2= )) > ? op.ops[2] : NULL_TREE, &vec_oprnds2); > + > + /* For single def-use cycles get one copy of the vectorized reduction > + definition. */ > if (single_defuse_cycle) > { > gcc_assert (!slp_node); > @@ -8301,7 +8371,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > } > else > { > - if (op.num_ops =3D=3D 3) > + if (op.num_ops >=3D 3) > vop[2] =3D vec_oprnds2[i]; > > if (masked_loop_p && mask_by_cond_expr) > @@ -8314,10 +8384,16 @@ vect_transform_reduction (loop_vec_info loop_vinf= o, > if (emulated_mixed_dot_prod) > new_stmt =3D vect_emulate_mixed_dot_prod (loop_vinfo, stmt_in= fo, gsi, > vec_dest, vop); > - else if (code.is_internal_fn ()) > + > + else if (code.is_internal_fn () && !cond_fn_p (code)) > new_stmt =3D gimple_build_call_internal (internal_fn (code), > op.num_ops, > vop[0], vop[1], vop[2]= ); > + else if (cond_fn_p (code)) > + new_stmt =3D gimple_build_call_internal (internal_fn (code), > + op.num_ops, > + vop[0], vop[1], vop[2]= , > + vop[1]); > else > new_stmt =3D gimple_build_assign (vec_dest, tree_code (op.cod= e), > vop[0], vop[1], vop[2]); > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index f1d0cd79961..e22067400af 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -2319,7 +2319,7 @@ extern tree vect_create_addr_base_for_vector_ref (v= ec_info *, > tree); > > /* In tree-vect-loop.cc. */ > -extern tree neutral_op_for_reduction (tree, code_helper, tree); > +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool =3D = true); > extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_= vinfo); > bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *); > /* Used in tree-vect-loop-manip.cc */ > -- > 2.41.0