From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 656C43858D37 for ; Mon, 23 Oct 2023 10:53:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 656C43858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 656C43858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698058427; cv=none; b=pju+npXH9YN3ew/d01mwM44S7Bgvw4XRPYAdap9u9W43E35TvWaometD9Tr9w8kcXo2T/J8YLSPoDsyiAKpVLmPQlDANO2SNq1n0FSBra17Kzh0OgsSZrDosHVyhObzyKUJbprYeBbG0CFdPluXgenyjw5v9s5e+4az9ilfxwhs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698058427; c=relaxed/simple; bh=jT6PJ2jkSlYCaWT94dK+ALkeMDo6NPhmCB5OdpqdLOw=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=OTf7EriaWEL5iDf/rlrivrrrU/QfpB5QlZT/WMNKG/a3oCRNBsjuHvqvMVbxvQy5BqUmPO/FBPCzLvJAhml8OOCc43eTFyQuxmYsn8Oy2rPAQeJYdiplVtYOiV/cKKcI7QeBEHtHm9xnt6bMM6HhBKxzY27eGpOOOx9zsGTr5eY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 2ADB11FD78; Mon, 23 Oct 2023 10:53:42 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id DB0872C805; Mon, 23 Oct 2023 10:53:41 +0000 (UTC) Date: Mon, 23 Oct 2023 10:53:42 +0000 (UTC) From: Richard Biener To: Robin Dapp cc: Tamar Christina , gcc-patches , richard.sandiford@arm.com Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction. In-Reply-To: <68c8aa07-996a-4591-a60c-e2a27f32cafa@gmail.com> Message-ID: References: <0193b63e-98dc-42bc-cd33-485361ea50bf@gmail.com> <671a575c-02ff-071b-967e-2e93d8986c1a@gmail.com> <85b08273-7eea-be3f-f08a-edf0780d36a7@gmail.com> <9861f38a-9716-44c4-957c-d53e1c96f567@gmail.com> <68c8aa07-996a-4591-a60c-e2a27f32cafa@gmail.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: Authentication-Results: smtp-out2.suse.de; dkim=none; dmarc=none; spf=softfail (smtp-out2.suse.de: 149.44.160.134 is neither permitted nor denied by domain of rguenther@suse.de) smtp.mailfrom=rguenther@suse.de X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-2.51 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.20)[suse.de]; R_SPF_SOFTFAIL(0.60)[~all:c]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; NEURAL_HAM_LONG(-3.00)[-1.000]; RWL_MAILSPIKE_GOOD(0.00)[149.44.160.134:from]; VIOLATED_DIRECT_SPF(3.50)[]; MX_GOOD(-0.01)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.20)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -2.51 X-Rspamd-Queue-Id: 2ADB11FD78 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 19 Oct 2023, Robin Dapp wrote: > Ugh, I didn't push yet because with a rebased trunk I am > seeing different behavior for some riscv testcases. > > A reduction is not recognized because there is yet another > "double use" occurrence in check_reduction_path. I guess it's > reasonable to loosen the restriction for conditional operations > here as well. > > The only change to v4 therefore is: > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index ebab1953b9c..64654a55e4c 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -4085,7 +4094,15 @@ pop: > || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))) > FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > cnt++; > - if (cnt != 1) > + > + bool cond_fn_p = op.code.is_internal_fn () > + && (conditional_internal_fn_code (internal_fn (*code)) > + != ERROR_MARK); > + > + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have > + op1 twice (once as definition, once as else) in the same operation. > + Allow this. */ > + if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2)) > > Bootstrapped and regtested again on x86, aarch64 and power10. > Testsuite on riscv unchanged. Hmm, why opi == 1 only? I think # _1 = PHI <.., _4> _3 = .COND_ADD (_1, _2, _1); _4 = .COND_ADD (_3, _5, _3); would be fine as well. I think we want to simply ignore the 'else' value of conditional internal functions. I suppose we have unary, binary and ternary conditional functions - I miss a internal_fn_else_index, but I suppose it's always the last one? I think a single use on .COND functions is also OK, even when on the 'else' value only? But maybe that's not too important here. Maybe gimple *op_use_stmt; unsigned cnt = 0; FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi]) if (.. op_use_stmt is conditional internal function ..) { for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j) if (gimple_call_arg (call, j) == op.ops[opi]) cnt++; } else if (!is_gimple_debug (op_use_stmt) && (*code != ERROR_MARK || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))) FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) cnt++; ? > Regards > Robin > > Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction. > > As described in PR111401 we currently emit a COND and a PLUS expression > for conditional reductions. This makes it difficult to combine both > into a masked reduction statement later. > This patch improves that by directly emitting a COND_ADD/COND_OP during > ifcvt and adjusting some vectorizer code to handle it. > > It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS > is true. > > gcc/ChangeLog: > > PR middle-end/111401 > * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP > if supported. > (predicate_scalar_phi): Add whitespace. > * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. > (neutral_op_for_reduction): Return -0 for PLUS. > (check_reduction_path): Don't count else operand in COND_OP. > (vect_is_simple_reduction): Ditto. > (vect_create_epilog_for_reduction): Fix whitespace. > (vectorize_fold_left_reduction): Add COND_OP handling. > (vectorizable_reduction): Don't count else operand in COND_OP. > (vect_transform_reduction): Add COND_OP handling. > * tree-vectorizer.h (neutral_op_for_reduction): Add default > parameter. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. > * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. > * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. > --- > .../vect-cond-reduc-in-order-2-signed-zero.c | 141 +++++++++++++++ > .../riscv/rvv/autovec/cond/pr111401.c | 139 +++++++++++++++ > .../riscv/rvv/autovec/reduc/reduc_call-2.c | 4 +- > .../riscv/rvv/autovec/reduc/reduc_call-4.c | 4 +- > gcc/tree-if-conv.cc | 49 +++-- > gcc/tree-vect-loop.cc | 168 ++++++++++++++---- > gcc/tree-vectorizer.h | 2 +- > 7 files changed, 456 insertions(+), 51 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > new file mode 100644 > index 00000000000..7b46e7d8a2a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > @@ -0,0 +1,141 @@ > +/* Make sure a -0 stays -0 when we perform a conditional reduction. */ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-add-options ieee } */ > +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */ > + > +#include "tree-vect.h" > + > +#include > + > +#define N (VECTOR_BITS * 17) > + > +double __attribute__ ((noinline, noclone)) > +reduc_plus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > + if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > + if (cond[i]) > + res += a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone)) > +reduc_minus_double (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > + if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +double __attribute__ ((noinline, noclone, optimize ("0"))) > +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n) > +{ > + double res = init; > + for (int i = 0; i < n; i++) > + if (cond[i]) > + res -= a[i]; > + return res; > +} > + > +int __attribute__ ((optimize (1))) > +main () > +{ > + int n = 19; > + double a[N]; > + int cond1[N], cond2[N]; > + > + for (int i = 0; i < N; i++) > + { > + a[i] = (i * 0.1) * (i & 1 ? 1 : -1); > + cond1[i] = 0; > + cond2[i] = i & 4 ? 1 : 0; > + asm volatile ("" ::: "memory"); > + } > + > + double res1 = reduc_plus_double (a, -0.0, cond1, n); > + double ref1 = reduc_plus_double_ref (a, -0.0, cond1, n); > + double res2 = reduc_minus_double (a, -0.0, cond1, n); > + double ref2 = reduc_minus_double_ref (a, -0.0, cond1, n); > + double res3 = reduc_plus_double (a, -0.0, cond1, n); > + double ref3 = reduc_plus_double_ref (a, -0.0, cond1, n); > + double res4 = reduc_minus_double (a, -0.0, cond1, n); > + double ref4 = reduc_minus_double_ref (a, -0.0, cond1, n); > + > + if (res1 != ref1 || signbit (res1) != signbit (ref1)) > + __builtin_abort (); > + if (res2 != ref2 || signbit (res2) != signbit (ref2)) > + __builtin_abort (); > + if (res3 != ref3 || signbit (res3) != signbit (ref3)) > + __builtin_abort (); > + if (res4 != ref4 || signbit (res4) != signbit (ref4)) > + __builtin_abort (); > + > + res1 = reduc_plus_double (a, 0.0, cond1, n); > + ref1 = reduc_plus_double_ref (a, 0.0, cond1, n); > + res2 = reduc_minus_double (a, 0.0, cond1, n); > + ref2 = reduc_minus_double_ref (a, 0.0, cond1, n); > + res3 = reduc_plus_double (a, 0.0, cond1, n); > + ref3 = reduc_plus_double_ref (a, 0.0, cond1, n); > + res4 = reduc_minus_double (a, 0.0, cond1, n); > + ref4 = reduc_minus_double_ref (a, 0.0, cond1, n); > + > + if (res1 != ref1 || signbit (res1) != signbit (ref1)) > + __builtin_abort (); > + if (res2 != ref2 || signbit (res2) != signbit (ref2)) > + __builtin_abort (); > + if (res3 != ref3 || signbit (res3) != signbit (ref3)) > + __builtin_abort (); > + if (res4 != ref4 || signbit (res4) != signbit (ref4)) > + __builtin_abort (); > + > + res1 = reduc_plus_double (a, -0.0, cond2, n); > + ref1 = reduc_plus_double_ref (a, -0.0, cond2, n); > + res2 = reduc_minus_double (a, -0.0, cond2, n); > + ref2 = reduc_minus_double_ref (a, -0.0, cond2, n); > + res3 = reduc_plus_double (a, -0.0, cond2, n); > + ref3 = reduc_plus_double_ref (a, -0.0, cond2, n); > + res4 = reduc_minus_double (a, -0.0, cond2, n); > + ref4 = reduc_minus_double_ref (a, -0.0, cond2, n); > + > + if (res1 != ref1 || signbit (res1) != signbit (ref1)) > + __builtin_abort (); > + if (res2 != ref2 || signbit (res2) != signbit (ref2)) > + __builtin_abort (); > + if (res3 != ref3 || signbit (res3) != signbit (ref3)) > + __builtin_abort (); > + if (res4 != ref4 || signbit (res4) != signbit (ref4)) > + __builtin_abort (); > + > + res1 = reduc_plus_double (a, 0.0, cond2, n); > + ref1 = reduc_plus_double_ref (a, 0.0, cond2, n); > + res2 = reduc_minus_double (a, 0.0, cond2, n); > + ref2 = reduc_minus_double_ref (a, 0.0, cond2, n); > + res3 = reduc_plus_double (a, 0.0, cond2, n); > + ref3 = reduc_plus_double_ref (a, 0.0, cond2, n); > + res4 = reduc_minus_double (a, 0.0, cond2, n); > + ref4 = reduc_minus_double_ref (a, 0.0, cond2, n); > + > + if (res1 != ref1 || signbit (res1) != signbit (ref1)) > + __builtin_abort (); > + if (res2 != ref2 || signbit (res2) != signbit (ref2)) > + __builtin_abort (); > + if (res3 != ref3 || signbit (res3) != signbit (ref3)) > + __builtin_abort (); > + if (res4 != ref4 || signbit (res4) != signbit (ref4)) > + __builtin_abort (); > + > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > new file mode 100644 > index 00000000000..83dbd61b3f3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c > @@ -0,0 +1,139 @@ > +/* { dg-do run { target { riscv_v } } } */ > +/* { dg-additional-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */ > + > +double > +__attribute__ ((noipa)) > +foo2 (double *__restrict a, double init, int *__restrict cond, int n) > +{ > + for (int i = 0; i < n; i++) > + if (cond[i]) > + init += a[i]; > + return init; > +} > + > +double > +__attribute__ ((noipa)) > +foo3 (double *__restrict a, double init, int *__restrict cond, int n) > +{ > + for (int i = 0; i < n; i++) > + if (cond[i]) > + init -= a[i]; > + return init; > +} > + > +double > +__attribute__ ((noipa)) > +foo4 (double *__restrict a, double init, int *__restrict cond, int n) > +{ > + for (int i = 0; i < n; i++) > + if (cond[i]) > + init *= a[i]; > + return init; > +} > + > +int > +__attribute__ ((noipa)) > +foo5 (int *__restrict a, int init, int *__restrict cond, int n) > +{ > + for (int i = 0; i < n; i++) > + if (cond[i]) > + init &= a[i]; > + return init; > +} > + > +int > +__attribute__ ((noipa)) > +foo6 (int *__restrict a, int init, int *__restrict cond, int n) > +{ > + for (int i = 0; i < n; i++) > + if (cond[i]) > + init |= a[i]; > + return init; > +} > + > +int > +__attribute__ ((noipa)) > +foo7 (int *__restrict a, int init, int *__restrict cond, int n) > +{ > + for (int i = 0; i < n; i++) > + if (cond[i]) > + init ^= a[i]; > + return init; > +} > + > +#define SZ 125 > + > +int > +main () > +{ > + double res1 = 0, res2 = 0, res3 = 0; > + double a1[SZ], a2[SZ], a3[SZ]; > + int c1[SZ], c2[SZ], c3[SZ]; > + > + int a4[SZ], a5[SZ], a6[SZ]; > + int res4 = 0, res5 = 0, res6 = 0; > + int c4[SZ], c5[SZ], c6[SZ]; > + > + for (int i = 0; i < SZ; i++) > + { > + a1[i] = i * 3 + (i & 4) - (i & 7); > + a2[i] = i * 3 + (i & 4) - (i & 7); > + a3[i] = i * 0.05 + (i & 4) - (i & 7); > + a4[i] = i * 3 + (i & 4) - (i & 7); > + a5[i] = i * 3 + (i & 4) - (i & 7); > + a6[i] = i * 3 + (i & 4) - (i & 7); > + c1[i] = i & 1; > + c2[i] = i & 2; > + c3[i] = i & 3; > + c4[i] = i & 4; > + c5[i] = i & 5; > + c6[i] = i & 6; > + __asm__ volatile ("" : : : "memory"); > + } > + > + double init1 = 2.7, init2 = 8.2, init3 = 0.1; > + double ref1 = init1, ref2 = init2, ref3 = init3; > + > + int init4 = 87, init5 = 11, init6 = -123894344; > + int ref4 = init4, ref5 = init5, ref6 = init6; > + > +#pragma GCC novector > + for (int i = 0; i < SZ; i++) > + { > + if (c1[i]) > + ref1 += a1[i]; > + if (c2[i]) > + ref2 -= a2[i]; > + if (c3[i]) > + ref3 *= a3[i]; > + if (c4[i]) > + ref4 &= a4[i]; > + if (c5[i]) > + ref5 |= a5[i]; > + if (c6[i]) > + ref6 ^= a6[i]; > + } > + > + res1 = foo2 (a1, init1, c1, SZ); > + res2 = foo3 (a2, init2, c2, SZ); > + res3 = foo4 (a3, init3, c3, SZ); > + res4 = foo5 (a4, init4, c4, SZ); > + res5 = foo6 (a5, init5, c5, SZ); > + res6 = foo7 (a6, init6, c6, SZ); > + > + if (res1 != ref1) > + __builtin_abort (); > + if (res2 != ref2) > + __builtin_abort (); > + if (res3 != ref3) > + __builtin_abort (); > + if (res4 != ref4) > + __builtin_abort (); > + if (res5 != ref5) > + __builtin_abort (); > + if (res6 != ref6) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 5 "vect" } } */ > +/* { dg-final { scan-tree-dump-not "VCOND_MASK" "vect" } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c > index cc07a047cd5..7be22d60bf2 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c > @@ -3,4 +3,6 @@ > > #include "reduc_call-1.c" > > -/* { dg-final { scan-assembler-times {vfmacc\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} 1 } } */ > +/* { dg-final { scan-assembler-times {vfmadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+} 1 } } */ > +/* { dg-final { scan-assembler-times {vfadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} 1 } } */ > +/* { dg-final { scan-assembler-not {vmerge} } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c > index 6d00c404d2a..83beabeff97 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c > @@ -3,4 +3,6 @@ > > #include "reduc_call-1.c" > > -/* { dg-final { scan-assembler {vfmacc\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} } } */ > +/* { dg-final { scan-assembler {vfmadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+} } } */ > +/* { dg-final { scan-assembler {vfadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} } } */ > +/* { dg-final { scan-assembler-not {vmerge} } } */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index c381d14b801..9571351805c 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1856,10 +1856,12 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi, > gimple *new_assign; > tree rhs; > tree rhs1 = gimple_assign_rhs1 (reduc); > + tree lhs = gimple_assign_lhs (reduc); > tree tmp = make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_ifc_"); > tree c; > enum tree_code reduction_op = gimple_assign_rhs_code (reduc); > - tree op_nochange = neutral_op_for_reduction (TREE_TYPE (rhs1), reduction_op, NULL); > + tree op_nochange = neutral_op_for_reduction (TREE_TYPE (rhs1), reduction_op, > + NULL, false); > gimple_seq stmts = NULL; > > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -1868,19 +1870,36 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi, > print_gimple_stmt (dump_file, reduc, 0, TDF_SLIM); > } > > - /* Build cond expression using COND and constant operand > - of reduction rhs. */ > - c = fold_build_cond_expr (TREE_TYPE (rhs1), > - unshare_expr (cond), > - swap ? op_nochange : op1, > - swap ? op1 : op_nochange); > - > - /* Create assignment stmt and insert it at GSI. */ > - new_assign = gimple_build_assign (tmp, c); > - gsi_insert_before (gsi, new_assign, GSI_SAME_STMT); > - /* Build rhs for unconditional increment/decrement/logic_operation. */ > - rhs = gimple_build (&stmts, reduction_op, > - TREE_TYPE (rhs1), op0, tmp); > + /* If possible create a COND_OP instead of a COND_EXPR and an OP_EXPR. > + The COND_OP will have a neutral_op else value. */ > + internal_fn ifn; > + ifn = get_conditional_internal_fn (reduction_op); > + if (ifn != IFN_LAST > + && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs)) > + && !swap) > + { > + gcall *cond_call = gimple_build_call_internal (ifn, 4, > + unshare_expr (cond), > + op0, op1, op0); > + gsi_insert_before (gsi, cond_call, GSI_SAME_STMT); > + gimple_call_set_lhs (cond_call, tmp); > + rhs = tmp; > + } > + else > + { > + /* Build cond expression using COND and constant operand > + of reduction rhs. */ > + c = fold_build_cond_expr (TREE_TYPE (rhs1), > + unshare_expr (cond), > + swap ? op_nochange : op1, > + swap ? op1 : op_nochange); > + /* Create assignment stmt and insert it at GSI. */ > + new_assign = gimple_build_assign (tmp, c); > + gsi_insert_before (gsi, new_assign, GSI_SAME_STMT); > + /* Build rhs for unconditional increment/decrement/logic_operation. */ > + rhs = gimple_build (&stmts, reduction_op, > + TREE_TYPE (rhs1), op0, tmp); > + } > > if (has_nop) > { > @@ -2292,7 +2311,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi) > { > /* Convert reduction stmt into vectorizable form. */ > rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1, > - swap,has_nop, nop_reduc); > + swap, has_nop, nop_reduc); > redundant_ssa_names.safe_push (std::make_pair (res, rhs)); > } > new_stmt = gimple_build_assign (res, rhs); > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index ebab1953b9c..1c455701c73 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -3762,7 +3762,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) > static bool > fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) > { > - if (code == PLUS_EXPR) > + /* We support MINUS_EXPR by negating the operand. This also preserves an > + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 + > + (-0.0) = -0.0. */ > + if (code == PLUS_EXPR || code == MINUS_EXPR) > { > *reduc_fn = IFN_FOLD_LEFT_PLUS; > return true; > @@ -3841,23 +3844,29 @@ reduction_fn_for_scalar_code (code_helper code, internal_fn *reduc_fn) > by the introduction of additional X elements, return that X, otherwise > return null. CODE is the code of the reduction and SCALAR_TYPE is type > of the scalar elements. If the reduction has just a single initial value > - then INITIAL_VALUE is that value, otherwise it is null. */ > + then INITIAL_VALUE is that value, otherwise it is null. > + If AS_INITIAL is TRUE the value is supposed to be used as initial value. > + In that case no signed zero is returned. */ > > tree > neutral_op_for_reduction (tree scalar_type, code_helper code, > - tree initial_value) > + tree initial_value, bool as_initial) > { > if (code.is_tree_code ()) > switch (tree_code (code)) > { > - case WIDEN_SUM_EXPR: > case DOT_PROD_EXPR: > case SAD_EXPR: > - case PLUS_EXPR: > case MINUS_EXPR: > case BIT_IOR_EXPR: > case BIT_XOR_EXPR: > return build_zero_cst (scalar_type); > + case WIDEN_SUM_EXPR: > + case PLUS_EXPR: > + if (!as_initial && HONOR_SIGNED_ZEROS (scalar_type)) > + return build_real (scalar_type, dconstm0); > + else > + return build_zero_cst (scalar_type); > > case MULT_EXPR: > return build_one_cst (scalar_type); > @@ -4085,7 +4094,15 @@ pop: > || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))) > FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > cnt++; > - if (cnt != 1) > + > + bool cond_fn_p = op.code.is_internal_fn () > + && (conditional_internal_fn_code (internal_fn (*code)) > + != ERROR_MARK); > + > + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have > + op1 twice (once as definition, once as else) in the same operation. > + Allow this. */ > + if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2)) > { > fail = true; > break; > @@ -4187,8 +4204,14 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, > return NULL; > } > > - nphi_def_loop_uses++; > - phi_use_stmt = use_stmt; > + /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have > + op1 twice (once as definition, once as else) in the same operation. > + Only count it as one. */ > + if (use_stmt != phi_use_stmt) > + { > + nphi_def_loop_uses++; > + phi_use_stmt = use_stmt; > + } > } > > tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); > @@ -6122,7 +6145,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info)); > gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt_info); > } > - > + > scalar_dest = gimple_get_lhs (orig_stmt_info->stmt); > scalar_type = TREE_TYPE (scalar_dest); > scalar_results.truncate (0); > @@ -6459,7 +6482,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, > if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) > initial_value = reduc_info->reduc_initial_values[0]; > neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code, > - initial_value); > + initial_value, false); > } > if (neutral_op) > vector_identity = gimple_build_vector_from_val (&seq, vectype, > @@ -6941,8 +6964,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > gimple_stmt_iterator *gsi, > gimple **vec_stmt, slp_tree slp_node, > gimple *reduc_def_stmt, > - tree_code code, internal_fn reduc_fn, > - tree ops[3], tree vectype_in, > + code_helper code, internal_fn reduc_fn, > + tree *ops, int num_ops, tree vectype_in, > int reduc_index, vec_loop_masks *masks, > vec_loop_lens *lens) > { > @@ -6958,17 +6981,48 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > > gcc_assert (!nested_in_vect_loop_p (loop, stmt_info)); > gcc_assert (ncopies == 1); > - gcc_assert (TREE_CODE_LENGTH (code) == binary_op); > + > + bool is_cond_op = false; > + if (!code.is_tree_code ()) > + { > + code = conditional_internal_fn_code (internal_fn (code)); > + gcc_assert (code != ERROR_MARK); > + is_cond_op = true; > + } > + > + gcc_assert (TREE_CODE_LENGTH (tree_code (code)) == binary_op); > > if (slp_node) > - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out), > - TYPE_VECTOR_SUBPARTS (vectype_in))); > + { > + if (is_cond_op) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "fold-left reduction on SLP not supported.\n"); > + return false; > + } > > - tree op0 = ops[1 - reduc_index]; > + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out), > + TYPE_VECTOR_SUBPARTS (vectype_in))); > + } > + > + /* The operands either come from a binary operation or an IFN_COND operation. > + The former is a gimple assign with binary rhs and the latter is a > + gimple call with four arguments. */ > + gcc_assert (num_ops == 2 || num_ops == 4); > + tree op0, opmask; > + if (!is_cond_op) > + op0 = ops[1 - reduc_index]; > + else > + { > + op0 = ops[2]; > + opmask = ops[0]; > + gcc_assert (!slp_node); > + } > > int group_size = 1; > stmt_vec_info scalar_dest_def_info; > - auto_vec vec_oprnds0; > + auto_vec vec_oprnds0, vec_opmask; > if (slp_node) > { > auto_vec > vec_defs (2); > @@ -6984,9 +7038,15 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, > op0, &vec_oprnds0); > scalar_dest_def_info = stmt_info; > + > + /* For an IFN_COND_OP we also need the vector mask operand. */ > + if (is_cond_op) > + vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, > + opmask, &vec_opmask); > } > > - tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt); > + gimple *sdef = scalar_dest_def_info->stmt; > + tree scalar_dest = gimple_get_lhs (sdef); > tree scalar_type = TREE_TYPE (scalar_dest); > tree reduc_var = gimple_phi_result (reduc_def_stmt); > > @@ -7020,13 +7080,16 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > tree bias = NULL_TREE; > if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); > + else if (is_cond_op) > + mask = vec_opmask[0]; > if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > { > len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in, > i, 1); > signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > bias = build_int_cst (intQI_type_node, biasval); > - mask = build_minus_one_cst (truth_type_for (vectype_in)); > + if (!is_cond_op) > + mask = build_minus_one_cst (truth_type_for (vectype_in)); > } > > /* Handle MINUS by adding the negative. */ > @@ -7038,7 +7101,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > def0 = negated; > } > > - if (mask && mask_reduc_fn == IFN_LAST) > + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + && mask && mask_reduc_fn == IFN_LAST) > def0 = merge_with_identity (gsi, mask, vectype_out, def0, > vector_identity); > > @@ -7069,8 +7133,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > } > else > { > - reduc_var = vect_expand_fold_left (gsi, scalar_dest_var, code, > - reduc_var, def0); > + reduc_var = vect_expand_fold_left (gsi, scalar_dest_var, > + tree_code (code), reduc_var, def0); > new_stmt = SSA_NAME_DEF_STMT (reduc_var); > /* Remove the statement, so that we can use the same code paths > as for statements that we've just created. */ > @@ -7521,8 +7585,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > if (i == STMT_VINFO_REDUC_IDX (stmt_info)) > continue; > > + /* For an IFN_COND_OP we might hit the reduction definition operand > + twice (once as definition, once as else). */ > + if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)]) > + continue; > + > /* There should be only one cycle def in the stmt, the one > - leading to reduc_def. */ > + leading to reduc_def. */ > if (VECTORIZABLE_CYCLE_DEF (dt)) > return false; > > @@ -7721,6 +7790,15 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > when generating the code inside the loop. */ > > code_helper orig_code = STMT_VINFO_REDUC_CODE (phi_info); > + > + /* If conversion might have created a conditional operation like > + IFN_COND_ADD already. Use the internal code for the following checks. */ > + if (orig_code.is_internal_fn ()) > + { > + tree_code new_code = conditional_internal_fn_code (internal_fn (orig_code)); > + orig_code = new_code != ERROR_MARK ? new_code : orig_code; > + } > + > STMT_VINFO_REDUC_CODE (reduc_info) = orig_code; > > vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info); > @@ -7759,7 +7837,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "reduction: not commutative/associative"); > + "reduction: not commutative/associative\n"); > return false; > } > } > @@ -8143,9 +8221,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > } > else if (reduction_type == FOLD_LEFT_REDUCTION > - && reduc_fn == IFN_LAST > + && internal_fn_mask_index (reduc_fn) == -1 > && FLOAT_TYPE_P (vectype_in) > - && HONOR_SIGNED_ZEROS (vectype_in) > && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in)) > { > if (dump_enabled_p ()) > @@ -8294,6 +8371,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > > code_helper code = canonicalize_code (op.code, op.type); > internal_fn cond_fn = get_conditional_internal_fn (code, op.type); > + > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); > @@ -8312,17 +8390,29 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > if (code == COND_EXPR) > gcc_assert (ncopies == 1); > > + /* A binary COND_OP reduction must have the same definition and else > + value. */ > + bool cond_fn_p = code.is_internal_fn () > + && conditional_internal_fn_code (internal_fn (code)) != ERROR_MARK; > + if (cond_fn_p) > + { > + gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB > + || code == IFN_COND_MUL || code == IFN_COND_AND > + || code == IFN_COND_IOR || code == IFN_COND_XOR); > + gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3])); > + } > + > bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); > > vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info); > if (reduction_type == FOLD_LEFT_REDUCTION) > { > internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info); > - gcc_assert (code.is_tree_code ()); > + gcc_assert (code.is_tree_code () || cond_fn_p); > return vectorize_fold_left_reduction > (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, > - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, > - lens); > + code, reduc_fn, op.ops, op.num_ops, vectype_in, > + reduc_index, masks, lens); > } > > bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); > @@ -8335,14 +8425,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > tree scalar_dest = gimple_get_lhs (stmt_info->stmt); > tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out); > > + /* Get NCOPIES vector definitions for all operands except the reduction > + definition. */ > vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, > single_defuse_cycle && reduc_index == 0 > ? NULL_TREE : op.ops[0], &vec_oprnds0, > single_defuse_cycle && reduc_index == 1 > ? NULL_TREE : op.ops[1], &vec_oprnds1, > - op.num_ops == 3 > - && !(single_defuse_cycle && reduc_index == 2) > + op.num_ops == 4 > + || (op.num_ops == 3 > + && !(single_defuse_cycle && reduc_index == 2)) > ? op.ops[2] : NULL_TREE, &vec_oprnds2); > + > + /* For single def-use cycles get one copy of the vectorized reduction > + definition. */ > if (single_defuse_cycle) > { > gcc_assert (!slp_node); > @@ -8382,7 +8478,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > } > else > { > - if (op.num_ops == 3) > + if (op.num_ops >= 3) > vop[2] = vec_oprnds2[i]; > > if (masked_loop_p && mask_by_cond_expr) > @@ -8395,10 +8491,16 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > if (emulated_mixed_dot_prod) > new_stmt = vect_emulate_mixed_dot_prod (loop_vinfo, stmt_info, gsi, > vec_dest, vop); > - else if (code.is_internal_fn ()) > + > + else if (code.is_internal_fn () && !cond_fn_p) > new_stmt = gimple_build_call_internal (internal_fn (code), > op.num_ops, > vop[0], vop[1], vop[2]); > + else if (code.is_internal_fn () && cond_fn_p) > + new_stmt = gimple_build_call_internal (internal_fn (code), > + op.num_ops, > + vop[0], vop[1], vop[2], > + vop[1]); > else > new_stmt = gimple_build_assign (vec_dest, tree_code (op.code), > vop[0], vop[1], vop[2]); > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index a4043e4a656..254d172231d 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -2350,7 +2350,7 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *, > tree); > > /* In tree-vect-loop.cc. */ > -extern tree neutral_op_for_reduction (tree, code_helper, tree); > +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool = true); > extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo); > bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *); > /* Used in tree-vect-loop-manip.cc */ > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)