public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Robin Dapp <rdapp.gcc@gmail.com>,
	 Tamar Christina <Tamar.Christina@arm.com>,
	 gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Date: Tue, 24 Oct 2023 12:11:35 +0100	[thread overview]
Message-ID: <mptfs20p95k.fsf@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2310231045160.5106@jbgna.fhfr.qr> (Richard Biener's message of "Mon, 23 Oct 2023 10:53:42 +0000 (UTC)")

Richard Biener <rguenther@suse.de> writes:
> On Thu, 19 Oct 2023, Robin Dapp wrote:
>
>> Ugh, I didn't push yet because with a rebased trunk I am
>> seeing different behavior for some riscv testcases.
>> 
>> A reduction is not recognized because there is yet another
>> "double use" occurrence in check_reduction_path.  I guess it's
>> reasonable to loosen the restriction for conditional operations
>> here as well.
>> 
>> The only change to v4 therefore is:
>> 
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index ebab1953b9c..64654a55e4c 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -4085,7 +4094,15 @@ pop:
>>                 || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt))))
>>           FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>>             cnt++;
>> -      if (cnt != 1)
>> +
>> +      bool cond_fn_p = op.code.is_internal_fn ()
>> +       && (conditional_internal_fn_code (internal_fn (*code))
>> +           != ERROR_MARK);
>> +
>> +      /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
>> +        op1 twice (once as definition, once as else) in the same operation.
>> +        Allow this.  */
>> +      if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
>> 
>> Bootstrapped and regtested again on x86, aarch64 and power10.
>> Testsuite on riscv unchanged.
>
> Hmm, why opi == 1 only?  I think
>
> # _1 = PHI <.., _4>
>  _3 = .COND_ADD (_1, _2, _1);
>  _4 = .COND_ADD (_3, _5, _3);
>
> would be fine as well.  I think we want to simply ignore the 'else' value
> of conditional internal functions.  I suppose we have unary, binary
> and ternary conditional functions - I miss a internal_fn_else_index,
> but I suppose it's always the last one?

Yeah, it was always the last one before the introduction of .COND_LEN.
I agree internal_fn_else_index would be useful now.

Thanks,
Richard

>
> I think a single use on .COND functions is also OK, even when on the
> 'else' value only?  But maybe that's not too important here.
>
> Maybe
>
>       gimple *op_use_stmt;
>       unsigned cnt = 0;
>       FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
>         if (.. op_use_stmt is conditional internal function ..)
>           {
>             for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j)
>               if (gimple_call_arg (call, j) == op.ops[opi])
>                 cnt++;
>           }
>         else if (!is_gimple_debug (op_use_stmt)
>             && (*code != ERROR_MARK
>                 || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt))))
>           FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>             cnt++;
>
> ?
>
>> Regards
>>  Robin
>> 
>> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction.
>> 
>> As described in PR111401 we currently emit a COND and a PLUS expression
>> for conditional reductions.  This makes it difficult to combine both
>> into a masked reduction statement later.
>> This patch improves that by directly emitting a COND_ADD/COND_OP during
>> ifcvt and adjusting some vectorizer code to handle it.
>> 
>> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
>> is true.
>> 
>> gcc/ChangeLog:
>> 
>> 	PR middle-end/111401
>> 	* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>> 	if supported.
>> 	(predicate_scalar_phi): Add whitespace.
>> 	* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>> 	(neutral_op_for_reduction): Return -0 for PLUS.
>> 	(check_reduction_path): Don't count else operand in COND_OP.
>> 	(vect_is_simple_reduction): Ditto.
>> 	(vect_create_epilog_for_reduction): Fix whitespace.
>> 	(vectorize_fold_left_reduction): Add COND_OP handling.
>> 	(vectorizable_reduction): Don't count else operand in COND_OP.
>> 	(vect_transform_reduction): Add COND_OP handling.
>> 	* tree-vectorizer.h (neutral_op_for_reduction): Add default
>> 	parameter.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>> 	* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>> 	* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>> 	* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
>> ---
>>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++++++++++++++
>>  .../riscv/rvv/autovec/cond/pr111401.c         | 139 +++++++++++++++
>>  .../riscv/rvv/autovec/reduc/reduc_call-2.c    |   4 +-
>>  .../riscv/rvv/autovec/reduc/reduc_call-4.c    |   4 +-
>>  gcc/tree-if-conv.cc                           |  49 +++--
>>  gcc/tree-vect-loop.cc                         | 168 ++++++++++++++----
>>  gcc/tree-vectorizer.h                         |   2 +-
>>  7 files changed, 456 insertions(+), 51 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>> new file mode 100644
>> index 00000000000..7b46e7d8a2a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>> @@ -0,0 +1,141 @@
>> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target vect_double } */
>> +/* { dg-add-options ieee } */
>> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
>> +
>> +#include "tree-vect.h"
>> +
>> +#include <math.h>
>> +
>> +#define N (VECTOR_BITS * 17)
>> +
>> +double __attribute__ ((noinline, noclone))
>> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
>> +{
>> +  double res = init;
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      res += a[i];
>> +  return res;
>> +}
>> +
>> +double __attribute__ ((noinline, noclone, optimize ("0")))
>> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
>> +{
>> +  double res = init;
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      res += a[i];
>> +  return res;
>> +}
>> +
>> +double __attribute__ ((noinline, noclone))
>> +reduc_minus_double (double *restrict a, double init, int *cond, int n)
>> +{
>> +  double res = init;
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      res -= a[i];
>> +  return res;
>> +}
>> +
>> +double __attribute__ ((noinline, noclone, optimize ("0")))
>> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
>> +{
>> +  double res = init;
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      res -= a[i];
>> +  return res;
>> +}
>> +
>> +int __attribute__ ((optimize (1)))
>> +main ()
>> +{
>> +  int n = 19;
>> +  double a[N];
>> +  int cond1[N], cond2[N];
>> +
>> +  for (int i = 0; i < N; i++)
>> +    {
>> +      a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
>> +      cond1[i] = 0;
>> +      cond2[i] = i & 4 ? 1 : 0;
>> +      asm volatile ("" ::: "memory");
>> +    }
>> +
>> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);
>> +  double ref1 = reduc_plus_double_ref (a, -0.0, cond1, n);
>> +  double res2 = reduc_minus_double (a, -0.0, cond1, n);
>> +  double ref2 = reduc_minus_double_ref (a, -0.0, cond1, n);
>> +  double res3 = reduc_plus_double (a, -0.0, cond1, n);
>> +  double ref3 = reduc_plus_double_ref (a, -0.0, cond1, n);
>> +  double res4 = reduc_minus_double (a, -0.0, cond1, n);
>> +  double ref4 = reduc_minus_double_ref (a, -0.0, cond1, n);
>> +
>> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
>> +    __builtin_abort ();
>> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
>> +    __builtin_abort ();
>> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
>> +    __builtin_abort ();
>> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
>> +    __builtin_abort ();
>> +
>> +  res1 = reduc_plus_double (a, 0.0, cond1, n);
>> +  ref1 = reduc_plus_double_ref (a, 0.0, cond1, n);
>> +  res2 = reduc_minus_double (a, 0.0, cond1, n);
>> +  ref2 = reduc_minus_double_ref (a, 0.0, cond1, n);
>> +  res3 = reduc_plus_double (a, 0.0, cond1, n);
>> +  ref3 = reduc_plus_double_ref (a, 0.0, cond1, n);
>> +  res4 = reduc_minus_double (a, 0.0, cond1, n);
>> +  ref4 = reduc_minus_double_ref (a, 0.0, cond1, n);
>> +
>> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
>> +    __builtin_abort ();
>> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
>> +    __builtin_abort ();
>> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
>> +    __builtin_abort ();
>> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
>> +    __builtin_abort ();
>> +
>> +  res1 = reduc_plus_double (a, -0.0, cond2, n);
>> +  ref1 = reduc_plus_double_ref (a, -0.0, cond2, n);
>> +  res2 = reduc_minus_double (a, -0.0, cond2, n);
>> +  ref2 = reduc_minus_double_ref (a, -0.0, cond2, n);
>> +  res3 = reduc_plus_double (a, -0.0, cond2, n);
>> +  ref3 = reduc_plus_double_ref (a, -0.0, cond2, n);
>> +  res4 = reduc_minus_double (a, -0.0, cond2, n);
>> +  ref4 = reduc_minus_double_ref (a, -0.0, cond2, n);
>> +
>> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
>> +    __builtin_abort ();
>> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
>> +    __builtin_abort ();
>> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
>> +    __builtin_abort ();
>> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
>> +    __builtin_abort ();
>> +
>> +  res1 = reduc_plus_double (a, 0.0, cond2, n);
>> +  ref1 = reduc_plus_double_ref (a, 0.0, cond2, n);
>> +  res2 = reduc_minus_double (a, 0.0, cond2, n);
>> +  ref2 = reduc_minus_double_ref (a, 0.0, cond2, n);
>> +  res3 = reduc_plus_double (a, 0.0, cond2, n);
>> +  ref3 = reduc_plus_double_ref (a, 0.0, cond2, n);
>> +  res4 = reduc_minus_double (a, 0.0, cond2, n);
>> +  ref4 = reduc_minus_double_ref (a, 0.0, cond2, n);
>> +
>> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
>> +    __builtin_abort ();
>> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
>> +    __builtin_abort ();
>> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
>> +    __builtin_abort ();
>> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
>> +    __builtin_abort ();
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>> new file mode 100644
>> index 00000000000..83dbd61b3f3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>> @@ -0,0 +1,139 @@
>> +/* { dg-do run { target { riscv_v } } } */
>> +/* { dg-additional-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
>> +
>> +double
>> +__attribute__ ((noipa))
>> +foo2 (double *__restrict a, double init, int *__restrict cond, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      init += a[i];
>> +  return init;
>> +}
>> +
>> +double
>> +__attribute__ ((noipa))
>> +foo3 (double *__restrict a, double init, int *__restrict cond, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      init -= a[i];
>> +  return init;
>> +}
>> +
>> +double
>> +__attribute__ ((noipa))
>> +foo4 (double *__restrict a, double init, int *__restrict cond, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      init *= a[i];
>> +  return init;
>> +}
>> +
>> +int
>> +__attribute__ ((noipa))
>> +foo5 (int *__restrict a, int init, int *__restrict cond, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      init &= a[i];
>> +  return init;
>> +}
>> +
>> +int
>> +__attribute__ ((noipa))
>> +foo6 (int *__restrict a, int init, int *__restrict cond, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      init |= a[i];
>> +  return init;
>> +}
>> +
>> +int
>> +__attribute__ ((noipa))
>> +foo7 (int *__restrict a, int init, int *__restrict cond, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +    if (cond[i])
>> +      init ^= a[i];
>> +  return init;
>> +}
>> +
>> +#define SZ 125
>> +
>> +int
>> +main ()
>> +{
>> +  double res1 = 0, res2 = 0, res3 = 0;
>> +  double a1[SZ], a2[SZ], a3[SZ];
>> +  int c1[SZ], c2[SZ], c3[SZ];
>> +
>> +  int a4[SZ], a5[SZ], a6[SZ];
>> +  int res4 = 0, res5 = 0, res6 = 0;
>> +  int c4[SZ], c5[SZ], c6[SZ];
>> +
>> +  for (int i = 0; i < SZ; i++)
>> +    {
>> +      a1[i] = i * 3 + (i & 4) - (i & 7);
>> +      a2[i] = i * 3 + (i & 4) - (i & 7);
>> +      a3[i] = i * 0.05 + (i & 4) - (i & 7);
>> +      a4[i] = i * 3 + (i & 4) - (i & 7);
>> +      a5[i] = i * 3 + (i & 4) - (i & 7);
>> +      a6[i] = i * 3 + (i & 4) - (i & 7);
>> +      c1[i] = i & 1;
>> +      c2[i] = i & 2;
>> +      c3[i] = i & 3;
>> +      c4[i] = i & 4;
>> +      c5[i] = i & 5;
>> +      c6[i] = i & 6;
>> +      __asm__ volatile ("" : : : "memory");
>> +    }
>> +
>> +  double init1 = 2.7, init2 = 8.2, init3 = 0.1;
>> +  double ref1 = init1, ref2 = init2, ref3 = init3;
>> +
>> +  int init4 = 87, init5 = 11, init6 = -123894344;
>> +  int ref4 = init4, ref5 = init5, ref6 = init6;
>> +
>> +#pragma GCC novector
>> +  for (int i = 0; i < SZ; i++)
>> +    {
>> +      if (c1[i])
>> +        ref1 += a1[i];
>> +      if (c2[i])
>> +        ref2 -= a2[i];
>> +      if (c3[i])
>> +        ref3 *= a3[i];
>> +      if (c4[i])
>> +        ref4 &= a4[i];
>> +      if (c5[i])
>> +        ref5 |= a5[i];
>> +      if (c6[i])
>> +        ref6 ^= a6[i];
>> +    }
>> +
>> +  res1 = foo2 (a1, init1, c1, SZ);
>> +  res2 = foo3 (a2, init2, c2, SZ);
>> +  res3 = foo4 (a3, init3, c3, SZ);
>> +  res4 = foo5 (a4, init4, c4, SZ);
>> +  res5 = foo6 (a5, init5, c5, SZ);
>> +  res6 = foo7 (a6, init6, c6, SZ);
>> +
>> +  if (res1 != ref1)
>> +    __builtin_abort ();
>> +  if (res2 != ref2)
>> +    __builtin_abort ();
>> +  if (res3 != ref3)
>> +    __builtin_abort ();
>> +  if (res4 != ref4)
>> +    __builtin_abort ();
>> +  if (res5 != ref5)
>> +    __builtin_abort ();
>> +  if (res6 != ref6)
>> +    __builtin_abort ();
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 5 "vect" } } */
>> +/* { dg-final { scan-tree-dump-not "VCOND_MASK" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c
>> index cc07a047cd5..7be22d60bf2 100644
>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c
>> @@ -3,4 +3,6 @@
>>  
>>  #include "reduc_call-1.c"
>>  
>> -/* { dg-final { scan-assembler-times {vfmacc\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} 1 } } */
>> +/* { dg-final { scan-assembler-times {vfmadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+} 1 } } */
>> +/* { dg-final { scan-assembler-times {vfadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} 1 } } */
>> +/* { dg-final { scan-assembler-not {vmerge} } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c
>> index 6d00c404d2a..83beabeff97 100644
>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c
>> @@ -3,4 +3,6 @@
>>  
>>  #include "reduc_call-1.c"
>>  
>> -/* { dg-final { scan-assembler {vfmacc\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} } } */
>> +/* { dg-final { scan-assembler {vfmadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+} } } */
>> +/* { dg-final { scan-assembler {vfadd\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+,v0.t} } } */
>> +/* { dg-final { scan-assembler-not {vmerge} } } */
>> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
>> index c381d14b801..9571351805c 100644
>> --- a/gcc/tree-if-conv.cc
>> +++ b/gcc/tree-if-conv.cc
>> @@ -1856,10 +1856,12 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
>>    gimple *new_assign;
>>    tree rhs;
>>    tree rhs1 = gimple_assign_rhs1 (reduc);
>> +  tree lhs = gimple_assign_lhs (reduc);
>>    tree tmp = make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_ifc_");
>>    tree c;
>>    enum tree_code reduction_op  = gimple_assign_rhs_code (reduc);
>> -  tree op_nochange = neutral_op_for_reduction (TREE_TYPE (rhs1), reduction_op, NULL);
>> +  tree op_nochange = neutral_op_for_reduction (TREE_TYPE (rhs1), reduction_op,
>> +					       NULL, false);
>>    gimple_seq stmts = NULL;
>>  
>>    if (dump_file && (dump_flags & TDF_DETAILS))
>> @@ -1868,19 +1870,36 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
>>        print_gimple_stmt (dump_file, reduc, 0, TDF_SLIM);
>>      }
>>  
>> -  /* Build cond expression using COND and constant operand
>> -     of reduction rhs.  */
>> -  c = fold_build_cond_expr (TREE_TYPE (rhs1),
>> -			    unshare_expr (cond),
>> -			    swap ? op_nochange : op1,
>> -			    swap ? op1 : op_nochange);
>> -
>> -  /* Create assignment stmt and insert it at GSI.  */
>> -  new_assign = gimple_build_assign (tmp, c);
>> -  gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
>> -  /* Build rhs for unconditional increment/decrement/logic_operation.  */
>> -  rhs = gimple_build (&stmts, reduction_op,
>> -		      TREE_TYPE (rhs1), op0, tmp);
>> +  /* If possible create a COND_OP instead of a COND_EXPR and an OP_EXPR.
>> +     The COND_OP will have a neutral_op else value.  */
>> +  internal_fn ifn;
>> +  ifn = get_conditional_internal_fn (reduction_op);
>> +  if (ifn != IFN_LAST
>> +      && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs))
>> +      && !swap)
>> +    {
>> +      gcall *cond_call = gimple_build_call_internal (ifn, 4,
>> +						     unshare_expr (cond),
>> +						     op0, op1, op0);
>> +      gsi_insert_before (gsi, cond_call, GSI_SAME_STMT);
>> +      gimple_call_set_lhs (cond_call, tmp);
>> +      rhs = tmp;
>> +    }
>> +  else
>> +    {
>> +      /* Build cond expression using COND and constant operand
>> +	 of reduction rhs.  */
>> +      c = fold_build_cond_expr (TREE_TYPE (rhs1),
>> +				unshare_expr (cond),
>> +				swap ? op_nochange : op1,
>> +				swap ? op1 : op_nochange);
>> +      /* Create assignment stmt and insert it at GSI.  */
>> +      new_assign = gimple_build_assign (tmp, c);
>> +      gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
>> +      /* Build rhs for unconditional increment/decrement/logic_operation.  */
>> +      rhs = gimple_build (&stmts, reduction_op,
>> +			  TREE_TYPE (rhs1), op0, tmp);
>> +    }
>>  
>>    if (has_nop)
>>      {
>> @@ -2292,7 +2311,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>>  	{
>>  	  /* Convert reduction stmt into vectorizable form.  */
>>  	  rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
>> -					       swap,has_nop, nop_reduc);
>> +					       swap, has_nop, nop_reduc);
>>  	  redundant_ssa_names.safe_push (std::make_pair (res, rhs));
>>  	}
>>        new_stmt = gimple_build_assign (res, rhs);
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index ebab1953b9c..1c455701c73 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -3762,7 +3762,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>>  static bool
>>  fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
>>  {
>> -  if (code == PLUS_EXPR)
>> +  /* We support MINUS_EXPR by negating the operand.  This also preserves an
>> +     initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 +
>> +     (-0.0) = -0.0.  */
>> +  if (code == PLUS_EXPR || code == MINUS_EXPR)
>>      {
>>        *reduc_fn = IFN_FOLD_LEFT_PLUS;
>>        return true;
>> @@ -3841,23 +3844,29 @@ reduction_fn_for_scalar_code (code_helper code, internal_fn *reduc_fn)
>>     by the introduction of additional X elements, return that X, otherwise
>>     return null.  CODE is the code of the reduction and SCALAR_TYPE is type
>>     of the scalar elements.  If the reduction has just a single initial value
>> -   then INITIAL_VALUE is that value, otherwise it is null.  */
>> +   then INITIAL_VALUE is that value, otherwise it is null.
>> +   If AS_INITIAL is TRUE the value is supposed to be used as initial value.
>> +   In that case no signed zero is returned.  */
>>  
>>  tree
>>  neutral_op_for_reduction (tree scalar_type, code_helper code,
>> -			  tree initial_value)
>> +			  tree initial_value, bool as_initial)
>>  {
>>    if (code.is_tree_code ())
>>      switch (tree_code (code))
>>        {
>> -      case WIDEN_SUM_EXPR:
>>        case DOT_PROD_EXPR:
>>        case SAD_EXPR:
>> -      case PLUS_EXPR:
>>        case MINUS_EXPR:
>>        case BIT_IOR_EXPR:
>>        case BIT_XOR_EXPR:
>>  	return build_zero_cst (scalar_type);
>> +      case WIDEN_SUM_EXPR:
>> +      case PLUS_EXPR:
>> +	if (!as_initial && HONOR_SIGNED_ZEROS (scalar_type))
>> +	  return build_real (scalar_type, dconstm0);
>> +	else
>> +	  return build_zero_cst (scalar_type);
>>  
>>        case MULT_EXPR:
>>  	return build_one_cst (scalar_type);
>> @@ -4085,7 +4094,15 @@ pop:
>>  		|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt))))
>>  	  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>>  	    cnt++;
>> -      if (cnt != 1)
>> +
>> +      bool cond_fn_p = op.code.is_internal_fn ()
>> +	&& (conditional_internal_fn_code (internal_fn (*code))
>> +	    != ERROR_MARK);
>> +
>> +      /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
>> +	 op1 twice (once as definition, once as else) in the same operation.
>> +	 Allow this.  */
>> +      if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
>>  	{
>>  	  fail = true;
>>  	  break;
>> @@ -4187,8 +4204,14 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info,
>>            return NULL;
>>          }
>>  
>> -      nphi_def_loop_uses++;
>> -      phi_use_stmt = use_stmt;
>> +      /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
>> +	 op1 twice (once as definition, once as else) in the same operation.
>> +	 Only count it as one. */
>> +      if (use_stmt != phi_use_stmt)
>> +	{
>> +	  nphi_def_loop_uses++;
>> +	  phi_use_stmt = use_stmt;
>> +	}
>>      }
>>  
>>    tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
>> @@ -6122,7 +6145,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>>        gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
>>        gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt_info);
>>      }
>> -  
>> +
>>    scalar_dest = gimple_get_lhs (orig_stmt_info->stmt);
>>    scalar_type = TREE_TYPE (scalar_dest);
>>    scalar_results.truncate (0);
>> @@ -6459,7 +6482,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>>  	  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>>  	    initial_value = reduc_info->reduc_initial_values[0];
>>  	  neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code,
>> -						 initial_value);
>> +						 initial_value, false);
>>  	}
>>        if (neutral_op)
>>  	vector_identity = gimple_build_vector_from_val (&seq, vectype,
>> @@ -6941,8 +6964,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>  			       gimple_stmt_iterator *gsi,
>>  			       gimple **vec_stmt, slp_tree slp_node,
>>  			       gimple *reduc_def_stmt,
>> -			       tree_code code, internal_fn reduc_fn,
>> -			       tree ops[3], tree vectype_in,
>> +			       code_helper code, internal_fn reduc_fn,
>> +			       tree *ops, int num_ops, tree vectype_in,
>>  			       int reduc_index, vec_loop_masks *masks,
>>  			       vec_loop_lens *lens)
>>  {
>> @@ -6958,17 +6981,48 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>  
>>    gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
>>    gcc_assert (ncopies == 1);
>> -  gcc_assert (TREE_CODE_LENGTH (code) == binary_op);
>> +
>> +  bool is_cond_op = false;
>> +  if (!code.is_tree_code ())
>> +    {
>> +      code = conditional_internal_fn_code (internal_fn (code));
>> +      gcc_assert (code != ERROR_MARK);
>> +      is_cond_op = true;
>> +    }
>> +
>> +  gcc_assert (TREE_CODE_LENGTH (tree_code (code)) == binary_op);
>>  
>>    if (slp_node)
>> -    gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
>> -			  TYPE_VECTOR_SUBPARTS (vectype_in)));
>> +    {
>> +      if (is_cond_op)
>> +	{
>> +	  if (dump_enabled_p ())
>> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +			     "fold-left reduction on SLP not supported.\n");
>> +	  return false;
>> +	}
>>  
>> -  tree op0 = ops[1 - reduc_index];
>> +      gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
>> +			    TYPE_VECTOR_SUBPARTS (vectype_in)));
>> +    }
>> +
>> +  /* The operands either come from a binary operation or an IFN_COND operation.
>> +     The former is a gimple assign with binary rhs and the latter is a
>> +     gimple call with four arguments.  */
>> +  gcc_assert (num_ops == 2 || num_ops == 4);
>> +  tree op0, opmask;
>> +  if (!is_cond_op)
>> +    op0 = ops[1 - reduc_index];
>> +  else
>> +    {
>> +      op0 = ops[2];
>> +      opmask = ops[0];
>> +      gcc_assert (!slp_node);
>> +    }
>>  
>>    int group_size = 1;
>>    stmt_vec_info scalar_dest_def_info;
>> -  auto_vec<tree> vec_oprnds0;
>> +  auto_vec<tree> vec_oprnds0, vec_opmask;
>>    if (slp_node)
>>      {
>>        auto_vec<vec<tree> > vec_defs (2);
>> @@ -6984,9 +7038,15 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>        vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
>>  				     op0, &vec_oprnds0);
>>        scalar_dest_def_info = stmt_info;
>> +
>> +      /* For an IFN_COND_OP we also need the vector mask operand.  */
>> +      if (is_cond_op)
>> +	  vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
>> +					 opmask, &vec_opmask);
>>      }
>>  
>> -  tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt);
>> +  gimple *sdef = scalar_dest_def_info->stmt;
>> +  tree scalar_dest = gimple_get_lhs (sdef);
>>    tree scalar_type = TREE_TYPE (scalar_dest);
>>    tree reduc_var = gimple_phi_result (reduc_def_stmt);
>>  
>> @@ -7020,13 +7080,16 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>        tree bias = NULL_TREE;
>>        if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>>  	mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
>> +      else if (is_cond_op)
>> +	mask = vec_opmask[0];
>>        if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>>  	{
>>  	  len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
>>  				   i, 1);
>>  	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>>  	  bias = build_int_cst (intQI_type_node, biasval);
>> -	  mask = build_minus_one_cst (truth_type_for (vectype_in));
>> +	  if (!is_cond_op)
>> +	    mask = build_minus_one_cst (truth_type_for (vectype_in));
>>  	}
>>  
>>        /* Handle MINUS by adding the negative.  */
>> @@ -7038,7 +7101,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>  	  def0 = negated;
>>  	}
>>  
>> -      if (mask && mask_reduc_fn == IFN_LAST)
>> +      if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
>> +	  && mask && mask_reduc_fn == IFN_LAST)
>>  	def0 = merge_with_identity (gsi, mask, vectype_out, def0,
>>  				    vector_identity);
>>  
>> @@ -7069,8 +7133,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>>  	}
>>        else
>>  	{
>> -	  reduc_var = vect_expand_fold_left (gsi, scalar_dest_var, code,
>> -					     reduc_var, def0);
>> +	  reduc_var = vect_expand_fold_left (gsi, scalar_dest_var,
>> +					     tree_code (code), reduc_var, def0);
>>  	  new_stmt = SSA_NAME_DEF_STMT (reduc_var);
>>  	  /* Remove the statement, so that we can use the same code paths
>>  	     as for statements that we've just created.  */
>> @@ -7521,8 +7585,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>>        if (i == STMT_VINFO_REDUC_IDX (stmt_info))
>>  	continue;
>>  
>> +      /* For an IFN_COND_OP we might hit the reduction definition operand
>> +	 twice (once as definition, once as else).  */
>> +      if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
>> +	continue;
>> +
>>        /* There should be only one cycle def in the stmt, the one
>> -         leading to reduc_def.  */
>> +	 leading to reduc_def.  */
>>        if (VECTORIZABLE_CYCLE_DEF (dt))
>>  	return false;
>>  
>> @@ -7721,6 +7790,15 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>>            when generating the code inside the loop.  */
>>  
>>    code_helper orig_code = STMT_VINFO_REDUC_CODE (phi_info);
>> +
>> +  /* If conversion might have created a conditional operation like
>> +     IFN_COND_ADD already.  Use the internal code for the following checks.  */
>> +  if (orig_code.is_internal_fn ())
>> +    {
>> +      tree_code new_code = conditional_internal_fn_code (internal_fn (orig_code));
>> +      orig_code = new_code != ERROR_MARK ? new_code : orig_code;
>> +    }
>> +
>>    STMT_VINFO_REDUC_CODE (reduc_info) = orig_code;
>>  
>>    vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
>> @@ -7759,7 +7837,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>>  	{
>>  	  if (dump_enabled_p ())
>>  	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -			    "reduction: not commutative/associative");
>> +			    "reduction: not commutative/associative\n");
>>  	  return false;
>>  	}
>>      }
>> @@ -8143,9 +8221,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>>  	  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>>  	}
>>        else if (reduction_type == FOLD_LEFT_REDUCTION
>> -	       && reduc_fn == IFN_LAST
>> +	       && internal_fn_mask_index (reduc_fn) == -1
>>  	       && FLOAT_TYPE_P (vectype_in)
>> -	       && HONOR_SIGNED_ZEROS (vectype_in)
>>  	       && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
>>  	{
>>  	  if (dump_enabled_p ())
>> @@ -8294,6 +8371,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>>  
>>    code_helper code = canonicalize_code (op.code, op.type);
>>    internal_fn cond_fn = get_conditional_internal_fn (code, op.type);
>> +
>>    vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>>    vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>>    bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in);
>> @@ -8312,17 +8390,29 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>>    if (code == COND_EXPR)
>>      gcc_assert (ncopies == 1);
>>  
>> +  /* A binary COND_OP reduction must have the same definition and else
>> +     value. */
>> +  bool cond_fn_p = code.is_internal_fn ()
>> +    && conditional_internal_fn_code (internal_fn (code)) != ERROR_MARK;
>> +  if (cond_fn_p)
>> +    {
>> +      gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
>> +		  || code == IFN_COND_MUL || code == IFN_COND_AND
>> +		  || code == IFN_COND_IOR || code == IFN_COND_XOR);
>> +      gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3]));
>> +    }
>> +
>>    bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>>  
>>    vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
>>    if (reduction_type == FOLD_LEFT_REDUCTION)
>>      {
>>        internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
>> -      gcc_assert (code.is_tree_code ());
>> +      gcc_assert (code.is_tree_code () || cond_fn_p);
>>        return vectorize_fold_left_reduction
>>  	  (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
>> -	   tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks,
>> -	   lens);
>> +	   code, reduc_fn, op.ops, op.num_ops, vectype_in,
>> +	   reduc_index, masks, lens);
>>      }
>>  
>>    bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
>> @@ -8335,14 +8425,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>>    tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
>>    tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
>>  
>> +  /* Get NCOPIES vector definitions for all operands except the reduction
>> +     definition.  */
>>    vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
>>  		     single_defuse_cycle && reduc_index == 0
>>  		     ? NULL_TREE : op.ops[0], &vec_oprnds0,
>>  		     single_defuse_cycle && reduc_index == 1
>>  		     ? NULL_TREE : op.ops[1], &vec_oprnds1,
>> -		     op.num_ops == 3
>> -		     && !(single_defuse_cycle && reduc_index == 2)
>> +		     op.num_ops == 4
>> +		     || (op.num_ops == 3
>> +			 && !(single_defuse_cycle && reduc_index == 2))
>>  		     ? op.ops[2] : NULL_TREE, &vec_oprnds2);
>> +
>> +  /* For single def-use cycles get one copy of the vectorized reduction
>> +     definition.  */
>>    if (single_defuse_cycle)
>>      {
>>        gcc_assert (!slp_node);
>> @@ -8382,7 +8478,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>>  	}
>>        else
>>  	{
>> -	  if (op.num_ops == 3)
>> +	  if (op.num_ops >= 3)
>>  	    vop[2] = vec_oprnds2[i];
>>  
>>  	  if (masked_loop_p && mask_by_cond_expr)
>> @@ -8395,10 +8491,16 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>>  	  if (emulated_mixed_dot_prod)
>>  	    new_stmt = vect_emulate_mixed_dot_prod (loop_vinfo, stmt_info, gsi,
>>  						    vec_dest, vop);
>> -	  else if (code.is_internal_fn ())
>> +
>> +	  else if (code.is_internal_fn () && !cond_fn_p)
>>  	    new_stmt = gimple_build_call_internal (internal_fn (code),
>>  						   op.num_ops,
>>  						   vop[0], vop[1], vop[2]);
>> +	  else if (code.is_internal_fn () && cond_fn_p)
>> +	    new_stmt = gimple_build_call_internal (internal_fn (code),
>> +						   op.num_ops,
>> +						   vop[0], vop[1], vop[2],
>> +						   vop[1]);
>>  	  else
>>  	    new_stmt = gimple_build_assign (vec_dest, tree_code (op.code),
>>  					    vop[0], vop[1], vop[2]);
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index a4043e4a656..254d172231d 100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -2350,7 +2350,7 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *,
>>  						  tree);
>>  
>>  /* In tree-vect-loop.cc.  */
>> -extern tree neutral_op_for_reduction (tree, code_helper, tree);
>> +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool = true);
>>  extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo);
>>  bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
>>  /* Used in tree-vect-loop-manip.cc */
>> 

  reply	other threads:[~2023-10-24 11:11 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 13:51 Robin Dapp
2023-09-27  0:44 ` Tamar Christina
2023-10-04  7:54   ` Robin Dapp
2023-10-04 13:15     ` Robin Dapp
2023-10-04 15:12     ` Tamar Christina
2023-10-05  8:54       ` Robin Dapp
2023-10-05  9:02       ` Robin Dapp
2023-10-05 14:05       ` Robin Dapp
2023-10-05 14:15         ` Tamar Christina
2023-10-06  9:10         ` Richard Biener
2023-10-06 12:28           ` Robin Dapp
2023-10-06 12:30             ` Robin Dapp
2023-10-06 13:43               ` Richard Biener
2023-10-06 20:54                 ` Robin Dapp
2023-10-09  8:25                   ` Richard Biener
2023-10-09 12:54                     ` Robin Dapp
2023-10-09 13:05                       ` Richard Biener
2023-10-09  5:50         ` Richard Sandiford
2023-10-09 12:02           ` Robin Dapp
2023-10-09 14:57             ` Richard Sandiford
2023-10-11 19:15               ` Robin Dapp
2023-10-12 10:47                 ` Richard Sandiford
2023-10-12 11:11                 ` Richard Biener
2023-10-19 20:07                   ` Robin Dapp
2023-10-23 10:53                     ` Richard Biener
2023-10-24 11:11                       ` Richard Sandiford [this message]
2023-10-24 19:56                         ` Robin Dapp
2023-10-31 21:04                           ` Richard Sandiford
2023-10-31 21:19                             ` Robin Dapp
2023-11-02  7:48                               ` Richard Biener
2023-09-27 11:42 ` Richard Biener
2023-11-02 23:26 ` Andrew Pinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptfs20p95k.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rdapp.gcc@gmail.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).