RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Robin Dapp <rdapp.gcc@gmail.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	Richard Biener <rguenther@suse.de>
Subject: RE: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.
Date: Wed, 4 Oct 2023 15:12:55 +0000	[thread overview]
Message-ID: <VI1PR08MB5325F4C648851C5A9C833E87FFCBA@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <671a575c-02ff-071b-967e-2e93d8986c1a@gmail.com>

Hi Robin,

> -----Original Message-----
> From: Robin Dapp <rdapp.gcc@gmail.com>
> Sent: Wednesday, October 4, 2023 8:54 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches <gcc-
> patches@gcc.gnu.org>; Richard Biener <rguenther@suse.de>
> Cc: rdapp.gcc@gmail.com
> Subject: Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> Hi Tamar,
> 
> > I can't approve but hope you don't mind the review,
> 
> Not at all, greatly appreciated.
> 
> I incorporated all your remarks apart from this:
> 
> > Isn't vec_opmask NULL for SLP? You probably need to read it from
> > vec_defs for the COND_EXPR
> 
> Above that I gcc_assert (!slp_node) for the IFN_COND case.  It doesn't seem to
> be hit during testsuite runs.  I also didn't manage to create an example that
> would trigger it.  When "conditionalizing" an SLP fold-left reduction we don't
> seem to vectorize for a different reason.
> Granted, I didn't look very closely at that reason :)

Yeah it looks like it's failing because it can't handle the PHI node reduction for
the condition.

So that's fine, I think then we should exit from vectorize_fold_left_reduction
in that case so we avoid the segfault when we start forcing things through
SLP only soon and add single lane SLP support.

So in the

  if (slp_node)
    {

Add something like:

If (is_cond_op)
    {
      if (dump_enabled_p ())
	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
			 "left fold reduction on SLP not supported.\n");
      return false;
    }

> 
> Bootstrap and testsuite are currently running with the attached v2 on x86,
> aarch64 and powerpc.
> 
> Besides, when thinking about which COND_OPs we expect I tried to loosen the
> restrictions in if-conv by allowing MAX_EXPR and MIN_EXPR.  The emitted
> code on riscv looks correct but I hit a bootstrap ICE on x86 so omitted it for
> now.
> 
> Regards
>  Robin
> 
> 
> Subject: [PATCH v2] ifcvt/vect: Emit COND_ADD for conditional scalar
> reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both into a
> masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is
> true.
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/111401
> 	* internal-fn.cc (cond_fn_p): New function.
> 	* internal-fn.h (cond_fn_p): Define.
> 	* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
> 	if supported.
> 	(predicate_scalar_phi): Add whitespace.
> 	* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
> 	(neutral_op_for_reduction): Return -0 for PLUS.
> 	(vect_is_simple_reduction): Don't count else operand in
> 	COND_ADD.
> 	(vectorize_fold_left_reduction): Add COND_ADD handling.
> 	(vectorizable_reduction): Don't count else operand in COND_ADD.
> 	(vect_transform_reduction): Add COND_ADD handling.
> 	* tree-vectorizer.h (neutral_op_for_reduction): Add default
> 	parameter.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
> 	* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  gcc/internal-fn.cc                            |  17 +++
>  gcc/internal-fn.h                             |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++++++++++++++++++
>  .../riscv/rvv/autovec/cond/pr111401.c         | 139 +++++++++++++++++
>  gcc/tree-if-conv.cc                           |  63 ++++++--
>  gcc/tree-vect-loop.cc                         | 129 ++++++++++++----
>  gcc/tree-vectorizer.h                         |   2 +-
>  7 files changed, 450 insertions(+), 42 deletions(-)  create mode 100644
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index
> 61d5a9e4772..9b38dc0cef4 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4245,6 +4245,23 @@ first_commutative_argument (internal_fn fn)
>      }
>  }
> 
> +/* Return true if this CODE describes a conditional (masked)
> +internal_fn.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +    return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +    return false;
> +
> +  internal_fn fn = as_internal_fn ((combined_fn) code);
> +
> +  return conditional_internal_fn_code (fn) != ERROR_MARK; }
> +
> +
>  /* Return true if this CODE describes an internal_fn that returns a vector with
>     elements twice as wide as the element size of the input vectors.  */
> 

The only comment I have is whether you actually need this helper function?
It looks like all the uses of it are in cases you have, or will call conditional_internal_fn_code
directly.

e.g. in vect_transform_reduction you can replace it by 

bool cond_fn_p = cond_fn != ERROR_MARK;

and in 

  if (cond_fn_p (orig_code))
      orig_code = conditional_internal_fn_code (internal_fn(orig_code));

just 

internal_fn new_fn = conditional_internal_fn_code (internal_fn(orig_code));
if (new_fn != ERROR_MARK)
  orig_code = new_fn;

which would save the repeated testing of the condition.

Patch looks good to me with those two changes, but can't approve 😊

Thanks for working on this!,
Tamar

> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index
> 99de13a0199..f1cc9db29c0 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
> extern int first_commutative_argument (internal_fn);  extern bool
> associative_binary_fn_p (internal_fn);  extern bool widening_fn_p
> (code_helper);
> +extern bool cond_fn_p (code_helper code);
> 
>  extern bool set_edom_supported_p (void);
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-
> zero.c b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 00000000000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include <math.h>
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone)) reduc_plus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone)) reduc_minus_double (double
> +*restrict a, double init, int *cond, int n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int
> +n) {
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i = 0; i < N; i++)
> +    {
> +      a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
> +      cond1[i] = 0;
> +      cond2[i] = i & 4 ? 1 : 0;
> +      asm volatile ("" ::: "memory");
> +    }
> +
> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);  double ref1 =
> + reduc_plus_double_ref (a, -0.0, cond1, n);  double res2 =
> + reduc_minus_double (a, -0.0, cond1, n);  double ref2 =
> + reduc_minus_double_ref (a, -0.0, cond1, n);  double res3 =
> + reduc_plus_double (a, -0.0, cond1, n);  double ref3 =
> + reduc_plus_double_ref (a, -0.0, cond1, n);  double res4 =
> + reduc_minus_double (a, -0.0, cond1, n);  double ref4 =
> + reduc_minus_double_ref (a, -0.0, cond1, n);
> +
> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
> +    __builtin_abort ();
> +
> +  res1 = reduc_plus_double (a, 0.0, cond1, n);
> +  ref1 = reduc_plus_double_ref (a, 0.0, cond1, n);
> +  res2 = reduc_minus_double (a, 0.0, cond1, n);
> +  ref2 = reduc_minus_double_ref (a, 0.0, cond1, n);
> +  res3 = reduc_plus_double (a, 0.0, cond1, n);
> +  ref3 = reduc_plus_double_ref (a, 0.0, cond1, n);
> +  res4 = reduc_minus_double (a, 0.0, cond1, n);
> +  ref4 = reduc_minus_double_ref (a, 0.0, cond1, n);
> +
> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
> +    __builtin_abort ();
> +
> +  res1 = reduc_plus_double (a, -0.0, cond2, n);
> +  ref1 = reduc_plus_double_ref (a, -0.0, cond2, n);
> +  res2 = reduc_minus_double (a, -0.0, cond2, n);
> +  ref2 = reduc_minus_double_ref (a, -0.0, cond2, n);
> +  res3 = reduc_plus_double (a, -0.0, cond2, n);
> +  ref3 = reduc_plus_double_ref (a, -0.0, cond2, n);
> +  res4 = reduc_minus_double (a, -0.0, cond2, n);
> +  ref4 = reduc_minus_double_ref (a, -0.0, cond2, n);
> +
> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
> +    __builtin_abort ();
> +
> +  res1 = reduc_plus_double (a, 0.0, cond2, n);
> +  ref1 = reduc_plus_double_ref (a, 0.0, cond2, n);
> +  res2 = reduc_minus_double (a, 0.0, cond2, n);
> +  ref2 = reduc_minus_double_ref (a, 0.0, cond2, n);
> +  res3 = reduc_plus_double (a, 0.0, cond2, n);
> +  ref3 = reduc_plus_double_ref (a, 0.0, cond2, n);
> +  res4 = reduc_minus_double (a, 0.0, cond2, n);
> +  ref4 = reduc_minus_double_ref (a, 0.0, cond2, n);
> +
> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
> +    __builtin_abort ();
> +  if (res2 != ref2 || signbit (res2) != signbit (ref2))
> +    __builtin_abort ();
> +  if (res3 != ref3 || signbit (res3) != signbit (ref3))
> +    __builtin_abort ();
> +  if (res4 != ref4 || signbit (res4) != signbit (ref4))
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> new file mode 100644
> index 00000000000..8f1cb0d68de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> @@ -0,0 +1,139 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-march=rv64gcv -mabi=lp64d --param
> +riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
> +
> +double
> +__attribute__ ((noipa))
> +foo2 (double *__restrict a, double init, int *__restrict cond, int n) {
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      init += a[i];
> +  return init;
> +}
> +
> +double
> +__attribute__ ((noipa))
> +foo3 (double *__restrict a, double init, int *__restrict cond, int n) {
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      init -= a[i];
> +  return init;
> +}
> +
> +double
> +__attribute__ ((noipa))
> +foo4 (double *__restrict a, double init, int *__restrict cond, int n) {
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      init *= a[i];
> +  return init;
> +}
> +
> +int
> +__attribute__ ((noipa))
> +foo5 (int *__restrict a, int init, int *__restrict cond, int n) {
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      init &= a[i];
> +  return init;
> +}
> +
> +int
> +__attribute__ ((noipa))
> +foo6 (int *__restrict a, int init, int *__restrict cond, int n) {
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      init |= a[i];
> +  return init;
> +}
> +
> +int
> +__attribute__ ((noipa))
> +foo7 (int *__restrict a, int init, int *__restrict cond, int n) {
> +  for (int i = 0; i < n; i++)
> +    if (cond[i])
> +      init ^= a[i];
> +  return init;
> +}
> +
> +#define SZ 125
> +
> +int
> +main ()
> +{
> +  double res1 = 0, res2 = 0, res3 = 0;
> +  double a1[SZ], a2[SZ], a3[SZ];
> +  int c1[SZ], c2[SZ], c3[SZ];
> +
> +  int a4[SZ], a5[SZ], a6[SZ], a7[SZ], a8[SZ];  int res4 = 0, res5 = 0,
> + res6 = 0;  int c4[SZ], c5[SZ], c6[SZ];
> +
> +  for (int i = 0; i < SZ; i++)
> +    {
> +      a1[i] = i * 3 + (i & 4) - (i & 7);
> +      a2[i] = i * 3 + (i & 4) - (i & 7);
> +      a3[i] = i * 0.05 + (i & 4) - (i & 7);
> +      a4[i] = i * 3 + (i & 4) - (i & 7);
> +      a5[i] = i * 3 + (i & 4) - (i & 7);
> +      a6[i] = i * 3 + (i & 4) - (i & 7);
> +      c1[i] = i & 1;
> +      c2[i] = i & 2;
> +      c3[i] = i & 3;
> +      c4[i] = i & 4;
> +      c5[i] = i & 5;
> +      c6[i] = i & 6;
> +      __asm__ volatile ("" : : : "memory");
> +    }
> +
> +  double init1 = 2.7, init2 = 8.2, init3 = 0.1;  double ref1 = init1,
> + ref2 = init2, ref3 = init3;
> +
> +  int init4 = 87, init5 = 11, init6 = -123894344, init7 = -2, init8 =
> + 854893;  int ref4 = init4, ref5 = init5, ref6 = init6;
> +
> +#pragma GCC novector
> +  for (int i = 0; i < SZ; i++)
> +    {
> +      if (c1[i])
> +        ref1 += a1[i];
> +      if (c2[i])
> +        ref2 -= a2[i];
> +      if (c3[i])
> +        ref3 *= a3[i];
> +      if (c4[i])
> +        ref4 &= a4[i];
> +      if (c5[i])
> +        ref5 |= a5[i];
> +      if (c6[i])
> +        ref6 ^= a6[i];
> +    }
> +
> +  res1 = foo2 (a1, init1, c1, SZ);
> +  res2 = foo3 (a2, init2, c2, SZ);
> +  res3 = foo4 (a3, init3, c3, SZ);
> +  res4 = foo5 (a4, init4, c4, SZ);
> +  res5 = foo6 (a5, init5, c5, SZ);
> +  res6 = foo7 (a6, init6, c6, SZ);
> +
> +  if (res1 != ref1)
> +    __builtin_abort ();
> +  if (res2 != ref2)
> +    __builtin_abort ();
> +  if (res3 != ref3)
> +    __builtin_abort ();
> +  if (res4 != ref4)
> +    __builtin_abort ();
> +  if (res5 != ref5)
> +    __builtin_abort ();
> +  if (res6 != ref6)
> +    __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 5 "vect" } }
> +*/
> +/* { dg-final { scan-tree-dump-not "VCOND_MASK" "vect" } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index
> a8c915913ae..b334173794d 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1852,10 +1852,12 @@ convert_scalar_cond_reduction (gimple *reduc,
> gimple_stmt_iterator *gsi,
>    gimple *new_assign;
>    tree rhs;
>    tree rhs1 = gimple_assign_rhs1 (reduc);
> +  tree lhs = gimple_assign_lhs (reduc);
>    tree tmp = make_temp_ssa_name (TREE_TYPE (rhs1), NULL, "_ifc_");
>    tree c;
>    enum tree_code reduction_op  = gimple_assign_rhs_code (reduc);
> -  tree op_nochange = neutral_op_for_reduction (TREE_TYPE (rhs1),
> reduction_op, NULL);
> +  tree op_nochange = neutral_op_for_reduction (TREE_TYPE (rhs1),
> reduction_op,
> +					       NULL, false);
>    gimple_seq stmts = NULL;
> 
>    if (dump_file && (dump_flags & TDF_DETAILS)) @@ -1864,19 +1866,52
> @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator
> *gsi,
>        print_gimple_stmt (dump_file, reduc, 0, TDF_SLIM);
>      }
> 
> -  /* Build cond expression using COND and constant operand
> -     of reduction rhs.  */
> -  c = fold_build_cond_expr (TREE_TYPE (rhs1),
> -			    unshare_expr (cond),
> -			    swap ? op_nochange : op1,
> -			    swap ? op1 : op_nochange);
> +  /* If possible try to create an IFN_COND_ADD instead of a COND_EXPR and
> +     a PLUS_EXPR.  Don't do this if the reduction def operand itself is
> +     a vectorizable call as we can create a COND version of it
> + directly.  */  internal_fn ifn;  ifn = get_conditional_internal_fn
> + (reduction_op);
> 
> -  /* Create assignment stmt and insert it at GSI.  */
> -  new_assign = gimple_build_assign (tmp, c);
> -  gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
> -  /* Build rhs for unconditional increment/decrement/logic_operation.  */
> -  rhs = gimple_build (&stmts, reduction_op,
> -		      TREE_TYPE (rhs1), op0, tmp);
> +  bool try_cond_op = true;
> +  gimple *opstmt;
> +  if (TREE_CODE (op1) == SSA_NAME
> +      && (opstmt = SSA_NAME_DEF_STMT (op1))
> +      && is_gimple_call (opstmt))
> +    {
> +      combined_fn cfn = gimple_call_combined_fn (opstmt);
> +      internal_fn ifnop;
> +      reduction_fn_for_scalar_code (cfn, &ifnop);
> +      if (vectorized_internal_fn_supported_p (ifnop, TREE_TYPE
> +					      (gimple_call_lhs (opstmt))))
> +	try_cond_op = false;
> +    }
> +
> +  if (ifn != IFN_LAST
> +      && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs))
> +      && try_cond_op && !swap)
> +    {
> +      gcall *cond_call = gimple_build_call_internal (ifn, 4,
> +						     unshare_expr (cond),
> +						     op0, op1, op0);
> +      gsi_insert_before (gsi, cond_call, GSI_SAME_STMT);
> +      gimple_call_set_lhs (cond_call, tmp);
> +      rhs = tmp;
> +    }
> +  else
> +    {
> +      /* Build cond expression using COND and constant operand
> +	 of reduction rhs.  */
> +      c = fold_build_cond_expr (TREE_TYPE (rhs1),
> +				unshare_expr (cond),
> +				swap ? op_nochange : op1,
> +				swap ? op1 : op_nochange);
> +      /* Create assignment stmt and insert it at GSI.  */
> +      new_assign = gimple_build_assign (tmp, c);
> +      gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
> +      /* Build rhs for unconditional increment/decrement/logic_operation.  */
> +      rhs = gimple_build (&stmts, reduction_op,
> +			  TREE_TYPE (rhs1), op0, tmp);
> +    }
> 
>    if (has_nop)
>      {
> @@ -2241,7 +2276,7 @@ predicate_scalar_phi (gphi *phi,
> gimple_stmt_iterator *gsi)
>  	{
>  	  /* Convert reduction stmt into vectorizable form.  */
>  	  rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
> -					       swap,has_nop, nop_reduc);
> +					       swap, has_nop, nop_reduc);
>  	  redundant_ssa_names.safe_push (std::make_pair (res, rhs));
>  	}
>        new_stmt = gimple_build_assign (res, rhs); diff --git a/gcc/tree-vect-
> loop.cc b/gcc/tree-vect-loop.cc index 23c6e8259e7..d370793cfcb 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop,
> vec_info_shared *shared)  static bool  fold_left_reduction_fn (code_helper
> code, internal_fn *reduc_fn)  {
> -  if (code == PLUS_EXPR)
> +  if (code == PLUS_EXPR || code == MINUS_EXPR)
>      {
>        *reduc_fn = IFN_FOLD_LEFT_PLUS;
>        return true;
> @@ -3751,23 +3751,29 @@ reduction_fn_for_scalar_code (code_helper
> code, internal_fn *reduc_fn)
>     by the introduction of additional X elements, return that X, otherwise
>     return null.  CODE is the code of the reduction and SCALAR_TYPE is type
>     of the scalar elements.  If the reduction has just a single initial value
> -   then INITIAL_VALUE is that value, otherwise it is null.  */
> +   then INITIAL_VALUE is that value, otherwise it is null.
> +   If AS_INITIAL is TRUE the value is supposed to be used as initial value.
> +   In that case no signed zero is returned.  */
> 
>  tree
>  neutral_op_for_reduction (tree scalar_type, code_helper code,
> -			  tree initial_value)
> +			  tree initial_value, bool as_initial)
>  {
>    if (code.is_tree_code ())
>      switch (tree_code (code))
>        {
> -      case WIDEN_SUM_EXPR:
>        case DOT_PROD_EXPR:
>        case SAD_EXPR:
> -      case PLUS_EXPR:
>        case MINUS_EXPR:
>        case BIT_IOR_EXPR:
>        case BIT_XOR_EXPR:
>  	return build_zero_cst (scalar_type);
> +      case WIDEN_SUM_EXPR:
> +      case PLUS_EXPR:
> +	if (!as_initial && HONOR_SIGNED_ZEROS (scalar_type))
> +	  return build_real (scalar_type, dconstm0);
> +	else
> +	  return build_zero_cst (scalar_type);
> 
>        case MULT_EXPR:
>  	return build_one_cst (scalar_type);
> @@ -4106,8 +4112,14 @@ vect_is_simple_reduction (loop_vec_info
> loop_info, stmt_vec_info phi_info,
>            return NULL;
>          }
> 
> -      nphi_def_loop_uses++;
> -      phi_use_stmt = use_stmt;
> +      /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
> +	 op1 twice (once as definition, once as else) in the same operation.
> +	 Only count it as one. */
> +      if (use_stmt != phi_use_stmt)
> +	{
> +	  nphi_def_loop_uses++;
> +	  phi_use_stmt = use_stmt;
> +	}
>      }
> 
>    tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
> @@ -6041,7 +6053,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
>        gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
>        gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt_info);
>      }
> -
> +
>    scalar_dest = gimple_get_lhs (orig_stmt_info->stmt);
>    scalar_type = TREE_TYPE (scalar_dest);
>    scalar_results.truncate (0);
> @@ -6378,7 +6390,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
>  	  if (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>  	    initial_value = reduc_info->reduc_initial_values[0];
>  	  neutral_op = neutral_op_for_reduction (TREE_TYPE (vectype), code,
> -						 initial_value);
> +						 initial_value, false);
>  	}
>        if (neutral_op)
>  	vector_identity = gimple_build_vector_from_val (&seq, vectype, @@ -
> 6860,8 +6872,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>  			       gimple_stmt_iterator *gsi,
>  			       gimple **vec_stmt, slp_tree slp_node,
>  			       gimple *reduc_def_stmt,
> -			       tree_code code, internal_fn reduc_fn,
> -			       tree ops[3], tree vectype_in,
> +			       code_helper code, internal_fn reduc_fn,
> +			       tree *ops, int num_ops, tree vectype_in,
>  			       int reduc_index, vec_loop_masks *masks,
>  			       vec_loop_lens *lens)
>  {
> @@ -6877,17 +6889,38 @@ vectorize_fold_left_reduction (loop_vec_info
> loop_vinfo,
> 
>    gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
>    gcc_assert (ncopies == 1);
> -  gcc_assert (TREE_CODE_LENGTH (code) == binary_op);
> +
> +  bool is_cond_op = false;
> +  if (!code.is_tree_code ())
> +    {
> +      gcc_assert (cond_fn_p (code));
> +      is_cond_op = true;
> +      code = conditional_internal_fn_code (internal_fn (code));
> +    }
> +
> +  gcc_assert (TREE_CODE_LENGTH (tree_code (code)) == binary_op);
> 
>    if (slp_node)
>      gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
>  			  TYPE_VECTOR_SUBPARTS (vectype_in)));
> 
> -  tree op0 = ops[1 - reduc_index];
> +  /* The operands either come from a binary operation or an IFN_COND
> operation.
> +     The former is a gimple assign with binary rhs and the latter is a
> +     gimple call with four arguments.  */  gcc_assert (num_ops == 2 ||
> + num_ops == 4);  tree op0, opmask;  if (!is_cond_op)
> +    op0 = ops[1 - reduc_index];
> +  else
> +    {
> +      op0 = ops[2];
> +      opmask = ops[0];
> +      gcc_assert (!slp_node);
> +    }
> 
>    int group_size = 1;
>    stmt_vec_info scalar_dest_def_info;
> -  auto_vec<tree> vec_oprnds0;
> +  auto_vec<tree> vec_oprnds0, vec_opmask;
>    if (slp_node)
>      {
>        auto_vec<vec<tree> > vec_defs (2); @@ -6903,9 +6936,15 @@
> vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>        vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
>  				     op0, &vec_oprnds0);
>        scalar_dest_def_info = stmt_info;
> +
> +      /* For an IFN_COND_OP we also need the vector mask operand.  */
> +      if (is_cond_op)
> +	  vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
> +					 opmask, &vec_opmask);
>      }
> 
> -  tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt);
> +  gimple *sdef = scalar_dest_def_info->stmt;  tree scalar_dest =
> + gimple_get_lhs (sdef);
>    tree scalar_type = TREE_TYPE (scalar_dest);
>    tree reduc_var = gimple_phi_result (reduc_def_stmt);
> 
> @@ -6939,13 +6978,16 @@ vectorize_fold_left_reduction (loop_vec_info
> loop_vinfo,
>        tree bias = NULL_TREE;
>        if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>  	mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num,
> vectype_in, i);
> +      else if (is_cond_op)
> +	mask = vec_opmask[0];
>        if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>  	{
>  	  len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
>  				   i, 1);
>  	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> (loop_vinfo);
>  	  bias = build_int_cst (intQI_type_node, biasval);
> -	  mask = build_minus_one_cst (truth_type_for (vectype_in));
> +	  if (!is_cond_op)
> +	    mask = build_minus_one_cst (truth_type_for (vectype_in));
>  	}
> 
>        /* Handle MINUS by adding the negative.  */ @@ -6957,7 +6999,8 @@
> vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>  	  def0 = negated;
>  	}
> 
> -      if (mask && mask_reduc_fn == IFN_LAST)
> +      if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +	  && mask && mask_reduc_fn == IFN_LAST)
>  	def0 = merge_with_identity (gsi, mask, vectype_out, def0,
>  				    vector_identity);
> 
> @@ -6988,8 +7031,8 @@ vectorize_fold_left_reduction (loop_vec_info
> loop_vinfo,
>  	}
>        else
>  	{
> -	  reduc_var = vect_expand_fold_left (gsi, scalar_dest_var, code,
> -					     reduc_var, def0);
> +	  reduc_var = vect_expand_fold_left (gsi, scalar_dest_var,
> +					     tree_code (code), reduc_var, def0);
>  	  new_stmt = SSA_NAME_DEF_STMT (reduc_var);
>  	  /* Remove the statement, so that we can use the same code paths
>  	     as for statements that we've just created.  */ @@ -7440,6
> +7483,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        if (i == STMT_VINFO_REDUC_IDX (stmt_info))
>  	continue;
> 
> +      /* For an IFN_COND_OP we might hit the reduction definition operand
> +	 twice (once as definition, once as else).  */
> +      if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
> +	continue;
> +
>        /* There should be only one cycle def in the stmt, the one
>           leading to reduc_def.  */
>        if (VECTORIZABLE_CYCLE_DEF (dt))
> @@ -7640,6 +7688,12 @@ vectorizable_reduction (loop_vec_info
> loop_vinfo,
>            when generating the code inside the loop.  */
> 
>    code_helper orig_code = STMT_VINFO_REDUC_CODE (phi_info);
> +
> +  /* If conversion might have created a conditional operation like
> +     IFN_COND_ADD already.  Use the internal code for the following
> + checks.  */  if (cond_fn_p (orig_code))
> +    orig_code = conditional_internal_fn_code (internal_fn (orig_code));
> +
>    STMT_VINFO_REDUC_CODE (reduc_info) = orig_code;
> 
>    vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE
> (reduc_info); @@ -7678,7 +7732,7 @@ vectorizable_reduction
> (loop_vec_info loop_vinfo,
>  	{
>  	  if (dump_enabled_p ())
>  	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -			    "reduction: not commutative/associative");
> +			    "reduction: not commutative/associative\n");
>  	  return false;
>  	}
>      }
> @@ -8213,6 +8267,7 @@ vect_transform_reduction (loop_vec_info
> loop_vinfo,
> 
>    code_helper code = canonicalize_code (op.code, op.type);
>    internal_fn cond_fn = get_conditional_internal_fn (code, op.type);
> +
>    vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>    vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>    bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn,
> vectype_in); @@ -8231,17 +8286,25 @@ vect_transform_reduction
> (loop_vec_info loop_vinfo,
>    if (code == COND_EXPR)
>      gcc_assert (ncopies == 1);
> 
> +  /* A binary COND_OP reduction must have the same definition and else
> +     value. */
> +  if (cond_fn_p (code))
> +    {
> +      gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB);
> +      gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3]));
> +    }
> +
>    bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> 
>    vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE
> (reduc_info);
>    if (reduction_type == FOLD_LEFT_REDUCTION)
>      {
>        internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> -      gcc_assert (code.is_tree_code ());
> +      gcc_assert (code.is_tree_code () || cond_fn_p (code));
>        return vectorize_fold_left_reduction
>  	  (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
> -	   tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks,
> -	   lens);
> +	   code, reduc_fn, op.ops, op.num_ops, vectype_in,
> +	   reduc_index, masks, lens);
>      }
> 
>    bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE
> (reduc_info); @@ -8254,14 +8317,20 @@ vect_transform_reduction
> (loop_vec_info loop_vinfo,
>    tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
>    tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
> 
> +  /* Get NCOPIES vector definitions for all operands except the reduction
> +     definition.  */
>    vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
>  		     single_defuse_cycle && reduc_index == 0
>  		     ? NULL_TREE : op.ops[0], &vec_oprnds0,
>  		     single_defuse_cycle && reduc_index == 1
>  		     ? NULL_TREE : op.ops[1], &vec_oprnds1,
> -		     op.num_ops == 3
> -		     && !(single_defuse_cycle && reduc_index == 2)
> +		     op.num_ops == 4
> +		     || (op.num_ops == 3
> +			 && !(single_defuse_cycle && reduc_index == 2))
>  		     ? op.ops[2] : NULL_TREE, &vec_oprnds2);
> +
> +  /* For single def-use cycles get one copy of the vectorized reduction
> +     definition.  */
>    if (single_defuse_cycle)
>      {
>        gcc_assert (!slp_node);
> @@ -8301,7 +8370,7 @@ vect_transform_reduction (loop_vec_info
> loop_vinfo,
>  	}
>        else
>  	{
> -	  if (op.num_ops == 3)
> +	  if (op.num_ops >= 3)
>  	    vop[2] = vec_oprnds2[i];
> 
>  	  if (masked_loop_p && mask_by_cond_expr) @@ -8314,10
> +8383,16 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>  	  if (emulated_mixed_dot_prod)
>  	    new_stmt = vect_emulate_mixed_dot_prod (loop_vinfo, stmt_info,
> gsi,
>  						    vec_dest, vop);
> -	  else if (code.is_internal_fn ())
> +
> +	  else if (code.is_internal_fn () && !cond_fn_p (code))
>  	    new_stmt = gimple_build_call_internal (internal_fn (code),
>  						   op.num_ops,
>  						   vop[0], vop[1], vop[2]);
> +	  else if (cond_fn_p (code))
> +	    new_stmt = gimple_build_call_internal (internal_fn (code),
> +						   op.num_ops,
> +						   vop[0], vop[1], vop[2],
> +						   vop[1]);
>  	  else
>  	    new_stmt = gimple_build_assign (vec_dest, tree_code (op.code),
>  					    vop[0], vop[1], vop[2]);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> f1d0cd79961..e22067400af 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2319,7 +2319,7 @@ extern tree vect_create_addr_base_for_vector_ref
> (vec_info *,
>  						  tree);
> 
>  /* In tree-vect-loop.cc.  */
> -extern tree neutral_op_for_reduction (tree, code_helper, tree);
> +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool =
> +true);
>  extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info
> loop_vinfo);  bool vect_rgroup_iv_might_wrap_p (loop_vec_info,
> rgroup_controls *);
>  /* Used in tree-vect-loop-manip.cc */
> --
> 2.41.0
>

next prev parent reply	other threads:[~2023-10-04 15:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 13:51 Robin Dapp
2023-09-27  0:44 ` Tamar Christina
2023-10-04  7:54   ` Robin Dapp
2023-10-04 13:15     ` Robin Dapp
2023-10-04 15:12     ` Tamar Christina [this message]
2023-10-05  8:54       ` Robin Dapp
2023-10-05  9:02       ` Robin Dapp
2023-10-05 14:05       ` Robin Dapp
2023-10-05 14:15         ` Tamar Christina
2023-10-06  9:10         ` Richard Biener
2023-10-06 12:28           ` Robin Dapp
2023-10-06 12:30             ` Robin Dapp
2023-10-06 13:43               ` Richard Biener
2023-10-06 20:54                 ` Robin Dapp
2023-10-09  8:25                   ` Richard Biener
2023-10-09 12:54                     ` Robin Dapp
2023-10-09 13:05                       ` Richard Biener
2023-10-09  5:50         ` Richard Sandiford
2023-10-09 12:02           ` Robin Dapp
2023-10-09 14:57             ` Richard Sandiford
2023-10-11 19:15               ` Robin Dapp
2023-10-12 10:47                 ` Richard Sandiford
2023-10-12 11:11                 ` Richard Biener
2023-10-19 20:07                   ` Robin Dapp
2023-10-23 10:53                     ` Richard Biener
2023-10-24 11:11                       ` Richard Sandiford
2023-10-24 19:56                         ` Robin Dapp
2023-10-31 21:04                           ` Richard Sandiford
2023-10-31 21:19                             ` Robin Dapp
2023-11-02  7:48                               ` Richard Biener
2023-09-27 11:42 ` Richard Biener
2023-11-02 23:26 ` Andrew Pinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325F4C648851C5A9C833E87FFCBA@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rdapp.gcc@gmail.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).