public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
@ 2023-09-08  9:01 Robin Dapp
  2023-09-11 20:35 ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-09-08  9:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc, richard.sandiford

Hi,

found in slp-reduc-7.c, this patch prevents optimizing e.g.
 COND_LEN_ADD ({-1, ... }, a, 0, c, len, bias)
unconditionally into just "a".

Currently, we assume that COND_LEN operations can be optimized similarly
to COND operations.  As the length is part of the mask (and usually not
compile-time constant), we must not perform any optimization that relies
on just the mask being "true".

Bootstrap and testsuite are unchanged on aarch64 and x86.

Regards
 Robin

gcc/ChangeLog:

	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Check for length masking.
---
 gcc/gimple-match-exports.cc | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..73be9f4f4c3 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -262,7 +262,8 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
   if (!res_op->cond.cond)
     return false;
 
-  if (!res_op->cond.else_value
+  if (!res_op->cond.len
+      && !res_op->cond.else_value
       && res_op->code.is_tree_code ())
     {
       /* The "else" value doesn't matter.  If the "then" value is a
@@ -301,9 +302,12 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
 
   /* If the "then" value is a gimple value and the "else" value matters,
      create a VEC_COND_EXPR between them, then see if it can be further
-     simplified.  */
+     simplified.
+     Don't do this if we have a COND_LEN_ as that would make us lose the
+     length masking.  */
   gimple_match_op new_op;
-  if (res_op->cond.else_value
+  if (!res_op->cond.len
+      && res_op->cond.else_value
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
@@ -314,7 +318,7 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       return gimple_resimplify3 (seq, res_op, valueize);
     }
 
-  /* Otherwise try rewriting the operation as an IFN_COND_* call.
+  /* Otherwise try rewriting the operation as an IFN_COND_(LEN_)* call.
      Again, this isn't a simplification in itself, since it's what
      RES_OP already described.  */
   if (convert_conditional_op (res_op, &new_op))
-- 
2.41.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-09-08  9:01 [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN Robin Dapp
@ 2023-09-11 20:35 ` Robin Dapp
  2023-09-18 10:22   ` Robin Dapp
  2023-10-12 13:53   ` Richard Sandiford
  0 siblings, 2 replies; 31+ messages in thread
From: Robin Dapp @ 2023-09-11 20:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc, richard.sandiford

Hi,

as Juzhe noticed in gcc.dg/pr92301.c there was still something missing in
the last patch.  The attached v2 makes sure we always have a COND_LEN operation
before returning true and initializes len and bias even if they are unused.

Bootstrapped and regtested on aarch64 and x86.

Regards
 Robin

Subject: [PATCH v2] gimple-match: Do not try UNCOND optimization with
 COND_LEN.

On riscv we mis-optimize conditional (length) operations into
unconditional operations e.g. in slp-reduc-7.c and
gcc.dg/pr92301.c.

This patch prevents optimizing e.g.
 COND_LEN_ADD ({-1, ... }, a, 0, c, len, bias)
unconditionally into just "a".

Currently, we assume that COND_LEN operations can be optimized similarly
to COND operations.  As the length is part of the mask (and usually not
compile-time constant), we must not perform any optimization that relies
on just the mask being "true".  This patch ensures that we still have a
COND_LEN pattern after optimization.

gcc/ChangeLog:

	PR target/111311
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Check for length masking.
	(try_conditional_simplification): Check that the result is still
	length masked.
---
 gcc/gimple-match-exports.cc | 38 ++++++++++++++++++++++++++++++-------
 gcc/gimple-match.h          |  3 ++-
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..d41de98a3d3 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -262,7 +262,8 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
   if (!res_op->cond.cond)
     return false;
 
-  if (!res_op->cond.else_value
+  if (!res_op->cond.len
+      && !res_op->cond.else_value
       && res_op->code.is_tree_code ())
     {
       /* The "else" value doesn't matter.  If the "then" value is a
@@ -301,9 +302,12 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
 
   /* If the "then" value is a gimple value and the "else" value matters,
      create a VEC_COND_EXPR between them, then see if it can be further
-     simplified.  */
+     simplified.
+     Don't do this if we have a COND_LEN_ as that would make us lose the
+     length masking.  */
   gimple_match_op new_op;
-  if (res_op->cond.else_value
+  if (!res_op->cond.len
+      && res_op->cond.else_value
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
@@ -314,7 +318,7 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       return gimple_resimplify3 (seq, res_op, valueize);
     }
 
-  /* Otherwise try rewriting the operation as an IFN_COND_* call.
+  /* Otherwise try rewriting the operation as an IFN_COND_(LEN_)* call.
      Again, this isn't a simplification in itself, since it's what
      RES_OP already described.  */
   if (convert_conditional_op (res_op, &new_op))
@@ -386,9 +390,29 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op,
     default:
       gcc_unreachable ();
     }
-  *res_op = cond_op;
-  maybe_resimplify_conditional_op (seq, res_op, valueize);
-  return true;
+
+  if (len)
+    {
+      /* If we had a COND_LEN before we need to ensure that it stays that
+	 way.  */
+      gimple_match_op old_op = *res_op;
+      *res_op = cond_op;
+      maybe_resimplify_conditional_op (seq, res_op, valueize);
+
+      auto cfn = combined_fn (res_op->code);
+      if (internal_fn_p (cfn)
+	  && internal_fn_len_index (as_internal_fn (cfn)) != -1)
+	return true;
+
+      *res_op = old_op;
+      return false;
+    }
+  else
+    {
+      *res_op = cond_op;
+      maybe_resimplify_conditional_op (seq, res_op, valueize);
+      return true;
+    }
 }
 
 /* Helper for the autogenerated code, valueize OP.  */
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..d192b7dae3e 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -56,7 +56,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
-- 
2.41.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-09-11 20:35 ` Robin Dapp
@ 2023-09-18 10:22   ` Robin Dapp
  2023-10-04  8:11     ` Robin Dapp
  2023-10-12 13:53   ` Richard Sandiford
  1 sibling, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-09-18 10:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc, richard.sandiford, Richard Biener, juzhe.zhong

Ping.

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-09-18 10:22   ` Robin Dapp
@ 2023-10-04  8:11     ` Robin Dapp
  0 siblings, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2023-10-04  8:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc, richard.sandiford, Richard Biener, juzhe.zhong

Ping^2.

I realize it's not very elegant as of now.  If there's a better/shorter way
to solve this feel free to suggest :) 

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-09-11 20:35 ` Robin Dapp
  2023-09-18 10:22   ` Robin Dapp
@ 2023-10-12 13:53   ` Richard Sandiford
  2023-10-12 14:19     ` Richard Sandiford
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-12 13:53 UTC (permalink / raw)
  To: Robin Dapp via Gcc-patches; +Cc: Robin Dapp

Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> as Juzhe noticed in gcc.dg/pr92301.c there was still something missing in
> the last patch.  The attached v2 makes sure we always have a COND_LEN operation
> before returning true and initializes len and bias even if they are unused.
>
> Bootstrapped and regtested on aarch64 and x86.

Sorry for the slow review.  I was hoping Richi would take it,
but I see he was hoping the same from me.

> Regards
>  Robin
>
> Subject: [PATCH v2] gimple-match: Do not try UNCOND optimization with
>  COND_LEN.
>
> On riscv we mis-optimize conditional (length) operations into
> unconditional operations e.g. in slp-reduc-7.c and
> gcc.dg/pr92301.c.
>
> This patch prevents optimizing e.g.
>  COND_LEN_ADD ({-1, ... }, a, 0, c, len, bias)
> unconditionally into just "a".
>
> Currently, we assume that COND_LEN operations can be optimized similarly
> to COND operations.  As the length is part of the mask (and usually not
> compile-time constant), we must not perform any optimization that relies
> on just the mask being "true".  This patch ensures that we still have a
> COND_LEN pattern after optimization.
>
> gcc/ChangeLog:
>
> 	PR target/111311
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Check for length masking.
> 	(try_conditional_simplification): Check that the result is still
> 	length masked.
> ---
>  gcc/gimple-match-exports.cc | 38 ++++++++++++++++++++++++++++++-------
>  gcc/gimple-match.h          |  3 ++-
>  2 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..d41de98a3d3 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -262,7 +262,8 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>    if (!res_op->cond.cond)
>      return false;
>  
> -  if (!res_op->cond.else_value
> +  if (!res_op->cond.len
> +      && !res_op->cond.else_value
>        && res_op->code.is_tree_code ())
>      {
>        /* The "else" value doesn't matter.  If the "then" value is a

Why are the contents of this if statement wrong for COND_LEN?
If the "else" value doesn't matter, then the masked form can use
the "then" value for all elements.  I would have expected the same
thing to be true of COND_LEN.

> @@ -301,9 +302,12 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>  
>    /* If the "then" value is a gimple value and the "else" value matters,
>       create a VEC_COND_EXPR between them, then see if it can be further
> -     simplified.  */
> +     simplified.
> +     Don't do this if we have a COND_LEN_ as that would make us lose the
> +     length masking.  */
>    gimple_match_op new_op;
> -  if (res_op->cond.else_value
> +  if (!res_op->cond.len
> +      && res_op->cond.else_value
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {

The change LGTM, but it would be nice to phrase the comment to avoid
the "Do A.  Don't do A if B" pattern.  Maybe:

  /* If the condition represents MASK ? THEN : ELSE, where THEN is a gimple
     value and ELSE matters, create a VEC_COND_EXPR between them, then see
     if it can be further simplified.  */

> @@ -314,7 +318,7 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
>  
> -  /* Otherwise try rewriting the operation as an IFN_COND_* call.
> +  /* Otherwise try rewriting the operation as an IFN_COND_(LEN_)* call.
>       Again, this isn't a simplification in itself, since it's what
>       RES_OP already described.  */
>    if (convert_conditional_op (res_op, &new_op))
> @@ -386,9 +390,29 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op,
>      default:
>        gcc_unreachable ();
>      }
> -  *res_op = cond_op;
> -  maybe_resimplify_conditional_op (seq, res_op, valueize);
> -  return true;
> +
> +  if (len)
> +    {
> +      /* If we had a COND_LEN before we need to ensure that it stays that
> +	 way.  */
> +      gimple_match_op old_op = *res_op;
> +      *res_op = cond_op;
> +      maybe_resimplify_conditional_op (seq, res_op, valueize);
> +
> +      auto cfn = combined_fn (res_op->code);
> +      if (internal_fn_p (cfn)
> +	  && internal_fn_len_index (as_internal_fn (cfn)) != -1)
> +	return true;

Why isn't it enough to check the result of maybe_resimplify_conditional_op?

Thanks,
Richard

> +
> +      *res_op = old_op;
> +      return false;
> +    }
> +  else
> +    {
> +      *res_op = cond_op;
> +      maybe_resimplify_conditional_op (seq, res_op, valueize);
> +      return true;
> +    }
>  }
>  
>  /* Helper for the autogenerated code, valueize OP.  */
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..d192b7dae3e 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -56,7 +56,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-12 13:53   ` Richard Sandiford
@ 2023-10-12 14:19     ` Richard Sandiford
  2023-10-13 15:50       ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-12 14:19 UTC (permalink / raw)
  To: Robin Dapp via Gcc-patches; +Cc: Robin Dapp

Richard Sandiford <richard.sandiford@arm.com> writes:
> Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> [...]
>> @@ -386,9 +390,29 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op,
>>      default:
>>        gcc_unreachable ();
>>      }
>> -  *res_op = cond_op;
>> -  maybe_resimplify_conditional_op (seq, res_op, valueize);
>> -  return true;
>> +
>> +  if (len)
>> +    {
>> +      /* If we had a COND_LEN before we need to ensure that it stays that
>> +	 way.  */
>> +      gimple_match_op old_op = *res_op;
>> +      *res_op = cond_op;
>> +      maybe_resimplify_conditional_op (seq, res_op, valueize);
>> +
>> +      auto cfn = combined_fn (res_op->code);
>> +      if (internal_fn_p (cfn)
>> +	  && internal_fn_len_index (as_internal_fn (cfn)) != -1)
>> +	return true;
>
> Why isn't it enough to check the result of maybe_resimplify_conditional_op?

Sorry, ignore that part.  I get it now.

But isn't the test whether res_op->code itself is an internal_function?
In other words, shouldn't it just be:

      if (internal_fn_p (res_op->code)
	  && internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
	return true;

maybe_resimplify_conditional_op should already have converted to an
internal function where possible, and if combined_fn (res_op->code)
does any extra conversion on the fly, that conversion won't be reflected
in res_op.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-12 14:19     ` Richard Sandiford
@ 2023-10-13 15:50       ` Robin Dapp
  2023-10-16 21:59         ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-10-13 15:50 UTC (permalink / raw)
  To: Robin Dapp via Gcc-patches, richard.sandiford; +Cc: rdapp.gcc

> Why are the contents of this if statement wrong for COND_LEN?
> If the "else" value doesn't matter, then the masked form can use
> the "then" value for all elements.  I would have expected the same
> thing to be true of COND_LEN.

Right, that one was overly pessimistic.  Removed.

> But isn't the test whether res_op->code itself is an internal_function?
> In other words, shouldn't it just be:
> 
>       if (internal_fn_p (res_op->code)
> 	  && internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
> 	return true;
> 
> maybe_resimplify_conditional_op should already have converted to an
> internal function where possible, and if combined_fn (res_op->code)
> does any extra conversion on the fly, that conversion won't be reflected
> in res_op.

I went through some of our test cases and believe most of the problems
are due to situations like the following:

In vect-cond-arith-2.c we have (on riscv)
  vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
  vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);

On aarch64 this is a situation that matches the VEC_COND_EXPR
simplification that I disabled with this patch.  We valueized
to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
vect_res_2.5_24 = VEC_COND_EXPR <loop_mask_22, _26, vect_res_1.0_19>;
This is later re-assembled into a COND_SUB.

As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
achieve the same thing.  Would it be possible to create a COND_OP
directly instead, though?  I tried the following (not very polished
obviously):

-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-                    res_op->cond.cond, res_op->ops[0],
-                    res_op->cond.else_value);
-      *res_op = new_op;
-      return gimple_resimplify3 (seq, res_op, valueize);
+      if (!res_op->cond.len)
+       {
+         new_op.set_op (VEC_COND_EXPR, res_op->type,
+                        res_op->cond.cond, res_op->ops[0],
+                        res_op->cond.else_value);
+         *res_op = new_op;
+         return gimple_resimplify3 (seq, res_op, valueize);
+       }
+      else if (seq && *seq && is_gimple_assign (*seq))
+       {
+         new_op.code = gimple_assign_rhs_code (*seq);
+         new_op.type = res_op->type;
+         new_op.num_ops = gimple_num_ops (*seq) - 1;
+         new_op.ops[0] = gimple_assign_rhs1 (*seq);
+         if (new_op.num_ops > 1)
+           new_op.ops[1] = gimple_assign_rhs2 (*seq);
+         if (new_op.num_ops > 2)
+           new_op.ops[2] = gimple_assign_rhs2 (*seq);
+
+         new_op.cond = res_op->cond;
+
+         gimple_match_op bla2;
+         if (convert_conditional_op (&new_op, &bla2))
+           {
+             *res_op = bla2;
+             // SEQ should now be dead.
+             return true;
+           }
+       }

This would make the other hunk (check whether it was a LEN
and try to recreate it) redundant I hope.

I don't know enough about valueization, whether it's always
safe to do that and other implications.  On riscv this seems
to work, though and the other backends never go through the LEN
path.  If, however, this is a feasible direction it could also
be done for the non-LEN targets?

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-13 15:50       ` Robin Dapp
@ 2023-10-16 21:59         ` Richard Sandiford
  2023-10-17  8:47           ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-16 21:59 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Robin Dapp via Gcc-patches

Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Why are the contents of this if statement wrong for COND_LEN?
>> If the "else" value doesn't matter, then the masked form can use
>> the "then" value for all elements.  I would have expected the same
>> thing to be true of COND_LEN.
>
> Right, that one was overly pessimistic.  Removed.
>
>> But isn't the test whether res_op->code itself is an internal_function?
>> In other words, shouldn't it just be:
>> 
>>       if (internal_fn_p (res_op->code)
>> 	  && internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
>> 	return true;
>> 
>> maybe_resimplify_conditional_op should already have converted to an
>> internal function where possible, and if combined_fn (res_op->code)
>> does any extra conversion on the fly, that conversion won't be reflected
>> in res_op.
>
> I went through some of our test cases and believe most of the problems
> are due to situations like the following:
>
> In vect-cond-arith-2.c we have (on riscv)
>   vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
>   vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);
>
> On aarch64 this is a situation that matches the VEC_COND_EXPR
> simplification that I disabled with this patch.  We valueized
> to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
> vect_res_2.5_24 = VEC_COND_EXPR <loop_mask_22, _26, vect_res_1.0_19>;
> This is later re-assembled into a COND_SUB.
>
> As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
> achieve the same thing.  Would it be possible to create a COND_OP
> directly instead, though?  I tried the following (not very polished
> obviously):
>
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -                    res_op->cond.cond, res_op->ops[0],
> -                    res_op->cond.else_value);
> -      *res_op = new_op;
> -      return gimple_resimplify3 (seq, res_op, valueize);
> +      if (!res_op->cond.len)
> +       {
> +         new_op.set_op (VEC_COND_EXPR, res_op->type,
> +                        res_op->cond.cond, res_op->ops[0],
> +                        res_op->cond.else_value);
> +         *res_op = new_op;
> +         return gimple_resimplify3 (seq, res_op, valueize);
> +       }
> +      else if (seq && *seq && is_gimple_assign (*seq))
> +       {
> +         new_op.code = gimple_assign_rhs_code (*seq);
> +         new_op.type = res_op->type;
> +         new_op.num_ops = gimple_num_ops (*seq) - 1;
> +         new_op.ops[0] = gimple_assign_rhs1 (*seq);
> +         if (new_op.num_ops > 1)
> +           new_op.ops[1] = gimple_assign_rhs2 (*seq);
> +         if (new_op.num_ops > 2)
> +           new_op.ops[2] = gimple_assign_rhs2 (*seq);
> +
> +         new_op.cond = res_op->cond;
> +
> +         gimple_match_op bla2;
> +         if (convert_conditional_op (&new_op, &bla2))
> +           {
> +             *res_op = bla2;
> +             // SEQ should now be dead.
> +             return true;
> +           }
> +       }
>
> This would make the other hunk (check whether it was a LEN
> and try to recreate it) redundant I hope.
>
> I don't know enough about valueization, whether it's always
> safe to do that and other implications.  On riscv this seems
> to work, though and the other backends never go through the LEN
> path.  If, however, this is a feasible direction it could also
> be done for the non-LEN targets?

I don't know much about valueisation either :)  But it does feel
like we're working around the lack of a LEN form of COND_EXPR.
In other words, it seems odd that we can do:

  IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)

but we can't do:

  IFN_COND_LEN (mask, a, b, len, bias)

There seems to be no way of applying a length without also finding an
operation to perform.

Does IFN_COND_LEN make conceptual sense on RVV?  If so, would defining
it solve some of these problems?

I suppose in the worst case, IFN_COND_LEN is equivalent to IFN_COND_LEN_IOR
with a zero input (and extended to floats).  So if the target can do
IFN_COND_LEN_IOR, it could implement IFN_COND_LEN using the same instruction.

Thanks,
Richard


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-16 21:59         ` Richard Sandiford
@ 2023-10-17  8:47           ` Richard Biener
  2023-10-17 11:39             ` Robin Dapp
  2023-10-17 15:52             ` [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN Richard Sandiford
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Biener @ 2023-10-17  8:47 UTC (permalink / raw)
  To: Robin Dapp, Robin Dapp via Gcc-patches, richard.sandiford

On Mon, Oct 16, 2023 at 11:59 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Robin Dapp <rdapp.gcc@gmail.com> writes:
> >> Why are the contents of this if statement wrong for COND_LEN?
> >> If the "else" value doesn't matter, then the masked form can use
> >> the "then" value for all elements.  I would have expected the same
> >> thing to be true of COND_LEN.
> >
> > Right, that one was overly pessimistic.  Removed.
> >
> >> But isn't the test whether res_op->code itself is an internal_function?
> >> In other words, shouldn't it just be:
> >>
> >>       if (internal_fn_p (res_op->code)
> >>        && internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
> >>      return true;
> >>
> >> maybe_resimplify_conditional_op should already have converted to an
> >> internal function where possible, and if combined_fn (res_op->code)
> >> does any extra conversion on the fly, that conversion won't be reflected
> >> in res_op.
> >
> > I went through some of our test cases and believe most of the problems
> > are due to situations like the following:
> >
> > In vect-cond-arith-2.c we have (on riscv)
> >   vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
> >   vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);
> >
> > On aarch64 this is a situation that matches the VEC_COND_EXPR
> > simplification that I disabled with this patch.  We valueized
> > to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
> > vect_res_2.5_24 = VEC_COND_EXPR <loop_mask_22, _26, vect_res_1.0_19>;
> > This is later re-assembled into a COND_SUB.
> >
> > As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
> > achieve the same thing.  Would it be possible to create a COND_OP
> > directly instead, though?  I tried the following (not very polished
> > obviously):
> >
> > -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> > -                    res_op->cond.cond, res_op->ops[0],
> > -                    res_op->cond.else_value);
> > -      *res_op = new_op;
> > -      return gimple_resimplify3 (seq, res_op, valueize);
> > +      if (!res_op->cond.len)
> > +       {
> > +         new_op.set_op (VEC_COND_EXPR, res_op->type,
> > +                        res_op->cond.cond, res_op->ops[0],
> > +                        res_op->cond.else_value);
> > +         *res_op = new_op;
> > +         return gimple_resimplify3 (seq, res_op, valueize);
> > +       }
> > +      else if (seq && *seq && is_gimple_assign (*seq))
> > +       {
> > +         new_op.code = gimple_assign_rhs_code (*seq);
> > +         new_op.type = res_op->type;
> > +         new_op.num_ops = gimple_num_ops (*seq) - 1;
> > +         new_op.ops[0] = gimple_assign_rhs1 (*seq);
> > +         if (new_op.num_ops > 1)
> > +           new_op.ops[1] = gimple_assign_rhs2 (*seq);
> > +         if (new_op.num_ops > 2)
> > +           new_op.ops[2] = gimple_assign_rhs2 (*seq);
> > +
> > +         new_op.cond = res_op->cond;
> > +
> > +         gimple_match_op bla2;
> > +         if (convert_conditional_op (&new_op, &bla2))
> > +           {
> > +             *res_op = bla2;
> > +             // SEQ should now be dead.
> > +             return true;
> > +           }
> > +       }
> >
> > This would make the other hunk (check whether it was a LEN
> > and try to recreate it) redundant I hope.
> >
> > I don't know enough about valueization, whether it's always
> > safe to do that and other implications.  On riscv this seems
> > to work, though and the other backends never go through the LEN
> > path.  If, however, this is a feasible direction it could also
> > be done for the non-LEN targets?
>
> I don't know much about valueisation either :)  But it does feel
> like we're working around the lack of a LEN form of COND_EXPR.
> In other words, it seems odd that we can do:
>
>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>
> but we can't do:
>
>   IFN_COND_LEN (mask, a, b, len, bias)
>
> There seems to be no way of applying a length without also finding an
> operation to perform.

Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
IFN_COND{,_LEN} to be more consistent here?

> Does IFN_COND_LEN make conceptual sense on RVV?  If so, would defining
> it solve some of these problems?
>
> I suppose in the worst case, IFN_COND_LEN is equivalent to IFN_COND_LEN_IOR
> with a zero input (and extended to floats).  So if the target can do
> IFN_COND_LEN_IOR, it could implement IFN_COND_LEN using the same instruction.

In principle one can construct a mask from the length via {0, 1, ... }
< len and then
AND that to the mask in a VEC_COND_EXPR but that's of course super ugly and
likely inefficient (or hard to match back on RTL land).

Richard.

> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-17  8:47           ` Richard Biener
@ 2023-10-17 11:39             ` Robin Dapp
  2023-10-17 13:35               ` Richard Sandiford
  2023-10-17 15:52             ` [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN Richard Sandiford
  1 sibling, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-10-17 11:39 UTC (permalink / raw)
  To: Richard Biener, Robin Dapp via Gcc-patches, richard.sandiford; +Cc: rdapp.gcc

>> I don't know much about valueisation either :)  But it does feel
>> like we're working around the lack of a LEN form of COND_EXPR.
>> In other words, it seems odd that we can do:
>>
>>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>
>> but we can't do:
>>
>>   IFN_COND_LEN (mask, a, b, len, bias)
>>
>> There seems to be no way of applying a length without also finding an
>> operation to perform.
> 
> Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
> IFN_COND{,_LEN} to be more consistent here?

So, yes we could define IFN_COND_LEN (or VCOND_MASK_LEN) but I'd
assume that there would be a whole lot of follow-up things to
consider.

I'm wondering if we really gain something from the the round-trip
via VEC_COND_EXPR when we eventually create a COND_(LEN_)_OP anyway?
Sure, if the target doesn't have the particular operation we would
want a VEC_COND_EXPR.  Same if SEQ is somehow more complicated.

So the IFN_COND(_LEN) =? VCOND_MASK(_LEN) discussion notwithstanding,
couldn't what I naively proposed be helpful as well?  Or do we
potentially lose optimizations during the time where e.g. a
 _foo = a BINOP b
 VEC_COND_EXPR (cond, foo, else)
has not yet been converted into a
 COND_OP?
We already create COND_OPs for the other paths
(via convert_conditional_op) so why not for this one?  Or am I missing
some interdependence with SEQ?

FWIW I did a full bootstrap and testsuite run on the usual architectures
showing no changes with the attached patch.

Regards
 Robin

Subject: [PATCH] gimple-match: Create COND_OP directly if possible.

This patch converts simplified sequences into conditional operations
instead of VEC_COND_EXPRs if the target supports them.
This helps for len-masked targets which cannot directly use a
VEC_COND_EXPR in the presence of length masking.

gcc/ChangeLog:

	* gimple-match-exports.cc (directly_supported_p): Define.
	(maybe_resimplify_conditional_op): Create COND_OP directly.
	* gimple-match.h (gimple_match_cond::gimple_match_cond):
	Initialize length and bias.
---
 gcc/gimple-match-exports.cc | 40 ++++++++++++++++++++++++++++---------
 gcc/gimple-match.h          |  7 +++++--
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..ba3bd1450db 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -98,6 +98,8 @@ static bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree))
 static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 
+bool directly_supported_p (code_helper, tree, optab_subtype);
+
 /* Match and simplify the toplevel valueized operation THIS.
    Replaces THIS with a simplified and/or canonicalized result and
    returns whether any change was made.  */
@@ -299,22 +301,42 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
 	}
     }
 
-  /* If the "then" value is a gimple value and the "else" value matters,
-     create a VEC_COND_EXPR between them, then see if it can be further
-     simplified.  */
+  /* If the condition represents MASK ? THEN : ELSE, where THEN is a gimple
+     value and ELSE matters, create a VEC_COND_EXPR between them, then see
+     if it can be further simplified.
+     For COND_LEN masking, try to create a COND_LEN_OP directly in case
+     SEQ contains a supportable operation. */
   gimple_match_op new_op;
   if (res_op->cond.else_value
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
-      *res_op = new_op;
-      return gimple_resimplify3 (seq, res_op, valueize);
+      /* If a previous simplification was pushed to SEQ
+	 and we can convert it to a COND_OP directly, do so
+	 in order to save a round-trip via VEC_COND_EXPR -> COND_OP.  */
+      if (seq && *seq && is_gimple_assign (*seq)
+	  && directly_supported_p (gimple_assign_rhs_code (*seq), res_op->type,
+				   optab_scalar))
+	{
+	  res_op->code = gimple_assign_rhs_code (*seq);
+	  res_op->num_ops = gimple_num_ops (*seq) - 1;
+	  res_op->ops[0] = gimple_assign_rhs1 (*seq);
+	  if (res_op->num_ops > 1)
+	    res_op->ops[1] = gimple_assign_rhs2 (*seq);
+	  if (res_op->num_ops > 2)
+	    res_op->ops[2] = gimple_assign_rhs2 (*seq);
+	}
+      else if (!res_op->cond.len)
+	{
+	  new_op.set_op (VEC_COND_EXPR, res_op->type,
+			 res_op->cond.cond, res_op->ops[0],
+			 res_op->cond.else_value);
+	  *res_op = new_op;
+	  return gimple_resimplify3 (seq, res_op, valueize);
+	}
     }
 
-  /* Otherwise try rewriting the operation as an IFN_COND_* call.
+  /* Otherwise try rewriting the operation as an IFN_COND_(LEN_)* call.
      Again, this isn't a simplification in itself, since it's what
      RES_OP already described.  */
   if (convert_conditional_op (res_op, &new_op))
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..55c771d560f 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,9 @@ public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond)
+    : cond (NULL_TREE), else_value (NULL_TREE), len (NULL_TREE),
+      bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +58,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
-- 
2.41.0




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-17 11:39             ` Robin Dapp
@ 2023-10-17 13:35               ` Richard Sandiford
  2023-10-17 15:42                 ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-17 13:35 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, Robin Dapp via Gcc-patches

Robin Dapp <rdapp.gcc@gmail.com> writes:
>>> I don't know much about valueisation either :)  But it does feel
>>> like we're working around the lack of a LEN form of COND_EXPR.
>>> In other words, it seems odd that we can do:
>>>
>>>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>>
>>> but we can't do:
>>>
>>>   IFN_COND_LEN (mask, a, b, len, bias)
>>>
>>> There seems to be no way of applying a length without also finding an
>>> operation to perform.
>> 
>> Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
>> IFN_COND{,_LEN} to be more consistent here?
>
> So, yes we could define IFN_COND_LEN (or VCOND_MASK_LEN) but I'd
> assume that there would be a whole lot of follow-up things to
> consider.
>
> I'm wondering if we really gain something from the the round-trip
> via VEC_COND_EXPR when we eventually create a COND_(LEN_)_OP anyway?

The main purpose of the VEC_COND_EXPR isn't as an intermediate step,
but as an end in its own right.  E.g. it allows:

  IFN_COND_ADD (mask, cst1, cst2, else)

to be folded to:

  VEC_COND_EXPR <mask, cst1 + cst, else>

This is especially useful when vectorisation has the effect of completely
unrolling a loop.

The VEC_COND_EXPR is only used if the equivalent unconditional rule
folds to a gimple value.

> Sure, if the target doesn't have the particular operation we would
> want a VEC_COND_EXPR.  Same if SEQ is somehow more complicated.
>
> So the IFN_COND(_LEN) =? VCOND_MASK(_LEN) discussion notwithstanding,
> couldn't what I naively proposed be helpful as well?

I don't think it's independently useful, since the fold that it's
attempting is one that match.pd should be able to do.  match.pd can
also do it in a more general way, since it isn't restricted to looking
at the currenct sequence.

> Or do we
> potentially lose optimizations during the time where e.g. a
>  _foo = a BINOP b
>  VEC_COND_EXPR (cond, foo, else)
> has not yet been converted into a
>  COND_OP?

Yeah, it would miss out on that too.  

> We already create COND_OPs for the other paths
> (via convert_conditional_op) so why not for this one?  Or am I missing
> some interdependence with SEQ?

The purpose of this code is to see what happens if we apply the
usual folds for unconditional ops to the corresponding conditional forms.
E.g. for IFN_COND_ADD (mask, a, b, c) it sees what a + b would fold to,
then tries to reapply the VEC_DOND_EXPR (mask, ..., c) to the result.

If a + b folds to a gimple value, we can fold to a VEC_COND_EXPR
involving that gimple value, as discussed above.  This could happen
if a + b folds to a constant, or for things like a + 0 -> a.

If instead a + b folds to a new operation (say a + b' or a - b'),
we need to construct the equivalent conditional form of that operation,
with the same mask and else values.  This is a correctness issue rather
than an optimisation.  As the comment in:

  /* Otherwise try rewriting the operation as an IFN_COND_* call.
     Again, this isn't a simplification in itself, since it's what
     RES_OP already described.  */
  if (convert_conditional_op (res_op, &new_op))
    *res_op = new_op;

says, it's just reconstituting what RES_OP describes in gimple form.
If that isn't possible then the simplification must fail.

In some cases we could, as a follow-on, try to make a a' op b' fold
result fall back to an unconditional a' op b' followed by a VEC_COND_EXPR.
But we don't do that currently.  It isn't safe in all cases, since
IFN_COND_ADD only adds active elements, whereas an unconditional a' op b'
would operate on all elements.  I also don't know of any specific example
where this would be useful on SVE.

Thanks,
Richard

>
> FWIW I did a full bootstrap and testsuite run on the usual architectures
> showing no changes with the attached patch.
>
> Regards
>  Robin
>
> Subject: [PATCH] gimple-match: Create COND_OP directly if possible.
>
> This patch converts simplified sequences into conditional operations
> instead of VEC_COND_EXPRs if the target supports them.
> This helps for len-masked targets which cannot directly use a
> VEC_COND_EXPR in the presence of length masking.
>
> gcc/ChangeLog:
>
> 	* gimple-match-exports.cc (directly_supported_p): Define.
> 	(maybe_resimplify_conditional_op): Create COND_OP directly.
> 	* gimple-match.h (gimple_match_cond::gimple_match_cond):
> 	Initialize length and bias.
> ---
>  gcc/gimple-match-exports.cc | 40 ++++++++++++++++++++++++++++---------
>  gcc/gimple-match.h          |  7 +++++--
>  2 files changed, 36 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..ba3bd1450db 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -98,6 +98,8 @@ static bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree))
>  static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree (*)(tree));
>  static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree (*)(tree));
>  
> +bool directly_supported_p (code_helper, tree, optab_subtype);
> +
>  /* Match and simplify the toplevel valueized operation THIS.
>     Replaces THIS with a simplified and/or canonicalized result and
>     returns whether any change was made.  */
> @@ -299,22 +301,42 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>  	}
>      }
>  
> -  /* If the "then" value is a gimple value and the "else" value matters,
> -     create a VEC_COND_EXPR between them, then see if it can be further
> -     simplified.  */
> +  /* If the condition represents MASK ? THEN : ELSE, where THEN is a gimple
> +     value and ELSE matters, create a VEC_COND_EXPR between them, then see
> +     if it can be further simplified.
> +     For COND_LEN masking, try to create a COND_LEN_OP directly in case
> +     SEQ contains a supportable operation. */
>    gimple_match_op new_op;
>    if (res_op->cond.else_value
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> -      *res_op = new_op;
> -      return gimple_resimplify3 (seq, res_op, valueize);
> +      /* If a previous simplification was pushed to SEQ
> +	 and we can convert it to a COND_OP directly, do so
> +	 in order to save a round-trip via VEC_COND_EXPR -> COND_OP.  */
> +      if (seq && *seq && is_gimple_assign (*seq)
> +	  && directly_supported_p (gimple_assign_rhs_code (*seq), res_op->type,
> +				   optab_scalar))
> +	{
> +	  res_op->code = gimple_assign_rhs_code (*seq);
> +	  res_op->num_ops = gimple_num_ops (*seq) - 1;
> +	  res_op->ops[0] = gimple_assign_rhs1 (*seq);
> +	  if (res_op->num_ops > 1)
> +	    res_op->ops[1] = gimple_assign_rhs2 (*seq);
> +	  if (res_op->num_ops > 2)
> +	    res_op->ops[2] = gimple_assign_rhs2 (*seq);
> +	}
> +      else if (!res_op->cond.len)
> +	{
> +	  new_op.set_op (VEC_COND_EXPR, res_op->type,
> +			 res_op->cond.cond, res_op->ops[0],
> +			 res_op->cond.else_value);
> +	  *res_op = new_op;
> +	  return gimple_resimplify3 (seq, res_op, valueize);
> +	}
>      }
>  
> -  /* Otherwise try rewriting the operation as an IFN_COND_* call.
> +  /* Otherwise try rewriting the operation as an IFN_COND_(LEN_)* call.
>       Again, this isn't a simplification in itself, since it's what
>       RES_OP already described.  */
>    if (convert_conditional_op (res_op, &new_op))
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..55c771d560f 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,9 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond)
> +    : cond (NULL_TREE), else_value (NULL_TREE), len (NULL_TREE),
> +      bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +58,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-17 13:35               ` Richard Sandiford
@ 2023-10-17 15:42                 ` Robin Dapp
  2023-10-17 16:05                   ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-10-17 15:42 UTC (permalink / raw)
  To: Richard Biener, Robin Dapp via Gcc-patches, richard.sandiford; +Cc: rdapp.gcc

Thank you for the explanation.

So, assuming I added an IFN_VCOND_MASK and IFN_VCOND_MASK_LEN along
with the respective helper and expand functions, what would be the
way forward?

Generate an IFN_VCOND_MASK(_LEN) here instead of a VEC_COND_EXPR?
How would I make sure all of match.pd's vec_cond optimizations
applied to it as well?
Right now AFAIK IFN_VCOND_MASK only gets created in isel and
everything is just a VEC_COND before.  But that does not provide
length masking so is not the way to go?

Thanks.

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-17  8:47           ` Richard Biener
  2023-10-17 11:39             ` Robin Dapp
@ 2023-10-17 15:52             ` Richard Sandiford
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Sandiford @ 2023-10-17 15:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: Robin Dapp, Robin Dapp via Gcc-patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Oct 16, 2023 at 11:59 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Robin Dapp <rdapp.gcc@gmail.com> writes:
>> >> Why are the contents of this if statement wrong for COND_LEN?
>> >> If the "else" value doesn't matter, then the masked form can use
>> >> the "then" value for all elements.  I would have expected the same
>> >> thing to be true of COND_LEN.
>> >
>> > Right, that one was overly pessimistic.  Removed.
>> >
>> >> But isn't the test whether res_op->code itself is an internal_function?
>> >> In other words, shouldn't it just be:
>> >>
>> >>       if (internal_fn_p (res_op->code)
>> >>        && internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
>> >>      return true;
>> >>
>> >> maybe_resimplify_conditional_op should already have converted to an
>> >> internal function where possible, and if combined_fn (res_op->code)
>> >> does any extra conversion on the fly, that conversion won't be reflected
>> >> in res_op.
>> >
>> > I went through some of our test cases and believe most of the problems
>> > are due to situations like the following:
>> >
>> > In vect-cond-arith-2.c we have (on riscv)
>> >   vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
>> >   vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);
>> >
>> > On aarch64 this is a situation that matches the VEC_COND_EXPR
>> > simplification that I disabled with this patch.  We valueized
>> > to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
>> > vect_res_2.5_24 = VEC_COND_EXPR <loop_mask_22, _26, vect_res_1.0_19>;
>> > This is later re-assembled into a COND_SUB.
>> >
>> > As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
>> > achieve the same thing.  Would it be possible to create a COND_OP
>> > directly instead, though?  I tried the following (not very polished
>> > obviously):
>> >
>> > -      new_op.set_op (VEC_COND_EXPR, res_op->type,
>> > -                    res_op->cond.cond, res_op->ops[0],
>> > -                    res_op->cond.else_value);
>> > -      *res_op = new_op;
>> > -      return gimple_resimplify3 (seq, res_op, valueize);
>> > +      if (!res_op->cond.len)
>> > +       {
>> > +         new_op.set_op (VEC_COND_EXPR, res_op->type,
>> > +                        res_op->cond.cond, res_op->ops[0],
>> > +                        res_op->cond.else_value);
>> > +         *res_op = new_op;
>> > +         return gimple_resimplify3 (seq, res_op, valueize);
>> > +       }
>> > +      else if (seq && *seq && is_gimple_assign (*seq))
>> > +       {
>> > +         new_op.code = gimple_assign_rhs_code (*seq);
>> > +         new_op.type = res_op->type;
>> > +         new_op.num_ops = gimple_num_ops (*seq) - 1;
>> > +         new_op.ops[0] = gimple_assign_rhs1 (*seq);
>> > +         if (new_op.num_ops > 1)
>> > +           new_op.ops[1] = gimple_assign_rhs2 (*seq);
>> > +         if (new_op.num_ops > 2)
>> > +           new_op.ops[2] = gimple_assign_rhs2 (*seq);
>> > +
>> > +         new_op.cond = res_op->cond;
>> > +
>> > +         gimple_match_op bla2;
>> > +         if (convert_conditional_op (&new_op, &bla2))
>> > +           {
>> > +             *res_op = bla2;
>> > +             // SEQ should now be dead.
>> > +             return true;
>> > +           }
>> > +       }
>> >
>> > This would make the other hunk (check whether it was a LEN
>> > and try to recreate it) redundant I hope.
>> >
>> > I don't know enough about valueization, whether it's always
>> > safe to do that and other implications.  On riscv this seems
>> > to work, though and the other backends never go through the LEN
>> > path.  If, however, this is a feasible direction it could also
>> > be done for the non-LEN targets?
>>
>> I don't know much about valueisation either :)  But it does feel
>> like we're working around the lack of a LEN form of COND_EXPR.
>> In other words, it seems odd that we can do:
>>
>>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>
>> but we can't do:
>>
>>   IFN_COND_LEN (mask, a, b, len, bias)
>>
>> There seems to be no way of applying a length without also finding an
>> operation to perform.
>
> Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
> IFN_COND{,_LEN} to be more consistent here?

Yeah, sounds like it could be worthwhile.  But I suppose we still need
VEC_COND_EXPR itself because it's a generic front-end operation that
needs to be lowered.  So it might be worth starting with an ifn for the
LEN form and seeing whether the non-LEN form should switch over.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.
  2023-10-17 15:42                 ` Robin Dapp
@ 2023-10-17 16:05                   ` Richard Sandiford
       [not found]                     ` <7e083b67-f283-4e9e-ba76-24e194fa1761@gmail.com>
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-17 16:05 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, Robin Dapp via Gcc-patches

Robin Dapp <rdapp.gcc@gmail.com> writes:
> Thank you for the explanation.
>
> So, assuming I added an IFN_VCOND_MASK and IFN_VCOND_MASK_LEN along
> with the respective helper and expand functions, what would be the
> way forward?

IMO it'd be worth starting with the _LEN form only.

> Generate an IFN_VCOND_MASK(_LEN) here instead of a VEC_COND_EXPR?
> How would I make sure all of match.pd's vec_cond optimizations
> applied to it as well?

I think the most important ones are:

/* Simplify:

     a = a1 op a2
     r = c ? a : b;

   to:

     r = c ? a1 op a2 : b;

   if the target can do it in one go.  This makes the operation conditional
   on c, so could drop potentially-trapping arithmetic, but that's a valid
   simplification if the result of the operation isn't needed.

   Avoid speculatively generating a stand-alone vector comparison
   on targets that might not support them.  Any target implementing
   conditional internal functions must support the same comparisons
   inside and outside a VEC_COND_EXPR.  */

It would be nice if there was some match.pd syntax that automatically
extended these rules to IFN_VCOND_MASK_LEN, but I don't know how easy
that would be, due to the extra two parameters.

Perhaps that itself could be done in gimple-match-exports.cc, in a similar
way to the current conditional stuff.  That is:

- for IFN_VCOND_MASK_LEN, try folding as a VEC_COND_EXPR and then "adding
  the length back"

- for IFN_COND_LEN_FOO, try folding as an IFN_COND_FOO and then
  "add the length back"

Not sure how important the second one is.

Thanks,
Richard

> Right now AFAIK IFN_VCOND_MASK only gets created in isel and
> everything is just a VEC_COND before.  But that does not provide
> length masking so is not the way to go?
>
> Thanks.
>
> Regards
>  Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] internal-fn: Add VCOND_MASK_LEN.
       [not found]                       ` <mptttqmny4u.fsf@arm.com>
@ 2023-10-23 16:09                         ` Robin Dapp
  2023-10-24 21:50                           ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-10-23 16:09 UTC (permalink / raw)
  To: Richard Biener, richard.sandiford, gcc-patches; +Cc: rdapp.gcc

The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
that were broken before and looks unchanged on x86, aarch64 and power
bootstrap and testsuites.

I only went with the minimal number of new match.pd patterns and did not
try stripping the length of a COND_LEN_OP in order to simplify the
associated COND_OP.

An important part that I'm not sure how to handle properly is -
when we have a constant immediate length of e.g. 16 and the hardware
also operates on 16 units, vector length masking is actually
redundant and the vcond_mask_len can be reduced to a vec_cond.
For those (if_then_else unsplit) we have a large number of combine
patterns that fuse instruction which do not correspond to ifns
(like widening operations but also more complex ones).

Currently I achieve this in a most likely wrong way:

      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
      bool full_len = len && known_eq (sz.coeffs[0], ilen);
      if (!len || full_len)
         "vec_cond"
      else
         "vcond_mask_len"

Another thing not done in this patch:  For vcond_mask we only expect
register operands as mask and force to a register.  For a vcond_mask_len
that results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.

Regards
 Robin

From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@ventanamicro.com>
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
	expander.
	* config/riscv/riscv-protos.h (enum insn_type):
	* doc/md.texi: Add vcond_mask_len.
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Create VCOND_MASK_LEN when
	length masking.
	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
	matching of 6 and 7 parameters.
	(gimple_match_op::set_op): Ditto.
	(gimple_match_op::gimple_match_op): Always initialize len and
	bias.
	* internal-fn.cc (vec_cond_mask_len_direct): Add.
	(expand_vec_cond_mask_len_optab_fn): Add.
	(direct_vec_cond_mask_len_optab_supported_p): Add.
	(internal_fn_len_index): Add VCOND_MASK_LEN.
	(internal_fn_mask_index): Ditto.
	* internal-fn.def (VCOND_MASK_LEN): New internal function.
	* match.pd: Combine unconditional unary, binary and ternary
	operations into the respective COND_LEN operations.
	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md     | 20 +++++++++
 gcc/config/riscv/riscv-protos.h |  4 ++
 gcc/doc/md.texi                 |  9 ++++
 gcc/gimple-match-exports.cc     | 20 +++++++--
 gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
 gcc/internal-fn.cc              | 41 +++++++++++++++++
 gcc/internal-fn.def             |  2 +
 gcc/match.pd                    | 74 +++++++++++++++++++++++++++++++
 gcc/optabs.def                  |  1 +
 9 files changed, 244 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..27a71bc1ef9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,26 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_<mode><vm>"
+  [(match_operand:V_VLS 0 "register_operand")
+    (match_operand:<VM> 3 "register_operand")
+    (match_operand:V_VLS 1 "nonmemory_operand")
+    (match_operand:V_VLS 2 "register_operand")
+    (match_operand 4 "autovec_length_operand")
+    (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+    /* The order of vcond_mask is opposite to pred_merge.  */
+    rtx ops[] = {operands[0], operands[0], operands[2], operands[1],
+		 operands[3]};
+    riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
+				      riscv_vector::MERGE_OP_REAL_ELSE, ops,
+				      operands[4]);
+    DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [BOOL] Select based on masks
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..025a3568566 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -337,6 +337,10 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with no vundef operand.  */
+  MERGE_OP_REAL_ELSE = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P
+		       | TERNARY_OP_P,
+
   /* For vm<compare>, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index daa318ee3da..de0757f1903 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
 Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
 result of vector comparison.
 
+@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
+or constant length and operand 5 holds a bias.  If the
+element index < operand 4 + operand 5 the respective element of the result is
+computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
+operand 4 + operand 5 the computation is performed as if the respective mask
+element were zero.
+
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..32134dbf711 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -307,9 +307,23 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
+      tree len = res_op->cond.len;
+      HOST_WIDE_INT ilen = -1;
+      if (len && TREE_CODE (len) == INTEGER_CST && tree_fits_uhwi_p (len))
+	ilen = tree_to_uhwi (len);at results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.
+
+      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
+      bool full_len = len && known_eq (sz.coeffs[0], ilen);
+
+      if (!len || full_len)
+	new_op.set_op (VEC_COND_EXPR, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value);
+      else
+	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value, res_op->cond.len,
+		       res_op->cond.bias);
       *res_op = new_op;
       return gimple_resimplify3 (seq, res_op, valueize);
     }
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..63a9f029589 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,8 @@ public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
+			       (NULL_TREE), bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +57,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
@@ -92,6 +94,10 @@ public:
 		   code_helper, tree, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
@@ -100,6 +106,8 @@ public:
   void set_op (code_helper, tree, tree, tree, tree, bool);
   void set_op (code_helper, tree, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
@@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
   ops[4] = op4;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5, tree op6)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (7)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
   ops[4] = op4;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 6;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5, tree op6)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 7;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..b47c33faf85 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -170,6 +170,7 @@ init_internal_fns ()
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define vec_cond_mask_direct { 1, 0, false }
+#define vec_cond_mask_len_direct { 2, 0, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
@@ -3129,6 +3130,41 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
     emit_move_insn (target, ops[0].value);
 }
 
+static void
+expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
+  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
+  rtx rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  rtx_op1 = force_reg (mode, rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+
+  int opno = add_mask_and_len_args (ops, 3, stmt);
+  expand_insn (icode, opno, ops);
+
+  if (!rtx_equal_p (ops[0].value, target))
+    emit_move_insn (target, ops[0].value);
+}
+
 /* Expand VEC_SET internal functions.  */
 
 static void
@@ -4018,6 +4054,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
+#define direct_vec_cond_mask_len_optab_supported_p convert_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
@@ -4690,6 +4727,7 @@ internal_fn_len_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
     case IFN_MASK_LEN_LOAD_LANES:
     case IFN_MASK_LEN_STORE_LANES:
+    case IFN_VCOND_MASK_LEN:
       return 3;
 
     default:
@@ -4721,6 +4759,9 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_SCATTER_STORE:
       return 4;
 
+    case IFN_VCOND_MASK_LEN:
+      return 0;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..581cc3b5140 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
 		       vcond_mask, vec_cond_mask)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
+		       vcond_mask_len, vec_cond_mask_len)
 
 DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/match.pd b/gcc/match.pd
index ce8d159d260..f187d560fbf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   negate bit_not)
 (define_operator_list COND_UNARY
   IFN_COND_NEG IFN_COND_NOT)
+(define_operator_list COND_LEN_UNARY
+  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
 
 /* Binary operations and their associated IFN_COND_* function.  */
 (define_operator_list UNCOND_BINARY
@@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_FMIN IFN_COND_FMAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
+(define_operator_list COND_LEN_BINARY
+  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
+  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
+  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
+  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
+  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
+  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
 
 /* Same for ternary operations.  */
 (define_operator_list UNCOND_TERNARY
   IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
 (define_operator_list COND_TERNARY
   IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
+(define_operator_list COND_LEN_TERNARY
+  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
 
 /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
 (define_operator_list ATOMIC_FETCH_OR_XOR_N
@@ -8949,6 +8960,69 @@ and,
 	&& single_use (@5))
     (view_convert (cond_op (bit_not @0) @2 @3 @4
 		  (view_convert:op_type @1)))))))
+
+/* Similar for all cond_len operations.  */
+(for uncond_op (UNCOND_UNARY)
+     cond_op (COND_LEN_UNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op @0 @1 @2 @4 @5))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op (bit_not @0) @2 @1 @4 @5)))))
+
+(for uncond_op (UNCOND_BINARY)
+     cond_op (COND_LEN_BINARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
+
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_LEN_TERNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
+
+/* A VCOND_MASK_LEN with a size that equals the full hardware vector size
+   is just a vec_cond.  */
+(simplify
+ (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
+ (with {
+      HOST_WIDE_INT len = -1;
+      if (tree_fits_uhwi_p (@3))
+	len = tree_to_uhwi (@3);
+      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
+      bool full_len = (sz.coeffs[0] == len); }
+   (if (full_len)
+    (vec_cond @0 @1 @2))))
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..3cb16bd3002 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -88,6 +88,7 @@ OPTAB_CD(vcond_optab, "vcond$a$b")
 OPTAB_CD(vcondu_optab, "vcondu$a$b")
 OPTAB_CD(vcondeq_optab, "vcondeq$a$b")
 OPTAB_CD(vcond_mask_optab, "vcond_mask_$a$b")
+OPTAB_CD(vcond_mask_len_optab, "vcond_mask_len_$a$b")
 OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
 OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
 OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
-- 
2.41.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-23 16:09                         ` [PATCH] internal-fn: Add VCOND_MASK_LEN Robin Dapp
@ 2023-10-24 21:50                           ` Richard Sandiford
  2023-10-25 19:59                             ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-24 21:50 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, gcc-patches

Robin Dapp <rdapp.gcc@gmail.com> writes:
> The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
> that were broken before and looks unchanged on x86, aarch64 and power
> bootstrap and testsuites.
>
> I only went with the minimal number of new match.pd patterns and did not
> try stripping the length of a COND_LEN_OP in order to simplify the
> associated COND_OP.
>
> An important part that I'm not sure how to handle properly is -
> when we have a constant immediate length of e.g. 16 and the hardware
> also operates on 16 units, vector length masking is actually
> redundant and the vcond_mask_len can be reduced to a vec_cond.
> For those (if_then_else unsplit) we have a large number of combine
> patterns that fuse instruction which do not correspond to ifns
> (like widening operations but also more complex ones).
>
> Currently I achieve this in a most likely wrong way:
>
>       auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
>       bool full_len = len && known_eq (sz.coeffs[0], ilen);
>       if (!len || full_len)
>          "vec_cond"
>       else
>          "vcond_mask_len"

At first, this seemed like an odd place to fold away the length.
AFAIK the length in res_op is inherited directly from the original
operation, and so it isn't any more redundant after the fold than
it was before.  But I suppose the reason for doing it here is that
we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
lengths.  Doing that avoids the need to define an IFN_COND_FOO
equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
I think it deserves a comment.

But yeah, known_eq (sz.coeffs[0], ilen) doesn't look right.
If the target knows that the length is exactly 16 at runtime,
then it should set GET_MODE_NUNITS to 16.  So I think the length
is only redundant if known_eq (sz, ilen).

The calculation should take the bias into account as well.

Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
(I realise IFN_COND_MASK isn't, but that's used differently.)

Failing that, could the expansion use expand_fn_using_insn?

It generally looks OK to me otherwise FWIW, but it would be nice
to handle the fold programmatically in gimple-match*.cc rather
than having the explicit match.pd patterns.  Richi should review
the match.pd stuff though. ;)  (I didn't really look at it.)

Thanks,
Richard

> Another thing not done in this patch:  For vcond_mask we only expect
> register operands as mask and force to a register.  For a vcond_mask_len
> that results from a simplification with all-one or all-zero mask we
> could allow constant immediate vectors and expand them to simple
> len moves in the backend.
>
> Regards
>  Robin
>
> From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
> From: Robin Dapp <rdapp@ventanamicro.com>
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.
>
> gcc/ChangeLog:
>
> 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> 	expander.
> 	* config/riscv/riscv-protos.h (enum insn_type):
> 	* doc/md.texi: Add vcond_mask_len.
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Create VCOND_MASK_LEN when
> 	length masking.
> 	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
> 	matching of 6 and 7 parameters.
> 	(gimple_match_op::set_op): Ditto.
> 	(gimple_match_op::gimple_match_op): Always initialize len and
> 	bias.
> 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> 	(expand_vec_cond_mask_len_optab_fn): Add.
> 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> 	(internal_fn_mask_index): Ditto.
> 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> 	* match.pd: Combine unconditional unary, binary and ternary
> 	operations into the respective COND_LEN operations.
> 	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md     | 20 +++++++++
>  gcc/config/riscv/riscv-protos.h |  4 ++
>  gcc/doc/md.texi                 |  9 ++++
>  gcc/gimple-match-exports.cc     | 20 +++++++--
>  gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
>  gcc/internal-fn.cc              | 41 +++++++++++++++++
>  gcc/internal-fn.def             |  2 +
>  gcc/match.pd                    | 74 +++++++++++++++++++++++++++++++
>  gcc/optabs.def                  |  1 +
>  9 files changed, 244 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 80910ba3cc2..27a71bc1ef9 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,26 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_<mode><vm>"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand:<VM> 3 "register_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "register_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    /* The order of vcond_mask is opposite to pred_merge.  */
> +    rtx ops[] = {operands[0], operands[0], operands[2], operands[1],
> +		 operands[3]};
> +    riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> +				      riscv_vector::MERGE_OP_REAL_ELSE, ops,
> +				      operands[4]);
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -------------------------------------------------------------------------
>  ;; ---- [BOOL] Select based on masks
>  ;; -------------------------------------------------------------------------
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 6cb9d459ee9..025a3568566 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -337,6 +337,10 @@ enum insn_type : unsigned int
>    /* For vmerge, no mask operand, no mask policy operand.  */
>    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with no vundef operand.  */
> +  MERGE_OP_REAL_ELSE = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P
> +		       | TERNARY_OP_P,
> +
>    /* For vm<compare>, no tail policy operand.  */
>    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
>    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index daa318ee3da..de0757f1903 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
>  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
>  result of vector comparison.
>  
> +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> +@item @samp{vcond_mask_@var{m}@var{n}}
> +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> +or constant length and operand 5 holds a bias.  If the
> +element index < operand 4 + operand 5 the respective element of the result is
> +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> +operand 4 + operand 5 the computation is performed as if the respective mask
> +element were zero.
> +
>  @cindex @code{maskload@var{m}@var{n}} instruction pattern
>  @item @samp{maskload@var{m}@var{n}}
>  Perform a masked load of vector from memory operand 1 of mode @var{m}
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..32134dbf711 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -307,9 +307,23 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> +      tree len = res_op->cond.len;
> +      HOST_WIDE_INT ilen = -1;
> +      if (len && TREE_CODE (len) == INTEGER_CST && tree_fits_uhwi_p (len))
> +	ilen = tree_to_uhwi (len);
> +
> +      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
> +      bool full_len = len && known_eq (sz.coeffs[0], ilen);
> +
> +      if (!len || full_len)
> +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value);
> +      else
> +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value, res_op->cond.len,
> +		       res_op->cond.bias);
>        *res_op = new_op;
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..63a9f029589 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,8 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> +			       (NULL_TREE), bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +57,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }
>  
> @@ -92,6 +94,10 @@ public:
>  		   code_helper, tree, tree, tree, tree, tree);
>    gimple_match_op (const gimple_match_cond &,
>  		   code_helper, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>  
>    void set_op (code_helper, tree, unsigned int);
>    void set_op (code_helper, tree, tree);
> @@ -100,6 +106,8 @@ public:
>    void set_op (code_helper, tree, tree, tree, tree, bool);
>    void set_op (code_helper, tree, tree, tree, tree, tree);
>    void set_op (code_helper, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>    void set_value (tree);
>  
>    tree op_or_null (unsigned int) const;
> @@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
>    ops[4] = op4;
>  }
>  
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (6)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5, tree op6)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (7)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Change the operation performed to CODE_IN, the type of the result to
>     TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
>     to set the operands itself.  */
> @@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
>    ops[4] = op4;
>  }
>  
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 6;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5, tree op6)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 7;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Set the "operation" to be the single value VALUE, such as a constant
>     or SSA_NAME.  */
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..b47c33faf85 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -170,6 +170,7 @@ init_internal_fns ()
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
>  #define vec_cond_mask_direct { 1, 0, false }
> +#define vec_cond_mask_len_direct { 2, 0, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> @@ -3129,6 +3130,41 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>      emit_move_insn (target, ops[0].value);
>  }
>  
> +static void
> +expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[6];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op0 = gimple_call_arg (stmt, 0);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  tree vec_cond_type = TREE_TYPE (lhs);
> +
> +  machine_mode mode = TYPE_MODE (vec_cond_type);
> +  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
> +  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
> +  rtx rtx_op1, rtx_op2;
> +
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  rtx_op1 = expand_normal (op1);
> +  rtx_op2 = expand_normal (op2);
> +
> +  rtx_op1 = force_reg (mode, rtx_op1);
> +
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], rtx_op1, mode);
> +  create_input_operand (&ops[2], rtx_op2, mode);
> +
> +  int opno = add_mask_and_len_args (ops, 3, stmt);
> +  expand_insn (icode, opno, ops);
> +
> +  if (!rtx_equal_p (ops[0].value, target))
> +    emit_move_insn (target, ops[0].value);
> +}
> +
>  /* Expand VEC_SET internal functions.  */
>  
>  static void
> @@ -4018,6 +4054,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
> +#define direct_vec_cond_mask_len_optab_supported_p convert_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
> @@ -4690,6 +4727,7 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>      case IFN_MASK_LEN_LOAD_LANES:
>      case IFN_MASK_LEN_STORE_LANES:
> +    case IFN_VCOND_MASK_LEN:
>        return 3;
>  
>      default:
> @@ -4721,6 +4759,9 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_SCATTER_STORE:
>        return 4;
>  
> +    case IFN_VCOND_MASK_LEN:
> +      return 0;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..581cc3b5140 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
>  		       vcond_mask, vec_cond_mask)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> +		       vcond_mask_len, vec_cond_mask_len)
>  
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ce8d159d260..f187d560fbf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    negate bit_not)
>  (define_operator_list COND_UNARY
>    IFN_COND_NEG IFN_COND_NOT)
> +(define_operator_list COND_LEN_UNARY
> +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
>  
>  /* Binary operations and their associated IFN_COND_* function.  */
>  (define_operator_list UNCOND_BINARY
> @@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    IFN_COND_FMIN IFN_COND_FMAX
>    IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>    IFN_COND_SHL IFN_COND_SHR)
> +(define_operator_list COND_LEN_BINARY
> +  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
> +  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
> +  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
> +  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
> +  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
> +  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
>  
>  /* Same for ternary operations.  */
>  (define_operator_list UNCOND_TERNARY
>    IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
>  (define_operator_list COND_TERNARY
>    IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> +(define_operator_list COND_LEN_TERNARY
> +  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
>  
>  /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
>  (define_operator_list ATOMIC_FETCH_OR_XOR_N
> @@ -8949,6 +8960,69 @@ and,
>  	&& single_use (@5))
>      (view_convert (cond_op (bit_not @0) @2 @3 @4
>  		  (view_convert:op_type @1)))))))
> +
> +/* Similar for all cond_len operations.  */
> +(for uncond_op (UNCOND_UNARY)
> +     cond_op (COND_LEN_UNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op @0 @1 @2 @4 @5))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> +
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_LEN_BINARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> +
> +(for uncond_op (UNCOND_TERNARY)
> +     cond_op (COND_LEN_TERNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> +
> +/* A VCOND_MASK_LEN with a size that equals the full hardware vector size
> +   is just a vec_cond.  */
> +(simplify
> + (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
> + (with {
> +      HOST_WIDE_INT len = -1;
> +      if (tree_fits_uhwi_p (@3))
> +	len = tree_to_uhwi (@3);
> +      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
> +      bool full_len = (sz.coeffs[0] == len); }
> +   (if (full_len)
> +    (vec_cond @0 @1 @2))))
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..3cb16bd3002 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -88,6 +88,7 @@ OPTAB_CD(vcond_optab, "vcond$a$b")
>  OPTAB_CD(vcondu_optab, "vcondu$a$b")
>  OPTAB_CD(vcondeq_optab, "vcondeq$a$b")
>  OPTAB_CD(vcond_mask_optab, "vcond_mask_$a$b")
> +OPTAB_CD(vcond_mask_len_optab, "vcond_mask_len_$a$b")
>  OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
>  OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
>  OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-24 21:50                           ` Richard Sandiford
@ 2023-10-25 19:59                             ` Robin Dapp
  2023-10-25 21:58                               ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-10-25 19:59 UTC (permalink / raw)
  To: Richard Biener, gcc-patches, richard.sandiford; +Cc: rdapp.gcc

> At first, this seemed like an odd place to fold away the length.
> AFAIK the length in res_op is inherited directly from the original
> operation, and so it isn't any more redundant after the fold than
> it was before.  But I suppose the reason for doing it here is that
> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
> lengths.  Doing that avoids the need to define an IFN_COND_FOO
> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
> I think it deserves a comment.

I think, generally, what I want to cover is a more fundamental thing
- in length-controlled targets the loop length doesn't change
throughout a loop and what we normally do is load the right length,
operate on the maximum length (ignoring tail elements) and store
the right length.

So, whenever the length is constant it was already determined that
we operate on exactly this length and length masking is not needed.
Only when the length is variable and not compile-time constant we need
to use length masking (and therefore the vec_cond simplification becomes
invalid).  I think we never e.g. operate on the first "half" of a
vector, leaving the second half unchanged.  As far as I know such access
patterns are always done with non-length, "conditional" masking.

Actually the only problematic cases I found were reduction-like loops
where the reduction operated on full length rather than the "right" one.
If a tail element is wrong then, obviously the reduction result is also
wrong.  From a "loop len" point of view a reduction could have a length
like len_store.  Then the simplification problem would go away.

In the attached version I removed the hunk you mentioned but added a
match.pd pattern where all constant-length vcond_mask_len are simplified
to vec_cond.

/* A VCOND_MASK_LEN with a constant length is just a vec_cond for
   our purposes.  */
(simplify
 (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
  (vec_cond @0 @1 @2))

This works for all of the testsuite (and is basically the same
thing we have been testing all along with the bogus simplification
still in place).  Is there any way how to formalize the
requirement?  Or am I totally wrong and this must never be done?

> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
> (I realise IFN_COND_MASK isn't, but that's used differently.)

Right, a conversion optab is not necessary - in the expander function
all we really do is move the condition from position 1 to 3.  Changing
the order would mean inconsistency with vec_cond.  If that's acceptable
I can change it and we can use expand_direct_optab_fn.  For now I kept
the expander function but used a direct optab.

Regards
 Robin

From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@ventanamicro.com>
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
	expander.
	* config/riscv/riscv-protos.h (enum insn_type):
	* doc/md.texi: Add vcond_mask_len.
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Create VCOND_MASK_LEN when
	length masking.
	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
	matching of 6 and 7 parameters.
	(gimple_match_op::set_op): Ditto.
	(gimple_match_op::gimple_match_op): Always initialize len and
	bias.
	* internal-fn.cc (vec_cond_mask_len_direct): Add.
	(expand_vec_cond_mask_len_optab_fn): Add.
	(direct_vec_cond_mask_len_optab_supported_p): Add.
	(internal_fn_len_index): Add VCOND_MASK_LEN.
	(internal_fn_mask_index): Ditto.
	* internal-fn.def (VCOND_MASK_LEN): New internal function.
	* match.pd: Combine unconditional unary, binary and ternary
	operations into the respective COND_LEN operations.
	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md     | 37 ++++++++++++++++
 gcc/config/riscv/riscv-protos.h |  5 +++
 gcc/doc/md.texi                 |  9 ++++
 gcc/gimple-match-exports.cc     | 13 ++++--
 gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
 gcc/internal-fn.cc              | 42 ++++++++++++++++++
 gcc/internal-fn.def             |  2 +
 gcc/match.pd                    | 67 ++++++++++++++++++++++++++++
 gcc/optabs.def                  |  1 +
 9 files changed, 249 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c9a2cf44816..096012af401 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_<mode>"
+  [(match_operand:V_VLS 0 "register_operand")
+    (match_operand:<VM> 3 "nonmemory_operand")
+    (match_operand:V_VLS 1 "nonmemory_operand")
+    (match_operand:V_VLS 2 "autovec_else_operand")
+    (match_operand 4 "autovec_length_operand")
+    (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+    if (satisfies_constraint_Wc1 (operands[3]))
+      {
+	rtx ops[] = {operands[0], operands[2], operands[1]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
+					  riscv_vector::UNARY_OP_TUMA,
+					  ops, operands[4]);
+      }
+    else if (satisfies_constraint_Wc0 (operands[3]))
+      {
+	rtx ops[] = {operands[0], operands[2], operands[2]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
+					  riscv_vector::UNARY_OP_TUMA,
+					  ops, operands[4]);
+      }
+    else
+      {
+	/* The order of vcond_mask is opposite to pred_merge.  */
+	rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
+		     operands[3]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
+					  riscv_vector::MERGE_OP_TUMA,
+					  ops, operands[4]);
+      }
+    DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [BOOL] Select based on masks
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index aa9ce4b70e4..3938c500839 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP = __NORMAL_OP | UNARY_OP_P,
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
+  UNARY_OP_TUMA = __MASK_OP_TUMA | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
@@ -337,6 +338,10 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with no vundef operand.  */
+  MERGE_OP_TUMA = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P
+		  | TU_POLICY_P,
+
   /* For vm<compare>, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index daa318ee3da..de0757f1903 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
 Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
 result of vector comparison.
 
+@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
+or constant length and operand 5 holds a bias.  If the
+element index < operand 4 + operand 5 the respective element of the result is
+computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
+operand 4 + operand 5 the computation is performed as if the respective mask
+element were zero.
+
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..d6dac08cc2b 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
+      tree len = res_op->cond.len;
+      if (!len)
+	new_op.set_op (VEC_COND_EXPR, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value);
+      else
+	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value,
+		       res_op->cond.len, res_op->cond.bias);
       *res_op = new_op;
       return gimple_resimplify3 (seq, res_op, valueize);
     }
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..63a9f029589 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,8 @@ public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
+			       (NULL_TREE), bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +57,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
@@ -92,6 +94,10 @@ public:
 		   code_helper, tree, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
@@ -100,6 +106,8 @@ public:
   void set_op (code_helper, tree, tree, tree, tree, bool);
   void set_op (code_helper, tree, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
@@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
   ops[4] = op4;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5, tree op6)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (7)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
   ops[4] = op4;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 6;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5, tree op6)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 7;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f196064c195..318756b6992 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -170,6 +170,7 @@ init_internal_fns ()
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define vec_cond_mask_direct { 1, 0, false }
+#define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
@@ -3129,6 +3130,39 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
     emit_move_insn (target, ops[0].value);
 }
 
+static void
+expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  enum insn_code icode = direct_optab_handler (optab, mode);
+  rtx rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  rtx_op1 = force_reg (mode, rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+
+  int opno = add_mask_and_len_args (ops, 3, stmt);
+  expand_insn (icode, opno, ops);
+
+  if (!rtx_equal_p (ops[0].value, target))
+    emit_move_insn (target, ops[0].value);
+}
+
 /* Expand VEC_SET internal functions.  */
 
 static void
@@ -3931,6 +3965,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab,
 #define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \
   expand_convert_optab_fn (FN, STMT, OPTAB, 2)
 
+#define expand_vec_cond_mask_len_optab_fn(FN, STMT, OPTAB) \
+  expand_vec_cond_mask_len_optab_fn (FN, STMT, OPTAB)
+
 /* RETURN_TYPE and ARGS are a return type and argument list that are
    in principle compatible with FN (which satisfies direct_internal_fn_p).
    Return the types that should be used to determine whether the
@@ -4022,6 +4059,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
+#define direct_vec_cond_mask_len_optab_supported_p direct_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
@@ -4694,6 +4732,7 @@ internal_fn_len_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
     case IFN_MASK_LEN_LOAD_LANES:
     case IFN_MASK_LEN_STORE_LANES:
+    case IFN_VCOND_MASK_LEN:
       return 3;
 
     default:
@@ -4783,6 +4822,9 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_SCATTER_STORE:
       return 4;
 
+    case IFN_VCOND_MASK_LEN:
+      return 0;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..581cc3b5140 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
 		       vcond_mask, vec_cond_mask)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
+		       vcond_mask_len, vec_cond_mask_len)
 
 DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/match.pd b/gcc/match.pd
index f725a685863..33532776288 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   negate bit_not)
 (define_operator_list COND_UNARY
   IFN_COND_NEG IFN_COND_NOT)
+(define_operator_list COND_LEN_UNARY
+  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
 
 /* Binary operations and their associated IFN_COND_* function.  */
 (define_operator_list UNCOND_BINARY
@@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_FMIN IFN_COND_FMAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
+(define_operator_list COND_LEN_BINARY
+  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
+  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
+  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
+  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
+  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
+  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
 
 /* Same for ternary operations.  */
 (define_operator_list UNCOND_TERNARY
   IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
 (define_operator_list COND_TERNARY
   IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
+(define_operator_list COND_LEN_TERNARY
+  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
 
 /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
 (define_operator_list ATOMIC_FETCH_OR_XOR_N
@@ -8949,6 +8960,62 @@ and,
 	&& single_use (@5))
     (view_convert (cond_op (bit_not @0) @2 @3 @4
 		  (view_convert:op_type @1)))))))
+
+/* Similar for all cond_len operations.  */
+(for uncond_op (UNCOND_UNARY)
+     cond_op (COND_LEN_UNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op @0 @1 @2 @4 @5))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op (bit_not @0) @2 @1 @4 @5)))))
+
+(for uncond_op (UNCOND_BINARY)
+     cond_op (COND_LEN_BINARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
+
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_LEN_TERNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
+
+/* A VCOND_MASK_LEN with a constant length is just a vec_cond for our
+   purposes.  */
+(simplify
+ (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
+  (vec_cond @0 @1 @2))
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..8d5ceeb8710 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
 OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
 OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
 OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
+OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")
 OPTAB_D (ctrap_optab, "ctrap$a4")
-- 
2.41.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-25 19:59                             ` Robin Dapp
@ 2023-10-25 21:58                               ` Richard Sandiford
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Sandiford @ 2023-10-25 21:58 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, gcc-patches

Robin Dapp <rdapp.gcc@gmail.com> writes:
>> At first, this seemed like an odd place to fold away the length.
>> AFAIK the length in res_op is inherited directly from the original
>> operation, and so it isn't any more redundant after the fold than
>> it was before.  But I suppose the reason for doing it here is that
>> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
>> lengths.  Doing that avoids the need to define an IFN_COND_FOO
>> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
>> I think it deserves a comment.
>
> I think, generally, what I want to cover is a more fundamental thing
> - in length-controlled targets the loop length doesn't change
> throughout a loop and what we normally do is load the right length,
> operate on the maximum length (ignoring tail elements) and store
> the right length.
>
> So, whenever the length is constant it was already determined that
> we operate on exactly this length and length masking is not needed.
> Only when the length is variable and not compile-time constant we need
> to use length masking (and therefore the vec_cond simplification becomes
> invalid).  I think we never e.g. operate on the first "half" of a
> vector, leaving the second half unchanged.  As far as I know such access
> patterns are always done with non-length, "conditional" masking.

In that case, I think we need to nail down what the semantics of
these LEN functions actually are.  There seems to be a discrepancy
between the optab documentation and the internal-fn.cc documentation.

The optab documentation says:

for (i = 0; i < ops[5] + ops[6]; i++)
  op0[i] = op1[i] ? op2[i] @var{op} op3[i] : op4[i];

which leaves trailing elements of op0 in an undefined state.
But internal-fn.cc says:

     for (int i = 0; i < NUNITS; i++)
      {
	if (i < LEN + BIAS && COND[i])
	  LHS[i] = A[i] CODE B[i];
	else
	  LHS[i] = ELSE[i];
      }

which leaves all lanes in a well-defined state.  Which one is right?

If the first one is right, then it doesn't seem to matter whether
the length is constant or variable.  We can simplify:

  IFN_COND_LEN_IOR (mask, a, 0, b, len, bias)

to:

  VEC_COND_EXPR <mask, a, b>

regardless of the values of len and bias.  We wouldn't then need a
VCOND_MASK_LEN after all.

If the second one is right, then we cannot get rid of the length
unless it is known to be equal to the number of lanes, at least
according to gimple semantics.  Any knowledge about which lanes
"exist" is only exposed in target-dependent code (presumably by
the VSETVL pass).

Thanks,
Richard

> Actually the only problematic cases I found were reduction-like loops
> where the reduction operated on full length rather than the "right" one.
> If a tail element is wrong then, obviously the reduction result is also
> wrong.  From a "loop len" point of view a reduction could have a length
> like len_store.  Then the simplification problem would go away.
>
> In the attached version I removed the hunk you mentioned but added a
> match.pd pattern where all constant-length vcond_mask_len are simplified
> to vec_cond.
>
> /* A VCOND_MASK_LEN with a constant length is just a vec_cond for
>    our purposes.  */
> (simplify
>  (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
>   (vec_cond @0 @1 @2))
>
> This works for all of the testsuite (and is basically the same
> thing we have been testing all along with the bogus simplification
> still in place).  Is there any way how to formalize the
> requirement?  Or am I totally wrong and this must never be done?
>
>> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
>> (I realise IFN_COND_MASK isn't, but that's used differently.)
>
> Right, a conversion optab is not necessary - in the expander function
> all we really do is move the condition from position 1 to 3.  Changing
> the order would mean inconsistency with vec_cond.  If that's acceptable
> I can change it and we can use expand_direct_optab_fn.  For now I kept
> the expander function but used a direct optab.
>
> Regards
>  Robin
>
> From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
> From: Robin Dapp <rdapp@ventanamicro.com>
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.
>
> gcc/ChangeLog:
>
> 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> 	expander.
> 	* config/riscv/riscv-protos.h (enum insn_type):
> 	* doc/md.texi: Add vcond_mask_len.
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Create VCOND_MASK_LEN when
> 	length masking.
> 	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
> 	matching of 6 and 7 parameters.
> 	(gimple_match_op::set_op): Ditto.
> 	(gimple_match_op::gimple_match_op): Always initialize len and
> 	bias.
> 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> 	(expand_vec_cond_mask_len_optab_fn): Add.
> 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> 	(internal_fn_mask_index): Ditto.
> 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> 	* match.pd: Combine unconditional unary, binary and ternary
> 	operations into the respective COND_LEN operations.
> 	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md     | 37 ++++++++++++++++
>  gcc/config/riscv/riscv-protos.h |  5 +++
>  gcc/doc/md.texi                 |  9 ++++
>  gcc/gimple-match-exports.cc     | 13 ++++--
>  gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
>  gcc/internal-fn.cc              | 42 ++++++++++++++++++
>  gcc/internal-fn.def             |  2 +
>  gcc/match.pd                    | 67 ++++++++++++++++++++++++++++
>  gcc/optabs.def                  |  1 +
>  9 files changed, 249 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index c9a2cf44816..096012af401 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_<mode>"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand:<VM> 3 "nonmemory_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "autovec_else_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    if (satisfies_constraint_Wc1 (operands[3]))
> +      {
> +	rtx ops[] = {operands[0], operands[2], operands[1]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
> +					  riscv_vector::UNARY_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    else if (satisfies_constraint_Wc0 (operands[3]))
> +      {
> +	rtx ops[] = {operands[0], operands[2], operands[2]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
> +					  riscv_vector::UNARY_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    else
> +      {
> +	/* The order of vcond_mask is opposite to pred_merge.  */
> +	rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
> +		     operands[3]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> +					  riscv_vector::MERGE_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -------------------------------------------------------------------------
>  ;; ---- [BOOL] Select based on masks
>  ;; -------------------------------------------------------------------------
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index aa9ce4b70e4..3938c500839 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -302,6 +302,7 @@ enum insn_type : unsigned int
>    UNARY_OP = __NORMAL_OP | UNARY_OP_P,
>    UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
>    UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
> +  UNARY_OP_TUMA = __MASK_OP_TUMA | UNARY_OP_P,
>    UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
>    UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
>    UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
> @@ -337,6 +338,10 @@ enum insn_type : unsigned int
>    /* For vmerge, no mask operand, no mask policy operand.  */
>    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with no vundef operand.  */
> +  MERGE_OP_TUMA = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P
> +		  | TU_POLICY_P,
> +
>    /* For vm<compare>, no tail policy operand.  */
>    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
>    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index daa318ee3da..de0757f1903 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
>  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
>  result of vector comparison.
>  
> +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> +@item @samp{vcond_mask_@var{m}@var{n}}
> +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> +or constant length and operand 5 holds a bias.  If the
> +element index < operand 4 + operand 5 the respective element of the result is
> +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> +operand 4 + operand 5 the computation is performed as if the respective mask
> +element were zero.
> +
>  @cindex @code{maskload@var{m}@var{n}} instruction pattern
>  @item @samp{maskload@var{m}@var{n}}
>  Perform a masked load of vector from memory operand 1 of mode @var{m}
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..d6dac08cc2b 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> +      tree len = res_op->cond.len;
> +      if (!len)
> +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value);
> +      else
> +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value,
> +		       res_op->cond.len, res_op->cond.bias);
>        *res_op = new_op;
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..63a9f029589 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,8 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> +			       (NULL_TREE), bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +57,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }
>  
> @@ -92,6 +94,10 @@ public:
>  		   code_helper, tree, tree, tree, tree, tree);
>    gimple_match_op (const gimple_match_cond &,
>  		   code_helper, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>  
>    void set_op (code_helper, tree, unsigned int);
>    void set_op (code_helper, tree, tree);
> @@ -100,6 +106,8 @@ public:
>    void set_op (code_helper, tree, tree, tree, tree, bool);
>    void set_op (code_helper, tree, tree, tree, tree, tree);
>    void set_op (code_helper, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>    void set_value (tree);
>  
>    tree op_or_null (unsigned int) const;
> @@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
>    ops[4] = op4;
>  }
>  
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (6)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5, tree op6)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (7)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Change the operation performed to CODE_IN, the type of the result to
>     TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
>     to set the operands itself.  */
> @@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
>    ops[4] = op4;
>  }
>  
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 6;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5, tree op6)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 7;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Set the "operation" to be the single value VALUE, such as a constant
>     or SSA_NAME.  */
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f196064c195..318756b6992 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -170,6 +170,7 @@ init_internal_fns ()
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
>  #define vec_cond_mask_direct { 1, 0, false }
> +#define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> @@ -3129,6 +3130,39 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>      emit_move_insn (target, ops[0].value);
>  }
>  
> +static void
> +expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[6];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  tree vec_cond_type = TREE_TYPE (lhs);
> +
> +  machine_mode mode = TYPE_MODE (vec_cond_type);
> +  enum insn_code icode = direct_optab_handler (optab, mode);
> +  rtx rtx_op1, rtx_op2;
> +
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  rtx_op1 = expand_normal (op1);
> +  rtx_op2 = expand_normal (op2);
> +
> +  rtx_op1 = force_reg (mode, rtx_op1);
> +
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], rtx_op1, mode);
> +  create_input_operand (&ops[2], rtx_op2, mode);
> +
> +  int opno = add_mask_and_len_args (ops, 3, stmt);
> +  expand_insn (icode, opno, ops);
> +
> +  if (!rtx_equal_p (ops[0].value, target))
> +    emit_move_insn (target, ops[0].value);
> +}
> +
>  /* Expand VEC_SET internal functions.  */
>  
>  static void
> @@ -3931,6 +3965,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab,
>  #define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \
>    expand_convert_optab_fn (FN, STMT, OPTAB, 2)
>  
> +#define expand_vec_cond_mask_len_optab_fn(FN, STMT, OPTAB) \
> +  expand_vec_cond_mask_len_optab_fn (FN, STMT, OPTAB)
> +
>  /* RETURN_TYPE and ARGS are a return type and argument list that are
>     in principle compatible with FN (which satisfies direct_internal_fn_p).
>     Return the types that should be used to determine whether the
> @@ -4022,6 +4059,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
> +#define direct_vec_cond_mask_len_optab_supported_p direct_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
> @@ -4694,6 +4732,7 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>      case IFN_MASK_LEN_LOAD_LANES:
>      case IFN_MASK_LEN_STORE_LANES:
> +    case IFN_VCOND_MASK_LEN:
>        return 3;
>  
>      default:
> @@ -4783,6 +4822,9 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_SCATTER_STORE:
>        return 4;
>  
> +    case IFN_VCOND_MASK_LEN:
> +      return 0;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..581cc3b5140 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
>  		       vcond_mask, vec_cond_mask)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> +		       vcond_mask_len, vec_cond_mask_len)
>  
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f725a685863..33532776288 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    negate bit_not)
>  (define_operator_list COND_UNARY
>    IFN_COND_NEG IFN_COND_NOT)
> +(define_operator_list COND_LEN_UNARY
> +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
>  
>  /* Binary operations and their associated IFN_COND_* function.  */
>  (define_operator_list UNCOND_BINARY
> @@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    IFN_COND_FMIN IFN_COND_FMAX
>    IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>    IFN_COND_SHL IFN_COND_SHR)
> +(define_operator_list COND_LEN_BINARY
> +  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
> +  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
> +  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
> +  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
> +  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
> +  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
>  
>  /* Same for ternary operations.  */
>  (define_operator_list UNCOND_TERNARY
>    IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
>  (define_operator_list COND_TERNARY
>    IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> +(define_operator_list COND_LEN_TERNARY
> +  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
>  
>  /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
>  (define_operator_list ATOMIC_FETCH_OR_XOR_N
> @@ -8949,6 +8960,62 @@ and,
>  	&& single_use (@5))
>      (view_convert (cond_op (bit_not @0) @2 @3 @4
>  		  (view_convert:op_type @1)))))))
> +
> +/* Similar for all cond_len operations.  */
> +(for uncond_op (UNCOND_UNARY)
> +     cond_op (COND_LEN_UNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op @0 @1 @2 @4 @5))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> +
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_LEN_BINARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> +
> +(for uncond_op (UNCOND_TERNARY)
> +     cond_op (COND_LEN_TERNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> +
> +/* A VCOND_MASK_LEN with a constant length is just a vec_cond for our
> +   purposes.  */
> +(simplify
> + (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
> +  (vec_cond @0 @1 @2))
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..8d5ceeb8710 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> +OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
>  OPTAB_D (cmov_optab, "cmov$a6")
>  OPTAB_D (cstore_optab, "cstore$a4")
>  OPTAB_D (ctrap_optab, "ctrap$a4")

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-05 20:28                     ` Richard Sandiford
@ 2023-11-06  7:22                       ` Richard Biener
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Biener @ 2023-11-06  7:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Robin Dapp, gcc-patches, juzhe.zhong

On Sun, 5 Nov 2023, Richard Sandiford wrote:

> Robin Dapp <rdapp.gcc@gmail.com> writes:
> >> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
> >> operands, even if that makes things inconsistent with vcond_mask.
> >> vcond_mask isn't really a good example to follow, since the operand
> >> order is not only inconsistent with the IFN, it's also inconsistent
> >> with the natural if_then_else order.
> >
> > v4 attached with that changed,  match.pd patterns interleaved as well
> > as scratch-handling added and VLS modes removed.  Lehua has since pushed
> > another patch that extends gimple_match_op to 6/7 operands already so
> > that could be removed as well making the patch even smaller now.
> >
> > Testsuite on riscv looks good (apart from the mentioned cond_widen...),
> > still running on aarch64 and x86.  OK if those pass?
> >
> > Regards
> >  Robin
> >
> > Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.
> >
> > In order to prevent simplification of a COND_OP with degenerate mask
> > (CONSTM1_RTX) into just an OP in the presence of length masking this
> > patch introduces a length-masked analog to VEC_COND_EXPR:
> > IFN_VCOND_MASK_LEN.
> >
> > It also adds new match patterns that allow the combination of
> > unconditional unary, binary and ternay operations with the
> > VCOND_MASK_LEN into a conditional operation if the target supports it.
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/111760
> >
> > 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> > 	expander.
> > 	* config/riscv/riscv-protos.h (enum insn_type): Add.
> > 	* config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
> > 	* doc/md.texi: Add vcond_mask_len.
> > 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> > 	Create VCOND_MASK_LEN when length masking.
> > 	* gimple-match.h (gimple_match_op::gimple_match_op): Always
> > 	initialize len and bias.
> > 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> > 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> > 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> > 	(internal_fn_mask_index): Ditto.
> > 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> > 	* match.pd: Combine unconditional unary, binary and ternary
> > 	operations into the respective COND_LEN operations.
> > 	* optabs.def (OPTAB_D): Add vcond_mask_len optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
> > 	riscv_v.
> > ---
> >  gcc/config/riscv/autovec.md                   | 26 ++++++++++
> >  gcc/config/riscv/riscv-protos.h               |  3 ++
> >  gcc/config/riscv/riscv-v.cc                   |  3 +-
> >  gcc/doc/md.texi                               |  9 ++++
> >  gcc/gimple-match-exports.cc                   | 13 +++--
> >  gcc/gimple-match.h                            |  6 ++-
> >  gcc/internal-fn.cc                            |  5 ++
> >  gcc/internal-fn.def                           |  2 +
> >  gcc/match.pd                                  | 51 +++++++++++++++++++
> >  gcc/optabs.def                                |  1 +
> >  gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
> >  11 files changed, 114 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > index cc4c9596bbf..0a5e4ccb54e 100644
> > --- a/gcc/config/riscv/autovec.md
> > +++ b/gcc/config/riscv/autovec.md
> > @@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
> >    [(set_attr "type" "vector")]
> >  )
> >  
> > +(define_expand "vcond_mask_len_<mode>"
> > +  [(match_operand:V 0 "register_operand")
> > +    (match_operand:<VM> 1 "nonmemory_operand")
> > +    (match_operand:V 2 "nonmemory_operand")
> > +    (match_operand:V 3 "autovec_else_operand")
> > +    (match_operand 4 "autovec_length_operand")
> > +    (match_operand 5 "const_0_operand")]
> > +  "TARGET_VECTOR"
> > +  {
> > +    if (satisfies_constraint_Wc1 (operands[1]))
> > +      riscv_vector::expand_cond_len_unop (code_for_pred_mov (<MODE>mode),
> > +					  operands);
> > +    else
> > +      {
> > +	/* The order of then and else is opposite to pred_merge.  */
> > +	rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
> > +		     operands[1]};
> > +	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> > +					  riscv_vector::MERGE_OP_TU,
> > +					  ops, operands[4]);
> > +      }
> > +    DONE;
> > +  }
> > +  [(set_attr "type" "vector")]
> > +)
> > +
> >  ;; -------------------------------------------------------------------------
> >  ;; ---- [BOOL] Select based on masks
> >  ;; -------------------------------------------------------------------------
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index a1be731c28e..0d0ee5effea 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -359,6 +359,9 @@ enum insn_type : unsigned int
> >    /* For vmerge, no mask operand, no mask policy operand.  */
> >    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
> >  
> > +  /* For vmerge with TU policy.  */
> > +  MERGE_OP_TU = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P | TU_POLICY_P,
> > +
> >    /* For vm<compare>, no tail policy operand.  */
> >    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
> >    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> > index b489ce08775..d2dde1897c4 100644
> > --- a/gcc/config/riscv/riscv-v.cc
> > +++ b/gcc/config/riscv/riscv-v.cc
> > @@ -3214,7 +3214,8 @@ needs_fp_rounding (unsigned icode, machine_mode mode)
> >  	 && icode != maybe_code_for_pred_widen (FLOAT, mode)
> >  	 && icode != maybe_code_for_pred_widen (UNSIGNED_FLOAT, mode)
> >  	 /* vfsgnj */
> > -	 && icode != maybe_code_for_pred (UNSPEC_VCOPYSIGN, mode);
> > +	 && icode != maybe_code_for_pred (UNSPEC_VCOPYSIGN, mode)
> > +	 && icode != maybe_code_for_pred_mov (mode);
> >  }
> >  
> >  /* Subroutine to expand COND_LEN_* patterns.  */
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..10f971749bc 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
> >  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
> >  result of vector comparison.
> >  
> > +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> > +@item @samp{vcond_mask_@var{m}@var{n}}
> > +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> > +or constant length and operand 5 holds a bias.  If the
> > +element index < operand 4 + operand 5 the respective element of the result is
> > +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> > +operand 4 + operand 5 the computation is performed as if the respective mask
> > +element were zero.
> > +
> 
> There is no computation here, it's just a selection between two values.
> We should also mention the different operand order.  How about:
> 
> ----------------
> Set each element of operand 0 to the corresponding element of operand 2
> or operand 3.  Choose operand 2 if both the element index is less than
> operand 4 plus operand 5 and the corresponding element of operand 1
> is nonzero:
> 
> @smallexample
> for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)
>   op0[i] = i < op4 + op5 && op1[i] ? op2[i] : op3[i];
> @end smallexample
> 
> Operands 0, 2 and 3 have mode @var{m}.  Operand 1 has mode @var{n}.
> Operands 4 and 5 have a target-dependent scalar integer mode.
> ----------------
> 
> OK for the non-match.pd target-independent parts with that change.

OK for the rest as well.

Richard.

> Thanks,
> Richard
> 
> >  @cindex @code{maskload@var{m}@var{n}} instruction pattern
> >  @item @samp{maskload@var{m}@var{n}}
> >  Perform a masked load of vector from memory operand 1 of mode @var{m}
> > diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> > index b36027b0bad..d6dac08cc2b 100644
> > --- a/gcc/gimple-match-exports.cc
> > +++ b/gcc/gimple-match-exports.cc
> > @@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
> >        && VECTOR_TYPE_P (res_op->type)
> >        && gimple_simplified_result_is_gimple_val (res_op))
> >      {
> > -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> > -		     res_op->cond.cond, res_op->ops[0],
> > -		     res_op->cond.else_value);
> > +      tree len = res_op->cond.len;
> > +      if (!len)
> > +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> > +		       res_op->cond.cond, res_op->ops[0],
> > +		       res_op->cond.else_value);
> > +      else
> > +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> > +		       res_op->cond.cond, res_op->ops[0],
> > +		       res_op->cond.else_value,
> > +		       res_op->cond.len, res_op->cond.bias);
> >        *res_op = new_op;
> >        return gimple_resimplify3 (seq, res_op, valueize);
> >      }
> > diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> > index 9892c142285..63a9f029589 100644
> > --- a/gcc/gimple-match.h
> > +++ b/gcc/gimple-match.h
> > @@ -32,7 +32,8 @@ public:
> >    enum uncond { UNCOND };
> >  
> >    /* Build an unconditional op.  */
> > -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> > +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> > +			       (NULL_TREE), bias (NULL_TREE) {}
> >    gimple_match_cond (tree, tree);
> >    gimple_match_cond (tree, tree, tree, tree);
> >  
> > @@ -56,7 +57,8 @@ public:
> >  
> >  inline
> >  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> > -  : cond (cond_in), else_value (else_value_in)
> > +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> > +    bias (NULL_TREE)
> >  {
> >  }
> >  
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index c7d3564faef..5a998e794ad 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -170,6 +170,7 @@ init_internal_fns ()
> >  #define store_lanes_direct { 0, 0, false }
> >  #define mask_store_lanes_direct { 0, 0, false }
> >  #define vec_cond_mask_direct { 1, 0, false }
> > +#define vec_cond_mask_len_direct { 1, 1, false }
> >  #define vec_cond_direct { 2, 0, false }
> >  #define scatter_store_direct { 3, 1, false }
> >  #define len_store_direct { 3, 3, false }
> > @@ -4690,6 +4691,7 @@ internal_fn_len_index (internal_fn fn)
> >      case IFN_MASK_LEN_STORE:
> >      case IFN_MASK_LEN_LOAD_LANES:
> >      case IFN_MASK_LEN_STORE_LANES:
> > +    case IFN_VCOND_MASK_LEN:
> >        return 3;
> >  
> >      default:
> > @@ -4782,6 +4784,9 @@ internal_fn_mask_index (internal_fn fn)
> >      case IFN_MASK_LEN_SCATTER_STORE:
> >        return 4;
> >  
> > +    case IFN_VCOND_MASK_LEN:
> > +      return 0;
> > +
> >      default:
> >        return (conditional_internal_fn_code (fn) != ERROR_MARK
> >  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index a2023ab9c3d..7f0e3759615 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
> >  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
> >  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
> >  		       vcond_mask, vec_cond_mask)
> > +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> > +		       vcond_mask_len, cond_len_unary)
> >  
> >  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
> >  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 424bbd02233..dbc811b2b38 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >    negate bit_not)
> >  (define_operator_list COND_UNARY
> >    IFN_COND_NEG IFN_COND_NOT)
> > +(define_operator_list COND_LEN_UNARY
> > +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
> >  
> >  /* Binary operations and their associated IFN_COND_* function.  */
> >  (define_operator_list UNCOND_BINARY
> > @@ -8961,6 +8963,21 @@ and,
> >          && is_truth_type_for (op_type, TREE_TYPE (@0)))
> >       (cond_op (bit_not @0) @2 @1)))))
> >  
> > +(for uncond_op (UNCOND_UNARY)
> > +     cond_op (COND_LEN_UNARY)
> > + (simplify
> > +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> > +   (with { tree op_type = TREE_TYPE (@3); }
> > +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> > +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> > +     (cond_op @0 @1 @2 @4 @5))))
> > + (simplify
> > +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> > +   (with { tree op_type = TREE_TYPE (@3); }
> > +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> > +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> > +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> > +
> >  /* `(a ? -1 : 0) ^ b` can be converted into a conditional not.  */
> >  (simplify
> >   (bit_xor:c (vec_cond @0 uniform_integer_cst_p@1 uniform_integer_cst_p@2) @3)
> > @@ -9007,6 +9024,23 @@ and,
> >  	&& single_use (@4))
> >      (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
> >  
> > +(for uncond_op (UNCOND_BINARY)
> > +     cond_op (COND_LEN_BINARY)
> > + (simplify
> > +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> > +  (with { tree op_type = TREE_TYPE (@4); }
> > +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> > +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> > +	&& single_use (@4))
> > +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> > + (simplify
> > +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> > +  (with { tree op_type = TREE_TYPE (@4); }
> > +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> > +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> > +	&& single_use (@4))
> > +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> > +
> >  /* Same for ternary operations.  */
> >  (for uncond_op (UNCOND_TERNARY)
> >       cond_op (COND_TERNARY)
> > @@ -9025,6 +9059,23 @@ and,
> >  	&& single_use (@5))
> >      (view_convert (cond_op (bit_not @0) @2 @3 @4
> >  		  (view_convert:op_type @1)))))))
> > +
> > +(for uncond_op (UNCOND_TERNARY)
> > +     cond_op (COND_LEN_TERNARY)
> > + (simplify
> > +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@5); }
> > +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> > +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> > +	&& single_use (@5))
> > +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> > + (simplify
> > +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> > +  (with { tree op_type = TREE_TYPE (@5); }
> > +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> > +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> > +	&& single_use (@5))
> > +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> >  #endif
> >  
> >  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 2ccbe4197b7..8d5ceeb8710 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
> >  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
> >  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
> >  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> > +OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
> >  OPTAB_D (cmov_optab, "cmov$a6")
> >  OPTAB_D (cstore_optab, "cstore$a4")
> >  OPTAB_D (ctrap_optab, "ctrap$a4")
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> > index 7e165977e2b..7b3d73acb88 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-additional-options "-fgimple -fdump-tree-optimized -ffast-math" } */
> > +/* { dg-additional-options "-fno-vect-cost-model" { target { riscv_v } } } */
> >  
> >  double __GIMPLE (ssa, startwith("loop"))
> >  neg_xi (double *x)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-03 22:02                   ` Robin Dapp
@ 2023-11-05 20:28                     ` Richard Sandiford
  2023-11-06  7:22                       ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-11-05 20:28 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, gcc-patches, juzhe.zhong

Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
>> operands, even if that makes things inconsistent with vcond_mask.
>> vcond_mask isn't really a good example to follow, since the operand
>> order is not only inconsistent with the IFN, it's also inconsistent
>> with the natural if_then_else order.
>
> v4 attached with that changed,  match.pd patterns interleaved as well
> as scratch-handling added and VLS modes removed.  Lehua has since pushed
> another patch that extends gimple_match_op to 6/7 operands already so
> that could be removed as well making the patch even smaller now.
>
> Testsuite on riscv looks good (apart from the mentioned cond_widen...),
> still running on aarch64 and x86.  OK if those pass?
>
> Regards
>  Robin
>
> Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (CONSTM1_RTX) into just an OP in the presence of length masking this
> patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.
>
> It also adds new match patterns that allow the combination of
> unconditional unary, binary and ternay operations with the
> VCOND_MASK_LEN into a conditional operation if the target supports it.
>
> gcc/ChangeLog:
>
> 	PR tree-optimization/111760
>
> 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> 	expander.
> 	* config/riscv/riscv-protos.h (enum insn_type): Add.
> 	* config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
> 	* doc/md.texi: Add vcond_mask_len.
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Create VCOND_MASK_LEN when length masking.
> 	* gimple-match.h (gimple_match_op::gimple_match_op): Always
> 	initialize len and bias.
> 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> 	(internal_fn_mask_index): Ditto.
> 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> 	* match.pd: Combine unconditional unary, binary and ternary
> 	operations into the respective COND_LEN operations.
> 	* optabs.def (OPTAB_D): Add vcond_mask_len optab.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
> 	riscv_v.
> ---
>  gcc/config/riscv/autovec.md                   | 26 ++++++++++
>  gcc/config/riscv/riscv-protos.h               |  3 ++
>  gcc/config/riscv/riscv-v.cc                   |  3 +-
>  gcc/doc/md.texi                               |  9 ++++
>  gcc/gimple-match-exports.cc                   | 13 +++--
>  gcc/gimple-match.h                            |  6 ++-
>  gcc/internal-fn.cc                            |  5 ++
>  gcc/internal-fn.def                           |  2 +
>  gcc/match.pd                                  | 51 +++++++++++++++++++
>  gcc/optabs.def                                |  1 +
>  gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
>  11 files changed, 114 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index cc4c9596bbf..0a5e4ccb54e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_<mode>"
> +  [(match_operand:V 0 "register_operand")
> +    (match_operand:<VM> 1 "nonmemory_operand")
> +    (match_operand:V 2 "nonmemory_operand")
> +    (match_operand:V 3 "autovec_else_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    if (satisfies_constraint_Wc1 (operands[1]))
> +      riscv_vector::expand_cond_len_unop (code_for_pred_mov (<MODE>mode),
> +					  operands);
> +    else
> +      {
> +	/* The order of then and else is opposite to pred_merge.  */
> +	rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
> +		     operands[1]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> +					  riscv_vector::MERGE_OP_TU,
> +					  ops, operands[4]);
> +      }
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -------------------------------------------------------------------------
>  ;; ---- [BOOL] Select based on masks
>  ;; -------------------------------------------------------------------------
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a1be731c28e..0d0ee5effea 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -359,6 +359,9 @@ enum insn_type : unsigned int
>    /* For vmerge, no mask operand, no mask policy operand.  */
>    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with TU policy.  */
> +  MERGE_OP_TU = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P | TU_POLICY_P,
> +
>    /* For vm<compare>, no tail policy operand.  */
>    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
>    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index b489ce08775..d2dde1897c4 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3214,7 +3214,8 @@ needs_fp_rounding (unsigned icode, machine_mode mode)
>  	 && icode != maybe_code_for_pred_widen (FLOAT, mode)
>  	 && icode != maybe_code_for_pred_widen (UNSIGNED_FLOAT, mode)
>  	 /* vfsgnj */
> -	 && icode != maybe_code_for_pred (UNSPEC_VCOPYSIGN, mode);
> +	 && icode != maybe_code_for_pred (UNSPEC_VCOPYSIGN, mode)
> +	 && icode != maybe_code_for_pred_mov (mode);
>  }
>  
>  /* Subroutine to expand COND_LEN_* patterns.  */
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..10f971749bc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
>  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
>  result of vector comparison.
>  
> +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> +@item @samp{vcond_mask_@var{m}@var{n}}
> +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> +or constant length and operand 5 holds a bias.  If the
> +element index < operand 4 + operand 5 the respective element of the result is
> +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> +operand 4 + operand 5 the computation is performed as if the respective mask
> +element were zero.
> +

There is no computation here, it's just a selection between two values.
We should also mention the different operand order.  How about:

----------------
Set each element of operand 0 to the corresponding element of operand 2
or operand 3.  Choose operand 2 if both the element index is less than
operand 4 plus operand 5 and the corresponding element of operand 1
is nonzero:

@smallexample
for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)
  op0[i] = i < op4 + op5 && op1[i] ? op2[i] : op3[i];
@end smallexample

Operands 0, 2 and 3 have mode @var{m}.  Operand 1 has mode @var{n}.
Operands 4 and 5 have a target-dependent scalar integer mode.
----------------

OK for the non-match.pd target-independent parts with that change.

Thanks,
Richard

>  @cindex @code{maskload@var{m}@var{n}} instruction pattern
>  @item @samp{maskload@var{m}@var{n}}
>  Perform a masked load of vector from memory operand 1 of mode @var{m}
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..d6dac08cc2b 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> +      tree len = res_op->cond.len;
> +      if (!len)
> +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value);
> +      else
> +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value,
> +		       res_op->cond.len, res_op->cond.bias);
>        *res_op = new_op;
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index 9892c142285..63a9f029589 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,8 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> +			       (NULL_TREE), bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +57,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c7d3564faef..5a998e794ad 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -170,6 +170,7 @@ init_internal_fns ()
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
>  #define vec_cond_mask_direct { 1, 0, false }
> +#define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> @@ -4690,6 +4691,7 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>      case IFN_MASK_LEN_LOAD_LANES:
>      case IFN_MASK_LEN_STORE_LANES:
> +    case IFN_VCOND_MASK_LEN:
>        return 3;
>  
>      default:
> @@ -4782,6 +4784,9 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_SCATTER_STORE:
>        return 4;
>  
> +    case IFN_VCOND_MASK_LEN:
> +      return 0;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..7f0e3759615 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
>  		       vcond_mask, vec_cond_mask)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> +		       vcond_mask_len, cond_len_unary)
>  
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 424bbd02233..dbc811b2b38 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    negate bit_not)
>  (define_operator_list COND_UNARY
>    IFN_COND_NEG IFN_COND_NOT)
> +(define_operator_list COND_LEN_UNARY
> +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
>  
>  /* Binary operations and their associated IFN_COND_* function.  */
>  (define_operator_list UNCOND_BINARY
> @@ -8961,6 +8963,21 @@ and,
>          && is_truth_type_for (op_type, TREE_TYPE (@0)))
>       (cond_op (bit_not @0) @2 @1)))))
>  
> +(for uncond_op (UNCOND_UNARY)
> +     cond_op (COND_LEN_UNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op @0 @1 @2 @4 @5))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> +
>  /* `(a ? -1 : 0) ^ b` can be converted into a conditional not.  */
>  (simplify
>   (bit_xor:c (vec_cond @0 uniform_integer_cst_p@1 uniform_integer_cst_p@2) @3)
> @@ -9007,6 +9024,23 @@ and,
>  	&& single_use (@4))
>      (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
>  
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_LEN_BINARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> +
>  /* Same for ternary operations.  */
>  (for uncond_op (UNCOND_TERNARY)
>       cond_op (COND_TERNARY)
> @@ -9025,6 +9059,23 @@ and,
>  	&& single_use (@5))
>      (view_convert (cond_op (bit_not @0) @2 @3 @4
>  		  (view_convert:op_type @1)))))))
> +
> +(for uncond_op (UNCOND_TERNARY)
> +     cond_op (COND_LEN_TERNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..8d5ceeb8710 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> +OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
>  OPTAB_D (cmov_optab, "cmov$a6")
>  OPTAB_D (cstore_optab, "cstore$a4")
>  OPTAB_D (ctrap_optab, "ctrap$a4")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> index 7e165977e2b..7b3d73acb88 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-additional-options "-fgimple -fdump-tree-optimized -ffast-math" } */
> +/* { dg-additional-options "-fno-vect-cost-model" { target { riscv_v } } } */
>  
>  double __GIMPLE (ssa, startwith("loop"))
>  neg_xi (double *x)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-03  9:11                 ` Richard Sandiford
@ 2023-11-03 22:02                   ` Robin Dapp
  2023-11-05 20:28                     ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-11-03 22:02 UTC (permalink / raw)
  To: Richard Biener, gcc-patches, juzhe.zhong, richard.sandiford; +Cc: rdapp.gcc

> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
> operands, even if that makes things inconsistent with vcond_mask.
> vcond_mask isn't really a good example to follow, since the operand
> order is not only inconsistent with the IFN, it's also inconsistent
> with the natural if_then_else order.

v4 attached with that changed,  match.pd patterns interleaved as well
as scratch-handling added and VLS modes removed.  Lehua has since pushed
another patch that extends gimple_match_op to 6/7 operands already so
that could be removed as well making the patch even smaller now.

Testsuite on riscv looks good (apart from the mentioned cond_widen...),
still running on aarch64 and x86.  OK if those pass?

Regards
 Robin

Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(CONSTM1_RTX) into just an OP in the presence of length masking this
patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.

It also adds new match patterns that allow the combination of
unconditional unary, binary and ternay operations with the
VCOND_MASK_LEN into a conditional operation if the target supports it.

gcc/ChangeLog:

	PR tree-optimization/111760

	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
	expander.
	* config/riscv/riscv-protos.h (enum insn_type): Add.
	* config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
	* doc/md.texi: Add vcond_mask_len.
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Create VCOND_MASK_LEN when length masking.
	* gimple-match.h (gimple_match_op::gimple_match_op): Always
	initialize len and bias.
	* internal-fn.cc (vec_cond_mask_len_direct): Add.
	(direct_vec_cond_mask_len_optab_supported_p): Add.
	(internal_fn_len_index): Add VCOND_MASK_LEN.
	(internal_fn_mask_index): Ditto.
	* internal-fn.def (VCOND_MASK_LEN): New internal function.
	* match.pd: Combine unconditional unary, binary and ternary
	operations into the respective COND_LEN operations.
	* optabs.def (OPTAB_D): Add vcond_mask_len optab.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
	riscv_v.
---
 gcc/config/riscv/autovec.md                   | 26 ++++++++++
 gcc/config/riscv/riscv-protos.h               |  3 ++
 gcc/config/riscv/riscv-v.cc                   |  3 +-
 gcc/doc/md.texi                               |  9 ++++
 gcc/gimple-match-exports.cc                   | 13 +++--
 gcc/gimple-match.h                            |  6 ++-
 gcc/internal-fn.cc                            |  5 ++
 gcc/internal-fn.def                           |  2 +
 gcc/match.pd                                  | 51 +++++++++++++++++++
 gcc/optabs.def                                |  1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
 11 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cc4c9596bbf..0a5e4ccb54e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_<mode>"
+  [(match_operand:V 0 "register_operand")
+    (match_operand:<VM> 1 "nonmemory_operand")
+    (match_operand:V 2 "nonmemory_operand")
+    (match_operand:V 3 "autovec_else_operand")
+    (match_operand 4 "autovec_length_operand")
+    (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+    if (satisfies_constraint_Wc1 (operands[1]))
+      riscv_vector::expand_cond_len_unop (code_for_pred_mov (<MODE>mode),
+					  operands);
+    else
+      {
+	/* The order of then and else is opposite to pred_merge.  */
+	rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
+		     operands[1]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
+					  riscv_vector::MERGE_OP_TU,
+					  ops, operands[4]);
+      }
+    DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [BOOL] Select based on masks
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a1be731c28e..0d0ee5effea 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -359,6 +359,9 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with TU policy.  */
+  MERGE_OP_TU = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P | TU_POLICY_P,
+
   /* For vm<compare>, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b489ce08775..d2dde1897c4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3214,7 +3214,8 @@ needs_fp_rounding (unsigned icode, machine_mode mode)
 	 && icode != maybe_code_for_pred_widen (FLOAT, mode)
 	 && icode != maybe_code_for_pred_widen (UNSIGNED_FLOAT, mode)
 	 /* vfsgnj */
-	 && icode != maybe_code_for_pred (UNSPEC_VCOPYSIGN, mode);
+	 && icode != maybe_code_for_pred (UNSPEC_VCOPYSIGN, mode)
+	 && icode != maybe_code_for_pred_mov (mode);
 }
 
 /* Subroutine to expand COND_LEN_* patterns.  */
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..10f971749bc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
 Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
 result of vector comparison.
 
+@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
+or constant length and operand 5 holds a bias.  If the
+element index < operand 4 + operand 5 the respective element of the result is
+computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
+operand 4 + operand 5 the computation is performed as if the respective mask
+element were zero.
+
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..d6dac08cc2b 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
+      tree len = res_op->cond.len;
+      if (!len)
+	new_op.set_op (VEC_COND_EXPR, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value);
+      else
+	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value,
+		       res_op->cond.len, res_op->cond.bias);
       *res_op = new_op;
       return gimple_resimplify3 (seq, res_op, valueize);
     }
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index 9892c142285..63a9f029589 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,8 @@ public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
+			       (NULL_TREE), bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +57,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c7d3564faef..5a998e794ad 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -170,6 +170,7 @@ init_internal_fns ()
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define vec_cond_mask_direct { 1, 0, false }
+#define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
@@ -4690,6 +4691,7 @@ internal_fn_len_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
     case IFN_MASK_LEN_LOAD_LANES:
     case IFN_MASK_LEN_STORE_LANES:
+    case IFN_VCOND_MASK_LEN:
       return 3;
 
     default:
@@ -4782,6 +4784,9 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_SCATTER_STORE:
       return 4;
 
+    case IFN_VCOND_MASK_LEN:
+      return 0;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..7f0e3759615 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
 		       vcond_mask, vec_cond_mask)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
+		       vcond_mask_len, cond_len_unary)
 
 DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/match.pd b/gcc/match.pd
index 424bbd02233..dbc811b2b38 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   negate bit_not)
 (define_operator_list COND_UNARY
   IFN_COND_NEG IFN_COND_NOT)
+(define_operator_list COND_LEN_UNARY
+  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
 
 /* Binary operations and their associated IFN_COND_* function.  */
 (define_operator_list UNCOND_BINARY
@@ -8961,6 +8963,21 @@ and,
         && is_truth_type_for (op_type, TREE_TYPE (@0)))
      (cond_op (bit_not @0) @2 @1)))))
 
+(for uncond_op (UNCOND_UNARY)
+     cond_op (COND_LEN_UNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op @0 @1 @2 @4 @5))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op (bit_not @0) @2 @1 @4 @5)))))
+
 /* `(a ? -1 : 0) ^ b` can be converted into a conditional not.  */
 (simplify
  (bit_xor:c (vec_cond @0 uniform_integer_cst_p@1 uniform_integer_cst_p@2) @3)
@@ -9007,6 +9024,23 @@ and,
 	&& single_use (@4))
     (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
 
+(for uncond_op (UNCOND_BINARY)
+     cond_op (COND_LEN_BINARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
+
 /* Same for ternary operations.  */
 (for uncond_op (UNCOND_TERNARY)
      cond_op (COND_TERNARY)
@@ -9025,6 +9059,23 @@ and,
 	&& single_use (@5))
     (view_convert (cond_op (bit_not @0) @2 @3 @4
 		  (view_convert:op_type @1)))))))
+
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_LEN_TERNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..8d5ceeb8710 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
 OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
 OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
 OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
+OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")
 OPTAB_D (ctrap_optab, "ctrap$a4")
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
index 7e165977e2b..7b3d73acb88 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-fgimple -fdump-tree-optimized -ffast-math" } */
+/* { dg-additional-options "-fno-vect-cost-model" { target { riscv_v } } } */
 
 double __GIMPLE (ssa, startwith("loop"))
 neg_xi (double *x)
-- 
2.41.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-03  9:03               ` Robin Dapp
@ 2023-11-03  9:11                 ` Richard Sandiford
  2023-11-03 22:02                   ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-11-03  9:11 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, gcc-patches, juzhe.zhong

Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Could you explain why a special expansion is needed?  (Sorry if you already
>> have and I missed it, bit overloaded ATM.)  What does it do that is
>> different from what expand_fn_using_insn would do?
>
> All it does (in excess) is shuffle the arguments - vcond_mask_len has the
> mask as third operand similar to vcond_mask while vec_cond has the mask
> first.  I can swap them in the IFN already but when not swapping we will
> either be inconsistent with vec_cond or with vcond_mask.

Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
operands, even if that makes things inconsistent with vcond_mask.
vcond_mask isn't really a good example to follow, since the operand
order is not only inconsistent with the IFN, it's also inconsistent
with the natural if_then_else order.

Thanks,
Richard


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-02 23:49             ` Richard Sandiford
@ 2023-11-03  9:03               ` Robin Dapp
  2023-11-03  9:11                 ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-11-03  9:03 UTC (permalink / raw)
  To: Richard Biener, gcc-patches, juzhe.zhong, richard.sandiford; +Cc: rdapp.gcc

> Could you explain why a special expansion is needed?  (Sorry if you already
> have and I missed it, bit overloaded ATM.)  What does it do that is
> different from what expand_fn_using_insn would do?

All it does (in excess) is shuffle the arguments - vcond_mask_len has the
mask as third operand similar to vcond_mask while vec_cond has the mask
first.  I can swap them in the IFN already but when not swapping we will
either be inconsistent with vec_cond or with vcond_mask.

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-02 13:48           ` Robin Dapp
@ 2023-11-02 23:49             ` Richard Sandiford
  2023-11-03  9:03               ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-11-02 23:49 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, gcc-patches, juzhe.zhong

Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Looks reasonable overall.  The new match patterns are 1:1 the
>> same as the COND_ ones.  That's a bit awkward, but I don't see
>> a good way to "macroize" stuff further there.  Can you at least
>> interleave the COND_LEN_* ones with the other ones instead of
>> putting them all at the end?
>
> Yes, no problem.  It's supposed to be only temporary anyway (FWIW)
> as I didn't manage with the "stripping _LEN" way on the first few tries.
> Still on the todo list but unlikely to be done before stage 1 closes.
>
> I believe Richard "kind of" LGTM'ed the rest minus the spurious
> pattern (which is gone now) but there is still the direct optab change
> that he didn't comment on so I think we should wait for his remarks
> still.

Could you explain why a special expansion is needed?  (Sorry if you already
have and I missed it, bit overloaded ATM.)  What does it do that is
different from what expand_fn_using_insn would do?

Thanks,
Richard


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-11-02 13:35         ` Richard Biener
@ 2023-11-02 13:48           ` Robin Dapp
  2023-11-02 23:49             ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-11-02 13:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: rdapp.gcc, richard.sandiford, gcc-patches, juzhe.zhong

> Looks reasonable overall.  The new match patterns are 1:1 the
> same as the COND_ ones.  That's a bit awkward, but I don't see
> a good way to "macroize" stuff further there.  Can you at least
> interleave the COND_LEN_* ones with the other ones instead of
> putting them all at the end?

Yes, no problem.  It's supposed to be only temporary anyway (FWIW)
as I didn't manage with the "stripping _LEN" way on the first few tries.
Still on the todo list but unlikely to be done before stage 1 closes.

I believe Richard "kind of" LGTM'ed the rest minus the spurious
pattern (which is gone now) but there is still the direct optab change
that he didn't comment on so I think we should wait for his remarks
still.

Regards
 Robin


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-26 14:02       ` Robin Dapp
  2023-10-26 14:10         ` 钟居哲
@ 2023-11-02 13:35         ` Richard Biener
  2023-11-02 13:48           ` Robin Dapp
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Biener @ 2023-11-02 13:35 UTC (permalink / raw)
  To: Robin Dapp; +Cc: richard.sandiford, gcc-patches, juzhe.zhong

On Thu, 26 Oct 2023, Robin Dapp wrote:

> Ok, next try.  Now without dubious pattern and with direct optab
> but still dedicated expander function.
> 
> This will cause one riscv regression in cond_widen_reduc-2.c that
> we can deal with later.  It is just a missed optimization where
> we do not combine something that we used to because of the
> now-present length masking.
> 
> I'd also like to postpone handling vcond_mask_len simplifications
> via stripping the length and falling back to vec_cond and its fold
> patterns to a later time.  As is, this helps us avoid execution
> failures in at least five test cases.
> 
> Bootstrap et al. running on x86, aarch64 and power10.

Looks reasonable overall.  The new match patterns are 1:1 the
same as the COND_ ones.  That's a bit awkward, but I don't see
a good way to "macroize" stuff further there.  Can you at least
interleave the COND_LEN_* ones with the other ones instead of
putting them all at the end?

Thanks,
Richard.


> Regards
>  Robin
> 
> From 7acdebb5b13b71331621af08da6649fe08476fe8 Mon Sep 17 00:00:00 2001
> From: Robin Dapp <rdapp@ventanamicro.com>
> Date: Wed, 25 Oct 2023 22:19:43 +0200
> Subject: [PATCH v3] internal-fn: Add VCOND_MASK_LEN.
> 
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.
> 
> It also adds new match patterns that allow the combination of
> unconditional unary, binary and ternay operations with the
> VCOND_MASK_LEN into a conditional operation if the target supports it.
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/111760
> 
> 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> 	expander.
> 	* config/riscv/riscv-protos.h (enum insn_type): Add.
> 	* doc/md.texi: Add vcond_mask_len.
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Create VCOND_MASK_LEN when
> 	length masking.
> 	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
> 	matching of 6 and 7 parameters.
> 	(gimple_match_op::set_op): Ditto.
> 	(gimple_match_op::gimple_match_op): Always initialize len and
> 	bias.
> 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> 	(expand_vec_cond_mask_len_optab_fn): Add.
> 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> 	(internal_fn_mask_index): Ditto.
> 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> 	* match.pd: Combine unconditional unary, binary and ternary
> 	operations into the respective COND_LEN operations.
> 	* optabs.def (OPTAB_D): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md     | 37 ++++++++++++++++
>  gcc/config/riscv/riscv-protos.h |  5 +++
>  gcc/doc/md.texi                 |  9 ++++
>  gcc/gimple-match-exports.cc     | 13 ++++--
>  gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
>  gcc/internal-fn.cc              | 42 ++++++++++++++++++
>  gcc/internal-fn.def             |  2 +
>  gcc/match.pd                    | 61 ++++++++++++++++++++++++++
>  gcc/optabs.def                  |  1 +
>  9 files changed, 243 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 80910ba3cc2..dadb71c1165 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_<mode>"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand:<VM> 3 "nonmemory_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "autovec_else_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    if (satisfies_constraint_Wc1 (operands[3]))
> +      {
> +	rtx ops[] = {operands[0], operands[2], operands[1]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
> +					  riscv_vector::UNARY_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    else if (satisfies_constraint_Wc0 (operands[3]))
> +      {
> +	rtx ops[] = {operands[0], operands[2], operands[2]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
> +					  riscv_vector::UNARY_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    else
> +      {
> +	/* The order of vcond_mask is opposite to pred_merge.  */
> +	rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
> +		     operands[3]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> +					  riscv_vector::MERGE_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -------------------------------------------------------------------------
>  ;; ---- [BOOL] Select based on masks
>  ;; -------------------------------------------------------------------------
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 668d75043ca..0a54e4ff022 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -302,6 +302,7 @@ enum insn_type : unsigned int
>    UNARY_OP = __NORMAL_OP | UNARY_OP_P,
>    UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
>    UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
> +  UNARY_OP_TUMA = __MASK_OP_TUMA | UNARY_OP_P,
>    UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
>    UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
>    UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
> @@ -337,6 +338,10 @@ enum insn_type : unsigned int
>    /* For vmerge, no mask operand, no mask policy operand.  */
>    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with no vundef operand.  */
> +  MERGE_OP_TUMA = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P
> +		  | TU_POLICY_P,
> +
>    /* For vm<compare>, no tail policy operand.  */
>    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
>    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index daa318ee3da..de0757f1903 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
>  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
>  result of vector comparison.
>  
> +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> +@item @samp{vcond_mask_@var{m}@var{n}}
> +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> +or constant length and operand 5 holds a bias.  If the
> +element index < operand 4 + operand 5 the respective element of the result is
> +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> +operand 4 + operand 5 the computation is performed as if the respective mask
> +element were zero.
> +
>  @cindex @code{maskload@var{m}@var{n}} instruction pattern
>  @item @samp{maskload@var{m}@var{n}}
>  Perform a masked load of vector from memory operand 1 of mode @var{m}
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..d6dac08cc2b 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> +      tree len = res_op->cond.len;
> +      if (!len)
> +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value);
> +      else
> +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value,
> +		       res_op->cond.len, res_op->cond.bias);
>        *res_op = new_op;
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..63a9f029589 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,8 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> +			       (NULL_TREE), bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +57,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }
>  
> @@ -92,6 +94,10 @@ public:
>  		   code_helper, tree, tree, tree, tree, tree);
>    gimple_match_op (const gimple_match_cond &,
>  		   code_helper, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>  
>    void set_op (code_helper, tree, unsigned int);
>    void set_op (code_helper, tree, tree);
> @@ -100,6 +106,8 @@ public:
>    void set_op (code_helper, tree, tree, tree, tree, bool);
>    void set_op (code_helper, tree, tree, tree, tree, tree);
>    void set_op (code_helper, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>    void set_value (tree);
>  
>    tree op_or_null (unsigned int) const;
> @@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
>    ops[4] = op4;
>  }
>  
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (6)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5, tree op6)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (7)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Change the operation performed to CODE_IN, the type of the result to
>     TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
>     to set the operands itself.  */
> @@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
>    ops[4] = op4;
>  }
>  
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 6;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5, tree op6)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 7;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Set the "operation" to be the single value VALUE, such as a constant
>     or SSA_NAME.  */
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 018175261b9..ed83fa8112e 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -170,6 +170,7 @@ init_internal_fns ()
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
>  #define vec_cond_mask_direct { 1, 0, false }
> +#define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> @@ -3129,6 +3130,39 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>      emit_move_insn (target, ops[0].value);
>  }
>  
> +static void
> +expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[6];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  tree vec_cond_type = TREE_TYPE (lhs);
> +
> +  machine_mode mode = TYPE_MODE (vec_cond_type);
> +  enum insn_code icode = direct_optab_handler (optab, mode);
> +  rtx rtx_op1, rtx_op2;
> +
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  rtx_op1 = expand_normal (op1);
> +  rtx_op2 = expand_normal (op2);
> +
> +  rtx_op1 = force_reg (mode, rtx_op1);
> +
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], rtx_op1, mode);
> +  create_input_operand (&ops[2], rtx_op2, mode);
> +
> +  int opno = add_mask_and_len_args (ops, 3, stmt);
> +  expand_insn (icode, opno, ops);
> +
> +  if (!rtx_equal_p (ops[0].value, target))
> +    emit_move_insn (target, ops[0].value);
> +}
> +
>  /* Expand VEC_SET internal functions.  */
>  
>  static void
> @@ -3927,6 +3961,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab,
>  #define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \
>    expand_convert_optab_fn (FN, STMT, OPTAB, 2)
>  
> +#define expand_vec_cond_mask_len_optab_fn(FN, STMT, OPTAB) \
> +  expand_vec_cond_mask_len_optab_fn (FN, STMT, OPTAB)
> +
>  /* RETURN_TYPE and ARGS are a return type and argument list that are
>     in principle compatible with FN (which satisfies direct_internal_fn_p).
>     Return the types that should be used to determine whether the
> @@ -4018,6 +4055,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
> +#define direct_vec_cond_mask_len_optab_supported_p direct_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
> @@ -4690,6 +4728,7 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>      case IFN_MASK_LEN_LOAD_LANES:
>      case IFN_MASK_LEN_STORE_LANES:
> +    case IFN_VCOND_MASK_LEN:
>        return 3;
>  
>      default:
> @@ -4779,6 +4818,9 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_SCATTER_STORE:
>        return 4;
>  
> +    case IFN_VCOND_MASK_LEN:
> +      return 0;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..581cc3b5140 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
>  		       vcond_mask, vec_cond_mask)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> +		       vcond_mask_len, vec_cond_mask_len)
>  
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f725a685863..0c21c29694d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    negate bit_not)
>  (define_operator_list COND_UNARY
>    IFN_COND_NEG IFN_COND_NOT)
> +(define_operator_list COND_LEN_UNARY
> +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
>  
>  /* Binary operations and their associated IFN_COND_* function.  */
>  (define_operator_list UNCOND_BINARY
> @@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    IFN_COND_FMIN IFN_COND_FMAX
>    IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>    IFN_COND_SHL IFN_COND_SHR)
> +(define_operator_list COND_LEN_BINARY
> +  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
> +  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
> +  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
> +  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
> +  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
> +  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
>  
>  /* Same for ternary operations.  */
>  (define_operator_list UNCOND_TERNARY
>    IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
>  (define_operator_list COND_TERNARY
>    IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> +(define_operator_list COND_LEN_TERNARY
> +  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
>  
>  /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
>  (define_operator_list ATOMIC_FETCH_OR_XOR_N
> @@ -8949,6 +8960,56 @@ and,
>  	&& single_use (@5))
>      (view_convert (cond_op (bit_not @0) @2 @3 @4
>  		  (view_convert:op_type @1)))))))
> +
> +/* Similar for all cond_len operations.  */
> +(for uncond_op (UNCOND_UNARY)
> +     cond_op (COND_LEN_UNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op @0 @1 @2 @4 @5))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> +
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_LEN_BINARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> +
> +(for uncond_op (UNCOND_TERNARY)
> +     cond_op (COND_LEN_TERNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..8d5ceeb8710 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> +OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
>  OPTAB_D (cmov_optab, "cmov$a6")
>  OPTAB_D (cstore_optab, "cstore$a4")
>  OPTAB_D (ctrap_optab, "ctrap$a4")
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-26 14:10         ` 钟居哲
@ 2023-10-26 20:32           ` Robin Dapp
  0 siblings, 0 replies; 31+ messages in thread
From: Robin Dapp @ 2023-10-26 20:32 UTC (permalink / raw)
  To: 钟居哲, richard.sandiford
  Cc: rdapp.gcc, gcc-patches, rguenther

> +(define_expand "vcond_mask_len_<mode>"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand:<VM> 3 "nonmemory_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "autovec_else_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> 
> I think you should change V_VLS into V since we never apply partial vectorization (predicated by length)
> on VLSmodes.  VLSmodes are the modes used on GNU vector/SLP/SIMD vectorizations.

Right, thanks.  That was likely copy and paste from vcond_mask.
Changed it and re-tested but not going to send another version
before more changes are requested.

Regards
 Robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-26  8:41     ` Robin Dapp
@ 2023-10-26 14:02       ` Robin Dapp
  2023-10-26 14:10         ` 钟居哲
  2023-11-02 13:35         ` Richard Biener
  0 siblings, 2 replies; 31+ messages in thread
From: Robin Dapp @ 2023-10-26 14:02 UTC (permalink / raw)
  To: richard.sandiford; +Cc: rdapp.gcc, gcc-patches, rguenther, juzhe.zhong

Ok, next try.  Now without dubious pattern and with direct optab
but still dedicated expander function.

This will cause one riscv regression in cond_widen_reduc-2.c that
we can deal with later.  It is just a missed optimization where
we do not combine something that we used to because of the
now-present length masking.

I'd also like to postpone handling vcond_mask_len simplifications
via stripping the length and falling back to vec_cond and its fold
patterns to a later time.  As is, this helps us avoid execution
failures in at least five test cases.

Bootstrap et al. running on x86, aarch64 and power10.

Regards
 Robin

From 7acdebb5b13b71331621af08da6649fe08476fe8 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@ventanamicro.com>
Date: Wed, 25 Oct 2023 22:19:43 +0200
Subject: [PATCH v3] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.

It also adds new match patterns that allow the combination of
unconditional unary, binary and ternay operations with the
VCOND_MASK_LEN into a conditional operation if the target supports it.

gcc/ChangeLog:

	PR tree-optimization/111760

	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
	expander.
	* config/riscv/riscv-protos.h (enum insn_type): Add.
	* doc/md.texi: Add vcond_mask_len.
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Create VCOND_MASK_LEN when
	length masking.
	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
	matching of 6 and 7 parameters.
	(gimple_match_op::set_op): Ditto.
	(gimple_match_op::gimple_match_op): Always initialize len and
	bias.
	* internal-fn.cc (vec_cond_mask_len_direct): Add.
	(expand_vec_cond_mask_len_optab_fn): Add.
	(direct_vec_cond_mask_len_optab_supported_p): Add.
	(internal_fn_len_index): Add VCOND_MASK_LEN.
	(internal_fn_mask_index): Ditto.
	* internal-fn.def (VCOND_MASK_LEN): New internal function.
	* match.pd: Combine unconditional unary, binary and ternary
	operations into the respective COND_LEN operations.
	* optabs.def (OPTAB_D): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md     | 37 ++++++++++++++++
 gcc/config/riscv/riscv-protos.h |  5 +++
 gcc/doc/md.texi                 |  9 ++++
 gcc/gimple-match-exports.cc     | 13 ++++--
 gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
 gcc/internal-fn.cc              | 42 ++++++++++++++++++
 gcc/internal-fn.def             |  2 +
 gcc/match.pd                    | 61 ++++++++++++++++++++++++++
 gcc/optabs.def                  |  1 +
 9 files changed, 243 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..dadb71c1165 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_<mode>"
+  [(match_operand:V_VLS 0 "register_operand")
+    (match_operand:<VM> 3 "nonmemory_operand")
+    (match_operand:V_VLS 1 "nonmemory_operand")
+    (match_operand:V_VLS 2 "autovec_else_operand")
+    (match_operand 4 "autovec_length_operand")
+    (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+    if (satisfies_constraint_Wc1 (operands[3]))
+      {
+	rtx ops[] = {operands[0], operands[2], operands[1]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
+					  riscv_vector::UNARY_OP_TUMA,
+					  ops, operands[4]);
+      }
+    else if (satisfies_constraint_Wc0 (operands[3]))
+      {
+	rtx ops[] = {operands[0], operands[2], operands[2]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
+					  riscv_vector::UNARY_OP_TUMA,
+					  ops, operands[4]);
+      }
+    else
+      {
+	/* The order of vcond_mask is opposite to pred_merge.  */
+	rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
+		     operands[3]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
+					  riscv_vector::MERGE_OP_TUMA,
+					  ops, operands[4]);
+      }
+    DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [BOOL] Select based on masks
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 668d75043ca..0a54e4ff022 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP = __NORMAL_OP | UNARY_OP_P,
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
+  UNARY_OP_TUMA = __MASK_OP_TUMA | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
@@ -337,6 +338,10 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with no vundef operand.  */
+  MERGE_OP_TUMA = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P
+		  | TU_POLICY_P,
+
   /* For vm<compare>, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index daa318ee3da..de0757f1903 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
 Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
 result of vector comparison.
 
+@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
+or constant length and operand 5 holds a bias.  If the
+element index < operand 4 + operand 5 the respective element of the result is
+computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
+operand 4 + operand 5 the computation is performed as if the respective mask
+element were zero.
+
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..d6dac08cc2b 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
+      tree len = res_op->cond.len;
+      if (!len)
+	new_op.set_op (VEC_COND_EXPR, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value);
+      else
+	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value,
+		       res_op->cond.len, res_op->cond.bias);
       *res_op = new_op;
       return gimple_resimplify3 (seq, res_op, valueize);
     }
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..63a9f029589 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,8 @@ public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
+			       (NULL_TREE), bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +57,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
@@ -92,6 +94,10 @@ public:
 		   code_helper, tree, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
@@ -100,6 +106,8 @@ public:
   void set_op (code_helper, tree, tree, tree, tree, bool);
   void set_op (code_helper, tree, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
@@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
   ops[4] = op4;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5, tree op6)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (7)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
   ops[4] = op4;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 6;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5, tree op6)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 7;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 018175261b9..ed83fa8112e 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -170,6 +170,7 @@ init_internal_fns ()
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define vec_cond_mask_direct { 1, 0, false }
+#define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
@@ -3129,6 +3130,39 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
     emit_move_insn (target, ops[0].value);
 }
 
+static void
+expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  enum insn_code icode = direct_optab_handler (optab, mode);
+  rtx rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  rtx_op1 = force_reg (mode, rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+
+  int opno = add_mask_and_len_args (ops, 3, stmt);
+  expand_insn (icode, opno, ops);
+
+  if (!rtx_equal_p (ops[0].value, target))
+    emit_move_insn (target, ops[0].value);
+}
+
 /* Expand VEC_SET internal functions.  */
 
 static void
@@ -3927,6 +3961,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab,
 #define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \
   expand_convert_optab_fn (FN, STMT, OPTAB, 2)
 
+#define expand_vec_cond_mask_len_optab_fn(FN, STMT, OPTAB) \
+  expand_vec_cond_mask_len_optab_fn (FN, STMT, OPTAB)
+
 /* RETURN_TYPE and ARGS are a return type and argument list that are
    in principle compatible with FN (which satisfies direct_internal_fn_p).
    Return the types that should be used to determine whether the
@@ -4018,6 +4055,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
+#define direct_vec_cond_mask_len_optab_supported_p direct_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
@@ -4690,6 +4728,7 @@ internal_fn_len_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
     case IFN_MASK_LEN_LOAD_LANES:
     case IFN_MASK_LEN_STORE_LANES:
+    case IFN_VCOND_MASK_LEN:
       return 3;
 
     default:
@@ -4779,6 +4818,9 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_SCATTER_STORE:
       return 4;
 
+    case IFN_VCOND_MASK_LEN:
+      return 0;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..581cc3b5140 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
 		       vcond_mask, vec_cond_mask)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
+		       vcond_mask_len, vec_cond_mask_len)
 
 DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/match.pd b/gcc/match.pd
index f725a685863..0c21c29694d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   negate bit_not)
 (define_operator_list COND_UNARY
   IFN_COND_NEG IFN_COND_NOT)
+(define_operator_list COND_LEN_UNARY
+  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
 
 /* Binary operations and their associated IFN_COND_* function.  */
 (define_operator_list UNCOND_BINARY
@@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_FMIN IFN_COND_FMAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
+(define_operator_list COND_LEN_BINARY
+  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
+  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
+  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
+  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
+  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
+  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
 
 /* Same for ternary operations.  */
 (define_operator_list UNCOND_TERNARY
   IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
 (define_operator_list COND_TERNARY
   IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
+(define_operator_list COND_LEN_TERNARY
+  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
 
 /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
 (define_operator_list ATOMIC_FETCH_OR_XOR_N
@@ -8949,6 +8960,56 @@ and,
 	&& single_use (@5))
     (view_convert (cond_op (bit_not @0) @2 @3 @4
 		  (view_convert:op_type @1)))))))
+
+/* Similar for all cond_len operations.  */
+(for uncond_op (UNCOND_UNARY)
+     cond_op (COND_LEN_UNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op @0 @1 @2 @4 @5))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op (bit_not @0) @2 @1 @4 @5)))))
+
+(for uncond_op (UNCOND_BINARY)
+     cond_op (COND_LEN_BINARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
+
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_LEN_TERNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..8d5ceeb8710 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
 OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
 OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
 OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
+OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")
 OPTAB_D (ctrap_optab, "ctrap$a4")
-- 
2.41.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-25 22:35   ` 钟居哲
@ 2023-10-26  8:41     ` Robin Dapp
  2023-10-26 14:02       ` Robin Dapp
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Dapp @ 2023-10-26  8:41 UTC (permalink / raw)
  To: 钟居哲, richard.sandiford
  Cc: rdapp.gcc, gcc-patches, rguenther

> Yeah. I think Robin may need this :
> 
> TREE_CODE (else_val) == SSA_NAAME
>   && SSA_NAME_IS_DEFAULT_DEF (else_val)
>   && VAR_P (SSA_NAME_VAR (else_val))
> 
> to differentiate whether the ELSE VALUE is uninitialized SSA or not.

I think we are talking about a different simplification now.
This one we could still add as a match.pd pattern simplifying every
conditional operation with an undefined else value.

I just re-checked - without my pattern that turns
VCOND_MASK_LEN into VEC_COND there is only one additional fail.
(cond_widen_reduc-2.c where we scan for vfwreduc).
I guess I can just change the combine pattern to combine cond
as well as length masking (merge + if_then_else) when the else
value is similar in both.  Then we would avoid my dubious
simplification and still get rid of the execution failures.

Surely Richard is right in that we cannot "unconditionally" fold
away the length but my naive hunch is that we currently never
create situations where this really leads to errors.

Regards
 Robin


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.
  2023-10-25 22:10 [PATCH] internal-fn: Add VCOND_MASK_LEN 钟居哲
@ 2023-10-25 22:32 ` Richard Sandiford
  2023-10-25 22:35   ` 钟居哲
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Sandiford @ 2023-10-25 22:32 UTC (permalink / raw)
  To: 钟居哲; +Cc: gcc-patches, rdapp.gcc, rguenther

钟居哲 <juzhe.zhong@rivai.ai> writes:
>>> Which one is right?
> Hi, Richard. Let me explain this situation.
>
> Both situations are possible. It's depending on the 'ELSE' value whether it is unitialized value.
>
> For reduction case:
>
> for (int i = 0; i < n; i++)
>   result += a[i]
>
> The trailing elements should be well-defined, keep the original value. Otherwise, it will cause run-time issue.
>
> For integer DIV operation:
>
> for (int i = 0; i < n; i++)
>   a[i] = a[i] / b[i];
>
> The trailling elements are DON'T care (or undefined), I will use unitialized value in 'ELSE' value.
> Then later 'expand' stage will expand it into "clobber scratch" RTL.

OK, in that case it sounds like we're talking about PR110751.
The gimple semantics are that the COND_LEN operates on all lanes
of the mode (and the md.texi documentation should be fixed).
But if the else value is undefined, we can simplify:

  IFN_COND_LEN_IOR (mask, a, 0, undef, len, bias)

to "a".

That's not really a property of COND_LEN though.  The same thing
applies to plain IFN_COND_IOR.  If the fold reduces to a selection
between two values, and one of them is undefined, we can pick the other.

(Although we'd need to think a little carefully about that,
c.f. llvm's distinction between undef and poison.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] internal-fn: Add VCOND_MASK_LEN.
@ 2023-10-25 22:10 钟居哲
  2023-10-25 22:32 ` Richard Sandiford
  0 siblings, 1 reply; 31+ messages in thread
From: 钟居哲 @ 2023-10-25 22:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc, richard.sandiford, rguenther

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]


>> Which one is right?
Hi, Richard. Let me explain this situation.

Both situations are possible. It's depending on the 'ELSE' value whether it is unitialized value.

For reduction case:

for (int i = 0; i < n; i++)
  result += a[i]

The trailing elements should be well-defined, keep the original value. Otherwise, it will cause run-time issue.

For integer DIV operation:

for (int i = 0; i < n; i++)
  a[i] = a[i] / b[i];

The trailling elements are DON'T care (or undefined), I will use unitialized value in 'ELSE' value.
Then later 'expand' stage will expand it into "clobber scratch" RTL.

 Thanks.


juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-11-06  7:22 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-08  9:01 [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN Robin Dapp
2023-09-11 20:35 ` Robin Dapp
2023-09-18 10:22   ` Robin Dapp
2023-10-04  8:11     ` Robin Dapp
2023-10-12 13:53   ` Richard Sandiford
2023-10-12 14:19     ` Richard Sandiford
2023-10-13 15:50       ` Robin Dapp
2023-10-16 21:59         ` Richard Sandiford
2023-10-17  8:47           ` Richard Biener
2023-10-17 11:39             ` Robin Dapp
2023-10-17 13:35               ` Richard Sandiford
2023-10-17 15:42                 ` Robin Dapp
2023-10-17 16:05                   ` Richard Sandiford
     [not found]                     ` <7e083b67-f283-4e9e-ba76-24e194fa1761@gmail.com>
     [not found]                       ` <mptttqmny4u.fsf@arm.com>
2023-10-23 16:09                         ` [PATCH] internal-fn: Add VCOND_MASK_LEN Robin Dapp
2023-10-24 21:50                           ` Richard Sandiford
2023-10-25 19:59                             ` Robin Dapp
2023-10-25 21:58                               ` Richard Sandiford
2023-10-17 15:52             ` [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN Richard Sandiford
2023-10-25 22:10 [PATCH] internal-fn: Add VCOND_MASK_LEN 钟居哲
2023-10-25 22:32 ` Richard Sandiford
2023-10-25 22:35   ` 钟居哲
2023-10-26  8:41     ` Robin Dapp
2023-10-26 14:02       ` Robin Dapp
2023-10-26 14:10         ` 钟居哲
2023-10-26 20:32           ` Robin Dapp
2023-11-02 13:35         ` Richard Biener
2023-11-02 13:48           ` Robin Dapp
2023-11-02 23:49             ` Richard Sandiford
2023-11-03  9:03               ` Robin Dapp
2023-11-03  9:11                 ` Richard Sandiford
2023-11-03 22:02                   ` Robin Dapp
2023-11-05 20:28                     ` Richard Sandiford
2023-11-06  7:22                       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).