public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] gimple-isel: handle x CMP y ? z : 0
@ 2022-05-04 10:15 Richard Earnshaw
  2022-05-04 11:14 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw @ 2022-05-04 10:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 522 bytes --]


Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
operations, but this can be generalized further when the comparison
forms a natural mask so that we can also handle x CMP y ? z : 0 by
transforming it into (x CMP y) & z.  This will, in most cases save
having to load a register with the zero vector.

gcc/ChangeLog:
	* gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
	x CMP y ? z : 0.
---
 gcc/gimple-isel.cc | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-gimple-isel-handle-x-CMP-y-z-0.patch --]
[-- Type: text/x-patch; name="0001-gimple-isel-handle-x-CMP-y-z-0.patch", Size: 1655 bytes --]

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index a8f7a0d25d0..5bf4a4eccc1 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -190,16 +190,33 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
 	    can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
 						     tcode);
 
-	  /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
+	  /* Try to fold x CMP y ? z : 0.  */
 	  if (can_compute_op0
-	      && integer_minus_onep (op1)
 	      && integer_zerop (op2)
 	      && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
 	    {
-	      tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
-	      gassign *new_stmt = gimple_build_assign (lhs, conv_op);
-	      gsi_replace (gsi, new_stmt, true);
-	      return new_stmt;
+	      /* If Z is -1, then the result is just the comparison.  */
+	      if (integer_minus_onep (op1))
+		{
+		  tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
+					 op0);
+		  gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+		  gsi_replace (gsi, new_stmt, true);
+		  return new_stmt;
+		}
+	      /* Otherwise, use the comparison as a mask for Z.  */
+	      else
+		{
+		  gimple_seq stmts = NULL;
+		  tree type = TREE_TYPE (lhs);
+		  location_t loc = gimple_location (stmt);
+		  tree tem0 = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
+					    type, op0);
+		  tree tem1 = gimple_build (&stmts, loc, BIT_AND_EXPR, type,
+					    tem0, op1);
+		  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+		  return gimple_build_assign (lhs, tem1);
+		}
 	    }
 
 	  /* When the compare has EH we do not want to forward it when

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gimple-isel: handle x CMP y ? z : 0
  2022-05-04 10:15 [PATCH] gimple-isel: handle x CMP y ? z : 0 Richard Earnshaw
@ 2022-05-04 11:14 ` Richard Biener
  2022-05-04 13:45   ` Richard Earnshaw
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2022-05-04 11:14 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches, Richard Biener

On Wed, May 4, 2022 at 12:16 PM Richard Earnshaw via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
> operations, but this can be generalized further when the comparison
> forms a natural mask so that we can also handle x CMP y ? z : 0 by
> transforming it into (x CMP y) & z.  This will, in most cases save
> having to load a register with the zero vector.

Hmm, I'm not sure why exactly the existing case is in ISEL but if
we want to extend it there we migh also want to handle ? 0 : -1
as BIT_NOT.

For the case in question it looks like it might better fit as optimization
in match.pd?  It also lacks a query for present and<mode>3 support.
For match.pd it might be nice to handle x CMP y ? 0 : z as well
if we can invert x CMP y (and it has only a single use).  When in
ISEL we might as well check for andn<mode>3 support and go with
BIT_NOT + BIT_AND ...

Do you have a testcase where it improves code generation btw?

Richard.

> gcc/ChangeLog:
>         * gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
>         x CMP y ? z : 0.
> ---
>  gcc/gimple-isel.cc | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gimple-isel: handle x CMP y ? z : 0
  2022-05-04 11:14 ` Richard Biener
@ 2022-05-04 13:45   ` Richard Earnshaw
  2022-05-05  6:53     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw @ 2022-05-04 13:45 UTC (permalink / raw)
  To: Richard Biener, Richard Earnshaw; +Cc: GCC Patches, Richard Biener



On 04/05/2022 12:14, Richard Biener wrote:
> On Wed, May 4, 2022 at 12:16 PM Richard Earnshaw via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
>> operations, but this can be generalized further when the comparison
>> forms a natural mask so that we can also handle x CMP y ? z : 0 by
>> transforming it into (x CMP y) & z.  This will, in most cases save
>> having to load a register with the zero vector.
> 
> Hmm, I'm not sure why exactly the existing case is in ISEL but if
> we want to extend it there we migh also want to handle ? 0 : -1
> as BIT_NOT.

I was assuming that there had already been some level of 
canonicalization at this point, but maybe that's an incorrect 
assumption.  There are quite a few transforms that would avoid a select 
operation, but at some point the bit manipulations may well become more 
expensive than the select itself.

> 
> For the case in question it looks like it might better fit as optimization
> in match.pd?  It also lacks a query for present and<mode>3 support.
> For match.pd it might be nice to handle x CMP y ? 0 : z as well
> if we can invert x CMP y (and it has only a single use).  When in
> ISEL we might as well check for andn<mode>3 support and go with
> BIT_NOT + BIT_AND ...

I'll have to think about that a bit, the approach was inspired by the 
hunk just a bit earlier that starts:

   /* Lower mask typed, non-vector mode VEC_COND_EXPRs to bitwise 
operations.
      Those can end up generated by folding and at least for integer 
mode masks
      we cannot expect vcond expanders to exist.  We lower a ? b : c
      to (b & a) | (c & ~a).  */
   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (lhs))
       && !VECTOR_MODE_P (mode))
     {
       gcc_assert (types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1)));
       gimple_seq stmts = NULL;
       tree type = TREE_TYPE (lhs);
       location_t loc = gimple_location (stmt);
       tree tem0 = gimple_build (&stmts, loc, BIT_AND_EXPR, type, op1, op0);

There's no test for and<mode>3 support there, but I guess that's working 
on integral modes rather than vectors.

> 
> Do you have a testcase where it improves code generation btw?

It was originally inspired by:

void f (int * __restrict__ dest, int *a, int *b)
{
   int i;
   for (i=0; i<4; i++) {
     dest[i] = a[i] == b[i];
   }
}

which with Neon on Arm was producing a vsel sequence rather than a mask 
operation because the backend doesn't handle this case directly during 
expand (unlike some other targets); but it seemed wrong for every target 
to have to handle this in the back-end when it's a pretty generic 
optimization.

A more generalized case would be something like

void f (int * __restrict__ dest, int *a, int *b)
{
   int i;
   for (i=0; i<4; i++) {
     dest[i] = a[i] ? b[i] : 0;
   }
}

R.

> 
> Richard.
> 
>> gcc/ChangeLog:
>>         * gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
>>         x CMP y ? z : 0.
>> ---
>>  gcc/gimple-isel.cc | 29 +++++++++++++++++++++++------
>>  1 file changed, 23 insertions(+), 6 deletions(-)
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gimple-isel: handle x CMP y ? z : 0
  2022-05-04 13:45   ` Richard Earnshaw
@ 2022-05-05  6:53     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2022-05-05  6:53 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Richard Biener, Richard Earnshaw, GCC Patches

On Wed, 4 May 2022, Richard Earnshaw wrote:

> 
> 
> On 04/05/2022 12:14, Richard Biener wrote:
> > On Wed, May 4, 2022 at 12:16 PM Richard Earnshaw via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >> Gimple isel already handles x CMP y ? -1 : 0 when lowering vector cond
> >> operations, but this can be generalized further when the comparison
> >> forms a natural mask so that we can also handle x CMP y ? z : 0 by
> >> transforming it into (x CMP y) & z.  This will, in most cases save
> >> having to load a register with the zero vector.
> > 
> > Hmm, I'm not sure why exactly the existing case is in ISEL but if
> > we want to extend it there we migh also want to handle ? 0 : -1
> > as BIT_NOT.
> 
> I was assuming that there had already been some level of canonicalization at
> this point, but maybe that's an incorrect assumption.  There are quite a few
> transforms that would avoid a select operation, but at some point the bit
> manipulations may well become more expensive than the select itself.
> 
> > 
> > For the case in question it looks like it might better fit as optimization
> > in match.pd?  It also lacks a query for present and<mode>3 support.
> > For match.pd it might be nice to handle x CMP y ? 0 : z as well
> > if we can invert x CMP y (and it has only a single use).  When in
> > ISEL we might as well check for andn<mode>3 support and go with
> > BIT_NOT + BIT_AND ...
> 
> I'll have to think about that a bit, the approach was inspired by the hunk
> just a bit earlier that starts:
> 
>   /* Lower mask typed, non-vector mode VEC_COND_EXPRs to bitwise operations.
>      Those can end up generated by folding and at least for integer mode masks
>      we cannot expect vcond expanders to exist.  We lower a ? b : c
>      to (b & a) | (c & ~a).  */
>   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (lhs))
>       && !VECTOR_MODE_P (mode))
>     {
>       gcc_assert (types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1)));
>       gimple_seq stmts = NULL;
>       tree type = TREE_TYPE (lhs);
>       location_t loc = gimple_location (stmt);
>       tree tem0 = gimple_build (&stmts, loc, BIT_AND_EXPR, type, op1, op0);
> 
> There's no test for and<mode>3 support there, but I guess that's working on
> integral modes rather than vectors.

Yes, that's for word_mode or AVX512 mask modes (QImode or HImode).

> > 
> > Do you have a testcase where it improves code generation btw?
> 
> It was originally inspired by:
> 
> void f (int * __restrict__ dest, int *a, int *b)
> {
>   int i;
>   for (i=0; i<4; i++) {
>     dest[i] = a[i] == b[i];
>   }
> }
> 
> which with Neon on Arm was producing a vsel sequence rather than a mask
> operation because the backend doesn't handle this case directly during expand
> (unlike some other targets); but it seemed wrong for every target to have to
> handle this in the back-end when it's a pretty generic optimization.

But of course whether the AND or the VSEL is more efficient depends
on the target so in match.pd it would be canonicalization while in
ISEL (aka instruction selection) we should look at what the target
prefers.  ISEL is supposed to become a place where we do all the
non-trivial RTL expansion tricks.

The existing trick just drops an operation which should be always
profitable, evading the problem of deciding.  If we add things
here we should either reason that by all means profitability
is guaranteed (with a comment) or somehow query the target
(I'm not sure how - how does RTL expansion do it?  IIRC that
might also just pick the first "working" expansion?)

I understand your target can do the vcond as well, but for
the testcases it seems we have used_vec_cond_exprs == 1
and thus the vcond would also do the compare and thus to me
it looks like the single .VCOND is going to be cheaper
than a compare to produce the mask and the then AND?

If the VSEL doesn't actually perform the comparison then I think
you shouldn't have vcond{[u],eq} patterns but just comparison
patterns and vcond_mask?  For that I'd then say the target
could choose to expand vcond_mask with same mode (not
mixed FP/INT modes) as AND if profitable?

Richard.

> A more generalized case would be something like
> 
> void f (int * __restrict__ dest, int *a, int *b)
> {
>   int i;
>   for (i=0; i<4; i++) {
>     dest[i] = a[i] ? b[i] : 0;
>   }
> }
> 
> R.
> 
> > 
> > Richard.
> > 
> >>gcc/ChangeLog:
> >>         * gimple-isel.cc (gimple_expand_vec_cond_expr): Handle
> >>         x CMP y ? z : 0.
> >>---
> >>  gcc/gimple-isel.cc | 29 +++++++++++++++++++++++------
> >>  1 file changed, 23 insertions(+), 6 deletions(-)
> >>
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-05  6:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04 10:15 [PATCH] gimple-isel: handle x CMP y ? z : 0 Richard Earnshaw
2022-05-04 11:14 ` Richard Biener
2022-05-04 13:45   ` Richard Earnshaw
2022-05-05  6:53     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).