public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Improve do_store_flag for single bit comparison against 0
@ 2023-05-19  2:14 Andrew Pinski
  2023-05-19  2:14 ` [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit Andrew Pinski
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Andrew Pinski @ 2023-05-19  2:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
extends it to handle the case where we know t has a
nonzero of just one bit set.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

	* expr.cc (do_store_flag): Extend the one bit checking case
	to handle the case where we don't have an and but rather still
	one bit is known to be non-zero.
---
 gcc/expr.cc | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5ede094e705..91528e734e7 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
       && integer_zerop (arg1)
       && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
     {
-      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
-      if (srcstmt
-	  && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
+      wide_int nz = tree_nonzero_bits (arg0);
+
+      if (wi::popcount (nz) == 1)
 	{
+	  tree op0;
+	  tree op1;
+	  gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
+	  /* If the defining statement was (x & POW2), then remove the and
+	     as we are going to add it back. */
+	  if (srcstmt
+	      && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
+	    {
+	      op0 = gimple_assign_rhs1 (srcstmt);
+	      op1 = gimple_assign_rhs2 (srcstmt);
+	    }
+	  else
+	    {
+	      op0 = arg0;
+	      op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
+	    }
 	  enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
 	  type = lang_hooks.types.type_for_mode (mode, unsignedp);
-	  tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
-				       gimple_assign_rhs1 (srcstmt),
-				       gimple_assign_rhs2 (srcstmt));
+	  tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
+				       op0, op1);
 	  temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
 	  if (temp)
 	    return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit
  2023-05-19  2:14 [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Andrew Pinski
@ 2023-05-19  2:14 ` Andrew Pinski
  2023-05-19 16:45   ` Jeff Law
  2023-05-19 16:38 ` [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Jeff Law
  2023-05-22 11:55 ` Richard Biener
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Pinski @ 2023-05-19  2:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  t ^= (1<<30);
  return t != 0;
}
```

We should handle the case where the nonzero bits is the same as the
comparison operand.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	* expr.cc (do_store_flag): Improve for single bit testing
	not against zero but against that single bit.
---
 gcc/expr.cc | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 91528e734e7..a4628c51c0c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13080,12 +13080,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
      so we just call into the folder and expand its result.  */
 
   if ((code == NE || code == EQ)
-      && integer_zerop (arg1)
+      && (integer_zerop (arg1)
+	  || integer_pow2p (arg1))
       && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
     {
       wide_int nz = tree_nonzero_bits (arg0);
 
-      if (wi::popcount (nz) == 1)
+      if (wi::popcount (nz) == 1
+	  && (integer_zerop (arg1)
+	      || wi::to_wide (arg1) == nz))
 	{
 	  tree op0;
 	  tree op1;
@@ -13103,11 +13106,13 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
 	      op0 = arg0;
 	      op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
 	    }
-	  enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
+	  enum tree_code tcode = EQ_EXPR;
+	  if ((code == NE) ^ !integer_zerop (arg1))
+	    tcode = NE_EXPR;
 	  type = lang_hooks.types.type_for_mode (mode, unsignedp);
 	  tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
 				       op0, op1);
-	  temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
+	  temp = fold_single_bit_test (loc, tcode, temp, build_zero_cst (type), type);
 	  if (temp)
 	    return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
 	}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0
  2023-05-19  2:14 [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Andrew Pinski
  2023-05-19  2:14 ` [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit Andrew Pinski
@ 2023-05-19 16:38 ` Jeff Law
  2023-05-19 22:04   ` Andrew Pinski
  2023-05-22 11:55 ` Richard Biener
  2 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-05-19 16:38 UTC (permalink / raw)
  To: Andrew Pinski, gcc-patches



On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:
> While working something else, I noticed we could improve
> the following function code generation:
> ```
> unsigned f(unsigned t)
> {
>    if (t & ~(1<<30)) __builtin_unreachable();
>    return t != 0;
> }
> ```
> Right know we just emit a comparison against 0 instead
> of just a shift right by 30.
> There is code in do_store_flag which already optimizes
> `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> extends it to handle the case where we know t has a
> nonzero of just one bit set.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> gcc/ChangeLog:
> 
> 	* expr.cc (do_store_flag): Extend the one bit checking case
> 	to handle the case where we don't have an and but rather still
> 	one bit is known to be non-zero.
So as we touched on in IRC, the concern is targets where the cost of the 
shift depends on the number of bits shifted.  Can we look at costing 
here to determine the initial RTL generation approach?

Another approach that would work for some targets is a single bit 
extract.  In theory we should be discovering the extract idiom from the 
shift+and form, but I'm always concerned that it's going to be missed 
for one or more oddball reasons.

jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit
  2023-05-19  2:14 ` [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit Andrew Pinski
@ 2023-05-19 16:45   ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2023-05-19 16:45 UTC (permalink / raw)
  To: Andrew Pinski, gcc-patches



On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:
> This is a case which I noticed while working on the previous patch.
> Sometimes we end up with `a == CST` instead of comparing against 0.
> This happens in the following code:
> ```
> unsigned f(unsigned t)
> {
>    if (t & ~(1<<30)) __builtin_unreachable();
>    t ^= (1<<30);
>    return t != 0;
> }
> ```
> 
> We should handle the case where the nonzero bits is the same as the
> comparison operand.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu.
> 
> gcc/ChangeLog:
> 
> 	* expr.cc (do_store_flag): Improve for single bit testing
> 	not against zero but against that single bit.
This looks like it can/should go forward independently of 1/2 and 
touches on my earlier comment about using bit extractions  ;-)

So OK by me.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0
  2023-05-19 16:38 ` [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Jeff Law
@ 2023-05-19 22:04   ` Andrew Pinski
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Pinski @ 2023-05-19 22:04 UTC (permalink / raw)
  To: Jeff Law; +Cc: Andrew Pinski, gcc-patches

On Fri, May 19, 2023 at 9:40 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:
> > While working something else, I noticed we could improve
> > the following function code generation:
> > ```
> > unsigned f(unsigned t)
> > {
> >    if (t & ~(1<<30)) __builtin_unreachable();
> >    return t != 0;
> > }
> > ```
> > Right know we just emit a comparison against 0 instead
> > of just a shift right by 30.
> > There is code in do_store_flag which already optimizes
> > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > extends it to handle the case where we know t has a
> > nonzero of just one bit set.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >       * expr.cc (do_store_flag): Extend the one bit checking case
> >       to handle the case where we don't have an and but rather still
> >       one bit is known to be non-zero.
> So as we touched on in IRC, the concern is targets where the cost of the
> shift depends on the number of bits shifted.  Can we look at costing
> here to determine the initial RTL generation approach?
>
> Another approach that would work for some targets is a single bit
> extract.  In theory we should be discovering the extract idiom from the
> shift+and form, but I'm always concerned that it's going to be missed
> for one or more oddball reasons.

I now have a patch set which does the extraction directly rather than having
combine try to combine it later on. This actually fixes an issue with avr target
which expands out the shift by doing a loop. Since we are using
extract_bit_field,
if a target does not have an extract pattern, it will expand using
shift+and form instead.
I will resubmit this and the other patch after this new patch set is completed.

Thanks,
Andrew Pinski

>
> jeff
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0
  2023-05-19  2:14 [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Andrew Pinski
  2023-05-19  2:14 ` [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit Andrew Pinski
  2023-05-19 16:38 ` [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Jeff Law
@ 2023-05-22 11:55 ` Richard Biener
  2023-05-22 15:21   ` Andrew Pinski
  2 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2023-05-22 11:55 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches

On Fri, May 19, 2023 at 4:15 AM Andrew Pinski via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> While working something else, I noticed we could improve
> the following function code generation:
> ```
> unsigned f(unsigned t)
> {
>   if (t & ~(1<<30)) __builtin_unreachable();
>   return t != 0;
> }
> ```
> Right know we just emit a comparison against 0 instead
> of just a shift right by 30.
> There is code in do_store_flag which already optimizes
> `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> extends it to handle the case where we know t has a
> nonzero of just one bit set.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
>         * expr.cc (do_store_flag): Extend the one bit checking case
>         to handle the case where we don't have an and but rather still
>         one bit is known to be non-zero.
> ---
>  gcc/expr.cc | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 5ede094e705..91528e734e7 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
>        && integer_zerop (arg1)
>        && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
>      {
> -      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> -      if (srcstmt
> -         && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> +      wide_int nz = tree_nonzero_bits (arg0);
> +
> +      if (wi::popcount (nz) == 1)
>         {
> +         tree op0;
> +         tree op1;
> +         gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> +         /* If the defining statement was (x & POW2), then remove the and
> +            as we are going to add it back. */
> +         if (srcstmt
> +             && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> +           {
> +             op0 = gimple_assign_rhs1 (srcstmt);
> +             op1 = gimple_assign_rhs2 (srcstmt);
> +           }
> +         else
> +           {
> +             op0 = arg0;
> +             op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
> +           }
>           enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
>           type = lang_hooks.types.type_for_mode (mode, unsignedp);
> -         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
> -                                      gimple_assign_rhs1 (srcstmt),
> -                                      gimple_assign_rhs2 (srcstmt));
> +         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
> +                                      op0, op1);
>           temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
>           if (temp)
>             return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);

I wonder if, instead of expanding expand with these kind of tricks we
want to instead
add to ISEL and use direct optab IFNs for things we matched?  In
particular I think
we do want to get rid of TER but the above adds another use of get_def_for_expr.

As Jeff says the above doesn't look like it includes costing so that would be an
argument to make it a generic match.pd transform (it appears to be "simpler")?

Richard.

> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0
  2023-05-22 11:55 ` Richard Biener
@ 2023-05-22 15:21   ` Andrew Pinski
  2023-05-23  6:07     ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Pinski @ 2023-05-22 15:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Pinski, gcc-patches

On Mon, May 22, 2023 at 4:56 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, May 19, 2023 at 4:15 AM Andrew Pinski via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > While working something else, I noticed we could improve
> > the following function code generation:
> > ```
> > unsigned f(unsigned t)
> > {
> >   if (t & ~(1<<30)) __builtin_unreachable();
> >   return t != 0;
> > }
> > ```
> > Right know we just emit a comparison against 0 instead
> > of just a shift right by 30.
> > There is code in do_store_flag which already optimizes
> > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > extends it to handle the case where we know t has a
> > nonzero of just one bit set.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >         * expr.cc (do_store_flag): Extend the one bit checking case
> >         to handle the case where we don't have an and but rather still
> >         one bit is known to be non-zero.
> > ---
> >  gcc/expr.cc | 27 +++++++++++++++++++++------
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 5ede094e705..91528e734e7 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
> >        && integer_zerop (arg1)
> >        && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
> >      {
> > -      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > -      if (srcstmt
> > -         && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > +      wide_int nz = tree_nonzero_bits (arg0);
> > +
> > +      if (wi::popcount (nz) == 1)
> >         {
> > +         tree op0;
> > +         tree op1;
> > +         gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > +         /* If the defining statement was (x & POW2), then remove the and
> > +            as we are going to add it back. */
> > +         if (srcstmt
> > +             && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > +           {
> > +             op0 = gimple_assign_rhs1 (srcstmt);
> > +             op1 = gimple_assign_rhs2 (srcstmt);
> > +           }
> > +         else
> > +           {
> > +             op0 = arg0;
> > +             op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
> > +           }
> >           enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
> >           type = lang_hooks.types.type_for_mode (mode, unsignedp);
> > -         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
> > -                                      gimple_assign_rhs1 (srcstmt),
> > -                                      gimple_assign_rhs2 (srcstmt));
> > +         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
> > +                                      op0, op1);
> >           temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
> >           if (temp)
> >             return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
>
> I wonder if, instead of expanding expand with these kind of tricks we
> want to instead
> add to ISEL and use direct optab IFNs for things we matched?  In
> particular I think
> we do want to get rid of TER but the above adds another use of get_def_for_expr.

The above does not add another at all. It was there before, it just
moves it around slightly. Instead we depend on the non-zero bits to be
correct before even trying get_def_for_expr .
The get_def_for_expr is there to remove the & if it can be ter'ed.

>
> As Jeff says the above doesn't look like it includes costing so that would be an
> argument to make it a generic match.pd transform (it appears to be "simpler")?

For the TER case, it would be same number of gimple instructions so
that can happen if we want
t = a & CST
result = t != 0
vs:
t1 = BIT_FIELD_REF <a, 1, N>
result = (bool)t1

For the non-TER case (which is what this patch is trying to solve).
we just have `t != 0` (where t has a non-zero value of CST) so it might increase
the number of gimple instructions by 1.

Is that ok? Or should that still happen in expand only.

The cost issue between a != 0 vs bit_extraction (for the non-ter case)
is something which I will be solving next weekend.

>
> Richard.
>
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0
  2023-05-22 15:21   ` Andrew Pinski
@ 2023-05-23  6:07     ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2023-05-23  6:07 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Andrew Pinski, gcc-patches

On Mon, May 22, 2023 at 5:21 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, May 22, 2023 at 4:56 AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Fri, May 19, 2023 at 4:15 AM Andrew Pinski via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > While working something else, I noticed we could improve
> > > the following function code generation:
> > > ```
> > > unsigned f(unsigned t)
> > > {
> > >   if (t & ~(1<<30)) __builtin_unreachable();
> > >   return t != 0;
> > > }
> > > ```
> > > Right know we just emit a comparison against 0 instead
> > > of just a shift right by 30.
> > > There is code in do_store_flag which already optimizes
> > > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > > extends it to handle the case where we know t has a
> > > nonzero of just one bit set.
> > >
> > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > gcc/ChangeLog:
> > >
> > >         * expr.cc (do_store_flag): Extend the one bit checking case
> > >         to handle the case where we don't have an and but rather still
> > >         one bit is known to be non-zero.
> > > ---
> > >  gcc/expr.cc | 27 +++++++++++++++++++++------
> > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > index 5ede094e705..91528e734e7 100644
> > > --- a/gcc/expr.cc
> > > +++ b/gcc/expr.cc
> > > @@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
> > >        && integer_zerop (arg1)
> > >        && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
> > >      {
> > > -      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > > -      if (srcstmt
> > > -         && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > > +      wide_int nz = tree_nonzero_bits (arg0);
> > > +
> > > +      if (wi::popcount (nz) == 1)
> > >         {
> > > +         tree op0;
> > > +         tree op1;
> > > +         gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > > +         /* If the defining statement was (x & POW2), then remove the and
> > > +            as we are going to add it back. */
> > > +         if (srcstmt
> > > +             && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > > +           {
> > > +             op0 = gimple_assign_rhs1 (srcstmt);
> > > +             op1 = gimple_assign_rhs2 (srcstmt);
> > > +           }
> > > +         else
> > > +           {
> > > +             op0 = arg0;
> > > +             op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
> > > +           }
> > >           enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
> > >           type = lang_hooks.types.type_for_mode (mode, unsignedp);
> > > -         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
> > > -                                      gimple_assign_rhs1 (srcstmt),
> > > -                                      gimple_assign_rhs2 (srcstmt));
> > > +         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
> > > +                                      op0, op1);
> > >           temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
> > >           if (temp)
> > >             return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
> >
> > I wonder if, instead of expanding expand with these kind of tricks we
> > want to instead
> > add to ISEL and use direct optab IFNs for things we matched?  In
> > particular I think
> > we do want to get rid of TER but the above adds another use of get_def_for_expr.
>
> The above does not add another at all. It was there before, it just
> moves it around slightly. Instead we depend on the non-zero bits to be
> correct before even trying get_def_for_expr .
> The get_def_for_expr is there to remove the & if it can be ter'ed.
>
> >
> > As Jeff says the above doesn't look like it includes costing so that would be an
> > argument to make it a generic match.pd transform (it appears to be "simpler")?
>
> For the TER case, it would be same number of gimple instructions so
> that can happen if we want
> t = a & CST
> result = t != 0
> vs:
> t1 = BIT_FIELD_REF <a, 1, N>
> result = (bool)t1
>
> For the non-TER case (which is what this patch is trying to solve).
> we just have `t != 0` (where t has a non-zero value of CST) so it might increase
> the number of gimple instructions by 1.
>
> Is that ok? Or should that still happen in expand only.

I was looking at the `(t & 1<<30) != 0` to `(t >> 30) & 1` case mostly
(there's conversions involved at both sides possibly).  I don't think we
want BIT_FIELD_REF for bit extraction as canonical variant on GIMPLE.

> The cost issue between a != 0 vs bit_extraction (for the non-ter case)
> is something which I will be solving next weekend.

Ah, good - if costs are involved expand is the correct place (or pre-expand
ISEL).

Richard.

> >
> > Richard.
> >
> > > --
> > > 2.31.1
> > >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-23  6:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19  2:14 [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Andrew Pinski
2023-05-19  2:14 ` [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit Andrew Pinski
2023-05-19 16:45   ` Jeff Law
2023-05-19 16:38 ` [PATCH 1/2] Improve do_store_flag for single bit comparison against 0 Jeff Law
2023-05-19 22:04   ` Andrew Pinski
2023-05-22 11:55 ` Richard Biener
2023-05-22 15:21   ` Andrew Pinski
2023-05-23  6:07     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).