[PATCH] Some additional zero-extension related optimizations in simplify-rtx.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
@ 2022-07-27 13:42 Roger Sayle
  2022-07-27 20:23 ` Segher Boessenkool
  2022-08-02  9:38 ` Richard Sandiford
  0 siblings, 2 replies; 9+ messages in thread
From: Roger Sayle @ 2022-07-27 13:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Segher Boessenkool'

[-- Attachment #1: Type: text/plain, Size: 2935 bytes --]

This patch implements some additional zero-extension and sign-extension
related optimizations in simplify-rtx.cc.  The original motivation comes
from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees:

Failed to match this instruction:
(set (reg:DI 88 [ _1 ])
    (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))

On many platforms the result of DImode CTZ is constrained to be a
small unsigned integer (between 0 and 64), hence the truncation to
32-bits (using a SUBREG) and the following sign extension back to
64-bits are effectively a no-op, so the above should ideally (often)
be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".

To implement this, and some closely related transformations, we build
upon the existing val_signbit_known_clear_p predicate.  In the first
chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
can itself be simplified.  The second transformation is that we can
canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above)
when the operand's sign-bit is known to be clear.  The final two chunks
are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a
truncating SUBREG respectively.  The nonzero_bits of a truncating
SUBREG pessimistically thinks that the upper bits may have an
arbitrary value (by taking the SUBREG), so we need look deeper at the
SUBREG's operand to confirm that the high bits are known to be zero.

Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
default architecture options is undefined at zero, so we can't be sure
the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
nonzero_bits knows this, so the above transformations don't trigger,
but the transformations themselves are perfectly valid for other
operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
settings where CTZ is defined at zero.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Testing with CSiBE shows these transformations
trigger on several source files (and with -Os reduces the size of the
code).  Ok for mainline?

2022-07-27  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
        test as both FFS and ABS result in nonzero_bits returning a
        mask that satisfies val_signbit_known_clear_p.
        <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
        val_signbit_known_clear_p is true of the operand.
        Simplify sign extensions of SUBREG truncations of operands
        that are already suitably (zero) extended.
        <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
        of operands that are already suitably zero extended.

Thanks in advance,
Roger
--

[-- Attachment #2: patchxt.txt --]
[-- Type: text/plain, Size: 2701 bytes --]

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index fa20665..e62bf56 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 	break;

       /* If operand is something known to be positive, ignore the ABS.  */
-      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
-	  || val_signbit_known_clear_p (GET_MODE (op),
-					nonzero_bits (op, GET_MODE (op))))
+      if (val_signbit_known_clear_p (GET_MODE (op),
+				     nonzero_bits (op, GET_MODE (op))))
 	return op;

       /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
@@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 	    }
 	}

+      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
+         we know the sign bit of OP must be clear.  */
+      if (val_signbit_known_clear_p (GET_MODE (op),
+				     nonzero_bits (op, GET_MODE (op))))
+	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
+
+      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
+      if (GET_CODE (op) == SUBREG
+	  && subreg_lowpart_p (op)
+	  && GET_MODE (SUBREG_REG (op)) == mode
+	  && is_a <scalar_int_mode> (mode, &int_mode)
+	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
+	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
+	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
+	  && (nonzero_bits (SUBREG_REG (op), mode)
+	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
+	return SUBREG_REG (op);
+
 #if defined(POINTERS_EXTEND_UNSIGNED)
       /* As we do not know which address space the pointer is referring to,
 	 we can do this only if the target does not support different pointer
@@ -1765,6 +1782,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 				     op0_mode);
 	}

+      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
+      if (GET_CODE (op) == SUBREG
+	  && subreg_lowpart_p (op)
+	  && GET_MODE (SUBREG_REG (op)) == mode
+	  && is_a <scalar_int_mode> (mode, &int_mode)
+	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
+	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
+	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
+	  && (nonzero_bits (SUBREG_REG (op), mode)
+	      & ~GET_MODE_MASK (op_mode)) == 0)
+	return SUBREG_REG (op);
+
 #if defined(POINTERS_EXTEND_UNSIGNED)
       /* As we do not know which address space the pointer is referring to,
 	 we can do this only if the target does not support different pointer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
  2022-07-27 13:42 [PATCH] Some additional zero-extension related optimizations in simplify-rtx Roger Sayle
@ 2022-07-27 20:23 ` Segher Boessenkool
  2022-07-29  6:57   ` Roger Sayle
  2022-07-29  7:28   ` Roger Sayle
  2022-08-02  9:38 ` Richard Sandiford
  1 sibling, 2 replies; 9+ messages in thread
From: Segher Boessenkool @ 2022-07-27 20:23 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

Hi!

On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> This patch implements some additional zero-extension and sign-extension
> related optimizations in simplify-rtx.cc.  The original motivation comes
> from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees:
> 
> Failed to match this instruction:
> (set (reg:DI 88 [ _1 ])
>     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> 
> On many platforms the result of DImode CTZ is constrained to be a
> small unsigned integer (between 0 and 64), hence the truncation to
> 32-bits (using a SUBREG) and the following sign extension back to
> 64-bits are effectively a no-op, so the above should ideally (often)
> be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".

And you can also do that if ctz is undefined for a zero argument!

> To implement this, and some closely related transformations, we build
> upon the existing val_signbit_known_clear_p predicate.  In the first
> chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> bit set,

Is that guaranteed in all cases?  Also at -O0, also for args bigger than
64 bits?

> Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> default architecture options is undefined at zero, so we can't be sure
> the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
> nonzero_bits knows this, so the above transformations don't trigger,
> but the transformations themselves are perfectly valid for other
> operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
> settings where CTZ is defined at zero.

It is valid to do whatever you want if the result of CTZ or CLZ is
undefined, so the sign_extend of c[lt]z is a nop.

However if C[LT]Z_DEFINED_VALUE_AT_ZERO is non-zero you have to check
if the returned value (the macro's second arg) survives sign-extending.

>        /* If operand is something known to be positive, ignore the ABS.  */
> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -	  || val_signbit_known_clear_p (GET_MODE (op),
> -					nonzero_bits (op, GET_MODE (op))))
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
>  	return op;

I don't think val_signbit_known_clear_p is true in all cases, as I said
above, but we do not care about generated code quality for these cases.
OK.

> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> +         we know the sign bit of OP must be clear.  */
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
> +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));

OK.

> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)

(spaces around >> please)

Please use val_signbit_known_{set,clear}_p?

> +	return SUBREG_REG (op);

Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if
the value it returns in its second arg does not survive sign extending
unmodified (if it is 0xffffffff for an extend from SI to DI for
exxample).

> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~GET_MODE_MASK (op_mode)) == 0)
> +	return SUBREG_REG (op);

This has that same problem.


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
  2022-07-27 20:23 ` Segher Boessenkool
@ 2022-07-29  6:57   ` Roger Sayle
  2022-08-02 20:07     ` Segher Boessenkool
  2022-07-29  7:28   ` Roger Sayle
  1 sibling, 1 reply; 9+ messages in thread
From: Roger Sayle @ 2022-07-29  6:57 UTC (permalink / raw)
  To: 'Segher Boessenkool'; +Cc: gcc-patches

Hi Segher,

> On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> > This patch implements some additional zero-extension and
> > sign-extension related optimizations in simplify-rtx.cc.  The original
> > motivation comes from PR rtl-optimization/71775, where in comment #2
> Andrew Pinski sees:
> >
> > Failed to match this instruction:
> > (set (reg:DI 88 [ _1 ])
> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> >
> > On many platforms the result of DImode CTZ is constrained to be a
> > small unsigned integer (between 0 and 64), hence the truncation to
> > 32-bits (using a SUBREG) and the following sign extension back to
> > 64-bits are effectively a no-op, so the above should ideally (often)
> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> 
> And you can also do that if ctz is undefined for a zero argument!

Forgive my perhaps poor use of terminology.  The case of ctz 0 on
x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that
would allow us to do anything, but implementation defined (which
Intel calls "undefined" in their documentation).  Hence, we don't
know which DI value is placed in the result register.  In this case,
truncating to SI mode, then sign extending the result is not a no-op,
as the top bits will/must now all be the same [though admittedly to an
unknown undefined signbit].  Hence the above optimization would 
be invalid, as it doesn't guarantee the result would be sign-extended.

> > To implement this, and some closely related transformations, we build
> > upon the existing val_signbit_known_clear_p predicate.  In the first
> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> > bit set,
> 
> Is that guaranteed in all cases?  Also at -O0, also for args bigger than
> 64 bits?

val_signbit_known_clear_p should work for any size/precision arg.
I'm not sure if the results are affected by -O0, but even if they are, this
will
not affect correctness only whether these optimizations are performed,
which is precisely what -O0 controls.

> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> > +      if (GET_CODE (op) == SUBREG
> > +	  && subreg_lowpart_p (op)
> > +	  && GET_MODE (SUBREG_REG (op)) == mode
> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
> (int_mode)
> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> 
> (spaces around >> please)

Doh! Good catch, thanks.

> Please use val_signbit_known_{set,clear}_p?

Alas, it's not just the SI mode's signbit that we care about, but all of the
bits above it in the DImode operand/result.  These all need to be zero,
for the operand to already be zero-extended/sign_extended.

> > +	return SUBREG_REG (op);
> 
> Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the
> value it returns in its second arg does not survive sign extending
unmodified (if it
> is 0xffffffff for an extend from SI to DI for example).

Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative
result, such as -1 is already handled (accounted for) in nonzero_bits.  The
relevant
code in rtlanal.cc's nonzero_bits1 is:

    case CTZ:
      /* If CTZ has a known value at zero, then the nonzero bits are
         that value, plus the number of bits in the mode minus one.  */
      if (CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
        nonzero
          |= (HOST_WIDE_INT_1U << (floor_log2 (mode_width))) - 1;
      else
        nonzero = -1;
      break;

Hence, any bits set by the constant returned by the target's
DEFINED_VALUE_AT_ZERO will be set in the result of nonzero_bits.
So if this is negative, say -1, then val_signbit_known_clear_p (or the
more complex tests above) will return false.

I'm currently bootstrapping and regression testing the whitespace 
change/correction suggested above.

Thanks,
Roger
--

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
  2022-07-27 20:23 ` Segher Boessenkool
  2022-07-29  6:57   ` Roger Sayle
@ 2022-07-29  7:28   ` Roger Sayle
  1 sibling, 0 replies; 9+ messages in thread
From: Roger Sayle @ 2022-07-29  7:28 UTC (permalink / raw)
  To: 'Segher Boessenkool'; +Cc: gcc-patches


Hi Segher,
> > > To implement this, and some closely related transformations, we
> > > build upon the existing val_signbit_known_clear_p predicate.  In the
> > > first chunk, nonzero_bits knows that FFS and ABS can't leave the
> > > sign-bit bit set,
> >
> > Is that guaranteed in all cases?  Also at -O0, also for args bigger
> > than 64 bits?
> 
> val_signbit_known_clear_p should work for any size/precision arg.

No, you're right!  Please forgive/excuse me.  Neither val_signbit_p nor
nonzero_bits have yet been updated to use "wide_int", so don't work
for TImode or wider modes.  Doh!

I'm shocked.

Roger
--



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
  2022-07-27 13:42 [PATCH] Some additional zero-extension related optimizations in simplify-rtx Roger Sayle
  2022-07-27 20:23 ` Segher Boessenkool
@ 2022-08-02  9:38 ` Richard Sandiford
  2022-08-02 11:55   ` [PATCH take #2] " Roger Sayle
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2022-08-02  9:38 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches, 'Segher Boessenkool'

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> This patch implements some additional zero-extension and sign-extension
> related optimizations in simplify-rtx.cc.  The original motivation comes
> from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees:
>
> Failed to match this instruction:
> (set (reg:DI 88 [ _1 ])
>     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>
> On many platforms the result of DImode CTZ is constrained to be a
> small unsigned integer (between 0 and 64), hence the truncation to
> 32-bits (using a SUBREG) and the following sign extension back to
> 64-bits are effectively a no-op, so the above should ideally (often)
> be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>
> To implement this, and some closely related transformations, we build
> upon the existing val_signbit_known_clear_p predicate.  In the first
> chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
> can itself be simplified.

I think I misunderstood, but just in case: RTL ABS is well-defined
for the minimum integer (giving back the minimum integer), so we
can't assume that ABS leaves the sign bit clear.

Thanks,
Richard

> The second transformation is that we can
> canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above)
> when the operand's sign-bit is known to be clear.  The final two chunks
> are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a
> truncating SUBREG respectively.  The nonzero_bits of a truncating
> SUBREG pessimistically thinks that the upper bits may have an
> arbitrary value (by taking the SUBREG), so we need look deeper at the
> SUBREG's operand to confirm that the high bits are known to be zero.
>
> Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> default architecture options is undefined at zero, so we can't be sure
> the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
> nonzero_bits knows this, so the above transformations don't trigger,
> but the transformations themselves are perfectly valid for other
> operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
> settings where CTZ is defined at zero.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Testing with CSiBE shows these transformations
> trigger on several source files (and with -Os reduces the size of the
> code).  Ok for mainline?
>
>
> 2022-07-27  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
>         test as both FFS and ABS result in nonzero_bits returning a
>         mask that satisfies val_signbit_known_clear_p.
>         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>         val_signbit_known_clear_p is true of the operand.
>         Simplify sign extensions of SUBREG truncations of operands
>         that are already suitably (zero) extended.
>         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>         of operands that are already suitably zero extended.
>
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index fa20665..e62bf56 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	break;
>  
>        /* If operand is something known to be positive, ignore the ABS.  */
> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -	  || val_signbit_known_clear_p (GET_MODE (op),
> -					nonzero_bits (op, GET_MODE (op))))
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
>  	return op;
>  
>        /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
> @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	    }
>  	}
>  
> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> +         we know the sign bit of OP must be clear.  */
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
> +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> +
> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> +	return SUBREG_REG (op);
> +
>  #if defined(POINTERS_EXTEND_UNSIGNED)
>        /* As we do not know which address space the pointer is referring to,
>  	 we can do this only if the target does not support different pointer
> @@ -1765,6 +1782,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  				     op0_mode);
>  	}
>  
> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~GET_MODE_MASK (op_mode)) == 0)
> +	return SUBREG_REG (op);
> +
>  #if defined(POINTERS_EXTEND_UNSIGNED)
>        /* As we do not know which address space the pointer is referring to,
>  	 we can do this only if the target does not support different pointer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.
  2022-08-02  9:38 ` Richard Sandiford
@ 2022-08-02 11:55   ` Roger Sayle
  2022-08-02 13:39     ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Roger Sayle @ 2022-08-02 11:55 UTC (permalink / raw)
  To: 'GCC Patches'
  Cc: 'Segher Boessenkool', 'Richard Sandiford'

[-- Attachment #1: Type: text/plain, Size: 8375 bytes --]


Many thanks to Segher and Richard for pointing out that my removal
of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
of this patch was incorrect, and my assumption that these would be 
subsumed by val_signbit_known_clear_p was mistaken.  That the
tests for ABS and FFS looked out of place, was not an indication that
they were not required, but that we were missing simplifications for
the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
To make up for this mistake, in this revised patch I've not only restored
the tests for ABS and FFS, but also added the many sibling RTX codes
that I'd also expect to see optimized here, such as ABS(PARITY(x)).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-02  Roger Sayle  <roger@nextmovesoftware.com>
            Segher Boessenkool  <segher@kernel.crashing.org>
            Richard Sandiford  <richard.sandiford@arm.com>

gcc/ChangeLog
        * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Add
        optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
        and LSHIFTRT that are all positive to complement the existing
        FFS and (idempotent) ABS simplifications.
        <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
        val_signbit_known_clear_p is true of the operand.
        Simplify sign extensions of SUBREG truncations of operands
        that are already suitably (zero) extended.
        <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
        of operands that are already suitably zero extended.

Thanks in advance,
Roger
--

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 02 August 2022 10:39
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
> <segher@kernel.crashing.org>
> Subject: Re: [PATCH] Some additional zero-extension related optimizations
in
> simplify-rtx.
> 
> "Roger Sayle" <roger@nextmovesoftware.com> writes:
> > This patch implements some additional zero-extension and
> > sign-extension related optimizations in simplify-rtx.cc.  The original
> > motivation comes from PR rtl-optimization/71775, where in comment #2
> Andrew Pinski sees:
> >
> > Failed to match this instruction:
> > (set (reg:DI 88 [ _1 ])
> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> >
> > On many platforms the result of DImode CTZ is constrained to be a
> > small unsigned integer (between 0 and 64), hence the truncation to
> > 32-bits (using a SUBREG) and the following sign extension back to
> > 64-bits are effectively a no-op, so the above should ideally (often)
> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> >
> > To implement this, and some closely related transformations, we build
> > upon the existing val_signbit_known_clear_p predicate.  In the first
> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
> > can itself be simplified.
> 
> I think I misunderstood, but just in case: RTL ABS is well-defined for the
minimum
> integer (giving back the minimum integer), so we can't assume that ABS
leaves
> the sign bit clear.
> 
> Thanks,
> Richard
> 
> > The second transformation is that we can canonicalized SIGN_EXTEND to
> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
> > sign-bit is known to be clear.  The final two chunks are for
> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
> > pessimistically thinks that the upper bits may have an arbitrary value
> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
> > to confirm that the high bits are known to be zero.
> >
> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> > default architecture options is undefined at zero, so we can't be sure
> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
ones).
> > nonzero_bits knows this, so the above transformations don't trigger,
> > but the transformations themselves are perfectly valid for other
> > operations such as FFS, POPCOUNT and PARITY, and on other
> > targets/-march settings where CTZ is defined at zero.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Testing with CSiBE shows these transformations
> > trigger on several source files (and with -Os reduces the size of the
> > code).  Ok for mainline?
> >
> >
> > 2022-07-27  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
> >         test as both FFS and ABS result in nonzero_bits returning a
> >         mask that satisfies val_signbit_known_clear_p.
> >         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
> >         val_signbit_known_clear_p is true of the operand.
> >         Simplify sign extensions of SUBREG truncations of operands
> >         that are already suitably (zero) extended.
> >         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
> >         of operands that are already suitably zero extended.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index
> > fa20665..e62bf56 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1
> (rtx_code code, machine_mode mode,
> >  	break;
> >
> >        /* If operand is something known to be positive, ignore the ABS.
*/
> > -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> > -	  || val_signbit_known_clear_p (GET_MODE (op),
> > -					nonzero_bits (op, GET_MODE (op))))
> > +      if (val_signbit_known_clear_p (GET_MODE (op),
> > +				     nonzero_bits (op, GET_MODE (op))))
> >  	return op;
> >
> >        /* If operand is known to be only -1 or 0, convert ABS to NEG.
> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1
> (rtx_code code, machine_mode mode,
> >  	    }
> >  	}
> >
> > +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> > +         we know the sign bit of OP must be clear.  */
> > +      if (val_signbit_known_clear_p (GET_MODE (op),
> > +				     nonzero_bits (op, GET_MODE (op))))
> > +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> > +
> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> > +      if (GET_CODE (op) == SUBREG
> > +	  && subreg_lowpart_p (op)
> > +	  && GET_MODE (SUBREG_REG (op)) == mode
> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
> (int_mode)
> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> > +	return SUBREG_REG (op);
> > +
> >  #if defined(POINTERS_EXTEND_UNSIGNED)
> >        /* As we do not know which address space the pointer is referring
to,
> >  	 we can do this only if the target does not support different
> > pointer @@ -1765,6 +1782,18 @@
> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode
> mode,
> >  				     op0_mode);
> >  	}
> >
> > +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> > +      if (GET_CODE (op) == SUBREG
> > +	  && subreg_lowpart_p (op)
> > +	  && GET_MODE (SUBREG_REG (op)) == mode
> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
> (int_mode)
> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
> > +	      & ~GET_MODE_MASK (op_mode)) == 0)
> > +	return SUBREG_REG (op);
> > +
> >  #if defined(POINTERS_EXTEND_UNSIGNED)
> >        /* As we do not know which address space the pointer is referring
to,
> >  	 we can do this only if the target does not support different
> > pointer

[-- Attachment #2: patchxt3.txt --]
[-- Type: text/plain, Size: 4053 bytes --]

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index fa20665..b53272b 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 	break;
 
       /* If operand is something known to be positive, ignore the ABS.  */
-      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
-	  || val_signbit_known_clear_p (GET_MODE (op),
-					nonzero_bits (op, GET_MODE (op))))
+      if (val_signbit_known_clear_p (GET_MODE (op),
+				     nonzero_bits (op, GET_MODE (op))))
 	return op;
 
+      /* Using nonzero_bits doesn't (currently) work for modes wider than
+	 HOST_WIDE_INT, so the following transformations help simplify
+	 ABS for TImode and wider.  */
+      switch (GET_CODE (op))
+	{
+	case ABS:
+	case CLRSB:
+	case FFS:
+	case PARITY:
+	case POPCOUNT:
+	case SS_ABS:
+	  return op;
+
+	case CLZ:
+	  if (is_a <scalar_int_mode> (mode, &int_mode)
+	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
+	    {
+	      HOST_WIDE_INT val0;
+	      if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
+		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
+		return op;
+	    }
+	  break;
+
+	case CTZ:
+	  if (is_a <scalar_int_mode> (mode, &int_mode)
+	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
+	    {
+	      HOST_WIDE_INT val0;
+	      if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
+		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
+		return op;
+	    }
+	  break;
+
+	case LSHIFTRT:
+	  if (CONST_INT_P (XEXP (op, 1))
+	      && INTVAL (XEXP (op, 1)) > 0
+	      && is_a <scalar_int_mode> (mode, &int_mode)
+	      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode))
+	    return op;
+	  break;
+
+	default:
+	  break;
+	}
+
       /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
       if (is_a <scalar_int_mode> (mode, &int_mode)
 	  && (num_sign_bit_copies (op, int_mode)
@@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 	    }
 	}
 
+      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
+         we know the sign bit of OP must be clear.  */
+      if (val_signbit_known_clear_p (GET_MODE (op),
+				     nonzero_bits (op, GET_MODE (op))))
+	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
+
+      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
+      if (GET_CODE (op) == SUBREG
+	  && subreg_lowpart_p (op)
+	  && GET_MODE (SUBREG_REG (op)) == mode
+	  && is_a <scalar_int_mode> (mode, &int_mode)
+	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
+	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
+	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
+	  && (nonzero_bits (SUBREG_REG (op), mode)
+	      & ~(GET_MODE_MASK (op_mode) >> 1)) == 0)
+	return SUBREG_REG (op);
+
 #if defined(POINTERS_EXTEND_UNSIGNED)
       /* As we do not know which address space the pointer is referring to,
 	 we can do this only if the target does not support different pointer
@@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
 				     op0_mode);
 	}
 
+      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
+      if (GET_CODE (op) == SUBREG
+	  && subreg_lowpart_p (op)
+	  && GET_MODE (SUBREG_REG (op)) == mode
+	  && is_a <scalar_int_mode> (mode, &int_mode)
+	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
+	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
+	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
+	  && (nonzero_bits (SUBREG_REG (op), mode)
+	      & ~GET_MODE_MASK (op_mode)) == 0)
+	return SUBREG_REG (op);
+
 #if defined(POINTERS_EXTEND_UNSIGNED)
       /* As we do not know which address space the pointer is referring to,
 	 we can do this only if the target does not support different pointer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.
  2022-08-02 11:55   ` [PATCH take #2] " Roger Sayle
@ 2022-08-02 13:39     ` Richard Sandiford
  2022-08-02 16:46       ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2022-08-02 13:39 UTC (permalink / raw)
  To: Roger Sayle; +Cc: 'GCC Patches', 'Segher Boessenkool'

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> Many thanks to Segher and Richard for pointing out that my removal
> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
> of this patch was incorrect, and my assumption that these would be 
> subsumed by val_signbit_known_clear_p was mistaken.  That the
> tests for ABS and FFS looked out of place, was not an indication that
> they were not required, but that we were missing simplifications for
> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
> To make up for this mistake, in this revised patch I've not only restored
> the tests for ABS and FFS, but also added the many sibling RTX codes
> that I'd also expect to see optimized here, such as ABS(PARITY(x)).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-02  Roger Sayle  <roger@nextmovesoftware.com>
>             Segher Boessenkool  <segher@kernel.crashing.org>
>             Richard Sandiford  <richard.sandiford@arm.com>

Thanks, but I didn't actually write anything. :-)

> gcc/ChangeLog
>         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Add
>         optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
>         and LSHIFTRT that are all positive to complement the existing
>         FFS and (idempotent) ABS simplifications.
>         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>         val_signbit_known_clear_p is true of the operand.
>         Simplify sign extensions of SUBREG truncations of operands
>         that are already suitably (zero) extended.
>         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>         of operands that are already suitably zero extended.
>
> Thanks in advance,
> Roger
> --
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: 02 August 2022 10:39
>> To: Roger Sayle <roger@nextmovesoftware.com>
>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
>> <segher@kernel.crashing.org>
>> Subject: Re: [PATCH] Some additional zero-extension related optimizations
> in
>> simplify-rtx.
>> 
>> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>> > This patch implements some additional zero-extension and
>> > sign-extension related optimizations in simplify-rtx.cc.  The original
>> > motivation comes from PR rtl-optimization/71775, where in comment #2
>> Andrew Pinski sees:
>> >
>> > Failed to match this instruction:
>> > (set (reg:DI 88 [ _1 ])
>> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>> >
>> > On many platforms the result of DImode CTZ is constrained to be a
>> > small unsigned integer (between 0 and 64), hence the truncation to
>> > 32-bits (using a SUBREG) and the following sign extension back to
>> > 64-bits are effectively a no-op, so the above should ideally (often)
>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>> >
>> > To implement this, and some closely related transformations, we build
>> > upon the existing val_signbit_known_clear_p predicate.  In the first
>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
>> > can itself be simplified.
>> 
>> I think I misunderstood, but just in case: RTL ABS is well-defined for the
> minimum
>> integer (giving back the minimum integer), so we can't assume that ABS
> leaves
>> the sign bit clear.
>> 
>> Thanks,
>> Richard
>> 
>> > The second transformation is that we can canonicalized SIGN_EXTEND to
>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
>> > sign-bit is known to be clear.  The final two chunks are for
>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
>> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
>> > pessimistically thinks that the upper bits may have an arbitrary value
>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
>> > to confirm that the high bits are known to be zero.
>> >
>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
>> > default architecture options is undefined at zero, so we can't be sure
>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
> ones).
>> > nonzero_bits knows this, so the above transformations don't trigger,
>> > but the transformations themselves are perfectly valid for other
>> > operations such as FFS, POPCOUNT and PARITY, and on other
>> > targets/-march settings where CTZ is defined at zero.
>> >
>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> > and make -k check, both with and without --target_board=unix{-m32},
>> > with no new failures.  Testing with CSiBE shows these transformations
>> > trigger on several source files (and with -Os reduces the size of the
>> > code).  Ok for mainline?
>> >
>> >
>> > 2022-07-27  Roger Sayle  <roger@nextmovesoftware.com>
>> >
>> > gcc/ChangeLog
>> >         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
>> >         test as both FFS and ABS result in nonzero_bits returning a
>> >         mask that satisfies val_signbit_known_clear_p.
>> >         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>> >         val_signbit_known_clear_p is true of the operand.
>> >         Simplify sign extensions of SUBREG truncations of operands
>> >         that are already suitably (zero) extended.
>> >         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>> >         of operands that are already suitably zero extended.
>> >
>> >
>> > Thanks in advance,
>> > Roger
>> > --
>> >
>> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index
>> > fa20665..e62bf56 100644
>> > --- a/gcc/simplify-rtx.cc
>> > +++ b/gcc/simplify-rtx.cc
>> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1
>> (rtx_code code, machine_mode mode,
>> >  	break;
>> >
>> >        /* If operand is something known to be positive, ignore the ABS.
> */
>> > -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
>> > -	  || val_signbit_known_clear_p (GET_MODE (op),
>> > -					nonzero_bits (op, GET_MODE (op))))
>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>> > +				     nonzero_bits (op, GET_MODE (op))))
>> >  	return op;
>> >
>> >        /* If operand is known to be only -1 or 0, convert ABS to NEG.
>> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1
>> (rtx_code code, machine_mode mode,
>> >  	    }
>> >  	}
>> >
>> > +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
>> > +         we know the sign bit of OP must be clear.  */
>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>> > +				     nonzero_bits (op, GET_MODE (op))))
>> > +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>> > +
>> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> > +      if (GET_CODE (op) == SUBREG
>> > +	  && subreg_lowpart_p (op)
>> > +	  && GET_MODE (SUBREG_REG (op)) == mode
>> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
>> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>> (int_mode)
>> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
>> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
>> > +	return SUBREG_REG (op);
>> > +
>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>> >        /* As we do not know which address space the pointer is referring
> to,
>> >  	 we can do this only if the target does not support different
>> > pointer @@ -1765,6 +1782,18 @@
>> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode
>> mode,
>> >  				     op0_mode);
>> >  	}
>> >
>> > +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> > +      if (GET_CODE (op) == SUBREG
>> > +	  && subreg_lowpart_p (op)
>> > +	  && GET_MODE (SUBREG_REG (op)) == mode
>> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
>> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>> (int_mode)
>> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
>> > +	      & ~GET_MODE_MASK (op_mode)) == 0)
>> > +	return SUBREG_REG (op);
>> > +
>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>> >        /* As we do not know which address space the pointer is referring
> to,
>> >  	 we can do this only if the target does not support different
>> > pointer
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index fa20665..b53272b 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	break;
>  
>        /* If operand is something known to be positive, ignore the ABS.  */
> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -	  || val_signbit_known_clear_p (GET_MODE (op),
> -					nonzero_bits (op, GET_MODE (op))))
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
>  	return op;
>  
> +      /* Using nonzero_bits doesn't (currently) work for modes wider than
> +	 HOST_WIDE_INT, so the following transformations help simplify
> +	 ABS for TImode and wider.  */
> +      switch (GET_CODE (op))
> +	{
> +	case ABS:
> +	case CLRSB:
> +	case FFS:
> +	case PARITY:
> +	case POPCOUNT:
> +	case SS_ABS:
> +	  return op;
> +
> +	case CLZ:
> +	  if (is_a <scalar_int_mode> (mode, &int_mode)
> +	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
> +	    {
> +	      HOST_WIDE_INT val0;
> +	      if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
> +		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
> +		return op;

Shifts right of negative numbers are implementation-defined,
so I guess this should be unsigned HOST_WIDE_INT instead.
(Or use (val0 & (GET_MODE_MASK (int_mode) >> 1)) == 0,
like you do below.)

Same for CTZ.

OK with that change, thanks.

Richard

> +	    }
> +	  break;
> +
> +	case CTZ:
> +	  if (is_a <scalar_int_mode> (mode, &int_mode)
> +	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
> +	    {
> +	      HOST_WIDE_INT val0;
> +	      if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
> +		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
> +		return op;
> +	    }
> +	  break;
> +
> +	case LSHIFTRT:
> +	  if (CONST_INT_P (XEXP (op, 1))
> +	      && INTVAL (XEXP (op, 1)) > 0
> +	      && is_a <scalar_int_mode> (mode, &int_mode)
> +	      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode))
> +	    return op;
> +	  break;
> +
> +	default:
> +	  break;
> +	}
> +
>        /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
>        if (is_a <scalar_int_mode> (mode, &int_mode)
>  	  && (num_sign_bit_copies (op, int_mode)
> @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	    }
>  	}
>  
> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> +         we know the sign bit of OP must be clear.  */
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
> +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> +
> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~(GET_MODE_MASK (op_mode) >> 1)) == 0)
> +	return SUBREG_REG (op);
> +
>  #if defined(POINTERS_EXTEND_UNSIGNED)
>        /* As we do not know which address space the pointer is referring to,
>  	 we can do this only if the target does not support different pointer
> @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  				     op0_mode);
>  	}
>  
> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~GET_MODE_MASK (op_mode)) == 0)
> +	return SUBREG_REG (op);
> +
>  #if defined(POINTERS_EXTEND_UNSIGNED)
>        /* As we do not know which address space the pointer is referring to,
>  	 we can do this only if the target does not support different pointer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.
  2022-08-02 13:39     ` Richard Sandiford
@ 2022-08-02 16:46       ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-08-02 16:46 UTC (permalink / raw)
  To: Roger Sayle; +Cc: 'GCC Patches', 'Segher Boessenkool'

Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>> Many thanks to Segher and Richard for pointing out that my removal
>> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
>> of this patch was incorrect, and my assumption that these would be 
>> subsumed by val_signbit_known_clear_p was mistaken.  That the
>> tests for ABS and FFS looked out of place, was not an indication that
>> they were not required, but that we were missing simplifications for
>> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
>> To make up for this mistake, in this revised patch I've not only restored
>> the tests for ABS and FFS, but also added the many sibling RTX codes
>> that I'd also expect to see optimized here, such as ABS(PARITY(x)).
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check, both with and without --target_board=unix{-m32},
>> with no new failures.  Ok for mainline?
>>
>>
>> 2022-08-02  Roger Sayle  <roger@nextmovesoftware.com>
>>             Segher Boessenkool  <segher@kernel.crashing.org>
>>             Richard Sandiford  <richard.sandiford@arm.com>
>
> Thanks, but I didn't actually write anything. :-)
>
>> gcc/ChangeLog
>>         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Add
>>         optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
>>         and LSHIFTRT that are all positive to complement the existing
>>         FFS and (idempotent) ABS simplifications.
>>         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>>         val_signbit_known_clear_p is true of the operand.
>>         Simplify sign extensions of SUBREG truncations of operands
>>         that are already suitably (zero) extended.
>>         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>>         of operands that are already suitably zero extended.
>>
>> Thanks in advance,
>> Roger
>> --
>>
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: 02 August 2022 10:39
>>> To: Roger Sayle <roger@nextmovesoftware.com>
>>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
>>> <segher@kernel.crashing.org>
>>> Subject: Re: [PATCH] Some additional zero-extension related optimizations
>> in
>>> simplify-rtx.
>>> 
>>> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>>> > This patch implements some additional zero-extension and
>>> > sign-extension related optimizations in simplify-rtx.cc.  The original
>>> > motivation comes from PR rtl-optimization/71775, where in comment #2
>>> Andrew Pinski sees:
>>> >
>>> > Failed to match this instruction:
>>> > (set (reg:DI 88 [ _1 ])
>>> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>>> >
>>> > On many platforms the result of DImode CTZ is constrained to be a
>>> > small unsigned integer (between 0 and 64), hence the truncation to
>>> > 32-bits (using a SUBREG) and the following sign extension back to
>>> > 64-bits are effectively a no-op, so the above should ideally (often)
>>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>>> >
>>> > To implement this, and some closely related transformations, we build
>>> > upon the existing val_signbit_known_clear_p predicate.  In the first
>>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
>>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
>>> > can itself be simplified.
>>> 
>>> I think I misunderstood, but just in case: RTL ABS is well-defined for the
>> minimum
>>> integer (giving back the minimum integer), so we can't assume that ABS
>> leaves
>>> the sign bit clear.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> > The second transformation is that we can canonicalized SIGN_EXTEND to
>>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
>>> > sign-bit is known to be clear.  The final two chunks are for
>>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
>>> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
>>> > pessimistically thinks that the upper bits may have an arbitrary value
>>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
>>> > to confirm that the high bits are known to be zero.
>>> >
>>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
>>> > default architecture options is undefined at zero, so we can't be sure
>>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
>> ones).
>>> > nonzero_bits knows this, so the above transformations don't trigger,
>>> > but the transformations themselves are perfectly valid for other
>>> > operations such as FFS, POPCOUNT and PARITY, and on other
>>> > targets/-march settings where CTZ is defined at zero.
>>> >
>>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> > and make -k check, both with and without --target_board=unix{-m32},
>>> > with no new failures.  Testing with CSiBE shows these transformations
>>> > trigger on several source files (and with -Os reduces the size of the
>>> > code).  Ok for mainline?
>>> >
>>> >
>>> > 2022-07-27  Roger Sayle  <roger@nextmovesoftware.com>
>>> >
>>> > gcc/ChangeLog
>>> >         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
>>> >         test as both FFS and ABS result in nonzero_bits returning a
>>> >         mask that satisfies val_signbit_known_clear_p.
>>> >         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>>> >         val_signbit_known_clear_p is true of the operand.
>>> >         Simplify sign extensions of SUBREG truncations of operands
>>> >         that are already suitably (zero) extended.
>>> >         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>>> >         of operands that are already suitably zero extended.
>>> >
>>> >
>>> > Thanks in advance,
>>> > Roger
>>> > --
>>> >
>>> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index
>>> > fa20665..e62bf56 100644
>>> > --- a/gcc/simplify-rtx.cc
>>> > +++ b/gcc/simplify-rtx.cc
>>> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1
>>> (rtx_code code, machine_mode mode,
>>> >  	break;
>>> >
>>> >        /* If operand is something known to be positive, ignore the ABS.
>> */
>>> > -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
>>> > -	  || val_signbit_known_clear_p (GET_MODE (op),
>>> > -					nonzero_bits (op, GET_MODE (op))))
>>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>>> > +				     nonzero_bits (op, GET_MODE (op))))
>>> >  	return op;
>>> >
>>> >        /* If operand is known to be only -1 or 0, convert ABS to NEG.
>>> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1
>>> (rtx_code code, machine_mode mode,
>>> >  	    }
>>> >  	}
>>> >
>>> > +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
>>> > +         we know the sign bit of OP must be clear.  */
>>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>>> > +				     nonzero_bits (op, GET_MODE (op))))
>>> > +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>>> > +
>>> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>>> > +      if (GET_CODE (op) == SUBREG
>>> > +	  && subreg_lowpart_p (op)
>>> > +	  && GET_MODE (SUBREG_REG (op)) == mode
>>> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
>>> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>>> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>>> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>>> (int_mode)
>>> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
>>> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
>>> > +	return SUBREG_REG (op);
>>> > +
>>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>>> >        /* As we do not know which address space the pointer is referring
>> to,
>>> >  	 we can do this only if the target does not support different
>>> > pointer @@ -1765,6 +1782,18 @@
>>> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode
>>> mode,
>>> >  				     op0_mode);
>>> >  	}
>>> >
>>> > +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>>> > +      if (GET_CODE (op) == SUBREG
>>> > +	  && subreg_lowpart_p (op)
>>> > +	  && GET_MODE (SUBREG_REG (op)) == mode
>>> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
>>> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>>> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>>> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>>> (int_mode)
>>> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
>>> > +	      & ~GET_MODE_MASK (op_mode)) == 0)
>>> > +	return SUBREG_REG (op);
>>> > +
>>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>>> >        /* As we do not know which address space the pointer is referring
>> to,
>>> >  	 we can do this only if the target does not support different
>>> > pointer
>>
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index fa20665..b53272b 100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>>  	break;
>>  
>>        /* If operand is something known to be positive, ignore the ABS.  */
>> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
>> -	  || val_signbit_known_clear_p (GET_MODE (op),
>> -					nonzero_bits (op, GET_MODE (op))))
>> +      if (val_signbit_known_clear_p (GET_MODE (op),
>> +				     nonzero_bits (op, GET_MODE (op))))
>>  	return op;
>>  
>> +      /* Using nonzero_bits doesn't (currently) work for modes wider than
>> +	 HOST_WIDE_INT, so the following transformations help simplify
>> +	 ABS for TImode and wider.  */
>> +      switch (GET_CODE (op))
>> +	{
>> +	case ABS:
>> +	case CLRSB:
>> +	case FFS:
>> +	case PARITY:
>> +	case POPCOUNT:
>> +	case SS_ABS:
>> +	  return op;
>> +
>> +	case CLZ:
>> +	  if (is_a <scalar_int_mode> (mode, &int_mode)
>> +	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
>> +	    {
>> +	      HOST_WIDE_INT val0;
>> +	      if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
>> +		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
>> +		return op;
>
> Shifts right of negative numbers are implementation-defined,
> so I guess this should be unsigned HOST_WIDE_INT instead.
> (Or use (val0 & (GET_MODE_MASK (int_mode) >> 1)) == 0,

Oops, missing ~ :)

> like you do below.)
>
> Same for CTZ.
>
> OK with that change, thanks.
>
> Richard
>
>> +	    }
>> +	  break;
>> +
>> +	case CTZ:
>> +	  if (is_a <scalar_int_mode> (mode, &int_mode)
>> +	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
>> +	    {
>> +	      HOST_WIDE_INT val0;
>> +	      if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
>> +		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
>> +		return op;
>> +	    }
>> +	  break;
>> +
>> +	case LSHIFTRT:
>> +	  if (CONST_INT_P (XEXP (op, 1))
>> +	      && INTVAL (XEXP (op, 1)) > 0
>> +	      && is_a <scalar_int_mode> (mode, &int_mode)
>> +	      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode))
>> +	    return op;
>> +	  break;
>> +
>> +	default:
>> +	  break;
>> +	}
>> +
>>        /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
>>        if (is_a <scalar_int_mode> (mode, &int_mode)
>>  	  && (num_sign_bit_copies (op, int_mode)
>> @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>>  	    }
>>  	}
>>  
>> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
>> +         we know the sign bit of OP must be clear.  */
>> +      if (val_signbit_known_clear_p (GET_MODE (op),
>> +				     nonzero_bits (op, GET_MODE (op))))
>> +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>> +
>> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> +      if (GET_CODE (op) == SUBREG
>> +	  && subreg_lowpart_p (op)
>> +	  && GET_MODE (SUBREG_REG (op)) == mode
>> +	  && is_a <scalar_int_mode> (mode, &int_mode)
>> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
>> +	  && (nonzero_bits (SUBREG_REG (op), mode)
>> +	      & ~(GET_MODE_MASK (op_mode) >> 1)) == 0)
>> +	return SUBREG_REG (op);
>> +
>>  #if defined(POINTERS_EXTEND_UNSIGNED)
>>        /* As we do not know which address space the pointer is referring to,
>>  	 we can do this only if the target does not support different pointer
>> @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>>  				     op0_mode);
>>  	}
>>  
>> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> +      if (GET_CODE (op) == SUBREG
>> +	  && subreg_lowpart_p (op)
>> +	  && GET_MODE (SUBREG_REG (op)) == mode
>> +	  && is_a <scalar_int_mode> (mode, &int_mode)
>> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
>> +	  && (nonzero_bits (SUBREG_REG (op), mode)
>> +	      & ~GET_MODE_MASK (op_mode)) == 0)
>> +	return SUBREG_REG (op);
>> +
>>  #if defined(POINTERS_EXTEND_UNSIGNED)
>>        /* As we do not know which address space the pointer is referring to,
>>  	 we can do this only if the target does not support different pointer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.
  2022-07-29  6:57   ` Roger Sayle
@ 2022-08-02 20:07     ` Segher Boessenkool
  0 siblings, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2022-08-02 20:07 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

Hi!

On Fri, Jul 29, 2022 at 07:57:51AM +0100, Roger Sayle wrote:
> > On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> > > This patch implements some additional zero-extension and
> > > sign-extension related optimizations in simplify-rtx.cc.  The original
> > > motivation comes from PR rtl-optimization/71775, where in comment #2
> > Andrew Pinski sees:
> > >
> > > Failed to match this instruction:
> > > (set (reg:DI 88 [ _1 ])
> > >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> > >
> > > On many platforms the result of DImode CTZ is constrained to be a
> > > small unsigned integer (between 0 and 64), hence the truncation to
> > > 32-bits (using a SUBREG) and the following sign extension back to
> > > 64-bits are effectively a no-op, so the above should ideally (often)
> > > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> > 
> > And you can also do that if ctz is undefined for a zero argument!
> 
> Forgive my perhaps poor use of terminology.  The case of ctz 0 on
> x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that
> would allow us to do anything, but implementation defined (which
> Intel calls "undefined" in their documentation).

This is about CTZ in RTL, in GCC.  CTZ_DEFINED_VALUE_AT_ZERO is 0 here,
which means a zero argument gives an undefined result.

> Hence, we don't
> know which DI value is placed in the result register.  In this case,
> truncating to SI mode, then sign extending the result is not a no-op,
> as the top bits will/must now all be the same [though admittedly to an
> unknown undefined signbit].

And any value is valid.

> Hence the above optimization would 
> be invalid, as it doesn't guarantee the result would be sign-extended.

It does not have to be!  Truncating an undefined DImode value to SIMode
gives an undefined SImode value.  On most architectures (including x86
afaik) you do not need to do any machine insn for that (the top 32 bits
in the register are just ignored for a SImode value).

> > Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the
> > value it returns in its second arg does not survive sign extending
> unmodified (if it
> > is 0xffffffff for an extend from SI to DI for example).
> 
> Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative
> result, such as -1 is already handled (accounted for) in nonzero_bits.  The
> relevant
> code in rtlanal.cc's nonzero_bits1 is:

A negative result, yes.  But that was not my example.


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-02 20:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27 13:42 [PATCH] Some additional zero-extension related optimizations in simplify-rtx Roger Sayle
2022-07-27 20:23 ` Segher Boessenkool
2022-07-29  6:57   ` Roger Sayle
2022-08-02 20:07     ` Segher Boessenkool
2022-07-29  7:28   ` Roger Sayle
2022-08-02  9:38 ` Richard Sandiford
2022-08-02 11:55   ` [PATCH take #2] " Roger Sayle
2022-08-02 13:39     ` Richard Sandiford
2022-08-02 16:46       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).