From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 935003858012
 for <gcc-patches@gcc.gnu.org>; Tue,  2 Aug 2022 16:46:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 935003858012
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EB29F1477;
 Tue,  2 Aug 2022 09:46:06 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AA6453F70D;
 Tue,  2 Aug 2022 09:46:05 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Roger Sayle" <roger@nextmovesoftware.com>
Mail-Followup-To: "Roger Sayle" <roger@nextmovesoftware.com>,
 "'GCC Patches'" <gcc-patches@gcc.gnu.org>,
 "'Segher Boessenkool'" <segher@kernel.crashing.org>, richard.sandiford@arm.com
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>,
 "'Segher Boessenkool'" <segher@kernel.crashing.org>
Subject: Re: [PATCH take #2] Some additional zero-extension related
 optimizations in simplify-rtx.
References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com>
 <mpto7x3q928.fsf@arm.com>
 <01c401d8a666$d48c9e50$7da5daf0$@nextmovesoftware.com>
 <mptv8rapxx3.fsf@arm.com>
Date: Tue, 02 Aug 2022 17:46:04 +0100
In-Reply-To: <mptv8rapxx3.fsf@arm.com> (Richard Sandiford via Gcc-patches's
 message of "Tue, 02 Aug 2022 14:39:20 +0100")
Message-ID: <mptpmhipp9v.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-52.8 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Aug 2022 16:46:09 -0000

Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>> Many thanks to Segher and Richard for pointing out that my removal
>> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
>> of this patch was incorrect, and my assumption that these would be 
>> subsumed by val_signbit_known_clear_p was mistaken.  That the
>> tests for ABS and FFS looked out of place, was not an indication that
>> they were not required, but that we were missing simplifications for
>> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
>> To make up for this mistake, in this revised patch I've not only restored
>> the tests for ABS and FFS, but also added the many sibling RTX codes
>> that I'd also expect to see optimized here, such as ABS(PARITY(x)).
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check, both with and without --target_board=unix{-m32},
>> with no new failures.  Ok for mainline?
>>
>>
>> 2022-08-02  Roger Sayle  <roger@nextmovesoftware.com>
>>             Segher Boessenkool  <segher@kernel.crashing.org>
>>             Richard Sandiford  <richard.sandiford@arm.com>
>
> Thanks, but I didn't actually write anything. :-)
>
>> gcc/ChangeLog
>>         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Add
>>         optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
>>         and LSHIFTRT that are all positive to complement the existing
>>         FFS and (idempotent) ABS simplifications.
>>         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>>         val_signbit_known_clear_p is true of the operand.
>>         Simplify sign extensions of SUBREG truncations of operands
>>         that are already suitably (zero) extended.
>>         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>>         of operands that are already suitably zero extended.
>>
>> Thanks in advance,
>> Roger
>> --
>>
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: 02 August 2022 10:39
>>> To: Roger Sayle <roger@nextmovesoftware.com>
>>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
>>> <segher@kernel.crashing.org>
>>> Subject: Re: [PATCH] Some additional zero-extension related optimizations
>> in
>>> simplify-rtx.
>>> 
>>> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>>> > This patch implements some additional zero-extension and
>>> > sign-extension related optimizations in simplify-rtx.cc.  The original
>>> > motivation comes from PR rtl-optimization/71775, where in comment #2
>>> Andrew Pinski sees:
>>> >
>>> > Failed to match this instruction:
>>> > (set (reg:DI 88 [ _1 ])
>>> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>>> >
>>> > On many platforms the result of DImode CTZ is constrained to be a
>>> > small unsigned integer (between 0 and 64), hence the truncation to
>>> > 32-bits (using a SUBREG) and the following sign extension back to
>>> > 64-bits are effectively a no-op, so the above should ideally (often)
>>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>>> >
>>> > To implement this, and some closely related transformations, we build
>>> > upon the existing val_signbit_known_clear_p predicate.  In the first
>>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
>>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
>>> > can itself be simplified.
>>> 
>>> I think I misunderstood, but just in case: RTL ABS is well-defined for the
>> minimum
>>> integer (giving back the minimum integer), so we can't assume that ABS
>> leaves
>>> the sign bit clear.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> > The second transformation is that we can canonicalized SIGN_EXTEND to
>>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
>>> > sign-bit is known to be clear.  The final two chunks are for
>>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
>>> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
>>> > pessimistically thinks that the upper bits may have an arbitrary value
>>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
>>> > to confirm that the high bits are known to be zero.
>>> >
>>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
>>> > default architecture options is undefined at zero, so we can't be sure
>>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
>> ones).
>>> > nonzero_bits knows this, so the above transformations don't trigger,
>>> > but the transformations themselves are perfectly valid for other
>>> > operations such as FFS, POPCOUNT and PARITY, and on other
>>> > targets/-march settings where CTZ is defined at zero.
>>> >
>>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> > and make -k check, both with and without --target_board=unix{-m32},
>>> > with no new failures.  Testing with CSiBE shows these transformations
>>> > trigger on several source files (and with -Os reduces the size of the
>>> > code).  Ok for mainline?
>>> >
>>> >
>>> > 2022-07-27  Roger Sayle  <roger@nextmovesoftware.com>
>>> >
>>> > gcc/ChangeLog
>>> >         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
>>> >         test as both FFS and ABS result in nonzero_bits returning a
>>> >         mask that satisfies val_signbit_known_clear_p.
>>> >         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>>> >         val_signbit_known_clear_p is true of the operand.
>>> >         Simplify sign extensions of SUBREG truncations of operands
>>> >         that are already suitably (zero) extended.
>>> >         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>>> >         of operands that are already suitably zero extended.
>>> >
>>> >
>>> > Thanks in advance,
>>> > Roger
>>> > --
>>> >
>>> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index
>>> > fa20665..e62bf56 100644
>>> > --- a/gcc/simplify-rtx.cc
>>> > +++ b/gcc/simplify-rtx.cc
>>> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1
>>> (rtx_code code, machine_mode mode,
>>> >  	break;
>>> >
>>> >        /* If operand is something known to be positive, ignore the ABS.
>> */
>>> > -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
>>> > -	  || val_signbit_known_clear_p (GET_MODE (op),
>>> > -					nonzero_bits (op, GET_MODE (op))))
>>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>>> > +				     nonzero_bits (op, GET_MODE (op))))
>>> >  	return op;
>>> >
>>> >        /* If operand is known to be only -1 or 0, convert ABS to NEG.
>>> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1
>>> (rtx_code code, machine_mode mode,
>>> >  	    }
>>> >  	}
>>> >
>>> > +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
>>> > +         we know the sign bit of OP must be clear.  */
>>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>>> > +				     nonzero_bits (op, GET_MODE (op))))
>>> > +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>>> > +
>>> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>>> > +      if (GET_CODE (op) == SUBREG
>>> > +	  && subreg_lowpart_p (op)
>>> > +	  && GET_MODE (SUBREG_REG (op)) == mode
>>> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
>>> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>>> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>>> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>>> (int_mode)
>>> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
>>> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
>>> > +	return SUBREG_REG (op);
>>> > +
>>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>>> >        /* As we do not know which address space the pointer is referring
>> to,
>>> >  	 we can do this only if the target does not support different
>>> > pointer @@ -1765,6 +1782,18 @@
>>> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode
>>> mode,
>>> >  				     op0_mode);
>>> >  	}
>>> >
>>> > +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>>> > +      if (GET_CODE (op) == SUBREG
>>> > +	  && subreg_lowpart_p (op)
>>> > +	  && GET_MODE (SUBREG_REG (op)) == mode
>>> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
>>> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>>> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>>> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>>> (int_mode)
>>> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
>>> > +	      & ~GET_MODE_MASK (op_mode)) == 0)
>>> > +	return SUBREG_REG (op);
>>> > +
>>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>>> >        /* As we do not know which address space the pointer is referring
>> to,
>>> >  	 we can do this only if the target does not support different
>>> > pointer
>>
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index fa20665..b53272b 100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>>  	break;
>>  
>>        /* If operand is something known to be positive, ignore the ABS.  */
>> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
>> -	  || val_signbit_known_clear_p (GET_MODE (op),
>> -					nonzero_bits (op, GET_MODE (op))))
>> +      if (val_signbit_known_clear_p (GET_MODE (op),
>> +				     nonzero_bits (op, GET_MODE (op))))
>>  	return op;
>>  
>> +      /* Using nonzero_bits doesn't (currently) work for modes wider than
>> +	 HOST_WIDE_INT, so the following transformations help simplify
>> +	 ABS for TImode and wider.  */
>> +      switch (GET_CODE (op))
>> +	{
>> +	case ABS:
>> +	case CLRSB:
>> +	case FFS:
>> +	case PARITY:
>> +	case POPCOUNT:
>> +	case SS_ABS:
>> +	  return op;
>> +
>> +	case CLZ:
>> +	  if (is_a <scalar_int_mode> (mode, &int_mode)
>> +	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
>> +	    {
>> +	      HOST_WIDE_INT val0;
>> +	      if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
>> +		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
>> +		return op;
>
> Shifts right of negative numbers are implementation-defined,
> so I guess this should be unsigned HOST_WIDE_INT instead.
> (Or use (val0 & (GET_MODE_MASK (int_mode) >> 1)) == 0,

Oops, missing ~ :)

> like you do below.)
>
> Same for CTZ.
>
> OK with that change, thanks.
>
> Richard
>
>> +	    }
>> +	  break;
>> +
>> +	case CTZ:
>> +	  if (is_a <scalar_int_mode> (mode, &int_mode)
>> +	      && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
>> +	    {
>> +	      HOST_WIDE_INT val0;
>> +	      if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
>> +		  && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
>> +		return op;
>> +	    }
>> +	  break;
>> +
>> +	case LSHIFTRT:
>> +	  if (CONST_INT_P (XEXP (op, 1))
>> +	      && INTVAL (XEXP (op, 1)) > 0
>> +	      && is_a <scalar_int_mode> (mode, &int_mode)
>> +	      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode))
>> +	    return op;
>> +	  break;
>> +
>> +	default:
>> +	  break;
>> +	}
>> +
>>        /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
>>        if (is_a <scalar_int_mode> (mode, &int_mode)
>>  	  && (num_sign_bit_copies (op, int_mode)
>> @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>>  	    }
>>  	}
>>  
>> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
>> +         we know the sign bit of OP must be clear.  */
>> +      if (val_signbit_known_clear_p (GET_MODE (op),
>> +				     nonzero_bits (op, GET_MODE (op))))
>> +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>> +
>> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> +      if (GET_CODE (op) == SUBREG
>> +	  && subreg_lowpart_p (op)
>> +	  && GET_MODE (SUBREG_REG (op)) == mode
>> +	  && is_a <scalar_int_mode> (mode, &int_mode)
>> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
>> +	  && (nonzero_bits (SUBREG_REG (op), mode)
>> +	      & ~(GET_MODE_MASK (op_mode) >> 1)) == 0)
>> +	return SUBREG_REG (op);
>> +
>>  #if defined(POINTERS_EXTEND_UNSIGNED)
>>        /* As we do not know which address space the pointer is referring to,
>>  	 we can do this only if the target does not support different pointer
>> @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>>  				     op0_mode);
>>  	}
>>  
>> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> +      if (GET_CODE (op) == SUBREG
>> +	  && subreg_lowpart_p (op)
>> +	  && GET_MODE (SUBREG_REG (op)) == mode
>> +	  && is_a <scalar_int_mode> (mode, &int_mode)
>> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
>> +	  && (nonzero_bits (SUBREG_REG (op), mode)
>> +	      & ~GET_MODE_MASK (op_mode)) == 0)
>> +	return SUBREG_REG (op);
>> +
>>  #if defined(POINTERS_EXTEND_UNSIGNED)
>>        /* As we do not know which address space the pointer is referring to,
>>  	 we can do this only if the target does not support different pointer