From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id A4D7B385800E
 for <gcc-patches@gcc.gnu.org>; Wed, 27 Jul 2022 20:24:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A4D7B385800E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 26RKNbBL009170;
 Wed, 27 Jul 2022 15:23:37 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 26RKNapQ009169;
 Wed, 27 Jul 2022 15:23:36 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Wed, 27 Jul 2022 15:23:36 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Some additional zero-extension related optimizations in
 simplify-rtx.
Message-ID: <20220727202336.GE25951@gate.crashing.org>
References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2022 20:24:40 -0000

Hi!

On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> This patch implements some additional zero-extension and sign-extension
> related optimizations in simplify-rtx.cc.  The original motivation comes
> from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees:
> 
> Failed to match this instruction:
> (set (reg:DI 88 [ _1 ])
>     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> 
> On many platforms the result of DImode CTZ is constrained to be a
> small unsigned integer (between 0 and 64), hence the truncation to
> 32-bits (using a SUBREG) and the following sign extension back to
> 64-bits are effectively a no-op, so the above should ideally (often)
> be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".

And you can also do that if ctz is undefined for a zero argument!

> To implement this, and some closely related transformations, we build
> upon the existing val_signbit_known_clear_p predicate.  In the first
> chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> bit set,

Is that guaranteed in all cases?  Also at -O0, also for args bigger than
64 bits?

> Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> default architecture options is undefined at zero, so we can't be sure
> the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
> nonzero_bits knows this, so the above transformations don't trigger,
> but the transformations themselves are perfectly valid for other
> operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
> settings where CTZ is defined at zero.

It is valid to do whatever you want if the result of CTZ or CLZ is
undefined, so the sign_extend of c[lt]z is a nop.

However if C[LT]Z_DEFINED_VALUE_AT_ZERO is non-zero you have to check
if the returned value (the macro's second arg) survives sign-extending.

>        /* If operand is something known to be positive, ignore the ABS.  */
> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -	  || val_signbit_known_clear_p (GET_MODE (op),
> -					nonzero_bits (op, GET_MODE (op))))
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
>  	return op;

I don't think val_signbit_known_clear_p is true in all cases, as I said
above, but we do not care about generated code quality for these cases.
OK.

> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> +         we know the sign bit of OP must be clear.  */
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +				     nonzero_bits (op, GET_MODE (op))))
> +	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));

OK.

> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)

(spaces around >> please)

Please use val_signbit_known_{set,clear}_p?

> +	return SUBREG_REG (op);

Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if
the value it returns in its second arg does not survive sign extending
unmodified (if it is 0xffffffff for an extend from SI to DI for
exxample).

> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +	  && subreg_lowpart_p (op)
> +	  && GET_MODE (SUBREG_REG (op)) == mode
> +	  && is_a <scalar_int_mode> (mode, &int_mode)
> +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +	  && (nonzero_bits (SUBREG_REG (op), mode)
> +	      & ~GET_MODE_MASK (op_mode)) == 0)
> +	return SUBREG_REG (op);

This has that same problem.


Segher