From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id A4D7B385800E for ; Wed, 27 Jul 2022 20:24:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A4D7B385800E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 26RKNbBL009170; Wed, 27 Jul 2022 15:23:37 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 26RKNapQ009169; Wed, 27 Jul 2022 15:23:36 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Wed, 27 Jul 2022 15:23:36 -0500 From: Segher Boessenkool To: Roger Sayle Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx. Message-ID: <20220727202336.GE25951@gate.crashing.org> References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2022 20:24:40 -0000 Hi! On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote: > This patch implements some additional zero-extension and sign-extension > related optimizations in simplify-rtx.cc. The original motivation comes > from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees: > > Failed to match this instruction: > (set (reg:DI 88 [ _1 ]) > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) > > On many platforms the result of DImode CTZ is constrained to be a > small unsigned integer (between 0 and 64), hence the truncation to > 32-bits (using a SUBREG) and the following sign extension back to > 64-bits are effectively a no-op, so the above should ideally (often) > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". And you can also do that if ctz is undefined for a zero argument! > To implement this, and some closely related transformations, we build > upon the existing val_signbit_known_clear_p predicate. In the first > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit > bit set, Is that guaranteed in all cases? Also at -O0, also for args bigger than 64 bits? > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with > default architecture options is undefined at zero, so we can't be sure > the upper bits of reg:DI 88 will be sign extended (all zeros or all ones). > nonzero_bits knows this, so the above transformations don't trigger, > but the transformations themselves are perfectly valid for other > operations such as FFS, POPCOUNT and PARITY, and on other targets/-march > settings where CTZ is defined at zero. It is valid to do whatever you want if the result of CTZ or CLZ is undefined, so the sign_extend of c[lt]z is a nop. However if C[LT]Z_DEFINED_VALUE_AT_ZERO is non-zero you have to check if the returned value (the macro's second arg) survives sign-extending. > /* If operand is something known to be positive, ignore the ABS. */ > - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS > - || val_signbit_known_clear_p (GET_MODE (op), > - nonzero_bits (op, GET_MODE (op)))) > + if (val_signbit_known_clear_p (GET_MODE (op), > + nonzero_bits (op, GET_MODE (op)))) > return op; I don't think val_signbit_known_clear_p is true in all cases, as I said above, but we do not care about generated code quality for these cases. OK. > + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when > + we know the sign bit of OP must be clear. */ > + if (val_signbit_known_clear_p (GET_MODE (op), > + nonzero_bits (op, GET_MODE (op)))) > + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); OK. > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > + if (GET_CODE (op) == SUBREG > + && subreg_lowpart_p (op) > + && GET_MODE (SUBREG_REG (op)) == mode > + && is_a (mode, &int_mode) > + && is_a (GET_MODE (op), &op_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) > + && (nonzero_bits (SUBREG_REG (op), mode) > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0) (spaces around >> please) Please use val_signbit_known_{set,clear}_p? > + return SUBREG_REG (op); Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the value it returns in its second arg does not survive sign extending unmodified (if it is 0xffffffff for an extend from SI to DI for exxample). > + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > + if (GET_CODE (op) == SUBREG > + && subreg_lowpart_p (op) > + && GET_MODE (SUBREG_REG (op)) == mode > + && is_a (mode, &int_mode) > + && is_a (GET_MODE (op), &op_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) > + && (nonzero_bits (SUBREG_REG (op), mode) > + & ~GET_MODE_MASK (op_mode)) == 0) > + return SUBREG_REG (op); This has that same problem. Segher