From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DB93D385800D for ; Tue, 2 Aug 2022 09:38:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DB93D385800D Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 275E0139F; Tue, 2 Aug 2022 02:38:42 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DAD9D3F67D; Tue, 2 Aug 2022 02:38:40 -0700 (PDT) From: Richard Sandiford To: "Roger Sayle" Mail-Followup-To: "Roger Sayle" , , 'Segher Boessenkool' , richard.sandiford@arm.com Cc: , 'Segher Boessenkool' Subject: Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx. References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> Date: Tue, 02 Aug 2022 10:38:39 +0100 In-Reply-To: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> (Roger Sayle's message of "Wed, 27 Jul 2022 14:42:25 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-53.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2022 09:38:44 -0000 "Roger Sayle" writes: > This patch implements some additional zero-extension and sign-extension > related optimizations in simplify-rtx.cc. The original motivation comes > from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees: > > Failed to match this instruction: > (set (reg:DI 88 [ _1 ]) > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) > > On many platforms the result of DImode CTZ is constrained to be a > small unsigned integer (between 0 and 64), hence the truncation to > 32-bits (using a SUBREG) and the following sign extension back to > 64-bits are effectively a no-op, so the above should ideally (often) > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". > > To implement this, and some closely related transformations, we build > upon the existing val_signbit_known_clear_p predicate. In the first > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) > can itself be simplified. I think I misunderstood, but just in case: RTL ABS is well-defined for the minimum integer (giving back the minimum integer), so we can't assume that ABS leaves the sign bit clear. Thanks, Richard > The second transformation is that we can > canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above) > when the operand's sign-bit is known to be clear. The final two chunks > are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a > truncating SUBREG respectively. The nonzero_bits of a truncating > SUBREG pessimistically thinks that the upper bits may have an > arbitrary value (by taking the SUBREG), so we need look deeper at the > SUBREG's operand to confirm that the high bits are known to be zero. > > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with > default architecture options is undefined at zero, so we can't be sure > the upper bits of reg:DI 88 will be sign extended (all zeros or all ones). > nonzero_bits knows this, so the above transformations don't trigger, > but the transformations themselves are perfectly valid for other > operations such as FFS, POPCOUNT and PARITY, and on other targets/-march > settings where CTZ is defined at zero. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Testing with CSiBE shows these transformations > trigger on several source files (and with -Os reduces the size of the > code). Ok for mainline? > > > 2022-07-27 Roger Sayle > > gcc/ChangeLog > * simplify_rtx.cc (simplify_unary_operation_1) : Simplify > test as both FFS and ABS result in nonzero_bits returning a > mask that satisfies val_signbit_known_clear_p. > : Canonicalize SIGN_EXTEND to ZERO_EXTEND when > val_signbit_known_clear_p is true of the operand. > Simplify sign extensions of SUBREG truncations of operands > that are already suitably (zero) extended. > : Simplify zero extensions of SUBREG truncations > of operands that are already suitably zero extended. > > > Thanks in advance, > Roger > -- > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index fa20665..e62bf56 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > break; > > /* If operand is something known to be positive, ignore the ABS. */ > - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS > - || val_signbit_known_clear_p (GET_MODE (op), > - nonzero_bits (op, GET_MODE (op)))) > + if (val_signbit_known_clear_p (GET_MODE (op), > + nonzero_bits (op, GET_MODE (op)))) > return op; > > /* If operand is known to be only -1 or 0, convert ABS to NEG. */ > @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > } > } > > + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when > + we know the sign bit of OP must be clear. */ > + if (val_signbit_known_clear_p (GET_MODE (op), > + nonzero_bits (op, GET_MODE (op)))) > + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); > + > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > + if (GET_CODE (op) == SUBREG > + && subreg_lowpart_p (op) > + && GET_MODE (SUBREG_REG (op)) == mode > + && is_a (mode, &int_mode) > + && is_a (GET_MODE (op), &op_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) > + && (nonzero_bits (SUBREG_REG (op), mode) > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0) > + return SUBREG_REG (op); > + > #if defined(POINTERS_EXTEND_UNSIGNED) > /* As we do not know which address space the pointer is referring to, > we can do this only if the target does not support different pointer > @@ -1765,6 +1782,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > op0_mode); > } > > + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > + if (GET_CODE (op) == SUBREG > + && subreg_lowpart_p (op) > + && GET_MODE (SUBREG_REG (op)) == mode > + && is_a (mode, &int_mode) > + && is_a (GET_MODE (op), &op_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) > + && (nonzero_bits (SUBREG_REG (op), mode) > + & ~GET_MODE_MASK (op_mode)) == 0) > + return SUBREG_REG (op); > + > #if defined(POINTERS_EXTEND_UNSIGNED) > /* As we do not know which address space the pointer is referring to, > we can do this only if the target does not support different pointer