From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 935003858012 for ; Tue, 2 Aug 2022 16:46:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 935003858012 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EB29F1477; Tue, 2 Aug 2022 09:46:06 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AA6453F70D; Tue, 2 Aug 2022 09:46:05 -0700 (PDT) From: Richard Sandiford To: "Roger Sayle" Mail-Followup-To: "Roger Sayle" , "'GCC Patches'" , "'Segher Boessenkool'" , richard.sandiford@arm.com Cc: "'GCC Patches'" , "'Segher Boessenkool'" Subject: Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx. References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> <01c401d8a666$d48c9e50$7da5daf0$@nextmovesoftware.com> Date: Tue, 02 Aug 2022 17:46:04 +0100 In-Reply-To: (Richard Sandiford via Gcc-patches's message of "Tue, 02 Aug 2022 14:39:20 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-52.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2022 16:46:09 -0000 Richard Sandiford via Gcc-patches writes: > "Roger Sayle" writes: >> Many thanks to Segher and Richard for pointing out that my removal >> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version >> of this patch was incorrect, and my assumption that these would be >> subsumed by val_signbit_known_clear_p was mistaken. That the >> tests for ABS and FFS looked out of place, was not an indication that >> they were not required, but that we were missing simplifications for >> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc. >> To make up for this mistake, in this revised patch I've not only restored >> the tests for ABS and FFS, but also added the many sibling RTX codes >> that I'd also expect to see optimized here, such as ABS(PARITY(x)). >> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >> and make -k check, both with and without --target_board=unix{-m32}, >> with no new failures. Ok for mainline? >> >> >> 2022-08-02 Roger Sayle >> Segher Boessenkool >> Richard Sandiford > > Thanks, but I didn't actually write anything. :-) > >> gcc/ChangeLog >> * simplify_rtx.cc (simplify_unary_operation_1) : Add >> optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ >> and LSHIFTRT that are all positive to complement the existing >> FFS and (idempotent) ABS simplifications. >> : Canonicalize SIGN_EXTEND to ZERO_EXTEND when >> val_signbit_known_clear_p is true of the operand. >> Simplify sign extensions of SUBREG truncations of operands >> that are already suitably (zero) extended. >> : Simplify zero extensions of SUBREG truncations >> of operands that are already suitably zero extended. >> >> Thanks in advance, >> Roger >> -- >> >>> -----Original Message----- >>> From: Richard Sandiford >>> Sent: 02 August 2022 10:39 >>> To: Roger Sayle >>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool' >>> >>> Subject: Re: [PATCH] Some additional zero-extension related optimizations >> in >>> simplify-rtx. >>> >>> "Roger Sayle" writes: >>> > This patch implements some additional zero-extension and >>> > sign-extension related optimizations in simplify-rtx.cc. The original >>> > motivation comes from PR rtl-optimization/71775, where in comment #2 >>> Andrew Pinski sees: >>> > >>> > Failed to match this instruction: >>> > (set (reg:DI 88 [ _1 ]) >>> > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) >>> > >>> > On many platforms the result of DImode CTZ is constrained to be a >>> > small unsigned integer (between 0 and 64), hence the truncation to >>> > 32-bits (using a SUBREG) and the following sign extension back to >>> > 64-bits are effectively a no-op, so the above should ideally (often) >>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". >>> > >>> > To implement this, and some closely related transformations, we build >>> > upon the existing val_signbit_known_clear_p predicate. In the first >>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit >>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) >>> > can itself be simplified. >>> >>> I think I misunderstood, but just in case: RTL ABS is well-defined for the >> minimum >>> integer (giving back the minimum integer), so we can't assume that ABS >> leaves >>> the sign bit clear. >>> >>> Thanks, >>> Richard >>> >>> > The second transformation is that we can canonicalized SIGN_EXTEND to >>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's >>> > sign-bit is known to be clear. The final two chunks are for >>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating >>> > SUBREG respectively. The nonzero_bits of a truncating SUBREG >>> > pessimistically thinks that the upper bits may have an arbitrary value >>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand >>> > to confirm that the high bits are known to be zero. >>> > >>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with >>> > default architecture options is undefined at zero, so we can't be sure >>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all >> ones). >>> > nonzero_bits knows this, so the above transformations don't trigger, >>> > but the transformations themselves are perfectly valid for other >>> > operations such as FFS, POPCOUNT and PARITY, and on other >>> > targets/-march settings where CTZ is defined at zero. >>> > >>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >>> > and make -k check, both with and without --target_board=unix{-m32}, >>> > with no new failures. Testing with CSiBE shows these transformations >>> > trigger on several source files (and with -Os reduces the size of the >>> > code). Ok for mainline? >>> > >>> > >>> > 2022-07-27 Roger Sayle >>> > >>> > gcc/ChangeLog >>> > * simplify_rtx.cc (simplify_unary_operation_1) : Simplify >>> > test as both FFS and ABS result in nonzero_bits returning a >>> > mask that satisfies val_signbit_known_clear_p. >>> > : Canonicalize SIGN_EXTEND to ZERO_EXTEND when >>> > val_signbit_known_clear_p is true of the operand. >>> > Simplify sign extensions of SUBREG truncations of operands >>> > that are already suitably (zero) extended. >>> > : Simplify zero extensions of SUBREG truncations >>> > of operands that are already suitably zero extended. >>> > >>> > >>> > Thanks in advance, >>> > Roger >>> > -- >>> > >>> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index >>> > fa20665..e62bf56 100644 >>> > --- a/gcc/simplify-rtx.cc >>> > +++ b/gcc/simplify-rtx.cc >>> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 >>> (rtx_code code, machine_mode mode, >>> > break; >>> > >>> > /* If operand is something known to be positive, ignore the ABS. >> */ >>> > - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS >>> > - || val_signbit_known_clear_p (GET_MODE (op), >>> > - nonzero_bits (op, GET_MODE (op)))) >>> > + if (val_signbit_known_clear_p (GET_MODE (op), >>> > + nonzero_bits (op, GET_MODE (op)))) >>> > return op; >>> > >>> > /* If operand is known to be only -1 or 0, convert ABS to NEG. >>> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 >>> (rtx_code code, machine_mode mode, >>> > } >>> > } >>> > >>> > + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when >>> > + we know the sign bit of OP must be clear. */ >>> > + if (val_signbit_known_clear_p (GET_MODE (op), >>> > + nonzero_bits (op, GET_MODE (op)))) >>> > + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); >>> > + >>> > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ >>> > + if (GET_CODE (op) == SUBREG >>> > + && subreg_lowpart_p (op) >>> > + && GET_MODE (SUBREG_REG (op)) == mode >>> > + && is_a (mode, &int_mode) >>> > + && is_a (GET_MODE (op), &op_mode) >>> > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT >>> > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION >>> (int_mode) >>> > + && (nonzero_bits (SUBREG_REG (op), mode) >>> > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0) >>> > + return SUBREG_REG (op); >>> > + >>> > #if defined(POINTERS_EXTEND_UNSIGNED) >>> > /* As we do not know which address space the pointer is referring >> to, >>> > we can do this only if the target does not support different >>> > pointer @@ -1765,6 +1782,18 @@ >>> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode >>> mode, >>> > op0_mode); >>> > } >>> > >>> > + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ >>> > + if (GET_CODE (op) == SUBREG >>> > + && subreg_lowpart_p (op) >>> > + && GET_MODE (SUBREG_REG (op)) == mode >>> > + && is_a (mode, &int_mode) >>> > + && is_a (GET_MODE (op), &op_mode) >>> > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT >>> > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION >>> (int_mode) >>> > + && (nonzero_bits (SUBREG_REG (op), mode) >>> > + & ~GET_MODE_MASK (op_mode)) == 0) >>> > + return SUBREG_REG (op); >>> > + >>> > #if defined(POINTERS_EXTEND_UNSIGNED) >>> > /* As we do not know which address space the pointer is referring >> to, >>> > we can do this only if the target does not support different >>> > pointer >> >> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >> index fa20665..b53272b 100644 >> --- a/gcc/simplify-rtx.cc >> +++ b/gcc/simplify-rtx.cc >> @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, >> break; >> >> /* If operand is something known to be positive, ignore the ABS. */ >> - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS >> - || val_signbit_known_clear_p (GET_MODE (op), >> - nonzero_bits (op, GET_MODE (op)))) >> + if (val_signbit_known_clear_p (GET_MODE (op), >> + nonzero_bits (op, GET_MODE (op)))) >> return op; >> >> + /* Using nonzero_bits doesn't (currently) work for modes wider than >> + HOST_WIDE_INT, so the following transformations help simplify >> + ABS for TImode and wider. */ >> + switch (GET_CODE (op)) >> + { >> + case ABS: >> + case CLRSB: >> + case FFS: >> + case PARITY: >> + case POPCOUNT: >> + case SS_ABS: >> + return op; >> + >> + case CLZ: >> + if (is_a (mode, &int_mode) >> + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT) >> + { >> + HOST_WIDE_INT val0; >> + if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0) >> + && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0) >> + return op; > > Shifts right of negative numbers are implementation-defined, > so I guess this should be unsigned HOST_WIDE_INT instead. > (Or use (val0 & (GET_MODE_MASK (int_mode) >> 1)) == 0, Oops, missing ~ :) > like you do below.) > > Same for CTZ. > > OK with that change, thanks. > > Richard > >> + } >> + break; >> + >> + case CTZ: >> + if (is_a (mode, &int_mode) >> + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT) >> + { >> + HOST_WIDE_INT val0; >> + if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0) >> + && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0) >> + return op; >> + } >> + break; >> + >> + case LSHIFTRT: >> + if (CONST_INT_P (XEXP (op, 1)) >> + && INTVAL (XEXP (op, 1)) > 0 >> + && is_a (mode, &int_mode) >> + && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode)) >> + return op; >> + break; >> + >> + default: >> + break; >> + } >> + >> /* If operand is known to be only -1 or 0, convert ABS to NEG. */ >> if (is_a (mode, &int_mode) >> && (num_sign_bit_copies (op, int_mode) >> @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, >> } >> } >> >> + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when >> + we know the sign bit of OP must be clear. */ >> + if (val_signbit_known_clear_p (GET_MODE (op), >> + nonzero_bits (op, GET_MODE (op)))) >> + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); >> + >> + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ >> + if (GET_CODE (op) == SUBREG >> + && subreg_lowpart_p (op) >> + && GET_MODE (SUBREG_REG (op)) == mode >> + && is_a (mode, &int_mode) >> + && is_a (GET_MODE (op), &op_mode) >> + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT >> + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) >> + && (nonzero_bits (SUBREG_REG (op), mode) >> + & ~(GET_MODE_MASK (op_mode) >> 1)) == 0) >> + return SUBREG_REG (op); >> + >> #if defined(POINTERS_EXTEND_UNSIGNED) >> /* As we do not know which address space the pointer is referring to, >> we can do this only if the target does not support different pointer >> @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, >> op0_mode); >> } >> >> + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ >> + if (GET_CODE (op) == SUBREG >> + && subreg_lowpart_p (op) >> + && GET_MODE (SUBREG_REG (op)) == mode >> + && is_a (mode, &int_mode) >> + && is_a (GET_MODE (op), &op_mode) >> + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT >> + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) >> + && (nonzero_bits (SUBREG_REG (op), mode) >> + & ~GET_MODE_MASK (op_mode)) == 0) >> + return SUBREG_REG (op); >> + >> #if defined(POINTERS_EXTEND_UNSIGNED) >> /* As we do not know which address space the pointer is referring to, >> we can do this only if the target does not support different pointer