From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 6B3DA3856242 for ; Tue, 2 Aug 2022 13:39:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B3DA3856242 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B42D0D6E; Tue, 2 Aug 2022 06:39:22 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6E5253F67D; Tue, 2 Aug 2022 06:39:21 -0700 (PDT) From: Richard Sandiford To: "Roger Sayle" Mail-Followup-To: "Roger Sayle" , "'GCC Patches'" , "'Segher Boessenkool'" , richard.sandiford@arm.com Cc: "'GCC Patches'" , "'Segher Boessenkool'" Subject: Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx. References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> <01c401d8a666$d48c9e50$7da5daf0$@nextmovesoftware.com> Date: Tue, 02 Aug 2022 14:39:20 +0100 In-Reply-To: <01c401d8a666$d48c9e50$7da5daf0$@nextmovesoftware.com> (Roger Sayle's message of "Tue, 2 Aug 2022 12:55:56 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-52.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2022 13:39:25 -0000 "Roger Sayle" writes: > Many thanks to Segher and Richard for pointing out that my removal > of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version > of this patch was incorrect, and my assumption that these would be > subsumed by val_signbit_known_clear_p was mistaken. That the > tests for ABS and FFS looked out of place, was not an indication that > they were not required, but that we were missing simplifications for > the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc. > To make up for this mistake, in this revised patch I've not only restored > the tests for ABS and FFS, but also added the many sibling RTX codes > that I'd also expect to see optimized here, such as ABS(PARITY(x)). > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-08-02 Roger Sayle > Segher Boessenkool > Richard Sandiford Thanks, but I didn't actually write anything. :-) > gcc/ChangeLog > * simplify_rtx.cc (simplify_unary_operation_1) : Add > optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ > and LSHIFTRT that are all positive to complement the existing > FFS and (idempotent) ABS simplifications. > : Canonicalize SIGN_EXTEND to ZERO_EXTEND when > val_signbit_known_clear_p is true of the operand. > Simplify sign extensions of SUBREG truncations of operands > that are already suitably (zero) extended. > : Simplify zero extensions of SUBREG truncations > of operands that are already suitably zero extended. > > Thanks in advance, > Roger > -- > >> -----Original Message----- >> From: Richard Sandiford >> Sent: 02 August 2022 10:39 >> To: Roger Sayle >> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool' >> >> Subject: Re: [PATCH] Some additional zero-extension related optimizations > in >> simplify-rtx. >> >> "Roger Sayle" writes: >> > This patch implements some additional zero-extension and >> > sign-extension related optimizations in simplify-rtx.cc. The original >> > motivation comes from PR rtl-optimization/71775, where in comment #2 >> Andrew Pinski sees: >> > >> > Failed to match this instruction: >> > (set (reg:DI 88 [ _1 ]) >> > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) >> > >> > On many platforms the result of DImode CTZ is constrained to be a >> > small unsigned integer (between 0 and 64), hence the truncation to >> > 32-bits (using a SUBREG) and the following sign extension back to >> > 64-bits are effectively a no-op, so the above should ideally (often) >> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". >> > >> > To implement this, and some closely related transformations, we build >> > upon the existing val_signbit_known_clear_p predicate. In the first >> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit >> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) >> > can itself be simplified. >> >> I think I misunderstood, but just in case: RTL ABS is well-defined for the > minimum >> integer (giving back the minimum integer), so we can't assume that ABS > leaves >> the sign bit clear. >> >> Thanks, >> Richard >> >> > The second transformation is that we can canonicalized SIGN_EXTEND to >> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's >> > sign-bit is known to be clear. The final two chunks are for >> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating >> > SUBREG respectively. The nonzero_bits of a truncating SUBREG >> > pessimistically thinks that the upper bits may have an arbitrary value >> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand >> > to confirm that the high bits are known to be zero. >> > >> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with >> > default architecture options is undefined at zero, so we can't be sure >> > the upper bits of reg:DI 88 will be sign extended (all zeros or all > ones). >> > nonzero_bits knows this, so the above transformations don't trigger, >> > but the transformations themselves are perfectly valid for other >> > operations such as FFS, POPCOUNT and PARITY, and on other >> > targets/-march settings where CTZ is defined at zero. >> > >> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >> > and make -k check, both with and without --target_board=unix{-m32}, >> > with no new failures. Testing with CSiBE shows these transformations >> > trigger on several source files (and with -Os reduces the size of the >> > code). Ok for mainline? >> > >> > >> > 2022-07-27 Roger Sayle >> > >> > gcc/ChangeLog >> > * simplify_rtx.cc (simplify_unary_operation_1) : Simplify >> > test as both FFS and ABS result in nonzero_bits returning a >> > mask that satisfies val_signbit_known_clear_p. >> > : Canonicalize SIGN_EXTEND to ZERO_EXTEND when >> > val_signbit_known_clear_p is true of the operand. >> > Simplify sign extensions of SUBREG truncations of operands >> > that are already suitably (zero) extended. >> > : Simplify zero extensions of SUBREG truncations >> > of operands that are already suitably zero extended. >> > >> > >> > Thanks in advance, >> > Roger >> > -- >> > >> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index >> > fa20665..e62bf56 100644 >> > --- a/gcc/simplify-rtx.cc >> > +++ b/gcc/simplify-rtx.cc >> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1 >> (rtx_code code, machine_mode mode, >> > break; >> > >> > /* If operand is something known to be positive, ignore the ABS. > */ >> > - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS >> > - || val_signbit_known_clear_p (GET_MODE (op), >> > - nonzero_bits (op, GET_MODE (op)))) >> > + if (val_signbit_known_clear_p (GET_MODE (op), >> > + nonzero_bits (op, GET_MODE (op)))) >> > return op; >> > >> > /* If operand is known to be only -1 or 0, convert ABS to NEG. >> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1 >> (rtx_code code, machine_mode mode, >> > } >> > } >> > >> > + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when >> > + we know the sign bit of OP must be clear. */ >> > + if (val_signbit_known_clear_p (GET_MODE (op), >> > + nonzero_bits (op, GET_MODE (op)))) >> > + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); >> > + >> > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ >> > + if (GET_CODE (op) == SUBREG >> > + && subreg_lowpart_p (op) >> > + && GET_MODE (SUBREG_REG (op)) == mode >> > + && is_a (mode, &int_mode) >> > + && is_a (GET_MODE (op), &op_mode) >> > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT >> > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION >> (int_mode) >> > + && (nonzero_bits (SUBREG_REG (op), mode) >> > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0) >> > + return SUBREG_REG (op); >> > + >> > #if defined(POINTERS_EXTEND_UNSIGNED) >> > /* As we do not know which address space the pointer is referring > to, >> > we can do this only if the target does not support different >> > pointer @@ -1765,6 +1782,18 @@ >> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode >> mode, >> > op0_mode); >> > } >> > >> > + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ >> > + if (GET_CODE (op) == SUBREG >> > + && subreg_lowpart_p (op) >> > + && GET_MODE (SUBREG_REG (op)) == mode >> > + && is_a (mode, &int_mode) >> > + && is_a (GET_MODE (op), &op_mode) >> > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT >> > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION >> (int_mode) >> > + && (nonzero_bits (SUBREG_REG (op), mode) >> > + & ~GET_MODE_MASK (op_mode)) == 0) >> > + return SUBREG_REG (op); >> > + >> > #if defined(POINTERS_EXTEND_UNSIGNED) >> > /* As we do not know which address space the pointer is referring > to, >> > we can do this only if the target does not support different >> > pointer > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index fa20665..b53272b 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > break; > > /* If operand is something known to be positive, ignore the ABS. */ > - if (GET_CODE (op) == FFS || GET_CODE (op) == ABS > - || val_signbit_known_clear_p (GET_MODE (op), > - nonzero_bits (op, GET_MODE (op)))) > + if (val_signbit_known_clear_p (GET_MODE (op), > + nonzero_bits (op, GET_MODE (op)))) > return op; > > + /* Using nonzero_bits doesn't (currently) work for modes wider than > + HOST_WIDE_INT, so the following transformations help simplify > + ABS for TImode and wider. */ > + switch (GET_CODE (op)) > + { > + case ABS: > + case CLRSB: > + case FFS: > + case PARITY: > + case POPCOUNT: > + case SS_ABS: > + return op; > + > + case CLZ: > + if (is_a (mode, &int_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT) > + { > + HOST_WIDE_INT val0; > + if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0) > + && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0) > + return op; Shifts right of negative numbers are implementation-defined, so I guess this should be unsigned HOST_WIDE_INT instead. (Or use (val0 & (GET_MODE_MASK (int_mode) >> 1)) == 0, like you do below.) Same for CTZ. OK with that change, thanks. Richard > + } > + break; > + > + case CTZ: > + if (is_a (mode, &int_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT) > + { > + HOST_WIDE_INT val0; > + if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0) > + && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0) > + return op; > + } > + break; > + > + case LSHIFTRT: > + if (CONST_INT_P (XEXP (op, 1)) > + && INTVAL (XEXP (op, 1)) > 0 > + && is_a (mode, &int_mode) > + && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode)) > + return op; > + break; > + > + default: > + break; > + } > + > /* If operand is known to be only -1 or 0, convert ABS to NEG. */ > if (is_a (mode, &int_mode) > && (num_sign_bit_copies (op, int_mode) > @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > } > } > > + /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when > + we know the sign bit of OP must be clear. */ > + if (val_signbit_known_clear_p (GET_MODE (op), > + nonzero_bits (op, GET_MODE (op)))) > + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); > + > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > + if (GET_CODE (op) == SUBREG > + && subreg_lowpart_p (op) > + && GET_MODE (SUBREG_REG (op)) == mode > + && is_a (mode, &int_mode) > + && is_a (GET_MODE (op), &op_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) > + && (nonzero_bits (SUBREG_REG (op), mode) > + & ~(GET_MODE_MASK (op_mode) >> 1)) == 0) > + return SUBREG_REG (op); > + > #if defined(POINTERS_EXTEND_UNSIGNED) > /* As we do not know which address space the pointer is referring to, > we can do this only if the target does not support different pointer > @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > op0_mode); > } > > + /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > + if (GET_CODE (op) == SUBREG > + && subreg_lowpart_p (op) > + && GET_MODE (SUBREG_REG (op)) == mode > + && is_a (mode, &int_mode) > + && is_a (GET_MODE (op), &op_mode) > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode) > + && (nonzero_bits (SUBREG_REG (op), mode) > + & ~GET_MODE_MASK (op_mode)) == 0) > + return SUBREG_REG (op); > + > #if defined(POINTERS_EXTEND_UNSIGNED) > /* As we do not know which address space the pointer is referring to, > we can do this only if the target does not support different pointer