From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 0E86D385840A for ; Fri, 29 Jul 2022 06:57:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E86D385840A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=UYOhiT92dR8zXbpKiqDBdY9aeb2mTZwfn5FflyUrNPo=; b=ZmCq5+Ybf7n0zPjqJP3Ni8Z1BZ MYOiU6BfTm5bBLSeW0dfhC+Aj8/ytpS3Wf2KGC4Dw1RYx2YUI5LquMcRL6xUH5V42/aAR35jePPwK dMN67nOTNSajXBhIJi6BoAUCfV2h7PzirvELP6YOz6Uvz40GPZ8lJf4WSmoacKNFfsb5pbQ/jZsav +bgAB+x6qw/YTqYmsQW1hQAFcJ5w9yZpUM68fj+MBeubBauyZeEsrgIdxS/MfXqCtqhKR0eGFf0WF 96+ro6cSAoEffG50jNMXZzeocDU84W2PXgD7ChE0E/9tv9Mdu6IkOKWCN57yrxqa8WQfXRAwdcLQT 26IAm3mQ==; Received: from host86-169-41-119.range86-169.btcentralplus.com ([86.169.41.119]:62161 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oHJwb-0005L3-BE; Fri, 29 Jul 2022 02:57:57 -0400 From: "Roger Sayle" To: "'Segher Boessenkool'" Cc: References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com> <20220727202336.GE25951@gate.crashing.org> In-Reply-To: <20220727202336.GE25951@gate.crashing.org> Subject: RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx. Date: Fri, 29 Jul 2022 07:57:51 +0100 Message-ID: <041201d8a318$8850d110$98f27330$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQGqgpDoQOd+6+TzNywqEkVR4Bop4QGnqx6WrePktGA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jul 2022 06:57:59 -0000 Hi Segher, > On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote: > > This patch implements some additional zero-extension and > > sign-extension related optimizations in simplify-rtx.cc. The original > > motivation comes from PR rtl-optimization/71775, where in comment #2 > Andrew Pinski sees: > > > > Failed to match this instruction: > > (set (reg:DI 88 [ _1 ]) > > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) > > > > On many platforms the result of DImode CTZ is constrained to be a > > small unsigned integer (between 0 and 64), hence the truncation to > > 32-bits (using a SUBREG) and the following sign extension back to > > 64-bits are effectively a no-op, so the above should ideally (often) > > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". > > And you can also do that if ctz is undefined for a zero argument! Forgive my perhaps poor use of terminology. The case of ctz 0 on x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that would allow us to do anything, but implementation defined (which Intel calls "undefined" in their documentation). Hence, we don't know which DI value is placed in the result register. In this case, truncating to SI mode, then sign extending the result is not a no-op, as the top bits will/must now all be the same [though admittedly to an unknown undefined signbit]. Hence the above optimization would be invalid, as it doesn't guarantee the result would be sign-extended. > > To implement this, and some closely related transformations, we build > > upon the existing val_signbit_known_clear_p predicate. In the first > > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit > > bit set, > > Is that guaranteed in all cases? Also at -O0, also for args bigger than > 64 bits? val_signbit_known_clear_p should work for any size/precision arg. I'm not sure if the results are affected by -O0, but even if they are, this will not affect correctness only whether these optimizations are performed, which is precisely what -O0 controls. > > + /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...). */ > > + if (GET_CODE (op) == SUBREG > > + && subreg_lowpart_p (op) > > + && GET_MODE (SUBREG_REG (op)) == mode > > + && is_a (mode, &int_mode) > > + && is_a (GET_MODE (op), &op_mode) > > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT > > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION > (int_mode) > > + && (nonzero_bits (SUBREG_REG (op), mode) > > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0) > > (spaces around >> please) Doh! Good catch, thanks. > Please use val_signbit_known_{set,clear}_p? Alas, it's not just the SI mode's signbit that we care about, but all of the bits above it in the DImode operand/result. These all need to be zero, for the operand to already be zero-extended/sign_extended. > > + return SUBREG_REG (op); > > Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the > value it returns in its second arg does not survive sign extending unmodified (if it > is 0xffffffff for an extend from SI to DI for example). Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative result, such as -1 is already handled (accounted for) in nonzero_bits. The relevant code in rtlanal.cc's nonzero_bits1 is: case CTZ: /* If CTZ has a known value at zero, then the nonzero bits are that value, plus the number of bits in the mode minus one. */ if (CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero)) nonzero |= (HOST_WIDE_INT_1U << (floor_log2 (mode_width))) - 1; else nonzero = -1; break; Hence, any bits set by the constant returned by the target's DEFINED_VALUE_AT_ZERO will be set in the result of nonzero_bits. So if this is negative, say -1, then val_signbit_known_clear_p (or the more complex tests above) will return false. I'm currently bootstrapping and regression testing the whitespace change/correction suggested above. Thanks, Roger --