From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F14E8385AC34; Mon, 12 Sep 2022 14:56:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F14E8385AC34 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CECBE113E; Mon, 12 Sep 2022 07:56:52 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7EEA93F73D; Mon, 12 Sep 2022 07:56:45 -0700 (PDT) From: Richard Sandiford To: "Roger Sayle" Mail-Followup-To: "Roger Sayle" ,"'GCC Patches'" , "'Tamar Christina'" , , richard.sandiford@arm.com Cc: "'GCC Patches'" , "'Tamar Christina'" , Subject: Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap. References: <007201d8c637$e2c3bfd0$a84b3f70$@nextmovesoftware.com> <009601d8c6b2$41557eb0$c4007c10$@nextmovesoftware.com> Date: Mon, 12 Sep 2022 15:56:44 +0100 In-Reply-To: <009601d8c6b2$41557eb0$c4007c10$@nextmovesoftware.com> (Roger Sayle's message of "Mon, 12 Sep 2022 15:16:28 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-42.5 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Roger Sayle" writes: > Hi Richard, > >> "Roger Sayle" writes: >> > This patch addresses PR rtl-optimization/106594, a significant >> > performance regression affecting aarch64 recently introduced (exposed) >> > by one of my recent RTL simplification improvements. Firstly many >> > thanks to Tamar Christina for confirming that the core of this patch >> > provides ~5% performance improvement on some on his benchmarks. >> > >> > GCC's combine pass uses the function expand_compound_operation to >> > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a >> > pair of shift operations, as not all targets support extension >> > instructions [technically ZERO_EXTEND may potentially be simplified/ >> > canonicalized to an AND operation, but the theory remains the same]. >> >> Are you sure the reason is that not all targets support extension instructions? >> I thought in practice this routine would only tend to see ZERO_EXTEND etc. if >> those codes appeared in the original rtl insns. > > Excellent question. My current understanding is that this subroutine is used > for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions > and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for > which many platforms, really do need a pair of shift instructions. (or an AND). > > The bit (to me) that that's suspicious is the exact wording of the comment... > >> > /* Convert sign extension to zero extension, if we know that the high >> > bit is not set, as this is easier to optimize. It will be converted >> > back to cheaper alternative in make_extraction. */ > > Notice that things are converted back in "make_extraction", and may not be > getting converted back (based on empirical evidence) for non-extractions, > i.e. extensions. This code is being called for ZERO_EXTEND on a path that > doesn't subsequently call make_compound. As shown in the PR, there are > code generation differences (that impact performance), but I agree there's > some ambiguity around the intent of the original code. > > My personal preference is to write backend patterns that contain ZERO_EXTEND > (or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical SUBREG. > For example, the new patterns added to i386.md look (to me) "cleaner" than the > forms that they complement/replace. The burden is then on the middle-end to > simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs. Yeah. I think the intention is still that .md patterns match the "compound" forms like zero_extend/sign_extend where possible. AIUI the "expanded" from is just an internal thing, to reduce the number of simplification rules needed, and combine is supposed to call make_compound_operation before recog()nising the result. But given that the simplification rules in simplify-rtx.cc (as opposed to combine.cc itself) have to cope with passes that *don't* do this canonicalisation, I'm not sure how much it really helps in practice. And the cost of make_compound_operation failing to recognise the (possibly quite convoluted) results of substituting into an expanded compound operation can be quite severe. > Interestingly, this is sometimes already the case, for example, we simplify > (ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent > (ffs (and ... 255)) [but these are perhaps also just missed optimizations]. Yeah, sounds like it. Thanks, Richard > Thoughts? I'll also dig more into the history of these (combine) functions. > > Cheers, > Roger > --