From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id F14E8385AC34;
	Mon, 12 Sep 2022 14:56:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F14E8385AC34
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CECBE113E;
	Mon, 12 Sep 2022 07:56:52 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7EEA93F73D;
	Mon, 12 Sep 2022 07:56:45 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Roger Sayle" <roger@nextmovesoftware.com>
Mail-Followup-To: "Roger Sayle" <roger@nextmovesoftware.com>,"'GCC Patches'" <gcc-patches@gcc.gnu.org>,  "'Tamar Christina'" <tnfchris@gcc.gnu.org>,  <segher@kernel.crashing.org>, richard.sandiford@arm.com
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>,  "'Tamar Christina'" <tnfchris@gcc.gnu.org>,  <segher@kernel.crashing.org>
Subject: Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap.
References: <007201d8c637$e2c3bfd0$a84b3f70$@nextmovesoftware.com>
	<mpttu5c7uka.fsf@arm.com>
	<009601d8c6b2$41557eb0$c4007c10$@nextmovesoftware.com>
Date: Mon, 12 Sep 2022 15:56:44 +0100
In-Reply-To: <009601d8c6b2$41557eb0$c4007c10$@nextmovesoftware.com> (Roger
	Sayle's message of "Mon, 12 Sep 2022 15:16:28 +0100")
Message-ID: <mptv8ps63pv.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-42.5 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> Hi Richard,
>
>> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>> > This patch addresses PR rtl-optimization/106594, a significant
>> > performance regression affecting aarch64 recently introduced (exposed)
>> > by one of my recent RTL simplification improvements.  Firstly many
>> > thanks to Tamar Christina for confirming that the core of this patch
>> > provides ~5% performance improvement on some on his benchmarks.
>> >
>> > GCC's combine pass uses the function expand_compound_operation to
>> > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a
>> > pair of shift operations, as not all targets support extension
>> > instructions [technically ZERO_EXTEND may potentially be simplified/
>> > canonicalized to an AND operation, but the theory remains the same].
>> 
>> Are you sure the reason is that not all targets support extension instructions?
>> I thought in practice this routine would only tend to see ZERO_EXTEND etc. if
>> those codes appeared in the original rtl insns.
>
> Excellent question.  My current understanding is that this subroutine is used
> for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions
> and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for
> which many platforms, really do need a pair of shift instructions. (or an AND).
>
> The bit (to me) that that's suspicious is the exact wording of the comment...
>
>> >   /* Convert sign extension to zero extension, if we know that the high
>> >      bit is not set, as this is easier to optimize.  It will be converted
>> >      back to cheaper alternative in make_extraction.  */
>
> Notice that things are converted back in "make_extraction", and may not be
> getting converted back (based on empirical evidence) for non-extractions,
> i.e. extensions.  This code is being called for ZERO_EXTEND on a path that
> doesn't subsequently call make_compound.  As shown in the PR, there are
> code generation differences (that impact performance), but I agree there's
> some ambiguity around the intent of the original code.
>
> My personal preference is to write backend patterns that contain ZERO_EXTEND
> (or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical SUBREG.
> For example, the new patterns added to i386.md look (to me) "cleaner" than the
> forms that they complement/replace.  The burden is then on the middle-end to
> simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs.

Yeah.  I think the intention is still that .md patterns match the
"compound" forms like zero_extend/sign_extend where possible.
AIUI the "expanded" from is just an internal thing, to reduce the
number of simplification rules needed, and combine is supposed to call
make_compound_operation before recog()nising the result.

But given that the simplification rules in simplify-rtx.cc (as opposed
to combine.cc itself) have to cope with passes that *don't* do this
canonicalisation, I'm not sure how much it really helps in practice.
And the cost of make_compound_operation failing to recognise the
(possibly quite convoluted) results of substituting into an expanded
compound operation can be quite severe.

> Interestingly, this is sometimes already the case, for example, we simplify
> (ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent
> (ffs (and ... 255)) [but these are perhaps also just missed optimizations].

Yeah, sounds like it.

Thanks,
Richard

> Thoughts?  I'll also dig more into the history of these (combine) functions.
>
> Cheers,
> Roger
> --