Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: "Roger Sayle" <roger@nextmovesoftware.com>
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>,
	 "'Tamar Christina'" <tnfchris@gcc.gnu.org>,
	 <segher@kernel.crashing.org>
Subject: Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap.
Date: Mon, 12 Sep 2022 15:56:44 +0100	[thread overview]
Message-ID: <mptv8ps63pv.fsf@arm.com> (raw)
In-Reply-To: <009601d8c6b2$41557eb0$c4007c10$@nextmovesoftware.com> (Roger Sayle's message of "Mon, 12 Sep 2022 15:16:28 +0100")

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> Hi Richard,
>
>> "Roger Sayle" <roger@nextmovesoftware.com> writes:
>> > This patch addresses PR rtl-optimization/106594, a significant
>> > performance regression affecting aarch64 recently introduced (exposed)
>> > by one of my recent RTL simplification improvements.  Firstly many
>> > thanks to Tamar Christina for confirming that the core of this patch
>> > provides ~5% performance improvement on some on his benchmarks.
>> >
>> > GCC's combine pass uses the function expand_compound_operation to
>> > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a
>> > pair of shift operations, as not all targets support extension
>> > instructions [technically ZERO_EXTEND may potentially be simplified/
>> > canonicalized to an AND operation, but the theory remains the same].
>> 
>> Are you sure the reason is that not all targets support extension instructions?
>> I thought in practice this routine would only tend to see ZERO_EXTEND etc. if
>> those codes appeared in the original rtl insns.
>
> Excellent question.  My current understanding is that this subroutine is used
> for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions
> and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for
> which many platforms, really do need a pair of shift instructions. (or an AND).
>
> The bit (to me) that that's suspicious is the exact wording of the comment...
>
>> >   /* Convert sign extension to zero extension, if we know that the high
>> >      bit is not set, as this is easier to optimize.  It will be converted
>> >      back to cheaper alternative in make_extraction.  */
>
> Notice that things are converted back in "make_extraction", and may not be
> getting converted back (based on empirical evidence) for non-extractions,
> i.e. extensions.  This code is being called for ZERO_EXTEND on a path that
> doesn't subsequently call make_compound.  As shown in the PR, there are
> code generation differences (that impact performance), but I agree there's
> some ambiguity around the intent of the original code.
>
> My personal preference is to write backend patterns that contain ZERO_EXTEND
> (or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical SUBREG.
> For example, the new patterns added to i386.md look (to me) "cleaner" than the
> forms that they complement/replace.  The burden is then on the middle-end to
> simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs.

Yeah.  I think the intention is still that .md patterns match the
"compound" forms like zero_extend/sign_extend where possible.
AIUI the "expanded" from is just an internal thing, to reduce the
number of simplification rules needed, and combine is supposed to call
make_compound_operation before recog()nising the result.

But given that the simplification rules in simplify-rtx.cc (as opposed
to combine.cc itself) have to cope with passes that *don't* do this
canonicalisation, I'm not sure how much it really helps in practice.
And the cost of make_compound_operation failing to recognise the
(possibly quite convoluted) results of substituting into an expanded
compound operation can be quite severe.

> Interestingly, this is sometimes already the case, for example, we simplify
> (ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent
> (ffs (and ... 255)) [but these are perhaps also just missed optimizations].

Yeah, sounds like it.

Thanks,
Richard

> Thoughts?  I'll also dig more into the history of these (combine) functions.
>
> Cheers,
> Roger
> --

next prev parent reply	other threads:[~2022-09-12 14:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-11 23:40 Roger Sayle
2022-09-12 10:31 ` Richard Sandiford
2022-09-12 14:16   ` Roger Sayle
2022-09-12 14:56     ` Richard Sandiford [this message]
2022-09-12 16:47 ` Segher Boessenkool
2022-09-12 18:09   ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptv8ps63pv.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    --cc=segher@kernel.crashing.org \
    --cc=tnfchris@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).