public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>,
	gcc-patches@gcc.gnu.org,
	 Philipp Tomsich <philipp.tomsich@vrull.eu>,
	Jakub Jelinek <jakub@redhat.com>,
	 Andrew Pinski <apinski@marvell.com>,
	Robin Dapp <rdapp@linux.ibm.com>,
	richard.sandiford@arm.com
Subject: Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
Date: Wed, 30 Aug 2023 13:30:37 +0300	[thread overview]
Message-ID: <CAM3yNXpRAapjRHp91z-VdUeTVE7zvjCYMfL4jccqe0kdpRDD+w@mail.gmail.com> (raw)
In-Reply-To: <mpt7cqxm5eq.fsf@arm.com>

On Tue, Jul 18, 2023 at 9:38 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Manolis Tsamis <manolis.tsamis@vrull.eu> writes:
> >> > noce_convert_multiple_sets has been introduced and extended over time to handle
> >> > if conversion for blocks with multiple sets. Currently this is focused on
> >> > register moves and rejects any sort of arithmetic operations.
> >> >
> >> > This series is an extension to allow more sequences to take part in if
> >> > conversion. The first patch is a required change to emit correct code and the
> >> > second patch whitelists a larger number of operations through
> >> > bb_ok_for_noce_convert_multiple_sets.
> >> >
> >> > For targets that have a rich selection of conditional instructions,
> >> > like aarch64, I have seen an ~5x increase of profitable if conversions for
> >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> >> > benchmarks and I have not seen performance regressions on either x64 / aarch64.
> >>
> >> Interesting results.  Are you free to say which target you used for aarch64?
> >>
> >> If I've understood the cost heuristics correctly, we'll allow a "predictable"
> >> branch to be replaced by up to 5 simple conditional instructions and an
> >> "unpredictable" branch to be replaced by up to 10 simple conditional
> >> instructions.  That seems pretty high.  And I'm not sure how well we
> >> guess predictability in the absence of real profile information.
> >>
> >> So my gut instinct was that the limitations of the current code might
> >> be saving us from overly generous limits.  It sounds from your results
> >> like that might not be the case though.
> >>
> >> Still, if it does turn out to be the case in future, I agree we should
> >> fix the costs rather than hamstring the code.
> >>
> >
> > My writing may have been confusing, but with "~5x increase of
> > profitable if conversions" I just meant that ifcvt considers these
> > profitable, not that they actually are when executed in particular
> > hardware.
>
> Yeah, sorry, I'd read that part as measuring the number of if-converisons.
> But...
>
> > But at the same time I haven't yet seen any obvious performance
> > regressions in some benchmarks that I have ran.
>
> ...it was a pleasant surprise that doing so much more if-conversion
> didn't make things noticeably worse. :)
>
> > In any case it could be interesting to microbenchmark branches vs
> > conditional instructions and see how sane these numbers are.
>
> I think for this we really do need the real workload, since it's
> hard to measure realistic branch mispredict penalties with a
> microbenchmark.
>
Yes indeed. I'm still trying to get properly analyze the effects of this change.
I'll share when I have something interesting on the benchmarks side.

> > [...]
> >> (2) Don't you also need to update the "rewiring" mechanism, to cope
> >>     with cases where the then block has something like:
> >>
> >>       if (a == 0) {
> >>         a = b op c;       ->    a' = a == 0 ? b op c : a;
> >>         d = a op b;       ->    d = a == 0 ? a' op b : d;
> >>       }                         a = a'
> >>
> >>     At the moment the code only handles regs and subregs, whereas but IIUC
> >>     it should now iterate over all the regs in the SET_SRC.  And I suppose
> >>     that creates the need for multiple possible rewirings in the same insn,
> >>     so that it isn't a simple insn -> index mapping any more.
> >>
> >
> > Indeed, I believe this current patch cannot properly handle these. I
> > will create testcases for this and see what changes need to be done in
> > the next iteration so that correct code is generated.
>
> Perhaps we should change the way that the rewiring is done.
> At the moment, need_cmov_or_rewire detects the renumbering
> ahead of time.  But it might be easier to:
>
> - have noce_convert_multiple_sets_1 keep track of which
>   SET_DESTs it has replaced with temporaries.
>
> - for each subsequent instruction, go through that list in order
>   and use insn_propagation (from recog.h) to apply each replacement.
>
> That might be simpler, and should also be more robust, since the
> insn_propagation routines return false on failure.
>
Thanks, I've tried various designs with these ideas, including moving
the rewiring code to noce_convert_multiple_sets_1  but there was
always some issue.
For example we cannot remove the need_cmov_or_rewire function because
the part that calculates need_no_cmov needs to iterate once before the
conversion, otherwise it wouldn't work.
Also noce_convert_multiple_sets_1 is ran twice each time so doing the
new rewiring logic which is somewhat expensive two times felt like a
regression without making things much easier.

In the end I opted to keep need_cmov_or_rewire (albait renamed) and
introduce a new struct noce_multiple_sets_info, which I thing made
things much nicer.
That includes taking care of the very long signatures of the helper
functions and improving efficiency by removing a lot of vectors/hash
set/hash maps.

I've also added a SCALAR_INT_MODE_P check that takes care of the
force_reg x86-64 issue.

These changes can be found in the two last commits of the new series.
Any feedback on the new implementation would be welcome:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html

> Thanks,
> Richard
>

Thanks,
Manolis

      reply	other threads:[~2023-08-30 10:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-13 14:09 Manolis Tsamis
2023-07-13 14:09 ` [PATCH v2 1/2] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets Manolis Tsamis
2023-07-13 14:09 ` [PATCH v2 2/2] ifcvt: Allow more operations in multiple set if conversion Manolis Tsamis
2023-07-17 22:12 ` [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Richard Sandiford
2023-07-18 17:03   ` Manolis Tsamis
2023-07-18 18:38     ` Richard Sandiford
2023-08-30 10:30       ` Manolis Tsamis [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM3yNXpRAapjRHp91z-VdUeTVE7zvjCYMfL4jccqe0kdpRDD+w@mail.gmail.com \
    --to=manolis.tsamis@vrull.eu \
    --cc=apinski@marvell.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=rdapp@linux.ibm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).