From: Richard Biener <rguenther@suse.de>
To: Michael Matz <matz@suse.de>
Cc: Jakub Jelinek <jakub@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end/114480 - IDF compute is slow
Date: Thu, 28 Mar 2024 09:12:49 +0100 (CET) [thread overview]
Message-ID: <63s447o4-pn43-noqp-rr6s-qpq7onsso813@fhfr.qr> (raw)
In-Reply-To: <f919e0e6-9024-37cb-6a30-04f2f232b098@suse.de>
On Wed, 27 Mar 2024, Michael Matz wrote:
> Hey,
>
> On Wed, 27 Mar 2024, Jakub Jelinek wrote:
>
> > > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> > > gcc_checking_assert (bb_index
> > > < (unsigned) last_basic_block_for_fn (cfun));
> > >
> > > - EXECUTE_IF_AND_COMPL_IN_BITMAP (&dfs[bb_index], phi_insertion_points,
> > > - 0, i, bi)
> > > - {
> > > + EXECUTE_IF_SET_IN_BITMAP (&dfs[bb_index], 0, i, bi)
> > > + if (bitmap_set_bit (phi_insertion_points, i))
> > > bitmap_set_bit (work_set, i);
> > > - bitmap_set_bit (phi_insertion_points, i);
> > > - }
> > > }
> >
> > I don't understand why the above is better.
> > Wouldn't it be best to do
> > bitmap_ior_and_compl_into (work_set, &dfs[bb_index],
> > phi_insertion_points);
> > bitmap_ior_into (phi_insertion_points, &dfs[bb_index]);
> > ?
>
> I had the same hunch, but:
>
> 1) One would have to make work_set be non-tree-view again (which with the
> current structure is a wash anyway, and that makes sense as accesses to
> work_set aren't heavily random here).
The tree-view is a wash indeed (I tried many things).
> 2) But doing that and using bitmap_ior.._into is still measurably slower:
> on a reduced testcase with -O0 -fno-checking, proposed structure
> (tree-view or not-tree-view workset doesn't matter):
>
> tree SSA rewrite : 14.93 ( 12%) 0.01 ( 2%) 14.95 (
> 12%) 27M ( 8%)
>
> with non-tree-view, and your suggestion:
>
> tree SSA rewrite : 20.68 ( 12%) 0.02 ( 4%) 20.75 (
> 12%) 27M ( 8%)
>
> I can only speculate that the usually extreme sparsity of the bitmaps in
> question make the setup costs of the two bitmap_ior calls actually more
> expensive than the often skipped second call to bitmap_set_bit in Richis
> proposed structure. (That or cache effects)
So slightly "better" than Jakubs variant would be
if (bitmap_ior_and_compl_into (work_set, &dfs[bb_index],
phi_insertion_points))
bitmap_ior_into (phi_insertion_points, &dfs[bb_index]);
since phi_insertion_points grows that IOR becomes more expensive over
time.
The above for me (today Zen2, yesterday Zen4) is
tree SSA rewrite : 181.02 ( 37%)
with unconditiona ior_into:
tree SSA rewrite : 180.93 ( 36%)
while my patch is
tree SSA rewrite : 22.04 ( 6%)
not sure what uarch Micha tested on. I think the testcase has simply
many variables we write into SSA (man compute_idf calls), many BBs
but very low popcount DFS[] so iterating over DFS[] only is very
beneficial here as opposed to also walk phi_insertion_points
and work_set. I think low popcount DFS[] is quite typical
for a CFG - but for sure popcount of DFS[] is going to be lower
than popcount of the IDF (phi_insertion_points).
Btw, with my patch compute_idf is completely off the profile so it's
hard to improve further (we do process blocks possibly twice for
example, but that doesn't make a difference here)
Indeed doing statistics shows the maximum popcount of a dominance
frontier is 8 but 99% have just a single block. But the popcount
of the final IDF is more than 10000 half of the time and more
than 1000 90% of the time.
I have pushed the patch now.
Richard.
next prev parent reply other threads:[~2024-03-28 8:12 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <96606.124032711422300395@us-mta-154.us.mimecast.lan>
2024-03-27 16:02 ` Jakub Jelinek
2024-03-27 18:44 ` Michael Matz
2024-03-27 18:57 ` Jakub Jelinek
2024-03-28 8:12 ` Richard Biener [this message]
2024-03-27 15:42 Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=63s447o4-pn43-noqp-rr6s-qpq7onsso813@fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=matz@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).