From: Jeff Law <law@redhat.com>
To: Jan Hubicka <hubicka@ucw.cz>, gcc-patches@gcc.gnu.org, rguenther@suse.de
Subject: Re: Early jump threading
Date: Thu, 18 Aug 2016 19:55:00 -0000 [thread overview]
Message-ID: <fe4c6cfd-34ef-23f7-543a-a6c07ac7157f@redhat.com> (raw)
In-Reply-To: <20160811140235.GA68714@kam.mff.cuni.cz>
On 08/11/2016 08:02 AM, Jan Hubicka wrote:
> Hi,
> this patch adds early jump threading pass. Jump threading is one of most
> common cases where estimated profile becomes corrupted, because the branches
> are predicted independently beforehand. This patch simply adds early mode to
> jump threader that does not permit code growth and thus only win/win threads
> are done before profile is constructed.
>
> Naturally this also improves IPA decisions because code size/time is estimated
> more acurately.
>
> It is not very cool to add another pass and the jump threader is already
> run 5 times. I think incrementally we could drop one of late threaders at least.
> I tried to measure compile time effects but they are in wash. Moreover the patch
> pays for itself in cc1plus code size:
>
> Before patch to tweak size estimates: 27779964
> Current mainline: 27748900
> With patch applied: 27716173
>
> So despite adding few functions the code size effect is positive which I think
> is quite nice.
>
> Given the fact that jump threading seems quite lightweit, I wonder why it is
> guarded by flag_expensive_optimizations? Is it expensive in terms of code
> size?
>
> The effectivity of individual threading passes is as follows (for tramp3d)
>
> mainline with patch
> pass thread count profile mismatches thread count profile mismatch
> early 525
> 1 1853 1900 316 337
> 2 4 812 4 288
So the real question here is what did VRP1 threading do between the
standalone thread1/thread2 passes. ie, it may look tempting to
eliminate the standalone thread2 pass, but threading done by VRP1 is
expected to be pushed into the thread{1,2} passes. So removing thread2
is premature at this point.
>
> The patch distorts testusite somewhat, in most cases we only want to update
> dump files or disable early threading:
>
> +XPASS: gcc.dg/uninit-15.c (test for warnings, line 13)
> +XPASS: gcc.dg/uninit-15.c (test for warnings, line 23)
> +FAIL: gcc.dg/uninit-15.c (test for warnings, line 24)
This seems like a step backwards to me. We're warning about "j", but in
the context in which its used, we really ought to be warning about "i".
And we lost a later warning.
> +FAIL: gcc.dg/tree-ssa/pr68198.c scan-tree-dump-times thread1 "Registering FSM" 1
Moved into early-threading, so adjusting test for that seems appropriate.
> +FAIL: gcc.dg/tree-ssa/pr69196-1.c scan-tree-dump thread1 "FSM did not thread around loop and would copy too many statements"
I'm not sure how to best check this one. Threading this test too
aggressively results in something like a 2X bloat in the resulting code
on the sparc.
> +FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2b.c scan-tree-dump-times thread1 "Jumps threaded: 1" 1
Moved into early threading, so adjusting the test for that seems
appropriate to me.
> +FAIL: gcc.dg/tree-ssa/ssa-thread-13.c scan-tree-dump thread1 "FSM"
Moved into early threading, so adjusting the test for that seems
appropriate to me.
> +FAIL: gcc.dg/tree-ssa/vrp01.c scan-tree-dump-times vrp1 "Folding predicate p_.*to 1" 1
So the optimization moved into fre1 as a result of early jump threading
which is good. We should probably check for that, even if the test is
badly mis-named now.
But this is showing one of the cases where the backwards threader is
deficient. It's not using the conditional as an implied set. It's a
known limitation and I'm hoping Andrew's work makes that an easier
problem to solve.
> +FAIL: gcc.dg/tree-ssa/vrp56.c scan-tree-dump thread1 "Jumps threaded: 1"
Moved into early threading, so adjusting the test for that seems
appropriate to me.
> +FAIL: gcc.dg/tree-ssa/vrp92.c scan-tree-dump vrp1 "res_.: \\\\[1, 1\\\\]"
Some results adjustment seems to be in order here. I think verifying
that early threading finds the obvious jump threads, then vrp is able to
eliminate the 2nd conditional and the result is a collapsed return i for
the entire function.
>
> This testcase is the now probably unnecesary heuristics (it may still be
> relevant in cases we do not thread because of size bounds but its effectivity
> may not be worth the maintenance cost):
>
> +FAIL: g++.dg/predict-loop-exit-1.C -std=gnu++11 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-1.C -std=gnu++11 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-1.C -std=gnu++14 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-1.C -std=gnu++14 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-1.C -std=gnu++98 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-1.C -std=gnu++98 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++11 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
> +FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++11 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++14 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
> +FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++14 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++98 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
> +FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++98 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C -std=gnu++11 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C -std=gnu++11 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-3.C -std=gnu++14 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C -std=gnu++14 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-3.C -std=gnu++98 scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C -std=gnu++98 scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
You're the best judge on how to deal with these. I leave that to your
discretion.
> If the patch seems acceptable, I will do the updates. One option why I did
> not do that is that it seems to be now posisble to pass parameters to passes
> from passes.def, so perhaps we do not need early_thread_jumps, but doing so is
> consistent with way we handle other early passes.
Your call on this. One of the things in the related literature is a
limiter on how far back the use-def chains we go. I'd envisioned adding
that limiter and having early jump threading use a different (much
smaller) limiter. But I don't think it's critical at this point.
>
> Bootstrapped/regtested x86_64-linux
> Honza
>
> * passes.def (pass_early_thread_jumps): Schedule after forwprop.
> * tree-pass.h (make_pass_early_thread_jumps): Declare.
> * tree-ssa-threadbackward.c (fsm_find_thread_path,
> fsm_find_thread_path, profitable_jump_thread_path,
> fsm_find_control_statement_thread_paths,
> find_jump_threads_backwards): Add speed_p parameter.
> (pass_data_early_thread_jumps): New pass.
> (make_pass_early_thread_jumps): New function.
LGTM. OK after fixing up the tests.
Jeff
prev parent reply other threads:[~2016-08-18 19:55 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-11 14:02 Jan Hubicka
2016-08-11 14:06 ` Richard Biener
2016-08-11 14:27 ` Jan Hubicka
2016-08-11 15:50 ` Richard Biener
2016-08-11 18:42 ` Jeff Law
2016-09-18 19:30 ` Jan Hubicka
2016-09-19 6:53 ` Andrew Pinski
2016-09-19 9:29 ` Jan Hubicka
2016-10-18 15:56 ` James Greenhalgh
2016-12-15 11:54 ` [Patch] Undermine the jump threading cost model to fix PR77445 James Greenhalgh
2016-12-16 12:34 ` Richard Biener
2016-09-19 7:59 ` Early jump threading Christophe Lyon
2016-09-20 8:41 ` Thomas Preudhomme
2016-08-11 18:41 ` Jeff Law
2016-08-12 6:02 ` Richard Biener
2016-08-15 17:06 ` Jeff Law
2016-08-11 18:30 ` Jeff Law
2016-08-11 18:29 ` Jeff Law
2016-08-16 11:02 ` Jan Hubicka
2016-08-16 15:42 ` Jeff Law
2016-08-12 8:02 ` Richard Biener
2016-08-12 11:27 ` Jan Hubicka
2016-08-15 17:25 ` Jeff Law
2016-08-16 16:52 ` Jeff Law
2016-08-18 19:55 ` Jeff Law [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fe4c6cfd-34ef-23f7-543a-a6c07ac7157f@redhat.com \
--to=law@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).