public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <law@redhat.com>
To: Richard Biener <rguenther@suse.de>, Jan Hubicka <hubicka@ucw.cz>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Early jump threading
Date: Tue, 16 Aug 2016 16:52:00 -0000	[thread overview]
Message-ID: <c79c8532-b40e-ea74-d491-9851d03d9c1d@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1608120951170.26629@t29.fhfr.qr>

On 08/12/2016 02:02 AM, Richard Biener wrote:
> On Thu, 11 Aug 2016, Jan Hubicka wrote:
>
>> Hi,
>> this patch adds early jump threading pass.  Jump threading is one of most
>> common cases where estimated profile becomes corrupted, because the branches
>> are predicted independently beforehand. This patch simply adds early mode to
>> jump threader that does not permit code growth and thus only win/win threads
>> are done before profile is constructed.
>>
>> Naturally this also improves IPA decisions because code size/time is estimated
>> more acurately.
>>
>> It is not very cool to add another pass and the jump threader is already
>> run 5 times. I think incrementally we could drop one of late threaders at least.
>> I tried to measure compile time effects but they are in wash. Moreover the patch
>> pays for itself in cc1plus code size:
>>
>> Before patch to tweak size estimates: 27779964
>> Current mainline:                     27748900
>> With patch applied:                   27716173
>>
>> So despite adding few functions the code size effect is positive which I think
>> is quite nice.
>>
>> Given the fact that jump threading seems quite lightweit, I wonder why it is
>> guarded by flag_expensive_optimizations? Is it expensive in terms of code
>> size?
>>
>> The effectivity of individual threading passes is as follows (for tramp3d)
>>
>>       mainline                              with patch
>> pass  thread count     profile mismatches   thread count    profile mismatch
>> early                                       525
>> 1     1853             1900                 316             337
>> 2     4                812                  4               288
>> 3     24               1450                 32              947
>> 4     76               1457                 75              975
>>
>> So at least tramp3d data seems to suggest that we can drop the second occurence
>> of jump threading and that most of the job thread1 does can be done by the
>> size restricted early version (the lower absolute counts are caused by the
>> fact that threadable paths gets duplicated by the inliner). 50% drop in
>> profile distortion is not bad. I wonder why basically every threaded paths seems
>> to introduce a mismatch.
>>
>> The patch distorts testusite somewhat, in most cases we only want to update
>> dump files or disable early threading:
>>
>> +XPASS: gcc.dg/uninit-15.c  (test for warnings, line 13)
>> +XPASS: gcc.dg/uninit-15.c  (test for warnings, line 23)
>> +FAIL: gcc.dg/uninit-15.c  (test for warnings, line 24)
>> +FAIL: gcc.dg/tree-ssa/pr68198.c scan-tree-dump-times thread1 "Registering FSM" 1
>> +FAIL: gcc.dg/tree-ssa/pr69196-1.c scan-tree-dump thread1 "FSM did not thread around loop and would copy too many statements"
>> +FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2b.c scan-tree-dump-times thread1 "Jumps threaded: 1" 1
>> +FAIL: gcc.dg/tree-ssa/ssa-thread-13.c scan-tree-dump thread1 "FSM"
>> +FAIL: gcc.dg/tree-ssa/vrp01.c scan-tree-dump-times vrp1 "Folding predicate p_.*to 1" 1
>> +FAIL: gcc.dg/tree-ssa/vrp56.c scan-tree-dump thread1 "Jumps threaded: 1"
>> +FAIL: gcc.dg/tree-ssa/vrp92.c scan-tree-dump vrp1 "res_.: \\\\[1, 1\\\\]"
>>
>> This testcase is the now probably unnecesary heuristics (it may still be
>> relevant in cases we do not thread because of size bounds but its effectivity
>> may not be worth the maintenance cost):
>>
>> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
>> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
>> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
>> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
>> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
>> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
>> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
>> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
>> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
>> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
>>
>> If the patch seems acceptable, I will do the updates. One option why I did
>> not do that is that it seems to be now posisble to pass parameters to passes
>> from passes.def, so perhaps we do not need early_thread_jumps, but doing so is
>> consistent with way we handle other early passes.
>
> Few comments on the patch itself below.  Note currently the pass is
> enabled at -O[s2]+ only which I think is fine.  We certainly do not
> want it at -Og.  (and nobody seems to try making -O1 sane anyway)
>
> Given your statistics we want to remove thread2 and thread3 (I guess
> tracer immediately before thread_jumps is odd as well, I'd at least
> swap them).  The fact that thread4 threads more than thread3 argues
> for DOM being required as a FSM thread enabler rather than the
> other way around.  So I'd try with generally moving the threader
> _after_ CSE-like optimizations.
What we want to be doing is threading *before* CSE-like optimizations 
and removing the DOM/VRP jump threaders.  Based on what I've seen for 
the last 15 years, threading helps CSE far more often than the opposite.

The fact that we do CSE first, then threading is an artifact of the 
threading implementation wanting to exploit the CSE tables.  But that's 
not necessary in a backwards walking threader.

So what we need to be going is figuring out what key things aren't being 
handled so that we can drop the DOM/VRP threaders.  Once that's done we 
will want to look at whether or not any of the standalone threading 
passes continue to make sense.

jeff

  parent reply	other threads:[~2016-08-16 16:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 14:02 Jan Hubicka
2016-08-11 14:06 ` Richard Biener
2016-08-11 14:27   ` Jan Hubicka
2016-08-11 15:50     ` Richard Biener
2016-08-11 18:42       ` Jeff Law
2016-09-18 19:30         ` Jan Hubicka
2016-09-19  6:53           ` Andrew Pinski
2016-09-19  9:29             ` Jan Hubicka
2016-10-18 15:56               ` James Greenhalgh
2016-12-15 11:54                 ` [Patch] Undermine the jump threading cost model to fix PR77445 James Greenhalgh
2016-12-16 12:34                   ` Richard Biener
2016-09-19  7:59           ` Early jump threading Christophe Lyon
2016-09-20  8:41             ` Thomas Preudhomme
2016-08-11 18:41     ` Jeff Law
2016-08-12  6:02       ` Richard Biener
2016-08-15 17:06         ` Jeff Law
2016-08-11 18:30   ` Jeff Law
2016-08-11 18:29 ` Jeff Law
2016-08-16 11:02   ` Jan Hubicka
2016-08-16 15:42     ` Jeff Law
2016-08-12  8:02 ` Richard Biener
2016-08-12 11:27   ` Jan Hubicka
2016-08-15 17:25     ` Jeff Law
2016-08-16 16:52   ` Jeff Law [this message]
2016-08-18 19:55 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c79c8532-b40e-ea74-d491-9851d03d9c1d@redhat.com \
    --to=law@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).