From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6572 invoked by alias); 19 Sep 2016 09:22:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 6562 invoked by uid 89); 19 Sep 2016 09:22:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=3-4, hubicka@ucw.cz, D*ucw.cz, hubickaucwcz X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Sep 2016 09:22:30 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id F41E3542B3B; Mon, 19 Sep 2016 11:22:27 +0200 (CEST) Date: Mon, 19 Sep 2016 09:29:00 -0000 From: Jan Hubicka To: Andrew Pinski Cc: Jan Hubicka , Jeff Law , Richard Biener , GCC Patches Subject: Re: Early jump threading Message-ID: <20160919092227.GB354@kam.mff.cuni.cz> References: <20160811140235.GA68714@kam.mff.cuni.cz> <20160811142700.GA57561@kam.mff.cuni.cz> <9B0D3D88-1653-4CB6-89BB-948064048D8F@suse.de> <20160918184850.GB58387@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2016-09/txt/msg01151.txt.bz2 > On Mon, Sep 19, 2016 at 2:48 AM, Jan Hubicka wrote: > > Hi, > > this is the patch compensating testsuite I commited after re-testing > > on x86_64-linux. > > > > Other placements of early_thread_jumps does not work veyr well (at least in > > current implementation). Putting it before forwprop disables about 15% of > > threadings. Placing it after DCE makes inliner to not see much of benefits > > because threading requires a cleanup propagation+DCE after itself. > > So unless we extend threader to be smarter or add extra DCE cleanup, i think > > this is best placement. > > > This caused (another) 3-4% degradation in coremarks on ThunderX. Hmm, this is interesting. The patch should have "fixed" the previous degradation by making the profile correct (backward threader still doe not update it, but because most threading now happens early and profile is built afterwards this should be less of issue). I am now looking into the profile update issues and will try to check why coremarks degrade again. Honza > > Thanks, > Andrew > > > > > Honza > > > > * passes.def (pass_early_thread_jumps): Schedule after forwprop. > > * tree-pass.h (make_pass_early_thread_jumps): Declare. > > * tree-ssa-threadbackward.c (fsm_find_thread_path, > > fsm_find_thread_path, profitable_jump_thread_path, > > fsm_find_control_statement_thread_paths, > > find_jump_threads_backwards): Add speed_p parameter. > > (pass_data_early_thread_jumps): New pass. > > (make_pass_early_thread_jumps): New function. > > > > * g++.dg/predict-loop-exit-1.C: Disable early jump threading. > > * g++.dg/predict-loop-exit-2.C: Disable early jump threading. > > * g++.dg/predict-loop-exit-3.C: Disable early jump threading. > > * gcc.dg/tree-ssa/pr69196-1.c: Disable early jump threading. > > * gcc.dg/tree-ssa/vrp01.c: Disable early jump threading. > > * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Disable early jump threading. > > * gcc.dg/tree-ssa/pr68198.c: Scan ethread dump. > > * gcc.dg/tree-ssa/ssa-thread-13.c: Scan ethread dump. > > * gcc.dg/tree-ssa/vrp56.c: Scan ethread dump. > > * gcc.dg/tree-ssa/vrp92.c: Scan ethread dump. > > * gcc.dg/uninit-15.c: Swap xfailed and non-xfailed alternative. > > > > Index: passes.def > > =================================================================== > > --- passes.def (revision 240109) > > +++ passes.def (working copy) > > @@ -84,6 +84,7 @@ along with GCC; see the file COPYING3. > > /* After CCP we rewrite no longer addressed locals into SSA > > form if possible. */ > > NEXT_PASS (pass_forwprop); > > + NEXT_PASS (pass_early_thread_jumps); > > NEXT_PASS (pass_sra_early); > > /* pass_build_ealias is a dummy pass that ensures that we > > execute TODO_rebuild_alias at this point. */ > > Index: testsuite/g++.dg/predict-loop-exit-1.C > > =================================================================== > > --- testsuite/g++.dg/predict-loop-exit-1.C (revision 240109) > > +++ testsuite/g++.dg/predict-loop-exit-1.C (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-profile_estimate" } */ > > +/* { dg-options "-O2 -fdump-tree-profile_estimate -fdisable-tree-ethread" } */ > > > > int g; > > int foo(); > > Index: testsuite/g++.dg/predict-loop-exit-2.C > > =================================================================== > > --- testsuite/g++.dg/predict-loop-exit-2.C (revision 240109) > > +++ testsuite/g++.dg/predict-loop-exit-2.C (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-profile_estimate" } */ > > +/* { dg-options "-O2 -fdump-tree-profile_estimate -fdisable-tree-ethread" } */ > > > > int g; > > int foo(); > > Index: testsuite/g++.dg/predict-loop-exit-3.C > > =================================================================== > > --- testsuite/g++.dg/predict-loop-exit-3.C (revision 240109) > > +++ testsuite/g++.dg/predict-loop-exit-3.C (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-profile_estimate" } */ > > +/* { dg-options "-O2 -fdump-tree-profile_estimate -fdisable-tree-ethread" } */ > > > > int g; > > int foo(); > > Index: testsuite/gcc.dg/tree-ssa/pr68198.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/pr68198.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/pr68198.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-thread1-details" } */ > > +/* { dg-options "-O2 -fdump-tree-thread1-details -fdisable-tree-ethread" } */ > > > > extern void abort (void); > > > > @@ -40,4 +40,4 @@ c_finish_omp_clauses (tree clauses) > > /* There are 3 FSM jump threading opportunities, two of which will > > get filtered out. */ > > /* { dg-final { scan-tree-dump-times "Registering FSM" 1 "thread1"} } */ > > -/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch without threading a multiway branch" 2 "thread1"} } */ > > +/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch without threading a multiway branch" 2 "ethread"} } */ > > Index: testsuite/gcc.dg/tree-ssa/pr69196-1.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/pr69196-1.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/pr69196-1.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile { target sparc*-*-* x86_64-*-* } } */ > > -/* { dg-options "-O2 -fdump-tree-thread1-details" } */ > > +/* { dg-options "-O2 -fdump-tree-thread1-details -fdisable-tree-ethread" } */ > > > > /* { dg-final { scan-tree-dump "FSM did not thread around loop and would copy too many statements" "thread1" } } */ > > > > Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats" } */ > > +/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */ > > > > void foo(); > > void bla(); > > Index: testsuite/gcc.dg/tree-ssa/ssa-thread-13.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/ssa-thread-13.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/ssa-thread-13.c (working copy) > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-thread1-details" } */ > > -/* { dg-final { scan-tree-dump "FSM" "thread1" } } */ > > +/* { dg-options "-O2 -fdump-tree-ethread-details" } */ > > +/* { dg-final { scan-tree-dump "FSM" "ethread" } } */ > > > > typedef struct rtx_def *rtx; > > typedef const struct rtx_def *const_rtx; > > Index: testsuite/gcc.dg/tree-ssa/vrp01.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/vrp01.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/vrp01.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-vrp1" } */ > > +/* { dg-options "-O2 -fdump-tree-vrp1 -fdisable-tree-ethread" } */ > > > > int > > foo (int *p, int i) > > Index: testsuite/gcc.dg/tree-ssa/vrp56.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/vrp56.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/vrp56.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-thread1-stats" } */ > > +/* { dg-options "-O2 -fdump-tree-ethread-stats" } */ > > typedef struct basic_block_def *basic_block; > > struct basic_block_def; > > struct edge_def; > > @@ -38,5 +38,5 @@ cleanup_empty_eh (basic_block bb) > > foo (); > > } > > } > > -/* { dg-final { scan-tree-dump "Jumps threaded: 1" "thread1"} } */ > > +/* { dg-final { scan-tree-dump "Jumps threaded: 1" "ethread"} } */ > > > > Index: testsuite/gcc.dg/tree-ssa/vrp92.c > > =================================================================== > > --- testsuite/gcc.dg/tree-ssa/vrp92.c (revision 240109) > > +++ testsuite/gcc.dg/tree-ssa/vrp92.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -fdump-tree-vrp1-details" } */ > > +/* { dg-options "-O2 -fdump-tree-vrp1-details -fdisable-tree-ethread" } */ > > > > void bar (void); > > int foo (int i, int j) > > Index: tree-pass.h > > =================================================================== > > --- tree-pass.h (revision 240109) > > +++ tree-pass.h (working copy) > > @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_cd_dce > > extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt); > > +extern gimple_opt_pass *make_pass_early_thread_jumps (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt); > > extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt); > > Index: tree-ssa-threadbackward.c > > =================================================================== > > --- tree-ssa-threadbackward.c (revision 240109) > > +++ tree-ssa-threadbackward.c (working copy) > > @@ -61,12 +61,14 @@ get_gimple_control_stmt (basic_block bb) > > /* Return true if the CFG contains at least one path from START_BB to END_BB. > > When a path is found, record in PATH the blocks from END_BB to START_BB. > > VISITED_BBS is used to make sure we don't fall into an infinite loop. Bound > > - the recursion to basic blocks belonging to LOOP. */ > > + the recursion to basic blocks belonging to LOOP. > > + SPEED_P indicate that we could increase code size to improve the code path */ > > > > static bool > > fsm_find_thread_path (basic_block start_bb, basic_block end_bb, > > vec *&path, > > - hash_set *visited_bbs, loop_p loop) > > + hash_set *visited_bbs, loop_p loop, > > + bool speed_p) > > { > > if (loop != start_bb->loop_father) > > return false; > > @@ -82,7 +84,8 @@ fsm_find_thread_path (basic_block start_ > > edge e; > > edge_iterator ei; > > FOR_EACH_EDGE (e, ei, start_bb->succs) > > - if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop)) > > + if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop, > > + speed_p)) > > { > > vec_safe_push (path, start_bb); > > return true; > > @@ -101,11 +104,13 @@ fsm_find_thread_path (basic_block start_ > > value on PATH. ARG is the value of that SSA_NAME. > > > > BBI will be appended to PATH when we have a profitable jump threading > > - path. Callers are responsible for removing BBI from PATH in that case. */ > > + path. Callers are responsible for removing BBI from PATH in that case. > > + > > + SPEED_P indicate that we could increase code size to improve the code path */ > > > > static edge > > profitable_jump_thread_path (vec *&path, > > - basic_block bbi, tree name, tree arg) > > + basic_block bbi, tree name, tree arg, bool speed_p) > > { > > /* Note BBI is not in the path yet, hence the +1 in the test below > > to make sure BBI is accounted for in the path length test. */ > > @@ -307,7 +312,7 @@ profitable_jump_thread_path (vec > return NULL; > > } > > > > - if (optimize_edge_for_speed_p (taken_edge)) > > + if (speed_p && optimize_edge_for_speed_p (taken_edge)) > > { > > if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS)) > > { > > @@ -422,13 +427,15 @@ convert_and_register_jump_thread_path (v > > > > /* We trace the value of the SSA_NAME NAME back through any phi nodes looking > > for places where it gets a constant value and save the path. Stop after > > - having recorded MAX_PATHS jump threading paths. */ > > + having recorded MAX_PATHS jump threading paths. > > + > > + SPEED_P indicate that we could increase code size to improve the code path */ > > > > static void > > fsm_find_control_statement_thread_paths (tree name, > > hash_set *visited_bbs, > > vec *&path, > > - bool seen_loop_phi) > > + bool seen_loop_phi, bool speed_p) > > { > > /* If NAME appears in an abnormal PHI, then don't try to trace its > > value back through PHI nodes. */ > > @@ -496,7 +503,7 @@ fsm_find_control_statement_thread_paths > > hash_set *visited_bbs = new hash_set; > > > > if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs, > > - e->src->loop_father)) > > + e->src->loop_father, speed_p)) > > ++e_count; > > > > delete visited_bbs; > > @@ -562,7 +569,7 @@ fsm_find_control_statement_thread_paths > > /* Recursively follow SSA_NAMEs looking for a constant > > definition. */ > > fsm_find_control_statement_thread_paths (arg, visited_bbs, path, > > - seen_loop_phi); > > + seen_loop_phi, speed_p); > > > > path->pop (); > > continue; > > @@ -573,7 +580,8 @@ fsm_find_control_statement_thread_paths > > > > /* If this is a profitable jump thread path, then convert it > > into the canonical form and register it. */ > > - edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg); > > + edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg, > > + speed_p); > > if (taken_edge) > > { > > if (bb_loop_depth (taken_edge->src) > > @@ -589,7 +597,7 @@ fsm_find_control_statement_thread_paths > > > > if (TREE_CODE (arg) == SSA_NAME) > > fsm_find_control_statement_thread_paths (arg, visited_bbs, > > - path, seen_loop_phi); > > + path, seen_loop_phi, speed_p); > > > > else > > { > > @@ -599,7 +607,7 @@ fsm_find_control_statement_thread_paths > > path->pop (); > > > > edge taken_edge = profitable_jump_thread_path (path, var_bb, > > - name, arg); > > + name, arg, speed_p); > > if (taken_edge) > > { > > if (bb_loop_depth (taken_edge->src) > > @@ -623,10 +631,11 @@ fsm_find_control_statement_thread_paths > > is a constant. Record such paths for jump threading. > > > > It is assumed that BB ends with a control statement and that by > > - finding a path where NAME is a constant, we can thread the path. */ > > + finding a path where NAME is a constant, we can thread the path. > > + SPEED_P indicate that we could increase code size to improve the code path */ > > > > void > > -find_jump_threads_backwards (basic_block bb) > > +find_jump_threads_backwards (basic_block bb, bool speed_p) > > { > > gimple *stmt = get_gimple_control_stmt (bb); > > if (!stmt) > > @@ -656,7 +665,8 @@ find_jump_threads_backwards (basic_block > > hash_set *visited_bbs = new hash_set; > > > > max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS); > > - fsm_find_control_statement_thread_paths (name, visited_bbs, bb_path, false); > > + fsm_find_control_statement_thread_paths (name, visited_bbs, bb_path, false, > > + speed_p); > > > > delete visited_bbs; > > vec_free (bb_path); > > @@ -706,7 +716,7 @@ pass_thread_jumps::execute (function *fu > > FOR_EACH_BB_FN (bb, fun) > > { > > if (EDGE_COUNT (bb->succs) > 1) > > - find_jump_threads_backwards (bb); > > + find_jump_threads_backwards (bb, true); > > } > > bool changed = thread_through_all_blocks (true); > > > > @@ -721,3 +731,59 @@ make_pass_thread_jumps (gcc::context *ct > > { > > return new pass_thread_jumps (ctxt); > > } > > + > > +namespace { > > + > > +const pass_data pass_data_early_thread_jumps = > > +{ > > + GIMPLE_PASS, > > + "ethread", > > + OPTGROUP_NONE, > > + TV_TREE_SSA_THREAD_JUMPS, > > + ( PROP_cfg | PROP_ssa ), > > + 0, > > + 0, > > + 0, > > + ( TODO_cleanup_cfg | TODO_update_ssa ), > > +}; > > + > > +class pass_early_thread_jumps : public gimple_opt_pass > > +{ > > +public: > > + pass_early_thread_jumps (gcc::context *ctxt) > > + : gimple_opt_pass (pass_data_early_thread_jumps, ctxt) > > + {} > > + > > + opt_pass * clone (void) { return new pass_early_thread_jumps (m_ctxt); } > > + virtual bool gate (function *); > > + virtual unsigned int execute (function *); > > +}; > > + > > +bool > > +pass_early_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED) > > +{ > > + return true; > > +} > > + > > + > > +unsigned int > > +pass_early_thread_jumps::execute (function *fun) > > +{ > > + /* Try to thread each block with more than one successor. */ > > + basic_block bb; > > + FOR_EACH_BB_FN (bb, fun) > > + { > > + if (EDGE_COUNT (bb->succs) > 1) > > + find_jump_threads_backwards (bb, false); > > + } > > + thread_through_all_blocks (true); > > + return 0; > > +} > > + > > +} > > + > > +gimple_opt_pass * > > +make_pass_early_thread_jumps (gcc::context *ctxt) > > +{ > > + return new pass_early_thread_jumps (ctxt); > > +} > > Index: testsuite/gcc.dg/uninit-15.c > > =================================================================== > > --- testsuite/gcc.dg/uninit-15.c (revision 240109) > > +++ testsuite/gcc.dg/uninit-15.c (working copy) > > @@ -1,16 +1,16 @@ > > /* PR tree-optimization/17506 > > We issue an uninitialized variable warning at a wrong location at > > line 11, which is very confusing. Make sure we print out a note to > > - make it less confusing. (xfailed alternative) > > + make it less confusing. (not xfailed alternative) > > But it is of course ok if we warn in bar about uninitialized use > > - of j. (not xfailed alternative) */ > > + of j. (xfailed alternative) */ > > /* { dg-do compile } */ > > /* { dg-options "-O1 -Wuninitialized" } */ > > > > inline int > > foo (int i) > > { > > - if (i) /* { dg-warning "used uninitialized in this function" "" { xfail *-*-* } } */ > > + if (i) /* { dg-warning "used uninitialized in this function" } */ > > return 1; > > return 0; > > } > > @@ -20,7 +20,7 @@ void baz (void); > > void > > bar (void) > > { > > - int j; /* { dg-message "note: 'j' was declared here" "" { xfail *-*-* } } */ > > - for (; foo (j); ++j) /* { dg-warning "'j' is used uninitialized" } */ > > + int j; /* { dg-message "note: 'j' was declared here" } */ > > + for (; foo (j); ++j) /* { dg-warning "'j' is used uninitialized" "" { xfail *-*-* } } */ > > baz (); > > }