public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Early jump threading
@ 2016-08-11 14:02 Jan Hubicka
  2016-08-11 14:06 ` Richard Biener
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Jan Hubicka @ 2016-08-11 14:02 UTC (permalink / raw)
  To: gcc-patches, law, rguenther

Hi,
this patch adds early jump threading pass.  Jump threading is one of most
common cases where estimated profile becomes corrupted, because the branches
are predicted independently beforehand. This patch simply adds early mode to
jump threader that does not permit code growth and thus only win/win threads
are done before profile is constructed.

Naturally this also improves IPA decisions because code size/time is estimated
more acurately.

It is not very cool to add another pass and the jump threader is already
run 5 times. I think incrementally we could drop one of late threaders at least.
I tried to measure compile time effects but they are in wash. Moreover the patch
pays for itself in cc1plus code size:

Before patch to tweak size estimates: 27779964  
Current mainline:                     27748900  
With patch applied:                   27716173  

So despite adding few functions the code size effect is positive which I think
is quite nice.

Given the fact that jump threading seems quite lightweit, I wonder why it is
guarded by flag_expensive_optimizations? Is it expensive in terms of code
size?

The effectivity of individual threading passes is as follows (for tramp3d)

      mainline                              with patch
pass  thread count     profile mismatches   thread count    profile mismatch
early                                       525
1     1853             1900                 316             337
2     4                812                  4               288
3     24               1450                 32              947
4     76               1457                 75              975

So at least tramp3d data seems to suggest that we can drop the second occurence
of jump threading and that most of the job thread1 does can be done by the
size restricted early version (the lower absolute counts are caused by the
fact that threadable paths gets duplicated by the inliner). 50% drop in
profile distortion is not bad. I wonder why basically every threaded paths seems
to introduce a mismatch.

The patch distorts testusite somewhat, in most cases we only want to update
dump files or disable early threading:

+XPASS: gcc.dg/uninit-15.c  (test for warnings, line 13)
+XPASS: gcc.dg/uninit-15.c  (test for warnings, line 23)
+FAIL: gcc.dg/uninit-15.c  (test for warnings, line 24)
+FAIL: gcc.dg/tree-ssa/pr68198.c scan-tree-dump-times thread1 "Registering FSM" 1
+FAIL: gcc.dg/tree-ssa/pr69196-1.c scan-tree-dump thread1 "FSM did not thread around loop and would copy too many statements"
+FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2b.c scan-tree-dump-times thread1 "Jumps threaded: 1" 1
+FAIL: gcc.dg/tree-ssa/ssa-thread-13.c scan-tree-dump thread1 "FSM"
+FAIL: gcc.dg/tree-ssa/vrp01.c scan-tree-dump-times vrp1 "Folding predicate p_.*to 1" 1
+FAIL: gcc.dg/tree-ssa/vrp56.c scan-tree-dump thread1 "Jumps threaded: 1"
+FAIL: gcc.dg/tree-ssa/vrp92.c scan-tree-dump vrp1 "res_.: \\\\[1, 1\\\\]"

This testcase is the now probably unnecesary heuristics (it may still be
relevant in cases we do not thread because of size bounds but its effectivity
may not be worth the maintenance cost):

+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times profile_estimate "loop exit heuristics of edge[^:]*:" 3

If the patch seems acceptable, I will do the updates. One option why I did
not do that is that it seems to be now posisble to pass parameters to passes
from passes.def, so perhaps we do not need early_thread_jumps, but doing so is
consistent with way we handle other early passes.

Bootstrapped/regtested x86_64-linux
Honza

	* passes.def (pass_early_thread_jumps): Schedule after forwprop.
	* tree-pass.h (make_pass_early_thread_jumps): Declare.
	* tree-ssa-threadbackward.c (fsm_find_thread_path,
	fsm_find_thread_path, profitable_jump_thread_path,
	fsm_find_control_statement_thread_paths,
	find_jump_threads_backwards): Add speed_p parameter.
	(pass_data_early_thread_jumps): New pass.
	(make_pass_early_thread_jumps): New function.
	
Index: passes.def
===================================================================
--- passes.def	(revision 239218)
+++ passes.def	(working copy)
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.
 	  /* After CCP we rewrite no longer addressed locals into SSA
 	     form if possible.  */
 	  NEXT_PASS (pass_forwprop);
+          NEXT_PASS (pass_early_thread_jumps);
 	  NEXT_PASS (pass_sra_early);
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 239218)
+++ tree-pass.h	(working copy)
@@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_cd_dce
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_early_thread_jumps (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt);
Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c	(revision 239219)
+++ tree-ssa-threadbackward.c	(working copy)
@@ -61,12 +61,14 @@ get_gimple_control_stmt (basic_block bb)
 /* Return true if the CFG contains at least one path from START_BB to END_BB.
    When a path is found, record in PATH the blocks from END_BB to START_BB.
    VISITED_BBS is used to make sure we don't fall into an infinite loop.  Bound
-   the recursion to basic blocks belonging to LOOP.  */
+   the recursion to basic blocks belonging to LOOP.
+   SPEED_P indicate that we could increase code size to improve the code path */
 
 static bool
 fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
 		      vec<basic_block, va_gc> *&path,
-		      hash_set<basic_block> *visited_bbs, loop_p loop)
+		      hash_set<basic_block> *visited_bbs, loop_p loop,
+		      bool speed_p)
 {
   if (loop != start_bb->loop_father)
     return false;
@@ -82,7 +84,8 @@ fsm_find_thread_path (basic_block start_
       edge e;
       edge_iterator ei;
       FOR_EACH_EDGE (e, ei, start_bb->succs)
-	if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop))
+	if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop,
+				  speed_p))
 	  {
 	    vec_safe_push (path, start_bb);
 	    return true;
@@ -101,11 +104,13 @@ fsm_find_thread_path (basic_block start_
    value on PATH.  ARG is the value of that SSA_NAME.
 
    BBI will be appended to PATH when we have a profitable jump threading
-   path.  Callers are responsible for removing BBI from PATH in that case. */
+   path.  Callers are responsible for removing BBI from PATH in that case.
+
+   SPEED_P indicate that we could increase code size to improve the code path */
 
 static edge
 profitable_jump_thread_path (vec<basic_block, va_gc> *&path,
-			     basic_block bbi, tree name, tree arg)
+			     basic_block bbi, tree name, tree arg, bool speed_p)
 {
   /* Note BBI is not in the path yet, hence the +1 in the test below
      to make sure BBI is accounted for in the path length test.  */
@@ -306,7 +311,7 @@ profitable_jump_thread_path (vec<basic_b
       return NULL;
     }
 
-  if (optimize_edge_for_speed_p (taken_edge))
+  if (speed_p && optimize_edge_for_speed_p (taken_edge))
     {
       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
 	{
@@ -421,13 +426,15 @@ convert_and_register_jump_thread_path (v
 
 /* We trace the value of the SSA_NAME NAME back through any phi nodes looking
    for places where it gets a constant value and save the path.  Stop after
-   having recorded MAX_PATHS jump threading paths.  */
+   having recorded MAX_PATHS jump threading paths.
+
+   SPEED_P indicate that we could increase code size to improve the code path */
 
 static void
 fsm_find_control_statement_thread_paths (tree name,
 					 hash_set<basic_block> *visited_bbs,
 					 vec<basic_block, va_gc> *&path,
-					 bool seen_loop_phi)
+					 bool seen_loop_phi, bool speed_p)
 {
   /* If NAME appears in an abnormal PHI, then don't try to trace its
      value back through PHI nodes.  */
@@ -495,7 +502,7 @@ fsm_find_control_statement_thread_paths
 	  hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
 
 	  if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs,
-				    e->src->loop_father))
+				    e->src->loop_father, speed_p))
 	    ++e_count;
 
 	  delete visited_bbs;
@@ -561,7 +568,7 @@ fsm_find_control_statement_thread_paths
 	      /* Recursively follow SSA_NAMEs looking for a constant
 		 definition.  */
 	      fsm_find_control_statement_thread_paths (arg, visited_bbs, path,
-						       seen_loop_phi);
+						       seen_loop_phi, speed_p);
 
 	      path->pop ();
 	      continue;
@@ -572,7 +579,8 @@ fsm_find_control_statement_thread_paths
 
 	  /* If this is a profitable jump thread path, then convert it
 	     into the canonical form and register it.  */
-	  edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg);
+	  edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg,
+							 speed_p);
 	  if (taken_edge)
 	    {
 	      if (bb_loop_depth (taken_edge->src)
@@ -588,7 +596,7 @@ fsm_find_control_statement_thread_paths
 
       if (TREE_CODE (arg) == SSA_NAME)
 	fsm_find_control_statement_thread_paths (arg, visited_bbs,
-						 path, seen_loop_phi);
+						 path, seen_loop_phi, speed_p);
 
       else
 	{
@@ -598,7 +606,7 @@ fsm_find_control_statement_thread_paths
 	  path->pop ();
 
 	  edge taken_edge = profitable_jump_thread_path (path, var_bb,
-						     name, arg);
+						         name, arg, speed_p);
 	  if (taken_edge)
 	    {
 	      if (bb_loop_depth (taken_edge->src)
@@ -622,10 +630,11 @@ fsm_find_control_statement_thread_paths
    is a constant.  Record such paths for jump threading.
 
    It is assumed that BB ends with a control statement and that by
-   finding a path where NAME is a constant, we can thread the path.  */
+   finding a path where NAME is a constant, we can thread the path.
+   SPEED_P indicate that we could increase code size to improve the code path */
 
 void  
-find_jump_threads_backwards (basic_block bb)
+find_jump_threads_backwards (basic_block bb, bool speed_p)
 {     
   gimple *stmt = get_gimple_control_stmt (bb);
   if (!stmt)
@@ -655,7 +664,8 @@ find_jump_threads_backwards (basic_block
   hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
 
   max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS);
-  fsm_find_control_statement_thread_paths (name, visited_bbs, bb_path, false);
+  fsm_find_control_statement_thread_paths (name, visited_bbs, bb_path, false,
+					   speed_p);
 
   delete visited_bbs;
   vec_free (bb_path);
@@ -703,7 +713,7 @@ pass_thread_jumps::execute (function *fu
   FOR_EACH_BB_FN (bb, fun)
     {
       if (EDGE_COUNT (bb->succs) > 1)
-	find_jump_threads_backwards (bb);
+	find_jump_threads_backwards (bb, true);
     }
   thread_through_all_blocks (true);
   return 0;
@@ -716,3 +726,59 @@ make_pass_thread_jumps (gcc::context *ct
 {
   return new pass_thread_jumps (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_early_thread_jumps =
+{
+  GIMPLE_PASS,
+  "early_thread",
+  OPTGROUP_NONE,
+  TV_TREE_SSA_THREAD_JUMPS,
+  ( PROP_cfg | PROP_ssa ),
+  0,
+  0,
+  0,
+  ( TODO_cleanup_cfg | TODO_update_ssa ),
+};
+
+class pass_early_thread_jumps : public gimple_opt_pass
+{
+public:
+  pass_early_thread_jumps (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_early_thread_jumps, ctxt)
+  {}
+
+  opt_pass * clone (void) { return new pass_early_thread_jumps (m_ctxt); }
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *);
+};
+
+bool
+pass_early_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
+
+unsigned int
+pass_early_thread_jumps::execute (function *fun)
+{
+  /* Try to thread each block with more than one successor.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      if (EDGE_COUNT (bb->succs) > 1)
+	find_jump_threads_backwards (bb, false);
+    }
+  thread_through_all_blocks (true);
+  return 0;
+}
+
+}
+
+gimple_opt_pass *
+make_pass_early_thread_jumps (gcc::context *ctxt)
+{
+  return new pass_early_thread_jumps (ctxt);
+}

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-12-16 12:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 14:02 Early jump threading Jan Hubicka
2016-08-11 14:06 ` Richard Biener
2016-08-11 14:27   ` Jan Hubicka
2016-08-11 15:50     ` Richard Biener
2016-08-11 18:42       ` Jeff Law
2016-09-18 19:30         ` Jan Hubicka
2016-09-19  6:53           ` Andrew Pinski
2016-09-19  9:29             ` Jan Hubicka
2016-10-18 15:56               ` James Greenhalgh
2016-12-15 11:54                 ` [Patch] Undermine the jump threading cost model to fix PR77445 James Greenhalgh
2016-12-16 12:34                   ` Richard Biener
2016-09-19  7:59           ` Early jump threading Christophe Lyon
2016-09-20  8:41             ` Thomas Preudhomme
2016-08-11 18:41     ` Jeff Law
2016-08-12  6:02       ` Richard Biener
2016-08-15 17:06         ` Jeff Law
2016-08-11 18:30   ` Jeff Law
2016-08-11 18:29 ` Jeff Law
2016-08-16 11:02   ` Jan Hubicka
2016-08-16 15:42     ` Jeff Law
2016-08-12  8:02 ` Richard Biener
2016-08-12 11:27   ` Jan Hubicka
2016-08-15 17:25     ` Jeff Law
2016-08-16 16:52   ` Jeff Law
2016-08-18 19:55 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).