public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-10-30  1:03 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-10-30  1:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

This patch drastically reduces the worst cast cost for finding 
equivalences associated with outgoing edges from switch statements.

Previously upon entry into a block we would search for any equivalences
associated with the edge we took into the block.  If that edge was
an outgoing edge from a switch statement we had to walk the entire
switch vector to determine if traversing the edge created an
equivalence.  Thus searching for these equivalences was effectively
O(N^2) where N is the number of outgoing edges from a switch statement.

With this patch we reduce that to O(N) by scanning the switch vector
once and recording any/all equivalences during that single pass over
the switch vector.  We can then query the recorded information as
we enter blocks during the dominator walk.

To give you some idea of how this affects worst-case behavior, the
dominator optimizer takes 41.76 seconds (19% of total compilation time)
for the testcase in PR 15524.  After this patch the dominator optimizer
takes 3.86 seconds (2% of total compilation time).

I've seen no measurable impact on more normal codes from this version
of the patch.

It turns out that it's fairly simple to record equivalences from
COND_EXPRs in the same data structure.  This has the nice side
effect of reducing the number of calls to invert_truthvalue and
thus reducing the amount of GC garbage we create.  I don't have
data for that handy though.

Bootstrapped and regression tested on i686-pc-linux-gnu.





[-- Attachment #2: ZZZ --]
[-- Type: text/plain, Size: 43722 bytes --]

	* tree-ssa-dom.c (struct edge_info): New structure holding
	edge equivalences and edge redirection information.
	(get_eq_expr_value, record_dominating_conditions): Kill.
	(propagate_to_outgoing_edges): Renamed from cprop_into_phis.
	Call record_edge_info.
	(allocate_edge_info, free_edge_info): New.
	(tree_ssa_dominator_optimize): Use propagate_to_outgoing_edges
	rather than cprop_into_phis.  Free all edge infos before threading
	jumps.
	(thread_across_edge): Allocate new edge info structures as needed
	and store the redirection target into the edge info structure
	instead of the edge's AUX field.
	(dom_opt_initialize_block): Mark unused argument with ATTRIBUTE_UNUSED.
	(record_equivalence_from_incoming_edge): Lose unnecessary argument.
	Revamp code which finds and records equivalences associated with
	edges to use saved data in the edge_info structure.
	(record_equivalencs_from_phis): Similarly.
	(dom_opt_finalize_block): Revamp code which finds and records
	equivalences associated with edges to use saved data in the
	edge_info structure.
	(build_and_record_new_cond): New function.
	(record_conditions): Use build_and_record_new_cond to record
	dominating conditions.
	(record_edge_info): New function.
	(record_range): Tighten test for conditions which create
	useful range records.


Index: tree-ssa-dom.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-dom.c,v
retrieving revision 2.63
diff -c -p -r2.63 tree-ssa-dom.c
*** tree-ssa-dom.c	29 Oct 2004 08:41:03 -0000	2.63
--- tree-ssa-dom.c	30 Oct 2004 00:04:20 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 45,50 ****
--- 45,84 ----
  
  /* This file implements optimizations on the dominator tree.  */
  
+ 
+ /* Structure for recording edge equivalences as well as any pending
+    edge redirections during the dominator optimizer.
+ 
+    Computing and storing the edge equivalences instead of creating
+    them on-demand can save significant amounts of time, particularly
+    for pathological cases involving switch statements.  
+ 
+    These structures live for a single iteration of the dominator
+    optimizer in the edge's AUX field.  At the end of an iteration we
+    free each of these structures and update the AUX field to point
+    to any requested redirection target (the code for updating the
+    CFG and SSA graph for edge redirection expects redirection edge
+    targets to be in the AUX field for each edge.  */
+ 
+ struct edge_info
+ {
+   /* If this edge creates a simple equivalence, the LHS and RHS of
+      the equivalence will be stored here.  */
+   tree lhs;
+   tree rhs;
+ 
+   /* Traversing an edge may also indicate one or more particular conditions
+      are true or false.  The number of recorded conditions can vary, but
+      can be determined by the condition's code.  So we have an array
+      and its maximum index rather than use a varray.  */
+   tree *cond_equivalences;
+   unsigned int max_cond_equivalences;
+ 
+   /* If we can thread this edge this field records the new target.  */
+   edge redirection_target;
+ };
+ 
+ 
  /* Hash table with expressions made available during the renaming process.
     When an assignment of the form X_i = EXPR is found, the statement is
     stored in this table.  If the same expression EXPR is later found on the
*************** static void optimize_stmt (struct dom_wa
*** 231,237 ****
  			   basic_block bb,
  			   block_stmt_iterator);
  static tree lookup_avail_expr (tree, bool);
- static struct eq_expr_value get_eq_expr_value (tree, int, basic_block);
  static hashval_t vrp_hash (const void *);
  static int vrp_eq (const void *, const void *);
  static hashval_t avail_expr_hash (const void *);
--- 265,270 ----
*************** static hashval_t real_avail_expr_hash (c
*** 239,245 ****
  static int avail_expr_eq (const void *, const void *);
  static void htab_statistics (FILE *, htab_t);
  static void record_cond (tree, tree);
- static void record_dominating_conditions (tree);
  static void record_const_or_copy (tree, tree);
  static void record_equality (tree, tree);
  static tree update_rhs_and_lookup_avail_expr (tree, tree, bool);
--- 272,277 ----
*************** static tree simplify_switch_and_lookup_a
*** 250,265 ****
  static tree find_equivalent_equality_comparison (tree);
  static void record_range (tree, basic_block);
  static bool extract_range_from_cond (tree, tree *, tree *, int *);
! static void record_equivalences_from_phis (struct dom_walk_data *, basic_block);
! static void record_equivalences_from_incoming_edge (struct dom_walk_data *,
! 						    basic_block);
  static bool eliminate_redundant_computations (struct dom_walk_data *,
  					      tree, stmt_ann_t);
  static void record_equivalences_from_stmt (tree, int, stmt_ann_t);
  static void thread_across_edge (struct dom_walk_data *, edge);
  static void dom_opt_finalize_block (struct dom_walk_data *, basic_block);
  static void dom_opt_initialize_block (struct dom_walk_data *, basic_block);
! static void cprop_into_phis (struct dom_walk_data *, basic_block);
  static void remove_local_expressions_from_table (void);
  static void restore_vars_to_original_value (void);
  static void restore_currdefs_to_original_value (void);
--- 282,296 ----
  static tree find_equivalent_equality_comparison (tree);
  static void record_range (tree, basic_block);
  static bool extract_range_from_cond (tree, tree *, tree *, int *);
! static void record_equivalences_from_phis (basic_block);
! static void record_equivalences_from_incoming_edge (basic_block);
  static bool eliminate_redundant_computations (struct dom_walk_data *,
  					      tree, stmt_ann_t);
  static void record_equivalences_from_stmt (tree, int, stmt_ann_t);
  static void thread_across_edge (struct dom_walk_data *, edge);
  static void dom_opt_finalize_block (struct dom_walk_data *, basic_block);
  static void dom_opt_initialize_block (struct dom_walk_data *, basic_block);
! static void propagate_to_outgoing_edges (struct dom_walk_data *, basic_block);
  static void remove_local_expressions_from_table (void);
  static void restore_vars_to_original_value (void);
  static void restore_currdefs_to_original_value (void);
*************** local_fold (tree t)
*** 283,288 ****
--- 314,363 ----
    return t;
  }
  
+ /* Allocate an EDGE_INFO for edge E and attach it to E.
+    Return the new EDGE_INFO structure.  */
+ 
+ static struct edge_info *
+ allocate_edge_info (edge e)
+ {
+   struct edge_info *edge_info;
+ 
+   edge_info = xcalloc (1, sizeof (struct edge_info));
+ 
+   e->aux = edge_info;
+   return edge_info;
+ }
+ 
+ /* Free all EDGE_INFO structures associated with edges in the CFG.
+    If a partciular edge can be threaded, copy the redirection
+    target from the EDGE_INFO structure into the edge's AUX field
+    as required by code to update the CFG and SSA graph for
+    jump threading.  */
+ 
+ static void
+ free_all_edge_infos (void)
+ {
+   basic_block bb;
+   edge_iterator ei;
+   edge e;
+ 
+   FOR_EACH_BB (bb)
+     {
+       FOR_EACH_EDGE (e, ei, bb->preds)
+         {
+ 	 struct edge_info *edge_info = e->aux;
+ 
+ 	  if (edge_info)
+ 	    {
+ 	      e->aux = edge_info->redirection_target;
+ 	      if (edge_info->cond_equivalences)
+ 		free (edge_info->cond_equivalences);
+ 	      free (edge_info);
+ 	    }
+ 	}
+     }
+ }
+ 
  /* Jump threading, redundancy elimination and const/copy propagation. 
  
     This pass may expose new symbols that need to be renamed into SSA.  For
*************** tree_ssa_dominator_optimize (void)
*** 323,329 ****
    walk_data.initialize_block_local_data = NULL;
    walk_data.before_dom_children_before_stmts = dom_opt_initialize_block;
    walk_data.before_dom_children_walk_stmts = optimize_stmt;
!   walk_data.before_dom_children_after_stmts = cprop_into_phis;
    walk_data.after_dom_children_before_stmts = NULL;
    walk_data.after_dom_children_walk_stmts = NULL;
    walk_data.after_dom_children_after_stmts = dom_opt_finalize_block;
--- 398,404 ----
    walk_data.initialize_block_local_data = NULL;
    walk_data.before_dom_children_before_stmts = dom_opt_initialize_block;
    walk_data.before_dom_children_walk_stmts = optimize_stmt;
!   walk_data.before_dom_children_after_stmts = propagate_to_outgoing_edges;
    walk_data.after_dom_children_before_stmts = NULL;
    walk_data.after_dom_children_walk_stmts = NULL;
    walk_data.after_dom_children_after_stmts = dom_opt_finalize_block;
*************** tree_ssa_dominator_optimize (void)
*** 362,367 ****
--- 437,444 ----
  	  bitmap_clear (vars_to_rename);
  	}
  
+       free_all_edge_infos ();
+ 
        /* Thread jumps, creating duplicate blocks as needed.  */
        cfg_altered = thread_through_all_blocks ();
  
*************** thread_across_edge (struct dom_walk_data
*** 697,705 ****
  	     bypass the conditional at our original destination.  */
  	  if (dest)
  	    {
  	      update_bb_profile_for_threading (e->dest, EDGE_FREQUENCY (e),
  					       e->count, taken_edge);
! 	      e->aux = taken_edge;
  	      bb_ann (e->dest)->incoming_edge_threaded = true;
  	    }
  	}
--- 774,788 ----
  	     bypass the conditional at our original destination.  */
  	  if (dest)
  	    {
+ 	      struct edge_info *edge_info;
+ 
  	      update_bb_profile_for_threading (e->dest, EDGE_FREQUENCY (e),
  					       e->count, taken_edge);
! 	      if (e->aux)
! 		edge_info = e->aux;
! 	      else
! 		edge_info = allocate_edge_info (e);
! 	      edge_info->redirection_target = taken_edge;
  	      bb_ann (e->dest)->incoming_edge_threaded = true;
  	    }
  	}
*************** thread_across_edge (struct dom_walk_data
*** 712,718 ****
     reach BB or they may come from PHI nodes at the start of BB.  */
  
  static void
! dom_opt_initialize_block (struct dom_walk_data *walk_data, basic_block bb)
  {
    if (dump_file && (dump_flags & TDF_DETAILS))
      fprintf (dump_file, "\n\nOptimizing block #%d\n\n", bb->index);
--- 795,802 ----
     reach BB or they may come from PHI nodes at the start of BB.  */
  
  static void
! dom_opt_initialize_block (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
! 			  basic_block bb)
  {
    if (dump_file && (dump_flags & TDF_DETAILS))
      fprintf (dump_file, "\n\nOptimizing block #%d\n\n", bb->index);
*************** dom_opt_initialize_block (struct dom_wal
*** 725,734 ****
    VARRAY_PUSH_TREE (nonzero_vars_stack, NULL_TREE);
    VARRAY_PUSH_TREE (vrp_variables_stack, NULL_TREE);
  
!   record_equivalences_from_incoming_edge (walk_data, bb);
  
    /* PHI nodes can create equivalences too.  */
!   record_equivalences_from_phis (walk_data, bb);
  }
  
  /* Given an expression EXPR (a relational expression or a statement), 
--- 809,818 ----
    VARRAY_PUSH_TREE (nonzero_vars_stack, NULL_TREE);
    VARRAY_PUSH_TREE (vrp_variables_stack, NULL_TREE);
  
!   record_equivalences_from_incoming_edge (bb);
  
    /* PHI nodes can create equivalences too.  */
!   record_equivalences_from_phis (bb);
  }
  
  /* Given an expression EXPR (a relational expression or a statement), 
*************** dom_opt_finalize_block (struct dom_walk_
*** 900,921 ****
  	   && (EDGE_SUCC (bb, 1)->flags & EDGE_ABNORMAL) == 0)
      {
        edge true_edge, false_edge;
-       tree cond, inverted = NULL;
-       enum tree_code cond_code;
  
        extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
  
-       cond = COND_EXPR_COND (last);
-       cond_code = TREE_CODE (cond);
- 
-       if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
- 	inverted = invert_truthvalue (cond);
- 
        /* If the THEN arm is the end of a dominator tree or has PHI nodes,
  	 then try to thread through its edge.  */
        if (get_immediate_dominator (CDI_DOMINATORS, true_edge->dest) != bb
  	  || phi_nodes (true_edge->dest))
  	{
  	  /* Push a marker onto the available expression stack so that we
  	     unwind any expressions related to the TRUE arm before processing
  	     the false arm below.  */
--- 984,1000 ----
  	   && (EDGE_SUCC (bb, 1)->flags & EDGE_ABNORMAL) == 0)
      {
        edge true_edge, false_edge;
  
        extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
  
        /* If the THEN arm is the end of a dominator tree or has PHI nodes,
  	 then try to thread through its edge.  */
        if (get_immediate_dominator (CDI_DOMINATORS, true_edge->dest) != bb
  	  || phi_nodes (true_edge->dest))
  	{
+ 	  struct edge_info *edge_info;
+ 	  unsigned int i;
+ 
  	  /* Push a marker onto the available expression stack so that we
  	     unwind any expressions related to the TRUE arm before processing
  	     the false arm below.  */
*************** dom_opt_finalize_block (struct dom_walk_
*** 923,937 ****
  	  VARRAY_PUSH_TREE (block_defs_stack, NULL_TREE);
  	  VARRAY_PUSH_TREE (const_and_copies_stack, NULL_TREE);
  
! 	  /* Record any equivalences created by following this edge.  */
! 	  if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
! 	    {
! 	      record_cond (cond, boolean_true_node);
! 	      record_dominating_conditions (cond);
! 	      record_cond (inverted, boolean_false_node);
  	    }
- 	  else if (cond_code == SSA_NAME)
- 	    record_const_or_copy (cond, boolean_true_node);
  
  	  /* Now thread the edge.  */
  	  thread_across_edge (walk_data, true_edge);
--- 1002,1039 ----
  	  VARRAY_PUSH_TREE (block_defs_stack, NULL_TREE);
  	  VARRAY_PUSH_TREE (const_and_copies_stack, NULL_TREE);
  
! 	  edge_info = true_edge->aux;
! 
! 	  /* If we have info associated with this edge, record it into
! 	     our equivalency tables.  */
! 	  if (edge_info)
! 	    {
! 	      tree *cond_equivalences = edge_info->cond_equivalences;
! 	      tree lhs = edge_info->lhs;
! 	      tree rhs = edge_info->rhs;
! 
! 	      /* If we have a simple NAME = VALUE equivalency record it.
! 		 Until the jump threading selection code improves, only
! 		 do this if both the name and value are SSA_NAMEs with
! 		 the same underlying variable to avoid missing threading
! 		 opportunities.  */
! 	      if (lhs
! 		  && TREE_CODE (COND_EXPR_COND (last)) == SSA_NAME
! 		  && TREE_CODE (edge_info->rhs) == SSA_NAME
! 		  && SSA_NAME_VAR (lhs) == SSA_NAME_VAR (rhs))
! 		record_const_or_copy (lhs, rhs);
! 
! 	      /* If we have 0 = COND or 1 = COND equivalences, record them
! 		 into our expression hash tables.  */
! 	      if (cond_equivalences)
! 		for (i = 0; i < edge_info->max_cond_equivalences; i += 2)
! 		  {
! 		    tree expr = cond_equivalences[i];
! 		    tree value = cond_equivalences[i + 1];
! 
! 		    record_cond (expr, value);
! 		  }
  	    }
  
  	  /* Now thread the edge.  */
  	  thread_across_edge (walk_data, true_edge);
*************** dom_opt_finalize_block (struct dom_walk_
*** 947,961 ****
        if (get_immediate_dominator (CDI_DOMINATORS, false_edge->dest) != bb
  	  || phi_nodes (false_edge->dest))
  	{
! 	  /* Record any equivalences created by following this edge.  */
! 	  if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
! 	    {
! 	      record_cond (cond, boolean_false_node);
! 	      record_cond (inverted, boolean_true_node);
! 	      record_dominating_conditions (inverted);
  	    }
- 	  else if (cond_code == SSA_NAME)
- 	    record_const_or_copy (cond, boolean_false_node);
  
  	  thread_across_edge (walk_data, false_edge);
  
--- 1049,1087 ----
        if (get_immediate_dominator (CDI_DOMINATORS, false_edge->dest) != bb
  	  || phi_nodes (false_edge->dest))
  	{
! 	  struct edge_info *edge_info;
! 	  unsigned int i;
! 
! 	  edge_info = false_edge->aux;
! 
! 	  /* If we have info associated with this edge, record it into
! 	     our equivalency tables.  */
! 	  if (edge_info)
! 	    {
! 	      tree *cond_equivalences = edge_info->cond_equivalences;
! 	      tree lhs = edge_info->lhs;
! 	      tree rhs = edge_info->rhs;
! 
! 	      /* If we have a simple NAME = VALUE equivalency record it.
! 		 Until the jump threading selection code improves, only
! 		 do this if both the name and value are SSA_NAMEs with
! 		 the same underlying variable to avoid missing threading
! 		 opportunities.  */
! 	      if (lhs
! 		  && TREE_CODE (COND_EXPR_COND (last)) == SSA_NAME)
! 		record_const_or_copy (lhs, rhs);
! 
! 	      /* If we have 0 = COND or 1 = COND equivalences, record them
! 		 into our expression hash tables.  */
! 	      if (cond_equivalences)
! 		for (i = 0; i < edge_info->max_cond_equivalences; i += 2)
! 		  {
! 		    tree expr = cond_equivalences[i];
! 		    tree value = cond_equivalences[i + 1];
! 
! 		    record_cond (expr, value);
! 		  }
  	    }
  
  	  thread_across_edge (walk_data, false_edge);
  
*************** dom_opt_finalize_block (struct dom_walk_
*** 1040,1047 ****
     even if we do not know its exact value.  */
  
  static void
! record_equivalences_from_phis (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
! 			       basic_block bb)
  {
    tree phi;
  
--- 1166,1172 ----
     even if we do not know its exact value.  */
  
  static void
! record_equivalences_from_phis (basic_block bb)
  {
    tree phi;
  
*************** single_incoming_edge_ignoring_loop_edges
*** 1138,1247 ****
     has more than one incoming edge, then no equivalence is created.  */
  
  static void
! record_equivalences_from_incoming_edge (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
! 					basic_block bb)
  {
!   int edge_flags;
    basic_block parent;
!   struct eq_expr_value eq_expr_value;
!   tree parent_block_last_stmt = NULL;
  
    /* If our parent block ended with a control statment, then we may be
       able to record some equivalences based on which outgoing edge from
       the parent was followed.  */
    parent = get_immediate_dominator (CDI_DOMINATORS, bb);
-   if (parent)
-     {
-       parent_block_last_stmt = last_stmt (parent);
-       if (parent_block_last_stmt && !is_ctrl_stmt (parent_block_last_stmt))
- 	parent_block_last_stmt = NULL;
-     }
  
!   eq_expr_value.src = NULL;
!   eq_expr_value.dst = NULL;
  
!   /* If we have a single predecessor (ignoring loop backedges), then extract
!      EDGE_FLAGS from the single incoming edge.  Otherwise just return as
!      there is nothing to do.  */
!   if (EDGE_COUNT (bb->preds) >= 1
!       && parent_block_last_stmt)
      {
!       edge e = single_incoming_edge_ignoring_loop_edges (bb);
!       if (e && bb_for_stmt (parent_block_last_stmt) == e->src)
! 	edge_flags = e->flags;
!       else
! 	return;
!     }
!   else
!     return;
  
!   /* If our parent block ended in a COND_EXPR, add any equivalences
!      created by the COND_EXPR to the hash table and initialize
!      EQ_EXPR_VALUE appropriately.
! 
!      EQ_EXPR_VALUE is an assignment expression created when BB's immediate
!      dominator ends in a COND_EXPR statement whose predicate is of the form
!      'VAR == VALUE', where VALUE may be another variable or a constant.
!      This is used to propagate VALUE on the THEN_CLAUSE of that
!      conditional. This assignment is inserted in CONST_AND_COPIES so that
!      the copy and constant propagator can find more propagation
!      opportunities.  */
!   if (TREE_CODE (parent_block_last_stmt) == COND_EXPR
!       && (edge_flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
!     eq_expr_value = get_eq_expr_value (parent_block_last_stmt,
! 				       (edge_flags & EDGE_TRUE_VALUE) != 0,
! 				       bb);
!   /* Similarly when the parent block ended in a SWITCH_EXPR.
!      We can only know the value of the switch's condition if the dominator
!      parent is also the only predecessor of this block.  */
!   else if (EDGE_PRED (bb, 0)->src == parent
! 	   && TREE_CODE (parent_block_last_stmt) == SWITCH_EXPR)
!     {
!       tree switch_cond = SWITCH_COND (parent_block_last_stmt);
! 
!       /* If the switch's condition is an SSA variable, then we may
! 	 know its value at each of the case labels.  */
!       if (TREE_CODE (switch_cond) == SSA_NAME)
! 	{
! 	  tree switch_vec = SWITCH_LABELS (parent_block_last_stmt);
! 	  size_t i, n = TREE_VEC_LENGTH (switch_vec);
! 	  int case_count = 0;
! 	  tree match_case = NULL_TREE;
! 
! 	  /* Search the case labels for those whose destination is
! 	     the current basic block.  */
! 	  for (i = 0; i < n; ++i)
  	    {
! 	      tree elt = TREE_VEC_ELT (switch_vec, i);
! 	      if (label_to_block (CASE_LABEL (elt)) == bb)
  		{
! 		  if (++case_count > 1 || CASE_HIGH (elt))
! 		    break;
! 		  match_case = elt;
! 		}
! 	    }
  
! 	  /* If we encountered precisely one CASE_LABEL_EXPR and it
! 	     was not the default case, or a case range, then we know
! 	     the exact value of SWITCH_COND which caused us to get to
! 	     this block.  Record that equivalence in EQ_EXPR_VALUE.  */
! 	  if (case_count == 1
! 	      && match_case
! 	      && CASE_LOW (match_case)
! 	      && !CASE_HIGH (match_case))
! 	    {
! 	      eq_expr_value.dst = switch_cond;
! 	      eq_expr_value.src = fold_convert (TREE_TYPE (switch_cond),
! 						CASE_LOW (match_case));
  	    }
  	}
      }
- 
-   /* If EQ_EXPR_VALUE (VAR == VALUE) is given, register the VALUE as a
-      new value for VAR, so that occurrences of VAR can be replaced with
-      VALUE while re-writing the THEN arm of a COND_EXPR.  */
-   if (eq_expr_value.src && eq_expr_value.dst)
-     record_equality (eq_expr_value.dst, eq_expr_value.src);
  }
  
  /* Dump SSA statistics on FILE.  */
--- 1263,1324 ----
     has more than one incoming edge, then no equivalence is created.  */
  
  static void
! record_equivalences_from_incoming_edge (basic_block bb)
  {
!   edge e;
    basic_block parent;
!   struct edge_info *edge_info;
  
    /* If our parent block ended with a control statment, then we may be
       able to record some equivalences based on which outgoing edge from
       the parent was followed.  */
    parent = get_immediate_dominator (CDI_DOMINATORS, bb);
  
!   e = single_incoming_edge_ignoring_loop_edges (bb);
  
!   /* If we had a single incoming edge from our parent block, then enter
!      any data associated with the edge into our tables.  */
!   if (e && e->src == parent)
      {
!       unsigned int i;
! 
!       edge_info = e->aux;
! 
!       if (edge_info)
! 	{
! 	  tree lhs = edge_info->lhs;
! 	  tree rhs = edge_info->rhs;
! 	  tree *cond_equivalences = edge_info->cond_equivalences;
  
! 	  if (lhs)
! 	    record_equality (lhs, rhs);
! 
! 	  if (cond_equivalences)
  	    {
! 	      bool recorded_range = false;
! 	      for (i = 0; i < edge_info->max_cond_equivalences; i += 2)
  		{
! 		  tree expr = cond_equivalences[i];
! 		  tree value = cond_equivalences[i + 1];
  
! 		  record_cond (expr, value);
! 
! 		  /* For the first true equivalence, record range
! 		     information.  We only do this for the first
! 		     true equivalence as it should dominate any
! 		     later true equivalences.  */
! 		  if (! recorded_range 
! 		      && COMPARISON_CLASS_P (expr)
! 		      && value == boolean_true_node
! 		      && TREE_CONSTANT (TREE_OPERAND (expr, 1)))
! 		    {
! 		      record_range (expr, bb);
! 		      recorded_range = true;
! 		    }
! 		}
  	    }
  	}
      }
  }
  
  /* Dump SSA statistics on FILE.  */
*************** record_cond (tree cond, tree value)
*** 1333,1482 ****
      free (element);
  }
  
! /* COND is a condition which is known to be true.   Record variants of
!    COND which must also be true.
  
     For example, if a < b is true, then a <= b must also be true.  */
  
  static void
! record_dominating_conditions (tree cond)
  {
    switch (TREE_CODE (cond))
      {
      case LT_EXPR:
-       record_cond (build2 (LE_EXPR, boolean_type_node,
- 			   TREE_OPERAND (cond, 0),
- 			   TREE_OPERAND (cond, 1)),
- 		   boolean_true_node);
-       record_cond (build2 (ORDERED_EXPR, boolean_type_node,
- 			   TREE_OPERAND (cond, 0),
- 			   TREE_OPERAND (cond, 1)),
- 		   boolean_true_node);
-       record_cond (build2 (NE_EXPR, boolean_type_node,
- 			   TREE_OPERAND (cond, 0),
- 			   TREE_OPERAND (cond, 1)),
- 		   boolean_true_node);
-       record_cond (build2 (LTGT_EXPR, boolean_type_node,
- 			   TREE_OPERAND (cond, 0),
- 			   TREE_OPERAND (cond, 1)),
- 		   boolean_true_node);
-       break;
- 
      case GT_EXPR:
!       record_cond (build2 (GE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (ORDERED_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (NE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (LTGT_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
        break;
  
      case GE_EXPR:
      case LE_EXPR:
!       record_cond (build2 (ORDERED_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
        break;
  
      case EQ_EXPR:
!       record_cond (build2 (ORDERED_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (LE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (GE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
        break;
  
      case UNORDERED_EXPR:
!       record_cond (build2 (NE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (UNLE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (UNGE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (UNEQ_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (UNLT_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (UNGT_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
        break;
  
      case UNLT_EXPR:
-       record_cond (build2 (UNLE_EXPR, boolean_type_node,
- 			   TREE_OPERAND (cond, 0),
- 			   TREE_OPERAND (cond, 1)),
- 		   boolean_true_node);
-       record_cond (build2 (NE_EXPR, boolean_type_node,
- 			   TREE_OPERAND (cond, 0),
- 			   TREE_OPERAND (cond, 1)),
- 		   boolean_true_node);
-       break;
- 
      case UNGT_EXPR:
!       record_cond (build2 (UNGE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (NE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
        break;
  
      case UNEQ_EXPR:
!       record_cond (build2 (UNLE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (UNGE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
        break;
  
      case LTGT_EXPR:
!       record_cond (build2 (NE_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
!       record_cond (build2 (ORDERED_EXPR, boolean_type_node,
! 			   TREE_OPERAND (cond, 0),
! 			   TREE_OPERAND (cond, 1)),
! 		   boolean_true_node);
  
      default:
        break;
      }
  }
  
  /* A helper function for record_const_or_copy and record_equality.
--- 1410,1538 ----
      free (element);
  }
  
! /* Build a new conditional using NEW_CODE, OP0 and OP1 and store
!    the new conditional into *p, then store a boolean_true_node
!    into the the *(p + 1).  */
!    
! static void
! build_and_record_new_cond (enum tree_code new_code, tree op0, tree op1, tree *p)
! {
!   *p = build2 (new_code, boolean_type_node, op0, op1);
!   p++;
!   *p = boolean_true_node;
! }
! 
! /* Record that COND is true and INVERTED is false into the edge information
!    structure.  Also record that any conditions dominated by COND are true
!    as well.
  
     For example, if a < b is true, then a <= b must also be true.  */
  
  static void
! record_conditions (struct edge_info *edge_info, tree cond, tree inverted)
  {
+   tree op0, op1;
+ 
+   if (!COMPARISON_CLASS_P (cond))
+     return;
+ 
+   op0 = TREE_OPERAND (cond, 0);
+   op1 = TREE_OPERAND (cond, 1);
+ 
    switch (TREE_CODE (cond))
      {
      case LT_EXPR:
      case GT_EXPR:
!       edge_info->max_cond_equivalences = 12;
!       edge_info->cond_equivalences = xmalloc (12 * sizeof (tree));
!       build_and_record_new_cond ((TREE_CODE (cond) == LT_EXPR
! 				  ? LE_EXPR : GE_EXPR),
! 				 op0, op1, &edge_info->cond_equivalences[4]);
!       build_and_record_new_cond (ORDERED_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[6]);
!       build_and_record_new_cond (NE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[8]);
!       build_and_record_new_cond (LTGT_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[10]);
        break;
  
      case GE_EXPR:
      case LE_EXPR:
!       edge_info->max_cond_equivalences = 6;
!       edge_info->cond_equivalences = xmalloc (6 * sizeof (tree));
!       build_and_record_new_cond (ORDERED_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[4]);
        break;
  
      case EQ_EXPR:
!       edge_info->max_cond_equivalences = 10;
!       edge_info->cond_equivalences = xmalloc (10 * sizeof (tree));
!       build_and_record_new_cond (ORDERED_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[4]);
!       build_and_record_new_cond (LE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[6]);
!       build_and_record_new_cond (GE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[8]);
        break;
  
      case UNORDERED_EXPR:
!       edge_info->max_cond_equivalences = 16;
!       edge_info->cond_equivalences = xmalloc (16 * sizeof (tree));
!       build_and_record_new_cond (NE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[4]);
!       build_and_record_new_cond (UNLE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[6]);
!       build_and_record_new_cond (UNGE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[8]);
!       build_and_record_new_cond (UNEQ_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[10]);
!       build_and_record_new_cond (UNLT_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[12]);
!       build_and_record_new_cond (UNGT_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[14]);
        break;
  
      case UNLT_EXPR:
      case UNGT_EXPR:
!       edge_info->max_cond_equivalences = 8;
!       edge_info->cond_equivalences = xmalloc (8 * sizeof (tree));
!       build_and_record_new_cond ((TREE_CODE (cond) == UNLT_EXPR
! 				  ? UNLE_EXPR : UNGE_EXPR),
! 				 op0, op1, &edge_info->cond_equivalences[4]);
!       build_and_record_new_cond (NE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[6]);
        break;
  
      case UNEQ_EXPR:
!       edge_info->max_cond_equivalences = 8;
!       edge_info->cond_equivalences = xmalloc (8 * sizeof (tree));
!       build_and_record_new_cond (UNLE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[4]);
!       build_and_record_new_cond (UNGE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[6]);
        break;
  
      case LTGT_EXPR:
!       edge_info->max_cond_equivalences = 8;
!       edge_info->cond_equivalences = xmalloc (8 * sizeof (tree));
!       build_and_record_new_cond (NE_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[4]);
!       build_and_record_new_cond (ORDERED_EXPR, op0, op1,
! 				 &edge_info->cond_equivalences[6]);
!       break;
  
      default:
+       edge_info->max_cond_equivalences = 4;
+       edge_info->cond_equivalences = xmalloc (4 * sizeof (tree));
        break;
      }
+ 
+   /* Now store the original true and false conditions into the first
+      two slots.  */
+   edge_info->cond_equivalences[0] = cond;
+   edge_info->cond_equivalences[1] = boolean_true_node;
+   edge_info->cond_equivalences[2] = inverted;
+   edge_info->cond_equivalences[3] = boolean_false_node;
  }
  
  /* A helper function for record_const_or_copy and record_equality.
*************** cprop_into_successor_phis (basic_block b
*** 2311,2324 ****
      }
  }
  
  
! /* Propagate known constants/copies into PHI nodes of BB's successor
!    blocks.  */
  
  static void
! cprop_into_phis (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
! 		 basic_block bb)
  {
    cprop_into_successor_phis (bb, nonzero_vars);
  }
  
--- 2367,2564 ----
      }
  }
  
+ /* We have finished optimizing BB, record any information implied by
+    taking a specific outgoing edge from BB.  */
+ 
+ static void
+ record_edge_info (basic_block bb)
+ {
+   block_stmt_iterator bsi = bsi_last (bb);
+   struct edge_info *edge_info;
+ 
+   if (! bsi_end_p (bsi))
+     {
+       tree stmt = bsi_stmt (bsi);
+ 
+       if (stmt && TREE_CODE (stmt) == SWITCH_EXPR)
+ 	{
+ 	  tree cond = SWITCH_COND (stmt);
+ 
+ 	  if (TREE_CODE (cond) == SSA_NAME)
+ 	    {
+ 	      tree labels = SWITCH_LABELS (stmt);
+ 	      int i, n_labels = TREE_VEC_LENGTH (labels);
+ 	      tree *info = xcalloc (n_basic_blocks, sizeof (tree));
+ 	      edge e;
+ 	      edge_iterator ei;
+ 
+ 	      for (i = 0; i < n_labels; i++)
+ 		{
+ 		  tree label = TREE_VEC_ELT (labels, i);
+ 		  basic_block target_bb = label_to_block (CASE_LABEL (label));
+ 
+ 		  if (CASE_HIGH (label)
+ 		      || !CASE_LOW (label)
+ 		      || info[target_bb->index])
+ 		    info[target_bb->index] = error_mark_node;
+ 		  else
+ 		    info[target_bb->index] = label;
+ 		}
  
! 	      FOR_EACH_EDGE (e, ei, bb->succs)
! 		{
! 		  basic_block target_bb = e->dest;
! 		  tree node = info[target_bb->index];
! 
! 		  if (node != NULL && node != error_mark_node)
! 		    {
! 		      tree x = fold_convert (TREE_TYPE (cond), CASE_LOW (node));
! 		      edge_info = allocate_edge_info (e);
! 		      edge_info->lhs = cond;
! 		      edge_info->rhs = x;
! 		    }
! 		}
! 	      free (info);
! 	    }
! 	}
! 
!       /* A COND_EXPR may create equivalences too.  */
!       if (stmt && TREE_CODE (stmt) == COND_EXPR)
! 	{
! 	  tree cond = COND_EXPR_COND (stmt);
! 	  edge true_edge;
! 	  edge false_edge;
! 
! 	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
! 
! 	  /* If the conditinoal is a single variable 'X', record 'X = 1'
! 	     for the true edge and 'X = 0' on the false edge.  */
! 	  if (SSA_VAR_P (cond))
! 	    {
! 	      struct edge_info *edge_info;
! 
! 	      edge_info = allocate_edge_info (true_edge);
! 	      edge_info->lhs = cond;
! 	      edge_info->rhs = constant_boolean_node (1, TREE_TYPE (cond));
! 
! 	      edge_info = allocate_edge_info (false_edge);
! 	      edge_info->lhs = cond;
! 	      edge_info->rhs = constant_boolean_node (0, TREE_TYPE (cond));
! 	    }
! 	  /* Equality tests may create one or two equivalences.  */
! 	  else if (COMPARISON_CLASS_P (cond))
! 	    {
! 	      tree op0 = TREE_OPERAND (cond, 0);
! 	      tree op1 = TREE_OPERAND (cond, 1);
! 
! 	      /* Special case comparing booleans against a constant as we
! 		 know the value of OP0 on both arms of the branch.  i.e., we
! 		 can record an equivalence for OP0 rather than COND.  */
! 	      if ((TREE_CODE (cond) == EQ_EXPR || TREE_CODE (cond) == NE_EXPR)
! 		  && TREE_CODE (op0) == SSA_NAME
! 		  && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
! 		  && is_gimple_min_invariant (op1))
! 		{
! 		  if (TREE_CODE (cond) == EQ_EXPR)
! 		    {
! 		      edge_info = allocate_edge_info (true_edge);
! 		      edge_info->lhs = op0;
! 		      edge_info->rhs = (integer_zerop (op1)
! 					    ? boolean_false_node
! 					    : boolean_true_node);
! 
! 		      edge_info = allocate_edge_info (false_edge);
! 		      edge_info->lhs = op0;
! 		      edge_info->rhs = (integer_zerop (op1)
! 					    ? boolean_true_node
! 					    : boolean_false_node);
! 		    }
! 		  else
! 		    {
! 		      edge_info = allocate_edge_info (true_edge);
! 		      edge_info->lhs = op0;
! 		      edge_info->rhs = (integer_zerop (op1)
! 					    ? boolean_true_node
! 					    : boolean_false_node);
! 
! 		      edge_info = allocate_edge_info (false_edge);
! 		      edge_info->lhs = op0;
! 		      edge_info->rhs = (integer_zerop (op1)
! 					    ? boolean_false_node
! 					    : boolean_true_node);
! 		    }
! 		}
! 
! 	      if (is_gimple_min_invariant (op0)
! 		  && (TREE_CODE (op1) == SSA_NAME
! 		       || is_gimple_min_invariant (op1)))
! 		{
! 		  tree inverted = invert_truthvalue (cond);
! 		  struct edge_info *edge_info;
! 
! 		  edge_info = allocate_edge_info (true_edge);
! 		  record_conditions (edge_info, cond, inverted);
! 
! 		  if (TREE_CODE (cond) == EQ_EXPR)
! 		    {
! 		      edge_info->lhs = op1;
! 		      edge_info->rhs = op0;
! 		    }
! 
! 		  edge_info = allocate_edge_info (false_edge);
! 		  record_conditions (edge_info, inverted, cond);
! 
! 		  if (TREE_CODE (cond) == NE_EXPR)
! 		    {
! 		      edge_info->lhs = op1;
! 		      edge_info->rhs = op0;
! 		    }
! 		}
! 
! 	      if (TREE_CODE (op0) == SSA_NAME
! 		  && (is_gimple_min_invariant (op1)
! 		      || TREE_CODE (op1) == SSA_NAME))
! 		{
! 		  tree inverted = invert_truthvalue (cond);
! 		  struct edge_info *edge_info;
! 
! 		  edge_info = allocate_edge_info (true_edge);
! 		  record_conditions (edge_info, cond, inverted);
! 
! 		  if (TREE_CODE (cond) == EQ_EXPR)
! 		    {
! 		      edge_info->lhs = op0;
! 		      edge_info->rhs = op1;
! 		    }
! 
! 		  edge_info = allocate_edge_info (false_edge);
! 		  record_conditions (edge_info, inverted, cond);
! 
! 		  if (TREE_CODE (cond) == NE_EXPR)
! 		    {
! 		      edge_info->lhs = op0;
! 		      edge_info->rhs = op1;
! 		    }
! 		}
! 	    }
! 
! 	  /* ??? TRUTH_NOT_EXPR can create an equivalence too.  */
! 	}
!     }
! }
! 
! /* Propagate information from BB to its outgoing edges.
! 
!    This can include equivalency information implied by control statements
!    at the end of BB and const/copy propagation into PHIs in BB's
!    successor blocks.  */
  
  static void
! propagate_to_outgoing_edges (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
! 			     basic_block bb)
  {
+   
+   record_edge_info (bb);
    cprop_into_successor_phis (bb, nonzero_vars);
  }
  
*************** extract_range_from_cond (tree cond, tree
*** 3037,3046 ****
  static void
  record_range (tree cond, basic_block bb)
  {
!   /* We explicitly ignore NE_EXPRs.  They rarely allow for meaningful
!      range optimizations and significantly complicate the implementation.  */
!   if (COMPARISON_CLASS_P (cond)
!       && TREE_CODE (cond) != NE_EXPR
        && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 1))) == INTEGER_TYPE)
      {
        struct vrp_hash_elt *vrp_hash_elt;
--- 3277,3289 ----
  static void
  record_range (tree cond, basic_block bb)
  {
!   enum tree_code code = TREE_CODE (cond);
! 
!   /* We explicitly ignore NE_EXPRs and all the unordered comparisons.
!      They rarely allow for meaningful range optimizations and significantly
!      complicate the implementation.  */
!   if ((code == LT_EXPR || code == LE_EXPR || code == GT_EXPR
!        || code == GE_EXPR || code == EQ_EXPR)
        && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 1))) == INTEGER_TYPE)
      {
        struct vrp_hash_elt *vrp_hash_elt;
*************** record_range (tree cond, basic_block bb)
*** 3076,3205 ****
      }
  }
  
- /* Given a conditional statement IF_STMT, return the assignment 'X = Y'
-    known to be true depending on which arm of IF_STMT is taken.
- 
-    Not all conditional statements will result in a useful assignment.
-    Return NULL_TREE in that case.
- 
-    Also enter into the available expression table statements of
-    the form:
- 
-      TRUE ARM		FALSE ARM
-      1 = cond		1 = cond'
-      0 = cond'		0 = cond
- 
-    This allows us to lookup the condition in a dominated block and
-    get back a constant indicating if the condition is true.  */
- 
- static struct eq_expr_value
- get_eq_expr_value (tree if_stmt,
- 		   int true_arm,
- 		   basic_block bb)
- {
-   tree cond;
-   struct eq_expr_value retval;
- 
-   cond = COND_EXPR_COND (if_stmt);
-   retval.src = NULL;
-   retval.dst = NULL;
- 
-   /* If the conditional is a single variable 'X', return 'X = 1' for
-      the true arm and 'X = 0' on the false arm.  */
-   if (TREE_CODE (cond) == SSA_NAME)
-     {
-       retval.dst = cond;
-       retval.src = constant_boolean_node (true_arm, TREE_TYPE (cond));
-       return retval;
-     }
- 
-   /* If we have a comparison expression, then record its result into
-      the available expression table.  */
-   if (COMPARISON_CLASS_P (cond))
-     {
-       tree op0 = TREE_OPERAND (cond, 0);
-       tree op1 = TREE_OPERAND (cond, 1);
- 
-       /* Special case comparing booleans against a constant as we know
- 	 the value of OP0 on both arms of the branch.  i.e., we can record
- 	 an equivalence for OP0 rather than COND.  */
-       if ((TREE_CODE (cond) == EQ_EXPR || TREE_CODE (cond) == NE_EXPR)
- 	  && TREE_CODE (op0) == SSA_NAME
- 	  && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
- 	  && is_gimple_min_invariant (op1))
- 	{
- 	  if ((TREE_CODE (cond) == EQ_EXPR && true_arm)
- 	      || (TREE_CODE (cond) == NE_EXPR && ! true_arm))
- 	    {
- 	      retval.src = op1;
- 	    }
- 	  else
- 	    {
- 	      if (integer_zerop (op1))
- 		retval.src = boolean_true_node;
- 	      else
- 		retval.src = boolean_false_node;
- 	    }
- 	  retval.dst = op0;
- 	  return retval;
- 	}
- 
-       if (TREE_CODE (op0) == SSA_NAME
- 	  && (is_gimple_min_invariant (op1) || TREE_CODE (op1) == SSA_NAME))
- 	{
- 	  tree inverted = invert_truthvalue (cond);
- 
- 	  /* When we find an available expression in the hash table, we replace
- 	     the expression with the LHS of the statement in the hash table.
- 
- 	     So, we want to build statements such as "1 = <condition>" on the
- 	     true arm and "0 = <condition>" on the false arm.  That way if we
- 	     find the expression in the table, we will replace it with its
- 	     known constant value.  Also insert inversions of the result and
- 	     condition into the hash table.  */
- 	  if (true_arm)
- 	    {
- 	      record_cond (cond, boolean_true_node);
- 	      record_dominating_conditions (cond);
- 	      record_cond (inverted, boolean_false_node);
- 
- 	      if (TREE_CONSTANT (op1))
- 		record_range (cond, bb);
- 
- 		/* If the conditional is of the form 'X == Y', return 'X = Y'
- 		   for the true arm.  */
- 	      if (TREE_CODE (cond) == EQ_EXPR)
- 		{
- 		  retval.dst = op0;
- 		  retval.src = op1;
- 		  return retval;
- 		}
- 	    }
- 	  else
- 	    {
- 
- 	      record_cond (inverted, boolean_true_node);
- 	      record_dominating_conditions (inverted);
- 	      record_cond (cond, boolean_false_node);
- 
- 	      if (TREE_CONSTANT (op1))
- 		record_range (inverted, bb);
- 
- 		/* If the conditional is of the form 'X != Y', return 'X = Y'
- 		   for the false arm.  */
- 	      if (TREE_CODE (cond) == NE_EXPR)
- 		{
- 		  retval.dst = op0;
- 		  retval.src = op1;
- 		  return retval;
- 		}
- 	    }
- 	}
-     }
- 
-   return retval;
- }
- 
  /* Hashing and equality functions for VRP_DATA.
  
     Since this hash table is addressed by SSA_NAMEs, we can hash on
--- 3319,3324 ----

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-22 17:52 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-22 17:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]


It's clearly getting more difficult to find compile-time improvements
for 15524.  But that doesn't mean we're done :-)

As it turns out we've got more code which is just inlined versions
of find_edge, but without the ability to select which list (src->succ
or dest->pred) to walk depending on their lengths.

This patch replaces those inlined instances of find_edge with calls
to find_edge.  The big winner is tree CFG construction which goes
from .40 seconds to around .04 seconds.  There are other wins
scattered throughout the various passes.  In all we go from 11.63
seconds of total time to 10.97 seconds of total time -- a net
improvement of around 5%.

[ Please don't close this PR.  I still think there's another 4-5%
  that we'll be able to get without major surgery. ]



Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 13947 bytes --]

	* cfg.c (cached_make_edge): Use find_edge rather than an inlined
	variant.
	* cfgbuild.c (make_edges): Likewise.
	* cfghooks.c (can_duplicate_block_p): Likewise.
	* cfgloop.c (loop_latch_edge): Likewise.
	* cfgloopmanip.c (force_single_succ_latches): Likewise.
	* cfgrtl.c (rtl_flow_call_edges_add): Likewise.
	* predict.c (predict_loops, propagate_freq): Likewise. 
	* tracer.c (tail_duplicate): Likewise.
	* tree-cfg.c (disband_implicit_edges): Likewise.
	(tree_forwarder_block_p, tree_flow_call_edges_add): Likewise.

Index: cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfg.c,v
retrieving revision 1.76
diff -c -p -r1.76 cfg.c
*** cfg.c	22 Nov 2004 12:23:43 -0000	1.76
--- cfg.c	22 Nov 2004 16:57:55 -0000
*************** cached_make_edge (sbitmap *edge_cache, b
*** 288,294 ****
  {
    int use_edge_cache;
    edge e;
-   edge_iterator ei;
  
    /* Don't bother with edge cache for ENTRY or EXIT, if there aren't that
       many edges to them, or we didn't allocate memory for it.  */
--- 288,293 ----
*************** cached_make_edge (sbitmap *edge_cache, b
*** 309,320 ****
  
        /* Fall through.  */
      case 0:
!       FOR_EACH_EDGE (e, ei, src->succs)
! 	if (e->dest == dst)
! 	  {
! 	    e->flags |= flags;
! 	    return NULL;
! 	  }
        break;
      }
  
--- 308,319 ----
  
        /* Fall through.  */
      case 0:
!       e = find_edge (src, dst);
!       if (e)
! 	{
! 	  e->flags |= flags;
! 	  return NULL;
! 	}
        break;
      }
  
Index: cfgbuild.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgbuild.c,v
retrieving revision 1.55
diff -c -p -r1.55 cfgbuild.c
*** cfgbuild.c	28 Sep 2004 07:59:42 -0000	1.55
--- cfgbuild.c	22 Nov 2004 16:57:55 -0000
*************** make_edges (basic_block min, basic_block
*** 271,277 ****
        enum rtx_code code;
        int force_fallthru = 0;
        edge e;
-       edge_iterator ei;
  
        if (LABEL_P (BB_HEAD (bb))
  	  && LABEL_ALT_ENTRY_P (BB_HEAD (bb)))
--- 271,276 ----
*************** make_edges (basic_block min, basic_block
*** 390,401 ****
  
        /* Find out if we can drop through to the next block.  */
        insn = NEXT_INSN (insn);
!       FOR_EACH_EDGE (e, ei, bb->succs)
! 	if (e->dest == EXIT_BLOCK_PTR && e->flags & EDGE_FALLTHRU)
! 	  {
! 	    insn = 0;
! 	    break;
! 	  }
        while (insn
  	     && NOTE_P (insn)
  	     && NOTE_LINE_NUMBER (insn) != NOTE_INSN_BASIC_BLOCK)
--- 389,398 ----
  
        /* Find out if we can drop through to the next block.  */
        insn = NEXT_INSN (insn);
!       e = find_edge (bb, EXIT_BLOCK_PTR);
!       if (e && e->flags & EDGE_FALLTHRU)
! 	insn = NULL;
! 
        while (insn
  	     && NOTE_P (insn)
  	     && NOTE_LINE_NUMBER (insn) != NOTE_INSN_BASIC_BLOCK)
Index: cfghooks.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfghooks.c,v
retrieving revision 1.19
diff -c -p -r1.19 cfghooks.c
*** cfghooks.c	20 Nov 2004 05:02:28 -0000	1.19
--- cfghooks.c	22 Nov 2004 16:57:55 -0000
*************** bool
*** 675,681 ****
  can_duplicate_block_p (basic_block bb)
  {
    edge e;
-   edge_iterator ei;
  
    if (!cfg_hooks->can_duplicate_block_p)
      internal_error ("%s does not support can_duplicate_block_p.",
--- 675,680 ----
*************** can_duplicate_block_p (basic_block bb)
*** 686,694 ****
  
    /* Duplicating fallthru block to exit would require adding a jump
       and splitting the real last BB.  */
!   FOR_EACH_EDGE (e, ei, bb->succs)
!     if (e->dest == EXIT_BLOCK_PTR && e->flags & EDGE_FALLTHRU)
!        return false;
  
    return cfg_hooks->can_duplicate_block_p (bb);
  }
--- 685,693 ----
  
    /* Duplicating fallthru block to exit would require adding a jump
       and splitting the real last BB.  */
!   e = find_edge (bb, EXIT_BLOCK_PTR);
!   if (e && (e->flags & EDGE_FALLTHRU))
!     return false;
  
    return cfg_hooks->can_duplicate_block_p (bb);
  }
Index: cfgloop.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloop.c,v
retrieving revision 1.43
diff -c -p -r1.43 cfgloop.c
*** cfgloop.c	22 Nov 2004 12:23:44 -0000	1.43
--- cfgloop.c	22 Nov 2004 16:57:56 -0000
*************** verify_loop_structure (struct loops *loo
*** 1499,1512 ****
  edge
  loop_latch_edge (const struct loop *loop)
  {
!   edge e;
!   edge_iterator ei;
! 
!   FOR_EACH_EDGE (e, ei, loop->header->preds)
!     if (e->src == loop->latch)
!       break;
! 
!   return e;
  }
  
  /* Returns preheader edge of LOOP.  */
--- 1499,1505 ----
  edge
  loop_latch_edge (const struct loop *loop)
  {
!   return find_edge (loop->latch, loop->header);
  }
  
  /* Returns preheader edge of LOOP.  */
Index: cfgloopmanip.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloopmanip.c,v
retrieving revision 1.37
diff -c -p -r1.37 cfgloopmanip.c
*** cfgloopmanip.c	22 Nov 2004 12:23:45 -0000	1.37
--- cfgloopmanip.c	22 Nov 2004 16:57:56 -0000
*************** force_single_succ_latches (struct loops 
*** 1221,1234 ****
  
    for (i = 1; i < loops->num; i++)
      {
-       edge_iterator ei;
        loop = loops->parray[i];
        if (loop->latch != loop->header && EDGE_COUNT (loop->latch->succs) == 1)
  	continue;
  
!       FOR_EACH_EDGE (e, ei, loop->header->preds)
! 	if (e->src == loop->latch)
! 	  break;
  
        loop_split_edge_with (e, NULL_RTX);
      }
--- 1221,1231 ----
  
    for (i = 1; i < loops->num; i++)
      {
        loop = loops->parray[i];
        if (loop->latch != loop->header && EDGE_COUNT (loop->latch->succs) == 1)
  	continue;
  
!       e = find_edge (loop->latch, loop->header);
  
        loop_split_edge_with (e, NULL_RTX);
      }
Index: cfgrtl.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgrtl.c,v
retrieving revision 1.147
diff -c -p -r1.147 cfgrtl.c
*** cfgrtl.c	22 Nov 2004 12:23:45 -0000	1.147
--- cfgrtl.c	22 Nov 2004 16:57:57 -0000
*************** rtl_flow_call_edges_add (sbitmap blocks)
*** 2972,2986 ****
        if (need_fake_edge_p (insn))
  	{
  	  edge e;
- 	  edge_iterator ei;
  
! 	  FOR_EACH_EDGE (e, ei, bb->succs)
! 	    if (e->dest == EXIT_BLOCK_PTR)
! 	      {
! 		insert_insn_on_edge (gen_rtx_USE (VOIDmode, const0_rtx), e);
! 		commit_edge_insertions ();
! 		break;
! 	      }
  	}
      }
  
--- 2972,2984 ----
        if (need_fake_edge_p (insn))
  	{
  	  edge e;
  
! 	  e = find_edge (bb, EXIT_BLOCK_PTR);
! 	  if (e)
! 	    {
! 	      insert_insn_on_edge (gen_rtx_USE (VOIDmode, const0_rtx), e);
! 	      commit_edge_insertions ();
! 	    }
  	}
      }
  
*************** rtl_flow_call_edges_add (sbitmap blocks)
*** 3023,3031 ****
  #ifdef ENABLE_CHECKING
  	      if (split_at_insn == BB_END (bb))
  		{
! 		  edge_iterator ei;
! 		  FOR_EACH_EDGE (e, ei, bb->succs)
! 		    gcc_assert (e->dest != EXIT_BLOCK_PTR);
  		}
  #endif
  
--- 3021,3028 ----
  #ifdef ENABLE_CHECKING
  	      if (split_at_insn == BB_END (bb))
  		{
! 		  e = find_edge (bb, EXIT_BLOCK_PTR);
! 		  gcc_assert (e == NULL);
  		}
  #endif
  
Index: predict.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/predict.c,v
retrieving revision 1.134
diff -c -p -r1.134 predict.c
*** predict.c	19 Nov 2004 00:03:14 -0000	1.134
--- predict.c	22 Nov 2004 16:57:58 -0000
*************** predict_loops (struct loops *loops_info,
*** 669,681 ****
  
  	  /* Loop branch heuristics - predict an edge back to a
  	     loop's head as taken.  */
! 	  FOR_EACH_EDGE (e, ei, bb->succs)
! 	    if (e->dest == loop->header
! 		&& e->src == loop->latch)
! 	      {
! 		header_found = 1;
! 		predict_edge_def (e, PRED_LOOP_BRANCH, TAKEN);
! 	      }
  
  	  /* Loop exit heuristics - predict an edge exiting the loop if the
  	     conditional has no loop header successors as not taken.  */
--- 669,683 ----
  
  	  /* Loop branch heuristics - predict an edge back to a
  	     loop's head as taken.  */
! 	  if (bb == loop->latch)
! 	    {
! 	      e = find_edge (loop->latch, loop->header);
! 	      if (e)
! 		{
! 		  header_found = 1;
! 		  predict_edge_def (e, PRED_LOOP_BRANCH, TAKEN);
! 		}
! 	    }
  
  	  /* Loop exit heuristics - predict an edge exiting the loop if the
  	     conditional has no loop header successors as not taken.  */
*************** propagate_freq (struct loop *loop, bitma
*** 1660,1680 ****
  
        bitmap_clear_bit (tovisit, bb->index);
  
!       /* Compute back edge frequencies.  */
!       FOR_EACH_EDGE (e, ei, bb->succs)
! 	if (e->dest == head)
! 	  {
! 	    sreal tmp;
  	    
! 	    /* EDGE_INFO (e)->back_edge_prob
! 	       = ((e->probability * BLOCK_INFO (bb)->frequency)
! 	       / REG_BR_PROB_BASE); */
  	    
! 	    sreal_init (&tmp, e->probability, 0);
! 	    sreal_mul (&tmp, &tmp, &BLOCK_INFO (bb)->frequency);
! 	    sreal_mul (&EDGE_INFO (e)->back_edge_prob,
! 		       &tmp, &real_inv_br_prob_base);
! 	  }
  
        /* Propagate to successor blocks.  */
        FOR_EACH_EDGE (e, ei, bb->succs)
--- 1662,1681 ----
  
        bitmap_clear_bit (tovisit, bb->index);
  
!       e = find_edge (bb, head);
!       if (e)
! 	{
! 	  sreal tmp;
  	    
! 	  /* EDGE_INFO (e)->back_edge_prob
! 	     = ((e->probability * BLOCK_INFO (bb)->frequency)
! 	     / REG_BR_PROB_BASE); */
  	    
! 	  sreal_init (&tmp, e->probability, 0);
! 	  sreal_mul (&tmp, &tmp, &BLOCK_INFO (bb)->frequency);
! 	  sreal_mul (&EDGE_INFO (e)->back_edge_prob,
! 		     &tmp, &real_inv_br_prob_base);
! 	}
  
        /* Propagate to successor blocks.  */
        FOR_EACH_EDGE (e, ei, bb->succs)
Index: tracer.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tracer.c,v
retrieving revision 1.22
diff -c -p -r1.22 tracer.c
*** tracer.c	28 Sep 2004 07:59:50 -0000	1.22
--- tracer.c	22 Nov 2004 16:57:58 -0000
*************** tail_duplicate (void)
*** 275,286 ****
  	      && can_duplicate_block_p (bb2))
  	    {
  	      edge e;
- 	      edge_iterator ei;
  	      basic_block old = bb2;
  
! 	      FOR_EACH_EDGE (e, ei, bb2->preds)
! 		if (e->src == bb)
! 		  break;
  
  	      nduplicated += counts [bb2->index];
  	      bb2 = duplicate_block (bb2, e);
--- 275,283 ----
  	      && can_duplicate_block_p (bb2))
  	    {
  	      edge e;
  	      basic_block old = bb2;
  
! 	      e = find_edge (bb, bb2);
  
  	      nduplicated += counts [bb2->index];
  	      bb2 = duplicate_block (bb2, e);
Index: tree-cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-cfg.c,v
retrieving revision 2.112
diff -c -p -r2.112 tree-cfg.c
*** tree-cfg.c	19 Nov 2004 22:14:35 -0000	2.112
--- tree-cfg.c	22 Nov 2004 16:57:59 -0000
*************** disband_implicit_edges (void)
*** 2649,2659 ****
  	     from cfg_remove_useless_stmts here since it violates the
  	     invariants for tree--cfg correspondence and thus fits better
  	     here where we do it anyway.  */
! 	  FOR_EACH_EDGE (e, ei, bb->succs)
  	    {
- 	      if (e->dest != bb->next_bb)
- 		continue;
- 
  	      if (e->flags & EDGE_TRUE_VALUE)
  		COND_EXPR_THEN (stmt) = build_empty_stmt ();
  	      else if (e->flags & EDGE_FALSE_VALUE)
--- 2649,2657 ----
  	     from cfg_remove_useless_stmts here since it violates the
  	     invariants for tree--cfg correspondence and thus fits better
  	     here where we do it anyway.  */
! 	  e = find_edge (bb, bb->next_bb);
! 	  if (e)
  	    {
  	      if (e->flags & EDGE_TRUE_VALUE)
  		COND_EXPR_THEN (stmt) = build_empty_stmt ();
  	      else if (e->flags & EDGE_FALSE_VALUE)
*************** static bool
*** 3892,3899 ****
  tree_forwarder_block_p (basic_block bb)
  {
    block_stmt_iterator bsi;
-   edge e;
-   edge_iterator ei;
  
    /* BB must have a single outgoing edge.  */
    if (EDGE_COUNT (bb->succs) != 1
--- 3890,3895 ----
*************** tree_forwarder_block_p (basic_block bb)
*** 3911,3920 ****
    gcc_assert (bb != ENTRY_BLOCK_PTR);
  #endif
  
!   /* Successors of the entry block are not forwarders.  */
!   FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR->succs)
!     if (e->dest == bb)
!       return false;
  
    /* Now walk through the statements.  We can ignore labels, anything else
       means this is not a forwarder block.  */
--- 3907,3914 ----
    gcc_assert (bb != ENTRY_BLOCK_PTR);
  #endif
  
!   if (find_edge (ENTRY_BLOCK_PTR, bb))
!     return false;
  
    /* Now walk through the statements.  We can ignore labels, anything else
       means this is not a forwarder block.  */
*************** tree_flow_call_edges_add (sbitmap blocks
*** 5206,5212 ****
       Handle this by adding a dummy instruction in a new last basic block.  */
    if (check_last_block)
      {
-       edge_iterator ei;
        basic_block bb = EXIT_BLOCK_PTR->prev_bb;
        block_stmt_iterator bsi = bsi_last (bb);
        tree t = NULL_TREE;
--- 5200,5205 ----
*************** tree_flow_call_edges_add (sbitmap blocks
*** 5217,5229 ****
  	{
  	  edge e;
  
! 	  FOR_EACH_EDGE (e, ei, bb->succs)
! 	    if (e->dest == EXIT_BLOCK_PTR)
! 	      {
! 		bsi_insert_on_edge (e, build_empty_stmt ());
! 		bsi_commit_edge_inserts ();
! 		break;
! 	      }
  	}
      }
  
--- 5210,5221 ----
  	{
  	  edge e;
  
! 	  e = find_edge (bb, EXIT_BLOCK_PTR);
! 	  if (e)
! 	    {
! 	      bsi_insert_on_edge (e, build_empty_stmt ());
! 	      bsi_commit_edge_inserts ();
! 	    }
  	}
      }
  
*************** tree_flow_call_edges_add (sbitmap blocks
*** 5260,5268 ****
  #ifdef ENABLE_CHECKING
  		  if (stmt == last_stmt)
  		    {
! 		      edge_iterator ei;
! 		      FOR_EACH_EDGE (e, ei, bb->succs)
! 			gcc_assert (e->dest != EXIT_BLOCK_PTR);
  		    }
  #endif
  
--- 5252,5259 ----
  #ifdef ENABLE_CHECKING
  		  if (stmt == last_stmt)
  		    {
! 		      e = find_edge (bb, EXIT_BLOCK_PTR);
! 		      gcc_assert (e == NULL);
  		    }
  #endif
  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-21 21:18 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-21 21:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

And nearly another 20% improvement.

As mentioned in a previous message, the most expensive code right now
for pr15524 is the code to rescale edge probabilities.  This patch has
two changes to improve this situation.

First, if the scale factor is 1, then the probabilities will not change
and thus there is no work to do.  This is the common case for pr15524.

In the case where the scale factor is != 1, we compute the scale factor
outside the loop.  This saves us a divide every iteration of the loop.

Before:

dominator optimization:   3.50 (24%)
TOTAL                 :  14.53

After:

dominator optimization:   0.61 ( 5%)
TOTAL                 :  11.84

Bootstrapped and regression tested on i686-pc-linux-gnu.

[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 1108 bytes --]

	* cfg.c (update_bb_profile_for_threading): Do not rescale the
	successor probabilities if they are not going to change.  Pull
	division out of loop if we do need to rescale successor probabilities.
	
Index: cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfg.c,v
retrieving revision 1.74
diff -c -p -r1.74 cfg.c
*** cfg.c	20 Nov 2004 05:02:28 -0000	1.74
--- cfg.c	21 Nov 2004 20:39:19 -0000
*************** update_bb_profile_for_threading (basic_b
*** 941,949 ****
        for (; (c = ei_safe_edge (ei)); ei_next (&ei))
  	c->probability = 0;
      }
!   else
!     FOR_EACH_EDGE (c, ei, bb->succs)
!       c->probability = ((c->probability * REG_BR_PROB_BASE) / (double) prob);
  
    if (bb != taken_edge->src)
      abort ();
--- 941,953 ----
        for (; (c = ei_safe_edge (ei)); ei_next (&ei))
  	c->probability = 0;
      }
!   else if (prob != REG_BR_PROB_BASE)
!     {
!       int scale = REG_BR_PROB_BASE / prob;
! 
!       FOR_EACH_EDGE (c, ei, bb->succs)
! 	c->probability *= scale;
!     }
  
    if (bb != taken_edge->src)
      abort ();

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
  2004-11-20 20:49   ` Jeffrey A Law
@ 2004-11-21 14:17     ` Eric Botcazou
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Botcazou @ 2004-11-21 14:17 UTC (permalink / raw)
  To: law; +Cc: gcc-patches, gcc-bugzilla

> I haven't seen the comparison failure on i686 -- if I had I never
> would have checked in the patch.  PPC is the only other kind of box
> I have here locally.

Right, the problem seems to be very sensitive to the environment.  H.J. saw it 
on AMD64 while I didn't.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
  2004-11-20 20:41 ` Eric Botcazou
@ 2004-11-20 20:49   ` Jeffrey A Law
  2004-11-21 14:17     ` Eric Botcazou
  0 siblings, 1 reply; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-20 20:49 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, gcc-bugzilla

On Sat, 2004-11-20 at 21:33 +0100, Eric Botcazou wrote:
> > [ Yes, I got the message regarding the bootstrap comparison failure
> >   on darwin.  I'm going to try and reproduce it on a relatively old
> >   PPC box here.  It'll take quite some time though.  I do plan on
> >   flushing out one more patch while my PPC bootstrap is running. ]
> 
> There are problems on x86_64, SPARC, PowerPC and i686, see the PR.
I haven't seen the comparison failure on i686 -- if I had I never
would have checked in the patch.  PPC is the only other kind of box
I have here locally.

There's some other approaches I can (and will) try while the PPC
box is doing its thing.  
jeff


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
  2004-11-20 20:32 Jeffrey A Law
@ 2004-11-20 20:41 ` Eric Botcazou
  2004-11-20 20:49   ` Jeffrey A Law
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Botcazou @ 2004-11-20 20:41 UTC (permalink / raw)
  To: law; +Cc: gcc-patches, gcc-bugzilla

> [ Yes, I got the message regarding the bootstrap comparison failure
>   on darwin.  I'm going to try and reproduce it on a relatively old
>   PPC box here.  It'll take quite some time though.  I do plan on
>   flushing out one more patch while my PPC bootstrap is running. ]

There are problems on x86_64, SPARC, PowerPC and i686, see the PR.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-20 20:32 Jeffrey A Law
  2004-11-20 20:41 ` Eric Botcazou
  0 siblings, 1 reply; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-20 20:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1779 bytes --]



This gets regrename off the radar.

Look at this code from regrename.c and tell me if you can find the part
that isn't particularly intelligent:



  FOR_EACH_BB (bb)
    {
      /* If a block has a single predecessor, that we've already
         processed, begin with the value data that was live at
         the end of the predecessor block.  */
      /* ??? Ought to use more intelligent queuing of blocks.  */
      if (EDGE_COUNT (bb->preds) > 0)
        for (bbp = bb; bbp && bbp != EDGE_PRED (bb, 0)->src; bbp = bbp-
prev_bb);
      if (EDGE_COUNT (bb->preds) == 1
          && ! (EDGE_PRED (bb, 0)->flags & (EDGE_ABNORMAL_CALL |
EDGE_EH))
          && EDGE_PRED (bb, 0)->src != ENTRY_BLOCK_PTR
          && bbp)
        all_vd[bb->index] = all_vd[EDGE_PRED (bb, 0)->src->index];
      else
        init_value_data (all_vd + bb->index);

      if (copyprop_hardreg_forward_1 (bb, all_vd + bb->index))
        need_refresh = true;
    }


OK.  A hint.  bbp isn't used/set anywhere outside this loop...

OK.  Another hint.  If BB has more than one predecessor is the loop
walking through the prev_bb blocks to set bbp even necessary?

That's right.  We do that silly loop setting bbp even in cases where
it's totally unnecessary.  

Before:

rename registers      :  3.27 (18%) 
TOTAL                 : 17.77


After:
rename registers      :  0.19 ( 1%) 
TOTAL                 : 14.53


Yup, that's nearly a net 20% improvement for pr15524.  I would expect
a small improvement for more normal codes.

[ Yes, I got the message regarding the bootstrap comparison failure
  on darwin.  I'm going to try and reproduce it on a relatively old
  PPC box here.  It'll take quite some time though.  I do plan on
  flushing out one more patch while my PPC bootstrap is running. ]



[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 1289 bytes --]

	* regrename.c (copyprop_hardreg_forward): Only search for a
	previously processed block if the current block only has one
	predecessor.


Index: regrename.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/regrename.c,v
retrieving revision 1.90
diff -c -p -r1.90 regrename.c
*** regrename.c	4 Nov 2004 14:07:58 -0000	1.90
--- regrename.c	20 Nov 2004 20:14:54 -0000
*************** copyprop_hardreg_forward (void)
*** 1757,1763 ****
  	 processed, begin with the value data that was live at
  	 the end of the predecessor block.  */
        /* ??? Ought to use more intelligent queuing of blocks.  */
!       if (EDGE_COUNT (bb->preds) > 0)
  	for (bbp = bb; bbp && bbp != EDGE_PRED (bb, 0)->src; bbp = bbp->prev_bb);
        if (EDGE_COUNT (bb->preds) == 1
  	  && ! (EDGE_PRED (bb, 0)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
--- 1757,1763 ----
  	 processed, begin with the value data that was live at
  	 the end of the predecessor block.  */
        /* ??? Ought to use more intelligent queuing of blocks.  */
!       if (EDGE_COUNT (bb->preds) == 1)
  	for (bbp = bb; bbp && bbp != EDGE_PRED (bb, 0)->src; bbp = bbp->prev_bb);
        if (EDGE_COUNT (bb->preds) == 1
  	  && ! (EDGE_PRED (bb, 0)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-20  0:28 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-20  0:28 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

And the fun never stops....

Now that we've pushed down the SWITCH_EXPR problems with this PR, the
dominator optimizer has bubbled back up to the top of the compile time
hogs list.

There's two significant issues that stick out like a sore thumb.  
The first and clearly most problematical is updating the profile data.
I'm not attacking that problem just yet.

The second is the code to update the CFG and SSA graph in response
to jump threading -- it's doing some linear searches.  Those linear
searches are typically OK, but they do get to be relatively expensive
in this case.

Building a hash table for the unique threaded destinations and list of
the incoming edges for each of those destinations allows us to eliminate
the annoyingly slow linear searches completely and improve 15524 by a
couple percent.  Unfortunately, this showed no measurable improvement
on more normal codes.

Before:
dominator optimization:   4.02 (22%) usr
TOTAL                 :  18.21

After:
dominator optimization:   3.80 (21%) usr
TOTAL                 :  18.00

Not huge, but it's noticeable.

Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 23809 bytes --]

	* tree-ssa-threadupdate.c: Replace REDIRECTION_DATA varray with
	a hash table.  Extensive modifications throughout to support
	that change.
	(struct el): New.
	(struct local_info): New.
	(struct redirection_data): Add new INCOMING_EDGES and DO_NOT_DUPLICATE
	fields.
	(redirection_data): Now a hashtable.
	(redirection_data_hash, redirection_data_eq): New.
	(lookup_redirection_data, create_duplicates): New.
	(create_edge_and_update_destionation_phis): New.
	(fixup_template_block, redirect_edges): New.
	(thread_block): Use hash table traversals instead of loops over
	varray entries or incoming edge vectors.


Index: tree-ssa-threadupdate.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-threadupdate.c,v
retrieving revision 2.13
diff -c -p -r2.13 tree-ssa-threadupdate.c
*** tree-ssa-threadupdate.c	9 Nov 2004 19:25:04 -0000	2.13
--- tree-ssa-threadupdate.c	19 Nov 2004 22:53:26 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 73,86 ****
  
     Note that block duplication can be minimized by first collecting the
     the set of unique destination blocks that the incoming edges should
!    be threaded to.  Block duplication can be further minimized by using 
     B instead of creating B' for one destination if all edges into B are
!    going to be threaded to a successor of B.  */
  
  
  /* Main data structure recording information regarding B's duplicate
     blocks.  */
  
  struct redirection_data
  {
    /* A duplicate of B with the trailing control statement removed and which
--- 73,113 ----
  
     Note that block duplication can be minimized by first collecting the
     the set of unique destination blocks that the incoming edges should
!    be threaded to.  Block duplication can be further minimized by using
     B instead of creating B' for one destination if all edges into B are
!    going to be threaded to a successor of B.
  
+    We further reduce the number of edges and statements we create by
+    not copying all the outgoing edges and the control statement in
+    step #1.  We instead create a template block without the outgoing
+    edges and duplicate the template.  */
+ 
+ 
+ /* Steps #5 and #6 of the above algorithm are best implemented by walking
+    all the incoming edges which thread to the same destination edge at
+    the same time.  That avoids lots of table lookups to get information
+    for the destination edge.
+ 
+    To realize that implementation we create a list of incoming edges
+    which thread to the same outgoing edge.  Thus to implement steps
+    #5 and #6 we traverse our hash table of outgoing edge information.
+    For each entry we walk the list of incoming edges which thread to
+    the current outgoing edge.  */
+ 
+ struct el
+ {
+   edge e;
+   struct el *next;
+ };
  
  /* Main data structure recording information regarding B's duplicate
     blocks.  */
  
+ /* We need to efficiently record the unique thread destinations of this
+    block and specific information associated with those destinations.  We
+    may have many incoming edges threaded to the same outgoing edge.  This
+    can be naturaly implemented with a hash table.  */
+ 
  struct redirection_data
  {
    /* A duplicate of B with the trailing control statement removed and which
*************** struct redirection_data
*** 90,99 ****
    /* An outgoing edge from B.  DUP_BLOCK will have OUTGOING_EDGE->dest as
       its single successor.  */
    edge outgoing_edge;
  };
  
  /* Main data structure to hold information for duplicates of BB.  */
! static varray_type redirection_data;
  
  /* Remove the last statement in block BB if it is a control statement
     Also remove all outgoing edges except the edge which reaches DEST_BB.
--- 117,146 ----
    /* An outgoing edge from B.  DUP_BLOCK will have OUTGOING_EDGE->dest as
       its single successor.  */
    edge outgoing_edge;
+ 
+   /* A list of incoming edges which we want to thread to
+      OUTGOING_EDGE->dest.  */
+   struct el *incoming_edges;
+ 
+   /* Flag indicating whether or not we should create a duplicate block
+      for this thread destination.  This is only true if we are threading
+      all incoming edges and thus are using BB itself as a duplicate block.  */
+   bool do_not_duplicate;
  };
  
  /* Main data structure to hold information for duplicates of BB.  */
! static htab_t redirection_data;
! 
! /* Data structure of information to pass to hash table traversal routines.  */
! struct local_info
! {
!   /* The current block we are working on.  */
!   basic_block bb;
! 
!   /* A template copy of BB with no outgoing edges or control statement that
!      we use for creating copies.  */
!   basic_block template_block;
! };
  
  /* Remove the last statement in block BB if it is a control statement
     Also remove all outgoing edges except the edge which reaches DEST_BB.
*************** create_block_for_threading (basic_block 
*** 151,161 ****
    remove_ctrl_stmt_and_useless_edges (rd->dup_block, NULL);
  }
  
  /* BB is a block which ends with a COND_EXPR or SWITCH_EXPR and when BB
     is reached via one or more specific incoming edges, we know which
     outgoing edge from BB will be traversed.
  
!    We want to redirect those incoming edges to the target of the 
     appropriate outgoing edge.  Doing so avoids a conditional branch
     and may expose new optimization opportunities.  Note that we have
     to update dominator tree and SSA graph after such changes.
--- 198,434 ----
    remove_ctrl_stmt_and_useless_edges (rd->dup_block, NULL);
  }
  
+ /* Hashing and equality routines for our hash table.  */
+ static hashval_t
+ redirection_data_hash (const void *p)
+ {
+   edge e = ((struct redirection_data *)p)->outgoing_edge;
+   return htab_hash_pointer (e);
+ }
+ 
+ static int
+ redirection_data_eq (const void *p1, const void *p2)
+ {
+   edge e1 = ((struct redirection_data *)p1)->outgoing_edge;
+   edge e2 = ((struct redirection_data *)p2)->outgoing_edge;
+ 
+   return e1 == e2;
+ }
+ 
+ /* Given an outgoing edge E lookup and return its entry in our hash table.
+ 
+    If INSERT is true, then we insert the entry into the hash table if
+    it is not already present.  INCOMING_EDGE is added to the list of incoming
+    edges associated with E in the hash table.  */
+ 
+ static struct redirection_data *
+ lookup_redirection_data (edge e, edge incoming_edge, bool insert)
+ {
+   void **slot;
+   struct redirection_data *elt;
+ 
+  /* Build a hash table element so we can see if E is already
+      in the table.  */
+   elt = xmalloc (sizeof (struct redirection_data));
+   elt->outgoing_edge = e;
+   elt->dup_block = NULL;
+   elt->do_not_duplicate = false;
+   elt->incoming_edges = NULL;
+ 
+   slot = htab_find_slot (redirection_data, elt, insert);
+ 
+   /* This will only happen if INSERT is false and the entry is not
+      in the hash table.  */
+   if (slot == NULL)
+     {
+       free (elt);
+       return NULL;
+     }
+ 
+   /* This will only happen if E was not in the hash table and
+      INSERT is true.  */
+   if (*slot == NULL)
+     {
+       *slot = (void *)elt;
+       elt->incoming_edges = xmalloc (sizeof (struct el));
+       elt->incoming_edges->e = incoming_edge;
+       elt->incoming_edges->next = NULL;
+       return elt;
+     }
+   /* E was in the hash table.  */
+   else
+     {
+       /* Free ELT as we do not need it anymore, we will extract the
+ 	 relevant entry from the hash table itself.  */
+       free (elt);
+ 
+       /* Get the entry stored in the hash table.  */
+       elt = (struct redirection_data *) *slot;
+ 
+       /* If insertion was requested, then we need to add INCOMING_EDGE
+ 	 to the list of incoming edges associated with E.  */
+       if (insert)
+ 	{
+           struct el *el = xmalloc (sizeof (struct el));
+ 	  el->next = elt->incoming_edges;
+ 	  el->e = incoming_edge;
+ 	  elt->incoming_edges = el;
+ 	}
+ 
+       return elt;
+     }
+ }
+ 
+ /* Given a duplicate block and its single destination (both stored
+    in RD).  Create an edge between the duplicate and its single
+    destination.
+ 
+    Add an additional argument to any PHI nodes at the single
+    destination.  */
+ 
+ static void
+ create_edge_and_update_destination_phis (struct redirection_data *rd)
+ {
+   edge e = make_edge (rd->dup_block, rd->outgoing_edge->dest, EDGE_FALLTHRU);
+   tree phi;
+ 
+   /* If there are any PHI nodes at the destination of the outgoing edge
+      from the duplicate block, then we will need to add a new argument
+      to them.  The argument should have the same value as the argument
+      associated with the outgoing edge stored in RD.  */
+   for (phi = phi_nodes (e->dest); phi; phi = PHI_CHAIN (phi))
+     {
+       int indx = phi_arg_from_edge (phi, rd->outgoing_edge);
+       add_phi_arg (&phi, PHI_ARG_DEF_TREE (phi, indx), e);
+     }
+ }
+ 
+ /* Hash table traversal callback routine to create duplicate blocks.  */
+ 
+ static int
+ create_duplicates (void **slot, void *data)
+ {
+   struct redirection_data *rd = (struct redirection_data *) *slot;
+   struct local_info *local_info = (struct local_info *)data;
+ 
+   /* If this entry should not have a duplicate created, then there's
+      nothing to do.  */
+   if (rd->do_not_duplicate)
+     return 1;
+ 
+   /* Create a template block if we have not done so already.  Otherwise
+      use the template to create a new block.  */
+   if (local_info->template_block == NULL)
+     {
+       create_block_for_threading (local_info->bb, rd);
+       local_info->template_block = rd->dup_block;
+ 
+       /* We do not create any outgoing edges for the template.  We will
+ 	 take care of that in a later traversal.  That way we do not
+ 	 create edges that are going to just be deleted.  */
+     }
+   else
+     {
+       create_block_for_threading (local_info->template_block, rd);
+ 
+       /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
+          block.  */
+       create_edge_and_update_destination_phis (rd);
+     }
+ 
+   /* Keep walking the hash table.  */
+   return 1;
+ }
+ 
+ /* We did not create any outgoing edges for the template block during
+    block creation.  This hash table traversal callback creates the
+    outgoing edge for the template block.  */
+ 
+ static int
+ fixup_template_block (void **slot, void *data)
+ {
+   struct redirection_data *rd = (struct redirection_data *) *slot;
+   struct local_info *local_info = (struct local_info *)data;
+ 
+   /* If this is the template block, then create its outgoing edges
+      and halt the hash table traversal.  */
+   if (rd->dup_block && rd->dup_block == local_info->template_block)
+     {
+       create_edge_and_update_destination_phis (rd);
+       return 0;
+     }
+ 
+   return 1;
+ }
+ 
+ /* Hash table traversal callback to redirect each incoming edge
+    associated with this hash table element to its new destination.  */
+ 
+ static int
+ redirect_edges (void **slot, void *data)
+ {
+   struct redirection_data *rd = (struct redirection_data *) *slot;
+   struct local_info *local_info = (struct local_info *)data;
+   struct el *next, *el;
+ 
+   /* Walk over all the incoming edges associated associated with this
+      hash table entry.  */
+   for (el = rd->incoming_edges; el; el = next)
+     {
+       edge e = el->e;
+ 
+       /* Go ahead and free this element from the list.  Doing this now
+ 	 avoids the need for another list walk when we destroy the hash
+ 	 table.  */
+       next = el->next;
+       free (el);
+ 
+       /* Go ahead and clear E->aux.  It's not needed anymore and failure
+          to clear it will cause all kinds of unpleasant problems later.  */
+       e->aux = NULL;
+ 
+       if (rd->dup_block)
+ 	{
+ 	  edge e2;
+ 
+ 	  if (dump_file && (dump_flags & TDF_DETAILS))
+ 	    fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
+ 		     e->src->index, e->dest->index, rd->dup_block->index);
+ 
+ 	  /* Redirect the incoming edge to the appropriate duplicate
+ 	     block.  */
+ 	  e2 = redirect_edge_and_branch (e, rd->dup_block);
+ 	  flush_pending_stmts (e2);
+ 
+ 	  if ((dump_file && (dump_flags & TDF_DETAILS))
+ 	      && e->src != e2->src)
+ 	    fprintf (dump_file, "    basic block %d created\n", e2->src->index);
+ 	}
+       else
+ 	{
+ 	  if (dump_file && (dump_flags & TDF_DETAILS))
+ 	    fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
+ 		     e->src->index, e->dest->index, local_info->bb->index);
+ 
+ 	  /* We are using BB as the duplicate.  Remove the unnecessary
+ 	     outgoing edges and statements from BB.  */
+ 	  remove_ctrl_stmt_and_useless_edges (local_info->bb,
+ 					      rd->outgoing_edge->dest);
+ 
+ 	  /* And fixup the flags on the single remaining edge.  */
+ 	  EDGE_SUCC (local_info->bb, 0)->flags
+ 	    &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+ 	  EDGE_SUCC (local_info->bb, 0)->flags |= EDGE_FALLTHRU;
+ 	}
+     }
+   return 1;
+ }
+ 
  /* BB is a block which ends with a COND_EXPR or SWITCH_EXPR and when BB
     is reached via one or more specific incoming edges, we know which
     outgoing edge from BB will be traversed.
  
!    We want to redirect those incoming edges to the target of the
     appropriate outgoing edge.  Doing so avoids a conditional branch
     and may expose new optimization opportunities.  Note that we have
     to update dominator tree and SSA graph after such changes.
*************** thread_block (basic_block bb)
*** 187,205 ****
       redirect to a duplicate of BB.  */
    edge e;
    edge_iterator ei;
!   basic_block template_block;
  
    /* ALL indicates whether or not all incoming edges into BB should
       be threaded to a duplicate of BB.  */
    bool all = true;
  
!   unsigned int i;
! 
!   VARRAY_GENERIC_PTR_INIT (redirection_data, 2, "redirection data");
  
!   /* Look at each incoming edge into BB.  Record each unique outgoing
!      edge that we want to thread an incoming edge to.  Also note if
!      all incoming edges are threaded or not.  */
    FOR_EACH_EDGE (e, ei, bb->preds)
      {
        if (!e->aux)
--- 460,482 ----
       redirect to a duplicate of BB.  */
    edge e;
    edge_iterator ei;
!   struct local_info local_info;
  
    /* ALL indicates whether or not all incoming edges into BB should
       be threaded to a duplicate of BB.  */
    bool all = true;
  
!   /* To avoid scanning a linear array for the element we need we instead
!      use a hash table.  For normal code there should be no noticable
!      difference.  However, if we have a block with a large number of
!      incoming and outgoing edges such linear searches can get expensive.  */
!   redirection_data = htab_create (EDGE_COUNT (bb->succs),
! 				  redirection_data_hash,
! 				  redirection_data_eq,
! 				  free);
  
!   /* Record each unique threaded destination into a hash table for
!      efficient lookups.  */
    FOR_EACH_EDGE (e, ei, bb->preds)
      {
        if (!e->aux)
*************** thread_block (basic_block bb)
*** 208,246 ****
  	}
        else
  	{
! 	  unsigned int i;
  
! 	  /* See if we can find an entry for the destination of this
! 	     threaded edge that has already been recorded.  */
! 	  for (i = 0; i < VARRAY_ACTIVE_SIZE (redirection_data); i++)
! 	    {
! 	      struct redirection_data *rd;
! 	      edge e2;
! 
! 	      rd = VARRAY_GENERIC_PTR (redirection_data, i);
! 	      e2 = e->aux;
! 
! 	      if (e2->dest == rd->outgoing_edge->dest)
! 		break;
! 	    }
! 
! 	  /* If the loop did not terminate early, then we have a new
! 	     destination for the incoming threaded edges.  Record it.  */
! 	  if (i == VARRAY_ACTIVE_SIZE (redirection_data))
! 	    {
! 	      struct redirection_data *rd;
! 
! 	      rd = ggc_alloc_cleared (sizeof (struct redirection_data));
! 	      rd->outgoing_edge = e->aux;
! 	      VARRAY_PUSH_GENERIC_PTR (redirection_data, rd);
! 	    }
  	}
      }
  
!   /* Now create duplicates of BB.  Note that if all incoming edges are
!      threaded, then BB is going to become unreachable.  In that case
!      we use BB for one of the duplicates rather than wasting memory
!      duplicating BB.  Thus the odd starting condition for the loop.
  
       Note that for a block with a high outgoing degree we can waste
       a lot of time and memory creating and destroying useless edges.
--- 485,509 ----
  	}
        else
  	{
! 	  edge e2 = e->aux;
  
! 	  /* Insert the outgoing edge into the hash table if it is not
! 	     already in the hash table.  */
! 	  lookup_redirection_data (e2, e, true);
  	}
      }
  
!   /* If we are going to thread all incoming edges to an outgoing edge, then
!      BB will become unreachable.  Rather than just throwing it away, use
!      it for one of the duplicates.  Mark the first incoming edge with the
!      DO_NOT_DUPLICATE attribute.  */
!   if (all)
!     {
!       edge e = EDGE_PRED (bb, 0)->aux;
!       lookup_redirection_data (e, NULL, false)->do_not_duplicate = true;
!     }
! 
!   /* Now create duplicates of BB.
  
       Note that for a block with a high outgoing degree we can waste
       a lot of time and memory creating and destroying useless edges.
*************** thread_block (basic_block bb)
*** 249,382 ****
       tail of the duplicate as well as all outgoing edges from the
       duplicate.  We then use that duplicate block as a template for
       the rest of the duplicates.  */
!   template_block = NULL;
!   for (i = (all ? 1 : 0); i < VARRAY_ACTIVE_SIZE (redirection_data); i++)
!     {
!       struct redirection_data *rd = VARRAY_GENERIC_PTR (redirection_data, i);
! 
!       if (template_block == NULL)
! 	{
! 	  create_block_for_threading (bb, rd);
! 	  template_block = rd->dup_block;
! 	}
!       else
! 	{
! 	  create_block_for_threading (template_block, rd);
! 	}
!     }
! 
!   /* Now created up edges from the duplicate blocks to their new
!      destinations.  Doing this as a separate loop after block creation
!      allows us to avoid creating lots of useless edges.  */
!   for (i = (all ? 1 : 0); i < VARRAY_ACTIVE_SIZE (redirection_data); i++)
!     {
!       struct redirection_data *rd = VARRAY_GENERIC_PTR (redirection_data, i);
!       tree phi;
!       edge e;
! 
!       e = make_edge (rd->dup_block, rd->outgoing_edge->dest, EDGE_FALLTHRU);
! 
!       /* If there are any PHI nodes at the destination of the outgoing edge
! 	 from the duplicate block, then we will need to add a new argument
! 	 to them.  The argument should have the same value as the argument
! 	 associated with the outgoing edge stored in RD.  */
!       for (phi = phi_nodes (e->dest); phi; phi = PHI_CHAIN (phi))
! 	{
! 	  int indx = phi_arg_from_edge (phi, rd->outgoing_edge);
! 	  add_phi_arg (&phi, PHI_ARG_DEF_TREE (phi, indx), e);
! 	}
!     }
! 
!   /* The loop above created the duplicate blocks (and the statements
!      within the duplicate blocks).  This loop creates PHI nodes for the
!      duplicated blocks and redirects the incoming edges into BB to reach
!      the duplicates of BB.
! 
!      Note that redirecting the edge will change e->pred_next, so we have
!      to hold e->pred_next in a temporary. 
! 
!      If this turns out to be a performance problem, then we could create
!      a list of incoming edges associated with each entry in 
!      REDIRECTION_DATA and walk over that list of edges instead.  */
!   for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
!     {
!       edge new_dest = e->aux;
! 
!       /* E was not threaded, then there is nothing to do.  */
!       if (!new_dest)
! 	{
! 	  ei_next (&ei);
! 	  continue;
! 	}
! 
!       /* Go ahead and clear E->aux.  It's not needed anymore and failure
!          to clear it will cause all kinds of unpleasant problems later.  */
!       e->aux = NULL;
! 
!       /* We know E is an edge we want to thread.  Find the entry associated
!          with E's new destination in the REDIRECTION_DATA array.  */
!       for (i = 0; i < VARRAY_ACTIVE_SIZE (redirection_data); i++)
! 	{
! 	  struct redirection_data *rd;
! 
! 	  rd = VARRAY_GENERIC_PTR (redirection_data, i);
! 
! 	  /* We have found the right entry if the outgoing edge in this
! 	     entry matches E's new destination.  Note that if we have not
! 	     created a duplicate block (rd->dup_block is NULL), then we
! 	     are going to re-use BB as a duplicate and we do not need
! 	     to create PHI nodes or redirect the edge.  */
! 	  if (rd->outgoing_edge == new_dest && rd->dup_block)
! 	    {
! 	      edge e2;
! 
! 	      if (dump_file && (dump_flags & TDF_DETAILS))
! 		fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
! 			 e->src->index, e->dest->index, rd->dup_block->index);
! 
! 	      e2 = redirect_edge_and_branch (e, rd->dup_block);
! 	      flush_pending_stmts (e2);
! 
! 	      if ((dump_file && (dump_flags & TDF_DETAILS))
! 		  && e->src != e2->src)
! 	      fprintf (dump_file, "    basic block %d created\n",
! 		       e2->src->index);
! 	      break;
! 	    }
! 	}
!     }
! 
!   /* If all the incoming edges where threaded, then we used BB as one
!      of the duplicate blocks.  We need to fixup BB in that case so that
!      it no longer has a COND_EXPR or SWITCH_EXPR and reaches one destination
!      unconditionally.  */
!   if (all)
!     {
!       struct redirection_data *rd;
! 
!       rd = VARRAY_GENERIC_PTR (redirection_data, 0);
! 
!       if (dump_file && (dump_flags & TDF_DETAILS))
! 	fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
! 		 EDGE_PRED (bb, 0)->src->index, bb->index,
! 		 EDGE_SUCC (bb, 0)->dest->index);
! 
!       remove_ctrl_stmt_and_useless_edges (bb, rd->outgoing_edge->dest);
!       EDGE_SUCC (bb, 0)->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
!       EDGE_SUCC (bb, 0)->flags |= EDGE_FALLTHRU;
!     }
  
    /* Done with this block.  Clear REDIRECTION_DATA.  */
!   VARRAY_CLEAR (redirection_data);
  }
  
! /* Walk through all blocks and thread incoming edges to the block's 
     destinations as requested.  This is the only entry point into this
     file.
  
     Blocks which have one or more incoming edges have INCOMING_EDGE_THREADED
     set in the block's annotation.
-    this routine.
  
     Each edge that should be threaded has the new destination edge stored in
     the original edge's AUX field.
--- 512,545 ----
       tail of the duplicate as well as all outgoing edges from the
       duplicate.  We then use that duplicate block as a template for
       the rest of the duplicates.  */
!   local_info.template_block = NULL;
!   local_info.bb = bb;
!   htab_traverse (redirection_data, create_duplicates, &local_info);
! 
!   /* The template does not have an outgoing edge.  Create that outgoing
!      edge and update PHI nodes as the edge's target as necessary.
! 
!      We do this after creating all the duplicates to avoid creating
!      unnecessary edges.  */
!   htab_traverse (redirection_data, fixup_template_block, &local_info);
! 
!   /* The hash table traversals above created the duplicate blocks (and the
!      statements within the duplicate blocks).  This loop creates PHI nodes for
!      the duplicated blocks and redirects the incoming edges into BB to reach
!      the duplicates of BB.  */
!   htab_traverse (redirection_data, redirect_edges, &local_info);
  
    /* Done with this block.  Clear REDIRECTION_DATA.  */
!   htab_delete (redirection_data);
!   redirection_data = NULL;
  }
  
! /* Walk through all blocks and thread incoming edges to the block's
     destinations as requested.  This is the only entry point into this
     file.
  
     Blocks which have one or more incoming edges have INCOMING_EDGE_THREADED
     set in the block's annotation.
  
     Each edge that should be threaded has the new destination edge stored in
     the original edge's AUX field.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-17 21:40 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-17 21:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]

OK.  This ought to fix 18478, 18521 and any other breakages caused
by the introduction of CASE_LEADER _and_ speed up 15524 considerably.

As Kazu determined, sharing nodes within the case vector has some
rather unpleasant effects and IMHO fixing them was going to create
an unmaintainable mess in the tree copying code.

This patch kills the concept of a CASE_LEADER; however, it introduces
an edge to cases hash table which allows us to map from an edge to
the CASE_LABEL_EXPRs which use that edge.

Because we do not have all the hooks in place to be notified of
CFG changes, the hash table is not persistent.  Instead we create
on-demand in response to requests to redirect jumps out of SWITCH_EXPRs.

The code is designed so that it will work with or without the hash
table.  That way we don't have to worry about correctness issues
with passes that may redirect edges without creating the hash table.
[ Such passes will simply run slower if they redirect a lot of
  edges out of SWITCH_EXPRs.  ]

The effect on 15524 is dramatic.  Before this change:

tree CFG cleanup      :  15.14 (38%) usr   0.00 ( 0%) sys  15.16 (37%)
tree split crit edges :   7.50 (19%) usr   0.00 ( 0%) sys   7.50 (18%)
TOTAL                 :  40.27             0.80            41.09


After this change:


tree CFG cleanup      :   0.46 ( 3%) usr   0.00 ( 0%) sys   0.47 ( 2%)
tree split crit edges :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%)
TOTAL                 :  18.35             0.81            19.18


FWIW, if we could have a persistent hash table which was built during
CFG construction, the time for CFG cleanup would be effectively zero.

Bootstrapped and regression tested on i686-pc-linux-gnu.


[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 18013 bytes --]

	* tree-cfg.c (edge_to_cases): Renamed from edge_to_case_leader.
	(edge_to_cases_elt): Renamed from edge_to_case_leader.
	(edge_to_cases_hash): Renamed from edge_to_case_leader_hash.
	(edge_to_cases_eq): Renamed from edge_to_case_leader_eq.
	(edge_to_cases_cleanup, recording_case_labels_p): New functions.
	(get_cases_for_edge): New function.
	(start_recording_case_labels, end_recording_case_labels): Similarly.
	(record_switch_edge): Don't muck with the CASE_LABEL.  Instead
	chain equivalent CASE_LABEL_EXPRs together.
	(get_case_leader_for_edge, get_case_leader_for_edge_hash): Kill.
	(make_switch_expr_edges): Do not record edge/cases here.
	(cleanup_tree_cfg): Record cases around the call to thread_jumps.
	(split_critical_edges): Record cases around the edge splitting code.
	(cleanup_dead_labels): Use CASE_LABEL again.
	(tree_redirect_edge_and_branch): If we have a mapping from edge
	to cases, use it to handle redirections.  Else do it the slow way.
	* tree.h (CASE_LEADER_OR_LABEL): Kill.
	(CASE_LABEL): Revert to just looking at the tree's second operand.
	* tree.c (get_case_label): Kill.

Index: tree-cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-cfg.c,v
retrieving revision 2.108
diff -c -p -r2.108 tree-cfg.c
*** tree-cfg.c	14 Nov 2004 04:08:04 -0000	2.108
--- tree-cfg.c	17 Nov 2004 19:09:50 -0000
*************** static const int initial_cfg_capacity = 
*** 58,86 ****
     building of the CFG in code with lots of gotos.  */
  static GTY(()) varray_type label_to_block_map;
  
! /* This hash table allows us to efficiently lookup the one and only one
!    CASE_LABEL_EXPR which contains the LABEL_DECL for the target block
!    of one or more case statements.  Efficient access to this node
!    allows us to efficiently update the case vector in response to
!    edge redirections and similar operations. 
! 
!    Right now this is only used to set up case label leaders.  In the
!    future we hope to make this table more persistent and use it to
!    more efficiently update case labels.  */
  
! struct edge_to_case_leader_elt
  {
    /* The edge itself.  Necessary for hashing and equality tests.  */
    edge e;
  
!   /* The "leader" for all the CASE_LABEL_EXPRs which transfer control
!      to E->dest.  When we change the destination of E, we will need to
!      update the CASE_LEADER_OR_LABEL of this CASE_LABEL_EXPR (and no
!      others).  */
!   tree case_label;
  };
  
! static htab_t edge_to_case_leader;
  
  /* CFG statistics.  */
  struct cfg_stats_d
--- 58,89 ----
     building of the CFG in code with lots of gotos.  */
  static GTY(()) varray_type label_to_block_map;
  
! /* This hash table allows us to efficiently lookup all CASE_LABEL_EXPRs
!    which use a particular edge.  The CASE_LABEL_EXPRs are chained together
!    via their TREE_CHAIN field, which we clear after we're done with the
!    hash table to prevent problems with duplication of SWITCH_EXPRs.
! 
!    Access to this list of CASE_LABEL_EXPRs allows us to efficiently
!    update the case vector in response to edge redirections.
! 
!    Right now this table is set up and torn down at key points in the
!    compilation process.  It would be nice if we could make the table
!    more persistent.  The key is getting notification of changes to
!    the CFG (particularly edge removal, creation and redirection).  */
  
! struct edge_to_cases_elt
  {
    /* The edge itself.  Necessary for hashing and equality tests.  */
    edge e;
  
!   /* The case labels associated with this edge.  We link these up via
!      their TREE_CHAIN field, then we wipe out the TREE_CHAIN fields
!      when we destroy the hash table.  This prevents problems when copying
!      SWITCH_EXPRs.  */
!   tree case_labels;
  };
  
! static htab_t edge_to_cases;
  
  /* CFG statistics.  */
  struct cfg_stats_d
*************** make_cond_expr_edges (basic_block bb)
*** 601,644 ****
    make_edge (bb, else_bb, EDGE_FALSE_VALUE);
  }
  
! /* Hashing routine for EDGE_TO_CASE_LEADER.  */
  
  static hashval_t
! edge_to_case_leader_hash (const void *p)
  {
!   edge e = ((struct edge_to_case_leader_elt *)p)->e;
  
    /* Hash on the edge itself (which is a pointer).  */
    return htab_hash_pointer (e);
  }
  
! /* Equality routine for EDGE_TO_CASE_LEADER, edges are unique, so testing
     for equality is just a pointer comparison.  */
  
  static int
! edge_to_case_leader_eq (const void *p1, const void *p2)
  {
!   edge e1 = ((struct edge_to_case_leader_elt *)p1)->e;
!   edge e2 = ((struct edge_to_case_leader_elt *)p2)->e;
  
    return e1 == e2;
  }
  
  /* Record that CASE_LABEL (a CASE_LABEL_EXPR) references edge E.  */
  
  static void
  record_switch_edge (edge e, tree case_label)
  {
!   struct edge_to_case_leader_elt *elt;
    void **slot;
  
    /* Build a hash table element so we can see if E is already
       in the table.  */
!   elt = xmalloc (sizeof (struct edge_to_case_leader_elt));
    elt->e = e;
!   elt->case_label = case_label;
  
!   slot = htab_find_slot (edge_to_case_leader, elt, INSERT);
  
    if (*slot == NULL)
      {
--- 604,698 ----
    make_edge (bb, else_bb, EDGE_FALSE_VALUE);
  }
  
! /* Hashing routine for EDGE_TO_CASES.  */
  
  static hashval_t
! edge_to_cases_hash (const void *p)
  {
!   edge e = ((struct edge_to_cases_elt *)p)->e;
  
    /* Hash on the edge itself (which is a pointer).  */
    return htab_hash_pointer (e);
  }
  
! /* Equality routine for EDGE_TO_CASES, edges are unique, so testing
     for equality is just a pointer comparison.  */
  
  static int
! edge_to_cases_eq (const void *p1, const void *p2)
  {
!   edge e1 = ((struct edge_to_cases_elt *)p1)->e;
!   edge e2 = ((struct edge_to_cases_elt *)p2)->e;
  
    return e1 == e2;
  }
  
+ /* Called for each element in the hash table (P) as we delete the
+    edge to cases hash table.
+ 
+    Clear all the TREE_CHAINs to prevent problems with copying of 
+    SWITCH_EXPRs and structure sharing rules, then free the hash table
+    element.  */
+ 
+ static void
+ edge_to_cases_cleanup (void *p)
+ {
+   struct edge_to_cases_elt *elt = p;
+   tree t, next;
+ 
+   for (t = elt->case_labels; t; t = next)
+     {
+       next = TREE_CHAIN (t);
+       TREE_CHAIN (t) = NULL;
+     }
+   free (p);
+ }
+ 
+ /* Start recording information mapping edges to case labels.  */
+ 
+ static void
+ start_recording_case_labels (void)
+ {
+   gcc_assert (edge_to_cases == NULL);
+ 
+   edge_to_cases = htab_create (37,
+ 			       edge_to_cases_hash,
+ 			       edge_to_cases_eq,
+ 			       edge_to_cases_cleanup);
+ }
+ 
+ /* Return nonzero if we are recording information for case labels.  */
+ 
+ static bool
+ recording_case_labels_p (void)
+ {
+   return (edge_to_cases != NULL);
+ }
+ 
+ /* Stop recording information mapping edges to case labels and
+    remove any information we have recorded.  */
+ static void
+ end_recording_case_labels (void)
+ {
+   htab_delete (edge_to_cases);
+   edge_to_cases = NULL;
+ }
+ 
  /* Record that CASE_LABEL (a CASE_LABEL_EXPR) references edge E.  */
  
  static void
  record_switch_edge (edge e, tree case_label)
  {
!   struct edge_to_cases_elt *elt;
    void **slot;
  
    /* Build a hash table element so we can see if E is already
       in the table.  */
!   elt = xmalloc (sizeof (struct edge_to_cases_elt));
    elt->e = e;
!   elt->case_labels = case_label;
  
!   slot = htab_find_slot (edge_to_cases, elt, INSERT);
  
    if (*slot == NULL)
      {
*************** record_switch_edge (edge e, tree case_la
*** 652,721 ****
        free (elt);
  
        /* Get the entry stored in the hash table.  */
!       elt = (struct edge_to_case_leader_elt *) *slot;
  
!       /* Make ELT->case_label the leader for CASE_LABEL.  */
!       CASE_LEADER_OR_LABEL (case_label) = elt->case_label;
      }
  }
  
! /* Subroutine of get_case_leader_for_edge; returns the case leader for the
!    chain of CASE_LABEL_EXPRs associated with E using a hash table lookup.  */
  
  static tree
! get_case_leader_for_edge_hash (edge e)
  {
!   struct edge_to_case_leader_elt elt, *elt_p;
    void **slot;
  
    elt.e = e;
!   elt.case_label = NULL;
!   slot = htab_find_slot (edge_to_case_leader, &elt, NO_INSERT);
  
    if (slot)
      {
!       tree t;
! 
!       elt_p = (struct edge_to_case_leader_elt *)*slot;
!       t = elt_p->case_label;
! 
!       while (TREE_CODE (CASE_LEADER_OR_LABEL (t)) == CASE_LABEL_EXPR)
! 	t = CASE_LEADER_OR_LABEL (t);
!       return t;
      }
  
!   return NULL;
! }
! 
! /* Given an edge E, return the case leader for the chain of CASE_LABEL_EXPRs
!    which use E.  */
! 
! static tree
! get_case_leader_for_edge (edge e)
! {
!   tree vec, stmt;
!   size_t i, n;
  
!   /* If we have a hash table, then use it as it's significantly faster.  */
!   if (edge_to_case_leader)
!     return get_case_leader_for_edge_hash (e);
! 
!   /* No hash table.  We have to walk the case vector.  */
!   stmt = bsi_stmt (bsi_last (e->src));
!   vec = SWITCH_LABELS (stmt);
    n = TREE_VEC_LENGTH (vec);
- 
    for (i = 0; i < n; i++)
      {
!       tree elt = TREE_VEC_ELT (vec, i);
!       tree t = CASE_LEADER_OR_LABEL (elt);
! 
!       if (TREE_CODE (t) == LABEL_DECL
! 	  && label_to_block (t) == e->dest)
! 	return elt;
      }
! 
!   abort ();
  }
  
  /* Create the edges for a SWITCH_EXPR starting at block BB.
--- 706,761 ----
        free (elt);
  
        /* Get the entry stored in the hash table.  */
!       elt = (struct edge_to_cases_elt *) *slot;
  
!       /* Add it to the chain of CASE_LABEL_EXPRs referencing E.  */
!       TREE_CHAIN (case_label) = elt->case_labels;
!       elt->case_labels = case_label;
      }
  }
  
! /* If we are inside a {start,end}_recording_cases block, then return
!    a chain of CASE_LABEL_EXPRs from T which reference E.
! 
!    Otherwise return NULL.  */
  
  static tree
! get_cases_for_edge (edge e, tree t)
  {
!   struct edge_to_cases_elt elt, *elt_p;
    void **slot;
+   size_t i, n;
+   tree vec;
  
+   /* If we are not recording cases, then we do not have CASE_LABEL_EXPR
+      chains available.  Return NULL so the caller can detect this case.  */
+   if (!recording_case_labels_p ())
+     return NULL;
+   
+ restart:
    elt.e = e;
!   elt.case_labels = NULL;
!   slot = htab_find_slot (edge_to_cases, &elt, NO_INSERT);
  
    if (slot)
      {
!       elt_p = (struct edge_to_cases_elt *)*slot;
!       return elt_p->case_labels;
      }
  
!   /* If we did not find E in the hash table, then this must be the first
!      time we have been queried for information about E & T.  Add all the
!      elements from T to the hash table then perform the query again.  */
  
!   vec = SWITCH_LABELS (t);
    n = TREE_VEC_LENGTH (vec);
    for (i = 0; i < n; i++)
      {
!       tree lab = CASE_LABEL (TREE_VEC_ELT (vec, i));
!       basic_block label_bb = label_to_block (lab);
!       record_switch_edge (find_edge (e->src, label_bb), TREE_VEC_ELT (vec, i));
      }
!   goto restart;
  }
  
  /* Create the edges for a SWITCH_EXPR starting at block BB.
*************** make_switch_expr_edges (basic_block bb)
*** 732,753 ****
    vec = SWITCH_LABELS (entry);
    n = TREE_VEC_LENGTH (vec);
  
-   edge_to_case_leader
-     = htab_create (n, edge_to_case_leader_hash, edge_to_case_leader_eq, free);
- 
    for (i = 0; i < n; ++i)
      {
        tree lab = CASE_LABEL (TREE_VEC_ELT (vec, i));
        basic_block label_bb = label_to_block (lab);
!       edge e = make_edge (bb, label_bb, 0);
! 
!       if (!e)
! 	e = find_edge (bb, label_bb);
! 
!       record_switch_edge (e, TREE_VEC_ELT (vec, i));
      }
-   htab_delete (edge_to_case_leader);
-   edge_to_case_leader = NULL;
  }
  
  
--- 772,783 ----
    vec = SWITCH_LABELS (entry);
    n = TREE_VEC_LENGTH (vec);
  
    for (i = 0; i < n; ++i)
      {
        tree lab = CASE_LABEL (TREE_VEC_ELT (vec, i));
        basic_block label_bb = label_to_block (lab);
!       make_edge (bb, label_bb, 0);
      }
  }
  
  
*************** cleanup_tree_cfg (void)
*** 869,875 ****
--- 899,911 ----
  
    retval = cleanup_control_flow ();
    retval |= delete_unreachable_blocks ();
+ 
+   /* thread_jumps can redirect edges out of SWITCH_EXPRs, which can get
+      expensive.  So we want to enable recording of edge to CASE_LABEL_EXPR
+      mappings around the call to thread_jumps.  */
+   start_recording_case_labels ();
    retval |= thread_jumps ();
+   end_recording_case_labels ();
  
  #ifdef ENABLE_CHECKING
    if (retval)
*************** cleanup_dead_labels (void)
*** 1019,1025 ****
  	      {
  		tree elt = TREE_VEC_ELT (vec, i);
  		tree label = main_block_label (CASE_LABEL (elt));
! 		CASE_LEADER_OR_LABEL (elt) = label;
  	      }
  	    break;
  	  }
--- 1055,1061 ----
  	      {
  		tree elt = TREE_VEC_ELT (vec, i);
  		tree label = main_block_label (CASE_LABEL (elt));
! 		CASE_LABEL (elt) = label;
  	      }
  	    break;
  	  }
*************** tree_redirect_edge_and_branch (edge e, b
*** 4290,4319 ****
  
      case SWITCH_EXPR:
        {
! 	edge e2;
! 
!         /* We need to update the LABEL_DECL in the switch vector to
! 	   reflect the edge redirection.
  
! 	   There is precisely one CASE_LABEL_EXPR in the switch vector
! 	   which needs updating.  Either its label needs to be updated
! 	   or it needs to be directed to a new case leader.  */
! 	e2 = find_edge (e->src, dest);
! 	if (e2)
  	  {
! 	    /* In this case we need to change the case leader for the
! 	       current leader of E to be the case leader for E2.  */
! 	    tree e_leader = get_case_leader_for_edge (e);
! 	    tree e2_leader = get_case_leader_for_edge (e2);
! 	    CASE_LEADER_OR_LABEL (e_leader) = e2_leader;
  	  }
  	else
  	  {
! 	    /* No edge exists from E->src to DEST, so we will simply
! 	       change E->dest.  The case leader does not change, but
! 	       the LABEL_DECL for the leader does change.  */
! 	    CASE_LEADER_OR_LABEL (get_case_leader_for_edge (e)) = label;
  	  }
  	break;
        }
  
--- 4326,4372 ----
  
      case SWITCH_EXPR:
        {
!         tree cases = get_cases_for_edge (e, stmt);
! 	edge e2 = find_edge (e->src, dest);
  
! 	/* If we have a list of cases associated with E, then use it
! 	   as it's a lot faster than walking the entire case vector.  */
! 	if (cases)
  	  {
! 	    tree last, first;
! 
! 	    first = cases;
! 	    while (cases)
! 	      {
! 		last = cases;
! 		CASE_LABEL (cases) = label;
! 		cases = TREE_CHAIN (cases);
! 	      }
! 
! 	    /* If there was already an edge in the CFG, then we need
! 	       to move all the cases associated with E to E2.  */
! 	    if (e2)
! 	      {
! 		tree cases2 = get_cases_for_edge (e2, stmt);
! 
! 		TREE_CHAIN (last) = TREE_CHAIN (cases2);
! 		TREE_CHAIN (cases2) = first;
! 	      }
  	  }
  	else
  	  {
! 	    tree vec = SWITCH_LABELS (stmt);
! 	    size_t i, n = TREE_VEC_LENGTH (vec);
! 
! 	    for (i = 0; i < n; i++)
! 	      {
! 		tree elt = TREE_VEC_ELT (vec, i);
! 
! 		if (label_to_block (CASE_LABEL (elt)) == e->dest)
! 		  CASE_LABEL (elt) = label;
! 	      }
  	  }
+ 
  	break;
        }
  
*************** split_critical_edges (void)
*** 5329,5334 ****
--- 5382,5391 ----
    edge e;
    edge_iterator ei;
  
+   /* split_edge can redirect edges out of SWITCH_EXPRs, which can get
+      expensive.  So we want to enable recording of edge to CASE_LABEL_EXPR
+      mappings around the calls to split_edge.  */
+   start_recording_case_labels ();
    FOR_ALL_BB (bb)
      {
        FOR_EACH_EDGE (e, ei, bb->succs)
*************** split_critical_edges (void)
*** 5337,5342 ****
--- 5394,5400 ----
  	    split_edge (e);
  	  }
      }
+   end_recording_case_labels ();
  }
  
  struct tree_opt_pass pass_split_crit_edges = 
Index: tree.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.c,v
retrieving revision 1.448
diff -c -p -r1.448 tree.c
*** tree.c	15 Nov 2004 00:18:33 -0000	1.448
--- tree.c	17 Nov 2004 19:09:53 -0000
*************** signed_type_for (tree type)
*** 6061,6079 ****
    return lang_hooks.types.signed_type (type);
  }
  
- /* Return the LABEL_DECL associated with T, which must be a 
-    CASE_LABEL_EXPR.  This will walk through any CASE_LABEL_EXPRs
-    appearing in operand 2 until it finds a CASE_LABEL_EXPR with
-    a LABEL_DECL in operand 2.  */
- 
- tree
- get_case_label (tree t)
- {
-   while (TREE_CODE (CASE_LEADER_OR_LABEL (t)) == CASE_LABEL_EXPR)
-     t = CASE_LEADER_OR_LABEL (t);
-   return CASE_LEADER_OR_LABEL (t);
- }
- 
  /* Returns the largest value obtainable by casting something in INNER type to
     OUTER type.  */
  
--- 6061,6066 ----
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.655
diff -c -p -r1.655 tree.h
*** tree.h	15 Nov 2004 00:18:33 -0000	1.655
--- tree.h	17 Nov 2004 19:09:54 -0000
*************** struct tree_vec GTY(())
*** 1231,1245 ****
     of a case label, respectively.  */
  #define CASE_LOW(NODE)          	TREE_OPERAND ((NODE), 0)
  #define CASE_HIGH(NODE)         	TREE_OPERAND ((NODE), 1)
! 
! /* Operand 2 has two uses, it may either be a LABEL_DECL node or a
!    another CASE_LABEL_EXPR node.  This accessor gets direct access
!    to that operand.  Use it when you want to assign a value to
!    operand 2 or when you want to conditionalize actions based on
!    whether operand 2 is a LABEL_DECL or CASE_LABEL_EXPR.  */
! #define CASE_LEADER_OR_LABEL(NODE)	TREE_OPERAND ((NODE), 2)
! 
! #define CASE_LABEL(NODE) get_case_label (NODE)
  
  /* The operands of a BIND_EXPR.  */
  #define BIND_EXPR_VARS(NODE) (TREE_OPERAND (BIND_EXPR_CHECK (NODE), 0))
--- 1231,1237 ----
     of a case label, respectively.  */
  #define CASE_LOW(NODE)          	TREE_OPERAND ((NODE), 0)
  #define CASE_HIGH(NODE)         	TREE_OPERAND ((NODE), 1)
! #define CASE_LABEL(NODE)		TREE_OPERAND ((NODE), 2)
  
  /* The operands of a BIND_EXPR.  */
  #define BIND_EXPR_VARS(NODE) (TREE_OPERAND (BIND_EXPR_CHECK (NODE), 0))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-13  5:24 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-13  5:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]


And the saga continues....

This patch introduces the concept of a case leader to CASE_LABEL_EXPR.

As Kazu outlined, the case leader is the "representative value" for
the partition of cases which reach the same destination block.   The
case leader is the only CASE_LABEL_EXPR which contains a LABEL_DECL;
other members of the partition point to the case leader.

We convert to the case leader representation as we create edges for
each SWITCH_EXPR.

In this iteration of the change we still have to do a search of the
case vector to find the case leader when we redirect an edge out of
the SWITCH_EXPR.  A future version is expected to use a hash table,
but some kinks in that code still need to be worked out.

Even though we're still doing a search of the case vector, this
patch represents a big improvement for 15524.  Instead of having
to search the entire case vector when we redirect a jump, we 
(on average) half the case vector to find the leader.


To give you an idea of how this affects compile time, we have the
critical timevars before this patch:
 tree CFG cleanup      :  28.20 (49%) usr   0.06 ( 9%) sys  29.81 (48%) 
 tree split crit edges :  14.55 (25%) usr   0.02 ( 3%) sys  14.83 (24%)
 TOTAL                 :  57.76             0.70            62.18


After this patch:
 tree CFG cleanup      :  12.29 (36%) usr   0.01 ( 2%) sys  12.35 (34%)
 tree split crit edges :   5.91 (17%) usr   0.00 ( 0%) sys   5.92 (16%)
 TOTAL                 :  33.89             0.57            36.65

That's a pretty good improvement.

FWIW, a version which uses a persistent hash table for lookups of
edge to case leaders brings the CFG cleanup and critical edge
splitting phases down into the noise with a total time of < 18
seconds.

Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 12083 bytes --]

	* tree-cfg.c (hashtab.h): Include.
	(struct edge_to_case_leader_elt): New structure.
	(edge_to_case_leader): New.
	(edge_to_case_leader_hash): New hashtable hasing function.
	(edge_to_case_leader_eq): New hashtable equality function.
	(record_switch_edge): New function.
	(get_case_leader_for_edge, get_case_leader_for_edge): New functions.
	(make_switch_expr_edges): Build the edge-to-case-leader
	hash table.  Tear down the hash table when we're done.
	(cleanup_dead_labels): Use CASE_LEADER_OR_LABEL instead of
	CASE_LABEL.
	(tree_node_can_be_shared): Allow sharing of CASE_LABEL_EXPR nodes.
	(tree_redirect_edge_and_branch, case SWITCH_EXPR): Update
	to use new concept of case leaders to reduce overhead of
	redirecting outgoing edges from switch statements.
	* tree.c (get_case_label): New function.
	* tree.h (CASE_LABEL): Define in terms of get_case_label.
	(CASE_LEADER_OR_LABEL): Define.
Index: tree-cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-cfg.c,v
retrieving revision 2.105
diff -c -p -r2.105 tree-cfg.c
*** tree-cfg.c	9 Nov 2004 19:33:58 -0000	2.105
--- tree-cfg.c	13 Nov 2004 04:00:32 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 44,49 ****
--- 44,50 ----
  #include "except.h"
  #include "cfgloop.h"
  #include "cfglayout.h"
+ #include "hashtab.h"
  
  /* This file contains functions for building the Control Flow Graph (CFG)
     for a function tree.  */
*************** static const int initial_cfg_capacity = 
*** 57,62 ****
--- 58,87 ----
     building of the CFG in code with lots of gotos.  */
  static GTY(()) varray_type label_to_block_map;
  
+ /* This hash table allows us to efficiently lookup the one and only one
+    CASE_LABEL_EXPR which contains the LABEL_DECL for the target block
+    of one or more case statements.  Efficient access to this node
+    allows us to efficiently update the case vector in response to
+    edge redirections and similar operations. 
+ 
+    Right now this is only used to set up case label leaders.  In the
+    future we hope to make this table more persistent and use it to
+    more efficiently update case labels.  */
+ 
+ struct edge_to_case_leader_elt
+ {
+   /* The edge itself.  Necessary for hashing and equality tests.  */
+   edge e;
+ 
+   /* The "leader" for all the CASE_LABEL_EXPRs which transfer control
+      to E->dest.  When we change the destination of E, we will need to
+      update the CASE_LEADER_OR_LABEL of this CASE_LABEL_EXPR (and no
+      others).  */
+   tree case_label;
+ };
+ 
+ static htab_t edge_to_case_leader;
+ 
  /* CFG statistics.  */
  struct cfg_stats_d
  {
*************** make_cond_expr_edges (basic_block bb)
*** 576,581 ****
--- 601,722 ----
    make_edge (bb, else_bb, EDGE_FALSE_VALUE);
  }
  
+ /* Hashing routine for EDGE_TO_CASE_LEADER.  */
+ 
+ static hashval_t
+ edge_to_case_leader_hash (const void *p)
+ {
+   edge e = ((struct edge_to_case_leader_elt *)p)->e;
+ 
+   /* Hash on the edge itself (which is a pointer).  */
+   return htab_hash_pointer (e);
+ }
+ 
+ /* Equality routine for EDGE_TO_CASE_LEADER, edges are unique, so testing
+    for equality is just a pointer comparison.  */
+ 
+ static int
+ edge_to_case_leader_eq (const void *p1, const void *p2)
+ {
+   edge e1 = ((struct edge_to_case_leader_elt *)p1)->e;
+   edge e2 = ((struct edge_to_case_leader_elt *)p2)->e;
+ 
+   return e1 == e2;
+ }
+ 
+ /* Record that CASE_LABEL (a CASE_LABEL_EXPR) references edge E.  */
+ 
+ static void
+ record_switch_edge (edge e, tree case_label)
+ {
+   struct edge_to_case_leader_elt *elt;
+   void **slot;
+ 
+   /* Build a hash table element so we can see if E is already
+      in the table.  */
+   elt = xmalloc (sizeof (struct edge_to_case_leader_elt));
+   elt->e = e;
+   elt->case_label = case_label;
+ 
+   slot = htab_find_slot (edge_to_case_leader, elt, INSERT);
+ 
+   if (*slot == NULL)
+     {
+       /* E was not in the hash table.  Install E into the hash table.  */
+       *slot = (void *)elt;
+     }
+   else
+     {
+       /* E was already in the hash table.  Free ELT as we do not need it
+ 	 anymore.  */
+       free (elt);
+ 
+       /* Get the entry stored in the hash table.  */
+       elt = (struct edge_to_case_leader_elt *) *slot;
+ 
+       /* Make ELT->case_label the leader for CASE_LABEL.  */
+       CASE_LEADER_OR_LABEL (case_label) = elt->case_label;
+     }
+ }
+ 
+ /* Subroutine of get_case_leader_for_edge; returns the case leader for the
+    chain of CASE_LABEL_EXPRs associated with E using a hash table lookup.  */
+ 
+ static tree
+ get_case_leader_for_edge_hash (edge e)
+ {
+   struct edge_to_case_leader_elt elt, *elt_p;
+   void **slot;
+ 
+   elt.e = e;
+   elt.case_label = NULL;
+   slot = htab_find_slot (edge_to_case_leader, &elt, NO_INSERT);
+ 
+   if (slot)
+     {
+       tree t;
+ 
+       elt_p = (struct edge_to_case_leader_elt *)*slot;
+       t = elt_p->case_label;
+ 
+       while (TREE_CODE (CASE_LEADER_OR_LABEL (t)) == CASE_LABEL_EXPR)
+ 	t = CASE_LEADER_OR_LABEL (t);
+       return t;
+     }
+ 
+   return NULL;
+ }
+ 
+ /* Given an edge E, return the case leader for the chain of CASE_LABEL_EXPRs
+    which use E.  */
+ 
+ static tree
+ get_case_leader_for_edge (edge e)
+ {
+   tree vec, stmt;
+   size_t i, n;
+ 
+   /* If we have a hash table, then use it as it's significantly faster.  */
+   if (edge_to_case_leader)
+     return get_case_leader_for_edge_hash (e);
+ 
+   /* No hash table.  We have to walk the case vector.  */
+   stmt = bsi_stmt (bsi_last (e->src));
+   vec = SWITCH_LABELS (stmt);
+   n = TREE_VEC_LENGTH (vec);
+ 
+   for (i = 0; i < n; i++)
+     {
+       tree elt = TREE_VEC_ELT (vec, i);
+       tree t = CASE_LEADER_OR_LABEL (elt);
+ 
+       if (TREE_CODE (t) == LABEL_DECL
+ 	  && label_to_block (t) == e->dest)
+ 	return elt;
+     }
+ 
+   abort ();
+ }
  
  /* Create the edges for a SWITCH_EXPR starting at block BB.
     At this point, the switch body has been lowered and the
*************** make_switch_expr_edges (basic_block bb)
*** 591,602 ****
    vec = SWITCH_LABELS (entry);
    n = TREE_VEC_LENGTH (vec);
  
    for (i = 0; i < n; ++i)
      {
        tree lab = CASE_LABEL (TREE_VEC_ELT (vec, i));
        basic_block label_bb = label_to_block (lab);
!       make_edge (bb, label_bb, 0);
      }
  }
  
  
--- 732,753 ----
    vec = SWITCH_LABELS (entry);
    n = TREE_VEC_LENGTH (vec);
  
+   edge_to_case_leader
+     = htab_create (n, edge_to_case_leader_hash, edge_to_case_leader_eq, free);
+ 
    for (i = 0; i < n; ++i)
      {
        tree lab = CASE_LABEL (TREE_VEC_ELT (vec, i));
        basic_block label_bb = label_to_block (lab);
!       edge e = make_edge (bb, label_bb, 0);
! 
!       if (!e)
! 	e = find_edge (bb, label_bb);
! 
!       record_switch_edge (e, TREE_VEC_ELT (vec, i));
      }
+   htab_delete (edge_to_case_leader);
+   edge_to_case_leader = NULL;
  }
  
  
*************** cleanup_dead_labels (void)
*** 865,873 ****
    
  	    /* Replace all destination labels.  */
  	    for (i = 0; i < n; ++i)
! 	      CASE_LABEL (TREE_VEC_ELT (vec, i))
! 		= main_block_label (CASE_LABEL (TREE_VEC_ELT (vec, i)));
!   
  	    break;
  	  }
  
--- 1016,1026 ----
    
  	    /* Replace all destination labels.  */
  	    for (i = 0; i < n; ++i)
! 	      {
! 		tree elt = TREE_VEC_ELT (vec, i);
! 		tree label = main_block_label (CASE_LABEL (elt));
! 		CASE_LEADER_OR_LABEL (elt) = label;
! 	      }
  	    break;
  	  }
  
*************** tree_node_can_be_shared (tree t)
*** 3246,3251 ****
--- 3399,3407 ----
        || TREE_CODE (t) == SSA_NAME)
      return true;
  
+   if (TREE_CODE (t) == CASE_LABEL_EXPR)
+     return true;
+ 
    while (((TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
  	  /* We check for constants explicitly since they are not considered
  	     gimple invariants if they overflowed.  */
*************** tree_redirect_edge_and_branch (edge e, b
*** 4134,4150 ****
  
      case SWITCH_EXPR:
        {
! 	tree vec = SWITCH_LABELS (stmt);
! 	size_t i, n = TREE_VEC_LENGTH (vec);
  
! 	for (i = 0; i < n; ++i)
  	  {
! 	    tree elt = TREE_VEC_ELT (vec, i);
! 	    if (label_to_block (CASE_LABEL (elt)) == e->dest)
! 	      CASE_LABEL (elt) = label;
  	  }
        }
-       break;
  
      case RETURN_EXPR:
        bsi_remove (&bsi);
--- 4290,4321 ----
  
      case SWITCH_EXPR:
        {
! 	edge e2;
! 
!         /* We need to update the LABEL_DECL in the switch vector to
! 	   reflect the edge redirection.
  
! 	   There is precisely one CASE_LABEL_EXPR in the switch vector
! 	   which needs updating.  Either its label needs to be updated
! 	   or it needs to be directed to a new case leader.   */
! 	e2 = find_edge (e->src, dest);
! 	if (e2)
  	  {
! 	    /* In this case we need to change the case leader for the
! 	       current leader of E to be the case leader for E2.   */
! 	    tree e_leader = get_case_leader_for_edge (e);
! 	    tree e2_leader = get_case_leader_for_edge (e2);
! 	    CASE_LEADER_OR_LABEL (e_leader) = e2_leader;
  	  }
+ 	else
+ 	  {
+ 	    /* No edge exists from E->src to DEST, so we will simply
+ 	       change E->dest.  The case leader does not change, but
+ 	       the LABEL_DECL for the leader does change.  */
+ 	    CASE_LEADER_OR_LABEL (get_case_leader_for_edge (e)) = label;
+ 	  }
+ 	break;
        }
  
      case RETURN_EXPR:
        bsi_remove (&bsi);
Index: tree.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.c,v
retrieving revision 1.445
diff -c -p -r1.445 tree.c
*** tree.c	10 Nov 2004 17:34:39 -0000	1.445
--- tree.c	13 Nov 2004 04:00:34 -0000
*************** signed_type_for (tree type)
*** 6061,6064 ****
--- 6061,6077 ----
    return lang_hooks.types.signed_type (type);
  }
  
+ /* Return the LABEL_DECL associated with T, which must be a 
+    CASE_LABEL_EXPR.  This will walk through any CASE_LABEL_EXPRs
+    appearing in operand 2 until it finds a CASE_LABEL_EXPR with
+    a LABEL_DECL in operand 2.  */
+ 
+ tree
+ get_case_label (tree t)
+ {
+   while (TREE_CODE (CASE_LEADER_OR_LABEL (t)) == CASE_LABEL_EXPR)
+     t = CASE_LEADER_OR_LABEL (t);
+   return CASE_LEADER_OR_LABEL (t);
+ }
+ 
  #include "gt-tree.h"
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.651
diff -c -p -r1.651 tree.h
*** tree.h	11 Nov 2004 23:15:48 -0000	1.651
--- tree.h	13 Nov 2004 04:00:36 -0000
*************** struct tree_vec GTY(())
*** 1229,1237 ****
  
  /* CASE_LABEL_EXPR accessors. These give access to the high and low values
     of a case label, respectively.  */
! #define CASE_LOW(NODE)          TREE_OPERAND ((NODE), 0)
! #define CASE_HIGH(NODE)         TREE_OPERAND ((NODE), 1)
! #define CASE_LABEL(NODE)        TREE_OPERAND ((NODE), 2)
  
  /* The operands of a BIND_EXPR.  */
  #define BIND_EXPR_VARS(NODE) (TREE_OPERAND (BIND_EXPR_CHECK (NODE), 0))
--- 1229,1245 ----
  
  /* CASE_LABEL_EXPR accessors. These give access to the high and low values
     of a case label, respectively.  */
! #define CASE_LOW(NODE)          	TREE_OPERAND ((NODE), 0)
! #define CASE_HIGH(NODE)         	TREE_OPERAND ((NODE), 1)
! 
! /* Operand 2 has two uses, it may either be a LABEL_DECL node or a
!    another CASE_LABEL_EXPR node.  This accessor gets direct access
!    to that operand.  Use it when you want to assign a value to
!    operand 2 or when you want to conditionalalize actions based on
!    whether operand 2 is a LABEL_DELC or CASE_LABEL_EXPR.  */
! #define CASE_LEADER_OR_LABEL(NODE)	TREE_OPERAND ((NODE), 2)
! 
! #define CASE_LABEL(NODE) get_case_label (NODE)
  
  /* The operands of a BIND_EXPR.  */
  #define BIND_EXPR_VARS(NODE) (TREE_OPERAND (BIND_EXPR_CHECK (NODE), 0))
*************** extern void change_decl_assembler_name (
*** 3455,3460 ****
--- 3463,3469 ----
  extern int type_num_arguments (tree);
  extern bool associative_tree_code (enum tree_code);
  extern bool commutative_tree_code (enum tree_code);
+ extern tree get_case_label (tree);
  
  \f
  /* In stmt.c */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-10  5:23 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-10  5:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 2449 bytes --]


Whee, almost another 30% improvement...

In this installment we're attacking the insane amount of time we're
taking to insert fake edges in the CFG to connect infinite loops
to the exit block (which we do 3 times during the course of compiling
this testcase).

The testcase in pr15524 is somewhat unique in that it has a huge 
number of infinite loops.  In fact, that's about all the testcase
is once jumps are threaded.

I agree with Steven that it is very unlikely that any normal looking
code will have similar numbers of unreachable loops and such a large
enough CFG to trigger the insane compile-time behavior like we see
with pr15524.

However, I do think it is worth fixing this problem.  Particularly 
when we realize that we're doing unnecessary work anytime we encounter
a function with an infinite loop and that it's pretty trivial to
avoid the unnecessary work.

The fundamental problem is after DFS walking the reverse CFG we start
searching blocks from EXIT to ENTRY looking for the first one which
has not been visited.

We perform that search each time the reverse CFG DFS walk stops
(ie, each time we find an infinite loop), each time we start the search
at the EXIT block.

For any function with an infinite loop we're wasting compilation
time searching blocks multiple times.  In 15524 we happen to have
enough blocks and enough infinite loops to make the lameness of
our implementation obvious.

Luckily fixing this is quite trivial.  All we have to do is pass
in the unvisited block from the previous search and use that
as the starting point for the loop instead of EXIT_BLOCK.

This basically takes PRE and the branch predictors off the radar for
this testcase.

Before the fix:
 tree PRE              :  12.69 (15%) usr   0.01 ( 1%) sys  12.73 (15%) 
 branch prediction     :  11.36 (14%) usr   0.00 ( 0%) sys  11.37 (13%)
 TOTAL                 :  83.16             0.79            84.45


After the fix:

 tree PRE              :   0.12 ( 0%) usr   0.02 ( 3%) sys   0.14 ( 0%)
 branch prediction     :   0.41 ( 1%) usr   0.01 ( 1%) sys   0.41 ( 1%)
 TOTAL                 :  59.54             0.80            61.29

[ At this point our SWITCH_EXPR issues account for roughly 70% of
  the remaining compilation time.  Yup, that's right, I'd expect
  a reduction from ~60 seconds to around 15-25 seconds depending
  on how well we solve our SWITCH_EXPR issues. ]


Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 3351 bytes --]

	* cfganal.c (flow_dfs_compute_reverse_execute): Accept new
	argument holding last unvisited block.  Start search for
	unvisited blocks at LAST_UNVISITED rather than EXIT_BLOCK.
	(connect_infinite_loops_to_exit): Supply last unvisited block
	to flow_dfs_compute_reverse_execute.

Index: cfganal.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfganal.c,v
retrieving revision 1.53
diff -c -p -r1.53 cfganal.c
*** cfganal.c	9 Nov 2004 04:21:49 -0000	1.53
--- cfganal.c	10 Nov 2004 00:27:52 -0000
*************** typedef struct depth_first_search_dsS *d
*** 50,56 ****
  static void flow_dfs_compute_reverse_init (depth_first_search_ds);
  static void flow_dfs_compute_reverse_add_bb (depth_first_search_ds,
  					     basic_block);
! static basic_block flow_dfs_compute_reverse_execute (depth_first_search_ds);
  static void flow_dfs_compute_reverse_finish (depth_first_search_ds);
  static bool flow_active_insn_p (rtx);
  \f
--- 50,57 ----
  static void flow_dfs_compute_reverse_init (depth_first_search_ds);
  static void flow_dfs_compute_reverse_add_bb (depth_first_search_ds,
  					     basic_block);
! static basic_block flow_dfs_compute_reverse_execute (depth_first_search_ds,
! 						     basic_block);
  static void flow_dfs_compute_reverse_finish (depth_first_search_ds);
  static bool flow_active_insn_p (rtx);
  \f
*************** add_noreturn_fake_exit_edges (void)
*** 613,619 ****
  void
  connect_infinite_loops_to_exit (void)
  {
!   basic_block unvisited_block;
    struct depth_first_search_dsS dfs_ds;
  
    /* Perform depth-first search in the reverse graph to find nodes
--- 614,620 ----
  void
  connect_infinite_loops_to_exit (void)
  {
!   basic_block unvisited_block = EXIT_BLOCK_PTR;
    struct depth_first_search_dsS dfs_ds;
  
    /* Perform depth-first search in the reverse graph to find nodes
*************** connect_infinite_loops_to_exit (void)
*** 624,630 ****
    /* Repeatedly add fake edges, updating the unreachable nodes.  */
    while (1)
      {
!       unvisited_block = flow_dfs_compute_reverse_execute (&dfs_ds);
        if (!unvisited_block)
  	break;
  
--- 625,632 ----
    /* Repeatedly add fake edges, updating the unreachable nodes.  */
    while (1)
      {
!       unvisited_block = flow_dfs_compute_reverse_execute (&dfs_ds,
! 							  unvisited_block);
        if (!unvisited_block)
  	break;
  
*************** flow_dfs_compute_reverse_add_bb (depth_f
*** 847,853 ****
     available.  */
  
  static basic_block
! flow_dfs_compute_reverse_execute (depth_first_search_ds data)
  {
    basic_block bb;
    edge e;
--- 849,856 ----
     available.  */
  
  static basic_block
! flow_dfs_compute_reverse_execute (depth_first_search_ds data,
! 				  basic_block last_unvisited)
  {
    basic_block bb;
    edge e;
*************** flow_dfs_compute_reverse_execute (depth_
*** 865,871 ****
      }
  
    /* Determine if there are unvisited basic blocks.  */
!   FOR_BB_BETWEEN (bb, EXIT_BLOCK_PTR, NULL, prev_bb)
      if (!TEST_BIT (data->visited_blocks, bb->index - (INVALID_BLOCK + 1)))
        return bb;
  
--- 868,874 ----
      }
  
    /* Determine if there are unvisited basic blocks.  */
!   FOR_BB_BETWEEN (bb, last_unvisited, NULL, prev_bb)
      if (!TEST_BIT (data->visited_blocks, bb->index - (INVALID_BLOCK + 1)))
        return bb;
  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-09  5:39 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-09  5:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Ho hum.  Another 12% improvement. 

There's two closely related changes in this patch.

First, we can drastically speed up find_edge for code with large
switch statements.  Right now we call find_edge with a src/dst
edge pair.  find_edge walks src->succs to find a successor
with a matching destination block.

That is usually a pretty efficient scheme as most blocks have one
or two successors and thus the search terminates quickly.

However, when handling large switch statements that scheme can 
perform poorly.  Instead it is sometimes better to walk dst->preds
searching for the right src/dst pair.

Previously we had no good way to determine which list to search
through.  However, with the recent work on our pred/succ list
representations we can trivially check which of the two
important lists is shorter.  We (of course) search the shorter list :-)

redirect_edge_succ_nodup actually implements find_edge inline and thus
it does not perform well on blocks with lots of successors.  Clearly
this code can (and does) benefit from calling the significantly 
improved find_edge.

Key timevars before the change:




 cfg cleanup           :   2.80 ( 3%) usr   0.00 ( 0%) sys   2.78 ( 3%)
 tree CFG cleanup      :  34.23 (36%) usr   0.00 ( 0%) sys  34.26 (35%)
 dominator optimization:   5.11 ( 5%) usr   0.02 ( 2%) sys   5.13 ( 5%)
 tree split crit edges :  17.36 (18%) usr   0.00 ( 0%) sys  17.36 (18%)
 TOTAL                 :  96.21             0.82            97.53


Key timevars after the change:

 cfg cleanup           :   0.62 ( 1%) usr   0.00 ( 0%) sys   0.62 ( 1%)
 tree CFG cleanup      :  28.27 (33%) usr   0.00 ( 0%) sys  28.33 (32%)
 dominator optimization:   3.77 ( 4%) usr   0.01 ( 1%) sys   3.80 ( 4%)
 tree split crit edges :  15.58 (18%) usr   0.00 ( 0%) sys  15.57 (18%)
 TOTAL                 :  84.81             0.81            87.92

Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: PPP --]
[-- Type: text/plain, Size: 1857 bytes --]

	* cfg.c (redirect_edge_succ_nodup): Use find_edge rather than
	implementing it inline.

	* cfganal.c (find_edge): Search pred->succs or succ->preds,
	whichever is shorter.

Index: cfg.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfg.c,v
retrieving revision 1.72
diff -c -p -r1.72 cfg.c
*** cfg.c	25 Oct 2004 21:48:25 -0000	1.72
--- cfg.c	9 Nov 2004 04:09:12 -0000
*************** edge
*** 428,441 ****
  redirect_edge_succ_nodup (edge e, basic_block new_succ)
  {
    edge s;
-   edge_iterator ei;
- 
-   /* Check whether the edge is already present.  */
-   FOR_EACH_EDGE (s, ei, e->src->succs)
-     if (s->dest == new_succ && s != e)
-       break;
  
!   if (s)
      {
        s->flags |= e->flags;
        s->probability += e->probability;
--- 428,436 ----
  redirect_edge_succ_nodup (edge e, basic_block new_succ)
  {
    edge s;
  
!   s = find_edge (e->src, new_succ);
!   if (s && s != e)
      {
        s->flags |= e->flags;
        s->probability += e->probability;
Index: cfganal.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfganal.c,v
retrieving revision 1.52
diff -c -p -r1.52 cfganal.c
*** cfganal.c	4 Nov 2004 08:40:56 -0000	1.52
--- cfganal.c	9 Nov 2004 04:09:12 -0000
*************** find_edge (basic_block pred, basic_block
*** 478,486 ****
    edge e;
    edge_iterator ei;
  
!   FOR_EACH_EDGE (e, ei, pred->succs)
!     if (e->dest == succ)
!       return e;
  
    return NULL;
  }
--- 478,495 ----
    edge e;
    edge_iterator ei;
  
!   if (EDGE_COUNT (pred->succs) <= EDGE_COUNT (succ->preds))
!     {
!       FOR_EACH_EDGE (e, ei, pred->succs)
! 	if (e->dest == succ)
! 	  return e;
!     }
!   else
!     {
!       FOR_EACH_EDGE (e, ei, succ->preds)
! 	if (e->src == pred)
! 	  return e;
!     }
  
    return NULL;
  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-04  0:22 Jeffrey A Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-04  0:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]


We've known for a little while that the prediction code compile time
behavior for code like pr15524 is, err, poor at best.


The first problem is the prediction code, much like the code
to record loop bounds, was walking all the blocks in the function and
performing a member test on them.

To give you an idea of how expensive this style of coding is, here's
the timevar info for pr15524 before my patch:

branch prediction     :  44.17 (35%) usr   0.01 ( 1%) sys  44.28 (32%)

OUCH.  Over 1/3 of our total compilation time is spent in the branch
prediction code.

I changed the code to record the blocks to visit in a bitmap and
we use EXECUTE_IF_SET_IN_BITMAP.  After that change we get:


branch prediction     :  11.23 (12%) usr   0.00 ( 0%) sys  11.25 (12%)

Which is a nice improvement.  However, there's clearly an algorithmic
problem in this code.  If I'm reading the code correctly it has a
fundamental problem in that the nodes in the inner loops are processed
multiple times (once per loop nest).  It would seem to me we could
get a nice speedup by aggregating and recording the information from
inner loops, then not walking them again and instead using the
cached information.

Anyway, it's a pretty straightforward change and gives us a nice
speedup on the 15524 code.

Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: ZZZ --]
[-- Type: text/plain, Size: 8454 bytes --]

	* predict.c (struct block_info_def): Kill "tovisit" field.
	(propagate_freq): Accept new "tovisit" parameter.  Change
	read/write access methods for "tovisit" to check the "tovisit"
	bitmap instead of a bit in block_info_def.
	(estimate_loops_at_level): Allocate "tovisit" bitmap.  Pass
	it to propagate_freq.

Index: predict.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/predict.c,v
retrieving revision 1.130
diff -c -p -r1.130 predict.c
*** predict.c	24 Oct 2004 01:46:38 -0000	1.130
--- predict.c	4 Nov 2004 00:14:34 -0000
*************** static sreal real_zero, real_one, real_a
*** 73,80 ****
  
  static void combine_predictions_for_insn (rtx, basic_block);
  static void dump_prediction (FILE *, enum br_predictor, int, basic_block, int);
! static void estimate_loops_at_level (struct loop *loop);
! static void propagate_freq (struct loop *);
  static void estimate_bb_frequencies (struct loops *);
  static void predict_paths_leading_to (basic_block, int *, enum br_predictor, enum prediction);
  static bool last_basic_block_p (basic_block);
--- 73,80 ----
  
  static void combine_predictions_for_insn (rtx, basic_block);
  static void dump_prediction (FILE *, enum br_predictor, int, basic_block, int);
! static void estimate_loops_at_level (struct loop *, bitmap);
! static void propagate_freq (struct loop *, bitmap);
  static void estimate_bb_frequencies (struct loops *);
  static void predict_paths_leading_to (basic_block, int *, enum br_predictor, enum prediction);
  static bool last_basic_block_p (basic_block);
*************** typedef struct block_info_def
*** 1530,1538 ****
    /* To keep queue of basic blocks to process.  */
    basic_block next;
  
-   /* True if block needs to be visited in propagate_freq.  */
-   unsigned int tovisit:1;
- 
    /* Number of predecessors we need to visit first.  */
    int npredecessors;
  } *block_info;
--- 1530,1535 ----
*************** typedef struct edge_info_def
*** 1555,1585 ****
     Propagate the frequencies for LOOP.  */
  
  static void
! propagate_freq (struct loop *loop)
  {
    basic_block head = loop->header;
    basic_block bb;
    basic_block last;
    edge e;
    basic_block nextbb;
  
    /* For each basic block we need to visit count number of his predecessors
       we need to visit first.  */
!   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
      {
!       if (BLOCK_INFO (bb)->tovisit)
  	{
! 	  edge_iterator ei;
! 	  int count = 0;
  
! 	  FOR_EACH_EDGE (e, ei, bb->preds)
! 	    if (BLOCK_INFO (e->src)->tovisit && !(e->flags & EDGE_DFS_BACK))
! 	      count++;
! 	    else if (BLOCK_INFO (e->src)->tovisit
! 		     && dump_file && !EDGE_INFO (e)->back_edge)
! 	      fprintf (dump_file,
! 		       "Irreducible region hit, ignoring edge to %i->%i\n",
! 		       e->src->index, bb->index);
  	  BLOCK_INFO (bb)->npredecessors = count;
  	}
      }
--- 1552,1594 ----
     Propagate the frequencies for LOOP.  */
  
  static void
! propagate_freq (struct loop *loop, bitmap tovisit)
  {
    basic_block head = loop->header;
    basic_block bb;
    basic_block last;
+   int i;
    edge e;
    basic_block nextbb;
+   bitmap_iterator bi;
  
    /* For each basic block we need to visit count number of his predecessors
       we need to visit first.  */
!   EXECUTE_IF_SET_IN_BITMAP (tovisit, 0, i, bi)
      {
!       edge_iterator ei;
!       int count = 0;
! 
!       /* The outermost "loop" includes the exit block, which we can not
! 	 look up via BASIC_BLOCK.  Detect this and use EXIT_BLOCK_PTR
! 	 directly.  Do the same for the entry block just to be safe.  */
!       if (i == ENTRY_BLOCK)
! 	bb = ENTRY_BLOCK_PTR;
!       else if (i == EXIT_BLOCK)
! 	bb = EXIT_BLOCK_PTR;
!       else
! 	bb = BASIC_BLOCK (i);
! 
!       FOR_EACH_EDGE (e, ei, bb->preds)
  	{
! 	  bool visit = bitmap_bit_p (tovisit, e->src->index);
  
! 	  if (visit && !(e->flags & EDGE_DFS_BACK))
! 	    count++;
! 	  else if (visit && dump_file && !EDGE_INFO (e)->back_edge)
! 	    fprintf (dump_file,
! 		     "Irreducible region hit, ignoring edge to %i->%i\n",
! 		     e->src->index, bb->index);
  	  BLOCK_INFO (bb)->npredecessors = count;
  	}
      }
*************** propagate_freq (struct loop *loop)
*** 1602,1608 ****
  	{
  #ifdef ENABLE_CHECKING
  	  FOR_EACH_EDGE (e, ei, bb->preds)
! 	    if (BLOCK_INFO (e->src)->tovisit && !(e->flags & EDGE_DFS_BACK))
  	      abort ();
  #endif
  
--- 1611,1618 ----
  	{
  #ifdef ENABLE_CHECKING
  	  FOR_EACH_EDGE (e, ei, bb->preds)
! 	    if (bitmap_bit_p (tovisit, e->src->index)
! 		&& !(e->flags & EDGE_DFS_BACK))
  	      abort ();
  #endif
  
*************** propagate_freq (struct loop *loop)
*** 1648,1654 ****
  	    }
  	}
  
!       BLOCK_INFO (bb)->tovisit = 0;
  
        /* Compute back edge frequencies.  */
        FOR_EACH_EDGE (e, ei, bb->succs)
--- 1658,1664 ----
  	    }
  	}
  
!       bitmap_clear_bit (tovisit, bb->index);
  
        /* Compute back edge frequencies.  */
        FOR_EACH_EDGE (e, ei, bb->succs)
*************** propagate_freq (struct loop *loop)
*** 1688,1694 ****
  /* Estimate probabilities of loopback edges in loops at same nest level.  */
  
  static void
! estimate_loops_at_level (struct loop *first_loop)
  {
    struct loop *loop;
  
--- 1698,1704 ----
  /* Estimate probabilities of loopback edges in loops at same nest level.  */
  
  static void
! estimate_loops_at_level (struct loop *first_loop, bitmap tovisit)
  {
    struct loop *loop;
  
*************** estimate_loops_at_level (struct loop *fi
*** 1698,1704 ****
        basic_block *bbs;
        unsigned i;
  
!       estimate_loops_at_level (loop->inner);
  
        /* Do not do this for dummy function loop.  */
        if (EDGE_COUNT (loop->latch->succs) > 0)
--- 1708,1714 ----
        basic_block *bbs;
        unsigned i;
  
!       estimate_loops_at_level (loop->inner, tovisit);
  
        /* Do not do this for dummy function loop.  */
        if (EDGE_COUNT (loop->latch->succs) > 0)
*************** estimate_loops_at_level (struct loop *fi
*** 1710,1718 ****
  
        bbs = get_loop_body (loop);
        for (i = 0; i < loop->num_nodes; i++)
! 	BLOCK_INFO (bbs[i])->tovisit = 1;
        free (bbs);
!       propagate_freq (loop);
      }
  }
  
--- 1720,1728 ----
  
        bbs = get_loop_body (loop);
        for (i = 0; i < loop->num_nodes; i++)
! 	bitmap_set_bit (tovisit, bbs[i]->index);
        free (bbs);
!       propagate_freq (loop, tovisit);
      }
  }
  
*************** estimate_bb_frequencies (struct loops *l
*** 1787,1792 ****
--- 1797,1803 ----
    if (!flag_branch_probabilities || !counts_to_freqs ())
      {
        static int real_values_initialized = 0;
+       bitmap tovisit;
  
        if (!real_values_initialized)
          {
*************** estimate_bb_frequencies (struct loops *l
*** 1805,1810 ****
--- 1816,1822 ----
        EDGE_SUCC (ENTRY_BLOCK_PTR, 0)->probability = REG_BR_PROB_BASE;
  
        /* Set up block info for each basic block.  */
+       tovisit = BITMAP_XMALLOC ();
        alloc_aux_for_blocks (sizeof (struct block_info_def));
        alloc_aux_for_edges (sizeof (struct edge_info_def));
        FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
*************** estimate_bb_frequencies (struct loops *l
*** 1812,1818 ****
  	  edge e;
  	  edge_iterator ei;
  
- 	  BLOCK_INFO (bb)->tovisit = 0;
  	  FOR_EACH_EDGE (e, ei, bb->succs)
  	    {
  	      sreal_init (&EDGE_INFO (e)->back_edge_prob, e->probability, 0);
--- 1824,1829 ----
*************** estimate_bb_frequencies (struct loops *l
*** 1824,1830 ****
  
        /* First compute probabilities locally for each loop from innermost
           to outermost to examine probabilities for back edges.  */
!       estimate_loops_at_level (loops->tree_root);
  
        memcpy (&freq_max, &real_zero, sizeof (real_zero));
        FOR_EACH_BB (bb)
--- 1835,1841 ----
  
        /* First compute probabilities locally for each loop from innermost
           to outermost to examine probabilities for back edges.  */
!       estimate_loops_at_level (loops->tree_root, tovisit);
  
        memcpy (&freq_max, &real_zero, sizeof (real_zero));
        FOR_EACH_BB (bb)
*************** estimate_bb_frequencies (struct loops *l
*** 1843,1848 ****
--- 1854,1860 ----
  
        free_aux_for_blocks ();
        free_aux_for_edges ();
+       BITMAP_XFREE (tovisit);
      }
    compute_function_frequency ();
    if (flag_reorder_functions)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
  2004-11-03 15:08 ` Andrew Pinski
@ 2004-11-03 15:44   ` Richard Guenther
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Guenther @ 2004-11-03 15:44 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: law, gcc-patches, gcc-bugzilla

On Wed, 3 Nov 2004 10:07:58 -0500, Andrew Pinski <pinskia@physics.uc.edu> wrote:
> 
> On Nov 3, 2004, at 10:03 AM, Jeffrey A Law wrote:
> 
> >
> > With loop bounds recording no longer charged to the expander it's time
> > to deal with the inefficiencies which are in the switch expander.
> >
> > Basically we have code which does
> >
> > for each case
> >   for each case
> >     see if the label in the outer loop matches any of the cases in the
> >     inner loop (up to the current case in the outer loop).
> 
> We don't really need some of this code anyways because the gimplifier
> now reorders the case statements so that you don't have to do the loop
> at all.

I believe I have seen patches from Kazu that address this.

Richard.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
  2004-11-03 15:03 Jeffrey A Law
@ 2004-11-03 15:08 ` Andrew Pinski
  2004-11-03 15:44   ` Richard Guenther
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Pinski @ 2004-11-03 15:08 UTC (permalink / raw)
  To: law; +Cc: gcc-patches, gcc-bugzilla


On Nov 3, 2004, at 10:03 AM, Jeffrey A Law wrote:

>
> With loop bounds recording no longer charged to the expander it's time
> to deal with the inefficiencies which are in the switch expander.
>
> Basically we have code which does
>
> for each case
>   for each case
>     see if the label in the outer loop matches any of the cases in the
>     inner loop (up to the current case in the outer loop).

We don't really need some of this code anyways because the gimplifier
now reorders the case statements so that you don't have to do the loop
at all.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-03 15:03 Jeffrey A Law
  2004-11-03 15:08 ` Andrew Pinski
  0 siblings, 1 reply; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-03 15:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]


With loop bounds recording no longer charged to the expander it's time
to deal with the inefficiencies which are in the switch expander.

Basically we have code which does

for each case
  for each case
    see if the label in the outer loop matches any of the cases in the
    inner loop (up to the current case in the outer loop).

Needless to say this can get rather expensive.

This patch avoids the inner loop completely by keeping track of the
target labels we've seen in a bitmap and only incrementing the unique
target count when we encounter a target which hasn't been seen.

Before this patch we got the following times for expand in pr15524.


expand          :   6.53 ( 5%) usr   0.03 ( 4%) sys   6.56 ( 5%)

After this patch we see:

expand          :   0.71 ( 1%) usr   0.02 ( 3%) sys   0.75 ( 1%)


Not bad.  Unfortunately, no measurable difference on insn-attrtab.i
and friends.  Oh well.

Bootstrapped and regression tested on i686-pc-linux-gnu




[-- Attachment #2: ZZZ --]
[-- Type: text/plain, Size: 2593 bytes --]

	* stmt.c (expand_case): Speed up code to detect duplicate case
	label targets and count unique case label targets.

Index: stmt.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/stmt.c,v
retrieving revision 1.406
diff -c -p -r1.406 stmt.c
*** stmt.c	26 Oct 2004 17:25:32 -0000	1.406
--- stmt.c	3 Nov 2004 08:03:15 -0000
*************** expand_case (tree exp)
*** 2317,2323 ****
  {
    tree minval = NULL_TREE, maxval = NULL_TREE, range = NULL_TREE;
    rtx default_label = 0;
!   struct case_node *n, *m;
    unsigned int count, uniq;
    rtx index;
    rtx table_label;
--- 2317,2323 ----
  {
    tree minval = NULL_TREE, maxval = NULL_TREE, range = NULL_TREE;
    rtx default_label = 0;
!   struct case_node *n;
    unsigned int count, uniq;
    rtx index;
    rtx table_label;
*************** expand_case (tree exp)
*** 2354,2359 ****
--- 2354,2360 ----
    if (index_type != error_mark_node)
      {
        tree elt;
+       bitmap label_bitmap;
  
        /* cleanup_tree_cfg removes all SWITCH_EXPR with their index
  	 expressions being INTEGER_CST.  */
*************** expand_case (tree exp)
*** 2392,2397 ****
--- 2393,2399 ----
  
        uniq = 0;
        count = 0;
+       label_bitmap = BITMAP_XMALLOC ();
        for (n = case_list; n; n = n->right)
  	{
  	  /* Count the elements and track the largest and smallest
*************** expand_case (tree exp)
*** 2412,2428 ****
  	  if (! tree_int_cst_equal (n->low, n->high))
  	    count++;
  
! 	  /* Count the number of unique case node targets.  */
!           uniq++;
  	  lab = label_rtx (n->code_label);
!           for (m = case_list; m != n; m = m->right)
!             if (label_rtx (m->code_label) == lab)
!               {
!                 uniq--;
!                 break;
!               }
  	}
  
        /* cleanup_tree_cfg removes all SWITCH_EXPR with a single
  	 destination, such as one with a default case only.  */
        gcc_assert (count != 0);
--- 2414,2431 ----
  	  if (! tree_int_cst_equal (n->low, n->high))
  	    count++;
  
! 	  /* If we have not seen this label yet, then increase the
! 	     number of unique case node targets seen.  */
  	  lab = label_rtx (n->code_label);
! 	  if (!bitmap_bit_p (label_bitmap, CODE_LABEL_NUMBER (lab)))
! 	    {
! 	      bitmap_set_bit (label_bitmap, CODE_LABEL_NUMBER (lab));
! 	      uniq++;
! 	    }
  	}
  
+       BITMAP_XFREE (label_bitmap);
+ 
        /* cleanup_tree_cfg removes all SWITCH_EXPR with a single
  	 destination, such as one with a default case only.  */
        gcc_assert (count != 0);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
  2004-11-01  3:24 Jeffrey A Law
@ 2004-11-01  3:51 ` Daniel Berlin
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Berlin @ 2004-11-01  3:51 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: gcc-patches, gcc-bugzilla



On Sun, 31 Oct 2004, Jeffrey A Law wrote:

>
> More work to speed up 15524.  I was rather puzzled that "expand" was
> being charged with 30% of the compilation time.  I originally thought
> there was some lameness in the switch statement expanders.  However,
> it turned out I was totally wrong.
>
> The culprit is actually the code to record the number of iterations
> of each loop.  We never pushed a timevar for that pass and thus the
> time was charged to whatever was on the top of the timevar stack
> (which happens to be TV_EXPAND).
>
> The code to record the number of loop iterations was being rather
> dumb.  It basically did something like:
>
> foreach loop
>  foreach block
>     if block is in loop
>        code
>
>
> Clearly the problem is second loop -- it's walking *all* the blocks
> and testing for loop membership.  Ouch.

I actually had fixed this as part of my last patch, but was waiting till 
monday to commit it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases
@ 2004-11-01  3:24 Jeffrey A Law
  2004-11-01  3:51 ` Daniel Berlin
  0 siblings, 1 reply; 18+ messages in thread
From: Jeffrey A Law @ 2004-11-01  3:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-bugzilla

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]


More work to speed up 15524.  I was rather puzzled that "expand" was
being charged with 30% of the compilation time.  I originally thought
there was some lameness in the switch statement expanders.  However,
it turned out I was totally wrong.

The culprit is actually the code to record the number of iterations
of each loop.  We never pushed a timevar for that pass and thus the
time was charged to whatever was on the top of the timevar stack
(which happens to be TV_EXPAND).

The code to record the number of loop iterations was being rather
dumb.  It basically did something like:

foreach loop
  foreach block
     if block is in loop
        code


Clearly the problem is second loop -- it's walking *all* the blocks
and testing for loop membership.  Ouch.

It's far more efficient to use get_loop_body (which does a DFS walk 
starting at the beginning of the loop to the end of the loop).  To
give you some idea here's some data:

The original time without my patch is:

 expand                :  54.55 (30%) usr   0.03 ( 4%) sys  55.21 (30%)
 TOTAL                 : 181.76             0.85           185.17

With my patch:

 expand                :   6.78 ( 5%) usr   0.05 ( 6%) sys   7.20 ( 5%)
 TOTAL                 : 134.20             0.83           137.19

Not bad for 10 minutes of work.

I took the liberty to add a new timevar for the loop bounds recording
pass and to kill an unused field within the loop structure.

Bootstrapped and regression tested on i686-pc-linux-gnu.



[-- Attachment #2: ZZZ --]
[-- Type: text/plain, Size: 3947 bytes --]

	* cfgloop.h (struct loop): Remove unused "nodes" field.
	* timevar.def (TV_TREE_LOOP_BOUNDS): New.
	* tree-data-ref.c (find_data_references_in_loop): Use get_loop_body
	instead of calling flow_bb_inside_loop_p for every basic block
	in the function.
	* tree-ssa-loop.c (pass_record_bounds): Use TV_TREE_LOOP_BOUNDS.
	
Index: cfgloop.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfgloop.h,v
retrieving revision 1.34
diff -c -p -r1.34 cfgloop.h
*** cfgloop.h	25 Oct 2004 21:46:17 -0000	1.34
--- cfgloop.h	1 Nov 2004 03:18:03 -0000
*************** struct loop
*** 97,105 ****
       the loop latch.  */
    basic_block last;
  
-   /* Bitmap of blocks contained within the loop.  */
-   sbitmap nodes;
- 
    /* Number of blocks contained within the loop.  */
    unsigned num_nodes;
  
--- 97,102 ----
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.38
diff -c -p -r1.38 timevar.def
*** timevar.def	25 Oct 2004 21:18:15 -0000	1.38
--- timevar.def	1 Nov 2004 03:18:03 -0000
*************** DEFTIMEVAR (TV_TREE_DCE		     , "tree co
*** 83,88 ****
--- 83,89 ----
  DEFTIMEVAR (TV_TREE_CD_DCE	     , "tree aggressive DCE")
  DEFTIMEVAR (TV_TREE_DSE		     , "tree DSE")
  DEFTIMEVAR (TV_TREE_LOOP	     , "tree loop optimization")
+ DEFTIMEVAR (TV_TREE_LOOP_BOUNDS	     , "tree record loop bounds")
  DEFTIMEVAR (TV_LIM                   , "loop invariant motion")
  DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv creation")
  DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
Index: tree-data-ref.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-data-ref.c,v
retrieving revision 2.13
diff -c -p -r2.13 tree-data-ref.c
*** tree-data-ref.c	22 Oct 2004 17:05:06 -0000	2.13
--- tree-data-ref.c	1 Nov 2004 03:18:04 -0000
*************** tree 
*** 2204,2217 ****
  find_data_references_in_loop (struct loop *loop, varray_type *datarefs)
  {
    bool dont_know_node_not_inserted = true;
!   basic_block bb;
    block_stmt_iterator bsi;
  
!   FOR_EACH_BB (bb)
      {
!       if (!flow_bb_inside_loop_p (loop, bb))
! 	continue;
!       
        for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi))
          {
  	  tree stmt = bsi_stmt (bsi);
--- 2204,2219 ----
  find_data_references_in_loop (struct loop *loop, varray_type *datarefs)
  {
    bool dont_know_node_not_inserted = true;
!   basic_block bb, *bbs;
!   unsigned int i;
    block_stmt_iterator bsi;
  
!   bbs = get_loop_body (loop);
! 
!   for (i = 0; i < loop->num_nodes; i++)
      {
!       bb = bbs[i];
! 
        for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi))
          {
  	  tree stmt = bsi_stmt (bsi);
*************** find_data_references_in_loop (struct loo
*** 2264,2269 ****
--- 2266,2273 ----
  	compute_estimated_nb_iterations (bb->loop_father);
      }
  
+   free (bbs);
+ 
    return dont_know_node_not_inserted ? NULL_TREE : chrec_dont_know;
  }
  
Index: tree-ssa-loop.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop.c,v
retrieving revision 2.20
diff -c -p -r2.20 tree-ssa-loop.c
*** tree-ssa-loop.c	28 Sep 2004 20:39:46 -0000	2.20
--- tree-ssa-loop.c	1 Nov 2004 03:20:37 -0000
*************** struct tree_opt_pass pass_record_bounds 
*** 317,323 ****
    NULL,					/* sub */
    NULL,					/* next */
    0,					/* static_pass_number */
!   0,			  		/* tv_id */
    PROP_cfg | PROP_ssa,			/* properties_required */
    0,					/* properties_provided */
    0,					/* properties_destroyed */
--- 317,323 ----
    NULL,					/* sub */
    NULL,					/* next */
    0,					/* static_pass_number */
!   TV_TREE_LOOP_BOUNDS,	  		/* tv_id */
    PROP_cfg | PROP_ssa,			/* properties_required */
    0,					/* properties_provided */
    0,					/* properties_destroyed */

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-11-22 17:15 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-30  1:03 [Bug tree-optimization/15524] [4.0 Regression] jump threading on trees is slow with switch statements with large # of cases Jeffrey A Law
2004-11-01  3:24 Jeffrey A Law
2004-11-01  3:51 ` Daniel Berlin
2004-11-03 15:03 Jeffrey A Law
2004-11-03 15:08 ` Andrew Pinski
2004-11-03 15:44   ` Richard Guenther
2004-11-04  0:22 Jeffrey A Law
2004-11-09  5:39 Jeffrey A Law
2004-11-10  5:23 Jeffrey A Law
2004-11-13  5:24 Jeffrey A Law
2004-11-17 21:40 Jeffrey A Law
2004-11-20  0:28 Jeffrey A Law
2004-11-20 20:32 Jeffrey A Law
2004-11-20 20:41 ` Eric Botcazou
2004-11-20 20:49   ` Jeffrey A Law
2004-11-21 14:17     ` Eric Botcazou
2004-11-21 21:18 Jeffrey A Law
2004-11-22 17:52 Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).