public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
@ 2016-11-17 20:48 Segher Boessenkool
  2016-11-18  8:09 ` Richard Biener
  2016-11-19 20:16 ` Andreas Schwab
  0 siblings, 2 replies; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-17 20:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: stevenb.gcc, rguenther, law, Segher Boessenkool

For code like the testcase in PR71785 GCC factors all the indirect branches
to a single dispatcher that then everything jumps to.  This is because
having many indirect branches with each many jump targets does not scale
in large parts of the compiler.  Very late in the pass pipeline (right
before peephole2) the indirect branches are then unfactored again, by
the duplicate_computed_gotos pass.

This pass works by replacing branches to such a common dispatcher by a
copy of the dispatcher.  For code like this testcase this does not work
so well: most cases do a single addition instruction right before the
dispatcher, but not all, and we end up with only two indirect jumps: the
one without the addition, and the one with the addition in its own basic
block, and now everything else jumps _there_.

This patch rewrites the algorithm to deal with this.  It also makes it
simpler: it does not need the "candidates" array anymore, it does not
need RTL layout mode, it does not need cleanup_cfg, and it does not
need to keep track of what blocks it already visited.

Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?


Segher


2016-11-17  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/71785
	* bb-reorder.c (maybe_duplicate_computed_goto): New function.
	(duplicate_computed_gotos): New function.
	(pass_duplicate_computed_gotos::execute): Rewrite.

---
 gcc/bb-reorder.c | 214 ++++++++++++++++++++++++-------------------------------
 1 file changed, 93 insertions(+), 121 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 85bc569..90de2de 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2587,11 +2587,100 @@ make_pass_reorder_blocks (gcc::context *ctxt)
   return new pass_reorder_blocks (ctxt);
 }
 
+/* Duplicate a block (that we already know ends in a computed jump) into its
+   predecessors, where possible.  Return whether anything is changed.  */
+static bool
+maybe_duplicate_computed_goto (basic_block bb, int max_size)
+{
+  if (single_pred_p (bb))
+    return false;
+
+  /* Make sure that the block is small enough.  */
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    if (INSN_P (insn))
+      {
+	max_size -= get_attr_min_length (insn);
+	if (max_size < 0)
+	   return false;
+      }
+
+  bool changed = false;
+  edge e;
+  edge_iterator ei;
+  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
+    {
+      basic_block pred = e->src;
+
+      /* Do not duplicate BB into PRED if that is the last predecessor, or if
+	 we cannot merge a copy of BB with PRED.  */
+      if (single_pred_p (bb)
+	  || !single_succ_p (pred)
+	  || e->flags & EDGE_COMPLEX
+	  || pred->index < NUM_FIXED_BLOCKS
+	  || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
+	{
+	  ei_next (&ei);
+	  continue;
+	}
+
+      if (dump_file)
+	fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
+		 bb->index, e->src->index);
+
+      /* Remember if PRED can be duplicated; if so, the copy of BB merged
+	 with PRED can be duplicated as well.  */
+      bool can_dup_more = can_duplicate_block_p (pred);
+
+      /* Make a copy of BB, merge it into PRED.  */
+      basic_block copy = duplicate_block (bb, e, NULL);
+      emit_barrier_after_bb (copy);
+      reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
+      merge_blocks (pred, copy);
+
+      changed = true;
+
+      /* Try to merge the resulting merged PRED into further predecessors.  */
+      if (can_dup_more)
+	maybe_duplicate_computed_goto (pred, max_size);
+    }
+
+  return changed;
+}
+
 /* Duplicate the blocks containing computed gotos.  This basically unfactors
    computed gotos that were factored early on in the compilation process to
-   speed up edge based data flow.  We used to not unfactoring them again,
-   which can seriously pessimize code with many computed jumps in the source
-   code, such as interpreters.  See e.g. PR15242.  */
+   speed up edge based data flow.  We used to not unfactor them again, which
+   can seriously pessimize code with many computed jumps in the source code,
+   such as interpreters.  See e.g. PR15242.  */
+static void
+duplicate_computed_gotos (function *fun)
+{
+  /* We are estimating the length of uncond jump insn only once
+     since the code for getting the insn length always returns
+     the minimal length now.  */
+  if (uncond_jump_length == 0)
+    uncond_jump_length = get_uncond_jump_length ();
+
+  /* Never copy a block larger than this.  */
+  int max_size
+    = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
+
+  bool changed = false;
+
+  /* Try to duplicate all blocks that end in a computed jump and that
+     can be duplicated at all.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
+      changed |= maybe_duplicate_computed_goto (bb, max_size);
+
+  /* Duplicating blocks will redirect edges and may cause hot blocks
+    previously reached by both hot and cold blocks to become dominated
+    only by cold blocks.  */
+  if (changed)
+    fixup_partitions ();
+}
 
 namespace {
 
@@ -2634,125 +2723,8 @@ pass_duplicate_computed_gotos::gate (function *fun)
 unsigned int
 pass_duplicate_computed_gotos::execute (function *fun)
 {
-  basic_block bb, new_bb;
-  bitmap candidates;
-  int max_size;
-  bool changed = false;
+  duplicate_computed_gotos (fun);
 
-  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
-    return 0;
-
-  clear_bb_flags ();
-  cfg_layout_initialize (0);
-
-  /* We are estimating the length of uncond jump insn only once
-     since the code for getting the insn length always returns
-     the minimal length now.  */
-  if (uncond_jump_length == 0)
-    uncond_jump_length = get_uncond_jump_length ();
-
-  max_size
-    = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
-  candidates = BITMAP_ALLOC (NULL);
-
-  /* Look for blocks that end in a computed jump, and see if such blocks
-     are suitable for unfactoring.  If a block is a candidate for unfactoring,
-     mark it in the candidates.  */
-  FOR_EACH_BB_FN (bb, fun)
-    {
-      rtx_insn *insn;
-      edge e;
-      edge_iterator ei;
-      int size, all_flags;
-
-      /* Build the reorder chain for the original order of blocks.  */
-      if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
-	bb->aux = bb->next_bb;
-
-      /* Obviously the block has to end in a computed jump.  */
-      if (!computed_jump_p (BB_END (bb)))
-	continue;
-
-      /* Only consider blocks that can be duplicated.  */
-      if (CROSSING_JUMP_P (BB_END (bb))
-	  || !can_duplicate_block_p (bb))
-	continue;
-
-      /* Make sure that the block is small enough.  */
-      size = 0;
-      FOR_BB_INSNS (bb, insn)
-	if (INSN_P (insn))
-	  {
-	    size += get_attr_min_length (insn);
-	    if (size > max_size)
-	       break;
-	  }
-      if (size > max_size)
-	continue;
-
-      /* Final check: there must not be any incoming abnormal edges.  */
-      all_flags = 0;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	all_flags |= e->flags;
-      if (all_flags & EDGE_COMPLEX)
-	continue;
-
-      bitmap_set_bit (candidates, bb->index);
-    }
-
-  /* Nothing to do if there is no computed jump here.  */
-  if (bitmap_empty_p (candidates))
-    goto done;
-
-  /* Duplicate computed gotos.  */
-  FOR_EACH_BB_FN (bb, fun)
-    {
-      if (bb->flags & BB_VISITED)
-	continue;
-
-      bb->flags |= BB_VISITED;
-
-      /* BB must have one outgoing edge.  That edge must not lead to
-	 the exit block or the next block.
-	 The destination must have more than one predecessor.  */
-      if (!single_succ_p (bb)
-	  || single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (fun)
-	  || single_succ (bb) == bb->next_bb
-	  || single_pred_p (single_succ (bb)))
-	continue;
-
-      /* The successor block has to be a duplication candidate.  */
-      if (!bitmap_bit_p (candidates, single_succ (bb)->index))
-	continue;
-
-      /* Don't duplicate a partition crossing edge, which requires difficult
-         fixup.  */
-      if (JUMP_P (BB_END (bb)) && CROSSING_JUMP_P (BB_END (bb)))
-	continue;
-
-      new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
-      new_bb->aux = bb->aux;
-      bb->aux = new_bb;
-      new_bb->flags |= BB_VISITED;
-      changed = true;
-    }
-
- done:
-  if (changed)
-    {
-      /* Duplicating blocks above will redirect edges and may cause hot
-	 blocks previously reached by both hot and cold blocks to become
-	 dominated only by cold blocks.  */
-      fixup_partitions ();
-
-      /* Merge the duplicated blocks into predecessors, when possible.  */
-      cfg_layout_finalize ();
-      cleanup_cfg (0);
-    }
-  else
-    cfg_layout_finalize ();
-
-  BITMAP_FREE (candidates);
   return 0;
 }
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-17 20:48 [PATCH v3] bb-reorder: Improve compgotos pass (PR71785) Segher Boessenkool
@ 2016-11-18  8:09 ` Richard Biener
  2016-11-18  9:07   ` Segher Boessenkool
  2016-11-18 12:46   ` Segher Boessenkool
  2016-11-19 20:16 ` Andreas Schwab
  1 sibling, 2 replies; 8+ messages in thread
From: Richard Biener @ 2016-11-18  8:09 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, stevenb.gcc, law

On Thu, 17 Nov 2016, Segher Boessenkool wrote:

> For code like the testcase in PR71785 GCC factors all the indirect branches
> to a single dispatcher that then everything jumps to.  This is because
> having many indirect branches with each many jump targets does not scale
> in large parts of the compiler.  Very late in the pass pipeline (right
> before peephole2) the indirect branches are then unfactored again, by
> the duplicate_computed_gotos pass.
> 
> This pass works by replacing branches to such a common dispatcher by a
> copy of the dispatcher.  For code like this testcase this does not work
> so well: most cases do a single addition instruction right before the
> dispatcher, but not all, and we end up with only two indirect jumps: the
> one without the addition, and the one with the addition in its own basic
> block, and now everything else jumps _there_.
> 
> This patch rewrites the algorithm to deal with this.  It also makes it
> simpler: it does not need the "candidates" array anymore, it does not
> need RTL layout mode, it does not need cleanup_cfg, and it does not
> need to keep track of what blocks it already visited.
> 
> Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
> of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?

Looks good to me - a single question below:

> 
> Segher
> 
> 
> 2016-11-17  Segher Boessenkool  <segher@kernel.crashing.org>
> 
> 	PR rtl-optimization/71785
> 	* bb-reorder.c (maybe_duplicate_computed_goto): New function.
> 	(duplicate_computed_gotos): New function.
> 	(pass_duplicate_computed_gotos::execute): Rewrite.
> 
> ---
>  gcc/bb-reorder.c | 214 ++++++++++++++++++++++++-------------------------------
>  1 file changed, 93 insertions(+), 121 deletions(-)
> 
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 85bc569..90de2de 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2587,11 +2587,100 @@ make_pass_reorder_blocks (gcc::context *ctxt)
>    return new pass_reorder_blocks (ctxt);
>  }
>  
> +/* Duplicate a block (that we already know ends in a computed jump) into its
> +   predecessors, where possible.  Return whether anything is changed.  */
> +static bool
> +maybe_duplicate_computed_goto (basic_block bb, int max_size)
> +{
> +  if (single_pred_p (bb))
> +    return false;
> +
> +  /* Make sure that the block is small enough.  */
> +  rtx_insn *insn;
> +  FOR_BB_INSNS (bb, insn)
> +    if (INSN_P (insn))
> +      {
> +	max_size -= get_attr_min_length (insn);
> +	if (max_size < 0)
> +	   return false;
> +      }
> +
> +  bool changed = false;
> +  edge e;
> +  edge_iterator ei;
> +  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
> +    {
> +      basic_block pred = e->src;
> +
> +      /* Do not duplicate BB into PRED if that is the last predecessor, or if
> +	 we cannot merge a copy of BB with PRED.  */
> +      if (single_pred_p (bb)
> +	  || !single_succ_p (pred)
> +	  || e->flags & EDGE_COMPLEX
> +	  || pred->index < NUM_FIXED_BLOCKS
> +	  || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
> +	{
> +	  ei_next (&ei);
> +	  continue;
> +	}
> +
> +      if (dump_file)
> +	fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
> +		 bb->index, e->src->index);
> +
> +      /* Remember if PRED can be duplicated; if so, the copy of BB merged
> +	 with PRED can be duplicated as well.  */
> +      bool can_dup_more = can_duplicate_block_p (pred);
> +
> +      /* Make a copy of BB, merge it into PRED.  */
> +      basic_block copy = duplicate_block (bb, e, NULL);
> +      emit_barrier_after_bb (copy);
> +      reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
> +      merge_blocks (pred, copy);

there is can_merge_blocks_p and I would have expected the RTL CFG hooks
to do the insn "merging" (you use reorder_insns_nobb for this which is
actually deprecated?).  I suppose if you'd specify pred as third arg
to duplicate_block the CFG hook would have done its work 
(can_merge_blocks_p checks a->next_bb == b)?  I'm not that familiar
with CFG-RTL mode.

> +
> +      changed = true;
> +
> +      /* Try to merge the resulting merged PRED into further predecessors.  */
> +      if (can_dup_more)
> +	maybe_duplicate_computed_goto (pred, max_size);
> +    }
> +
> +  return changed;
> +}
> +
>  /* Duplicate the blocks containing computed gotos.  This basically unfactors
>     computed gotos that were factored early on in the compilation process to
> -   speed up edge based data flow.  We used to not unfactoring them again,
> -   which can seriously pessimize code with many computed jumps in the source
> -   code, such as interpreters.  See e.g. PR15242.  */
> +   speed up edge based data flow.  We used to not unfactor them again, which
> +   can seriously pessimize code with many computed jumps in the source code,
> +   such as interpreters.  See e.g. PR15242.  */
> +static void
> +duplicate_computed_gotos (function *fun)
> +{
> +  /* We are estimating the length of uncond jump insn only once
> +     since the code for getting the insn length always returns
> +     the minimal length now.  */
> +  if (uncond_jump_length == 0)
> +    uncond_jump_length = get_uncond_jump_length ();
> +
> +  /* Never copy a block larger than this.  */
> +  int max_size
> +    = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
> +
> +  bool changed = false;
> +
> +  /* Try to duplicate all blocks that end in a computed jump and that
> +     can be duplicated at all.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
> +      changed |= maybe_duplicate_computed_goto (bb, max_size);
> +
> +  /* Duplicating blocks will redirect edges and may cause hot blocks
> +    previously reached by both hot and cold blocks to become dominated
> +    only by cold blocks.  */
> +  if (changed)
> +    fixup_partitions ();
> +}
>  
>  namespace {
>  
> @@ -2634,125 +2723,8 @@ pass_duplicate_computed_gotos::gate (function *fun)
>  unsigned int
>  pass_duplicate_computed_gotos::execute (function *fun)
>  {
> -  basic_block bb, new_bb;
> -  bitmap candidates;
> -  int max_size;
> -  bool changed = false;
> +  duplicate_computed_gotos (fun);
>  
> -  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
> -    return 0;
> -
> -  clear_bb_flags ();
> -  cfg_layout_initialize (0);
> -
> -  /* We are estimating the length of uncond jump insn only once
> -     since the code for getting the insn length always returns
> -     the minimal length now.  */
> -  if (uncond_jump_length == 0)
> -    uncond_jump_length = get_uncond_jump_length ();
> -
> -  max_size
> -    = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
> -  candidates = BITMAP_ALLOC (NULL);
> -
> -  /* Look for blocks that end in a computed jump, and see if such blocks
> -     are suitable for unfactoring.  If a block is a candidate for unfactoring,
> -     mark it in the candidates.  */
> -  FOR_EACH_BB_FN (bb, fun)
> -    {
> -      rtx_insn *insn;
> -      edge e;
> -      edge_iterator ei;
> -      int size, all_flags;
> -
> -      /* Build the reorder chain for the original order of blocks.  */
> -      if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
> -	bb->aux = bb->next_bb;
> -
> -      /* Obviously the block has to end in a computed jump.  */
> -      if (!computed_jump_p (BB_END (bb)))
> -	continue;
> -
> -      /* Only consider blocks that can be duplicated.  */
> -      if (CROSSING_JUMP_P (BB_END (bb))
> -	  || !can_duplicate_block_p (bb))
> -	continue;
> -
> -      /* Make sure that the block is small enough.  */
> -      size = 0;
> -      FOR_BB_INSNS (bb, insn)
> -	if (INSN_P (insn))
> -	  {
> -	    size += get_attr_min_length (insn);
> -	    if (size > max_size)
> -	       break;
> -	  }
> -      if (size > max_size)
> -	continue;
> -
> -      /* Final check: there must not be any incoming abnormal edges.  */
> -      all_flags = 0;
> -      FOR_EACH_EDGE (e, ei, bb->preds)
> -	all_flags |= e->flags;
> -      if (all_flags & EDGE_COMPLEX)
> -	continue;
> -
> -      bitmap_set_bit (candidates, bb->index);
> -    }
> -
> -  /* Nothing to do if there is no computed jump here.  */
> -  if (bitmap_empty_p (candidates))
> -    goto done;
> -
> -  /* Duplicate computed gotos.  */
> -  FOR_EACH_BB_FN (bb, fun)
> -    {
> -      if (bb->flags & BB_VISITED)
> -	continue;
> -
> -      bb->flags |= BB_VISITED;
> -
> -      /* BB must have one outgoing edge.  That edge must not lead to
> -	 the exit block or the next block.
> -	 The destination must have more than one predecessor.  */
> -      if (!single_succ_p (bb)
> -	  || single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (fun)
> -	  || single_succ (bb) == bb->next_bb
> -	  || single_pred_p (single_succ (bb)))
> -	continue;
> -
> -      /* The successor block has to be a duplication candidate.  */
> -      if (!bitmap_bit_p (candidates, single_succ (bb)->index))
> -	continue;
> -
> -      /* Don't duplicate a partition crossing edge, which requires difficult
> -         fixup.  */
> -      if (JUMP_P (BB_END (bb)) && CROSSING_JUMP_P (BB_END (bb)))
> -	continue;
> -
> -      new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
> -      new_bb->aux = bb->aux;
> -      bb->aux = new_bb;
> -      new_bb->flags |= BB_VISITED;
> -      changed = true;
> -    }
> -
> - done:
> -  if (changed)
> -    {
> -      /* Duplicating blocks above will redirect edges and may cause hot
> -	 blocks previously reached by both hot and cold blocks to become
> -	 dominated only by cold blocks.  */
> -      fixup_partitions ();
> -
> -      /* Merge the duplicated blocks into predecessors, when possible.  */
> -      cfg_layout_finalize ();
> -      cleanup_cfg (0);
> -    }
> -  else
> -    cfg_layout_finalize ();
> -
> -  BITMAP_FREE (candidates);
>    return 0;
>  }
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-18  8:09 ` Richard Biener
@ 2016-11-18  9:07   ` Segher Boessenkool
  2016-11-18 11:12     ` Segher Boessenkool
  2016-11-18 12:46   ` Segher Boessenkool
  1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-18  9:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc, law

On Fri, Nov 18, 2016 at 09:09:40AM +0100, Richard Biener wrote:
> > +      /* Make a copy of BB, merge it into PRED.  */
> > +      basic_block copy = duplicate_block (bb, e, NULL);
> > +      emit_barrier_after_bb (copy);
> > +      reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
> > +      merge_blocks (pred, copy);
> 
> there is can_merge_blocks_p and I would have expected the RTL CFG hooks
> to do the insn "merging" (you use reorder_insns_nobb for this which is
> actually deprecated?).  I suppose if you'd specify pred as third arg
> to duplicate_block the CFG hook would have done its work 
> (can_merge_blocks_p checks a->next_bb == b)?  I'm not that familiar
> with CFG-RTL mode.

The third arg to duplicate_block ("after") doesn't do anything in either
RTL mode: it only is passed to move_block_after, but that is a noop for
the RTL modes (and its result is not checked by duplicate_block, ouch).

can_merge_blocks_p will always return true here (except I forgot to check
for simplejump_p -- I'll add that).

rtl_merge_blocks does not check rtl_can_merge_blocks_p (and you get a
quite spectacular ICE when you get it wrong: everything between the two
blocks is deleted :-) ).  I'll make a separate patch that checks it
(layout mode already does).

reorder_insns_nobb is deprecated but it does an important job for which
there is no replacement.  In many cases you can do better than doing
manual surgery on the insn stream of course.

Thanks for the review,


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-18  9:07   ` Segher Boessenkool
@ 2016-11-18 11:12     ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-18 11:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc, law

On Fri, Nov 18, 2016 at 03:07:27AM -0600, Segher Boessenkool wrote:
> rtl_merge_blocks does not check rtl_can_merge_blocks_p (and you get a
> quite spectacular ICE when you get it wrong: everything between the two
> blocks is deleted :-) ).  I'll make a separate patch that checks it
> (layout mode already does).

This fails in cfgcleanup.c (in merge_blocks_move_successor_nojumps),
on the "a->next_bb == b" test.  Oh well.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-18  8:09 ` Richard Biener
  2016-11-18  9:07   ` Segher Boessenkool
@ 2016-11-18 12:46   ` Segher Boessenkool
  2016-11-21 15:19     ` Segher Boessenkool
  1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-18 12:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc, law

On Fri, Nov 18, 2016 at 09:09:40AM +0100, Richard Biener wrote:
> > This patch rewrites the algorithm to deal with this.  It also makes it
> > simpler: it does not need the "candidates" array anymore, it does not
> > need RTL layout mode, it does not need cleanup_cfg, and it does not
> > need to keep track of what blocks it already visited.
> > 
> > Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
> > of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?
> 
> Looks good to me - a single question below:

And here is a testcase, okay as well?


Segher


2016-11-18  Segher Boessenkool  <segher@kernel.crashing.org>

gcc/testsuite/
	PR rtl-optimization/71785
	* gcc.target/powerpc/pr71785.c: New file.

---
 gcc/testsuite/gcc.target/powerpc/pr71785.c | 52 ++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr71785.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr71785.c b/gcc/testsuite/gcc.target/powerpc/pr71785.c
new file mode 100644
index 0000000..613dcd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr71785.c
@@ -0,0 +1,52 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\mb\M} } } */
+
+/* Check that all computed gotos in this testcase end up unfactored completely.
+   If some is not there will be a unconditional jump left; if all works fine,
+   all are gone.  */
+
+typedef enum opcode
+{
+	OP_A,
+	OP_B,
+	OP_END
+} opcode;
+
+typedef struct op
+{
+	opcode opcode;
+	int arg;
+} op;
+
+extern void do_stuff_b(int arg);
+extern void do_stuff_c(int arg);
+
+extern int someglobal;
+
+void
+eval(op *op)
+{
+	static const void *dispatch_table[] = {
+		&&CASE_OP_A,
+		&&CASE_OP_B,
+		&&CASE_OP_C,
+		&&CASE_OP_END
+	};
+
+	goto *dispatch_table[op->opcode];
+CASE_OP_A:
+	someglobal++;
+	op++;
+	goto *dispatch_table[op->opcode];
+CASE_OP_B:
+	do_stuff_b(op->arg);
+	op++;
+	goto *dispatch_table[op->opcode];
+CASE_OP_C:
+	do_stuff_c(op->arg);
+	op++;
+	goto *dispatch_table[op->opcode];
+CASE_OP_END:
+	return;
+}
-- 
1.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-17 20:48 [PATCH v3] bb-reorder: Improve compgotos pass (PR71785) Segher Boessenkool
  2016-11-18  8:09 ` Richard Biener
@ 2016-11-19 20:16 ` Andreas Schwab
  2016-11-25  8:53   ` Jakub Jelinek
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2016-11-19 20:16 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, stevenb.gcc, rguenther, law

On Nov 17 2016, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> 	PR rtl-optimization/71785
> 	* bb-reorder.c (maybe_duplicate_computed_goto): New function.
> 	(duplicate_computed_gotos): New function.
> 	(pass_duplicate_computed_gotos::execute): Rewrite.

This breaks gcc.c-torture/execute/comp-goto-1.c execution test with -O2
-flto on ia64 and m68k.  The label sim_base_addr has been placed after
the L_LOAD32_RR and L_METAOP_DONE labels, so that
op_map[program[i].f1.offset] - base_addr becomes negative.  That can be
fixed by making .offset a signed bitfield.  Installed as obvious.

Andreas.

	* gcc.c-torture/execute/comp-goto-1.c (insn_t): Change offset to
	signed int.

diff --git a/gcc/testsuite/gcc.c-torture/execute/comp-goto-1.c b/gcc/testsuite/gcc.c-torture/execute/comp-goto-1.c
index 3bf9a26f65..4c41b71c34 100644
--- a/gcc/testsuite/gcc.c-torture/execute/comp-goto-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/comp-goto-1.c
@@ -14,7 +14,7 @@ typedef union
 {
   struct
     {
-      unsigned int	offset:18;
+      signed int	offset:18;
       unsigned int	ignore:4;
       unsigned int	s1:8;
       int		:2;
-- 
2.10.2


-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-18 12:46   ` Segher Boessenkool
@ 2016-11-21 15:19     ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2016-11-21 15:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, stevenb.gcc, law

On Fri, Nov 18, 2016 at 06:46:21AM -0600, Segher Boessenkool wrote:
> 2016-11-18  Segher Boessenkool  <segher@kernel.crashing.org>
> 
> gcc/testsuite/
> 	PR rtl-optimization/71785
> 	* gcc.target/powerpc/pr71785.c: New file.

I have committed this now.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-19 20:16 ` Andreas Schwab
@ 2016-11-25  8:53   ` Jakub Jelinek
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Jelinek @ 2016-11-25  8:53 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Segher Boessenkool, gcc-patches, stevenb.gcc, rguenther, law

On Sat, Nov 19, 2016 at 09:15:42PM +0100, Andreas Schwab wrote:
> 	* gcc.c-torture/execute/comp-goto-1.c (insn_t): Change offset to
> 	signed int.

The same testcase is copied to gcc.dg/tree-prof/, just with extra dg-
directives.  Committed as obvious:

2016-11-25  Jakub Jelinek  <jakub@redhat.com>
	    Andreas Schwab  <schwab@linux-m68k.org>

	PR gcov-profile/78467
	* gcc.dg/tree-prof/comp-goto-1.c (insn_t): Change offset to
	signed int.

--- gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c.jj	2013-06-07 13:17:15.000000000 +0200
+++ gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c	2016-11-25 09:46:58.734509541 +0100
@@ -16,7 +16,7 @@ typedef union
 {
   struct
     {
-      unsigned int	offset:18;
+      signed int	offset:18;
       unsigned int	ignore:4;
       unsigned int	s1:8;
       int		:2;


	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-25  8:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-17 20:48 [PATCH v3] bb-reorder: Improve compgotos pass (PR71785) Segher Boessenkool
2016-11-18  8:09 ` Richard Biener
2016-11-18  9:07   ` Segher Boessenkool
2016-11-18 11:12     ` Segher Boessenkool
2016-11-18 12:46   ` Segher Boessenkool
2016-11-21 15:19     ` Segher Boessenkool
2016-11-19 20:16 ` Andreas Schwab
2016-11-25  8:53   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).