public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] bb-reorder: Improve compgotos pass (PR71785)
@ 2016-10-30 19:11 Segher Boessenkool
  2016-10-31 15:10 ` Steven Bosscher
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2016-10-30 19:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

For code like the testcase in PR71785 GCC factors all the indirect branches
to a single dispatcher that then everything jumps to.  This is because
having many indirect branches with each many jump targets does not scale
in large parts of the compiler.  Very late in the pass pipeline (right
before peephole2) the indirect branches are then unfactored again, by
the duplicate_computed_gotos pass.

This pass works by replacing branches to such a common dispatcher by a
copy of the dispatcher.  For code like this testcase this does not work
so well: most cases do a single addition instruction right before the
dispatcher, but not all, and we end up with only two indirect jumps: the
one without the addition, and the one with the addition in its own basic
block, and now everything else jumps _there_.

This patch solves this problem by simply running the duplicate_computed_gotos
pass again, as long as it does any work.  The patch looks much bigger than
it is, because I factored out two routines to simplify the control flow.

Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?


Segher


2016-10-30  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/71785
	* bb-reorder.c (duplicate_computed_gotos_find_candidates,
	duplicate_computed_gotos_do_duplicate): New functions, factored out
	from...
	(pass_duplicate_computed_gotos::execute): Here.  Rerun the pass as
	long as it makes changes.

---
 gcc/bb-reorder.c | 216 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 119 insertions(+), 97 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 85bc569..b69c3b2 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2593,6 +2593,97 @@ make_pass_reorder_blocks (gcc::context *ctxt)
    which can seriously pessimize code with many computed jumps in the source
    code, such as interpreters.  See e.g. PR15242.  */
 
+/* Look for blocks that end in a computed jump in function FUN, and see if
+   such blocks are suitable for unfactoring.  If a block is a candidate for
+   unfactoring, mark it in the CANDIDATES.  A block bigger than MAX_SIZE is
+   not suitable.  */
+static void
+duplicate_computed_gotos_find_candidates (function *fun, bitmap candidates,
+					  int max_size)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      /* Obviously the block has to end in a computed jump.  */
+      if (!computed_jump_p (BB_END (bb)))
+	continue;
+
+      /* Only consider blocks that can be duplicated.  */
+      if (CROSSING_JUMP_P (BB_END (bb))
+	  || !can_duplicate_block_p (bb))
+	continue;
+
+      /* Make sure that the block is small enough.  */
+      int size = 0;
+      rtx_insn *insn;
+      FOR_BB_INSNS (bb, insn)
+	if (INSN_P (insn))
+	  {
+	    size += get_attr_min_length (insn);
+	    if (size > max_size)
+	       break;
+	  }
+      if (size > max_size)
+	continue;
+
+      /* Final check: there must not be any incoming abnormal edges.  */
+      int all_flags = 0;
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, bb->preds)
+	all_flags |= e->flags;
+      if (all_flags & EDGE_COMPLEX)
+	continue;
+
+      bitmap_set_bit (candidates, bb->index);
+    }
+}
+
+/* For every jump in FUN to a block in CANDIDATES try to unfactor that block
+   (i.e. duplicate it and jump to the copy instead).  Return whether any
+   change is made.  */
+static bool
+duplicate_computed_gotos_do_duplicate (function *fun, bitmap candidates)
+{
+  bool changed = false;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      if (bb->flags & BB_VISITED)
+	continue;
+
+      bb->flags |= BB_VISITED;
+
+      /* BB must have one outgoing edge.  That edge must not lead to
+	 the exit block or the next block.
+	 The destination must have more than one predecessor.  */
+      if (!single_succ_p (bb)
+	  || single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (fun)
+	  || single_succ (bb) == bb->next_bb
+	  || single_pred_p (single_succ (bb)))
+	continue;
+
+      /* The successor block has to be a duplication candidate.  */
+      if (!bitmap_bit_p (candidates, single_succ (bb)->index))
+	continue;
+
+      /* Don't duplicate a partition crossing edge, which requires difficult
+         fixup.  */
+      if (JUMP_P (BB_END (bb)) && CROSSING_JUMP_P (BB_END (bb)))
+	continue;
+
+      basic_block new_bb = duplicate_block (single_succ (bb),
+					    single_succ_edge (bb), bb);
+      new_bb->aux = bb->aux;
+      bb->aux = new_bb;
+      new_bb->flags |= BB_VISITED;
+      changed = true;
+    }
+
+  return changed;
+}
+
 namespace {
 
 const pass_data pass_data_duplicate_computed_gotos =
@@ -2634,125 +2725,56 @@ pass_duplicate_computed_gotos::gate (function *fun)
 unsigned int
 pass_duplicate_computed_gotos::execute (function *fun)
 {
-  basic_block bb, new_bb;
-  bitmap candidates;
-  int max_size;
-  bool changed = false;
-
-  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
-    return 0;
-
-  clear_bb_flags ();
-  cfg_layout_initialize (0);
-
   /* We are estimating the length of uncond jump insn only once
      since the code for getting the insn length always returns
      the minimal length now.  */
   if (uncond_jump_length == 0)
     uncond_jump_length = get_uncond_jump_length ();
 
-  max_size
+  int max_size
     = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
-  candidates = BITMAP_ALLOC (NULL);
 
-  /* Look for blocks that end in a computed jump, and see if such blocks
-     are suitable for unfactoring.  If a block is a candidate for unfactoring,
-     mark it in the candidates.  */
-  FOR_EACH_BB_FN (bb, fun)
+  bitmap candidates = BITMAP_ALLOC (NULL);
+
+  for (;;)
     {
-      rtx_insn *insn;
-      edge e;
-      edge_iterator ei;
-      int size, all_flags;
+      if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
+	return 0;
 
-      /* Build the reorder chain for the original order of blocks.  */
-      if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
-	bb->aux = bb->next_bb;
+      clear_bb_flags ();
+      cfg_layout_initialize (0);
 
-      /* Obviously the block has to end in a computed jump.  */
-      if (!computed_jump_p (BB_END (bb)))
-	continue;
+      basic_block bb;
+      FOR_EACH_BB_FN (bb, fun)
+	{
+	  /* Build the reorder chain for the original order of blocks.  */
+	  if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
+	    bb->aux = bb->next_bb;
+	}
 
-      /* Only consider blocks that can be duplicated.  */
-      if (CROSSING_JUMP_P (BB_END (bb))
-	  || !can_duplicate_block_p (bb))
-	continue;
+      duplicate_computed_gotos_find_candidates (fun, candidates, max_size);
 
-      /* Make sure that the block is small enough.  */
-      size = 0;
-      FOR_BB_INSNS (bb, insn)
-	if (INSN_P (insn))
-	  {
-	    size += get_attr_min_length (insn);
-	    if (size > max_size)
-	       break;
-	  }
-      if (size > max_size)
-	continue;
+      bool changed = false;
+      if (!bitmap_empty_p (candidates))
+	changed = duplicate_computed_gotos_do_duplicate (fun, candidates);
 
-      /* Final check: there must not be any incoming abnormal edges.  */
-      all_flags = 0;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	all_flags |= e->flags;
-      if (all_flags & EDGE_COMPLEX)
-	continue;
-
-      bitmap_set_bit (candidates, bb->index);
-    }
-
-  /* Nothing to do if there is no computed jump here.  */
-  if (bitmap_empty_p (candidates))
-    goto done;
-
-  /* Duplicate computed gotos.  */
-  FOR_EACH_BB_FN (bb, fun)
-    {
-      if (bb->flags & BB_VISITED)
-	continue;
-
-      bb->flags |= BB_VISITED;
-
-      /* BB must have one outgoing edge.  That edge must not lead to
-	 the exit block or the next block.
-	 The destination must have more than one predecessor.  */
-      if (!single_succ_p (bb)
-	  || single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (fun)
-	  || single_succ (bb) == bb->next_bb
-	  || single_pred_p (single_succ (bb)))
-	continue;
-
-      /* The successor block has to be a duplication candidate.  */
-      if (!bitmap_bit_p (candidates, single_succ (bb)->index))
-	continue;
-
-      /* Don't duplicate a partition crossing edge, which requires difficult
-         fixup.  */
-      if (JUMP_P (BB_END (bb)) && CROSSING_JUMP_P (BB_END (bb)))
-	continue;
-
-      new_bb = duplicate_block (single_succ (bb), single_succ_edge (bb), bb);
-      new_bb->aux = bb->aux;
-      bb->aux = new_bb;
-      new_bb->flags |= BB_VISITED;
-      changed = true;
-    }
-
- done:
-  if (changed)
-    {
       /* Duplicating blocks above will redirect edges and may cause hot
 	 blocks previously reached by both hot and cold blocks to become
 	 dominated only by cold blocks.  */
-      fixup_partitions ();
+      if (changed)
+	fixup_partitions ();
+
+      cfg_layout_finalize ();
 
       /* Merge the duplicated blocks into predecessors, when possible.  */
-      cfg_layout_finalize ();
-      cleanup_cfg (0);
+      if (changed)
+	cleanup_cfg (0);
+      else
+	break;
     }
-  else
-    cfg_layout_finalize ();
 
   BITMAP_FREE (candidates);
+
   return 0;
 }
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-10-30 19:11 [PATCH] bb-reorder: Improve compgotos pass (PR71785) Segher Boessenkool
@ 2016-10-31 15:10 ` Steven Bosscher
  2016-10-31 15:35   ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Bosscher @ 2016-10-31 15:10 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
> This patch solves this problem by simply running the duplicate_computed_gotos
> pass again, as long as it does any work.  The patch looks much bigger than
> it is, because I factored out two routines to simplify the control flow.

It's made the patch a bit difficult to read. Condensing it a bit...

> +  for (;;)
>      {
> +      if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
> +       return 0;

This test should not be needed in the loop. This pass can never
collapse the function to a single basic block.


> +      clear_bb_flags ();
> +      cfg_layout_initialize (0);

See comment below...


> +      basic_block bb;
> +      FOR_EACH_BB_FN (bb, fun)
> +       {
> +         /* Build the reorder chain for the original order of blocks.  */
> +         if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
> +           bb->aux = bb->next_bb;
> +       }
>
> +      duplicate_computed_gotos_find_candidates (fun, candidates, max_size);
>
> +      bool changed = false;
> +      if (!bitmap_empty_p (candidates))
> +       changed = duplicate_computed_gotos_do_duplicate (fun, candidates);
>
> +      if (changed)
> +       fixup_partitions ();
> +
> +      cfg_layout_finalize ();

I don't think you have to go into/out-of cfglayout mode for each iteration.


>        /* Merge the duplicated blocks into predecessors, when possible.  */
> +      if (changed)
> +       cleanup_cfg (0);
> +      else
> +       break;
>      }

Maybe a gcc_assert that the loop doesn't iterate more often than num_edges?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-10-31 15:10 ` Steven Bosscher
@ 2016-10-31 15:35   ` Segher Boessenkool
  2016-11-02  9:02     ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2016-10-31 15:35 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches

On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
> > This patch solves this problem by simply running the duplicate_computed_gotos
> > pass again, as long as it does any work.  The patch looks much bigger than
> > it is, because I factored out two routines to simplify the control flow.
> 
> It's made the patch a bit difficult to read. Condensing it a bit...

Well, it would have a goto crossing a loop, or two gotos crossing each
other, otherwise.  I should have done it as two patches I guess (first
factor, then change).

> > +  for (;;)
> >      {
> > +      if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
> > +       return 0;
> 
> This test should not be needed in the loop. This pass can never
> collapse the function to a single basic block.

Yeah maybe, but that relies on quite a few assumptions.  This is strictly
an optimisation anyway, will move it outside the loop.

> > +      basic_block bb;
> > +      FOR_EACH_BB_FN (bb, fun)
> > +       {
> > +         /* Build the reorder chain for the original order of blocks.  */
> > +         if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
> > +           bb->aux = bb->next_bb;
> > +       }
> >
> > +      duplicate_computed_gotos_find_candidates (fun, candidates, max_size);
> >
> > +      bool changed = false;
> > +      if (!bitmap_empty_p (candidates))
> > +       changed = duplicate_computed_gotos_do_duplicate (fun, candidates);
> >
> > +      if (changed)
> > +       fixup_partitions ();
> > +
> > +      cfg_layout_finalize ();
> 
> I don't think you have to go into/out-of cfglayout mode for each iteration.

Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
iteration though, I was afraid that interacts.

> >        /* Merge the duplicated blocks into predecessors, when possible.  */
> > +      if (changed)
> > +       cleanup_cfg (0);
> > +      else
> > +       break;
> >      }
> 
> Maybe a gcc_assert that the loop doesn't iterate more often than num_edges?

Good plan (num blocks even).

Thanks,


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-10-31 15:35   ` Segher Boessenkool
@ 2016-11-02  9:02     ` Richard Biener
  2016-11-02 10:40       ` Steven Bosscher
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2016-11-02  9:02 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Steven Bosscher, GCC Patches

On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>> > This patch solves this problem by simply running the duplicate_computed_gotos
>> > pass again, as long as it does any work.  The patch looks much bigger than
>> > it is, because I factored out two routines to simplify the control flow.
>>
>> It's made the patch a bit difficult to read. Condensing it a bit...
>
> Well, it would have a goto crossing a loop, or two gotos crossing each
> other, otherwise.  I should have done it as two patches I guess (first
> factor, then change).
>
>> > +  for (;;)
>> >      {
>> > +      if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
>> > +       return 0;
>>
>> This test should not be needed in the loop. This pass can never
>> collapse the function to a single basic block.
>
> Yeah maybe, but that relies on quite a few assumptions.  This is strictly
> an optimisation anyway, will move it outside the loop.
>
>> > +      basic_block bb;
>> > +      FOR_EACH_BB_FN (bb, fun)
>> > +       {
>> > +         /* Build the reorder chain for the original order of blocks.  */
>> > +         if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
>> > +           bb->aux = bb->next_bb;
>> > +       }
>> >
>> > +      duplicate_computed_gotos_find_candidates (fun, candidates, max_size);
>> >
>> > +      bool changed = false;
>> > +      if (!bitmap_empty_p (candidates))
>> > +       changed = duplicate_computed_gotos_do_duplicate (fun, candidates);
>> >
>> > +      if (changed)
>> > +       fixup_partitions ();
>> > +
>> > +      cfg_layout_finalize ();
>>
>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>
> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
> iteration though, I was afraid that interacts.

Ick.  Why would it need a cfg-cleanup every iteration?  I fear this is quadratic
complexity in the number of edges to the compgoto block (and the size of the
function).  Can't the unfactoring perform the "cleanup" we rely on here?

>> >        /* Merge the duplicated blocks into predecessors, when possible.  */
>> > +      if (changed)
>> > +       cleanup_cfg (0);
>> > +      else
>> > +       break;
>> >      }
>>
>> Maybe a gcc_assert that the loop doesn't iterate more often than num_edges?
>
> Good plan (num blocks even).
>
> Thanks,
>
>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-02  9:02     ` Richard Biener
@ 2016-11-02 10:40       ` Steven Bosscher
  2016-11-02 13:41         ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Bosscher @ 2016-11-02 10:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Segher Boessenkool, GCC Patches

 On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
>> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>>> > +      cfg_layout_finalize ();
>>>
>>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>>
>> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
>> iteration though, I was afraid that interacts.
>
> Ick.  Why would it need a cfg-cleanup every iteration?

I don't believe it needs a cleanup on every iteration. One cleanup at
the end should work fine.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-02 10:40       ` Steven Bosscher
@ 2016-11-02 13:41         ` Segher Boessenkool
  2016-11-02 13:51           ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2016-11-02 13:41 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Richard Biener, GCC Patches

On Wed, Nov 02, 2016 at 11:39:20AM +0100, Steven Bosscher wrote:
>  On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
> > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
> >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
> >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
> >>> > +      cfg_layout_finalize ();
> >>>
> >>> I don't think you have to go into/out-of cfglayout mode for each iteration.
> >>
> >> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
> >> iteration though, I was afraid that interacts.
> >
> > Ick.  Why would it need a cfg-cleanup every iteration?
> 
> I don't believe it needs a cleanup on every iteration. One cleanup at
> the end should work fine.

But as the comment there says:

      /* Merge the duplicated blocks into predecessors, when possible.  */
      cleanup_cfg (0);

(this is not a new comment), and without merging blocks this whole
patch does zilch.

There is no need to create new basic blocks at all (can replace the
final branch in a block with a copy of the whole block it jumps to,
instead); and then it is painfully obvious that switching to layout
mode here is pointless, too.

Do you want me to do that?

Btw, this isn't quadratic anyway; it is a constant number (the maximal
length allowed of the block to copy) linear.  Unless there are unboundedly
many forwarder blocks, which there shouldn't be, but I cannot prove that.

And on a testcase with 2000 cases (instead of the 4 in the testcase in
the PR) this pass takes less than 1% of the compiler runtime; and in
the normal cases (no computed gotos to unfactor) it does the same as
before.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-02 13:41         ` Segher Boessenkool
@ 2016-11-02 13:51           ` Richard Biener
  2016-11-02 15:27             ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2016-11-02 13:51 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Steven Bosscher, GCC Patches

On Wed, Nov 2, 2016 at 2:40 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Wed, Nov 02, 2016 at 11:39:20AM +0100, Steven Bosscher wrote:
>>  On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
>> >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>> >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>> >>> > +      cfg_layout_finalize ();
>> >>>
>> >>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>> >>
>> >> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
>> >> iteration though, I was afraid that interacts.
>> >
>> > Ick.  Why would it need a cfg-cleanup every iteration?
>>
>> I don't believe it needs a cleanup on every iteration. One cleanup at
>> the end should work fine.
>
> But as the comment there says:
>
>       /* Merge the duplicated blocks into predecessors, when possible.  */
>       cleanup_cfg (0);
>
> (this is not a new comment), and without merging blocks this whole
> patch does zilch.
>
> There is no need to create new basic blocks at all (can replace the
> final branch in a block with a copy of the whole block it jumps to,
> instead); and then it is painfully obvious that switching to layout
> mode here is pointless, too.
>
> Do you want me to do that?
>
> Btw, this isn't quadratic anyway; it is a constant number (the maximal
> length allowed of the block to copy) linear.  Unless there are unboundedly
> many forwarder blocks, which there shouldn't be, but I cannot prove that.

Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
walk the whole function.  So unless the number of iterations is bound
with a constant I call this quadratic (in function size).

> And on a testcase with 2000 cases (instead of the 4 in the testcase in
> the PR) this pass takes less than 1% of the compiler runtime; and in
> the normal cases (no computed gotos to unfactor) it does the same as
> before.
>
>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-02 13:51           ` Richard Biener
@ 2016-11-02 15:27             ` Segher Boessenkool
  2016-11-03  8:29               ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2016-11-02 15:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: Steven Bosscher, GCC Patches

On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
> >> I don't believe it needs a cleanup on every iteration. One cleanup at
> >> the end should work fine.
> >
> > But as the comment there says:
> >
> >       /* Merge the duplicated blocks into predecessors, when possible.  */
> >       cleanup_cfg (0);
> >
> > (this is not a new comment), and without merging blocks this whole
> > patch does zilch.
> >
> > There is no need to create new basic blocks at all (can replace the
> > final branch in a block with a copy of the whole block it jumps to,
> > instead); and then it is painfully obvious that switching to layout
> > mode here is pointless, too.
> >
> > Do you want me to do that?
> >
> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
> > length allowed of the block to copy) linear.  Unless there are unboundedly
> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
> 
> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
> walk the whole function.  So unless the number of iterations is bound
> with a constant I call this quadratic (in function size).

It is bounded (with that caveat above): new candidates will be bigger than
the block merged into it, so there won't be more than
PARAM_MAX_GOTO_DUPLICATION_INSNS passes.

But I can make it all work simpler, in non-layout mode, if you prefer.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-02 15:27             ` Segher Boessenkool
@ 2016-11-03  8:29               ` Richard Biener
  2016-11-03 15:47                 ` Segher Boessenkool
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2016-11-03  8:29 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Steven Bosscher, GCC Patches

On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
>> >> I don't believe it needs a cleanup on every iteration. One cleanup at
>> >> the end should work fine.
>> >
>> > But as the comment there says:
>> >
>> >       /* Merge the duplicated blocks into predecessors, when possible.  */
>> >       cleanup_cfg (0);
>> >
>> > (this is not a new comment), and without merging blocks this whole
>> > patch does zilch.
>> >
>> > There is no need to create new basic blocks at all (can replace the
>> > final branch in a block with a copy of the whole block it jumps to,
>> > instead); and then it is painfully obvious that switching to layout
>> > mode here is pointless, too.
>> >
>> > Do you want me to do that?
>> >
>> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
>> > length allowed of the block to copy) linear.  Unless there are unboundedly
>> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
>>
>> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
>> walk the whole function.  So unless the number of iterations is bound
>> with a constant I call this quadratic (in function size).
>
> It is bounded (with that caveat above): new candidates will be bigger than
> the block merged into it, so there won't be more than
> PARAM_MAX_GOTO_DUPLICATION_INSNS passes.
>
> But I can make it all work simpler, in non-layout mode, if you prefer.

Iteration is fine but we should restrict where we look for new
candidates.  Likewise
block merging opportunities should be easier to exploit than by
running cfg-cleanup...

I'm just thinking of those gigantic machine-generated state machines where we
ATM do quite well (at least at -O1 ...) with respect to compile-time.

Richard.

>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-03  8:29               ` Richard Biener
@ 2016-11-03 15:47                 ` Segher Boessenkool
  2016-11-04  8:42                   ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2016-11-03 15:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: Steven Bosscher, GCC Patches

On Thu, Nov 03, 2016 at 09:29:07AM +0100, Richard Biener wrote:
> On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
> >> >> I don't believe it needs a cleanup on every iteration. One cleanup at
> >> >> the end should work fine.
> >> >
> >> > But as the comment there says:
> >> >
> >> >       /* Merge the duplicated blocks into predecessors, when possible.  */
> >> >       cleanup_cfg (0);
> >> >
> >> > (this is not a new comment), and without merging blocks this whole
> >> > patch does zilch.
> >> >
> >> > There is no need to create new basic blocks at all (can replace the
> >> > final branch in a block with a copy of the whole block it jumps to,
> >> > instead); and then it is painfully obvious that switching to layout
> >> > mode here is pointless, too.
> >> >
> >> > Do you want me to do that?
> >> >
> >> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
> >> > length allowed of the block to copy) linear.  Unless there are unboundedly
> >> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
> >>
> >> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
> >> walk the whole function.  So unless the number of iterations is bound
> >> with a constant I call this quadratic (in function size).
> >
> > It is bounded (with that caveat above): new candidates will be bigger than
> > the block merged into it, so there won't be more than
> > PARAM_MAX_GOTO_DUPLICATION_INSNS passes.
> >
> > But I can make it all work simpler, in non-layout mode, if you prefer.
> 
> Iteration is fine but we should restrict where we look for new
> candidates.  Likewise
> block merging opportunities should be easier to exploit than by
> running cfg-cleanup...

Yes, but that was what the original code does already, so I didn't look
deeper.  "It was just a simple bugfix".

> I'm just thinking of those gigantic machine-generated state machines where we
> ATM do quite well (at least at -O1 ...) with respect to compile-time.

Like I said, I tested it on a 2000 node VM, very artificial so almost
no code except the threading itself, and the compgotos pass takes less
than 1% time both before and after the patch.  I ony tested at -O2 though.

I'll get back to it, but first I have some things that need handling during
stage 1.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
  2016-11-03 15:47                 ` Segher Boessenkool
@ 2016-11-04  8:42                   ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2016-11-04  8:42 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Steven Bosscher, GCC Patches

On Thu, Nov 3, 2016 at 4:46 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Thu, Nov 03, 2016 at 09:29:07AM +0100, Richard Biener wrote:
>> On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> > On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
>> >> >> I don't believe it needs a cleanup on every iteration. One cleanup at
>> >> >> the end should work fine.
>> >> >
>> >> > But as the comment there says:
>> >> >
>> >> >       /* Merge the duplicated blocks into predecessors, when possible.  */
>> >> >       cleanup_cfg (0);
>> >> >
>> >> > (this is not a new comment), and without merging blocks this whole
>> >> > patch does zilch.
>> >> >
>> >> > There is no need to create new basic blocks at all (can replace the
>> >> > final branch in a block with a copy of the whole block it jumps to,
>> >> > instead); and then it is painfully obvious that switching to layout
>> >> > mode here is pointless, too.
>> >> >
>> >> > Do you want me to do that?
>> >> >
>> >> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
>> >> > length allowed of the block to copy) linear.  Unless there are unboundedly
>> >> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
>> >>
>> >> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
>> >> walk the whole function.  So unless the number of iterations is bound
>> >> with a constant I call this quadratic (in function size).
>> >
>> > It is bounded (with that caveat above): new candidates will be bigger than
>> > the block merged into it, so there won't be more than
>> > PARAM_MAX_GOTO_DUPLICATION_INSNS passes.
>> >
>> > But I can make it all work simpler, in non-layout mode, if you prefer.
>>
>> Iteration is fine but we should restrict where we look for new
>> candidates.  Likewise
>> block merging opportunities should be easier to exploit than by
>> running cfg-cleanup...
>
> Yes, but that was what the original code does already, so I didn't look
> deeper.  "It was just a simple bugfix".
>
>> I'm just thinking of those gigantic machine-generated state machines where we
>> ATM do quite well (at least at -O1 ...) with respect to compile-time.
>
> Like I said, I tested it on a 2000 node VM, very artificial so almost
> no code except the threading itself, and the compgotos pass takes less
> than 1% time both before and after the patch.  I ony tested at -O2 though.

I see.

> I'll get back to it, but first I have some things that need handling during
> stage 1.

Fair enough.  Don't consider my comments blocking the patch.

Richard.

>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-11-04  8:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-30 19:11 [PATCH] bb-reorder: Improve compgotos pass (PR71785) Segher Boessenkool
2016-10-31 15:10 ` Steven Bosscher
2016-10-31 15:35   ` Segher Boessenkool
2016-11-02  9:02     ` Richard Biener
2016-11-02 10:40       ` Steven Bosscher
2016-11-02 13:41         ` Segher Boessenkool
2016-11-02 13:51           ` Richard Biener
2016-11-02 15:27             ` Segher Boessenkool
2016-11-03  8:29               ` Richard Biener
2016-11-03 15:47                 ` Segher Boessenkool
2016-11-04  8:42                   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).