Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <law@redhat.com>
To: Segher Boessenkool <segher@kernel.crashing.org>, gcc-patches@gcc.gnu.org
Cc: dje.gcc@gmail.com
Subject: Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components
Date: Tue, 27 Sep 2016 21:25:00 -0000	[thread overview]
Message-ID: <cf28e01b-eb41-acfd-8d4b-7cc724006054@redhat.com> (raw)
In-Reply-To: <cd8a358e576756d1611ca2d7862d4c718ded8bf5.1474616087.git.segher@kernel.crashing.org>

On 09/23/2016 02:21 AM, Segher Boessenkool wrote:
> Deciding what blocks should run with a certain component active so that
> the total cost of executing the prologues (and epilogues) is optimal, is
> not a computationally feasible problem.  Instead, for each basic block,
> we estimate the cost of putting a prologue right before the block, and
> if that is cheaper than the total cost of putting prologues optimally
> (according to the estimated cost) in the dominator subtrees strictly
> dominated by this first block, place it at the first block instead.
> This simple procedure places the components optimally for any dominator
> sub tree where the root node's cost does not depend on anything outside
> its subtree.
>
> The cost is the execution frequency of all edges into the block coming
> from blocks that do not have this component active.  The estimated cost
> is the execution frequency of the block, minus the execution frequency
> of any backedges (which by definition are coming from subtrees, so if
> the "head" block gets a prologue, the source block of any backedge has
> that component active as well).
>
> Currently, the epilogues are placed as late as possible, given the
> constraints.  This does not matter for execution cost, but we could
> save a little bit of code size by placing the epilogues in a smarter
> way.  This is a possible future optimisation.
>
> Now all that is left is inserting prologues and epilogues on all edges
> that jump into resp. out of the "active" set of blocks.  Often we need
> to insert some components' prologues (or epilogues) on all edges into
> (or out of) a block.  In theory cross-jumping can unify all such, but
> in practice that often fails; besides, that is a lot of work.  So in
> this case we insert the prologue and epilogue components at the "head"
> or "tail" of a block, instead.
>
> As a final optimisation, if a block needs a prologue and its immediate
> dominator has the block as a post-dominator, that immediate dominator
> gets the prologue as well.
>
>
> 2016-09-23  Segher Boessenkool  <segher@kernel.crashing.org>
>
> 	* function.c (thread_prologue_and_epilogue_insns): Recompute the
> 	live info.  Call try_shrink_wrapping_separate.  Compute the
> 	prologue_seq afterwards, if it has possibly changed.  Compute the
> 	split_prologue_seq and epilogue_seq later, too.
> 	* shrink-wrap.c: #include cfgbuild.h.
> 	(dump_components): New function.
> 	(struct sw): New struct.
> 	(SW): New function.
> 	(init_separate_shrink_wrap): New function.
> 	(fini_separate_shrink_wrap): New function.
> 	(place_prologue_for_one_component): New function.
> 	(spread_components): New function.
> 	(disqualify_problematic_components): New function.
> 	(emit_common_heads_for_components): New function.
> 	(emit_common_tails_for_components): New function.
> 	(insert_prologue_epilogue_for_components): New function.
> 	(try_shrink_wrapping_separate): New function.
> 	* shrink-wrap.h: Declare try_shrink_wrapping_separate.
>
> ---
>  gcc/function.c    |   9 +-
>  gcc/shrink-wrap.c | 729 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/shrink-wrap.h |   1 +
>  3 files changed, 737 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/function.c b/gcc/function.c
> index 53bad87..79e7b4f 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -5920,9 +5920,7 @@ thread_prologue_and_epilogue_insns (void)
>    edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>    edge orig_entry_edge = entry_edge;
>
> -  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
>    rtx_insn *prologue_seq = make_prologue_seq ();
> -  rtx_insn *epilogue_seq = make_epilogue_seq ();
>
>    /* Try to perform a kind of shrink-wrapping, making sure the
>       prologue/epilogue is emitted only around those parts of the
> @@ -5930,6 +5928,13 @@ thread_prologue_and_epilogue_insns (void)
>
>    try_shrink_wrapping (&entry_edge, prologue_seq);
>
> +  try_shrink_wrapping_separate (entry_edge->dest);
> +
> +  if (crtl->shrink_wrapped_separate)
> +    prologue_seq = make_prologue_seq ();
Note this implies that make_prologue_seq (and consequently the target 
specific bits for prologue generation) can be safely called more than 
once.  That's probably OK, but might be worth a comment -- your decision 
whether or not to add the comment.


> diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
> index b85b1c3..0dced22 100644
> --- a/gcc/shrink-wrap.c
> +++ b/gcc/shrink-wrap.c

> +/* Place the prologue for component WHICH, in the basic blocks dominated
> +   by HEAD.  Do a DFS over the dominator tree, and set bit WHICH in the
> +   HAS_COMPONENTS of a block if either the block has that bit set in
> +   NEEDS_COMPONENTS, or it is cheaper to place the prologue here than in all
> +   dominator subtrees separately.  */
> +static void
> +place_prologue_for_one_component (unsigned int which, basic_block head)
> +{
> +  /* The block we are currently dealing with.  */
> +  basic_block bb = head;
> +  /* Is this the first time we visit this block, i.e. have we just gone
> +     down the tree.  */
> +  bool first_visit = true;
> +
> +  /* Walk the dominator tree, visit one block per iteration of this loop.
> +     Each basic block is visited twice: once before visiting any children
> +     of the block, and once after visiting all of them (leaf nodes are
> +     visited only once).  As an optimization, we do not visit subtrees
> +     that can no longer influence the prologue placement.  */
> +  for (;;)
Is there some reason you wrote this as a loop rather than recursion? 
IMHO it makes this function (and spread_components) more difficult to 
reason about than it needs to be.



> +
> +/* Mark HAS_COMPONENTS for every block dominated by at least one block with
> +   PRO_COMPONENTS set for the respective components, starting at HEAD.  */
> +static void
> +spread_components (basic_block head)
I don't see any references to PRO_COMPONENTS in the actual code.  If I'm 
reading the code correctly, you're just accumulating/propagating 
HAS_COMPONENTS from parents to children via a DFS walk of the dominator 
tree, right?





> +
> +/* Place code for prologues and epilogues for COMPONENTS where we can put
> +   that code at the end of basic blocks.  */
> +static void
> +emit_common_tails_for_components (sbitmap components)
[ Snip. ]
> +
> +      /* Put the code at the end of the BB, but before any final jump.  */
> +      if (!bitmap_empty_p (epi))
So what if the final jump uses hard registers in one way or another?   I 
don't immediately see anything that verifies it is safe to transpose the 
epilogue and the final jump.

Conceptually want the epilogue to occur on the outgoing edge(s).  But 
you want to actually transpose the epilogue and the control flow insn so 
that you can insert the epilogue in one place.

But naive transposition isn't safe.  The control flow insn may use one 
or more registers that you were restoring or you may be on a cc0 target. 
   I think you need to handle the former more cleanly.  The latter I'd 
be comfortable filtering out in try_shrink_wrapping_separate.

This has similarities to some of the problems we've had in the past with 
rtl-gcse insertion as well as output reloads on insns with control flow.

With transposition issue addressed, the only blocker I see are some 
simple testcases we can add to the suite.  They don't have to be real 
extensive.  And one motivating example for the list archives, ideally 
the glibc malloc case.

Jeff

next prev parent reply	other threads:[~2016-09-27 21:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23  8:22 [PATCH v3 0/5] Separate shrink-wrapping Segher Boessenkool
2016-09-23  8:22 ` [PATCH 1/5] separate shrink-wrap: New command-line flag, status flag, hooks, and doc Segher Boessenkool
2016-09-26 17:02   ` Jeff Law
2016-09-23  8:22 ` [PATCH 2/5] dce: Don't dead-code delete separately wrapped restores Segher Boessenkool
2016-09-26 16:55   ` Jeff Law
2016-09-23  8:23 ` [PATCH 3/5] regrename: Don't rename restores Segher Boessenkool
2016-09-26 16:44   ` Jeff Law
2016-09-23  8:33 ` [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components Segher Boessenkool
2016-09-27 21:25   ` Jeff Law [this message]
2016-09-28  9:26     ` Segher Boessenkool
2016-09-28 16:36       ` Jeff Law
2016-09-30 10:14     ` Segher Boessenkool
     [not found]     ` <20160930102908.GB14933@gate.crashing.org>
2016-09-30 10:52       ` Segher Boessenkool
2016-10-10 21:21         ` Jeff Law
2016-10-10 22:23           ` Segher Boessenkool
2016-09-23  8:44 ` [PATCH 5/5] rs6000: Separate shrink-wrapping Segher Boessenkool
2016-09-26 16:39   ` Jeff Law
2016-09-26 18:16   ` David Edelsohn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf28e01b-eb41-acfd-8d4b-7cc724006054@redhat.com \
    --to=law@redhat.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).