From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-437139-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 87576 invoked by alias); 28 Sep 2016 09:04:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 86477 invoked by uid 89); 28 Sep 2016 09:04:13 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-4.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=caps, fun, thousand, visit
X-HELO: gate.crashing.org
Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Sep 2016 09:04:11 +0000
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])	by gate.crashing.org (8.14.1/8.13.8) with ESMTP id u8S948rg030292;	Wed, 28 Sep 2016 04:04:08 -0500
Received: (from segher@localhost)	by gate.crashing.org (8.14.1/8.14.1/Submit) id u8S947mO030291;	Wed, 28 Sep 2016 04:04:07 -0500
Date: Wed, 28 Sep 2016 09:26:00 -0000
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jeff Law <law@redhat.com>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com
Subject: Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components
Message-ID: <20160928090407.GH30477@gate.crashing.org>
References: <cover.1474616087.git.segher@kernel.crashing.org> <cd8a358e576756d1611ca2d7862d4c718ded8bf5.1474616087.git.segher@kernel.crashing.org> <cf28e01b-eb41-acfd-8d4b-7cc724006054@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <cf28e01b-eb41-acfd-8d4b-7cc724006054@redhat.com>
User-Agent: Mutt/1.4.2.3i
X-IsSubscribed: yes
X-SW-Source: 2016-09/txt/msg02102.txt.bz2

On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote:
> On 09/23/2016 02:21 AM, Segher Boessenkool wrote:
> >--- a/gcc/function.c
> >+++ b/gcc/function.c
> >@@ -5920,9 +5920,7 @@ thread_prologue_and_epilogue_insns (void)
> >   edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> >   edge orig_entry_edge = entry_edge;
> >
> >-  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
> >   rtx_insn *prologue_seq = make_prologue_seq ();
> >-  rtx_insn *epilogue_seq = make_epilogue_seq ();
> >
> >   /* Try to perform a kind of shrink-wrapping, making sure the
> >      prologue/epilogue is emitted only around those parts of the
> >@@ -5930,6 +5928,13 @@ thread_prologue_and_epilogue_insns (void)
> >
> >   try_shrink_wrapping (&entry_edge, prologue_seq);
> >
> >+  try_shrink_wrapping_separate (entry_edge->dest);
> >+
> >+  if (crtl->shrink_wrapped_separate)
> >+    prologue_seq = make_prologue_seq ();
> Note this implies that make_prologue_seq (and consequently the target 
> specific bits for prologue generation) can be safely called more than 
> once.  That's probably OK, but might be worth a comment -- your decision 
> whether or not to add the comment.

It is only called twice if separately shrink-wrapping (and that did
anything), so it won't change current behaviour.

I'll add a comment.

> >+static void
> >+place_prologue_for_one_component (unsigned int which, basic_block head)
> >+{
> >+  /* The block we are currently dealing with.  */
> >+  basic_block bb = head;
> >+  /* Is this the first time we visit this block, i.e. have we just gone
> >+     down the tree.  */
> >+  bool first_visit = true;
> >+
> >+  /* Walk the dominator tree, visit one block per iteration of this loop.
> >+     Each basic block is visited twice: once before visiting any children
> >+     of the block, and once after visiting all of them (leaf nodes are
> >+     visited only once).  As an optimization, we do not visit subtrees
> >+     that can no longer influence the prologue placement.  */
> >+  for (;;)
> Is there some reason you wrote this as a loop rather than recursion? 
> IMHO it makes this function (and spread_components) more difficult to 
> reason about than it needs to be.

It would recurse a few thousand frames deep on not terribly big testcases
(i.e. I've seen it happen).  This uses a lot of stack space on some ABIs,
and is very slow as well (this is the slowest function here by far).
Unlimited recursion is bad.

> >+/* Mark HAS_COMPONENTS for every block dominated by at least one block 
> >with
> >+   PRO_COMPONENTS set for the respective components, starting at HEAD.  */
> >+static void
> >+spread_components (basic_block head)
> I don't see any references to PRO_COMPONENTS in the actual code.  If I'm 
> reading the code correctly, you're just accumulating/propagating 
> HAS_COMPONENTS from parents to children via a DFS walk of the dominator 
> tree, right?

Yeah, s/PRO_COMPONENTS/HAS_COMPONENTS/.  I have renamed things a few
time but missed this one due to the ALL VARS IN ALL CAPS style :-)

> >+/* Place code for prologues and epilogues for COMPONENTS where we can put
> >+   that code at the end of basic blocks.  */
> >+static void
> >+emit_common_tails_for_components (sbitmap components)
> [ Snip. ]
> >+
> >+      /* Put the code at the end of the BB, but before any final jump.  */
> >+      if (!bitmap_empty_p (epi))
> So what if the final jump uses hard registers in one way or another?   I 
> don't immediately see anything that verifies it is safe to transpose the 
> epilogue and the final jump.

Whoops.  Thanks for catching this.

> Conceptually want the epilogue to occur on the outgoing edge(s).  But 
> you want to actually transpose the epilogue and the control flow insn so 
> that you can insert the epilogue in one place.

The same problem happens with prologues, too.

> But naive transposition isn't safe.  The control flow insn may use one 
> or more registers that you were restoring or you may be on a cc0 target. 

A cc0 target can not use separate shrink-wrapping *anyway* if any of the
components would clobber cc0, so that is no problem here.

>   I think you need to handle the former more cleanly.  The latter I'd 
> be comfortable filtering out in try_shrink_wrapping_separate.

I'm thinking to not do the common tail optimisation if BB_END is a
JUMP_INSN but not simplejump_p (or a return or a sibling call).  Do
you see any problem with that?

> This has similarities to some of the problems we've had in the past with 
> rtl-gcse insertion as well as output reloads on insns with control flow.

Oh, I had so much fun recently with the latter :-/

> With transposition issue addressed, the only blocker I see are some 
> simple testcases we can add to the suite.  They don't have to be real 
> extensive.

Yeah, working on it.

> And one motivating example for the list archives, ideally 
> the glibc malloc case.

Okay.

Thanks for the reviews,


Segher