From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 87576 invoked by alias); 28 Sep 2016 09:04:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 86477 invoked by uid 89); 28 Sep 2016 09:04:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=caps, fun, thousand, visit X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Sep 2016 09:04:11 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id u8S948rg030292; Wed, 28 Sep 2016 04:04:08 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id u8S947mO030291; Wed, 28 Sep 2016 04:04:07 -0500 Date: Wed, 28 Sep 2016 09:26:00 -0000 From: Segher Boessenkool To: Jeff Law Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components Message-ID: <20160928090407.GH30477@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2016-09/txt/msg02102.txt.bz2 On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote: > On 09/23/2016 02:21 AM, Segher Boessenkool wrote: > >--- a/gcc/function.c > >+++ b/gcc/function.c > >@@ -5920,9 +5920,7 @@ thread_prologue_and_epilogue_insns (void) > > edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > edge orig_entry_edge = entry_edge; > > > >- rtx_insn *split_prologue_seq = make_split_prologue_seq (); > > rtx_insn *prologue_seq = make_prologue_seq (); > >- rtx_insn *epilogue_seq = make_epilogue_seq (); > > > > /* Try to perform a kind of shrink-wrapping, making sure the > > prologue/epilogue is emitted only around those parts of the > >@@ -5930,6 +5928,13 @@ thread_prologue_and_epilogue_insns (void) > > > > try_shrink_wrapping (&entry_edge, prologue_seq); > > > >+ try_shrink_wrapping_separate (entry_edge->dest); > >+ > >+ if (crtl->shrink_wrapped_separate) > >+ prologue_seq = make_prologue_seq (); > Note this implies that make_prologue_seq (and consequently the target > specific bits for prologue generation) can be safely called more than > once. That's probably OK, but might be worth a comment -- your decision > whether or not to add the comment. It is only called twice if separately shrink-wrapping (and that did anything), so it won't change current behaviour. I'll add a comment. > >+static void > >+place_prologue_for_one_component (unsigned int which, basic_block head) > >+{ > >+ /* The block we are currently dealing with. */ > >+ basic_block bb = head; > >+ /* Is this the first time we visit this block, i.e. have we just gone > >+ down the tree. */ > >+ bool first_visit = true; > >+ > >+ /* Walk the dominator tree, visit one block per iteration of this loop. > >+ Each basic block is visited twice: once before visiting any children > >+ of the block, and once after visiting all of them (leaf nodes are > >+ visited only once). As an optimization, we do not visit subtrees > >+ that can no longer influence the prologue placement. */ > >+ for (;;) > Is there some reason you wrote this as a loop rather than recursion? > IMHO it makes this function (and spread_components) more difficult to > reason about than it needs to be. It would recurse a few thousand frames deep on not terribly big testcases (i.e. I've seen it happen). This uses a lot of stack space on some ABIs, and is very slow as well (this is the slowest function here by far). Unlimited recursion is bad. > >+/* Mark HAS_COMPONENTS for every block dominated by at least one block > >with > >+ PRO_COMPONENTS set for the respective components, starting at HEAD. */ > >+static void > >+spread_components (basic_block head) > I don't see any references to PRO_COMPONENTS in the actual code. If I'm > reading the code correctly, you're just accumulating/propagating > HAS_COMPONENTS from parents to children via a DFS walk of the dominator > tree, right? Yeah, s/PRO_COMPONENTS/HAS_COMPONENTS/. I have renamed things a few time but missed this one due to the ALL VARS IN ALL CAPS style :-) > >+/* Place code for prologues and epilogues for COMPONENTS where we can put > >+ that code at the end of basic blocks. */ > >+static void > >+emit_common_tails_for_components (sbitmap components) > [ Snip. ] > >+ > >+ /* Put the code at the end of the BB, but before any final jump. */ > >+ if (!bitmap_empty_p (epi)) > So what if the final jump uses hard registers in one way or another? I > don't immediately see anything that verifies it is safe to transpose the > epilogue and the final jump. Whoops. Thanks for catching this. > Conceptually want the epilogue to occur on the outgoing edge(s). But > you want to actually transpose the epilogue and the control flow insn so > that you can insert the epilogue in one place. The same problem happens with prologues, too. > But naive transposition isn't safe. The control flow insn may use one > or more registers that you were restoring or you may be on a cc0 target. A cc0 target can not use separate shrink-wrapping *anyway* if any of the components would clobber cc0, so that is no problem here. > I think you need to handle the former more cleanly. The latter I'd > be comfortable filtering out in try_shrink_wrapping_separate. I'm thinking to not do the common tail optimisation if BB_END is a JUMP_INSN but not simplejump_p (or a return or a sibling call). Do you see any problem with that? > This has similarities to some of the problems we've had in the past with > rtl-gcse insertion as well as output reloads on insns with control flow. Oh, I had so much fun recently with the latter :-/ > With transposition issue addressed, the only blocker I see are some > simple testcases we can add to the suite. They don't have to be real > extensive. Yeah, working on it. > And one motivating example for the list archives, ideally > the glibc malloc case. Okay. Thanks for the reviews, Segher