From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32489 invoked by alias); 28 Sep 2016 16:33:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 32470 invoked by uid 89); 28 Sep 2016 16:33:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=thousand, life, visit X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Sep 2016 16:33:18 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0A4DE65727; Wed, 28 Sep 2016 16:33:17 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-142.phx2.redhat.com [10.3.116.142]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8SGXGOt012237; Wed, 28 Sep 2016 12:33:16 -0400 Subject: Re: [PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components To: Segher Boessenkool References: <20160928090407.GH30477@gate.crashing.org> Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com From: Jeff Law Message-ID: <2effb66f-132a-be2d-dcd0-038709eea201@redhat.com> Date: Wed, 28 Sep 2016 16:36:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20160928090407.GH30477@gate.crashing.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-09/txt/msg02164.txt.bz2 On 09/28/2016 03:04 AM, Segher Boessenkool wrote: > >>> +static void >>> +place_prologue_for_one_component (unsigned int which, basic_block head) >>> +{ >>> + /* The block we are currently dealing with. */ >>> + basic_block bb = head; >>> + /* Is this the first time we visit this block, i.e. have we just gone >>> + down the tree. */ >>> + bool first_visit = true; >>> + >>> + /* Walk the dominator tree, visit one block per iteration of this loop. >>> + Each basic block is visited twice: once before visiting any children >>> + of the block, and once after visiting all of them (leaf nodes are >>> + visited only once). As an optimization, we do not visit subtrees >>> + that can no longer influence the prologue placement. */ >>> + for (;;) >> Is there some reason you wrote this as a loop rather than recursion? >> IMHO it makes this function (and spread_components) more difficult to >> reason about than it needs to be. > > It would recurse a few thousand frames deep on not terribly big testcases > (i.e. I've seen it happen). This uses a lot of stack space on some ABIs, > and is very slow as well (this is the slowest function here by far). > Unlimited recursion is bad. I'm surprised the recursion was that deep. Such is life. Thanks for clarifying. I won't object to the iterative version. :-) >>> +/* Place code for prologues and epilogues for COMPONENTS where we can put >>> + that code at the end of basic blocks. */ >>> +static void >>> +emit_common_tails_for_components (sbitmap components) >> [ Snip. ] >>> + >>> + /* Put the code at the end of the BB, but before any final jump. */ >>> + if (!bitmap_empty_p (epi)) >> So what if the final jump uses hard registers in one way or another? I >> don't immediately see anything that verifies it is safe to transpose the >> epilogue and the final jump. > > Whoops. Thanks for catching this. I missed it the first time though the code too. > >> Conceptually want the epilogue to occur on the outgoing edge(s). But >> you want to actually transpose the epilogue and the control flow insn so >> that you can insert the epilogue in one place. > > The same problem happens with prologues, too. Yea. I guess if a had a jump with an embedded side effect (such as movb or addb on the PA), then transposing the control flow insn with the prologue would be problematical as well. > > A cc0 target can not use separate shrink-wrapping *anyway* if any of the > components would clobber cc0, so that is no problem here. True, but I'd be more comfortable if we filtered out cc0 targets explicitly. > >> I think you need to handle the former more cleanly. The latter I'd >> be comfortable filtering out in try_shrink_wrapping_separate. > > I'm thinking to not do the common tail optimisation if BB_END is a > JUMP_INSN but not simplejump_p (or a return or a sibling call). Do > you see any problem with that? Seems reasonable. Jeff