From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 53993 invoked by alias); 23 May 2018 09:40:24 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 53978 invoked by uid 89); 23 May 2018 09:40:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-io0-f182.google.com Received: from mail-io0-f182.google.com (HELO mail-io0-f182.google.com) (209.85.223.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 May 2018 09:40:21 +0000 Received: by mail-io0-f182.google.com with SMTP id r9-v6so21998721iod.6 for ; Wed, 23 May 2018 02:40:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=kwFVvGik4WLU6s+qdi/e0Lv/m/I2vWldEyyFeSg1fQE=; b=j/lWt73+7M3zj3HN9ZQIDW/fLtfrxdy6Ndp6Tm++97XnI+wEBahvNrSbEyloayDnre JvYsIV6PIwaCncuLAT5z8DG1a8ZQVRhv/UoE0aLDdE35F7qH/nsnPWM+ayKXS6eSkElD 4OKbV4j0ZwO8W5C2wHpsBwFINeY1aOwrpayHtbKO2X+qEL0vyjKIx6W1wN6TGU1ZinwC QEZdle2OLNWVTik0/yNDjjKknhFkwGAgtxLTpbLMLkDr6Qv3vUH5E2UlqCbX0mxopPP1 4eGSM/f4AuVXpJLN9MIoOrO99FTiIkzJwCWzIv29lmY3FFfkheioA9JbHpvQwYh7d4TN CS8A== X-Gm-Message-State: ALKqPwdJ/y2CUp2CCvSeCselOY0QNg2iiDc7IHz+Gf7xnuxyKl6W9ULc PjJ3GwBtrutrmjua5cNAs8MowhgqH5iCAMGhlAM= X-Google-Smtp-Source: ADUXVKK12Dxs57fLU5WjuZWPO24vzNxPqIK5cN+sWmBX7uVd7Z93hQT9bghULBUdJvinDjSbgxz1W3VHQ4WfTwwEMDA= X-Received: by 2002:a6b:d208:: with SMTP id q8-v6mr251740iob.248.1527068419834; Wed, 23 May 2018 02:40:19 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:b40e:0:0:0:0:0 with HTTP; Wed, 23 May 2018 02:40:19 -0700 (PDT) In-Reply-To: References: From: "Bin.Cheng" Date: Wed, 23 May 2018 09:40:00 -0000 Message-ID: Subject: Re: PR80155: Code hoisting and register pressure To: Richard Biener Cc: Prathamesh Kulkarni , GCC Development , Bin Cheng , Thomas Preudhomme Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-05/txt/msg00202.txt.bz2 On Wed, May 23, 2018 at 9:28 AM, Richard Biener wrote: > On Wed, 23 May 2018, Prathamesh Kulkarni wrote: > >> Hi, >> I am trying to work on PR80155, which exposes a problem with code >> hoisting and register pressure on a leading embedded benchmark for ARM >> cortex-m7, where code-hoisting causes an extra register spill. >> >> I have attached two test-cases which (hopefully) are representative of >> the original test-case. >> The first one (trans_dfa.c) is bigger and somewhat similar to the >> original test-case and trans_dfa_2.c is hand-reduced version of >> trans_dfa.c. There's 2 spills caused with trans_dfa.c >> and one spill with trans_dfa_2.c due to lesser amount of cases. >> The test-cases in the PR are probably not relevant. >> >> Initially I thought the spill was happening because of "too many >> hoistings" taking place in original test-case thus increasing the >> register pressure, but it seems the spill is possibly caused because >> expression gets hoisted out of a block that is on loop exit. >> >> For example, the following hoistings take place with trans_dfa_2.c: >> >> (1) Inserting expression in block 4 for code hoisting: >> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >> >> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >> >> (3) Inserting expression in block 4 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> (4) Inserting expression in block 3 for code hoisting: >> {pointer_plus_expr,s_33,1} (0023) >> >> The issue seems to be hoisting of (*tab + 1) which consists of first >> two hoistings in block 4 >> from blocks 5 and 9, which causes the extra spill. I verified that by >> disabling hoisting into block 4, >> which resulted in no extra spills. >> >> I wonder if that's because the expression (*tab + 1) is getting >> hoisted from blocks 5 and 9, >> which are on loop exit ? So the expression that was previously >> computed in a block on loop exit, gets hoisted outside that block >> which possibly makes the allocator more defensive ? Similarly >> disabling hoisting of expressions which appeared in blocks on loop >> exit in original test-case prevented the extra spill. The other >> hoistings didn't seem to matter. > > I think that's simply co-incidence. The only thing that makes > a block that also exits from the loop special is that an > expression could be sunk out of the loop and hoisting (commoning > with another path) could prevent that. But that isn't what is > happening here and it would be a pass ordering issue as > the sinking pass runs only after hoisting (no idea why exactly > but I guess there are cases where we want to prefer CSE over > sinking). So you could try if re-ordering PRE and sinking helps > your testcase. > > What I do see is a missed opportunity to merge the successors > of BB 4. After PRE we have > > [local count: 159303558]: > : > pretmp_123 = *tab_37(D); > _87 = pretmp_123 + 1; > if (c_36 == 65) > goto ; [34.00%] > else > goto ; [66.00%] > > [local count: 54163210]: > *tab_37(D) = _87; > _96 = MEM[(char *)s_57 + 1B]; > if (_96 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > [local count: 105140348]: > *tab_37(D) = _87; > _56 = MEM[(char *)s_57 + 1B]; > if (_56 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > here at least the stores and loads can be hoisted. Note this > may also point at the real issue of the code hoisting which is > tearing apart the RMW operation? > >> If that's the case, would it make sense to add another constraint to >> hoisting to not hoist expression if it appears in a block that is on >> loop exit (exception is if block has only single successor), at-least >> for targets like cortex-m7 where the effect of spill is more >> pronounced ? >> >> I tried to write an untested prototype patch (approach-8.diff) on >> these lines, to refuse hoisting if an expression appears in block on >> loop exit. The patch adds a new map pre_expr_block_d, that maps >> pre_expr to the set of blocks (as a bitmap) it appears in and are on >> loop exit, which is computed during compute_avail. >> During do_hoist_insertion, it simply checks if the bitmap of blocks is >> not empty. >> It works for the attached test-cases and passes ssa-pre-*.c and >> ssa-hoist-*.c tests. > > As said to me the heuristic doesn't make much sense. There is > btw loop_exits_from_bb_p (). > > If the issue in the end is a RA one (that is, there _is_ a better > allocation possible?) then you may also want to look at out-of-SSA > coalescing. > > So overall I'm not convinced the hoisting decision is wrong. > It doesn't increase register pressure at all. It does expose Not quite. There are two hoisting in case of trans_dfa_2.c For the first one: Inserting expression in block 4 for code hoisting: {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) Inserted pretmp_30 = *tab_20(D); in predecessor 4 (0005) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) Inserted _12 = pretmp_30 + 1; in predecessor 4 (0006) I agree hoisting separates RM from W, but for the second one: Inserting expression in block 4 for code hoisting: {pointer_plus_expr,s_33,1} (0023) Inserted _14 = s_33 + 1; in predecessor 4 (0023) Inserting expression in block 3 for code hoisting: {pointer_plus_expr,s_33,1} (0023) Inserted _62 = s_33 + 1; in predecessor 3 (0023) It does increase register pressure because _62(originally s_34 and s_27) is extended and s_33 is still alive. Thanks, bin > a missed store hoisting and load CSE opportunity (but we don't > have a way to "PRE" or hoist stores at the moment). Stores > do not fit data flow problems well given they need to be kept > in order with respect to other stores and loads and appearantly > *tab aliases *s (yeah, s is char *... make tab restrict and I > guess things will work much smoother). > > Richard. > >> Alternatively, we could restrict replacing expression by it's leader >> in eliminate_dom_walker::before_dom_children if the expression appears >> in a block on loop exit. >> In principle this is more general than hoisting, but I have restricted >> it to only hoisted expressions to avoid being overly conservative. >> Similarly, this constrained could be made conditional, only for >> targets like cortex-m7. I have attached a prototype patch based on >> this approach (approach-9.diff). Although it works for attached >> test-cases, unfortunately it ICE's with the original test-case in >> tail-merge, I am working on that. >> >> I am not sure though if either of these approaches are in the correct >> direction and would >> be grateful for suggestions on the issue! >> >> Thanks, >> Prathamesh >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)