From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117126 invoked by alias); 25 May 2018 17:54:18 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 117117 invoked by uid 89); 25 May 2018 17:54:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=H*i:sk:66d75fc, H*f:sk:66d75fc X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 May 2018 17:54:16 +0000 Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 406F1ADC6; Fri, 25 May 2018 17:54:13 +0000 (UTC) Date: Fri, 25 May 2018 17:54:00 -0000 User-Agent: K-9 Mail for Android In-Reply-To: <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: PR80155: Code hoisting and register pressure To: Jeff Law ,"Bin.Cheng" ,Prathamesh Kulkarni CC: GCC Development ,Thomas Preudhomme From: Richard Biener Message-ID: <3D61699E-73EC-4936-9FF3-494DF6816616@suse.de> X-SW-Source: 2018-05/txt/msg00230.txt.bz2 On May 25, 2018 6:57:13 PM GMT+02:00, Jeff Law wrote: >On 05/25/2018 03:49 AM, Bin.Cheng wrote: >> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni >> wrote: >>> On 23 May 2018 at 18:37, Jeff Law wrote: >>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>>> >>>>>>> Hi, >>>>>>> I am trying to work on PR80155, which exposes a problem with >code >>>>>>> hoisting and register pressure on a leading embedded benchmark >for ARM >>>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>>> >>>>>>> I have attached two test-cases which (hopefully) are >representative of >>>>>>> the original test-case. >>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to >the >>>>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>>>> The test-cases in the PR are probably not relevant. >>>>>>> >>>>>>> Initially I thought the spill was happening because of "too many >>>>>>> hoistings" taking place in original test-case thus increasing >the >>>>>>> register pressure, but it seems the spill is possibly caused >because >>>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>>> >>>>>>> For example, the following hoistings take place with >trans_dfa_2.c: >>>>>>> >>>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>>> >>>>>>> (2) Inserting expression in block 4 for code hoisting: >{plus_expr,_4,1} (0006) >>>>>>> >>>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>> >>>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>>> >>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of >first >>>>>>> two hoistings in block 4 >>>>>>> from blocks 5 and 9, which causes the extra spill. I verified >that by >>>>>>> disabling hoisting into block 4, >>>>>>> which resulted in no extra spills. >>>>>>> >>>>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>>>> hoisted from blocks 5 and 9, >>>>>>> which are on loop exit ? So the expression that was previously >>>>>>> computed in a block on loop exit, gets hoisted outside that >block >>>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>>> disabling hoisting of expressions which appeared in blocks on >loop >>>>>>> exit in original test-case prevented the extra spill. The other >>>>>>> hoistings didn't seem to matter. >>>>>> >>>>>> I think that's simply co-incidence. The only thing that makes >>>>>> a block that also exits from the loop special is that an >>>>>> expression could be sunk out of the loop and hoisting (commoning >>>>>> with another path) could prevent that. But that isn't what is >>>>>> happening here and it would be a pass ordering issue as >>>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>>> but I guess there are cases where we want to prefer CSE over >>>>>> sinking). So you could try if re-ordering PRE and sinking helps >>>>>> your testcase. >>>>> Thanks for the suggestions. Placing sink pass before PRE works >>>>> for both these test-cases! Sadly it still causes the spill for the >benchmark -:( >>>>> I will try to create a better approximation of the original >test-case. >>>>>> >>>>>> What I do see is a missed opportunity to merge the successors >>>>>> of BB 4. After PRE we have >>>>>> >>>>>> [local count: 159303558]: >>>>>> : >>>>>> pretmp_123 =3D *tab_37(D); >>>>>> _87 =3D pretmp_123 + 1; >>>>>> if (c_36 =3D=3D 65) >>>>>> goto ; [34.00%] >>>>>> else >>>>>> goto ; [66.00%] >>>>>> >>>>>> [local count: 54163210]: >>>>>> *tab_37(D) =3D _87; >>>>>> _96 =3D MEM[(char *)s_57 + 1B]; >>>>>> if (_96 !=3D 0) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> [local count: 105140348]: >>>>>> *tab_37(D) =3D _87; >>>>>> _56 =3D MEM[(char *)s_57 + 1B]; >>>>>> if (_56 !=3D 0) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> here at least the stores and loads can be hoisted. Note this >>>>>> may also point at the real issue of the code hoisting which is >>>>>> tearing apart the RMW operation? >>>>> Indeed, this possibility seems much more likely than block being >on loop exit. >>>>> I will try to "hardcode" the load/store hoists into block 4 for >this >>>>> specific test-case to check >>>>> if that prevents the spill. >>>> Even if it prevents the spill in this case, it's likely a good >thing to >>>> do. The statements prior to the conditional in bb5 and bb8 should >be >>>> hoisted, leaving bb5 and bb8 with just their conditionals. >>> Hi, >>> It seems disabling forwprop somehow works for causing no extra >spills >>> on the original test-case. >>> >>> For instance, >>> Hoisting without forwprop: >>> >>> bb 3: >>> _1 =3D tab_1(D) + 8 >>> pretmp_268 =3D MEM[tab_1(D) + 8B]; >>> _2 =3D pretmp_268 + 1; >>> goto or >>> >>> bb 4: >>> *_1 =3D _ 2 >>> >>> bb 5: >>> *_1 =3D _2 >>> >>> Hoisting with forwprop: >>> >>> bb 3: >>> pretmp_164 =3D MEM[tab_1(D) + 8B]; >>> _2 =3D pretmp_164 + 1 >>> goto or >>> >>> bb 4: >>> MEM[tab_1(D) + 8] =3D _2; >>> >>> bb 5: >>> MEM[tab_1(D) + 8] =3D _2; >>> >>> Although in both cases, we aren't hoisting stores, the issues with >forwprop >>> for this case seems to be the folding of >>> *_1 =3D _2 >>> into >>> MEM[tab_1(D) + 8] =3D _2 ? >>=20 >> This isn't an issue, right? IIUC, tab_1(D) used all over the loop >> thus propagating _1 using (tab_1(D) + 8) actually removes one live >> range. >>=20 >>> >>> Disabling folding to mem_ref[base + offset] in forwprop "works" in >the >>> sense it created same set of hoistings as without forwprop, however >it >>> still results in additional spills (albeit different registers). >>> >>> That's because forwprop seems to be increasing live range of >>> prephitmp_217 by substituting >>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + >1). >> Hmm, it's hard to discuss private benchmarks, not sure which dump >> shall I find prephitmp_221/prephitmp_217 stuff. >>=20 >>> On the other hand, Bin pointed out to me in private that forwprop >also >>> helps to restrict register pressure by propagating "tab + const_int" >>> for same test-case. >>> >>> So I am not really sure if there's an easier fix than having >>> heuristics for estimating register pressure at TREE level ? I would >be >> Easy fix, maybe not. OTOH, I am more convinced passes like >> forwprop/sink/hoisting can be improved by taking live range into >> consideration. Specifically, to direct such passes when moving code >> around different basic blocks, because inter-block register pressure >> is hard to resolve afterwards. >>=20 >> As suggested by Jeff and Richi, I guess the first step would be doing >> experiments, collecting more benchmark data for reordering sink >before >> pre? It enables code sink as well as decreases register pressure in >> the original reduced cases IIRC. >We might even consider re-evaluating Bernd's work on what is >effectively >a gimple scheduler to minimize register pressure. Sure. The main issue here I see is with the interaction with TER which we u= nfortunately still rely on. Enough GIMPLE instruction selection might help = to get rid of the remaining pieces...=20 >Or we could look to extend your work into a generalized pressure >reducing pass that we could run near the gimple/rtl border. > >The final possibility would be Click's algorithm from '95 adjusted to >just do pressure reduction. > >jeff