From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 126709 invoked by alias); 25 May 2018 16:57:22 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 126673 invoked by uid 89); 25 May 2018 16:57:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=generalized X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 May 2018 16:57:17 +0000 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.25]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6FF94165EFF; Fri, 25 May 2018 16:57:16 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-50.rdu2.redhat.com [10.10.112.50]) by smtp.corp.redhat.com (Postfix) with ESMTP id D2DA62010CC1; Fri, 25 May 2018 16:57:14 +0000 (UTC) Subject: Re: PR80155: Code hoisting and register pressure To: "Bin.Cheng" , Prathamesh Kulkarni Cc: Richard Biener , GCC Development , Thomas Preudhomme References: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com> From: Jeff Law Openpgp: preference=signencrypt Autocrypt: addr=law@redhat.com; prefer-encrypt=mutual; keydata= xsBNBFkbIO8BCACVIqDhDVh9ur8C+zNV1J/cXfwvVDAUcphDEFl4jyHqZORK4Pd3Db8oWqLm Q8lOCr/VOS7lrCtdpVMQkLGOGA16oJ8g7hzhnojpjY09UjsoUiG7oKacuxj8skfp6SIx93Zl +iNYPRa4S+za6nY8qiVjyUuiyX04ZPZMrKp2c2sGi+HnBKUZXGhrz/Jdzdox3tjajWZnObyy nhEN6hn9L3KawTtGPE/R6A/1RhHTD9FQmIWIeucpaY5c6GNKXTFpj2VYx57LY5hve1R5vhrJ IZcgwZAiOtmik5lVi96glY5h6bugRwpexjhwORTLPBCkwiYotSxX99mWd6EHL576i5CNABEB AAHNGUplZmYgTGF3IDxsYXdAcmVkaGF0LmNvbT7CwI4EEwEIADgWIQR+niGjtnP5P/8PpRq8 fP682pgzWwUCWRsg7wIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRC8fP682pgzW5QG B/9VATJmx5235RB+8jiDYGXQf3vd9gBfPy/l1tsaK400eFAevDzfGvKmeCKe+uGnlrH3vyT8 rg9zqH+s5a1Y+lDXPOpJAFmmzbOLU4FW4ucbawmtYvBL65PqpQneCTYnC802/OAcxjm/Onem HlgeK6WicNsBTPwYN/0araDFUejyYBIFi9CNqqflwk5Z3brKbQ9bAYIkysVLC/c3njKPmM0c WPFHG91ubLbWCHwTIK0+mAL714eTD74dXzOjO2ZDBPLGlFN/kO3+YjaO6UOD2O8acvAMCivT kWLr7JwRgLIQDN2DkhQDd3LTPqQE/yOcMcXBTO+fxm8KG0iKQBqWMyGJzsBNBFkbIO8BCACy qbOsv7XegSeea8XORt5zMaBVWKoSyhmmcCmlxZFS2cuYOBt79MO13lZE2DlO3Lv5IKikj/D4 ketGVO4+h5psEMH5Yz5P8bx0TmgwbK1GxPZrzeXozUFJDvvCDbIlT0v0pwUXuK3hg8Ieo2h5 uTed/cn1OjySXW5BqLxN0cyr5hL+J6dcsHvKLT/N3nTgCQhoJXK2MrEMhAGgF3jKpMn3CoS4 i/ZbNI2MQR6LWHwdZ95f0fI8NzHSfVzeLtzCKQec7nr9fgd6Ylk1ZpGWQUPlQmKjzYgeCeTK NO04cwt20WIrQWeWiZFPA0U86NDBdSBrYp4kG3dfIXE+wSSvE7qPABEBAAHCwHYEGAEIACAW IQR+niGjtnP5P/8PpRq8fP682pgzWwUCWRsg7wIbDAAKCRC8fP682pgzW3REB/9cT7iKRPg/ OK9bpLlllIEDM90IaKC79DQrv+fRudOR78cdV4XUwPSFnyHUsP3VJ4lDy5FhiKCwGie0BK53 EsxgMrLy1L8hboFdTE4Vi0xzCheMaMVp4hATDU29k1cuxu1VPpCa8E3mYeHjNV7ip0HN5L4D rfs8lRPJE/oM1vGs9DgQFZrCPPNRNGKC97BH+DHccesEJr7tSsQrkPkt0z/FTKr5wIM02vSx OJjgmcVbGB7dc2j/Sx8loXmuKnuKtM35668kUG8jeJvSQk3o/VHpD27bhl0rR68R2jN6G6kQ egMVb6dPu1Ius8rBE5rFw88J4JEb5q4hMNClWWUFHIdP Message-ID: <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> Date: Fri, 25 May 2018 16:57:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-05/txt/msg00229.txt.bz2 On 05/25/2018 03:49 AM, Bin.Cheng wrote: > On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni > wrote: >> On 23 May 2018 at 18:37, Jeff Law wrote: >>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote: >>>> On 23 May 2018 at 13:58, Richard Biener wrote: >>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote: >>>>> >>>>>> Hi, >>>>>> I am trying to work on PR80155, which exposes a problem with code >>>>>> hoisting and register pressure on a leading embedded benchmark for ARM >>>>>> cortex-m7, where code-hoisting causes an extra register spill. >>>>>> >>>>>> I have attached two test-cases which (hopefully) are representative of >>>>>> the original test-case. >>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to the >>>>>> original test-case and trans_dfa_2.c is hand-reduced version of >>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c >>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases. >>>>>> The test-cases in the PR are probably not relevant. >>>>>> >>>>>> Initially I thought the spill was happening because of "too many >>>>>> hoistings" taking place in original test-case thus increasing the >>>>>> register pressure, but it seems the spill is possibly caused because >>>>>> expression gets hoisted out of a block that is on loop exit. >>>>>> >>>>>> For example, the following hoistings take place with trans_dfa_2.c: >>>>>> >>>>>> (1) Inserting expression in block 4 for code hoisting: >>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005) >>>>>> >>>>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006) >>>>>> >>>>>> (3) Inserting expression in block 4 for code hoisting: >>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>> >>>>>> (4) Inserting expression in block 3 for code hoisting: >>>>>> {pointer_plus_expr,s_33,1} (0023) >>>>>> >>>>>> The issue seems to be hoisting of (*tab + 1) which consists of first >>>>>> two hoistings in block 4 >>>>>> from blocks 5 and 9, which causes the extra spill. I verified that by >>>>>> disabling hoisting into block 4, >>>>>> which resulted in no extra spills. >>>>>> >>>>>> I wonder if that's because the expression (*tab + 1) is getting >>>>>> hoisted from blocks 5 and 9, >>>>>> which are on loop exit ? So the expression that was previously >>>>>> computed in a block on loop exit, gets hoisted outside that block >>>>>> which possibly makes the allocator more defensive ? Similarly >>>>>> disabling hoisting of expressions which appeared in blocks on loop >>>>>> exit in original test-case prevented the extra spill. The other >>>>>> hoistings didn't seem to matter. >>>>> >>>>> I think that's simply co-incidence. The only thing that makes >>>>> a block that also exits from the loop special is that an >>>>> expression could be sunk out of the loop and hoisting (commoning >>>>> with another path) could prevent that. But that isn't what is >>>>> happening here and it would be a pass ordering issue as >>>>> the sinking pass runs only after hoisting (no idea why exactly >>>>> but I guess there are cases where we want to prefer CSE over >>>>> sinking). So you could try if re-ordering PRE and sinking helps >>>>> your testcase. >>>> Thanks for the suggestions. Placing sink pass before PRE works >>>> for both these test-cases! Sadly it still causes the spill for the benchmark -:( >>>> I will try to create a better approximation of the original test-case. >>>>> >>>>> What I do see is a missed opportunity to merge the successors >>>>> of BB 4. After PRE we have >>>>> >>>>> [local count: 159303558]: >>>>> : >>>>> pretmp_123 = *tab_37(D); >>>>> _87 = pretmp_123 + 1; >>>>> if (c_36 == 65) >>>>> goto ; [34.00%] >>>>> else >>>>> goto ; [66.00%] >>>>> >>>>> [local count: 54163210]: >>>>> *tab_37(D) = _87; >>>>> _96 = MEM[(char *)s_57 + 1B]; >>>>> if (_96 != 0) >>>>> goto ; [89.00%] >>>>> else >>>>> goto ; [11.00%] >>>>> >>>>> [local count: 105140348]: >>>>> *tab_37(D) = _87; >>>>> _56 = MEM[(char *)s_57 + 1B]; >>>>> if (_56 != 0) >>>>> goto ; [89.00%] >>>>> else >>>>> goto ; [11.00%] >>>>> >>>>> here at least the stores and loads can be hoisted. Note this >>>>> may also point at the real issue of the code hoisting which is >>>>> tearing apart the RMW operation? >>>> Indeed, this possibility seems much more likely than block being on loop exit. >>>> I will try to "hardcode" the load/store hoists into block 4 for this >>>> specific test-case to check >>>> if that prevents the spill. >>> Even if it prevents the spill in this case, it's likely a good thing to >>> do. The statements prior to the conditional in bb5 and bb8 should be >>> hoisted, leaving bb5 and bb8 with just their conditionals. >> Hi, >> It seems disabling forwprop somehow works for causing no extra spills >> on the original test-case. >> >> For instance, >> Hoisting without forwprop: >> >> bb 3: >> _1 = tab_1(D) + 8 >> pretmp_268 = MEM[tab_1(D) + 8B]; >> _2 = pretmp_268 + 1; >> goto or >> >> bb 4: >> *_1 = _ 2 >> >> bb 5: >> *_1 = _2 >> >> Hoisting with forwprop: >> >> bb 3: >> pretmp_164 = MEM[tab_1(D) + 8B]; >> _2 = pretmp_164 + 1 >> goto or >> >> bb 4: >> MEM[tab_1(D) + 8] = _2; >> >> bb 5: >> MEM[tab_1(D) + 8] = _2; >> >> Although in both cases, we aren't hoisting stores, the issues with forwprop >> for this case seems to be the folding of >> *_1 = _2 >> into >> MEM[tab_1(D) + 8] = _2 ? > > This isn't an issue, right? IIUC, tab_1(D) used all over the loop > thus propagating _1 using (tab_1(D) + 8) actually removes one live > range. > >> >> Disabling folding to mem_ref[base + offset] in forwprop "works" in the >> sense it created same set of hoistings as without forwprop, however it >> still results in additional spills (albeit different registers). >> >> That's because forwprop seems to be increasing live range of >> prephitmp_217 by substituting >> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1). > Hmm, it's hard to discuss private benchmarks, not sure which dump > shall I find prephitmp_221/prephitmp_217 stuff. > >> On the other hand, Bin pointed out to me in private that forwprop also >> helps to restrict register pressure by propagating "tab + const_int" >> for same test-case. >> >> So I am not really sure if there's an easier fix than having >> heuristics for estimating register pressure at TREE level ? I would be > Easy fix, maybe not. OTOH, I am more convinced passes like > forwprop/sink/hoisting can be improved by taking live range into > consideration. Specifically, to direct such passes when moving code > around different basic blocks, because inter-block register pressure > is hard to resolve afterwards. > > As suggested by Jeff and Richi, I guess the first step would be doing > experiments, collecting more benchmark data for reordering sink before > pre? It enables code sink as well as decreases register pressure in > the original reduced cases IIRC. We might even consider re-evaluating Bernd's work on what is effectively a gimple scheduler to minimize register pressure. Or we could look to extend your work into a generalized pressure reducing pass that we could run near the gimple/rtl border. The final possibility would be Click's algorithm from '95 adjusted to just do pressure reduction. jeff