Re: PR80155: Code hoisting and register pressure

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <law@redhat.com>
To: "Bin.Cheng" <amker.cheng@gmail.com>,
	Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: Richard Biener <rguenther@suse.de>,
	GCC Development <gcc@gcc.gnu.org>,
	Thomas Preudhomme <thomas.preudhomme@linaro.org>
Subject: Re: PR80155: Code hoisting and register pressure
Date: Fri, 25 May 2018 16:57:00 -0000	[thread overview]
Message-ID: <66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com> (raw)
In-Reply-To: <CAHFci2_5gZ4vVOkxFoRDhPJH-VaGYGTjt4DZgwLCK_weLQefKQ@mail.gmail.com>

On 05/25/2018 03:49 AM, Bin.Cheng wrote:
> On Fri, May 25, 2018 at 10:23 AM, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>> On 23 May 2018 at 18:37, Jeff Law <law@redhat.com> wrote:
>>> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>>>> On 23 May 2018 at 13:58, Richard Biener <rguenther@suse.de> wrote:
>>>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>>>>>
>>>>>> Hi,
>>>>>> I am trying to work on PR80155, which exposes a problem with code
>>>>>> hoisting and register pressure on a leading embedded benchmark for ARM
>>>>>> cortex-m7, where code-hoisting causes an extra register spill.
>>>>>>
>>>>>> I have attached two test-cases which (hopefully) are representative of
>>>>>> the original test-case.
>>>>>> The first one (trans_dfa.c) is bigger and somewhat similar to the
>>>>>> original test-case and trans_dfa_2.c is hand-reduced version of
>>>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c
>>>>>> and one spill with trans_dfa_2.c due to lesser amount of cases.
>>>>>> The test-cases in the PR are probably not relevant.
>>>>>>
>>>>>> Initially I thought the spill was happening because of "too many
>>>>>> hoistings" taking place in original test-case thus increasing the
>>>>>> register pressure, but it seems the spill is possibly caused because
>>>>>> expression gets hoisted out of a block that is on loop exit.
>>>>>>
>>>>>> For example, the following hoistings take place with trans_dfa_2.c:
>>>>>>
>>>>>> (1) Inserting expression in block 4 for code hoisting:
>>>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>>>>>>
>>>>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006)
>>>>>>
>>>>>> (3) Inserting expression in block 4 for code hoisting:
>>>>>> {pointer_plus_expr,s_33,1} (0023)
>>>>>>
>>>>>> (4) Inserting expression in block 3 for code hoisting:
>>>>>> {pointer_plus_expr,s_33,1} (0023)
>>>>>>
>>>>>> The issue seems to be hoisting of (*tab + 1) which consists of first
>>>>>> two hoistings in block 4
>>>>>> from blocks 5 and 9, which causes the extra spill. I verified that by
>>>>>> disabling hoisting into block 4,
>>>>>> which resulted in no extra spills.
>>>>>>
>>>>>> I wonder if that's because the expression (*tab + 1) is getting
>>>>>> hoisted from blocks 5 and 9,
>>>>>> which are on loop exit ? So the expression that was previously
>>>>>> computed in a block on loop exit, gets hoisted outside that block
>>>>>> which possibly makes the allocator more defensive ? Similarly
>>>>>> disabling hoisting of expressions which appeared in blocks on loop
>>>>>> exit in original test-case prevented the extra spill. The other
>>>>>> hoistings didn't seem to matter.
>>>>>
>>>>> I think that's simply co-incidence.  The only thing that makes
>>>>> a block that also exits from the loop special is that an
>>>>> expression could be sunk out of the loop and hoisting (commoning
>>>>> with another path) could prevent that.  But that isn't what is
>>>>> happening here and it would be a pass ordering issue as
>>>>> the sinking pass runs only after hoisting (no idea why exactly
>>>>> but I guess there are cases where we want to prefer CSE over
>>>>> sinking).  So you could try if re-ordering PRE and sinking helps
>>>>> your testcase.
>>>> Thanks for the suggestions. Placing sink pass before PRE works
>>>> for both these test-cases! Sadly it still causes the spill for the benchmark -:(
>>>> I will try to create a better approximation of the original test-case.
>>>>>
>>>>> What I do see is a missed opportunity to merge the successors
>>>>> of BB 4.  After PRE we have
>>>>>
>>>>> <bb 4> [local count: 159303558]:
>>>>> <L1>:
>>>>> pretmp_123 = *tab_37(D);
>>>>> _87 = pretmp_123 + 1;
>>>>> if (c_36 == 65)
>>>>>   goto <bb 5>; [34.00%]
>>>>> else
>>>>>   goto <bb 8>; [66.00%]
>>>>>
>>>>> <bb 5> [local count: 54163210]:
>>>>> *tab_37(D) = _87;
>>>>> _96 = MEM[(char *)s_57 + 1B];
>>>>> if (_96 != 0)
>>>>>   goto <bb 7>; [89.00%]
>>>>> else
>>>>>   goto <bb 6>; [11.00%]
>>>>>
>>>>> <bb 8> [local count: 105140348]:
>>>>> *tab_37(D) = _87;
>>>>> _56 = MEM[(char *)s_57 + 1B];
>>>>> if (_56 != 0)
>>>>>   goto <bb 10>; [89.00%]
>>>>> else
>>>>>   goto <bb 9>; [11.00%]
>>>>>
>>>>> here at least the stores and loads can be hoisted.  Note this
>>>>> may also point at the real issue of the code hoisting which is
>>>>> tearing apart the RMW operation?
>>>> Indeed, this possibility seems much more likely than block being on loop exit.
>>>> I will try to "hardcode" the load/store hoists into block 4 for this
>>>> specific test-case to check
>>>> if that prevents the spill.
>>> Even if it prevents the spill in this case, it's likely a good thing to
>>> do.  The statements prior to the conditional in bb5 and bb8 should be
>>> hoisted, leaving bb5 and bb8 with just their conditionals.
>> Hi,
>> It seems disabling forwprop somehow works for causing no extra spills
>> on the original test-case.
>>
>> For instance,
>> Hoisting without forwprop:
>>
>> bb 3:
>> _1 = tab_1(D) + 8
>> pretmp_268 = MEM[tab_1(D) + 8B];
>> _2 = pretmp_268 + 1;
>> goto <bb 4> or <bb 5>
>>
>> bb 4:
>>  *_1 = _ 2
>>
>> bb 5:
>> *_1 = _2
>>
>> Hoisting with forwprop:
>>
>> bb 3:
>> pretmp_164 = MEM[tab_1(D) + 8B];
>> _2 = pretmp_164 + 1
>> goto <bb 4> or <bb 5>
>>
>> bb 4:
>> MEM[tab_1(D) + 8] = _2;
>>
>> bb 5:
>> MEM[tab_1(D) + 8] = _2;
>>
>> Although in both cases, we aren't hoisting stores, the issues with forwprop
>> for this case seems to be the folding of
>> *_1 = _2
>> into
>> MEM[tab_1(D) + 8] = _2  ?
> 
> This isn't an issue, right?  IIUC, tab_1(D) used all over the loop
> thus propagating _1 using (tab_1(D) + 8) actually removes one live
> range.
> 
>>
>> Disabling folding to mem_ref[base + offset] in forwprop "works" in the
>> sense it created same set of hoistings as without forwprop, however it
>> still results in additional spills (albeit different registers).
>>
>> That's because forwprop seems to be increasing live range of
>> prephitmp_217 by substituting
>> _221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1).
> Hmm, it's hard to discuss private benchmarks, not sure which dump
> shall I find prephitmp_221/prephitmp_217 stuff.
> 
>> On the other hand, Bin pointed out to me in private that forwprop also
>> helps to restrict register pressure by propagating "tab + const_int"
>> for same test-case.
>>
>> So I am not really sure if there's an easier fix than having
>> heuristics for estimating register pressure at TREE level ? I would be
> Easy fix, maybe not.  OTOH, I am more convinced passes like
> forwprop/sink/hoisting can be improved by taking live range into
> consideration.  Specifically, to direct such passes when moving code
> around different basic blocks, because inter-block register pressure
> is hard to resolve afterwards.
> 
> As suggested by Jeff and Richi, I guess the first step would be doing
> experiments, collecting more benchmark data for reordering sink before
> pre?  It enables code sink as well as decreases register pressure in
> the original reduced cases IIRC.
We might even consider re-evaluating Bernd's work on what is effectively
a gimple scheduler to minimize register pressure.

Or we could look to extend your work into a generalized pressure
reducing pass that we could run near the gimple/rtl border.

The final possibility would be Click's algorithm from '95 adjusted to
just do pressure reduction.

jeff

next prev parent reply	other threads:[~2018-05-25 16:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23  7:27 Prathamesh Kulkarni
2018-05-23  8:29 ` Richard Biener
2018-05-23  9:20   ` Prathamesh Kulkarni
2018-05-23 13:07     ` Jeff Law
2018-05-25  9:23       ` Prathamesh Kulkarni
2018-05-25  9:49         ` Bin.Cheng
2018-05-25  9:58           ` Richard Biener
2018-05-25 16:57           ` Jeff Law [this message]
2018-05-25 17:54             ` Richard Biener
2018-05-25 19:26               ` Jeff Law
2018-05-26  6:10                 ` Richard Biener
2018-05-26 18:07               ` Bin.Cheng
2018-05-28 15:22                 ` Richard Biener
2018-05-23  9:40   ` Bin.Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66d75fc6-5553-212d-1aca-d92b3c6193b2@redhat.com \
    --to=law@redhat.com \
    --cc=amker.cheng@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=prathamesh.kulkarni@linaro.org \
    --cc=rguenther@suse.de \
    --cc=thomas.preudhomme@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).