From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-196114-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 42459 invoked by alias); 25 May 2018 09:23:48 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 42379 invoked by uid 89); 25 May 2018 09:23:43 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=albeit, 0023
X-HELO: mail-wm0-f47.google.com
Received: from mail-wm0-f47.google.com (HELO mail-wm0-f47.google.com) (74.125.82.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 May 2018 09:23:41 +0000
Received: by mail-wm0-f47.google.com with SMTP id q4-v6so12361675wmq.1        for <gcc@gcc.gnu.org>; Fri, 25 May 2018 02:23:41 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=hXkvy/YDnYgF7h45o2zafFx1XqyXigsODzu4/vin1wc=;        b=uKRC191MruJ3sTvZl07dbpUq6+hzO+4RB/lPMNHmQ92CLXW4uQ16aNQGwHk4r9BFvn         xoxgpnIDvIo+r+RkY7VBmPDhKoOdueWaNGIsNEOJNdPnSw2kSpezL69ZG/iM55BOtNEi         bbmkUIhzcn70cnbOZN9Em5qXN/SOvWj/CdoHQ1Mir8ed7VOfd4osRFJlkuy9sdEUV8jI         GMkFBK58CqfOTf+LMC2fWmE7mPIRReE9tyUGa7pBq4yxpVrjWW6+YxYobom3bXzP66kV         Il9ehO7HZ+G8BWjUjx8XVTVRPf1chKK9PzVPBS8JjjOZttB53AGFaJ3oKn6aK8Vdb1Fu         Qd7A==
X-Gm-Message-State: ALKqPwfNNZsD/UOubI2HG+t/8Ub+V4G07b8DJVXCltJ+Q+o0PN0EQzvU	PXq1x5UYravVWC05yaJVFM+eb1aErRxe2Bau1oFeCQ==
X-Google-Smtp-Source: AB8JxZp2VUtgNlUpcMnk9jw7DC9xT98b45I9+dwTE/SCZ0/q2FdbxYk4ygwWXr6t4JC8yaD660/FEgRuaumtLaEowr4=
X-Received: by 2002:a1c:5894:: with SMTP id m142-v6mr1112447wmb.10.1527240219709; Fri, 25 May 2018 02:23:39 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:adf:de10:0:0:0:0:0 with HTTP; Fri, 25 May 2018 02:23:39 -0700 (PDT)
In-Reply-To: <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com>
References: <CAAgBjMndG3T40NnF6-SZkzyF_UzQwzXGT6HcnMJr0HfBD5mTKg@mail.gmail.com> <alpine.LSU.2.20.1805231011350.24704@zhemvz.fhfr.qr> <CAAgBjMnx87jrW7ZusO5OHdVbvNiw1VLu9C-2Zw2x5gH5MpfLCg@mail.gmail.com> <014f7b2a-3c64-4144-37a4-4cc7bdff3d47@redhat.com>
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Date: Fri, 25 May 2018 09:23:00 -0000
Message-ID: <CAAgBjMnEzskovbNKffYKfkTsN2+WRbjwaARCuhCss-S4-FaPGQ@mail.gmail.com>
Subject: Re: PR80155: Code hoisting and register pressure
To: Jeff Law <law@redhat.com>
Cc: Richard Biener <rguenther@suse.de>, gcc@gcc.gnu.org, Bin Cheng <Bin.Cheng@arm.com>, 	Thomas Preudhomme <thomas.preudhomme@linaro.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2018-05/txt/msg00226.txt.bz2

On 23 May 2018 at 18:37, Jeff Law <law@redhat.com> wrote:
> On 05/23/2018 03:20 AM, Prathamesh Kulkarni wrote:
>> On 23 May 2018 at 13:58, Richard Biener <rguenther@suse.de> wrote:
>>> On Wed, 23 May 2018, Prathamesh Kulkarni wrote:
>>>
>>>> Hi,
>>>> I am trying to work on PR80155, which exposes a problem with code
>>>> hoisting and register pressure on a leading embedded benchmark for ARM
>>>> cortex-m7, where code-hoisting causes an extra register spill.
>>>>
>>>> I have attached two test-cases which (hopefully) are representative of
>>>> the original test-case.
>>>> The first one (trans_dfa.c) is bigger and somewhat similar to the
>>>> original test-case and trans_dfa_2.c is hand-reduced version of
>>>> trans_dfa.c. There's 2 spills caused with trans_dfa.c
>>>> and one spill with trans_dfa_2.c due to lesser amount of cases.
>>>> The test-cases in the PR are probably not relevant.
>>>>
>>>> Initially I thought the spill was happening because of "too many
>>>> hoistings" taking place in original test-case thus increasing the
>>>> register pressure, but it seems the spill is possibly caused because
>>>> expression gets hoisted out of a block that is on loop exit.
>>>>
>>>> For example, the following hoistings take place with trans_dfa_2.c:
>>>>
>>>> (1) Inserting expression in block 4 for code hoisting:
>>>> {mem_ref<0B>,tab_20(D)}@.MEM_45 (0005)
>>>>
>>>> (2) Inserting expression in block 4 for code hoisting: {plus_expr,_4,1} (0006)
>>>>
>>>> (3) Inserting expression in block 4 for code hoisting:
>>>> {pointer_plus_expr,s_33,1} (0023)
>>>>
>>>> (4) Inserting expression in block 3 for code hoisting:
>>>> {pointer_plus_expr,s_33,1} (0023)
>>>>
>>>> The issue seems to be hoisting of (*tab + 1) which consists of first
>>>> two hoistings in block 4
>>>> from blocks 5 and 9, which causes the extra spill. I verified that by
>>>> disabling hoisting into block 4,
>>>> which resulted in no extra spills.
>>>>
>>>> I wonder if that's because the expression (*tab + 1) is getting
>>>> hoisted from blocks 5 and 9,
>>>> which are on loop exit ? So the expression that was previously
>>>> computed in a block on loop exit, gets hoisted outside that block
>>>> which possibly makes the allocator more defensive ? Similarly
>>>> disabling hoisting of expressions which appeared in blocks on loop
>>>> exit in original test-case prevented the extra spill. The other
>>>> hoistings didn't seem to matter.
>>>
>>> I think that's simply co-incidence.  The only thing that makes
>>> a block that also exits from the loop special is that an
>>> expression could be sunk out of the loop and hoisting (commoning
>>> with another path) could prevent that.  But that isn't what is
>>> happening here and it would be a pass ordering issue as
>>> the sinking pass runs only after hoisting (no idea why exactly
>>> but I guess there are cases where we want to prefer CSE over
>>> sinking).  So you could try if re-ordering PRE and sinking helps
>>> your testcase.
>> Thanks for the suggestions. Placing sink pass before PRE works
>> for both these test-cases! Sadly it still causes the spill for the benchmark -:(
>> I will try to create a better approximation of the original test-case.
>>>
>>> What I do see is a missed opportunity to merge the successors
>>> of BB 4.  After PRE we have
>>>
>>> <bb 4> [local count: 159303558]:
>>> <L1>:
>>> pretmp_123 = *tab_37(D);
>>> _87 = pretmp_123 + 1;
>>> if (c_36 == 65)
>>>   goto <bb 5>; [34.00%]
>>> else
>>>   goto <bb 8>; [66.00%]
>>>
>>> <bb 5> [local count: 54163210]:
>>> *tab_37(D) = _87;
>>> _96 = MEM[(char *)s_57 + 1B];
>>> if (_96 != 0)
>>>   goto <bb 7>; [89.00%]
>>> else
>>>   goto <bb 6>; [11.00%]
>>>
>>> <bb 8> [local count: 105140348]:
>>> *tab_37(D) = _87;
>>> _56 = MEM[(char *)s_57 + 1B];
>>> if (_56 != 0)
>>>   goto <bb 10>; [89.00%]
>>> else
>>>   goto <bb 9>; [11.00%]
>>>
>>> here at least the stores and loads can be hoisted.  Note this
>>> may also point at the real issue of the code hoisting which is
>>> tearing apart the RMW operation?
>> Indeed, this possibility seems much more likely than block being on loop exit.
>> I will try to "hardcode" the load/store hoists into block 4 for this
>> specific test-case to check
>> if that prevents the spill.
> Even if it prevents the spill in this case, it's likely a good thing to
> do.  The statements prior to the conditional in bb5 and bb8 should be
> hoisted, leaving bb5 and bb8 with just their conditionals.
Hi,
It seems disabling forwprop somehow works for causing no extra spills
on the original test-case.

For instance,
Hoisting without forwprop:

bb 3:
_1 = tab_1(D) + 8
pretmp_268 = MEM[tab_1(D) + 8B];
_2 = pretmp_268 + 1;
goto <bb 4> or <bb 5>

bb 4:
 *_1 = _ 2

bb 5:
*_1 = _2

Hoisting with forwprop:

bb 3:
pretmp_164 = MEM[tab_1(D) + 8B];
_2 = pretmp_164 + 1
goto <bb 4> or <bb 5>

bb 4:
MEM[tab_1(D) + 8] = _2;

bb 5:
MEM[tab_1(D) + 8] = _2;

Although in both cases, we aren't hoisting stores, the issues with forwprop
for this case seems to be the folding of
*_1 = _2
into
MEM[tab_1(D) + 8] = _2  ?

Disabling folding to mem_ref[base + offset] in forwprop "works" in the
sense it created same set of hoistings as without forwprop, however it
still results in additional spills (albeit different registers).

That's because forwprop seems to be increasing live range of
prephitmp_217 by substituting
_221 + 1 with prephitmp_217 + 2 (_221 is defined as prephitmp_217 + 1).
On the other hand, Bin pointed out to me in private that forwprop also
helps to restrict register pressure by propagating "tab + const_int"
for same test-case.

So I am not really sure if there's an easier fix than having
heuristics for estimating register pressure at TREE level ? I would be
grateful for suggestions on how to proceed from here.
Thanks!

Regards,
Prathamesh
>
> Jeff