From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-199235-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 39820 invoked by alias); 31 May 2019 15:40:47 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 39811 invoked by uid 89); 31 May 2019 15:40:47 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=flagged, so, absence, boat
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 May 2019 15:40:44 +0000
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 8E84081E05;	Fri, 31 May 2019 15:40:43 +0000 (UTC)
Received: from [10.10.124.16] (ovpn-124-16.rdu2.redhat.com [10.10.124.16])	by smtp.corp.redhat.com (Postfix) with ESMTP id 456FB5DA34;	Fri, 31 May 2019 15:40:41 +0000 (UTC)
Subject: Re: On-Demand range technology [2/5] - Major Components : How it works
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC <gcc@gcc.gnu.org>, Jeff Law <law@redhat.com>, Aldy Hernandez <aldyh@redhat.com>
References: <908dbc60-b7b9-c609-f999-cc7264af0c6e@redhat.com> <CAFiYyc2a_h7AVa44G+wrAic=op+qbaGgLHbb+d8wGc3bFZdDaQ@mail.gmail.com> <a6a5c7ea-f4a1-c671-5ff4-cbd5183a7606@redhat.com> <CAFiYyc3AGjzJpYJR4FFrN-J1yfa++aQ_9MJYiiZhJ4MS4Nw2qw@mail.gmail.com> <91bfd08e-1431-d8ba-f9b5-e8822aaf62e2@redhat.com> <CAFiYyc2Qvk2NJ0Akwakm8CEUz0ZdOFc4BbK1_xdQYpvdej=hgA@mail.gmail.com>
From: Andrew MacLeod <amacleod@redhat.com>
Message-ID: <dbdfa6db-ddcf-cad9-69ed-0e514a7aea68@redhat.com>
Date: Fri, 31 May 2019 15:40:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1
MIME-Version: 1.0
In-Reply-To: <CAFiYyc2Qvk2NJ0Akwakm8CEUz0ZdOFc4BbK1_xdQYpvdej=hgA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2019-05/txt/msg00280.txt.bz2

On 5/29/19 7:15 AM, Richard Biener wrote:
> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>>>> The above suggests that iff this is done at all it is not in GORI because
>>>>> those are not conditional stmts or ranges from feeding those.  The
>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>> telling us that for the stmt following above, for example
>>>>>
>>>>>     if (_5 != 0)
>>>>>
>>>>> that _5 is not zero?
>>>>>
>>>>> Can you clarify?
>>>> So there are 2 aspects to this.    the range-ops code for DIV_EXPR, if
>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>> But you are also correct in that the walk backwards would not find this.
>>>>
>>>> This is similar functionality to how null_derefs are currently handled,
>>>> and in fact could probably be done simultaneously using the same code
>>>> base.   I didn't bring null derefs up, but this is a good time :-)
>>>>
>>>> There is a separate class used by the gori-cache which tracks the
>>>> non-nullness property at the block level.    It has a single API:
>>>> non_null_deref_p (name, bb)    which determines whether the is a
>>>> dereference in any BB for NAME, which indicates whether the range has an
>>>> implicit ~[0,0] range in that basic block or not.
>>> So when we then have
>>>
>>>    _1 = *_2; // after this _2 is non-NULL
>>>    _3 = _1 + 1; // _3 is non-NULL
>>>    _4 = *_3;
>>> ...
>>>
>>> when a on-demand user asks whether _3 is non-NULL at the
>>> point of _4 = *_3 we don't have this information?  Since the
>>> per-BB caching will only say _1 is non-NULL after the BB.
>>> I'm also not sure whether _3 ever gets non-NULL during
>>> non-NULL processing of the block since walking immediate uses
>>> doesn't really help here?
>> presumably _3 is globally non-null due to the definition being (pointer
>> + x)  ... ie, _3 has a global range o f ~[0,0] ?
> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.

I'm confused.

_1 was loaded from _2 (thus asserting _2 is non-NULL).Â  but we have no 
idea what the range of _1 is, soÂ  how do you assert _1 is [~0,0] ?
The only way I see to determine _3 is non-NULLÂ  is through the _4 = *_3 
statement.


>
>>> So this seems to be a fundamental limitation [to the caching scheme],
>>> not sure if it is bad in practice.
>>>
>>> Or am I missing something?
>>>
>> Not missing anything  The non-nullness property is maintains globally at
>> the basic block level.  both _1 and _3 are flagged as being non-null in
>> the block.  Upon exit, its a bit check.   If the global information does
>> not have the non-nullness property, then when a request is made for
>> non-nullness  and the def and the use are both within the same block,
>> and its flagged as being non-null in that block, then the request is
>> forced back to a quick walk between the def and the use to see if there
>> is any non-nulless introduced in between.   Yes, that makes it a linear
>> walk, but its infrequent, and often short.  to the best of our knowledge
>> at this point anyway :-)
> So with the clarification above do we ever see that _3 is non-NULL?
> I suppose the worker processing _3 = _1 + 1 would ask for
> _1 non-nullness but we do not record any non-NULL-ness of _1 in
> this basic-block (but only at its end).  Consider stmts
>
>   _4 = (uintptr_t) _2;
>   _5 = _6 / _4;
>   _1 = *_2;
> ...
>
> here at _1 we know _2 is not NULL.  But if we ask for non-NULLness
> of _2 at the definition of _4 we may not compute ~[0, 0] and thus
> conclude that _6 / _4 does not trap.
EVRP must look backwards to figure this out since the forward walk will 
process _5 = _6 / _4 before it sees the dereference to _2... so how does 
it know that _4 is non-zero without looking backwards at things after it 
sees the dereference??Â  Does it actually do this?

>
> stmt-level tracking of ranges are sometimes important.  This is
> something the machinery cannot provide - correct?  At least not
> optimistically enough with ranges derived about uses.

Maybe I'm the one missing something, but in the absence of statement 
level exception throwing via 'can_throw_non_call_exceptions' being true,
any assertion made anywhere in the block to an ssa_name applies to the 
entire block does it not? Â  ie it doesn't matter if the deference 
happens first thing in the block or last thing, its not going to change 
its value within the block.. its going to be non-null throughout the 
entire block.

so if one statement in the block asserts that references to _2 are 
non-null, we can assert that all references to _2 in the block are 
non-null. Meaning we get all these cases by knowing that the specified 
name is non-zero through-out the block.Â  This also means we could know 
things earlier in the block than a forward walk would provide.

So with the 2 examples:

   _1 = *_2; // after this _2 is non-NULL
   _3 = _1 + 1;
   _4 = *_3;


both _2 and _3 are flagged as non-null in the block due to the 
de-references.Â  Im not sure what we know about _1 from above, but we do 
know
 Â  ~[0,0] = _1 + 1Â  .Â  so whatever that tells you , if anything, about 
_1 we know.Â  it seems to me _1 is ~[-1,-1] based on that...

Regardless, I think we know all the same things EVRP does.

likewise

 Â _4 = (uintptr_t) _2;
 Â _5 = _6 / _4;
 Â _1 = *_2;

_2 will be non-null in the entire block,Â  so _4 must also be non-null 
and we can conclude that the divide does not trap.


now, when we set the ' can_throw_non_call_exceptions' flag, then we'd 
have to resort to statement walks,Â  and we cannot determine that _5 does 
not trap anyway. Â  EVRP is in the same boat.. It doesn't know its not 
going to trap either because we may never get to the *_2..


>>>> yes, compile-time complexity is from empirical speed timings and
>>>> theory-crafting from the algorithms,  and that the on-entry cache
>>>> prevents multiple passes over things.
>>>>
>>>> we have not done a memory analysis yet, not done anything to see if we
>>>> can improve it.
>>>> It makes very heavy use of bitmaps, which are typically quite sparse.
>>>> The on-entry cache is a vector of pointers to a range, initially 0, and
>>>> we allocate ranges as needed.   There will be an on-entry range entry
>>>> for any ssa-name which has been queried between the query point and the
>>>> definition.
>>> So that's similar to LIVE-on-entry thus a SSA name with a range
>>> will have an on-entry range entry on each BB it dominates?
>>> That makes storage requirement quadratic in the worst case.
>> Yes, assuming a request has been made for all ssa-names everywhere they
>> are live.
> You did propose to replace [E]VRP with a walk over the whole function
> querying ranges and simplifying stmts, did you?
yes, for the case of EVRP.Â  But not all use cases query every range 
everywhere.

its also a bit lower than that since we cache certain types of ranges.Â  
we cache a range for varying (or range for type if you prefer)Â  of 
whatever the ssa-name type is.Â  so if the range is varying everywhere, 
we actually only have once instance of the range rather than N of them.Â  
So any name that doesn't have a range reduction anywhere will only 
create a single range instance for the entire CFG.Â  I think thats the 
most common value, so that should reduce a large number of them.Â Â  I've 
also considering caching ranges like we cache tree constants... but I 
haven't delved into that.Â  I figured if memory turns out to be a 
problem, then we'll look at it then.

Andrew