From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 39820 invoked by alias); 31 May 2019 15:40:47 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 39811 invoked by uid 89); 31 May 2019 15:40:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=flagged, so, absence, boat X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 May 2019 15:40:44 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8E84081E05; Fri, 31 May 2019 15:40:43 +0000 (UTC) Received: from [10.10.124.16] (ovpn-124-16.rdu2.redhat.com [10.10.124.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 456FB5DA34; Fri, 31 May 2019 15:40:41 +0000 (UTC) Subject: Re: On-Demand range technology [2/5] - Major Components : How it works To: Richard Biener Cc: GCC , Jeff Law , Aldy Hernandez References: <908dbc60-b7b9-c609-f999-cc7264af0c6e@redhat.com> <91bfd08e-1431-d8ba-f9b5-e8822aaf62e2@redhat.com> From: Andrew MacLeod Message-ID: Date: Fri, 31 May 2019 15:40:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg00280.txt.bz2 On 5/29/19 7:15 AM, Richard Biener wrote: > On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod wrote: >> On 5/27/19 9:02 AM, Richard Biener wrote: >>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod wrote: >>>>> The above suggests that iff this is done at all it is not in GORI because >>>>> those are not conditional stmts or ranges from feeding those. The >>>>> machinery doing the use-def walking from stmt context also cannot >>>>> come along these so I have the suspicion that Ranger cannot handle >>>>> telling us that for the stmt following above, for example >>>>> >>>>> if (_5 != 0) >>>>> >>>>> that _5 is not zero? >>>>> >>>>> Can you clarify? >>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if >>>> asked for the range of op2 () would return ~[0,0] for _5. >>>> But you are also correct in that the walk backwards would not find this. >>>> >>>> This is similar functionality to how null_derefs are currently handled, >>>> and in fact could probably be done simultaneously using the same code >>>> base. I didn't bring null derefs up, but this is a good time :-) >>>> >>>> There is a separate class used by the gori-cache which tracks the >>>> non-nullness property at the block level. It has a single API: >>>> non_null_deref_p (name, bb) which determines whether the is a >>>> dereference in any BB for NAME, which indicates whether the range has an >>>> implicit ~[0,0] range in that basic block or not. >>> So when we then have >>> >>> _1 = *_2; // after this _2 is non-NULL >>> _3 = _1 + 1; // _3 is non-NULL >>> _4 = *_3; >>> ... >>> >>> when a on-demand user asks whether _3 is non-NULL at the >>> point of _4 = *_3 we don't have this information? Since the >>> per-BB caching will only say _1 is non-NULL after the BB. >>> I'm also not sure whether _3 ever gets non-NULL during >>> non-NULL processing of the block since walking immediate uses >>> doesn't really help here? >> presumably _3 is globally non-null due to the definition being (pointer >> + x) ... ie, _3 has a global range o f ~[0,0] ? > No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and > you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer. I'm confused. _1 was loaded from _2 (thus asserting _2 is non-NULL).  but we have no idea what the range of _1 is, so  how do you assert _1 is [~0,0] ? The only way I see to determine _3 is non-NULL  is through the _4 = *_3 statement. > >>> So this seems to be a fundamental limitation [to the caching scheme], >>> not sure if it is bad in practice. >>> >>> Or am I missing something? >>> >> Not missing anything The non-nullness property is maintains globally at >> the basic block level. both _1 and _3 are flagged as being non-null in >> the block. Upon exit, its a bit check. If the global information does >> not have the non-nullness property, then when a request is made for >> non-nullness and the def and the use are both within the same block, >> and its flagged as being non-null in that block, then the request is >> forced back to a quick walk between the def and the use to see if there >> is any non-nulless introduced in between. Yes, that makes it a linear >> walk, but its infrequent, and often short. to the best of our knowledge >> at this point anyway :-) > So with the clarification above do we ever see that _3 is non-NULL? > I suppose the worker processing _3 = _1 + 1 would ask for > _1 non-nullness but we do not record any non-NULL-ness of _1 in > this basic-block (but only at its end). Consider stmts > > _4 = (uintptr_t) _2; > _5 = _6 / _4; > _1 = *_2; > ... > > here at _1 we know _2 is not NULL. But if we ask for non-NULLness > of _2 at the definition of _4 we may not compute ~[0, 0] and thus > conclude that _6 / _4 does not trap. EVRP must look backwards to figure this out since the forward walk will process _5 = _6 / _4 before it sees the dereference to _2... so how does it know that _4 is non-zero without looking backwards at things after it sees the dereference??  Does it actually do this? > > stmt-level tracking of ranges are sometimes important. This is > something the machinery cannot provide - correct? At least not > optimistically enough with ranges derived about uses. Maybe I'm the one missing something, but in the absence of statement level exception throwing via 'can_throw_non_call_exceptions' being true, any assertion made anywhere in the block to an ssa_name applies to the entire block does it not?   ie it doesn't matter if the deference happens first thing in the block or last thing, its not going to change its value within the block.. its going to be non-null throughout the entire block. so if one statement in the block asserts that references to _2 are non-null, we can assert that all references to _2 in the block are non-null. Meaning we get all these cases by knowing that the specified name is non-zero through-out the block.  This also means we could know things earlier in the block than a forward walk would provide. So with the 2 examples: _1 = *_2; // after this _2 is non-NULL _3 = _1 + 1; _4 = *_3; both _2 and _3 are flagged as non-null in the block due to the de-references.  Im not sure what we know about _1 from above, but we do know   ~[0,0] = _1 + 1  .  so whatever that tells you , if anything, about _1 we know.  it seems to me _1 is ~[-1,-1] based on that... Regardless, I think we know all the same things EVRP does. likewise  _4 = (uintptr_t) _2;  _5 = _6 / _4;  _1 = *_2; _2 will be non-null in the entire block,  so _4 must also be non-null and we can conclude that the divide does not trap. now, when we set the ' can_throw_non_call_exceptions' flag, then we'd have to resort to statement walks,  and we cannot determine that _5 does not trap anyway.   EVRP is in the same boat.. It doesn't know its not going to trap either because we may never get to the *_2.. >>>> yes, compile-time complexity is from empirical speed timings and >>>> theory-crafting from the algorithms, and that the on-entry cache >>>> prevents multiple passes over things. >>>> >>>> we have not done a memory analysis yet, not done anything to see if we >>>> can improve it. >>>> It makes very heavy use of bitmaps, which are typically quite sparse. >>>> The on-entry cache is a vector of pointers to a range, initially 0, and >>>> we allocate ranges as needed. There will be an on-entry range entry >>>> for any ssa-name which has been queried between the query point and the >>>> definition. >>> So that's similar to LIVE-on-entry thus a SSA name with a range >>> will have an on-entry range entry on each BB it dominates? >>> That makes storage requirement quadratic in the worst case. >> Yes, assuming a request has been made for all ssa-names everywhere they >> are live. > You did propose to replace [E]VRP with a walk over the whole function > querying ranges and simplifying stmts, did you? yes, for the case of EVRP.  But not all use cases query every range everywhere. its also a bit lower than that since we cache certain types of ranges.  we cache a range for varying (or range for type if you prefer)  of whatever the ssa-name type is.  so if the range is varying everywhere, we actually only have once instance of the range rather than N of them.  So any name that doesn't have a range reduction anywhere will only create a single range instance for the entire CFG.  I think thats the most common value, so that should reduce a large number of them.   I've also considering caching ranges like we cache tree constants... but I haven't delved into that.  I figured if memory turns out to be a problem, then we'll look at it then. Andrew