Project Ranger

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Project Ranger
@ 2018-05-29 23:53 Andrew MacLeod
  2018-05-30  7:49 ` Eric Botcazou
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Andrew MacLeod @ 2018-05-29 23:53 UTC (permalink / raw)
  To: GCC; +Cc: Aldy Hernandez, Jeff Law

I'd like to introduce a project we've been working on for the past year 
an a half.

The original project goal was to see if we could derived accurate range 
information from the IL without requiring much effort on the client 
side. The idea being that a pass could simply ask "what is the range of 
this ssa_name on this statement? "Â  and the compiler would go figure it out.

After lots of experimenting and prototyping the project evolved into 
what we are introducing here. I call it the Ranger.

Existing range infrastructure in the compiler works from the top down. 
It walks through the IL computing all ranges and propagates these values 
forward in case they are needed.Â  For the most part, other passes are 
required to either use global information, or process things in 
dominator order and work lockstep with EVRP to get more context 
sensitive ranges.

The Ranger's model is purely on-demand, and designed to have minimal 
overhead.Â Â  When a range is requested, the Ranger walking backwards 
through use-def chains to determine what ranges it can find relative to 
the name being requested.Â  This means it only looks at statements which 
are deemed necessary to evaluate a range.Â  This can result is some 
significantÂ  speedups when a pass is only interested in a few specific 
cases, as is demonstrated in some of the pass conversions we have 
performed. We have also implemented a "quick and dirty" vrp-like pass 
using the ranger to demonstrate that it can also handle much heavier 
duty range work and still perform well.

The code is located on an svn branch *ssa-range*.Â  It is based on trunk 
at revision *259405***circa mid April 2018. **The branch currently 
bootstraps with no regressions.Â  The top level ranger class is called 
'path_ranger' and is found in the file ssa-range.h.Â  It has 4 primary API's:

  * bool path_range_edge (irange& r, tree name, edge e);
  * bool path_range_entry (irange& r, tree name, basic_block bb);
  * bool path_range_on_stmt (irange&r, tree name, gimple *g);
  * bool path_range_stmt (irange& r, gimple *g);

This allows queries for a range on an edge, on entry to a block, as an 
operand on an specific statement, or to calculate the range of the 
result of a statement.Â  There are no prerequisites to use it, simply 
create a path ranger and start using the API.Â Â  There is even an 
available function which can be lightly called and doesn't require 
knowledge of the ranger:

    static inline bool
    on_demand_get_range_on_stmt (irange &r, tree ssa, gimple *stmt)
    {
     Â Â  path_ranger ranger;
     Â Â  return ranger.path_range_on_stmt (r, ssa, stmt);
    }

The Ranger consists of 3 primary components:

  * range.[ch] - A new range representation purely based on wide-int ,
    and allows ranges to consist of multiple non-overlapping sub-ranges.
  * range-op.[ch] - Implements centralized tree-code operations on the
    irange class (allowing adding, masking, multiplying, etc).
  * ssa-range*.[ch]Â  - Files containing a set of classes which implement
    the Ranger.

We have set up a project page on the wiki which contains documentation 
for the approach as well as some converted pass info and a to-do list here:

https://gcc.gnu.org/wiki/AndrewMacLeod/Ranger

We would like to include the ranger in GCC for this release, and realize 
there are still numerous things to be done before its ready for 
integration. It has been in prototype mode until now,Â  so we have not 
prepared the code for a merge yet.Â  No real performance analysis has 
been done on it either, but there is an integration page where you will 
find information about the 4 passes that have been converted to use the 
Ranger and the performance of those:

https://gcc.gnu.org/wiki/AndrewMacLeod/RangerIntegration

One of the primary tasks over the next month or two is to improve the 
sharing of operation code between the VRPs and the Ranger. We haven't 
done a very good job of that so far.Â Â  This is included along with a 
list ofknown issues we need to look at on the to-do page:

https://gcc.gnu.org/wiki/AndrewMacLeod/RangerToDo .

The Ranger is far enough along now that we have confidence in both its 
approach and ability to perform, and would like to solicit feedback on 
what you think of it,Â  any questions, possible uses,Â  as well as 
potential requirements to integrate with trunk later this stage.

Please visit the project page and have a look.Â  We've put as much 
documentation, comments, and to-dos there as we could think of.Â  We will 
try to keep it up-to-date.

Andrew, Aldy and Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-29 23:53 Project Ranger Andrew MacLeod
@ 2018-05-30  7:49 ` Eric Botcazou
  2018-05-30 14:03   ` Andrew MacLeod
  2018-05-30 14:39 ` David Malcolm
  2018-06-01  9:49 ` Richard Biener
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Botcazou @ 2018-05-30  7:49 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc, Aldy Hernandez, Jeff Law

> The Ranger is far enough along now that we have confidence in both its
> approach and ability to perform, and would like to solicit feedback on
> what you think of it,  any questions, possible uses,  as well as
> potential requirements to integrate with trunk later this stage.

The PDF document mentions that you first intended to support symbolic ranges 
but eventually dropped them as "too complex, and ultimately not necessary".

I don't entirely disagree with the former part, but I'm curious about the 
latter part: how do you intent to deal in the long term with cases that do 
require symbolic information to optimize things?  The TODO page seems to 
acknowledge the loophole but only mentions a plan to deal with equivalences, 
which is not sufficient in the general case (as acknowledged too on the page).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-30  7:49 ` Eric Botcazou
@ 2018-05-30 14:03   ` Andrew MacLeod
  2018-06-01  8:16     ` Eric Botcazou
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew MacLeod @ 2018-05-30 14:03 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc, Aldy Hernandez, Jeff Law

On 05/30/2018 03:41 AM, Eric Botcazou wrote:
>> The Ranger is far enough along now that we have confidence in both its
>> approach and ability to perform, and would like to solicit feedback on
>> what you think of it,  any questions, possible uses,  as well as
>> potential requirements to integrate with trunk later this stage.
> The PDF document mentions that you first intended to support symbolic ranges
> but eventually dropped them as "too complex, and ultimately not necessary".
>
> I don't entirely disagree with the former part, but I'm curious about the
> latter part: how do you intent to deal in the long term with cases that do
> require symbolic information to optimize things?  The TODO page seems to
> acknowledge the loophole but only mentions a plan to deal with equivalences,
> which is not sufficient in the general case (as acknowledged too on the page).
>
First, we'll collect the cases that demonstrate a unique situation we 
care about.Â  I have 4 very specific case that show current 
shortcomings.. Not just with the Ranger, but a couple we don't handle 
with VRP today. .. I'll eventually get those put onto the wiki so the 
list can be updated.

I think most of these cases that care about symbolics are not so much 
range related, but rather an algorithmic layer on top. Any follow on 
optimization to either enhance or replace vrp or anything similar will 
simply use the ranger as a client.Â  If it turns out there are cases 
where we *have* to remember the end point of a range as a symbolic, then 
the algorithm to track that symbolic along with the range, and request a 
re-evaluation of the range when the value of that symbolic is changes.

Thats the high-level view.Â  I'm not convinced the symbolic has to be in 
the range in order to solve problems for 2 reasons:

 Â 1)Â  The Ranger maintains some definition chains internally and has a 
clear idea of what ssa_names can affect the outcome of a range. It 
tracks all these dependencies on a per-bb basis in the gori-map 
structure as imports and exports. .Â  The iterative approach I mentioned 
in the document would use this info to decide that ranges in a block 
need to be re-evaluated because an input to this block has changed.Â  
This is similar to the way VRP and other passes iterate until nothing 
changes.Â  If we get a better range for an ssa_name than we had before, 
push it on the stack to look for potential re-evaluation, and keep going.

ThIs is what I referred to as the Level 4 ranger in the document. I kind 
of glossed over it because I didn't want to get into the full-blown 
design I originally had, nor to rework it based on the current 
incarnation of the Ranger.Â  I wanted to primarily focus on what we 
currently have that is working so we can move forward with it.

I don't think we need to track the symbolic name in the range because 
the Ranger tracks these dependencies for each ssa-name and can indicate 
when we may need to reevaluate them.Â  There is an exported routine from 
the block ranger :
 Â Â Â Â Â  tree single_import (tree name);
If there is a sole import, it will return the ssa-name that NAME is 
dependent on that can affect the range ofÂ  NAME.Â Â  We added that API so 
Aldy's new threading code could utilizes this ability to a small degree..

Bottom line: The ranger has information that a pass can use to decide 
that a range could benefit from being reevaluated. This identifies any 
symbolic component of a range from that block.

 Â 2) This entire approach is modeled on walking the IL to evaluate a 
range.Â  If we put symbolics and expressions in the range, we are really 
duplicating information that is already in the IL, and have to make a 
choice of exactly what and how we do it..
BB3:
 Â Â  j_1 = q_6 / 2
 Â Â  i_2 = j_1 + 3
 Â Â  if ( i_2 < k_4)

we could store the range of k_4 as [i_2 + 1, MAX]Â  (which seems the 
obvious one)
we could also store it as [j_1 + 4, MAX]
or even [q_6 / 2 + 4, MAX].Â  But we have to decide in advance, and we 
have extra work to do if it turns out to be one of the other names we 
ended up wanting..

At some point later on we decide we either don't know anything about i_2 
(or j_1, or q_6), or we have found a range for it, and now need to take 
that value and evaluate the expression stashed in the range in order to 
get the final result.Â Â  Note that whatever algorithm is doing this must 
also keep track of this range somehow in order to use it later.

With the Ranger model, the same algorithm gets a range, and if it thinks 
it might need to be re-evaluated for whatever reason can just track the 
an extra bit of info (like i_2 for instance) along side the range 
(rather than in it).Â  If we thinks the range needs to be re-evaluated , 
it can simply request a new range from the ranger.

You also don't have to decide whether to track the range with i_2 or j_1 
(or even q_6). The Ranger can tell you that the range it gives you for 
k_4 is accurate unless you get a new value for q_6. That is really what 
you want to track.Â  You might later want to reevaluate the range only if 
q_6 changes.Â  If it doesn't, you are done.Â  .

Bottom line:The ranger indicates what the symbolic aspect of the range 
is with the import. The net effect is the symbolic expression using that 
import is also the longest possible expression in the block 
available...Â  it just picks it up from the IL rather than storing it in 
the range.

I would also note that we track multiple imports, they just aren't 
really exposed as yet since they aren't really being used by anyone.Â  
k_4 is also tagged as an import to that block, and if you ask for the 
range of i_2, you'd get a range, andÂ  k_4 would be listed as the import.

Also note more complexity is available.Â  Once we hit statements with 
multiple ssa_names, we stop tracking currently, but we do note the 
imports at that point:
BB4:
 Â  z_4 = a_3 + c_2
 Â  z_5 = z_4 + 3
 Â  if (Â  q_8 < z_5)
we can get a range for q_8, and the ranger does know that a_3 and c_2 
are both imports to defining z_5.Â  By using the import information, we 
effectively get a "symbolic" range of
[MIN, a_3 + c_2 + 3] for q_8 in this case.

Which means I think the import approach of the Ranger has the benefit of 
being simpler in many ways,Â  yet more powerful should we wish to explore 
that route.

The one place this falls down is if you get a range back from a call and 
you have no idea where it came from, but want to be able to re-evaluate 
it later. .Â  I am not sure what this use case looks like (if it exists 
:-), but I would be surprised if it wasn't something that could be 
handled with an algorithm changed.Â  I know if discussions with Aldy and 
Jeff as we went through various use cases, this model does sometimes 
require a bit of rethinking of how you approach using the information 
since a lot of things we're use to worrying about just happen under the 
covers.

Does that help?Â Â  If it does, I'll add this to the coverage in the wiki 
page.

Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-29 23:53 Project Ranger Andrew MacLeod
  2018-05-30  7:49 ` Eric Botcazou
@ 2018-05-30 14:39 ` David Malcolm
  2018-05-30 14:45   ` Andrew MacLeod
  2018-05-30 14:51   ` Andreas Schwab
  2018-06-01  9:49 ` Richard Biener
  2 siblings, 2 replies; 14+ messages in thread
From: David Malcolm @ 2018-05-30 14:39 UTC (permalink / raw)
  To: Andrew MacLeod, GCC; +Cc: Aldy Hernandez, Jeff Law

On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:

[...snip...]

> The code is located on an svn branch *ssa-range*.  It is based on
> trunk 
> at revision *259405***circa mid April 2018. 

Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
from there, but failed.

[...snip...]

Thanks
Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-30 14:39 ` David Malcolm
@ 2018-05-30 14:45   ` Andrew MacLeod
  2018-05-30 14:51   ` Andreas Schwab
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew MacLeod @ 2018-05-30 14:45 UTC (permalink / raw)
  To: David Malcolm, GCC; +Cc: Aldy Hernandez, Jeff Law

On 05/30/2018 10:39 AM, David Malcolm wrote:
> On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:
>
> [...snip...]
>
>> The code is located on an svn branch *ssa-range*.  It is based on
>> trunk
>> at revision *259405***circa mid April 2018.
> Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
> from there, but failed.
>
> [...snip...]
>
> Thanks
> Dave

I don't know :-)Â Â Â  I know that Â Â  svn diff -r 259405Â Â Â  worked and 
appeared to give me a diff of everything.Â Â  Aldy uses git, maybe he can 
tell you.

that was a merge from trunk to start with to an existing branch I had 
cut a month earlier.Â  . the ACTUAL original branch cut was revision 
*258524 if that makes any difference *

Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-30 14:39 ` David Malcolm
  2018-05-30 14:45   ` Andrew MacLeod
@ 2018-05-30 14:51   ` Andreas Schwab
  2018-05-30 14:53     ` Aldy Hernandez
  1 sibling, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2018-05-30 14:51 UTC (permalink / raw)
  To: David Malcolm; +Cc: Andrew MacLeod, GCC, Aldy Hernandez, Jeff Law

On Mai 30 2018, David Malcolm <dmalcolm@redhat.com> wrote:

> On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:
>
> [...snip...]
>
>> The code is located on an svn branch *ssa-range*.  It is based on
>> trunk 
>> at revision *259405***circa mid April 2018. 
>
> Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
> from there, but failed.

It's in refs/remotes/ssa-range, which isn't fetched by default.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-30 14:51   ` Andreas Schwab
@ 2018-05-30 14:53     ` Aldy Hernandez
  0 siblings, 0 replies; 14+ messages in thread
From: Aldy Hernandez @ 2018-05-30 14:53 UTC (permalink / raw)
  To: Andreas Schwab, David Malcolm; +Cc: Andrew MacLeod, GCC, Jeff Law



On 05/30/2018 10:51 AM, Andreas Schwab wrote:
> On Mai 30 2018, David Malcolm <dmalcolm@redhat.com> wrote:
> 
>> On Tue, 2018-05-29 at 19:53 -0400, Andrew MacLeod wrote:
>>
>> [...snip...]
>>
>>> The code is located on an svn branch *ssa-range*.  It is based on
>>> trunk
>>> at revision *259405***circa mid April 2018.
>>
>> Is this svn branch mirrored on gcc's git mirror?  I tried to clone it
>> from there, but failed.
> 
> It's in refs/remotes/ssa-range, which isn't fetched by default.

Right.

 From my tree:

$ git branch -a |grep svn.ssa.range
   remotes/svn/ssa-range

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-30 14:03   ` Andrew MacLeod
@ 2018-06-01  8:16     ` Eric Botcazou
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Botcazou @ 2018-06-01  8:16 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc, Aldy Hernandez, Jeff Law

> First, we'll collect the cases that demonstrate a unique situation we
> care about.  I have 4 very specific case that show current
> shortcomings.. Not just with the Ranger, but a couple we don't handle
> with VRP today. .. I'll eventually get those put onto the wiki so the
> list can be updated.

I'm not specifically worried here, there is already a couple of testcases in 
the testsuite that require symbolic information to be properly optimized.

> I think most of these cases that care about symbolics are not so much
> range related, but rather an algorithmic layer on top. Any follow on
> optimization to either enhance or replace vrp or anything similar will
> simply use the ranger as a client.  If it turns out there are cases
> where we *have* to remember the end point of a range as a symbolic, then
> the algorithm to track that symbolic along with the range, and request a
> re-evaluation of the range when the value of that symbolic is changes.
>
> [...]
>
> Does that help?   If it does, I'll add this to the coverage in the wiki
> page.

Yes, thanks for the detailed explanation.  This approach of setting aside the 
handling of symbolic information might end up being a good compromise between 
the necessary minimal[*] handling of this information and the complexity of 
doing it directly in the Ranger.

[*] The implicit assumption hee is that a VRP implementation with full-blown 
support of symbolic ranges is not worth the hassle in practice.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-05-29 23:53 Project Ranger Andrew MacLeod
  2018-05-30  7:49 ` Eric Botcazou
  2018-05-30 14:39 ` David Malcolm
@ 2018-06-01  9:49 ` Richard Biener
  2018-06-01 20:38   ` Andrew MacLeod
  2 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2018-06-01  9:49 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC Development, Aldy Hernandez, Jeff Law

On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> I'd like to introduce a project we've been working on for the past year
> an a half.
>
> The original project goal was to see if we could derived accurate range
> information from the IL without requiring much effort on the client
> side. The idea being that a pass could simply ask "what is the range of
> this ssa_name on this statement? "  and the compiler would go figure it out.
>
> After lots of experimenting and prototyping the project evolved into
> what we are introducing here. I call it the Ranger.
>
> Existing range infrastructure in the compiler works from the top down.
> It walks through the IL computing all ranges and propagates these values
> forward in case they are needed.  For the most part, other passes are
> required to either use global information, or process things in
> dominator order and work lockstep with EVRP to get more context
> sensitive ranges.
>
> The Ranger's model is purely on-demand, and designed to have minimal
> overhead.   When a range is requested, the Ranger walking backwards
> through use-def chains to determine what ranges it can find relative to
> the name being requested.  This means it only looks at statements which
> are deemed necessary to evaluate a range.  This can result is some
> significant  speedups when a pass is only interested in a few specific
> cases, as is demonstrated in some of the pass conversions we have
> performed. We have also implemented a "quick and dirty" vrp-like pass
> using the ranger to demonstrate that it can also handle much heavier
> duty range work and still perform well.
>
> The code is located on an svn branch *ssa-range*.  It is based on trunk
> at revision *259405***circa mid April 2018. **The branch currently
> bootstraps with no regressions.  The top level ranger class is called
> 'path_ranger' and is found in the file ssa-range.h.  It has 4 primary API's:
>I'd like to introduce a project we've been working on for the past year
an a half.
>   * bool path_range_edge (irange& r, tree name, edge e);
>   * bool path_range_entry (irange& r, tree name, basic_block bb);
>   * bool path_range_on_stmt (irange&r, tree name, gimple *g);
>   * bool path_range_stmt (irange& r, gimple *g);
>
> This allows queries for a range on an edge, on entry to a block, as an
> operand on an specific statement, or to calculate the range of the
> result of a statement.  There are no prerequisites to use it, simply
> create a path ranger and start using the API.   There is even an
> available function which can be lightly called and doesn't require
> knowledge of the ranger:

So what the wiki pages do not show is how contextual information
is handled (and how that's done w/o dominators as the wiki page
claims).  That is, path_range_on_edge suggests that range information
provided by the (all) controlling stmts of that edge are used when
determining the range for 'name'.

That's probably true for the 'stmt' variant as well.

This leads me to guess that either the cache has to be
O(number of SSA uses) size or a simple VRP pass implemented
using Ranger querying ranges of all SSA ops for each stmt
will require O(number of SSA uses) times analysis?

One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
pass is that the number of evaluations and the size of the cache
isn't that way "unbound".  On the WIKI page you mention edge
info isn't cached yet - whatever that means ;)

So I wonder what's the speed for doing

  FOR_EACH_BB_FN (bb)
     FOR_EACH_STMT (stmt)
       FOR_EACH_SSA_USE (stmt)
           path_range_on_stmt (..., SSA-USE, stmt);

assuming that it takes into account controlling predicates.
That's actually what EVRP does (within a DOM walk) given
it tries to simplify each stmt with based on range info of the ops.

>     static inline bool
>     on_demand_get_range_on_stmt (irange &r, tree ssa, gimple *stmt)
>     {
>         path_ranger ranger;
>         return ranger.path_range_on_stmt (r, ssa, stmt);
>     }
>
> The Ranger consists of 3 primary components:
>
>   * range.[ch] - A new range representation purely based on wide-int ,
>     and allows ranges to consist of multiple non-overlapping sub-ranges.
>   * range-op.[ch] - Implements centralized tree-code operations on the
>     irange class (allowing adding, masking, multiplying, etc).
>   * ssa-range*.[ch]  - Files containing a set of classes which implement
>     the Ranger.
>
> We have set up a project page on the wiki which contains documentation
> for the approach as well as some converted pass info and a to-do list here:
>
> https://gcc.gnu.org/wiki/AndrewMacLeod/Ranger
>
> We would like to include the ranger in GCC for this release, and realize
> there are still numerous things to be done before its ready for
> integration. It has been in prototype mode until now,  so we have not
> prepared the code for a merge yet.  No real performance analysis has
> been done on it either, but there is an integration page where you will
> find information about the 4 passes that have been converted to use the
> Ranger and the performance of those:
>
> https://gcc.gnu.org/wiki/AndrewMacLeod/RangerIntegration
>
> One of the primary tasks over the next month or two is to improve the
> sharing of operation code between the VRPs and the Ranger. We haven't
> done a very good job of that so far.   This is included along with a
> list ofknown issues we need to look at on the to-do page:
>
> https://gcc.gnu.org/wiki/AndrewMacLeod/RangerToDo .
>
> The Ranger is far enough along now that we have confidence in both its
> approach and ability to perform, and would like to solicit feedback on
> what you think of it,  any questions, possible uses,  as well as
> potential requirements to integrate with trunk later this stage.
>
> Please visit the project page and have a look.  We've put as much
> documentation, comments, and to-dos there as we could think of.  We will
> try to keep it up-to-date.

More generally my remarks are still the same as with the irange introduction
attempt.

It's a no-no to add yet another set of range op primitives given we already have
them implemented (with trees, yes...) in tre-vrp.c.  The existing ones
are already
factored out to some extent so I expect you will continue that and split out
the wide-int cases from those existing routines.  Yes - that will not be very
C++-ish - but who cares.  Simply implement C-style workers, for example
be inspired by the tree-ssa-ccp.c:bit_value_* routines implementing
a kind of non-zero-bits operation primitives ontop of wide_ints (and also
wrapping those with tree variants).

For most of your on-demand uses I would expect a _much_ simpler and
less heavy-weight approach to be extending the recently merged
determine_value_range (I've pointed you to this multiple times in the
past) to walk SSA def-use chains using the existing SSA on-the-side
info as cache (yes, w/o use-granular contextual information).

I also think that purely restricting yourself to wide-ints is a mistake - we
do very well want to have range information for REAL_CSTs (or composite
integers / reals).  How do you think of extending Ranger to handle different
kind of types?  Eric already raised the issue of symbolic ranges.

Yes, I do like having wide-int workers for ranges.

Yes, I do like the idea of making ranges not restricted to single sub-ranges
(didn't review the implementation).

But those things should have been done on the existing code, not sticking
sth completely new into GCC that cannot subsume the existing stuff.  That's
the road to maintainance hell which we unfortunately walk way too often
with GCC :/

Btw, the original goal of EVRP was to get rid of the ASSERT_EXRP-style
VRP pass once the threading bits it has are subsumed by the backwards
threader.  Does ranger at least allow that - get rid of the threading inside
VRP without regressions?

Just as a final note - open source development doesn't work like this,
citing you "I'd like to introduce a project we've been working on for
the past year
an a half." - this simply provokes "reviews" like the above.

Thanks,
Richard.

> Andrew, Aldy and Jeff
>
>
>
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-06-01  9:49 ` Richard Biener
@ 2018-06-01 20:38   ` Andrew MacLeod
  2018-06-04 15:17     ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew MacLeod @ 2018-06-01 20:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development, Aldy Hernandez, Jeff Law

On 06/01/2018 05:48 AM, Richard Biener wrote:
> On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>> This allows queries for a range on an edge, on entry to a block, as an
>> operand on an specific statement, or to calculate the range of the
>> result of a statement.  There are no prerequisites to use it, simply
>> create a path ranger and start using the API.   There is even an
>> available function which can be lightly called and doesn't require
>> knowledge of the ranger:
> So what the wiki pages do not show is how contextual information
> is handled (and how that's done w/o dominators as the wiki page
> claims).  That is, path_range_on_edge suggests that range information
> provided by the (all) controlling stmts of that edge are used when
> determining the range for 'name'.
the wiki doesn't talk about it, but the document itself goes into great 
detail of how the dynamic process works.

> That's probably true for the 'stmt' variant as well.
>
> This leads me to guess that either the cache has to be
> O(number of SSA uses) size or a simple VRP pass implemented
> using Ranger querying ranges of all SSA ops for each stmt
> will require O(number of SSA uses) times analysis?
The current cache is not particularly efficient, its size is #ssa_name * 
#BB, assuming you actually go and look at every basic block, which many 
passes do not.Â  It also only puts a range in there when it has to. Â Â  Im 
not sure why it really matters unless we find a pathological case that 
kills compile time which cannot be addressed.Â  This would all fall under 
performance analysis and tweaking.Â  Its the end performance that matters.

When aÂ  range request is made for NAME in a block, it requests a range 
on each incoming edge, and then unions those ranges, and caches it. This 
is the range-on-entry cache.Â Â  The next time a range request forÂ  NAME 
is made for on-entry to that block, it simply picks it up from the cache.

The requests for range on an edge uses the gori_map summary to decide 
whether that block creates a range for NAME or not.
If the block does not generate a range, it simply requests the 
range-on-entry to that block.
If it does generate a range, then it examines the statements defining 
the condition, gets ranges for those,Â  and calculates the range that 
would be generated on that edge.Â  It also requests ranges for any of 
those names in the defining chain which originate outside this block.

So it queries all the names which feed other names and so on. The 
queries end when a statement is found that doesn't generate a useful 
range. Sometimes it has to go hunt quite a way, but usually they are 
nearby.Â  And caching the range-on-entry values prevents the lookups from 
doing the same work over and over.

Once a range has been calculated, we don't spend much more time 
calculating it again.Â Â  Any other ssa-name which uses that name also 
doesnt need to recalculate it.Â  For the most part, we walk the def 
chains for any given ssa-name and its feeding names once and future 
requests pretty much come from the cache.

So yes, there is also a lot of recursion in here at the moment. calls 
for ranges spawning other calls before they can be resolved. I suspect 
sometimes this call chain is deep. I don't know, we haven't performed 
any analysis on it yet.

> One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
> pass is that the number of evaluations and the size of the cache
> isn't that way "unbound".  On the WIKI page you mention edge
> info isn't cached yet - whatever that means ;)
>
> So I wonder what's the speed for doing
The on-entry caches prevent it from being "unbound". . and the 
intersection of incoming edge calculation performs the equivalent 
merging of ranges that DOM based processing give you.

    if (x > 10)
     Â Â  A
    else
     Â Â  B
    C

block A has x [11, MAX]
block B has x [MIN, 10]
block C, at the bottom of a diamond, the 2 incoming edges union those 2 
ranges back to [MIN, MAX] and we know nothing about the range of X 
again.Â  Accomplished the same thing dominators would tell you.

The edge info that "isnt cached yet", (and may not ever be), is simply 
the 'static' info calculated by evaluating the few statements at the 
bottom of the block feeding the condition.Â  any values coming into the 
block are cached.Â  Â  I this case its simply looking at x > 10 and 
calculating the range on whichever edge.Â Â Â  One of the original design 
elements called for this to be cached, but in practice I don't think it 
needs to be.Â  But we haven't analyzed the performance yet.. so it can 
only get quicker :-)

>   
>
>    FOR_EACH_BB_FN (bb)
>       FOR_EACH_STMT (stmt)
>         FOR_EACH_SSA_USE (stmt)
>             path_range_on_stmt (..., SSA-USE, stmt);
>
> assuming that it takes into account controlling predicates.
> That's actually what EVRP does (within a DOM walk) given
> it tries to simplify each stmt with based on range info of the ops.

   FOR_EACH_BB_FN (bb)
      FOR_EACH_STMT (stmt)
         path_range_on_stmt (..., stmt);

would be sufficient.  In order to calculate the range for the definition of the statement, the Ranger will query the range of each SSA operand.

I had Aldy run his cumulative .ii  times with a patch added to EVRP which adds a ranger and makes a call to path_range_stmt on each PHI and every stmt as  evrp processes it.

evrp original time: 2025055 usec
evrp + ranger:      3787678 usec
                     -------
       ranger time:  1762623

So to be fair we aren't actually doing anything and evrp is, but the actual folding of statement is not a big time killer which is the only additional thing EVRP is doing.  This indicates that for a full-on stmt-by-stmt walk and process, we're certainly in the same time ballpark.

A significant point is that the stmt-by-stmt walk is also not the intended target. Thats old school now :-)   Many clients only care about a few ranges.

The "quick and dirty" RVRP pass only looks at the last condition in the block and catches the vast majority of what the stmt-by-stmt walk accomplishes. If timings for the RVRP pass are any indication, I would say it walks/calculates 60% of the statements in the program to calculate flow conditions.  Processing a lot of arithmetic statements that can never result in a fold seems kinda pointless. leave those to constant prop, (which could use the ranger as client).   The extra stmt-by-stmt opprtunities found mostly appear to find PHIs that can be folded and the odd condition that doesn't feed a branch.

EVRP also *has* to walk each statement because that is the the model it uses to find ranges. build them from the top down. The ranger model does not have this need.   If uses of the ranger were more prolific in other passes, I think a lot of things the existing VRP passes get on these statements would be caught in the normal processing the other passes do.  We did put a stmt-by-stmt RVRP on the todo list mostly so we could get a better apple-to-apple time comparison and to make sure we would eventually have full coverage of its VRP's abilities.

I would also note again that we have made no attempt to optimize things yet. We don't really know how much extra work we're doing that doesn't need to be redone. We just know the current speed is a pretty good starting point.  We're seeing big improvements in the passes that are changed to use it.

we will get to finding the inefficiencies and try to address them.

> More generally my remarks are still the same as with the irange introduction
> attempt.
>
> It's a no-no to add yet another set of range op primitives given we already have
> them implemented (with trees, yes...) in tre-vrp.c.  The existing ones
> are already
> factored out to some extent so I expect you will continue that and split out
> the wide-int cases from those existing routines.  Yes - that will not be very
> C++-ish - but who cares.  Simply implement C-style workers, for example
> be inspired by the tree-ssa-ccp.c:bit_value_* routines implementing
> a kind of non-zero-bits operation primitives ontop of wide_ints (and also
> wrapping those with tree variants).
Sharing of code to do the same work was covered as something we will get 
to. Aldy is working on that so there is not dual maintenance. I had 
started it with the wide-int-aux.[ch] file in the branch but never got 
very far.Â Â  Yeah, its not super pretty, but it is what it is.

It wont be perfect however as many of those routines are written with 
the assumption that ranges are either a single range or an anti range. 
we no longer have that restriction and we will diverge in those cases. 
Extricating the code from the symbolic handling is also not 
straightforward some times as there are big switches which mix and match 
various bits of each case.

We will however try to distill it down to the wide-int aspects that are 
common.

 Â I would also argue that long term I see value_range disappearing 
completely... and then there wont be any dual maintenance :-)

>
> For most of your on-demand uses I would expect a _much_ simpler and
> less heavy-weight approach to be extending the recently merged
> determine_value_range (I've pointed you to this multiple times in the
> past) to walk SSA def-use chains using the existing SSA on-the-side
> info as cache (yes, w/o use-granular contextual information).
I think what we have is pretty lightweight and more powerful than that 
would be.Â  I started from scratch so we could get all the ideas in one 
place working together and not suffer from limits imposed by existing 
assumptions or information not easily available. Â Â  I think our results 
confirm those benefits. Its fast, easy to use, and its already starting 
to get things EVRP does not get. It is also showing signs of finding 
things VRP itself has trouble with.Â  When we handle equivalencies I 
would expect we will exceed current capabilities.

>
> I also think that purely restricting yourself to wide-ints is a mistake - we
> do very well want to have range information for REAL_CSTs (or composite
> integers / reals).  How do you think of extending Ranger to handle different
> kind of types?  Eric already raised the issue of symbolic ranges.
This is in fact one of the main reasons we wrote everything new, 
including a distinct range class. We expected to want to make this work 
on other types.

Irange and range-ops is a class to handle integer ranges.Â  We have a 
very well defined API for ranges now.Â  The range engine itself simply 
uses this API.

If we write class 'real_range' to manipulate & store real ranges and 
implement 'real_range_ops' to teach the compilerÂ  how real ranges are 
extracted from expressions, the entire ranger engine will work on reals.

There are 2 options to proceed with this

a) we'd either template the ranger to use <class Range> instead of class 
irange, or at least extract out the common code and template that.Â  And 
then we'll have the same on-demand rangerÂ  for "real" ranges, or 
"complex" or whatever.Â Â  so instead of invoking it with 'path_ranger r', 
it'd be something like 'path_ranger<irange> r',Â  or 
path_ranger<real_range> r;

I like that less because I hate templates, but it is an option.Â  The 
more likely case I planned for is:

b) adjust class irange to derive from a base range class with the API as 
virtual functions, and inherit irange and real_range from thatÂ  for the 
ranger to call. Then it would handle both simultaneously.Â  We'd have 
some tweaking to do where the ranger currently checks the type of things 
to be integral, but nothing too serious.Â  The entire structure of the 
range engine will still apply, and we could process both simultaneously.

The second model was also intended from the beginning to also allow us 
to adjust the number of sub-ranges dynamically. Right now we set it to 3 
sub-ranges at compile time, but he intention is to allow this to vary if 
we want.Â  There are cases where some optimization might be willing to 
spend extra time/memory in order to get really accurate range info. 
Maybe switch analysis could use an irange which allowed up to 1000 
subranges so it could very precisely map what the value is at any given 
point.Â  The general case doesn't need that extra memory usage however.

It just gives us flexibility we don't currently have.Â  We also expect to 
do some tests and find out what the "optimum" default number of ranges 
are.Â  I chose 3 because it was handling cases I wanted to get we 
couldn't get before.Â  Maybe a different number is better, we'd have to 
run some experiments to see how often we could use more ranges.

> Yes, I do like having wide-int workers for ranges.
>
> Yes, I do like the idea of making ranges not restricted to single sub-ranges
> (didn't review the implementation).
>
> But those things should have been done on the existing code, not sticking
> sth completely new into GCC that cannot subsume the existing stuff.  That's
> the road to maintainance hell which we unfortunately walk way too often
> with GCC :/
I argue that this will subsume all the existing code related to it. Â  
Sure we cannot right this moment, its not a small task. I think our 
pathway out of maintenance hell involves removing the existing 
value_range code and the various incarnations of VRP we have and using 
something modern designed from scratch to solve these problems.Â  I would 
also predict we could do in the release after this next one. So we'd 
have one release with the ranger and VRP brothers both present, and 
after that it would be just the Ranger.
> Btw, the original goal of EVRP was to get rid of the ASSERT_EXRP-style
> VRP pass once the threading bits it has are subsumed by the backwards
> threader.  Does ranger at least allow that - get rid of the threading inside
> VRP without regressions?
The ranger calculates ranges. It does not implement a complete VRP 
replacement.Â  We will implement a full VRP in time. One of the original 
goals was to get this working so we can remove all the 
micro-optimizations that VRP does because range info isn't available 
anywhere else.Â Â  The clean goal is all the threading related range work 
should be done in threading via calls to the ranger.

So to more directly answer the question.Â Â  I do not know the state 
today.Â  I'm not the threader dude :-) Â  Aldy enhanced the backward 
threader to do some things we couldn't do before, and converted those 
other 3 passes.Â  I think Jeff has on his plate to remove the threading 
parts from VRP and put them back in the threader.Â  So he can clarify 
this.Â Â  By the end of the stage I would expect that threader work to be 
complete and out of VRP.

> Just as a final note - open source development doesn't work like this,
> citing you "I'd like to introduce a project we've been working on for
> the past year
> an a half." - this simply provokes "reviews" like the above.
well, it is what it is.Â Â  I see nothing wrong with the "review". quite 
frankly, it's pretty much as expected.Â  I think it would have been 
almost identical if I'd talked about it last year.

And why can't it can work like this?Â  We aren't suggesting we shove this 
into trunk right now.Â  We've been experimenting with the technology, and 
now we think its developed to the point where we think it can be of 
benefit, and we can demonstrate it works.Â  Its on a branch and if anyone 
else wants to help.Â  awesome.Â  We have a to do list. Â  It'll probably be 
months before we're even ready to be considered for a merge.Â  It has now 
advanced to the point where we have tangible things to talk about for 
the first time.

Sure, we could replace irange with value_range and range-ops with some 
serious reconfiguration of the range extraction code, and the ranger 
would still work. It just needs the same range API.Â  I happen to think 
irange and range-ops offers benefits that value_range can't, and so to 
do that would be a step backwards.

I chose to create irange so we had the flexibility to enhance its 
ability in the future while offering some improvements now.
Range-ops was a desire to unite the range processing code in one place, 
and I needed to add the ability to do algebraic re-arrangements that we 
currently dont do in VRP... the op1_irange and op2_irange code that 
derives ranges for operands given a LHS and the other operand.Â Â  Yeah, 
its not in one place yet, but it will be.

So at what point do we decide community involvement is appropriate. As 
soon as I get the idea? Maybe after I had the third prototype sort-of 
working indicating the approach was feasible?Â  It was pretty ugly then.Â  
I wouldn't want anyone to have seen that :-)Â  Â  All the exact same 
questions would come up then as now.Â  We need to handle symbolics.Â  too 
expensive.Â  reuse the existing range code.Â  Rather than waving my hands 
to answer and not really accomplish anything, I can now point to 
tangible things. (mostly :-)Â  I would have still pushed for irange. and 
range-ops because I think they are the right strategic solution. Â  Â  
Maybe someone else could have helped advance things a little faster, but 
I was a pretty big bottle neck on design and reworking things. ask Aldy.

The only thing that we ever had working that was even worth considering 
submitting was the irange work last July. and that did not get very 
far.Â Â  We could also have waited another year until we have a complete 
VRP replacement working. Then its a drop in solution that removes a lot 
of the arguments about code sharing and not re-using value_range. That 
might have been even easier! I doÂ  think its reached the point where 
there is a lot of benefit to be derived now, and so here it is, lets 
figure out what we can do with it.

Anyway, the "lateness" of he introduction is purely on me.Â  I like to 
get things working before talking about them especially when they start 
with half baked ideas.

I think this is a good strategic direction to go, demonstrates benefit 
now, and has a path to a better/easier to maintain VRP with a 
compiler-wide range resource.

Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-06-01 20:38   ` Andrew MacLeod
@ 2018-06-04 15:17     ` Richard Biener
  2018-06-04 15:22       ` Richard Biener
  2018-06-06  2:42       ` Andrew MacLeod
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2018-06-04 15:17 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC Development, Aldy Hernandez, Jeff Law

On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 06/01/2018 05:48 AM, Richard Biener wrote:
>
> On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacleod@redhat.com> wrote:

bah, gmail now completely mangles quoting :/

>
> This allows queries for a range on an edge, on entry to a block, as an
> operand on an specific statement, or to calculate the range of the
> result of a statement.  There are no prerequisites to use it, simply
> create a path ranger and start using the API.   There is even an
> available function which can be lightly called and doesn't require
> knowledge of the ranger:
>
> So what the wiki pages do not show is how contextual information
> is handled (and how that's done w/o dominators as the wiki page
> claims).  That is, path_range_on_edge suggests that range information
> provided by the (all) controlling stmts of that edge are used when
> determining the range for 'name'.
>
> the wiki doesn't talk about it, but the document itself goes into great detail of how the dynamic process works.
>
> That's probably true for the 'stmt' variant as well.
>
> This leads me to guess that either the cache has to be
> O(number of SSA uses) size or a simple VRP pass implemented
> using Ranger querying ranges of all SSA ops for each stmt
> will require O(number of SSA uses) times analysis?
>
> The current cache is not particularly efficient, its size is  #ssa_name * #BB, assuming you actually go and look at every basic block, which many passes do not.  It also only puts a range in there when it has to.    Im not sure why it really matters unless we find a pathological case that kills compile time which cannot be addressed.  This would all fall under performance analysis and tweaking.  Its the end performance that matters.
>
> When a  range request is made for NAME in a block, it requests a range on each incoming edge, and then unions those ranges, and caches it. This is the range-on-entry cache.   The next time a range request for  NAME is made for on-entry to that block, it simply picks it up from the cache.
>
> The requests for range on an edge uses the gori_map summary to decide whether that block creates a range for NAME or not.
> If the block does not generate a range, it simply requests the range-on-entry to that block.
> If it does generate a range, then it examines the statements defining the condition, gets ranges for those,  and calculates the range that would be generated on that edge.  It also requests ranges for any of those names in the defining chain which originate outside this block.
>
> So it queries all the names which feed other names and so on. The queries end when a statement is found that doesn't generate a useful range. Sometimes it has to go hunt quite a way, but usually they are nearby.  And caching the range-on-entry values prevents the lookups from doing the same work over and over.
>
> Once a range has been calculated, we don't spend much more time calculating it again.   Any other ssa-name which uses that name also doesnt need to recalculate it.  For the most part, we walk the def chains for any given ssa-name and its feeding names once and future requests pretty much come from the cache.
>
> So yes, there is also a lot of recursion in here at the moment. calls for ranges spawning other calls before they can be resolved.  I suspect sometimes this call chain is deep. I don't know, we haven't performed any analysis on it yet.
>
> One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
> pass is that the number of evaluations and the size of the cache
> isn't that way "unbound".  On the WIKI page you mention edge
> info isn't cached yet - whatever that means ;)
>
> So I wonder what's the speed for doing
>
> The on-entry caches prevent it from being "unbound". . and the intersection of incoming edge calculation performs the equivalent merging of ranges that DOM based processing give you.
>
> if (x > 10)
>    A
> else
>    B
> C
>
> block A has x [11, MAX]
> block B has x [MIN, 10]
> block C, at the bottom of a diamond, the 2 incoming edges union those 2 ranges back to [MIN, MAX] and we know nothing about the range of X again.  Accomplished the same thing dominators would tell you.
>
> The edge info that "isnt cached yet", (and may not ever be), is simply the 'static' info calculated by evaluating the few statements at the bottom of the block feeding the condition.  any values coming into the block are cached.    I this case its simply looking at x > 10 and calculating the range on whichever edge.    One of the original design elements called for this to be cached, but in practice I don't think it needs to be.  But we haven't analyzed the performance yet.. so it can only get quicker :-)
>
>
>
>
>   FOR_EACH_BB_FN (bb)
>      FOR_EACH_STMT (stmt)
>        FOR_EACH_SSA_USE (stmt)
>            path_range_on_stmt (..., SSA-USE, stmt);
>
> assuming that it takes into account controlling predicates.
> That's actually what EVRP does (within a DOM walk) given
> it tries to simplify each stmt with based on range info of the ops.
>
>
>   FOR_EACH_BB_FN (bb)
>      FOR_EACH_STMT (stmt)
>         path_range_on_stmt (..., stmt);
>
> would be sufficient.  In order to calculate the range for the definition of the statement, the Ranger will query the range of each SSA operand.
>
> I had Aldy run his cumulative .ii  times with a patch added to EVRP which adds a ranger and makes a call to path_range_stmt on each PHI and every stmt as  evrp processes it.
>
> evrp original time: 2025055 usec
> evrp + ranger:      3787678 usec
>                     -------
>       ranger time:  1762623
>
> So to be fair we aren't actually doing anything and evrp is, but the actual folding of statement is not a big time killer which is the only additional thing EVRP is doing.  This indicates that for a full-on stmt-by-stmt walk and process, we're certainly in the same time ballpark.
>
> A significant point is that the stmt-by-stmt walk is also not the intended target. Thats old school now :-)   Many clients only care about a few ranges.
>
> The "quick and dirty" RVRP pass only looks at the last condition in the block and catches the vast majority of what the stmt-by-stmt walk accomplishes. If timings for the RVRP pass are any indication, I would say it walks/calculates 60% of the statements in the program to calculate flow conditions.  Processing a lot of arithmetic statements that can never result in a fold seems kinda pointless. leave those to constant prop, (which could use the ranger as client).   The extra stmt-by-stmt opprtunities found mostly appear to find PHIs that can be folded and the odd condition that doesn't feed a branch.
>
> EVRP also *has* to walk each statement because that is the the model it uses to find ranges. build them from the top down. The ranger model does not have this need.   If uses of the ranger were more prolific in other passes, I think a lot of things the existing VRP passes get on these statements would be caught in the normal processing the other passes do.  We did put a stmt-by-stmt RVRP on the todo list mostly so we could get a better apple-to-apple time comparison and to make sure we would eventually have full coverage of its VRP's abilities.
>
> I would also note again that we have made no attempt to optimize things yet. We don't really know how much extra work we're doing that doesn't need to be redone. We just know the current speed is a pretty good starting point.  We're seeing big improvements in the passes that are changed to use it.
>
> we will get to finding the inefficiencies and try to address them.
>
>
>
> More generally my remarks are still the same as with the irange introduction
> attempt.
>
> It's a no-no to add yet another set of range op primitives given we already have
> them implemented (with trees, yes...) in tre-vrp.c.  The existing ones
> are already
> factored out to some extent so I expect you will continue that and split out
> the wide-int cases from those existing routines.  Yes - that will not be very
> C++-ish - but who cares.  Simply implement C-style workers, for example
> be inspired by the tree-ssa-ccp.c:bit_value_* routines implementing
> a kind of non-zero-bits operation primitives ontop of wide_ints (and also
> wrapping those with tree variants).
>
> Sharing of code to do the same work was covered as something we will get to. Aldy is working on that so there is not dual maintenance. I had started it with the wide-int-aux.[ch] file in the branch but never got very far.   Yeah, its not super pretty, but it is what it is.
>
> It wont be perfect however as many of those routines are written with the assumption that ranges are either a single range or an anti range. we no longer have that restriction and we will diverge in those cases.

But having 1 single range or N ranges isn't different when it comes to
the core working routines.  Because if you have N ranges it's simply
apply the op N times and union the result.  Yes - the special case is
now that OP (range) can result in multiple ranges but currently
all interesting cases you likely implement are simply covered by
returning either a range or an anti-range.  Because more generally
you'd not want N ranges but some affine thing so you can handle even
more cases without blowing up storage.

> Extricating the code from the symbolic handling is also not straightforward some times as there are big switches which mix and match various bits of each case.

Yes.  Which is why I never really got far with dumping the symbolic
stuff because it's hard to come up with a replacement.  I guess
in reality you want to model symbolic ranges as constraints on affine
expressions and then have some constraint solver to
answer queries -- at least for symbolic ranges the interesting
transforms involve comparison simplifications (aka jump threading)
only.

> We will however try to distill it down to the wide-int aspects that are common.
>
>  I would also argue that long term I see value_range disappearing completely... and then there wont be any dual maintenance :-)
>
>
>
> For most of your on-demand uses I would expect a _much_ simpler and
> less heavy-weight approach to be extending the recently merged
> determine_value_range (I've pointed you to this multiple times in the
> past) to walk SSA def-use chains using the existing SSA on-the-side
> info as cache (yes, w/o use-granular contextual information).
>
> I think what we have is pretty lightweight and more powerful than that would be.  I started from scratch so we could get all the ideas in one place working together and not suffer from limits imposed by existing assumptions or information not easily available.    I think our results confirm those benefits. Its fast, easy to use, and its already starting to get things EVRP does not get. It is also showing signs of finding things VRP itself has trouble with.  When we handle equivalencies I would expect we will exceed current capabilities.
>
>
> I also think that purely restricting yourself to wide-ints is a mistake - we
> do very well want to have range information for REAL_CSTs (or composite
> integers / reals).  How do you think of extending Ranger to handle different
> kind of types?  Eric already raised the issue of symbolic ranges.
>
> This is in fact one of the main reasons we wrote everything new, including a distinct range class. We expected to want to make this work on other types.
>
> Irange and range-ops is a class to handle integer ranges.  We have a very well defined API for ranges now.  The range engine itself simply uses this API.
>
> If we write class 'real_range' to manipulate & store real ranges and implement 'real_range_ops' to teach the compiler  how real ranges are extracted from expressions, the entire ranger engine will work on reals.
>
> There are 2 options to proceed with this
>
> a) we'd either template the ranger to use <class Range> instead of class irange, or at least extract out the common code and template that.  And then we'll have the same on-demand ranger  for "real" ranges, or "complex" or whatever.   so instead of invoking it with 'path_ranger r', it'd be something like 'path_ranger<irange> r',  or path_ranger<real_range> r;
>
> I like that less because I hate templates, but it is an option.  The more likely case I planned for is:
>
> b) adjust class irange to derive from a base range class with the API as virtual functions, and inherit irange and real_range from that  for the ranger to call. Then it would handle both simultaneously.  We'd have some tweaking to do where the ranger currently checks the type of things to be integral, but nothing too serious.  The entire structure of the range engine will still apply, and we could process both simultaneously.
>
> The second model was also intended from the beginning to also allow us to adjust the number of sub-ranges dynamically. Right now we set it to 3 sub-ranges at compile time, but he intention is to allow this to vary if we want.  There are cases where some optimization might be willing to spend extra time/memory in order to get really accurate range info. Maybe switch analysis could use an irange  which allowed up to 1000 subranges so it could very precisely map what the value is at any given point.  The general case doesn't need that extra memory usage however.
>
> It just gives us flexibility we don't currently have.  We also expect to do some tests and find out what the "optimum" default number of ranges are.  I chose 3 because it was handling cases I wanted to get we couldn't get before.  Maybe a different number is better, we'd have to run some experiments to see how often we could use more ranges.
>
>
>
> Yes, I do like having wide-int workers for ranges.
>
> Yes, I do like the idea of making ranges not restricted to single sub-ranges
> (didn't review the implementation).
>
> But those things should have been done on the existing code, not sticking
> sth completely new into GCC that cannot subsume the existing stuff.  That's
> the road to maintainance hell which we unfortunately walk way too often
> with GCC :/
>
> I argue that this will subsume all the existing code related to it.   Sure we cannot right this moment, its not a small task. I think our pathway out of maintenance hell involves removing the existing value_range code and the various incarnations of VRP we have and using something modern designed from scratch to solve these problems.  I would also predict we could do in the release after this next one. So we'd have one release with the ranger and VRP brothers both present, and after that it would be just the Ranger.
>
> Btw, the original goal of EVRP was to get rid of the ASSERT_EXRP-style
> VRP pass once the threading bits it has are subsumed by the backwards
> threader.  Does ranger at least allow that - get rid of the threading inside
> VRP without regressions?
>
> The ranger calculates ranges. It does not implement a complete VRP replacement.  We will implement a full VRP in time. One of the original goals was to get this working so we can remove all the micro-optimizations that VRP does because range info isn't available anywhere else.   The clean goal is all the threading related range work should be done in threading via calls to the ranger.
>
> So to more directly answer the question.   I do not know the state today.  I'm not the threader dude :-)   Aldy enhanced the backward threader to do some things we couldn't do before, and converted those other 3 passes.  I think Jeff has on his plate to remove the threading parts from VRP and put them back in the threader.  So he can clarify this.   By the end of the stage I would expect that threader work to be complete and out of VRP.
>
>
> Just as a final note - open source development doesn't work like this,
> citing you "I'd like to introduce a project we've been working on for
> the past year
> an a half." - this simply provokes "reviews" like the above.
>
> well, it is what it is.   I see nothing wrong with the "review".   quite frankly, it's pretty much as expected.  I think it would have been almost identical if I'd talked about it last year.
>
> And why can't it can work like this?  We aren't suggesting we shove this into trunk right now.  We've been experimenting with the technology, and now we think its developed to the point where we think it can be of benefit, and we can demonstrate it works.  Its on a branch and if anyone else wants to help.  awesome.  We have a to do list.   It'll probably be months before we're even ready to be considered for a merge.  It has now advanced to the point where we have tangible things to talk about for the first time.
>
> Sure, we could replace irange with value_range and range-ops with some serious reconfiguration of the range extraction code, and the ranger would still work. It just needs the same range API.  I happen to think irange and range-ops offers benefits that value_range can't, and so to do that would be a step backwards.
>
> I chose to create irange so we had the flexibility to enhance its ability in the future while offering some improvements now.
> Range-ops was a desire to unite the range processing code in one place, and I needed to add the ability to do algebraic re-arrangements that we currently dont do in VRP... the op1_irange and op2_irange code that derives ranges for operands given a LHS and the other operand.   Yeah, its not in one place yet, but it will be.
>
> So at what point do we decide community involvement is appropriate. As soon as I get the idea? Maybe after I had the third prototype sort-of working indicating the approach was feasible?  It was pretty ugly then.  I wouldn't want anyone to have seen that :-)    All the exact same questions would come up then as now.  We need to handle symbolics.  too expensive.  reuse the existing range code.  Rather than waving my hands to answer and not really accomplish anything, I can now point to tangible things. (mostly :-)  I would have still pushed for irange. and range-ops because I think they are the right strategic solution.     Maybe someone else could have helped advance things a little faster, but I was a pretty big bottle neck on design and reworking things. ask Aldy.
>
> The only thing that we ever had working that was even worth considering submitting was the irange work last July. and that did not get very far.   We could also have waited another year until we have a complete VRP replacement working. Then its a drop in solution that removes a lot of the arguments about code sharing and not re-using value_range. That might have been even easier! I do  think its reached the point where there is a lot of benefit to be derived now, and so here it is, lets figure out what we can do with it.
>
> Anyway, the "lateness" of he introduction is purely on me.  I like to get things working before talking about them especially when they start with half baked ideas.

I would have liked to see factoring out from current code first and
actually enhancing the current value-range represenation by removing
the
restriction on single ranges.

 I don't very much believe in the "start from scratch" approach.
Because history shows we always end up with both, the old and the new
afterwards.

Richard.

> I think this is a good strategic direction to go, demonstrates benefit now, and has a path to a better/easier to maintain VRP with a compiler-wide range resource.
>
> Andrew
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-06-04 15:17     ` Richard Biener
@ 2018-06-04 15:22       ` Richard Biener
  2018-06-06  6:33         ` Andrew MacLeod
  2018-06-06  2:42       ` Andrew MacLeod
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Biener @ 2018-06-04 15:22 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC Development, Aldy Hernandez, Jeff Law

On Mon, Jun 4, 2018 at 5:17 PM Richard Biener
<richard.guenther@gmail.com> wrote:>
> On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >
> > On 06/01/2018 05:48 AM, Richard Biener wrote:
> >
> > On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> bah, gmail now completely mangles quoting :/

Ah no, you sent a multipart mail and it showed the html variant.  Replying in
plain-text mode messes things up then.  And your mail likely got rejected
by the list anyway.

> >
> > This allows queries for a range on an edge, on entry to a block, as an
> > operand on an specific statement, or to calculate the range of the
> > result of a statement.  There are no prerequisites to use it, simply
> > create a path ranger and start using the API.   There is even an
> > available function which can be lightly called and doesn't require
> > knowledge of the ranger:
> >
> > So what the wiki pages do not show is how contextual information
> > is handled (and how that's done w/o dominators as the wiki page
> > claims).  That is, path_range_on_edge suggests that range information
> > provided by the (all) controlling stmts of that edge are used when
> > determining the range for 'name'.
> >
> > the wiki doesn't talk about it, but the document itself goes into great detail of how the dynamic process works.
> >
> > That's probably true for the 'stmt' variant as well.
> >
> > This leads me to guess that either the cache has to be
> > O(number of SSA uses) size or a simple VRP pass implemented
> > using Ranger querying ranges of all SSA ops for each stmt
> > will require O(number of SSA uses) times analysis?
> >
> > The current cache is not particularly efficient, its size is  #ssa_name * #BB, assuming you actually go and look at every basic block, which many passes do not.  It also only puts a range in there when it has to.    Im not sure why it really matters unless we find a pathological case that kills compile time which cannot be addressed.  This would all fall under performance analysis and tweaking.  Its the end performance that matters.
> >
> > When a  range request is made for NAME in a block, it requests a range on each incoming edge, and then unions those ranges, and caches it. This is the range-on-entry cache.   The next time a range request for  NAME is made for on-entry to that block, it simply picks it up from the cache.
> >
> > The requests for range on an edge uses the gori_map summary to decide whether that block creates a range for NAME or not.
> > If the block does not generate a range, it simply requests the range-on-entry to that block.
> > If it does generate a range, then it examines the statements defining the condition, gets ranges for those,  and calculates the range that would be generated on that edge.  It also requests ranges for any of those names in the defining chain which originate outside this block.
> >
> > So it queries all the names which feed other names and so on. The queries end when a statement is found that doesn't generate a useful range. Sometimes it has to go hunt quite a way, but usually they are nearby.  And caching the range-on-entry values prevents the lookups from doing the same work over and over.
> >
> > Once a range has been calculated, we don't spend much more time calculating it again.   Any other ssa-name which uses that name also doesnt need to recalculate it.  For the most part, we walk the def chains for any given ssa-name and its feeding names once and future requests pretty much come from the cache.
> >
> > So yes, there is also a lot of recursion in here at the moment. calls for ranges spawning other calls before they can be resolved.  I suspect sometimes this call chain is deep. I don't know, we haven't performed any analysis on it yet.
> >
> > One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
> > pass is that the number of evaluations and the size of the cache
> > isn't that way "unbound".  On the WIKI page you mention edge
> > info isn't cached yet - whatever that means ;)
> >
> > So I wonder what's the speed for doing
> >
> > The on-entry caches prevent it from being "unbound". . and the intersection of incoming edge calculation performs the equivalent merging of ranges that DOM based processing give you.
> >
> > if (x > 10)
> >    A
> > else
> >    B
> > C
> >
> > block A has x [11, MAX]
> > block B has x [MIN, 10]
> > block C, at the bottom of a diamond, the 2 incoming edges union those 2 ranges back to [MIN, MAX] and we know nothing about the range of X again.  Accomplished the same thing dominators would tell you.
> >
> > The edge info that "isnt cached yet", (and may not ever be), is simply the 'static' info calculated by evaluating the few statements at the bottom of the block feeding the condition.  any values coming into the block are cached.    I this case its simply looking at x > 10 and calculating the range on whichever edge.    One of the original design elements called for this to be cached, but in practice I don't think it needs to be.  But we haven't analyzed the performance yet.. so it can only get quicker :-)
> >
> >
> >
> >
> >   FOR_EACH_BB_FN (bb)
> >      FOR_EACH_STMT (stmt)
> >        FOR_EACH_SSA_USE (stmt)
> >            path_range_on_stmt (..., SSA-USE, stmt);
> >
> > assuming that it takes into account controlling predicates.
> > That's actually what EVRP does (within a DOM walk) given
> > it tries to simplify each stmt with based on range info of the ops.
> >
> >
> >   FOR_EACH_BB_FN (bb)
> >      FOR_EACH_STMT (stmt)
> >         path_range_on_stmt (..., stmt);
> >
> > would be sufficient.  In order to calculate the range for the definition of the statement, the Ranger will query the range of each SSA operand.
> >
> > I had Aldy run his cumulative .ii  times with a patch added to EVRP which adds a ranger and makes a call to path_range_stmt on each PHI and every stmt as  evrp processes it.
> >
> > evrp original time: 2025055 usec
> > evrp + ranger:      3787678 usec
> >                     -------
> >       ranger time:  1762623
> >
> > So to be fair we aren't actually doing anything and evrp is, but the actual folding of statement is not a big time killer which is the only additional thing EVRP is doing.  This indicates that for a full-on stmt-by-stmt walk and process, we're certainly in the same time ballpark.
> >
> > A significant point is that the stmt-by-stmt walk is also not the intended target. Thats old school now :-)   Many clients only care about a few ranges.
> >
> > The "quick and dirty" RVRP pass only looks at the last condition in the block and catches the vast majority of what the stmt-by-stmt walk accomplishes. If timings for the RVRP pass are any indication, I would say it walks/calculates 60% of the statements in the program to calculate flow conditions.  Processing a lot of arithmetic statements that can never result in a fold seems kinda pointless. leave those to constant prop, (which could use the ranger as client).   The extra stmt-by-stmt opprtunities found mostly appear to find PHIs that can be folded and the odd condition that doesn't feed a branch.
> >
> > EVRP also *has* to walk each statement because that is the the model it uses to find ranges. build them from the top down. The ranger model does not have this need.   If uses of the ranger were more prolific in other passes, I think a lot of things the existing VRP passes get on these statements would be caught in the normal processing the other passes do.  We did put a stmt-by-stmt RVRP on the todo list mostly so we could get a better apple-to-apple time comparison and to make sure we would eventually have full coverage of its VRP's abilities.
> >
> > I would also note again that we have made no attempt to optimize things yet. We don't really know how much extra work we're doing that doesn't need to be redone. We just know the current speed is a pretty good starting point.  We're seeing big improvements in the passes that are changed to use it.
> >
> > we will get to finding the inefficiencies and try to address them.
> >
> >
> >
> > More generally my remarks are still the same as with the irange introduction
> > attempt.
> >
> > It's a no-no to add yet another set of range op primitives given we already have
> > them implemented (with trees, yes...) in tre-vrp.c.  The existing ones
> > are already
> > factored out to some extent so I expect you will continue that and split out
> > the wide-int cases from those existing routines.  Yes - that will not be very
> > C++-ish - but who cares.  Simply implement C-style workers, for example
> > be inspired by the tree-ssa-ccp.c:bit_value_* routines implementing
> > a kind of non-zero-bits operation primitives ontop of wide_ints (and also
> > wrapping those with tree variants).
> >
> > Sharing of code to do the same work was covered as something we will get to. Aldy is working on that so there is not dual maintenance. I had started it with the wide-int-aux.[ch] file in the branch but never got very far.   Yeah, its not super pretty, but it is what it is.
> >
> > It wont be perfect however as many of those routines are written with the assumption that ranges are either a single range or an anti range. we no longer have that restriction and we will diverge in those cases.
>
> But having 1 single range or N ranges isn't different when it comes to
> the core working routines.  Because if you have N ranges it's simply
> apply the op N times and union the result.  Yes - the special case is
> now that OP (range) can result in multiple ranges but currently
> all interesting cases you likely implement are simply covered by
> returning either a range or an anti-range.  Because more generally
> you'd not want N ranges but some affine thing so you can handle even
> more cases without blowing up storage.
>
> > Extricating the code from the symbolic handling is also not straightforward some times as there are big switches which mix and match various bits of each case.
>
> Yes.  Which is why I never really got far with dumping the symbolic
> stuff because it's hard to come up with a replacement.  I guess
> in reality you want to model symbolic ranges as constraints on affine
> expressions and then have some constraint solver to
> answer queries -- at least for symbolic ranges the interesting
> transforms involve comparison simplifications (aka jump threading)
> only.
>
> > We will however try to distill it down to the wide-int aspects that are common.
> >
> >  I would also argue that long term I see value_range disappearing completely... and then there wont be any dual maintenance :-)
> >
> >
> >
> > For most of your on-demand uses I would expect a _much_ simpler and
> > less heavy-weight approach to be extending the recently merged
> > determine_value_range (I've pointed you to this multiple times in the
> > past) to walk SSA def-use chains using the existing SSA on-the-side
> > info as cache (yes, w/o use-granular contextual information).
> >
> > I think what we have is pretty lightweight and more powerful than that would be.  I started from scratch so we could get all the ideas in one place working together and not suffer from limits imposed by existing assumptions or information not easily available.    I think our results confirm those benefits. Its fast, easy to use, and its already starting to get things EVRP does not get. It is also showing signs of finding things VRP itself has trouble with.  When we handle equivalencies I would expect we will exceed current capabilities.
> >
> >
> > I also think that purely restricting yourself to wide-ints is a mistake - we
> > do very well want to have range information for REAL_CSTs (or composite
> > integers / reals).  How do you think of extending Ranger to handle different
> > kind of types?  Eric already raised the issue of symbolic ranges.
> >
> > This is in fact one of the main reasons we wrote everything new, including a distinct range class. We expected to want to make this work on other types.
> >
> > Irange and range-ops is a class to handle integer ranges.  We have a very well defined API for ranges now.  The range engine itself simply uses this API.
> >
> > If we write class 'real_range' to manipulate & store real ranges and implement 'real_range_ops' to teach the compiler  how real ranges are extracted from expressions, the entire ranger engine will work on reals.
> >
> > There are 2 options to proceed with this
> >
> > a) we'd either template the ranger to use <class Range> instead of class irange, or at least extract out the common code and template that.  And then we'll have the same on-demand ranger  for "real" ranges, or "complex" or whatever.   so instead of invoking it with 'path_ranger r', it'd be something like 'path_ranger<irange> r',  or path_ranger<real_range> r;
> >
> > I like that less because I hate templates, but it is an option.  The more likely case I planned for is:
> >
> > b) adjust class irange to derive from a base range class with the API as virtual functions, and inherit irange and real_range from that  for the ranger to call. Then it would handle both simultaneously.  We'd have some tweaking to do where the ranger currently checks the type of things to be integral, but nothing too serious.  The entire structure of the range engine will still apply, and we could process both simultaneously.
> >
> > The second model was also intended from the beginning to also allow us to adjust the number of sub-ranges dynamically. Right now we set it to 3 sub-ranges at compile time, but he intention is to allow this to vary if we want.  There are cases where some optimization might be willing to spend extra time/memory in order to get really accurate range info. Maybe switch analysis could use an irange  which allowed up to 1000 subranges so it could very precisely map what the value is at any given point.  The general case doesn't need that extra memory usage however.
> >
> > It just gives us flexibility we don't currently have.  We also expect to do some tests and find out what the "optimum" default number of ranges are.  I chose 3 because it was handling cases I wanted to get we couldn't get before.  Maybe a different number is better, we'd have to run some experiments to see how often we could use more ranges.
> >
> >
> >
> > Yes, I do like having wide-int workers for ranges.
> >
> > Yes, I do like the idea of making ranges not restricted to single sub-ranges
> > (didn't review the implementation).
> >
> > But those things should have been done on the existing code, not sticking
> > sth completely new into GCC that cannot subsume the existing stuff.  That's
> > the road to maintainance hell which we unfortunately walk way too often
> > with GCC :/
> >
> > I argue that this will subsume all the existing code related to it.   Sure we cannot right this moment, its not a small task. I think our pathway out of maintenance hell involves removing the existing value_range code and the various incarnations of VRP we have and using something modern designed from scratch to solve these problems.  I would also predict we could do in the release after this next one. So we'd have one release with the ranger and VRP brothers both present, and after that it would be just the Ranger.
> >
> > Btw, the original goal of EVRP was to get rid of the ASSERT_EXRP-style
> > VRP pass once the threading bits it has are subsumed by the backwards
> > threader.  Does ranger at least allow that - get rid of the threading inside
> > VRP without regressions?
> >
> > The ranger calculates ranges. It does not implement a complete VRP replacement.  We will implement a full VRP in time. One of the original goals was to get this working so we can remove all the micro-optimizations that VRP does because range info isn't available anywhere else.   The clean goal is all the threading related range work should be done in threading via calls to the ranger.
> >
> > So to more directly answer the question.   I do not know the state today.  I'm not the threader dude :-)   Aldy enhanced the backward threader to do some things we couldn't do before, and converted those other 3 passes.  I think Jeff has on his plate to remove the threading parts from VRP and put them back in the threader.  So he can clarify this.   By the end of the stage I would expect that threader work to be complete and out of VRP.
> >
> >
> > Just as a final note - open source development doesn't work like this,
> > citing you "I'd like to introduce a project we've been working on for
> > the past year
> > an a half." - this simply provokes "reviews" like the above.
> >
> > well, it is what it is.   I see nothing wrong with the "review".   quite frankly, it's pretty much as expected.  I think it would have been almost identical if I'd talked about it last year.
> >
> > And why can't it can work like this?  We aren't suggesting we shove this into trunk right now.  We've been experimenting with the technology, and now we think its developed to the point where we think it can be of benefit, and we can demonstrate it works.  Its on a branch and if anyone else wants to help.  awesome.  We have a to do list.   It'll probably be months before we're even ready to be considered for a merge.  It has now advanced to the point where we have tangible things to talk about for the first time.
> >
> > Sure, we could replace irange with value_range and range-ops with some serious reconfiguration of the range extraction code, and the ranger would still work. It just needs the same range API.  I happen to think irange and range-ops offers benefits that value_range can't, and so to do that would be a step backwards.
> >
> > I chose to create irange so we had the flexibility to enhance its ability in the future while offering some improvements now.
> > Range-ops was a desire to unite the range processing code in one place, and I needed to add the ability to do algebraic re-arrangements that we currently dont do in VRP... the op1_irange and op2_irange code that derives ranges for operands given a LHS and the other operand.   Yeah, its not in one place yet, but it will be.
> >
> > So at what point do we decide community involvement is appropriate. As soon as I get the idea? Maybe after I had the third prototype sort-of working indicating the approach was feasible?  It was pretty ugly then.  I wouldn't want anyone to have seen that :-)    All the exact same questions would come up then as now.  We need to handle symbolics.  too expensive.  reuse the existing range code.  Rather than waving my hands to answer and not really accomplish anything, I can now point to tangible things. (mostly :-)  I would have still pushed for irange. and range-ops because I think they are the right strategic solution.     Maybe someone else could have helped advance things a little faster, but I was a pretty big bottle neck on design and reworking things. ask Aldy.
> >
> > The only thing that we ever had working that was even worth considering submitting was the irange work last July. and that did not get very far.   We could also have waited another year until we have a complete VRP replacement working. Then its a drop in solution that removes a lot of the arguments about code sharing and not re-using value_range. That might have been even easier! I do  think its reached the point where there is a lot of benefit to be derived now, and so here it is, lets figure out what we can do with it.
> >
> > Anyway, the "lateness" of he introduction is purely on me.  I like to get things working before talking about them especially when they start with half baked ideas.
>
> I would have liked to see factoring out from current code first and
> actually enhancing the current value-range represenation by removing
> the
> restriction on single ranges.
>
>  I don't very much believe in the "start from scratch" approach.
> Because history shows we always end up with both, the old and the new
> afterwards.
>
> Richard.
>
> > I think this is a good strategic direction to go, demonstrates benefit now, and has a path to a better/easier to maintain VRP with a compiler-wide range resource.
> >
> > Andrew
> >
> >
> >
> >
> >
> >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-06-04 15:17     ` Richard Biener
  2018-06-04 15:22       ` Richard Biener
@ 2018-06-06  2:42       ` Andrew MacLeod
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew MacLeod @ 2018-06-06  2:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development, Aldy Hernandez, Jeff Law

On 06/04/2018 11:17 AM, Richard Biener wrote:
>
>> It wont be perfect however as many of those routines are written with the assumption that ranges are either a single range or an anti range. we no longer have that restriction and we will diverge in those cases.
> But having 1 single range or N ranges isn't different when it comes to
> the core working routines.  Because if you have N ranges it's simply
> apply the op N times and union the result.  Yes - the special case is


> now that OP (range) can result in multiple ranges but currently
> all interesting cases you likely implement are simply covered by
> returning either a range or an anti-range.  Because more generally
Yeah, This applies to most of the arithmetic operations in particular. Â  
The range code indeed loops through the sub-ranges applying the 
operation to each one.
And yes most cases do result in a single range, which is why it is ripe 
for sharing.Â  But we also aren't limited to it.Â  There are times where 
we can provide different results, such as with AND.Â  for signed char X:
(X & 0xF0) Â Â  vrp produces Â  [-128, 112]Â  because it has to be a range 
or an anti range, and has to include 0. Â  with the 3 ranges we use right 
now, we generateÂ  [-128, -16][0, 0][16, 112]
Even with 2 ranges, we could generate [-128,0][16,112]Â  (or 
[-128,-16][0,112]) , both of which are somewhat better.

And we can also do better in cases when ranges are combined, like
 Â Â Â Â  if (x != 20 && x != 30) by producingÂ  [-128, 19][21, 29][31, 127] 
signed char
> you'd not want N ranges but some affine thing so you can handle even
> more cases without blowing up storage.
Why don't we want N ranges?Â Â Â  when we exceed theÂ  number of supported 
ranges in irange, it simply starts merging ranges.Â  In the AND example 
with 3 sub-ranges supported,Â  we produce [-128, -16][0, 0][16, 112] .Â  
If we try to add more sub-ranges than are supported, it simply starts 
merging them.Â  If we only supported 2 ranges instead of 3, the AND code 
would end up producing [-128,0][16,112]Â  or [-128,-16][0,112].Â  both of 
which are correct, just not as precise as if irange supported more. 
Class irange takes care of managing the storage and exact number of 
ranges, so none of the code in range-ops changes to deal with this.. it 
is just works on whatever it is given to produce the best result it can.

>> Extricating the code from the symbolic handling is also not straightforward some times as there are big switches which mix and match various bits of each case.
> Yes.  Which is why I never really got far with dumping the symbolic
> stuff because it's hard to come up with a replacement.  I guess
> in reality you want to model symbolic ranges as constraints on affine
> expressions and then have some constraint solver to
> answer queries -- at least for symbolic ranges the interesting
> transforms involve comparison simplifications (aka jump threading)
> only.
>
Yeah its not trivial. The model I'm pursuing tracksÂ  the symbolic 
relationships separately from the range constraints. It can use the 
range constraints if they are useful, but only if they help.

It focuses on the relationships between symbols generated by statements 
and combines sequentially encountered relationships together to 
determine a new relationship and make decisions.Â  And yes those are 
primarily are comparison points where we'll try to simplify the condition.

for instance
 Â  i = j + 1
 Â  if (i == j)

we don't need a range for 'i' or 'j' to determine this is never true, 
the relationship established in i = j + 1Â  is either (i != j) or (i < j) 
depending on the type and range of 'i' or 'j'.Â  Either result applied to 
the following (i == j) relationship makes it clear that the second 
condition can never be true.Â  We'll do it in the reverse order like the 
rangers current model...see the condition, decide it merits looking at 
for simplification, and look back to see if there are any relationships 
feeding it which can simplify it.

Anyway, thats the general direction I'm working in to resolve that set 
of problems.Â  And I plan toÂ  integrate it with a copy/cast tracker which 
will allow us to see through copies and "equivalent" equivalencies so we 
don't get tricked :-)


>>
>> Anyway, the "lateness" of he introduction is purely on me.  I like to get things working before talking about them especially when they start with half baked ideas.
> I would have liked to see factoring out from current code first and
> actually enhancing the current value-range represenation by removing
> the
> restriction on single ranges.

we started with something like that in mind, but the presence of 
symbolics embedded in the range made it incredibly awkward to add 
support for non-single ranges to the existing representation.

Factoring out the constant bounds aspects of the code is what Aldy is 
currently working on.Â  When he's finished the existing functions should 
contain just the symbolic processing part of the code, and they will 
call a common base to handle the constant bounds cases.

>
>   I don't very much believe in the "start from scratch" approach.
> Because history shows we always end up with both, the old and the new
> afterwards.

I will try very, very hard to make sure we don't get stuck with both.
Think of it this way... this is a path to being able to hand off all 
range and vrp-related problems to someone else :-)

Andrew

PS. I tried manually overriding thunderbird settings on this to make 
this plain text. Hope it worked :-P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Project Ranger
  2018-06-04 15:22       ` Richard Biener
@ 2018-06-06  6:33         ` Andrew MacLeod
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew MacLeod @ 2018-06-06  6:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development, Aldy Hernandez, Jeff Law

On 06/04/2018 11:21 AM, Richard Biener wrote:
> On Mon, Jun 4, 2018 at 5:17 PM Richard Biener
> <richard.guenther@gmail.com> wrote:>
>> On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>> On 06/01/2018 05:48 AM, Richard Biener wrote:
>>>
>>> On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacleod@redhat.com> wrote:
>> bah, gmail now completely mangles quoting :/
> Ah no, you sent a multipart mail and it showed the html variant.  Replying in
> plain-text mode messes things up then.  And your mail likely got rejected
> by the list anyway.
>
No mail got rejected.

Sorry.Â  I don't know what my mailer does sometimes.Â  I use thunderbird 
and gmail is the underlying service.
anything to the mailing list is suppose to go as plain-text only, 
according to my thunderbird settings.Â  maybe it sends different things 
to different recipients?

about once a year suddenly the mailing lists start rejecting my emails, 
and then I go poking through the settings trying to understand why, and 
then it just as suddenly stops :-P
I don't have the patience to understand. Maybe its a bug.Â  It was 
probably compiled with llvm. :-)

ooo. cheap shot.

Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-06-06  2:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-29 23:53 Project Ranger Andrew MacLeod
2018-05-30  7:49 ` Eric Botcazou
2018-05-30 14:03   ` Andrew MacLeod
2018-06-01  8:16     ` Eric Botcazou
2018-05-30 14:39 ` David Malcolm
2018-05-30 14:45   ` Andrew MacLeod
2018-05-30 14:51   ` Andreas Schwab
2018-05-30 14:53     ` Aldy Hernandez
2018-06-01  9:49 ` Richard Biener
2018-06-01 20:38   ` Andrew MacLeod
2018-06-04 15:17     ` Richard Biener
2018-06-04 15:22       ` Richard Biener
2018-06-06  6:33         ` Andrew MacLeod
2018-06-06  2:42       ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).