* On-Demand range technology [2/5] - Major Components : How it works
@ 2019-05-23 1:28 Andrew MacLeod
2019-05-23 12:55 ` Richard Biener
0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-23 1:28 UTC (permalink / raw)
To: GCC; +Cc: Jeff Law, Aldy Hernandez
*This note will talk about the 4 major components of the prototype and
explain how they work together.  I will be fairly light on detail just
to give an overview, we can delve into whatever details are needed.
- Range-ops : Range operations at the statement level
- GORI - Generates Outgoing Range Info : Ranges generated by basic blocks
- GORI Cache - combining and caching basic-block range info
- Ranger - API and query control
1 * Range-ops
--------------------
The first major component is the centralized range operation database.
This is where range operations for tree-codes are implemented. The
compiler currently has code which can fold ranges, but the new mechanism
is a general purpose solver which can solve for other operands. If there
are N input/output operands, and we have ranges for N-1, It is often
possible to derive the missing range.  ie
   lhs = op1 + op2
The existing code in the compiler can solve for LHS given ranges for OP1
and OP2. This has been extended so that we can also sometimes solve for
either op1 or op2 e, ie
   [20,40] = op1 + [5, 10]
...can calculate that op1 has the range [15, 30]
This ability to solve for the other operands provides the ability to
calculate ranges in the reverse order we are accustomed to, and is key
to enabling the on-demand range approach.
   a_2 = b_1 - 20
   if (a_2 < 40)
 A conditional jump has an implicit boolean LHS depending on which edge
is taken. To evaluate ranges on the TRUE edge of the branch, the LHS is
[1,1]. To calculate the range for a_2 we simply solve the equation:
   [1,1] = a_2 < 40
which provides the answer as [0,39]. Furthermore, substituting this
range for a_2 as the LHS of itâs definition statement:
   [0,39] = b_1 - 20
The same evaluation mechanism can calculate a result for b_1 on the
true edge as [20,59]. This is the key feature which allows the rest of
the algorithm to work in a general way.
All operations are driven from this component, and the only thing
required to add support for another tree code is to implement one or
more methods for it here, and the rest of the range mechanism will
simply work with it.
2 * GORI
------------
The second component is the âGenerates Outgoing Range Infoâ engine.Â
This is a basic-block oriented component which determines what ssa-names
have ranges created on outgoing edges, and how to calculate those ranges.
The last statement in the block is examined to see if it is a branch
which generates range info, and then determines which ssa-names in the
block can have ranges calculated. It quickly walks the use-def chain
from the branch to find other definitions within the block that can
impact the branch and could also have their ranges calculated. This
summary is then cached for future use.
The GORI component also uses this summary info to perform the
calculations necessary to determine the outgoing range for any ssa_name
which can be determined. For example:
   c_3 = foo ()
   a_2 = b_1 - 20
   If (a_2 < 40)
The summary information would indicate that b_1 and a_2 can have their
outgoing ranges calculated for this block, and uses the cached
information to quickly calculate them when required.
The API consists of 2 basic methods, query and calculate:
 - bool has_edge_range_p (edge, name);
 - range outgoing_edge_range (edge, name);
If a query is made for any other ssa-name, it simply returns false and
says this block does not generate a range for that name. This is a key
rationale for the summary so we only spend time processing names for
blocks that actually have ranges generated.
3 * GORI cache
---------------------
The third component builds on the basic block GORI component, and adds
the ability to traverse the CFG and combine the various ranges provided
by each basic block into a cache.
It provides both a global-range cache and a range-on-entry cache
summarizing the range of an ssa_name on entry to the block.
The range-on-entry cache is self filling, and iteratively evaluates back
edges. There is a single API entry point which asks for the range on
entry, and if the data has not been calculated/cached already, it spawns
the following process to calculate the data. :
   * Walk the CFG from the current block to the definition block,
and/or any block which has range-on-entry information already
calculated. These end points are the source blocks.
   * Any non-source blocks visited are added to a worklist to be updated.
   * Starting with the source blocks, push the known outgoing range
from each block to each successor and update their live-on-entry values
when they are visited. If the incoming range changes, mark this blockâs
successors as requiring updating.
   * Continue iterating until no more updates are required.
The ordering is planned by the ranger such that if there are no back
edges, it is typically a straightforward single pass. When back edges
are involved, the amount of iterating is typically very small. The
updating is restricted to a single ssa-name, meaning it doesnât get into
feedback loops with other names nor PHIs. . . It usually converges very
quickly.
It is important to note that this works exclusively with static ranges
of only a single ssa-name at a time. Ie, ranges which are implicitly
exposed in the IL, and only name being examined. The values returned by
these queries are not dependent on changes in other ssa-names, which is
how the iteration process never gets out of control.
The ranger making the calls to fill this cache has a higher level
overview, and requests these ranges in definition order such that any
ssa-names feeding the definition of a name having its cache filled are
resolved first, providing the best possible results the first time.
   a_2 = b_1 - 20
   If (a_2 < 40)
If the range for a_2 is requested on the true side, the ranger will
first calculate the range of b_1 on entry to the block. Then use this to
calculate the global range of a_2, and finally for the outgoing range on
the desired edge.
If at some later point, it is discovered that the incoming range of b_1
has changed in such a way that it has an impact on the outgoing range of
a_2, the iterative update process can be reapplied by the ranger to
update the relevant cache entries. This is usually only required in
cases where multiple ssa-names are affected by back edges and feed each
other.
4 * Ranger
----------------
The final component is the Ranger which provides a simple API to clients
where they can make various simple requests:
   - Range of an ssa-name at a statement location
   - Range of the result of a stmt
   - Range of an ssa-name on an edge
   - Range of an ssa-name after the last statement in a block
   - Range of an ssa name on entry to a block
The ranger is responsible for processing the IL, walking use/def chainsÂ
and coordinating the various calls into the GORI components to
pre-satisfy as many conditions as possible before any cache filling is
performed. It is also responsible for triggering any additional updates
which may be required due to newly discovered ranges. We in fact donât
do this yet, but is rarely required as it turns out.
Summary
---------------
All evaluations are done on-demand. If no queries are made, there is no
cost. The Ranger has no prerequisites other than a valid IL and CFG. It
does not need dominators, nor does it require any other changes within
an optimization pass in order to be used.
When queries are made, only the minimum effort required is made to
satisfy the request. This is a big win when it comes to passes which
only need occasional range information such as warning passes. Some of
these passes actually instantiate a new ranger each time they make a
request, as a demonstration of the low overhead.
As for passes such as VRP which require complete range information, the
caching process prevents the same information from being calculated more
than once. This means a similar amount of work is done with this
approach, as with the traditional top-down approach currently being
used, except we also process the back edges and iteratively solve for them.
**Comments and feedback always welcome!Thanks
Andrew
*
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-23 1:28 On-Demand range technology [2/5] - Major Components : How it works Andrew MacLeod
@ 2019-05-23 12:55 ` Richard Biener
2019-05-24 15:50 ` Andrew MacLeod
0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2019-05-23 12:55 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On Thu, May 23, 2019 at 3:28 AM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> *This note will talk about the 4 major components of the prototype and
> explain how they work together. I will be fairly light on detail just
> to give an overview, we can delve into whatever details are needed.
> - Range-ops : Range operations at the statement level
> - GORI - Generates Outgoing Range Info : Ranges generated by basic blocks
> - GORI Cache - combining and caching basic-block range info
> - Ranger - API and query control
>
> 1 * Range-ops
> --------------------
> The first major component is the centralized range operation database.
> This is where range operations for tree-codes are implemented. The
> compiler currently has code which can fold ranges, but the new mechanism
> is a general purpose solver which can solve for other operands. If there
> are N input/output operands, and we have ranges for N-1, It is often
> possible to derive the missing range. ie
> lhs = op1 + op2
> The existing code in the compiler can solve for LHS given ranges for OP1
> and OP2. This has been extended so that we can also sometimes solve for
> either op1 or op2 e, ie
> [20,40] = op1 + [5, 10]
> ...can calculate that op1 has the range [15, 30]
>
> This ability to solve for the other operands provides the ability to
> calculate ranges in the reverse order we are accustomed to, and is key
> to enabling the on-demand range approach.
> a_2 = b_1 - 20
> if (a_2 < 40)
> A conditional jump has an implicit boolean LHS depending on which edge
> is taken. To evaluate ranges on the TRUE edge of the branch, the LHS is
> [1,1]. To calculate the range for a_2 we simply solve the equation:
> [1,1] = a_2 < 40
> which provides the answer as [0,39]. Furthermore, substituting this
> range for a_2 as the LHS of it’s definition statement:
> [0,39] = b_1 - 20
> The same evaluation mechanism can calculate a result for b_1 on the
> true edge as [20,59]. This is the key feature which allows the rest of
> the algorithm to work in a general way.
>
> All operations are driven from this component, and the only thing
> required to add support for another tree code is to implement one or
> more methods for it here, and the rest of the range mechanism will
> simply work with it.
>
>
> 2 * GORI
> ------------
> The second component is the “Generates Outgoing Range Info” engine.
> This is a basic-block oriented component which determines what ssa-names
> have ranges created on outgoing edges, and how to calculate those ranges.
>
> The last statement in the block is examined to see if it is a branch
> which generates range info, and then determines which ssa-names in the
> block can have ranges calculated. It quickly walks the use-def chain
> from the branch to find other definitions within the block that can
> impact the branch and could also have their ranges calculated. This
> summary is then cached for future use.
>
> The GORI component also uses this summary info to perform the
> calculations necessary to determine the outgoing range for any ssa_name
> which can be determined. For example:
> c_3 = foo ()
> a_2 = b_1 - 20
> If (a_2 < 40)
> The summary information would indicate that b_1 and a_2 can have their
> outgoing ranges calculated for this block, and uses the cached
> information to quickly calculate them when required.
> The API consists of 2 basic methods, query and calculate:
> - bool has_edge_range_p (edge, name);
> - range outgoing_edge_range (edge, name);
> If a query is made for any other ssa-name, it simply returns false and
> says this block does not generate a range for that name. This is a key
> rationale for the summary so we only spend time processing names for
> blocks that actually have ranges generated.
So the current code (in a few cases) derives ranges for SSA uses when
they see for example
_3 = _4 / _5;
here _5 != 0 for the stmts following this one (OK, this is a bad example
because for divisions we don't do this, but we do it for derefs and ~[0,0]).
The above suggests that iff this is done at all it is not in GORI because
those are not conditional stmts or ranges from feeding those. The
machinery doing the use-def walking from stmt context also cannot
come along these so I have the suspicion that Ranger cannot handle
telling us that for the stmt following above, for example
if (_5 != 0)
that _5 is not zero?
Can you clarify?
>
> 3 * GORI cache
> ---------------------
> The third component builds on the basic block GORI component, and adds
> the ability to traverse the CFG and combine the various ranges provided
> by each basic block into a cache.
>
> It provides both a global-range cache and a range-on-entry cache
> summarizing the range of an ssa_name on entry to the block.
>
> The range-on-entry cache is self filling, and iteratively evaluates back
> edges. There is a single API entry point which asks for the range on
> entry, and if the data has not been calculated/cached already, it spawns
> the following process to calculate the data. :
> * Walk the CFG from the current block to the definition block,
> and/or any block which has range-on-entry information already
> calculated. These end points are the source blocks.
> * Any non-source blocks visited are added to a worklist to be updated.
> * Starting with the source blocks, push the known outgoing range
> from each block to each successor and update their live-on-entry values
> when they are visited. If the incoming range changes, mark this block’s
> successors as requiring updating.
> * Continue iterating until no more updates are required.
>
> The ordering is planned by the ranger such that if there are no back
> edges, it is typically a straightforward single pass. When back edges
> are involved, the amount of iterating is typically very small. The
> updating is restricted to a single ssa-name, meaning it doesn’t get into
> feedback loops with other names nor PHIs. . . It usually converges very
> quickly.
>
> It is important to note that this works exclusively with static ranges
> of only a single ssa-name at a time. Ie, ranges which are implicitly
> exposed in the IL, and only name being examined. The values returned by
> these queries are not dependent on changes in other ssa-names, which is
> how the iteration process never gets out of control.
>
> The ranger making the calls to fill this cache has a higher level
> overview, and requests these ranges in definition order such that any
> ssa-names feeding the definition of a name having its cache filled are
> resolved first, providing the best possible results the first time.
> a_2 = b_1 - 20
> If (a_2 < 40)
> If the range for a_2 is requested on the true side, the ranger will
> first calculate the range of b_1 on entry to the block. Then use this to
> calculate the global range of a_2, and finally for the outgoing range on
> the desired edge.
>
> If at some later point, it is discovered that the incoming range of b_1
> has changed in such a way that it has an impact on the outgoing range of
> a_2, the iterative update process can be reapplied by the ranger to
> update the relevant cache entries. This is usually only required in
> cases where multiple ssa-names are affected by back edges and feed each
> other.
>
> 4 * Ranger
> ----------------
> The final component is the Ranger which provides a simple API to clients
> where they can make various simple requests:
> - Range of an ssa-name at a statement location
> - Range of the result of a stmt
> - Range of an ssa-name on an edge
> - Range of an ssa-name after the last statement in a block
> - Range of an ssa name on entry to a block
>
> The ranger is responsible for processing the IL, walking use/def chains
> and coordinating the various calls into the GORI components to
> pre-satisfy as many conditions as possible before any cache filling is
> performed. It is also responsible for triggering any additional updates
> which may be required due to newly discovered ranges. We in fact don’t
> do this yet, but is rarely required as it turns out.
>
> Summary
> ---------------
> All evaluations are done on-demand. If no queries are made, there is no
> cost. The Ranger has no prerequisites other than a valid IL and CFG. It
> does not need dominators, nor does it require any other changes within
> an optimization pass in order to be used.
Does it need fully up-to-date SSA form or does it only rely on appropriate
SSA uses on stmts in the use-def chain of the SSA name we query
range-info for? Does it use immediate uses or stmt operands at all
or just use-def chains?
I'm asking because some passes leave parts of the CFG not updated
while working on other parts to reduce the number of update_ssa calls
(ideally to a single one).
> When queries are made, only the minimum effort required is made to
> satisfy the request. This is a big win when it comes to passes which
> only need occasional range information such as warning passes. Some of
> these passes actually instantiate a new ranger each time they make a
> request, as a demonstration of the low overhead.
>
> As for passes such as VRP which require complete range information, the
> caching process prevents the same information from being calculated more
> than once. This means a similar amount of work is done with this
> approach, as with the traditional top-down approach currently being
> used, except we also process the back edges and iteratively solve for them.
What's the storage requirement for the cache when doing VRP pass like
processing? What's the compile-time complexity? I read you are saying
"similar amount of work" but I guess that's from empirical data comparing
to the existing VRP implementations (SSA VRP and EVRP) rather than
theoretical bounds?
Not sure where to put it so let me put it here: I'd like to see the SSA
based VRP pass to die (in fact I'd like all ssa-propagator based
passes to die...).
EVRP is the future here and I can see EVRP making use of the
reverse part of Range-Ops to replace the ad-hoc pattern matching
in register_edge_assert_for (badly named function that now gets you
back a list of assertions hold on an edge). So I hope to see VRP
yanked (it performing jump threading is the major obstackle) and
Range-Ops being used as common infrastructure for ranges (that leaves
EVRP). I don't really want something "additional" leaving things for
other people to cleanup / merge - honestly you don't have the very best
track record in this area (it might be your main area of work is not GCC,
still the @redhat tag on your mail is an obligation here).
Now I've said the above let me continue with 3/n.
Richard.
> **Comments and feedback always welcome!Thanks
> Andrew
> *
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-23 12:55 ` Richard Biener
@ 2019-05-24 15:50 ` Andrew MacLeod
2019-05-27 13:03 ` Richard Biener
0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-24 15:50 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 5/23/19 8:55 AM, Richard Biener wrote:
> On Thu, May 23, 2019 at 3:28 AM Andrew MacLeod <amacleod@redhat.com> wrote:
>>
>> 2 * GORI
>> ------------
>> The second component is the âGenerates Outgoing Range Infoâ engine.
>> This is a basic-block oriented component which determines what ssa-names
>> have ranges created on outgoing edges, and how to calculate those ranges.
>>
>> The last statement in the block is examined to see if it is a branch
>> which generates range info, and then determines which ssa-names in the
>> block can have ranges calculated. It quickly walks the use-def chain
>> from the branch to find other definitions within the block that can
>> impact the branch and could also have their ranges calculated. This
>> summary is then cached for future use.
>>
>> The GORI component also uses this summary info to perform the
>> calculations necessary to determine the outgoing range for any ssa_name
>> which can be determined. For example:
>> c_3 = foo ()
>> a_2 = b_1 - 20
>> If (a_2 < 40)
>> The summary information would indicate that b_1 and a_2 can have their
>> outgoing ranges calculated for this block, and uses the cached
>> information to quickly calculate them when required.
>> The API consists of 2 basic methods, query and calculate:
>> - bool has_edge_range_p (edge, name);
>> - range outgoing_edge_range (edge, name);
>> If a query is made for any other ssa-name, it simply returns false and
>> says this block does not generate a range for that name. This is a key
>> rationale for the summary so we only spend time processing names for
>> blocks that actually have ranges generated.
> So the current code (in a few cases) derives ranges for SSA uses when
> they see for example
>
> _3 = _4 / _5;
>
> here _5 != 0 for the stmts following this one (OK, this is a bad example
> because for divisions we don't do this, but we do it for derefs and ~[0,0]).
>
> The above suggests that iff this is done at all it is not in GORI because
> those are not conditional stmts or ranges from feeding those. The
> machinery doing the use-def walking from stmt context also cannot
> come along these so I have the suspicion that Ranger cannot handle
> telling us that for the stmt following above, for example
>
> if (_5 != 0)
>
> that _5 is not zero?
>
> Can you clarify?
So there are 2 aspects to this.   the range-ops code for DIV_EXPR, if
asked for the range of op2 () would return ~[0,0] for _5.
But you are also correct in that the walk backwards would not find this.
This is similar functionality to how null_derefs are currently handled,
and in fact could probably be done simultaneously using the same code
base.  I didn't bring null derefs up, but this is a good time :-)
There is a separate class used by the gori-cache which tracks the
non-nullness property at the block level. Â Â It has a single API:
non_null_deref_p (name, bb) Â Â which determines whether the is a
dereference in any BB for NAME, which indicates whether the range has an
implicit ~[0,0] range in that basic block or not.
It it implemented by caching a bitmap which has a 1 set for any block
which contains a deref of that name, so after the first call, it is
simply a matter of returning that bit. The first time the API is
called, it does a immediate-use walk for name and sets the bit for any
block which contains a statement for which we can infer a non-null result.
We'd want the exact same functionality for divide by zero.. we can infer
~[0,0] from the DIV_EXPR for the block that statement is in. The call is
already made, it just returns false right now if it isn't a pointer type.
I'll add that and make sure it works.
>
>> 3 * GORI cache
>> ---------------------
>> The third component builds on the basic block GORI component, and adds
>> the ability to traverse the CFG and combine the various ranges provided
>> by each basic block into a cache.
>>
>> It provides both a global-range cache and a range-on-entry cache
>> summarizing the range of an ssa_name on entry to the block.
>>
>> The range-on-entry cache is self filling, and iteratively evaluates back
>> edges. There is a single API entry point which asks for the range on
>> entry, and if the data has not been calculated/cached already, it spawns
>> the following process to calculate the data. :
>> * Walk the CFG from the current block to the definition block,
>> and/or any block which has range-on-entry information already
>> calculated. These end points are the source blocks.
>> * Any non-source blocks visited are added to a worklist to be updated.
>> * Starting with the source blocks, push the known outgoing range
>> from each block to each successor and update their live-on-entry values
>> when they are visited. If the incoming range changes, mark this blockâs
>> successors as requiring updating.
>> * Continue iterating until no more updates are required.
>>
>> The ordering is planned by the ranger such that if there are no back
>> edges, it is typically a straightforward single pass. When back edges
>> are involved, the amount of iterating is typically very small. The
>> updating is restricted to a single ssa-name, meaning it doesnât get into
>> feedback loops with other names nor PHIs. . . It usually converges very
>> quickly.
>>
>> It is important to note that this works exclusively with static ranges
>> of only a single ssa-name at a time. Ie, ranges which are implicitly
>> exposed in the IL, and only name being examined. The values returned by
>> these queries are not dependent on changes in other ssa-names, which is
>> how the iteration process never gets out of control.
>>
>> The ranger making the calls to fill this cache has a higher level
>> overview, and requests these ranges in definition order such that any
>> ssa-names feeding the definition of a name having its cache filled are
>> resolved first, providing the best possible results the first time.
>> a_2 = b_1 - 20
>> If (a_2 < 40)
>> If the range for a_2 is requested on the true side, the ranger will
>> first calculate the range of b_1 on entry to the block. Then use this to
>> calculate the global range of a_2, and finally for the outgoing range on
>> the desired edge.
>>
>> If at some later point, it is discovered that the incoming range of b_1
>> has changed in such a way that it has an impact on the outgoing range of
>> a_2, the iterative update process can be reapplied by the ranger to
>> update the relevant cache entries. This is usually only required in
>> cases where multiple ssa-names are affected by back edges and feed each
>> other.
>>
>> 4 * Ranger
>> ----------------
>> The final component is the Ranger which provides a simple API to clients
>> where they can make various simple requests:
>> - Range of an ssa-name at a statement location
>> - Range of the result of a stmt
>> - Range of an ssa-name on an edge
>> - Range of an ssa-name after the last statement in a block
>> - Range of an ssa name on entry to a block
>>
>> The ranger is responsible for processing the IL, walking use/def chains
>> and coordinating the various calls into the GORI components to
>> pre-satisfy as many conditions as possible before any cache filling is
>> performed. It is also responsible for triggering any additional updates
>> which may be required due to newly discovered ranges. We in fact donât
>> do this yet, but is rarely required as it turns out.
>>
>> Summary
>> ---------------
>> All evaluations are done on-demand. If no queries are made, there is no
>> cost. The Ranger has no prerequisites other than a valid IL and CFG. It
>> does not need dominators, nor does it require any other changes within
>> an optimization pass in order to be used.
> Does it need fully up-to-date SSA form or does it only rely on appropriate
> SSA uses on stmts in the use-def chain of the SSA name we query
> range-info for? Does it use immediate uses or stmt operands at all
> or just use-def chains?
the range-ops component works purely on stmt operands.
The gori component which calculates ranges coming out of the block
require the SSA_NAME_DEF_STMT to be correct in order to look at the
previous stmt in the def chain and to determine if it is in the same
basic block or not. .
immediate uses are used for the non-null/zero property. If they are not
up to date this information would be stale or out of date unless it were
calculated up front
>
> I'm asking because some passes leave parts of the CFG not updated
> while working on other parts to reduce the number of update_ssa calls
> (ideally to a single one).
Due to the on-demand nature, any changes made would be reflected
immediately in any lookup, for better or worse. At the moment, we don't
encourage changing the CFG while working with ranges. Any CFG changes
made may invalidate some of the caches, or require them to grow... and
we'd need to hook into the CFG machinery in order to make sure we either
look at, or clear any affected caches.
Its clearly safer to queue up the changes in that sort of environment,
but we'd have to identify which kinds of operations are not safe. Our
implementation of RVRP does everything on the fly... simplifying and
eliminating statements as it goes. To the best of my knowledge we are
not currently using the on-demand engine in places where are taking some
things out of SSA and then putting them back in. So i have no practical
experience with that.  Aldy can correct me if he's doing anything in
the theader, but I doubt it.
so other than the non-null/zero property, we don't need much SSA
infrastructure, just "correct" IL/CFG.
>> When queries are made, only the minimum effort required is made to
>> satisfy the request. This is a big win when it comes to passes which
>> only need occasional range information such as warning passes. Some of
>> these passes actually instantiate a new ranger each time they make a
>> request, as a demonstration of the low overhead.
>>
>> As for passes such as VRP which require complete range information, the
>> caching process prevents the same information from being calculated more
>> than once. This means a similar amount of work is done with this
>> approach, as with the traditional top-down approach currently being
>> used, except we also process the back edges and iteratively solve for them.
> What's the storage requirement for the cache when doing VRP pass like
> processing? What's the compile-time complexity? I read you are saying
> "similar amount of work" but I guess that's from empirical data comparing
> to the existing VRP implementations (SSA VRP and EVRP) rather than
> theoretical bounds?
yes, compile-time complexity is from empirical speed timings and
theory-crafting from the algorithms, and that the on-entry cache
prevents multiple passes over things.
we have not done a memory analysis yet, not done anything to see if we
can improve it.
It makes very heavy use of bitmaps, which are typically quite sparse.
The on-entry cache is a vector of pointers to a range, initially 0, and
we allocate ranges as needed.  There will be an on-entry range entry
for any ssa-name which has been queried between the query point and the
definition. My belief is this is typically not large, but we have done
no analysis as yet. This does mean that once an ssa-name goes
"out-of-scope", ie no more uses in the program, it will not be in the
cache for any of those blocks simply because its never queried past its
end of life.
>
> Not sure where to put it so let me put it here: I'd like to see the SSA
> based VRP pass to die (in fact I'd like all ssa-propagator based
> passes to die...).
> EVRP is the future here and I can see EVRP making use of the
> reverse part of Range-Ops to replace the ad-hoc pattern matching
> in register_edge_assert_for (badly named function that now gets you
> back a list of assertions hold on an edge). So I hope to see VRP
Does register_edge_assert cache anything? or walk an assert chain? or is
it basically just a "range-on-this-edge"Â call? and the result combined
with what is known at that point?
> yanked (it performing jump threading is the major obstackle) and
> Range-Ops being used as common infrastructure for ranges (that leaves
> EVRP). I don't really want something "additional" leaving things for
Although EVRP can be modified to use the range-ops work, and we can
probably make VRP go away, I'm not convinced that EVRP has to be the
future. Its merely an alternate implementation that has its own set of
benefits and drawbacks.
> other people to cleanup / merge - honestly you don't have the very best
> track record in this area (it might be your main area of work is not GCC,
> still the @redhat tag on your mail is an obligation here).
That's a curious assertion over a 20 year span. To what are you referring?
> Now I've said the above let me continue with 3/n.
>
> Richard.
>
>> **Comments and feedback always welcome!Thanks
>> Andrew
>> *
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-24 15:50 ` Andrew MacLeod
@ 2019-05-27 13:03 ` Richard Biener
2019-05-28 14:17 ` Andrew MacLeod
0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2019-05-27 13:03 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 5/23/19 8:55 AM, Richard Biener wrote:
> > On Thu, May 23, 2019 at 3:28 AM Andrew MacLeod <amacleod@redhat.com> wrote:
> >>
> >> 2 * GORI
> >> ------------
> >> The second component is the “Generates Outgoing Range Info” engine.
> >> This is a basic-block oriented component which determines what ssa-names
> >> have ranges created on outgoing edges, and how to calculate those ranges.
> >>
> >> The last statement in the block is examined to see if it is a branch
> >> which generates range info, and then determines which ssa-names in the
> >> block can have ranges calculated. It quickly walks the use-def chain
> >> from the branch to find other definitions within the block that can
> >> impact the branch and could also have their ranges calculated. This
> >> summary is then cached for future use.
> >>
> >> The GORI component also uses this summary info to perform the
> >> calculations necessary to determine the outgoing range for any ssa_name
> >> which can be determined. For example:
> >> c_3 = foo ()
> >> a_2 = b_1 - 20
> >> If (a_2 < 40)
> >> The summary information would indicate that b_1 and a_2 can have their
> >> outgoing ranges calculated for this block, and uses the cached
> >> information to quickly calculate them when required.
> >> The API consists of 2 basic methods, query and calculate:
> >> - bool has_edge_range_p (edge, name);
> >> - range outgoing_edge_range (edge, name);
> >> If a query is made for any other ssa-name, it simply returns false and
> >> says this block does not generate a range for that name. This is a key
> >> rationale for the summary so we only spend time processing names for
> >> blocks that actually have ranges generated.
> > So the current code (in a few cases) derives ranges for SSA uses when
> > they see for example
> >
> > _3 = _4 / _5;
> >
> > here _5 != 0 for the stmts following this one (OK, this is a bad example
> > because for divisions we don't do this, but we do it for derefs and ~[0,0]).
> >
> > The above suggests that iff this is done at all it is not in GORI because
> > those are not conditional stmts or ranges from feeding those. The
> > machinery doing the use-def walking from stmt context also cannot
> > come along these so I have the suspicion that Ranger cannot handle
> > telling us that for the stmt following above, for example
> >
> > if (_5 != 0)
> >
> > that _5 is not zero?
> >
> > Can you clarify?
>
> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
> asked for the range of op2 () would return ~[0,0] for _5.
> But you are also correct in that the walk backwards would not find this.
>
> This is similar functionality to how null_derefs are currently handled,
> and in fact could probably be done simultaneously using the same code
> base. I didn't bring null derefs up, but this is a good time :-)
>
> There is a separate class used by the gori-cache which tracks the
> non-nullness property at the block level. It has a single API:
> non_null_deref_p (name, bb) which determines whether the is a
> dereference in any BB for NAME, which indicates whether the range has an
> implicit ~[0,0] range in that basic block or not.
So when we then have
_1 = *_2; // after this _2 is non-NULL
_3 = _1 + 1; // _3 is non-NULL
_4 = *_3;
...
when a on-demand user asks whether _3 is non-NULL at the
point of _4 = *_3 we don't have this information? Since the
per-BB caching will only say _1 is non-NULL after the BB.
I'm also not sure whether _3 ever gets non-NULL during
non-NULL processing of the block since walking immediate uses
doesn't really help here?
So this seems to be a fundamental limitation [to the caching scheme],
not sure if it is bad in practice.
Or am I missing something?
> It it implemented by caching a bitmap which has a 1 set for any block
> which contains a deref of that name, so after the first call, it is
> simply a matter of returning that bit. The first time the API is
> called, it does a immediate-use walk for name and sets the bit for any
> block which contains a statement for which we can infer a non-null result.
>
> We'd want the exact same functionality for divide by zero.. we can infer
> ~[0,0] from the DIV_EXPR for the block that statement is in. The call is
> already made, it just returns false right now if it isn't a pointer type.
>
>
> I'll add that and make sure it works.
>
>
>
> >
> >> 3 * GORI cache
> >> ---------------------
> >> The third component builds on the basic block GORI component, and adds
> >> the ability to traverse the CFG and combine the various ranges provided
> >> by each basic block into a cache.
> >>
> >> It provides both a global-range cache and a range-on-entry cache
> >> summarizing the range of an ssa_name on entry to the block.
> >>
> >> The range-on-entry cache is self filling, and iteratively evaluates back
> >> edges. There is a single API entry point which asks for the range on
> >> entry, and if the data has not been calculated/cached already, it spawns
> >> the following process to calculate the data. :
> >> * Walk the CFG from the current block to the definition block,
> >> and/or any block which has range-on-entry information already
> >> calculated. These end points are the source blocks.
> >> * Any non-source blocks visited are added to a worklist to be updated.
> >> * Starting with the source blocks, push the known outgoing range
> >> from each block to each successor and update their live-on-entry values
> >> when they are visited. If the incoming range changes, mark this block’s
> >> successors as requiring updating.
> >> * Continue iterating until no more updates are required.
> >>
> >> The ordering is planned by the ranger such that if there are no back
> >> edges, it is typically a straightforward single pass. When back edges
> >> are involved, the amount of iterating is typically very small. The
> >> updating is restricted to a single ssa-name, meaning it doesn’t get into
> >> feedback loops with other names nor PHIs. . . It usually converges very
> >> quickly.
> >>
> >> It is important to note that this works exclusively with static ranges
> >> of only a single ssa-name at a time. Ie, ranges which are implicitly
> >> exposed in the IL, and only name being examined. The values returned by
> >> these queries are not dependent on changes in other ssa-names, which is
> >> how the iteration process never gets out of control.
> >>
> >> The ranger making the calls to fill this cache has a higher level
> >> overview, and requests these ranges in definition order such that any
> >> ssa-names feeding the definition of a name having its cache filled are
> >> resolved first, providing the best possible results the first time.
> >> a_2 = b_1 - 20
> >> If (a_2 < 40)
> >> If the range for a_2 is requested on the true side, the ranger will
> >> first calculate the range of b_1 on entry to the block. Then use this to
> >> calculate the global range of a_2, and finally for the outgoing range on
> >> the desired edge.
> >>
> >> If at some later point, it is discovered that the incoming range of b_1
> >> has changed in such a way that it has an impact on the outgoing range of
> >> a_2, the iterative update process can be reapplied by the ranger to
> >> update the relevant cache entries. This is usually only required in
> >> cases where multiple ssa-names are affected by back edges and feed each
> >> other.
> >>
> >> 4 * Ranger
> >> ----------------
> >> The final component is the Ranger which provides a simple API to clients
> >> where they can make various simple requests:
> >> - Range of an ssa-name at a statement location
> >> - Range of the result of a stmt
> >> - Range of an ssa-name on an edge
> >> - Range of an ssa-name after the last statement in a block
> >> - Range of an ssa name on entry to a block
> >>
> >> The ranger is responsible for processing the IL, walking use/def chains
> >> and coordinating the various calls into the GORI components to
> >> pre-satisfy as many conditions as possible before any cache filling is
> >> performed. It is also responsible for triggering any additional updates
> >> which may be required due to newly discovered ranges. We in fact don’t
> >> do this yet, but is rarely required as it turns out.
> >>
> >> Summary
> >> ---------------
> >> All evaluations are done on-demand. If no queries are made, there is no
> >> cost. The Ranger has no prerequisites other than a valid IL and CFG. It
> >> does not need dominators, nor does it require any other changes within
> >> an optimization pass in order to be used.
> > Does it need fully up-to-date SSA form or does it only rely on appropriate
> > SSA uses on stmts in the use-def chain of the SSA name we query
> > range-info for? Does it use immediate uses or stmt operands at all
> > or just use-def chains?
>
> the range-ops component works purely on stmt operands.
> The gori component which calculates ranges coming out of the block
> require the SSA_NAME_DEF_STMT to be correct in order to look at the
> previous stmt in the def chain and to determine if it is in the same
> basic block or not. .
> immediate uses are used for the non-null/zero property. If they are not
> up to date this information would be stale or out of date unless it were
> calculated up front
>
>
> >
> > I'm asking because some passes leave parts of the CFG not updated
> > while working on other parts to reduce the number of update_ssa calls
> > (ideally to a single one).
> Due to the on-demand nature, any changes made would be reflected
> immediately in any lookup, for better or worse. At the moment, we don't
> encourage changing the CFG while working with ranges. Any CFG changes
> made may invalidate some of the caches, or require them to grow... and
> we'd need to hook into the CFG machinery in order to make sure we either
> look at, or clear any affected caches.
>
> Its clearly safer to queue up the changes in that sort of environment,
> but we'd have to identify which kinds of operations are not safe. Our
> implementation of RVRP does everything on the fly... simplifying and
> eliminating statements as it goes. To the best of my knowledge we are
> not currently using the on-demand engine in places where are taking some
> things out of SSA and then putting them back in. So i have no practical
> experience with that. Aldy can correct me if he's doing anything in
> the theader, but I doubt it.
>
> so other than the non-null/zero property, we don't need much SSA
> infrastructure, just "correct" IL/CFG.
>
> >> When queries are made, only the minimum effort required is made to
> >> satisfy the request. This is a big win when it comes to passes which
> >> only need occasional range information such as warning passes. Some of
> >> these passes actually instantiate a new ranger each time they make a
> >> request, as a demonstration of the low overhead.
> >>
> >> As for passes such as VRP which require complete range information, the
> >> caching process prevents the same information from being calculated more
> >> than once. This means a similar amount of work is done with this
> >> approach, as with the traditional top-down approach currently being
> >> used, except we also process the back edges and iteratively solve for them.
> > What's the storage requirement for the cache when doing VRP pass like
> > processing? What's the compile-time complexity? I read you are saying
> > "similar amount of work" but I guess that's from empirical data comparing
> > to the existing VRP implementations (SSA VRP and EVRP) rather than
> > theoretical bounds?
> yes, compile-time complexity is from empirical speed timings and
> theory-crafting from the algorithms, and that the on-entry cache
> prevents multiple passes over things.
>
> we have not done a memory analysis yet, not done anything to see if we
> can improve it.
> It makes very heavy use of bitmaps, which are typically quite sparse.
> The on-entry cache is a vector of pointers to a range, initially 0, and
> we allocate ranges as needed. There will be an on-entry range entry
> for any ssa-name which has been queried between the query point and the
> definition.
So that's similar to LIVE-on-entry thus a SSA name with a range
will have an on-entry range entry on each BB it dominates?
That makes storage requirement quadratic in the worst case.
> My belief is this is typically not large, but we have done
> no analysis as yet. This does mean that once an ssa-name goes
> "out-of-scope", ie no more uses in the program, it will not be in the
> cache for any of those blocks simply because its never queried past its
> end of life.
> >
> > Not sure where to put it so let me put it here: I'd like to see the SSA
> > based VRP pass to die (in fact I'd like all ssa-propagator based
> > passes to die...).
> > EVRP is the future here and I can see EVRP making use of the
> > reverse part of Range-Ops to replace the ad-hoc pattern matching
> > in register_edge_assert_for (badly named function that now gets you
> > back a list of assertions hold on an edge). So I hope to see VRP
> Does register_edge_assert cache anything? or walk an assert chain? or is
> it basically just a "range-on-this-edge" call? and the result combined
> with what is known at that point?
register_edge_assert pattern-matches the stmts feeding a conditional
stmt on an edge. It doesn't cache anything (so does quite some redundant
work when processing all outgoing edges). It's not intended to be called
multiple times but rather when the DOM walker enters a block and we
look at the controlling statement to derive conditional ranges. For
example for
_1 = (unsigned) _2;
if (_1 > 1)
...
it derives a vector of two relations on the true edge:
_1 > 1 == true
(unsigned) _2 > 1 == true
this is actually what we end up building ASSERT_EXPRs for in traditional
VRP and what we extract ranges from only valid in dominated context from EVRP.
> > yanked (it performing jump threading is the major obstackle) and
> > Range-Ops being used as common infrastructure for ranges (that leaves
> > EVRP). I don't really want something "additional" leaving things for
>
> Although EVRP can be modified to use the range-ops work, and we can
> probably make VRP go away, I'm not convinced that EVRP has to be the
> future. Its merely an alternate implementation that has its own set of
> benefits and drawbacks.
>
> > other people to cleanup / merge - honestly you don't have the very best
> > track record in this area (it might be your main area of work is not GCC,
> > still the @redhat tag on your mail is an obligation here).
> That's a curious assertion over a 20 year span. To what are you referring?
>
> > Now I've said the above let me continue with 3/n.
> >
> > Richard.
> >
> >> **Comments and feedback always welcome!Thanks
> >> Andrew
> >> *
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-27 13:03 ` Richard Biener
@ 2019-05-28 14:17 ` Andrew MacLeod
2019-05-29 11:16 ` Richard Biener
0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-28 14:17 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 5/27/19 9:02 AM, Richard Biener wrote:
> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>
>>> The above suggests that iff this is done at all it is not in GORI because
>>> those are not conditional stmts or ranges from feeding those. The
>>> machinery doing the use-def walking from stmt context also cannot
>>> come along these so I have the suspicion that Ranger cannot handle
>>> telling us that for the stmt following above, for example
>>>
>>> if (_5 != 0)
>>>
>>> that _5 is not zero?
>>>
>>> Can you clarify?
>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
>> asked for the range of op2 () would return ~[0,0] for _5.
>> But you are also correct in that the walk backwards would not find this.
>>
>> This is similar functionality to how null_derefs are currently handled,
>> and in fact could probably be done simultaneously using the same code
>> base. I didn't bring null derefs up, but this is a good time :-)
>>
>> There is a separate class used by the gori-cache which tracks the
>> non-nullness property at the block level. It has a single API:
>> non_null_deref_p (name, bb) which determines whether the is a
>> dereference in any BB for NAME, which indicates whether the range has an
>> implicit ~[0,0] range in that basic block or not.
> So when we then have
>
> _1 = *_2; // after this _2 is non-NULL
> _3 = _1 + 1; // _3 is non-NULL
> _4 = *_3;
> ...
>
> when a on-demand user asks whether _3 is non-NULL at the
> point of _4 = *_3 we don't have this information? Since the
> per-BB caching will only say _1 is non-NULL after the BB.
> I'm also not sure whether _3 ever gets non-NULL during
> non-NULL processing of the block since walking immediate uses
> doesn't really help here?
presumably _3 is globally non-null due to the definition being (pointer
+ x)Â ... ie, _3 has a global range o f ~[0,0] ?
>
> So this seems to be a fundamental limitation [to the caching scheme],
> not sure if it is bad in practice.
>
> Or am I missing something?
>
Not missing anything The non-nullness property is maintains globally at
the basic block level. both _1 and _3 are flagged as being non-null in
the block. Upon exit, its a bit check.  If the global information does
not have the non-nullness property, then when a request is made for
non-nullness and the def and the use are both within the same block,
and its flagged as being non-null in that block, then the request is
forced back to a quick walk between the def and the use to see if there
is any non-nulless introduced in between.  Yes, that makes it a linear
walk, but its infrequent, and often short. to the best of our knowledge
at this point anyway :-)
>>
>> yes, compile-time complexity is from empirical speed timings and
>> theory-crafting from the algorithms, and that the on-entry cache
>> prevents multiple passes over things.
>>
>> we have not done a memory analysis yet, not done anything to see if we
>> can improve it.
>> It makes very heavy use of bitmaps, which are typically quite sparse.
>> The on-entry cache is a vector of pointers to a range, initially 0, and
>> we allocate ranges as needed. There will be an on-entry range entry
>> for any ssa-name which has been queried between the query point and the
>> definition.
> So that's similar to LIVE-on-entry thus a SSA name with a range
> will have an on-entry range entry on each BB it dominates?
> That makes storage requirement quadratic in the worst case.
Yes, assuming a request has been made for all ssa-names everywhere they
are live.
>
>> My belief is this is typically not large, but we have done
>> no analysis as yet. This does mean that once an ssa-name goes
>> "out-of-scope", ie no more uses in the program, it will not be in the
>> cache for any of those blocks simply because its never queried past its
>> end of life.
>>> Not sure where to put it so let me put it here: I'd like to see the SSA
>>> based VRP pass to die (in fact I'd like all ssa-propagator based
>>> passes to die...).
>>> EVRP is the future here and I can see EVRP making use of the
>>> reverse part of Range-Ops to replace the ad-hoc pattern matching
>>> in register_edge_assert_for (badly named function that now gets you
>>> back a list of assertions hold on an edge). So I hope to see VRP
>> Does register_edge_assert cache anything? or walk an assert chain? or is
>> it basically just a "range-on-this-edge" call? and the result combined
>> with what is known at that point?
> register_edge_assert pattern-matches the stmts feeding a conditional
> stmt on an edge. It doesn't cache anything (so does quite some redundant
> work when processing all outgoing edges). It's not intended to be called
> multiple times but rather when the DOM walker enters a block and we
> look at the controlling statement to derive conditional ranges. For
> example for
>
> _1 = (unsigned) _2;
> if (_1 > 1)
> ...
>
> it derives a vector of two relations on the true edge:
>
> _1 > 1 == true
> (unsigned) _2 > 1 == true
>
> this is actually what we end up building ASSERT_EXPRs for in traditional
> VRP and what we extract ranges from only valid in dominated context from EVRP.
and that gets mapped to the actual range [2, MAX] when? That is what
range_on_edge will produce for both _1 and _2 on this edge.
If we were to go to the effort of handling symbolics, and the expression was
 if (_1 > _2)
then we'd get ranges _2 = [_1 + 1, MAX] and _1 = [MIN, _2]Â from
range_on_edge
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-28 14:17 ` Andrew MacLeod
@ 2019-05-29 11:16 ` Richard Biener
2019-05-31 15:40 ` Andrew MacLeod
0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2019-05-29 11:16 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 5/27/19 9:02 AM, Richard Biener wrote:
> > On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >>
> >>> The above suggests that iff this is done at all it is not in GORI because
> >>> those are not conditional stmts or ranges from feeding those. The
> >>> machinery doing the use-def walking from stmt context also cannot
> >>> come along these so I have the suspicion that Ranger cannot handle
> >>> telling us that for the stmt following above, for example
> >>>
> >>> if (_5 != 0)
> >>>
> >>> that _5 is not zero?
> >>>
> >>> Can you clarify?
> >> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
> >> asked for the range of op2 () would return ~[0,0] for _5.
> >> But you are also correct in that the walk backwards would not find this.
> >>
> >> This is similar functionality to how null_derefs are currently handled,
> >> and in fact could probably be done simultaneously using the same code
> >> base. I didn't bring null derefs up, but this is a good time :-)
> >>
> >> There is a separate class used by the gori-cache which tracks the
> >> non-nullness property at the block level. It has a single API:
> >> non_null_deref_p (name, bb) which determines whether the is a
> >> dereference in any BB for NAME, which indicates whether the range has an
> >> implicit ~[0,0] range in that basic block or not.
> > So when we then have
> >
> > _1 = *_2; // after this _2 is non-NULL
> > _3 = _1 + 1; // _3 is non-NULL
> > _4 = *_3;
> > ...
> >
> > when a on-demand user asks whether _3 is non-NULL at the
> > point of _4 = *_3 we don't have this information? Since the
> > per-BB caching will only say _1 is non-NULL after the BB.
> > I'm also not sure whether _3 ever gets non-NULL during
> > non-NULL processing of the block since walking immediate uses
> > doesn't really help here?
> presumably _3 is globally non-null due to the definition being (pointer
> + x) ... ie, _3 has a global range o f ~[0,0] ?
No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
> >
> > So this seems to be a fundamental limitation [to the caching scheme],
> > not sure if it is bad in practice.
> >
> > Or am I missing something?
> >
>
> Not missing anything The non-nullness property is maintains globally at
> the basic block level. both _1 and _3 are flagged as being non-null in
> the block. Upon exit, its a bit check. If the global information does
> not have the non-nullness property, then when a request is made for
> non-nullness and the def and the use are both within the same block,
> and its flagged as being non-null in that block, then the request is
> forced back to a quick walk between the def and the use to see if there
> is any non-nulless introduced in between. Yes, that makes it a linear
> walk, but its infrequent, and often short. to the best of our knowledge
> at this point anyway :-)
So with the clarification above do we ever see that _3 is non-NULL?
I suppose the worker processing _3 = _1 + 1 would ask for
_1 non-nullness but we do not record any non-NULL-ness of _1 in
this basic-block (but only at its end). Consider stmts
_4 = (uintptr_t) _2;
_5 = _6 / _4;
_1 = *_2;
...
here at _1 we know _2 is not NULL. But if we ask for non-NULLness
of _2 at the definition of _4 we may not compute ~[0, 0] and thus
conclude that _6 / _4 does not trap.
stmt-level tracking of ranges are sometimes important. This is
something the machinery cannot provide - correct? At least not
optimistically enough with ranges derived about uses.
> >>
> >> yes, compile-time complexity is from empirical speed timings and
> >> theory-crafting from the algorithms, and that the on-entry cache
> >> prevents multiple passes over things.
> >>
> >> we have not done a memory analysis yet, not done anything to see if we
> >> can improve it.
> >> It makes very heavy use of bitmaps, which are typically quite sparse.
> >> The on-entry cache is a vector of pointers to a range, initially 0, and
> >> we allocate ranges as needed. There will be an on-entry range entry
> >> for any ssa-name which has been queried between the query point and the
> >> definition.
> > So that's similar to LIVE-on-entry thus a SSA name with a range
> > will have an on-entry range entry on each BB it dominates?
> > That makes storage requirement quadratic in the worst case.
> Yes, assuming a request has been made for all ssa-names everywhere they
> are live.
You did propose to replace [E]VRP with a walk over the whole function
querying ranges and simplifying stmts, did you?
> >> My belief is this is typically not large, but we have done
> >> no analysis as yet. This does mean that once an ssa-name goes
> >> "out-of-scope", ie no more uses in the program, it will not be in the
> >> cache for any of those blocks simply because its never queried past its
> >> end of life.
> >>> Not sure where to put it so let me put it here: I'd like to see the SSA
> >>> based VRP pass to die (in fact I'd like all ssa-propagator based
> >>> passes to die...).
> >>> EVRP is the future here and I can see EVRP making use of the
> >>> reverse part of Range-Ops to replace the ad-hoc pattern matching
> >>> in register_edge_assert_for (badly named function that now gets you
> >>> back a list of assertions hold on an edge). So I hope to see VRP
> >> Does register_edge_assert cache anything? or walk an assert chain? or is
> >> it basically just a "range-on-this-edge" call? and the result combined
> >> with what is known at that point?
> > register_edge_assert pattern-matches the stmts feeding a conditional
> > stmt on an edge. It doesn't cache anything (so does quite some redundant
> > work when processing all outgoing edges). It's not intended to be called
> > multiple times but rather when the DOM walker enters a block and we
> > look at the controlling statement to derive conditional ranges. For
> > example for
> >
> > _1 = (unsigned) _2;
> > if (_1 > 1)
> > ...
> >
> > it derives a vector of two relations on the true edge:
> >
> > _1 > 1 == true
> > (unsigned) _2 > 1 == true
> >
> > this is actually what we end up building ASSERT_EXPRs for in traditional
> > VRP and what we extract ranges from only valid in dominated context from EVRP.
>
> and that gets mapped to the actual range [2, MAX] when? That is what
> range_on_edge will produce for both _1 and _2 on this edge.
In EVRP immediately. In VRP when the later propagation stage visits the
newly inserted ASSERT_EXPRs.
> If we were to go to the effort of handling symbolics, and the expression was
> if (_1 > _2)
> then we'd get ranges _2 = [_1 + 1, MAX] and _1 = [MIN, _2] from
> range_on_edge
>
>
> Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-29 11:16 ` Richard Biener
@ 2019-05-31 15:40 ` Andrew MacLeod
2019-05-31 18:16 ` Jeff Law
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-31 15:40 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 5/29/19 7:15 AM, Richard Biener wrote:
> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>>>> The above suggests that iff this is done at all it is not in GORI because
>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>> telling us that for the stmt following above, for example
>>>>>
>>>>> if (_5 != 0)
>>>>>
>>>>> that _5 is not zero?
>>>>>
>>>>> Can you clarify?
>>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>> But you are also correct in that the walk backwards would not find this.
>>>>
>>>> This is similar functionality to how null_derefs are currently handled,
>>>> and in fact could probably be done simultaneously using the same code
>>>> base. I didn't bring null derefs up, but this is a good time :-)
>>>>
>>>> There is a separate class used by the gori-cache which tracks the
>>>> non-nullness property at the block level. It has a single API:
>>>> non_null_deref_p (name, bb) which determines whether the is a
>>>> dereference in any BB for NAME, which indicates whether the range has an
>>>> implicit ~[0,0] range in that basic block or not.
>>> So when we then have
>>>
>>> _1 = *_2; // after this _2 is non-NULL
>>> _3 = _1 + 1; // _3 is non-NULL
>>> _4 = *_3;
>>> ...
>>>
>>> when a on-demand user asks whether _3 is non-NULL at the
>>> point of _4 = *_3 we don't have this information? Since the
>>> per-BB caching will only say _1 is non-NULL after the BB.
>>> I'm also not sure whether _3 ever gets non-NULL during
>>> non-NULL processing of the block since walking immediate uses
>>> doesn't really help here?
>> presumably _3 is globally non-null due to the definition being (pointer
>> + x) ... ie, _3 has a global range o f ~[0,0] ?
> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
I'm confused.
_1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
The only way I see to determine _3 is non-NULLÂ is through the _4 = *_3
statement.
>
>>> So this seems to be a fundamental limitation [to the caching scheme],
>>> not sure if it is bad in practice.
>>>
>>> Or am I missing something?
>>>
>> Not missing anything The non-nullness property is maintains globally at
>> the basic block level. both _1 and _3 are flagged as being non-null in
>> the block. Upon exit, its a bit check. If the global information does
>> not have the non-nullness property, then when a request is made for
>> non-nullness and the def and the use are both within the same block,
>> and its flagged as being non-null in that block, then the request is
>> forced back to a quick walk between the def and the use to see if there
>> is any non-nulless introduced in between. Yes, that makes it a linear
>> walk, but its infrequent, and often short. to the best of our knowledge
>> at this point anyway :-)
> So with the clarification above do we ever see that _3 is non-NULL?
> I suppose the worker processing _3 = _1 + 1 would ask for
> _1 non-nullness but we do not record any non-NULL-ness of _1 in
> this basic-block (but only at its end). Consider stmts
>
> _4 = (uintptr_t) _2;
> _5 = _6 / _4;
> _1 = *_2;
> ...
>
> here at _1 we know _2 is not NULL. But if we ask for non-NULLness
> of _2 at the definition of _4 we may not compute ~[0, 0] and thus
> conclude that _6 / _4 does not trap.
EVRP must look backwards to figure this out since the forward walk will
process _5 = _6 / _4 before it sees the dereference to _2... so how does
it know that _4 is non-zero without looking backwards at things after it
sees the dereference?? Does it actually do this?
>
> stmt-level tracking of ranges are sometimes important. This is
> something the machinery cannot provide - correct? At least not
> optimistically enough with ranges derived about uses.
Maybe I'm the one missing something, but in the absence of statement
level exception throwing via 'can_throw_non_call_exceptions' being true,
any assertion made anywhere in the block to an ssa_name applies to the
entire block does it not? Â ie it doesn't matter if the deference
happens first thing in the block or last thing, its not going to change
its value within the block.. its going to be non-null throughout the
entire block.
so if one statement in the block asserts that references to _2 are
non-null, we can assert that all references to _2 in the block are
non-null. Meaning we get all these cases by knowing that the specified
name is non-zero through-out the block. This also means we could know
things earlier in the block than a forward walk would provide.
So with the 2 examples:
_1 = *_2; // after this _2 is non-NULL
_3 = _1 + 1;
_4 = *_3;
both _2 and _3 are flagged as non-null in the block due to the
de-references. Im not sure what we know about _1 from above, but we do
know
 ~[0,0] = _1 + 1 . so whatever that tells you , if anything, about
_1 we know. it seems to me _1 is ~[-1,-1] based on that...
Regardless, I think we know all the same things EVRP does.
likewise
 _4 = (uintptr_t) _2;
 _5 = _6 / _4;
 _1 = *_2;
_2 will be non-null in the entire block, so _4 must also be non-null
and we can conclude that the divide does not trap.
now, when we set the ' can_throw_non_call_exceptions' flag, then we'd
have to resort to statement walks, and we cannot determine that _5 does
not trap anyway. Â EVRP is in the same boat.. It doesn't know its not
going to trap either because we may never get to the *_2..
>>>> yes, compile-time complexity is from empirical speed timings and
>>>> theory-crafting from the algorithms, and that the on-entry cache
>>>> prevents multiple passes over things.
>>>>
>>>> we have not done a memory analysis yet, not done anything to see if we
>>>> can improve it.
>>>> It makes very heavy use of bitmaps, which are typically quite sparse.
>>>> The on-entry cache is a vector of pointers to a range, initially 0, and
>>>> we allocate ranges as needed. There will be an on-entry range entry
>>>> for any ssa-name which has been queried between the query point and the
>>>> definition.
>>> So that's similar to LIVE-on-entry thus a SSA name with a range
>>> will have an on-entry range entry on each BB it dominates?
>>> That makes storage requirement quadratic in the worst case.
>> Yes, assuming a request has been made for all ssa-names everywhere they
>> are live.
> You did propose to replace [E]VRP with a walk over the whole function
> querying ranges and simplifying stmts, did you?
yes, for the case of EVRP. But not all use cases query every range
everywhere.
its also a bit lower than that since we cache certain types of ranges.Â
we cache a range for varying (or range for type if you prefer)Â of
whatever the ssa-name type is. so if the range is varying everywhere,
we actually only have once instance of the range rather than N of them.Â
So any name that doesn't have a range reduction anywhere will only
create a single range instance for the entire CFG. I think thats the
most common value, so that should reduce a large number of them.  I've
also considering caching ranges like we cache tree constants... but I
haven't delved into that. I figured if memory turns out to be a
problem, then we'll look at it then.
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-31 15:40 ` Andrew MacLeod
@ 2019-05-31 18:16 ` Jeff Law
2019-05-31 20:27 ` Andrew MacLeod
2019-06-04 15:27 ` Richard Biener
2019-06-04 20:04 ` Martin Sebor
2 siblings, 1 reply; 21+ messages in thread
From: Jeff Law @ 2019-05-31 18:16 UTC (permalink / raw)
To: Andrew MacLeod, Richard Biener; +Cc: GCC, Aldy Hernandez
On 5/31/19 9:40 AM, Andrew MacLeod wrote:
> On 5/29/19 7:15 AM, Richard Biener wrote:
>> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com>
>> wrote:
>>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com>
>>>> wrote:
>>>>>> The above suggests that iff this is done at all it is not in GORI
>>>>>> because
>>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>>> telling us that for the stmt following above, for example
>>>>>>
>>>>>> Â Â Â if (_5 != 0)
>>>>>>
>>>>>> that _5 is not zero?
>>>>>>
>>>>>> Can you clarify?
>>>>> So there are 2 aspects to this.   the range-ops code for DIV_EXPR, if
>>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>>> But you are also correct in that the walk backwards would not find
>>>>> this.
>>>>>
>>>>> This is similar functionality to how null_derefs are currently
>>>>> handled,
>>>>> and in fact could probably be done simultaneously using the same code
>>>>> base.  I didn't bring null derefs up, but this is a good time :-)
>>>>>
>>>>> There is a separate class used by the gori-cache which tracks the
>>>>> non-nullness property at the block level.   It has a single API:
>>>>> non_null_deref_p (name, bb)Â Â Â which determines whether the is a
>>>>> dereference in any BB for NAME, which indicates whether the range
>>>>> has an
>>>>> implicit ~[0,0] range in that basic block or not.
>>>> So when we then have
>>>>
>>>> Â Â _1 = *_2; // after this _2 is non-NULL
>>>> Â Â _3 = _1 + 1; // _3 is non-NULL
>>>> Â Â _4 = *_3;
>>>> ...
>>>>
>>>> when a on-demand user asks whether _3 is non-NULL at the
>>>> point of _4 = *_3 we don't have this information? Since the
>>>> per-BB caching will only say _1 is non-NULL after the BB.
>>>> I'm also not sure whether _3 ever gets non-NULL during
>>>> non-NULL processing of the block since walking immediate uses
>>>> doesn't really help here?
>>> presumably _3 is globally non-null due to the definition being (pointer
>>> + x)Â ... ie, _3 has a global range o f ~[0,0] ?
>> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
>> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
>
> I'm confused.
>
> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
> idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
> The only way I see to determine _3 is non-NULLÂ is through the _4 = *_3
> statement.
Likewise. I don't see how we get ~[0,0] for _1, except at the point
after the dereference of _3.
>>>> So this seems to be a fundamental limitation [to the caching scheme],
>>>> not sure if it is bad in practice.
>>>>
>>>> Or am I missing something?
>>>>
>>> Not missing anything The non-nullness property is maintains globally at
>>> the basic block level. both _1 and _3 are flagged as being non-null in
>>> the block. Upon exit, its a bit check.  If the global information does
>>> not have the non-nullness property, then when a request is made for
>>> non-nullness and the def and the use are both within the same block,
>>> and its flagged as being non-null in that block, then the request is
>>> forced back to a quick walk between the def and the use to see if there
>>> is any non-nulless introduced in between.  Yes, that makes it a linear
>>> walk, but its infrequent, and often short. to the best of our knowledge
>>> at this point anyway :-)
>> So with the clarification above do we ever see that _3 is non-NULL?
>> I suppose the worker processing _3 = _1 + 1 would ask for
>> _1 non-nullness but we do not record any non-NULL-ness of _1 in
>> this basic-block (but only at its end). Consider stmts
>>
>> Â _4 = (uintptr_t) _2;
>> Â _5 = _6 / _4;
>> Â _1 = *_2;
>> ...
>>
>> here at _1 we know _2 is not NULL. But if we ask for non-NULLness
>> of _2 at the definition of _4 we may not compute ~[0, 0] and thus
>> conclude that _6 / _4 does not trap.
> EVRP must look backwards to figure this out since the forward walk will
> process _5 = _6 / _4 before it sees the dereference to _2... so how does
> it know that _4 is non-zero without looking backwards at things after it
> sees the dereference?? Does it actually do this?
During the forward walk we process the assignment to _5, which is _6 /
_4. We can infer that _4 is nonzero because division by zero is
undefined behavior. But I'm not sure how EVRP would go back and then
make a determination about _4 unless it's doing so via an equivalence.
>
>>
>> stmt-level tracking of ranges are sometimes important. This is
>> something the machinery cannot provide - correct? At least not
>> optimistically enough with ranges derived about uses.
>
> Maybe I'm the one missing something, but in the absence of statement
> level exception throwing via 'can_throw_non_call_exceptions' being true,
> any assertion made anywhere in the block to an ssa_name applies to the
> entire block does it not? Â ie it doesn't matter if the deference
> happens first thing in the block or last thing, its not going to change
> its value within the block.. its going to be non-null throughout the
> entire block.
No, I don't think it can hold for the entire block.
Consider
x = p ? 10 : 20;
foo (x)
*p = whatever
We don't know p is non-null because foo might not return. If by way of
the dereference we were to assert p is non-null for the entire block,
then we'd pass the wrong value to foo().
Jeff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-31 18:16 ` Jeff Law
@ 2019-05-31 20:27 ` Andrew MacLeod
2019-05-31 22:00 ` Jeff Law
0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-31 20:27 UTC (permalink / raw)
To: Jeff Law, Richard Biener; +Cc: GCC, Aldy Hernandez
On 5/31/19 2:16 PM, Jeff Law wrote:
> On 5/31/19 9:40 AM, Andrew MacLeod wrote:
>> On 5/29/19 7:15 AM, Richard Biener wrote:
>>> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com>
>>> wrote:
>>>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com>
>>>>> wrote:
>>>>>>> The above suggests that iff this is done at all it is not in GORI
>>>>>>> because
>>>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>>>> telling us that for the stmt following above, for example
>>>>>>>
>>>>>>> Â Â Â if (_5 != 0)
>>>>>>>
>>>>>>> that _5 is not zero?
>>>>>>>
>>>>>>> Can you clarify?
>>>>>> So there are 2 aspects to this.   the range-ops code for DIV_EXPR, if
>>>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>>>> But you are also correct in that the walk backwards would not find
>>>>>> this.
>>>>>>
>>>>>> This is similar functionality to how null_derefs are currently
>>>>>> handled,
>>>>>> and in fact could probably be done simultaneously using the same code
>>>>>> base.  I didn't bring null derefs up, but this is a good time :-)
>>>>>>
>>>>>> There is a separate class used by the gori-cache which tracks the
>>>>>> non-nullness property at the block level.   It has a single API:
>>>>>> non_null_deref_p (name, bb)Â Â Â which determines whether the is a
>>>>>> dereference in any BB for NAME, which indicates whether the range
>>>>>> has an
>>>>>> implicit ~[0,0] range in that basic block or not.
>>>>> So when we then have
>>>>>
>>>>> Â Â _1 = *_2; // after this _2 is non-NULL
>>>>> Â Â _3 = _1 + 1; // _3 is non-NULL
>>>>> Â Â _4 = *_3;
>>>>> ...
>>>>>
>>>>> when a on-demand user asks whether _3 is non-NULL at the
>>>>> point of _4 = *_3 we don't have this information? Since the
>>>>> per-BB caching will only say _1 is non-NULL after the BB.
>>>>> I'm also not sure whether _3 ever gets non-NULL during
>>>>> non-NULL processing of the block since walking immediate uses
>>>>> doesn't really help here?
>>>> presumably _3 is globally non-null due to the definition being (pointer
>>>> + x)Â ... ie, _3 has a global range o f ~[0,0] ?
>>> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
>>> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
>> I'm confused.
>>
>> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
>> idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
>> The only way I see to determine _3 is non-NULLÂ is through the _4 = *_3
>> statement.
> Likewise. I don't see how we get ~[0,0] for _1, except at the point
> after the dereference of _3.
>
>
>>>>> So this seems to be a fundamental limitation [to the caching scheme],
>>>>> not sure if it is bad in practice.
>>>>>
>>>>> Or am I missing something?
>>>>>
>>>> Not missing anything The non-nullness property is maintains globally at
>>>> the basic block level. both _1 and _3 are flagged as being non-null in
>>>> the block. Upon exit, its a bit check.  If the global information does
>>>> not have the non-nullness property, then when a request is made for
>>>> non-nullness and the def and the use are both within the same block,
>>>> and its flagged as being non-null in that block, then the request is
>>>> forced back to a quick walk between the def and the use to see if there
>>>> is any non-nulless introduced in between.  Yes, that makes it a linear
>>>> walk, but its infrequent, and often short. to the best of our knowledge
>>>> at this point anyway :-)
>>> So with the clarification above do we ever see that _3 is non-NULL?
>>> I suppose the worker processing _3 = _1 + 1 would ask for
>>> _1 non-nullness but we do not record any non-NULL-ness of _1 in
>>> this basic-block (but only at its end). Consider stmts
>>>
>>> Â _4 = (uintptr_t) _2;
>>> Â _5 = _6 / _4;
>>> Â _1 = *_2;
>>> ...
>>>
>>> here at _1 we know _2 is not NULL. But if we ask for non-NULLness
>>> of _2 at the definition of _4 we may not compute ~[0, 0] and thus
>>> conclude that _6 / _4 does not trap.
>> EVRP must look backwards to figure this out since the forward walk will
>> process _5 = _6 / _4 before it sees the dereference to _2... so how does
>> it know that _4 is non-zero without looking backwards at things after it
>> sees the dereference?? Does it actually do this?
> During the forward walk we process the assignment to _5, which is _6 /
> _4. We can infer that _4 is nonzero because division by zero is
> undefined behavior. But I'm not sure how EVRP would go back and then
> make a determination about _4 unless it's doing so via an equivalence.
>
>
>
>>> stmt-level tracking of ranges are sometimes important. This is
>>> something the machinery cannot provide - correct? At least not
>>> optimistically enough with ranges derived about uses.
>> Maybe I'm the one missing something, but in the absence of statement
>> level exception throwing via 'can_throw_non_call_exceptions' being true,
>> any assertion made anywhere in the block to an ssa_name applies to the
>> entire block does it not? Â ie it doesn't matter if the deference
>> happens first thing in the block or last thing, its not going to change
>> its value within the block.. its going to be non-null throughout the
>> entire block.
> No, I don't think it can hold for the entire block.
>
> Consider
>
> x = p ? 10 : 20;
> foo (x)
> *p = whatever
>
> We don't know p is non-null because foo might not return. If by way of
> the dereference we were to assert p is non-null for the entire block,
> then we'd pass the wrong value to foo().
>
> Jeff
>
Interesting. I started wondering about calls when I was talking about
non-call exceptions, but couldn't think of an example off the top of my
head.
OK, so the statement holds true for any section of code without calls in
it. If the block is marked as having a non-null deref in it, I need to
look at statement in between the relevant def and use to see if there is
an impact rather than just checking the flag. If the flag is false, we
need to do nothing else.
I don't think any example above affects this. if we adjust the first
example to be
_1 = *_2; // after this _2 is non-NULL
  _3 = _2 + 1; // _3 is non-NULL
  _4 = *_3;
then when we are stepping back and evaluating
 _3 = _2 + 1, if _2 has a range of varying, and the
"non-null_in_block" flag is set, it would have to trigger a walk up the
block looking to see if we find a *_2.
The walk terminates when we see
 a) a call, resulting in _2 = varying
 b) the start of the block, resulting in _2 = varying or
 c) *_2 resulting in _2 = ~[0,0]
now, this only triggers when there is *both* a dereference and a
'normal' use in the same block. If both conditions aren't met, then no
walk is required.
There may also be a better approach, but Id wait until we measured and
saw some problem with this.
If it turns out to be some sort of real issue, then we could always fall
back to having the engine recognize that "oh, this block has both a
deref and a use, lets evaluate the entire block before answering the
question". The results are all cached, and any additional queries for
that block are then already a calculated.
I wonder how frequently it comes up. maybe I'll try some measurements. Â
 Best case, we do the minimum like we do today, worst case, we end up
evaluating everything in every block (if every block has this
situation).  But we'd be doing that without the on-demand approach
anyway, so it'd just be break even in that case.
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-31 20:27 ` Andrew MacLeod
@ 2019-05-31 22:00 ` Jeff Law
2019-05-31 23:08 ` Andrew MacLeod
0 siblings, 1 reply; 21+ messages in thread
From: Jeff Law @ 2019-05-31 22:00 UTC (permalink / raw)
To: Andrew MacLeod, Richard Biener; +Cc: GCC, Aldy Hernandez
On 5/31/19 2:26 PM, Andrew MacLeod wrote:
> On 5/31/19 2:16 PM, Jeff Law wrote:
>> On 5/31/19 9:40 AM, Andrew MacLeod wrote:
>>>> stmt-level tracking of ranges are sometimes important. This is
>>>> something the machinery cannot provide - correct? At least not
>>>> optimistically enough with ranges derived about uses.
>>> Maybe I'm the one missing something, but in the absence of statement
>>> level exception throwing via 'can_throw_non_call_exceptions' being true,
>>> any assertion made anywhere in the block to an ssa_name applies to the
>>> entire block does it not? Â ie it doesn't matter if the deference
>>> happens first thing in the block or last thing, its not going to change
>>> its value within the block.. its going to be non-null throughout the
>>> entire block.
>> No, I don't think it can hold for the entire block.
>>
>> Consider
>>
>> Â Â x = p ? 10 : 20;
>> Â Â foo (x)
>> Â Â *p = whatever
>>
>> We don't know p is non-null because foo might not return. If by way of
>> the dereference we were to assert p is non-null for the entire block,
>> then we'd pass the wrong value to foo().
>>
>> Jeff
>>
>
> Interesting. I started wondering about calls when I was talking about
> non-call exceptions, but couldn't think of an example off the top of my
> head.
>
> OK, so the statement holds true for any section of code without calls in
> it. If the block is marked as having a non-null deref in it, I need to
> look at statement in between the relevant def and use to see if there is
> an impact rather than just checking the flag. If the flag is false, we
> need to do nothing else.
What about in a multi-threaded application? No calls here....
x = p ? 10 : 20;
sharedvar = x;
[ bunch of straightline code]
*p = <whatever>
Our thread gets preempted and for some reason never restarts? Another
thread could read sharedvar and get the wrong value?
But one might argue we have similar problems in other places where we
"back propagate" non-nullness. I'm thinking about cases where the
pointer is passed to a function in an argument slot marked as non-null.
We mark the name as currently having a non-null value and it can get
back propagated around in rather surprising ways and we've been called
out on it.
Dunno... Probably worth mulling over some more...
Jeff
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-31 22:00 ` Jeff Law
@ 2019-05-31 23:08 ` Andrew MacLeod
0 siblings, 0 replies; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-31 23:08 UTC (permalink / raw)
To: Jeff Law, Richard Biener; +Cc: GCC, Aldy Hernandez
On 5/31/19 6:00 PM, Jeff Law wrote:
> On 5/31/19 2:26 PM, Andrew MacLeod wrote:
>> On 5/31/19 2:16 PM, Jeff Law wrote:
>>> On 5/31/19 9:40 AM, Andrew MacLeod wrote:
>>>>> stmt-level tracking of ranges are sometimes important. This is
>>>>> something the machinery cannot provide - correct? At least not
>>>>> optimistically enough with ranges derived about uses.
>>>> Maybe I'm the one missing something, but in the absence of statement
>>>> level exception throwing via 'can_throw_non_call_exceptions' being true,
>>>> any assertion made anywhere in the block to an ssa_name applies to the
>>>> entire block does it not? Â ie it doesn't matter if the deference
>>>> happens first thing in the block or last thing, its not going to change
>>>> its value within the block.. its going to be non-null throughout the
>>>> entire block.
>>> No, I don't think it can hold for the entire block.
>>>
>>> Consider
>>>
>>> Â Â x = p ? 10 : 20;
>>> Â Â foo (x)
>>> Â Â *p = whatever
>>>
>>> We don't know p is non-null because foo might not return. If by way of
>>> the dereference we were to assert p is non-null for the entire block,
>>> then we'd pass the wrong value to foo().
>>>
>>> Jeff
>>>
>> Interesting. I started wondering about calls when I was talking about
>> non-call exceptions, but couldn't think of an example off the top of my
>> head.
>>
>> OK, so the statement holds true for any section of code without calls in
>> it. If the block is marked as having a non-null deref in it, I need to
>> look at statement in between the relevant def and use to see if there is
>> an impact rather than just checking the flag. If the flag is false, we
>> need to do nothing else.
> What about in a multi-threaded application? No calls here....
>
> x = p ? 10 : 20;
> sharedvar = x;
> [ bunch of straightline code]
> *p = <whatever>
>
> Our thread gets preempted and for some reason never restarts? Another
> thread could read sharedvar and get the wrong value?
oh we're getting into the weeds :-)
the program dies if *p ever gets executed and p is NULL.  so if we
write shardedvar = 10 because we know this, does that make the program
wrong if p is NULL?  I dunno. maybe I suppose some other thread starts
doing something when it sees 20 and knows this one is going to die.. and
now it wont know its going to die. What a screwed up program :-).
>
> But one might argue we have similar problems in other places where we
> "back propagate" non-nullness. I'm thinking about cases where the
> pointer is passed to a function in an argument slot marked as non-null.
> We mark the name as currently having a non-null value and it can get
> back propagated around in rather surprising ways and we've been called
> out on it.
>
> Dunno... Probably worth mulling over some more...
> Jeff
yeah, we may not be able to optimistically back propagate. I was
thinking about it, and what happens if you tried to push that
non-nullness back into previous blocks. you can get into trouble quickly
:-)Â so it does seem like it might be important to maintain the temporal
integrity of a range when any kind of inferred knowledge like *p or /x
is applied to it.
This is not a pressing issue yet since we arent trying to immediately
integrate the on-demand engine. I'll mull it over some more, but we may
be able to do some processing for blocks in which there is an issue with
non-zeroness as needed.. And the fallback to fully enumerate andÂ
forward process any block affected should certainly work and not cost
more than the current mechanism which does the same thing.
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-31 15:40 ` Andrew MacLeod
2019-05-31 18:16 ` Jeff Law
@ 2019-06-04 15:27 ` Richard Biener
2019-06-04 15:38 ` Richard Biener
2019-06-04 16:49 ` Andrew MacLeod
2019-06-04 20:04 ` Martin Sebor
2 siblings, 2 replies; 21+ messages in thread
From: Richard Biener @ 2019-06-04 15:27 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On Fri, May 31, 2019 at 5:40 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 5/29/19 7:15 AM, Richard Biener wrote:
> > On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >> On 5/27/19 9:02 AM, Richard Biener wrote:
> >>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >>>>> The above suggests that iff this is done at all it is not in GORI because
> >>>>> those are not conditional stmts or ranges from feeding those. The
> >>>>> machinery doing the use-def walking from stmt context also cannot
> >>>>> come along these so I have the suspicion that Ranger cannot handle
> >>>>> telling us that for the stmt following above, for example
> >>>>>
> >>>>> if (_5 != 0)
> >>>>>
> >>>>> that _5 is not zero?
> >>>>>
> >>>>> Can you clarify?
> >>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
> >>>> asked for the range of op2 () would return ~[0,0] for _5.
> >>>> But you are also correct in that the walk backwards would not find this.
> >>>>
> >>>> This is similar functionality to how null_derefs are currently handled,
> >>>> and in fact could probably be done simultaneously using the same code
> >>>> base. I didn't bring null derefs up, but this is a good time :-)
> >>>>
> >>>> There is a separate class used by the gori-cache which tracks the
> >>>> non-nullness property at the block level. It has a single API:
> >>>> non_null_deref_p (name, bb) which determines whether the is a
> >>>> dereference in any BB for NAME, which indicates whether the range has an
> >>>> implicit ~[0,0] range in that basic block or not.
> >>> So when we then have
> >>>
> >>> _1 = *_2; // after this _2 is non-NULL
> >>> _3 = _1 + 1; // _3 is non-NULL
> >>> _4 = *_3;
> >>> ...
> >>>
> >>> when a on-demand user asks whether _3 is non-NULL at the
> >>> point of _4 = *_3 we don't have this information? Since the
> >>> per-BB caching will only say _1 is non-NULL after the BB.
> >>> I'm also not sure whether _3 ever gets non-NULL during
> >>> non-NULL processing of the block since walking immediate uses
> >>> doesn't really help here?
> >> presumably _3 is globally non-null due to the definition being (pointer
> >> + x) ... ie, _3 has a global range o f ~[0,0] ?
> > No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
> > you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
>
> I'm confused.
>
> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
> idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
Bug on my part - it should offset _2:
_1 = *_2; // after this _2 is non-NULL
_3 = _2 + 1; // _3 is non-NULL
_4 = *_3;
> The only way I see to determine _3 is non-NULL is through the _4 = *_3
> statement.
So how do you derive a ~[0,0] _2 and thus know that *_3 does not trap?
>
> >
> >>> So this seems to be a fundamental limitation [to the caching scheme],
> >>> not sure if it is bad in practice.
> >>>
> >>> Or am I missing something?
> >>>
> >> Not missing anything The non-nullness property is maintains globally at
> >> the basic block level. both _1 and _3 are flagged as being non-null in
> >> the block. Upon exit, its a bit check. If the global information does
> >> not have the non-nullness property, then when a request is made for
> >> non-nullness and the def and the use are both within the same block,
> >> and its flagged as being non-null in that block, then the request is
> >> forced back to a quick walk between the def and the use to see if there
> >> is any non-nulless introduced in between. Yes, that makes it a linear
> >> walk, but its infrequent, and often short. to the best of our knowledge
> >> at this point anyway :-)
> > So with the clarification above do we ever see that _3 is non-NULL?
> > I suppose the worker processing _3 = _1 + 1 would ask for
> > _1 non-nullness but we do not record any non-NULL-ness of _1 in
> > this basic-block (but only at its end). Consider stmts
> >
> > _4 = (uintptr_t) _2;
> > _5 = _6 / _4;
> > _1 = *_2;
> > ...
> >
> > here at _1 we know _2 is not NULL. But if we ask for non-NULLness
> > of _2 at the definition of _4 we may not compute ~[0, 0] and thus
> > conclude that _6 / _4 does not trap.
> EVRP must look backwards to figure this out since the forward walk will
> process _5 = _6 / _4 before it sees the dereference to _2... so how does
> it know that _4 is non-zero without looking backwards at things after it
> sees the dereference?? Does it actually do this?
It actually doesn't do this for divisions (but I have a patch lying somewhere).
During the forward walk when coming along _5 = _6 / _4; it would
in evrp_range_analyzer::record_ranges_from_stmt via infer_value_range
push a new value-range for _4. To infer the same for _2 it would need to
look backward which I think it doesn't do right now but it easily could
(I'm just trying to make up examples to understand ranger better).
When visiting _1 = *_2 we then know _2 is not NULL.
There may be more "useful" cases, for example we could derive
ranges for shift operands based on undefinedness and their range
may be useful for simplifying adjacent stmts.
I'm trying to discover how ranger appearantly recording ranges only
at BB boundaries interacts with ranges that become "live" at
some point inside the BB.
> >
> > stmt-level tracking of ranges are sometimes important. This is
> > something the machinery cannot provide - correct? At least not
> > optimistically enough with ranges derived about uses.
>
> Maybe I'm the one missing something, but in the absence of statement
> level exception throwing via 'can_throw_non_call_exceptions' being true,
> any assertion made anywhere in the block to an ssa_name applies to the
> entire block does it not? ie it doesn't matter if the deference
> happens first thing in the block or last thing, its not going to change
> its value within the block.. its going to be non-null throughout the
> entire block.
Well, one complication I had with the division by zero patch is that
for _2 = _1 / _3 you can derive that _3 is not zero _after_ the stmt
but you may of course not assume it is not zero during evaluation
of the stmt because that may lead you to assume it may not trap
and thus move it to a place it was not executed previously.
> so if one statement in the block asserts that references to _2 are
> non-null, we can assert that all references to _2 in the block are
> non-null. Meaning we get all these cases by knowing that the specified
> name is non-zero through-out the block. This also means we could know
> things earlier in the block than a forward walk would provide.
>
> So with the 2 examples:
>
> _1 = *_2; // after this _2 is non-NULL
> _3 = _1 + 1;
> _4 = *_3;
>
>
>
> both _2 and _3 are flagged as non-null in the block due to the
> de-references. Im not sure what we know about _1 from above, but we do
> know
> ~[0,0] = _1 + 1 . so whatever that tells you , if anything, about
> _1 we know. it seems to me _1 is ~[-1,-1] based on that...
>
> Regardless, I think we know all the same things EVRP does.
>
> likewise
>
> _4 = (uintptr_t) _2;
> _5 = _6 / _4;
> _1 = *_2;
>
> _2 will be non-null in the entire block, so _4 must also be non-null
> and we can conclude that the divide does not trap.
And move the divide up like so?
_4 = (uintptr_t) _2;
_5 = _6 / _4;
if (flag)
{
_1 = *_2;
...
}
most definitely not.
>
>
> now, when we set the ' can_throw_non_call_exceptions' flag, then we'd
> have to resort to statement walks, and we cannot determine that _5 does
> not trap anyway. EVRP is in the same boat.. It doesn't know its not
> going to trap either because we may never get to the *_2..
The point is not that we know *_2 does not trap - we don't! We just know
that if the program reaches the next stmt it did not and thus _2 was not NULL
_for the stmts following the dereference_.
Note I'm not sure it will make a difference in practice since any transform
on a stmt after *_2 relying on the above creates constraints on stmt ordering
which we cannot represent in the IL and thus the result is likely going to be
very fragile.
As said, I was just wondering if the non-NULL machinery you have
integrates with the range machinery or is really independent
(thus the pointer derived from the non-NULL one due to the dereference
case above).
> >>>> yes, compile-time complexity is from empirical speed timings and
> >>>> theory-crafting from the algorithms, and that the on-entry cache
> >>>> prevents multiple passes over things.
> >>>>
> >>>> we have not done a memory analysis yet, not done anything to see if we
> >>>> can improve it.
> >>>> It makes very heavy use of bitmaps, which are typically quite sparse.
> >>>> The on-entry cache is a vector of pointers to a range, initially 0, and
> >>>> we allocate ranges as needed. There will be an on-entry range entry
> >>>> for any ssa-name which has been queried between the query point and the
> >>>> definition.
> >>> So that's similar to LIVE-on-entry thus a SSA name with a range
> >>> will have an on-entry range entry on each BB it dominates?
> >>> That makes storage requirement quadratic in the worst case.
> >> Yes, assuming a request has been made for all ssa-names everywhere they
> >> are live.
> > You did propose to replace [E]VRP with a walk over the whole function
> > querying ranges and simplifying stmts, did you?
> yes, for the case of EVRP. But not all use cases query every range
> everywhere.
>
> its also a bit lower than that since we cache certain types of ranges.
> we cache a range for varying (or range for type if you prefer) of
> whatever the ssa-name type is. so if the range is varying everywhere,
> we actually only have once instance of the range rather than N of them.
> So any name that doesn't have a range reduction anywhere will only
> create a single range instance for the entire CFG.
But you still have a reference to the range in evry BB dominated by the
definition?
> I think thats the
> most common value, so that should reduce a large number of them. I've
> also considering caching ranges like we cache tree constants... but I
> haven't delved into that. I figured if memory turns out to be a
> problem, then we'll look at it then.
>
> Andrew
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 15:27 ` Richard Biener
@ 2019-06-04 15:38 ` Richard Biener
2019-06-04 16:50 ` Andrew MacLeod
2019-06-04 16:49 ` Andrew MacLeod
1 sibling, 1 reply; 21+ messages in thread
From: Richard Biener @ 2019-06-04 15:38 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On Tue, Jun 4, 2019 at 5:26 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, May 31, 2019 at 5:40 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >
> > On 5/29/19 7:15 AM, Richard Biener wrote:
> > > On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> > >> On 5/27/19 9:02 AM, Richard Biener wrote:
> > >>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> > >>>>> The above suggests that iff this is done at all it is not in GORI because
> > >>>>> those are not conditional stmts or ranges from feeding those. The
> > >>>>> machinery doing the use-def walking from stmt context also cannot
> > >>>>> come along these so I have the suspicion that Ranger cannot handle
> > >>>>> telling us that for the stmt following above, for example
> > >>>>>
> > >>>>> if (_5 != 0)
> > >>>>>
> > >>>>> that _5 is not zero?
> > >>>>>
> > >>>>> Can you clarify?
> > >>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
> > >>>> asked for the range of op2 () would return ~[0,0] for _5.
> > >>>> But you are also correct in that the walk backwards would not find this.
> > >>>>
> > >>>> This is similar functionality to how null_derefs are currently handled,
> > >>>> and in fact could probably be done simultaneously using the same code
> > >>>> base. I didn't bring null derefs up, but this is a good time :-)
> > >>>>
> > >>>> There is a separate class used by the gori-cache which tracks the
> > >>>> non-nullness property at the block level. It has a single API:
> > >>>> non_null_deref_p (name, bb) which determines whether the is a
> > >>>> dereference in any BB for NAME, which indicates whether the range has an
> > >>>> implicit ~[0,0] range in that basic block or not.
> > >>> So when we then have
> > >>>
> > >>> _1 = *_2; // after this _2 is non-NULL
> > >>> _3 = _1 + 1; // _3 is non-NULL
> > >>> _4 = *_3;
> > >>> ...
> > >>>
> > >>> when a on-demand user asks whether _3 is non-NULL at the
> > >>> point of _4 = *_3 we don't have this information? Since the
> > >>> per-BB caching will only say _1 is non-NULL after the BB.
> > >>> I'm also not sure whether _3 ever gets non-NULL during
> > >>> non-NULL processing of the block since walking immediate uses
> > >>> doesn't really help here?
> > >> presumably _3 is globally non-null due to the definition being (pointer
> > >> + x) ... ie, _3 has a global range o f ~[0,0] ?
> > > No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
> > > you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
> >
> > I'm confused.
> >
> > _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
> > idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
>
> Bug on my part - it should offset _2:
>
> _1 = *_2; // after this _2 is non-NULL
> _3 = _2 + 1; // _3 is non-NULL
> _4 = *_3;
>
> > The only way I see to determine _3 is non-NULL is through the _4 = *_3
> > statement.
>
> So how do you derive a ~[0,0] _2 and thus know that *_3 does not trap?
>
> >
> > >
> > >>> So this seems to be a fundamental limitation [to the caching scheme],
> > >>> not sure if it is bad in practice.
> > >>>
> > >>> Or am I missing something?
> > >>>
> > >> Not missing anything The non-nullness property is maintains globally at
> > >> the basic block level. both _1 and _3 are flagged as being non-null in
> > >> the block. Upon exit, its a bit check. If the global information does
> > >> not have the non-nullness property, then when a request is made for
> > >> non-nullness and the def and the use are both within the same block,
> > >> and its flagged as being non-null in that block, then the request is
> > >> forced back to a quick walk between the def and the use to see if there
> > >> is any non-nulless introduced in between. Yes, that makes it a linear
> > >> walk, but its infrequent, and often short. to the best of our knowledge
> > >> at this point anyway :-)
> > > So with the clarification above do we ever see that _3 is non-NULL?
> > > I suppose the worker processing _3 = _1 + 1 would ask for
> > > _1 non-nullness but we do not record any non-NULL-ness of _1 in
> > > this basic-block (but only at its end). Consider stmts
> > >
> > > _4 = (uintptr_t) _2;
> > > _5 = _6 / _4;
> > > _1 = *_2;
> > > ...
> > >
> > > here at _1 we know _2 is not NULL. But if we ask for non-NULLness
> > > of _2 at the definition of _4 we may not compute ~[0, 0] and thus
> > > conclude that _6 / _4 does not trap.
> > EVRP must look backwards to figure this out since the forward walk will
> > process _5 = _6 / _4 before it sees the dereference to _2... so how does
> > it know that _4 is non-zero without looking backwards at things after it
> > sees the dereference?? Does it actually do this?
>
> It actually doesn't do this for divisions (but I have a patch lying somewhere).
> During the forward walk when coming along _5 = _6 / _4; it would
> in evrp_range_analyzer::record_ranges_from_stmt via infer_value_range
> push a new value-range for _4. To infer the same for _2 it would need to
> look backward which I think it doesn't do right now but it easily could
> (I'm just trying to make up examples to understand ranger better).
> When visiting _1 = *_2 we then know _2 is not NULL.
>
> There may be more "useful" cases, for example we could derive
> ranges for shift operands based on undefinedness and their range
> may be useful for simplifying adjacent stmts.
>
> I'm trying to discover how ranger appearantly recording ranges only
> at BB boundaries interacts with ranges that become "live" at
> some point inside the BB.
>
> > >
> > > stmt-level tracking of ranges are sometimes important. This is
> > > something the machinery cannot provide - correct? At least not
> > > optimistically enough with ranges derived about uses.
> >
> > Maybe I'm the one missing something, but in the absence of statement
> > level exception throwing via 'can_throw_non_call_exceptions' being true,
> > any assertion made anywhere in the block to an ssa_name applies to the
> > entire block does it not? ie it doesn't matter if the deference
> > happens first thing in the block or last thing, its not going to change
> > its value within the block.. its going to be non-null throughout the
> > entire block.
>
> Well, one complication I had with the division by zero patch is that
> for _2 = _1 / _3 you can derive that _3 is not zero _after_ the stmt
> but you may of course not assume it is not zero during evaluation
> of the stmt because that may lead you to assume it may not trap
> and thus move it to a place it was not executed previously.
>
> > so if one statement in the block asserts that references to _2 are
> > non-null, we can assert that all references to _2 in the block are
> > non-null. Meaning we get all these cases by knowing that the specified
> > name is non-zero through-out the block. This also means we could know
> > things earlier in the block than a forward walk would provide.
> >
> > So with the 2 examples:
> >
> > _1 = *_2; // after this _2 is non-NULL
> > _3 = _1 + 1;
> > _4 = *_3;
> >
> >
> >
> > both _2 and _3 are flagged as non-null in the block due to the
> > de-references. Im not sure what we know about _1 from above, but we do
> > know
> > ~[0,0] = _1 + 1 . so whatever that tells you , if anything, about
> > _1 we know. it seems to me _1 is ~[-1,-1] based on that...
> >
> > Regardless, I think we know all the same things EVRP does.
> >
> > likewise
> >
> > _4 = (uintptr_t) _2;
> > _5 = _6 / _4;
> > _1 = *_2;
> >
> > _2 will be non-null in the entire block, so _4 must also be non-null
> > and we can conclude that the divide does not trap.
>
> And move the divide up like so?
>
> _4 = (uintptr_t) _2;
> _5 = _6 / _4;
> if (flag)
> {
> _1 = *_2;
> ...
> }
>
> most definitely not.
>
> >
> >
> > now, when we set the ' can_throw_non_call_exceptions' flag, then we'd
> > have to resort to statement walks, and we cannot determine that _5 does
> > not trap anyway. EVRP is in the same boat.. It doesn't know its not
> > going to trap either because we may never get to the *_2..
>
> The point is not that we know *_2 does not trap - we don't! We just know
> that if the program reaches the next stmt it did not and thus _2 was not NULL
> _for the stmts following the dereference_.
>
> Note I'm not sure it will make a difference in practice since any transform
> on a stmt after *_2 relying on the above creates constraints on stmt ordering
> which we cannot represent in the IL and thus the result is likely going to be
> very fragile.
>
> As said, I was just wondering if the non-NULL machinery you have
> integrates with the range machinery or is really independent
> (thus the pointer derived from the non-NULL one due to the dereference
> case above).
>
> > >>>> yes, compile-time complexity is from empirical speed timings and
> > >>>> theory-crafting from the algorithms, and that the on-entry cache
> > >>>> prevents multiple passes over things.
> > >>>>
> > >>>> we have not done a memory analysis yet, not done anything to see if we
> > >>>> can improve it.
> > >>>> It makes very heavy use of bitmaps, which are typically quite sparse.
> > >>>> The on-entry cache is a vector of pointers to a range, initially 0, and
> > >>>> we allocate ranges as needed. There will be an on-entry range entry
> > >>>> for any ssa-name which has been queried between the query point and the
> > >>>> definition.
> > >>> So that's similar to LIVE-on-entry thus a SSA name with a range
> > >>> will have an on-entry range entry on each BB it dominates?
> > >>> That makes storage requirement quadratic in the worst case.
> > >> Yes, assuming a request has been made for all ssa-names everywhere they
> > >> are live.
> > > You did propose to replace [E]VRP with a walk over the whole function
> > > querying ranges and simplifying stmts, did you?
> > yes, for the case of EVRP. But not all use cases query every range
> > everywhere.
> >
> > its also a bit lower than that since we cache certain types of ranges.
> > we cache a range for varying (or range for type if you prefer) of
> > whatever the ssa-name type is. so if the range is varying everywhere,
> > we actually only have once instance of the range rather than N of them.
> > So any name that doesn't have a range reduction anywhere will only
> > create a single range instance for the entire CFG.
>
> But you still have a reference to the range in evry BB dominated by the
> definition?
Btw, I was thinking of
unsigned char foo(long);
void baz(unsigned char c)
{
try{
long i = c;
i += foo(i);
i += foo(i);
i += foo(i);
i += foo(i);
... repeat N times ...
alloca(i); // query range of i
}
catch (...)
{
}
}
where the single(!) query of the range of i on the alloca call
will populate N BBs caches with the Nth having ranges for
all SSA defs of i running into quadraticness in N.
If you actually try you probably will have to find a persisting
stmt that some user of on-demand query looks for.
You converted some of the warning passes so choose
a stmt that triggers it from there.
Note we've run into "interesting" testcases mostly from
machine generated code which we still want to be able
to optimize at -O1 at least with reasonable use of
time and memory (and quadraticnesses are to be avoided
like a plague - not that we don't have some).
> > I think thats the
> > most common value, so that should reduce a large number of them. I've
> > also considering caching ranges like we cache tree constants... but I
> > haven't delved into that. I figured if memory turns out to be a
> > problem, then we'll look at it then.
> >
> > Andrew
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 15:27 ` Richard Biener
2019-06-04 15:38 ` Richard Biener
@ 2019-06-04 16:49 ` Andrew MacLeod
1 sibling, 0 replies; 21+ messages in thread
From: Andrew MacLeod @ 2019-06-04 16:49 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 6/4/19 11:26 AM, Richard Biener wrote:
> On Fri, May 31, 2019 at 5:40 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>> On 5/29/19 7:15 AM, Richard Biener wrote:
>>> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>>>>>> The above suggests that iff this is done at all it is not in GORI because
>>>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>>>> telling us that for the stmt following above, for example
>>>>>>>
>>>>>>> if (_5 != 0)
>>>>>>>
>>>>>>> that _5 is not zero?
>>>>>>>
>>>>>>> Can you clarify?
>>>>>> So there are 2 aspects to this. the range-ops code for DIV_EXPR, if
>>>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>>>> But you are also correct in that the walk backwards would not find this.
>>>>>>
>>>>>> This is similar functionality to how null_derefs are currently handled,
>>>>>> and in fact could probably be done simultaneously using the same code
>>>>>> base. I didn't bring null derefs up, but this is a good time :-)
>>>>>>
>>>>>> There is a separate class used by the gori-cache which tracks the
>>>>>> non-nullness property at the block level. It has a single API:
>>>>>> non_null_deref_p (name, bb) which determines whether the is a
>>>>>> dereference in any BB for NAME, which indicates whether the range has an
>>>>>> implicit ~[0,0] range in that basic block or not.
>>>>> So when we then have
>>>>>
>>>>> _1 = *_2; // after this _2 is non-NULL
>>>>> _3 = _1 + 1; // _3 is non-NULL
>>>>> _4 = *_3;
>>>>> ...
>>>>>
>>>>> when a on-demand user asks whether _3 is non-NULL at the
>>>>> point of _4 = *_3 we don't have this information? Since the
>>>>> per-BB caching will only say _1 is non-NULL after the BB.
>>>>> I'm also not sure whether _3 ever gets non-NULL during
>>>>> non-NULL processing of the block since walking immediate uses
>>>>> doesn't really help here?
>>>> presumably _3 is globally non-null due to the definition being (pointer
>>>> + x) ... ie, _3 has a global range o f ~[0,0] ?
>>> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
>>> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
>> I'm confused.
>>
>> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
>> idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
> Bug on my part - it should offset _2:
>
> _1 = *_2; // after this _2 is non-NULL
> _3 = _2 + 1; // _3 is non-NULL
> _4 = *_3;
>
>> The only way I see to determine _3 is non-NULL is through the _4 = *_3
>> statement.
> So how do you derive a ~[0,0] _2 and thus know that *_3 does not trap?
well, we know *_2 is non-null on exit to the block. we wind backwards,
so we know its non-null when we get to _3 = _2 + 1
But as I will say later in this note and in the earlier discussion with
jeff in this thread, the non-null property needs some work as I can
rewrite this to make it wrong. :-P
There is an interesting difference between what we know about things in
a forward walk, what we know in a backward walk, and what we know in a
def-chain walk :-)
>
>>>>> So this seems to be a fundamental limitation [to the caching scheme],
>>>>> not sure if it is bad in practice.
>>>>>
>>>>> Or am I missing something?
>>>>>
>>>> Not missing anything The non-nullness property is maintains globally at
>>>> the basic block level. both _1 and _3 are flagged as being non-null in
>>>> the block. Upon exit, its a bit check. If the global information does
>>>> not have the non-nullness property, then when a request is made for
>>>> non-nullness and the def and the use are both within the same block,
>>>> and its flagged as being non-null in that block, then the request is
>>>> forced back to a quick walk between the def and the use to see if there
>>>> is any non-nulless introduced in between. Yes, that makes it a linear
>>>> walk, but its infrequent, and often short. to the best of our knowledge
>>>> at this point anyway :-)
>>> So with the clarification above do we ever see that _3 is non-NULL?
>>> I suppose the worker processing _3 = _1 + 1 would ask for
>>> _1 non-nullness but we do not record any non-NULL-ness of _1 in
>>> this basic-block (but only at its end). Consider stmts
>>>
>>> _4 = (uintptr_t) _2;
>>> _5 = _6 / _4;
>>> _1 = *_2;
>>> ...
>>>
>>> here at _1 we know _2 is not NULL. But if we ask for non-NULLness
>>> of _2 at the definition of _4 we may not compute ~[0, 0] and thus
>>> conclude that _6 / _4 does not trap.
>> EVRP must look backwards to figure this out since the forward walk will
>> process _5 = _6 / _4 before it sees the dereference to _2... so how does
>> it know that _4 is non-zero without looking backwards at things after it
>> sees the dereference?? Does it actually do this?
> It actually doesn't do this for divisions (but I have a patch lying somewhere).
> During the forward walk when coming along _5 = _6 / _4; it would
> in evrp_range_analyzer::record_ranges_from_stmt via infer_value_range
> push a new value-range for _4. To infer the same for _2 it would need to
> look backward which I think it doesn't do right now but it easily could
> (I'm just trying to make up examples to understand ranger better).
> When visiting _1 = *_2 we then know _2 is not NULL.
>
> There may be more "useful" cases, for example we could derive
> ranges for shift operands based on undefinedness and their range
> may be useful for simplifying adjacent stmts.
>
> I'm trying to discover how ranger appearantly recording ranges only
> at BB boundaries interacts with ranges that become "live" at
> some point inside the BB.
The ranger records that a name becomes non-zero somewhere within a
block. this is then propagated to other following blocks.
The case where there is work to do is when a name is tagged as having a
non-zero reference in the block AND
the use being queried is not the non-zero reference, AND
 a) the range did contain zero before the block OR
 b) the definition is within the block
Then the only way it can determine if any given use is non-zero is to
walk from the use back towards the def/start of the block looking to see
if we encounter the non-zero use along the way.
This thread contains a longer discussion with jeff about the block level
info in the presence of threading as well where we concluded it needs
some extra thought.
At the moment, there is not a perfect solution for this. the inferred
values by unrelated statements is clearly not the strongest aspect of
the backwards walking ranger which is why it needs something like the
non-null infrastructure to get these cases..
 Its clearly a far more frequent thing for pointers than for other
integrals, and maybe there is something better than can be done for them.
>>> stmt-level tracking of ranges are sometimes important. This is
>>> something the machinery cannot provide - correct? At least not
>>> optimistically enough with ranges derived about uses.
>> Maybe I'm the one missing something, but in the absence of statement
>> level exception throwing via 'can_throw_non_call_exceptions' being true,
>> any assertion made anywhere in the block to an ssa_name applies to the
>> entire block does it not? ie it doesn't matter if the deference
>> happens first thing in the block or last thing, its not going to change
>> its value within the block.. its going to be non-null throughout the
>> entire block.
> Well, one complication I had with the division by zero patch is that
> for _2 = _1 / _3 you can derive that _3 is not zero _after_ the stmt
> but you may of course not assume it is not zero during evaluation
> of the stmt because that may lead you to assume it may not trap
> and thus move it to a place it was not executed previously.
yeah, during the reverse walk we can assume that when we reach this
point its non zero, but the inferred values like this and *p can not be
assumed ON the statement.
I guess thats would have to be a property of inferred values.
>
>> so if one statement in the block asserts that references to _2 are
>> non-null, we can assert that all references to _2 in the block are
>> non-null. Meaning we get all these cases by knowing that the specified
>> name is non-zero through-out the block. This also means we could know
>> things earlier in the block than a forward walk would provide.
>>
>> So with the 2 examples:
>>
>> _1 = *_2; // after this _2 is non-NULL
>> _3 = _1 + 1;
>> _4 = *_3;
>>
>>
>>
>> both _2 and _3 are flagged as non-null in the block due to the
>> de-references. Im not sure what we know about _1 from above, but we do
>> know
>> ~[0,0] = _1 + 1 . so whatever that tells you , if anything, about
>> _1 we know. it seems to me _1 is ~[-1,-1] based on that...
>>
>> Regardless, I think we know all the same things EVRP does.
>>
>> likewise
>>
>> _4 = (uintptr_t) _2;
>> _5 = _6 / _4;
>> _1 = *_2;
>>
>> _2 will be non-null in the entire block, so _4 must also be non-null
>> and we can conclude that the divide does not trap.
> And move the divide up like so?
>
> _4 = (uintptr_t) _2;
> _5 = _6 / _4;
> if (flag)
> {
> _1 = *_2;
> ...
> }
>
> most definitely not.
well you are in a different block and there is nothing in the first
block that says its non-null now, just the TRUE side of the if block...
but I get the point I think you are trying to make, and back to the
discussion with jeff I think we've concluded that the non-null-ref in
block needs some work to be fully viable :-)
>>
>> now, when we set the ' can_throw_non_call_exceptions' flag, then we'd
>> have to resort to statement walks, and we cannot determine that _5 does
>> not trap anyway. EVRP is in the same boat.. It doesn't know its not
>> going to trap either because we may never get to the *_2..
> The point is not that we know *_2 does not trap - we don't! We just know
> that if the program reaches the next stmt it did not and thus _2 was not NULL
> _for the stmts following the dereference_.
>
> Note I'm not sure it will make a difference in practice since any transform
> on a stmt after *_2 relying on the above creates constraints on stmt ordering
> which we cannot represent in the IL and thus the result is likely going to be
> very fragile.
>
> As said, I was just wondering if the non-NULL machinery you have
> integrates with the range machinery or is really independent
> (thus the pointer derived from the non-NULL one due to the dereference
> case above).
It is currently an independent class that is invoked by the ranger when
evaluating the range on an outgoing edge to determine if non-null should
be applied to the range
It looks like I disabled any checks within the block, so at the moment
it wont apply any non-nullness within the block unless it was live on
entry. I see some potential issues even with the live on exit value
being propagated back up into the block if the value wasnt also non-zero
on entry.   I think my current implementation of non-nullness is only
safe to use when the value is non-zero throughout the block.
>>>>>> yes, compile-time complexity is from empirical speed timings and
>>>>>> theory-crafting from the algorithms, and that the on-entry cache
>>>>>> prevents multiple passes over things.
>>>>>>
>>>>>> we have not done a memory analysis yet, not done anything to see if we
>>>>>> can improve it.
>>>>>> It makes very heavy use of bitmaps, which are typically quite sparse.
>>>>>> The on-entry cache is a vector of pointers to a range, initially 0, and
>>>>>> we allocate ranges as needed. There will be an on-entry range entry
>>>>>> for any ssa-name which has been queried between the query point and the
>>>>>> definition.
>>>>> So that's similar to LIVE-on-entry thus a SSA name with a range
>>>>> will have an on-entry range entry on each BB it dominates?
>>>>> That makes storage requirement quadratic in the worst case.
>>>> Yes, assuming a request has been made for all ssa-names everywhere they
>>>> are live.
>>> You did propose to replace [E]VRP with a walk over the whole function
>>> querying ranges and simplifying stmts, did you?
>> yes, for the case of EVRP. But not all use cases query every range
>> everywhere.
>>
>> its also a bit lower than that since we cache certain types of ranges.
>> we cache a range for varying (or range for type if you prefer) of
>> whatever the ssa-name type is. so if the range is varying everywhere,
>> we actually only have once instance of the range rather than N of them.
>> So any name that doesn't have a range reduction anywhere will only
>> create a single range instance for the entire CFG.
> But you still have a reference to the range in evry BB dominated by the
> definition?
every range dominated by the definition that leads to a use, yes. Since
we work bottom up, the live-on-exit property is implicit so once it goes
out of liveness there are no more references. so anything that is
defined, used in the block and never used again will never be in the cache.
>> I think thats the
>> most common value, so that should reduce a large number of them. I've
>> also considering caching ranges like we cache tree constants... but I
>> haven't delved into that. I figured if memory turns out to be a
>> problem, then we'll look at it then.
>>
>> Andrew
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 15:38 ` Richard Biener
@ 2019-06-04 16:50 ` Andrew MacLeod
2019-06-04 17:07 ` Richard Biener
0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-06-04 16:50 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 6/4/19 11:37 AM, Richard Biener wrote:
> On Tue, Jun 4, 2019 at 5:26 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> But you still have a reference to the range in evry BB dominated by the
>> definition?
> Btw, I was thinking of
>
> unsigned char foo(long);
> void baz(unsigned char c)
> {
> try{
> long i = c;
> i += foo(i);
> i += foo(i);
> i += foo(i);
> i += foo(i);
> ... repeat N times ...
> alloca(i); // query range of i
> }
> catch (...)
> {
> }
> }
>
>
> where the single(!) query of the range of i on the alloca call
> will populate N BBs caches with the Nth having ranges for
> all SSA defs of i running into quadraticness in N.
well, not really.  its still linear since the 'i' used in each foo()
will be dead, so the next block will contain no range for it
try{
long i_2 = _1;
i_3 += foo(i_2);
i_4 += foo(i_3);
i_5 += foo(i_4);
i_6 += foo(i_5);
... repeat N times ...
alloca(i_6); // query range of i
once i_2 is 'dead' after the first call to foo(), it will no longer be
in the on-entry cache to any other block.
The walk back never see's i_2 until the first call to foo(), so none of
blocks after that have a need for its range, so it will not in their caches.
The on-entry cache will only get a single range entry for each of those
blocks.. the incoming value of i_x and thats it. The implicitly known
live-on-exit attribute is handy here.
Now, that's not to say we cant construct cases where there is a lot of
ranges being populated.. If we find the cache is really a problem, we
can start caching ranges.. so that if all these i_x's were live somehow,
there would only be one range for each in this case, and the cache's
would simply all point to the same range.
so far we havent really run across anything that appears to trigger
significant concerns.
>
> If you actually try you probably will have to find a persisting
> stmt that some user of on-demand query looks for.
> You converted some of the warning passes so choose
> a stmt that triggers it from there.
>
> Note we've run into "interesting" testcases mostly from
> machine generated code which we still want to be able
> to optimize at -O1 at least with reasonable use of
> time and memory (and quadraticnesses are to be avoided
> like a plague - not that we don't have some).
>
>>> I think thats the
>>> most common value, so that should reduce a large number of them. I've
>>> also considering caching ranges like we cache tree constants... but I
>>> haven't delved into that. I figured if memory turns out to be a
>>> problem, then we'll look at it then.
>>>
>>> Andrew
>>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 16:50 ` Andrew MacLeod
@ 2019-06-04 17:07 ` Richard Biener
2019-06-04 19:55 ` Andrew MacLeod
0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2019-06-04 17:07 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On June 4, 2019 6:50:07 PM GMT+02:00, Andrew MacLeod <amacleod@redhat.com> wrote:
>On 6/4/19 11:37 AM, Richard Biener wrote:
>> On Tue, Jun 4, 2019 at 5:26 PM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> But you still have a reference to the range in evry BB dominated by
>the
>>> definition?
>> Btw, I was thinking of
>>
>> unsigned char foo(long);
>> void baz(unsigned char c)
>> {
>> try{
>> long i = c;
>> i += foo(i);
>> i += foo(i);
>> i += foo(i);
>> i += foo(i);
>> ... repeat N times ...
>> alloca(i); // query range of i
>> }
>> catch (...)
>> {
>> }
>> }
>>
>>
>> where the single(!) query of the range of i on the alloca call
>> will populate N BBs caches with the Nth having ranges for
>> all SSA defs of i running into quadraticness in N.
>well, not really. its still linear since the 'i' used in each foo()
>will be dead, so the next block will contain no range for it
>
>
> try{
> long i_2 = _1;
> i_3 += foo(i_2);
> i_4 += foo(i_3);
> i_5 += foo(i_4);
> i_6 += foo(i_5);
>... repeat N times ...
> alloca(i_6); // query range of i
>
>once i_2 is 'dead' after the first call to foo(), it will no longer be
>in the on-entry cache to any other block.
>The walk back never see's i_2 until the first call to foo(), so none of
It's actually
Utemp_2 = foo(i_1);
Bb:
i_3 = i_1 + (long) utemp_2;
Utemp_4 = foo(i_3);
Eliding the separate stmt for the conversion. From you description of the cache it sounded like getting incoming values is a take-all operation. Otherwise you'd need negative entries as well (I thought the cache might not contain varying ranges for example).
>blocks after that have a need for its range, so it will not in their
>caches.
>
>The on-entry cache will only get a single range entry for each of those
>
>blocks.. the incoming value of i_x and thats it. The implicitly known
>live-on-exit attribute is handy here.
>
>Now, that's not to say we cant construct cases where there is a lot of
>ranges being populated.. If we find the cache is really a problem, we
>can start caching ranges.. so that if all these i_x's were live
>somehow,
>there would only be one range for each in this case, and the cache's
>would simply all point to the same range.
I suppose using i in the catch block would create sth like this, a phi with a large in degree and thus the switch case you already know about?
>so far we havent really run across anything that appears to trigger
>significant concerns.
Sure. It's still worth thinking about worst case behavior and how you'd deal with that given once ranger is actually used we will eventually run into these cases. When
A forward evrp walk is then faster than a single on demand query that would be concerning.
Richard.
>
>
>>
>> If you actually try you probably will have to find a persisting
>> stmt that some user of on-demand query looks for.
>> You converted some of the warning passes so choose
>> a stmt that triggers it from there.
>>
>> Note we've run into "interesting" testcases mostly from
>> machine generated code which we still want to be able
>> to optimize at -O1 at least with reasonable use of
>> time and memory (and quadraticnesses are to be avoided
>> like a plague - not that we don't have some).
>>
>>>> I think thats the
>>>> most common value, so that should reduce a large number of them.
>I've
>>>> also considering caching ranges like we cache tree constants... but
>I
>>>> haven't delved into that. I figured if memory turns out to be a
>>>> problem, then we'll look at it then.
>>>>
>>>> Andrew
>>>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 17:07 ` Richard Biener
@ 2019-06-04 19:55 ` Andrew MacLeod
2019-06-05 9:59 ` Richard Biener
0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-06-04 19:55 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 6/4/19 1:07 PM, Richard Biener wrote:
> On June 4, 2019 6:50:07 PM GMT+02:00, Andrew MacLeod <amacleod@redhat.com> wrote:
>> On 6/4/19 11:37 AM, Richard Biener wrote:
>>
>>> where the single(!) query of the range of i on the alloca call
>>> will populate N BBs caches with the Nth having ranges for
>>> all SSA defs of i running into quadraticness in N.
>> well, not really.  its still linear since the 'i' used in each foo()
>> will be dead, so the next block will contain no range for it
>>
>>
>> try{
>> long i_2 = _1;
>> i_3 += foo(i_2);
>> i_4 += foo(i_3);
>> i_5 += foo(i_4);
>> i_6 += foo(i_5);
>> ... repeat N times ...
>> alloca(i_6); // query range of i
>>
>> once i_2 is 'dead' after the first call to foo(), it will no longer be
>> in the on-entry cache to any other block.
>> The walk back never see's i_2 until the first call to foo(), so none of
> It's actually
>
> Utemp_2 = foo(i_1);
>
> Bb:
> i_3 = i_1 + (long) utemp_2;
> Utemp_4 = foo(i_3);
>
> Eliding the separate stmt for the conversion. From you description of the cache it sounded like getting incoming values is a take-all operation. Otherwise you'd need negative entries as well (I thought the cache might not contain varying ranges for example).
ok, Utemp2 and i_1 will be live, so there are 2 ranges on entry . the
next block will have only utemp_4 and i_3 live on entry. So it'll be 2
ranges per block.
It currently does have varying in the table, but the varying value is
cached for the entire ssa_name in the cache. so 500 BB's with varying
range will only consume a single range.
By negative entries I assume you mean something like the bottom of a
diamond?
if (x_3 > 20)
 blah1 (x_3)
else
 blah 2(x)_3;
blah3 (x_3);
It does nothing special to come up with the range of x_3 at blah3 ().Â
It simply unions all the incoming ranges which is [21,MAX] union
[MIN,20]Â to produce the value [MIN, MAX] which is then put into the
cache. We don't use anything special to represent varying. Â
varying_p()Â merely checks if the range is [MIN, MAX]. And when putting
a value into the cache, if it is varying_p() the cache simply uses it's
unique copy of [MIN, MAX] rather than create a new one every time it is
needed.
I had originally considered not caching ranges spanning blocks in which
the range never changes... the trade off is you have an increase in
calculation time as you walk the CFG to find the last block in which it
did change. First I figured we'd first see if the less error prone full
cache causes any real problems or not. Â Its always an option to
implement later since the cache is self contained and can resolve its
queries internally however it wants.
>
>> blocks after that have a need for its range, so it will not in their
>> caches.
>>
>> The on-entry cache will only get a single range entry for each of those
>>
>> blocks.. the incoming value of i_x and thats it. The implicitly known
>> live-on-exit attribute is handy here.
>>
>> Now, that's not to say we cant construct cases where there is a lot of
>> ranges being populated.. If we find the cache is really a problem, we
>> can start caching ranges.. so that if all these i_x's were live
>> somehow,
>> there would only be one range for each in this case, and the cache's
>> would simply all point to the same range.
> I suppose using i in the catch block would create sth like this, a phi with a large in degree and thus the switch case you already know about?
It would be a PHI. None of the ranges coming into the block are
considered live on entry, so they dont get into the cache. The PHI
calculation asks for the incoming range on the edge for each parameter,
and we get a global range for the PHI definition calculated, and that is
it.
However, Thats not actually the switch problem :-)Â the switch problem
isn't due to a large degree of incoming edges, but rather the difficulty
in calculating case ranges on the outgoing edges from the switch itself.
stepping back slightly, Branches are handled generically the same way
any expression is, the outgoing edge you want information about provides
a constant range for the implicit LHS of the branch.
so for an if
 c_2 = x_3 > 20
 if (c_2 != 0)
the TRUE edge has a range of bool [1,1] , fed back as the implicit LHS
of the branch expression
[1,1] = (c_2 != 0)Â Â which range-ops for '!=' says c_2 must be [1,1] in
order to satisfy the equation.
[1,1] = x_3 > 20Â which range-ops for '>' says x_3 must have a range of
[21, MAX]
if you want the range on the FALSE edge, the edge starts with the range
[0,0] into the equation solver, and out pops [MIN, 20].
switch (i_3)
for a switch, it works the same way, except the edge has an integral
value, not a boolean one. This range is fed back into the implicit
branch expression:
[edge range] = i_3  ,  so i_3 has whatever value the edge has since
it amounts to a copy. Calculating this edge constant value is the problem.
BB2:
switch (i_3) {
BB3:
 case 4:
 case 5:
 case 9:
 case 10:
   foo (i_3)
In order to evaluate the case values on the edge from BB2 to BB3 asÂ
[4,5][9,10], we have to loop over all the cases that have a label at
the beginning of the block, and union them together. Calculating the
default value is even worse, start with varying and intersect out each
and every case.
The gimple representation does not make evaluating this easy nor cheap
since it merely has a list of case LOW/HIGHS with labels... so you have
to loop thru that entire list for each edge being evaluated to see if
the label is associated with this edge and then union together all those
that match. Â This is linear and cant be shortcut. multiple it by the
number of edges since you probably want a range on each edge and its
suddenly exponential.  The result is it can be *very* time consuming
if the switch is very, very large.
And it's a constant value that never changes, so it really could be
cheap. We've just never needed to ask this exact question before. We've
considered multiple ways to address this, but we wont worry about any
of them just yet. All in time :-)
>
>> so far we havent really run across anything that appears to trigger
>> significant concerns.
> Sure. It's still worth thinking about worst case behavior and how you'd deal with that given once ranger is actually used we will eventually run into these cases. When
> A forward evrp walk is then faster than a single on demand query that would be concerning.
>
> Richard.
Absolutely. I've been concerned about worst case since the get go.
Thats how its evolved to this design. We're trying to make sure the
worst case scenario is no worse than doing a normal walk.
We can currently run the ranger version of RVRP across blocks in
dominator order, reverse dominator order, linearly BB1 -> BBN, and
finally BBN-> BB1  There are fluctuations in timings, but nothing of
too much significance. They all run in similar times and pass all the
tests.
I have thoughts on how the various pieces can fit together with EVRP,Â
I'll stick that in another email.
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-05-31 15:40 ` Andrew MacLeod
2019-05-31 18:16 ` Jeff Law
2019-06-04 15:27 ` Richard Biener
@ 2019-06-04 20:04 ` Martin Sebor
2019-06-04 20:22 ` Marc Glisse
2 siblings, 1 reply; 21+ messages in thread
From: Martin Sebor @ 2019-06-04 20:04 UTC (permalink / raw)
To: Andrew MacLeod, Richard Biener; +Cc: GCC, Jeff Law, Aldy Hernandez
On 5/31/19 9:40 AM, Andrew MacLeod wrote:
> On 5/29/19 7:15 AM, Richard Biener wrote:
>> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com>
>> wrote:
>>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com>
>>>> wrote:
>>>>>> The above suggests that iff this is done at all it is not in GORI
>>>>>> because
>>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>>> telling us that for the stmt following above, for example
>>>>>>
>>>>>> Â Â Â if (_5 != 0)
>>>>>>
>>>>>> that _5 is not zero?
>>>>>>
>>>>>> Can you clarify?
>>>>> So there are 2 aspects to this.   the range-ops code for DIV_EXPR, if
>>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>>> But you are also correct in that the walk backwards would not find
>>>>> this.
>>>>>
>>>>> This is similar functionality to how null_derefs are currently
>>>>> handled,
>>>>> and in fact could probably be done simultaneously using the same code
>>>>> base.  I didn't bring null derefs up, but this is a good time :-)
>>>>>
>>>>> There is a separate class used by the gori-cache which tracks the
>>>>> non-nullness property at the block level.   It has a single API:
>>>>> non_null_deref_p (name, bb)Â Â Â which determines whether the is a
>>>>> dereference in any BB for NAME, which indicates whether the range
>>>>> has an
>>>>> implicit ~[0,0] range in that basic block or not.
>>>> So when we then have
>>>>
>>>> Â Â _1 = *_2; // after this _2 is non-NULL
>>>> Â Â _3 = _1 + 1; // _3 is non-NULL
>>>> Â Â _4 = *_3;
>>>> ...
>>>>
>>>> when a on-demand user asks whether _3 is non-NULL at the
>>>> point of _4 = *_3 we don't have this information? Since the
>>>> per-BB caching will only say _1 is non-NULL after the BB.
>>>> I'm also not sure whether _3 ever gets non-NULL during
>>>> non-NULL processing of the block since walking immediate uses
>>>> doesn't really help here?
>>> presumably _3 is globally non-null due to the definition being (pointer
>>> + x)Â ... ie, _3 has a global range o f ~[0,0] ?
>> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
>> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
>
> I'm confused.
>
> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no
> idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
> The only way I see to determine _3 is non-NULLÂ is through the _4 = *_3
> statement.
In the first two statements from the above (where _1 is a pointer):
_1 = *_2;
_3 = _1 + 1;
_1 must be non-null because C/C++ define pointer addition only for
non-null pointers, and therefore so must _3.
Or does the middle-end allow arithmetic on null pointers?
Martin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 20:04 ` Martin Sebor
@ 2019-06-04 20:22 ` Marc Glisse
2019-06-04 21:40 ` Andrew MacLeod
0 siblings, 1 reply; 21+ messages in thread
From: Marc Glisse @ 2019-06-04 20:22 UTC (permalink / raw)
To: Martin Sebor
Cc: Andrew MacLeod, Richard Biener, GCC, Jeff Law, Aldy Hernandez
On Tue, 4 Jun 2019, Martin Sebor wrote:
> On 5/31/19 9:40 AM, Andrew MacLeod wrote:
>> On 5/29/19 7:15 AM, Richard Biener wrote:
>>> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacleod@redhat.com>
>>> wrote:
>>>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacleod@redhat.com>
>>>>> wrote:
>>>>>>> The above suggests that iff this is done at all it is not in GORI
>>>>>>> because
>>>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>>>> telling us that for the stmt following above, for example
>>>>>>>
>>>>>>> Â Â Â if (_5 != 0)
>>>>>>>
>>>>>>> that _5 is not zero?
>>>>>>>
>>>>>>> Can you clarify?
>>>>>> So there are 2 aspects to this.   the range-ops code for DIV_EXPR, if
>>>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>>>> But you are also correct in that the walk backwards would not find
>>>>>> this.
>>>>>>
>>>>>> This is similar functionality to how null_derefs are currently handled,
>>>>>> and in fact could probably be done simultaneously using the same code
>>>>>> base.  I didn't bring null derefs up, but this is a good time :-)
>>>>>>
>>>>>> There is a separate class used by the gori-cache which tracks the
>>>>>> non-nullness property at the block level.   It has a single API:
>>>>>> non_null_deref_p (name, bb)Â Â Â which determines whether the is a
>>>>>> dereference in any BB for NAME, which indicates whether the range has
>>>>>> an
>>>>>> implicit ~[0,0] range in that basic block or not.
>>>>> So when we then have
>>>>>
>>>>> Â Â _1 = *_2; // after this _2 is non-NULL
>>>>> Â Â _3 = _1 + 1; // _3 is non-NULL
>>>>> Â Â _4 = *_3;
>>>>> ...
>>>>>
>>>>> when a on-demand user asks whether _3 is non-NULL at the
>>>>> point of _4 = *_3 we don't have this information? Since the
>>>>> per-BB caching will only say _1 is non-NULL after the BB.
>>>>> I'm also not sure whether _3 ever gets non-NULL during
>>>>> non-NULL processing of the block since walking immediate uses
>>>>> doesn't really help here?
>>>> presumably _3 is globally non-null due to the definition being (pointer
>>>> + x)Â ... ie, _3 has a global range o f ~[0,0] ?
>>> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
>>> you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.
>>
>> I'm confused.
>>
>> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no idea
>> what the range of _1 is, so how do you assert _1 is [~0,0] ?
>> The only way I see to determine _3 is non-NULLÂ is through the _4 = *_3
>> statement.
>
> In the first two statements from the above (where _1 is a pointer):
>
> _1 = *_2;
> _3 = _1 + 1;
>
> _1 must be non-null because C/C++ define pointer addition only for
> non-null pointers, and therefore so must _3.
(int*)0+0 is well-defined, so this uses the fact that 1 is non-null. This
is all well done in extract_range_from_binary_expr already, although it
seems to miss the (dangerous) optimization NULL + unknown == NULL.
Just in case, a quote:
"When an expression J that has integral type is added to or subtracted
from an expression P of pointer type, the result has the type of P.
(4.1) â If P evaluates to a null pointer value and J evaluates to 0, the
result is a null pointer value.
(4.2) â Otherwise, if P points to element x[i] of an array object x with
n elements, 80 the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ⤠i + j ⤠n
and the expression P - J points to the (possibly-hypothetical) element
x[i â j] if 0 ⤠i â j ⤠n.
(4.3) â Otherwise, the behavior is undefined"
> Or does the middle-end allow arithmetic on null pointers?
When people use -fno-delete-null-pointer-checks because their (embedded)
platform has important stuff at address 0, they also want to be able to do
arithmetic there.
--
Marc Glisse
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 20:22 ` Marc Glisse
@ 2019-06-04 21:40 ` Andrew MacLeod
0 siblings, 0 replies; 21+ messages in thread
From: Andrew MacLeod @ 2019-06-04 21:40 UTC (permalink / raw)
To: gcc, Martin Sebor; +Cc: Richard Biener, Jeff Law, Aldy Hernandez
On 6/4/19 4:21 PM, Marc Glisse wrote:
> On Tue, 4 Jun 2019, Martin Sebor wrote:
>
>> On 5/31/19 9:40 AM, Andrew MacLeod wrote:
>>> On 5/29/19 7:15 AM, Richard Biener wrote:
>>>> On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod
>>>> <amacleod@redhat.com> wrote:
>>>>> On 5/27/19 9:02 AM, Richard Biener wrote:
>>>>>> On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod
>>>>>> <amacleod@redhat.com> wrote:
>>>>>>>> The above suggests that iff this is done at all it is not in
>>>>>>>> GORI because
>>>>>>>> those are not conditional stmts or ranges from feeding those. The
>>>>>>>> machinery doing the use-def walking from stmt context also cannot
>>>>>>>> come along these so I have the suspicion that Ranger cannot handle
>>>>>>>> telling us that for the stmt following above, for example
>>>>>>>>
>>>>>>>> Â Â Â if (_5 != 0)
>>>>>>>>
>>>>>>>> that _5 is not zero?
>>>>>>>>
>>>>>>>> Can you clarify?
>>>>>>> So there are 2 aspects to this.   the range-ops code for
>>>>>>> DIV_EXPR, if
>>>>>>> asked for the range of op2 () would return ~[0,0] for _5.
>>>>>>> But you are also correct in that the walk backwards would not
>>>>>>> find this.
>>>>>>>
>>>>>>> This is similar functionality to how null_derefs are currently
>>>>>>> handled,
>>>>>>> and in fact could probably be done simultaneously using the same
>>>>>>> code
>>>>>>> base.  I didn't bring null derefs up, but this is a good time :-)
>>>>>>>
>>>>>>> There is a separate class used by the gori-cache which tracks the
>>>>>>> non-nullness property at the block level.   It has a single API:
>>>>>>> non_null_deref_p (name, bb)Â Â Â which determines whether the is a
>>>>>>> dereference in any BB for NAME, which indicates whether the
>>>>>>> range has an
>>>>>>> implicit ~[0,0] range in that basic block or not.
>>>>>> So when we then have
>>>>>>
>>>>>> Â Â _1 = *_2; // after this _2 is non-NULL
>>>>>> Â Â _3 = _1 + 1; // _3 is non-NULL
>>>>>> Â Â _4 = *_3;
>>>>>> ...
>>>>>>
>>>>>> when a on-demand user asks whether _3 is non-NULL at the
>>>>>> point of _4 = *_3 we don't have this information? Since the
>>>>>> per-BB caching will only say _1 is non-NULL after the BB.
>>>>>> I'm also not sure whether _3 ever gets non-NULL during
>>>>>> non-NULL processing of the block since walking immediate uses
>>>>>> doesn't really help here?
>>>>> presumably _3 is globally non-null due to the definition being
>>>>> (pointer
>>>>> + x)Â ... ie, _3 has a global range o f ~[0,0] ?
>>>> No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
>>>> you cannot arrive at NULL by pointer arithmetic from a non-NULL
>>>> pointer.
>>>
>>> I'm confused.
>>>
>>> _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have
>>> no idea what the range of _1 is, so how do you assert _1 is [~0,0] ?
>>> The only way I see to determine _3 is non-NULLÂ is through the _4 =
>>> *_3 statement.
>>
>> In the first two statements from the above (where _1 is a pointer):
>>
>> Â _1 = *_2;
>> Â _3 = _1 + 1;
>>
>> _1 must be non-null because C/C++ define pointer addition only for
>> non-null pointers, and therefore so must _3.
>
> (int*)0+0 is well-defined, so this uses the fact that 1 is non-null.
> This is all well done in extract_range_from_binary_expr already,
> although it seems to miss the (dangerous) optimization NULL + unknown
> == NULL.
>
> Just in case, a quote:
>
> "When an expression J that has integral type is added to or subtracted
> from an expression P of pointer type, the result has the type of P.
> (4.1) â If P evaluates to a null pointer value and J evaluates to 0, the
> result is a null pointer value.
> (4.2) â Otherwise, if P points to element x[i] of an array object x with
> n elements, 80 the expressions P + J and J + P (where J has the value j)
> point to the (possibly-hypothetical) element x[i + j] if 0 ⤠i + j ⤠n
> and the expression P - J points to the (possibly-hypothetical) element
> x[i â j] if 0 ⤠i â j ⤠n.
> (4.3) â Otherwise, the behavior is undefined"
>
>> Or does the middle-end allow arithmetic on null pointers?
>
> When people use -fno-delete-null-pointer-checks because their
> (embedded) platform has important stuff at address 0, they also want
> to be able to do arithmetic there.
>
Plus with the new range ops code, we want to be able to solve for any
operand given ranges for the other 2, not just for the LHS given op1 and
op2. Â Â so exact details are good :-)
Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: On-Demand range technology [2/5] - Major Components : How it works
2019-06-04 19:55 ` Andrew MacLeod
@ 2019-06-05 9:59 ` Richard Biener
0 siblings, 0 replies; 21+ messages in thread
From: Richard Biener @ 2019-06-05 9:59 UTC (permalink / raw)
To: Andrew MacLeod; +Cc: GCC, Jeff Law, Aldy Hernandez
On Tue, Jun 4, 2019 at 9:55 PM Andrew MacLeod <amacleod@redhat.com> wrote:>
> On 6/4/19 1:07 PM, Richard Biener wrote:
> > On June 4, 2019 6:50:07 PM GMT+02:00, Andrew MacLeod <amacleod@redhat.com> wrote:
> >> On 6/4/19 11:37 AM, Richard Biener wrote:
> >>
> >>> where the single(!) query of the range of i on the alloca call
> >>> will populate N BBs caches with the Nth having ranges for
> >>> all SSA defs of i running into quadraticness in N.
> >> well, not really. its still linear since the 'i' used in each foo()
> >> will be dead, so the next block will contain no range for it
> >>
> >>
> >> try{
> >> long i_2 = _1;
> >> i_3 += foo(i_2);
> >> i_4 += foo(i_3);
> >> i_5 += foo(i_4);
> >> i_6 += foo(i_5);
> >> ... repeat N times ...
> >> alloca(i_6); // query range of i
> >>
> >> once i_2 is 'dead' after the first call to foo(), it will no longer be
> >> in the on-entry cache to any other block.
> >> The walk back never see's i_2 until the first call to foo(), so none of
> > It's actually
> >
> > Utemp_2 = foo(i_1);
> >
> > Bb:
> > i_3 = i_1 + (long) utemp_2;
> > Utemp_4 = foo(i_3);
> >
> > Eliding the separate stmt for the conversion. From you description of the cache it sounded like getting incoming values is a take-all operation. Otherwise you'd need negative entries as well (I thought the cache might not contain varying ranges for example).
>
> ok, Utemp2 and i_1 will be live, so there are 2 ranges on entry . the
> next block will have only utemp_4 and i_3 live on entry. So it'll be 2
> ranges per block.
>
>
> It currently does have varying in the table, but the varying value is
> cached for the entire ssa_name in the cache. so 500 BB's with varying
> range will only consume a single range.
>
> By negative entries I assume you mean something like the bottom of a
> diamond?
>
> if (x_3 > 20)
> blah1 (x_3)
> else
> blah 2(x)_3;
> blah3 (x_3);
>
> It does nothing special to come up with the range of x_3 at blah3 ().
> It simply unions all the incoming ranges which is [21,MAX] union
> [MIN,20] to produce the value [MIN, MAX] which is then put into the
> cache. We don't use anything special to represent varying.
> varying_p() merely checks if the range is [MIN, MAX]. And when putting
> a value into the cache, if it is varying_p() the cache simply uses it's
> unique copy of [MIN, MAX] rather than create a new one every time it is
> needed.
>
>
> I had originally considered not caching ranges spanning blocks in which
> the range never changes... the trade off is you have an increase in
> calculation time as you walk the CFG to find the last block in which it
> did change. First I figured we'd first see if the less error prone full
> cache causes any real problems or not. Its always an option to
> implement later since the cache is self contained and can resolve its
> queries internally however it wants.
>
> >
> >> blocks after that have a need for its range, so it will not in their
> >> caches.
> >>
> >> The on-entry cache will only get a single range entry for each of those
> >>
> >> blocks.. the incoming value of i_x and thats it. The implicitly known
> >> live-on-exit attribute is handy here.
> >>
> >> Now, that's not to say we cant construct cases where there is a lot of
> >> ranges being populated.. If we find the cache is really a problem, we
> >> can start caching ranges.. so that if all these i_x's were live
> >> somehow,
> >> there would only be one range for each in this case, and the cache's
> >> would simply all point to the same range.
> > I suppose using i in the catch block would create sth like this, a phi with a large in degree and thus the switch case you already know about?
>
> It would be a PHI. None of the ranges coming into the block are
> considered live on entry, so they dont get into the cache. The PHI
> calculation asks for the incoming range on the edge for each parameter,
> and we get a global range for the PHI definition calculated, and that is
> it.
Thanks for all the clarifications.
> However, Thats not actually the switch problem :-) the switch problem
> isn't due to a large degree of incoming edges, but rather the difficulty
> in calculating case ranges on the outgoing edges from the switch itself.
>
> stepping back slightly, Branches are handled generically the same way
> any expression is, the outgoing edge you want information about provides
> a constant range for the implicit LHS of the branch.
>
> so for an if
> c_2 = x_3 > 20
> if (c_2 != 0)
>
> the TRUE edge has a range of bool [1,1] , fed back as the implicit LHS
> of the branch expression
> [1,1] = (c_2 != 0) which range-ops for '!=' says c_2 must be [1,1] in
> order to satisfy the equation.
> [1,1] = x_3 > 20 which range-ops for '>' says x_3 must have a range of
> [21, MAX]
>
> if you want the range on the FALSE edge, the edge starts with the range
> [0,0] into the equation solver, and out pops [MIN, 20].
>
> switch (i_3)
>
> for a switch, it works the same way, except the edge has an integral
> value, not a boolean one. This range is fed back into the implicit
> branch expression:
> [edge range] = i_3 , so i_3 has whatever value the edge has since
> it amounts to a copy. Calculating this edge constant value is the problem.
>
> BB2:
> switch (i_3) {
> BB3:
> case 4:
> case 5:
> case 9:
> case 10:
> foo (i_3)
>
> In order to evaluate the case values on the edge from BB2 to BB3 as
> [4,5][9,10], we have to loop over all the cases that have a label at
> the beginning of the block, and union them together. Calculating the
> default value is even worse, start with varying and intersect out each
> and every case.
>
> The gimple representation does not make evaluating this easy nor cheap
> since it merely has a list of case LOW/HIGHS with labels... so you have
> to loop thru that entire list for each edge being evaluated to see if
> the label is associated with this edge and then union together all those
> that match. This is linear and cant be shortcut. multiple it by the
> number of edges since you probably want a range on each edge and its
> suddenly exponential. The result is it can be *very* time consuming
> if the switch is very, very large.
>
> And it's a constant value that never changes, so it really could be
> cheap. We've just never needed to ask this exact question before. We've
> considered multiple ways to address this, but we wont worry about any
> of them just yet. All in time :-)
Hmm, shouldn't it be
range edge_ranges[number_of_succs];
for (int i = 0; i < gimple_switch_num_labels (); ++i)
{
tree casel = gimple_switch_label (s, i);
basic_block target = label_to_block (CASE_LABEL (casel));
edge e = find_edge (gimple_bb (s), target); // OK, this needs to
be optimized, linear
union_range (edge_ranges[e->src_index], casel-range); //
similarly, we don't have src_index
}
where the missing detail is precomputing the find_edge + src index things
in a hash_map <dest-block-index, src-index> map. That's possible by
a simple linear walk of switch successor edges.
So computing the edge ranges is linear. The default case needs special-casing
and its edge range computation delayed until we know the union of all
case ranges.
But yes, I agree the GIMPLE switch representation isn't exactly helpful...
But I didn't figure out a good improvement yet. One possibility is to
do sth similar to PHIs and split the case labels into groups associated
with edges indexed by the index of the edge in the src->succs vector
(there's "conveniently" CASE_CHAIN which I think is unused). Doing
this requires CFG hook adjustments similar to how we handle PHIs
since we have to adjust the edge index based case group vector.
Btw, there's the function-level {start,end}_recording_cases functionality
which gives you get_cases_for_edge which is overall still quadratic
since it uses find_edge as in my pseudo-code above. Re-implementing
it to compute all in advance would solve this.
Richard.
>
>
>
>
> >
> >> so far we havent really run across anything that appears to trigger
> >> significant concerns.
> > Sure. It's still worth thinking about worst case behavior and how you'd deal with that given once ranger is actually used we will eventually run into these cases. When
> > A forward evrp walk is then faster than a single on demand query that would be concerning.
> >
> > Richard.
> Absolutely. I've been concerned about worst case since the get go.
> Thats how its evolved to this design. We're trying to make sure the
> worst case scenario is no worse than doing a normal walk.
>
> We can currently run the ranger version of RVRP across blocks in
> dominator order, reverse dominator order, linearly BB1 -> BBN, and
> finally BBN-> BB1 There are fluctuations in timings, but nothing of
> too much significance. They all run in similar times and pass all the
> tests.
>
> I have thoughts on how the various pieces can fit together with EVRP,
> I'll stick that in another email.
>
> Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-06-05 9:59 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 1:28 On-Demand range technology [2/5] - Major Components : How it works Andrew MacLeod
2019-05-23 12:55 ` Richard Biener
2019-05-24 15:50 ` Andrew MacLeod
2019-05-27 13:03 ` Richard Biener
2019-05-28 14:17 ` Andrew MacLeod
2019-05-29 11:16 ` Richard Biener
2019-05-31 15:40 ` Andrew MacLeod
2019-05-31 18:16 ` Jeff Law
2019-05-31 20:27 ` Andrew MacLeod
2019-05-31 22:00 ` Jeff Law
2019-05-31 23:08 ` Andrew MacLeod
2019-06-04 15:27 ` Richard Biener
2019-06-04 15:38 ` Richard Biener
2019-06-04 16:50 ` Andrew MacLeod
2019-06-04 17:07 ` Richard Biener
2019-06-04 19:55 ` Andrew MacLeod
2019-06-05 9:59 ` Richard Biener
2019-06-04 16:49 ` Andrew MacLeod
2019-06-04 20:04 ` Martin Sebor
2019-06-04 20:22 ` Marc Glisse
2019-06-04 21:40 ` Andrew MacLeod
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).