public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* On-Demand range technology [2/5] - Major Components : How it works
@ 2019-05-23  1:28 Andrew MacLeod
  2019-05-23 12:55 ` Richard Biener
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew MacLeod @ 2019-05-23  1:28 UTC (permalink / raw)
  To: GCC; +Cc: Jeff Law, Aldy Hernandez

*This note will talk about the 4 major components of the prototype and 
explain how they work together.   I will be fairly light on detail just 
to give an overview, we can delve into whatever details are needed.
- Range-ops : Range operations at the statement level
- GORI - Generates Outgoing Range Info : Ranges generated by basic blocks
- GORI Cache - combining and caching basic-block range info
- Ranger - API and query control

1 * Range-ops
--------------------
The first major component is the centralized range operation database. 
This is where range operations for tree-codes are implemented.  The 
compiler currently has code which can fold ranges, but the new mechanism 
is a general purpose solver which can solve for other operands. If there 
are N input/output operands, and we have ranges for N-1, It is often 
possible to derive the missing range.   ie
     lhs = op1 + op2
The existing code in the compiler can solve for LHS given ranges for OP1 
and OP2. This has been extended so that we can also sometimes solve for 
either op1 or op2 e, ie
     [20,40] = op1 + [5, 10]
...can calculate that op1 has the range [15, 30]

This ability to solve for the other operands provides the ability to 
calculate ranges in the reverse order we are accustomed to, and is key 
to enabling the on-demand range approach.
     a_2 = b_1 - 20
     if (a_2 < 40)
  A conditional jump has an implicit boolean LHS depending on which edge 
is taken. To evaluate ranges on the TRUE edge of the branch, the LHS is 
[1,1]. To calculate the range for a_2 we simply solve the equation:
     [1,1] = a_2 < 40
which provides the answer as [0,39].  Furthermore, substituting this 
range for a_2 as the LHS of it’s definition statement:
     [0,39] = b_1 - 20
The same evaluation  mechanism can calculate a result for b_1 on the 
true edge as [20,59]. This is the key feature which allows the rest of 
the algorithm to work in a general way.

All operations are driven from this component, and the only thing 
required to add support for another tree code is to implement one or 
more methods for it here, and the rest of the range mechanism will 
simply work with it.


2 * GORI
------------
The second component is the “Generates Outgoing Range Info” engine.  
This is a basic-block oriented component which determines what ssa-names 
have ranges created on outgoing edges, and how to calculate those ranges.

The last statement in the block is examined to see if it is a branch 
which generates range info, and then determines which ssa-names in the 
block can have ranges calculated.  It quickly walks the use-def chain 
from the branch to find other definitions within the block that can 
impact the branch and could also have their ranges calculated. This 
summary is then cached for future use.

The GORI component also uses this summary info to perform the 
calculations necessary to determine the outgoing range for any ssa_name 
which can be determined.  For example:
     c_3 = foo ()
     a_2 = b_1 - 20
     If (a_2 < 40)
The summary information would indicate that b_1 and a_2 can have their 
outgoing ranges calculated for this block, and uses the cached 
information to quickly calculate them when required.
The API consists of 2 basic methods, query and calculate:
  - bool has_edge_range_p (edge, name);
  - range outgoing_edge_range (edge, name);
If a query is made for any other ssa-name, it simply returns false and 
says this block does not generate a range for that name.  This is a key 
rationale for the summary so we only spend time processing names for 
blocks that actually have ranges generated.


3 * GORI cache
---------------------
The third component builds on the basic block GORI component, and adds 
the ability to traverse the CFG and combine the various ranges provided 
by each basic block into a cache.

It provides both a global-range cache and a range-on-entry cache 
summarizing the range of an ssa_name on entry to the block.

The range-on-entry cache is self filling, and iteratively evaluates back 
edges. There is a single API entry point which asks for the range on 
entry, and if the data has not been calculated/cached already, it spawns 
the following process to calculate the data. :
     * Walk the CFG from the current block to the definition block, 
and/or any block which has range-on-entry information already 
calculated. These end points are the source blocks.
     * Any non-source blocks visited are added to a worklist to be updated.
     * Starting with the source blocks, push the known outgoing range 
from each block to each successor and update their live-on-entry values 
when they are visited.  If the incoming range changes, mark this block’s 
successors as requiring updating.
     * Continue iterating until no more updates are required.

The ordering is planned by the ranger such that if there are no back 
edges, it is typically a straightforward single pass.  When back edges 
are involved, the amount of iterating is typically very small. The 
updating is restricted to a single ssa-name, meaning it doesn’t get into 
feedback loops with other names nor PHIs. . . It usually converges very 
quickly.

It is important to note that this works exclusively with static ranges 
of only a single ssa-name at a time. Ie, ranges which are implicitly 
exposed in the IL, and only name being examined. The values returned by 
these queries are not dependent on changes in other ssa-names, which is 
how the iteration process never gets out of control.

The ranger making the calls to fill this cache has a higher level 
overview, and requests these ranges in definition order such that any 
ssa-names feeding the definition of a name having its cache filled are 
resolved first, providing the best possible results the first time.
     a_2 = b_1 - 20
     If (a_2 < 40)
If the range for a_2 is requested on the true side, the ranger will 
first calculate the range of b_1 on entry to the block. Then use this to 
calculate the global range of a_2, and finally for the outgoing range on 
the desired edge.

If at some later point, it is discovered that the incoming range of b_1 
has changed in such a way that it has an impact on the outgoing range of 
a_2, the iterative update process can be reapplied by the ranger to 
update the relevant cache entries. This is usually only required in 
cases where multiple ssa-names are affected by back edges and feed each 
other.

4 * Ranger
----------------
The final component is the Ranger which provides a simple API to clients 
where they can make various simple requests:
     - Range of an ssa-name at a statement location
     - Range of the result of a stmt
     - Range of an ssa-name on an edge
     - Range of an ssa-name after the last statement in a block
     - Range of an ssa name on entry to a block

The ranger is responsible for processing the IL, walking use/def chains  
and coordinating the various calls into the GORI components to 
pre-satisfy as many conditions as possible before any cache filling is 
performed.  It is also responsible for triggering any additional updates 
which may be required due to newly discovered ranges. We in fact don’t 
do this yet, but is rarely required as it turns out.

Summary
---------------
All evaluations are done on-demand. If no queries are made, there is no 
cost. The Ranger has no prerequisites other than a valid IL and CFG. It 
does not need dominators, nor does it require any other changes within 
an optimization pass in order to be used.

When queries are made, only the minimum effort required is made to 
satisfy the request. This is a big win when it comes to passes which 
only need occasional range information such as warning passes.  Some of 
these passes actually instantiate a new ranger each time they make a 
request, as a demonstration of the low overhead.

As for passes such as VRP which require complete range information, the 
caching process prevents the same information from being calculated more 
than once.  This means a similar amount of work is done with this 
approach, as with the traditional top-down approach currently being 
used, except we also process the back edges and iteratively solve for them.

**Comments and feedback always welcome!Thanks
Andrew
*

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-06-05  9:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23  1:28 On-Demand range technology [2/5] - Major Components : How it works Andrew MacLeod
2019-05-23 12:55 ` Richard Biener
2019-05-24 15:50   ` Andrew MacLeod
2019-05-27 13:03     ` Richard Biener
2019-05-28 14:17       ` Andrew MacLeod
2019-05-29 11:16         ` Richard Biener
2019-05-31 15:40           ` Andrew MacLeod
2019-05-31 18:16             ` Jeff Law
2019-05-31 20:27               ` Andrew MacLeod
2019-05-31 22:00                 ` Jeff Law
2019-05-31 23:08                   ` Andrew MacLeod
2019-06-04 15:27             ` Richard Biener
2019-06-04 15:38               ` Richard Biener
2019-06-04 16:50                 ` Andrew MacLeod
2019-06-04 17:07                   ` Richard Biener
2019-06-04 19:55                     ` Andrew MacLeod
2019-06-05  9:59                       ` Richard Biener
2019-06-04 16:49               ` Andrew MacLeod
2019-06-04 20:04             ` Martin Sebor
2019-06-04 20:22               ` Marc Glisse
2019-06-04 21:40                 ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).