Higher level RTL issues

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Higher level RTL issues
@ 2001-10-09 19:46 law
  2001-10-09 20:54 ` Diego Novillo
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: law @ 2001-10-09 19:46 UTC (permalink / raw)
  To: gcc

I know folks have heard me mention the idea of a higher level RTL from
time to time.  I must apologize for not posting this earlier -- oh well,
life goes on.  I must also apologize as this isn't 100% complete, but
given that I haven't posted anything about it already and the fact that
I'm going on vacation tomorrow, I figured I'd at least get this info out
and let folks chew on it while I'm gone.

I don't think there's any significant disagreement that a higher level
IL would be useful.  What is a subject of debate is whether we have
a lower tree form, higher RTL form or some other form.  I think if we
look at the set of problems we want to solve we can get a good feel for
which of the 3 choices makes the most sense.

I'm looking at the new IL as a way to enable an SSA optimization path.  On
that optimization path I'm initially looking at the following optimizations:

	* Sparse conditional constant propagation (ssa-ccp.c)

	* Traditional dead code elimination (ssa-dce.c)

	* Aggressive dead code elimination (ssa-dce.c)

	* Register coalescing and renaming (ssa.c)

	* Dominator based value numbering (will eventually be in ssa.c)

Certainly there will be more optimizations in the future, but I think this
set gives us a good idea of the kinds of optimization problems I want to 
solve.

Let's first consider Tree SSA work Diego's doing vs our existing SSA
translator.  The SSA naming discipline in Diego's work is merely a
conceptual model -- we do not actually rewrite the tree structures to
conform to SSA naming rules.  The nice thing about this model is you
don't actually have to convert the IL into and out of SSA form.

That model works fine for a class of optimization problems, but breaks down
with any optimization which potentially has two SSA names for the same 
variable live at the same point in the program.  GVN and dominator based
value numbering can create situations where we have two SSA names live at
the same time.  It's probably true that other interesting optimizations
could create similar situations.  I believe this argues that either the
tree SSA needs to do rewriting or that tree SSA is not suitable for the
class of optimizations I'm trying to implement.

Now let's consider writing a new IL to sit between trees and RTL.  It can
certainly be done, no doubt about that.  However, that means we have to
come up with a new class of routines to analyze and manipulate this new IL
as well as translators in/out of this new IL.  It means we as developers have
to familiarize ourselves with a whole new IL.  It means that we can't use any
of the existing work in the SSA translator and optimization passes.  In short
I just don't think it makes a lot of sense.

So that leaves the 3rd alternative and my preferred solution -- a higher level
RTL.  The biggest benefits of this direction are that we already have code
which can analyze and manipulate RTL which would likely only need minor
extensions.  Similarly I believe the learning curve for us as developers is
a whole lot smaller since we're still working with RTL.

So, given the set of optimization problems I want to solve and the possible
solutions using a higher level RTL seems like the natural choice to me.

--

If we now switch gears and assume that we're going to have a higher level
RTL representation we have the interesting problem of defining the higher
level RTL representation and means by which we lower from the higher level
RTL representation into RTL as we know it today.

Given that I'm looking to have this new RTL layer be suitable for a class
of SSA optimizers, I've largely let the needs of those optimizers define
the properties of the new RTL layer.

As a mental model, we have a set of standard patterns that are defined
unconditionally for all targets.   Those patterns handle standard
operations -- data movement, arithmetic, logical, flow control, etc. 
The operands within those patterns have the property that registers and
constants are interchangable -- ie, if we have a register, we can 
replace it with a constant and the result is a valid insn that does not
need re-recognition.  Furthermore, we have a set of standard addressing
modes (TBD, but I'm considering reg, reg + disp, constant).  MEMs can
be replaced by registers and thus by constants under the same rules.
[ It's not clear where we're going to allow MEMs, but anywhere we allow
  a MEM will also allow a REG or constant. ]

There are no libcalls, subregs, cc0, etc which cause us so much trouble
in SSA form.

If we have this higher level RTL we have the interesting problem of 
lowering from the high level RTL to RTL as we know it today.  Clearly
we want to avoid having to rewrite each backend.  In an ideal world
we wouldn't have to twiddle backends at all, but I'm not sure if
that's feasible in practice.  So it is a goal to not have to twiddle
backends, but it is not an absolute requirement.

I believe we want to structure the lowering phase in the following
manner (many thanks to Richard for help in this space):

  a. First try to match the high level RTL to a pattern in the
     target's backend.  If we get a match, then no lowering was
     needed.

  b. See if the RTL matches a define_split in the backend, if so
     call the target backend's splitter to perform the lowering.
     The insns resulting from the split must match patterns in the
     target's backend.

  c. Extract the operator and its operands and call back into the
     expand_foo routines we currently have (expr.c optabs.c).  Those
     will use the target's define_expands and the generic lowering code we've
     already got in expr.c, optabs.c and friends to generate target
     specific RTL.

Once the lowering phase is started, the patterns for the higher level
RTL are no longer allowed to match.  

--

This design has come from my experiences in trying to build out the
fledgling SSA optimization path mentioned at the start of this message.

I've actually implemented this stuff by hand on one embedded target using
state variables and define_splits for lowering as a proof of concept.  It
worked well enough to build libgcc, newlib, libstdc++ and run the various
testsuites (including the proprietary testsuites Red Hat uses internally).

I've also had this up and running on IA-64 linux -- which included successful
bootstrapping, spec runs, etc.

Anyway, chew on it for a while and I'll join in the discussion when I return
from vacation.

jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 19:46 Higher level RTL issues law
@ 2001-10-09 20:54 ` Diego Novillo
  2001-10-09 21:27   ` Daniel Berlin
  2001-10-19 12:19   ` law
  2001-10-09 21:18 ` Daniel Berlin
  2001-10-13  2:27 ` Jan Hubicka
  2 siblings, 2 replies; 34+ messages in thread
From: Diego Novillo @ 2001-10-09 20:54 UTC (permalink / raw)
  To: law; +Cc: gcc

On Tue, 09 Oct 2001, Jeff Law wrote:

> I don't think there's any significant disagreement that a higher level
> IL would be useful.  What is a subject of debate is whether we have
> a lower tree form, higher RTL form or some other form.  I think if we
>
Or both.  The tree form is still too high-level for some
transformations.  We can still lower the tree representation
without actually creating a new IL.  This lowering would preserve
full symbolic and array information while simplifying the
expression trees to remove most of the language semantics
(side-effects, sequence points, etc).

I'm thinking of trees that ressemble three-address ILs.  They
would look just like trees to the tree-to-RTL pass and more
importantly they would provide a high-level IL that is more
language-independent than the current trees.  

> That model works fine for a class of optimization problems, but breaks down
> with any optimization which potentially has two SSA names for the same 
> variable live at the same point in the program.  GVN and dominator based
> value numbering can create situations where we have two SSA names live at
> the same time.  
>
Sorry, you missed me here.  Would you have an example?  I agree
that we might find the FUD chains representation restricted for
some optimizations, but I'm not sure I follow this particular
case.  We can at least support most/all interesting loop
transformations with FUD chains.

> Now let's consider writing a new IL to sit between trees and RTL.  It can
> certainly be done, no doubt about that.  However, that means we have to
> come up with a new class of routines to analyze and manipulate this new IL
> as well as translators in/out of this new IL.  It means we as developers have
> to familiarize ourselves with a whole new IL.  It means that we can't use any
> of the existing work in the SSA translator and optimization passes.  In short
> I just don't think it makes a lot of sense.
> 
Agreed, it needs to be an IL that we can evolve from an existing
one.  Either a lower-level tree representation as I outlined
above or a higher-level RTL, or both.  Each representation
supports orthogonal classes of transformations.  Anything closer
to the source is probably easier to do at some form of tree IL.


Diego.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 19:46 Higher level RTL issues law
  2001-10-09 20:54 ` Diego Novillo
@ 2001-10-09 21:18 ` Daniel Berlin
  2001-10-09 21:33   ` Diego Novillo
  2001-10-19 14:37   ` law
  2001-10-13  2:27 ` Jan Hubicka
  2 siblings, 2 replies; 34+ messages in thread
From: Daniel Berlin @ 2001-10-09 21:18 UTC (permalink / raw)
  To: law; +Cc: gcc

On Tuesday, October 9, 2001, at 10:48  PM, law@redhat.com wrote:

>
> I know folks have heard me mention the idea of a higher level RTL from
> time to time.  I must apologize for not posting this earlier -- oh well,
> life goes on.  I must also apologize as this isn't 100% complete, but
> given that I haven't posted anything about it already and the fact that
> I'm going on vacation tomorrow, I figured I'd at least get this info out
> and let folks chew on it while I'm gone.
>
> I don't think there's any significant disagreement that a higher level
> IL would be useful.
>  What is a subject of debate is whether we have
> a lower tree form, higher RTL form or some other form.  I think if we
> look at the set of problems we want to solve we can get a good feel for
> which of the 3 choices makes the most sense.
>
>
> I'm looking at the new IL as a way to enable an SSA optimization path.
Okey dokey.

>  On
> that optimization path I'm initially looking at the following 
> optimizations:
>
> 	* Sparse conditional constant propagation (ssa-ccp.c)
>
> 	* Traditional dead code elimination (ssa-dce.c)
>
> 	* Aggressive dead code elimination (ssa-dce.c)
>
> 	* Register coalescing and renaming (ssa.c)
>
> 	* Dominator based value numbering (will eventually be in ssa.c)
>
> Certainly there will be more optimizations in the future, but I think 
> this
> set gives us a good idea of the kinds of optimization problems I want to
> solve.
I don't think that these are representative of the type of problems we 
*should* be looking to solve.
These are pretty basic ssa optimizations, not something to look at when 
determining the type of IL you need.
If we only use these as the things to base requirements on, you'll never 
get much improvement in performance.
At least, as they exist now.
Why?
Because the main optimizations needed right now relate to memory and 
loops.
You can perform very interesting loop and memory optimizations in SSA 
form.
However, having simply a higher level RTL would not help them 
significantly (>5%, i'd guesstimate), unless you tackle the issues of 
array indices and memory.
So to not take this into account at all when designing an IR better for 
SSA (whether it be a higher level RTL or whatever), would be a shame.

>
> Let's first consider Tree SSA work Diego's doing vs our existing SSA
> translator.  The SSA naming discipline in Diego's work is merely a
> conceptual model -- we do not actually rewrite the tree structures to
> conform to SSA naming rules.  The nice thing about this model is you
> don't actually have to convert the IL into and out of SSA form.
>
> That model works fine for a class of optimization problems, but breaks 
> down
> with any optimization which potentially has two SSA names for the same
> variable live at the same point in the program.
>
>
Sorry, this isn't correrct.
Factored use-def chains are more powerful than traditional SSA form, 
it's an extended variant of SSA, because it can handle backwards 
problems as well.
To get the equivalent, you'd need to take traditional SSA form,and give 
unique names to uses as well.
Anything you can do in traditional SSA, you can do with fud chains. You 
can link defs->uses just as well as you can link the uses->defs.
It's absolutely trivial.
The point of FUD is to allow not doing phyiscal renaming (saving 
memory), and to give use-def chains as well, without the cost of the 
renaming, so you can solve backwards dataflow problems, as demand driven 
optimizations often require.
I.E. It allows you to do things like demand-driven constant propagation.
http://www.acm.org/pubs/articles/proceedings/sac/326619/p400-stoltz/p400-stoltz .
pdf

In short, use  def-use chains rather than use-def chains (we aren't 
currently creating them in tree-ssa, but this can be rectified very 
easily, i've already done it for SSA loop optimizations), and your 
problem is solved.

>   GVN and dominator based
> value numbering can create situations where we have two SSA names live 
> at
> the same time.
>  It's probably true that other interesting optimizations
> could create similar situations.  I believe this argues that either the
> tree SSA needs to do rewriting or that tree SSA is not suitable for the
> class of optimizations I'm trying to implement.
>
I believe you are incorrect on both counts.
It's trivial to extend the tree SSA to do what you want, and it 
*shouldn't* do rewriting
>
> jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 20:54 ` Diego Novillo
@ 2001-10-09 21:27   ` Daniel Berlin
  2001-10-09 21:49     ` Diego Novillo
  2001-10-19 14:39     ` law
  2001-10-19 12:19   ` law
  1 sibling, 2 replies; 34+ messages in thread
From: Daniel Berlin @ 2001-10-09 21:27 UTC (permalink / raw)
  To: Diego Novillo; +Cc: law, gcc

On Tuesday, October 9, 2001, at 11:53  PM, Diego Novillo wrote:

> On Tue, 09 Oct 2001, Jeff Law wrote:
>
>> I don't think there's any significant disagreement that a higher level
>> IL would be useful.  What is a subject of debate is whether we have
>> a lower tree form, higher RTL form or some other form.  I think if we
>>
> Or both.  The tree form is still too high-level for some
> transformations.  We can still lower the tree representation
> without actually creating a new IL.  This lowering would preserve
> full symbolic and array information while simplifying the
> expression trees to remove most of the language semantics
> (side-effects, sequence points, etc).
>
> I'm thinking of trees that ressemble three-address ILs.  They
> would look just like trees to the tree-to-RTL pass and more
> importantly they would provide a high-level IL that is more
> language-independent than the current trees.
This is a much better idea than a high level rtl, IMHO, having had to 
implement it for the vectorization stuff i did.

>
>> That model works fine for a class of optimization problems, but breaks 
>> down
>> with any optimization which potentially has two SSA names for the same
>> variable live at the same point in the program.  GVN and dominator 
>> based
>> value numbering can create situations where we have two SSA names live 
>> at
>> the same time.
>>
> Sorry, you missed me here.  Would you have an example?  I agree
> that we might find the FUD chains representation restricted for
> some optimizations, but I'm not sure I follow this particular
> case.  We can at least support most/all interesting loop
> transformations with FUD chains.

There can exist no optimization that can be performed in a renamed SSA 
form (with both *uses* and *defs* renamed to be unique) that can't be 
performed on a factored representation where names point to the reaching 
def/use.
They are provably equivalent.

The perceived difference could be caused by the fact that Wolfe's paper 
covers Factored UD chains, while traditional SSA is only renaming defs, 
giving you *explicit* def-use chains.

You can just as easily do factored DU chains, and have an SSA form with 
explicit use-def chains through renaming.
The main advantage to factored chains is that it doesn't require 
renaming.
This matters a heck of a lot more for uses than defs, since there are a 
lot more uses than defs.

There is a picture of what factored du chains would look like in wolfe's 
paper. It's just FUD with the pointers reversed (obviously).

Factored chains tend to use recursive algorithms (walking the chains for 
the variables), rather than iterative ones (walking the explicitly 
renamed variables), but you could do it either way.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 21:18 ` Daniel Berlin
@ 2001-10-09 21:33   ` Diego Novillo
  2001-10-19 14:37   ` law
  1 sibling, 0 replies; 34+ messages in thread
From: Diego Novillo @ 2001-10-09 21:33 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: law, gcc

On Wed, 10 Oct 2001, Daniel Berlin wrote:

> In short, use  def-use chains rather than use-def chains (we aren't 
> currently creating them in tree-ssa, but this can be rectified very 
>
Yes, we are.  We have both DU and UD chains now.


Diego.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 21:27   ` Daniel Berlin
@ 2001-10-09 21:49     ` Diego Novillo
  2001-10-09 22:23       ` Daniel Berlin
  2001-10-19 14:39     ` law
  1 sibling, 1 reply; 34+ messages in thread
From: Diego Novillo @ 2001-10-09 21:49 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: law, gcc

On Wed, 10 Oct 2001, Daniel Berlin wrote:

> 
> On Tuesday, October 9, 2001, at 11:53  PM, Diego Novillo wrote:
> 
> > On Tue, 09 Oct 2001, Jeff Law wrote:
> >
> >> That model works fine for a class of optimization problems, but breaks 
> >> down
> >> with any optimization which potentially has two SSA names for the same
> >> variable live at the same point in the program.  GVN and dominator 
> >> based
> >> value numbering can create situations where we have two SSA names live 
> >> at
> >> the same time.
> >>
> > Sorry, you missed me here.  Would you have an example?  I agree
> > that we might find the FUD chains representation restricted for
> > some optimizations, but I'm not sure I follow this particular
> > case.  We can at least support most/all interesting loop
> > transformations with FUD chains.
> 
> There can exist no optimization that can be performed in a renamed SSA 
> form (with both *uses* and *defs* renamed to be unique) that can't be 
> performed on a factored representation where names point to the reaching 
                                               ^^^^^
					       references
> def/use.
> They are provably equivalent.
> 
I think Jeff was referring to ease of use, rather than actual
limitation.  The FUD chains camp usually goes on and on about the
advantages of not blowing up your symbol table to hell and back,
and also they dislike the idea of the compiler actually
re-writing the code to do analysis.  The renaming camp, argues
that you have to jump through hoops to maintain the two views of
the program. Both POVs seem valid to me.

I never really decided which one I like best.  I do find the FUD
chains less intrusive, but I'm not sure they are so easy to deal
with once you start doing incremental updates of your SSA form.
I never dealt with incremental updates, I just re-ran SSA all
over again.  Then again, I never had to worry about compile-time
performance :)

> You can just as easily do factored DU chains, and have an SSA
> form with explicit use-def chains through renaming. The main
> advantage to factored chains is that it doesn't require
> renaming. This matters a heck of a lot more for uses than defs,
> since there are a lot more uses than defs.
> 
We already have def-use chains and pretty soon we will also have
def-def chains for non-killing definitions like array assignments
and pointer dereferences.


Diego.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 21:49     ` Diego Novillo
@ 2001-10-09 22:23       ` Daniel Berlin
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Berlin @ 2001-10-09 22:23 UTC (permalink / raw)
  To: Diego Novillo; +Cc: law, gcc

On Wednesday, October 10, 2001, at 12:48  AM, Diego Novillo wrote:

> On Wed, 10 Oct 2001, Daniel Berlin wrote:
>
>>
>> On Tuesday, October 9, 2001, at 11:53  PM, Diego Novillo wrote:
>>
>>> On Tue, 09 Oct 2001, Jeff Law wrote:
>>>
>>>> That model works fine for a class of optimization problems, but 
>>>> breaks
>>>> down
>>>> with any optimization which potentially has two SSA names for the 
>>>> same
>>>> variable live at the same point in the program.  GVN and dominator
>>>> based
>>>> value numbering can create situations where we have two SSA names 
>>>> live
>>>> at
>>>> the same time.
>>>>
>>> Sorry, you missed me here.  Would you have an example?  I agree
>>> that we might find the FUD chains representation restricted for
>>> some optimizations, but I'm not sure I follow this particular
>>> case.  We can at least support most/all interesting loop
>>> transformations with FUD chains.
>>
>> There can exist no optimization that can be performed in a renamed SSA
>> form (with both *uses* and *defs* renamed to be unique) that can't be
>> performed on a factored representation where names point to the 
>> reaching
>                                                ^^^^^
> 					       references
>> def/use.
>> They are provably equivalent.
>>
> I think Jeff was referring to ease of use, rather than actual
> limitation.
I'm not so sure, given what he said about value numbering, so i thought 
i'd just point it out. It certainly would not require a rewrite of the 
tree-ssa stuff.
And nascent has examples of most of the algorithms he wants to 
implement, implemented on factored representations, and they are 
comparable in implementation ease to the traditional versions.

> The FUD chains camp usually goes on and on about the
> advantages of not blowing up your symbol table to hell and back,
> and also they dislike the idea of the compiler actually
> re-writing the code to do analysis.
Yup.
>  The renaming camp, argues
> that you have to jump through hoops to maintain the two views of
> the program. Both POVs seem valid to me.
>
Same here.
> I never really decided which one I like best.  I do find the FUD
> chains less intrusive, but I'm not sure they are so easy to deal
> with once you start doing incremental updates of your SSA form.
It's just as easy to relink a bunch of pointers, as  it is to rename a 
bunch of variables.
At least, it should be.
It's a data structures problem more than anything else.
> I never dealt with incremental updates, I just re-ran SSA all
> over again.  Then again, I never had to worry about compile-time
> performance :)
>
:)
In reality, the real problem with the two camps is that the "best" 
representation depends on external factors.
For ease of algorithms, speed, and memory usage, fud is probably better 
for backwards problems, traditional for forwards problems.
However, using both is not really an option, because who the heck wants 
two ssa representations to maintain?  Even if you were wacky, and 
decided you were going to fake one from the other, in your nice C++ 
compiler than only walked the IR using iterators, you'd still have a 
mess (probably even more so).

It's around this point in trying to weigh each that i usually give up on 
SSA and move on to looking at sparse evaluation graphs.

Trying to argue which of factored vs explicit SSA forms is best is 
comparable to arguing whether sparse bitmaps are better than dense ones, 
or adjacency lists are better for graphs than an adjacency matrix.
It's rather silly trying to make generalizations about it.
However, for the explicit purpose of SSA on trees, since you deal mostly 
with recursive algorithms already, a factored representation makes more 
sense, since the algorithms used on them tend to be recursive as well.
For an RTL form, where you tend to walk insns iteratively, an explicit 
representation makes more sense, but if we implement optimizations 
needing renamed uses, it might use too much memory.
If for_each_rtx becomes the preferred walking method, than a factored 
representation would be right there too.
Unless the plan was to only do SSA at the new IL level (IE not do 
tree-ssa, not do rtl-ssa), in which case it becomes a case of looking 
into a crystal ball and using tea leaves.

>> You can just as easily do factored DU chains, and have an SSA
>> form with explicit use-def chains through renaming. The main
>> advantage to factored chains is that it doesn't require
>> renaming. This matters a heck of a lot more for uses than defs,
>> since there are a lot more uses than defs.
>>
> We already have def-use chains and pretty soon we will also have
> def-def chains for non-killing definitions like array assignments
> and pointer dereferences.
>
Cool.

>
> Diego.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 19:46 Higher level RTL issues law
  2001-10-09 20:54 ` Diego Novillo
  2001-10-09 21:18 ` Daniel Berlin
@ 2001-10-13  2:27 ` Jan Hubicka
  2001-10-19 12:03   ` law
  2 siblings, 1 reply; 34+ messages in thread
From: Jan Hubicka @ 2001-10-13  2:27 UTC (permalink / raw)
  To: law; +Cc: gcc

> 
> I know folks have heard me mention the idea of a higher level RTL from
> time to time.  I must apologize for not posting this earlier -- oh well,
> life goes on.  I must also apologize as this isn't 100% complete, but
> given that I haven't posted anything about it already and the fact that
> I'm going on vacation tomorrow, I figured I'd at least get this info out
> and let folks chew on it while I'm gone.
Nice to hear that there is progress on mid-RTL side.
Would be possible, after the vacation, to tell us more details about
the implementation (or even better commit it to the branch)?  
If I think about the concept, there are some parts that appears dificult
to me, such as how are the calls represented, or the string operations...

Honza

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-13  2:27 ` Jan Hubicka
@ 2001-10-19 12:03   ` law
  2001-10-21 10:40     ` Jan Hubicka
  2001-10-22 16:22     ` Joern Rennecke
  0 siblings, 2 replies; 34+ messages in thread
From: law @ 2001-10-19 12:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

  In message < 20011013112552.E29652@atrey.karlin.mff.cuni.cz >you write:
  > > 
  > > I know folks have heard me mention the idea of a higher level RTL from
  > > time to time.  I must apologize for not posting this earlier -- oh well,
  > > life goes on.  I must also apologize as this isn't 100% complete, but
  > > given that I haven't posted anything about it already and the fact that
  > > I'm going on vacation tomorrow, I figured I'd at least get this info out
  > > and let folks chew on it while I'm gone.
  > Nice to hear that there is progress on mid-RTL side.
  > Would be possible, after the vacation, to tell us more details about
  > the implementation (or even better commit it to the branch)?  
The details of the proof of concept are pretty simple.  Remember, we don't
have the automatic lowering code, just proof of concepts at this point.

For the embedded target I have a series of state variables which indicate
whether or not we're working with generic RTL, lowering from generic to
target RTL or working with target RTL only.

The backend has a set of generic patterns which are only available when the
state variable indicates that we are working with generic RTL.

The backend has a set of splitters which are only available when lowering
from generic to target RTL.

The original backend's patterns (target specific) are basically unchanged.

The state variables are twiddled in rest_of_compilation and there's a wee
bit of code to call split_insns to actually call the lowering splitters.

Finally there's a small amount of code to scan the RTL chain looking for
RTL constructs that we can't currently deal with in the SSA path 
(libcalls, assignments to subregs, etc).  If none of the "bad" things are
found, then we automatically run the SSA path.

There's really not enough interesting bits for the proof of concept
embedded port to post (not to mention the details of the chip are still
under NDA).

For IA64 things are a bit more interesting.  The port works as-is.  Why?

Well, first, ssa-ccp isn't quite as aggressive as it could be in terms of
replacing registers with constants.  The cases where it is aggressive happen
to be the cases that ia64 handles without modifications (similarly for the
dominator optimizations).  Removing the restrictions in those optimizers is
trivial once we've got automatic lowering code.

Second, comparisons/jumps are in a reasonably well defined form (no cc0 for
example and patterns don't have extra uses/clobbers).

So all you'd really need to play with this stuff is the code which scans
the RTL for certain problem constructs and automatically uses the SSA path
if no such problem constructs are found.

  > If I think about the concept, there are some parts that appears dificult
  > to me, such as how are the calls represented, or the string operations...
Calls are one of the interesting issues.  There's a whole rats nest of things
that need to be improved, but none which are strictly required at this point.
ie, we can tackle how calls are represented independently of the rest of the
lowering work.

First, there should be a phase which unnests calls at the tree level.  Nested
calls make the code in calls.c a freaking nightmare (all the stack slot
saving crap).

Second, we want a model where arguments are attached to the call and eventually
lowered to assignments to stack slots and argument registers.  I'm not sure
where it makes sense to lower these assignments yet.  The same applies for
the return value.

Third, we want to delay exposure and lowering of libcalls.  libcalls sequences
are a disgusting hack and cause as many problems as they solve.  The tricky
part is that we need to unnest potential libcalls at the tree level so that
we avoid all the stack slot saving BS when we lower from a simple "div"
instruction to a __divsi3 libcall (as an example).

I haven't tacked string operations.  But my gut feeling is that we leave them
as simple string operations at the generic RTL stage, then either lower them
to the target specific string op, a series of simple insns, or a libcall
as part of the overall lowering process.

jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 20:54 ` Diego Novillo
  2001-10-09 21:27   ` Daniel Berlin
@ 2001-10-19 12:19   ` law
  1 sibling, 0 replies; 34+ messages in thread
From: law @ 2001-10-19 12:19 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

  In message < 20011009235348.A2183@tornado.cygnus.com >you write:
  > On Tue, 09 Oct 2001, Jeff Law wrote:
  > 
  > > I don't think there's any significant disagreement that a higher level
  > > IL would be useful.  What is a subject of debate is whether we have
  > > a lower tree form, higher RTL form or some other form.  I think if we
  > >
  > Or both.
Right.  Sorry if I wasn't explicit about my belief that there is a set of
problems that are best solved by Diego's approach and some that are best
solved by my approach.  Furthermore, I believe that Diego's code and my
code can peacefully co-exist in the compiler together.


  > We can still lower the tree representation without actually creating a
  > new IL.  This lowering would preserve full symbolic and array information
  > while simplifying the expression trees to remove most of the language
  > semantics (side-effects, sequence points, etc).
Right.  In fact one of the things I want to do (and touched upon briefly in
a previous message) is un-nesting of calls at the tree level and one day
have a more gradual lowering of memory addresses (the way we drop from,
ARRAY_REF, COMPONENT_REF, etc down to target specific addressing modes in 
tree->rtl translation right now is incredibly silly).  Similarly for how 
we handle any kind of partial-word/bitfield arithmetic (you pretend they
are full-word then lower to partial-word/bitfields later, my guess is at
least part of this will be best handled at the tree level).


  > I'm thinking of trees that ressemble three-address ILs.  They
  > would look just like trees to the tree-to-RTL pass and more
  > importantly they would provide a high-level IL that is more
  > language-independent than the current trees.  
Right.

  > > with any optimization which potentially has two SSA names for the same 
  > > variable live at the same point in the program.  GVN and dominator based
  > > value numbering can create situations where we have two SSA names live at
  > > the same time.  
  > >
  > Sorry, you missed me here.  Would you have an example? 
Yes.  Do you recall our phone conversation about the inner loop of compress
from several months ago?  When you apply value numbering (either GVN or 
dominator optimizations ala Morgan) on that loop in SSA form for the IA64 
port you'll find a redundant sign extension.

Removal of that sign extension causes two distinct SSA names for the same
original variable to be live at the same point and have different runtime
values.  Thus, we can not simply throw away the SSA names after removing
the useless sign extension.

I'll have to fire up my ia64 box and get you the RTL dumps which show what's
happening (it's supposed to get cold in SLC Sunday, so that might be a good
time to crank up the ia64 machine :-)

[ discussion of a whole new IL ... ]
  > Agreed, it needs to be an IL that we can evolve from an existing
  > one.  Either a lower-level tree representation as I outlined
  > above or a higher-level RTL, or both.  Each representation
  > supports orthogonal classes of transformations.  Anything closer
  > to the source is probably easier to do at some form of tree IL.
I think we're in violent agreement here :-)  I strongly feel that we'll
need both -- I just happen to have more of an interest in the RTL side
of things :-)


jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 21:18 ` Daniel Berlin
  2001-10-09 21:33   ` Diego Novillo
@ 2001-10-19 14:37   ` law
  2001-10-19 15:53     ` Daniel Berlin
  1 sibling, 1 reply; 34+ messages in thread
From: law @ 2001-10-19 14:37 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

  In message < CCB1DD56-BD35-11D5-867A-0030657B5340@cgsoftware.com >you write:
  > 
  > On Tuesday, October 9, 2001, at 10:48  PM, law@redhat.com wrote:
  > 
  > >
  > > I know folks have heard me mention the idea of a higher level RTL from
  > > time to time.  I must apologize for not posting this earlier -- oh well,
  > > life goes on.  I must also apologize as this isn't 100% complete, but
  > > given that I haven't posted anything about it already and the fact that
  > > I'm going on vacation tomorrow, I figured I'd at least get this info out
  > > and let folks chew on it while I'm gone.
  > >
  > > I don't think there's any significant disagreement that a higher level
  > > IL would be useful.
  > >  What is a subject of debate is whether we have
  > > a lower tree form, higher RTL form or some other form.  I think if we
  > > look at the set of problems we want to solve we can get a good feel for
  > > which of the 3 choices makes the most sense.
  > >
  > >
  > > I'm looking at the new IL as a way to enable an SSA optimization path.
  > Okey dokey.
  > 
  > >  On
  > > that optimization path I'm initially looking at the following 
  > > optimizations:
  > >
  > > 	* Sparse conditional constant propagation (ssa-ccp.c)
  > >
  > > 	* Traditional dead code elimination (ssa-dce.c)
  > >
  > > 	* Aggressive dead code elimination (ssa-dce.c)
  > >
  > > 	* Register coalescing and renaming (ssa.c)
  > >
  > > 	* Dominator based value numbering (will eventually be in ssa.c)
  > >
  > > Certainly there will be more optimizations in the future, but I think 
  > > this
  > > set gives us a good idea of the kinds of optimization problems I want to
  > > solve.
  > I don't think that these are representative of the type of problems we 
  > *should* be looking to solve.
  > These are pretty basic ssa optimizations, not something to look at when 
  > determining the type of IL you need.
I disagree strongly.  One of the goals of moving towards an SSA optimization
path is to slowly, but surely build an optimization path that is stronger,
more efficient and more understandable than the mess we have now (particularly
cse.c and loop.c).  The first step in that direction is to build a path which
can perform the same things that the old path can perform and show that the
new path is better across the various axis mentioned above.

By not means should the list I mentioned above be considered exhaustive, I
very much want to build a path that can be used to solve more interesting
problems than the ones mentioned above.

By attacking the problems noted above I believe we can incrementally build
a new optimizer path in GCC without completely rewriting the compiler from
the ground up (meaning there's actually a remote chance we'll do it).

  > If we only use these as the things to base requirements on, you'll never 
  > get much improvement in performance.
  > At least, as they exist now.
  > Why?
  > Because the main optimizations needed right now relate to memory and 
  > loops.
  > You can perform very interesting loop and memory optimizations in SSA 
  > form.
  > However, having simply a higher level RTL would not help them 
  > significantly (>5%, i'd guesstimate), unless you tackle the issues of 
  > array indices and memory.
  > So to not take this into account at all when designing an IR better for 
  > SSA (whether it be a higher level RTL or whatever), would be a shame.
The class of problems in this space is going to require major rethinking
at both the tree level and the RTL level.  It should be sufficient to say that
I have had extensive discussions with folks about what we need to do to
one day be able to do these kinds of optimizations.

  > Sorry, this isn't correrct.
  > Factored use-def chains are more powerful than traditional SSA form, 
  > it's an extended variant of SSA, because it can handle backwards 
  > problems as well.
  > To get the equivalent, you'd need to take traditional SSA form,and give 
  > unique names to uses as well.
  > Anything you can do in traditional SSA, you can do with fud chains. You 
  > can link defs->uses just as well as you can link the uses->defs.
  > It's absolutely trivial.
  > The point of FUD is to allow not doing phyiscal renaming (saving 
  > memory), and to give use-def chains as well, without the cost of the 
  > renaming, so you can solve backwards dataflow problems, as demand driven 
  > optimizations often require.
  > I.E. It allows you to do things like demand-driven constant propagation.
  > http://www.acm.org/pubs/articles/proceedings/sac/326619/p400-stoltz/p400-st
  > oltz.
I'll have to show you an example where you do need renaming to perform 
relatively simple redundancy elimination problems that are found by any
reasonable value numbering scheme.

jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-09 21:27   ` Daniel Berlin
  2001-10-09 21:49     ` Diego Novillo
@ 2001-10-19 14:39     ` law
  1 sibling, 0 replies; 34+ messages in thread
From: law @ 2001-10-19 14:39 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Diego Novillo, gcc

  In message < 1B5A94CB-BD37-11D5-867A-0030657B5340@cgsoftware.com >you write:
  > > I'm thinking of trees that ressemble three-address ILs.  They
  > > would look just like trees to the tree-to-RTL pass and more
  > > importantly they would provide a high-level IL that is more
  > > language-independent than the current trees.
  > This is a much better idea than a high level rtl, IMHO, having had to 
  > implement it for the vectorization stuff i did.
I don't disagree.  I never intended for the higher level RTL concepts to
solve all the vectorization problems.  I believe Diego, Richard and others
can probably back me up on that since I've discussed vectorization issues
with them in the past.

Attacking vectorization (IMHO) is mostly going to be in the realm of a
tree loop optimizer.  As will be most of the loop restructuring 
transformations.

jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-19 14:37   ` law
@ 2001-10-19 15:53     ` Daniel Berlin
  2001-10-22  9:31       ` law
  0 siblings, 1 reply; 34+ messages in thread
From: Daniel Berlin @ 2001-10-19 15:53 UTC (permalink / raw)
  To: law; +Cc: gcc

>
<snip the rest, for various reasons, i'm not going to reply>

> I'll have to show you an example where you do need renaming to perform
> relatively simple redundancy elimination problems that are found by any
> reasonable value numbering scheme.
Please, do.
Any updating you'd have to do to def-use chains in tree-ssa, you'd have 
to do in the explicitly renamed version.
If you have to at all, which you don't in reasonable value numbering 
schemes
Take dominator based value numbering, for instance.
It's explicitly computing which expressions are dominated by expressions 
that compute the same value.
It then removes the extra computations, and rewrites uses of them to use 
the original computations.
Where's the renaming needed?
SCC based value numbering is the same way.
Nowhere do you insert *new* computations of a value.
In fact, nowhere in any normal redundancy elimination problem do you 
insert new computations of values, only partial redundancy problems.

The only reason we do it now in some cases is because RTL effectively 
requires us to, in some cases.
Doing it at the tree SSA level, for instance, has no such problem.
I.E. It's not a FUD-chain problem, it's an RTL-SSA problem.

>
> jeff
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-19 12:03   ` law
@ 2001-10-21 10:40     ` Jan Hubicka
  2001-10-22  8:28       ` law
  2001-10-22 16:22     ` Joern Rennecke
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Hubicka @ 2001-10-21 10:40 UTC (permalink / raw)
  To: law; +Cc: Jan Hubicka, gcc

> The details of the proof of concept are pretty simple.  Remember, we don't
> have the automatic lowering code, just proof of concepts at this point.
> 
> For the embedded target I have a series of state variables which indicate
> whether or not we're working with generic RTL, lowering from generic to
> target RTL or working with target RTL only.
> 
> The backend has a set of generic patterns which are only available when the
> state variable indicates that we are working with generic RTL.

I've imaginated that w/o patterns at all - just have some easy enought
definition of "correct RTL", to get rid of the optimizer trying to hit into
existing pattern, but this sounds sane too.
> 
> The backend has a set of splitters which are only available when lowering
> from generic to target RTL.
> 
> The original backend's patterns (target specific) are basically unchanged.
Good.
> 
[snip]
>   > If I think about the concept, there are some parts that appears dificult
>   > to me, such as how are the calls represented, or the string operations...
> Calls are one of the interesting issues.  There's a whole rats nest of things
> that need to be improved, but none which are strictly required at this point.
> ie, we can tackle how calls are represented independently of the rest of the
> lowering work.
> 
> First, there should be a phase which unnests calls at the tree level.  Nested
Actually the current calls.c code can do unnesting, but only for "real calls" it
can see in the tree, not libcalls that can happend suddenly during tree generation.

But for unnesting you don't need to modify the tree structure, just always generate
in a way to compute operands first and set to proper places later.
This unfortunately won't work well for structures that are better returned from
nested calls directly where they should resist when calling the outer functions.

This works on targets w/o fixed stack pointer, as i386 is very well.
> calls make the code in calls.c a freaking nightmare (all the stack slot
> saving crap).
> 
> Second, we want a model where arguments are attached to the call and eventually
> lowered to assignments to stack slots and argument registers.  I'm not sure
> where it makes sense to lower these assignments yet.  The same applies for
> the return value.
> 
> Third, we want to delay exposure and lowering of libcalls.  libcalls sequences
> are a disgusting hack and cause as many problems as they solve.  The tricky
> part is that we need to unnest potential libcalls at the tree level so that
> we avoid all the stack slot saving BS when we lower from a simple "div"
> instruction to a __divsi3 libcall (as an example).
Agreed, but we need to track the nesting of libcalls inside calls somehow.
> 
> I haven't tacked string operations.  But my gut feeling is that we leave them
> as simple string operations at the generic RTL stage, then either lower them
> to the target specific string op, a series of simple insns, or a libcall
> as part of the overall lowering process.
Sounds easy - it is not that dificult to represent them just approximately
in the RTL.  With generic patterns and splitters it can be easy to write
the pattern and splitter calling the expand_builtin beast.

Honza
> 
> jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-21 10:40     ` Jan Hubicka
@ 2001-10-22  8:28       ` law
  2001-10-22  8:36         ` Daniel Berlin
  0 siblings, 1 reply; 34+ messages in thread
From: law @ 2001-10-22  8:28 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

  In message < 20011021194030.G12034@atrey.karlin.mff.cuni.cz >you write:
  > > The details of the proof of concept are pretty simple.  Remember, we don'
  > t
  > > have the automatic lowering code, just proof of concepts at this point.
  > > 
  > > For the embedded target I have a series of state variables which indicate
  > > whether or not we're working with generic RTL, lowering from generic to
  > > target RTL or working with target RTL only.
  > > 
  > > The backend has a set of generic patterns which are only available when t
  > he
  > > state variable indicates that we are working with generic RTL.
  > 
  > I've imaginated that w/o patterns at all - just have some easy enought
  > definition of "correct RTL", to get rid of the optimizer trying to hit into
  > existing pattern, but this sounds sane too.
What Richard and I discussed as a generic set of patterns that would be
included by all ports via a #include or similar mechanism in the md file.

jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22  8:28       ` law
@ 2001-10-22  8:36         ` Daniel Berlin
  2001-10-22  8:56           ` law
  2001-10-22  9:07           ` Jan Hubicka
  0 siblings, 2 replies; 34+ messages in thread
From: Daniel Berlin @ 2001-10-22  8:36 UTC (permalink / raw)
  To: law; +Cc: Jan Hubicka, gcc

law@redhat.com writes:

>   In message < 20011021194030.G12034@atrey.karlin.mff.cuni.cz >you write:
>   > > The details of the proof of concept are pretty simple.  Remember, we don'
>   > t
>   > > have the automatic lowering code, just proof of concepts at this point.
>   > > 
>   > > For the embedded target I have a series of state variables which indicate
>   > > whether or not we're working with generic RTL, lowering from generic to
>   > > target RTL or working with target RTL only.
>   > > 
>   > > The backend has a set of generic patterns which are only available when t
>   > he
>   > > state variable indicates that we are working with generic RTL.
>   > 
>   > I've imaginated that w/o patterns at all - just have some easy enought
>   > definition of "correct RTL", to get rid of the optimizer trying to hit into
>   > existing pattern, but this sounds sane too.
> What Richard and I discussed as a generic set of patterns that would be
> included by all ports via a #include or similar mechanism in the md file.
> 
A minor issue i'm sure, but how much does this increase the running
time of the gen* programs on the md files?
Or has it not been done enough yet to be able to tell?

> jeff
> 

-- 
"My friend has a baby.  I'm recording all the noises he makes so
later I can ask him what he meant.
"-Steven Wright

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22  8:36         ` Daniel Berlin
@ 2001-10-22  8:56           ` law
  2001-10-22  9:07           ` Jan Hubicka
  1 sibling, 0 replies; 34+ messages in thread
From: law @ 2001-10-22  8:56 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Jan Hubicka, gcc

  In message < 87elnv3cvv.fsf@cgsoftware.com >you write:
  > A minor issue i'm sure, but how much does this increase the running
  > time of the gen* programs on the md files?
  > Or has it not been done enough yet to be able to tell?
Unknown at this time.
jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22  8:36         ` Daniel Berlin
  2001-10-22  8:56           ` law
@ 2001-10-22  9:07           ` Jan Hubicka
  2001-10-22 11:32             ` law
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Hubicka @ 2001-10-22  9:07 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: law, Jan Hubicka, gcc

> law@redhat.com writes:
> 
> >   In message < 20011021194030.G12034@atrey.karlin.mff.cuni.cz >you write:
> >   > > The details of the proof of concept are pretty simple.  Remember, we don'
> >   > t
> >   > > have the automatic lowering code, just proof of concepts at this point.
> >   > > 
> >   > > For the embedded target I have a series of state variables which indicate
> >   > > whether or not we're working with generic RTL, lowering from generic to
> >   > > target RTL or working with target RTL only.
> >   > > 
> >   > > The backend has a set of generic patterns which are only available when t
> >   > he
> >   > > state variable indicates that we are working with generic RTL.
> >   > 
> >   > I've imaginated that w/o patterns at all - just have some easy enought
> >   > definition of "correct RTL", to get rid of the optimizer trying to hit into
> >   > existing pattern, but this sounds sane too.
> > What Richard and I discussed as a generic set of patterns that would be
> > included by all ports via a #include or similar mechanism in the md file.
> > 
> A minor issue i'm sure, but how much does this increase the running
> time of the gen* programs on the md files?
> Or has it not been done enough yet to be able to tell?

Perhaps this can be done at higher level and teach genrecog to generate separate
functions for the generic stuff as they should be never intermixed together
and we avoid expenses of increasing number of instructions attrtab and ohters
needs to handle (ignore).

Honza
> 
> > jeff
> > 
> 
> -- 
> "My friend has a baby.  I'm recording all the noises he makes so
> later I can ask him what he meant.
> "-Steven Wright

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-19 15:53     ` Daniel Berlin
@ 2001-10-22  9:31       ` law
  2001-10-22 11:49         ` Daniel Berlin
  0 siblings, 1 reply; 34+ messages in thread
From: law @ 2001-10-22  9:31 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

  In message < 071E8F7C-C4E4-11D5-B9EF-0030657B5340@cgsoftware.com >you write:
  > >
  > <snip the rest, for various reasons, i'm not going to reply>
  > 
  > > I'll have to show you an example where you do need renaming to perform
  > > relatively simple redundancy elimination problems that are found by any
  > > reasonable value numbering scheme.
  > Please, do.
I'm going to try to get it to y'all today.

FWIW, I'm not the only person to recognize this issue with using SSA without
actually rewriting the IL.  Here's one thread from comp.compilers on the
subject:

  http://compilers.iecc.com/comparch/article/01-01-147
  http://compilers.iecc.com/comparch/article/01-02-004
  http://compilers.iecc.com/comparch/article/01-02-006
  http://compilers.iecc.com/comparch/article/01-02-007
  http://compilers.iecc.com/comparch/article/01-02-043





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22  9:07           ` Jan Hubicka
@ 2001-10-22 11:32             ` law
  2001-10-22 15:07               ` Richard Henderson
  2001-10-23 15:16               ` Joern Rennecke
  0 siblings, 2 replies; 34+ messages in thread
From: law @ 2001-10-22 11:32 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Daniel Berlin, gcc

  In message < 20011022180647.C4658@atrey.karlin.mff.cuni.cz >you write:
  > Perhaps this can be done at higher level and teach genrecog to generate sep
  > arate
  > functions for the generic stuff as they should be never intermixed together
  > and we avoid expenses of increasing number of instructions attrtab and ohte
  > rs
  > needs to handle (ignore).
If you give up the ability to go ahead and match target specific patterns
while dealing with generic RTL.  

The idea behind being able to match target specific patterns was to avoid
even trying to lower insns which are already known to match a pattern 
specific to the target.

I don't think you'd need to expose both generic and target RTL patterns
to genattrtab since attributes are generally related to low level target
specific stuff (scheduling, insn lengths, delay slots, etc).

jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22  9:31       ` law
@ 2001-10-22 11:49         ` Daniel Berlin
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Berlin @ 2001-10-22 11:49 UTC (permalink / raw)
  To: law; +Cc: gcc

On Monday, October 22, 2001, at 12:34  PM, law@redhat.com wrote:

>   In message < 071E8F7C-C4E4-11D5-B9EF-0030657B5340@cgsoftware.com >you 
> write:
>>>
>> <snip the rest, for various reasons, i'm not going to reply>
>>
>>> I'll have to show you an example where you do need renaming to perform
>>> relatively simple redundancy elimination problems that are found by 
>>> any
>>> reasonable value numbering scheme.
>> Please, do.
> I'm going to try to get it to y'all today.
>
Okey dokey.
 From the thread, bob morgan says: " However, other
transformations, such as eliminating copy operations, can cause two
variants of the same register to be live at the same time. In that
case the easy thing to do is to actually rename the registers as new
registers."
I'll take his word for it.
However, note that he says the "easy" thing.
He doesn't say it's impossible.
*If*  two variants of the same register became live at the same time, 
It's a matter of updating definitions in the case of FUD chains, while 
you usually can get away with doing nothing in explicitly renamed ssa 
form.
Of course, what *we* should do, if we only had to select one thing,  (in 
terms of renaming or fud chains) depends on which type of problem we 
have more, forwards or backwards.
I'd rather *allow* us to do both.
If part of the point of the SSA path is to be fast and efficient, and 
cut down the amount of rtl we have to deal with in later, slower, 
passes, then what makes most sense is something like:

Convert to FUD chains
SSA-CCP (including simple dead-code removal)
Aggressive DCE
<other ssa optimizations that are faster or easier on fud chains>
Convert to explict renamed form
<Other SSA optimizations that are faster or easier with renaming>

This is because fud-chains are going to be faster to work with for 
things that remove a lot of code quickly. For instance, while SSA-CCP on 
explicit renamed form will require up to two evaluations per operand of 
an operation, on fud-chains, you can guarantee to evaluate each thing 
only once.
--Dan



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 11:32             ` law
@ 2001-10-22 15:07               ` Richard Henderson
  2001-10-23 15:16               ` Joern Rennecke
  1 sibling, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2001-10-22 15:07 UTC (permalink / raw)
  To: law; +Cc: Jan Hubicka, Daniel Berlin, gcc

On Mon, Oct 22, 2001 at 12:35:14PM -0600, law@redhat.com wrote:
>   In message < 20011022180647.C4658@atrey.karlin.mff.cuni.cz >you write:
>   > Perhaps this can be done at higher level and teach genrecog to generate
>   > separate functions for the generic stuff ...

This is what I had in mind.

> If you give up the ability to go ahead and match target specific patterns
> while dealing with generic RTL.  

We *never* want this.  Generic rtl is generic rtl.

What you are remembering is the desire to match target specific
patterns during the lowering process.  Which we can do by having
the lowering code call into the target specific match routines.



r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-19 12:03   ` law
  2001-10-21 10:40     ` Jan Hubicka
@ 2001-10-22 16:22     ` Joern Rennecke
  2001-10-23  3:02       ` Jan Hubicka
  1 sibling, 1 reply; 34+ messages in thread
From: Joern Rennecke @ 2001-10-22 16:22 UTC (permalink / raw)
  To: law; +Cc: Jan Hubicka, gcc

> Third, we want to delay exposure and lowering of libcalls.  libcalls sequences
> are a disgusting hack and cause as many problems as they solve.  The tricky
> part is that we need to unnest potential libcalls at the tree level so that
> we avoid all the stack slot saving BS when we lower from a simple "div"
> instruction to a __divsi3 libcall (as an example).

How about commoning of libcall addresses and other special constants?
Are you going to rely on post-lowering phases to do that, or do you
plan to display this stuff in the CALL_FUSAGE or some now field of
CALL_INSNs?

--
Joern Rennecke                  |            gcc expert for hire
amylaar@onetel.net.uk           |  send enquiries to: jwr_jobs@onetel.net.uk

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 16:22     ` Joern Rennecke
@ 2001-10-23  3:02       ` Jan Hubicka
  2001-10-23 15:28         ` law
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Hubicka @ 2001-10-23  3:02 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: law, Jan Hubicka, gcc

> > Third, we want to delay exposure and lowering of libcalls.  libcalls sequences
> > are a disgusting hack and cause as many problems as they solve.  The tricky
> > part is that we need to unnest potential libcalls at the tree level so that
> > we avoid all the stack slot saving BS when we lower from a simple "div"
> > instruction to a __divsi3 libcall (as an example).
> 
> How about commoning of libcall addresses and other special constants?
> Are you going to rely on post-lowering phases to do that, or do you
> plan to display this stuff in the CALL_FUSAGE or some now field of
> CALL_INSNs?
I believe we still must do (g)cse after lowering, as at almost each riscy
architecture this is going to introduce dozen of new constants and temporaries
that can be cseed, but we probably don't need the "full strength cse+gcse"
we've somehow failed to reach, as for instance load/store motion and friends
can be done only on midlevel.

Another pass that may make sense at lowlevel is strength reduction, but maybe
we can do it at midlevel just by knowing the addressing modes of target
machine.

Honza
> 
> --
> Joern Rennecke                  |            gcc expert for hire
> amylaar@onetel.net.uk           |  send enquiries to: jwr_jobs@onetel.net.uk

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 11:32             ` law
  2001-10-22 15:07               ` Richard Henderson
@ 2001-10-23 15:16               ` Joern Rennecke
  1 sibling, 0 replies; 34+ messages in thread
From: Joern Rennecke @ 2001-10-23 15:16 UTC (permalink / raw)
  To: law; +Cc: Jan Hubicka, Daniel Berlin, gcc

> I don't think you'd need to expose both generic and target RTL patterns
> to genattrtab since attributes are generally related to low level target
> specific stuff (scheduling, insn lengths, delay slots, etc).

Well, you might want to provide default definitions for high level rtl insn
attributes that are different from the defaults of the machine description.
For example, you might want to default most md patterns to have a 'normal'
type, for standard length and one-cycle execution.  Obviously, high-level
rtl patterns should default to something else.
So we could start with having an automatically provided attribute that
says if a pattern is high-level rtl, i.e. not described in the md file,
and the default attribute definitions can then test this attribute.

Or - that should make the build machinery simpler - we could have a new
(optional) slot in attribute definitions, which provides the value for
high-level rtl patterns.

-- 
Joern Rennecke                  |            gcc expert for hire
amylaar@onetel.net.uk           |  send enquiries to: jwr_jobs@onetel.net.uk

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-23  3:02       ` Jan Hubicka
@ 2001-10-23 15:28         ` law
  2001-10-24  7:59           ` Jan Hubicka
  0 siblings, 1 reply; 34+ messages in thread
From: law @ 2001-10-23 15:28 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joern Rennecke, gcc

  In message < 20011023120240.G12992@atrey.karlin.mff.cuni.cz >you write:
  > > How about commoning of libcall addresses and other special constants?
  > > Are you going to rely on post-lowering phases to do that, or do you
  > > plan to display this stuff in the CALL_FUSAGE or some now field of
  > > CALL_INSNs?
  > I believe we still must do (g)cse after lowering, as at almost each riscy
  > architecture this is going to introduce dozen of new constants and
  > temporaries that can be cseed, but we probably don't need the "full
  > strength cse+gcse" we've somehow failed to reach, as for instance
  > load/store motion and friends> can be done only on midlevel.
It is likely that we will continue to have a [g]cse pass of some kind that
runs after lowering from generic to target specific RTL.  It is *hoped* that
it can be a simpler, more compile time efficient pass than we currently have
with cse or gcse.

What makes you think we can only do lsm on target rtl?  There's no reason
they can not be done on the generic RTL.

  > Another pass that may make sense at lowlevel is strength reduction, but
  > maybe we can do it at midlevel just by knowing the addressing modes of
  > target machine.
I don't really want to get into the loop optimizer right now or better stated
I think solving our loop optimizer problems needs to be handled independently
of adding a generic RTL IR to GCC and I personally prefer to focus on the
generic RTL issues right now.

jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-23 15:28         ` law
@ 2001-10-24  7:59           ` Jan Hubicka
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Hubicka @ 2001-10-24  7:59 UTC (permalink / raw)
  To: law; +Cc: Jan Hubicka, Joern Rennecke, gcc

>   In message < 20011023120240.G12992@atrey.karlin.mff.cuni.cz >you write:
>   > > How about commoning of libcall addresses and other special constants?
>   > > Are you going to rely on post-lowering phases to do that, or do you
>   > > plan to display this stuff in the CALL_FUSAGE or some now field of
>   > > CALL_INSNs?
>   > I believe we still must do (g)cse after lowering, as at almost each riscy
>   > architecture this is going to introduce dozen of new constants and
>   > temporaries that can be cseed, but we probably don't need the "full
>   > strength cse+gcse" we've somehow failed to reach, as for instance
>   > load/store motion and friends> can be done only on midlevel.
> It is likely that we will continue to have a [g]cse pass of some kind that
> runs after lowering from generic to target specific RTL.  It is *hoped* that
> it can be a simpler, more compile time efficient pass than we currently have
> with cse or gcse.
> 
> What makes you think we can only do lsm on target rtl?  There's no reason
> they can not be done on the generic RTL.
Thats what I wanted to say - that we can drop some of complexity of current
gcse on lowlevel tree, as lsm is, since it don't make sense to rerun it after
midlevel->lowlevel generation.
> 
>   > Another pass that may make sense at lowlevel is strength reduction, but
>   > maybe we can do it at midlevel just by knowing the addressing modes of
>   > target machine.
> I don't really want to get into the loop optimizer right now or better stated
> I think solving our loop optimizer problems needs to be handled independently
> of adding a generic RTL IR to GCC and I personally prefer to focus on the
> generic RTL issues right now.

Sure.  I am trying to get some of bits for writing new loop optimizer ready
in meantime, so I am focusion on the oposite side of this problem.

Honza
> 
> jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 11:30 law
  2001-10-22 13:10 ` Daniel Berlin
@ 2001-11-13 13:41 ` Diego Novillo
  1 sibling, 0 replies; 34+ messages in thread
From: Diego Novillo @ 2001-11-13 13:41 UTC (permalink / raw)
  To: law; +Cc: dan, gcc

[ Apologies for joining so late.  I've been away for a month. ]

On Mon, 22 Oct 2001, Jeff Law wrote:

> So how is this done when SSA names are not reflected in the IR?  I have
> ideas, but I'd like to hear from someone with more experience in the
> non-rewriting camp.  You certainly can't do the naive thing and drop the
> SSA names in favor of the original names.
> 
In the non-rewriting SSA there are no names to drop.  There is no
SSA->normal translation because the SSA data structure is simply
laid on top of the flow graph.

However, eliminating redundancy does force you to update the SSA
web.  The work you have to do is similar to the work you would do
manipulating the SSA names.

> When we reach insn 458, we lookup (sign_extend:DI (reg:SI 343)) and we
> get (reg:DI 444).  Cool.  That means insn 458 is redundant.  Delete it.
> Do some manipulation of our renaming stacks to arrange for uses of 
> (reg:DI 617) to be rewritten as (reg:DI 444).  The only uses of 
> (reg:DI 617) are at insn 765.
>
The above transformation in the non-rewriting SSA goes like this:

When you decide to eliminate insn 458, instead of doing a
manipulation of the renaming stacks, you just manipulate the
use-def and def-use links between the def of r344 at bb 18 and
the phi term for r344 at bb 23.

This is the original phi term:

(insn 765 744 276 (set (reg:DI 603)
        (phi[
                (reg:DI 617)
                (const_int 22 [0x16])
                (reg:DI 617)
                (const_int 21 [0x15])
                (reg:DI 617)
                (const_int 20 [0x14])
                (reg:DI 617)
                (const_int 19 [0x13])
                (reg:DI 614)
                (const_int 16 [0x10])
                (reg:DI 607)
                (const_int 9 [0x9])
                (reg/v:DI 344)
                (const_int 7 [0x7])
            ] )) -1 (nil)
    (nil))

In this case, all the arguments for r617 are actually pointers to
the definition for r344 at insn 458.  Since insn 458 is being
removed, you simply make those use-def links point to the
definition for r444 in insn 305.  This operation is akin to
manipulating the renaming stacks you have to do in the rewriting
form.

Diego.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 14:28       ` Daniel Berlin
@ 2001-10-22 14:48         ` law
  0 siblings, 0 replies; 34+ messages in thread
From: law @ 2001-10-22 14:48 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: dnovillo, gcc

  In message < B7299F68-C733-11D5-9F25-0030657B5340@cgsoftware.com >you write:
  > > None that you wouldn't have to do anyway, since you need to update the 
  > > use's chain to point to the new def anyway.  The ID number is part of 
  > > the def/use structure, so doing the required update makes it work.
  > > Though i think this is what you meant, i'm just having a bit of trouble 
  > > parsing your sentence. Criminal law class does it to me.
  > > Remember, we have all the uses for a def, too, so when you remove the 
  > > def, you can just automatically update all the uses to the new def 
  > > pretty simply (since, in the case of dominator optimizations, we know 
  > > which def is the new one).
  > 
  > Scratch that, i forgot you are talking about dominator optimizations 
  > done *before* we build the links.
Precisely.

  > I was thinking of doing it as a value numbering pass once we *had* ssa 
  > form.
Understood.  And your scheme may make sense in that space, I haven't
worked through GVN in enough detail to have any feel one way or the
other.

jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 14:25     ` Daniel Berlin
@ 2001-10-22 14:28       ` Daniel Berlin
  2001-10-22 14:48         ` law
  0 siblings, 1 reply; 34+ messages in thread
From: Daniel Berlin @ 2001-10-22 14:28 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: law, dnovillo, gcc

On Monday, October 22, 2001, at 05:25  PM, Daniel Berlin wrote:

>
> On Monday, October 22, 2001, at 05:04  PM, law@redhat.com wrote:
>
>>   In message < BF0CB0D7-C728-11D5-9F25-0030657B5340@cgsoftware.com >you 
>> write:
>>>> So how is this done when SSA names are not reflected in the IR?
>>> You can do the same thing.
>>> You just need names.
>>
>>> If you look at the ssa-ccp patch i submitted, you'll note i added a
>>> unique id number to each ref, and use it in the following way to give 
>>> an
>>> ssa name:
>>> For a phi or a def, the ssa name is the id number.
>>> For a use, the ssa name is the id number of the associated def.
>>> This gives you the same "name" an explicit renamed representation 
>>> would
>>> give you, but we still haven't rewritten any code.
>> Right, but whenever you make this kind of transformation you have to 
>> scrurry
>> around and find all the uses the update the id number to point to the 
>> new
>> def.
> None that you wouldn't have to do anyway, since you need to update the 
> use's chain to point to the new def anyway.  The ID number is part of 
> the def/use structure, so doing the required update makes it work.
> Though i think this is what you meant, i'm just having a bit of trouble 
> parsing your sentence. Criminal law class does it to me.
> Remember, we have all the uses for a def, too, so when you remove the 
> def, you can just automatically update all the uses to the new def 
> pretty simply (since, in the case of dominator optimizations, we know 
> which def is the new one).

Scratch that, i forgot you are talking about dominator optimizations 
done *before* we build the links.
I was thinking of doing it as a value numbering pass once we *had* ssa 
form.
In the case of doing it coming into ssa form, it would certainly be a 
pain in the ass.

>
>
>
>> Ugh.  How unpleasant.  The beauty of a rewriting SSA is it just works.
>>
> It's not actually as unpleasant as one might think, because all the 
> updating can be done in replace_expr_in_tree, without *too* much 
> trouble.
> So from a programming perspective, it's quite possible to make it so it 
> is never seen by the ssa passes.
>
>
>>> I'm assuming you keep the definitions/uses/phi links updated
>>> approriately, which is where the real pain lies.
>> No need for this kind of bookkeeping since the dominator opts happen
>> before we build those links.
>>
>> jeff
>>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 14:02   ` law
@ 2001-10-22 14:25     ` Daniel Berlin
  2001-10-22 14:28       ` Daniel Berlin
  0 siblings, 1 reply; 34+ messages in thread
From: Daniel Berlin @ 2001-10-22 14:25 UTC (permalink / raw)
  To: law; +Cc: dnovillo, gcc

On Monday, October 22, 2001, at 05:04  PM, law@redhat.com wrote:

>   In message < BF0CB0D7-C728-11D5-9F25-0030657B5340@cgsoftware.com >you 
> write:
>>> So how is this done when SSA names are not reflected in the IR?
>> You can do the same thing.
>> You just need names.
>
>> If you look at the ssa-ccp patch i submitted, you'll note i added a
>> unique id number to each ref, and use it in the following way to give 
>> an
>> ssa name:
>> For a phi or a def, the ssa name is the id number.
>> For a use, the ssa name is the id number of the associated def.
>> This gives you the same "name" an explicit renamed representation would
>> give you, but we still haven't rewritten any code.
> Right, but whenever you make this kind of transformation you have to 
> scrurry
> around and find all the uses the update the id number to point to the 
> new
> def.
None that you wouldn't have to do anyway, since you need to update the 
use's chain to point to the new def anyway.  The ID number is part of 
the def/use structure, so doing the required update makes it work.
Though i think this is what you meant, i'm just having a bit of trouble 
parsing your sentence. Criminal law class does it to me.
Remember, we have all the uses for a def, too, so when you remove the 
def, you can just automatically update all the uses to the new def 
pretty simply (since, in the case of dominator optimizations, we know 
which def is the new one).



> Ugh.  How unpleasant.  The beauty of a rewriting SSA is it just works.
>
It's not actually as unpleasant as one might think, because all the 
updating can be done in replace_expr_in_tree, without *too* much trouble.
So from a programming perspective, it's quite possible to make it so it 
is never seen by the ssa passes.


>> I'm assuming you keep the definitions/uses/phi links updated
>> approriately, which is where the real pain lies.
> No need for this kind of bookkeeping since the dominator opts happen
> before we build those links.
>
> jeff
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 13:10 ` Daniel Berlin
@ 2001-10-22 14:02   ` law
  2001-10-22 14:25     ` Daniel Berlin
  0 siblings, 1 reply; 34+ messages in thread
From: law @ 2001-10-22 14:02 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: dnovillo, gcc

  In message < BF0CB0D7-C728-11D5-9F25-0030657B5340@cgsoftware.com >you write:
  > > So how is this done when SSA names are not reflected in the IR?
  > You can do the same thing.
  > You just need names.

  > If you look at the ssa-ccp patch i submitted, you'll note i added a 
  > unique id number to each ref, and use it in the following way to give an 
  > ssa name:
  > For a phi or a def, the ssa name is the id number.
  > For a use, the ssa name is the id number of the associated def.
  > This gives you the same "name" an explicit renamed representation would 
  > give you, but we still haven't rewritten any code.
Right, but whenever you make this kind of transformation you have to scrurry
around and find all the uses the update the id number to point to the new
def.  Ugh.  How unpleasant.  The beauty of a rewriting SSA is it just works.

  > I'm assuming you keep the definitions/uses/phi links updated 
  > approriately, which is where the real pain lies.
No need for this kind of bookkeeping since the dominator opts happen 
before we build those links.

jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Higher level RTL issues
  2001-10-22 11:30 law
@ 2001-10-22 13:10 ` Daniel Berlin
  2001-10-22 14:02   ` law
  2001-11-13 13:41 ` Diego Novillo
  1 sibling, 1 reply; 34+ messages in thread
From: Daniel Berlin @ 2001-10-22 13:10 UTC (permalink / raw)
  To: law; +Cc: dnovillo, gcc

> So, how do you turn this back into normal form?  Well, for the camp that
> reflects the SSA names in the IR, it's the standard SSA->normal 
> translation
> and everything just works.
>
> So how is this done when SSA names are not reflected in the IR?
You can do the same thing.
You just need names.
If you look at the ssa-ccp patch i submitted, you'll note i added a 
unique id number to each ref, and use it in the following way to give an 
ssa name:
For a phi or a def, the ssa name is the id number.
For a use, the ssa name is the id number of the associated def.
This gives you the same "name" an explicit renamed representation would 
give you, but we still haven't rewritten any code.
Now, just insert real copies for the fake phis, and you are golden. (IE 
converting *out* of fud chains of course, could require real code 
insertion).
I'm assuming you keep the definitions/uses/phi links updated 
approriately, which is where the real pain lies.

You could actually probably get away with only inserting copies for 
modified phis (making sure you remove arguments for defs that no longer 
exist).


> I have
> ideas, but I'd like to hear from someone with more experience in the
> non-rewriting camp.  You certainly can't do the naive thing and drop the
> SSA names in favor of the original names.

>
> FWIW, k.c.04.ssa is the same as j.c.04.ssa, except that dominator opts 
> were
> applied during translation into SSA form.  It may or may not be useful 
> to
> folks that want to poke around at this issue.
>
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Higher level RTL issues
@ 2001-10-22 11:30 law
  2001-10-22 13:10 ` Daniel Berlin
  2001-11-13 13:41 ` Diego Novillo
  0 siblings, 2 replies; 34+ messages in thread
From: law @ 2001-10-22 11:30 UTC (permalink / raw)
  To: dan; +Cc: dnovillo, gcc

As promised, here's an example where optimizations done on the SSA form
ultimately result in having two SSA names for the same original temporary
are live at the same time.

As I've mentioned before, when that happens it is not possible to simply
throw away the SSA names and use the original variable/temporary names.

This is not a trivial example.  I recommend you start by first taking
the info from blah.cfg and scribble out of CFG for this function.

Note carefully the loop at starting at block 23 and the backedges from
blocks 9, 16, 19, 20, 21 and 22 to block 23.

I've saved the RTL just before we convert into SSA in j.c.pressa.  Mostly
so that we can recover the original register #s from the RTL.  j.c.04.ssa
shows how we convert into SSA form without applying dominator based
optimizations.

We're concerned with the assignments and uses of two registers.

(reg 344) Defs:
  insn 209, bb5	  SSA reg 344
  insn 338, bb9	  SSA reg 607
  insn 407, bb16  SSA reg 614
  insn 458, bb18  SSA reg 617	Note this occurs after the def at insn 438

(reg 344) Uses:
  insn 312, bb8
  insn 317, bb8
  insn 438, bb18  
  insn 550, bb24

(reg 444) Defs:
  insn 305, bb8

(reg 444) Uses:
  insn 311, bb8

The PHI node at the start of bb23 looks like:

(insn 765 744 276 (set (reg:DI 603)
        (phi[
                (reg:DI 617)
                (const_int 22 [0x16])
                (reg:DI 617)
                (const_int 21 [0x15])
                (reg:DI 617)
                (const_int 20 [0x14])
                (reg:DI 617)
                (const_int 19 [0x13])
                (reg:DI 614)
                (const_int 16 [0x10])
                (reg:DI 607)
                (const_int 9 [0x9])
                (reg/v:DI 344)
                (const_int 7 [0x7])
            ] )) -1 (nil)
    (nil))

Verification that this makes sense is left to the reader.  Basically
(reg 344) reaches from outside the loop.  The rest reach from inside the loop.
The lifetimes of regs 344, 607, 614, 617 and 603 are disjoint.

There is no PHI node related to the definition of (reg 444) as (reg 444) is
only assigned to once in the original RTL.

Those are the key items from the translation from normal into SSA form.  If
we made no transformations to the code we could throw away the SSA register
numbers and just use the original register numbers.

Now look in j.c.04.ssa for insn 305 in block 8 and insn 458 in block 18.
They should look like:

(insn 305 303 307 (set (reg:DI 444)
        (sign_extend:DI (reg/v:SI 343))) -1 (nil)
    (nil))

(insn 458 456 461 (set (reg:DI 617)
        (sign_extend:DI (reg/v:SI 343))) -1 (nil)
    (nil))

From the CFG we know that block 8 dominates block 18 and thus insn 305 
dominates insn 458.  Since we're in SSA form we know that (reg 343) is
only assigned to once and its set dominates its uses.  Therefore, we
know that insn 458 is redundant and with a little work can be removed.

Armed with that information, let's pretend that we're doing dominator
optimizations as Morgan has described while translating from normal
form into SSA form.

When we encounter insn 305, we record that (sign_extend:DI (reg:SI 343))
can be found in (reg:DI 444).  We proceed to follow paths through the
dominator tree and eventually reach block 18 with insn 458.  Remember
block 8 (which contained insn 305) dominates block 18, so entries we
added to the hash table for block 8 are still available.

When we reach insn 458, we lookup (sign_extend:DI (reg:SI 343)) and we
get (reg:DI 444).  Cool.  That means insn 458 is redundant.  Delete it.
Do some manipulation of our renaming stacks to arrange for uses of 
(reg:DI 617) to be rewritten as (reg:DI 444).  The only uses of 
(reg:DI 617) are at insn 765.

After finishing the translation into SSA form the PHI node at the start
of block 23 would look like:

(insn 765 744 276 (set (reg:DI 603)
        (phi[
                (reg:DI 444)
                (const_int 22 [0x16])
                (reg:DI 444)
                (const_int 21 [0x15])
                (reg:DI 444)
                (const_int 20 [0x14])
                (reg:DI 444)
                (const_int 19 [0x13])
                (reg:DI 614)
                (const_int 16 [0x10])
                (reg:DI 607)
                (const_int 9 [0x9])
                (reg/v:DI 344)
                (const_int 7 [0x7])
            ] )) -1 (nil)
    (nil))

So far so good?  If you want, you can verify that the changes we made were
all valid and we're still in SSA form.

However, note carefully that SSA name (reg:DI 444) is live at the same time
as SSA name (reg:DI 603).  Specifically they are both live starting after
insn 305 in block 8.  

This happened because redundancy elimination extended the lifetime of one
register in order to eliminate a redundancy.

So, how do you turn this back into normal form?  Well, for the camp that
reflects the SSA names in the IR, it's the standard SSA->normal translation
and everything just works.

So how is this done when SSA names are not reflected in the IR?  I have
ideas, but I'd like to hear from someone with more experience in the
non-rewriting camp.  You certainly can't do the naive thing and drop the
SSA names in favor of the original names.

FWIW, k.c.04.ssa is the same as j.c.04.ssa, except that dominator opts were
applied during translation into SSA form.  It may or may not be useful to
folks that want to poke around at this issue.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2001-11-23  3:56 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-09 19:46 Higher level RTL issues law
2001-10-09 20:54 ` Diego Novillo
2001-10-09 21:27   ` Daniel Berlin
2001-10-09 21:49     ` Diego Novillo
2001-10-09 22:23       ` Daniel Berlin
2001-10-19 14:39     ` law
2001-10-19 12:19   ` law
2001-10-09 21:18 ` Daniel Berlin
2001-10-09 21:33   ` Diego Novillo
2001-10-19 14:37   ` law
2001-10-19 15:53     ` Daniel Berlin
2001-10-22  9:31       ` law
2001-10-22 11:49         ` Daniel Berlin
2001-10-13  2:27 ` Jan Hubicka
2001-10-19 12:03   ` law
2001-10-21 10:40     ` Jan Hubicka
2001-10-22  8:28       ` law
2001-10-22  8:36         ` Daniel Berlin
2001-10-22  8:56           ` law
2001-10-22  9:07           ` Jan Hubicka
2001-10-22 11:32             ` law
2001-10-22 15:07               ` Richard Henderson
2001-10-23 15:16               ` Joern Rennecke
2001-10-22 16:22     ` Joern Rennecke
2001-10-23  3:02       ` Jan Hubicka
2001-10-23 15:28         ` law
2001-10-24  7:59           ` Jan Hubicka
2001-10-22 11:30 law
2001-10-22 13:10 ` Daniel Berlin
2001-10-22 14:02   ` law
2001-10-22 14:25     ` Daniel Berlin
2001-10-22 14:28       ` Daniel Berlin
2001-10-22 14:48         ` law
2001-11-13 13:41 ` Diego Novillo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).