public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Higher level RTL issues
@ 2001-10-09 19:46 law
  2001-10-09 20:54 ` Diego Novillo
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: law @ 2001-10-09 19:46 UTC (permalink / raw)
  To: gcc

I know folks have heard me mention the idea of a higher level RTL from
time to time.  I must apologize for not posting this earlier -- oh well,
life goes on.  I must also apologize as this isn't 100% complete, but
given that I haven't posted anything about it already and the fact that
I'm going on vacation tomorrow, I figured I'd at least get this info out
and let folks chew on it while I'm gone.

I don't think there's any significant disagreement that a higher level
IL would be useful.  What is a subject of debate is whether we have
a lower tree form, higher RTL form or some other form.  I think if we
look at the set of problems we want to solve we can get a good feel for
which of the 3 choices makes the most sense.


I'm looking at the new IL as a way to enable an SSA optimization path.  On
that optimization path I'm initially looking at the following optimizations:

	* Sparse conditional constant propagation (ssa-ccp.c)

	* Traditional dead code elimination (ssa-dce.c)

	* Aggressive dead code elimination (ssa-dce.c)

	* Register coalescing and renaming (ssa.c)

	* Dominator based value numbering (will eventually be in ssa.c)

Certainly there will be more optimizations in the future, but I think this
set gives us a good idea of the kinds of optimization problems I want to 
solve.

Let's first consider Tree SSA work Diego's doing vs our existing SSA
translator.  The SSA naming discipline in Diego's work is merely a
conceptual model -- we do not actually rewrite the tree structures to
conform to SSA naming rules.  The nice thing about this model is you
don't actually have to convert the IL into and out of SSA form.

That model works fine for a class of optimization problems, but breaks down
with any optimization which potentially has two SSA names for the same 
variable live at the same point in the program.  GVN and dominator based
value numbering can create situations where we have two SSA names live at
the same time.  It's probably true that other interesting optimizations
could create similar situations.  I believe this argues that either the
tree SSA needs to do rewriting or that tree SSA is not suitable for the
class of optimizations I'm trying to implement.

Now let's consider writing a new IL to sit between trees and RTL.  It can
certainly be done, no doubt about that.  However, that means we have to
come up with a new class of routines to analyze and manipulate this new IL
as well as translators in/out of this new IL.  It means we as developers have
to familiarize ourselves with a whole new IL.  It means that we can't use any
of the existing work in the SSA translator and optimization passes.  In short
I just don't think it makes a lot of sense.

So that leaves the 3rd alternative and my preferred solution -- a higher level
RTL.  The biggest benefits of this direction are that we already have code
which can analyze and manipulate RTL which would likely only need minor
extensions.  Similarly I believe the learning curve for us as developers is
a whole lot smaller since we're still working with RTL.


So, given the set of optimization problems I want to solve and the possible
solutions using a higher level RTL seems like the natural choice to me.

--

If we now switch gears and assume that we're going to have a higher level
RTL representation we have the interesting problem of defining the higher
level RTL representation and means by which we lower from the higher level
RTL representation into RTL as we know it today.

Given that I'm looking to have this new RTL layer be suitable for a class
of SSA optimizers, I've largely let the needs of those optimizers define
the properties of the new RTL layer.

As a mental model, we have a set of standard patterns that are defined
unconditionally for all targets.   Those patterns handle standard
operations -- data movement, arithmetic, logical, flow control, etc. 
The operands within those patterns have the property that registers and
constants are interchangable -- ie, if we have a register, we can 
replace it with a constant and the result is a valid insn that does not
need re-recognition.  Furthermore, we have a set of standard addressing
modes (TBD, but I'm considering reg, reg + disp, constant).  MEMs can
be replaced by registers and thus by constants under the same rules.
[ It's not clear where we're going to allow MEMs, but anywhere we allow
  a MEM will also allow a REG or constant. ]

There are no libcalls, subregs, cc0, etc which cause us so much trouble
in SSA form.


If we have this higher level RTL we have the interesting problem of 
lowering from the high level RTL to RTL as we know it today.  Clearly
we want to avoid having to rewrite each backend.  In an ideal world
we wouldn't have to twiddle backends at all, but I'm not sure if
that's feasible in practice.  So it is a goal to not have to twiddle
backends, but it is not an absolute requirement.

I believe we want to structure the lowering phase in the following
manner (many thanks to Richard for help in this space):

  a. First try to match the high level RTL to a pattern in the
     target's backend.  If we get a match, then no lowering was
     needed.

  b. See if the RTL matches a define_split in the backend, if so
     call the target backend's splitter to perform the lowering.
     The insns resulting from the split must match patterns in the
     target's backend.

  c. Extract the operator and its operands and call back into the
     expand_foo routines we currently have (expr.c optabs.c).  Those
     will use the target's define_expands and the generic lowering code we've
     already got in expr.c, optabs.c and friends to generate target
     specific RTL.

Once the lowering phase is started, the patterns for the higher level
RTL are no longer allowed to match.  


--
 


This design has come from my experiences in trying to build out the
fledgling SSA optimization path mentioned at the start of this message.

I've actually implemented this stuff by hand on one embedded target using
state variables and define_splits for lowering as a proof of concept.  It
worked well enough to build libgcc, newlib, libstdc++ and run the various
testsuites (including the proprietary testsuites Red Hat uses internally).

I've also had this up and running on IA-64 linux -- which included successful
bootstrapping, spec runs, etc.


Anyway, chew on it for a while and I'll join in the discussion when I return
from vacation.

jeff

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Higher level RTL issues
@ 2001-10-22 11:30 law
  2001-10-22 13:10 ` Daniel Berlin
  2001-11-13 13:41 ` Diego Novillo
  0 siblings, 2 replies; 34+ messages in thread
From: law @ 2001-10-22 11:30 UTC (permalink / raw)
  To: dan; +Cc: dnovillo, gcc

As promised, here's an example where optimizations done on the SSA form
ultimately result in having two SSA names for the same original temporary
are live at the same time.

As I've mentioned before, when that happens it is not possible to simply
throw away the SSA names and use the original variable/temporary names.

This is not a trivial example.  I recommend you start by first taking
the info from blah.cfg and scribble out of CFG for this function.

Note carefully the loop at starting at block 23 and the backedges from
blocks 9, 16, 19, 20, 21 and 22 to block 23.


I've saved the RTL just before we convert into SSA in j.c.pressa.  Mostly
so that we can recover the original register #s from the RTL.  j.c.04.ssa
shows how we convert into SSA form without applying dominator based
optimizations.

We're concerned with the assignments and uses of two registers.

(reg 344) Defs:
  insn 209, bb5	  SSA reg 344
  insn 338, bb9	  SSA reg 607
  insn 407, bb16  SSA reg 614
  insn 458, bb18  SSA reg 617	Note this occurs after the def at insn 438

(reg 344) Uses:
  insn 312, bb8
  insn 317, bb8
  insn 438, bb18  
  insn 550, bb24

(reg 444) Defs:
  insn 305, bb8

(reg 444) Uses:
  insn 311, bb8

  
The PHI node at the start of bb23 looks like:

(insn 765 744 276 (set (reg:DI 603)
        (phi[
                (reg:DI 617)
                (const_int 22 [0x16])
                (reg:DI 617)
                (const_int 21 [0x15])
                (reg:DI 617)
                (const_int 20 [0x14])
                (reg:DI 617)
                (const_int 19 [0x13])
                (reg:DI 614)
                (const_int 16 [0x10])
                (reg:DI 607)
                (const_int 9 [0x9])
                (reg/v:DI 344)
                (const_int 7 [0x7])
            ] )) -1 (nil)
    (nil))

Verification that this makes sense is left to the reader.  Basically
(reg 344) reaches from outside the loop.  The rest reach from inside the loop.
The lifetimes of regs 344, 607, 614, 617 and 603 are disjoint.


There is no PHI node related to the definition of (reg 444) as (reg 444) is
only assigned to once in the original RTL.

Those are the key items from the translation from normal into SSA form.  If
we made no transformations to the code we could throw away the SSA register
numbers and just use the original register numbers.

Now look in j.c.04.ssa for insn 305 in block 8 and insn 458 in block 18.
They should look like:

(insn 305 303 307 (set (reg:DI 444)
        (sign_extend:DI (reg/v:SI 343))) -1 (nil)
    (nil))


(insn 458 456 461 (set (reg:DI 617)
        (sign_extend:DI (reg/v:SI 343))) -1 (nil)
    (nil))


From the CFG we know that block 8 dominates block 18 and thus insn 305 
dominates insn 458.  Since we're in SSA form we know that (reg 343) is
only assigned to once and its set dominates its uses.  Therefore, we
know that insn 458 is redundant and with a little work can be removed.

Armed with that information, let's pretend that we're doing dominator
optimizations as Morgan has described while translating from normal
form into SSA form.

When we encounter insn 305, we record that (sign_extend:DI (reg:SI 343))
can be found in (reg:DI 444).  We proceed to follow paths through the
dominator tree and eventually reach block 18 with insn 458.  Remember
block 8 (which contained insn 305) dominates block 18, so entries we
added to the hash table for block 8 are still available.

When we reach insn 458, we lookup (sign_extend:DI (reg:SI 343)) and we
get (reg:DI 444).  Cool.  That means insn 458 is redundant.  Delete it.
Do some manipulation of our renaming stacks to arrange for uses of 
(reg:DI 617) to be rewritten as (reg:DI 444).  The only uses of 
(reg:DI 617) are at insn 765.



After finishing the translation into SSA form the PHI node at the start
of block 23 would look like:

(insn 765 744 276 (set (reg:DI 603)
        (phi[
                (reg:DI 444)
                (const_int 22 [0x16])
                (reg:DI 444)
                (const_int 21 [0x15])
                (reg:DI 444)
                (const_int 20 [0x14])
                (reg:DI 444)
                (const_int 19 [0x13])
                (reg:DI 614)
                (const_int 16 [0x10])
                (reg:DI 607)
                (const_int 9 [0x9])
                (reg/v:DI 344)
                (const_int 7 [0x7])
            ] )) -1 (nil)
    (nil))


So far so good?  If you want, you can verify that the changes we made were
all valid and we're still in SSA form.

However, note carefully that SSA name (reg:DI 444) is live at the same time
as SSA name (reg:DI 603).  Specifically they are both live starting after
insn 305 in block 8.  

This happened because redundancy elimination extended the lifetime of one
register in order to eliminate a redundancy.


So, how do you turn this back into normal form?  Well, for the camp that
reflects the SSA names in the IR, it's the standard SSA->normal translation
and everything just works.

So how is this done when SSA names are not reflected in the IR?  I have
ideas, but I'd like to hear from someone with more experience in the
non-rewriting camp.  You certainly can't do the naive thing and drop the
SSA names in favor of the original names.

FWIW, k.c.04.ssa is the same as j.c.04.ssa, except that dominator opts were
applied during translation into SSA form.  It may or may not be useful to
folks that want to poke around at this issue.


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2001-11-23  3:56 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-09 19:46 Higher level RTL issues law
2001-10-09 20:54 ` Diego Novillo
2001-10-09 21:27   ` Daniel Berlin
2001-10-09 21:49     ` Diego Novillo
2001-10-09 22:23       ` Daniel Berlin
2001-10-19 14:39     ` law
2001-10-19 12:19   ` law
2001-10-09 21:18 ` Daniel Berlin
2001-10-09 21:33   ` Diego Novillo
2001-10-19 14:37   ` law
2001-10-19 15:53     ` Daniel Berlin
2001-10-22  9:31       ` law
2001-10-22 11:49         ` Daniel Berlin
2001-10-13  2:27 ` Jan Hubicka
2001-10-19 12:03   ` law
2001-10-21 10:40     ` Jan Hubicka
2001-10-22  8:28       ` law
2001-10-22  8:36         ` Daniel Berlin
2001-10-22  8:56           ` law
2001-10-22  9:07           ` Jan Hubicka
2001-10-22 11:32             ` law
2001-10-22 15:07               ` Richard Henderson
2001-10-23 15:16               ` Joern Rennecke
2001-10-22 16:22     ` Joern Rennecke
2001-10-23  3:02       ` Jan Hubicka
2001-10-23 15:28         ` law
2001-10-24  7:59           ` Jan Hubicka
2001-10-22 11:30 law
2001-10-22 13:10 ` Daniel Berlin
2001-10-22 14:02   ` law
2001-10-22 14:25     ` Daniel Berlin
2001-10-22 14:28       ` Daniel Berlin
2001-10-22 14:48         ` law
2001-11-13 13:41 ` Diego Novillo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).