Re: Need advice on interface to non-gcc scheduler

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Need advice on interface to non-gcc scheduler
@ 2000-03-31  5:59 Steven Roos
  2000-03-31 13:50 ` Damjan Lampret
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Roos @ 2000-03-31  5:59 UTC (permalink / raw)
  To: gcc

"Damjan Lampret" <lampret@opencores.org> wrote:
> Hi Steven,
> 
> is your architecture similar to something like
> http://www.opencores.org/cores/or2k/ ?

The philosophy behind the architecture is more or less the same, but
on the implementation level there are some differences. The three most
important differences are:
1. We use a one-dimensional array instead of a two-dimensional matrix
   because we think that would map better onto hardware.
2. We allow only Ri (the only non-locally visible register on a unit)
   as the destination of an operation.
3. Our FUs are not all equivalent because we think the hardware
   required for some opcodes like multiply is too expensive to be
   included in each unit.

However, even with these differences the similarities are remarkable.
Unfortunately I do not have a recent architecture description. The
paper http://cardit.et.tudelft.nl/MOVE/papers/Roos99a.ps.gz gives
a good introduction, but is in many ways outdated. For example, it
does not yet include the concept of a local register file in each
unit.

> I've tried to model OpenRISC 2000 in
> GCC but GCC doesn't understand very good concept of distributed register
> file. I've tried this with register classes defining a new register class
> for each functional unit and its registers. Of course since GCC was designed
> for traditional CPUs where you have one central register file it does not
> generate efficient code for CPU similar to OR2K. I would be interested if
> anyone has better idea how to model OR2K in GCC so that GCC would generate
> efficient code?
> 
> regards, Damjan

I haven't found any compiler that can effectively handle the
non-homogeneous and distributed register files as found in OR2K and my 
architecture. Most compilers (including gcc) try to assign a single
register to the complete liverange of a variable and fail to
distribute instructions and data evenly over the processor.

Therefore I have put down a completely new scheduling algorithm that
is capable of handling such register file topology. I think it can
also schedule for OR2K with very limited modifications. It requires a
completely new scheduler module, with its own internal representation.
My idea is to use gcc with a RISC-like target as a front-end and
switch my own module for the scheduling and register allocation.

Regards,
Steven

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need advice on interface to non-gcc scheduler
  2000-03-31  5:59 Need advice on interface to non-gcc scheduler Steven Roos
@ 2000-03-31 13:50 ` Damjan Lampret
  2000-04-01  2:17   ` Marko Mlinar
  0 siblings, 1 reply; 6+ messages in thread
From: Damjan Lampret @ 2000-03-31 13:50 UTC (permalink / raw)
  To: gcc; +Cc: Marko Mlinar

Steven,

I am also surprised with similarities between both architectures. Our
approach would be similar to yours except in our case we were thinking to
use assembly output for some generic RISC as input to "optimizer" for OR2K.
BTW FUs in OR2K doesn't need to be equal since some instructions are more
common than others.

regards, Damjan

----- Original Message -----
From: Steven Roos <S.Roos@its.tudelft.nl>
To: <gcc@gcc.gnu.org>
Sent: Friday, March 31, 2000 3:59 PM
Subject: Re: Need advice on interface to non-gcc scheduler


> "Damjan Lampret" <lampret@opencores.org> wrote:
> > Hi Steven,
> >
> > is your architecture similar to something like
> > http://www.opencores.org/cores/or2k/ ?
>
> The philosophy behind the architecture is more or less the same, but
> on the implementation level there are some differences. The three most
> important differences are:
> 1. We use a one-dimensional array instead of a two-dimensional matrix
>    because we think that would map better onto hardware.
> 2. We allow only Ri (the only non-locally visible register on a unit)
>    as the destination of an operation.
> 3. Our FUs are not all equivalent because we think the hardware
>    required for some opcodes like multiply is too expensive to be
>    included in each unit.
>
> However, even with these differences the similarities are remarkable.
> Unfortunately I do not have a recent architecture description. The
> paper http://cardit.et.tudelft.nl/MOVE/papers/Roos99a.ps.gz gives
> a good introduction, but is in many ways outdated. For example, it
> does not yet include the concept of a local register file in each
> unit.
>
> > I've tried to model OpenRISC 2000 in
> > GCC but GCC doesn't understand very good concept of distributed register
> > file. I've tried this with register classes defining a new register
class
> > for each functional unit and its registers. Of course since GCC was
designed
> > for traditional CPUs where you have one central register file it does
not
> > generate efficient code for CPU similar to OR2K. I would be interested
if
> > anyone has better idea how to model OR2K in GCC so that GCC would
generate
> > efficient code?
> >
> > regards, Damjan
>
> I haven't found any compiler that can effectively handle the
> non-homogeneous and distributed register files as found in OR2K and my
> architecture. Most compilers (including gcc) try to assign a single
> register to the complete liverange of a variable and fail to
> distribute instructions and data evenly over the processor.
>
> Therefore I have put down a completely new scheduling algorithm that
> is capable of handling such register file topology. I think it can
> also schedule for OR2K with very limited modifications. It requires a
> completely new scheduler module, with its own internal representation.
> My idea is to use gcc with a RISC-like target as a front-end and
> switch my own module for the scheduling and register allocation.
>
> Regards,
> Steven
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need advice on interface to non-gcc scheduler
  2000-03-31 13:50 ` Damjan Lampret
@ 2000-04-01  2:17   ` Marko Mlinar
  2000-04-03  5:59     ` Steven Roos
  0 siblings, 1 reply; 6+ messages in thread
From: Marko Mlinar @ 2000-04-01  2:17 UTC (permalink / raw)
  To: S.Roos; +Cc: Damjan Lampret, gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3528 bytes --]

Steven,

I am Marko Mlinar, co-developer of OR2k, mentioned before.
Let me say that I am surprised too, that our designs have so much
similarities. Besides I don't think those three points
you mentioned aren't really differences except second one.
You even used 4 as a number of neigbours like we did.
And optimization algorithm is really similar to one we used.
(even heuristics proposed in article; well - later we used more
complex ones, that produced better results).
Have you made any simulations? We have already made several.

I wonder - do you intend to use such arhitecture for general purpose
computing or just for applications that needs high computation power?
How do you intend to apply jumps/calls to your arhitecture?
Major differences I see are:
1. your design uses (more) centralised register file, which may cause
    problems, especially with jumps and calls; maybe localised register
    file (as you mentioned it) and as we use it may put us to same ground
2. your use more 'localised' connections this may cause problems with
    larger number of units, but on the other hand dataflow graph may be
    much more easily extracted.

I suggest that you use more specialised algorithm based on dataflow tree
extraction (place small parts of df tree into 2D (unit,time) space), which
could have much lower complexity and could yield better results.
(I am just speculating based on my experiences - I have nothing concrete
in mind yet).

I belive (based on my simulations), that algorithms doesn't work so
well for large number of units (like >10).


best regards,
    Marko

----- Izvorno sporoÃ¨ilo -----
Od: Damjan Lampret <lampret@opencores.org>
Za: <gcc@gcc.gnu.org>
Kp: Marko Mlinar <Marko.Mlinar@campus.fri.uni-lj.si>
Poslano: 31. marec 2000 23:54
Zadeva: Re: Need advice on interface to non-gcc scheduler


> Steven,
>
> I am also surprised with similarities between both architectures. Our
> approach would be similar to yours except in our case we were thinking to
> use assembly output for some generic RISC as input to "optimizer" for
OR2K.
> BTW FUs in OR2K doesn't need to be equal since some instructions are more
> common than others.
>
> regards, Damjan
>
> ----- Original Message -----
> From: Steven Roos <        >
> To: <gcc@gcc.gnu.org>
> Sent: Friday, March 31, 2000 3:59 PM
> Subject: Re: Need advice on interface to non-gcc scheduler
>
>
> > "Damjan Lampret" <lampret@opencores.org> wrote:
> > > Hi Steven,
> > >
> > > is your architecture similar to something like
> > > http://www.opencores.org/cores/or2k/ ?
> >
> > The philosophy behind the architecture is more or less the same, but
> > on the implementation level there are some differences. The three most
> > important differences are:
> > 1. We use a one-dimensional array instead of a two-dimensional matrix
> >    because we think that would map better onto hardware.
> > 2. We allow only Ri (the only non-locally visible register on a unit)
> >    as the destination of an operation.
> > 3. Our FUs are not all equivalent because we think the hardware
> >    required for some opcodes like multiply is too expensive to be
> >    included in each unit.
> >
> > However, even with these differences the similarities are remarkable.
> > Unfortunately I do not have a recent architecture description. The
> > paper http://cardit.et.tudelft.nl/MOVE/papers/Roos99a.ps.gz gives
> > a good introduction, but is in many ways outdated. For example, it
> > does not yet include the concept of a local register file in each
> > unit.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need advice on interface to non-gcc scheduler
  2000-04-01  2:17   ` Marko Mlinar
@ 2000-04-03  5:59     ` Steven Roos
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Roos @ 2000-04-03  5:59 UTC (permalink / raw)
  To: Marko Mlinar; +Cc: Damjan Lampret, gcc

Marko Mlinar wrote:
> Steven,
> 
> I am Marko Mlinar, co-developer of OR2k, mentioned before.
> Let me say that I am surprised too, that our designs have so much
> similarities. Besides I don't think those three points
> you mentioned aren't really differences except second one.
> You even used 4 as a number of neigbours like we did.

Hi Marko,

The number of 4 (2 left, 2 right) was used as an example. Real
implementations may use any convenient number. We think that 4 may
be rather small, and that 6 or 8 may give a better balance between
hardware costs and communication latency (at least for a 1D
topology).

> And optimization algorithm is really similar to one we used.
> (even heuristics proposed in article; well - later we used more
> complex ones, that produced better results).
> Have you made any simulations? We have already made several.

We changed the scheduling algorithm recently. The algorithm described
in the paper has some serious problems when choosing a local register
file for long-living variables. It has to choose a file long before it 
has any idea of where the variable is likely going to be used.

Our new algorithm tries to postpone that decision until it has to
schedule a use,so it can better decide where to move the variable to.
It is also better at deciding when to split a variable, for example
when it has to be used in remote parts of the processor.

> I wonder - do you intend to use such arhitecture for general purpose
> computing or just for applications that needs high computation power?
> How do you intend to apply jumps/calls to your arhitecture?

We think that the architecture is mostly suited for embedded
applications with lots of parallelism and a high need of computation 
power. If we want to use it as a general purpose processor we have to
include support for floating point, exceptions, memory management,
etc. While it may be possible, these features would highly increase
the complexity of the architecture and would conflict with the idea of 
fast, simple and cheap hardware.

The implementation of jumps and calls is really nothing special. One
of the units gets some extra opcodes for (conditional) jumps and
calls. They will have some delay slots; two is probably enough.

> Major differences I see are:
> 1. your design uses (more) centralised register file, which may cause
>     problems, especially with jumps and calls; maybe localised register
>     file (as you mentioned it) and as we use it may put us to same ground

We left the idea of register file units shortly after we submitted the 
paper. The only two remaining differences in the register architecture 
are 
1: We allow only Ri as the destination.
2: Both our operands may come from a neighbor unit (the predicate
   operand shown in the paper has been removed).

> 2. your use more 'localised' connections this may cause problems with
>     larger number of units, but on the other hand dataflow graph may be
>     much more easily extracted.
> 
> I suggest that you use more specialised algorithm based on dataflow tree
> extraction (place small parts of df tree into 2D (unit,time) space), which
> could have much lower complexity and could yield better results.
> (I am just speculating based on my experiences - I have nothing concrete
> in mind yet).

I would really like to have an algorithm that could map a dataflow
graph into 2D as it solves a large part of the problem of deciding
on which unit an operation must be scheduled. When I have that info,
the rest of the schedule (transports, register allocation) is fairly
straightforward. The problem is: what is a good mapping, and how do I
find it.

> I belive (based on my simulations), that algorithms doesn't work so
> well for large number of units (like >10).
> 
> 
> best regards,
>     Marko

Greetings,
Steven

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Need advice on interface to non-gcc scheduler
  2000-03-31  2:58 Steven Roos
@ 2000-03-31  3:39 ` Damjan Lampret
  0 siblings, 0 replies; 6+ messages in thread
From: Damjan Lampret @ 2000-03-31  3:39 UTC (permalink / raw)
  To: gcc; +Cc: Marko Mlinar

Hi Steven,

is your architecture similar to something like
http://www.opencores.org/cores/or2k/ ? I've tried to model OpenRISC 2000 in
GCC but GCC doesn't understand very good concept of distributed register
file. I've tried this with register classes defining a new register class
for each functional unit and its registers. Of course since GCC was designed
for traditional CPUs where you have one central register file it does not
generate efficient code for CPU similar to OR2K. I would be interested if
anyone has better idea how to model OR2K in GCC so that GCC would generate
efficient code?

regards, Damjan

----- Original Message -----
From: Steven Roos <S.Roos@its.tudelft.nl>
To: <gcc@gcc.gnu.org>
Sent: Friday, March 31, 2000 12:58 PM
Subject: Need advice on interface to non-gcc scheduler


> Hi,
>
> Currently I am working on an instruction scheduler for a new computer
> architecture. This new architecture is not like any current
> architecture and I guess there is no way to make gcc generate code
> for it.
>
> The architecture looks like an extremely wide VLIW with limited
> connectivity. To avoid long wires wherever possible, all resources
> are distributed over all units. The concept of a centralized register
> file does not exist. When scheduling an instruction all related data
> transports have to be scheduled explicitly too.
>
> My idea is to use as much of gcc as possible and send the gcc RTL code
> to a dedicated scheduler just before gcc would do its own scheduling.
>
> I would like some advice on how to implement this interface. How hard is
> it to generate such an interface? Can I get all sorts of analysis info
> (mem alias, data flow, control flow) through this interface? How about
> profiling info and debug info?
>
> Greetings,
> Steven Roos
> Delft University of Technology
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Need advice on interface to non-gcc scheduler
@ 2000-03-31  2:58 Steven Roos
  2000-03-31  3:39 ` Damjan Lampret
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Roos @ 2000-03-31  2:58 UTC (permalink / raw)
  To: gcc

Hi,

Currently I am working on an instruction scheduler for a new computer
architecture. This new architecture is not like any current
architecture and I guess there is no way to make gcc generate code
for it. 

The architecture looks like an extremely wide VLIW with limited
connectivity. To avoid long wires wherever possible, all resources
are distributed over all units. The concept of a centralized register
file does not exist. When scheduling an instruction all related data
transports have to be scheduled explicitly too.

My idea is to use as much of gcc as possible and send the gcc RTL code
to a dedicated scheduler just before gcc would do its own scheduling.

I would like some advice on how to implement this interface. How hard is 
it to generate such an interface? Can I get all sorts of analysis info 
(mem alias, data flow, control flow) through this interface? How about 
profiling info and debug info?

Greetings,
Steven Roos
Delft University of Technology

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2000-04-03  5:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-03-31  5:59 Need advice on interface to non-gcc scheduler Steven Roos
2000-03-31 13:50 ` Damjan Lampret
2000-04-01  2:17   ` Marko Mlinar
2000-04-03  5:59     ` Steven Roos
  -- strict thread matches above, loose matches on Subject: below --
2000-03-31  2:58 Steven Roos
2000-03-31  3:39 ` Damjan Lampret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).