Dropping of old loop optimizer

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Dropping of old loop optimizer
@ 2003-02-27  1:04 Zdenek Dvorak
  2003-02-27  1:12 ` tm_gccmail
  0 siblings, 1 reply; 33+ messages in thread
From: Zdenek Dvorak @ 2003-02-27  1:04 UTC (permalink / raw)
  To: gcc; +Cc: m.hayes, rth, law, dan, jh

Hello,

I would like to know your opinion on this subject. Why I believe old
loop optimizer should be dropped as soon as possible:
1) It destroys CFG, and thus profile feedback/estimation must be run
   after it. It would be cool to have it available at least in gcse,
   and perhaps other earlier passes too.  This could probably be solved
   by just dropping the part that manipulates CFG (the unroller) and
   relatively simple modification of the rest.
2) It requires the loop notes. The need to preserve them limits and
   complicates some pre-loop passes (cfg_cleanups, copy_loop_headers,
   ...) and enables the loop optimizer to work only for a limited class
   of loops. This does not seem to be fixable without radical rewrite
   of the optimizer.

What the loop optimizer consists of:
-- unroller -- already replaced by new one.
-- doloop optimalization -- can be trivially converted to use the new loop
   infrastructure
-- loop invariant motion -- should be unnecessary with gcse; I am
   currently working on improving gcse so that it indeed is
-- prefetching, strength reduction, induction variable elimination --
   this is a problem. They are way too magic for me to be able to
   just modify them somehow; I would basically have to rewrite them from
   scratch (of course some parts could be used, but anyway it would be
   quite a lot of work)

Any ideas?

Zdenek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27  1:04 Dropping of old loop optimizer Zdenek Dvorak
@ 2003-02-27  1:12 ` tm_gccmail
  2003-02-27  2:27   ` Zdenek Dvorak
  0 siblings, 1 reply; 33+ messages in thread
From: tm_gccmail @ 2003-02-27  1:12 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc, m.hayes, rth, law, dan, jh

On Thu, 27 Feb 2003, Zdenek Dvorak wrote:

> Hello,
> 
> I would like to know your opinion on this subject. Why I believe old
> loop optimizer should be dropped as soon as possible:
> 1) It destroys CFG, and thus profile feedback/estimation must be run
>    after it. It would be cool to have it available at least in gcse,
>    and perhaps other earlier passes too.  This could probably be solved
>    by just dropping the part that manipulates CFG (the unroller) and
>    relatively simple modification of the rest.
> 2) It requires the loop notes. The need to preserve them limits and
>    complicates some pre-loop passes (cfg_cleanups, copy_loop_headers,
>    ...) and enables the loop optimizer to work only for a limited class
>    of loops. This does not seem to be fixable without radical rewrite
>    of the optimizer.
> 
> What the loop optimizer consists of:
> -- unroller -- already replaced by new one.
> -- doloop optimalization -- can be trivially converted to use the new loop
>    infrastructure
> -- loop invariant motion -- should be unnecessary with gcse; I am
>    currently working on improving gcse so that it indeed is
> -- prefetching, strength reduction, induction variable elimination --
>    this is a problem. They are way too magic for me to be able to
>    just modify them somehow; I would basically have to rewrite them from
>    scratch (of course some parts could be used, but anyway it would be
>    quite a lot of work)

The "old" loop optimizer does BIV-to-GIV flattening which is important on
processors without double-register addressing modes such as the SH,
MIPS16, Thumb, and a few others.

Does the new loop optimizer support this feature?

> Any ideas?
> 
> Zdenek

Toshi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27  1:12 ` tm_gccmail
@ 2003-02-27  2:27   ` Zdenek Dvorak
  2003-02-27  4:07     ` tm_gccmail
  2003-02-27 15:48     ` Pop Sébastian
  0 siblings, 2 replies; 33+ messages in thread
From: Zdenek Dvorak @ 2003-02-27  2:27 UTC (permalink / raw)
  To: tm_gccmail; +Cc: gcc, m.hayes, rth, law, dan, jh

Hello,

> > I would like to know your opinion on this subject. Why I believe old
> > loop optimizer should be dropped as soon as possible:
> > 1) It destroys CFG, and thus profile feedback/estimation must be run
> >    after it. It would be cool to have it available at least in gcse,
> >    and perhaps other earlier passes too.  This could probably be solved
> >    by just dropping the part that manipulates CFG (the unroller) and
> >    relatively simple modification of the rest.
> > 2) It requires the loop notes. The need to preserve them limits and
> >    complicates some pre-loop passes (cfg_cleanups, copy_loop_headers,
> >    ...) and enables the loop optimizer to work only for a limited class
> >    of loops. This does not seem to be fixable without radical rewrite
> >    of the optimizer.
> > 
> > What the loop optimizer consists of:
> > -- unroller -- already replaced by new one.
> > -- doloop optimalization -- can be trivially converted to use the new loop
> >    infrastructure
> > -- loop invariant motion -- should be unnecessary with gcse; I am
> >    currently working on improving gcse so that it indeed is
> > -- prefetching, strength reduction, induction variable elimination --
> >    this is a problem. They are way too magic for me to be able to
> >    just modify them somehow; I would basically have to rewrite them from
> >    scratch (of course some parts could be used, but anyway it would be
> >    quite a lot of work)
> 
> The "old" loop optimizer does BIV-to-GIV flattening which is important on
> processors without double-register addressing modes such as the SH,
> MIPS16, Thumb, and a few others.
> 
> Does the new loop optimizer support this feature?

no; as I said, anything related to iv's is problem.

Zdenek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27  2:27   ` Zdenek Dvorak
@ 2003-02-27  4:07     ` tm_gccmail
  2003-02-27  4:24       ` tm_gccmail
  2003-02-27 11:47       ` Zdenek Dvorak
  2003-02-27 15:48     ` Pop Sébastian
  1 sibling, 2 replies; 33+ messages in thread
From: tm_gccmail @ 2003-02-27  4:07 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc, m.hayes, rth, law, dan, jh

On Thu, 27 Feb 2003, Zdenek Dvorak wrote:

> Hello,
> 
> > > I would like to know your opinion on this subject. Why I believe old
> > > loop optimizer should be dropped as soon as possible:
...
> > The "old" loop optimizer does BIV-to-GIV flattening which is important on
> > processors without double-register addressing modes such as the SH,
> > MIPS16, Thumb, and a few others.
> > 
> > Does the new loop optimizer support this feature?
> 
> no; as I said, anything related to iv's is problem.

Then I personally (and probably most users of the SH) would prefer the old
loop optimizer be retained until the new loop optimizer supports BIV
flattening. 

This one optimization is extremely important; if we don't have it, the
compiler will generate a move/shift/add operation for each IV-related
load/store within the loop, which kills performance.

> Zdenek

Toshi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27  4:07     ` tm_gccmail
@ 2003-02-27  4:24       ` tm_gccmail
  2003-02-27 11:47       ` Zdenek Dvorak
  1 sibling, 0 replies; 33+ messages in thread
From: tm_gccmail @ 2003-02-27  4:24 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc, m.hayes, rth, law, dan, jh

On Wed, 26 Feb 2003 tm_gccmail@mail.kloo.net wrote:

> On Thu, 27 Feb 2003, Zdenek Dvorak wrote:
> 
> > Hello,
> > 
> > > > I would like to know your opinion on this subject. Why I believe old
> > > > loop optimizer should be dropped as soon as possible:
> ...
> > > The "old" loop optimizer does BIV-to-GIV flattening which is important on
> > > processors without double-register addressing modes such as the SH,
> > > MIPS16, Thumb, and a few others.
> > > 
> > > Does the new loop optimizer support this feature?
> > 
> > no; as I said, anything related to iv's is problem.
> 
> Then I personally (and probably most users of the SH) would prefer the old
> loop optimizer be retained until the new loop optimizer supports BIV
> flattening. 
> 
> This one optimization is extremely important; if we don't have it, the
> compiler will generate a move/shift/add operation for each IV-related
> load/store within the loop, which kills performance.

I forgot to mention that the old loop optimizer also knows not to host
address inheritance out of the loop, which increases register pressure
dramatically.

If the loop optimizer can leave the address calculations inside the
loop, then regmove can simplify it properly...this is very important.

> > Zdenek
> 
> Toshi

Toshi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27  4:07     ` tm_gccmail
  2003-02-27  4:24       ` tm_gccmail
@ 2003-02-27 11:47       ` Zdenek Dvorak
  1 sibling, 0 replies; 33+ messages in thread
From: Zdenek Dvorak @ 2003-02-27 11:47 UTC (permalink / raw)
  To: tm_gccmail; +Cc: gcc, m.hayes, rth, law, dan, jh

Hello,

> > > > I would like to know your opinion on this subject. Why I believe old
> > > > loop optimizer should be dropped as soon as possible:
> ...
> > > The "old" loop optimizer does BIV-to-GIV flattening which is important on
> > > processors without double-register addressing modes such as the SH,
> > > MIPS16, Thumb, and a few others.
> > > 
> > > Does the new loop optimizer support this feature?
> > 
> > no; as I said, anything related to iv's is problem.
> 
> Then I personally (and probably most users of the SH) would prefer the old
> loop optimizer be retained until the new loop optimizer supports BIV
> flattening. 
> 
> This one optimization is extremely important; if we don't have it, the
> compiler will generate a move/shift/add operation for each IV-related
> load/store within the loop, which kills performance.

you perhaps don't understand what I wrote. I have never suggested
dropping the loop optimizer now; I am asking for the best way how to
replace its functionality, so that it can be dropped.

Zdenek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27  2:27   ` Zdenek Dvorak
  2003-02-27  4:07     ` tm_gccmail
@ 2003-02-27 15:48     ` Pop Sébastian
  2003-02-27 16:06       ` Daniel Berlin
  2003-02-27 16:59       ` Zdenek Dvorak
  1 sibling, 2 replies; 33+ messages in thread
From: Pop Sébastian @ 2003-02-27 15:48 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: tm_gccmail, gcc, m.hayes, rth, law, dan, jh

Hi,

On Thu, Feb 27, 2003 at 02:04:14AM +0100, Zdenek Dvorak wrote:
> 
> > The "old" loop optimizer does BIV-to-GIV flattening which is important on
> > processors without double-register addressing modes such as the SH,
> > MIPS16, Thumb, and a few others.
> > 
> > Does the new loop optimizer support this feature?
> 
> no; as I said, anything related to iv's is problem.
> 

Right now I'm working on a pass that enrich the SSA representation 
by the IV functions information.  IVs are represented by chains of
recurences as described in:

http://citeseer.nj.nec.com/485503.html
http://citeseer.nj.nec.com/vanengelen00symbolic.html

In contrast to these papers I'll not transform the underlying 
representation, but enrich SSA with symbolic information that 
analyzers, as the dependence tester, or optimizers will use.

As long as you have an SSA representation over RTL it should be 
straight to implement it from the tree-SSA.  

Right now I have a working multivariate and periodic CR system.  
I have extended the CR system to flip-flop operations that introduce 
periodicity effects.

I still work on extending the CR system to undefined operations 
that nor Zima, nor Engelen defined (for example the multiplication 
of exponential CRs by polynomial CRs, or the addition of exponential CRs).
I will perform these operations symbolically without expecting to 
have a code generator at the end as it is the case in the previous papers.  
I think that analyzers and optimizers should be implemented without mixes.

Interfacing with the existing tree-SSA is the next step in my todo list.
Then I'll look at how it could be adapted to RTL.

	Sebastian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 15:48     ` Pop Sébastian
@ 2003-02-27 16:06       ` Daniel Berlin
  2003-02-27 16:32         ` Pop Sébastian
  2003-02-27 17:13         ` Pop Sébastian
  2003-02-27 16:59       ` Zdenek Dvorak
  1 sibling, 2 replies; 33+ messages in thread
From: Daniel Berlin @ 2003-02-27 16:06 UTC (permalink / raw)
  To: Pop Sébastian
  Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh


On Thursday, February 27, 2003, at 10:18  AM, Pop Sébastian wrote:

> Hi,
>
> On Thu, Feb 27, 2003 at 02:04:14AM +0100, Zdenek Dvorak wrote:
>>
>>> The "old" loop optimizer does BIV-to-GIV flattening which is 
>>> important on
>>> processors without double-register addressing modes such as the SH,
>>> MIPS16, Thumb, and a few others.
>>>
>>> Does the new loop optimizer support this feature?
>>
>> no; as I said, anything related to iv's is problem.
>>
>
> Right now I'm working on a pass that enrich the SSA representation
> by the IV functions information.  IVs are represented by chains of
> recurences as described in:

I had started to work on this method, but decided it was too complex, 
and moved to monotonic evolution.

Are you sure CR's are going to be fast enough?

--Dan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 16:06       ` Daniel Berlin
@ 2003-02-27 16:32         ` Pop Sébastian
  2003-02-27 19:43           ` Daniel Berlin
  2003-02-27 17:13         ` Pop Sébastian
  1 sibling, 1 reply; 33+ messages in thread
From: Pop Sébastian @ 2003-02-27 16:32 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh

On Thu, Feb 27, 2003 at 10:43:28AM -0500, Daniel Berlin wrote:
> 
> I had started to work on this method, but decided it was too complex, 
> and moved to monotonic evolution.
> 
In contrast to the CR method, monotonic evolution (ME) gives approximate 
data dependences: you'll have the direction but not the distance vector 
of a dependence.  Fact: not all optimizers use approximate vectors.

In this point you cannot compare these two passes, but anyway ...

> Are you sure CR's are going to be fast enough?
> 

..., I agree CR is slower than ME.  ME was designed to be faster.

We could speed up CRs by generating them on demand.  This way we'll 
limit this "expensive" ("not so-" in my opinion since I haven't 
tested it yet :-) pass to a single loop nest.

	Sebastian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 16:32         ` Pop Sébastian
@ 2003-02-27 19:43           ` Daniel Berlin
  2003-02-28 13:06             ` Pop Sébastian
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Berlin @ 2003-02-27 19:43 UTC (permalink / raw)
  To: Pop Sébastian
  Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh


On Thursday, February 27, 2003, at 11:14  AM, Pop Sébastian wrote:

> On Thu, Feb 27, 2003 at 10:43:28AM -0500, Daniel Berlin wrote:
>>
>> I had started to work on this method, but decided it was too complex,
>> and moved to monotonic evolution.
>>
> In contrast to the CR method, monotonic evolution (ME) gives 
> approximate
> data dependences: you'll have the direction but not the distance vector
> of a dependence.  Fact: not all optimizers use approximate vectors.
>

Um, actually, all the papers i have on ME describe a distance extended 
lattice to solve this problem.
I wouldn't have considered ME if it couldn't do distance information.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 19:43           ` Daniel Berlin
@ 2003-02-28 13:06             ` Pop Sébastian
  0 siblings, 0 replies; 33+ messages in thread
From: Pop Sébastian @ 2003-02-28 13:06 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Thu, Feb 27, 2003 at 02:15:29PM -0500, Daniel Berlin wrote:
> 
> Um, actually, all the papers i have on ME describe a distance extended 
> lattice to solve this problem.
> I wouldn't have considered ME if it couldn't do distance information.
> 
You're right.  I just read the old version of their paper, in the new one 
they extend ME to distance intervals.  However I'm not yet convinced why 
ME is faster than CR even when ME computes closed forms... 

	Sebastian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 16:06       ` Daniel Berlin
  2003-02-27 16:32         ` Pop Sébastian
@ 2003-02-27 17:13         ` Pop Sébastian
  1 sibling, 0 replies; 33+ messages in thread
From: Pop Sébastian @ 2003-02-27 17:13 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh

On Thu, Feb 27, 2003 at 04:14:48PM +0000, pop wrote:
> On Thu, Feb 27, 2003 at 10:43:28AM -0500, Daniel Berlin wrote:
> > 
> > I had started to work on this method, but decided it was too complex, 
> > and moved to monotonic evolution.
> > 
> In contrast to the CR method, monotonic evolution (ME) gives approximate 
> data dependences: you'll have the direction but not the distance vector 
> of a dependence.  Fact: not all optimizers use approximate vectors.
> 
> In this point you cannot compare these two passes, but anyway ...
> 
Forgot to mention the conclusion of one of the papers on ME:
"Monotonic Evolution: an Alternative to Induction Variable Substitution 
for Dependence Analysis" 
Peng Wu, Albert Cohen, Jay Hoeflinger, and David Padua

"Closed form expression computation [...] is also critical for removing 
dependences due to computation of induction variables themselves."

In other words if you generate code for IV functions at loop-phi-nodes, 
then you can parallelize these loops.

	Sebastian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 15:48     ` Pop Sébastian
  2003-02-27 16:06       ` Daniel Berlin
@ 2003-02-27 16:59       ` Zdenek Dvorak
  2003-02-27 18:30         ` Pop Sébastian
  1 sibling, 1 reply; 33+ messages in thread
From: Zdenek Dvorak @ 2003-02-27 16:59 UTC (permalink / raw)
  To: Pop Sébastian; +Cc: tm_gccmail, gcc, m.hayes, rth, law, dan, jh

Hello,

> > > The "old" loop optimizer does BIV-to-GIV flattening which is important on
> > > processors without double-register addressing modes such as the SH,
> > > MIPS16, Thumb, and a few others.
> > > 
> > > Does the new loop optimizer support this feature?
> > 
> > no; as I said, anything related to iv's is problem.
> > 
> 
> Right now I'm working on a pass that enrich the SSA representation 
> by the IV functions information.  IVs are represented by chains of
> recurences as described in:
> 
> http://citeseer.nj.nec.com/485503.html
> http://citeseer.nj.nec.com/vanengelen00symbolic.html
> 
> In contrast to these papers I'll not transform the underlying 
> representation, but enrich SSA with symbolic information that 
> analyzers, as the dependence tester, or optimizers will use.
> 
> As long as you have an SSA representation over RTL it should be 
> straight to implement it from the tree-SSA.  

The problem I see is that rtl ssa is broken (or never worked?); but
perhaps we could do this kind of optimizations just on trees (that
unfortunately also aren't ready for serious usage yet, right?)?

> Right now I have a working multivariate and periodic CR system.  
> I have extended the CR system to flip-flop operations that introduce 
> periodicity effects.
> 
> I still work on extending the CR system to undefined operations 
> that nor Zima, nor Engelen defined (for example the multiplication 
> of exponential CRs by polynomial CRs, or the addition of exponential CRs).
> I will perform these operations symbolically without expecting to 
> have a code generator at the end as it is the case in the previous papers.  
> I think that analyzers and optimizers should be implemented without mixes.
> 
> Interfacing with the existing tree-SSA is the next step in my todo list.
> Then I'll look at how it could be adapted to RTL.

How long do you think it will take you before the result is usable (I
wonder whether it makes sense to work on some temporary, not so clean
but faster solution)?

Zdenek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 16:59       ` Zdenek Dvorak
@ 2003-02-27 18:30         ` Pop Sébastian
  2003-02-27 19:15           ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher
  2003-02-27 19:59           ` Dropping of old loop optimizer Daniel Berlin
  0 siblings, 2 replies; 33+ messages in thread
From: Pop Sébastian @ 2003-02-27 18:30 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: tm_gccmail, gcc, m.hayes, rth, law, dan, jh

On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote:
> 
> The problem I see is that rtl ssa is broken (or never worked?); but
Last time I had a close look at rtl was about two years ago :-)
Dan is the expert to ask about the status of SSA at rtl level (he adapted 
the CCP code from rtl to tree-ssa).

> perhaps we could do this kind of optimizations just on trees (that
> unfortunately also aren't ready for serious usage yet, right?)?
> 
CCP (Conditional constant propagation) and 
DCE (Dead code elimination) 
seem to work fine on tree-SSA.

> > Right now I have a working multivariate and periodic CR system.  
> > I have extended the CR system to flip-flop operations that introduce 
> > periodicity effects.
> > 
> > I still work on extending the CR system to undefined operations 
> > that nor Zima, nor Engelen defined (for example the multiplication 
> > of exponential CRs by polynomial CRs, or the addition of exponential CRs).
> > I will perform these operations symbolically without expecting to 
> > have a code generator at the end as it is the case in the previous papers.  
> > I think that analyzers and optimizers should be implemented without mixes.
> > 
> > Interfacing with the existing tree-SSA is the next step in my todo list.
> > Then I'll look at how it could be adapted to RTL.
> 
> How long do you think it will take you before the result is usable (I
Since this is an analyzer the work could be done faster than for an optimization 
pass since difficult cases could be delayed to a later refinement.
In these difficult cases the analyzer will just reply "don't know".

> wonder whether it makes sense to work on some temporary, not so clean
> but faster solution)?
> 
Seems possible with successive refinements.

	Sebastian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 18:30         ` Pop Sébastian
@ 2003-02-27 19:15           ` Steven Bosscher
  2003-02-27 20:06             ` Diego Novillo
  2003-02-27 19:59           ` Dropping of old loop optimizer Daniel Berlin
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Bosscher @ 2003-02-27 19:15 UTC (permalink / raw)
  To: Pop Sébastian, dnovillo; +Cc: gcc

Op do 27-02-2003, om 18:13 schreef Pop Sébastian:
> On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote:
> > 
> > The problem I see is that rtl ssa is broken (or never worked?); but
> Last time I had a close look at rtl was about two years ago :-)
> Dan is the expert to ask about the status of SSA at rtl level (he adapted 
> the CCP code from rtl to tree-ssa).
> 
> > perhaps we could do this kind of optimizations just on trees (that
> > unfortunately also aren't ready for serious usage yet, right?)?
> > 
> CCP (Conditional constant propagation) and 
> DCE (Dead code elimination) 
> seem to work fine on tree-SSA.

There is a difference between things that "work fine" and things that
"are ready".

What, for example, is the compile time performance on the branch?  SPEC
results compared to mainline?  If I correctly interpret Diego's SPEC
tester results, it does not quite look like it's there yet.

Diego, you suggested last month that you would like to merge the branch
in the near feature (on the gomp list IIRC).  What are the show stoppers
you know about, other than that it will not be ready before stage1
ends?  There branch has moved so fast over the last two months that a
status update would be nice...
(I can send you a patch to update the web page iyl.  It still claims
you're using FUD chains :-)

Greetz
Steven

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 19:15           ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher
@ 2003-02-27 20:06             ` Diego Novillo
  2003-02-27 20:36               ` Steven Bosscher
  2003-02-27 20:45               ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin
  0 siblings, 2 replies; 33+ messages in thread
From: Diego Novillo @ 2003-02-27 20:06 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

On Thu, 27 Feb 2003, Steven Bosscher wrote:

> There is a difference between things that "work fine" and things that
> "are ready".
> 
Yup.

> What, for example, is the compile time performance on the branch?  SPEC
> results compared to mainline?  If I correctly interpret Diego's SPEC
> tester results, it does not quite look like it's there yet.
>
Yes, we are not quite there yet.

Since you brought it up, I got curious and gathered a few stats.
This is how the branch looks today in terms of compile time
compared with mainline:

- All times are in seconds.
- All the compilers are built with ENABLE_CHECKING.
- Bootstrap times are for all default languages (C, C++, Java,
  Fortran and ObjC).

------------------------------------------------------------------------
Build times	mainline	tree-ssa	Difference
------------------------------------------------------------------------
SPEC95		 375		 418		+11%	
SPEC2000	 517		 586		+13%
Bootstrap	3247		3588		+10%
------------------------------------------------------------------------

In terms of runtime performance, the branch is ~3% slower on
SPECint2000 and ~11% slower on SPECfp2000.  There's a couple of
individual tests where the branch is marginally faster than
mainline.  The branch also fails to compile vpr, eon and perlbmk.
So we need to look at those failures eventually.  Boring details
at the end of this message.

For SPEC95, the branch is about 7% slower than mainline on
SPECint95 (SPECfp95 is Fortran only, so the results are virtually
identical).  The branch fails to compile m88ksim and produces
better results for compress.


> Diego, you suggested last month that you would like to merge the branch
> in the near feature (on the gomp list IIRC).  What are the show stoppers
> you know about, other than that it will not be ready before stage1
> ends?  There branch has moved so fast over the last two months that a
>
My idea is to propose a merge only when we meet the following:

- Bootstrap times similar or better than mainline.
- Comparable SPEC95 and SPEC2000 results.
- Number of testsuite failures comparable with mainline.
- Tree optimizers enabled for C and C++.

The branch may have to comply with additional requirements from
the community.  That's just my own of criteria, I may have missed
something.  The usual battery of cross and native builds will
have to be done, as well.

In terms of things left to do.  The more important that come to
mind are:

- A real SSA->normal pass.  The current unSSA pass simply drops
  the SSA version numbers.  If two versions of the same variable
  have been moved crossing life boundaries, we silently generate
  wrong code.

- Support for try/catch in the flowgraph.  With the addition of
  C++, we now support try/catch in GIMPLE.  The flowgraph still
  doesn't know that.

- Investigate what can be modified/removed/simplified in the RTL
  backend due to the different "flavour" of RTL that the tree
  optimizers emit.  I suspect that we could get a big compile
  time boost if we could get rid of unnecessary work in the
  backend.  We could also find that we need some more additional
  passes at the tree level that we still haven't thought of.


So, it's a lot of work.  Will it be ready for 3.5's stage1?  I
don't know.  Particularly if the list of requirements grows
bigger.  The integration work will also be interesting.  I diff'd
mainline and the branch a few days ago:

 307 files changed, 80994 insertions(+), 4342 deletions(-)


> (I can send you a patch to update the web page iyl.  It still claims
> you're using FUD chains :-)
> 
Oops.  Patches appreciated!


Diego.


---------------------------------------------------------------------------
SPECint2000

    Benchmark	Mainline	tree-ssa	% diff
     164.gzip	669.33		646.19		-  3.46%
      175.vpr	430.87		  0.00		-100.00%
      176.gcc	678.95		649.51		-  4.34%
      181.mcf	418.57		425.79		+  1.73%
   186.crafty	689.60		647.80		-  6.06%
   197.parser	561.97		539.01		-  4.09%
      252.eon	583.56		  0.00		-100.00%
  253.perlbmk	868.12		  0.00		-100.00%
      254.gap	736.19		711.74		-  3.32%
   255.vortex	832.92		837.57		+  0.56%
    256.bzip2	545.44		513.55		-  5.85%
    300.twolf	538.90		520.49		-  3.42%
         mean	614.52		599.10		-  2.51%


SPECfp2000

    Benchmark	Before		 After		% diff
  168.wupwise	824.20		808.10		-  1.95%
     171.swim	506.86		502.08		-  0.94%
    172.mgrid	485.86		515.41		+  6.08%
    173.applu	555.19		536.63		-  3.34%
     177.mesa	731.30		678.08		-  7.28%
   178.galgel	  0.00		  0.00		INF
      179.art	373.37		191.51		- 48.71%
   183.equake	819.99		820.92		+  0.11%
  187.facerec	  0.00		  0.00		INF
     188.ammp	359.64		351.84		-  2.17%
    189.lucas	  0.00		  0.00		INF
    191.fma3d	  0.00		  0.00		INF
 200.sixtrack	314.80		304.86		-  3.16%
     301.apsi	  0.00		340.96		INF
         mean	521.57		461.42		- 11.53%
---------------------------------------------------------------------------


---------------------------------------------------------------------------
SPECint95

    Benchmark	Before		 After		% diff
       099.go	 43.11		 42.86		-  0.59%
  124.m88ksim	 30.64		  0.00		-100.00%
      126.gcc	 33.48		 31.72		-  5.26%
 129.compress	 22.39		 23.34		+  4.26%
       130.li	 32.30		 30.40		-  5.88%
    132.ijpeg	 34.91		 30.85		- 11.62%
     134.perl	 46.71		  0.00		-100.00%
   147.vortex	 32.95		 32.14		-  2.45%
         mean	 33.84		 31.39		-  7.23%
---------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 20:06             ` Diego Novillo
@ 2003-02-27 20:36               ` Steven Bosscher
  2003-02-27 21:17                 ` tree-ssa status Zack Weinberg
  2003-02-27 21:23                 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo
  2003-02-27 20:45               ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin
  1 sibling, 2 replies; 33+ messages in thread
From: Steven Bosscher @ 2003-02-27 20:36 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

Op do 27-02-2003, om 20:46 schreef Diego Novillo:
> On Thu, 27 Feb 2003, Steven Bosscher wrote:
> 
> > There is a difference between things that "work fine" and things that
> > "are ready".
> > 
> Yup.
> 
> > What, for example, is the compile time performance on the branch?  SPEC
> > results compared to mainline?  If I correctly interpret Diego's SPEC
> > tester results, it does not quite look like it's there yet.
> >
> Yes, we are not quite there yet.
> 
> Since you brought it up, I got curious and gathered a few stats.
> This is how the branch looks today in terms of compile time
> compared with mainline:
> 
> - All times are in seconds.
> - All the compilers are built with ENABLE_CHECKING.
> - Bootstrap times are for all default languages (C, C++, Java,
>   Fortran and ObjC).
> 
> ------------------------------------------------------------------------
> Build times	mainline	tree-ssa	Difference
> ------------------------------------------------------------------------
> SPEC95		 375		 418		+11%	
> SPEC2000	 517		 586		+13%
> Bootstrap	3247		3588		+10%
> ------------------------------------------------------------------------

Mwah, that is not as bad as I thought.  And comparing the branch to the
mainline just like that is not really fair IMO (which is why I didn't do
it), because eventually there should be optimizations we do on RTL that
have virtually no effect while they may ruin compile time.  If you take
that into account it really doesn't look that bad.

It would be interesting to know the total time spent in the RTL
optimizers for mainline and branch...

> In terms of runtime performance, the branch is ~3% slower on
> SPECint2000 and ~11% slower on SPECfp2000.  There's a couple of
> individual tests where the branch is marginally faster than
> mainline.  The branch also fails to compile vpr, eon and perlbmk.

Ooof ouch, 11%!

The worst slowdown is obviously in 179.art.  What's so special about it
that makes the branch twice as slow?

> - Investigate what can be modified/removed/simplified in the RTL
>   backend due to the different "flavour" of RTL that the tree
>   optimizers emit.  I suspect that we could get a big compile
>   time boost if we could get rid of unnecessary work in the
>   backend.  We could also find that we need some more additional
>   passes at the tree level that we still haven't thought of.

Define "unnecessary"...

Ideally we'd have all front ends use the tree infrastructure.  That
would allow us to blow away big chunks of RTL optimizers, and indeed one
would expect to see a potentially significant win in compile time from
that.

But it seems that we'll be stuck with at least one front end that will
probably never be able to do function-at-a-time compilation (g77).  It
was said here once that Ada will do function-at-a-time??

> So, it's a lot of work.  Will it be ready for 3.5's stage1?  I
> don't know.  Particularly if the list of requirements grows
> bigger.  The integration work will also be interesting.  I diff'd
> mainline and the branch a few days ago:
> 
>  307 files changed, 80994 insertions(+), 4342 deletions(-)

How much of those insertions are in new files?

> > (I can send you a patch to update the web page iyl.  It still claims
> > you're using FUD chains :-)
> > 
> Oops.  Patches appreciated!

I'll make one.

Greetz
Steven


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status
  2003-02-27 20:36               ` Steven Bosscher
@ 2003-02-27 21:17                 ` Zack Weinberg
  2003-02-27 21:23                   ` Paul Brook
  2003-02-27 22:47                   ` Steven Bosscher
  2003-02-27 21:23                 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo
  1 sibling, 2 replies; 33+ messages in thread
From: Zack Weinberg @ 2003-02-27 21:17 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Diego Novillo, gcc

Steven Bosscher <s.bosscher@student.tudelft.nl> writes:

> But it seems that we'll be stuck with at least one front end that will
> probably never be able to do function-at-a-time compilation (g77).

Isn't g95 targeted for the 3.5 time frame as well?  If so, we ought to
be able to scrap g77 entirely.

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status
  2003-02-27 21:17                 ` tree-ssa status Zack Weinberg
@ 2003-02-27 21:23                   ` Paul Brook
  2003-02-28 21:15                     ` Toon Moene
  2003-02-27 22:47                   ` Steven Bosscher
  1 sibling, 1 reply; 33+ messages in thread
From: Paul Brook @ 2003-02-27 21:23 UTC (permalink / raw)
  To: Zack Weinberg, Steven Bosscher; +Cc: Diego Novillo, gcc

On Thursday 27 February 2003 8:57 pm, Zack Weinberg wrote:
> Steven Bosscher <s.bosscher@student.tudelft.nl> writes:
> > But it seems that we'll be stuck with at least one front end that will
> > probably never be able to do function-at-a-time compilation (g77).
>
> Isn't g95 targeted for the 3.5 time frame as well?  If so, we ought to
> be able to scrap g77 entirely.

While g95 may be nominally targeted for this timeframe, I very much doubt it 
will be considered an acceptable replacement for g77 until some time 
afterwards. A working g95 and a g95 which is as good as (or better than) 
g77 for fortran 77 code are two totally different things.

Paul Brook

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status
  2003-02-27 21:23                   ` Paul Brook
@ 2003-02-28 21:15                     ` Toon Moene
  0 siblings, 0 replies; 33+ messages in thread
From: Toon Moene @ 2003-02-28 21:15 UTC (permalink / raw)
  To: Paul Brook; +Cc: Zack Weinberg, Steven Bosscher, Diego Novillo, gcc

Paul Brook wrote:

> On Thursday 27 February 2003 8:57 pm, Zack Weinberg wrote:
> 
>>Steven Bosscher <s.bosscher@student.tudelft.nl> writes:
>>
>>>But it seems that we'll be stuck with at least one front end that will
>>>probably never be able to do function-at-a-time compilation (g77).
>>
>>Isn't g95 targeted for the 3.5 time frame as well?  If so, we ought to
>>be able to scrap g77 entirely.

> While g95 may be nominally targeted for this timeframe, I very much doubt it 
> will be considered an acceptable replacement for g77 until some time 
> afterwards. A working g95 and a g95 which is as good as (or better than) 
> g77 for fortran 77 code are two totally different things.

For the benefit of understanding this issue by those not intimately 
involved in Fortran business:  Most closed-source Fortran compilers 
consisted as two separate executables for about a decade before the 
Fortran 90/95 compiler was "good enough" at the Fortran 77 stuff to be 
the only offering.

We hope to work faster than that, but don't hold your breath.

-- 
Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://gcc-g95.sourceforge.net/ (under construction)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status
  2003-02-27 21:17                 ` tree-ssa status Zack Weinberg
  2003-02-27 21:23                   ` Paul Brook
@ 2003-02-27 22:47                   ` Steven Bosscher
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Bosscher @ 2003-02-27 22:47 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Diego Novillo, gcc, Paul Brook

Op do 27-02-2003, om 21:57 schreef Zack Weinberg:
> Steven Bosscher <s.bosscher@student.tudelft.nl> writes:
> 
> > But it seems that we'll be stuck with at least one front end that will
> > probably never be able to do function-at-a-time compilation (g77).
> 
> Isn't g95 targeted for the 3.5 time frame as well? 

Yea, right. I'd wish!  :-\

Right now we can't even generate code for all of F77.  And that is
supposed to be the *easy* part.  Nope, there is still a long way to go. 
We're getting closer to being able to generate code for the whole
language, but the code quality is not all that great.

And that is just the compiler side of things.  The library is another
issue that needs to be addressed, and while bits and pieces are in
place, it will still take a huge effort to finish it.

Greetz
Steven

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 20:36               ` Steven Bosscher
  2003-02-27 21:17                 ` tree-ssa status Zack Weinberg
@ 2003-02-27 21:23                 ` Diego Novillo
  2003-02-27 23:09                   ` law
  1 sibling, 1 reply; 33+ messages in thread
From: Diego Novillo @ 2003-02-27 21:23 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

On Thu, 2003-02-27 at 15:31, Steven Bosscher wrote:

> It would be interesting to know the total time spent in the RTL
> optimizers for mainline and branch...
> 
Yes, it would.  This is something that Jeff Law has started to do
recently.  Most of the recent compile time improvements come from
profiling the implementation and finding obvious hot spots.  We will be
doing lots of that in the coming weeks/months.

> The worst slowdown is obviously in 179.art.  What's so special about it
> that makes the branch twice as slow?
> 
No idea yet.  This is part of what still needs to be done.

> > - Investigate what can be modified/removed/simplified in the RTL
> >   backend due to the different "flavour" of RTL that the tree
> >   optimizers emit.  I suspect that we could get a big compile
> >   time boost if we could get rid of unnecessary work in the
> >   backend.  We could also find that we need some more additional
> >   passes at the tree level that we still haven't thought of.
> 
> Define "unnecessary"...
> 
If we can make simplifying assumptions in the RTL optimizers, we could
make them run faster.  This of course would need to be predicated.  As
you point out, not all the front ends go through tree-ssa.  It is still
unclear to me whether this is impossible or merely difficult.

> > So, it's a lot of work.  Will it be ready for 3.5's stage1?  I
> > don't know.  Particularly if the list of requirements grows
> > bigger.  The integration work will also be interesting.  I diff'd
> > mainline and the branch a few days ago:
> > 
> >  307 files changed, 80994 insertions(+), 4342 deletions(-)
> 
> How much of those insertions are in new files?
> 
Good point.  About 70000.


Diego.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 21:23                 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo
@ 2003-02-27 23:09                   ` law
  2003-02-27 23:36                     ` tree-ssa status Zack Weinberg
  0 siblings, 1 reply; 33+ messages in thread
From: law @ 2003-02-27 23:09 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Steven Bosscher, gcc

In message <1046379746.3100.23.camel@shadowfax>, Diego Novillo writes:
 >On Thu, 2003-02-27 at 15:31, Steven Bosscher wrote:
 >
 >> It would be interesting to know the total time spent in the RTL
 >> optimizers for mainline and branch...
 >> 
 >Yes, it would.  This is something that Jeff Law has started to do
 >recently.  Most of the recent compile time improvements come from
 >profiling the implementation and finding obvious hot spots.  We will be
 >doing lots of that in the coming weeks/months.
More correctly, I've been measuring time in the SSA path independently
from the rest of the compiler.  This makes it a lot easier to see where
the SSA path is wasting time (ie, it isn't hidden by something like cse
or gcse going crazy).

I haven't tried to measure and compare the time the two branches spend
in the RTL optimizers.  We'll probably be to that point in the not too
terribly distant future.

 >> The worst slowdown is obviously in 179.art.  What's so special about it
 >> that makes the branch twice as slow?
 >> 
 >No idea yet.  This is part of what still needs to be done.
Right.  Of focus so far has been on getting the underlying infrastucture
in place and more recently making sure that infrastructure runs reasonably
quickly.  We haven't started looking at the generated code yet.  It won't
make a lot of sense to do that until we have a real translator out of SSA.

 >> Define "unnecessary"...
 >> 
 >If we can make simplifying assumptions in the RTL optimizers, we could
 >make them run faster.  This of course would need to be predicated.  As
 >you point out, not all the front ends go through tree-ssa.  It is still
 >unclear to me whether this is impossible or merely difficult.
One canonical example I use is null pointer check elimination which
can be completely subsumed by a tree-ssa version.  We already know that
null pointer check elimination is relatively slow and memory intensive
(that's why it blocks the bitvectors rather than doing everything in
parallel).

The other canonical example is all the path following performed by cse1.
I believe there is an excellent chance we'll be able to have cse1 do a
block-local CSE once the tree-ssa code is doing some basic value numbering
and copy prop on the dominator tree.  This code is also known to be a
major cpu hog.

In both cases the SSA equivalent optimizations can be made bloody fast
and memory efficient.


 >> > So, it's a lot of work.  Will it be ready for 3.5's stage1?  I
 >> > don't know.  Particularly if the list of requirements grows
 >> > bigger.  The integration work will also be interesting.  I diff'd
 >> > mainline and the branch a few days ago:
 >> > 
 >> >  307 files changed, 80994 insertions(+), 4342 deletions(-)
 >> 
 >> How much of those insertions are in new files?
 >> 
 >Good point.  About 70000.
Also keep in mind the change to carry file/line information in a common
location touched a lot of code in basically mindless ways.

Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status
  2003-02-27 23:09                   ` law
@ 2003-02-27 23:36                     ` Zack Weinberg
  2003-02-27 23:46                       ` law
  0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-02-27 23:36 UTC (permalink / raw)
  To: law; +Cc: Diego Novillo, Steven Bosscher, gcc

law@redhat.com writes:

> Also keep in mind the change to carry file/line information in a common
> location touched a lot of code in basically mindless ways.

That might be a candidate for a separate merge to mainline (he says,
not having looked at the code at all).

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status
  2003-02-27 23:36                     ` tree-ssa status Zack Weinberg
@ 2003-02-27 23:46                       ` law
  0 siblings, 0 replies; 33+ messages in thread
From: law @ 2003-02-27 23:46 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Diego Novillo, Steven Bosscher, gcc

In message <87lm015pgo.fsf@egil.codesourcery.com>, Zack Weinberg writes:
 >law@redhat.com writes:
 >
 >> Also keep in mind the change to carry file/line information in a common
 >> location touched a lot of code in basically mindless ways.
 >
 >That might be a candidate for a separate merge to mainline (he says,
 >not having looked at the code at all).
I wouldn't recommend it.  The tree-ssa code has very different characteristics
in terms of how file/line information is carried around when compared to
the mainline sources.  As a result, the tradeoffs for which scheme is more
memory efficient are very different.

I did the algebra for the mainline a while back and concluded that the scheme
used by tree-ssa won't make sense for the mainline if/until tree-ssa is the
mainline.

Jeff


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 20:06             ` Diego Novillo
  2003-02-27 20:36               ` Steven Bosscher
@ 2003-02-27 20:45               ` Daniel Berlin
  2003-02-27 20:57                 ` Diego Novillo
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Berlin @ 2003-02-27 20:45 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Steven Bosscher, gcc


On Thursday, February 27, 2003, at 02:46  PM, Diego Novillo wrote:

> On Thu, 27 Feb 2003, Steven Bosscher wrote:
>
>> There is a difference between things that "work fine" and things that
>> "are ready".
>>
> Yup.
>
>> What, for example, is the compile time performance on the branch?  
>> SPEC
>> results compared to mainline?  If I correctly interpret Diego's SPEC
>> tester results, it does not quite look like it's there yet.
>>
> Yes, we are not quite there yet.
>
> Since you brought it up, I got curious and gathered a few stats.
> This is how the branch looks today in terms of compile time
> compared with mainline:
>
> - All times are in seconds.
> - All the compilers are built with ENABLE_CHECKING.
> - Bootstrap times are for all default languages (C, C++, Java,
>   Fortran and ObjC).

Is this still without copy propagation, or is that in a tree you used 
here?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tree-ssa status (was: Re: Dropping of old loop optimizer)
  2003-02-27 20:45               ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin
@ 2003-02-27 20:57                 ` Diego Novillo
  0 siblings, 0 replies; 33+ messages in thread
From: Diego Novillo @ 2003-02-27 20:57 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Steven Bosscher, gcc

On Thu, 2003-02-27 at 15:37, Daniel Berlin wrote:

> > On Thu, 27 Feb 2003, Steven Bosscher wrote:
> >
> >> There is a difference between things that "work fine" and things that
> >> "are ready".
> >>
> > Yup.
> >
> >> What, for example, is the compile time performance on the branch?  
> >> SPEC
> >> results compared to mainline?  If I correctly interpret Diego's SPEC
> >> tester results, it does not quite look like it's there yet.
> >>
> > Yes, we are not quite there yet.
> >
> > Since you brought it up, I got curious and gathered a few stats.
> > This is how the branch looks today in terms of compile time
> > compared with mainline:
> >
> > - All times are in seconds.
> > - All the compilers are built with ENABLE_CHECKING.
> > - Bootstrap times are for all default languages (C, C++, Java,
> >   Fortran and ObjC).
> 
> Is this still without copy propagation, or is that in a tree you used 
> here?
>
We still don't have an implementation of copy propagation in the
branch.  Copyprop is one of the transformations that needs the real
unSSA conversion.

All the figures I posted are taken with pristine copies of tree-ssa and
mainline as of 2003-02-26.


Diego.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 18:30         ` Pop Sébastian
  2003-02-27 19:15           ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher
@ 2003-02-27 19:59           ` Daniel Berlin
  2003-02-27 20:49             ` Michael Matz
  2003-02-28 10:46             ` Jan Hubicka
  1 sibling, 2 replies; 33+ messages in thread
From: Daniel Berlin @ 2003-02-27 19:59 UTC (permalink / raw)
  To: Pop Sébastian
  Cc: Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth, law, dan, jh

On Thursday, February 27, 2003, at 12:13  PM, Pop Sébastian wrote:

> On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote:
>>
>> The problem I see is that rtl ssa is broken (or never worked?); but
> Last time I had a close look at rtl was about two years ago :-)
> Dan is the expert to ask about the status of SSA at rtl level (he 
> adapted
> the CCP code from rtl to tree-ssa).
The problem with RTL SSA is hard registers and subregs.
There are hard registers assigned in certain cases during initial RTL 
generation.
These can't be renamed, obviously.
You run into difficulties with subregs as well (in relation to liveness 
and renaming).

Thus, one has the following options:
1. Ignore them while creating SSA, and make sure optimizers don't touch 
them. (not clean)
2. Perform magic to save and restore the pieces of code using them. 
(not clean, not easy)
3. Don't generate hard registers until after SSA is done with (not all 
that easy, but could be done), and don't ever generate subregs (not 
really possible without severely rearchitecting RTL).

The reason we need subregs is because of how we do registers.
Most compilers do the reverse of what we do in relation to registers.
We have a single operator, like SET, and multiple register modes/sizes 
that can be operands.
So to get a full register copy, one simply does something like
(set (reg:SI 50) (reg:SI 52))

And to get a piece of a register, one needs a subregister operator.

like
(set (reg:SI 50) (subreg:SI (reg:DI 51) 0))

Most compilers have multiple operators (one for each size), and no 
register sizes.
So to get a full register copy, they do:
(set:DIDI (reg 50) (reg 51))
and to get a piece of a register, they do
(set:SIDI (reg 50) (reg 51))

Now consider trying to set a piece of a register

we would do:

(set (subreg:SI (reg:DI 51)) (reg:SI 50))

which is very hard to think about how to rename into SSA (because it's 
a partial assignment, and the subreg itself has no name)

while they would do:
(set:DISI (reg 51) (reg 50))

which is still a full def, and easy to rename.

Get it?

>
>> perhaps we could do this kind of optimizations just on trees (that
>> unfortunately also aren't ready for serious usage yet, right?)?
>>
> CCP (Conditional constant propagation) and
> DCE (Dead code elimination)
> seem to work fine on tree-SSA.
>

and PRE *will* work fine just as soon as Andrew finishes and posts the 
insertion code.
--Dan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 19:59           ` Dropping of old loop optimizer Daniel Berlin
@ 2003-02-27 20:49             ` Michael Matz
  2003-02-28 10:46             ` Jan Hubicka
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Matz @ 2003-02-27 20:49 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth,
	law, dan, jh

Hi,

On Thu, 27 Feb 2003, Daniel Berlin wrote:

> Now consider trying to set a piece of a register
>
> we would do:
>
> (set (subreg:SI (reg:DI 51)) (reg:SI 50))
>
> while they would do:
> (set:DISI (reg 51) (reg 50))
>
> which is still a full def, and easy to rename.

But those compiler would need to do magic for other than the lowest part,
i.e. the difference between (subreg:SI (reg:DI) 0) and
(subreg:SI (reg:DI) 4).  And especially when setting a full DImode reg
really piecewise (i.e. when the set's indeed are just partial and do not
clobber the whole underlying reg).  They basically need to set two SImode
regs, and then do a parallel copy of those two regs into one DImode reg
(one in the low, one in the high part), and then rely on something like
coalescing and the reg allocator to allocate one of the SImode regs to the
low and the other to the high part of an register (i.e. a limited form of
bitwidth aware allocation).

It's not clear to me, if one approach is really better than the other,
they both have their issues.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-27 19:59           ` Dropping of old loop optimizer Daniel Berlin
  2003-02-27 20:49             ` Michael Matz
@ 2003-02-28 10:46             ` Jan Hubicka
  2003-02-28 13:15               ` Pop Sébastian
  2003-02-28 15:29               ` law
  1 sibling, 2 replies; 33+ messages in thread
From: Jan Hubicka @ 2003-02-28 10:46 UTC (permalink / raw)
  To: Daniel Berlin
  Cc: Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc, m.hayes, rth,
	law, dan, jh

> 
> On Thursday, February 27, 2003, at 12:13  PM, Pop SĂŠbastian wrote:
> 
> >On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote:
> >>
> >>The problem I see is that rtl ssa is broken (or never worked?); but
> >Last time I had a close look at rtl was about two years ago :-)
> >Dan is the expert to ask about the status of SSA at rtl level (he 
> >adapted
> >the CCP code from rtl to tree-ssa).
> The problem with RTL SSA is hard registers and subregs.
> There are hard registers assigned in certain cases during initial RTL 
> generation.
> These can't be renamed, obviously.

The tree-SSA represents the SSA form as datastructure on the top of the
trees not modifying the trees themselves, right?  Why it is not feasible
for RTL (ie you won't need to rename the registers)

Honza

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-28 10:46             ` Jan Hubicka
@ 2003-02-28 13:15               ` Pop Sébastian
  2003-02-28 15:29               ` law
  1 sibling, 0 replies; 33+ messages in thread
From: Pop Sébastian @ 2003-02-28 13:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

On Fri, Feb 28, 2003 at 10:54:45AM +0100, Jan Hubicka wrote:
> 
> The tree-SSA represents the SSA form as datastructure on the top of the
> trees not modifying the trees themselves, right?  

Trees are modified and contain SSA nodes: 
http://gcc.gnu.org/ml/gcc-patches/2003-01/msg02182.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-28 10:46             ` Jan Hubicka
  2003-02-28 13:15               ` Pop Sébastian
@ 2003-02-28 15:29               ` law
  2003-02-28 21:56                 ` Daniel Berlin
  1 sibling, 1 reply; 33+ messages in thread
From: law @ 2003-02-28 15:29 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Daniel Berlin, Pop Sébastian, Zdenek Dvorak, tm_gccmail,
	gcc, m.hayes, rth, dan

In message <20030228095445.GD5431@kam.mff.cuni.cz>, Jan Hubicka writes:
 >> 
 >> On Thursday, February 27, 2003, at 12:13  PM, Pop Sébastian wrote:
 >> 
 >> >On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote:
 >> >>
 >> >>The problem I see is that rtl ssa is broken (or never worked?); but
 >> >Last time I had a close look at rtl was about two years ago :-)
 >> >Dan is the expert to ask about the status of SSA at rtl level (he 
 >> >adapted
 >> >the CCP code from rtl to tree-ssa).
 >> The problem with RTL SSA is hard registers and subregs.
 >> There are hard registers assigned in certain cases during initial RTL 
 >> generation.
 >> These can't be renamed, obviously.
 >
 >The tree-SSA represents the SSA form as datastructure on the top of the
 >trees not modifying the trees themselves, right?  Why it is not feasible
 >for RTL (ie you won't need to rename the registers)
hard-regs and subregs really aren't really the problem with rtl-ssa,
they're both solvable.  The first by ignoring them, the second with
some annoying, but not terribly difficult work in the ssa->normal
translator.

What is far more of a problem for rtl-ssa is the lack of any kind of
generic level rtl.

The tree-SSA branch using a rewriting SSA form now.

Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Dropping of old loop optimizer
  2003-02-28 15:29               ` law
@ 2003-02-28 21:56                 ` Daniel Berlin
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Berlin @ 2003-02-28 21:56 UTC (permalink / raw)
  To: law
  Cc: Jan Hubicka, Pop Sébastian, Zdenek Dvorak, tm_gccmail, gcc,
	m.hayes, rth, dan


On Friday, February 28, 2003, at 09:29  AM, law@redhat.com wrote:

> In message <20030228095445.GD5431@kam.mff.cuni.cz>, Jan Hubicka writes:
>>>
>>> On Thursday, February 27, 2003, at 12:13  PM, Pop Sébastian wrote:
>>>
>>>> On Thu, Feb 27, 2003 at 05:39:17PM +0100, Zdenek Dvorak wrote:
>>>>>
>>>>> The problem I see is that rtl ssa is broken (or never worked?); but
>>>> Last time I had a close look at rtl was about two years ago :-)
>>>> Dan is the expert to ask about the status of SSA at rtl level (he
>>>> adapted
>>>> the CCP code from rtl to tree-ssa).
>>> The problem with RTL SSA is hard registers and subregs.
>>> There are hard registers assigned in certain cases during initial RTL
>>> generation.
>>> These can't be renamed, obviously.
>>
>> The tree-SSA represents the SSA form as datastructure on the top of 
>> the
>> trees not modifying the trees themselves, right?  Why it is not 
>> feasible
>> for RTL (ie you won't need to rename the registers)
> hard-regs and subregs really aren't really the problem with rtl-ssa,
> they're both solvable.
however, last i looked, your solution was going to be to just not do an 
ssa path when subregs existed

And if they were all that easy to solve, we would be doing register 
allocation on an SSA form, like every other compiler in existence.

>  The first by ignoring them, the second with
> some annoying, but not terribly difficult work in the ssa->normal
> translator.
Both of which suck, to be honest, and *are* difficult enough that 
nobody has *done* it yet.
>
> What is far more of a problem for rtl-ssa is the lack of any kind of
> generic level rtl.
>
With what properties, exactly?

> The tree-SSA branch using a rewriting SSA form now.
>
> Jeff
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2003-02-28 20:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-27  1:04 Dropping of old loop optimizer Zdenek Dvorak
2003-02-27  1:12 ` tm_gccmail
2003-02-27  2:27   ` Zdenek Dvorak
2003-02-27  4:07     ` tm_gccmail
2003-02-27  4:24       ` tm_gccmail
2003-02-27 11:47       ` Zdenek Dvorak
2003-02-27 15:48     ` Pop Sébastian
2003-02-27 16:06       ` Daniel Berlin
2003-02-27 16:32         ` Pop Sébastian
2003-02-27 19:43           ` Daniel Berlin
2003-02-28 13:06             ` Pop Sébastian
2003-02-27 17:13         ` Pop Sébastian
2003-02-27 16:59       ` Zdenek Dvorak
2003-02-27 18:30         ` Pop Sébastian
2003-02-27 19:15           ` tree-ssa status (was: Re: Dropping of old loop optimizer) Steven Bosscher
2003-02-27 20:06             ` Diego Novillo
2003-02-27 20:36               ` Steven Bosscher
2003-02-27 21:17                 ` tree-ssa status Zack Weinberg
2003-02-27 21:23                   ` Paul Brook
2003-02-28 21:15                     ` Toon Moene
2003-02-27 22:47                   ` Steven Bosscher
2003-02-27 21:23                 ` tree-ssa status (was: Re: Dropping of old loop optimizer) Diego Novillo
2003-02-27 23:09                   ` law
2003-02-27 23:36                     ` tree-ssa status Zack Weinberg
2003-02-27 23:46                       ` law
2003-02-27 20:45               ` tree-ssa status (was: Re: Dropping of old loop optimizer) Daniel Berlin
2003-02-27 20:57                 ` Diego Novillo
2003-02-27 19:59           ` Dropping of old loop optimizer Daniel Berlin
2003-02-27 20:49             ` Michael Matz
2003-02-28 10:46             ` Jan Hubicka
2003-02-28 13:15               ` Pop Sébastian
2003-02-28 15:29               ` law
2003-02-28 21:56                 ` Daniel Berlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).