public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* problems with the ipa pass manager.
@ 2004-11-19 21:02 Kenneth Zadeck
  2004-11-20 14:22 ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Kenneth Zadeck @ 2004-11-19 21:02 UTC (permalink / raw)
  To: Jan Hubicka, Steven Bosscher, Dale Johannesen, Stuart Hastings,
	GCC Mailing List, Berlin, Daniel, Novillo, Diego

The current design (or at least implementation) of the ipa pass manager 
seems to have a serious design flaw.  The passes are called vertically 
rather than horizontally.  What I mean by this is that

for each function F in the compilation unit
    build the cfg for F
   for each pass P in the ipa pass manager
       call P (F);

This design has several serious flaws:
   
1) The cfg is only built for the functions that have already been 
processed.  They have not been built for the functions to be processed.

2) Many axillary functions have been booby-trapped to abort if the 
function that they are called on does not have a cfg.
consider
   cgraph_function_body_availability calls
   tree_inlinable_function_p calls
   inlinable_function_p calls
   inline_forbidden_p which has been booby-trapped. 

so with the current design, it is impossible ask important questions 
about the functions that have not yet been processed.

3) With this current design it is impossible to implement a series of 
ipa-based improvers since the output of one ipa is not seen by the 
analysis routines of the next pass. 

Consider the case where it would be desirable to find and mark the set 
of module level static which are readonly and do not have their address 
taken and then using this analysis to do a better job of identifying 
const and pure functions.  Unless you can do this in a single pass, you 
are screwed by this ipa pass manager architecture.

4) This ipa pass manager design will make it impossible to do true cfg 
based inlining whose results can be seen by ipa passes since the actual 
inlining, by design, must be done after all the ipa passes have finished 
(this is the only time when the entire cfg has been built).

The proper design for this needs to be (where building the cfg is the 
first ipa pass):
for each pass manager pass P:
  {
     for each function F
        call P.analyze_function(F)

     for each global variable V
        call P analyze_variable (V)
    
     call P.execute()

     for each function F
        call P.transform_function(F)

    for each global variable V
        call P transform_variable (V)
  }


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with the ipa pass manager.
  2004-11-19 21:02 problems with the ipa pass manager Kenneth Zadeck
@ 2004-11-20 14:22 ` Jan Hubicka
  2004-11-20 19:07   ` Kenneth Zadeck
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2004-11-20 14:22 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Jan Hubicka, Steven Bosscher, Dale Johannesen, Stuart Hastings,
	GCC Mailing List, Berlin, Daniel, Novillo, Diego

> The current design (or at least implementation) of the ipa pass manager 
> seems to have a serious design flaw.  The passes are called vertically 
> rather than horizontally.  What I mean by this is that
> 
> for each function F in the compilation unit
>    build the cfg for F
>   for each pass P in the ipa pass manager
>       call P (F);
> 
> This design has several serious flaws:
>   
> 1) The cfg is only built for the functions that have already been 
> processed.  They have not been built for the functions to be processed.

This is going to change with CFG inlining
> 
> 2) Many axillary functions have been booby-trapped to abort if the 
> function that they are called on does not have a cfg.
> consider
>   cgraph_function_body_availability calls
>   tree_inlinable_function_p calls
>   inlinable_function_p calls
>   inline_forbidden_p which has been booby-trapped. 
> 
> so with the current design, it is impossible ask important questions 
> about the functions that have not yet been processed.

These functions should in general work for function passed to the
analyze hook, but not for the functions being called from it since those
are not analyzed yet..
> 
> 3) With this current design it is impossible to implement a series of 
> ipa-based improvers since the output of one ipa is not seen by the 
> analysis routines of the next pass. 
> 
> Consider the case where it would be desirable to find and mark the set 
> of module level static which are readonly and do not have their address 
> taken and then using this analysis to do a better job of identifying 
> const and pure functions.  Unless you can do this in a single pass, you 
> are screwed by this ipa pass manager architecture.

Well, you should be able to collect the analysis at analysis time and
set the flags at execution time even if this is split in between passes.
> 
> 4) This ipa pass manager design will make it impossible to do true cfg 
> based inlining whose results can be seen by ipa passes since the actual 
> inlining, by design, must be done after all the ipa passes have finished 
> (this is the only time when the entire cfg has been built).
> 
> The proper design for this needs to be (where building the cfg is the 
> first ipa pass):
> for each pass manager pass P:
>  {
>     for each function F
>        call P.analyze_function(F)
> 
>     for each global variable V
>        call P analyze_variable (V)
>    
>     call P.execute()
> 
>     for each function F
>        call P.transform_function(F)
> 
>    for each global variable V
>        call P transform_variable (V)
>  }
> 

Hi,
we seem to hit again the problem on whether we want to do the analysis
first and modification later to allow real IPA later too that would
allow to do the analysis at compilation time.

I quite see that for the IPA passes development it is more convenient to
see them executed in the sequence, but I don't quite see how this can
scale up.  It seems to me that we perhaps want to go with combination of
these two approaches and have early compilation unit specific passes and
later the whole program passes, but even in this scenario the inlining
won't fit very well as we really want to have (most? of) inlining done
at whole program phase.

I would quite welcome some references here into how other production
compilers cope with this.  So far I was just looking into the SGI's
implementation and had dificulties to find anything usefull about the
others...

I can definitly move the analysis into separate loop so the cgraph and
basic flags are fully built before the analysis hooks are executed for
the current source level compilation unit, but I am still bit affraid to
go further to the requirement for everything to be readilly available
at each time.

Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with the ipa pass manager.
  2004-11-20 14:22 ` Jan Hubicka
@ 2004-11-20 19:07   ` Kenneth Zadeck
  2004-11-20 19:28     ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Kenneth Zadeck @ 2004-11-20 19:07 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Jan Hubicka, Steven Bosscher, Dale Johannesen, Stuart Hastings,
	GCC Mailing List, Berlin, Daniel, Novillo, Diego



>Well, you should be able to collect the analysis at analysis time and
>set the flags at execution time even if this is split in between passes.
>  
>
This is just not good enough.

>>4) This ipa pass manager design will make it impossible to do true cfg 
>>based inlining whose results can be seen by ipa passes since the actual 
>>inlining, by design, must be done after all the ipa passes have finished 
>>(this is the only time when the entire cfg has been built).
>>
>>The proper design for this needs to be (where building the cfg is the 
>>first ipa pass):
>>for each pass manager pass P:
>> {
>>    for each function F
>>       call P.analyze_function(F)
>>
>>    for each global variable V
>>       call P analyze_variable (V)
>>   
>>    call P.execute()
>>
>>    for each function F
>>       call P.transform_function(F)
>>
>>   for each global variable V
>>       call P transform_variable (V)
>> }
>>
>>    
>>
>
>Hi,
>we seem to hit again the problem on whether we want to do the analysis
>first and modification later to allow real IPA later too that would
>allow to do the analysis at compilation time.
>
>I quite see that for the IPA passes development it is more convenient to
>see them executed in the sequence, but I don't quite see how this can
>scale up.  It seems to me that we perhaps want to go with combination of
>these two approaches and have early compilation unit specific passes and
>later the whole program passes, but even in this scenario the inlining
>won't fit very well as we really want to have (most? of) inlining done
>at whole program phase.
>
>I would quite welcome some references here into how other production
>compilers cope with this.  So far I was just looking into the SGI's
>implementation and had dificulties to find anything usefull about the
>others...
>
>I can definitly move the analysis into separate loop so the cgraph and
>basic flags are fully built before the analysis hooks are executed for
>the current source level compilation unit, but I am still bit affraid to
>go further to the requirement for everything to be readilly available
>at each time.
>
>  
>
There is certainly the question of scaling.  There always is and there 
always will be.  I have not and would not advocate using the same 
algorithms at the module level as I would with the full program in front 
of me.  As the context gets larger the algorithms applied in that 
context must get weaker and weaker, or else the compilation will never 
finish.  However, for the level of a single compilation unit, it is not 
only possible but completely practical to use the same techniques that 
one uses on the for a single function. 

It was my understanding, that for the whole program compilation that 
when we compile each file to produce the .o file, we do a fairly good 
job of transforming that module before writing out the il and only do 
simple things that depend on the whole program at link time.  From that 
point of view, I believe that the plan is to output the ssa level il, 
and it is my plan to have done a significant amount of processing on 
that.  Given that a substantial number of calls in a c program are 
local, we should be able to extract a fair number of benefits early.

kenny




>Honza
>  
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with the ipa pass manager.
  2004-11-20 19:07   ` Kenneth Zadeck
@ 2004-11-20 19:28     ` Jan Hubicka
  2004-11-25 10:46       ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2004-11-20 19:28 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Jan Hubicka, Jan Hubicka, Steven Bosscher, Dale Johannesen,
	Stuart Hastings, GCC Mailing List, Berlin, Daniel, Novillo,
	Diego

> 
> 
> >Well, you should be able to collect the analysis at analysis time and
> >set the flags at execution time even if this is split in between passes.
> > 
> >
> This is just not good enough.
> 
> >>4) This ipa pass manager design will make it impossible to do true cfg 
> >>based inlining whose results can be seen by ipa passes since the actual 
> >>inlining, by design, must be done after all the ipa passes have finished 
> >>(this is the only time when the entire cfg has been built).
> >>
> >>The proper design for this needs to be (where building the cfg is the 
> >>first ipa pass):
> >>for each pass manager pass P:
> >>{
> >>   for each function F
> >>      call P.analyze_function(F)
> >>
> >>   for each global variable V
> >>      call P analyze_variable (V)
> >>  
> >>   call P.execute()
> >>
> >>   for each function F
> >>      call P.transform_function(F)
> >>
> >>  for each global variable V
> >>      call P transform_variable (V)
> >>}
> >>
> >>   
> >>
> >
> >Hi,
> >we seem to hit again the problem on whether we want to do the analysis
> >first and modification later to allow real IPA later too that would
> >allow to do the analysis at compilation time.
> >
> >I quite see that for the IPA passes development it is more convenient to
> >see them executed in the sequence, but I don't quite see how this can
> >scale up.  It seems to me that we perhaps want to go with combination of
> >these two approaches and have early compilation unit specific passes and
> >later the whole program passes, but even in this scenario the inlining
> >won't fit very well as we really want to have (most? of) inlining done
> >at whole program phase.
> >
> >I would quite welcome some references here into how other production
> >compilers cope with this.  So far I was just looking into the SGI's
> >implementation and had dificulties to find anything usefull about the
> >others...
> >
> >I can definitly move the analysis into separate loop so the cgraph and
> >basic flags are fully built before the analysis hooks are executed for
> >the current source level compilation unit, but I am still bit affraid to
> >go further to the requirement for everything to be readilly available
> >at each time.
> >
> > 
> >
> There is certainly the question of scaling.  There always is and there 
> always will be.  I have not and would not advocate using the same 
> algorithms at the module level as I would with the full program in front 
> of me.  As the context gets larger the algorithms applied in that 
> context must get weaker and weaker, or else the compilation will never 
> finish.  However, for the level of a single compilation unit, it is not 
> only possible but completely practical to use the same techniques that 
> one uses on the for a single function. 
> 
> It was my understanding, that for the whole program compilation that 
> when we compile each file to produce the .o file, we do a fairly good 
> job of transforming that module before writing out the il and only do 
> simple things that depend on the whole program at link time.  From that 
> point of view, I believe that the plan is to output the ssa level il, 
> and it is my plan to have done a significant amount of processing on 
> that.  Given that a substantial number of calls in a c program are 
> local, we should be able to extract a fair number of benefits early.

OK,
I guess I can start pushing bits to make the passmanager flexible and
allow combination of both approaches (this will need some work in the
inliner bit that is good cleanup anyway).
With that we can work out how much good we can gain fron applying the
IPA passes incrementally and how much we want (can) share with the local
IPA and the whole program IPA compilation.
Still I would like to take care to rely on the particular ordering of
the passes only when there are really good reasons for doing so...

Honza
> 
> kenny
> 
> 
> 
> 
> >Honza
> > 
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with the ipa pass manager.
  2004-11-20 19:28     ` Jan Hubicka
@ 2004-11-25 10:46       ` Jan Hubicka
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Hubicka @ 2004-11-25 10:46 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Kenneth Zadeck, Jan Hubicka, Steven Bosscher, Dale Johannesen,
	Stuart Hastings, GCC Mailing List, Berlin, Daniel, Novillo,
	Diego

> > There is certainly the question of scaling.  There always is and there 
> > always will be.  I have not and would not advocate using the same 
> > algorithms at the module level as I would with the full program in front 
> > of me.  As the context gets larger the algorithms applied in that 
> > context must get weaker and weaker, or else the compilation will never 
> > finish.  However, for the level of a single compilation unit, it is not 
> > only possible but completely practical to use the same techniques that 
> > one uses on the for a single function. 
> > 
> > It was my understanding, that for the whole program compilation that 
> > when we compile each file to produce the .o file, we do a fairly good 
> > job of transforming that module before writing out the il and only do 
> > simple things that depend on the whole program at link time.  From that 
> > point of view, I believe that the plan is to output the ssa level il, 
> > and it is my plan to have done a significant amount of processing on 
> > that.  Given that a substantial number of calls in a c program are 
> > local, we should be able to extract a fair number of benefits early.

I was thinking about this a bit more.  How much of algorithms for IPA
optimizations you would expect to exist in two versions (for single unit
and for whole program)?  It seems to me that most of the major IPA
transformations are fairly scalable and I don't know about any
implementation that would use more powerfull algorithms for single unit
first followed by traditional link time IPA.

It also seems to me that we don't want the compiler to be too much
dependent on the compilation unit choice as this is pretty much
artifical and depends on programming style...

But, of course, for some cases it makes sense to have unit scale
specific optimizers, just I wonder how much of the stuff will need to be
redesigned when (if) we get into link time IPA busyness.

Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-11-25  9:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-19 21:02 problems with the ipa pass manager Kenneth Zadeck
2004-11-20 14:22 ` Jan Hubicka
2004-11-20 19:07   ` Kenneth Zadeck
2004-11-20 19:28     ` Jan Hubicka
2004-11-25 10:46       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).