Richard,

Just as Matt posted his findings about the effect of iterating early optimizations, I've got the new patch ready.  This patch is essentially a complete rewrite and addresses the comments you made.

On 18/10/2011, at 9:56 PM, Richard Guenther wrote:

>>> 
>>> If we'd want to iterate early optimizations we'd want to do it by iterating
>>> an IPA pass so that we benefit from more precise size estimates
>>> when trying to inline a function the second time.
>> 
>> Could you elaborate on this a bit?  Early optimizations are gimple passes, so I'm missing your point here.
> 
> pass_early_local_passes is an IPA pass, you want to iterate
> fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better
> inlining.  Thus you need to split pass_early_local_passes into pieces
> so you can iterate one of the IPA pieces.

Early_local_passes are now split into _main, _iter and _late parts.  To avoid changing the default case, _late part is merged into _main when no iterative optimizations are requested.

> 
>>> Also statically
>>> scheduling the passes will mess up dump files and you have no
>>> chance of say, noticing that nothing changed for function f and its
>>> callees in iteration N and thus you can skip processing them in
>>> iteration N + 1.
>> 
>> Yes, these are the shortcomings.  The dump files name changes can be fixed, e.g., by adding a suffix to the passes on iterations after the first one.  The analysis to avoid unnecessary iterations is more complex problem.

To avoid changing the dump file names the patch appends "_iter" suffix to the dumps of iterative passes.

> 
> Sure.  I analyzed early passes by manually duplicating them and
> test that they do nothing for tramp3d, which they pretty much all did
> at some point.
> 
>>> 
>>> So, at least you should split the pass_early_local_passes IPA pass
>>> into three, you'd iterate over the 2nd (definitely not over pass_split_functions
>>> though), the third would be pass_profile and pass_split_functions only.
>>> And you'd iterate from the place the 2nd IPA pass is executed, not
>>> by scheduling them N times.
>> 
>> OK, I will look into this.

Done.

>> 
>>> 
>>> Then you'd have to analyze the compile-time impact of the IPA
>>> splitting on its own when not iterating.  

I decided to avoid this and keep the pass pipeline effectively the same when not running iterative optimizations.  This is achieved by scheduling pass_early_optimizations_late in different places in the pipeline depending on whether iterative optimizations are enabled or not.

The patch bootstraps and passes regtest on i686-pc-linux-gnu {-m32/-m64} with 3 iterations enabled by default.  The only failures are 5 scan-dump tests that are due to more functions being inlined than expected.  With iterative optimizations disabled there is no change.

I've kicked off SPEC2000/SPEC2006 benchmark runs to see the performance effect of the patch, and those will be posted in the same Google Docs spreadsheet in several days.

OK for trunk?

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics