Richard, Just as Matt posted his findings about the effect of iterating early optimizations, I've got the new patch ready. This patch is essentially a complete rewrite and addresses the comments you made. On 18/10/2011, at 9:56 PM, Richard Guenther wrote: >>> >>> If we'd want to iterate early optimizations we'd want to do it by iterating >>> an IPA pass so that we benefit from more precise size estimates >>> when trying to inline a function the second time. >> >> Could you elaborate on this a bit? Early optimizations are gimple passes, so I'm missing your point here. > > pass_early_local_passes is an IPA pass, you want to iterate > fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better > inlining. Thus you need to split pass_early_local_passes into pieces > so you can iterate one of the IPA pieces. Early_local_passes are now split into _main, _iter and _late parts. To avoid changing the default case, _late part is merged into _main when no iterative optimizations are requested. > >>> Also statically >>> scheduling the passes will mess up dump files and you have no >>> chance of say, noticing that nothing changed for function f and its >>> callees in iteration N and thus you can skip processing them in >>> iteration N + 1. >> >> Yes, these are the shortcomings. The dump files name changes can be fixed, e.g., by adding a suffix to the passes on iterations after the first one. The analysis to avoid unnecessary iterations is more complex problem. To avoid changing the dump file names the patch appends "_iter" suffix to the dumps of iterative passes. > > Sure. I analyzed early passes by manually duplicating them and > test that they do nothing for tramp3d, which they pretty much all did > at some point. > >>> >>> So, at least you should split the pass_early_local_passes IPA pass >>> into three, you'd iterate over the 2nd (definitely not over pass_split_functions >>> though), the third would be pass_profile and pass_split_functions only. >>> And you'd iterate from the place the 2nd IPA pass is executed, not >>> by scheduling them N times. >> >> OK, I will look into this. Done. >> >>> >>> Then you'd have to analyze the compile-time impact of the IPA >>> splitting on its own when not iterating. I decided to avoid this and keep the pass pipeline effectively the same when not running iterative optimizations. This is achieved by scheduling pass_early_optimizations_late in different places in the pipeline depending on whether iterative optimizations are enabled or not. The patch bootstraps and passes regtest on i686-pc-linux-gnu {-m32/-m64} with 3 iterations enabled by default. The only failures are 5 scan-dump tests that are due to more functions being inlined than expected. With iterative optimizations disabled there is no change. I've kicked off SPEC2000/SPEC2006 benchmark runs to see the performance effect of the patch, and those will be posted in the same Google Docs spreadsheet in several days. OK for trunk? -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics