Richard Guenther wrote: > On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov wrote: > >> Richard Guenther wrote: >> >>> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov >>> wrote: >>> >>> >>>> Zdenek Dvorak wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>>>>> + if (i < ira_reg_class_cover_size) >>>>>>>> + size_cost = comp_cost + 10; >>>>>>>> + else >>>>>>>> + size_cost = 0; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Including comp_cost in size_cost makes no sense (this would prevent us >>>>>>> from >>>>>>> moving even very costly invariants out of the loop if we run out of >>>>>>> registers). >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> That is exactly what I intended. As I wrote above, I tried a lot of >>>>>> heuristics with different parameters which decided to move loop >>>>>> invariant >>>>>> depending on spill cost and loop invariant cost. But they don't work >>>>>> well >>>>>> at least for x86/x86_64 and power6. I have some speculation for this. >>>>>> x86/x86_64 is OOO processors these days. And costly invariant will >>>>>> be >>>>>> hidden because usually the invariant has a lot of freedom to be >>>>>> executed >>>>>> out-of-order. For power6, long latency is hidden by insn scheduling. >>>>>> It >>>>>> is hard to me find a processor where it will be important. Another >>>>>> reason >>>>>> for this, it is very hard to evaluate accurately spill cost at this >>>>>> stage. >>>>>> So I decided not to use combination of register pressure and >>>>>> invariant >>>>>> cost in my approach. >>>>>> >>>>>> >>>>> could you please add this reasoning to the comment? Another reason why >>>>> preventing the invariant motion does not hurt might be that all >>>>> expensive >>>>> invariants were already moved out of the loop by PRE and gimple >>>>> invariant >>>>> motion pass. >>>>> >>>>> >>>>> >>>>>> + for (i = 0; i < ira_reg_class_cover_size; i++) >>>>>> + { >>>>>> + cover_class = ira_reg_class_cover[i]; >>>>>> + if ((int) new_regs[cover_class] >>>>>> + + (int) regs_needed[cover_class] >>>>>> + + LOOP_DATA (curr_loop)->max_reg_pressure[cover_class] >>>>>> + + IRA_LOOP_RESERVED_REGS >>>>>> + - ira_available_class_regs[cover_class] > 0) >>>>>> + break; >>>>>> + } >>>>>> >>>>>> >>>>> It might be clearer to write this as ... > >>>>> ira_available_class_regs[cover_class] instead >>>>> of ... - ira_available_class_regs[cover_class] > 0. Otherwise, the >>>>> patch >>>>> is OK. >>>>> >>>>> >>>>> >>>> Zdenek, thanks for the additional comments. I incorporated them into the >>>> patch just before committing. Here is the affected patch part: >>>> >>>> >>> I think this consistently regressed both compile-time and runtime for >>> Polyhedron on x86_64. For Itanium the story isn't clear, but effects >>> are seen there as well (it's also the only one I see off-noise effects >>> on SPEC 2000 - significant ups and downs). >>> >>> >>> >> Yes, it is expensive optimization (at least 3 additional passes >> through RTL insns one for calculating register pressure and two very >> expensive passes for finding register classes for pseudos). It is >> clearly seen from SPEC compilation time graphs on >> >> http://vmakarov.fedorapeople.org/spec >> >> for 2 last benchmarking. Therefore I proposed it only for -O3. >> >> Overall SPEC2000 scores are practically the same on x86/x86_64. >> >> As for Polyhedron benchmarks, here is my results on Core I7: >> >> first: -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure >> second: -ffast-math -funroll-loops -O3 -fira-loop-pressure >> >> x86: >> Geometric Mean Execution Time = 12.84 seconds >> Geometric Mean Execution Time = 12.82 seconds >> >> x86_64: >> Geometric Mean Execution Time = 9.89 seconds >> Geometric Mean Execution Time = 9.91 seconds >> >> On power6: >> first: -mtune=power6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pressure >> second: -mtune=power6 -ffast-math -funroll-loops -O3 -fira-loop-pressure >> >> Geometric Mean Execution Time = 19.22 seconds >> Geometric Mean Execution Time = 19.04 seconds >> >> As I wrote earlier the winner of the optimization usage will be >> loops with pressure lower (but not too lower) than #registers. For >> x86/x86_64, practically all loops have pressure more than #registers. >> For such loops, evaluation of invariant cost vs spill cost would be >> important. But at this stage, spill cost is impossible to evaluate >> accurately. So usage of old and new loop invariant motion criteria on >> processors similar x86/x86_64 will give different results for particular >> tests (some tests better, some worse) but overall score will be >> practically the same. >> >> Probably, there is no sense to use IRA-based register pressure calculation >> for all targets (including x86/x86_64) but for power it is a clear win as it >> is seen from polyhedron and as I reported for SPEC2000. >> >> So we could switch it off by default for -O3. What do you think about this >> solution, Richard? >> > > I think we could switch it on by default at -O3 for a selected group of > targets. Itanium overall also improves with the new heuristics. That would > make it power and Itanium. The patch is below. Ok to commit? > Did you try restricting the heuristics to certain > register classes, like SSE registers on x86_64? > > No, I did not try. I am not sure it is worth to do it. 2009-10-19 Vladimir Makarov * doc/invoke.texi (fira-loop-pressure): Update default value. * opts.c (decode_options): Remove default value setting for flag_ira_loop_pressure. * config/ia64/ia64.c (ia64_override_options): Set flag_ira_loop_pressure up for -O3. * config/rs6000/rs6000.c (rs6000_override_options): Ditto.