From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22729 invoked by alias); 17 Oct 2009 11:10:07 -0000 Received: (qmail 22721 invoked by uid 22791); 17 Oct 2009 11:10:06 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-vw0-f178.google.com (HELO mail-vw0-f178.google.com) (209.85.212.178) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 17 Oct 2009 11:10:01 +0000 Received: by vws8 with SMTP id 8so1984044vws.0 for ; Sat, 17 Oct 2009 04:09:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.69.83 with SMTP id y19mr5400059vci.64.1255777798764; Sat, 17 Oct 2009 04:09:58 -0700 (PDT) In-Reply-To: <4AD93B5D.40902@redhat.com> References: <4AC41EE0.8010000@redhat.com> <20091001084947.GA5640@kam.mff.cuni.cz> <4AC4BD0D.2010400@redhat.com> <20091014152122.GA30067@kam.mff.cuni.cz> <4AD5FBF7.6040301@redhat.com> <84fc9c000910161434ue66ab0cve4ac0f08850c0e10@mail.gmail.com> <4AD93B5D.40902@redhat.com> Date: Sat, 17 Oct 2009 11:17:00 -0000 Message-ID: <84fc9c000910170409r876afe9nf86986ffb1e698d3@mail.gmail.com> Subject: Re: Ping: IRA-based register pressure calculation for RTL loop invariant motion From: Richard Guenther To: Vladimir Makarov Cc: Zdenek Dvorak , gcc-patches Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg01120.txt.bz2 On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov wro= te: > Richard Guenther wrote: >> >> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov >> wrote: >> >>> >>> Zdenek Dvorak wrote: >>> >>>> >>>> Hi, >>>> >>>> >>>>>>> >>>>>>> + =A0 =A0 =A0if (i < ira_reg_class_cover_size) >>>>>>> + =A0 =A0 =A0 size_cost =3D comp_cost + 10; >>>>>>> + =A0 =A0 =A0else >>>>>>> + =A0 =A0 =A0 size_cost =3D 0; >>>>>>> >>>>>>> >>>>>> >>>>>> Including comp_cost in size_cost makes no sense (this would prevent = us >>>>>> from >>>>>> moving even very costly invariants out of the loop if we run out of >>>>>> registers). >>>>>> >>>>>> >>>>>> >>>>> >>>>> That is exactly what I intended. =A0As I wrote above, I tried a lot of >>>>> =A0heuristics with different parameters which decided to move loop >>>>> =A0invariant >>>>> depending on spill cost and loop invariant cost. =A0But they =A0don't= =A0work >>>>> well >>>>> at least for x86/x86_64 and power6. =A0I have some =A0speculation for= this. >>>>> =A0x86/x86_64 is OOO processors these days. =A0And =A0costly invarian= t will >>>>> be >>>>> hidden because usually the invariant has a lot =A0of freedom to be >>>>> executed >>>>> out-of-order. =A0For power6, long latency is =A0hidden by insn schedu= ling. >>>>> =A0It >>>>> is hard to me find a processor where it =A0will be important. =A0Anot= her >>>>> reason >>>>> for this, it is very hard to evaluate =A0accurately spill cost at this >>>>> stage. >>>>> =A0So I decided not to use =A0combination of register pressure and >>>>> invariant >>>>> cost in my approach. >>>>> >>>> >>>> could you please add this reasoning to the comment? =A0Another reason = why >>>> preventing the invariant motion does not hurt might be that all >>>> expensive >>>> invariants were already moved out of the loop by PRE and gimple >>>> invariant >>>> motion pass. >>>> >>>> >>>>> >>>>> + =A0 =A0 =A0for (i =3D 0; i < ira_reg_class_cover_size; i++) >>>>> + =A0 =A0 =A0 { >>>>> + =A0 =A0 =A0 =A0 cover_class =3D ira_reg_class_cover[i]; >>>>> + =A0 =A0 =A0 =A0 if ((int) new_regs[cover_class] >>>>> + =A0 =A0 =A0 =A0 =A0 =A0 + (int) regs_needed[cover_class] >>>>> + =A0 =A0 =A0 =A0 =A0 =A0 + LOOP_DATA (curr_loop)->max_reg_pressure[c= over_class] >>>>> + =A0 =A0 =A0 =A0 =A0 =A0 + IRA_LOOP_RESERVED_REGS >>>>> + =A0 =A0 =A0 =A0 =A0 =A0 - ira_available_class_regs[cover_class] > 0) >>>>> + =A0 =A0 =A0 =A0 =A0 break; >>>>> + =A0 =A0 =A0 } >>>>> >>>> >>>> It might be clearer to write this as ... > >>>> ira_available_class_regs[cover_class] instead >>>> of ... - ira_available_class_regs[cover_class] > 0. =A0Otherwise, the >>>> patch >>>> is OK. >>>> >>>> >>> >>> Zdenek, thanks for the additional comments. =A0I incorporated them into= the >>> patch just before committing. =A0Here is the affected patch part: >>> >> >> I think this consistently regressed both compile-time and runtime for >> Polyhedron on x86_64. =A0For Itanium the story isn't clear, but effects >> are seen there as well (it's also the only one I see off-noise effects >> on SPEC 2000 - significant ups and downs). >> >> > > =A0Yes, it is expensive optimization (at least 3 additional passes > through RTL insns one for calculating register pressure and two very > expensive passes for finding register classes for pseudos). =A0It is > clearly seen from SPEC compilation time graphs on > > http://vmakarov.fedorapeople.org/spec > > for 2 last benchmarking. =A0Therefore I proposed it only for -O3. > > Overall SPEC2000 scores are practically the same on x86/x86_64. > > As for Polyhedron benchmarks, here is my results on Core I7: > > first: =A0-ffast-math -funroll-loops -O3 -fno-ira-loop-pressure > second: -ffast-math -funroll-loops -O3 -fira-loop-pressure > > x86: > Geometric Mean Execution Time =3D =A0 =A0 =A012.84 seconds > Geometric Mean Execution Time =3D =A0 =A0 =A012.82 seconds > > x86_64: > Geometric Mean Execution Time =3D =A0 =A0 =A0 9.89 seconds > Geometric Mean Execution Time =3D =A0 =A0 =A0 9.91 seconds > > On power6: > first: =A0-mtune=3Dpower6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pr= essure > second: -mtune=3Dpower6 -ffast-math -funroll-loops -O3 -fira-loop-pressure > > Geometric Mean Execution Time =3D =A0 =A0 =A019.22 seconds > Geometric Mean Execution Time =3D =A0 =A0 =A019.04 seconds > > =A0As I wrote earlier the winner of the optimization usage will be > loops with pressure lower (but not too lower) than #registers. =A0For > x86/x86_64, practically all loops have pressure more than #registers. > For such loops, evaluation of invariant cost vs spill cost would be > important. =A0But at this stage, spill cost is impossible to evaluate > accurately. =A0So usage of old and new loop invariant motion criteria on > processors similar x86/x86_64 will give different results for particular > tests (some tests better, some worse) but overall score will be > practically the same. > > =A0Probably, there is no sense to use IRA-based register pressure calcula= tion > for all targets (including x86/x86_64) but for power it is a clear win as= it > is seen from polyhedron and as I reported for SPEC2000. > > =A0So we could switch it off by default for -O3. =A0What do you think abo= ut this > solution, Richard? I think we could switch it on by default at -O3 for a selected group of targets. Itanium overall also improves with the new heuristics. That would make it power and Itanium. Did you try restricting the heuristics to certa= in register classes, like SSE registers on x86_64? Thanks, Richard.