From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-251155-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 22729 invoked by alias); 17 Oct 2009 11:10:07 -0000
Received: (qmail 22721 invoked by uid 22791); 17 Oct 2009 11:10:06 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 	tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail-vw0-f178.google.com (HELO mail-vw0-f178.google.com) (209.85.212.178)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 17 Oct 2009 11:10:01 +0000
Received: by vws8 with SMTP id 8so1984044vws.0         for <gcc-patches@gcc.gnu.org>; Sat, 17 Oct 2009 04:09:59 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.69.83 with SMTP id y19mr5400059vci.64.1255777798764; Sat,  	17 Oct 2009 04:09:58 -0700 (PDT)
In-Reply-To: <4AD93B5D.40902@redhat.com>
References: <4AC41EE0.8010000@redhat.com> 	 <20091001084947.GA5640@kam.mff.cuni.cz> <4AC4BD0D.2010400@redhat.com> 	 <20091014152122.GA30067@kam.mff.cuni.cz> <4AD5FBF7.6040301@redhat.com> 	 <84fc9c000910161434ue66ab0cve4ac0f08850c0e10@mail.gmail.com> 	 <4AD93B5D.40902@redhat.com>
Date: Sat, 17 Oct 2009 11:17:00 -0000
Message-ID: <84fc9c000910170409r876afe9nf86986ffb1e698d3@mail.gmail.com>
Subject: Re: Ping: IRA-based register pressure calculation for RTL loop  	invariant motion
From: Richard Guenther <richard.guenther@gmail.com>
To: Vladimir Makarov <vmakarov@redhat.com>
Cc: Zdenek Dvorak <rakdver@kam.mff.cuni.cz>, gcc-patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2009-10/txt/msg01120.txt.bz2

On Sat, Oct 17, 2009 at 5:34 AM, Vladimir Makarov <vmakarov@redhat.com> wro=
te:
> Richard Guenther wrote:
>>
>> On Wed, Oct 14, 2009 at 6:27 PM, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>
>>>
>>> Zdenek Dvorak wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>>>>
>>>>>>> + =A0 =A0 =A0if (i < ira_reg_class_cover_size)
>>>>>>> + =A0 =A0 =A0 size_cost =3D comp_cost + 10;
>>>>>>> + =A0 =A0 =A0else
>>>>>>> + =A0 =A0 =A0 size_cost =3D 0;
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Including comp_cost in size_cost makes no sense (this would prevent =
us
>>>>>> from
>>>>>> moving even very costly invariants out of the loop if we run out of
>>>>>> registers).
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> That is exactly what I intended. =A0As I wrote above, I tried a lot of
>>>>> =A0heuristics with different parameters which decided to move loop
>>>>> =A0invariant
>>>>> depending on spill cost and loop invariant cost. =A0But they =A0don't=
 =A0work
>>>>> well
>>>>> at least for x86/x86_64 and power6. =A0I have some =A0speculation for=
 this.
>>>>> =A0x86/x86_64 is OOO processors these days. =A0And =A0costly invarian=
t will
>>>>> be
>>>>> hidden because usually the invariant has a lot =A0of freedom to be
>>>>> executed
>>>>> out-of-order. =A0For power6, long latency is =A0hidden by insn schedu=
ling.
>>>>> =A0It
>>>>> is hard to me find a processor where it =A0will be important. =A0Anot=
her
>>>>> reason
>>>>> for this, it is very hard to evaluate =A0accurately spill cost at this
>>>>> stage.
>>>>> =A0So I decided not to use =A0combination of register pressure and
>>>>> invariant
>>>>> cost in my approach.
>>>>>
>>>>
>>>> could you please add this reasoning to the comment? =A0Another reason =
why
>>>> preventing the invariant motion does not hurt might be that all
>>>> expensive
>>>> invariants were already moved out of the loop by PRE and gimple
>>>> invariant
>>>> motion pass.
>>>>
>>>>
>>>>>
>>>>> + =A0 =A0 =A0for (i =3D 0; i < ira_reg_class_cover_size; i++)
>>>>> + =A0 =A0 =A0 {
>>>>> + =A0 =A0 =A0 =A0 cover_class =3D ira_reg_class_cover[i];
>>>>> + =A0 =A0 =A0 =A0 if ((int) new_regs[cover_class]
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 + (int) regs_needed[cover_class]
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 + LOOP_DATA (curr_loop)->max_reg_pressure[c=
over_class]
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 + IRA_LOOP_RESERVED_REGS
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 - ira_available_class_regs[cover_class] > 0)
>>>>> + =A0 =A0 =A0 =A0 =A0 break;
>>>>> + =A0 =A0 =A0 }
>>>>>
>>>>
>>>> It might be clearer to write this as ... >
>>>> ira_available_class_regs[cover_class] instead
>>>> of ... - ira_available_class_regs[cover_class] > 0. =A0Otherwise, the
>>>> patch
>>>> is OK.
>>>>
>>>>
>>>
>>> Zdenek, thanks for the additional comments. =A0I incorporated them into=
 the
>>> patch just before committing. =A0Here is the affected patch part:
>>>
>>
>> I think this consistently regressed both compile-time and runtime for
>> Polyhedron on x86_64. =A0For Itanium the story isn't clear, but effects
>> are seen there as well (it's also the only one I see off-noise effects
>> on SPEC 2000 - significant ups and downs).
>>
>>
>
> =A0Yes, it is expensive optimization (at least 3 additional passes
> through RTL insns one for calculating register pressure and two very
> expensive passes for finding register classes for pseudos). =A0It is
> clearly seen from SPEC compilation time graphs on
>
> http://vmakarov.fedorapeople.org/spec
>
> for 2 last benchmarking. =A0Therefore I proposed it only for -O3.
>
> Overall SPEC2000 scores are practically the same on x86/x86_64.
>
> As for Polyhedron benchmarks, here is my results on Core I7:
>
> first: =A0-ffast-math -funroll-loops -O3 -fno-ira-loop-pressure
> second: -ffast-math -funroll-loops -O3 -fira-loop-pressure
>
> x86:
> Geometric Mean Execution Time =3D =A0 =A0 =A012.84 seconds
> Geometric Mean Execution Time =3D =A0 =A0 =A012.82 seconds
>
> x86_64:
> Geometric Mean Execution Time =3D =A0 =A0 =A0 9.89 seconds
> Geometric Mean Execution Time =3D =A0 =A0 =A0 9.91 seconds
>
> On power6:
> first: =A0-mtune=3Dpower6 -ffast-math -funroll-loops -O3 -fno-ira-loop-pr=
essure
> second: -mtune=3Dpower6 -ffast-math -funroll-loops -O3 -fira-loop-pressure
>
> Geometric Mean Execution Time =3D =A0 =A0 =A019.22 seconds
> Geometric Mean Execution Time =3D =A0 =A0 =A019.04 seconds
>
> =A0As I wrote earlier the winner of the optimization usage will be
> loops with pressure lower (but not too lower) than #registers. =A0For
> x86/x86_64, practically all loops have pressure more than #registers.
> For such loops, evaluation of invariant cost vs spill cost would be
> important. =A0But at this stage, spill cost is impossible to evaluate
> accurately. =A0So usage of old and new loop invariant motion criteria on
> processors similar x86/x86_64 will give different results for particular
> tests (some tests better, some worse) but overall score will be
> practically the same.
>
> =A0Probably, there is no sense to use IRA-based register pressure calcula=
tion
> for all targets (including x86/x86_64) but for power it is a clear win as=
 it
> is seen from polyhedron and as I reported for SPEC2000.
>
> =A0So we could switch it off by default for -O3. =A0What do you think abo=
ut this
> solution, Richard?

I think we could switch it on by default at -O3 for a selected group of
targets.  Itanium overall also improves with the new heuristics.  That would
make it power and Itanium.  Did you try restricting the heuristics to certa=
in
register classes, like SSE registers on x86_64?

Thanks,
Richard.