From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-304785-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 2231 invoked by alias); 17 Oct 2011 23:45:53 -0000
Received: (qmail 2220 invoked by uid 22791); 17 Oct 2011 23:45:53 -0000
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 17 Oct 2011 23:45:38 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com)	by relay1.mentorg.com with esmtp 	id 1RFwsD-0001HG-3s from Maxim_Kuvyrkov@mentor.com ; Mon, 17 Oct 2011 16:45:37 -0700
Received: from [127.0.0.1] ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830);	 Tue, 18 Oct 2011 00:45:35 +0100
Subject: Re: [PATCH] Add capability to run several iterations of early optimizations
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset=iso-8859-1
From: Maxim Kuvyrkov <maxim@codesourcery.com>
In-Reply-To: <CAFiYyc3R0r2Gc4bn1dqsx3eiv76yMKj59y2-068mJh5AoAEnyw@mail.gmail.com>
Date: Tue, 18 Oct 2011 03:00:00 -0000
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com>
References: <D2CC2627-619F-4154-85DD-0BF4A11D2027@codesourcery.com> <CAFiYyc3R0r2Gc4bn1dqsx3eiv76yMKj59y2-068mJh5AoAEnyw@mail.gmail.com>
To: Richard Guenther <richard.guenther@gmail.com>
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-10/txt/msg01583.txt.bz2

On 13/10/2011, at 12:58 AM, Richard Guenther wrote:

> On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov <maxim@codesourcery.com> =
wrote:
>> The following patch adds new knob to make GCC perform several iterations=
 of early optimizations and inlining.
>>=20
>> This is for dont-care-about-compile-time-optimize-all-you-can scenarios.=
  Performing several iterations of optimizations does significantly improve=
 code speed on a certain proprietary source base.  Some hand-tuning of the =
parameter value is required to get optimum performance.  Another good use f=
or this option is for search and ad-hoc analysis of cases where GCC misses =
optimization opportunities.
>>=20
>> With the default setting of '1', nothing is changed from the current sta=
tus quo.
>>=20
>> The patch was bootstrapped and regtested with 3 iterations set by defaul=
t on i686-linux-gnu.  The only failures in regression testsuite were due to=
 latent bugs in handling of EH information, which are being discussed in a =
different thread.
>>=20
>> Performance impact on the standard benchmarks is not conclusive, there a=
re improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*=
].  SPEC2006 benchmarks will take another day or two to complete and I will=
 update the spreadsheet then.  The benchmarks were run on a Core2 system fo=
r all combinations of {-m32/-m64}{-O2/-O3}.
>>=20
>> Effect on compilation time is fairly predictable, about 10% compile time=
 increase with 3 iterations.
>>=20
>> OK for trunk?
>=20
> I don't think this is a good idea, especially in the form you implemented=
 it.
>=20
> If we'd want to iterate early optimizations we'd want to do it by iterati=
ng
> an IPA pass so that we benefit from more precise size estimates
> when trying to inline a function the second time.=20=20

Could you elaborate on this a bit?  Early optimizations are gimple passes, =
so I'm missing your point here.

> Also statically
> scheduling the passes will mess up dump files and you have no
> chance of say, noticing that nothing changed for function f and its
> callees in iteration N and thus you can skip processing them in
> iteration N + 1.

Yes, these are the shortcomings.  The dump files name changes can be fixed,=
 e.g., by adding a suffix to the passes on iterations after the first one. =
 The analysis to avoid unnecessary iterations is more complex problem.

>=20
> So, at least you should split the pass_early_local_passes IPA pass
> into three, you'd iterate over the 2nd (definitely not over pass_split_fu=
nctions
> though), the third would be pass_profile and pass_split_functions only.
> And you'd iterate from the place the 2nd IPA pass is executed, not
> by scheduling them N times.

OK, I will look into this.

>=20
> Then you'd have to analyze the compile-time impact of the IPA
> splitting on its own when not iterating.  Then you should look
> at what actually was the optimizations that were performed
> that lead to the improvement (I can see some indirect inlining
> happening, but everything else would be a bug in present
> optimizers in the early pipeline - they are all designed to be
> roughly independent on each other and _not_ expose new
> opportunities by iteration).  Thus - testcases?

The initial motivation for the patch was to enable more indirect inlining a=
nd devirtualization opportunities. Since then I found the patch to be helpf=
ul in searching for optimization opportunities and bugs.  E.g., SPEC2006's =
471.omnetpp drops 20% with 2 additional iterations of early optimizations [=
*].  Given that applying more optimizations should, theoretically, not decr=
ease performance, there is likely a very real bug or deficiency behind that.

Thank you,

[*] https://docs.google.com/spreadsheet/ccc?key=3D0AvK0Y-Pgj7bNdFBQMEJ6d3la=
eFdvdk9lQ1p0LUFkVFE&hl=3Den_US

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics