From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2231 invoked by alias); 17 Oct 2011 23:45:53 -0000 Received: (qmail 2220 invoked by uid 22791); 17 Oct 2011 23:45:53 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 17 Oct 2011 23:45:38 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1RFwsD-0001HG-3s from Maxim_Kuvyrkov@mentor.com ; Mon, 17 Oct 2011 16:45:37 -0700 Received: from [127.0.0.1] ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 18 Oct 2011 00:45:35 +0100 Subject: Re: [PATCH] Add capability to run several iterations of early optimizations Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: Maxim Kuvyrkov In-Reply-To: Date: Tue, 18 Oct 2011 03:00:00 -0000 Cc: GCC Patches Content-Transfer-Encoding: quoted-printable Message-Id: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> References: To: Richard Guenther Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg01583.txt.bz2 On 13/10/2011, at 12:58 AM, Richard Guenther wrote: > On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov = wrote: >> The following patch adds new knob to make GCC perform several iterations= of early optimizations and inlining. >>=20 >> This is for dont-care-about-compile-time-optimize-all-you-can scenarios.= Performing several iterations of optimizations does significantly improve= code speed on a certain proprietary source base. Some hand-tuning of the = parameter value is required to get optimum performance. Another good use f= or this option is for search and ad-hoc analysis of cases where GCC misses = optimization opportunities. >>=20 >> With the default setting of '1', nothing is changed from the current sta= tus quo. >>=20 >> The patch was bootstrapped and regtested with 3 iterations set by defaul= t on i686-linux-gnu. The only failures in regression testsuite were due to= latent bugs in handling of EH information, which are being discussed in a = different thread. >>=20 >> Performance impact on the standard benchmarks is not conclusive, there a= re improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*= ]. SPEC2006 benchmarks will take another day or two to complete and I will= update the spreadsheet then. The benchmarks were run on a Core2 system fo= r all combinations of {-m32/-m64}{-O2/-O3}. >>=20 >> Effect on compilation time is fairly predictable, about 10% compile time= increase with 3 iterations. >>=20 >> OK for trunk? >=20 > I don't think this is a good idea, especially in the form you implemented= it. >=20 > If we'd want to iterate early optimizations we'd want to do it by iterati= ng > an IPA pass so that we benefit from more precise size estimates > when trying to inline a function the second time.=20=20 Could you elaborate on this a bit? Early optimizations are gimple passes, = so I'm missing your point here. > Also statically > scheduling the passes will mess up dump files and you have no > chance of say, noticing that nothing changed for function f and its > callees in iteration N and thus you can skip processing them in > iteration N + 1. Yes, these are the shortcomings. The dump files name changes can be fixed,= e.g., by adding a suffix to the passes on iterations after the first one. = The analysis to avoid unnecessary iterations is more complex problem. >=20 > So, at least you should split the pass_early_local_passes IPA pass > into three, you'd iterate over the 2nd (definitely not over pass_split_fu= nctions > though), the third would be pass_profile and pass_split_functions only. > And you'd iterate from the place the 2nd IPA pass is executed, not > by scheduling them N times. OK, I will look into this. >=20 > Then you'd have to analyze the compile-time impact of the IPA > splitting on its own when not iterating. Then you should look > at what actually was the optimizations that were performed > that lead to the improvement (I can see some indirect inlining > happening, but everything else would be a bug in present > optimizers in the early pipeline - they are all designed to be > roughly independent on each other and _not_ expose new > opportunities by iteration). Thus - testcases? The initial motivation for the patch was to enable more indirect inlining a= nd devirtualization opportunities. Since then I found the patch to be helpf= ul in searching for optimization opportunities and bugs. E.g., SPEC2006's = 471.omnetpp drops 20% with 2 additional iterations of early optimizations [= *]. Given that applying more optimizations should, theoretically, not decr= ease performance, there is likely a very real bug or deficiency behind that. Thank you, [*] https://docs.google.com/spreadsheet/ccc?key=3D0AvK0Y-Pgj7bNdFBQMEJ6d3la= eFdvdk9lQ1p0LUFkVFE&hl=3Den_US -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics