From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15261 invoked by alias); 18 Oct 2011 08:57:09 -0000 Received: (qmail 15247 invoked by uid 22791); 18 Oct 2011 08:57:07 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-gx0-f175.google.com (HELO mail-gx0-f175.google.com) (209.85.161.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 18 Oct 2011 08:56:49 +0000 Received: by ggnq1 with SMTP id q1so381848ggn.20 for ; Tue, 18 Oct 2011 01:56:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.12.69 with SMTP id w5mr749536obb.31.1318928208599; Tue, 18 Oct 2011 01:56:48 -0700 (PDT) Received: by 10.182.28.138 with HTTP; Tue, 18 Oct 2011 01:56:48 -0700 (PDT) In-Reply-To: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> References: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> Date: Tue, 18 Oct 2011 09:09:00 -0000 Message-ID: Subject: Re: [PATCH] Add capability to run several iterations of early optimizations From: Richard Guenther To: Maxim Kuvyrkov Cc: GCC Patches Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg01602.txt.bz2 On Tue, Oct 18, 2011 at 1:45 AM, Maxim Kuvyrkov wr= ote: > On 13/10/2011, at 12:58 AM, Richard Guenther wrote: > >> On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov = wrote: >>> The following patch adds new knob to make GCC perform several iteration= s of early optimizations and inlining. >>> >>> This is for dont-care-about-compile-time-optimize-all-you-can scenarios= . =A0Performing several iterations of optimizations does significantly impr= ove code speed on a certain proprietary source base. =A0Some hand-tuning of= the parameter value is required to get optimum performance. =A0Another goo= d use for this option is for search and ad-hoc analysis of cases where GCC = misses optimization opportunities. >>> >>> With the default setting of '1', nothing is changed from the current st= atus quo. >>> >>> The patch was bootstrapped and regtested with 3 iterations set by defau= lt on i686-linux-gnu. =A0The only failures in regression testsuite were due= to latent bugs in handling of EH information, which are being discussed in= a different thread. >>> >>> Performance impact on the standard benchmarks is not conclusive, there = are improvements in SPEC2000 of up to 4% and regressions down to -2%, see [= *]. =A0SPEC2006 benchmarks will take another day or two to complete and I w= ill update the spreadsheet then. =A0The benchmarks were run on a Core2 syst= em for all combinations of {-m32/-m64}{-O2/-O3}. >>> >>> Effect on compilation time is fairly predictable, about 10% compile tim= e increase with 3 iterations. >>> >>> OK for trunk? >> >> I don't think this is a good idea, especially in the form you implemente= d it. >> >> If we'd want to iterate early optimizations we'd want to do it by iterat= ing >> an IPA pass so that we benefit from more precise size estimates >> when trying to inline a function the second time. > > Could you elaborate on this a bit? =A0Early optimizations are gimple pass= es, so I'm missing your point here. pass_early_local_passes is an IPA pass, you want to iterate fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better inlining. Thus you need to split pass_early_local_passes into pieces so you can iterate one of the IPA pieces. >> Also statically >> scheduling the passes will mess up dump files and you have no >> chance of say, noticing that nothing changed for function f and its >> callees in iteration N and thus you can skip processing them in >> iteration N + 1. > > Yes, these are the shortcomings. =A0The dump files name changes can be fi= xed, e.g., by adding a suffix to the passes on iterations after the first o= ne. =A0The analysis to avoid unnecessary iterations is more complex problem. Sure. I analyzed early passes by manually duplicating them and test that they do nothing for tramp3d, which they pretty much all did at some point. >> >> So, at least you should split the pass_early_local_passes IPA pass >> into three, you'd iterate over the 2nd (definitely not over pass_split_f= unctions >> though), the third would be pass_profile and pass_split_functions only. >> And you'd iterate from the place the 2nd IPA pass is executed, not >> by scheduling them N times. > > OK, I will look into this. > >> >> Then you'd have to analyze the compile-time impact of the IPA >> splitting on its own when not iterating. =A0Then you should look >> at what actually was the optimizations that were performed >> that lead to the improvement (I can see some indirect inlining >> happening, but everything else would be a bug in present >> optimizers in the early pipeline - they are all designed to be >> roughly independent on each other and _not_ expose new >> opportunities by iteration). =A0Thus - testcases? > > The initial motivation for the patch was to enable more indirect inlining= and devirtualization opportunities. Hm. > Since then I found the patch to be helpful in searching for optimization = opportunities and bugs. =A0E.g., SPEC2006's 471.omnetpp drops 20% with 2 ad= ditional iterations of early optimizations [*]. =A0Given that applying more= optimizations should, theoretically, not decrease performance, there is li= kely a very real bug or deficiency behind that. It is likely early SRA that messes up, or maybe convert switch. Early passes should be really restricted to always profitable cleanups. Your experiment looks useful to track down these bugs, but in general I don't think we want to expose iterating early passes. Richard. > Thank you, > > [*] https://docs.google.com/spreadsheet/ccc?key=3D0AvK0Y-Pgj7bNdFBQMEJ6d3= laeFdvdk9lQ1p0LUFkVFE&hl=3Den_US > > -- > Maxim Kuvyrkov > CodeSourcery / Mentor Graphics > > > >