From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-305934-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 12973 invoked by alias); 28 Oct 2011 23:07:04 -0000
Received: (qmail 12960 invoked by uid 22791); 28 Oct 2011 23:07:02 -0000
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from cesium.clock.org (HELO cesium.clock.org) (192.5.16.65)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Oct 2011 23:06:48 +0000
Received: from cesium.clock.org (cesium.clock.org [192.5.16.65])	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))	(No client certificate requested)	by cesium.clock.org (Postfix) with ESMTP id 9288915EC8E;	Fri, 28 Oct 2011 16:06:47 -0700 (PDT)
Date: Sat, 29 Oct 2011 00:10:00 -0000
From: Matt <matt@use.net>
To: Maxim Kuvyrkov <maxim@codesourcery.com>
cc: Richard Guenther <richard.guenther@gmail.com>,     GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Add capability to run several iterations of early optimizations
In-Reply-To: <AE6C3BCE-4EE9-478A-BABA-0D52CB7C3003@codesourcery.com>
Message-ID: <Pine.NEB.4.64.1110281533560.8520@cesium.clock.org>
References: <D2CC2627-619F-4154-85DD-0BF4A11D2027@codesourcery.com> <CAFiYyc3R0r2Gc4bn1dqsx3eiv76yMKj59y2-068mJh5AoAEnyw@mail.gmail.com> <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> <CAFiYyc30CiDyug50_fqUFFi6zArwxZP1qz0Fg=JOZQwSS=fK8Q@mail.gmail.com> <2047F9D7-5DE8-42C3-8E6E-B20A2752AB46@codesourcery.com> <CAFiYyc1AHg7u0VA2XiZ50ozY6=BaPQx7Dct9-7M1rUzdgNj9Lw@mail.gmail.com> <AE6C3BCE-4EE9-478A-BABA-0D52CB7C3003@codesourcery.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-10/txt/msg02732.txt.bz2

On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote:

>> I like this variant a lot better than the last one - still it lacks any
>> analysis-based justification for iteration (see my reply to Matt on
>> what I discussed with Honza).
>
> Yes, having a way to tell whether a function have significantly changed 
> would be awesome.  My approach here would be to make inline_parameters 
> output feedback of how much the size/time metrics have changed for a 
> function since previous run.  If the change is above X%, then queue 
> functions callers for more optimizations.  Similarly, Martin's 
> rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue 
> new direct callees and current function for another iteration if new 
> direct edges were resolved.

Figuring out the heuristic will need decent testing on a few projects to 
figure out what the "sweet spot" is (smallest binary for time/passes 
spent) for that given codebase. With a few data points, a reasonable stab 
at the metrics you mention can be had that would not terminate the 
iterations before the known optimial number of passes. Without those data 
points, it seems like making sure the metrics allow those "sweet spots" to 
be attained will be difficult.

>>  Thus, I don't think we want to
>> merge this in its current form or in this stage1.
>
> What is the benefit of pushing this to a later release?  If anything, 
> merging the support for iterative optimizations now will allow us to 
> consider adding the wonderful smartness to it later.  In the meantime, 
> substituting that smartness with a knob is still a great alternative.

I agree (of course). Having the knob will be very useful for testing and 
determining the acceptance criteria for the later "smartness". While 
terminating early would be a nice optimization, the feature is still 
intrinsically useful and deployable without it. In addition, when using 
LTO on nearly all the projects/modules I tested on, 3+ passes were 
always productive. To be fair, when not using LTO, beyond 2-3 passes did 
not often produce improvements unless individual compilation units were 
enormous.

There was also the question of if some of the improvements seen with 
multiple passes were indicative of deficiencies in early inlining, CFG, 
SRA, etc. If the knob is available, I'm happy to continue testing on the 
same projects I've filed recent LTO/graphite bugs against (glib, zlib, 
openssl, scummvm, binutils, etc) and write a report on what I observe as 
"suspicious" improvements that perhaps should be caught/made in a single 
pass.

It's worth noting again that while this is a useful feature in and of 
itself (especially when combined with LTO), it's *extremely* useful when 
coupled with the de-virtualization improvements submitted in other 
threads. The examples submitted for inclusion in the test suite aren't 
academic -- they are reductions of real-world performance issues from a 
mature (and shipping) C++-based networking product. Any C++ codebase that 
employs physical separation in their designs via Factory patterns, 
Interface Segregation, and/or Dependency Inversion will likely see 
improvements. To me, these enahncements combine to form one of the biggest 
leaps I've seen in C++ code optimization -- code that can be clean, OO, 
*and* fast.

Richard: If there's any additional testing or information I can reasonably 
provide to help get this in for this stage1, let me know.

Thanks!


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt