From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12973 invoked by alias); 28 Oct 2011 23:07:04 -0000 Received: (qmail 12960 invoked by uid 22791); 28 Oct 2011 23:07:02 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from cesium.clock.org (HELO cesium.clock.org) (192.5.16.65) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Oct 2011 23:06:48 +0000 Received: from cesium.clock.org (cesium.clock.org [192.5.16.65]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by cesium.clock.org (Postfix) with ESMTP id 9288915EC8E; Fri, 28 Oct 2011 16:06:47 -0700 (PDT) Date: Sat, 29 Oct 2011 00:10:00 -0000 From: Matt To: Maxim Kuvyrkov cc: Richard Guenther , GCC Patches Subject: Re: [PATCH] Add capability to run several iterations of early optimizations In-Reply-To: Message-ID: References: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> <2047F9D7-5DE8-42C3-8E6E-B20A2752AB46@codesourcery.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg02732.txt.bz2 On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote: >> I like this variant a lot better than the last one - still it lacks any >> analysis-based justification for iteration (see my reply to Matt on >> what I discussed with Honza). > > Yes, having a way to tell whether a function have significantly changed > would be awesome. My approach here would be to make inline_parameters > output feedback of how much the size/time metrics have changed for a > function since previous run. If the change is above X%, then queue > functions callers for more optimizations. Similarly, Martin's > rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue > new direct callees and current function for another iteration if new > direct edges were resolved. Figuring out the heuristic will need decent testing on a few projects to figure out what the "sweet spot" is (smallest binary for time/passes spent) for that given codebase. With a few data points, a reasonable stab at the metrics you mention can be had that would not terminate the iterations before the known optimial number of passes. Without those data points, it seems like making sure the metrics allow those "sweet spots" to be attained will be difficult. >> Thus, I don't think we want to >> merge this in its current form or in this stage1. > > What is the benefit of pushing this to a later release? If anything, > merging the support for iterative optimizations now will allow us to > consider adding the wonderful smartness to it later. In the meantime, > substituting that smartness with a knob is still a great alternative. I agree (of course). Having the knob will be very useful for testing and determining the acceptance criteria for the later "smartness". While terminating early would be a nice optimization, the feature is still intrinsically useful and deployable without it. In addition, when using LTO on nearly all the projects/modules I tested on, 3+ passes were always productive. To be fair, when not using LTO, beyond 2-3 passes did not often produce improvements unless individual compilation units were enormous. There was also the question of if some of the improvements seen with multiple passes were indicative of deficiencies in early inlining, CFG, SRA, etc. If the knob is available, I'm happy to continue testing on the same projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, scummvm, binutils, etc) and write a report on what I observe as "suspicious" improvements that perhaps should be caught/made in a single pass. It's worth noting again that while this is a useful feature in and of itself (especially when combined with LTO), it's *extremely* useful when coupled with the de-virtualization improvements submitted in other threads. The examples submitted for inclusion in the test suite aren't academic -- they are reductions of real-world performance issues from a mature (and shipping) C++-based networking product. Any C++ codebase that employs physical separation in their designs via Factory patterns, Interface Segregation, and/or Dependency Inversion will likely see improvements. To me, these enahncements combine to form one of the biggest leaps I've seen in C++ code optimization -- code that can be clean, OO, *and* fast. Richard: If there's any additional testing or information I can reasonably provide to help get this in for this stage1, let me know. Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt