From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9396 invoked by alias); 1 Nov 2011 20:23:02 -0000 Received: (qmail 9366 invoked by uid 22791); 1 Nov 2011 20:22:59 -0000 X-SWARE-Spam-Status: No, hits=-3.3 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 01 Nov 2011 20:22:41 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.221.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id A7E468738D; Tue, 1 Nov 2011 21:22:37 +0100 (CET) Date: Tue, 01 Nov 2011 20:48:00 -0000 From: Martin Jambor To: Matt Cc: Maxim Kuvyrkov , Richard Guenther , GCC Patches Subject: Re: [PATCH] Add capability to run several iterations of early optimizations Message-ID: <20111101202236.GB13544@virgil.arch.suse.de> Mail-Followup-To: Matt , Maxim Kuvyrkov , Richard Guenther , GCC Patches References: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> <2047F9D7-5DE8-42C3-8E6E-B20A2752AB46@codesourcery.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-11/txt/msg00085.txt.bz2 Hi, On Fri, Oct 28, 2011 at 04:06:20PM -0700, Matt wrote: > ... > > I agree (of course). Having the knob will be very useful for testing > and determining the acceptance criteria for the later "smartness". > While terminating early would be a nice optimization, the feature is > still intrinsically useful and deployable without it. In addition, > when using LTO on nearly all the projects/modules I tested on, 3+ > passes were always productive. To be fair, when not using LTO, > beyond 2-3 passes did not often produce improvements unless > individual compilation units were enormous. I'm quite surprised you get extra benefit with LTO since early optimizations are exactly the part of middle-end which should produce the same results, LTO or not. So the only way I can imagine this can happen is that inlining analysis gets somehow a much better input and then can make much bigger use of it. If this is because of extra early inlining, we might try to be able to catch these situations when doing IPA inlining decisions which would work regardless of any iteration number cut-off. If it is because of something else, it's probably better to (at least try to) tweak the passes and/or inlining analysis to understand each other straight away. > > There was also the question of if some of the improvements seen with > multiple passes were indicative of deficiencies in early inlining, > CFG, SRA, SRA, because it is not flow-sensitive in any way, unfortunately sometimes produces useless statements which then need to be cleanup up by forwprop (and possibly dse and others). We've already talked with Richi about this and agreed the early one should be dumbed down a little to produce much less of these. I'm afraid I won't be able to submit a patch doing that during this stage 1, though. > etc. If the knob is available, I'm happy to continue > testing on the same projects I've filed recent LTO/graphite bugs > against (glib, zlib, openssl, scummvm, binutils, etc) and write a > report on what I observe as "suspicious" improvements that perhaps > should be caught/made in a single pass. > > It's worth noting again that while this is a useful feature in and > of itself (especially when combined with LTO), it's *extremely* > useful when coupled with the de-virtualization improvements > submitted in other threads. The examples submitted for inclusion in > the test suite aren't academic -- they are reductions of real-world > performance issues from a mature (and shipping) C++-based networking > product. Any C++ codebase that employs physical separation in their > designs via Factory patterns, Interface Segregation, and/or > Dependency Inversion will likely see improvements. To me, these > enahncements combine to form one of the biggest leaps I've seen in > C++ code optimization -- code that can be clean, OO, *and* fast. Well, while I'd understand that whenever there is a new direct call graph edge, trying early inlining again might help or save some work for the full inlining, I think that we should rather try to enhance the current IPA infrastructure rather than grow another one from the early optimizations, especially if we aim at LTO - iterating early optimizations will not help reduce abstraction if it is spread across a number of compilation units. Martin