From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-306195-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 9396 invoked by alias); 1 Nov 2011 20:23:02 -0000
Received: (qmail 9366 invoked by uid 22791); 1 Nov 2011 20:22:59 -0000
X-SWARE-Spam-Status: No, hits=-3.3 required=5.0	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 01 Nov 2011 20:22:41 +0000
Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.221.2])	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx2.suse.de (Postfix) with ESMTP id A7E468738D;	Tue,  1 Nov 2011 21:22:37 +0100 (CET)
Date: Tue, 01 Nov 2011 20:48:00 -0000
From: Martin Jambor <mjambor@suse.cz>
To: Matt <matt@use.net>
Cc: Maxim Kuvyrkov <maxim@codesourcery.com>,	Richard Guenther <richard.guenther@gmail.com>,	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Add capability to run several iterations of early optimizations
Message-ID: <20111101202236.GB13544@virgil.arch.suse.de>
Mail-Followup-To: Matt <matt@use.net>,	Maxim Kuvyrkov <maxim@codesourcery.com>,	Richard Guenther <richard.guenther@gmail.com>,	GCC Patches <gcc-patches@gcc.gnu.org>
References: <D2CC2627-619F-4154-85DD-0BF4A11D2027@codesourcery.com> <CAFiYyc3R0r2Gc4bn1dqsx3eiv76yMKj59y2-068mJh5AoAEnyw@mail.gmail.com> <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> <CAFiYyc30CiDyug50_fqUFFi6zArwxZP1qz0Fg=JOZQwSS=fK8Q@mail.gmail.com> <2047F9D7-5DE8-42C3-8E6E-B20A2752AB46@codesourcery.com> <CAFiYyc1AHg7u0VA2XiZ50ozY6=BaPQx7Dct9-7M1rUzdgNj9Lw@mail.gmail.com> <AE6C3BCE-4EE9-478A-BABA-0D52CB7C3003@codesourcery.com> <Pine.NEB.4.64.1110281533560.8520@cesium.clock.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <Pine.NEB.4.64.1110281533560.8520@cesium.clock.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-11/txt/msg00085.txt.bz2

Hi,

On Fri, Oct 28, 2011 at 04:06:20PM -0700, Matt wrote:
>
...

> 
> I agree (of course). Having the knob will be very useful for testing
> and determining the acceptance criteria for the later "smartness".
> While terminating early would be a nice optimization, the feature is
> still intrinsically useful and deployable without it. In addition,
> when using LTO on nearly all the projects/modules I tested on, 3+
> passes were always productive. To be fair, when not using LTO,
> beyond 2-3 passes did not often produce improvements unless
> individual compilation units were enormous.

I'm quite surprised you get extra benefit with LTO since early
optimizations are exactly the part of middle-end which should produce
the same results, LTO or not.  So the only way I can imagine this can
happen is that inlining analysis gets somehow a much better input and
then can make much bigger use of it.  If this is because of extra
early inlining, we might try to be able to catch these situations when
doing IPA inlining decisions which would work regardless of any
iteration number cut-off.  If it is because of something else, it's
probably better to (at least try to) tweak the passes and/or inlining
analysis to understand each other straight away.

> 
> There was also the question of if some of the improvements seen with
> multiple passes were indicative of deficiencies in early inlining,
> CFG, SRA, 

SRA, because it is not flow-sensitive in any way, unfortunately
sometimes produces useless statements which then need to be cleanup up
by forwprop (and possibly dse and others).  We've already talked with
Richi about this and agreed the early one should be dumbed down a
little to produce much less of these.  I'm afraid I won't be able to
submit a patch doing that during this stage 1, though.

> etc. If the knob is available, I'm happy to continue
> testing on the same projects I've filed recent LTO/graphite bugs
> against (glib, zlib, openssl, scummvm, binutils, etc) and write a
> report on what I observe as "suspicious" improvements that perhaps
> should be caught/made in a single pass.
> 
> It's worth noting again that while this is a useful feature in and
> of itself (especially when combined with LTO), it's *extremely*
> useful when coupled with the de-virtualization improvements
> submitted in other threads. The examples submitted for inclusion in
> the test suite aren't academic -- they are reductions of real-world
> performance issues from a mature (and shipping) C++-based networking
> product. Any C++ codebase that employs physical separation in their
> designs via Factory patterns, Interface Segregation, and/or
> Dependency Inversion will likely see improvements. To me, these
> enahncements combine to form one of the biggest leaps I've seen in
> C++ code optimization -- code that can be clean, OO, *and* fast.

Well, while I'd understand that whenever there is a new direct call
graph edge, trying early inlining again might help or save some work
for the full inlining, I think that we should rather try to enhance
the current IPA infrastructure rather than grow another one from the
early optimizations, especially if we aim at LTO - iterating early
optimizations will not help reduce abstraction if it is spread across
a number of compilation units.

Martin