From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-121304-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 12048 invoked by alias); 6 Dec 2004 13:18:31 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 12023 invoked by alias); 6 Dec 2004 13:18:25 -0000
Date: Mon, 06 Dec 2004 13:18:00 -0000
Message-ID: <20041206131825.12022.qmail@sourceware.org>
From: "rguenth at tat dot physik dot uni-tuebingen dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de>
References: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de>
Reply-To: gcc-bugzilla@gcc.gnu.org
Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
X-Bugzilla-Reason: CC
X-SW-Source: 2004-12/txt/msg00837.txt.bz2
List-Id: <gcc-bugs.sourceware.org>


------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  2004-12-06 13:18 -------
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 6 Dec 2004, hubicka at ucw dot cz wrote:

> The cfg inliner per se is not too interesting.  What matters here is the
> code size esitmation and profitability estimation.  I am playing with
> this now and trying to get profile based inlining working.

Yes, I guess the cfg inliner and some early dead code removal passes
should improve code size metrics for stuff like

template <class X>
struct Foo
{
  enum { val = X::val };
  void foo()
  {
    if (val)
      ...
    else
      ...
  }
};

with val being const.

> For -n10 and tramp3d.cc I need 2m14s on mainline, 1m31s on the current
> tree-profiling.  With my new implementation I need 0m27s with profile
> feedback and 2m53s without.  I wonder what makes the new heuristics work
> worse without profiling, but just increasing the inline-unit-growth very
> slightly (to 155) I get 0m42s.  This might be just little unstability in

Note that inline-unit-growth is 50 by default, so 155 is not slightly
increased.

> the order of inlining decisions affecting this.  I would be curious how
> those results compare to leafify and whether the 0m27s is not caused by
> missoptimization.

You can check for misoptimization by looking at the final output.
I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum
will increase with the number of iterations.

With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math
-D__NO_MATH_INLINES (we still need explicit -fpeel-loops for
unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with
leafification turned on, with it turned off, runtime increases
to 0m31s with --param inline-unit-growth=175.

> Unless I will observe it otherwise (on SPEC with intermodule), I will
> apply my current patch and try to improve the profitability analysis
> without profiling incrementally.  Ideally we ought to build estimated
> profile and use it, but that needs some work so for the moment I guess I
> will try to experiment with making loop depth available to the cgraph
> code.

Yes, loops could be "auto-leafified", but it will be difficult to
statically check if that is worthwhile.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704