From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Berlin <dan@cgsoftware.com>
To: Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at>
Cc: Mark Mitchell <mark@codesourcery.com>, Joe Buck <jbuck@synopsys.com>, "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>, <gcc-patches@gcc.gnu.org>
Subject: Re: C++ compile-time regressions
Date: Thu, 02 Aug 2001 13:53:00 -0000
Message-id: <87y9p2rxny.fsf@cgsoftware.com>
References: <Pine.BSF.4.33.0108011301040.3036-100000@taygeta.dbai.tuwien.ac.at>
X-SW-Source: 2001-08/msg00139.html

Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at> writes:

> [ Includes patch. Okay for branch and mainline? ]
> 
> On Wed, 25 Jul 2001, Gerald Pfeifer wrote:
>> Yes, I've been working on this now.  To be honest, I don't have time to
>> do this, but I'll try to have something in time for 3.0.1 -- somehow.
> 
> So, here we go. (The first column is the value of PARAM_MAX_INLINE_INSNS.)
> 
>         -O2     -O3     -O2     -O3
>  100    8:29    8:48    4000228 3990276
>  500    8:24    8:53    4136996 4126148
>  600    8:33    8:59    4158820 4156068
>  700    8:52    9:32    4169028 4222436
>  800    8:34?  10:27    4179652 4315396
> 1000    9:09   11:27    4239076 4425860
> 1500    9:49   14:05    4336260 4637060
> 2000   10:47   23:47    4435428 4758052
> 
> To me, 600 seems like a definite and affordable improvement here; I'd
> be a bit hesitant to go over 700.
> 
>>> Realistically, I think we have to be willing to compromise here; the 3.0.1
>>> compiler is going to be slower *and* probably generate slower code than
>>> 2.95, which is too bad, but that seems to be where we're at.  If we could
>>> get to 10-25% on both figures that would be better than having one figure
>>> small and the other massive.
>> The problem is, on both ends of the scale (that is, either slower code
>> or slower generation) the *better* value is already around 25%, so a
>> compromise will be worse than that for *both* values.
> 
> While I still see what I wrote as quoted above as a problem, here is the
> patch I had promised.

BTW, i've gotten the performance problem down using a slightly
modified heuristic from
integrate.c.
On the last run, the compile times were about the same as 200 insns,
but the performance was *much* better (we're down to about 10% speed
loss).
When your performance gets shot to hell, it's always being caused by
not inlining things. I.E. at 100 insns, *::begin and *::end are taking
>50% of the runtime, because they aren't being inlined.

With a fixed store motion, we can turn off cse-skip-blocks and
cse-follow-jumps.
They buy us absolutely no gain, but cost a lot of time (In compiling
your app, Gerald, CSE accounts for > 20% of the compile time on the
files that take the longest to compile).
I've got statistics to back this up.
However, even with cse-skip-blocks and cse-follow-jumps turned off,
CSE is still >15% of the compile.
Mainly because it's trying to eliminate memory loads and stores, which
PRE and Store Motion do much faster than it (since they don't modify
the hash table when a store/load is killed, they just set a bit or two
in a bitvector), and on a global scale.
I'm just completing some benchmark runs to see if our performance
actually changes if i tell CSE to stop caring about memory (and run
store motion after reload).
I sincerely doubt it will, now that load and store motion should be
working.  
If it does, then PRE and store motion need to be improved. 

--Dan

-- 
"When I was a kid, I went to the store and asked the guy, "Do you
have any toy train schedules?"
"-Steven Wright