From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Berlin To: Gerald Pfeifer Cc: Mark Mitchell , Joe Buck , "gcc@gcc.gnu.org" , Subject: Re: C++ compile-time regressions Date: Thu, 02 Aug 2001 13:53:00 -0000 Message-id: <87y9p2rxny.fsf@cgsoftware.com> References: X-SW-Source: 2001-08/msg00139.html Gerald Pfeifer writes: > [ Includes patch. Okay for branch and mainline? ] > > On Wed, 25 Jul 2001, Gerald Pfeifer wrote: >> Yes, I've been working on this now. To be honest, I don't have time to >> do this, but I'll try to have something in time for 3.0.1 -- somehow. > > So, here we go. (The first column is the value of PARAM_MAX_INLINE_INSNS.) > > -O2 -O3 -O2 -O3 > 100 8:29 8:48 4000228 3990276 > 500 8:24 8:53 4136996 4126148 > 600 8:33 8:59 4158820 4156068 > 700 8:52 9:32 4169028 4222436 > 800 8:34? 10:27 4179652 4315396 > 1000 9:09 11:27 4239076 4425860 > 1500 9:49 14:05 4336260 4637060 > 2000 10:47 23:47 4435428 4758052 > > To me, 600 seems like a definite and affordable improvement here; I'd > be a bit hesitant to go over 700. > >>> Realistically, I think we have to be willing to compromise here; the 3.0.1 >>> compiler is going to be slower *and* probably generate slower code than >>> 2.95, which is too bad, but that seems to be where we're at. If we could >>> get to 10-25% on both figures that would be better than having one figure >>> small and the other massive. >> The problem is, on both ends of the scale (that is, either slower code >> or slower generation) the *better* value is already around 25%, so a >> compromise will be worse than that for *both* values. > > While I still see what I wrote as quoted above as a problem, here is the > patch I had promised. BTW, i've gotten the performance problem down using a slightly modified heuristic from integrate.c. On the last run, the compile times were about the same as 200 insns, but the performance was *much* better (we're down to about 10% speed loss). When your performance gets shot to hell, it's always being caused by not inlining things. I.E. at 100 insns, *::begin and *::end are taking >50% of the runtime, because they aren't being inlined. With a fixed store motion, we can turn off cse-skip-blocks and cse-follow-jumps. They buy us absolutely no gain, but cost a lot of time (In compiling your app, Gerald, CSE accounts for > 20% of the compile time on the files that take the longest to compile). I've got statistics to back this up. However, even with cse-skip-blocks and cse-follow-jumps turned off, CSE is still >15% of the compile. Mainly because it's trying to eliminate memory loads and stores, which PRE and Store Motion do much faster than it (since they don't modify the hash table when a store/load is killed, they just set a bit or two in a bitvector), and on a global scale. I'm just completing some benchmark runs to see if our performance actually changes if i tell CSE to stop caring about memory (and run store motion after reload). I sincerely doubt it will, now that load and store motion should be working. If it does, then PRE and store motion need to be improved. --Dan -- "When I was a kid, I went to the store and asked the guy, "Do you have any toy train schedules?" "-Steven Wright