From mboxrd@z Thu Jan  1 00:00:00 1970
From: Cyrille Chepelov <cyrille@chepelov.org>
To: gcc@gcc.gnu.org
Subject: Re: Faster compilation speed
Date: Sat, 10 Aug 2002 18:28:00 -0000
Message-id: <20020811012851.GD23533@chepelov.org>
References: <20020810212553.GA22959@chepelov.org> <Pine.LNX.4.44.0208102031550.8641-100000@dberlin.org>
X-SW-Source: 2002-08/msg00609.html

Le Sat, Aug 10, 2002, Ã  08:33:53PM -0400, Daniel Berlin a Ã©crit:

> On Sat, 10 Aug 2002, Cyrille Chepelov wrote:
> > I have tried on a grand total of three files, two from today's mainline CVS
> > (updated from anonymous about four hours ago), and one from Linux 2.5.30; as
> > my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected
> > (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) 

(Some brave soul pointed to me that HT is more probably HyperThreading. I
stand corrected (though being LT surely entitles one to getting cooler toys
that mere mortals)).

> The numbers I get on a p4 with cachegrind are *much* worse in all cases.
> 
> The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.

a-ha ! This is interesting... Did you run on the same sample files as I did,
or others ? Can you reproduce my numbers if you set --I1=65536,2,64
--D1=65536,2,64 --L2=65536,8,64 ?

> Are you sure you have valgrind configured right for your cache?

Sure, no. The cache spec numbers did look about rig... D'oh! Looks like 
Cachegrind trusts a little too faithfully what this old (A0-stepping) Duron 
says. CG believes L2 is 1 KB, whereas in fact it is 64KB.

I've just re-ran the java/parser.c test with forcing --L2=65536,8,64, and
uploaded the results (same place)

What are the first lines of output from vg_annotate on your system ?
It certainly sounds unbelievable that a Duron's cache design beats a P4's.

(there is something curious about the L2 lines from the initial output (the
last three ones). Saying that 355266 misses for 365721 refs means a 0.0%
miss rate certainly sounds strange, I've got to ask Julian about the logic
there. Looks to me that L2 failed 97% of its mission).

> I'm going to do this the *real* way, using the performance monitoring 
> counters on my p4, and get *real* numbers.

It would be very interesting to see how far off CG falls... CG does make the
implicit assumption that the process runs uninterrupted (I tried welding
cachegrind into UML, but that didn't bring me far). The real CPU will
certainly give you a more lively picture.... (the performance monitoring
counters are not per-process on Linux, are they ?)

	-- Cyrille

-- 
Grumpf.