From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cyrille Chepelov To: gcc@gcc.gnu.org Subject: Re: Faster compilation speed Date: Sat, 10 Aug 2002 18:28:00 -0000 Message-id: <20020811012851.GD23533@chepelov.org> References: <20020810212553.GA22959@chepelov.org> X-SW-Source: 2002-08/msg00609.html Le Sat, Aug 10, 2002, à 08:33:53PM -0400, Daniel Berlin a écrit: > On Sat, 10 Aug 2002, Cyrille Chepelov wrote: > > I have tried on a grand total of three files, two from today's mainline CVS > > (updated from anonymous about four hours ago), and one from Linux 2.5.30; as > > my machine is not exactly the dual-multi-gigahertz, "HT"-interconnected > > (HyperTransport ?) with gobs of memory bandwith (and what else? 64 bits?) (Some brave soul pointed to me that HT is more probably HyperThreading. I stand corrected (though being LT surely entitles one to getting cooler toys that mere mortals)). > The numbers I get on a p4 with cachegrind are *much* worse in all cases. > > The miss rates are all >2%, which is a far cry from 0.1% and 0.0%. a-ha ! This is interesting... Did you run on the same sample files as I did, or others ? Can you reproduce my numbers if you set --I1=65536,2,64 --D1=65536,2,64 --L2=65536,8,64 ? > Are you sure you have valgrind configured right for your cache? Sure, no. The cache spec numbers did look about rig... D'oh! Looks like Cachegrind trusts a little too faithfully what this old (A0-stepping) Duron says. CG believes L2 is 1 KB, whereas in fact it is 64KB. I've just re-ran the java/parser.c test with forcing --L2=65536,8,64, and uploaded the results (same place) What are the first lines of output from vg_annotate on your system ? It certainly sounds unbelievable that a Duron's cache design beats a P4's. (there is something curious about the L2 lines from the initial output (the last three ones). Saying that 355266 misses for 365721 refs means a 0.0% miss rate certainly sounds strange, I've got to ask Julian about the logic there. Looks to me that L2 failed 97% of its mission). > I'm going to do this the *real* way, using the performance monitoring > counters on my p4, and get *real* numbers. It would be very interesting to see how far off CG falls... CG does make the implicit assumption that the process runs uninterrupted (I tried welding cachegrind into UML, but that didn't bring me far). The real CPU will certainly give you a more lively picture.... (the performance monitoring counters are not per-process on Linux, are they ?) -- Cyrille -- Grumpf.