From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Sturm To: Richard.Earnshaw@arm.com Cc: David Edelsohn , Richard Henderson , "David S. Miller" , Subject: Re: Faster compilation speed Date: Fri, 23 Aug 2002 15:39:00 -0000 Message-id: References: <200208220853.JAA29295@cam-mail2.cambridge.arm.com> X-SW-Source: 2002-08/msg01486.html On Thu, 22 Aug 2002, Richard Earnshaw wrote: > OK, now consider it this way. Each cache line miss will cause N bytes to > be fetched from memory -- I don't know the details, but lets assume that's > 32 bytes, a typical value. Each tlb entry will address one page -- again > I don't know the details but 4K is common on many machines. > > So, with gcc 2.95.3 we have > > -O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95 > -O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127 > > Since each dcache miss represents 32 bytes of memory we have 3040 (95 * > 32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100% > of each page being accessed for each miss (it will be lower than this in > practice, since some lines in a page will probably be fetched more than > once and others not at all). > > However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best* > access less than half the memory in each page we touch. Interesting analysis; thanks. It's actually worse than you say since Alpha has 8k pages. I looked up the ev56 specs to find out there are just 64 TLB entries, so for any working set larger than 512k some thrashing would be expected. For another experiment I installed one of the superpage patches available for Linux; this enables the granularity hint bits for Alpha to support pages up to 4MB. Then I modified ggc-page.c to allocate 4MB chucks by anonymous mmap. I then measured 70% fewer dtb misses for cc1, although wall clock time is reduced by only ~5%. So it would appear that TLB misses are indeed important but not the overwhelming concern in gcc's performance. Jeff