From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeff Sturm <jsturm@one-point.com>
To: Richard.Earnshaw@arm.com
Cc: David Edelsohn <dje@watson.ibm.com>, Richard Henderson <rth@redhat.com>, "David S. Miller" <davem@redhat.com>, <gcc@gcc.gnu.org>
Subject: Re: Faster compilation speed 
Date: Fri, 23 Aug 2002 15:39:00 -0000
Message-id: <Pine.LNX.4.44.0208231822110.28519-100000@ops2.one-point.com>
References: <200208220853.JAA29295@cam-mail2.cambridge.arm.com>
X-SW-Source: 2002-08/msg01486.html

On Thu, 22 Aug 2002, Richard Earnshaw wrote:
> OK, now consider it this way.  Each cache line miss will cause N bytes to
> be fetched from memory -- I don't know the details, but lets assume that's
> 32 bytes, a typical value.  Each tlb entry will address one page -- again
> I don't know the details but 4K is common on many machines.
>
> So, with gcc 2.95.3 we have
>
> -O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95
> -O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127
>
> Since each dcache miss represents 32 bytes of memory we have 3040 (95 *
> 32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100%
> of each page being accessed for each miss (it will be lower than this in
> practice, since some lines in a page will probably be fetched more than
> once and others not at all).
>
> However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best*
> access less than half the memory in each page we touch.

Interesting analysis; thanks.  It's actually worse than you say since
Alpha has 8k pages.

I looked up the ev56 specs to find out there are just 64 TLB entries, so
for any working set larger than 512k some thrashing would be expected.

For another experiment I installed one of the superpage patches available
for Linux; this enables the granularity hint bits for Alpha to support
pages up to 4MB.  Then I modified ggc-page.c to allocate 4MB chucks by
anonymous mmap.

I then measured 70% fewer dtb misses for cc1, although wall clock time is
reduced by only ~5%.  So it would appear that TLB misses are indeed
important but not the overwhelming concern in gcc's performance.

Jeff