From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4995 invoked by alias); 5 Sep 2013 08:04:30 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 4981 invoked by uid 89); 5 Sep 2013 08:04:29 -0000 Received: from popelka.ms.mff.cuni.cz (HELO popelka.ms.mff.cuni.cz) (195.113.20.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Sep 2013 08:04:29 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Received: from domone.kolej.mff.cuni.cz (popelka.ms.mff.cuni.cz [195.113.20.131]) by popelka.ms.mff.cuni.cz (Postfix) with ESMTPS id C5FE7500CF; Thu, 5 Sep 2013 10:04:21 +0200 (CEST) Received: by domone.kolej.mff.cuni.cz (Postfix, from userid 1000) id A2F315F822; Thu, 5 Sep 2013 10:04:21 +0200 (CEST) Date: Thu, 05 Sep 2013 08:04:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: "Ryan S. Arnold" Cc: Siddhesh Poyarekar , Carlos O'Donell , Will Newton , "libc-ports@sourceware.org" , Patch Tracking Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance. Message-ID: <20130905080421.GA5401@domone.kolej.mff.cuni.cz> References: <5220F1F0.80501@redhat.com> <52260BD0.6090805@redhat.com> <20130903173710.GA2028@domone.kolej.mff.cuni.cz> <522621E2.6020903@redhat.com> <20130903185721.GA3876@domone.kolej.mff.cuni.cz> <5226354D.8000006@redhat.com> <20130904073008.GA4306@spoyarek.pnq.redhat.com> <20130904110333.GA6216@domone.kolej.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00041.txt.bz2 On Wed, Sep 04, 2013 at 12:37:33PM -0500, Ryan S. Arnold wrote: > On Wed, Sep 4, 2013 at 6:03 AM, Ondřej Bílka wrote: > > On Wed, Sep 04, 2013 at 01:00:09PM +0530, Siddhesh Poyarekar wrote: > >> 4. Measure the effect of dcache pressure on function performance > >> 5. Measure effect of icache pressure on function performance. > >> > > Here you really need to base weigths on function usage patterns. > > A bigger code size is acceptable for functions that are called more > > often. You need to see distribution of how are calls clustered to get > > full picture. A strcmp is least sensitive to icache concerns, as when it > > is called its mostly 100 times over in tight loop so size is not big issue. > > If same number of call is uniformnly spread through program we need > > stricter criteria. > > Icache pressure is probably one of the more difficult things to > measure with a benchmark. I suppose it'd be easier with a pipeline > analyzer. > > Can you explain how usage pattern analysis might reveal icache pressure? > With profiler its simple, I profiled firefox a while, results are here: http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile_firefox/result.html Now when you look to 'Delays between calls' graph you will see peak which is likely caused by strcmp being called in loop. >From graph about 2/3 of calls happen in less than 128 cycles since last one. As there is limited number of cache lines that you can access in 128 cycles per call impact is smaller. > I'm not sure how useful 'usage pattern' are when considering dcache > pressure. On Power we have data-cache prefetch instructions and since > we know that dcache pressure is a reality, we will prefetch if our > data sizes are large enough to out-weigh the overhead of prefetching, > e.g., when the data size exceeds the cacheline size. > Very useful as overhead of prefetching is determined that this quantity. You can have two applications that often call memset with size 16000. First one uses memset to refresh one static array which is entirely in L1 cache and prefetching is harmful. Second one does random access of 1GB of memory and prefetching would help. Swithching to prefetching when you exceed cache size has advantage of certainty that is will help. Real treshold is lower as it is unlikely that large array got as argument is only thing that occupies cache.