From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17837 invoked by alias); 4 Sep 2013 17:37:37 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 17828 invoked by uid 89); 4 Sep 2013 17:37:37 -0000 Received: from mail-we0-f172.google.com (HELO mail-we0-f172.google.com) (74.125.82.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 04 Sep 2013 17:37:37 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KHOP_THREADED,NO_RELAYS autolearn=ham version=3.3.2 X-HELO: mail-we0-f172.google.com Received: by mail-we0-f172.google.com with SMTP id n5so712440wev.3 for ; Wed, 04 Sep 2013 10:37:33 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.194.173.163 with SMTP id bl3mr3536146wjc.10.1378316253091; Wed, 04 Sep 2013 10:37:33 -0700 (PDT) Received: by 10.216.179.5 with HTTP; Wed, 4 Sep 2013 10:37:33 -0700 (PDT) In-Reply-To: <20130904110333.GA6216@domone.kolej.mff.cuni.cz> References: <5220D30B.9080306@redhat.com> <5220F1F0.80501@redhat.com> <52260BD0.6090805@redhat.com> <20130903173710.GA2028@domone.kolej.mff.cuni.cz> <522621E2.6020903@redhat.com> <20130903185721.GA3876@domone.kolej.mff.cuni.cz> <5226354D.8000006@redhat.com> <20130904073008.GA4306@spoyarek.pnq.redhat.com> <20130904110333.GA6216@domone.kolej.mff.cuni.cz> Date: Wed, 04 Sep 2013 17:37:00 -0000 Message-ID: Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance. From: "Ryan S. Arnold" To: =?UTF-8?B?T25kxZllaiBCw61sa2E=?= Cc: Siddhesh Poyarekar , "Carlos O'Donell" , Will Newton , "libc-ports@sourceware.org" , Patch Tracking Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00038.txt.bz2 On Wed, Sep 4, 2013 at 6:03 AM, Ond=C5=99ej B=C3=ADlka w= rote: > On Wed, Sep 04, 2013 at 01:00:09PM +0530, Siddhesh Poyarekar wrote: >> 2. Scale with size > Not very important for several reasons. One is that big sizes are cold > (just look in oprofile output that loops are less frequent than header.) > > Second reason is that if we look at caller large sizes are unlikely > bottleneck. =46rom my experience, extremely large data sizes are not very common. Optimizing for those gets diminishing returns. I believe that at very large sizes the pressure is all on the hardware anyway. Prefetching large amounts of data in a loop takes a fixed amount of time and given a large enough amount of data, the overhead introduced by most other factors is negligible. >> 4. Measure the effect of dcache pressure on function performance >> 5. Measure effect of icache pressure on function performance. >> > Here you really need to base weigths on function usage patterns. > A bigger code size is acceptable for functions that are called more > often. You need to see distribution of how are calls clustered to get > full picture. A strcmp is least sensitive to icache concerns, as when it > is called its mostly 100 times over in tight loop so size is not big issu= e. > If same number of call is uniformnly spread through program we need > stricter criteria. Icache pressure is probably one of the more difficult things to measure with a benchmark. I suppose it'd be easier with a pipeline analyzer. Can you explain how usage pattern analysis might reveal icache pressure? I'm not sure how useful 'usage pattern' are when considering dcache pressure. On Power we have data-cache prefetch instructions and since we know that dcache pressure is a reality, we will prefetch if our data sizes are large enough to out-weigh the overhead of prefetching, e.g., when the data size exceeds the cacheline size. Ryan