From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-ports-return-4438-listarch-libc-ports=sources.redhat.com@sourceware.org>
Received: (qmail 17837 invoked by alias); 4 Sep 2013 17:37:37 -0000
Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-ports.sourceware.org>
List-Subscribe: <mailto:libc-ports-subscribe@sourceware.org>
List-Post: <mailto:libc-ports@sourceware.org>
List-Help: <mailto:libc-ports-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-ports-owner@sourceware.org
Received: (qmail 17828 invoked by uid 89); 4 Sep 2013 17:37:37 -0000
Received: from mail-we0-f172.google.com (HELO mail-we0-f172.google.com) (74.125.82.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 04 Sep 2013 17:37:37 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KHOP_THREADED,NO_RELAYS autolearn=ham version=3.3.2
X-HELO: mail-we0-f172.google.com
Received: by mail-we0-f172.google.com with SMTP id n5so712440wev.3        for <libc-ports@sourceware.org>; Wed, 04 Sep 2013 10:37:33 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.194.173.163 with SMTP id bl3mr3536146wjc.10.1378316253091; Wed, 04 Sep 2013 10:37:33 -0700 (PDT)
Received: by 10.216.179.5 with HTTP; Wed, 4 Sep 2013 10:37:33 -0700 (PDT)
In-Reply-To: <20130904110333.GA6216@domone.kolej.mff.cuni.cz>
References: <5220D30B.9080306@redhat.com>	<CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ@mail.gmail.com>	<5220F1F0.80501@redhat.com>	<CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com>	<52260BD0.6090805@redhat.com>	<20130903173710.GA2028@domone.kolej.mff.cuni.cz>	<522621E2.6020903@redhat.com>	<20130903185721.GA3876@domone.kolej.mff.cuni.cz>	<5226354D.8000006@redhat.com>	<20130904073008.GA4306@spoyarek.pnq.redhat.com>	<20130904110333.GA6216@domone.kolej.mff.cuni.cz>
Date: Wed, 04 Sep 2013 17:37:00 -0000
Message-ID: <CAAKybw8L6A7RpMzbp3WheVciMwMTWko3uWgxV_9KPYtEJZ=WHQ@mail.gmail.com>
Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
From: "Ryan S. Arnold" <ryan.arnold@gmail.com>
To: =?UTF-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz>
Cc: Siddhesh Poyarekar <siddhesh@redhat.com>, "Carlos O'Donell" <carlos@redhat.com>, 	Will Newton <will.newton@linaro.org>, 	"libc-ports@sourceware.org" <libc-ports@sourceware.org>, Patch Tracking <patches@linaro.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2013-09/txt/msg00038.txt.bz2

On Wed, Sep 4, 2013 at 6:03 AM, Ond=C5=99ej B=C3=ADlka <neleai@seznam.cz> w=
rote:
> On Wed, Sep 04, 2013 at 01:00:09PM +0530, Siddhesh Poyarekar wrote:
>> 2. Scale with size
> Not very important for several reasons. One is that big sizes are cold
> (just look in oprofile output that loops are less frequent than header.)
>
> Second reason is that if we look at caller large sizes are unlikely
> bottleneck.

=46rom my experience, extremely large data sizes are not very common.
Optimizing for those gets diminishing returns.  I believe that at very
large sizes the pressure is all on the hardware anyway.  Prefetching
large amounts of data in a loop takes a fixed amount of time and given
a large enough amount of data, the overhead introduced by most other
factors is negligible.

>> 4. Measure the effect of dcache pressure on function performance
>> 5. Measure effect of icache pressure on function performance.
>>
> Here you really need to base weigths on function usage patterns.
> A bigger code size is acceptable for functions that are called more
> often. You need to see distribution of how are calls clustered to get
> full picture. A strcmp is least sensitive to icache concerns, as when it
> is called its mostly 100 times over in tight loop so size is not big issu=
e.
> If same number of call is uniformnly spread through program we need
> stricter criteria.

Icache pressure is probably one of the more difficult things to
measure with a benchmark.  I suppose it'd be easier with a pipeline
analyzer.

Can you explain how usage pattern analysis might reveal icache pressure?

I'm not sure how useful 'usage pattern' are when considering dcache
pressure.  On Power we have data-cache prefetch instructions and since
we know that dcache pressure is a reality, we will prefetch if our
data sizes are large enough to out-weigh the overhead of prefetching,
e.g., when the data size exceeds the cacheline size.

Ryan