From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-ports-return-4423-listarch-libc-ports=sources.redhat.com@sourceware.org>
Received: (qmail 19977 invoked by alias); 3 Sep 2013 17:52:46 -0000
Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-ports.sourceware.org>
List-Subscribe: <mailto:libc-ports-subscribe@sourceware.org>
List-Post: <mailto:libc-ports@sourceware.org>
List-Help: <mailto:libc-ports-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-ports-owner@sourceware.org
Received: (qmail 19968 invoked by uid 89); 3 Sep 2013 17:52:46 -0000
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 03 Sep 2013 17:52:46 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.7 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r83HqaHk016982	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Tue, 3 Sep 2013 13:52:36 -0400
Received: from [10.3.113.109] (ovpn-113-109.phx2.redhat.com [10.3.113.109])	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r83HqZSA026452;	Tue, 3 Sep 2013 13:52:35 -0400
Message-ID: <522621E2.6020903@redhat.com>
Date: Tue, 03 Sep 2013 17:52:00 -0000
From: "Carlos O'Donell" <carlos@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8
MIME-Version: 1.0
To: =?UTF-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz>
CC: Will Newton <will.newton@linaro.org>,        "libc-ports@sourceware.org" <libc-ports@sourceware.org>,        Patch Tracking <patches@linaro.org>,        Siddhesh Poyarekar <siddhesh@redhat.com>
Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
References: <520894D5.7060207@linaro.org> <CANu=DmiBHoymFKTvaW_VsdhWZEYwkfViz1tTeRgj7H80f0FntA@mail.gmail.com> <5220D30B.9080306@redhat.com> <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ@mail.gmail.com> <5220F1F0.80501@redhat.com> <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com> <52260BD0.6090805@redhat.com> <20130903173710.GA2028@domone.kolej.mff.cuni.cz>
In-Reply-To: <20130903173710.GA2028@domone.kolej.mff.cuni.cz>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2013-09/txt/msg00023.txt.bz2

On 09/03/2013 01:37 PM, OndÅej BÃ­lka wrote:
>> We have one, it's the glibc microbenchmark, and we want to expand it,
>> otherwise when ACME comes with their patch for ARM and breaks performance
>> for targets that Linaro cares about I have no way to reject the patch
>> objectively :-)
>>
> Carlos, you are asking for impossible. When you publish benchmark people
> will try to maximize benchmark number. After certain point this becomes
> possible only by employing shady accounting: Move part of time to place
> wehre it will not be measured by benchmark (for example by having
> function that is 4kb large, on benchmarks it will fit into instruction
> cache but that does not happen in reality). 

What is it that I'm asking that is impossible?

> Taking care of common factors that can cause that is about ten times
> more complex than whole system benchmarking, analysis will be quite
> difficult as you will get twenty numbers and you will need to decide
> which ones could made real impact and which wont.

Sorry, could you clarify this a bit more, exactly what is ten times
more complex?

If we have N tests and they produce N numbers, for a given target,
for a given device, for a given workload, there is a set of importance
weights on N that should give you some kind of relevance.

We should be able to come up with some kind of framework from which
we can clearly say "this patch is better than this other patch", even
if not automated, it should be possible to reason from the results,
and that reasoning recorded as a discussion on this list.

>>> The key advantage of the cortex-strings framework is that it allows
>>> graphing the results of benchmarks. Often changes to string function
>>> performance can only really be analysed graphically as otherwise you
>>> end up with a huge soup of numbers, some going up, some going down and
>>> it is very hard to separate the signal from the noise.
>>
>> I disagree strongly. You *must* come up with a measurable answer and
>> looking at a graph is never a solution I'm going to accept.
>>
> You can have that opinion.
> Looking at performance graphs is most powerful technique how to
> understand performance. I got most of my improvements from analyzing
> these.

That is a different use for the graphs. I do not disagree that graphing
is a *powerful* way to display information and using that information to
produce a new routine is useful. What I disagree with is using such graphs
to argue qualitatively that your patch is better than the existing
implementation.

There is always a quantitative way to say X is better than Y, but it
requires breaking down your expectations and documenting them e.g.
should be faster with X alignment on sizes from N bytes to M bytes, and
then ranking based on those criteria.

>> You need to statistically analyze the numbers, assign weights to ranges,
>> and come up with some kind of number that evaluates the results based
>> on *some* formula. That is the only way we are going to keep moving
>> performance forward (against some kind of criteria).
>>
> These accurate assigning weigths is best done by taking program running
> it and measuring time. Without taking this into account weigths will not
> tell much, as you will likely just optimize cold code at expense of hot
> code.

I don't disagree with you here.

Cheers,
Carlos.