public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
From: "Ondřej Bílka" <neleai@seznam.cz>
To: Will Newton <will.newton@linaro.org>
Cc: Carlos O'Donell <carlos@redhat.com>,
	"libc-ports@sourceware.org" <libc-ports@sourceware.org>,
	Patch Tracking <patches@linaro.org>,
	Siddhesh Poyarekar <siddhesh@redhat.com>
Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
Date: Mon, 02 Sep 2013 19:57:00 -0000	[thread overview]
Message-ID: <20130902195746.GA13759@domone.kolej.mff.cuni.cz> (raw)
In-Reply-To: <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com>

On Mon, Sep 02, 2013 at 02:58:23PM +0100, Will Newton wrote:
> On 30 August 2013 20:26, Carlos O'Donell <carlos@redhat.com> wrote:
> > On 08/30/2013 02:48 PM, Will Newton wrote:
> >> On 30 August 2013 18:14, Carlos O'Donell <carlos@redhat.com> wrote:
> >>>> Ping?
> >>>
> >>> How did you test the performance?
> >>>
> >>> glibc has a performance microbenchmark, did you use that?
> >>
> >> No, I used the cortex-strings package developed by Linaro for
> >> benchmarking various string functions against one another[1].
> >>
> >> I haven't checked the glibc benchmarks but I'll look into that. It's
> >> quite a specific case that shows the problem so it may not be obvious
> >> which one is better however.
> >
> > If it's not obvious how is someone supposed to review this patch? :-)
> 
> With difficulty. ;-)
> 
> Joseph has raised some good points about the comments and I'll go back
> through the code and make sure everything is correct in that regard.
> The change was actually made to the copy of the code in cortex-strings
> some time ago but I delayed pushing the patch due to the 2.18 release
> so I have to refresh my memory somewhat.
> 
> Ideally we would have an agreed upon benchmark with which everyone
> could analyse the performance of the code on their systems, however
> that does not seem to exist as far as I can tell.
>
Well, for measuring performance about only way that everybody will agree
with is compile implementations as old.so and new.so and then use

LD_PRELOAD=old.so time cmd
LD_PRELOAD=new.so time cmd

in loop until you calculate that there is statistically significant
difference (provided that commands you use are representative enough).

For any other somebody will argue that its opposite because you
forgotten to take some factor into account.

Even when you change LD_PRELOAD=old.so implementation that to
accurately measure time spend in function it need not be enough.

You could have implementation that will be 5 cycles faster on that
benchmark but slower in reality because

1) Its code is 1000 bytes bigger than alternative. Gains in function
itself will be eaten by instruction cache misses outside function. 
Or
2) Function agressively prefetches data (say loop that prefetches 
lines 512 bytes after current buffer position). This makes benchmark
performance better but cache will be littered by data after buffer end
buffers, real performance suffers.
Or
3) For malloc saving metadata at same cache line as start of allocated
memory could make benchmark look bad due to cache misses. But it will
improve performance as user will write there and metadata write serves
as prefetch.
or
...


> > e.g.
> > http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/benchmarks/multi/harness.c
> >
> > I would not call `multi' exhaustive, and while neither is the glibc performance
> > benchmark tests the glibc tests have received review from the glibc community
> > and are our preferred way of demonstrating performance gains when posting
> > performance patches.
> 
> The key advantage of the cortex-strings framework is that it allows
> graphing the results of benchmarks. Often changes to string function
> performance can only really be analysed graphically as otherwise you
> end up with a huge soup of numbers, some going up, some going down and
> it is very hard to separate the signal from the noise.
> 
Like following? http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_ivy_bridge/memcpy_profile_loop/results_rand/result.html

On real workloads of memcpy it is still bit hard to see what is going on:
http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_ivy_bridge/memcpy_profile_loop/results_gcc/result.html


> The glibc benchmarks also have some other weaknesses that should
> really be addressed, hopefully I'll have some time to write patches
> for some of this work.
> 
How will you fix measuring in tight loop with same arguments and only 32 times?

  parent reply	other threads:[~2013-09-02 19:57 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-12  7:55 Will Newton
2013-08-27  7:46 ` Will Newton
2013-08-30 17:14   ` Carlos O'Donell
2013-08-30 18:48     ` Will Newton
2013-08-30 19:26       ` Carlos O'Donell
2013-09-02 14:18         ` Will Newton
2013-09-03 16:14           ` Carlos O'Donell
     [not found]         ` <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com>
2013-09-02 14:18           ` benchmark improvements (Was: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.) Siddhesh Poyarekar
2013-09-03 13:46             ` Will Newton
2013-09-03 17:48               ` Ondřej Bílka
2013-09-02 19:57           ` Ondřej Bílka [this message]
2013-09-03 16:18           ` [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance Carlos O'Donell
2013-09-03 17:37             ` Ondřej Bílka
2013-09-03 17:52               ` Carlos O'Donell
2013-09-03 18:57                 ` Ondřej Bílka
2013-09-03 19:15                   ` Carlos O'Donell
2013-09-04  7:27                     ` Siddhesh Poyarekar
2013-09-04 11:03                       ` Ondřej Bílka
2013-09-04 11:43                         ` Siddhesh Poyarekar
2013-09-04 17:37                         ` Ryan S. Arnold
2013-09-05  8:04                           ` Ondřej Bílka
2013-09-04 15:30                       ` Carlos O'Donell
2013-09-04 17:35                       ` Ryan S. Arnold
2013-09-05 11:07                         ` Ondřej Bílka
2013-09-05 11:54                         ` Joseph S. Myers
2013-09-03 19:34               ` Ryan S. Arnold
2013-09-07 11:55                 ` Ondřej Bílka
2013-09-03 19:31             ` Ryan S. Arnold
2013-09-03 19:54               ` Carlos O'Donell
2013-09-03 20:56                 ` Ryan S. Arnold
2013-09-03 23:29                   ` Ondřej Bílka
2013-09-03 23:31                   ` Carlos O'Donell
2013-09-03 22:27               ` Ondřej Bílka
2013-08-29 23:58 ` Joseph S. Myers
2013-08-30 14:56   ` Will Newton
2013-08-30 15:18     ` Joseph S. Myers
2013-08-30 18:46       ` Will Newton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130902195746.GA13759@domone.kolej.mff.cuni.cz \
    --to=neleai@seznam.cz \
    --cc=carlos@redhat.com \
    --cc=libc-ports@sourceware.org \
    --cc=patches@linaro.org \
    --cc=siddhesh@redhat.com \
    --cc=will.newton@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).