Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.

public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed

From: Siddhesh Poyarekar <siddhesh@redhat.com>
To: "Carlos O'Donell" <carlos@redhat.com>
Cc: "Ondřej Bílka" <neleai@seznam.cz>,
	"Will Newton" <will.newton@linaro.org>,
	"libc-ports@sourceware.org" <libc-ports@sourceware.org>,
	"Patch Tracking" <patches@linaro.org>
Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
Date: Wed, 04 Sep 2013 07:27:00 -0000	[thread overview]
Message-ID: <20130904073008.GA4306@spoyarek.pnq.redhat.com> (raw)
In-Reply-To: <5226354D.8000006@redhat.com>

On Tue, Sep 03, 2013 at 03:15:25PM -0400, Carlos O'Donell wrote:
> I agree. The eventual goal of the project is to have some kind of
> whole system benchmarking that allows users to feed in their profiles
> and allow us as developers to see what users are doing with our library.
> 
> Just like CPU designers feed in a whole distribution of applications
> and look at the probability of instruction selection and tweak instruction
> to microcode mappings.
> 
> I am willing to accept a certain error in the process as long as I know
> we are headed in the right direction. If we all disagree about the
> direction we are going in then we should talk about it.
> 
> I see:
> 
> microbenchmarks -> whole system benchmarks -> profile driven optimizations

I've mentioned this before - microbenchmarks are not a way to whole
system benchmarks in that they don't replace system benchmarks.  We
need to work on both in parallel because both have different goals.

A microbenchmark would have parameters such as alignment, size and
cache pressure to determine how an implementation scales.  These are
generic numbers (i.e. they're not tied to specific high level
workloads) that a developer can use to design their programs.

Whole system benchmarks however work at a different level.  They would
give an average case number that describes how a specific recipe
impacts performance of a set of programs.  An administrator would use
these to tweak the system for the workload.

> I would be happy to accept a patch that does:
> * Shows the benchmark numbers.
> * Explains relevant factors not caught by the benchmark that affect
>   performance, what they are, and why the patch should go in.
> 
> My goal is to increase the quality of the written rationales for
> performance related submissions.

Agreed.  In fact, this should go in as a large comment in the
implementation itself.  Someone had mentioned in the past (was it
Torvald?) that every assembly implementation we write should be as
verbose in comments as it can possibly be so that there is no
ambiguity about the rationale for selection of specific instruction
sequences over others.

> >> If we have N tests and they produce N numbers, for a given target,
> >> for a given device, for a given workload, there is a set of importance
> >> weights on N that should give you some kind of relevance.
> >>
> > You are jumping to case when we will have these weights. Problematic
> > part is getting those.
> 
> I agree.
> 
> It's hard to know the weights without having an intuitive understanding
> of the applications you're running on your system and what's relevant
> for their performance.

1. Assume aligned input.  Nothing should take (any noticeable)
   performance away from align copies/moves
2. Scale with size
3. Provide acceptable performance for unaligned sizes without
   penalizing the aligned case
4. Measure the effect of dcache pressure on function performance
5. Measure effect of icache pressure on function performance.

Depending on the actual cost of cache misses on different processors,
the icache/dcache miss cost would either have higher or lower weight
but for 1-3, I'd go in that order of priorities with little concern
for unaligned cases.

Siddhesh

next prev parent reply	other threads:[~2013-09-04  7:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-12  7:55 Will Newton
2013-08-27  7:46 ` Will Newton
2013-08-30 17:14   ` Carlos O'Donell
2013-08-30 18:48     ` Will Newton
2013-08-30 19:26       ` Carlos O'Donell
2013-09-02 14:18         ` Will Newton
2013-09-03 16:14           ` Carlos O'Donell
     [not found]         ` <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com>
2013-09-02 14:18           ` benchmark improvements (Was: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.) Siddhesh Poyarekar
2013-09-03 13:46             ` Will Newton
2013-09-03 17:48               ` Ondřej Bílka
2013-09-02 19:57           ` [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance Ondřej Bílka
2013-09-03 16:18           ` Carlos O'Donell
2013-09-03 17:37             ` Ondřej Bílka
2013-09-03 17:52               ` Carlos O'Donell
2013-09-03 18:57                 ` Ondřej Bílka
2013-09-03 19:15                   ` Carlos O'Donell
2013-09-04  7:27                     ` Siddhesh Poyarekar [this message]
2013-09-04 11:03                       ` Ondřej Bílka
2013-09-04 11:43                         ` Siddhesh Poyarekar
2013-09-04 17:37                         ` Ryan S. Arnold
2013-09-05  8:04                           ` Ondřej Bílka
2013-09-04 15:30                       ` Carlos O'Donell
2013-09-04 17:35                       ` Ryan S. Arnold
2013-09-05 11:07                         ` Ondřej Bílka
2013-09-05 11:54                         ` Joseph S. Myers
2013-09-03 19:34               ` Ryan S. Arnold
2013-09-07 11:55                 ` Ondřej Bílka
2013-09-03 19:31             ` Ryan S. Arnold
2013-09-03 19:54               ` Carlos O'Donell
2013-09-03 20:56                 ` Ryan S. Arnold
2013-09-03 23:29                   ` Ondřej Bílka
2013-09-03 23:31                   ` Carlos O'Donell
2013-09-03 22:27               ` Ondřej Bílka
2013-08-29 23:58 ` Joseph S. Myers
2013-08-30 14:56   ` Will Newton
2013-08-30 15:18     ` Joseph S. Myers
2013-08-30 18:46       ` Will Newton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130904073008.GA4306@spoyarek.pnq.redhat.com \
    --to=siddhesh@redhat.com \
    --cc=carlos@redhat.com \
    --cc=libc-ports@sourceware.org \
    --cc=neleai@seznam.cz \
    --cc=patches@linaro.org \
    --cc=will.newton@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).