public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
From: Richard Earnshaw <rearnsha@arm.com>
To: Carlos O'Donell <carlos@redhat.com>
Cc: "Joseph S. Myers" <joseph@codesourcery.com>,
	 "Shih-Yuan Lee (FourDollars)" <sylee@canonical.com>,
	"patches@eglibc.org" <patches@eglibc.org>,
	 "libc-ports@sourceware.org" <libc-ports@sourceware.org>,
	"rex.tsai@canonical.com" <rex.tsai@canonical.com>,
	 "jesse.sung@canonical.com" <jesse.sung@canonical.com>,
	"yc.cheng@canonical.com" <yc.cheng@canonical.com>,
	 Shih-Yuan Lee <fourdollars@gmail.com>
Subject: Re: [PATCH] ARM: NEON detected memcpy.
Date: Tue, 09 Apr 2013 15:00:00 -0000	[thread overview]
Message-ID: <51642CF3.2040506@arm.com> (raw)
In-Reply-To: <51641077.4000102@redhat.com>

On 09/04/13 13:58, Carlos O'Donell wrote:
> On 04/09/2013 05:04 AM, Richard Earnshaw wrote:
>> On 03/04/13 16:08, Joseph S. Myers wrote:
>>> I was previously told by people at ARM that NEON memcpy wasn't a good idea
>>> in practice because of raised power consumption, context switch costs etc.
>>> from using NEON in processes that otherwise didn't use it, even if it
>>> appeared superficially beneficial in benchmarks.
>>
>> What really matters is system power increase vs performance gain and
>> what you might be able to save if you finish sooner.  If a 10%
>> improvement to memcpy performance comes at a 12% increase in CPU
>> power, then that might seem like a net loss.  But if the CPU is only
>> 50% of the system power, then the increase in system power increase
>> is just half of that (ie 6%), but the performance improvement will
>> still be 10%.  Note that 20% is just an example to make the figures
>> easier here, I've no idea what the real numbers are, and they will be
>> hightly dependent on the other components in the system: a back-lit
>> display, in particular, will use a significant amount of power.
>>
>> It's also necessary to think about how the Neon unit in the processor
>> is managed.  Is it power gated or simply clock gated.  Power gated
>> regions are likely to have long power-up times (relative to normal
>> CPU operations), but clock-gated regions are typically
>> instantaneously available.
>>
>> Finally, you need to consider whether the unit is likely to be
>> already in use.  With the increasing trend to using the hard-float
>> ABI, VFP (and Neon) are generally much more widely used in code now
>> than they were, so the other potential cost of using Neon (lazy
>> context switching) is also likely to be a non-issue, than if the unit
>> is almost never touched.
>
> My expectation here is that downstream integrators run the
> glibc microbenchmarks, or their own benchmarks, measure power,
> and engage the community to discuss alternate runtime tunings
> for their systems.
>
> The project lacks any generalized whole-system benchmarking,
> but my opinion is that  microbenchmarks are the best "first step"
> towards achieving measurable performance goals (since whole-system
> benchmarking is much more complicated).
>
> At present the only policy we have as a community is that faster
> is always better.


You still have to be careful how you measure 'faster'.  Repeatedly 
running the same fragment of code under the same boundary conditions 
will only ever give you the 'warm caches' number (I, D and branch 
target), but if the code is called cold (or with different boundary 
conditions in the case of the Branch target cache) most of the time in 
real life, that's unlikely to be very meaningful.

R.


  reply	other threads:[~2013-04-09 15:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-03  7:58 Shih-Yuan Lee (FourDollars)
2013-04-03  8:15 ` Will Newton
2013-04-03  9:19   ` Ondřej Bílka
2013-04-03 15:08 ` Joseph S. Myers
2013-04-03 15:48   ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:02     ` Joseph S. Myers
2013-04-04  3:56       ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:20     ` [Patches] " Ondřej Bílka
2013-04-04  4:15       ` Shih-Yuan Lee (FourDollars)
2013-04-04  6:37         ` Ondřej Bílka
2013-04-08  9:12           ` Will Newton
2013-04-08 10:27             ` Ondřej Bílka
2013-04-09  8:45         ` Richard Earnshaw
2013-04-09  9:05   ` Richard Earnshaw
2013-04-09 12:04     ` Ondřej Bílka
2013-04-09 12:59     ` Carlos O'Donell
2013-04-09 15:00       ` Richard Earnshaw [this message]
2013-04-09 15:54         ` Ondřej Bílka
2013-04-09 15:59         ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51642CF3.2040506@arm.com \
    --to=rearnsha@arm.com \
    --cc=carlos@redhat.com \
    --cc=fourdollars@gmail.com \
    --cc=jesse.sung@canonical.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-ports@sourceware.org \
    --cc=patches@eglibc.org \
    --cc=rex.tsai@canonical.com \
    --cc=sylee@canonical.com \
    --cc=yc.cheng@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).