From: "Ondřej Bílka" <neleai@seznam.cz>
To: Richard Earnshaw <rearnsha@arm.com>
Cc: Carlos O'Donell <carlos@redhat.com>,
"Joseph S. Myers" <joseph@codesourcery.com>,
"Shih-Yuan Lee (FourDollars)" <sylee@canonical.com>,
"patches@eglibc.org" <patches@eglibc.org>,
"libc-ports@sourceware.org" <libc-ports@sourceware.org>,
"rex.tsai@canonical.com" <rex.tsai@canonical.com>,
"jesse.sung@canonical.com" <jesse.sung@canonical.com>,
"yc.cheng@canonical.com" <yc.cheng@canonical.com>,
Shih-Yuan Lee <fourdollars@gmail.com>
Subject: Re: [PATCH] ARM: NEON detected memcpy.
Date: Tue, 09 Apr 2013 15:54:00 -0000 [thread overview]
Message-ID: <20130409155344.GA8760@domone.kolej.mff.cuni.cz> (raw)
In-Reply-To: <51642CF3.2040506@arm.com>
On Tue, Apr 09, 2013 at 04:00:03PM +0100, Richard Earnshaw wrote:
> On 09/04/13 13:58, Carlos O'Donell wrote:
> >On 04/09/2013 05:04 AM, Richard Earnshaw wrote:
> >>On 03/04/13 16:08, Joseph S. Myers wrote:
> >>>I was previously told by people at ARM that NEON memcpy wasn't a good idea
> >>>in practice because of raised power consumption, context switch costs etc.
> >>>from using NEON in processes that otherwise didn't use it, even if it
> >>>appeared superficially beneficial in benchmarks.
> >>
> >>What really matters is system power increase vs performance gain and
> >>what you might be able to save if you finish sooner. If a 10%
> >>improvement to memcpy performance comes at a 12% increase in CPU
> >>power, then that might seem like a net loss. But if the CPU is only
> >>50% of the system power, then the increase in system power increase
> >>is just half of that (ie 6%), but the performance improvement will
> >>still be 10%. Note that 20% is just an example to make the figures
> >>easier here, I've no idea what the real numbers are, and they will be
> >>hightly dependent on the other components in the system: a back-lit
> >>display, in particular, will use a significant amount of power.
> >>
> >>It's also necessary to think about how the Neon unit in the processor
> >>is managed. Is it power gated or simply clock gated. Power gated
> >>regions are likely to have long power-up times (relative to normal
> >>CPU operations), but clock-gated regions are typically
> >>instantaneously available.
> >>
> >>Finally, you need to consider whether the unit is likely to be
> >>already in use. With the increasing trend to using the hard-float
> >>ABI, VFP (and Neon) are generally much more widely used in code now
> >>than they were, so the other potential cost of using Neon (lazy
> >>context switching) is also likely to be a non-issue, than if the unit
> >>is almost never touched.
> >
> >My expectation here is that downstream integrators run the
> >glibc microbenchmarks, or their own benchmarks, measure power,
> >and engage the community to discuss alternate runtime tunings
> >for their systems.
> >
> >The project lacks any generalized whole-system benchmarking,
> >but my opinion is that microbenchmarks are the best "first step"
> >towards achieving measurable performance goals (since whole-system
> >benchmarking is much more complicated).
> >
> >At present the only policy we have as a community is that faster
> >is always better.
>
I am rewriting my whole-system benchmarks to be more generic.
Still measuring performance would be time consuming, benchmarks needs
minimaly hour to get enough data.
Then I cannot replicate exact conditions of measurement. It depends on
what I do with computer which varies.
There is problem with representability. I know how conditions for
popular programs (gcc, firefox) Most other programs show very similar
characteristic but I do not know anything about tail.
To get more direct feedback I also do record/replay benchmark, see my
previous mail.
>
> You still have to be careful how you measure 'faster'. Repeatedly
> running the same fragment of code under the same boundary conditions
> will only ever give you the 'warm caches' number (I, D and branch
> target), but if the code is called cold (or with different boundary
> conditions in the case of the Branch target cache) most of the time
> in real life, that's unlikely to be very meaningful.
>
> R.
>
next prev parent reply other threads:[~2013-04-09 15:54 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 7:58 Shih-Yuan Lee (FourDollars)
2013-04-03 8:15 ` Will Newton
2013-04-03 9:19 ` Ondřej Bílka
2013-04-03 15:08 ` Joseph S. Myers
2013-04-03 15:48 ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:02 ` Joseph S. Myers
2013-04-04 3:56 ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:20 ` [Patches] " Ondřej Bílka
2013-04-04 4:15 ` Shih-Yuan Lee (FourDollars)
2013-04-04 6:37 ` Ondřej Bílka
2013-04-08 9:12 ` Will Newton
2013-04-08 10:27 ` Ondřej Bílka
2013-04-09 8:45 ` Richard Earnshaw
2013-04-09 9:05 ` Richard Earnshaw
2013-04-09 12:04 ` Ondřej Bílka
2013-04-09 12:59 ` Carlos O'Donell
2013-04-09 15:00 ` Richard Earnshaw
2013-04-09 15:54 ` Ondřej Bílka [this message]
2013-04-09 15:59 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130409155344.GA8760@domone.kolej.mff.cuni.cz \
--to=neleai@seznam.cz \
--cc=carlos@redhat.com \
--cc=fourdollars@gmail.com \
--cc=jesse.sung@canonical.com \
--cc=joseph@codesourcery.com \
--cc=libc-ports@sourceware.org \
--cc=patches@eglibc.org \
--cc=rearnsha@arm.com \
--cc=rex.tsai@canonical.com \
--cc=sylee@canonical.com \
--cc=yc.cheng@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).