From: "Shih-Yuan Lee (FourDollars)" <sylee@canonical.com>
To: "Ondřej Bílka" <neleai@seznam.cz>
Cc: "Joseph S. Myers" <joseph@codesourcery.com>,
libc-ports@sourceware.org, Jesse Sung <jesse.sung@canonical.com>,
patches@eglibc.org, YC Cheng <yc.cheng@canonical.com>,
rex.tsai@canonical.com
Subject: Re: [Patches] [PATCH] ARM: NEON detected memcpy.
Date: Thu, 04 Apr 2013 04:15:00 -0000 [thread overview]
Message-ID: <CAAT15mMZgtfcUr3rgz3BiY-v14-DW9u1LHP+5jp2rD3uxA+=sw@mail.gmail.com> (raw)
In-Reply-To: <20130403161949.GA6759@domone.kolej.mff.cuni.cz>
Hi Ondrej,
I do have some benchmark data.
--- Running benchmarks (average case/perfect alignment case) ---
very small data test:
memcpy_arm : (3 bytes copy) = 86.2 MB/s / 88.3 MB/s
memcpy_neon : (3 bytes copy) = 53.4 MB/s / 54.5 MB/s
memcpy_arm : (4 bytes copy) = 79.8 MB/s / 62.9 MB/s
memcpy_neon : (4 bytes copy) = 72.5 MB/s / 73.9 MB/s
memcpy_arm : (5 bytes copy) = 91.0 MB/s / 78.7 MB/s
memcpy_neon : (5 bytes copy) = 90.2 MB/s / 91.0 MB/s
memcpy_arm : (7 bytes copy) = 109.5 MB/s / 104.7 MB/s
memcpy_neon : (7 bytes copy) = 122.1 MB/s / 126.6 MB/s
memcpy_arm : (8 bytes copy) = 122.4 MB/s / 122.4 MB/s
memcpy_neon : (8 bytes copy) = 142.0 MB/s / 148.2 MB/s
memcpy_arm : (11 bytes copy) = 157.8 MB/s / 161.3 MB/s
memcpy_neon : (11 bytes copy) = 193.8 MB/s / 196.2 MB/s
memcpy_arm : (12 bytes copy) = 170.1 MB/s / 172.7 MB/s
memcpy_neon : (12 bytes copy) = 206.8 MB/s / 212.5 MB/s
memcpy_arm : (15 bytes copy) = 204.0 MB/s / 209.6 MB/s
memcpy_neon : (15 bytes copy) = 247.5 MB/s / 270.3 MB/s
memcpy_arm : (16 bytes copy) = 212.2 MB/s / 225.6 MB/s
memcpy_neon : (16 bytes copy) = 175.3 MB/s / 252.2 MB/s
memcpy_arm : (24 bytes copy) = 274.6 MB/s / 326.5 MB/s
memcpy_neon : (24 bytes copy) = 244.7 MB/s / 367.8 MB/s
memcpy_arm : (31 bytes copy) = 333.3 MB/s / 399.2 MB/s
memcpy_neon : (31 bytes copy) = 304.3 MB/s / 463.5 MB/s
L1 cached data:
memcpy_arm : (4096 bytes copy) = 1295.5 MB/s / 2691.8 MB/s
memcpy_neon : (4096 bytes copy) = 1826.3 MB/s / 2021.8 MB/s
memcpy_arm : (6144 bytes copy) = 1306.5 MB/s / 2724.1 MB/s
memcpy_neon : (6144 bytes copy) = 1857.8 MB/s / 2053.2 MB/s
L2 cached data:
memcpy_arm : (65536 bytes copy) = 1291.5 MB/s / 2304.8 MB/s
memcpy_neon : (65536 bytes copy) = 1866.5 MB/s / 2441.7 MB/s
memcpy_arm : (98304 bytes copy) = 1285.6 MB/s / 2283.8 MB/s
memcpy_neon : (98304 bytes copy) = 1860.7 MB/s / 2454.7 MB/s
SDRAM:
memcpy_arm : (2097152 bytes copy) = 466.7 MB/s / 736.5 MB/s
memcpy_neon : (2097152 bytes copy) = 727.5 MB/s / 868.8 MB/s
memcpy_arm : (3145728 bytes copy) = 507.9 MB/s / 854.7 MB/s
memcpy_neon : (3145728 bytes copy) = 852.9 MB/s / 1038.0 MB/s
(*) 1 MB = 1000000 bytes
(*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports
The similar benchmark is at
http://sourceware.org/ml/libc-ports/2009-07/msg00000.html .
Regards,
$4
On Thu, Apr 4, 2013 at 12:19 AM, Ondřej Bílka <neleai@seznam.cz> wrote:
> On Wed, Apr 03, 2013 at 11:47:36PM +0800, Shih-Yuan Lee (FourDollars) wrote:
>> Hi Joseph,
>>
> ...
>> > I was previously told by people at ARM that NEON memcpy wasn't a good idea
>> > in practice because of raised power consumption, context switch costs etc.
>> > from using NEON in processes that otherwise didn't use it, even if it
>> > appeared superficially beneficial in benchmarks.
>> >
>> About raised power consumption and context switch costs, I may be able
>> to add some option in configure for the users to decide if they want
>> to use this feature or not.
>> How do you think?
>>
> Configure option is bit overkill.
>
> You need to compare neon/other implementation speed. Then determine
> size where neon is faster if we include energy cost and context switch.
> My first estimate is use neon when larger than 4096 bytes.
>
> However to determine context switch cost of neon you must account network effect.
>
> If you use neon in one function that is called sufficiently often (to
> always save registers) then adding neon implementation for additional functions
> does not increase cost.
next prev parent reply other threads:[~2013-04-04 4:15 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 7:58 Shih-Yuan Lee (FourDollars)
2013-04-03 8:15 ` Will Newton
2013-04-03 9:19 ` Ondřej Bílka
2013-04-03 15:08 ` Joseph S. Myers
2013-04-03 15:48 ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:02 ` Joseph S. Myers
2013-04-04 3:56 ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:20 ` [Patches] " Ondřej Bílka
2013-04-04 4:15 ` Shih-Yuan Lee (FourDollars) [this message]
2013-04-04 6:37 ` Ondřej Bílka
2013-04-08 9:12 ` Will Newton
2013-04-08 10:27 ` Ondřej Bílka
2013-04-09 8:45 ` Richard Earnshaw
2013-04-09 9:05 ` Richard Earnshaw
2013-04-09 12:04 ` Ondřej Bílka
2013-04-09 12:59 ` Carlos O'Donell
2013-04-09 15:00 ` Richard Earnshaw
2013-04-09 15:54 ` Ondřej Bílka
2013-04-09 15:59 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAT15mMZgtfcUr3rgz3BiY-v14-DW9u1LHP+5jp2rD3uxA+=sw@mail.gmail.com' \
--to=sylee@canonical.com \
--cc=jesse.sung@canonical.com \
--cc=joseph@codesourcery.com \
--cc=libc-ports@sourceware.org \
--cc=neleai@seznam.cz \
--cc=patches@eglibc.org \
--cc=rex.tsai@canonical.com \
--cc=yc.cheng@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).