From: "Ondřej Bílka" <neleai@seznam.cz>
To: Will Newton <will.newton@linaro.org>
Cc: "Shih-Yuan Lee (FourDollars)" <sylee@canonical.com>,
patches@eglibc.org, libc-ports@sourceware.org,
rex.tsai@canonical.com, jesse.sung@canonical.com,
yc.cheng@canonical.com, Shih-Yuan Lee <fourdollars@gmail.com>
Subject: Re: [PATCH] ARM: NEON detected memcpy.
Date: Wed, 03 Apr 2013 09:19:00 -0000 [thread overview]
Message-ID: <20130403091855.GA3467@domone.kolej.mff.cuni.cz> (raw)
In-Reply-To: <CANu=DmjOZBWu2D=+0BZxyeGSRbH-heZ+e1ofS8bOWM7yG1hPsw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1426 bytes --]
On Wed, Apr 03, 2013 at 09:15:46AM +0100, Will Newton wrote:
> On 3 April 2013 08:58, Shih-Yuan Lee (FourDollars) <sylee@canonical.com> wrote:
> > Hi,
> >
> > I am working on the NEON detected memcpy.
> > This is based on what Siarhei Siamashka did at 2009 [1].
> >
> > The idea is to use HWCAP and check NEON bit.
> > If there is a NEON bit, using NEON optimized memcpy.
> > If not, using the original memcpy instead.
> >
> > If using NEON optimized memcpy, the performance of memcpy will be
> > raised up by about 50% [2].
> >
> > How do you think about this idea? Any comment is welcome.
>
> Hi,
>
> I am working on a similar project within Linaro, which is to add the
> NEON/VFP capable memcpy from cortex-strings[1] to glibc. However I am
> looking at enabling it at runtime via indirect functions which makes
> it slightly more complex than just importing the cortex strings code,
> so I don't have any patches to show you just yet.
>
> [1] https://launchpad.net/cortex-strings
Hi,
You need to optimize header beacuse you typically copy less than 128 bytes.
My measurement how many 16 byte blocks are used is here.
http://kam.mff.cuni.cz/~ondra/benchmark_string/profile/result.html
If I had code to get number of cycles from perf counter I could provide
tool to see memcpy performance in arbitrary binary.
On x64 I used overlapping load/store to minimize branches. Try how attached
memcpy works on small inputs.
[-- Attachment #2: memcpy_generic.c --]
[-- Type: text/plain, Size: 2048 bytes --]
#include <stdint.h>
#include <stdlib.h>
/* Align VALUE down by ALIGN bytes. */
#define ALIGN_DOWN(value, align) \
ALIGN_DOWN_M1(value, align - 1)
/* Align VALUE down by ALIGN_M1 + 1 bytes.
Useful if you have precomputed ALIGN - 1. */
#define ALIGN_DOWN_M1(value, align_m1) \
(void *)((uintptr_t)(value) \
& ~(uintptr_t)(align_m1))
/* Align VALUE up by ALIGN bytes. */
#define ALIGN_UP(value, align) \
ALIGN_UP_M1(value, align - 1)
/* Align VALUE up by ALIGN_M1 + 1 bytes.
Useful if you have precomputed ALIGN - 1. */
#define ALIGN_UP_M1(value, align_m1) \
(void *)(((uintptr_t)(value) + (uintptr_t)(align_m1)) \
& ~(uintptr_t)(align_m1))
#define STOREU(x,y) STORE(x,y)
#define STORE(x,y) ((uint64_t*)(x))[0]=((uint64_t*)(y))[0]; ((uint64_t*)(x))[1]=((uint64_t*)(y))[1];
#define LOAD(x) x
#define LOADU(x) x
static char *memcpy_small (char *dest, char *src, size_t no, char *ret);
void *memcpy_new_u(char *dest, char *src, size_t n)
{
char *from,*to;
if (n < 16)
{
return memcpy_small(dest, src, n, dest);
}
else
{
STOREU(dest, LOADU(src));
STOREU(dest + n - 16, LOADU(src + n - 16));
to = ALIGN_DOWN(dest + n, 16);
from = ALIGN_DOWN(src + 16, 16);
dest += src - from;
src = from;
from = dest;
while (from != to)
{
STOREU(from, LOAD(src));
from += 16;
src += 16;
}
}
return dest;
}
static char *memcpy_small (char *dest, char *src, size_t no, char *ret)
{
if (no & (8 + 16))
{
((uint64_t *) dest)[0] = ((uint64_t *) src)[0];
((uint64_t *)(dest + no - 8))[0] = ((uint64_t *)(src + no - 8))[0];
return ret;
}
if (no & 4)
{
((uint32_t *) dest)[0] = ((uint32_t *) src)[0];
((uint32_t *)(dest + no - 4))[0] = ((uint32_t *)(src + no - 4))[0];
return ret;
}
dest[0] = src[0];
if (no & 2)
{
((uint16_t *)(dest + no - 2))[0] = ((uint16_t *)(src + no - 2))[0];
return ret;
}
return ret;
}
next prev parent reply other threads:[~2013-04-03 9:19 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 7:58 Shih-Yuan Lee (FourDollars)
2013-04-03 8:15 ` Will Newton
2013-04-03 9:19 ` Ondřej Bílka [this message]
2013-04-03 15:08 ` Joseph S. Myers
2013-04-03 15:48 ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:02 ` Joseph S. Myers
2013-04-04 3:56 ` Shih-Yuan Lee (FourDollars)
2013-04-03 16:20 ` [Patches] " Ondřej Bílka
2013-04-04 4:15 ` Shih-Yuan Lee (FourDollars)
2013-04-04 6:37 ` Ondřej Bílka
2013-04-08 9:12 ` Will Newton
2013-04-08 10:27 ` Ondřej Bílka
2013-04-09 8:45 ` Richard Earnshaw
2013-04-09 9:05 ` Richard Earnshaw
2013-04-09 12:04 ` Ondřej Bílka
2013-04-09 12:59 ` Carlos O'Donell
2013-04-09 15:00 ` Richard Earnshaw
2013-04-09 15:54 ` Ondřej Bílka
2013-04-09 15:59 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130403091855.GA3467@domone.kolej.mff.cuni.cz \
--to=neleai@seznam.cz \
--cc=fourdollars@gmail.com \
--cc=jesse.sung@canonical.com \
--cc=libc-ports@sourceware.org \
--cc=patches@eglibc.org \
--cc=rex.tsai@canonical.com \
--cc=sylee@canonical.com \
--cc=will.newton@linaro.org \
--cc=yc.cheng@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).