* [PATCH v2] ARM: Improve armv7 memcpy performance.
@ 2013-08-30 15:09 Will Newton
2013-08-30 17:16 ` Carlos O'Donell
0 siblings, 1 reply; 2+ messages in thread
From: Will Newton @ 2013-08-30 15:09 UTC (permalink / raw)
To: libc-ports; +Cc: patches
Only enter the aligned copy loop with buffers that can be 8-byte
aligned. This improves performance slightly on Cortex-A9 and
Cortex-A15 cores for large copies with buffers that are 4-byte
aligned but not 8-byte aligned.
ports/ChangeLog.arm:
2013-08-30 Will Newton <will.newton@linaro.org>
* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
on entry to aligned copy loop to improve performance.
---
ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Changes in v2:
- Improved description
diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
index 3decad6..6e84173 100644
--- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
+++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
@@ -369,8 +369,8 @@ ENTRY(memcpy)
cfi_adjust_cfa_offset (FRAME_SIZE)
cfi_rel_offset (tmp2, 0)
cfi_remember_state
- and tmp2, src, #3
- and tmp1, dst, #3
+ and tmp2, src, #7
+ and tmp1, dst, #7
cmp tmp1, tmp2
bne .Lcpy_notaligned
--
1.8.1.4
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v2] ARM: Improve armv7 memcpy performance.
2013-08-30 15:09 [PATCH v2] ARM: Improve armv7 memcpy performance Will Newton
@ 2013-08-30 17:16 ` Carlos O'Donell
0 siblings, 0 replies; 2+ messages in thread
From: Carlos O'Donell @ 2013-08-30 17:16 UTC (permalink / raw)
To: Will Newton; +Cc: libc-ports, patches
On 08/30/2013 11:09 AM, Will Newton wrote:
>
> Only enter the aligned copy loop with buffers that can be 8-byte
> aligned. This improves performance slightly on Cortex-A9 and
> Cortex-A15 cores for large copies with buffers that are 4-byte
> aligned but not 8-byte aligned.
>
> ports/ChangeLog.arm:
>
> 2013-08-30 Will Newton <will.newton@linaro.org>
>
> * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
> on entry to aligned copy loop to improve performance.
How did you test this?
Did you use the glibc performance microbenchmark?
Does the microbenchmark show gains with this change? What are the numbers?
Cheers,
Carlos.
> ---
> ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Changes in v2:
> - Improved description
>
> diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> index 3decad6..6e84173 100644
> --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> @@ -369,8 +369,8 @@ ENTRY(memcpy)
> cfi_adjust_cfa_offset (FRAME_SIZE)
> cfi_rel_offset (tmp2, 0)
> cfi_remember_state
> - and tmp2, src, #3
> - and tmp1, dst, #3
> + and tmp2, src, #7
> + and tmp1, dst, #7
> cmp tmp1, tmp2
> bne .Lcpy_notaligned
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-08-30 17:16 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-30 15:09 [PATCH v2] ARM: Improve armv7 memcpy performance Will Newton
2013-08-30 17:16 ` Carlos O'Donell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).