[PATCH v2] ARM: Improve armv7 memcpy performance.

public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v2] ARM: Improve armv7 memcpy performance.
@ 2013-08-30 15:09 Will Newton
  2013-08-30 17:16 ` Carlos O'Donell
  0 siblings, 1 reply; 2+ messages in thread
From: Will Newton @ 2013-08-30 15:09 UTC (permalink / raw)
  To: libc-ports; +Cc: patches


Only enter the aligned copy loop with buffers that can be 8-byte
aligned. This improves performance slightly on Cortex-A9 and
Cortex-A15 cores for large copies with buffers that are 4-byte
aligned but not 8-byte aligned.

ports/ChangeLog.arm:

2013-08-30  Will Newton  <will.newton@linaro.org>

	* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
	on entry to aligned copy loop to improve performance.
---
 ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Changes in v2:
 - Improved description

diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
index 3decad6..6e84173 100644
--- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
+++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
@@ -369,8 +369,8 @@ ENTRY(memcpy)
 	cfi_adjust_cfa_offset (FRAME_SIZE)
 	cfi_rel_offset (tmp2, 0)
 	cfi_remember_state
-	and	tmp2, src, #3
-	and	tmp1, dst, #3
+	and	tmp2, src, #7
+	and	tmp1, dst, #7
 	cmp	tmp1, tmp2
 	bne	.Lcpy_notaligned

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] ARM: Improve armv7 memcpy performance.
  2013-08-30 15:09 [PATCH v2] ARM: Improve armv7 memcpy performance Will Newton
@ 2013-08-30 17:16 ` Carlos O'Donell
  0 siblings, 0 replies; 2+ messages in thread
From: Carlos O'Donell @ 2013-08-30 17:16 UTC (permalink / raw)
  To: Will Newton; +Cc: libc-ports, patches

On 08/30/2013 11:09 AM, Will Newton wrote:
> 
> Only enter the aligned copy loop with buffers that can be 8-byte
> aligned. This improves performance slightly on Cortex-A9 and
> Cortex-A15 cores for large copies with buffers that are 4-byte
> aligned but not 8-byte aligned.
> 
> ports/ChangeLog.arm:
> 
> 2013-08-30  Will Newton  <will.newton@linaro.org>
> 
> 	* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
> 	on entry to aligned copy loop to improve performance.

How did you test this?

Did you use the glibc performance microbenchmark?

Does the microbenchmark show gains with this change? What are the numbers?

Cheers,
Carlos.

> ---
>  ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Changes in v2:
>  - Improved description
> 
> diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> index 3decad6..6e84173 100644
> --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> @@ -369,8 +369,8 @@ ENTRY(memcpy)
>  	cfi_adjust_cfa_offset (FRAME_SIZE)
>  	cfi_rel_offset (tmp2, 0)
>  	cfi_remember_state
> -	and	tmp2, src, #3
> -	and	tmp1, dst, #3
> +	and	tmp2, src, #7
> +	and	tmp1, dst, #7
>  	cmp	tmp1, tmp2
>  	bne	.Lcpy_notaligned
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-30 17:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-30 15:09 [PATCH v2] ARM: Improve armv7 memcpy performance Will Newton
2013-08-30 17:16 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).