From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6679 invoked by alias); 30 Aug 2013 17:16:58 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 6667 invoked by uid 89); 30 Aug 2013 17:16:58 -0000 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Aug 2013 17:16:58 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.6 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r7UHGs2w027558 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 30 Aug 2013 13:16:54 -0400 Received: from [10.3.113.180] (ovpn-113-180.phx2.redhat.com [10.3.113.180]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r7UHGr2U014834; Fri, 30 Aug 2013 13:16:54 -0400 Message-ID: <5220D385.4040501@redhat.com> Date: Fri, 30 Aug 2013 17:16:00 -0000 From: "Carlos O'Donell" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Will Newton CC: libc-ports@sourceware.org, patches@linaro.org Subject: Re: [PATCH v2] ARM: Improve armv7 memcpy performance. References: <5220B5A1.40903@linaro.org> In-Reply-To: <5220B5A1.40903@linaro.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SW-Source: 2013-08/txt/msg00088.txt.bz2 On 08/30/2013 11:09 AM, Will Newton wrote: > > Only enter the aligned copy loop with buffers that can be 8-byte > aligned. This improves performance slightly on Cortex-A9 and > Cortex-A15 cores for large copies with buffers that are 4-byte > aligned but not 8-byte aligned. > > ports/ChangeLog.arm: > > 2013-08-30 Will Newton > > * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check > on entry to aligned copy loop to improve performance. How did you test this? Did you use the glibc performance microbenchmark? Does the microbenchmark show gains with this change? What are the numbers? Cheers, Carlos. > --- > ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Changes in v2: > - Improved description > > diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S > index 3decad6..6e84173 100644 > --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S > +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S > @@ -369,8 +369,8 @@ ENTRY(memcpy) > cfi_adjust_cfa_offset (FRAME_SIZE) > cfi_rel_offset (tmp2, 0) > cfi_remember_state > - and tmp2, src, #3 > - and tmp1, dst, #3 > + and tmp2, src, #7 > + and tmp1, dst, #7 > cmp tmp1, tmp2 > bne .Lcpy_notaligned >