From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24672 invoked by alias); 30 Aug 2013 19:26:55 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 24583 invoked by uid 89); 30 Aug 2013 19:26:54 -0000 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Aug 2013 19:26:54 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.6 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r7UJQgBH012976 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 30 Aug 2013 15:26:42 -0400 Received: from [10.3.113.141] (ovpn-113-141.phx2.redhat.com [10.3.113.141]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r7UJQf22007774; Fri, 30 Aug 2013 15:26:41 -0400 Message-ID: <5220F1F0.80501@redhat.com> Date: Fri, 30 Aug 2013 19:26:00 -0000 From: "Carlos O'Donell" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Will Newton CC: "libc-ports@sourceware.org" , Patch Tracking , =?UTF-8?B?T25kxZllaiBCw61sa2E=?= , Siddhesh Poyarekar Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance. References: <520894D5.7060207@linaro.org> <5220D30B.9080306@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2013-08/txt/msg00097.txt.bz2 On 08/30/2013 02:48 PM, Will Newton wrote: > On 30 August 2013 18:14, Carlos O'Donell wrote: > > Hi Carlos, > >>>> A small change to the entry to the aligned copy loop improves >>>> performance slightly on A9 and A15 cores for certain copies. >>>> >>>> ports/ChangeLog.arm: >>>> >>>> 2013-08-07 Will Newton >>>> >>>> * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check >>>> on entry to aligned copy loop for improved performance. >>>> --- >>>> ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> Ping? >> >> How did you test the performance? >> >> glibc has a performance microbenchmark, did you use that? > > No, I used the cortex-strings package developed by Linaro for > benchmarking various string functions against one another[1]. > > I haven't checked the glibc benchmarks but I'll look into that. It's > quite a specific case that shows the problem so it may not be obvious > which one is better however. If it's not obvious how is someone supposed to review this patch? :-) > [1] https://launchpad.net/cortex-strings There are 2 benchmarks. One appears to be dhrystone 2.1, which isn't a string test in and of itself which should not be used for benchmarking or changing string functions. The other is called "multi" and appears to run some functions in a loop and take the time. e.g. http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/benchmarks/multi/harness.c I would not call `multi' exhaustive, and while neither is the glibc performance benchmark tests the glibc tests have received review from the glibc community and are our preferred way of demonstrating performance gains when posting performance patches. I would really really like to see you post the results of running your new implementation with this benchmark and show the numbers that claim this is faster. Is that possible? Cheers, Carlos.