From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9125 invoked by alias); 15 Apr 2013 13:38:54 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 9098 invoked by uid 89); 15 Apr 2013 13:38:54 -0000 X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL,TW_CP autolearn=no version=3.3.1 Received: from popelka.ms.mff.cuni.cz (HELO popelka.ms.mff.cuni.cz) (195.113.20.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 15 Apr 2013 13:38:52 +0000 Received: from domone.kolej.mff.cuni.cz (popelka.ms.mff.cuni.cz [195.113.20.131]) by popelka.ms.mff.cuni.cz (Postfix) with ESMTPS id ED6006A6EF; Mon, 15 Apr 2013 15:38:47 +0200 (CEST) Received: by domone.kolej.mff.cuni.cz (Postfix, from userid 1000) id 48A9A6046E; Mon, 15 Apr 2013 15:38:29 +0200 (CEST) Date: Mon, 15 Apr 2013 13:38:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: Will Newton Cc: libc-ports@sourceware.org, Patch Tracking Subject: Re: [PATCH] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC. Message-ID: <20130415133829.GA14170@domone.kolej.mff.cuni.cz> References: <516BCEE5.9070809@linaro.org> <20130415102327.GA7032@domone.kolej.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-04/txt/msg00065.txt.bz2 On Mon, Apr 15, 2013 at 11:59:27AM +0100, Will Newton wrote: > On 15 April 2013 11:23, Ondřej Bílka wrote: > > On Mon, Apr 15, 2013 at 11:01:37AM +0100, Will Newton wrote: > >> Attached are a set of benchmarks of the new code versus the existing > >> memcpy implementation on a Cortex-A15 platform. > >> > > > > As I wrote at previous thread: > > > > On Thu, Apr 04, 2013 at 08:37:01AM +0200, Ondřej Bílka wrote: > >> Try also benchmark with real world data (20MB). I put it on > >> http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2 > >> > >> To add neon copy test_generic.c file and add compiling neon > >> implementation to benchmark script. > >> > >> It now only measures total time. > >> I would need something like timestamp counter for more detailed > >> results. > > > > How good it fares on my benchmark? > > It wasn't clear to me how to integrate my code and run the tests - I > built a version of replay.c with each memcpy implementation and the > new one ran in 20% less time, but I don't know if I did that > correctly. > Nice, this looks correct as that big improvement cannot happen by chance. Do you plan improve memset in same way? First step would be take memcpy and replace loads with zero register. Ondra