From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26996 invoked by alias); 4 Apr 2013 06:37:41 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 26986 invoked by uid 89); 4 Apr 2013 06:37:41 -0000 X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL,TW_CP autolearn=no version=3.3.1 Received: from popelka.ms.mff.cuni.cz (HELO popelka.ms.mff.cuni.cz) (195.113.20.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Thu, 04 Apr 2013 06:37:37 +0000 Received: from domone.kolej.mff.cuni.cz (popelka.ms.mff.cuni.cz [195.113.20.131]) by popelka.ms.mff.cuni.cz (Postfix) with ESMTPS id E803149B31; Thu, 4 Apr 2013 08:37:23 +0200 (CEST) Received: by domone.kolej.mff.cuni.cz (Postfix, from userid 1000) id BBF2A6046C; Thu, 4 Apr 2013 08:37:01 +0200 (CEST) Date: Thu, 04 Apr 2013 06:37:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: "Shih-Yuan Lee (FourDollars)" Cc: "Joseph S. Myers" , libc-ports@sourceware.org, Jesse Sung , patches@eglibc.org, YC Cheng , rex.tsai@canonical.com Subject: Re: [Patches] [PATCH] ARM: NEON detected memcpy. Message-ID: <20130404063701.GA6324@domone.kolej.mff.cuni.cz> References: <20130403161949.GA6759@domone.kolej.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-04/txt/msg00015.txt.bz2 On Thu, Apr 04, 2013 at 12:15:17PM +0800, Shih-Yuan Lee (FourDollars) wrote: > Hi Ondrej, > > I do have some benchmark data. > Hi, Try also benchmark with real world data (20MB). I put it on http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2 To add neon copy test_generic.c file and add compiling neon implementation to benchmark script. It now only measures total time. I would need something like timestamp counter for more detailed results. > --- Running benchmarks (average case/perfect alignment case) --- > > very small data test: > memcpy_arm : (3 bytes copy) = 86.2 MB/s / 88.3 MB/s > memcpy_neon : (3 bytes copy) = 53.4 MB/s / 54.5 MB/s > memcpy_arm : (4 bytes copy) = 79.8 MB/s / 62.9 MB/s > memcpy_neon : (4 bytes copy) = 72.5 MB/s / 73.9 MB/s > memcpy_arm : (5 bytes copy) = 91.0 MB/s / 78.7 MB/s > memcpy_neon : (5 bytes copy) = 90.2 MB/s / 91.0 MB/s > memcpy_arm : (7 bytes copy) = 109.5 MB/s / 104.7 MB/s > memcpy_neon : (7 bytes copy) = 122.1 MB/s / 126.6 MB/s > memcpy_arm : (8 bytes copy) = 122.4 MB/s / 122.4 MB/s > memcpy_neon : (8 bytes copy) = 142.0 MB/s / 148.2 MB/s > memcpy_arm : (11 bytes copy) = 157.8 MB/s / 161.3 MB/s > memcpy_neon : (11 bytes copy) = 193.8 MB/s / 196.2 MB/s > memcpy_arm : (12 bytes copy) = 170.1 MB/s / 172.7 MB/s > memcpy_neon : (12 bytes copy) = 206.8 MB/s / 212.5 MB/s > memcpy_arm : (15 bytes copy) = 204.0 MB/s / 209.6 MB/s > memcpy_neon : (15 bytes copy) = 247.5 MB/s / 270.3 MB/s > memcpy_arm : (16 bytes copy) = 212.2 MB/s / 225.6 MB/s > memcpy_neon : (16 bytes copy) = 175.3 MB/s / 252.2 MB/s > memcpy_arm : (24 bytes copy) = 274.6 MB/s / 326.5 MB/s > memcpy_neon : (24 bytes copy) = 244.7 MB/s / 367.8 MB/s > memcpy_arm : (31 bytes copy) = 333.3 MB/s / 399.2 MB/s > memcpy_neon : (31 bytes copy) = 304.3 MB/s / 463.5 MB/s > > L1 cached data: > memcpy_arm : (4096 bytes copy) = 1295.5 MB/s / 2691.8 MB/s > memcpy_neon : (4096 bytes copy) = 1826.3 MB/s / 2021.8 MB/s > memcpy_arm : (6144 bytes copy) = 1306.5 MB/s / 2724.1 MB/s > memcpy_neon : (6144 bytes copy) = 1857.8 MB/s / 2053.2 MB/s > > L2 cached data: > memcpy_arm : (65536 bytes copy) = 1291.5 MB/s / 2304.8 MB/s > memcpy_neon : (65536 bytes copy) = 1866.5 MB/s / 2441.7 MB/s > memcpy_arm : (98304 bytes copy) = 1285.6 MB/s / 2283.8 MB/s > memcpy_neon : (98304 bytes copy) = 1860.7 MB/s / 2454.7 MB/s > > SDRAM: > memcpy_arm : (2097152 bytes copy) = 466.7 MB/s / 736.5 MB/s > memcpy_neon : (2097152 bytes copy) = 727.5 MB/s / 868.8 MB/s > memcpy_arm : (3145728 bytes copy) = 507.9 MB/s / 854.7 MB/s > memcpy_neon : (3145728 bytes copy) = 852.9 MB/s / 1038.0 MB/s > > (*) 1 MB = 1000000 bytes > (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports > > The similar benchmark is at > http://sourceware.org/ml/libc-ports/2009-07/msg00000.html . > > Regards, > $4 >