From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20670 invoked by alias); 9 Apr 2013 08:45:22 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 20661 invoked by uid 89); 9 Apr 2013 08:45:22 -0000 X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,TW_CP autolearn=ham version=3.3.1 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 09 Apr 2013 08:45:18 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 09 Apr 2013 09:45:15 +0100 Received: from [10.1.69.67] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 9 Apr 2013 09:45:12 +0100 Message-ID: <5163D517.6020703@arm.com> Date: Tue, 09 Apr 2013 08:45:00 -0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: "Shih-Yuan Lee (FourDollars)" CC: =?UTF-8?B?T25kxZllaiBCw61sa2E=?= , "Joseph S. Myers" , "libc-ports@sourceware.org" , Jesse Sung , "patches@eglibc.org" , YC Cheng , "rex.tsai@canonical.com" Subject: Re: [Patches] [PATCH] ARM: NEON detected memcpy. References: <20130403161949.GA6759@domone.kolej.mff.cuni.cz> In-Reply-To: X-MC-Unique: 113040909451505601 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-SW-Source: 2013-04/txt/msg00030.txt.bz2 On 04/04/13 05:15, Shih-Yuan Lee (FourDollars) wrote: > Hi Ondrej, > > I do have some benchmark data. > > --- Running benchmarks (average case/perfect alignment case) --- > > very small data test: > memcpy_arm : (3 bytes copy) =3D 86.2 MB/s / 88.3 MB/s > memcpy_neon : (3 bytes copy) =3D 53.4 MB/s / 54.5 MB/s > memcpy_arm : (4 bytes copy) =3D 79.8 MB/s / 62.9 MB/s > memcpy_neon : (4 bytes copy) =3D 72.5 MB/s / 73.9 MB/s > memcpy_arm : (5 bytes copy) =3D 91.0 MB/s / 78.7 MB/s > memcpy_neon : (5 bytes copy) =3D 90.2 MB/s / 91.0 MB/s > memcpy_arm : (7 bytes copy) =3D 109.5 MB/s / 104.7 MB/s > memcpy_neon : (7 bytes copy) =3D 122.1 MB/s / 126.6 MB/s > memcpy_arm : (8 bytes copy) =3D 122.4 MB/s / 122.4 MB/s > memcpy_neon : (8 bytes copy) =3D 142.0 MB/s / 148.2 MB/s > memcpy_arm : (11 bytes copy) =3D 157.8 MB/s / 161.3 MB/s > memcpy_neon : (11 bytes copy) =3D 193.8 MB/s / 196.2 MB/s > memcpy_arm : (12 bytes copy) =3D 170.1 MB/s / 172.7 MB/s > memcpy_neon : (12 bytes copy) =3D 206.8 MB/s / 212.5 MB/s > memcpy_arm : (15 bytes copy) =3D 204.0 MB/s / 209.6 MB/s > memcpy_neon : (15 bytes copy) =3D 247.5 MB/s / 270.3 MB/s > memcpy_arm : (16 bytes copy) =3D 212.2 MB/s / 225.6 MB/s > memcpy_neon : (16 bytes copy) =3D 175.3 MB/s / 252.2 MB/s > memcpy_arm : (24 bytes copy) =3D 274.6 MB/s / 326.5 MB/s > memcpy_neon : (24 bytes copy) =3D 244.7 MB/s / 367.8 MB/s > memcpy_arm : (31 bytes copy) =3D 333.3 MB/s / 399.2 MB/s > memcpy_neon : (31 bytes copy) =3D 304.3 MB/s / 463.5 MB/s > > L1 cached data: > memcpy_arm : (4096 bytes copy) =3D 1295.5 MB/s / 2691.8 MB/s > memcpy_neon : (4096 bytes copy) =3D 1826.3 MB/s / 2021.8 MB/s > memcpy_arm : (6144 bytes copy) =3D 1306.5 MB/s / 2724.1 MB/s > memcpy_neon : (6144 bytes copy) =3D 1857.8 MB/s / 2053.2 MB/s > > L2 cached data: > memcpy_arm : (65536 bytes copy) =3D 1291.5 MB/s / 2304.8 MB/s > memcpy_neon : (65536 bytes copy) =3D 1866.5 MB/s / 2441.7 MB/s > memcpy_arm : (98304 bytes copy) =3D 1285.6 MB/s / 2283.8 MB/s > memcpy_neon : (98304 bytes copy) =3D 1860.7 MB/s / 2454.7 MB/s > > SDRAM: > memcpy_arm : (2097152 bytes copy) =3D 466.7 MB/s / 736.5 MB/s > memcpy_neon : (2097152 bytes copy) =3D 727.5 MB/s / 868.8 MB/s > memcpy_arm : (3145728 bytes copy) =3D 507.9 MB/s / 854.7 MB/s > memcpy_neon : (3145728 bytes copy) =3D 852.9 MB/s / 1038.0 MB/s > > (*) 1 MB =3D 1000000 bytes > (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports You don't say what this is measured on. Without knowing the hardware=20 it's impossible to really argue whether this is generally a good thing=20 or not. R.