From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 459 invoked by alias); 30 Oct 2012 21:56:41 -0000 Received: (qmail 450 invoked by uid 22791); 30 Oct 2012 21:56:40 -0000 X-SWARE-Spam-Status: No, hits=-4.7 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_CP X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 30 Oct 2012 21:56:32 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1TTJnT-0007Bo-IV from Maxim_Kuvyrkov@mentor.com ; Tue, 30 Oct 2012 14:56:31 -0700 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 30 Oct 2012 14:56:31 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Tue, 30 Oct 2012 21:56:29 +0000 Subject: Re: [PATCH] Optimize MIPS memcpy MIME-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset="us-ascii" From: Maxim Kuvyrkov In-Reply-To: <1351619149.15035.110.camel@ubuntu-sellcey> Date: Tue, 30 Oct 2012 21:56:00 -0000 CC: Andrew Pinski , "Joseph S. Myers" , Content-Transfer-Encoding: quoted-printable Message-ID: <1F5F6910-1831-47F9-BB49-889C5F16299E@codesourcery.com> References: <5044746c.23eb440a.75e2.618f@mx.google.com> <1346771341.14333.20.camel@ubuntu-sellcey> <596797ED-6575-456D-98FD-C13A209DBC49@mentor.com> <1346948701.14333.152.camel@ubuntu-sellcey> <1347376645.14333.319.camel@ubuntu-sellcey> <1348166309.6170.55.camel@ubuntu-sellcey> <25105334-8813-4532-AC0E-B3A44BE69A19@codesourcery.com> <5B30D440-A918-4352-8DED-A7D681DF0338@codesourcery.com> <1349715796.30194.131.camel@ubuntu-sellcey> <954E9625-0D5C-4295-9229-C16A3F5C200D@codesourcery.com> <1350323373.2044.7.camel@ubuntu-sellcey> <1350333255.2044.15.camel@ubuntu-sellcey> <1350337000.2044.23.camel@ubuntu-sellcey> <1350494987.2660.5.camel@ubuntu-sellcey> <1351533617.15035.98.camel@ubuntu-sellcey> <0CA30903-22E1-47EF-BA7F-9A82E5DC8961@codesourcery.com> <1351619149.15035.110.camel@ubuntu-sellcey> To: Steve Ellcey Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org X-SW-Source: 2012-10/txt/msg00110.txt.bz2 n 31/10/2012, at 6:45 AM, Steve Ellcey wrote: > On Tue, 2012-10-30 at 20:16 +1300, Maxim Kuvyrkov wrote: >>=20 ... >> I have tested your latest version. Good news: there are no correctness = issues. Bad news: it underperforms compared to my patch by 2-3 times on bo= th N32 and N64 (didn't test O32) on the benchmark that I used. I've run th= e benchmark several times and results are consistent. I use oprofile on li= bc.so to determine how much time is spent in memcpy. >>=20 >> Would you please confirm that your current implementation is faster on Y= OUR benchmark than my patch in http://sourceware.org/ml/libc-ports/2012-09/= msg00000.html ? Please make sure that PREFETCH macro in ports/sysdeps/mips= /sys/asm.h gets defined to "pref", not "nop", in your build. >>=20 >> Thanks, >>=20 >> -- >> Maxim Kuvyrkov >> CodeSourcery / Mentor Graphics >=20 > Maxim, With O32 ABI I am seeing my version as slightly faster for large > memcpy's and slightly slower for small memcpy's compared to yours. >=20 > With N32 and 64 ABI's I see my version as slightly faster across the > board (a couple of percentage points). I am definitely not seeing > anything like a 2X difference. Are you sure prefetch is defined when > you tested my version? How about using double loads and stores? They > should both get set by default. It turns out I was benchmarking my patch against original glibc implementat= ion, not yours (patched files in ports/ instead of libc/ports). With the p= atch applied correctly, the performance is virtually the same on my benchma= rk. I've also checked the assembly dump of libc.so and confirmed that pref= etch instructions and 8-byte loads/store are used where appropriate. Given that your patch provides on par or better performance than mine, and = it also unifies MIPS memcpy for all ABIs (as well as between glibc and Bion= ic!) -- I am all for your patch. I've reviewed you patch -- code is clean and well-documented. Please apply= the patch if sufficient testing has been done: big- and little-endian for = o32/n32/n64 ABIs. I've tested your patch for all big-endian ABIs, so you j= ust need to cover little-endian (which, I think, you may have done already). Thanks for bearing with me through all the debugging process! -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics