From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28103 invoked by alias); 24 Jul 2008 08:04:20 -0000 Received: (qmail 28095 invoked by uid 22791); 24 Jul 2008 08:04:18 -0000 X-Spam-Check-By: sourceware.org Received: from smtp.fullrate.dk (HELO dns2.fullrate.dk) (89.150.129.5) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 24 Jul 2008 08:03:48 +0000 Received: from [192.168.1.33] (3604ds3-fb.0.fullrate.dk [90.184.27.253]) by dns2.fullrate.dk (Postfix) with ESMTP id C638A5CE28; Thu, 24 Jul 2008 10:03:43 +0200 (CEST) Message-ID: <4888375A.30601@agner.org> Date: Thu, 24 Jul 2008 09:41:00 -0000 From: Agner Fog User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: dclarke@opensolaris.org CC: gcc@gcc.gnu.org, TimothyPrince@sbcglobal.net Subject: Re: gcc will become the best optimizing x86 compiler References: <2E073B3ABB3F664DBA1D1C4D5FB47EF40EBDAD8E@NT-IRVA-0752.brcm.ad.broadcom.com> <4887592E.4040804@agner.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2008-07/txt/msg00426.txt.bz2 Dennis Clarke wrote: >The Sun Studio 12 compiler with Solaris 10 on AMD Opteron or >UltraSparc beats GCC in almost every single test case that I have >seen. This is memcpy on Solaris: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/i386/gen/memcpy.s It uses exactly the same method as memcpy on gcc libc, with only minor differences that have no influence on performance. > Also, you have provided no data at all. I have linked to the data rather than copying it here to save space on the mailing list. Here is the link again: http://www.agner.org/optimize/optimizing_cpp.pdf section 2.6, page 12. > So your assertions are those of a marketing person at the moment. Who sounds like a marketing person, you or me? :-) > Please post some code that can be compiled and then tested with high resolution timers and perhaps > we can compare notes. Here is my code, again: http://www.agner.org/optimize/asmlib.zip My test results, referred to above, uses the "core clock cycles" performance counter on Intel and RDTSC on AMD. It's the highest resolution you can get. Feel free to do you own tests, it's as simple as linking my library into your test program. Tim Prince wrote: >you identify the library you tested only as "ubuntu g++ 4.2.3." Where can I see the libc version? >The corresponding 64-bit linux will see vastly different levels of performance, depending on the >glibc version, as it doesn't use a builtin string move. Yes, this is exactly what my tests show. 64-bit libc is better than 32-bit libc, but still 3-4 times slower than the best library for unaligned operands on an Intel. >Certain newer CPUs aim to improve performance of the 32-bit gcc builtin string moves, but don't > entirely eliminate the situations where it isn't optimum. The Intel manuals are not clear about this. Intel Optimization reference manual says: >In most cases, applications should take advantage of the default memory routines provided by Intel compilers. What an excellent advice - the Intel compiler puts in a library with an automatic run-slowly-on-AMD feature! The Intel library does not use rep movs when running on an Intel CPU. The AMD software optimization guide mentions specific situations where rep movs is optimal. However, my tests on an Opteron (K8) tell that rep movs is never optimal on AMD either. I have no access to test it on the new AMD K10, but I expect the XMM register code to run much faster on K10 than on K8 because K10 has 128-bit data paths where K8 has only 64-bit. Evidently, the problem with memcpy has been ignored for years, see http://softwarecommunity.intel.com/Wiki/Linux/719.htm