From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-334195-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16876 invoked by alias); 13 Dec 2012 19:43:06 -0000
Received: (qmail 16867 invoked by uid 22791); 13 Dec 2012 19:43:05 -0000
X-SWARE-Spam-Status: No, hits=-3.8 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_CP
X-Spam-Check-By: sourceware.org
Received: from mail-qa0-f47.google.com (HELO mail-qa0-f47.google.com) (209.85.216.47)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Dec 2012 19:42:54 +0000
Received: by mail-qa0-f47.google.com with SMTP id a19so6911597qad.20        for <gcc-patches@gcc.gnu.org>; Thu, 13 Dec 2012 11:42:53 -0800 (PST)
MIME-Version: 1.0
Received: by 10.229.209.137 with SMTP id gg9mr321376qcb.125.1355427772130; Thu, 13 Dec 2012 11:42:52 -0800 (PST)
Received: by 10.49.12.210 with HTTP; Thu, 13 Dec 2012 11:42:51 -0800 (PST)
In-Reply-To: <20121213062128.GK2315@tucnak.redhat.com>
References: <CAAkRFZLMofkNZs9NUkfUDnMzVd5YsVhbx0xsb8jZuXy_eqEj6w@mail.gmail.com>	<20121212163722.GA21037@atrey.karlin.mff.cuni.cz>	<CAAkRFZKBa3GtEh=mmWiAmy-oGffYFxrmWetpaz+pKYSG1zSvSw@mail.gmail.com>	<20121212183036.GB5303@atrey.karlin.mff.cuni.cz>	<CAAkRFZLAe0CuO+-sBps9pCBDVi5k2ti8cBgL9Ukw4fBmrnpUeg@mail.gmail.com>	<20121213011933.GB21037@atrey.karlin.mff.cuni.cz>	<CAAkRFZ+gf-733+7djNrEyi6t_Sx6UzH3xSXsAGAMH+oWwCjN1Q@mail.gmail.com>	<20121213062128.GK2315@tucnak.redhat.com>
Date: Thu, 13 Dec 2012 19:43:00 -0000
Message-ID: <CAMe9rOri3djrv29rQKMLS8jdYvJ8xxs+7xDaL6U-iKa=ojOjrw@mail.gmail.com>
Subject: Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Xinliang David Li <davidxl@google.com>, Jan Hubicka <hubicka@ucw.cz>, 	GCC Patches <gcc-patches@gcc.gnu.org>, Teresa Johnson <tejohnson@google.com>
Content-Type: text/plain; charset=ISO-8859-1
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-12/txt/msg00937.txt.bz2

On Wed, Dec 12, 2012 at 10:21 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote:
>> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> > libcall is not faster up to 8KB to rep sequence that is better for regalloc/code
>> >> > cache than fully blowin function call.
>> >>
>> >> Be careful with this. My recollection is that REP sequence is good for
>> >> any size -- for smaller size, the REP initial set up cost is too high
>> >> (10s of cycles), while for large size copy, it is less efficient
>> >> compared with library version.
>> >
>> > Well this is based on the data from the memtest script.
>> > Core has good REP implementation - it is a win from rather small blocks (16
>> > bytes if I recall) and it does not need alignment.
>> > Library version starts to be interesting with caching hints, but I think till 80KB
>> > it is still not a win for my setup (glibc-2.15)
>>
>> A simple test shows that -mstringop-strategy=libcall always beats
>> -mstringop-strategy=rep_8byte (on core2 and corei7) except for size
>> smaller than 8 where the rep_8byte strategy simply bypasses REP movs.
>> Can you share your memtest ?
>
> I can't believe that say 16 byte or 32 byte memcpy can be ever faster using a
> libcall.  The PLT call overhead is simply too high.
>

The x86 string/memory functions in the current glibc are
extremely fast and tuned for Core 2/Core i7.  GCC is having
a very hard time to beat them with inlining:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

-- 
H.J.