From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-334139-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 32504 invoked by alias); 13 Dec 2012 06:21:44 -0000
Received: (qmail 32485 invoked by uid 22791); 13 Dec 2012 06:21:42 -0000
X-SWARE-Spam-Status: No, hits=-6.3 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS,TW_CP
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Dec 2012 06:21:32 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id qBD6LVku020788	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Thu, 13 Dec 2012 01:21:31 -0500
Received: from zalov.redhat.com (vpn1-5-150.ams2.redhat.com [10.36.5.150])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id qBD6LT80006181	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);	Thu, 13 Dec 2012 01:21:31 -0500
Received: from zalov.cz (localhost [127.0.0.1])	by zalov.redhat.com (8.14.5/8.14.5) with ESMTP id qBD6LTkb028676;	Thu, 13 Dec 2012 07:21:29 +0100
Received: (from jakub@localhost)	by zalov.cz (8.14.5/8.14.5/Submit) id qBD6LSQ4028675;	Thu, 13 Dec 2012 07:21:28 +0100
Date: Thu, 13 Dec 2012 06:21:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: Xinliang David Li <davidxl@google.com>
Cc: Jan Hubicka <hubicka@ucw.cz>, GCC Patches <gcc-patches@gcc.gnu.org>,        Teresa Johnson <tejohnson@google.com>
Subject: Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs
Message-ID: <20121213062128.GK2315@tucnak.redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <CAAkRFZLMofkNZs9NUkfUDnMzVd5YsVhbx0xsb8jZuXy_eqEj6w@mail.gmail.com> <20121212163722.GA21037@atrey.karlin.mff.cuni.cz> <CAAkRFZKBa3GtEh=mmWiAmy-oGffYFxrmWetpaz+pKYSG1zSvSw@mail.gmail.com> <20121212183036.GB5303@atrey.karlin.mff.cuni.cz> <CAAkRFZLAe0CuO+-sBps9pCBDVi5k2ti8cBgL9Ukw4fBmrnpUeg@mail.gmail.com> <20121213011933.GB21037@atrey.karlin.mff.cuni.cz> <CAAkRFZ+gf-733+7djNrEyi6t_Sx6UzH3xSXsAGAMH+oWwCjN1Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAAkRFZ+gf-733+7djNrEyi6t_Sx6UzH3xSXsAGAMH+oWwCjN1Q@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-12/txt/msg00881.txt.bz2

On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote:
> On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> > libcall is not faster up to 8KB to rep sequence that is better for regalloc/code
> >> > cache than fully blowin function call.
> >>
> >> Be careful with this. My recollection is that REP sequence is good for
> >> any size -- for smaller size, the REP initial set up cost is too high
> >> (10s of cycles), while for large size copy, it is less efficient
> >> compared with library version.
> >
> > Well this is based on the data from the memtest script.
> > Core has good REP implementation - it is a win from rather small blocks (16
> > bytes if I recall) and it does not need alignment.
> > Library version starts to be interesting with caching hints, but I think till 80KB
> > it is still not a win for my setup (glibc-2.15)
> 
> A simple test shows that -mstringop-strategy=libcall always beats
> -mstringop-strategy=rep_8byte (on core2 and corei7) except for size
> smaller than 8 where the rep_8byte strategy simply bypasses REP movs.
> Can you share your memtest ?

I can't believe that say 16 byte or 32 byte memcpy can be ever faster using a
libcall.  The PLT call overhead is simply too high.

	Jakub