From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32504 invoked by alias); 13 Dec 2012 06:21:44 -0000 Received: (qmail 32485 invoked by uid 22791); 13 Dec 2012 06:21:42 -0000 X-SWARE-Spam-Status: No, hits=-6.3 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS,TW_CP X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Dec 2012 06:21:32 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id qBD6LVku020788 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 13 Dec 2012 01:21:31 -0500 Received: from zalov.redhat.com (vpn1-5-150.ams2.redhat.com [10.36.5.150]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id qBD6LT80006181 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Dec 2012 01:21:31 -0500 Received: from zalov.cz (localhost [127.0.0.1]) by zalov.redhat.com (8.14.5/8.14.5) with ESMTP id qBD6LTkb028676; Thu, 13 Dec 2012 07:21:29 +0100 Received: (from jakub@localhost) by zalov.cz (8.14.5/8.14.5/Submit) id qBD6LSQ4028675; Thu, 13 Dec 2012 07:21:28 +0100 Date: Thu, 13 Dec 2012 06:21:00 -0000 From: Jakub Jelinek To: Xinliang David Li Cc: Jan Hubicka , GCC Patches , Teresa Johnson Subject: Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs Message-ID: <20121213062128.GK2315@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20121212163722.GA21037@atrey.karlin.mff.cuni.cz> <20121212183036.GB5303@atrey.karlin.mff.cuni.cz> <20121213011933.GB21037@atrey.karlin.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-12/txt/msg00881.txt.bz2 On Wed, Dec 12, 2012 at 10:09:14PM -0800, Xinliang David Li wrote: > On Wed, Dec 12, 2012 at 5:19 PM, Jan Hubicka wrote: > >> > libcall is not faster up to 8KB to rep sequence that is better for regalloc/code > >> > cache than fully blowin function call. > >> > >> Be careful with this. My recollection is that REP sequence is good for > >> any size -- for smaller size, the REP initial set up cost is too high > >> (10s of cycles), while for large size copy, it is less efficient > >> compared with library version. > > > > Well this is based on the data from the memtest script. > > Core has good REP implementation - it is a win from rather small blocks (16 > > bytes if I recall) and it does not need alignment. > > Library version starts to be interesting with caching hints, but I think till 80KB > > it is still not a win for my setup (glibc-2.15) > > A simple test shows that -mstringop-strategy=libcall always beats > -mstringop-strategy=rep_8byte (on core2 and corei7) except for size > smaller than 8 where the rep_8byte strategy simply bypasses REP movs. > Can you share your memtest ? I can't believe that say 16 byte or 32 byte memcpy can be ever faster using a libcall. The PLT call overhead is simply too high. Jakub