From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26926 invoked by alias); 7 Nov 2007 09:15:29 -0000 Received: (qmail 26870 invoked by uid 48); 7 Nov 2007 09:15:20 -0000 Date: Wed, 07 Nov 2007 09:15:00 -0000 Message-ID: <20071107091520.26869.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/26658] [4.0/4.1/4.2/4.3 Regression] memcpy/memset are not inlining with -march=athlon-xp and size of 128 In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "jakub at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2007-11/txt/msg00568.txt.bz2 ------- Comment #8 from jakub at gcc dot gnu dot org 2007-11-07 09:15 ------- I'd stress that this is extremely worthless "benchmark", because it makes no attempt to ensure the calls are really done and not optimized away, which happens in the 3.4.x -march=athlon-xp case. At expand time GCC decides which of the forms of memcpy/memset are fastest and 4.x believes for -mathlon-xp it is rep; stosl resp. rep; movsl, while 3.4.x believed it is 32 individual stores resp. 32 reads + 32 stores, another alternative is calling an optimized memcpy library routine. Try changing the definition of T to #define T memcpy(mb1, mb2, Block_Size); memset(mb2, i, Block_Size); asm volatile ("" : : "r" (mb1), "r" (mb2) : "memory"); which makes sure the memcpy/memsets can't be optimized away and you'll see very different results. The thing is just that we are able to DSE just the memcpy/memset expanded to individual instructions. What we perhaps should have a tree pass which analyzes all the usual string operations, knows exactly what they are doing and will track what they do with memory (track e.g. how long a zero terminated string in some buffer is, what values it contains - these len1 bytes are copied from bufx, these len2 bytes are 0 and change say calls like strcat where we know where the destination string ends into strcpy (or memcpy if we even known the length etc.)). Plus perhaps teach tree DSE about memcpy/memset. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26658