From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10607 invoked by alias); 23 Apr 2002 06:07:32 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 10545 invoked from network); 23 Apr 2002 06:07:16 -0000 Received: from unknown (HELO Angel.zoy.org) (12.236.86.18) by sources.redhat.com with SMTP; 23 Apr 2002 06:07:16 -0000 Received: by Angel.zoy.org (Postfix, from userid 1000) id 582C8B8CD; Mon, 22 Apr 2002 23:07:09 -0700 (PDT) Date: Mon, 22 Apr 2002 23:30:00 -0000 From: Michel LESPINASSE To: Roger Sayle Cc: gcc@gcc.gnu.org, Richard Henderson , Jan Hubicka Subject: Re: GCC performance regression - its memset! Message-ID: <20020423060709.GA21922@zoy.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.28i X-SW-Source: 2002-04/txt/msg01142.txt.bz2 On Mon, Apr 22, 2002 at 11:13:09PM -0600, Roger Sayle wrote: > > I think its one of Jan's changes. I can reproduce the problem, and > fix it using "-minline-all-stringops" which forces 3.1 to inline the > memset on i686. I was concerned that it was a middle-end bug with > builtins, but it now appears to be an ia32 back-end issue. > > Michel, does "-minline-all-stringops" fix the problem for you? This option actually generates invalid code for me. Here is a test case: ------------------- cut here ----------------- #include short table[64]; int main (void) { int i; for (i = 0; i < 64; i++) table[i] = 1234; memset (table, 0, 63 * sizeof(short)); return (table[63] != 0); } ------------------- cut here ----------------- This code should return 0, however it returns 1 (compiled with -O3 -minline-all-stringops) Here is an extract from the generated asm (the memset part of it): movl $table, %edi testl $1, %edi <- test 1-byte alignment (hmmm, isnt table already two-byte aligned, being a short ?) movl $126, %eax <- we want to clear 126 bytes je .L7 movb $0, table movl $table+1, %edi <- now edi is guaranteed two-byte-aligned movl $125, %eax .L7: testl $2, %edi <- test 4-byte alignment je .L8 movw $0, (%edi) subl $2, %eax <- now edi is guaranteed four-byte-aligned addl $2, %edi .L8: cld movl %eax, %ecx xorl %eax, %eax shrl $2, %ecx <- number of 4-byte words remaining rep stosl testl $2, %edi <- ooops, its really meant to test the remainder not the address !!! so test will always fail. je .L9 movw $0, (%edi) addl $2, %edi .L9: testl $1, %edi <- that one too. je .L10 movb $0, (%edi) .L10: 2.95 was generating simpler code: movl $table,%edi xorl %eax,%eax cld movl $31,%ecx rep stosl stosw This did not take care about alignment issues, but was simpler and actually faster on my athlon. Hope this helps, -- Michel "Walken" LESPINASSE Is this the best that god can do ? Then I'm not impressed.