From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-50490-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 10607 invoked by alias); 23 Apr 2002 06:07:32 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 10545 invoked from network); 23 Apr 2002 06:07:16 -0000
Received: from unknown (HELO Angel.zoy.org) (12.236.86.18)
  by sources.redhat.com with SMTP; 23 Apr 2002 06:07:16 -0000
Received: by Angel.zoy.org (Postfix, from userid 1000)
	id 582C8B8CD; Mon, 22 Apr 2002 23:07:09 -0700 (PDT)
Date: Mon, 22 Apr 2002 23:30:00 -0000
From: Michel LESPINASSE <walken@zoy.org>
To: Roger Sayle <roger@eyesopen.com>
Cc: gcc@gcc.gnu.org, Richard Henderson <rth@redhat.com>,
	Jan Hubicka <jh@suse.cz>
Subject: Re: GCC performance regression - its memset!
Message-ID: <20020423060709.GA21922@zoy.org>
References: <Pine.LNX.4.33.0204222307450.2893-100000@www.eyesopen.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.33.0204222307450.2893-100000@www.eyesopen.com>
User-Agent: Mutt/1.3.28i
X-SW-Source: 2002-04/txt/msg01142.txt.bz2

On Mon, Apr 22, 2002 at 11:13:09PM -0600, Roger Sayle wrote:
> 
> I think its one of Jan's changes.  I can reproduce the problem, and
> fix it using "-minline-all-stringops" which forces 3.1 to inline the
> memset on i686.  I was concerned that it was a middle-end bug with
> builtins, but it now appears to be an ia32 back-end issue.
> 
> Michel, does "-minline-all-stringops" fix the problem for you?

This option actually generates invalid code for me. Here is a test case:

------------------- cut here -----------------
#include <string.h>

short table[64];

int main (void)
{
    int i;

    for (i = 0; i < 64; i++)
        table[i] = 1234;

    memset (table, 0, 63 * sizeof(short));

    return (table[63] != 0);
}
------------------- cut here -----------------

This code should return 0, however it returns 1 (compiled with -O3
-minline-all-stringops)

Here is an extract from the generated asm (the memset part of it):
        movl    $table, %edi
        testl   $1, %edi       <- test 1-byte alignment (hmmm, isnt table
                                  already two-byte aligned, being a short ?)
        movl    $126, %eax     <- we want to clear 126 bytes
        je      .L7
        movb    $0, table
        movl    $table+1, %edi <- now edi is guaranteed two-byte-aligned
        movl    $125, %eax
.L7:
        testl   $2, %edi       <- test 4-byte alignment
        je      .L8
        movw    $0, (%edi)
        subl    $2, %eax       <- now edi is guaranteed four-byte-aligned
        addl    $2, %edi
.L8:
        cld
        movl    %eax, %ecx
        xorl    %eax, %eax
        shrl    $2, %ecx       <- number of 4-byte words remaining
        rep
        stosl
        testl   $2, %edi       <- ooops, its really meant to test the remainder
                                  not the address !!! so test will always fail.
        je      .L9
        movw    $0, (%edi)
        addl    $2, %edi
.L9:
        testl   $1, %edi       <- that one too.
        je      .L10
        movb    $0, (%edi)
.L10:


2.95 was generating simpler code:
        movl $table,%edi
        xorl %eax,%eax
        cld
        movl $31,%ecx
        rep
        stosl
        stosw

This did not take care about alignment issues, but was simpler and
actually faster on my athlon.

Hope this helps,

-- 
Michel "Walken" LESPINASSE
Is this the best that god can do ? Then I'm not impressed.