From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jamie Lokier <egcs@tantalophile.demon.co.uk>
To: Ã´ÃÃÃÃÃÃ Ã·ÃÃÃÃÃÃÃ <medtekh@orc.ru>, egcs@egcs.cygnus.com
Subject: Re: gcc-2.7 creates faster code than pgcc-1.1.1
Date: Thu, 04 Mar 1999 13:20:00 -0000
Message-id: <19990304222018.A21939@pcep-jamie.cern.ch>
In-reply-to: < 001401be6633$fed21a60$a18330d4@main.medtech.ru >; from Ã´ÃÃÃÃÃÃ Ã·ÃÃÃÃÃÃÃ on Thu, Mar 04, 1999 at 02:40:08PM +0300
References: <001401be6633$fed21a60$a18330d4@main.medtech.ru>
X-SW-Source: 1999-03/msg00185.html

Ã´ÃÃÃÃÃÃ Ã·ÃÃÃÃÃÃÃ wrote:
> After several day of search I finally find out offending
> instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
> on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.
> 
> This instruction is:
> andl $255, %eax
> in flush_window (util.c) function body (it is inlined from updcrc)
> 
> if you manually replace it with
> movzbl %al, $eax
> this will boost decompression by 20%.

In the past I have written hand-optimised assembly language, tuned for
the different x86 families, and I found movzbl to be a very effective
instruction on the Pentium Pro.  So what you describe sounds correct.

Another is to do xorl %eax,%eax just before loading something into %al.
That is fast on the PPro too.

-- Jamie

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jamie Lokier <egcs@tantalophile.demon.co.uk>
To: Ã´ÃÃÃÃÃÃ Ã·ÃÃÃÃÃÃÃ <medtekh@orc.ru>, egcs@egcs.cygnus.com
Subject: Re: gcc-2.7 creates faster code than pgcc-1.1.1
Date: Wed, 31 Mar 1999 23:46:00 -0000
Message-ID: <19990304222018.A21939@pcep-jamie.cern.ch>
References: <001401be6633$fed21a60$a18330d4@main.medtech.ru>
X-SW-Source: 1999-03n/msg00186.html
Message-ID: <19990331234600.aBMevfvOnf0n5uznRNnvMFsduat1KtFlrn7PCbQ1UOs@z>

Ã´ÃÃÃÃÃÃ Ã·ÃÃÃÃÃÃÃ wrote:
> After several day of search I finally find out offending
> instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
> on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.
> 
> This instruction is:
> andl $255, %eax
> in flush_window (util.c) function body (it is inlined from updcrc)
> 
> if you manually replace it with
> movzbl %al, $eax
> this will boost decompression by 20%.

In the past I have written hand-optimised assembly language, tuned for
the different x86 families, and I found movzbl to be a very effective
instruction on the Pentium Pro.  So what you describe sounds correct.

Another is to do xorl %eax,%eax just before loading something into %al.
That is fast on the PPro too.

-- Jamie