From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier To: ôÅÒÅÈÉÎ ÷ÑÞÅÓÌÁ× , egcs@egcs.cygnus.com Subject: Re: gcc-2.7 creates faster code than pgcc-1.1.1 Date: Thu, 04 Mar 1999 13:20:00 -0000 Message-id: <19990304222018.A21939@pcep-jamie.cern.ch> In-reply-to: < 001401be6633$fed21a60$a18330d4@main.medtech.ru >; from ôÅÒÅÈÉÎ ÷ÑÞÅÓÌÁ× on Thu, Mar 04, 1999 at 02:40:08PM +0300 References: <001401be6633$fed21a60$a18330d4@main.medtech.ru> X-SW-Source: 1999-03/msg00185.html ôÅÒÅÈÉÎ ÷ÑÞÅÓÌÁ× wrote: > After several day of search I finally find out offending > instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1 > on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me. > > This instruction is: > andl $255, %eax > in flush_window (util.c) function body (it is inlined from updcrc) > > if you manually replace it with > movzbl %al, $eax > this will boost decompression by 20%. In the past I have written hand-optimised assembly language, tuned for the different x86 families, and I found movzbl to be a very effective instruction on the Pentium Pro. So what you describe sounds correct. Another is to do xorl %eax,%eax just before loading something into %al. That is fast on the PPro too. -- Jamie From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier To: ôÅÒÅÈÉÎ ÷ÑÞÅÓÌÁ× , egcs@egcs.cygnus.com Subject: Re: gcc-2.7 creates faster code than pgcc-1.1.1 Date: Wed, 31 Mar 1999 23:46:00 -0000 Message-ID: <19990304222018.A21939@pcep-jamie.cern.ch> References: <001401be6633$fed21a60$a18330d4@main.medtech.ru> X-SW-Source: 1999-03n/msg00186.html Message-ID: <19990331234600.aBMevfvOnf0n5uznRNnvMFsduat1KtFlrn7PCbQ1UOs@z> ôÅÒÅÈÉÎ ÷ÑÞÅÓÌÁ× wrote: > After several day of search I finally find out offending > instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1 > on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me. > > This instruction is: > andl $255, %eax > in flush_window (util.c) function body (it is inlined from updcrc) > > if you manually replace it with > movzbl %al, $eax > this will boost decompression by 20%. In the past I have written hand-optimised assembly language, tuned for the different x86 families, and I found movzbl to be a very effective instruction on the Pentium Pro. So what you describe sounds correct. Another is to do xorl %eax,%eax just before loading something into %al. That is fast on the PPro too. -- Jamie