From mboxrd@z Thu Jan  1 00:00:00 1970
From: "ÃÃ¥Ã°Ã¥ÃµÃ¨Ã­ ÃÃ¿Ã·Ã¥Ã±Ã«Ã Ã¢" <medtekh@orc.ru>
To: "Richard Henderson" <rth@cygnus.com>, "Alfred Perlstein" <bright@cygnus.rush.net>
Cc: "H.J. Lu" <hjl@varesearch.com>, <egcs@egcs.cygnus.com>
Subject: Re: gcc-2.7 creates faster code than pgcc-1.1.1
Date: Wed, 31 Mar 1999 23:46:00 -0000
Message-ID: <001401be6a0e$2734a300$a18330d4@main.medtech.ru>
X-SW-Source: 1999-03n/msg00344.html
Message-ID: <19990331234600.zyB6C4XsIfRALxPBunWqAaUpUx2t2JLdS1q8m8fxDDQ@z>

>On Fri, Mar 05, 1999 at 12:23:46PM -0500, Alfred Perlstein wrote:
>> > I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
>> > So this is a kind of mistery with this instructions.
>>
>> I think the magic lies in that with register renaming, instruction
>> caches and all the 'behind the scenes' optimizations PPro and later
>> versions of x86 chips can do.  It really should be investigated more.
>
>It has nothing to do with register renaming.
>
>It is most likely to be related to instruction alignment -- some
>important insn in the loop is straddling a 16-byte boundary, which
>requires an extra cycle to decode.
>
>I've seen such create up to a 20% difference in runtime on a small loop.
>


It has nothing to deal with para boundary. In movz case xorb insn crosses
para boundary
while with andl no insn crosses para boundary.

Sincerely Yours, Eugene.

P.S. For H.J.Lu -- I do not state that things go slower with movz. Slow down
I get were 1% (this can be statistical error). Nevertheless there is no
speed up in
most cases too (or such a huge speed up as with decompression).
We should try to find out more why and how this happens.
BTW I have PPro 180MHz.