From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Òåðåõèí Âÿ÷åñëàâ" To: "Richard Henderson" , "Alfred Perlstein" Cc: "H.J. Lu" , Subject: Re: gcc-2.7 creates faster code than pgcc-1.1.1 Date: Wed, 31 Mar 1999 23:46:00 -0000 Message-ID: <001401be6a0e$2734a300$a18330d4@main.medtech.ru> X-SW-Source: 1999-03n/msg00344.html Message-ID: <19990331234600.zyB6C4XsIfRALxPBunWqAaUpUx2t2JLdS1q8m8fxDDQ@z> >On Fri, Mar 05, 1999 at 12:23:46PM -0500, Alfred Perlstein wrote: >> > I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both. >> > So this is a kind of mistery with this instructions. >> >> I think the magic lies in that with register renaming, instruction >> caches and all the 'behind the scenes' optimizations PPro and later >> versions of x86 chips can do. It really should be investigated more. > >It has nothing to do with register renaming. > >It is most likely to be related to instruction alignment -- some >important insn in the loop is straddling a 16-byte boundary, which >requires an extra cycle to decode. > >I've seen such create up to a 20% difference in runtime on a small loop. > It has nothing to deal with para boundary. In movz case xorb insn crosses para boundary while with andl no insn crosses para boundary. Sincerely Yours, Eugene. P.S. For H.J.Lu -- I do not state that things go slower with movz. Slow down I get were 1% (this can be statistical error). Nevertheless there is no speed up in most cases too (or such a huge speed up as with decompression). We should try to find out more why and how this happens. BTW I have PPro 180MHz.