From mboxrd@z Thu Jan  1 00:00:00 1970
From: Sylvain Pion <Sylvain.Pion@sophia.inria.fr>
To: Richard Henderson <rth@cygnus.com>
Cc: law@cygnus.com, Jason Merrill <jason@cygnus.com>, egcs@egcs.cygnus.com
Subject: Re: C++ default copy ctor not optimal
Date: Sun, 28 Feb 1999 22:53:00 -0000
Message-ID: <19990216091048.A13581@rigel.inria.fr>
References: <19990215171524.A19063@cygnus.com> <14555.919137968@upchuck> <19990215213927.A19254@cygnus.com>
X-SW-Source: 1999-02n/msg00699.html
Message-ID: <19990228225300.s_JAAAuQaYI7WkLtBo_RG0L-KSZAw8yqz6t7gyQyiSw@z>

On Mon, Feb 15, 1999 at 09:39:27PM -0800, Richard Henderson wrote:
> On Mon, Feb 15, 1999 at 09:06:08PM -0700, Jeffrey A Law wrote:
> >   > The x86 fpu can load DImode values without faulting, and since
> >   > the frational part of the extended double register is 64-bits
> >   > wide, we don't lose bits.

"can" ?  Does it mean it depends on some flags in the FPCW ?
What about if the FPU is in MMX mode ?  I guess it won't work, will it ?
In MMX mode, we can use MMX insns, but the compiler doesn't know
in which mode we are.

> > But is it profitable?  Particularly in cases where the addresses are
> > not 64bit aligned?
> 
> Certainly not when alignment is not to be had.  But on Pentiums,
> it can speed things up quite a bit. 

Yes.  The speed up is noticable for my stuff, so I guess that using it more
widely is a good idea, if it's feasible.
The speed difference is also very important in case the alignement is not
correct.

> I'm not sure what effect it has on p2.  Probably still a good thing
> in small doses.  Larger copies should use rep movsl, as the microcode
> does neat cache tricks.

I don't know, but the FP memcpy() patch for the linux kernel worked very well
(at least on pentiums), and it was for large areas.

-- 
Sylvain