From mboxrd@z Thu Jan  1 00:00:00 1970
From: Sylvain Pion <Sylvain.Pion@sophia.inria.fr>
To: EGCS list <egcs@egcs.cygnus.com>
Subject: Re: C++ default copy ctor not optimal
Date: Fri, 12 Feb 1999 04:46:00 -0000
Message-id: <19990212134607.F13091@rigel.inria.fr>
In-reply-to: < 19990212120037.C13091@rigel.inria.fr >; from Sylvain Pion on Fri, Feb 12, 1999 at 12:00:37PM +0100
References: <19990212120037.C13091@rigel.inria.fr> <19990212120037.C13091@rigel.inria.fr>
X-SW-Source: 1999-02/msg00436.html

On Fri, Feb 12, 1999 at 12:00:37PM +0100, Sylvain Pion wrote:
> I've got a class with 2 "double" data members (like a complex).
> The default copy ctor should be a memberwise copy, but it is slower (on x86)
> than when I declare it explicitly.  In fact, mine generates copies using the
> FPU, whereas the default does something like a memcopy, using more 32bits
> "mov"s, and thus is slower.

In case I was not explicit enough, the test case is the following C++ program:

----------------
struct IA { 
  double i,s;
#ifdef TEST
  IA () {}
  IA (const IA & d) : i(d.i), s(d.s) {}

  IA & operator=(const IA & d)
  {i = d.i; s = d.s; return *this; }
#endif
};

int main() { IA a; IA b = a; }
----------------


If you compile with "g++ -O2", you get:

main:
.LFB1:
        pushl %ebp
.LCFI0:
        movl %esp,%ebp
.LCFI1:
        subl $32,%esp
.LCFI2:
        movl -16(%ebp),%eax
        movl %eax,-32(%ebp)
        movl -12(%ebp),%eax
        movl %eax,-28(%ebp)
        movl -8(%ebp),%eax
        movl %eax,-24(%ebp)
        movl -4(%ebp),%eax
        movl %eax,-20(%ebp)
        xorl %eax,%eax
        movl %ebp,%esp
        popl %ebp
        ret


And if you compile with "g++ -DTEST -O2":

main:
.LFB1:
        pushl %ebp
.LCFI0:
        movl %esp,%ebp
.LCFI1:
        subl $32,%esp
.LCFI2:
        fldl -16(%ebp)
        fstpl -32(%ebp)
        fldl -8(%ebp)
        fstpl -24(%ebp)
        xorl %eax,%eax
        movl %ebp,%esp
        popl %ebp
        ret


It is clearly the second one I prefer :).  It is tested with the 19990208
snapshot, but it's basically the same with all versions I've tried so far.
G++ 2.8.1 doesn't use the FPU even in the second case.  And it's on
Linux-2.0/libc5/x86.  I've not tested other archs.

I can understand that it's maybe not safe to copy a 64bits memory area via the
FPU, when it's precision mode is set to single float (maybe it might trunc it,
or raise an exception if it's a NaN or anything ?).
However, if we are allowed to copy doubles via the FPU, then it might be a
valid "optimisation" to propagate it in such cases.

However, I can live with the 2 additionnal lines of code in my class to get
this optimisation.  I was just curious why it's done this way.

-- 
Sylvain

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Sylvain Pion <Sylvain.Pion@sophia.inria.fr>
To: EGCS list <egcs@egcs.cygnus.com>
Subject: Re: C++ default copy ctor not optimal
Date: Sun, 28 Feb 1999 22:53:00 -0000
Message-ID: <19990212134607.F13091@rigel.inria.fr>
References: <19990212120037.C13091@rigel.inria.fr>
X-SW-Source: 1999-02n/msg00429.html
Message-ID: <19990228225300.iTxE2KqZDLXErniTx8-kvjHoV0ztgJc_LyG_FuZZKQQ@z>

On Fri, Feb 12, 1999 at 12:00:37PM +0100, Sylvain Pion wrote:
> I've got a class with 2 "double" data members (like a complex).
> The default copy ctor should be a memberwise copy, but it is slower (on x86)
> than when I declare it explicitly.  In fact, mine generates copies using the
> FPU, whereas the default does something like a memcopy, using more 32bits
> "mov"s, and thus is slower.

In case I was not explicit enough, the test case is the following C++ program:

----------------
struct IA { 
  double i,s;
#ifdef TEST
  IA () {}
  IA (const IA & d) : i(d.i), s(d.s) {}

  IA & operator=(const IA & d)
  {i = d.i; s = d.s; return *this; }
#endif
};

int main() { IA a; IA b = a; }
----------------


If you compile with "g++ -O2", you get:

main:
.LFB1:
        pushl %ebp
.LCFI0:
        movl %esp,%ebp
.LCFI1:
        subl $32,%esp
.LCFI2:
        movl -16(%ebp),%eax
        movl %eax,-32(%ebp)
        movl -12(%ebp),%eax
        movl %eax,-28(%ebp)
        movl -8(%ebp),%eax
        movl %eax,-24(%ebp)
        movl -4(%ebp),%eax
        movl %eax,-20(%ebp)
        xorl %eax,%eax
        movl %ebp,%esp
        popl %ebp
        ret


And if you compile with "g++ -DTEST -O2":

main:
.LFB1:
        pushl %ebp
.LCFI0:
        movl %esp,%ebp
.LCFI1:
        subl $32,%esp
.LCFI2:
        fldl -16(%ebp)
        fstpl -32(%ebp)
        fldl -8(%ebp)
        fstpl -24(%ebp)
        xorl %eax,%eax
        movl %ebp,%esp
        popl %ebp
        ret


It is clearly the second one I prefer :).  It is tested with the 19990208
snapshot, but it's basically the same with all versions I've tried so far.
G++ 2.8.1 doesn't use the FPU even in the second case.  And it's on
Linux-2.0/libc5/x86.  I've not tested other archs.

I can understand that it's maybe not safe to copy a 64bits memory area via the
FPU, when it's precision mode is set to single float (maybe it might trunc it,
or raise an exception if it's a NaN or anything ?).
However, if we are allowed to copy doubles via the FPU, then it might be a
valid "optimisation" to propagate it in such cases.

However, I can live with the 2 additionnal lines of code in my class to get
this optimisation.  I was just curious why it's done this way.

-- 
Sylvain