From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sylvain Pion To: EGCS list Subject: Re: C++ default copy ctor not optimal Date: Fri, 12 Feb 1999 04:46:00 -0000 Message-id: <19990212134607.F13091@rigel.inria.fr> In-reply-to: < 19990212120037.C13091@rigel.inria.fr >; from Sylvain Pion on Fri, Feb 12, 1999 at 12:00:37PM +0100 References: <19990212120037.C13091@rigel.inria.fr> <19990212120037.C13091@rigel.inria.fr> X-SW-Source: 1999-02/msg00436.html On Fri, Feb 12, 1999 at 12:00:37PM +0100, Sylvain Pion wrote: > I've got a class with 2 "double" data members (like a complex). > The default copy ctor should be a memberwise copy, but it is slower (on x86) > than when I declare it explicitly. In fact, mine generates copies using the > FPU, whereas the default does something like a memcopy, using more 32bits > "mov"s, and thus is slower. In case I was not explicit enough, the test case is the following C++ program: ---------------- struct IA { double i,s; #ifdef TEST IA () {} IA (const IA & d) : i(d.i), s(d.s) {} IA & operator=(const IA & d) {i = d.i; s = d.s; return *this; } #endif }; int main() { IA a; IA b = a; } ---------------- If you compile with "g++ -O2", you get: main: .LFB1: pushl %ebp .LCFI0: movl %esp,%ebp .LCFI1: subl $32,%esp .LCFI2: movl -16(%ebp),%eax movl %eax,-32(%ebp) movl -12(%ebp),%eax movl %eax,-28(%ebp) movl -8(%ebp),%eax movl %eax,-24(%ebp) movl -4(%ebp),%eax movl %eax,-20(%ebp) xorl %eax,%eax movl %ebp,%esp popl %ebp ret And if you compile with "g++ -DTEST -O2": main: .LFB1: pushl %ebp .LCFI0: movl %esp,%ebp .LCFI1: subl $32,%esp .LCFI2: fldl -16(%ebp) fstpl -32(%ebp) fldl -8(%ebp) fstpl -24(%ebp) xorl %eax,%eax movl %ebp,%esp popl %ebp ret It is clearly the second one I prefer :). It is tested with the 19990208 snapshot, but it's basically the same with all versions I've tried so far. G++ 2.8.1 doesn't use the FPU even in the second case. And it's on Linux-2.0/libc5/x86. I've not tested other archs. I can understand that it's maybe not safe to copy a 64bits memory area via the FPU, when it's precision mode is set to single float (maybe it might trunc it, or raise an exception if it's a NaN or anything ?). However, if we are allowed to copy doubles via the FPU, then it might be a valid "optimisation" to propagate it in such cases. However, I can live with the 2 additionnal lines of code in my class to get this optimisation. I was just curious why it's done this way. -- Sylvain From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sylvain Pion To: EGCS list Subject: Re: C++ default copy ctor not optimal Date: Sun, 28 Feb 1999 22:53:00 -0000 Message-ID: <19990212134607.F13091@rigel.inria.fr> References: <19990212120037.C13091@rigel.inria.fr> X-SW-Source: 1999-02n/msg00429.html Message-ID: <19990228225300.iTxE2KqZDLXErniTx8-kvjHoV0ztgJc_LyG_FuZZKQQ@z> On Fri, Feb 12, 1999 at 12:00:37PM +0100, Sylvain Pion wrote: > I've got a class with 2 "double" data members (like a complex). > The default copy ctor should be a memberwise copy, but it is slower (on x86) > than when I declare it explicitly. In fact, mine generates copies using the > FPU, whereas the default does something like a memcopy, using more 32bits > "mov"s, and thus is slower. In case I was not explicit enough, the test case is the following C++ program: ---------------- struct IA { double i,s; #ifdef TEST IA () {} IA (const IA & d) : i(d.i), s(d.s) {} IA & operator=(const IA & d) {i = d.i; s = d.s; return *this; } #endif }; int main() { IA a; IA b = a; } ---------------- If you compile with "g++ -O2", you get: main: .LFB1: pushl %ebp .LCFI0: movl %esp,%ebp .LCFI1: subl $32,%esp .LCFI2: movl -16(%ebp),%eax movl %eax,-32(%ebp) movl -12(%ebp),%eax movl %eax,-28(%ebp) movl -8(%ebp),%eax movl %eax,-24(%ebp) movl -4(%ebp),%eax movl %eax,-20(%ebp) xorl %eax,%eax movl %ebp,%esp popl %ebp ret And if you compile with "g++ -DTEST -O2": main: .LFB1: pushl %ebp .LCFI0: movl %esp,%ebp .LCFI1: subl $32,%esp .LCFI2: fldl -16(%ebp) fstpl -32(%ebp) fldl -8(%ebp) fstpl -24(%ebp) xorl %eax,%eax movl %ebp,%esp popl %ebp ret It is clearly the second one I prefer :). It is tested with the 19990208 snapshot, but it's basically the same with all versions I've tried so far. G++ 2.8.1 doesn't use the FPU even in the second case. And it's on Linux-2.0/libc5/x86. I've not tested other archs. I can understand that it's maybe not safe to copy a 64bits memory area via the FPU, when it's precision mode is set to single float (maybe it might trunc it, or raise an exception if it's a NaN or anything ?). However, if we are allowed to copy doubles via the FPU, then it might be a valid "optimisation" to propagate it in such cases. However, I can live with the 2 additionnal lines of code in my class to get this optimisation. I was just curious why it's done this way. -- Sylvain