* slow complex<double>'s with g++
@ 2006-03-03 17:21 Greg Buchholz
2006-03-03 17:37 ` Brian Budge
0 siblings, 1 reply; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 17:21 UTC (permalink / raw)
To: gcc-help
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...
http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
...I've come across the issue that complex<double>'s seem painfully slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...
http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2
...I'd be interesting in learning the reason for the speed difference.
Does it have something to do with temporaries not being optimized away,
or somesuch? A limitation of the x87 instruction set? Is it inherent
in the way the C++ Standard requires complex<double>'s to be calculated?
My bad coding style?
Curious,
Greg Buchholz
*/
// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )
#include<iostream>
#include<complex>
int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;
std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);
std::cout << "P4\n" << n << " " << n << std::endl;
for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n - 1.0);
for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z + C;
byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ? 0x00:0x01);
if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc = 0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++
2006-03-03 17:21 slow complex<double>'s with g++ Greg Buchholz
@ 2006-03-03 17:37 ` Brian Budge
2006-03-03 17:46 ` Greg Buchholz
2006-03-03 19:22 ` slow complex<double>'s with g++ Greg Buchholz
0 siblings, 2 replies; 8+ messages in thread
From: Brian Budge @ 2006-03-03 17:37 UTC (permalink / raw)
To: Greg Buchholz; +Cc: gcc-help
Which command line arguments are you using? Can you make use of sse2?
Brian
On 3/3/06, Greg Buchholz <greg@sleepingsquirrel.org> wrote:
> /*
> While writing a C++ version of the Mandelbrot benchmark over at the
> "The Great Computer Language Shootout"...
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
>
> ...I've come across the issue that complex<double>'s seem painfully slow
> unless compiled with -ffast-math. Of course doing that results in
> incorrect answers because of rounding issues. The speed difference for
> the program below is between 5x-8x depending on the version of g++. It
> is also about 5 times slower than the corresponding gcc version at...
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2
>
> ...I'd be interesting in learning the reason for the speed difference.
> Does it have something to do with temporaries not being optimized away,
> or somesuch? A limitation of the x87 instruction set? Is it inherent
> in the way the C++ Standard requires complex<double>'s to be calculated?
> My bad coding style?
>
> Curious,
>
> Greg Buchholz
> */
>
> // Takes an integer argument "n" on the command line and generates a
> // PBM bitmap of the Mandelbrot set on stdout.
> // see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )
>
> #include<iostream>
> #include<complex>
>
> int main (int argc, char **argv)
> {
> char bit_num = 0, byte_acc = 0;
> const int iter = 50;
> const double limit_sqr = 2.0 * 2.0;
>
> std::ios_base::sync_with_stdio(false);
> int n = atoi(argv[1]);
>
> std::cout << "P4\n" << n << " " << n << std::endl;
>
> for(int y=0; y<n; ++y)
> for(int x=0; x<n; ++x)
> {
> std::complex<double> Z(0.0,0.0);
> std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n - 1.0);
>
> for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z + C;
>
> byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ? 0x00:0x01);
>
> if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc = 0; }
> else if(x == n-1) { byte_acc <<= (8-n%8);
> std::cout << byte_acc;
> bit_num = byte_acc = 0; }
> }
> }
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++
2006-03-03 17:37 ` Brian Budge
@ 2006-03-03 17:46 ` Greg Buchholz
2006-03-03 18:28 ` Brian Budge
2006-03-03 19:22 ` slow complex<double>'s with g++ Greg Buchholz
1 sibling, 1 reply; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 17:46 UTC (permalink / raw)
To: gcc-help
Brian Budge wrote:
> Which command line arguments are you using?
Mostly "-O3", but I've tried lots of combinations of different
options and I haven't yet found the magic incantation.
>Can you make use of sse2?
Well, there is a C version that I wrote which uses gcc's vector
extensions and SSE2...
http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=3
...but right now I'm more interested in why normal complex<double>'s are
so slow.
Greg Buchholz
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++
2006-03-03 17:46 ` Greg Buchholz
@ 2006-03-03 18:28 ` Brian Budge
2006-03-03 18:59 ` Greg Buchholz
0 siblings, 1 reply; 8+ messages in thread
From: Brian Budge @ 2006-03-03 18:28 UTC (permalink / raw)
To: gcc-help
It may be that I'm missing something, but you appear to be computing
different things...
Zi = 2.0*Zr*Zi + Ci;
Zr = Tr - Ti + Cr;
vs
Z = Z*Z + C;
No doubt the 2.0 makes a difference...
There are several minor optimizations also that might speed up your
code... if you're interested, email me offline.
Brian
On 3/3/06, Greg Buchholz <greg@sleepingsquirrel.org> wrote:
> Brian Budge wrote:
> > Which command line arguments are you using?
>
> Mostly "-O3", but I've tried lots of combinations of different
> options and I haven't yet found the magic incantation.
>
> >Can you make use of sse2?
>
> Well, there is a C version that I wrote which uses gcc's vector
> extensions and SSE2...
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=3
>
> ...but right now I'm more interested in why normal complex<double>'s are
> so slow.
>
> Greg Buchholz
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++
2006-03-03 18:28 ` Brian Budge
@ 2006-03-03 18:59 ` Greg Buchholz
2006-03-03 19:25 ` Brian Budge
0 siblings, 1 reply; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 18:59 UTC (permalink / raw)
To: gcc-help
Brian Budge wrote:
> It may be that I'm missing something, but you appear to be computing
> different things...
>
> Zi = 2.0*Zr*Zi + Ci;
> Zr = Tr - Ti + Cr;
>
> vs
>
> Z = Z*Z + C;
>
> No doubt the 2.0 makes a difference...
Hmm. I don't get what you are saying. Squaring a complex number
like a+ib you get...
(a+ib) * (a+ib)
a*a + iab + iab - b*b
(a*a - b*b) + i2ab
...Are you saying a multiplication instead of an addition is the cause?
Yes, the C version is a little more optimized, but I was surprised at
the factor of 8.5 time difference for gcc-4.1pre021006. I guess I
expecting something small, say a 20-50% slowdown. And you get all of
the speed back if you just use "-ffast-math". Maybe I'm just surprised
that the optimizations that "-ffast-math" turns on do such a good job.
Greg Buchholz
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++
2006-03-03 17:37 ` Brian Budge
2006-03-03 17:46 ` Greg Buchholz
@ 2006-03-03 19:22 ` Greg Buchholz
1 sibling, 0 replies; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 19:22 UTC (permalink / raw)
To: gcc-help
Brian Budge wrote:
> Which command line arguments are you using? Can you make use of sse2?
BTW, if you're an SSE2 expert, do you have any insight on this problem...
http://gcc.gnu.org/ml/gcc-help/2006-02/msg00098.html
Thanks,
Greg Buchholz
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++
2006-03-03 18:59 ` Greg Buchholz
@ 2006-03-03 19:25 ` Brian Budge
2006-03-06 18:03 ` slow complex<double>'s with g++ [Solved] Greg Buchholz
0 siblings, 1 reply; 8+ messages in thread
From: Brian Budge @ 2006-03-03 19:25 UTC (permalink / raw)
To: gcc-help
Hmmmm, yeah... I guess it shouldn't make an 8.5 time difference.
You're right, it should be a smaller difference. The squaring is 2
muls and an add vs 4 muls and 2 adds, so it should be less than twice
as fast when you include the other portion.
The instruction ordering might be more optimal also.
8.5 times though... weird.
Brian
On 3/3/06, Greg Buchholz <greg@sleepingsquirrel.org> wrote:
> Brian Budge wrote:
> > It may be that I'm missing something, but you appear to be computing
> > different things...
> >
> > Zi = 2.0*Zr*Zi + Ci;
> > Zr = Tr - Ti + Cr;
> >
> > vs
> >
> > Z = Z*Z + C;
> >
> > No doubt the 2.0 makes a difference...
>
> Hmm. I don't get what you are saying. Squaring a complex number
> like a+ib you get...
>
> (a+ib) * (a+ib)
> a*a + iab + iab - b*b
> (a*a - b*b) + i2ab
>
> ...Are you saying a multiplication instead of an addition is the cause?
> Yes, the C version is a little more optimized, but I was surprised at
> the factor of 8.5 time difference for gcc-4.1pre021006. I guess I
> expecting something small, say a 20-50% slowdown. And you get all of
> the speed back if you just use "-ffast-math". Maybe I'm just surprised
> that the optimizations that "-ffast-math" turns on do such a good job.
>
> Greg Buchholz
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: slow complex<double>'s with g++ [Solved]
2006-03-03 19:25 ` Brian Budge
@ 2006-03-06 18:03 ` Greg Buchholz
0 siblings, 0 replies; 8+ messages in thread
From: Greg Buchholz @ 2006-03-06 18:03 UTC (permalink / raw)
To: gcc-help
Brian Budge wrote:
> Hmmmm, yeah... I guess it shouldn't make an 8.5 time difference.
>
> You're right, it should be a smaller difference. The squaring is 2
> muls and an add vs 4 muls and 2 adds, so it should be less than twice
> as fast when you include the other portion.
>
> The instruction ordering might be more optimal also.
>
> 8.5 times though... weird.
>
Looks like the problem can be solved by manually inlining the
definition of "norm"...
//manually inlining "norm" results in a 5x-7x speedup on g++
for(int i=0; i<iter and
(Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i)
Z = Z*Z + C;
...For some reason g++ must not have been able to inline it (or does so
after common subexpression elimination or somesuch).
Greg Buchholz
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-03-06 18:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-03 17:21 slow complex<double>'s with g++ Greg Buchholz
2006-03-03 17:37 ` Brian Budge
2006-03-03 17:46 ` Greg Buchholz
2006-03-03 18:28 ` Brian Budge
2006-03-03 18:59 ` Greg Buchholz
2006-03-03 19:25 ` Brian Budge
2006-03-06 18:03 ` slow complex<double>'s with g++ [Solved] Greg Buchholz
2006-03-03 19:22 ` slow complex<double>'s with g++ Greg Buchholz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).