public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* slow complex<double>'s with g++
@ 2006-03-03 17:21 Greg Buchholz
  2006-03-03 17:37 ` Brian Budge
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 17:21 UTC (permalink / raw)
  To: gcc-help

/* 
    While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...

  http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

...I've come across the issue that complex<double>'s seem painfully slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues.  The speed difference for
the program below is between 5x-8x depending on the version of g++.  It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2

...I'd be interesting in learning the reason for the speed difference.
Does it have something to do with temporaries not being optimized away,
or somesuch?  A limitation of the x87 instruction set?  Is it inherent
in the way the C++ Standard requires complex<double>'s to be calculated?
My bad coding style?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream> 
#include<complex>

int main (int argc, char **argv)
{
  char  bit_num = 0, byte_acc = 0;
  const int iter = 50;
  const double limit_sqr = 2.0 * 2.0;
  
  std::ios_base::sync_with_stdio(false);
  int n = atoi(argv[1]);

  std::cout << "P4\n" << n << " " << n << std::endl;

  for(int y=0; y<n; ++y) 
    for(int x=0; x<n; ++x)
    {
       std::complex<double> Z(0.0,0.0);
       std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n - 1.0);
        
       for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i)  Z = Z*Z + C;
        
       byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ? 0x00:0x01);

       if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc = 0; }
       else if(x == n-1) { byte_acc  <<= (8-n%8);
                           std::cout << byte_acc;
                           bit_num = byte_acc = 0; }
    }
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++
  2006-03-03 17:21 slow complex<double>'s with g++ Greg Buchholz
@ 2006-03-03 17:37 ` Brian Budge
  2006-03-03 17:46   ` Greg Buchholz
  2006-03-03 19:22   ` slow complex<double>'s with g++ Greg Buchholz
  0 siblings, 2 replies; 8+ messages in thread
From: Brian Budge @ 2006-03-03 17:37 UTC (permalink / raw)
  To: Greg Buchholz; +Cc: gcc-help

Which command line arguments are you using?  Can you make use of sse2?

  Brian

On 3/3/06, Greg Buchholz <greg@sleepingsquirrel.org> wrote:
> /*
>     While writing a C++ version of the Mandelbrot benchmark over at the
> "The Great Computer Language Shootout"...
>
>   http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
>
> ...I've come across the issue that complex<double>'s seem painfully slow
> unless compiled with -ffast-math. Of course doing that results in
> incorrect answers because of rounding issues.  The speed difference for
> the program below is between 5x-8x depending on the version of g++.  It
> is also about 5 times slower than the corresponding gcc version at...
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2
>
> ...I'd be interesting in learning the reason for the speed difference.
> Does it have something to do with temporaries not being optimized away,
> or somesuch?  A limitation of the x87 instruction set?  Is it inherent
> in the way the C++ Standard requires complex<double>'s to be calculated?
> My bad coding style?
>
> Curious,
>
> Greg Buchholz
> */
>
> // Takes an integer argument "n" on the command line and generates a
> // PBM bitmap of the Mandelbrot set on stdout.
> // see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )
>
> #include<iostream>
> #include<complex>
>
> int main (int argc, char **argv)
> {
>   char  bit_num = 0, byte_acc = 0;
>   const int iter = 50;
>   const double limit_sqr = 2.0 * 2.0;
>
>   std::ios_base::sync_with_stdio(false);
>   int n = atoi(argv[1]);
>
>   std::cout << "P4\n" << n << " " << n << std::endl;
>
>   for(int y=0; y<n; ++y)
>     for(int x=0; x<n; ++x)
>     {
>        std::complex<double> Z(0.0,0.0);
>        std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n - 1.0);
>
>        for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i)  Z = Z*Z + C;
>
>        byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ? 0x00:0x01);
>
>        if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc = 0; }
>        else if(x == n-1) { byte_acc  <<= (8-n%8);
>                            std::cout << byte_acc;
>                            bit_num = byte_acc = 0; }
>     }
> }
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++
  2006-03-03 17:37 ` Brian Budge
@ 2006-03-03 17:46   ` Greg Buchholz
  2006-03-03 18:28     ` Brian Budge
  2006-03-03 19:22   ` slow complex<double>'s with g++ Greg Buchholz
  1 sibling, 1 reply; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 17:46 UTC (permalink / raw)
  To: gcc-help

Brian Budge wrote:
> Which command line arguments are you using?  

    Mostly "-O3", but I've tried lots of combinations of different
options and I haven't yet found the magic incantation.

>Can you make use of sse2?

    Well, there is a C version that I wrote which uses gcc's vector
extensions and SSE2...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=3

...but right now I'm more interested in why normal complex<double>'s are
so slow.

Greg Buchholz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++
  2006-03-03 17:46   ` Greg Buchholz
@ 2006-03-03 18:28     ` Brian Budge
  2006-03-03 18:59       ` Greg Buchholz
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Budge @ 2006-03-03 18:28 UTC (permalink / raw)
  To: gcc-help

It may be that I'm missing something, but you appear to be computing
different things...

Zi = 2.0*Zr*Zi + Ci;
Zr = Tr - Ti + Cr;

vs

 Z = Z*Z + C;

No doubt the 2.0 makes a difference...

There are several minor optimizations also that might speed up your
code... if you're interested, email me offline.

  Brian

On 3/3/06, Greg Buchholz <greg@sleepingsquirrel.org> wrote:
> Brian Budge wrote:
> > Which command line arguments are you using?
>
>     Mostly "-O3", but I've tried lots of combinations of different
> options and I haven't yet found the magic incantation.
>
> >Can you make use of sse2?
>
>     Well, there is a C version that I wrote which uses gcc's vector
> extensions and SSE2...
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=3
>
> ...but right now I'm more interested in why normal complex<double>'s are
> so slow.
>
> Greg Buchholz
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++
  2006-03-03 18:28     ` Brian Budge
@ 2006-03-03 18:59       ` Greg Buchholz
  2006-03-03 19:25         ` Brian Budge
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 18:59 UTC (permalink / raw)
  To: gcc-help

Brian Budge wrote:
> It may be that I'm missing something, but you appear to be computing
> different things...
> 
> Zi = 2.0*Zr*Zi + Ci;
> Zr = Tr - Ti + Cr;
> 
> vs
> 
>  Z = Z*Z + C;
> 
> No doubt the 2.0 makes a difference...

    Hmm.  I don't get what you are saying.  Squaring a complex number
like a+ib you get...

(a+ib) * (a+ib)
a*a + iab + iab - b*b
(a*a - b*b) + i2ab

...Are you saying a multiplication instead of an addition is the cause?
Yes, the C version is a little more optimized, but I was surprised at
the factor of 8.5 time difference for gcc-4.1pre021006.  I guess I
expecting something small, say a 20-50% slowdown.  And you get all of
the speed back if you just use "-ffast-math".  Maybe I'm just surprised
that the optimizations that "-ffast-math" turns on do such a good job.

Greg Buchholz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++
  2006-03-03 17:37 ` Brian Budge
  2006-03-03 17:46   ` Greg Buchholz
@ 2006-03-03 19:22   ` Greg Buchholz
  1 sibling, 0 replies; 8+ messages in thread
From: Greg Buchholz @ 2006-03-03 19:22 UTC (permalink / raw)
  To: gcc-help

Brian Budge wrote:
> Which command line arguments are you using?  Can you make use of sse2?

BTW, if you're an SSE2 expert, do you have any insight on this problem...

    http://gcc.gnu.org/ml/gcc-help/2006-02/msg00098.html

Thanks,

Greg Buchholz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++
  2006-03-03 18:59       ` Greg Buchholz
@ 2006-03-03 19:25         ` Brian Budge
  2006-03-06 18:03           ` slow complex<double>'s with g++ [Solved] Greg Buchholz
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Budge @ 2006-03-03 19:25 UTC (permalink / raw)
  To: gcc-help

Hmmmm, yeah... I guess it shouldn't make an 8.5 time difference.

You're right, it should be a smaller difference.  The squaring is 2
muls and an add vs 4 muls and 2 adds, so it should be less than twice
as fast when you include the other portion.

The instruction ordering might be more optimal also.

8.5 times though... weird.

  Brian



On 3/3/06, Greg Buchholz <greg@sleepingsquirrel.org> wrote:
> Brian Budge wrote:
> > It may be that I'm missing something, but you appear to be computing
> > different things...
> >
> > Zi = 2.0*Zr*Zi + Ci;
> > Zr = Tr - Ti + Cr;
> >
> > vs
> >
> >  Z = Z*Z + C;
> >
> > No doubt the 2.0 makes a difference...
>
>     Hmm.  I don't get what you are saying.  Squaring a complex number
> like a+ib you get...
>
> (a+ib) * (a+ib)
> a*a + iab + iab - b*b
> (a*a - b*b) + i2ab
>
> ...Are you saying a multiplication instead of an addition is the cause?
> Yes, the C version is a little more optimized, but I was surprised at
> the factor of 8.5 time difference for gcc-4.1pre021006.  I guess I
> expecting something small, say a 20-50% slowdown.  And you get all of
> the speed back if you just use "-ffast-math".  Maybe I'm just surprised
> that the optimizations that "-ffast-math" turns on do such a good job.
>
> Greg Buchholz
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow complex<double>'s with g++ [Solved]
  2006-03-03 19:25         ` Brian Budge
@ 2006-03-06 18:03           ` Greg Buchholz
  0 siblings, 0 replies; 8+ messages in thread
From: Greg Buchholz @ 2006-03-06 18:03 UTC (permalink / raw)
  To: gcc-help

Brian Budge wrote:
> Hmmmm, yeah... I guess it shouldn't make an 8.5 time difference.
> 
> You're right, it should be a smaller difference.  The squaring is 2
> muls and an add vs 4 muls and 2 adds, so it should be less than twice
> as fast when you include the other portion.
> 
> The instruction ordering might be more optimal also.
> 
> 8.5 times though... weird.
>
    Looks like the problem can be solved by manually inlining the
definition of "norm"...

     //manually inlining "norm" results in a 5x-7x speedup on g++
     for(int i=0; i<iter and
           (Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i)
       Z = Z*Z + C;

...For some reason g++ must not have been able to inline it (or does so
after common subexpression elimination or somesuch).

Greg Buchholz
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-03-06 18:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-03 17:21 slow complex<double>'s with g++ Greg Buchholz
2006-03-03 17:37 ` Brian Budge
2006-03-03 17:46   ` Greg Buchholz
2006-03-03 18:28     ` Brian Budge
2006-03-03 18:59       ` Greg Buchholz
2006-03-03 19:25         ` Brian Budge
2006-03-06 18:03           ` slow complex<double>'s with g++ [Solved] Greg Buchholz
2006-03-03 19:22   ` slow complex<double>'s with g++ Greg Buchholz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).