public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: Marc Glisse <marc.glisse@inria.fr>
To: Boris Hollas <borish@gmx.de>
Cc: gcc-help@gcc.gnu.org
Subject: Re: SSE SIMD enhanced code 4x slower than regular code
Date: Wed, 18 Jan 2012 11:23:00 -0000	[thread overview]
Message-ID: <alpine.DEB.2.02.1201180919420.2311@laptop-mg.saclay.inria.fr> (raw)
In-Reply-To: <33159404.post@talk.nabble.com>

On Tue, 17 Jan 2012, Boris Hollas wrote:

> I have a function iter1 that iterates a sequence of complex numbers. I
> redesigned this function, using SSE intrinsics such as _mm_mul_pd, to obtain
> iter0. Nonetheless, iter0 is 4x slower than iter1:

That is not surprising at all and happens to most codes using double when 
people try converting them to SSE.

> $ gcc -O -march=core2 t.c && time ./a.out

Maybe use at least -O2 ?

> The size of a.out ist 7.1K in both cases. I use gcc version 4.4.5 and the
> CPU is an Intel Core 2 Duo.

You may want to try newer versions of gcc (note the plural, results 
between 4.4, 4.5, 4.6 and the future 4.7 can vary a lot, and not always in 
the right direction).

> #include <pmmintrin.h>
> #include <stdio.h>
> #define sqr(x) ((x)*(x))
>
> typedef union {
>  __m128d m;
>  double v[2]; // v[0] low, v[1] up
> } v2df;
>
> int iter0(v2df z, v2df c, int n, int bound) {
>  v2df z2, z2r, z2r_addsub, z_;
>  z2.m = _mm_mul_pd(z.m, z.m);  // z_re^2, z_im^2
>  z2r.v[1] = z2.v[0];
>  z2r.v[0] = z2.v[1];

You may want to try _mm_shuffle_pd or __builtin_shuffle.

>  z2r_addsub.m = _mm_addsub_pd(z2r.m, z2.m); // z_re^2 + z_im^2, z_re^2 -
> z_im^2
>
>  if(z2r_addsub.v[1] > 4.0 || n == bound) return n;
>  else {
> 	z_.v[1] = z2r_addsub.v[0];
> 	z_.v[0] = 2.0 * z.v[1] * z.v[0];
> 	z_.m = _mm_add_pd(z_.m, c.m); // z_re^2 - z_im^2 + c_re, 2 * z_re * z_im +
> c_im
> 	return iter0(z_, c, n+1, bound);
>  }
> }

Did you take a look at the generated code (use flag -S and read the 
generated t.s)? Going back and forth between packed and unpacked through a 
union often generates plenty of mov instructions. If you manually use 
_mm_cvtsd_f64 and _mm_unpackhi_pd you may be able to save a bit. Note that 
with the latest gcc, you can use the [] notation directly on your __m128d.

> int iter1(double z_re, double z_im, double c_re, double c_im, int n, int
> bound) {
>  double zre2 = sqr(z_re);
>  double zim2 = sqr(z_im);
>
>  if(zre2 + zim2 > 4.0 || n == bound) return n;
>  else return iter1(zre2 - zim2 + c_re, 2.0 * z_re * z_im + c_im, c_re,
> c_im, n+1, bound);
> }
>
> #define sse
>
> int main() {
>  v2df z, c;
>  long n = 0;
>  z.v[1] = 0.0; z.v[0] = 0.0;
>
>  for(c.v[1] = -2.0; c.v[1] < 1.0; c.v[1] += 3.0/1000.0) {
>    for(c.v[0] = -1.0; c.v[0] < 1.0; c.v[0] += 2.0/1000.0) {
> #ifdef sse
>  	  n += iter0(z, c, 0, 1000);
> #else
> 	  n += iter1(0.0, 0.0, c.v[1], c.v[0], 0, 1000);
> #endif
>    }
>  }
>  printf("%ld\n", n);
>  return 0;
> }

I'd be surprised if you managed any gain on this thanks to __m128d.

-- 
Marc Glisse

  reply	other threads:[~2012-01-18  8:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-18 11:13 Boris Hollas
2012-01-18 11:23 ` Marc Glisse [this message]
2012-01-20 17:30   ` Boris Hollas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1201180919420.2311@laptop-mg.saclay.inria.fr \
    --to=marc.glisse@inria.fr \
    --cc=borish@gmx.de \
    --cc=gcc-help@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).