public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Dario Bahena Tapia" <dario.mx@gmail.com>
To: "Zuxy Meng" <zuxy.meng@gmail.com>
Cc: gcc-help@gcc.gnu.org
Subject: Re: Why worse performace in euclidean distance with SSE2?
Date: Tue, 08 Apr 2008 15:57:00 -0000	[thread overview]
Message-ID: <3d104d6f0804080755w2a90760cgc9159fdedc894fae@mail.gmail.com> (raw)
In-Reply-To: <ftf5j3$g1j$1@ger.gmane.org>

Hello,

Yeah, others have suggested as well changing the way i process them in
order to allow for that. Working there ;-|

Will consider the other suggestions as well !!!

Thanks.


On Tue, Apr 8, 2008 at 2:07 AM, Zuxy Meng <zuxy.meng@gmail.com> wrote:
> Hi,
>
>  "Dario Bahena Tapia" <dario.mx@gmail.com>
> 写入消息新闻:3d104d6f0804070617u47213cc8nbc697dab9dc262b5@mail.gmail.com...
>
>
>
> > Hello,
> >
> > I have just begun to play with SSE2 and gcc intrinsics. Indeed, maybe
> > this question is not exactly about gcc  ... but I think gcc lists are
> > a very good place to find help from  hardcore assembler hackers ;-1
> >
> > I have a program which makes heavy usage of euclidean distance
> > function. The serial version is:
> >
> > inline static double dist(int i,int j)
> > {
> >  double xd = C[i][X] - C[j][X];
> >  double yd = C[i][Y] - C[j][Y];
> >  return rint(sqrt(xd*xd + yd*yd));
> > }
> >
> > As you can see each C[i] is an array of two double which represents a
> > 2D vector (indexes 0 and 1 are coordinates X,Y respectively). I tried
> > to vectorize the function using SSE2 and gcc intrinsics, here is the
> > result:
> >
> > inline static double dist_sse(int i,int j)
> > {
> >  double d;
> >  __m128d xmm0,xmm1;
> >  xmm0 =_mm_load_pd(C[i]);
> >  xmm1 = _mm_load_pd(C[j]);
> >  xmm0 = _mm_sub_pd(xmm0,xmm1);
> >  xmm1 = xmm0;
> >  xmm0 = _mm_mul_pd(xmm0,xmm1);
> >  xmm1 = _mm_shuffle_pd(xmm0, xmm0, _MM_SHUFFLE2(1, 1));
> >  xmm0 = _mm_add_pd(xmm0,xmm1);
> >  xmm0 = _mm_sqrt_pd(xmm0);
> >  _mm_store_sd(&d,xmm0);
> >  return rint(d);
> > }
> >
> > Of course each C[i] was aligned as SSE2 expects:
> >
> > for(i=0; i<D; i++)
> > C[i] = (double *) _mm_malloc(2 * sizeof(double), 16);
> >
> > And in order to activate the SSE2 features, I am using the following
> > flags for gcc (my computer is a laptop):
> >
> > CFLAGS = -O -Wall -march=pentium-m -msse2
> >
> > The vectorized version of the function seems to be correct, given it
> > provides same results as serial counterpart. However, the performace
> > is poor; execution time of program increases in approximately 50% (for
> > example, in calculating the distance of every pair of points from a
> > set of 10,000, the serial version takes around 8 seconds while
> > vectorized flavor takes 12).
> >
> > I was expecting a better time given that:
> >
> > 1. The difference of X and Y is done in parallel
> > 2. The product of each difference coordinate with itself is also done
> > in parallel
> > 3. The sqrt function used is hardware implemented (although serial
> > sqrt implementation could also take advantage of hardware)
> >
> > I suppose the problem here is my lack of experience programming in
> > assembler in general, and in particular with SSE2. Therefore, I am
> > looking for advice.
> >
>
>  1. First of all, you didn't extract the parallelism in your algorithm. SSE2
> won't help you if all you want is to pick up two points at random indices
> and calculate the distance. However it will help you a lot when you
> calculate the distances between a given point and 1 million others whose
> indices are sequential.
>
>  2. Unroll the loop to hide the latency of square root as much as possible.
>
>  3. Since the final result is an integer, you may consider using "float"
> instead of "double". That'll give you a performance boost even without SSE2.
> And rsqrtps comes in handy too, if its precision is acceptable.
>
>  --
>  Zuxy
>
>

      reply	other threads:[~2008-04-08 14:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-07 14:09 Dario Bahena Tapia
2008-04-07 15:23 ` Dario Saccavino
2008-04-07 16:09   ` Dario Bahena Tapia
2008-04-07 16:41     ` Dario Bahena Tapia
2008-04-07 16:05 ` jlh
2008-04-07 17:02   ` Dario Bahena Tapia
2008-04-07 23:42     ` Brian Budge
2008-04-08  2:15       ` Dario Bahena Tapia
2008-04-08  8:34 ` Zuxy Meng
2008-04-08 15:57   ` Dario Bahena Tapia [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d104d6f0804080755w2a90760cgc9159fdedc894fae@mail.gmail.com \
    --to=dario.mx@gmail.com \
    --cc=gcc-help@gcc.gnu.org \
    --cc=zuxy.meng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).