From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-32222-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28680 invoked by alias); 7 Apr 2008 23:03:12 -0000
Received: (qmail 28671 invoked by uid 22791); 7 Apr 2008 23:03:11 -0000
X-Spam-Check-By: sourceware.org
Received: from ug-out-1314.google.com (HELO ug-out-1314.google.com) (66.249.92.169)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 07 Apr 2008 23:02:54 +0000
Received: by ug-out-1314.google.com with SMTP id o38so655229ugd.17         for <gcc-help@gcc.gnu.org>; Mon, 07 Apr 2008 16:02:51 -0700 (PDT)
Received: by 10.151.150.13 with SMTP id c13mr3001133ybo.173.1207609369524;         Mon, 07 Apr 2008 16:02:49 -0700 (PDT)
Received: by 10.150.195.1 with HTTP; Mon, 7 Apr 2008 16:02:49 -0700 (PDT)
Message-ID: <3d104d6f0804071602u2cf25d61w3cdf13f3e7ac1f51@mail.gmail.com>
Date: Tue, 08 Apr 2008 02:15:00 -0000
From: "Dario Bahena Tapia" <dario.mx@gmail.com>
To: "Brian Budge" <brian.budge@gmail.com>
Subject: Re: Why worse performace in euclidean distance with SSE2?
Cc: gcc-help@gcc.gnu.org
In-Reply-To: <5b7094580804071551m67759fb0r84b018de3c4a4267@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <3d104d6f0804070617u47213cc8nbc697dab9dc262b5@mail.gmail.com> 	 <47FA3C65.6020701@gmx.ch> 	 <3d104d6f0804070908q7ee3513ehd18db00437c6d835@mail.gmail.com> 	 <5b7094580804071551m67759fb0r84b018de3c4a4267@mail.gmail.com>
X-IsSubscribed: yes
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-help.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2008-04/txt/msg00088.txt.bz2

Hello,

Think I concur, indeed, original program had structure of arrays (each
coordinate in separate array). Will try to use SSE2 over that flavor,
although I think sqrt will still be the bottleneck ... maybe I could
use also another norm function (like maximum or taxicab).

Thanks.


On Mon, Apr 7, 2008 at 5:51 PM, Brian Budge <brian.budge@gmail.com> wrote:
> In my experience, SSE is generally more useful when you can optimize
>  your structures as SOA (struct of array) vs AOS (array of struct).  If
>  you expect a speed up by doing individual groups of pairs of doubles,
>  I doubt you'll see much improvement except in extreme situations, or
>  when the compiler might detect a pattern in your code.  Also, shuffles
>  etc... are killers.
>
>  Much better would be if you had 10000 of these things to take
>  distances at once, and you could lay out the data friendlier for SSE
>  (SOA).
>
>   Brian
>
>
>
>  On Mon, Apr 7, 2008 at 9:08 AM, Dario Bahena Tapia <dario.mx@gmail.com> wrote:
>  > Hello,
>  >
>  >  I tried with your options but it seems to make no difference. In
>  >  another email it was suggested to use _mm_sqrt_sd, because I only
>  >  needed one sqrt calculation. That improved time and indeed, almost
>  >  reach serial version (now it runs up to 1 second slower for the 10,000
>  >  data example, hehe).
>  >
>  >  But of course, I would wanna/expect the vector version to run faster
>  >  ... still unsure how to achieve that.
>  >
>  >  Thanks
>  >
>  >
>  >
>  >  On Mon, Apr 7, 2008 at 10:23 AM, jlh <jlh@gmx.ch> wrote:
>  >  > Dario Bahena Tapia wrote:
>  >  >
>  >  > >
>  >  > > inline static double dist(int i,int j)
>  >  > > {
>  >  > >  double xd = C[i][X] - C[j][X];
>  >  > >  double yd = C[i][Y] - C[j][Y];
>  >  > >  return rint(sqrt(xd*xd + yd*yd));
>  >  > > }
>  >  > > [...]
>  >  > >
>  >  > > And in order to activate the SSE2 features, I am using the following
>  >  > > flags for gcc (my computer is a laptop):
>  >  > >
>  >  > > CFLAGS = -O -Wall -march=pentium-m -msse2
>  >  > >
>  >  >
>  >  >  These options do not make dist() use any SSE for me.  Have you
>  >  >  tried compiling with this?
>  >  >
>  >  >  CFLAGS = -O2 -Wall -march=pentium-m -mfpmath=sse
>  >  >
>  >  >  I think -msse2 is redundant if you say -march-pentium-m.  I don't
>  >  >  have an SSE2 machine to try this though.
>  >  >
>  >  >  jlh
>  >  >
>  >
>