From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28680 invoked by alias); 7 Apr 2008 23:03:12 -0000 Received: (qmail 28671 invoked by uid 22791); 7 Apr 2008 23:03:11 -0000 X-Spam-Check-By: sourceware.org Received: from ug-out-1314.google.com (HELO ug-out-1314.google.com) (66.249.92.169) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 07 Apr 2008 23:02:54 +0000 Received: by ug-out-1314.google.com with SMTP id o38so655229ugd.17 for ; Mon, 07 Apr 2008 16:02:51 -0700 (PDT) Received: by 10.151.150.13 with SMTP id c13mr3001133ybo.173.1207609369524; Mon, 07 Apr 2008 16:02:49 -0700 (PDT) Received: by 10.150.195.1 with HTTP; Mon, 7 Apr 2008 16:02:49 -0700 (PDT) Message-ID: <3d104d6f0804071602u2cf25d61w3cdf13f3e7ac1f51@mail.gmail.com> Date: Tue, 08 Apr 2008 02:15:00 -0000 From: "Dario Bahena Tapia" To: "Brian Budge" Subject: Re: Why worse performace in euclidean distance with SSE2? Cc: gcc-help@gcc.gnu.org In-Reply-To: <5b7094580804071551m67759fb0r84b018de3c4a4267@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3d104d6f0804070617u47213cc8nbc697dab9dc262b5@mail.gmail.com> <47FA3C65.6020701@gmx.ch> <3d104d6f0804070908q7ee3513ehd18db00437c6d835@mail.gmail.com> <5b7094580804071551m67759fb0r84b018de3c4a4267@mail.gmail.com> X-IsSubscribed: yes Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2008-04/txt/msg00088.txt.bz2 Hello, Think I concur, indeed, original program had structure of arrays (each coordinate in separate array). Will try to use SSE2 over that flavor, although I think sqrt will still be the bottleneck ... maybe I could use also another norm function (like maximum or taxicab). Thanks. On Mon, Apr 7, 2008 at 5:51 PM, Brian Budge wrote: > In my experience, SSE is generally more useful when you can optimize > your structures as SOA (struct of array) vs AOS (array of struct). If > you expect a speed up by doing individual groups of pairs of doubles, > I doubt you'll see much improvement except in extreme situations, or > when the compiler might detect a pattern in your code. Also, shuffles > etc... are killers. > > Much better would be if you had 10000 of these things to take > distances at once, and you could lay out the data friendlier for SSE > (SOA). > > Brian > > > > On Mon, Apr 7, 2008 at 9:08 AM, Dario Bahena Tapia wrote: > > Hello, > > > > I tried with your options but it seems to make no difference. In > > another email it was suggested to use _mm_sqrt_sd, because I only > > needed one sqrt calculation. That improved time and indeed, almost > > reach serial version (now it runs up to 1 second slower for the 10,000 > > data example, hehe). > > > > But of course, I would wanna/expect the vector version to run faster > > ... still unsure how to achieve that. > > > > Thanks > > > > > > > > On Mon, Apr 7, 2008 at 10:23 AM, jlh wrote: > > > Dario Bahena Tapia wrote: > > > > > > > > > > > inline static double dist(int i,int j) > > > > { > > > > double xd = C[i][X] - C[j][X]; > > > > double yd = C[i][Y] - C[j][Y]; > > > > return rint(sqrt(xd*xd + yd*yd)); > > > > } > > > > [...] > > > > > > > > And in order to activate the SSE2 features, I am using the following > > > > flags for gcc (my computer is a laptop): > > > > > > > > CFLAGS = -O -Wall -march=pentium-m -msse2 > > > > > > > > > > These options do not make dist() use any SSE for me. Have you > > > tried compiling with this? > > > > > > CFLAGS = -O2 -Wall -march=pentium-m -mfpmath=sse > > > > > > I think -msse2 is redundant if you say -march-pentium-m. I don't > > > have an SSE2 machine to try this though. > > > > > > jlh > > > > > >