From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23236 invoked by alias); 12 Mar 2007 08:16:49 -0000 Received: (qmail 23203 invoked by uid 48); 12 Mar 2007 08:16:39 -0000 Date: Mon, 12 Mar 2007 08:16:00 -0000 Message-ID: <20070312081639.23202.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "burnus at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2007-03/txt/msg01039.txt.bz2 ------- Comment #6 from burnus at gcc dot gnu dot org 2007-03-12 08:16 ------- > Can someone try instead of doing "__real__ a += w[j] *__real__ mfi[*index];" > Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication? Well, -std=c99 was used already and the "real(!) * complex" calculation was already correct. "c_cmplx" below uses now: a += w[j ] * mfi[*index++]; Compiled with: gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math -m64 gfortran -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math -m64 Fortran: 0.4360271 Fortran: 0.4280267 c_nosse: 0.2440166 c_nosse: 0.2320151 c_sse: 0.2320137 c_sse: 0.2400150 c_struct: 0.2320151 c_struct: 0.2320147 c_cmplx: 0.2360163 c_cmplx: 0.2320147 And using a non-manually unrolled version: 0.3760242, 0.3760242 for(i = 0; i < np ; i++) { for(j = 1; j < n; j++) a += w[j ] * mfi[*index++]; fo[i] = a; } Thus the unrolling seems to do most of the speed up. With -funroll-all-loops, the timings of fortran an the non-unrolled version remain the same. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139