From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-288936-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20407 invoked by alias); 15 Jul 2009 20:27:57 -0000
Received: (qmail 20355 invoked by uid 48); 15 Jul 2009 20:27:44 -0000
Date: Wed, 15 Jul 2009 20:27:00 -0000
Message-ID: <20090715202744.20354.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-40766-15353@http.gcc.gnu.org/bugzilla/>
Subject: [Bug fortran/40766] this fortran program is too slow
In-Reply-To: <bug-40766-15353@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "burnus at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2009-07/txt/msg01317.txt.bz2


------- Comment #6 from burnus at gcc dot gnu dot org  2009-07-15 20:27 -------
You should also add -march=native to the command line; it probably does not
help much, bit it should help a bit. I recall also the standard GLIBC misses
some optimized version for math on x86-64 while AMD provides patches for those
(applied by standard on SUSE Linux). Though, I am not sure whether this is
still an issue.

With openSUSE Factory (x86_64, glibc 2.10.1, GCC 4.5.0) I get on an AMD Athlon
64 x2 4800+ the following timings, which do not look too bad:

$ ifort -O3 -xHost aa.f90; time ./a.out/
real  1m59.997s    user  1m59.651s   sys   0m0.252s

$ gfortran -O3 -ffast-math -march=native aa.f90; time ./a.out
real  2m29.711s    user  2m28.841s   sys   0m0.236s

$ gfortran -O3 -ffast-math  -mveclibabi=acml -march=native aa.f90 \
  -L /opt/acml4.2.0/gfortran64_mp/lib/ -lacml_mv   #(Note: current is ACML 4.3)
real  2m29.693s    user  2m29.373s   sys   0m0.192s

$ gfortran -O3 -ffast-math  -mveclibabi=svml -march=native aa.f90 \
  -L /opt/intel/Compiler/11.1/038/lib/intel64 -lsvml -limf -lintlc; \
  time ./a.out
real  3m56.189s    user  3m55.839s   sys   0m0.200s

Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a
slowdown of 25%, which is still acceptable. Why the Intel routines are so slow
on my AMD, I do not know.

With -mveclibabi=svml sincosf and tanf are linked; for -mveclibabi=acml and no
-mvec* option, sincosf and tanf@@GLIBC_2.2.5. ifort by contrast calls:
vmlsSinCos4 vmlsTan4

Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor for
ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766