From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20407 invoked by alias); 15 Jul 2009 20:27:57 -0000 Received: (qmail 20355 invoked by uid 48); 15 Jul 2009 20:27:44 -0000 Date: Wed, 15 Jul 2009 20:27:00 -0000 Message-ID: <20090715202744.20354.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug fortran/40766] this fortran program is too slow In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "burnus at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-07/txt/msg01317.txt.bz2 ------- Comment #6 from burnus at gcc dot gnu dot org 2009-07-15 20:27 ------- You should also add -march=native to the command line; it probably does not help much, bit it should help a bit. I recall also the standard GLIBC misses some optimized version for math on x86-64 while AMD provides patches for those (applied by standard on SUSE Linux). Though, I am not sure whether this is still an issue. With openSUSE Factory (x86_64, glibc 2.10.1, GCC 4.5.0) I get on an AMD Athlon 64 x2 4800+ the following timings, which do not look too bad: $ ifort -O3 -xHost aa.f90; time ./a.out/ real 1m59.997s user 1m59.651s sys 0m0.252s $ gfortran -O3 -ffast-math -march=native aa.f90; time ./a.out real 2m29.711s user 2m28.841s sys 0m0.236s $ gfortran -O3 -ffast-math -mveclibabi=acml -march=native aa.f90 \ -L /opt/acml4.2.0/gfortran64_mp/lib/ -lacml_mv #(Note: current is ACML 4.3) real 2m29.693s user 2m29.373s sys 0m0.192s $ gfortran -O3 -ffast-math -mveclibabi=svml -march=native aa.f90 \ -L /opt/intel/Compiler/11.1/038/lib/intel64 -lsvml -limf -lintlc; \ time ./a.out real 3m56.189s user 3m55.839s sys 0m0.200s Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a slowdown of 25%, which is still acceptable. Why the Intel routines are so slow on my AMD, I do not know. With -mveclibabi=svml sincosf and tanf are linked; for -mveclibabi=acml and no -mvec* option, sincosf and tanf@@GLIBC_2.2.5. ifort by contrast calls: vmlsSinCos4 vmlsTan4 Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor for ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766