From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2504 invoked by alias); 7 Mar 2007 10:20:39 -0000 Received: (qmail 2475 invoked by uid 48); 7 Mar 2007 10:20:29 -0000 Date: Wed, 07 Mar 2007 10:20:00 -0000 Subject: [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "fxcoudert at gcc dot gnu dot org" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2007-03/txt/msg00464.txt.bz2 [see http://www.polyhedron.co.uk/pb05/linux/f90bench_AMD.html for the original polyhedron benchmark results, an explanation of what the benchmark is and the source code] Typical timings for the gas_dyn.f90 benchmark on my AMD64/linux system are: * ifort -O3 -xW -ipo -static -V gas_dyn.f90 -o gas_dyn.intel => ./gas_dyn.intel 10.53s user 0.43s system 99% cpu 10.976 total * gfortran -static -ftree-vectorize -march=opteron -ffast-math -funroll-loops -O3 gas_dyn.f90 -o gas_dyn.gfortran ./gas_dyn.gfortran 15.92s user 0.05s system 99% cpu 15.969 total Experimenting a bit with Intel options to understand why it is so fast, I found that: * disabling inlining doesn't change the execution time * disabling vectorization drops it to the same execution time as gfortran (roughly speaking) Following an analysis by Tobias Burnus, and noting that 22.16% of the total time is spent in the MINLOC library routine, I modified the source by replacing a call to MINLOC by inline code: --- gas_dyn.f90 2007-03-07 09:36:23.000000000 +0100 +++ gas_dyn.modified.f90 2007-03-07 10:44:14.000000000 +0100 @@ -234,12 +234,23 @@ end module ints !----------------------------------------------- ! L o c a l V a r i a b l e s !----------------------------------------------- - INTEGER :: ISET(1) - REAL :: VSET, SSET + INTEGER :: ISET(1), I + REAL :: VSET, SSET, T REAL, DIMENSION (NODES) :: DTEMP !----------------------------------------------- DTEMP = DX/(ABS(VEL) + SOUND) - ISET = MINLOC (DTEMP) +! FXC replace this: +! ISET = MINLOC (DTEMP) +! by this: + ISET(1) = 0 + T = HUGE(T) + DO I = 1, NODES + IF (DTEMP(I) < T) THEN + T = DTEMP(I) + ISET(1) = I + END IF + END DO +! end of modification DT = DTEMP(ISET(1)) VSET = VEL(ISET(1)) SSET = SOUND(ISET(1)) this makes the code faster by 14%: ./gas_dyn.modified.gfortran 13.56s user 0.05s system 99% cpu 13.614 total Maybe we should have MINLOC inlined when there's no mask, stride 1 and one-dimensional? PS: Other hot spots are: % cumulative self self total time seconds seconds calls Ts/call Ts/call name 29.13 4.18 4.18 eos_ (gas_dyn.f90:410 @ 413386) 14.22 6.22 2.04 chozdt_ (gas_dyn.f90:241 @ 4152b3) Both lines are whole-array operations, corresponding to: CS(:NODES) = SQRT(CGAMMA*PRES(:NODES)/DENS(:NODES)) and DTEMP = DX/(ABS(VEL) + SOUND) I filed PR31066, which is I think a small reproducer for the two lines above. -- Summary: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fxcoudert at gcc dot gnu dot org BugsThisDependsOn: 31066 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067