* 10 to 20% speedup with -m64 on Intel Core2Duo
@ 2007-12-28 11:22 Dominique Dhumieres
0 siblings, 0 replies; only message in thread
From: Dominique Dhumieres @ 2007-12-28 11:22 UTC (permalink / raw)
To: fortran; +Cc: gcc
Some time ago I had a look at pr30388 and got the following results:
g77 -O2 g95 -O2 gfc -O2 gfc -m64 -O2
MFLOPS: 1063 1061 858 1129
ref. g77 -19% +6%
Since the evening is quite calm I decided to check if this speedup with
-m64 is generic or not and I got the following timings for the Polyhedron
test suite:
================================================================================
Date & Time : 27 Dec 2007 22:24:03
Test Name : pbharness
Compile Command : gfc %n.f90 -m64 -O3 -ffast-math -funroll-loops -finline-limit=600 --param min-vect-loop-bound=2 -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 4.27 50712 13.10 2 0.0420
aermod 100.72 1200712 30.19 2 0.0066
air 6.68 73204 9.37 2 0.0267
capacita 3.92 64520 56.49 2 0.0628
channel 2.43 42752 2.29 2 0.0437
doduc 14.42 179504 48.66 2 0.0021
fatigue 5.69 76696 11.17 5 0.3700
gas_dyn 6.32 700392 10.24 5 0.7605
induct 12.79 160672 66.27 2 0.0053
linpk 1.53 38400 27.54 2 0.0000
mdbx 3.77 68856 15.16 2 0.0099
nf 11.69 112312 31.63 2 0.0174
protein 10.71 110048 46.78 2 0.0064
rnflow 10.95 163144 37.28 2 0.0268
test_fpu 10.08 150080 12.72 2 0.0314
tfft 1.37 30488 2.79 2 0.1074
Geometric Mean Execution Time = 18.20 seconds
================================================================================
Date & Time : 27 Dec 2007 22:44:36
Test Name : pbharness
Compile Command : gfc %n.f90 -O3 -ffast-math -funroll-loops -finline-limit=600 --param min-vect-loop-bound=2 -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 4.48 46532 16.88 2 0.0207
aermod 104.92 1288460 37.09 2 0.0081
air 6.67 80956 11.36 5 0.0849
capacita 3.79 68332 62.40 2 0.0048
channel 2.65 50780 2.51 4 0.1828
doduc 14.27 183264 57.41 2 0.0009
fatigue 6.11 84564 14.02 2 0.0642
gas_dyn 5.93 699872 12.01 5 0.2754
induct 11.83 160132 73.59 2 0.0177
linpk 1.67 46512 27.57 2 0.0145
mdbx 3.84 72672 16.78 2 0.0149
nf 16.73 157220 31.86 2 0.0016
protein 11.62 113868 54.90 2 0.0337
rnflow 11.87 187316 45.56 2 0.0889
test_fpu 11.38 182544 14.56 2 0.0653
tfft 1.44 34420 3.03 5 0.2973
Geometric Mean Execution Time = 20.86 seconds
================================================================================
Polyhedron Benchmark Validator
Copyright (C) Polyhedron Software Ltd - 2004 - All rights reserved
The results have been obtain on an Intel Core2Duo 2.16Ghz with 2Gb of RAM
under Darwin9.1 with gfortran 4.3 at revision 131206.
Is this 10 to 20% speedup with -m64 expected? and how generic is it?
In the assembly code of the inner loop of the test case in PR30388,
the main differences I can see are at the level of the addressing:
%eax, %ebp, ... in 32 bit mode and %rn, ... in 64 bit mode.
TIA
Dominique
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2007-12-27 23:05 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-28 11:22 10 to 20% speedup with -m64 on Intel Core2Duo Dominique Dhumieres
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).