public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "dominiq at lps dot ens.fr" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libfortran/51119] MATMUL slow for large matrices
Date: Sat, 31 Oct 2015 14:15:00 -0000 [thread overview]
Message-ID: <bug-51119-4-ybaXb6fdj4@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-51119-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
--- Comment #12 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
Some new numbers for a four cores Corei7 2.8Ghz, turboboost 3.8Ghz, 1.6Ghz DDR3
on x86_64-apple-darwin14.5 for the following test
program t2
implicit none
REAL time_begin, time_end
integer, parameter :: n=2000;
integer(8) :: ts, te, rate8, cmax8
real(8) :: elapsed
REAL(8) :: a(n,n), b(n,n), c(n,n)
integer, parameter :: m = 100
integer :: i
call RANDOM_NUMBER(a)
call RANDOM_NUMBER(b)
call cpu_time(time_begin)
call SYSTEM_CLOCK (ts, rate8, cmax8)
do i = 1,m
a(1,1) = a(1,1) + 0.1
c = MATMUL(a,b)
enddo
call SYSTEM_CLOCK (te, rate8, cmax8)
call cpu_time(time_end)
elapsed = real(te-ts, kind=8)/real(rate8, kind=8)
PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n,
kind=8)**3/(10**9*elapsed)
call cpu_time(time_begin)
call SYSTEM_CLOCK (ts, rate8, cmax8)
do i = 1,m
a(1,1) = a(1,1) + 0.1
call dgemm('n','n',n, n, n, dble(1.0), a, n, b, n, dble(0.0), c, n)
enddo
call SYSTEM_CLOCK (te, rate8, cmax8)
call cpu_time(time_end)
elapsed = real(te-ts, kind=8)/real(rate8, kind=8)
PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n,
kind=8)**3/(10**9*elapsed)
end program
borrowed from
http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/1cba8e6ce5080197
[Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate
-fno-frontend-optimize
[Book15] f90/bug% time a.out
Time, MATMUL: 374.027161 374.02889900000002 4.2777443247774283
Time, MATMUL: 172.823853 23.073034000000000 69.345019818373260
546.427u 0.542s 6:37.24 137.6% 0+0k 1+0io 41pf+0w
[Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate
[Book15] f90/bug% time a.out
Time, MATMUL: 391.495880 391.49403500000000 4.0869077353886123
Time, MATMUL: 169.313202 22.781099000000001 70.233661685944114
560.384u 0.544s 6:54.39 135.3% 0+0k 0+0io 0pf+0w
[Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate
-march=native
[Book15] f90/bug% time a.out
Time, MATMUL: 367.570374 367.56880500000000 4.3529265221514102
Time, MATMUL: 170.150818 22.837544000000001 70.060073009602078
537.306u 0.534s 6:30.53 137.7% 0+0k 0+0io 0pf+0w
where the last column is the speed in Gflops. These numbers show that the
library MATMUL is slightly faster than the inline version unless -march=native
is used (AVX should be twice faster unless limited by the memory bandwidth).
[Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90
-framework Accelerate
[Book15] f90/bug% time a.out
Time, MATMUL: 159.000992 21.450851000000000 74.589115368896088
Time, MATMUL: 172.616943 23.029487000000000 69.476145951492541
331.281u 0.453s 0:44.60 743.7% 0+0k 0+0io 3pf+0w
... repeated several time in order to heat the CPU
[Book15] f90/bug% time a.out
Time, MATMUL: 179.624268 23.935708999999999 66.845732457726655
Time, MATMUL: 178.685364 23.898668000000001 66.949337929628541
357.978u 0.447s 0:47.95 747.4% 0+0k 0+0io 0pf+0w
Thus the BLAS provided by darwin gets ~67GFlops out of the ~90GFlops peak
(AVX*4cores), while the inlined MATMUL gets ~4GFlops out of ~15Gflops peak (no
AVX, one core and turboboost) with little gain when using AVX (~30GFlops peak).
I suppose most modern OS provide such optimized BLAS and, if not, one can
install libraries such as atlas. So I wonder if it would not be more effective
to be able to configure with something such as --with-blas="magic incantation"
and use -fexternal-blas as the default rather than reinventing the wheel.
More than three years ago Janne Blomqvist (comment 7) wrote
> IIRC I reached about 30-40 % of peak flops which was a bit disappointing.
Would it be possible to have the patch to play with?
prev parent reply other threads:[~2015-10-31 14:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-14 8:16 [Bug libfortran/51119] New: " jb at gcc dot gnu.org
2011-11-14 8:17 ` [Bug libfortran/51119] " jb at gcc dot gnu.org
2011-11-14 13:56 ` burnus at gcc dot gnu.org
2011-11-15 12:35 ` Joost.VandeVondele at mat dot ethz.ch
2011-11-15 12:37 ` Joost.VandeVondele at mat dot ethz.ch
2011-11-15 16:19 ` jb at gcc dot gnu.org
2012-06-28 11:58 ` Joost.VandeVondele at mat dot ethz.ch
2012-06-28 12:15 ` jb at gcc dot gnu.org
2012-06-29 7:19 ` Joost.VandeVondele at mat dot ethz.ch
2012-06-29 10:56 ` steven at gcc dot gnu.org
2013-03-29 8:47 ` Joost.VandeVondele at mat dot ethz.ch
2013-04-01 15:59 ` tkoenig at gcc dot gnu.org
2015-10-31 14:15 ` dominiq at lps dot ens.fr [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-51119-4-ybaXb6fdj4@http.gcc.gnu.org/bugzilla/ \
--to=gcc-bugzilla@gcc.gnu.org \
--cc=gcc-bugs@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).