From: "Dominique d'Humières" <dominiq@lps.ens.fr>
To: Thomas Koenig <tkoenig@netcologne.de>
Cc: gcc-patches@gcc.gnu.org, fortran@gcc.gnu.org, jvdelisle@charter.com
Subject: Re: [patch, fortran, RFC] First steps towards inlining matmul
Date: Sun, 05 Apr 2015 18:25:00 -0000 [thread overview]
Message-ID: <4DA1047C-92D3-485A-9457-61655ED14681@lps.ens.fr> (raw)
In-Reply-To: <552147A3.7000603@netcologne.de>
I have done some timings
(1) with the test given below, before the patch I get (last column in Gflops)
[Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate
[Book15] f90/bug% time a.out
Time, MATMUL: 373.708008 373.69497100000001 4.2815668504139435
Time, MATMUL: 172.086609 23.117919000000001 69.210381782201068
545.374u 0.537s 6:36.93 137.5% 0+0k 0+0io 0pf+0w
[Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90 -framework Accelerate
[Book15] f90/bug% time a.out
Time, MATMUL: 176.327881 23.855111999999998 67.071577781735002
Time, MATMUL: 182.086746 24.453551000000001 65.430170039516952
358.059u 0.471s 0:48.43 740.2% 0+0k 0+0io 0pf+0w
after the patch
[Book15] f90/bug% time a.out
Time, MATMUL: 392.415436 392.41728799999999 4.0772923337669056
Time, MATMUL: 171.690399 22.905118000000002 69.853383859450091
563.671u 0.551s 6:55.44 135.8% 0+0k 0+0io 0pf+0w
[Book15] f90/bug% gfc -Ofast -fexternal-blas timing/matmul_tst_sys.f90 -framework Accelerate
[Book15] f90/bug% time a.out
Time, MATMUL: 392.850342 392.84190799999999 4.0728852177349673
Time, MATMUL: 174.534821 23.784797000000001 67.269861500184334
566.824u 0.674s 6:56.74 136.1% 0+0k 0+0io 0pf+0w
which means that -fexternal-blas should disable the inlining.
program t2
implicit none
REAL time_begin, time_end
integer, parameter :: n=2000;
integer(8) :: ts, te, rate8, cmax8
real(8) :: elapsed
REAL(8) :: a(n,n), b(n,n), c(n,n)
integer, parameter :: m = 100
integer :: i
call RANDOM_NUMBER(a)
call RANDOM_NUMBER(b)
call cpu_time(time_begin)
call SYSTEM_CLOCK (ts, rate8, cmax8)
do i = 1,m
a(1,1) = a(1,1) + 0.1
c = MATMUL(a,b)
enddo
call SYSTEM_CLOCK (te, rate8, cmax8)
call cpu_time(time_end)
elapsed = real(te-ts, kind=8)/real(rate8, kind=8)
PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n, kind=8)**3/(10**9*elapsed)
call cpu_time(time_begin)
call SYSTEM_CLOCK (ts, rate8, cmax8)
do i = 1,m
a(1,1) = a(1,1) + 0.1
call dgemm('n','n',n, n, n, dble(1.0), a, n, b, n, dble(0.0), c, n)
enddo
call SYSTEM_CLOCK (te, rate8, cmax8)
call cpu_time(time_end)
elapsed = real(te-ts, kind=8)/real(rate8, kind=8)
PRINT *, 'Time, MATMUL: ',time_end-time_begin, elapsed , 2*m*real(n, kind=8)**3/(10**9*elapsed)
end program
! http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/1cba8e6ce5080197
(2) The polyhedron test compiled with -fprotect-parens -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer -fwhole-program -flto runs in ~6.5s. After applying the patch below, it runs in ~66s with/without the patch.
Dominique
--- induct.f90 2005-10-11 22:53:32.000000000 +0200
+++ induct_vmc.f90 2015-04-05 19:06:30.000000000 +0200
@@ -1644,18 +1644,17 @@ contains
coil_tmp_vector(1) = -sin(theta)
coil_tmp_vector(2) = cos(theta)
coil_tmp_vector(3) = 0.0_longreal
- coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:))
- coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:))
- coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:))
+ coil_current_vec = matmul(rotate_coil,coil_tmp_vector)
!
do j = 1, 9
c_vector(3) = 0.5 * h_coil * z1gauss(j)
!
! rotate coil vector into the global coordinate system and translate it
!
- rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
- rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
- rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
+ rot_c_vector = matmul(rotate_coil,c_vector)
+ rot_c_vector(1) = rot_c_vector(1) + dx
+ rot_c_vector(2) = rot_c_vector(2) + dy
+ rot_c_vector(3) = rot_c_vector(3) + dz
!
do k = 1, 9
q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
@@ -1664,9 +1663,7 @@ contains
!
! rotate quad vector into the global coordinate system
!
- rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
- rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
- rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
+ rot_q_vector = matmul(rotate_quad,q_vector)
!
! compute and add in quadrature term
!
@@ -1756,18 +1753,17 @@ contains
coil_tmp_vector(1) = -sin(theta)
coil_tmp_vector(2) = cos(theta)
coil_tmp_vector(3) = 0.0_longreal
- coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:))
- coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:))
- coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:))
+ coil_current_vec = matmul(rotate_coil,coil_tmp_vector)
!
do j = 1, 9
c_vector(3) = 0.5 * h_coil * z1gauss(j)
!
! rotate coil vector into the global coordinate system and translate it
!
- rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
- rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
- rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
+ rot_c_vector = matmul(rotate_coil,c_vector)
+ rot_c_vector(1) = rot_c_vector(1) + dx
+ rot_c_vector(2) = rot_c_vector(2) + dy
+ rot_c_vector(3) = rot_c_vector(3) + dz
!
do k = 1, 9
q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
@@ -1776,9 +1772,7 @@ contains
!
! rotate quad vector into the global coordinate system
!
- rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
- rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
- rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
+ rot_q_vector = matmul(rotate_quad,q_vector)
!
! compute and add in quadrature term
!
@@ -2061,18 +2055,17 @@ contains
!
! compute current vector for the coil in the global coordinate system
!
- coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:))
- coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:))
- coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:))
+ coil_current_vec = matmul(rotate_coil,coil_tmp_vector)
!
do j = 1, 9
c_vector(3) = 0.5 * h_coil * z1gauss(j)
!
! rotate coil vector into the global coordinate system and translate it
!
- rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
- rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
- rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
+ rot_c_vector = matmul(rotate_coil,c_vector)
+ rot_c_vector(1) = rot_c_vector(1) + dx
+ rot_c_vector(2) = rot_c_vector(2) + dy
+ rot_c_vector(3) = rot_c_vector(3) + dz
!
do k = 1, 9
q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
@@ -2081,9 +2074,7 @@ contains
!
! rotate quad vector into the global coordinate system
!
- rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
- rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
- rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
+ rot_q_vector = matmul(rotate_quad,q_vector)
!
! compute and add in quadrature term
!
@@ -2204,18 +2195,17 @@ contains
!
! compute current vector for the coil in the global coordinate system
!
- coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:))
- coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:))
- coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:))
+ coil_current_vec = matmul(rotate_coil,coil_tmp_vector)
!
do j = 1, 9
c_vector(3) = 0.5 * h_coil * z1gauss(j)
!
! rotate coil vector into the global coordinate system and translate it
!
- rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
- rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
- rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
+ rot_c_vector = matmul(rotate_coil,c_vector)
+ rot_c_vector(1) = rot_c_vector(1) + dx
+ rot_c_vector(2) = rot_c_vector(2) + dy
+ rot_c_vector(3) = rot_c_vector(3) + dz
!
do k = 1, 9
q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
@@ -2224,9 +2214,7 @@ contains
!
! rotate quad vector into the global coordinate system
!
- rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
- rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
- rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
+ rot_q_vector = matmul(rotate_quad,q_vector)
!
! compute and add in quadrature term
!
> Le 5 avr. 2015 à 16:33, Thomas Koenig <tkoenig@netcologne.de> a écrit :
>
> Hi Dominique,
>
>> IMO the inlining of MATMUL should be restricted to small matrices (less than 4x4, 9x9
>> or 16x16 depending of your field!-)
>
> The problem with the library function we have is that it is quite
> general; it can deal with all the complexity of assumed-shape array
> arguments. Inlining allows the compiler to take advantage of
> contiguous memory, compile-time knowledge of array sizes, etc.
> to allow for vectorization or even complete unrolling.
>
> Of course, if you use c=matmul(a,b) where all three are assumed-shape
> arrays, your advantage over the library function will be minimal at
> best.
>
> It will be interesting to see at what size calling BLAS directly
> will be better.
>
> Thomas
next prev parent reply other threads:[~2015-04-05 18:25 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-05 13:55 Dominique Dhumieres
2015-04-05 14:33 ` Thomas Koenig
2015-04-05 18:25 ` Dominique d'Humières [this message]
2015-04-05 20:37 ` Thomas Koenig
2015-04-05 23:15 ` Dominique d'Humières
2015-04-06 12:18 ` Dominique d'Humières
2015-04-09 22:18 ` Thomas Koenig
2015-04-10 14:15 ` Dominique d'Humières
2015-04-10 16:57 ` Dominique d'Humières
2015-04-11 12:24 ` Thomas Koenig
2015-04-11 13:43 ` Mikael Morin
2015-04-11 17:01 ` Thomas Koenig
-- strict thread matches above, loose matches on Subject: below --
2015-04-05 12:32 Thomas Koenig
2015-04-05 13:20 ` Jerry DeLisle
2015-04-05 13:48 ` Thomas Koenig
2015-04-07 19:24 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DA1047C-92D3-485A-9457-61655ED14681@lps.ens.fr \
--to=dominiq@lps.ens.fr \
--cc=fortran@gcc.gnu.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=jvdelisle@charter.com \
--cc=tkoenig@netcologne.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).