public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Loop fusion.
@ 2018-04-22 15:23 Toon Moene
  2018-04-23 11:00 ` Bin.Cheng
  2018-04-23 11:02 ` Richard Biener
  0 siblings, 2 replies; 13+ messages in thread
From: Toon Moene @ 2018-04-22 15:23 UTC (permalink / raw)
  To: gcc mailing list

A few days ago there was a rant on the Fortran Standardization 
Committee's e-mail list about Fortran's "whole array arithmetic" being 
unoptimizable.

An example picked at random from our weather forecasting code:

     ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
     ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
     ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
     ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)

The reaction from one of the members of the committee (about "their" 
compiler):

'And multiple consecutive array statements with the same shape are 
“fused” exactly so that the compiler can generate good cache use. This 
sort of optimization is pretty low hanging fruit.'

As far as I can see loop fusion as a stand-alone optimization is not 
supported as yet, although some mention is made in the context of graphite.

Is this something that should be pursued ?

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Loop fusion.
@ 2015-04-22 19:19 Toon Moene
  2015-04-22 20:05 ` Steven Bosscher
  0 siblings, 1 reply; 13+ messages in thread
From: Toon Moene @ 2015-04-22 19:19 UTC (permalink / raw)
  To: gcc mailing list

L.S.,

Last week, a colleague of mine from Meteo France held a talk at the 
yearly meeting of all researchers working on HARMONIE (see 
http://hirlam.org) discussing the performance of our code when compiled 
with each of the supported compilers on the Cray XC30 at ECMWF 
(http://www.ecmwf.int/en/computing/our-facilities).

In the context of GCC this is relevant, because one of the three 
compilers is gfortran (version 4.9.2).

One of his slides discussed the differences in optimizations that the 
three compilers offer; I was surprised to learn that GCC/gfortran 
doesn't do loop fusion *at all*. Note, I discussed loop fusion (among 
other optimizations) at LinuxExpo 99 (http://moene.org/~toon/nwp.ps) 
which, unsurprisingly, was held 16 years ago :-)

Why is loop fusion important, especially in Fortran 90 and later programs ?

Because without it, every array assignment is a single loop nest, 
isolated from related, same-shape assignments.

Consider this (artificial, but typical) example [updating atmospheric 
quantities after the computation of the rate of change during a time 
step of the integration]:

SUBROUTINE UPDATE_DT(T, U, V, Q, DTDT, DUDT, DVDT, DQDT, &
    & NLON, NLAT, NLEV, TSTEP)
...
REAL, DIMENSION(NLON, NLAT, NLEV) :: T, U, V, Q, DTDT, DUDT, DVDT, DQDT
...
T = T + TSTEP*DTDT ! Update temperature
U = U + TSTEP*DUDT ! Update east-west wind component
V = V + TSTEP*DVDT ! Update north-south wind component
Q = Q + TSTEP*DQDT ! Update specific humidity
...
END

This generates four consecutive 3 deep loop nests over NLEV, NLAT, NLON.
Of course, it would be much more efficient if this were just one loop 
nest, as Fortran 77 programmers would write it:

DO JLEV = 1, NLEV
   DO JLAT = 1, NLAT
     DO JLON = 1, NLON
       T(JLON, JLAT, JLEV) = T(JLON, JLAT, JLEV) + TSTEP*DTDT(JLON, 
JLAT, JLEV)
       U(JLON, JLAT, JLEV) = U(JLON, JLAT, JLEV) + TSTEP*DUDT(JLON, 
JLAT, JLEV)
       V(JLON, JLAT, JLEV) = V(JLON, JLAT, JLEV) + TSTEP*DVDT(JLON, 
JLAT, JLEV)
       Q(JLON, JLAT, JLEV) = Q(JLON, JLAT, JLEV) + TSTEP*DQDT(JLON, 
JLAT, JLEV)
     ENDDO
   ENDDO
ENDDO

After a loop fusion optimization pass the Fortran 90 and the Fortran 77 
code should result in the same assembler output.

Is this something the Graphite infrastructure could help with ? From the 
wiki documentation I get the impression that it only works on single 
loop nests, but I must confess that I am not familiar with the 
nomenclature in its description ...

Would it be hard to write a loop fusion pass otherwise ?

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-25  6:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-22 15:23 Loop fusion Toon Moene
2018-04-23 11:00 ` Bin.Cheng
2018-04-23 12:31   ` Richard Biener
2018-04-23 12:47     ` Janne Blomqvist
2018-04-23 14:11       ` Richard Biener
2018-04-23 11:02 ` Richard Biener
2018-04-24  2:22   ` Toon Moene
2018-04-24 12:58     ` Richard Biener
2018-04-25  8:06       ` Toon Moene
  -- strict thread matches above, loose matches on Subject: below --
2015-04-22 19:19 Toon Moene
2015-04-22 20:05 ` Steven Bosscher
2015-04-23  4:58   ` Toon Moene
2015-04-23 17:17     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).