From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24725 invoked by alias); 22 Apr 2015 16:59:41 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 24670 invoked by uid 89); 22 Apr 2015 16:59:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.8 required=5.0 tests=AWL,BAYES_05,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: moene.org Received: from moene.org (HELO moene.org) (80.101.130.238) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 22 Apr 2015 16:59:40 +0000 Received: from localhost ([::1] helo=moene.org) by moene.org with esmtp (Exim 4.84) (envelope-from ) id 1Ykxzr-0001KH-GH for gcc@gcc.gnu.org; Wed, 22 Apr 2015 18:59:35 +0200 Message-ID: <5537D377.1000603@moene.org> Date: Wed, 22 Apr 2015 19:19:00 -0000 From: Toon Moene User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: gcc mailing list Subject: Loop fusion. Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg00294.txt.bz2 L.S., Last week, a colleague of mine from Meteo France held a talk at the yearly meeting of all researchers working on HARMONIE (see http://hirlam.org) discussing the performance of our code when compiled with each of the supported compilers on the Cray XC30 at ECMWF (http://www.ecmwf.int/en/computing/our-facilities). In the context of GCC this is relevant, because one of the three compilers is gfortran (version 4.9.2). One of his slides discussed the differences in optimizations that the three compilers offer; I was surprised to learn that GCC/gfortran doesn't do loop fusion *at all*. Note, I discussed loop fusion (among other optimizations) at LinuxExpo 99 (http://moene.org/~toon/nwp.ps) which, unsurprisingly, was held 16 years ago :-) Why is loop fusion important, especially in Fortran 90 and later programs ? Because without it, every array assignment is a single loop nest, isolated from related, same-shape assignments. Consider this (artificial, but typical) example [updating atmospheric quantities after the computation of the rate of change during a time step of the integration]: SUBROUTINE UPDATE_DT(T, U, V, Q, DTDT, DUDT, DVDT, DQDT, & & NLON, NLAT, NLEV, TSTEP) ... REAL, DIMENSION(NLON, NLAT, NLEV) :: T, U, V, Q, DTDT, DUDT, DVDT, DQDT ... T = T + TSTEP*DTDT ! Update temperature U = U + TSTEP*DUDT ! Update east-west wind component V = V + TSTEP*DVDT ! Update north-south wind component Q = Q + TSTEP*DQDT ! Update specific humidity ... END This generates four consecutive 3 deep loop nests over NLEV, NLAT, NLON. Of course, it would be much more efficient if this were just one loop nest, as Fortran 77 programmers would write it: DO JLEV = 1, NLEV DO JLAT = 1, NLAT DO JLON = 1, NLON T(JLON, JLAT, JLEV) = T(JLON, JLAT, JLEV) + TSTEP*DTDT(JLON, JLAT, JLEV) U(JLON, JLAT, JLEV) = U(JLON, JLAT, JLEV) + TSTEP*DUDT(JLON, JLAT, JLEV) V(JLON, JLAT, JLEV) = V(JLON, JLAT, JLEV) + TSTEP*DVDT(JLON, JLAT, JLEV) Q(JLON, JLAT, JLEV) = Q(JLON, JLAT, JLEV) + TSTEP*DQDT(JLON, JLAT, JLEV) ENDDO ENDDO ENDDO After a loop fusion optimization pass the Fortran 90 and the Fortran 77 code should result in the same assembler output. Is this something the Graphite infrastructure could help with ? From the wiki documentation I get the impression that it only works on single loop nests, but I must confess that I am not familiar with the nomenclature in its description ... Would it be hard to write a loop fusion pass otherwise ? Kind regards, -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news