From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12384 invoked by alias); 4 Jun 2009 17:33:37 -0000 Received: (qmail 12366 invoked by uid 22791); 4 Jun 2009 17:33:35 -0000 X-SWARE-Spam-Status: No, hits=1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_22,J_CHICKENPOX_42,J_CHICKENPOX_52,RDNS_DYNAMIC,URIBL_BLACK X-Spam-Check-By: sourceware.org Received: from a82-93-67-168.adsl.xs4all.nl (HELO super.moene.org) (82.93.67.168) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 04 Jun 2009 17:33:29 +0000 Received: from [127.0.0.1] (ident=toon) by super.moene.org with esmtp (Exim 4.69) (envelope-from ) id 1MCGod-0002oX-TS for gcc@gcc.gnu.org; Thu, 04 Jun 2009 19:33:24 +0200 Message-ID: <4A280562.6050407@moene.org> Date: Thu, 04 Jun 2009 17:33:00 -0000 From: Toon Moene User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103) MIME-Version: 1.0 To: gcc mailing list Subject: GCC Summit 2010 topic (potentially). Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-06/txt/msg00069.txt.bz2 L.S., This year I'm unable to attend the GCC Summit (both due to time and money constraints). In 2008, I pondered to talk about the effect of link time optimization on typical Fortran programs - That is, until my attention got hijacked by the geo-politically more pressing question of Coarrays in Fortran. However, the issue still stands. So I'm thinking ahead of next year (assuming LTO will work by that time for most front-end languages): What will LTO bring for Fortran ? Here's a run-of-the-mill example from our code: SUBROUTINE VERINT ( I KLON , KLAT , KLEV , KINT , KHALO I , KLON1 , KLON2 , KLAT1 , KLAT2 I , KP , KQ , KR R , PARG , PRES R , PALFH , PBETH R , PALFA , PBETA , PGAMA ) ... DO JY = KLAT1,KLAT2 DO JX = KLON1,KLON2 IDX = KP(JX,JY) IDY = KQ(JX,JY) ILEV = KR(JX,JY) C PRES(JX,JY) = PGAMA(JX,JY,1)*( C + PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV-1) + + PALFA(JX,JY,2)*PARG(IDX ,IDY-1,ILEV-1) ) + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY ,ILEV-1) + + PALFA(JX,JY,2)*PARG(IDX ,IDY ,ILEV-1) ) ) C + + + PGAMA(JX,JY,2)*( C + + PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV ) + + PALFA(JX,JY,2)*PARG(IDX ,IDY-1,ILEV ) ) + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY ,ILEV ) + + PALFA(JX,JY,2)*PARG(IDX ,IDY ,ILEV ) ) ) ENDDO ENDDO ... RETURN END There are several issues a link time optimization pass could determine: 1. Whether or not the arrays PALFA, PARG, ... are suitably aligned for vectorization (forgoing a run time check for that). 2. Wheter KLON{1,2}, KLAT{1,2} are actually invariant throughout an invocation of the execuatable (as they are in our case) (CSE of vectorization criteria). However, with a little bit of extra effort (instrumentation outside the program), the following can be determined: 3. KLON{1,2}, KLAT{1,2} are in fact known constants, which only happen to be variables because the executable is built to accommodate arbitrary grid sizes. Would it help to provide GCC with knowledge about KLON, KLAT (and thereby, KLON{1,2}, KLAT{1,2}) ? Note that this question is less academic than it seems. We often run on the same grid for years without changing an executable, so this optimization makes sense. Kind regards, -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.4/changes.html