From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1210 invoked by alias); 11 May 2010 17:49:30 -0000 Received: (qmail 1198 invoked by uid 22791); 11 May 2010 17:49:28 -0000 X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,TW_EG,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.35) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 11 May 2010 17:49:21 +0000 Received: from wpaz13.hot.corp.google.com (wpaz13.hot.corp.google.com [172.24.198.77]) by smtp-out.google.com with ESMTP id o4BHnHLd024295 for ; Tue, 11 May 2010 10:49:18 -0700 Received: from vws8 (vws8.prod.google.com [10.241.21.136]) by wpaz13.hot.corp.google.com with ESMTP id o4BHn48f032168 for ; Tue, 11 May 2010 10:49:16 -0700 Received: by vws8 with SMTP id 8so178908vws.36 for ; Tue, 11 May 2010 10:49:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.124.194 with SMTP id v2mr376970vcr.125.1273600156061; Tue, 11 May 2010 10:49:16 -0700 (PDT) Received: by 10.220.17.8 with HTTP; Tue, 11 May 2010 10:49:15 -0700 (PDT) In-Reply-To: <4BE99178.6030007@moene.org> References: <4BE99178.6030007@moene.org> Date: Tue, 11 May 2010 17:49:00 -0000 Message-ID: Subject: Re: IVOPT improvement patch From: Xinliang David Li To: Toon Moene Cc: GCC Patches Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-05/txt/msg00777.txt.bz2 On Tue, May 11, 2010 at 10:18 AM, Toon Moene wrote: > On 05/11/2010 08:35 AM, Xinliang David Li wrote: > >> Hi, IVOPT has been one of the main area of complaints from gcc users >> and it is often shutdown or user is forced to use inline assembly to >> write key kernel loops. The following (resulting from the >> investigation of many user complaints) summarize some of the key >> problems: > >> 6) IN MEM_REF creation, loop variant and invariants may be assigned to >> the same part -- which is essentially a re-association blocking LIM > > On the other hand, some recombination of induction variables is necessary= to > prevent excessive register pressure (and the resulting spills). > > From my slides at the May, 1999 Linux Expo: > > "Let's turn our attention to the kinetic energy loop again: > \begin{verbatim} > =A0 =A0 =A0DO 810 I=3DILONP2,ILNLT > =A0 =A0 =A0 =A0 ZEK(I) =3D 0.25 * > =A0 =A0 + =A0 =A0 =A0 =A0( ( PUZ(I-1 =A0 ,K)*PUZ(I-1 =A0 ,K) > =A0 =A0 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *HYU(I-1 =A0 ) > =A0 =A0 + =A0 =A0 =A0 =A0 =A0+ PUZ(I =A0 =A0 ,K)*PUZ(I =A0 =A0 ,K) > =A0 =A0 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *HYU(I =A0 =A0 ))*RHYV (I) > =A0 =A0 + =A0 =A0 =A0 =A0+ ( PVZ(I-ILON,K)*PVZ(I-ILON,K) > =A0 =A0 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *HXV(I-ILON) > =A0 =A0 + =A0 =A0 =A0 =A0 =A0+ PVZ(I =A0 =A0 ,K)*PVZ(I =A0 =A0 ,K) > =A0 =A0 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *HXV(I =A0 =A0 ))*RHXU (I) ) > =A0810 =A0CONTINUE > \end{verbatim} > If we strength reduce all induction variables and move > all loop invariant code out of the loop, we need > 11 registers to hold the addresses needed to step through > the arrays. > \end{slide} > \begin{slide}{} > We can do better, by noting that > \begin{verbatim} > { PUZ(I-1 =A0 ,K), PUZ(I =A0 =A0 ,K) } > { PVZ(I-ILON,K), PVZ(I =A0 =A0 ,K) } > { HYU(I-1 =A0 ) =A0, HYU(I =A0 =A0 ) =A0 } > { HXV(I-ILON) =A0, HXV(I =A0 =A0 ) =A0 } > \end{verbatim} > form 4 {\em equivalence classes} of induction variables > that differ only by a constant - which means they > can be written in the form of address-register-with-offset. > Finding the optimal partition and iv selection/assignment is the core part of the IVOPT. GCC IVOPT uses a cost based approach to achieve that, however there are some issues that prevent the above optimal solution above. One of the intention of this patch is to address it. The problem described in 6 is different. Assuming we have the following linear address expressions: base + iv + inv*coeff, where iv is the induction variable, and inv is loop invariant and coeff is constant 2/4/8, the synthesized mem ref (without the fix) can be: MEM_REF(base + iv, inv, coeff) With the fix, MEM_REF(base + inv*coeff, iv, 1) Where base+inv*coeff can be later hoisted out of the loop. Of course the added cost is the increased register pressure -- but this is modelled by the so call pseudo invariant. Thanks, David > In doing so, we save 4 registers and need only 7 registers > for addressing. > > Richard Henderson implemented this optimization, which > will be part of egcs-1.2 ... I mean gcc-2.95." > > That was '99, so it was discussing code compiled by g77 and using RTL > optimization passes only - but the idea is the same. > > Kind regards, > > -- > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG =A0Maartensdijk, The Netherlands > At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran >