From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-376585-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 22376 invoked by alias); 11 Dec 2011 14:08:31 -0000
Received: (qmail 22366 invoked by uid 22791); 11 Dec 2011 14:08:29 -0000
X-SWARE-Spam-Status: No, hits=-2.8 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00,TW_PM
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 11 Dec 2011 14:08:16 +0000
From: "dominiq at lps dot ens.fr" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107
Date: Sun, 11 Dec 2011 14:14:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: lto
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dominiq at lps dot ens.fr
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-51497-4-U22k7smkZK@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-51497-4@http.gcc.gnu.org/bugzilla/>
References: <bug-51497-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2011-12/txt/msg01151.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51497
--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-11 14:07:59 UTC ---
Upon further looking at the assembly, I have found that the seven loops in
spmmult are all vectorized without -flto, while none of them are with -flto. 

For nf2dprecon after trisolve inlining, the code looks like

subroutine NF2DPrecon(x,gi,au1,au2,i1,i2,nx)       ! 2D NF Preconditioning
matrix
implicit none
integer :: i1,i2,nx
real(8),dimension(i2)::x,t,gi,au1,au2
integer :: i,j
do i = i1 , i2 , nx
   if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
   x(i) = gi(i)* x(i)
   do j = i+1 , i+nx-1
      x(j) = gi(j)*(x(j)-au1(j-1)*x(j-1))
   enddo
   do j = i+nx-2 , i , -1
      x(j) = x(j) - gi(j)*au1(j)*x(j+1)
   enddo
enddo 
do i = i2-2*nx+1 , i1 , -nx
   t(i:i+nx-1) = au2(i:i+nx-1)*x(i+nx:i+2*nx-1)
   t(i) = gi(i)* t(i)
   do j = i+1 , i+nx-1
      t(j) = gi(j)*(t(j)-au1(j-1)*t(j-1))
   enddo
   do j = i+nx-2 , i , -1
      t(j) = t(j) - gi(j)*au1(j)*t(j+1)
   enddo
   x(i:i+nx-1) = x(i:i+nx-1) - t(i:i+nx-1)
enddo
end subroutine NF2DPrecon            !=========================================

where none of the explicit 'do j' loops are vectorized ("possible dependence
between data-refs") while the three implicit loops are vectorized without
-flto, while only the last two are with -flto. Note that the first loop not
vectorized with -lflto:

x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)

is vectorized without it with "created 1 versioning for alias checks." (alias
between au2 and x? if yes, valid Fortran codes guarantee that there is no
aliasing).