From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3139 invoked by alias); 14 Jan 2011 20:04:38 -0000 Received: (qmail 3124 invoked by uid 22791); 14 Jan 2011 20:04:37 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 14 Jan 2011 20:04:32 +0000 From: "Joost.VandeVondele at pci dot uzh.ch" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/47298] New: -O3 destroys beautifully vectorized code obtained at -O2 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: Joost.VandeVondele at pci dot uzh.ch X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Fri, 14 Jan 2011 20:53:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-01/txt/msg01406.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298 Summary: -O3 destroys beautifully vectorized code obtained at -O2 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned@gcc.gnu.org ReportedBy: Joost.VandeVondele@pci.uzh.ch current trunk generates really fast vectorized code for the following testcase (a 12x12x12 matrix multiply, c=c+a*b, benchmarked with a,b,c in cache) as can be seen from the assembly: > cat compare.f90 SUBROUTINE HARD_NN_12_12_12(C,A,B) REAL(KIND=8), INTENT(INOUT) :: C(12,*) REAL(KIND=8), INTENT(IN) :: B(12,*), A(12,*) INTEGER ::i,j,l DO j=1,12 ; DO i=1,12; DO l=1,12 C(i,j)=C(i,j)+A(i,l)*B(l,j) ENDDO ; ENDDO ; ENDDO END SUBROUTINE HARD_NN_12_12_12 however, this only happens with: gfortran-trunk -O2 -funroll-loops -ftree-vectorize -ffast-math -march=corei7 -msse4.2 compare.f90 while switch -O2 to -O3 causes 'bad' code. gfortran-trunk -O3 -funroll-loops -ftree-vectorize -ffast-math -march=corei7 -msse4.2 compare.f90 with the following tester below -O2 runs in about 4.4s -O3 runs in about 7.0s > cat test_compare.f90 REAL(KIND=8), DIMENSION(12,12) :: A,B,C A=0 ; B=0 ; C=0 DO I=1,10000000 CALL HARD_NN_12_12_12(C,12,A,12,B,12) ENDDO END