From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7217 invoked by alias); 13 Jan 2012 13:40:25 -0000 Received: (qmail 7201 invoked by uid 22791); 13 Jan 2012 13:40:23 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_AV,TW_DD,TW_SV X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 13 Jan 2012 13:40:10 +0000 From: "venkataramanan.kumar at amd dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. Date: Fri, 13 Jan 2012 14:08:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: venkataramanan.kumar at amd dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-01/txt/msg01516.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Bug #: 51848 Summary: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned@gcc.gnu.org ReportedBy: venkataramanan.kumar@amd.com This below test case is simulated from "air.f90" benchmark of polyhedren. What I see is vectorization makes "air" run faster with ICC than GCC by about 16%, but I am not sure if all that comes from vectorization alone. While analysing the assembly differences, found that GCC is not vectorizing the below case wheres ICC does vectorize. (Snip) DIMENSION NPX(30) , NPY(30) COMMON /XD1 / MXPy, NDX COMMON /XD2 / MXPx MXPx = 0 DO i = 1 , NDX MXPx = MXPx + NPX(i)+1 ENDDO ! END (Snip) Machine: x86_64-unknown-linux-gnu GCC revison: 183151 ICC revision: 12.1.0.233 Build 2 gcc -Ofast -march=corei7-avx -limf -lsvml -L /tool/intel/lib/intel64/ -mveclibabi=svml pattern1.f90 -ftree-vectorizer-verbose=2 -S Analyzing loop at pattern1.f90:5 5: not vectorized: unsupported use in stmt. 5: not vectorized: unsupported use in stmt. pattern1.f90:9: note: vectorized 0 loops in function. ifort -march=corei7-avx -O3 -limf -lsvml -L /tool/intel/lib/intel64/ pattern1.f90 -vec-report -S -fsource-asm pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED. For the expression: MXPx = MXPx + NPX(i)+1 The constant "1" is converted to a vector packet as shown below .L_2il0floatpacket.0: .long 0x00000001,0x00000001,0x00000001,0x00000001 The assembly pattern for the vectorization portion in ICC looks like as shown below: The total expression now becomes vectorizable. vmovdqu .L_2il0floatpacket.0(%rip), %xmm0 ..B1.5: # Preds ..B1.5 ..B1.4 vpaddd _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10 addq $4, %rax #5.7 vpaddd %xmm2, %xmm1, %xmm1 #6.22 cmpq %rdx, %rax #5.7 jb ..B1.5 # Prob 96% #5.7 Please provide your thoughts on this and possible vectorization improvement in GCC for this pattern.