From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-379841-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7217 invoked by alias); 13 Jan 2012 13:40:25 -0000
Received: (qmail 7201 invoked by uid 22791); 13 Jan 2012 13:40:23 -0000
X-SWARE-Spam-Status: No, hits=-2.8 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00,TW_AV,TW_DD,TW_SV
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 13 Jan 2012 13:40:10 +0000
From: "venkataramanan.kumar at amd dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
Date: Fri, 13 Jan 2012 14:08:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: venkataramanan.kumar at amd dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-51848-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2012-01/txt/msg01516.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

             Bug #: 51848
           Summary: GCC is not able to vectorize when a constant value is
                    also added to the sum of array expression inside a
                    loop.
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: venkataramanan.kumar@amd.com


This below test case is simulated from "air.f90" benchmark of polyhedren. 

What I see is vectorization makes "air" run faster with ICC than GCC by about
16%,
but I am not sure if all that comes from vectorization alone.

While analysing the assembly differences, found that GCC is not vectorizing the
below case wheres ICC does vectorize.

(Snip)
      DIMENSION NPX(30) , NPY(30)
      COMMON /XD1   / MXPy, NDX
      COMMON /XD2  / MXPx
      MXPx = 0
      DO i = 1 , NDX
         MXPx = MXPx + NPX(i)+1
      ENDDO
!
      END
(Snip)


Machine: x86_64-unknown-linux-gnu
GCC revison: 183151 
ICC revision: 12.1.0.233 Build 2

gcc -Ofast -march=corei7-avx  -limf -lsvml -L /tool/intel/lib/intel64/
-mveclibabi=svml   pattern1.f90 -ftree-vectorizer-verbose=2 -S

Analyzing loop at pattern1.f90:5

5: not vectorized: unsupported use in stmt.
5: not vectorized: unsupported use in stmt.
pattern1.f90:9: note: vectorized 0 loops in function.


ifort -march=corei7-avx  -O3  -limf -lsvml -L /tool/intel/lib/intel64/ 
pattern1.f90  -vec-report -S -fsource-asm

pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED.


For the expression: 

MXPx = MXPx + NPX(i)+1


The constant "1" is converted to a vector packet as shown below

 .L_2il0floatpacket.0:
        .long   0x00000001,0x00000001,0x00000001,0x00000001

The assembly pattern for the vectorization portion in ICC looks like as shown
below:


The total expression now becomes vectorizable. 

vmovdqu   .L_2il0floatpacket.0(%rip), %xmm0

..B1.5:                         # Preds ..B1.5 ..B1.4
        vpaddd    _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10
        addq      $4, %rax                                      #5.7
        vpaddd    %xmm2, %xmm1, %xmm1                           #6.22
        cmpq      %rdx, %rax                                    #5.7
        jb        ..B1.5        # Prob 96%                      #5.7

Please provide your thoughts on this and possible vectorization improvement in
GCC for this pattern.