public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
@ 2012-01-13 14:08 venkataramanan.kumar at amd dot com
  2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: venkataramanan.kumar at amd dot com @ 2012-01-13 14:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

             Bug #: 51848
           Summary: GCC is not able to vectorize when a constant value is
                    also added to the sum of array expression inside a
                    loop.
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: venkataramanan.kumar@amd.com


This below test case is simulated from "air.f90" benchmark of polyhedren. 

What I see is vectorization makes "air" run faster with ICC than GCC by about
16%,
but I am not sure if all that comes from vectorization alone.

While analysing the assembly differences, found that GCC is not vectorizing the
below case wheres ICC does vectorize.

(Snip)
      DIMENSION NPX(30) , NPY(30)
      COMMON /XD1   / MXPy, NDX
      COMMON /XD2  / MXPx
      MXPx = 0
      DO i = 1 , NDX
         MXPx = MXPx + NPX(i)+1
      ENDDO
!
      END
(Snip)


Machine: x86_64-unknown-linux-gnu
GCC revison: 183151 
ICC revision: 12.1.0.233 Build 2

gcc -Ofast -march=corei7-avx  -limf -lsvml -L /tool/intel/lib/intel64/
-mveclibabi=svml   pattern1.f90 -ftree-vectorizer-verbose=2 -S

Analyzing loop at pattern1.f90:5

5: not vectorized: unsupported use in stmt.
5: not vectorized: unsupported use in stmt.
pattern1.f90:9: note: vectorized 0 loops in function.


ifort -march=corei7-avx  -O3  -limf -lsvml -L /tool/intel/lib/intel64/ 
pattern1.f90  -vec-report -S -fsource-asm

pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED.


For the expression: 

MXPx = MXPx + NPX(i)+1


The constant "1" is converted to a vector packet as shown below

 .L_2il0floatpacket.0:
        .long   0x00000001,0x00000001,0x00000001,0x00000001

The assembly pattern for the vectorization portion in ICC looks like as shown
below:


The total expression now becomes vectorizable. 

vmovdqu   .L_2il0floatpacket.0(%rip), %xmm0

..B1.5:                         # Preds ..B1.5 ..B1.4
        vpaddd    _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10
        addq      $4, %rax                                      #5.7
        vpaddd    %xmm2, %xmm1, %xmm1                           #6.22
        cmpq      %rdx, %rax                                    #5.7
        jb        ..B1.5        # Prob 96%                      #5.7

Please provide your thoughts on this and possible vectorization improvement in
GCC for this pattern.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-02-16  8:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr
2012-01-15  8:49 ` irar at il dot ibm.com
2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org
2012-07-13  9:00 ` rguenth at gcc dot gnu.org
2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org
2024-02-16  8:22 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).