public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
@ 2012-01-13 14:08 venkataramanan.kumar at amd dot com
  2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: venkataramanan.kumar at amd dot com @ 2012-01-13 14:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

             Bug #: 51848
           Summary: GCC is not able to vectorize when a constant value is
                    also added to the sum of array expression inside a
                    loop.
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: venkataramanan.kumar@amd.com


This below test case is simulated from "air.f90" benchmark of polyhedren. 

What I see is vectorization makes "air" run faster with ICC than GCC by about
16%,
but I am not sure if all that comes from vectorization alone.

While analysing the assembly differences, found that GCC is not vectorizing the
below case wheres ICC does vectorize.

(Snip)
      DIMENSION NPX(30) , NPY(30)
      COMMON /XD1   / MXPy, NDX
      COMMON /XD2  / MXPx
      MXPx = 0
      DO i = 1 , NDX
         MXPx = MXPx + NPX(i)+1
      ENDDO
!
      END
(Snip)


Machine: x86_64-unknown-linux-gnu
GCC revison: 183151 
ICC revision: 12.1.0.233 Build 2

gcc -Ofast -march=corei7-avx  -limf -lsvml -L /tool/intel/lib/intel64/
-mveclibabi=svml   pattern1.f90 -ftree-vectorizer-verbose=2 -S

Analyzing loop at pattern1.f90:5

5: not vectorized: unsupported use in stmt.
5: not vectorized: unsupported use in stmt.
pattern1.f90:9: note: vectorized 0 loops in function.


ifort -march=corei7-avx  -O3  -limf -lsvml -L /tool/intel/lib/intel64/ 
pattern1.f90  -vec-report -S -fsource-asm

pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED.


For the expression: 

MXPx = MXPx + NPX(i)+1


The constant "1" is converted to a vector packet as shown below

 .L_2il0floatpacket.0:
        .long   0x00000001,0x00000001,0x00000001,0x00000001

The assembly pattern for the vectorization portion in ICC looks like as shown
below:


The total expression now becomes vectorizable. 

vmovdqu   .L_2il0floatpacket.0(%rip), %xmm0

..B1.5:                         # Preds ..B1.5 ..B1.4
        vpaddd    _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10
        addq      $4, %rax                                      #5.7
        vpaddd    %xmm2, %xmm1, %xmm1                           #6.22
        cmpq      %rdx, %rax                                    #5.7
        jb        ..B1.5        # Prob 96%                      #5.7

Please provide your thoughts on this and possible vectorization improvement in
GCC for this pattern.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
  2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
@ 2012-01-13 16:02 ` dominiq at lps dot ens.fr
  2012-01-15  8:49 ` irar at il dot ibm.com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-01-13 16:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-01-13
                 CC|                            |grosser at gcc dot gnu.org,
                   |                            |irar at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-01-13 16:00:20 UTC ---
I confirm that the loop in the codelet is not vectorized (it is if the "+1" is
removed and taken into account as "MXPx=NDX").

> but I am not sure if all that comes from vectorization alone.

Doing the above change improves the vectorization, but not the timing. One
"property" of the test air.f90 is to have several nested loops with "bad"
nesting (slow index first). I don't know if this was done to test compilers,
but if I reverse the loops manually as in

--- air.f90    2009-08-28 14:22:26.000000000 +0200
+++ air_v1.f90    2005-11-09 17:33:12.000000000 +0100
@@ -400,8 +400,8 @@
 !
 ! COMPUTE THE FLUX TERMS
 !
-      DO i = 1 , MXPx
-         DO j = 1 , MXPy
+      DO j = 1 , MXPy
+         DO i = 1 , MXPx
 !
 ! compute vanleer fluxes
 !
@@ -657,8 +657,8 @@
       ENDDO
 !
 ! COMPUTE THE FLUX TERMS
-      DO i = 1 , MXPx
-         DO j = 1 , MXPy
+      DO j = 1 , MXPy
+         DO i = 1 , MXPx
 !
 ! compute vanleer fluxes
 !
@@ -838,8 +838,8 @@
 ! FIND THE LOCAL TIME STEPS
 !
       dt = 100
-      DO i = 1 , MXPx
-         DO j = 1 , MXPy
+      DO j = 1 , MXPy
+         DO i = 1 , MXPx
             as = DSQRT(P(i,j)/RHO(i,j)*GMA)
             rdltx = RHO(i,j)*DABS(U(i,j))*ddx(i,j)/xmu(i,j)
             rdlty = RHO(i,j)*DABS(V(i,j))*ddy(i,j)/xmu(i,j)
@@ -880,13 +880,13 @@
          DO iy = 1 , NDY
             maxy = maxy + NPY(iy) + 1
             dtd = 100.0
-            DO i = minx , maxx
-               DO j = miny , maxy
+            DO j = miny , maxy
+               DO i = minx , maxx
                   IF ( dtt(i,j).LE.dtd ) dtd = dtt(i,j)
                ENDDO
             ENDDO
-            DO i = minx , maxx
-               DO j = miny , maxy
+            DO j = miny , maxy
+               DO i = minx , maxx
                   dtt(i,j) = dtd
                ENDDO
             ENDDO
@@ -958,8 +958,8 @@
       con2 = 0.0
       con3 = 0.0
       con4 = 0.0
-      DO i = 1 , MXPx
-         DO j = 1 , MXPy
+      DO j = 1 , MXPy
+         DO i = 1 , MXPx
             con1 = con1 + DABS(u1(i,j)-u1o(i,j))/dtt(i,j)
             con2 = con2 + DABS(u2(i,j)-u2o(i,j))/dtt(i,j)
             con3 = con3 + DABS(u3(i,j)-u3o(i,j))/dtt(i,j)

the timing goes from

[macbook] lin/test% time a.out > /dev/null
7.233u 0.023s 0:07.25 100.0%    0+0k 0+8io 0pf+0w

to

[macbook] lin/test% time a.out > /dev/null
6.353u 0.021s 0:06.37 100.0%    0+0k 0+8io 0pf+0w

I have made a few attempt to obtain gfortran to do these loops interchange
using graphite without success so far.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
  2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
  2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr
@ 2012-01-15  8:49 ` irar at il dot ibm.com
  2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: irar at il dot ibm.com @ 2012-01-15  8:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

Ira Rosen <irar at il dot ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm.com

--- Comment #2 from Ira Rosen <irar at il dot ibm.com> 2012-01-15 08:06:09 UTC ---
We don't recognize this as reduction because we look for:

     a1 = phi < a0, a2 >
     a3 = ...
     a2 = operation (a3, a1)

and here we have

     a1 = phi < a0, a2 >
     a3 = ...
     a4 = operation (a3, a1)
     a2 = a4 + 1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
  2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
  2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr
  2012-01-15  8:49 ` irar at il dot ibm.com
@ 2012-01-16 10:16 ` rguenth at gcc dot gnu.org
  2012-07-13  9:00 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-16 10:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
                 CC|                            |rguenth at gcc dot gnu.org
          Component|middle-end                  |tree-optimization

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-16 10:14:08 UTC ---
(In reply to comment #2)
> We don't recognize this as reduction because we look for:
> 
>      a1 = phi < a0, a2 >
>      a3 = ...
>      a2 = operation (a3, a1)
> 
> and here we have
> 
>      a1 = phi < a0, a2 >
>      a3 = ...
>      a4 = operation (a3, a1)
>      a2 = a4 + 1

It seems reassociation should "fix" this.  IIRC we had some special code
in there that was supposed to handle this.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
  2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
                   ` (2 preceding siblings ...)
  2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org
@ 2012-07-13  9:00 ` rguenth at gcc dot gnu.org
  2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org
  2024-02-16  8:22 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13  9:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:59:26 UTC ---
Link to vectorizer missed-optimization meta-bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
  2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
                   ` (3 preceding siblings ...)
  2012-07-13  9:00 ` rguenth at gcc dot gnu.org
@ 2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org
  2024-02-16  8:22 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-06-16 16:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

alalaw01 at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alalaw01 at gcc dot gnu.org

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
(In reply to Richard Biener from comment #3)
> (In reply to comment #2)
> > We don't recognize this as reduction because we look for:
> > 
> >      a1 = phi < a0, a2 >
> >      a3 = ...
> >      a2 = operation (a3, a1)
> > 
> > and here we have
> > 
> >      a1 = phi < a0, a2 >
> >      a3 = ...
> >      a4 = operation (a3, a1)
> >      a2 = a4 + 1
> 
> It seems reassociation should "fix" this.  IIRC we had some special code
> in there that was supposed to handle this.

We do reassociate equivalent functions in C, for example

float x[30];

float bar() {
  float s = 0;
  for (int i = 0; i < 30; i++)
    {
      s += x[i];
      s += 1;
    }
  return s;
}

which vectorizes just fine.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
  2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
                   ` (4 preceding siblings ...)
  2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org
@ 2024-02-16  8:22 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-16  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-bisection             |
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
      Known to work|                            |11.4.1

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
long fixed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-02-16  8:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com
2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr
2012-01-15  8:49 ` irar at il dot ibm.com
2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org
2012-07-13  9:00 ` rguenth at gcc dot gnu.org
2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org
2024-02-16  8:22 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).