public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. @ 2012-01-13 14:08 venkataramanan.kumar at amd dot com 2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: venkataramanan.kumar at amd dot com @ 2012-01-13 14:08 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Bug #: 51848 Summary: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned@gcc.gnu.org ReportedBy: venkataramanan.kumar@amd.com This below test case is simulated from "air.f90" benchmark of polyhedren. What I see is vectorization makes "air" run faster with ICC than GCC by about 16%, but I am not sure if all that comes from vectorization alone. While analysing the assembly differences, found that GCC is not vectorizing the below case wheres ICC does vectorize. (Snip) DIMENSION NPX(30) , NPY(30) COMMON /XD1 / MXPy, NDX COMMON /XD2 / MXPx MXPx = 0 DO i = 1 , NDX MXPx = MXPx + NPX(i)+1 ENDDO ! END (Snip) Machine: x86_64-unknown-linux-gnu GCC revison: 183151 ICC revision: 12.1.0.233 Build 2 gcc -Ofast -march=corei7-avx -limf -lsvml -L /tool/intel/lib/intel64/ -mveclibabi=svml pattern1.f90 -ftree-vectorizer-verbose=2 -S Analyzing loop at pattern1.f90:5 5: not vectorized: unsupported use in stmt. 5: not vectorized: unsupported use in stmt. pattern1.f90:9: note: vectorized 0 loops in function. ifort -march=corei7-avx -O3 -limf -lsvml -L /tool/intel/lib/intel64/ pattern1.f90 -vec-report -S -fsource-asm pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED. For the expression: MXPx = MXPx + NPX(i)+1 The constant "1" is converted to a vector packet as shown below .L_2il0floatpacket.0: .long 0x00000001,0x00000001,0x00000001,0x00000001 The assembly pattern for the vectorization portion in ICC looks like as shown below: The total expression now becomes vectorizable. vmovdqu .L_2il0floatpacket.0(%rip), %xmm0 ..B1.5: # Preds ..B1.5 ..B1.4 vpaddd _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10 addq $4, %rax #5.7 vpaddd %xmm2, %xmm1, %xmm1 #6.22 cmpq %rdx, %rax #5.7 jb ..B1.5 # Prob 96% #5.7 Please provide your thoughts on this and possible vectorization improvement in GCC for this pattern. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com @ 2012-01-13 16:02 ` dominiq at lps dot ens.fr 2012-01-15 8:49 ` irar at il dot ibm.com ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: dominiq at lps dot ens.fr @ 2012-01-13 16:02 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Dominique d'Humieres <dominiq at lps dot ens.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2012-01-13 CC| |grosser at gcc dot gnu.org, | |irar at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-01-13 16:00:20 UTC --- I confirm that the loop in the codelet is not vectorized (it is if the "+1" is removed and taken into account as "MXPx=NDX"). > but I am not sure if all that comes from vectorization alone. Doing the above change improves the vectorization, but not the timing. One "property" of the test air.f90 is to have several nested loops with "bad" nesting (slow index first). I don't know if this was done to test compilers, but if I reverse the loops manually as in --- air.f90 2009-08-28 14:22:26.000000000 +0200 +++ air_v1.f90 2005-11-09 17:33:12.000000000 +0100 @@ -400,8 +400,8 @@ ! ! COMPUTE THE FLUX TERMS ! - DO i = 1 , MXPx - DO j = 1 , MXPy + DO j = 1 , MXPy + DO i = 1 , MXPx ! ! compute vanleer fluxes ! @@ -657,8 +657,8 @@ ENDDO ! ! COMPUTE THE FLUX TERMS - DO i = 1 , MXPx - DO j = 1 , MXPy + DO j = 1 , MXPy + DO i = 1 , MXPx ! ! compute vanleer fluxes ! @@ -838,8 +838,8 @@ ! FIND THE LOCAL TIME STEPS ! dt = 100 - DO i = 1 , MXPx - DO j = 1 , MXPy + DO j = 1 , MXPy + DO i = 1 , MXPx as = DSQRT(P(i,j)/RHO(i,j)*GMA) rdltx = RHO(i,j)*DABS(U(i,j))*ddx(i,j)/xmu(i,j) rdlty = RHO(i,j)*DABS(V(i,j))*ddy(i,j)/xmu(i,j) @@ -880,13 +880,13 @@ DO iy = 1 , NDY maxy = maxy + NPY(iy) + 1 dtd = 100.0 - DO i = minx , maxx - DO j = miny , maxy + DO j = miny , maxy + DO i = minx , maxx IF ( dtt(i,j).LE.dtd ) dtd = dtt(i,j) ENDDO ENDDO - DO i = minx , maxx - DO j = miny , maxy + DO j = miny , maxy + DO i = minx , maxx dtt(i,j) = dtd ENDDO ENDDO @@ -958,8 +958,8 @@ con2 = 0.0 con3 = 0.0 con4 = 0.0 - DO i = 1 , MXPx - DO j = 1 , MXPy + DO j = 1 , MXPy + DO i = 1 , MXPx con1 = con1 + DABS(u1(i,j)-u1o(i,j))/dtt(i,j) con2 = con2 + DABS(u2(i,j)-u2o(i,j))/dtt(i,j) con3 = con3 + DABS(u3(i,j)-u3o(i,j))/dtt(i,j) the timing goes from [macbook] lin/test% time a.out > /dev/null 7.233u 0.023s 0:07.25 100.0% 0+0k 0+8io 0pf+0w to [macbook] lin/test% time a.out > /dev/null 6.353u 0.021s 0:06.37 100.0% 0+0k 0+8io 0pf+0w I have made a few attempt to obtain gfortran to do these loops interchange using graphite without success so far. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com 2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr @ 2012-01-15 8:49 ` irar at il dot ibm.com 2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: irar at il dot ibm.com @ 2012-01-15 8:49 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Ira Rosen <irar at il dot ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |irar at il dot ibm.com --- Comment #2 from Ira Rosen <irar at il dot ibm.com> 2012-01-15 08:06:09 UTC --- We don't recognize this as reduction because we look for: a1 = phi < a0, a2 > a3 = ... a2 = operation (a3, a1) and here we have a1 = phi < a0, a2 > a3 = ... a4 = operation (a3, a1) a2 = a4 + 1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com 2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr 2012-01-15 8:49 ` irar at il dot ibm.com @ 2012-01-16 10:16 ` rguenth at gcc dot gnu.org 2012-07-13 9:00 ` rguenth at gcc dot gnu.org ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2012-01-16 10:16 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization CC| |rguenth at gcc dot gnu.org Component|middle-end |tree-optimization --- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-16 10:14:08 UTC --- (In reply to comment #2) > We don't recognize this as reduction because we look for: > > a1 = phi < a0, a2 > > a3 = ... > a2 = operation (a3, a1) > > and here we have > > a1 = phi < a0, a2 > > a3 = ... > a4 = operation (a3, a1) > a2 = a4 + 1 It seems reassociation should "fix" this. IIRC we had some special code in there that was supposed to handle this. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com ` (2 preceding siblings ...) 2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org @ 2012-07-13 9:00 ` rguenth at gcc dot gnu.org 2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org 2024-02-16 8:22 ` rguenth at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2012-07-13 9:00 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |53947 --- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:59:26 UTC --- Link to vectorizer missed-optimization meta-bug. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com ` (3 preceding siblings ...) 2012-07-13 9:00 ` rguenth at gcc dot gnu.org @ 2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org 2024-02-16 8:22 ` rguenth at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: alalaw01 at gcc dot gnu.org @ 2015-06-16 16:54 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 alalaw01 at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |alalaw01 at gcc dot gnu.org --- Comment #5 from alalaw01 at gcc dot gnu.org --- (In reply to Richard Biener from comment #3) > (In reply to comment #2) > > We don't recognize this as reduction because we look for: > > > > a1 = phi < a0, a2 > > > a3 = ... > > a2 = operation (a3, a1) > > > > and here we have > > > > a1 = phi < a0, a2 > > > a3 = ... > > a4 = operation (a3, a1) > > a2 = a4 + 1 > > It seems reassociation should "fix" this. IIRC we had some special code > in there that was supposed to handle this. We do reassociate equivalent functions in C, for example float x[30]; float bar() { float s = 0; for (int i = 0; i < 30; i++) { s += x[i]; s += 1; } return s; } which vectorizes just fine. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/51848] GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop. 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com ` (4 preceding siblings ...) 2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org @ 2024-02-16 8:22 ` rguenth at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-02-16 8:22 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs-bisection | Status|NEW |RESOLVED Resolution|--- |FIXED Known to work| |11.4.1 --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- long fixed. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-16 8:22 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-13 14:08 [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop venkataramanan.kumar at amd dot com 2012-01-13 16:02 ` [Bug middle-end/51848] " dominiq at lps dot ens.fr 2012-01-15 8:49 ` irar at il dot ibm.com 2012-01-16 10:16 ` [Bug tree-optimization/51848] " rguenth at gcc dot gnu.org 2012-07-13 9:00 ` rguenth at gcc dot gnu.org 2015-06-16 16:54 ` alalaw01 at gcc dot gnu.org 2024-02-16 8:22 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).