public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran
@ 2007-03-08 9:46 jv244 at cam dot ac dot uk
2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-03-08 9:46 UTC (permalink / raw)
To: gcc-bugs
I'm still trying to find a reduced testcase (or better source) for PR 31021,
but I'm not sure the code below is really the same issue. However, it
illustrates a rather small program with a very significant slowdown in gfortran
relative to ifort.
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ifort -O2 -xT test.f90
test.f90(17) : (col. 7) remark: LOOP WAS VECTORIZED.
test.f90(20) : (col. 7) remark: LOOP WAS VECTORIZED.
test.f90(24) : (col. 4) remark: BLOCK WAS VECTORIZED.
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
3.544221
vondele@pcihpc13:/data/vondele/extracted_collocate/test> gfortran -O3
-march=native -ftree-vectorize -ffast-math test.f90
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
11.84874
vondele@pcihpc13:/data/vondele/extracted_collocate/test> gfortran -O2
-march=native -ftree-vectorize -ffast-math test.f90
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
11.84474
vondele@pcihpc13:/data/vondele/extracted_collocate/test> cat test.f90
SUBROUTINE collocate_core_2_2_0_0(jg,cmax)
IMPLICIT NONE
integer, INTENT(IN) :: jg,cmax
INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
INTEGER, PARAMETER :: N=1000
TYPE vec
real(wp) :: a(2)
END TYPE vec
TYPE(vec) :: dpy(1000)
TYPE(vec) :: pxy(1000)
real(wp) s(04)
integer :: i
CALL USE(dpy,pxy,s)
DO i=1,N
pxy(i)%a=0.0_wp
ENDDO
DO i=1,N
dpy(i)%a=0.0_wp
ENDDO
s(01)=0.0_wp
s(02)=0.0_wp
s(03)=0.0_wp
s(04)=0.0_wp
DO i=1,N
s(01)=s(01)+pxy(i)%a(1)*dpy(i)%a(1)
s(02)=s(02)+pxy(i)%a(2)*dpy(i)%a(1)
s(03)=s(03)+pxy(i)%a(1)*dpy(i)%a(2)
s(04)=s(04)+pxy(i)%a(2)*dpy(i)%a(2)
ENDDO
CALL USE(dpy,pxy,s)
END SUBROUTINE
SUBROUTINE USE(a,b,c)
INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
REAL(kind=wp) :: a(*),b(*),c(*)
END SUBROUTINE USE
PROGRAM TEST
integer, parameter :: cmax=5
integer*8 :: t1,t2,tbest
real :: time1,time2
jg=0
CALL cpu_time(time1)
tbest=huge(tbest)
DO i=1,1000000
! t1=nanotime_ia32()
CALL collocate_core_2_2_0_0(0,cmax)
! t2=nanotime_ia32()
! if(t2-t1>0 .AND. t2-t1<tbest) tbest=t2-t1
ENDDO
CALL cpu_time(time2)
! write(6,*) tbest,time2-time1
write(6,*) time2-time1
END PROGRAM TEST
--
Summary: 300% difference between ifort/gfortran
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jv244 at cam dot ac dot uk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
@ 2007-03-08 11:11 ` jv244 at cam dot ac dot uk
2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-03-08 11:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from jv244 at cam dot ac dot uk 2007-03-08 11:11 -------
The following is (for me) an even more interesting example, as it times only
the loop that thus the actual multiply / add but also tricks my version of
ifort into generating the expected asm. Ifort is about twice as fast as
gfortran on it.
SUBROUTINE collocate_core_2_2_0_0(jg,cmax)
IMPLICIT NONE
integer, INTENT(IN) :: jg,cmax
INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
INTEGER, PARAMETER :: N=10,Nit=100000000
TYPE vec
real(wp) :: a(2)
END TYPE vec
TYPE(vec) :: dpy(1000)
TYPE(vec) :: pxy(1000)
TYPE(vec) :: s(02)
integer :: i,j
DO i=1,N
pxy(i)%a=0.0_wp
ENDDO
DO i=1,N
dpy(i)%a=0.0_wp
ENDDO
s(01)%a(1)=0.0_wp
s(01)%a(2)=0.0_wp
s(02)%a(1)=0.0_wp
s(02)%a(2)=0.0_wp
CALL USE(dpy,pxy,s)
DO j=1,Nit
DO i=1,N
s(01)%a(:)=s(01)%a(:)+pxy(i)%a(:)*dpy(i)%a(1)
s(02)%a(:)=s(02)%a(:)+pxy(i)%a(:)*dpy(i)%a(2)
ENDDO
ENDDO
CALL USE(dpy,pxy,s)
END SUBROUTINE
vondele@pcihpc13:/data/vondele/extracted_collocate/test> gfortran -O2
-march=native -ftree-vectorize -ffast-math test.f90
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
4.288268
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ifort -O2 -xT test.f90
test.f90(16) : (col. 8) remark: LOOP WAS VECTORIZED.
test.f90(19) : (col. 8) remark: LOOP WAS VECTORIZED.
test.f90(31) : (col. 6) remark: LOOP WAS VECTORIZED.
test.f90(31) : (col. 6) remark: LOOP WAS VECTORIZED.
test.f90(32) : (col. 6) remark: LOOP WAS VECTORIZED.
test.f90(32) : (col. 6) remark: LOOP WAS VECTORIZED.
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
1.944121
The inner loop asm looks, with ifort, also the way I was hoping it to look
like:
.B2.7: # Preds ..B2.7 ..B2.6
movddup -16+collocate_core_2_2_0_0_$DPY.0.0(%rcx), %xmm2 #31.41
movddup -8+collocate_core_2_2_0_0_$DPY.0.0(%rcx), %xmm3 #32.41
addq $16, %rdx #33.4
movapd collocate_core_2_2_0_0_$PXY.0.0(%rdx), %xmm4 #31.6
mulpd %xmm4, %xmm2 #31.39
mulpd %xmm3, %xmm4 #32.39
addpd %xmm2, %xmm1 #31.7
addpd %xmm4, %xmm0 #32.7
addq $16, %rcx #33.5
cmpq $160, %rcx #33.4
jle ..B2.7 # Prob 90% #33.4
# LOE rdx rcx rbx rbp r12 r13 r14 r15 eax xmm0
xmm1
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
@ 2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
2007-06-21 4:16 ` jv244 at cam dot ac dot uk
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-06-20 21:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from fxcoudert at gcc dot gnu dot org 2007-06-20 20:59 -------
I see a smaller difference, but a difference nonetheless.
--
fxcoudert at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fxcoudert at gcc dot gnu dot
| |org
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
GCC build triplet| |x86_64-unknown-linux-gnu
GCC host triplet| |x86_64-unknown-linux-gnu
GCC target triplet| |x86_64-unknown-linux-gnu
Last reconfirmed|0000-00-00 00:00:00 |2007-06-20 20:59:50
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
@ 2007-06-21 4:16 ` jv244 at cam dot ac dot uk
2008-01-07 22:58 ` jv244 at cam dot ac dot uk
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-06-21 4:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from jv244 at cam dot ac dot uk 2007-06-21 04:16 -------
(In reply to comment #2)
> I see a smaller difference, but a difference nonetheless.
yes, looks like better code is now generated, current timings are down to a
200% difference
ifort: 1.988124
gfortran: 3.900243
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (2 preceding siblings ...)
2007-06-21 4:16 ` jv244 at cam dot ac dot uk
@ 2008-01-07 22:58 ` jv244 at cam dot ac dot uk
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-01-07 22:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from jv244 at cam dot ac dot uk 2008-01-07 22:00 -------
timings have improved a lot with a recent gfortran, at least on an opteron, I
have now for ifort 3.7s for gfortran 4.5s (20% slower only) for the following
code:
SUBROUTINE collocate_core_2_2_0_0(jg,cmax)
IMPLICIT NONE
integer, INTENT(IN) :: jg,cmax
INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
INTEGER, PARAMETER :: N=10,Nit=100000000
TYPE vec
real(wp) :: a(2)
END TYPE vec
TYPE(vec) :: dpy(1000)
TYPE(vec) :: pxy(1000)
TYPE(vec) :: s(02)
integer :: i,j
DO i=1,N
pxy(i)%a=0.0_wp
ENDDO
DO i=1,N
dpy(i)%a=0.0_wp
ENDDO
s(01)%a(1)=0.0_wp
s(01)%a(2)=0.0_wp
s(02)%a(1)=0.0_wp
s(02)%a(2)=0.0_wp
CALL USE(dpy,pxy,s)
! this is the hot loop
DO j=1,Nit
DO i=1,N
s(01)%a(:)=s(01)%a(:)+pxy(i)%a(:)*dpy(i)%a(1)
s(02)%a(:)=s(02)%a(:)+pxy(i)%a(:)*dpy(i)%a(2)
ENDDO
ENDDO
CALL USE(dpy,pxy,s)
END SUBROUTINE
SUBROUTINE USE(a,b,c)
INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
REAL(kind=wp) :: a(*),b(*),c(*)
END SUBROUTINE USE
PROGRAM TEST
integer, parameter :: cmax=5
integer*8 :: t1,t2,tbest
real :: time1,time2
jg=0
CALL cpu_time(time1)
tbest=huge(tbest)
DO i=1,1
! t1=nanotime_ia32()
CALL collocate_core_2_2_0_0(0,cmax)
! t2=nanotime_ia32()
! if(t2-t1>0 .AND. t2-t1<tbest) tbest=t2-t1
ENDDO
CALL cpu_time(time2)
! write(6,*) tbest,time2-time1
write(6,*) time2-time1
END PROGRAM TEST
using
ifort -xW -O3 test.f90
gfortran -march=native -O3 -ffast-math test.f90
gfortran's inner loop asm looks like:
.L8:
movlpd (%rbp,%rax), %xmm0
movsd %xmm0, %xmm1
mulsd (%rbx,%rax), %xmm1
addsd %xmm1, %xmm2
movsd %xmm2, 32000(%rsp)
mulsd 8(%rbx,%rax), %xmm0
addsd %xmm0, %xmm5
movsd %xmm5, 32008(%rsp)
movlpd 8(%rbp,%rax), %xmm0
movsd %xmm0, %xmm1
mulsd (%rbx,%rax), %xmm1
addsd %xmm1, %xmm4
movsd %xmm4, 32016(%rsp)
mulsd 8(%rbx,%rax), %xmm0
addq $16, %rax
cmpq $160, %rax
addsd %xmm0, %xmm3
movsd %xmm3, 32024(%rsp)
jne .L8
while ifort's loop looks like:
..B3.7: # Preds ..B3.7 ..B3.6
movsd collocate_core_2_2_0_0_$DPY.0.0(%rdx), %xmm2 #31.41
movsd 8+collocate_core_2_2_0_0_$DPY.0.0(%rdx), %xmm3 #32.41
movaps collocate_core_2_2_0_0_$PXY.0.0(%rdx), %xmm4 #31.7
unpcklpd %xmm2, %xmm2 #31.41
mulpd %xmm4, %xmm2 #31.40
addpd %xmm2, %xmm1 #31.7
unpcklpd %xmm3, %xmm3 #32.41
mulpd %xmm3, %xmm4 #32.40
addpd %xmm4, %xmm0 #32.7
addq $16, %rdx #30.5
cmpq $160, %rdx #30.5
jl ..B3.7 # Prob 90% #30.5
so I guess ifort vectorizes where gfortran does not.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (3 preceding siblings ...)
2008-01-07 22:58 ` jv244 at cam dot ac dot uk
@ 2008-01-08 10:22 ` jv244 at cam dot ac dot uk
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-01-08 10:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from jv244 at cam dot ac dot uk 2008-01-08 09:52 -------
updated the summary after the analysis in comment #4, and and CCed Dorit for
the vectorization issue.
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dorit at il dot ibm dot com
Summary|300% difference between |20% difference between
|ifort/gfortran |ifort/gfortran, missed
| |vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (4 preceding siblings ...)
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
@ 2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenth at gcc dot gnu dot org 2008-08-18 15:20 -------
The problem for the GCC vectorizer is that there are no loads or stores left
in the loop and it doesn't handle vectorizing "registers" only. This is a
case where real vectorization of straight-line code would be necessary.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (5 preceding siblings ...)
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
@ 2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenth at gcc dot gnu dot org 2008-08-18 15:22 -------
That is, GCCs inner loop is
.L6:
addl $1, %eax
addsd %xmm12, %xmm11
cmpl $100000000, %eax
addsd %xmm14, %xmm3
addsd %xmm15, %xmm2
addsd %xmm13, %xmm1
jne .L6
which doesn't necessarily look slower than ICCs.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (6 preceding siblings ...)
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
@ 2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 5:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from jv244 at cam dot ac dot uk 2008-08-19 05:43 -------
(In reply to comment #7)
> That is, GCCs inner loop is
>
> .L6:
> addl $1, %eax
> addsd %xmm12, %xmm11
> cmpl $100000000, %eax
> addsd %xmm14, %xmm3
> addsd %xmm15, %xmm2
> addsd %xmm13, %xmm1
> jne .L6
>
> which doesn't necessarily look slower than ICCs.
>
Right... checked trunk, and it now does something very smart with the testcase
from comment 4 ... it is now about 10 times faster than ifort (9.1 /11.0)
> gfortran -O3 -ftree-vectorize -ffast-math -march=native -S PR31079_4.f90
> ./a.out
0.25201499
> ifort -xT -O2 PR31079_4.f90
> ./a.out
2.040127
I'll see if there is a way to get the testcase somewhat smarter. I checked the
very first program (comment #0), and this is still slower with gfortran (intel
3.51 vs gfortran 4.1). Just for completeness, I attach the Fortran source and
the intel assembly.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (7 preceding siblings ...)
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
@ 2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:46 ` jv244 at cam dot ac dot uk
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 5:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from jv244 at cam dot ac dot uk 2008-08-19 05:44 -------
Created an attachment (id=16093)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16093&action=view)
comment #0 source
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (8 preceding siblings ...)
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
@ 2008-08-19 5:46 ` jv244 at cam dot ac dot uk
2008-08-19 6:11 ` jv244 at cam dot ac dot uk
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 5:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from jv244 at cam dot ac dot uk 2008-08-19 05:45 -------
Created an attachment (id=16094)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16094&action=view)
comment #0 intel's assembly (ifort 9.1 at -O2 -xT)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (9 preceding siblings ...)
2008-08-19 5:46 ` jv244 at cam dot ac dot uk
@ 2008-08-19 6:11 ` jv244 at cam dot ac dot uk
2008-08-19 6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 6:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from jv244 at cam dot ac dot uk 2008-08-19 06:09 -------
Created an attachment (id=16095)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16095&action=view)
new testcase
This (PR31079_11.f90) should be a replacement for comment #4, and illustrates
the vectorizer issue.
> gfortran -O3 -ftree-vectorize -ffast-math -march=native PR31079_11.f90
> ./a.out
4.0282512
> ifort -O3 -xT PR31079_11.f90
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: BLOCK WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(17): (col. 8) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(24): (col. 5) remark: BLOCK WAS VECTORIZED.
PR31079_11.f90(30): (col. 7) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(31): (col. 7) remark: LOOP WAS VECTORIZED.
> ./a.out
2.640165
The inner loop looks like:
DO i=1,N
s(1:2)=s(1:2)+pxy(i)%a(:)*dpy(i)%a(1)
s(3:4)=s(3:4)+pxy(i)%a(:)*dpy(i)%a(2)
ENDDO
which ifort vectorizes (I will attach the full asm):
..B3.4: # Preds ..B3.4 ..B3.3
movddup collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm2 #30.33
movddup 8+collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm4 #31.33
movaps collocate_core_2_2_0_0_$PXY.0.1(%rax), %xmm3 #30.7
mulpd %xmm3, %xmm2 #30.32
incq %rdx #29.5
addq $16, %rax #29.5
addpd %xmm2, %xmm1 #30.7
cmpq $1000, %rdx #29.5
mulpd %xmm3, %xmm4 #31.32
addpd %xmm4, %xmm0 #31.7
jl ..B3.4 # Prob 99% #29.5
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (10 preceding siblings ...)
2008-08-19 6:11 ` jv244 at cam dot ac dot uk
@ 2008-08-19 6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 6:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from jv244 at cam dot ac dot uk 2008-08-19 06:11 -------
Created an attachment (id=16096)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16096&action=view)
ifort's asm for PR31079_11.f90 at -O3 -xT -S
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (11 preceding siblings ...)
2008-08-19 6:12 ` jv244 at cam dot ac dot uk
@ 2008-08-19 13:33 ` jv244 at cam dot ac dot uk
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 13:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from jv244 at cam dot ac dot uk 2008-08-19 13:31 -------
(In reply to comment #11)
> This (PR31079_11.f90) should be a replacement for comment #4, and illustrates
> the vectorizer issue.
The patch Richard posted in PR37150 also improves this PR31079_11.f90 testcase
a lot:
ifort : 2.54
gfortran (unpatched): 4.00
gfortran (patched) : 2.96
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-08-19 13:33 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
2007-06-21 4:16 ` jv244 at cam dot ac dot uk
2008-01-07 22:58 ` jv244 at cam dot ac dot uk
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:46 ` jv244 at cam dot ac dot uk
2008-08-19 6:11 ` jv244 at cam dot ac dot uk
2008-08-19 6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).