[Bug tree-optimization/31079] New: 300% difference between ifort/gfortran

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/31079]  New: 300% difference between ifort/gfortran
@ 2007-03-08  9:46 jv244 at cam dot ac dot uk
  2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-03-08  9:46 UTC (permalink / raw)
  To: gcc-bugs

I'm still trying to find a reduced testcase (or better source) for PR 31021,
but I'm not sure the code below is really the same issue. However, it
illustrates a rather small program with a very significant slowdown in gfortran
relative to ifort.

vondele@pcihpc13:/data/vondele/extracted_collocate/test> ifort -O2 -xT test.f90
test.f90(17) : (col. 7) remark: LOOP WAS VECTORIZED.
test.f90(20) : (col. 7) remark: LOOP WAS VECTORIZED.
test.f90(24) : (col. 4) remark: BLOCK WAS VECTORIZED.
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
   3.544221
vondele@pcihpc13:/data/vondele/extracted_collocate/test> gfortran -O3
-march=native -ftree-vectorize  -ffast-math  test.f90
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
   11.84874
vondele@pcihpc13:/data/vondele/extracted_collocate/test> gfortran -O2
-march=native -ftree-vectorize  -ffast-math  test.f90
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
   11.84474
vondele@pcihpc13:/data/vondele/extracted_collocate/test> cat test.f90
SUBROUTINE collocate_core_2_2_0_0(jg,cmax)
    IMPLICIT NONE
    integer, INTENT(IN)  :: jg,cmax
    INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
    INTEGER, PARAMETER :: N=1000
    TYPE vec
      real(wp) :: a(2)
    END TYPE vec
    TYPE(vec) :: dpy(1000)
    TYPE(vec) ::  pxy(1000)
    real(wp) s(04)
    integer :: i

    CALL USE(dpy,pxy,s)

    DO i=1,N
       pxy(i)%a=0.0_wp
    ENDDO
    DO i=1,N
       dpy(i)%a=0.0_wp
    ENDDO


    s(01)=0.0_wp
    s(02)=0.0_wp
    s(03)=0.0_wp
    s(04)=0.0_wp

    DO i=1,N
      s(01)=s(01)+pxy(i)%a(1)*dpy(i)%a(1)
      s(02)=s(02)+pxy(i)%a(2)*dpy(i)%a(1)
      s(03)=s(03)+pxy(i)%a(1)*dpy(i)%a(2)
      s(04)=s(04)+pxy(i)%a(2)*dpy(i)%a(2)
    ENDDO

    CALL USE(dpy,pxy,s)

END SUBROUTINE

SUBROUTINE USE(a,b,c)
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 REAL(kind=wp) :: a(*),b(*),c(*)
END SUBROUTINE USE

PROGRAM TEST
    integer, parameter :: cmax=5
    integer*8 :: t1,t2,tbest
    real :: time1,time2
    jg=0
    CALL cpu_time(time1)
    tbest=huge(tbest)
    DO i=1,1000000
     ! t1=nanotime_ia32()
       CALL collocate_core_2_2_0_0(0,cmax)
     ! t2=nanotime_ia32()
     ! if(t2-t1>0 .AND. t2-t1<tbest) tbest=t2-t1
    ENDDO
    CALL cpu_time(time2)
    ! write(6,*) tbest,time2-time1
    write(6,*) time2-time1
END PROGRAM TEST


-- 
           Summary: 300% difference between ifort/gfortran
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
@ 2007-03-08 11:11 ` jv244 at cam dot ac dot uk
  2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-03-08 11:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from jv244 at cam dot ac dot uk  2007-03-08 11:11 -------
The following is (for me) an even more interesting example, as it times only
the loop that thus the actual multiply / add but also tricks my version of
ifort into generating the expected asm. Ifort is about twice as fast as
gfortran on it.

SUBROUTINE collocate_core_2_2_0_0(jg,cmax)
    IMPLICIT NONE
    integer, INTENT(IN)  :: jg,cmax
    INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
    INTEGER, PARAMETER :: N=10,Nit=100000000
    TYPE vec
      real(wp) :: a(2)
    END TYPE vec
    TYPE(vec) :: dpy(1000)
    TYPE(vec) ::  pxy(1000)
    TYPE(vec) :: s(02)
    integer :: i,j


    DO i=1,N
        pxy(i)%a=0.0_wp
    ENDDO
    DO i=1,N
        dpy(i)%a=0.0_wp
    ENDDO

    s(01)%a(1)=0.0_wp
    s(01)%a(2)=0.0_wp
    s(02)%a(1)=0.0_wp
    s(02)%a(2)=0.0_wp

    CALL USE(dpy,pxy,s)

    DO j=1,Nit
    DO i=1,N
      s(01)%a(:)=s(01)%a(:)+pxy(i)%a(:)*dpy(i)%a(1)
      s(02)%a(:)=s(02)%a(:)+pxy(i)%a(:)*dpy(i)%a(2)
    ENDDO
    ENDDO

    CALL USE(dpy,pxy,s)

END SUBROUTINE

vondele@pcihpc13:/data/vondele/extracted_collocate/test> gfortran -O2
-march=native -ftree-vectorize  -ffast-math  test.f90
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
   4.288268
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ifort -O2 -xT test.f90
test.f90(16) : (col. 8) remark: LOOP WAS VECTORIZED.
test.f90(19) : (col. 8) remark: LOOP WAS VECTORIZED.
test.f90(31) : (col. 6) remark: LOOP WAS VECTORIZED.
test.f90(31) : (col. 6) remark: LOOP WAS VECTORIZED.
test.f90(32) : (col. 6) remark: LOOP WAS VECTORIZED.
test.f90(32) : (col. 6) remark: LOOP WAS VECTORIZED.
vondele@pcihpc13:/data/vondele/extracted_collocate/test> ./a.out
   1.944121

The inner loop asm looks, with ifort, also the way I was hoping it to look
like:

.B2.7:                         # Preds ..B2.7 ..B2.6
        movddup   -16+collocate_core_2_2_0_0_$DPY.0.0(%rcx), %xmm2 #31.41
        movddup   -8+collocate_core_2_2_0_0_$DPY.0.0(%rcx), %xmm3 #32.41
        addq      $16, %rdx                                     #33.4
        movapd    collocate_core_2_2_0_0_$PXY.0.0(%rdx), %xmm4  #31.6
        mulpd     %xmm4, %xmm2                                  #31.39
        mulpd     %xmm3, %xmm4                                  #32.39
        addpd     %xmm2, %xmm1                                  #31.7
        addpd     %xmm4, %xmm0                                  #32.7
        addq      $16, %rcx                                     #33.5
        cmpq      $160, %rcx                                    #33.4
        jle       ..B2.7        # Prob 90%                      #33.4
                                # LOE rdx rcx rbx rbp r12 r13 r14 r15 eax xmm0
xmm1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
  2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
@ 2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
  2007-06-21  4:16 ` jv244 at cam dot ac dot uk
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-06-20 21:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from fxcoudert at gcc dot gnu dot org  2007-06-20 20:59 -------
I see a smaller difference, but a difference nonetheless.


-- 

fxcoudert at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fxcoudert at gcc dot gnu dot
                   |                            |org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
  GCC build triplet|                            |x86_64-unknown-linux-gnu
   GCC host triplet|                            |x86_64-unknown-linux-gnu
 GCC target triplet|                            |x86_64-unknown-linux-gnu
   Last reconfirmed|0000-00-00 00:00:00         |2007-06-20 20:59:50
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
  2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
  2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
@ 2007-06-21  4:16 ` jv244 at cam dot ac dot uk
  2008-01-07 22:58 ` jv244 at cam dot ac dot uk
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-06-21  4:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from jv244 at cam dot ac dot uk  2007-06-21 04:16 -------
(In reply to comment #2)
> I see a smaller difference, but a difference nonetheless.

yes, looks like better code is now generated, current timings are down to a
200% difference

ifort: 1.988124
gfortran: 3.900243


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 300% difference between ifort/gfortran
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (2 preceding siblings ...)
  2007-06-21  4:16 ` jv244 at cam dot ac dot uk
@ 2008-01-07 22:58 ` jv244 at cam dot ac dot uk
  2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-01-07 22:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jv244 at cam dot ac dot uk  2008-01-07 22:00 -------
timings have improved a lot with a recent gfortran, at least on an opteron, I
have now for ifort 3.7s for gfortran 4.5s (20% slower only) for the following
code:

SUBROUTINE collocate_core_2_2_0_0(jg,cmax)
    IMPLICIT NONE
    integer, INTENT(IN)  :: jg,cmax
    INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
    INTEGER, PARAMETER :: N=10,Nit=100000000
    TYPE vec
      real(wp) :: a(2)
    END TYPE vec
    TYPE(vec) :: dpy(1000)
    TYPE(vec) ::  pxy(1000)
    TYPE(vec) :: s(02)
    integer :: i,j


    DO i=1,N
        pxy(i)%a=0.0_wp
    ENDDO
    DO i=1,N
        dpy(i)%a=0.0_wp
    ENDDO

    s(01)%a(1)=0.0_wp
    s(01)%a(2)=0.0_wp
    s(02)%a(1)=0.0_wp
    s(02)%a(2)=0.0_wp

    CALL USE(dpy,pxy,s)

    ! this is the hot loop
    DO j=1,Nit
    DO i=1,N
      s(01)%a(:)=s(01)%a(:)+pxy(i)%a(:)*dpy(i)%a(1)
      s(02)%a(:)=s(02)%a(:)+pxy(i)%a(:)*dpy(i)%a(2)
    ENDDO
    ENDDO

    CALL USE(dpy,pxy,s)

END SUBROUTINE

SUBROUTINE USE(a,b,c)
 INTEGER, PARAMETER :: wp = SELECTED_REAL_KIND ( 14, 200 )
 REAL(kind=wp) :: a(*),b(*),c(*)
END SUBROUTINE USE

PROGRAM TEST
    integer, parameter :: cmax=5
    integer*8 :: t1,t2,tbest
    real :: time1,time2
    jg=0
    CALL cpu_time(time1)
    tbest=huge(tbest)
    DO i=1,1
     ! t1=nanotime_ia32()
       CALL collocate_core_2_2_0_0(0,cmax)
     ! t2=nanotime_ia32()
     ! if(t2-t1>0 .AND. t2-t1<tbest) tbest=t2-t1
    ENDDO
    CALL cpu_time(time2)
    ! write(6,*) tbest,time2-time1
    write(6,*) time2-time1
END PROGRAM TEST

using 

ifort -xW -O3 test.f90
gfortran -march=native -O3 -ffast-math test.f90

gfortran's inner loop asm looks like:

.L8:
        movlpd  (%rbp,%rax), %xmm0
        movsd   %xmm0, %xmm1
        mulsd   (%rbx,%rax), %xmm1
        addsd   %xmm1, %xmm2
        movsd   %xmm2, 32000(%rsp)
        mulsd   8(%rbx,%rax), %xmm0
        addsd   %xmm0, %xmm5
        movsd   %xmm5, 32008(%rsp)
        movlpd  8(%rbp,%rax), %xmm0
        movsd   %xmm0, %xmm1
        mulsd   (%rbx,%rax), %xmm1
        addsd   %xmm1, %xmm4
        movsd   %xmm4, 32016(%rsp)
        mulsd   8(%rbx,%rax), %xmm0
        addq    $16, %rax
        cmpq    $160, %rax
        addsd   %xmm0, %xmm3
        movsd   %xmm3, 32024(%rsp)
        jne     .L8

while ifort's loop looks like:

..B3.7:                         # Preds ..B3.7 ..B3.6
        movsd     collocate_core_2_2_0_0_$DPY.0.0(%rdx), %xmm2  #31.41
        movsd     8+collocate_core_2_2_0_0_$DPY.0.0(%rdx), %xmm3 #32.41
        movaps    collocate_core_2_2_0_0_$PXY.0.0(%rdx), %xmm4  #31.7
        unpcklpd  %xmm2, %xmm2                                  #31.41
        mulpd     %xmm4, %xmm2                                  #31.40
        addpd     %xmm2, %xmm1                                  #31.7
        unpcklpd  %xmm3, %xmm3                                  #32.41
        mulpd     %xmm3, %xmm4                                  #32.40
        addpd     %xmm4, %xmm0                                  #32.7
        addq      $16, %rdx                                     #30.5
        cmpq      $160, %rdx                                    #30.5
        jl        ..B3.7        # Prob 90%                      #30.5

so I guess ifort vectorizes where gfortran does not.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (3 preceding siblings ...)
  2008-01-07 22:58 ` jv244 at cam dot ac dot uk
@ 2008-01-08 10:22 ` jv244 at cam dot ac dot uk
  2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-01-08 10:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from jv244 at cam dot ac dot uk  2008-01-08 09:52 -------
updated the summary after the analysis in comment #4, and and CCed Dorit for
the vectorization issue.


-- 

jv244 at cam dot ac dot uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dorit at il dot ibm dot com
            Summary|300% difference between     |20% difference between
                   |ifort/gfortran              |ifort/gfortran, missed
                   |                            |vectorization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (4 preceding siblings ...)
  2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
@ 2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
  2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2008-08-18 15:20 -------
The problem for the GCC vectorizer is that there are no loads or stores left
in the loop and it doesn't handle vectorizing "registers" only.  This is a
case where real vectorization of straight-line code would be necessary.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (5 preceding siblings ...)
  2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
@ 2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
  2008-08-19  5:45 ` jv244 at cam dot ac dot uk
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2008-08-18 15:22 -------
That is, GCCs inner loop is

.L6:
        addl    $1, %eax
        addsd   %xmm12, %xmm11
        cmpl    $100000000, %eax
        addsd   %xmm14, %xmm3
        addsd   %xmm15, %xmm2
        addsd   %xmm13, %xmm1
        jne     .L6

which doesn't necessarily look slower than ICCs.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (6 preceding siblings ...)
  2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
@ 2008-08-19  5:45 ` jv244 at cam dot ac dot uk
  2008-08-19  5:45 ` jv244 at cam dot ac dot uk
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19  5:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from jv244 at cam dot ac dot uk  2008-08-19 05:43 -------
(In reply to comment #7)
> That is, GCCs inner loop is
> 
> .L6:
>         addl    $1, %eax
>         addsd   %xmm12, %xmm11
>         cmpl    $100000000, %eax
>         addsd   %xmm14, %xmm3
>         addsd   %xmm15, %xmm2
>         addsd   %xmm13, %xmm1
>         jne     .L6
> 
> which doesn't necessarily look slower than ICCs.
> 

Right... checked trunk, and it now does something very smart with the testcase
from comment 4 ... it is now about 10 times faster than ifort (9.1 /11.0)

> gfortran -O3 -ftree-vectorize -ffast-math -march=native -S PR31079_4.f90
> ./a.out
  0.25201499

> ifort -xT -O2 PR31079_4.f90
> ./a.out
   2.040127

I'll see if there is a way to get the testcase somewhat smarter. I checked the
very first program (comment #0), and this is still slower with gfortran (intel
3.51 vs gfortran 4.1). Just for completeness, I attach the Fortran source and
the intel assembly. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (7 preceding siblings ...)
  2008-08-19  5:45 ` jv244 at cam dot ac dot uk
@ 2008-08-19  5:45 ` jv244 at cam dot ac dot uk
  2008-08-19  5:46 ` jv244 at cam dot ac dot uk
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19  5:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jv244 at cam dot ac dot uk  2008-08-19 05:44 -------
Created an attachment (id=16093)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16093&action=view)
comment #0 source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (8 preceding siblings ...)
  2008-08-19  5:45 ` jv244 at cam dot ac dot uk
@ 2008-08-19  5:46 ` jv244 at cam dot ac dot uk
  2008-08-19  6:11 ` jv244 at cam dot ac dot uk
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19  5:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from jv244 at cam dot ac dot uk  2008-08-19 05:45 -------
Created an attachment (id=16094)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16094&action=view)
comment #0 intel's assembly (ifort 9.1 at -O2 -xT)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (9 preceding siblings ...)
  2008-08-19  5:46 ` jv244 at cam dot ac dot uk
@ 2008-08-19  6:11 ` jv244 at cam dot ac dot uk
  2008-08-19  6:12 ` jv244 at cam dot ac dot uk
  2008-08-19 13:33 ` jv244 at cam dot ac dot uk
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19  6:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jv244 at cam dot ac dot uk  2008-08-19 06:09 -------
Created an attachment (id=16095)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16095&action=view)
new testcase 

This (PR31079_11.f90) should be a replacement for comment #4, and illustrates
the vectorizer issue.

> gfortran -O3 -ftree-vectorize -ffast-math -march=native PR31079_11.f90
> ./a.out
   4.0282512

> ifort -O3 -xT PR31079_11.f90
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: BLOCK WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(17): (col. 8) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(24): (col. 5) remark: BLOCK WAS VECTORIZED.
PR31079_11.f90(30): (col. 7) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(31): (col. 7) remark: LOOP WAS VECTORIZED.
> ./a.out
   2.640165

The inner loop looks like:

    DO i=1,N
      s(1:2)=s(1:2)+pxy(i)%a(:)*dpy(i)%a(1)
      s(3:4)=s(3:4)+pxy(i)%a(:)*dpy(i)%a(2)
    ENDDO

which ifort vectorizes (I will attach the full asm):

..B3.4:                         # Preds ..B3.4 ..B3.3
        movddup   collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm2  #30.33
        movddup   8+collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm4 #31.33
        movaps    collocate_core_2_2_0_0_$PXY.0.1(%rax), %xmm3  #30.7
        mulpd     %xmm3, %xmm2                                  #30.32
        incq      %rdx                                          #29.5
        addq      $16, %rax                                     #29.5
        addpd     %xmm2, %xmm1                                  #30.7
        cmpq      $1000, %rdx                                   #29.5
        mulpd     %xmm3, %xmm4                                  #31.32
        addpd     %xmm4, %xmm0                                  #31.7
        jl        ..B3.4        # Prob 99%                      #29.5


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (10 preceding siblings ...)
  2008-08-19  6:11 ` jv244 at cam dot ac dot uk
@ 2008-08-19  6:12 ` jv244 at cam dot ac dot uk
  2008-08-19 13:33 ` jv244 at cam dot ac dot uk
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19  6:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from jv244 at cam dot ac dot uk  2008-08-19 06:11 -------
Created an attachment (id=16096)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16096&action=view)
ifort's asm for PR31079_11.f90 at -O3 -xT -S


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
  2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
                   ` (11 preceding siblings ...)
  2008-08-19  6:12 ` jv244 at cam dot ac dot uk
@ 2008-08-19 13:33 ` jv244 at cam dot ac dot uk
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 13:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jv244 at cam dot ac dot uk  2008-08-19 13:31 -------
(In reply to comment #11)

> This (PR31079_11.f90) should be a replacement for comment #4, and illustrates
> the vectorizer issue.

The patch Richard posted in PR37150 also improves this PR31079_11.f90 testcase
a lot:

ifort               : 2.54
gfortran (unpatched): 4.00
gfortran (patched)  : 2.96


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-08-19 13:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-08  9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2007-03-08 11:11 ` [Bug tree-optimization/31079] " jv244 at cam dot ac dot uk
2007-06-20 21:00 ` fxcoudert at gcc dot gnu dot org
2007-06-21  4:16 ` jv244 at cam dot ac dot uk
2008-01-07 22:58 ` jv244 at cam dot ac dot uk
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
2008-08-19  5:45 ` jv244 at cam dot ac dot uk
2008-08-19  5:45 ` jv244 at cam dot ac dot uk
2008-08-19  5:46 ` jv244 at cam dot ac dot uk
2008-08-19  6:11 ` jv244 at cam dot ac dot uk
2008-08-19  6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).