public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?)
@ 2018-04-12 13:56 Laércio LIMA PILLA
  2018-04-12 14:00 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Laércio LIMA PILLA @ 2018-04-12 13:56 UTC (permalink / raw)
  To: fortran

Dear all,

TL;DR version: I have been noticing very extreme performance differences
(up to a factor of 3) between ifort and gfortran.
As I checked the assembly code, I noticed that the compilers are using
different instructions (e.g., ifort uses 'vbroadcastsd').
Am I missing any special optimization flags (besides -march and -mtune
native) or is this expected?

Original version:

I have been working on the optimization of a Fortran 95 [and over]
application that makes use of several matrix-vector multiplication kernels.
As the sizes of the matrices are well-known, the original developers
generated different kernels for different sizes.
An example for size 4 is given below.

  USE ISO_C_BINDING
  !...
  subroutine mv_mult_4_4(mat,vec,res)
    REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4,4) :: mat
    REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4)   :: vec
    REAL(C_DOUBLE), INTENT(OUT), DIMENSION(4)   :: res
    INTEGER(C_INT) :: iRow, iCol

    res = 0.0

    do iCol=1,4
       do iRow=1,4
          res(iRow) = res(iRow) + mat(iRow,iCol)*vec(iCol)
       end do
    end do

  end subroutine mv_mult_4_4

I have been noticing very significant performance differences in my tests
with gfortran, ifort, and different optimizations on my local system.
On the special case for a 20x20 matrix, ifort provides a code that reduces
the execution time by a factor of 3 for the same optimization flags.
I started checking the assembly code generated by the different compilers
and noticed some differences.
For the code snippet above, the assembly versions from ifort and gfortran
are presented below.
We can notice that ifort is using some instructions (vbroadcastsd) that are
not used by gfortran even though I am telling the compiler the specific
architecture of my processor.
As the general users of the application use gfortran, I would like to know:

1) Is this difference in instructions used expected?
2) Am I missing any additional optimization flag (besides -march and
-mtune) that could change that?
3) Are there any directives (besides OpenMP ones) that could help in this
case?

Assembly:
CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz

ifort w/ -O3 -march=native -mtune=native -autodouble -S:
# mark_description "Intel(R) Fortran Intel(R) 64 Compiler for applications
running on Intel(R) 64, Version 17.0.3.191 Build 2017";
# mark_description "0404";
# mark_description "-O3 -march=native -mtune=native -autodouble -S";
# -- Begin  mv_mult_4_4_
.text
# mark_begin;
       .align    16,0x90
.globl mv_mult_4_4_
mv_mult_4_4_:
# parameter 1: %rdi
# parameter 2: %rsi
# parameter 3: %rdx
#...
        vbroadcastsd (%rsi), %ymm0                              #157.50
        vbroadcastsd 8(%rsi), %ymm2                             #157.50
        vbroadcastsd 16(%rsi), %ymm3                            #157.50
        vbroadcastsd 24(%rsi), %ymm4                            #157.50
        vmulpd    (%rdi), %ymm0, %ymm1                          #157.11
        vfmadd132pd 32(%rdi), %ymm1, %ymm2                      #157.11
        vfmadd132pd 64(%rdi), %ymm2, %ymm3                      #157.11
        vfmadd132pd 96(%rdi), %ymm3, %ymm4                      #157.11
        vmovupd   %ymm4, (%rdx)                                 #157.11
        vzeroupper                                              #165.3
        ret                                                     #165.3
        .align    16,0x90
                                # LOE
.cfi_endproc
# mark_end;

---

gfortran (GNU Fortran (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0) w/ -O3
-march=native -mtune=native -fdefault-double-8 -fdefault-real-8 -S:
.p2align 4,,15
.globl __mv_mult_4_4
.type __mv_mult_4_4, @function
__mv_mult_4_4:
.LFB12:
.cfi_startproc
vpxor %xmm0, %xmm0, %xmm0
vmovups %xmm0, (%rdx)
vmovups %xmm0, 16(%rdx)
vmovupd (%rsi), %ymm0
vmovupd (%rdx), %ymm4
vpermpd $0, %ymm0, %ymm3
vfmadd132pd (%rdi), %ymm4, %ymm3
vpermpd $85, %ymm0, %ymm2
vfmadd132pd 32(%rdi), %ymm3, %ymm2
vpermpd $170, %ymm0, %ymm1
vpermpd $255, %ymm0, %ymm0
vfmadd132pd 64(%rdi), %ymm2, %ymm1
vfmadd132pd 96(%rdi), %ymm1, %ymm0
vmovupd %ymm0, (%rdx)
vzeroupper
ret
.cfi_endproc
.LFE12:
.size __mv_mult_4_4, .-__mv_mult_4_4
.p2align 4,,15

---

Best regards,

Laércio LIMA PILLA
Postdoctoral Researcher @ Inria Grenoble - Rhône-Alpes, CORSE project-team
Associate Professor @ UFSC, Brazil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?)
  2018-04-12 13:56 Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?) Laércio LIMA PILLA
@ 2018-04-12 14:00 ` Richard Biener
  2018-04-12 14:48   ` Steve Kargl
  2018-04-12 15:00   ` Laércio LIMA PILLA
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2018-04-12 14:00 UTC (permalink / raw)
  To: Laércio LIMA PILLA; +Cc: fortran

On Thu, Apr 12, 2018 at 3:55 PM, Laércio LIMA PILLA
<laercio.lima@inria.fr> wrote:
> Dear all,
>
> TL;DR version: I have been noticing very extreme performance differences
> (up to a factor of 3) between ifort and gfortran.
> As I checked the assembly code, I noticed that the compilers are using
> different instructions (e.g., ifort uses 'vbroadcastsd').
> Am I missing any special optimization flags (besides -march and -mtune
> native) or is this expected?
>
> Original version:
>
> I have been working on the optimization of a Fortran 95 [and over]
> application that makes use of several matrix-vector multiplication kernels.
> As the sizes of the matrices are well-known, the original developers
> generated different kernels for different sizes.
> An example for size 4 is given below.
>
>   USE ISO_C_BINDING
>   !...
>   subroutine mv_mult_4_4(mat,vec,res)
>     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4,4) :: mat
>     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4)   :: vec
>     REAL(C_DOUBLE), INTENT(OUT), DIMENSION(4)   :: res
>     INTEGER(C_INT) :: iRow, iCol
>
>     res = 0.0
>
>     do iCol=1,4
>        do iRow=1,4
>           res(iRow) = res(iRow) + mat(iRow,iCol)*vec(iCol)
>        end do
>     end do
>
>   end subroutine mv_mult_4_4

This doesn't seem complete as it doesn't compile for me...

> I have been noticing very significant performance differences in my tests
> with gfortran, ifort, and different optimizations on my local system.
> On the special case for a 20x20 matrix, ifort provides a code that reduces
> the execution time by a factor of 3 for the same optimization flags.
> I started checking the assembly code generated by the different compilers
> and noticed some differences.
> For the code snippet above, the assembly versions from ifort and gfortran
> are presented below.
> We can notice that ifort is using some instructions (vbroadcastsd) that are
> not used by gfortran even though I am telling the compiler the specific
> architecture of my processor.
> As the general users of the application use gfortran, I would like to know:
>
> 1) Is this difference in instructions used expected?
> 2) Am I missing any additional optimization flag (besides -march and
> -mtune) that could change that?
> 3) Are there any directives (besides OpenMP ones) that could help in this
> case?

It looks like ifort does loop vecotrization on the inner loop while GCC
most certainly unrolls that fully and vectorizes the outer loop which in turn
requires all the shuffling.  You can see if -fdisable-tree-cunrolli solves this
(just for debugging!).

Richard.

> Assembly:
> CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
>
> ifort w/ -O3 -march=native -mtune=native -autodouble -S:
> # mark_description "Intel(R) Fortran Intel(R) 64 Compiler for applications
> running on Intel(R) 64, Version 17.0.3.191 Build 2017";
> # mark_description "0404";
> # mark_description "-O3 -march=native -mtune=native -autodouble -S";
> # -- Begin  mv_mult_4_4_
> .text
> # mark_begin;
>        .align    16,0x90
> .globl mv_mult_4_4_
> mv_mult_4_4_:
> # parameter 1: %rdi
> # parameter 2: %rsi
> # parameter 3: %rdx
> #...
>         vbroadcastsd (%rsi), %ymm0                              #157.50
>         vbroadcastsd 8(%rsi), %ymm2                             #157.50
>         vbroadcastsd 16(%rsi), %ymm3                            #157.50
>         vbroadcastsd 24(%rsi), %ymm4                            #157.50
>         vmulpd    (%rdi), %ymm0, %ymm1                          #157.11
>         vfmadd132pd 32(%rdi), %ymm1, %ymm2                      #157.11
>         vfmadd132pd 64(%rdi), %ymm2, %ymm3                      #157.11
>         vfmadd132pd 96(%rdi), %ymm3, %ymm4                      #157.11
>         vmovupd   %ymm4, (%rdx)                                 #157.11
>         vzeroupper                                              #165.3
>         ret                                                     #165.3
>         .align    16,0x90
>                                 # LOE
> .cfi_endproc
> # mark_end;
>
> ---
>
> gfortran (GNU Fortran (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0) w/ -O3
> -march=native -mtune=native -fdefault-double-8 -fdefault-real-8 -S:
> .p2align 4,,15
> .globl __mv_mult_4_4
> .type __mv_mult_4_4, @function
> __mv_mult_4_4:
> .LFB12:
> .cfi_startproc
> vpxor %xmm0, %xmm0, %xmm0
> vmovups %xmm0, (%rdx)
> vmovups %xmm0, 16(%rdx)
> vmovupd (%rsi), %ymm0
> vmovupd (%rdx), %ymm4
> vpermpd $0, %ymm0, %ymm3
> vfmadd132pd (%rdi), %ymm4, %ymm3
> vpermpd $85, %ymm0, %ymm2
> vfmadd132pd 32(%rdi), %ymm3, %ymm2
> vpermpd $170, %ymm0, %ymm1
> vpermpd $255, %ymm0, %ymm0
> vfmadd132pd 64(%rdi), %ymm2, %ymm1
> vfmadd132pd 96(%rdi), %ymm1, %ymm0
> vmovupd %ymm0, (%rdx)
> vzeroupper
> ret
> .cfi_endproc
> .LFE12:
> .size __mv_mult_4_4, .-__mv_mult_4_4
> .p2align 4,,15
>
> ---
>
> Best regards,
>
> Laércio LIMA PILLA
> Postdoctoral Researcher @ Inria Grenoble - Rhône-Alpes, CORSE project-team
> Associate Professor @ UFSC, Brazil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?)
  2018-04-12 14:00 ` Richard Biener
@ 2018-04-12 14:48   ` Steve Kargl
  2018-04-12 20:10     ` Thomas Koenig
  2018-04-12 15:00   ` Laércio LIMA PILLA
  1 sibling, 1 reply; 5+ messages in thread
From: Steve Kargl @ 2018-04-12 14:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Laércio LIMA PILLA, fortran

On Thu, Apr 12, 2018 at 04:00:46PM +0200, Richard Biener wrote:
> >
> >   USE ISO_C_BINDING

Move the above statement to ...

> >   !...
> >   subroutine mv_mult_4_4(mat,vec,res)

here.

> >     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4,4) :: mat
> >     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4)   :: vec
> >     REAL(C_DOUBLE), INTENT(OUT), DIMENSION(4)   :: res
> >     INTEGER(C_INT) :: iRow, iCol
> >
> >     res = 0.0
> >
> >     do iCol=1,4
> >        do iRow=1,4
> >           res(iRow) = res(iRow) + mat(iRow,iCol)*vec(iCol)
> >        end do
> >     end do
> >
> >   end subroutine mv_mult_4_4
> 
> This doesn't seem complete as it doesn't compile for me...

See above.

It would also be interesting to see the result of replacing
the loops with


   res = matmul(mat, vec)

as tkoenig (and jerryd?) works on optimizing matmul.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?)
  2018-04-12 14:00 ` Richard Biener
  2018-04-12 14:48   ` Steve Kargl
@ 2018-04-12 15:00   ` Laércio LIMA PILLA
  1 sibling, 0 replies; 5+ messages in thread
From: Laércio LIMA PILLA @ 2018-04-12 15:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: fortran

Thank you for the quick reply.

2018-04-12 16:00 GMT+02:00 Richard Biener <richard.guenther@gmail.com>:

> On Thu, Apr 12, 2018 at 3:55 PM,
>
>
>
> <laercio.lima@inria.fr> wrote:
> > Dear all,
> >
> > TL;DR version: I have been noticing very extreme performance differences
> > (up to a factor of 3) between ifort and gfortran.
> > As I checked the assembly code, I noticed that the compilers are using
> > different instructions (e.g., ifort uses 'vbroadcastsd').
> > Am I missing any special optimization flags (besides -march and -mtune
> > native) or is this expected?
> >
> > Original version:
> >
> > I have been working on the optimization of a Fortran 95 [and over]
> > application that makes use of several matrix-vector multiplication
> kernels.
> > As the sizes of the matrices are well-known, the original developers
> > generated different kernels for different sizes.
> > An example for size 4 is given below.
> >
> >   USE ISO_C_BINDING
> >   !...
> >   subroutine mv_mult_4_4(mat,vec,res)
> >     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4,4) :: mat
> >     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4)   :: vec
> >     REAL(C_DOUBLE), INTENT(OUT), DIMENSION(4)   :: res
> >     INTEGER(C_INT) :: iRow, iCol
> >
> >     res = 0.0
> >
> >     do iCol=1,4
> >        do iRow=1,4
> >           res(iRow) = res(iRow) + mat(iRow,iCol)*vec(iCol)
> >        end do
> >     end do
> >
> >   end subroutine mv_mult_4_4
>
> This doesn't seem complete as it doesn't compile for me...
>

Yes. My fault. When I took this part out of the code, I forgot to add the
module information. Here is a more complete version:

module example

  USE ISO_C_BINDING

contains

  subroutine mv_mult_4_4(mat,vec,res)
    REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4,4) :: mat
    REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4)   :: vec
    REAL(C_DOUBLE), INTENT(OUT), DIMENSION(4)   :: res
    INTEGER(C_INT) :: iRow, iCol

    res = 0.0

    do iCol=1,4
       do iRow=1,4
          res(iRow) = res(iRow) + mat(iRow,iCol)*vec(iCol)
       end do
    end do

  end subroutine mv_mult_4_4

end module example

>
> > I have been noticing very significant performance differences in my tests
> > with gfortran, ifort, and different optimizations on my local system.
> > On the special case for a 20x20 matrix, ifort provides a code that
> reduces
> > the execution time by a factor of 3 for the same optimization flags.
> > I started checking the assembly code generated by the different compilers
> > and noticed some differences.
> > For the code snippet above, the assembly versions from ifort and gfortran
> > are presented below.
> > We can notice that ifort is using some instructions (vbroadcastsd) that
> are
> > not used by gfortran even though I am telling the compiler the specific
> > architecture of my processor.
> > As the general users of the application use gfortran, I would like to
> know:
> >
> > 1) Is this difference in instructions used expected?
> > 2) Am I missing any additional optimization flag (besides -march and
> > -mtune) that could change that?
> > 3) Are there any directives (besides OpenMP ones) that could help in this
> > case?
>
> It looks like ifort does loop vecotrization on the inner loop while GCC
> most certainly unrolls that fully and vectorizes the outer loop which in
> turn
> requires all the shuffling.  You can see if -fdisable-tree-cunrolli solves
> this
> (just for debugging!).
>

I took your suggestion into account and added that flag. The result is a
better code that even includes vbroadcast:

.file "example.f90"
.text
.p2align 4,,15
.globl __example_MOD_mv_mult_4_4
.type __example_MOD_mv_mult_4_4, @function
__example_MOD_mv_mult_4_4:
.LFB0:
.cfi_startproc
vpxor %xmm0, %xmm0, %xmm0
vbroadcastsd (%rsi), %ymm1
vmovups %xmm0, (%rdx)
vmovups %xmm0, 16(%rdx)
vmovupd (%rdi), %ymm0
vfmadd213pd (%rdx), %ymm1, %ymm0
vbroadcastsd 8(%rsi), %ymm1
vfmadd132pd 32(%rdi), %ymm0, %ymm1
vbroadcastsd 16(%rsi), %ymm0
vfmadd231pd 64(%rdi), %ymm0, %ymm1
vbroadcastsd 24(%rsi), %ymm0
vfmadd132pd 96(%rdi), %ymm1, %ymm0
vmovupd %ymm0, (%rdx)
vzeroupper
ret
.cfi_endproc
.LFE0:
.size __example_MOD_mv_mult_4_4, .-__example_MOD_mv_mult_4_4
.ident "GCC: (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0"
.section .note.GNU-stack,"",@progbits

I also experimented with loop permutation, which also lead to better
assembly:

.file "example.f90"
.text
.p2align 4,,15
.globl __example_MOD_mv_mult_4_4
.type __example_MOD_mv_mult_4_4, @function
__example_MOD_mv_mult_4_4:
.LFB0:
.cfi_startproc
vpxor %xmm0, %xmm0, %xmm0
vbroadcastsd (%rsi), %ymm3
vbroadcastsd 8(%rsi), %ymm2
vmovups %xmm0, (%rdx)
vbroadcastsd 16(%rsi), %ymm1
vmovups %xmm0, 16(%rdx)
vmovupd (%rdx), %ymm4
vfmadd132pd (%rdi), %ymm4, %ymm3
vfmadd132pd 32(%rdi), %ymm3, %ymm2
vbroadcastsd 24(%rsi), %ymm0
vfmadd132pd 64(%rdi), %ymm2, %ymm1
vfmadd132pd 96(%rdi), %ymm1, %ymm0
vmovupd %ymm0, (%rdx)
vzeroupper
ret
.cfi_endproc
.LFE0:
.size __example_MOD_mv_mult_4_4, .-__example_MOD_mv_mult_4_4
.ident "GCC: (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0"
.section .note.GNU-stack,"",@progbits

Still, this does not seem to improve the code for the situation with a
20x20 matrix.
I will try some more things in the next few days.

Best regards,


>
> Richard.
>
> > Assembly:
> > CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
> >
> > ifort w/ -O3 -march=native -mtune=native -autodouble -S:
> > # mark_description "Intel(R) Fortran Intel(R) 64 Compiler for
> applications
> > running on Intel(R) 64, Version 17.0.3.191 Build 2017";
> > # mark_description "0404";
> > # mark_description "-O3 -march=native -mtune=native -autodouble -S";
> > # -- Begin  mv_mult_4_4_
> > .text
> > # mark_begin;
> >        .align    16,0x90
> > .globl mv_mult_4_4_
> > mv_mult_4_4_:
> > # parameter 1: %rdi
> > # parameter 2: %rsi
> > # parameter 3: %rdx
> > #...
> >         vbroadcastsd (%rsi), %ymm0                              #157.50
> >         vbroadcastsd 8(%rsi), %ymm2                             #157.50
> >         vbroadcastsd 16(%rsi), %ymm3                            #157.50
> >         vbroadcastsd 24(%rsi), %ymm4                            #157.50
> >         vmulpd    (%rdi), %ymm0, %ymm1                          #157.11
> >         vfmadd132pd 32(%rdi), %ymm1, %ymm2                      #157.11
> >         vfmadd132pd 64(%rdi), %ymm2, %ymm3                      #157.11
> >         vfmadd132pd 96(%rdi), %ymm3, %ymm4                      #157.11
> >         vmovupd   %ymm4, (%rdx)                                 #157.11
> >         vzeroupper                                              #165.3
> >         ret                                                     #165.3
> >         .align    16,0x90
> >                                 # LOE
> > .cfi_endproc
> > # mark_end;
> >
> > ---
> >
> > gfortran (GNU Fortran (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0) w/ -O3
> > -march=native -mtune=native -fdefault-double-8 -fdefault-real-8 -S:
> > .p2align 4,,15
> > .globl __mv_mult_4_4
> > .type __mv_mult_4_4, @function
> > __mv_mult_4_4:
> > .LFB12:
> > .cfi_startproc
> > vpxor %xmm0, %xmm0, %xmm0
> > vmovups %xmm0, (%rdx)
> > vmovups %xmm0, 16(%rdx)
> > vmovupd (%rsi), %ymm0
> > vmovupd (%rdx), %ymm4
> > vpermpd $0, %ymm0, %ymm3
> > vfmadd132pd (%rdi), %ymm4, %ymm3
> > vpermpd $85, %ymm0, %ymm2
> > vfmadd132pd 32(%rdi), %ymm3, %ymm2
> > vpermpd $170, %ymm0, %ymm1
> > vpermpd $255, %ymm0, %ymm0
> > vfmadd132pd 64(%rdi), %ymm2, %ymm1
> > vfmadd132pd 96(%rdi), %ymm1, %ymm0
> > vmovupd %ymm0, (%rdx)
> > vzeroupper
> > ret
> > .cfi_endproc
> > .LFE12:
> > .size __mv_mult_4_4, .-__mv_mult_4_4
> > .p2align 4,,15
> >
> > ---
> >
> > Best regards,
> >
> > Laércio LIMA PILLA
> > Postdoctoral Researcher @ Inria Grenoble - Rhône-Alpes, CORSE
> project-team
> > Associate Professor @ UFSC, Brazil
>

Laércio LIMA PILLA
Postdoctoral Researcher @ Inria Grenoble - Rhône-Alpes, CORSE project-team
Associate Professor @ UFSC, Brazil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?)
  2018-04-12 14:48   ` Steve Kargl
@ 2018-04-12 20:10     ` Thomas Koenig
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2018-04-12 20:10 UTC (permalink / raw)
  To: sgk, Richard Biener; +Cc: Laércio LIMA PILLA, fortran

Steve wrote:

> It would also be interesting to see the result of replacing
> the loops with
> 
> 
>     res = matmul(mat, vec)
> 
> as tkoenig (and jerryd?) works on optimizing matmul.

For a 4*4 matrix and a 4 vector, with optimization, the code
will be inlined to equivalent DO loops. It should result in the
same speed as the explicit DO loops.

Regards

	Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-12 20:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 13:56 Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?) Laércio LIMA PILLA
2018-04-12 14:00 ` Richard Biener
2018-04-12 14:48   ` Steve Kargl
2018-04-12 20:10     ` Thomas Koenig
2018-04-12 15:00   ` Laércio LIMA PILLA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).