public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: "Laércio LIMA PILLA" <laercio.lima@inria.fr>
Cc: "fortran@gcc.gnu.org" <fortran@gcc.gnu.org>
Subject: Re: Difference in assembly code generated (gfortran vs ifort) (optimization flags missing?)
Date: Thu, 12 Apr 2018 14:00:00 -0000	[thread overview]
Message-ID: <CAFiYyc3P-vvVe3L6uRkamgEpOZQ8yE53tGpp_HphD5AJOwTNgw@mail.gmail.com> (raw)
In-Reply-To: <CAF1HNd+tjDQjx-NM9mDpvRn=dDSZjCcent5y=XG35jJWJE2z2A@mail.gmail.com>

On Thu, Apr 12, 2018 at 3:55 PM, Laércio LIMA PILLA
<laercio.lima@inria.fr> wrote:
> Dear all,
>
> TL;DR version: I have been noticing very extreme performance differences
> (up to a factor of 3) between ifort and gfortran.
> As I checked the assembly code, I noticed that the compilers are using
> different instructions (e.g., ifort uses 'vbroadcastsd').
> Am I missing any special optimization flags (besides -march and -mtune
> native) or is this expected?
>
> Original version:
>
> I have been working on the optimization of a Fortran 95 [and over]
> application that makes use of several matrix-vector multiplication kernels.
> As the sizes of the matrices are well-known, the original developers
> generated different kernels for different sizes.
> An example for size 4 is given below.
>
>   USE ISO_C_BINDING
>   !...
>   subroutine mv_mult_4_4(mat,vec,res)
>     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4,4) :: mat
>     REAL(C_DOUBLE), INTENT(IN),  DIMENSION(4)   :: vec
>     REAL(C_DOUBLE), INTENT(OUT), DIMENSION(4)   :: res
>     INTEGER(C_INT) :: iRow, iCol
>
>     res = 0.0
>
>     do iCol=1,4
>        do iRow=1,4
>           res(iRow) = res(iRow) + mat(iRow,iCol)*vec(iCol)
>        end do
>     end do
>
>   end subroutine mv_mult_4_4

This doesn't seem complete as it doesn't compile for me...

> I have been noticing very significant performance differences in my tests
> with gfortran, ifort, and different optimizations on my local system.
> On the special case for a 20x20 matrix, ifort provides a code that reduces
> the execution time by a factor of 3 for the same optimization flags.
> I started checking the assembly code generated by the different compilers
> and noticed some differences.
> For the code snippet above, the assembly versions from ifort and gfortran
> are presented below.
> We can notice that ifort is using some instructions (vbroadcastsd) that are
> not used by gfortran even though I am telling the compiler the specific
> architecture of my processor.
> As the general users of the application use gfortran, I would like to know:
>
> 1) Is this difference in instructions used expected?
> 2) Am I missing any additional optimization flag (besides -march and
> -mtune) that could change that?
> 3) Are there any directives (besides OpenMP ones) that could help in this
> case?

It looks like ifort does loop vecotrization on the inner loop while GCC
most certainly unrolls that fully and vectorizes the outer loop which in turn
requires all the shuffling.  You can see if -fdisable-tree-cunrolli solves this
(just for debugging!).

Richard.

> Assembly:
> CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
>
> ifort w/ -O3 -march=native -mtune=native -autodouble -S:
> # mark_description "Intel(R) Fortran Intel(R) 64 Compiler for applications
> running on Intel(R) 64, Version 17.0.3.191 Build 2017";
> # mark_description "0404";
> # mark_description "-O3 -march=native -mtune=native -autodouble -S";
> # -- Begin  mv_mult_4_4_
> .text
> # mark_begin;
>        .align    16,0x90
> .globl mv_mult_4_4_
> mv_mult_4_4_:
> # parameter 1: %rdi
> # parameter 2: %rsi
> # parameter 3: %rdx
> #...
>         vbroadcastsd (%rsi), %ymm0                              #157.50
>         vbroadcastsd 8(%rsi), %ymm2                             #157.50
>         vbroadcastsd 16(%rsi), %ymm3                            #157.50
>         vbroadcastsd 24(%rsi), %ymm4                            #157.50
>         vmulpd    (%rdi), %ymm0, %ymm1                          #157.11
>         vfmadd132pd 32(%rdi), %ymm1, %ymm2                      #157.11
>         vfmadd132pd 64(%rdi), %ymm2, %ymm3                      #157.11
>         vfmadd132pd 96(%rdi), %ymm3, %ymm4                      #157.11
>         vmovupd   %ymm4, (%rdx)                                 #157.11
>         vzeroupper                                              #165.3
>         ret                                                     #165.3
>         .align    16,0x90
>                                 # LOE
> .cfi_endproc
> # mark_end;
>
> ---
>
> gfortran (GNU Fortran (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0) w/ -O3
> -march=native -mtune=native -fdefault-double-8 -fdefault-real-8 -S:
> .p2align 4,,15
> .globl __mv_mult_4_4
> .type __mv_mult_4_4, @function
> __mv_mult_4_4:
> .LFB12:
> .cfi_startproc
> vpxor %xmm0, %xmm0, %xmm0
> vmovups %xmm0, (%rdx)
> vmovups %xmm0, 16(%rdx)
> vmovupd (%rsi), %ymm0
> vmovupd (%rdx), %ymm4
> vpermpd $0, %ymm0, %ymm3
> vfmadd132pd (%rdi), %ymm4, %ymm3
> vpermpd $85, %ymm0, %ymm2
> vfmadd132pd 32(%rdi), %ymm3, %ymm2
> vpermpd $170, %ymm0, %ymm1
> vpermpd $255, %ymm0, %ymm0
> vfmadd132pd 64(%rdi), %ymm2, %ymm1
> vfmadd132pd 96(%rdi), %ymm1, %ymm0
> vmovupd %ymm0, (%rdx)
> vzeroupper
> ret
> .cfi_endproc
> .LFE12:
> .size __mv_mult_4_4, .-__mv_mult_4_4
> .p2align 4,,15
>
> ---
>
> Best regards,
>
> Laércio LIMA PILLA
> Postdoctoral Researcher @ Inria Grenoble - Rhône-Alpes, CORSE project-team
> Associate Professor @ UFSC, Brazil

  reply	other threads:[~2018-04-12 14:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12 13:56 Laércio LIMA PILLA
2018-04-12 14:00 ` Richard Biener [this message]
2018-04-12 14:48   ` Steve Kargl
2018-04-12 20:10     ` Thomas Koenig
2018-04-12 15:00   ` Laércio LIMA PILLA

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc3P-vvVe3L6uRkamgEpOZQ8yE53tGpp_HphD5AJOwTNgw@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=fortran@gcc.gnu.org \
    --cc=laercio.lima@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).