[Bug middle-end/41969] New: [LTO] 23% slow-down with -flto -fwhole-program

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/41969]  New: [LTO] 23% slow-down with -flto -fwhole-program
@ 2009-11-06 13:27 burnus at gcc dot gnu dot org
  2009-11-06 16:42 ` [Bug middle-end/41969] " rguenth at gcc dot gnu dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-11-06 13:27 UTC (permalink / raw)
  To: gcc-bugs

The following test case takes

  9.100s with gfortran -O3
 11.113s with gfortran -flto -fwhole-program -O3
  9.582s with ifort -O3

  5.205s with gfortran -ffast-math -march=native -funroll-loops -O3
  7.414s with gfortran -ffast-math -march=native -funroll-loops -flto
-fwhole-program -O3
  9.624s with ifort -xHost -O3

on an AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
running openSUSE 11.2 RC 1 (x86_64).


Thus the LTO version takes 23% more - with unrolled loops/fast math and
-march=native, the LTO version is even 42% slower.


Without the "print" statement, the "ifort" version only takes 0.002s which
means that it optimizes the complete loop away - GCC does not (why?).
Otherwise, gfortran wins the race by 5% to 46%.


module vector_ops
contains
  FUNCTION vector_function(v1,v2) RESULT(v3)
    IMPLICIT NONE
    INTEGER, PARAMETER :: ndim = 100
    REAL*8, DIMENSION(ndim), INTENT(IN) :: v1, v2
    REAL*8, DIMENSION(ndim) :: v3
    INTEGER :: i

    DO i = 1, ndim
      v3(i) = 0.5D0 * v1(i) + 0.5D0 * v2(i)
    END DO
  END FUNCTION vector_function
end module vector_ops

PROGRAM vectorsyntax
  USE vector_ops
  IMPLICIT NONE
  INTEGER, PARAMETER :: ndim = 100
  REAL*8, DIMENSION(ndim) :: u, v, w
  INTEGER :: i, iter

  DO i = 1, ndim
    CALL random_number( u(i) )
    CALL random_number( v(i) )
  END DO

  iter = 50000000
  DO i = 1, iter
    w = vector_function( u, v )
  END DO
  print *, w(1),w(50),w(100)
END PROGRAM


-- 
           Summary: [LTO] 23% slow-down with -flto -fwhole-program
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Keywords: lto
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: burnus at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
  2009-11-06 13:27 [Bug middle-end/41969] New: [LTO] 23% slow-down with -flto -fwhole-program burnus at gcc dot gnu dot org
@ 2009-11-06 16:42 ` rguenth at gcc dot gnu dot org
  2009-11-06 16:46 ` rguenth at gcc dot gnu dot org
  2009-11-06 16:50 ` rguenth at gcc dot gnu dot org
  2 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-06 16:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-11-06 16:42 -------
I'd be interested in -O3 [-fwhole-program] -fwhole-file numbers.  This
testcase shouldn't need any LTO to be optimized properly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
  2009-11-06 13:27 [Bug middle-end/41969] New: [LTO] 23% slow-down with -flto -fwhole-program burnus at gcc dot gnu dot org
  2009-11-06 16:42 ` [Bug middle-end/41969] " rguenth at gcc dot gnu dot org
@ 2009-11-06 16:46 ` rguenth at gcc dot gnu dot org
  2009-11-06 16:50 ` rguenth at gcc dot gnu dot org
  2 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-06 16:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-11-06 16:46 -------
For me it's all the same speed apart from plain -O3 which is slower.  Basically
the theory is that if inlined we know the alignment of the arrays and
thus can do aligned loads/stores in the vectorized code (and avoid some
runtime tests).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
  2009-11-06 13:27 [Bug middle-end/41969] New: [LTO] 23% slow-down with -flto -fwhole-program burnus at gcc dot gnu dot org
  2009-11-06 16:42 ` [Bug middle-end/41969] " rguenth at gcc dot gnu dot org
  2009-11-06 16:46 ` rguenth at gcc dot gnu dot org
@ 2009-11-06 16:50 ` rguenth at gcc dot gnu dot org
  2 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-06 16:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2009-11-06 16:50 -------
It's btw another case where we do not try to force alignment of the
stack locals :/


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
       [not found] <bug-41969-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2012-04-29 18:49 ` mikolajmm at gmail dot com
@ 2012-04-30  0:39 ` hubicka at ucw dot cz
  3 siblings, 0 replies; 9+ messages in thread
From: hubicka at ucw dot cz @ 2012-04-30  0:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969

--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> 2012-04-30 00:39:08 UTC ---
It doesn't help to attach this into a closed bugreport.  LTO allows GCC to do a
lot more transformations
and sometime the transformations hurts the final performance. This is a bug and
we will try to fix it.
>From information you gave it is however impossible to guess what may possibly
go wrong.
If you provide some more analysis about why your program gets slower and attach
it into a (new) PR,
we will try to take a look.

Honza


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
  2012-04-29 18:49 ` mikolajmm at gmail dot com
@ 2012-04-30  0:39   ` Jan Hubicka
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Hubicka @ 2012-04-30  0:39 UTC (permalink / raw)
  To: mikolajmm at gmail dot com; +Cc: gcc-bugs

It doesn't help to attach this into a closed bugreport.  LTO allows GCC to do a lot more transformations
and sometime the transformations hurts the final performance. This is a bug and we will try to fix it.
>From information you gave it is however impossible to guess what may possibly go wrong.
If you provide some more analysis about why your program gets slower and attach it into a (new) PR,
we will try to take a look.

Honza


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
       [not found] <bug-41969-4@http.gcc.gnu.org/bugzilla/>
  2012-03-13 23:52 ` pinskia at gcc dot gnu.org
  2012-03-18 16:32 ` burnus at gcc dot gnu.org
@ 2012-04-29 18:49 ` mikolajmm at gmail dot com
  2012-04-30  0:39   ` Jan Hubicka
  2012-04-30  0:39 ` hubicka at ucw dot cz
  3 siblings, 1 reply; 9+ messages in thread
From: mikolajmm at gmail dot com @ 2012-04-29 18:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969

travnick <mikolajmm at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mikolajmm at gmail dot com

--- Comment #6 from travnick <mikolajmm at gmail dot com> 2012-04-29 18:48:49 UTC ---
I have the same issue with flags:
-fwhole-program
-ftlo (-flto: eaven slower, faster (this result) for me: -flto=2)
-fuse-linker-plugin

In my project: https://github.com/travnick/GKiO-Projekt-GK
after applying this flags, default scene renders in 30-32s on my machine.
Without them it takes only 19-22 s so this flags make my project slower by 50%.

gcc version 4.7.0 20120414 (prerelease) (GCC)

Linux 3.3-pf #2 SMP PREEMPT Sat Apr 28 02:09:06 EEST 2012 x86_64 Intel(R)
Core(TM)2 Duo CPU T6600 @ 2.20GHz GenuineIntel GNU/Linux


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
       [not found] <bug-41969-4@http.gcc.gnu.org/bugzilla/>
  2012-03-13 23:52 ` pinskia at gcc dot gnu.org
@ 2012-03-18 16:32 ` burnus at gcc dot gnu.org
  2012-04-29 18:49 ` mikolajmm at gmail dot com
  2012-04-30  0:39 ` hubicka at ucw dot cz
  3 siblings, 0 replies; 9+ messages in thread
From: burnus at gcc dot gnu.org @ 2012-03-18 16:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |burnus at gcc dot gnu.org
         Resolution|                            |FIXED

--- Comment #5 from Tobias Burnus <burnus at gcc dot gnu.org> 2012-03-18 16:23:12 UTC ---
(In reply to comment #4)
> Is this still true?

It seems to be FIXED - and the performance is now better as well!

(In reply to comment #0)
> The following test case takes
>   9.100s with gfortran -O3
>  11.113s with gfortran -flto -fwhole-program -O3

Now with GCC 4.8: 6.780s (For completeness, with -fno-whole-file: 11.106s)

>   5.205s with gfortran -ffast-math -march=native -funroll-loops -O3
>   7.414s with gfortran -ffast-math -march=native -funroll-loops -flto
> -fwhole-program -O3

Now: 4.765s


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/41969] [LTO] 23% slow-down with -flto -fwhole-program
       [not found] <bug-41969-4@http.gcc.gnu.org/bugzilla/>
@ 2012-03-13 23:52 ` pinskia at gcc dot gnu.org
  2012-03-18 16:32 ` burnus at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-03-13 23:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41969

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-03-13 23:49:58 UTC ---
Is this still true?


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-04-30  0:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-06 13:27 [Bug middle-end/41969] New: [LTO] 23% slow-down with -flto -fwhole-program burnus at gcc dot gnu dot org
2009-11-06 16:42 ` [Bug middle-end/41969] " rguenth at gcc dot gnu dot org
2009-11-06 16:46 ` rguenth at gcc dot gnu dot org
2009-11-06 16:50 ` rguenth at gcc dot gnu dot org
     [not found] <bug-41969-4@http.gcc.gnu.org/bugzilla/>
2012-03-13 23:52 ` pinskia at gcc dot gnu.org
2012-03-18 16:32 ` burnus at gcc dot gnu.org
2012-04-29 18:49 ` mikolajmm at gmail dot com
2012-04-30  0:39   ` Jan Hubicka
2012-04-30  0:39 ` hubicka at ucw dot cz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).