public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
@ 2010-10-01 12:05 ` jakub at gcc dot gnu.org
  2011-02-20 15:33 ` steven at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2010-10-01 12:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.5                       |4.4.6


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
  2010-10-01 12:05 ` [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures jakub at gcc dot gnu.org
@ 2011-02-20 15:33 ` steven at gcc dot gnu.org
  2011-02-20 16:24 ` Joost.VandeVondele at pci dot uzh.ch
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: steven at gcc dot gnu.org @ 2011-02-20 15:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING
                 CC|                            |steven at gcc dot gnu.org

--- Comment #18 from Steven Bosscher <steven at gcc dot gnu.org> 2011-02-20 15:22:26 UTC ---
Hello Joost, could you please check if this is still a problem in GCC 4.6?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
  2010-10-01 12:05 ` [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures jakub at gcc dot gnu.org
  2011-02-20 15:33 ` steven at gcc dot gnu.org
@ 2011-02-20 16:24 ` Joost.VandeVondele at pci dot uzh.ch
  2011-02-20 16:32 ` Joost.VandeVondele at pci dot uzh.ch
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-02-20 16:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #19 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-02-20 16:17:33 UTC ---
(In reply to comment #18)
> Hello Joost, could you please check if this is still a problem in GCC 4.6?

I think it still is a minor problem, but (without   -fschedule-insns) somewhat
less pronounced (the old hardware is gone, this might make a difference):

4.3 branch

> gfortran -O3 -march=native -funroll-loops  -ffast-math   -fschedule-insns test.f90 ; ./a.out 
Time for evaluation [s]:                        3.478
> gfortran -O3 -march=native -funroll-loops  -ffast-math   test.f90 ; ./a.out 
Time for evaluation [s]:                        4.367

4.5 branch

> gfortran -O3 -march=native -funroll-loops  -ffast-math   -fschedule-insns test.f90 ; ./a.out 
Time for evaluation [s]:                        4.839
> gfortran -O3 -march=native -funroll-loops  -ffast-math  test.f90 ; ./a.out 
Time for evaluation [s]:                        4.524

4.6 branch
> gfortran -O3 -march=native -funroll-loops  -ffast-math   -fschedule-insns test.f90 ; ./a.out 
Time for evaluation [s]:                        4.997
> gfortran -O3 -march=native -funroll-loops  -ffast-math   test.f90 ; ./a.out 
Time for evaluation [s]:                        4.547

FYI: -march=amdfam10 -mcx16 -msahf -mpopcnt -mabm
model name      : AMD Opteron(tm) Processor 6176 SE


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-02-20 16:24 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-02-20 16:32 ` Joost.VandeVondele at pci dot uzh.ch
  2011-02-20 16:50 ` Joost.VandeVondele at pci dot uzh.ch
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-02-20 16:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #20 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-02-20 16:28:00 UTC ---
additionally for trunk, lto/profile-use seem not to help:

> gfortran -O3 -march=native -funroll-loops  -ffast-math  -flto -fprofile-use test.f90 ; ./a.out 
Time for evaluation [s]:                        4.664

> gfortran -O3 -march=native -funroll-loops  -ffast-math   -fprofile-use test.f90 ; ./a.out 
Time for evaluation [s]:                        4.665


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2011-02-20 16:32 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-02-20 16:50 ` Joost.VandeVondele at pci dot uzh.ch
  2011-02-20 18:59 ` steven at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-02-20 16:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #21 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-02-20 16:32:38 UTC ---
... however, the following works great:

> gfortran -O2 -march=native -funroll-loops  -ffast-math  -ftree-vectorize test.f90 ; ./a.out 
Time for evaluation [s]:                        2.700

(notice -O2 instead of -O3, -O2 is thus twice as fast as -O3)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2011-02-20 16:50 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-02-20 18:59 ` steven at gcc dot gnu.org
  2011-02-21  8:29 ` bonzini at gnu dot org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: steven at gcc dot gnu.org @ 2011-02-20 18:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2011-02-20 18:59 ` steven at gcc dot gnu.org
@ 2011-02-21  8:29 ` bonzini at gnu dot org
  2011-02-21 12:56 ` Joost.VandeVondele at pci dot uzh.ch
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bonzini at gnu dot org @ 2011-02-21  8:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #22 from Paolo Bonzini <bonzini at gnu dot org> 2011-02-21 07:55:35 UTC ---
What is the performance with 4.3 -O2?  A regression that is limited to -O3 is
(a bit) less important since -O3 is still a "mixing bag" of optimizations that
might or might not be proficient.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2011-02-21  8:29 ` bonzini at gnu dot org
@ 2011-02-21 12:56 ` Joost.VandeVondele at pci dot uzh.ch
  2011-04-16 11:17 ` [Bug target/38306] [4.4/4.5/4.6/4.7 " jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-02-21 12:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #23 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-02-21 12:53:30 UTC ---
(In reply to comment #22)
> What is the performance with 4.3 -O2?  

4.3:
> gfortran -O2 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out
Time for evaluation [s]:                        4.373

4.6:
>  gfortran -O2 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out
Time for evaluation [s]:                        4.347

so, same performance. 

Given that vectorization only happens at -O3, it is an important optimization
level for numerical codes. Nevertheless, I would propose to remove the
regression tag, and instead refocus the bug on the what current trunk does at
-O3 vs -O2 -ftree-vectorize as noted in comment #21

> gfortran -O2 -march=native -funroll-loops  -ffast-math  -ftree-vectorize test.f90 ; ./a.out
Time for evaluation [s]:                        2.694

> gfortran -O3 -march=native -funroll-loops  -ffast-math  -ftree-vectorize test.f90 ; ./a.out
Time for evaluation [s]:                        4.536


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2011-02-21 12:56 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-04-16 11:17 ` jakub at gcc dot gnu.org
  2011-09-09 19:11 ` Joost.VandeVondele at pci dot uzh.ch
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-04-16 11:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.6                       |4.4.7


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2011-04-16 11:17 ` [Bug target/38306] [4.4/4.5/4.6/4.7 " jakub at gcc dot gnu.org
@ 2011-09-09 19:11 ` Joost.VandeVondele at pci dot uzh.ch
  2011-09-10  9:52 ` manu at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-09-09 19:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2011-02-20 19:01:16         |2011-09-09 19:01:16

--- Comment #24 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-09-09 19:06:50 UTC ---
checked again current trunk, the situation remains that -O2 is much faster than
-O3:

> gfortran -O2 -march=native -funroll-loops  -ffast-math  -ftree-vectorize pr38306.f90  ; ./a.out
Time for evaluation [s]:                        2.830

> gfortran -O3 -march=native -funroll-loops  -ffast-math  -ftree-vectorize pr38306.f90  ; ./a.out
Time for evaluation [s]:                        4.593

The issue is that at -O3 the subroutine PD2VAL is not vectorized, while it is
at -O2.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2011-09-09 19:11 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-09-10  9:52 ` manu at gcc dot gnu.org
  2011-09-10 13:11 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: manu at gcc dot gnu.org @ 2011-09-10  9:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Manuel López-Ibáñez <manu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |manu at gcc dot gnu.org

--- Comment #25 from Manuel López-Ibáñez <manu at gcc dot gnu.org> 2011-09-10 09:43:58 UTC ---
(In reply to comment #24)
> 
> The issue is that at -O3 the subroutine PD2VAL is not vectorized, while it is
> at -O2.

If you are interested in investigating why this is so by yourself, I would
suggest that you use the various -fdump- options to check what GCC is doing
differently between the two variants. 

1) Dump everything you can dump.

2) Then find the earliest optimization pass where they differ (you may even use
diff to make this faster).

3) Check subsequent dumps to see if that difference is actually what makes -O3
to not vectorize. (At this point you can play with -f* -fno-* to reduce the
differences further and isolate the trigger).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2011-09-10  9:52 ` manu at gcc dot gnu.org
@ 2011-09-10 13:11 ` ubizjak at gmail dot com
  2011-09-13  8:30 ` Joost.VandeVondele at pci dot uzh.ch
  2012-01-16 12:41 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: ubizjak at gmail dot com @ 2011-09-10 13:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #26 from Uros Bizjak <ubizjak at gmail dot com> 2011-09-10 12:31:23 UTC ---
At -O3, vectorizer says:

pr38306.f90:246: note: not vectorized: the size of group of strided accesses is
not a power of 2D.2258_518 = *c0_193(D)[D.2257_517];


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2011-09-10 13:11 ` ubizjak at gmail dot com
@ 2011-09-13  8:30 ` Joost.VandeVondele at pci dot uzh.ch
  2012-01-16 12:41 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-09-13  8:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #27 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-09-13 07:59:06 UTC ---
(In reply to comment #25)
> 2) Then find the earliest optimization pass where they differ (you may even use
> diff to make this faster).

The first point where things differ for PD2VAL is 

pr38306_xxx.f90.057t.cunrolli

afterwards, everything seems fully different.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6/4.7 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
       [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2011-09-13  8:30 ` Joost.VandeVondele at pci dot uzh.ch
@ 2012-01-16 12:41 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-16 12:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME

--- Comment #28 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-16 12:38:17 UTC ---
Using original flags (-O3 -march=native -funroll-loops -ffast-math) on
a AMD Athlon(tm) 64 X2 (that's close enough to "Opteron") (same family 15):

4.2.4: 5.78s
4.3.6: 5.77s
4.4.6: 5.84s
4.5.3: 5.77s
4.6.2: 5.85s
trunk: 5.75s

seems to be a wash for me and I cannot reproduce even the originally
reported slowdown in 4.4.  There are very many different flag measurements
in this report which makes it unlikely that this bug will be ever
properly triaged (or even "fixed").

Note that in general we'd like to concentrate monitoring performance
for standard flags (thus, not including -fschedule-insns if not enabled
by default).  -Ofast -funroll-loops is a reasonable flag set, as we (still)
do not enable loop unrolling by default at -O3.

I'm closing this as worksforme.  Please try to not make too much of a mess
out of regression bugreports ;)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
@ 2010-04-30  9:00 ` jakub at gcc dot gnu dot org
  0 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30  9:00 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.4                       |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-01-16 12:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-38306-4@http.gcc.gnu.org/bugzilla/>
2010-10-01 12:05 ` [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures jakub at gcc dot gnu.org
2011-02-20 15:33 ` steven at gcc dot gnu.org
2011-02-20 16:24 ` Joost.VandeVondele at pci dot uzh.ch
2011-02-20 16:32 ` Joost.VandeVondele at pci dot uzh.ch
2011-02-20 16:50 ` Joost.VandeVondele at pci dot uzh.ch
2011-02-20 18:59 ` steven at gcc dot gnu.org
2011-02-21  8:29 ` bonzini at gnu dot org
2011-02-21 12:56 ` Joost.VandeVondele at pci dot uzh.ch
2011-04-16 11:17 ` [Bug target/38306] [4.4/4.5/4.6/4.7 " jakub at gcc dot gnu.org
2011-09-09 19:11 ` Joost.VandeVondele at pci dot uzh.ch
2011-09-10  9:52 ` manu at gcc dot gnu.org
2011-09-10 13:11 ` ubizjak at gmail dot com
2011-09-13  8:30 ` Joost.VandeVondele at pci dot uzh.ch
2012-01-16 12:41 ` rguenth at gcc dot gnu.org
2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
2010-04-30  9:00 ` [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures jakub at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).