public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/38306]  New: [4.4 Regression] 15% slowdown of computational kernel
@ 2008-11-28 16:02 jv244 at cam dot ac dot uk
  2008-11-28 16:03 ` [Bug target/38306] " jv244 at cam dot ac dot uk
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-11-28 16:02 UTC (permalink / raw)
  To: gcc-bugs

The (to be) attached code runs about ~15% (4.4 vs 4.2) slower compiled with:
gfortran -O3 -march=native -funroll-loops  -ffast-math test.f90

4.4:  5.060s
4.3:  4.376s
4.2:  4.316s

most time would be spent in PD2VAL.

FYI, the cpu is:

cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8218
stepping        : 2
cpu MHz         : 2612.084
cache size      : 1024 KB
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy

(-march -> -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8)

on Core2 4.4 is actually faster:

4.4: 4.236s
4.3.0: 4.572s

-march=core2 -mcx16 -msahf --param l1-cache-size=32 --param
l1-cache-line-size=64 -mtune=core2


-- 
           Summary: [4.4 Regression] 15% slowdown of computational kernel
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
@ 2008-11-28 16:03 ` jv244 at cam dot ac dot uk
  2008-11-28 16:25 ` pinskia at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-11-28 16:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from jv244 at cam dot ac dot uk  2008-11-28 16:01 -------
Created an attachment (id=16788)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16788&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
  2008-11-28 16:03 ` [Bug target/38306] " jv244 at cam dot ac dot uk
@ 2008-11-28 16:25 ` pinskia at gcc dot gnu dot org
  2008-11-30 11:40 ` rguenth at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-11-28 16:25 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu dot
                   |                            |org
   Target Milestone|---                         |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
  2008-11-28 16:03 ` [Bug target/38306] " jv244 at cam dot ac dot uk
  2008-11-28 16:25 ` pinskia at gcc dot gnu dot org
@ 2008-11-30 11:40 ` rguenth at gcc dot gnu dot org
  2008-11-30 11:49 ` rguenth at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-11-30 11:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2008-11-30 11:38 -------
Due to the high density of branches in the code this is easily a code layout
and/or padding issue.  Different architectures have different constraints on
their decoders and branch predictors related to branch density.  Core
introduces other branch limitations for loops that engage the loop stream
detector.

We do not at all try to properly optimize (or even model) this apart
from inserting nops.  YMMV with -fschedule-insns.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (2 preceding siblings ...)
  2008-11-30 11:40 ` rguenth at gcc dot gnu dot org
@ 2008-11-30 11:49 ` rguenth at gcc dot gnu dot org
  2008-11-30 16:18 ` jv244 at cam dot ac dot uk
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-11-30 11:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2008-11-30 11:48 -------
Oh, maybe try -fno-tree-reassoc as well.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (3 preceding siblings ...)
  2008-11-30 11:49 ` rguenth at gcc dot gnu dot org
@ 2008-11-30 16:18 ` jv244 at cam dot ac dot uk
  2008-11-30 16:27 ` jv244 at cam dot ac dot uk
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-11-30 16:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jv244 at cam dot ac dot uk  2008-11-30 16:17 -------
(In reply to comment #2)
> Due to the high density of branches in the code this is easily a code layout
> and/or padding issue.  Different architectures have different constraints on
> their decoders and branch predictors related to branch density.  Core
> introduces other branch limitations for loops that engage the loop stream
> detector.
> We do not at all try to properly optimize (or even model) this apart
> from inserting nops.  YMMV with -fschedule-insns.

I'm not expert enough to understand this, but you have it right. However, it
remains a regression (on opteron)

4.4: 
-O3 -march=native -funroll-loops  -ffast-math                  ==> 5.064s
-O3 -march=native -funroll-loops  -ffast-math -fschedule-insns ==> 4.396

4.3:
-O3 -march=native -funroll-loops  -ffast-math                  ==> 4.376
-O3 -march=native -funroll-loops  -ffast-math -fschedule-insns ==> 3.372

-fno-tree-reassoc has no effect.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (4 preceding siblings ...)
  2008-11-30 16:18 ` jv244 at cam dot ac dot uk
@ 2008-11-30 16:27 ` jv244 at cam dot ac dot uk
  2008-11-30 16:40 ` rguenth at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-11-30 16:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from jv244 at cam dot ac dot uk  2008-11-30 16:26 -------
(In reply to comment #4)
> 4.3:
> -O3 -march=native -funroll-loops  -ffast-math                  ==> 4.376
> -O3 -march=native -funroll-loops  -ffast-math -fschedule-insns ==> 3.372

strangely:

http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Optimize-Options.html#Optimize-Options
suggests -fschedule-insns is enabled by default at -O3 ?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown of computational kernel
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (5 preceding siblings ...)
  2008-11-30 16:27 ` jv244 at cam dot ac dot uk
@ 2008-11-30 16:40 ` rguenth at gcc dot gnu dot org
  2008-12-03 19:04 ` [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures steven at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-11-30 16:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2008-11-30 16:39 -------
Not on all targets though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (6 preceding siblings ...)
  2008-11-30 16:40 ` rguenth at gcc dot gnu dot org
@ 2008-12-03 19:04 ` steven at gcc dot gnu dot org
  2008-12-03 21:29 ` hjl dot tools at gmail dot com
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-03 19:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from steven at gcc dot gnu dot org  2008-12-03 19:01 -------
But a regression at least on some targets.  Confirmed.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-12-03 19:01:16
               date|                            |
            Summary|[4.4 Regression] 15%        |[4.4 Regression] 15%
                   |slowdown of computational   |slowdown w.r.t. 4.3 of
                   |kernel                      |computational kernel on some
                   |                            |architectures


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (7 preceding siblings ...)
  2008-12-03 19:04 ` [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures steven at gcc dot gnu dot org
@ 2008-12-03 21:29 ` hjl dot tools at gmail dot com
  2008-12-04 16:12 ` jv244 at cam dot ac dot uk
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: hjl dot tools at gmail dot com @ 2008-12-03 21:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from hjl dot tools at gmail dot com  2008-12-03 21:28 -------
(In reply to comment #5)
> (In reply to comment #4)
> > 4.3:
> > -O3 -march=native -funroll-loops  -ffast-math                  ==> 4.376
> > -O3 -march=native -funroll-loops  -ffast-math -fschedule-insns ==> 3.372
> 
> strangely:
> 
> http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Optimize-Options.html#Optimize-Options
> suggests -fschedule-insns is enabled by default at -O3 ?
> 

This may be related to PR 37565. i386.c has

void
optimization_options (int level, int size ATTRIBUTE_UNUSED)
{
  /* For -O2 and beyond, turn off -fschedule-insns by default.  It tends to
     make the problem with not enough registers even worse.  */
#ifdef INSN_SCHEDULING
  if (level > 1)
    flag_schedule_insns = 0;
#endif    


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (8 preceding siblings ...)
  2008-12-03 21:29 ` hjl dot tools at gmail dot com
@ 2008-12-04 16:12 ` jv244 at cam dot ac dot uk
  2008-12-05 12:18 ` rguenth at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-04 16:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jv244 at cam dot ac dot uk  2008-12-04 16:11 -------
I tried -fschedule-insns  on CP2K, which lead to an ICE (now PR38403)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (9 preceding siblings ...)
  2008-12-04 16:12 ` jv244 at cam dot ac dot uk
@ 2008-12-05 12:18 ` rguenth at gcc dot gnu dot org
  2008-12-06 15:38 ` steven at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-05 12:18 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (10 preceding siblings ...)
  2008-12-05 12:18 ` rguenth at gcc dot gnu dot org
@ 2008-12-06 15:38 ` steven at gcc dot gnu dot org
  2008-12-06 18:56 ` jv244 at cam dot ac dot uk
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-06 15:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from steven at gcc dot gnu dot org  2008-12-06 15:37 -------
If the code layout (see comment #2) is indeed causing the slow-down, this
problem might have been fixed along with bug 38074.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (11 preceding siblings ...)
  2008-12-06 15:38 ` steven at gcc dot gnu dot org
@ 2008-12-06 18:56 ` jv244 at cam dot ac dot uk
  2009-02-11 19:00 ` bonzini at gnu dot org
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-06 18:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jv244 at cam dot ac dot uk  2008-12-06 18:54 -------
(In reply to comment #10)
> If the code layout (see comment #2) is indeed causing the slow-down, this
> problem might have been fixed along with bug 38074.

No, timings are still identical:

gcc version 4.4.0 20081206 (experimental) [trunk revision 142525] (GCC)
Time for evaluation [s]:                        5.028
gcc version 4.3.3 20080912 (prerelease) (GCC)
Time for evaluation [s]:                        4.376

(note that the regression is on opteron)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (12 preceding siblings ...)
  2008-12-06 18:56 ` jv244 at cam dot ac dot uk
@ 2009-02-11 19:00 ` bonzini at gnu dot org
  2009-02-11 19:25 ` jv244 at cam dot ac dot uk
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: bonzini at gnu dot org @ 2009-02-11 19:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from bonzini at gnu dot org  2009-02-11 19:00 -------
  /* For -O2 and beyond, turn off -fschedule-insns by default.  It tends to
     make the problem with not enough registers even worse.  */

As risky as this may be (for performance, not correctness), what about changing
"if (level > 1)" to "if (level == 2)"?  And what about enabling it on x86-64?


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bonzini at gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (13 preceding siblings ...)
  2009-02-11 19:00 ` bonzini at gnu dot org
@ 2009-02-11 19:25 ` jv244 at cam dot ac dot uk
  2009-04-21 15:59 ` [Bug target/38306] [4.4/4.5 " jakub at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-02-11 19:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jv244 at cam dot ac dot uk  2009-02-11 19:25 -------
(In reply to comment #12)
>   /* For -O2 and beyond, turn off -fschedule-insns by default.  It tends to
>      make the problem with not enough registers even worse.  */
> 
> As risky as this may be (for performance, not correctness), what about changing
> "if (level > 1)" to "if (level == 2)"?  And what about enabling it on x86-64?

But even on x86-64 this seems to lead to ICEs (see PR38403). 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (14 preceding siblings ...)
  2009-02-11 19:25 ` jv244 at cam dot ac dot uk
@ 2009-04-21 15:59 ` jakub at gcc dot gnu dot org
  2009-07-22 10:35 ` jakub at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-04-21 15:59 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.0                       |4.4.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (15 preceding siblings ...)
  2009-04-21 15:59 ` [Bug target/38306] [4.4/4.5 " jakub at gcc dot gnu dot org
@ 2009-07-22 10:35 ` jakub at gcc dot gnu dot org
  2009-09-01  6:56 ` jv244 at cam dot ac dot uk
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-22 10:35 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.1                       |4.4.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (16 preceding siblings ...)
  2009-07-22 10:35 ` jakub at gcc dot gnu dot org
@ 2009-09-01  6:56 ` jv244 at cam dot ac dot uk
  2009-09-01  8:54 ` bonzini at gnu dot org
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-09-01  6:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from jv244 at cam dot ac dot uk  2009-09-01 06:56 -------
I wanted to try Vladimir Makarov's new patch for this testcase, but on an
unpatched trunk I notice a serious runtime regression with '-fschedule-insns'
with respect to 4.3.3

Using as base options (for the attached testcase)

gfortran -O3 -march=native -funroll-loops  -ffast-math test.f90

4.3.3 w   -fschedule-insns : 3.372s
4.3.3 w/o -fschedule-insns : 4.384s

4.4.2 w   -fschedule-insns : 4.748s
4.4.2 w/o -fschedule-insns : 4.408s

4.5.0 w   -fschedule-insns : 4.712s
4.5.0 w/o -fschedule-insns : 4.408s

so 4.3 against 4.5 'w -fschedule-insns' is about 40% faster.

I guess this is pretty target specific, I'm running this on an Opteron, this is
what -v reports:

Target: x86_64-unknown-linux-gnu
Configured with: /data03/vondele/gcc_trunk/gcc/configure --disable-bootstrap
--prefix=/data03/vondele/gcc_trunk/build --enable-languages=c,c++,fortran
--disable-multilib --with-ppl=/data03/vondele/gcc_trunk/build/
--with-cloog=/data03/vondele/gcc_trunk/build/
Thread model: posix
gcc version 4.5.0 20090830 (experimental) [trunk revision 151229] (GCC)
COLLECT_GCC_OPTIONS='-O3'  '-funroll-loops' '-ffast-math' '-fschedule-insns'
'-v' '-shared-libgcc'

/data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/f951
test.f90 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -quiet -dumpbase
test.f90 -auxbase test -O3 -version -funroll-loops -ffast-math -fschedule-insns
-fintrinsic-modules-path
/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/finclude
-o /tmp/ccvGq2CO.s


-- 

jv244 at cam dot ac dot uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (17 preceding siblings ...)
  2009-09-01  6:56 ` jv244 at cam dot ac dot uk
@ 2009-09-01  8:54 ` bonzini at gnu dot org
  2009-09-01  9:13 ` jv244 at cam dot ac dot uk
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: bonzini at gnu dot org @ 2009-09-01  8:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from bonzini at gnu dot org  2009-09-01 08:54 -------
Please try -O2 and -O2 -funroll-loops too, since -O3 is not always good for
speed.  (It would be even better if -O2 is not slower and you can find out what
the culprit is at -O3; this is not necessarily possible though).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (18 preceding siblings ...)
  2009-09-01  8:54 ` bonzini at gnu dot org
@ 2009-09-01  9:13 ` jv244 at cam dot ac dot uk
  2009-09-01  9:18 ` jv244 at cam dot ac dot uk
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-09-01  9:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from jv244 at cam dot ac dot uk  2009-09-01 09:13 -------
(In reply to comment #15)
> Please try -O2 and -O2 -funroll-loops too, since -O3 is not always good for
> speed.  (It would be even better if -O2 is not slower and you can find out what
> the culprit is at -O3; this is not necessarily possible though).

you're right that, without -fschedule-insns -O2 is faster than -O3 on this
case, but nothing comes close to 4.3 performance. adding '-fschedule-insns' to
the fastest -O2 choice makes it 20% slower.

All numbers with trunk:

 -O2 -march=native -funroll-loops  -ffast-math: 4.032
 -O2 -march=native -funroll-loops  -ffast-math -fschedule-insns: 4.712
 -O3 -march=native -funroll-loops  -ffast-math: 4.408
 -O2 -march=native -ffast-math: 11.373
 -O2 -march=native -ffast-math -fschedule-insns: 11.409
 -O3 -march=native -ffast-math: 4.296
 -O3 -march=native -ffast-math -fschedule-insns: 4.656

I can test other flags if you've a hint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (19 preceding siblings ...)
  2009-09-01  9:13 ` jv244 at cam dot ac dot uk
@ 2009-09-01  9:18 ` jv244 at cam dot ac dot uk
  2009-10-15 12:50 ` jakub at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-09-01  9:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from jv244 at cam dot ac dot uk  2009-09-01 09:17 -------
(In reply to comment #16)
> All numbers with trunk:
with 4.3 there is no difference between -O2 and -O3

-O2 -march=native -funroll-loops  -ffast-math: 4.388
-O2 -march=native -funroll-loops  -ffast-math -fschedule-insns: 3.352
-O3 -march=native -funroll-loops  -ffast-math: 4.380
-O3 -march=native -funroll-loops  -ffast-math -fschedule-insns: 3.372


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (20 preceding siblings ...)
  2009-09-01  9:18 ` jv244 at cam dot ac dot uk
@ 2009-10-15 12:50 ` jakub at gcc dot gnu dot org
  2010-01-21 13:19 ` jakub at gcc dot gnu dot org
  2010-04-30  9:00 ` [Bug target/38306] [4.4/4.5/4.6 " jakub at gcc dot gnu dot org
  23 siblings, 0 replies; 25+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-10-15 12:50 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.2                       |4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (21 preceding siblings ...)
  2009-10-15 12:50 ` jakub at gcc dot gnu dot org
@ 2010-01-21 13:19 ` jakub at gcc dot gnu dot org
  2010-04-30  9:00 ` [Bug target/38306] [4.4/4.5/4.6 " jakub at gcc dot gnu dot org
  23 siblings, 0 replies; 25+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-01-21 13:19 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.3                       |4.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
  2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
                   ` (22 preceding siblings ...)
  2010-01-21 13:19 ` jakub at gcc dot gnu dot org
@ 2010-04-30  9:00 ` jakub at gcc dot gnu dot org
  23 siblings, 0 replies; 25+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30  9:00 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.4                       |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-04-30  8:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-28 16:02 [Bug target/38306] New: [4.4 Regression] 15% slowdown of computational kernel jv244 at cam dot ac dot uk
2008-11-28 16:03 ` [Bug target/38306] " jv244 at cam dot ac dot uk
2008-11-28 16:25 ` pinskia at gcc dot gnu dot org
2008-11-30 11:40 ` rguenth at gcc dot gnu dot org
2008-11-30 11:49 ` rguenth at gcc dot gnu dot org
2008-11-30 16:18 ` jv244 at cam dot ac dot uk
2008-11-30 16:27 ` jv244 at cam dot ac dot uk
2008-11-30 16:40 ` rguenth at gcc dot gnu dot org
2008-12-03 19:04 ` [Bug target/38306] [4.4 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures steven at gcc dot gnu dot org
2008-12-03 21:29 ` hjl dot tools at gmail dot com
2008-12-04 16:12 ` jv244 at cam dot ac dot uk
2008-12-05 12:18 ` rguenth at gcc dot gnu dot org
2008-12-06 15:38 ` steven at gcc dot gnu dot org
2008-12-06 18:56 ` jv244 at cam dot ac dot uk
2009-02-11 19:00 ` bonzini at gnu dot org
2009-02-11 19:25 ` jv244 at cam dot ac dot uk
2009-04-21 15:59 ` [Bug target/38306] [4.4/4.5 " jakub at gcc dot gnu dot org
2009-07-22 10:35 ` jakub at gcc dot gnu dot org
2009-09-01  6:56 ` jv244 at cam dot ac dot uk
2009-09-01  8:54 ` bonzini at gnu dot org
2009-09-01  9:13 ` jv244 at cam dot ac dot uk
2009-09-01  9:18 ` jv244 at cam dot ac dot uk
2009-10-15 12:50 ` jakub at gcc dot gnu dot org
2010-01-21 13:19 ` jakub at gcc dot gnu dot org
2010-04-30  9:00 ` [Bug target/38306] [4.4/4.5/4.6 " jakub at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).