public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2
@ 2005-04-11 13:17 denis dot nagorny at intel dot com
  2005-04-11 13:20 ` [Bug fortran/20945] " denis dot nagorny at intel dot com
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: denis dot nagorny at intel dot com @ 2005-04-11 13:17 UTC (permalink / raw)
  To: gcc-bugs

fortran 4.0 shows perfomance regression (with -O2 option) in comparison with 
g77 from 3.4.2 on IA32 with attached test. This test is obtained from 
cpu2000/mgrid test. 
It consists of calling of two functions: PSINV and RESID.
Instrumental control (gprof) shows that most part of time spends in RESID
function. 
There is one strange thing for me. If I remove call of PSINV function test 
(compiled by g77) became more slowly then it was before (with this call). 
gfortran from gcc4.0 behave more predictable.
It looks like g77 from gcc 3.4.2 does interprocedure optimization for better 
cache using which can't do gfortran from gcc4.0

You can reproduce my results with attached test.

Timing results:
With PSINV call
g77 sample.f -O2 -static
0m0.693s 0m0.685s 0m0.008s
0m0.694s 0m0.685s 0m0.009s
0m0.690s 0m0.683s 0m0.007s

With PSINV call
gfortran sample.f -O2 -static
0m1.293s 0m1.279s 0m0.015s
0m1.320s 0m1.306s 0m0.014s
0m1.303s 0m1.294s 0m0.008s

Without PSINV call:
g77 sample1.f -O2 -static -o z342s
time ./z342s 
0m0.902s 0m0.893s 0m0.007s
0m0.930s 0m0.923s 0m0.008s
0m0.894s 0m0.889s 0m0.005s

Without PSINV call
gfortran sample1.f -O2 -static -o z40s
time ./z40s 
0m0.758s 0m0.752s 0m0.006s
0m0.762s 0m0.758s 0m0.004s
0m0.759s 0m0.757s 0m0.004s

cat /proc/cpuinfo:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 7
cpu MHz         : 2400.858
cache size      : 512 KB

-- 
           Summary: about 2x perfomance regression in comparision with 3.4.2
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: fortran
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: denis dot nagorny at intel dot com
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i586-suse-linux
  GCC host triplet: i586-suse-linux
GCC target triplet: i586-suse-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
@ 2005-04-11 13:20 ` denis dot nagorny at intel dot com
  2005-04-11 13:28 ` pinskia at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: denis dot nagorny at intel dot com @ 2005-04-11 13:20 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From denis dot nagorny at intel dot com  2005-04-11 13:20 -------
Created an attachment (id=8591)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8591&action=view)
Test for results reproducing


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
  2005-04-11 13:20 ` [Bug fortran/20945] " denis dot nagorny at intel dot com
@ 2005-04-11 13:28 ` pinskia at gcc dot gnu dot org
  2005-04-11 20:08 ` pinskia at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-11 13:28 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-04-11 13:28 -------
I know that gfortran has some issue with inlining.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
  2005-04-11 13:20 ` [Bug fortran/20945] " denis dot nagorny at intel dot com
  2005-04-11 13:28 ` pinskia at gcc dot gnu dot org
@ 2005-04-11 20:08 ` pinskia at gcc dot gnu dot org
  2005-04-18 15:46 ` hjl at lucon dot org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-04-11 20:08 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-04-11 20:08 -------
Note I think this is just the register allocator being stupid.
See PR 18048 for another bug report about this.

Though I note on ppc-darwin, I cannot reproduce the problem you see with your testcase.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |18048


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (2 preceding siblings ...)
  2005-04-11 20:08 ` pinskia at gcc dot gnu dot org
@ 2005-04-18 15:46 ` hjl at lucon dot org
  2005-06-19  7:34 ` fxcoudert at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: hjl at lucon dot org @ 2005-04-18 15:46 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl at lucon dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (3 preceding siblings ...)
  2005-04-18 15:46 ` hjl at lucon dot org
@ 2005-06-19  7:34 ` fxcoudert at gcc dot gnu dot org
  2005-06-22 14:34 ` denis dot nagorny at intel dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2005-06-19  7:34 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From fxcoudert at gcc dot gnu dot org  2005-06-19 07:34 -------
On Pentium III (Coppermine), 864 MHz, 256 KB cache, with linux. Timing in seconds:

g77: 9.52s
gfortran: 9.49s
g77 -O2: 3.11s
gfortran -O2: 3.39s
g77 -O3 -ffast-math: 3.09s
gfortran -O3 -ffast-math: 3.37s

So, I can't confirm your timings. You should perhaps use longer loops (since
your timings are less than a second, you could see the effect of something
external to the floating-point performance itself).

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug fortran/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (4 preceding siblings ...)
  2005-06-19  7:34 ` fxcoudert at gcc dot gnu dot org
@ 2005-06-22 14:34 ` denis dot nagorny at intel dot com
  2005-06-22 14:37 ` [Bug rtl-optimization/20945] " pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: denis dot nagorny at intel dot com @ 2005-06-22 14:34 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From denis dot nagorny at intel dot com  2005-06-22 14:34 -------
Ok. It seems like this issue is mostly fixed now. I incresead NIT counter up 
to 200 and obtained following results:
3.4.2 ~ 3.4s
old 4.0 ~ 6.4s
mainline ~ 4.0s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (5 preceding siblings ...)
  2005-06-22 14:34 ` denis dot nagorny at intel dot com
@ 2005-06-22 14:37 ` pinskia at gcc dot gnu dot org
  2005-06-25 11:36 ` steven at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-06-22 14:37 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-06-22 14:37 -------
I think this is the normal register pressure issue in GCC's RA.

See <http://www.gccsummit.org/2005/view_abstract.php?content_key=1> for a discussion which will 
happen later today.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|fortran                     |rtl-optimization
           Keywords|                            |missed-optimization, ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (6 preceding siblings ...)
  2005-06-22 14:37 ` [Bug rtl-optimization/20945] " pinskia at gcc dot gnu dot org
@ 2005-06-25 11:36 ` steven at gcc dot gnu dot org
  2005-06-25 11:38 ` steven at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-25 11:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-25 11:36 -------
Speculation is not going to help anyone.  What does the generated code 
look like for 3.4.2, 4.0.0 and CVS HEAD?  Bonus points for annotated 
assembly output so that it is easier to interpret the results. 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (7 preceding siblings ...)
  2005-06-25 11:36 ` steven at gcc dot gnu dot org
@ 2005-06-25 11:38 ` steven at gcc dot gnu dot org
  2005-07-09 16:35 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-25 11:38 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-25 11:38 -------
And FWIW, yes there are a number of known issues with optimizing for mgrid 
on ia32.  Try e.g. -fno-tree-pre, this used to be a major win for mgrid. 
What can you expect, when the hot loop has 11 integer register candidates 
that GCC all puts in GIMPLE registers, but your silly target only has 6 
registers.  Long live AMD64! 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/20945] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (8 preceding siblings ...)
  2005-06-25 11:38 ` steven at gcc dot gnu dot org
@ 2005-07-09 16:35 ` pinskia at gcc dot gnu dot org
  2005-07-10 19:55 ` [Bug rtl-optimization/20945] [4.0/4.1 Regresson] " pinskia at gcc dot gnu dot org
  2005-09-29  4:08 ` pinskia at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-07-09 16:35 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|i586-suse-linux             |
   GCC host triplet|i586-suse-linux             |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/20945] [4.0/4.1 Regresson] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (9 preceding siblings ...)
  2005-07-09 16:35 ` pinskia at gcc dot gnu dot org
@ 2005-07-10 19:55 ` pinskia at gcc dot gnu dot org
  2005-09-29  4:08 ` pinskia at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-07-10 19:55 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |18427
            Summary|about 2x perfomance         |[4.0/4.1 Regresson] about 2x
                   |regression in comparision   |perfomance regression in
                   |with 3.4.2                  |comparision with 3.4.2
   Target Milestone|---                         |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/20945] [4.0/4.1 Regresson] about 2x perfomance regression in comparision with 3.4.2
  2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
                   ` (10 preceding siblings ...)
  2005-07-10 19:55 ` [Bug rtl-optimization/20945] [4.0/4.1 Regresson] " pinskia at gcc dot gnu dot org
@ 2005-09-29  4:08 ` pinskia at gcc dot gnu dot org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-29  4:08 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
 GCC target triplet|i586-suse-linux             |i586-*-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20945


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-09-29  4:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-11 13:17 [Bug fortran/20945] New: about 2x perfomance regression in comparision with 3.4.2 denis dot nagorny at intel dot com
2005-04-11 13:20 ` [Bug fortran/20945] " denis dot nagorny at intel dot com
2005-04-11 13:28 ` pinskia at gcc dot gnu dot org
2005-04-11 20:08 ` pinskia at gcc dot gnu dot org
2005-04-18 15:46 ` hjl at lucon dot org
2005-06-19  7:34 ` fxcoudert at gcc dot gnu dot org
2005-06-22 14:34 ` denis dot nagorny at intel dot com
2005-06-22 14:37 ` [Bug rtl-optimization/20945] " pinskia at gcc dot gnu dot org
2005-06-25 11:36 ` steven at gcc dot gnu dot org
2005-06-25 11:38 ` steven at gcc dot gnu dot org
2005-07-09 16:35 ` pinskia at gcc dot gnu dot org
2005-07-10 19:55 ` [Bug rtl-optimization/20945] [4.0/4.1 Regresson] " pinskia at gcc dot gnu dot org
2005-09-29  4:08 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).