[Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
@ 2003-06-11 22:41 ` pinskia@physics.uc.edu
  2003-06-21  1:44 ` dhazeghi at yahoo dot com
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: pinskia@physics.uc.edu @ 2003-06-11 22:41 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


pinskia@physics.uc.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.4                         |3.3.1


------- Additional Comments From pinskia@physics.uc.edu  2003-06-11 22:41 -------
Does using -fnew-ra get back to 2.95 speed?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
  2003-06-11 22:41 ` [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95 pinskia@physics.uc.edu
@ 2003-06-21  1:44 ` dhazeghi at yahoo dot com
  2003-06-24 13:23 ` o dot lauffenburger at topsolid dot com
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-06-21  1:44 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


dhazeghi at yahoo dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|                            |i686-pc-cygwin
   GCC host triplet|                            |i686-pc-cygwin
 GCC target triplet|                            |i686-pc-cygwin
           Priority|P3                          |P2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
  2003-06-11 22:41 ` [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95 pinskia@physics.uc.edu
  2003-06-21  1:44 ` dhazeghi at yahoo dot com
@ 2003-06-24 13:23 ` o dot lauffenburger at topsolid dot com
  2003-06-24 14:38 ` pinskia at physics dot uc dot edu
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: o dot lauffenburger at topsolid dot com @ 2003-06-24 13:23 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126



------- Additional Comments From o dot lauffenburger at topsolid dot com  2003-06-24 12:35 -------
I have tested the -fnew-ra option with version 3.3 and the other options (-O3 -
ffast-math -fomit-frame-pointer).

Without -fnew-ra : 4746 ms
With -fnew-ra : 9063 ms
(With gcc 2.95 : 2914 ms)

So it is apparently worse with the option -fnew-ra.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (2 preceding siblings ...)
  2003-06-24 13:23 ` o dot lauffenburger at topsolid dot com
@ 2003-06-24 14:38 ` pinskia at physics dot uc dot edu
  2003-07-23  7:02 ` mmitchel at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: pinskia at physics dot uc dot edu @ 2003-06-24 14:38 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126



------- Additional Comments From pinskia at physics dot uc dot edu  2003-06-24 13:28 -------
Then there is two bugs here a general regression and one due to fnew-ra.
fnew-ra should be at least the same speed as without it.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (3 preceding siblings ...)
  2003-06-24 14:38 ` pinskia at physics dot uc dot edu
@ 2003-07-23  7:02 ` mmitchel at gcc dot gnu dot org
  2003-10-16  2:38 ` mmitchel at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2003-07-23  7:02 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.3.1                       |3.3.2


------- Additional Comments From mmitchel at gcc dot gnu dot org  2003-07-23 07:02 -------
Jan says, via private email, that this is a "random" slowdown.  In other words,
that regstack makes some decisions that are easily perturbed and that some
sometimes it gets luck and sometimes idt doesn't.

We should fix that, but not before 3.3.1, so I've postponed this bug until GCC
3.3.2.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (4 preceding siblings ...)
  2003-07-23  7:02 ` mmitchel at gcc dot gnu dot org
@ 2003-10-16  2:38 ` mmitchel at gcc dot gnu dot org
  2003-10-30  6:26 ` uros at kss-loka dot si
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2003-10-16  2:38 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.3.2                       |3.4


------- Additional Comments From mmitchel at gcc dot gnu dot org  2003-10-16 02:38 -------
Postponed until GCC 3.4; this doesn't sound like it's going to have an easy fix.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (5 preceding siblings ...)
  2003-10-16  2:38 ` mmitchel at gcc dot gnu dot org
@ 2003-10-30  6:26 ` uros at kss-loka dot si
  2004-01-01  4:11 ` pinskia at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2003-10-30  6:26 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126



------- Additional Comments From uros at kss-loka dot si  2003-10-30 06:23 -------
I have some measurements with RedHat 7.3 gcc-2.96 and gcc-3.3. The results on
166 MHz pentium MMX are quite interesting. They show that for attached testcase
(test.c in attachments, modified to plain .c file) gcc-3.3 is faster that
gcc-2.96. Also of interest is gcc 3.3 with -fnew-ra and -funroll-all-loops
switches. This combination is the fastest one, however ony -fnew-ra is the worst
one.

[uros@localhost test]$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)
===
[uros@localhost test]$ gcc -ffast-math -fomit-frame-pointer -O3 test.c
[uros@localhost test]$ time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000

real	0m22.352s
user	0m22.310s
sys	0m0.010s
[uros@localhost test]$ gcc -ffast-math -fomit-frame-pointer -funroll-all-loops
-O3 test.c
[uros@localhost test]$ time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000

real	0m19.831s
user	0m19.780s
sys	0m0.020s
===
===
[uros@localhost test]$ gcc -v
Reading specs from /usr/local/lib/gcc-lib/i586-pc-linux-gnu/3.3/specs
Configured with: ../gcc-3.3/configure 
Thread model: posix
gcc version 3.3
===
[uros@localhost test]$ gcc -ffast-math -fomit-frame-pointer -O3 test.c
[uros@localhost test]$ time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000

real	0m19.408s
user	0m19.320s
sys	0m0.010s
===
[uros@localhost test]$ gcc -ffast-math -fomit-frame-pointer -funroll-all-loops
-O3 test.c
[uros@localhost test]$ time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000

real	0m14.518s
user	0m14.470s
sys	0m0.010s
===
[uros@localhost test]$ gcc -ffast-math -fomit-frame-pointer -funroll-all-loops
-fnew-ra -O3 test.c
[uros@localhost test]$ time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000

real	0m13.540s
user	0m13.520s
sys	0m0.010s
===
[uros@localhost test]$ gcc -ffast-math -fomit-frame-pointer -fnew-ra -O3 test.c
[uros@localhost test]$ time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000

real	0m27.185s
user	0m27.140s
sys	0m0.010s


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (6 preceding siblings ...)
  2003-10-30  6:26 ` uros at kss-loka dot si
@ 2004-01-01  4:11 ` pinskia at gcc dot gnu dot org
  2004-01-01 10:25 ` hubicka at ucw dot cz
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-01-01  4:11 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-01-01 04:11 -------
What is weird is that -march=i386 is faster than -march=i686 on a pentium3:
grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i386
grendel:~/src/gnu/gcctest>time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000
2.726u 0.000s 0:02.74 99.2%     0+0k 0+0io 2pf+0w
grendel:~/src/gnu/gcctest>time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000
2.710u 0.000s 0:02.74 98.9%     0+0k 0+0io 0pf+0w
grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i686
grendel:~/src/gnu/gcctest>time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000
2.843u 0.007s 0:02.87 98.9%     0+0k 0+0io 2pf+0w
grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i586
grendel:~/src/gnu/gcctest>time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000
2.703u 0.000s 0:02.72 99.2%     0+0k 0+0io 2pf+0w
grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=
pentium3
grendel:~/src/gnu/gcctest>time ./a.out
Start?
Stop!
Result = 0.000000, 0.000000, 1.000000
2.843u 0.007s 0:02.87 98.9%     0+0k 0+0io 2pf+0w

Is it looks like a choosing the wrong instruction for pentium3. (pentium4 is different and 
does not matter that mcuh).

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|0000-00-00 00:00:00         |2004-01-01 04:11:34
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (7 preceding siblings ...)
  2004-01-01  4:11 ` pinskia at gcc dot gnu dot org
@ 2004-01-01 10:25 ` hubicka at ucw dot cz
  2004-01-03 18:39 ` hubicka at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: hubicka at ucw dot cz @ 2004-01-01 10:25 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From hubicka at ucw dot cz  2004-01-01 10:25 -------
Subject: Re:  [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95

> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-01-01 04:11 -------
> What is weird is that -march=i386 is faster than -march=i686 on a pentium3:
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i386
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.726u 0.000s 0:02.74 99.2%     0+0k 0+0io 2pf+0w
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.710u 0.000s 0:02.74 98.9%     0+0k 0+0io 0pf+0w
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i686
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.843u 0.007s 0:02.87 98.9%     0+0k 0+0io 2pf+0w
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=i586
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.703u 0.000s 0:02.72 99.2%     0+0k 0+0io 2pf+0w
> grendel:~/src/gnu/gcctest>gcc -O3 -ffast-math -fomit-frame-pointer pr8126.c -march=
> pentium3
> grendel:~/src/gnu/gcctest>time ./a.out
> Start?
> Stop!
> Result = 0.000000, 0.000000, 1.000000
> 2.843u 0.007s 0:02.87 98.9%     0+0k 0+0io 2pf+0w
> 
> Is it looks like a choosing the wrong instruction for pentium3. (pentium4 is different and 
> does not matter that mcuh).

No, it is the scheduler (you will likely reproduce similar results via
-fno-schedule-insns2).  Scheduler does not take into account the stack
register file and reg-stack does not reorder and works by blindly
inserting exchange operations when the code does not match stack nature,
thus we get 100% random results performance wise out of the backend.
The unscheduled code usually fare slightly better as the structure of
original expression trees is still somewhat preserved, but it is still
far fom optimal.  There is not much to do on this front in short term,
unfortunately.

I've had limited luck with a patch teaching scheduler that two
consetuctive FP operations are cheaper when the other uses same operand
as destination of the first, but it does not fit very well to current
scheduler model (and it is missdesign).  Proper sollution is to
reorganize scheduler core into kind of library and make reg-stack to use
it to fix ordering as needed.  I am not planning to dig into it anytime
soon tought, home that the importance of x87 will fade.

Honza
> 
> -- 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>    Last reconfirmed|0000-00-00 00:00:00         |2004-01-01 04:11:34
>                date|                            |
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (8 preceding siblings ...)
  2004-01-01 10:25 ` hubicka at ucw dot cz
@ 2004-01-03 18:39 ` hubicka at gcc dot gnu dot org
  2004-01-23 16:58 ` [Bug optimization/8126] [3.3/3.4/3.5 " dhazeghi at yahoo dot com
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2004-01-03 18:39 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From hubicka at gcc dot gnu dot org  2004-01-03 18:39 -------
We will unlikely redesign reg-stack for this release :(
hope for the best in the future

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.4.0                       |3.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug optimization/8126] [3.3/3.4/3.5 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (9 preceding siblings ...)
  2004-01-03 18:39 ` hubicka at gcc dot gnu dot org
@ 2004-01-23 16:58 ` dhazeghi at yahoo dot com
  2004-09-30 16:28 ` [Bug rtl-optimization/8126] [3.3/3.4/4.0 " pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: dhazeghi at yahoo dot com @ 2004-01-23 16:58 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dhazeghi at yahoo dot com  2004-01-23 16:58 -------
Are you currently working on this Jan, or should we unassign it? Thanks.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/8126] [3.3/3.4/4.0 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (10 preceding siblings ...)
  2004-01-23 16:58 ` [Bug optimization/8126] [3.3/3.4/3.5 " dhazeghi at yahoo dot com
@ 2004-09-30 16:28 ` pinskia at gcc dot gnu dot org
  2004-11-26  9:38 ` uros at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-09-30 16:28 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-09-30 16:28 -------
Hmm, 4.0.0 is faster and smaller at least on a pentium4.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/8126] [3.3/3.4/4.0 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (11 preceding siblings ...)
  2004-09-30 16:28 ` [Bug rtl-optimization/8126] [3.3/3.4/4.0 " pinskia at gcc dot gnu dot org
@ 2004-11-26  9:38 ` uros at gcc dot gnu dot org
  2005-01-05 21:53 ` hubicka at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-26  9:38 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at gcc dot gnu dot org  2004-11-26 09:38 -------
(In reply to comment #15)
> Hmm, 4.0.0 is faster and smaller at least on a pentium4.

The faster and smaller code is produced because scheduler is disabled for pentium4.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/8126] [3.3/3.4/4.0 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (12 preceding siblings ...)
  2004-11-26  9:38 ` uros at gcc dot gnu dot org
@ 2005-01-05 21:53 ` hubicka at gcc dot gnu dot org
  2005-01-16  3:35 ` ian at airs dot com
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2005-01-05 21:53 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From hubicka at gcc dot gnu dot org  2005-01-05 21:53 -------
I don't see much to do without regstack reorg and I don't have time for that :(

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|hubicka at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/8126] [3.3/3.4/4.0 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (13 preceding siblings ...)
  2005-01-05 21:53 ` hubicka at gcc dot gnu dot org
@ 2005-01-16  3:35 ` ian at airs dot com
  2005-01-16 13:52 ` steven at gcc dot gnu dot org
  2005-01-27  9:08 ` uros at kss-loka dot si
  16 siblings, 0 replies; 17+ messages in thread
From: ian at airs dot com @ 2005-01-16  3:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From ian at airs dot com  2005-01-16 03:35 -------
If we're going to mark this as a regression, can somebody pin down the cases
where mainline gcc is slower than gcc 2.95?

On my system it is about 35% faster.  But that is on a Pentium 4.

I know that Roger Sayle did some work on reg-stack shuffling, but I don't know
how much that affects this PR, if at all.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ian at airs dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/8126] [3.3/3.4/4.0 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (14 preceding siblings ...)
  2005-01-16  3:35 ` ian at airs dot com
@ 2005-01-16 13:52 ` steven at gcc dot gnu dot org
  2005-01-27  9:08 ` uros at kss-loka dot si
  16 siblings, 0 replies; 17+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-16 13:52 UTC (permalink / raw)
  To: gcc-bugs

------- Additional Comments From steven at gcc dot gnu dot org  2005-01-16 13:52 -------
I think this is a WONTFIX regression.

As Honza pointed out, interactions between regstack and sched2 can sometimes
produce really odd results.  I don't see us produce a "sched3"-like pass for
x87 any time soon.  It should not be hard to teach regstack to use the DFA
interface, but realistically I don't think anyone is interested in doing so,
except for Roger Sayle maybe...?

IMHO this is a BS bug, because overall we are not worse for FP at all, and
compared to 2.95.x we are in fact *much* better overall.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sayle at gcc dot gnu dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/8126] [3.3/3.4/4.0 regression] Floating point computation far slower in 3.2 than in 2.95
       [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
                   ` (15 preceding siblings ...)
  2005-01-16 13:52 ` steven at gcc dot gnu dot org
@ 2005-01-27  9:08 ` uros at kss-loka dot si
  16 siblings, 0 replies; 17+ messages in thread
From: uros at kss-loka dot si @ 2005-01-27  9:08 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-01-27 09:07 -------
I don't think that this has anything to do with regstack and sched2. The fact
is, that for fp-intensive applications, 8 FP regs (either stacked x87 or
non-stack SSE type) is not enough. When there is a shorthage of registers, gcc
starts to swap registers to and from memory.

Please note that reg/reg and reg/mem fops have the same latency/throuhput on P4,
but moving FP registers to and from memory introduces a big performance penalty
and these moves should be minimised as much as possible.

There are some measurements to prove this (-O2 only to avoid fast-math intrinsic
shortcuts, P4-3.2 timings):

a) -march=pentium -mfpmath=387: scheduling and reg-stack interactions:
real    0m34.073s
user    0m33.756s
sys     0m0.018s

b) -march=pentium -msse2 -mfpmath=sse: scheduling and no reg-stack:
real    0m35.063s
user    0m34.674s
sys     0m0.076s

c) -march=pentium4 -mfpmath=387: no scheduling with reg-stack:
real    0m33.720s
user    0m33.348s
sys     0m0.037s

d) -march=pentium4 -mfpmath=sse: no scheduling and no reg-stack:
real    0m35.399s
user    0m35.016s
sys     0m0.035s

The question I would like to ask: is there a functionality in gcc to optimise
register moving, considering the cost of reg/reg vs. reg/mem FP operators and
the cost of register<->mem move?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-01-27  9:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20021002075601.8126.o.lauffenburger@topsolid.com>
2003-06-11 22:41 ` [Bug optimization/8126] [3.3/3.4 regression] Floating point computation far slower in 3.2 than in 2.95 pinskia@physics.uc.edu
2003-06-21  1:44 ` dhazeghi at yahoo dot com
2003-06-24 13:23 ` o dot lauffenburger at topsolid dot com
2003-06-24 14:38 ` pinskia at physics dot uc dot edu
2003-07-23  7:02 ` mmitchel at gcc dot gnu dot org
2003-10-16  2:38 ` mmitchel at gcc dot gnu dot org
2003-10-30  6:26 ` uros at kss-loka dot si
2004-01-01  4:11 ` pinskia at gcc dot gnu dot org
2004-01-01 10:25 ` hubicka at ucw dot cz
2004-01-03 18:39 ` hubicka at gcc dot gnu dot org
2004-01-23 16:58 ` [Bug optimization/8126] [3.3/3.4/3.5 " dhazeghi at yahoo dot com
2004-09-30 16:28 ` [Bug rtl-optimization/8126] [3.3/3.4/4.0 " pinskia at gcc dot gnu dot org
2004-11-26  9:38 ` uros at gcc dot gnu dot org
2005-01-05 21:53 ` hubicka at gcc dot gnu dot org
2005-01-16  3:35 ` ian at airs dot com
2005-01-16 13:52 ` steven at gcc dot gnu dot org
2005-01-27  9:08 ` uros at kss-loka dot si

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).