public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math
@ 2004-04-28 7:26 uros at kss-loka dot si
2004-04-28 9:50 ` [Bug optimization/15187] " zack at codesourcery dot com
` (11 more replies)
0 siblings, 12 replies; 15+ messages in thread
From: uros at kss-loka dot si @ 2004-04-28 7:26 UTC (permalink / raw)
To: gcc-bugs
This testcase compiles with '-O2 -ffast-math' into extremely inefficient asm code:
double test(double x) {
if (x > 0.0)
return cos(x);
else
return sin(x);
}
--cut here--
test:
pushl %ebp
movl %esp, %ebp
fldl 8(%ebp)
fcoml .LC1
fnstsw %ax
fld %st(0)
fcos
sahf
ja .L6
fstp %st(0)
fsin
jmp .L1
.p2align 4,,7
.L6:
fstp %st(1)
.L1:
popl %ebp
ret
--cut here--
It will _always_ call fcos instruction, and - depending on input - overwrite
output of fcos with output of fsin instruction. This problem is not fsin/fcos
specific.
The problem is in ifcvt.c, find_if_case_1() function. Around line 2889, there is
a condition:
/* THEN is small. */
if (count_bb_insns (then_bb) > BRANCH_COST)
return FALSE;
This condition would prevent moving 'else' BB before 'if', if then_bb is not
small. 'Small' means a couple of instructions with default BRANCH_COST (= 1).
However, if an instruction is UNSPEC_*, this instruction can last hundred of
cycles (as it is case with fsin or fcos), but it is still _one_ RTL instruction.
So the case above is not triggered, and fcos is moved before 'if'.
The situation is even worser with '-O2 -ffast-math -march=i686'. fsin and fcos
are called every time...
test:
pushl %ebp
movl %esp, %ebp
fldl 8(%ebp)
fldz
fld %st(1)
fld %st(2)
fxch %st(1)
fcos
fxch %st(3)
popl %ebp
fcomip %st(2), %st
fstp %st(1)
fsin
fcmovnbe %st(1), %st
fstp %st(1)
ret
--
Summary: Inefficient if optimization with -O2 -ffast-math
Product: gcc
Version: 3.5.0
Status: UNCONFIRMED
Severity: critical
Priority: P2
Component: optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at kss-loka dot si
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
@ 2004-04-28 9:50 ` zack at codesourcery dot com
2004-04-28 11:22 ` uros at kss-loka dot si
` (10 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: zack at codesourcery dot com @ 2004-04-28 9:50 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From zack at codesourcery dot com 2004-04-28 07:44 -------
Subject: Re: New: Inefficient if optimization with
-O2 -ffast-math
Ideal code here would be to call fsincos and then pick one of the
outputs, yes?
zw
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
2004-04-28 9:50 ` [Bug optimization/15187] " zack at codesourcery dot com
@ 2004-04-28 11:22 ` uros at kss-loka dot si
2004-04-28 12:52 ` pinskia at gcc dot gnu dot org
` (9 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: uros at kss-loka dot si @ 2004-04-28 11:22 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2004-04-28 09:52 -------
Yes, fsincos would be the best solution in particular case, when both sin() and
cos() are involved. However with more general testcase, for example:
double test(double x) {
if (x > 0.0)
return x + 1.0;
else
return sqrt(x); // or sin(x), cos(x), etc...
}
square root is _always_ called, even if x > 0.0. If we return sin(x) [which
could not be combine with sqrt in any way] instead of (x + 1.0), both sin() and
sqrt() will be called when x > 0.0. In this case, -march=i686 (which has cmov
instruction) will produce code, which will _always_ call sin() and sqrt() and
then cmov will pick one of the results. When fsqrt x87 instruction takes 70
cycles and fsin x87 insn takes 65-100 cycles to execute, this is not a win in
any case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
2004-04-28 9:50 ` [Bug optimization/15187] " zack at codesourcery dot com
2004-04-28 11:22 ` uros at kss-loka dot si
@ 2004-04-28 12:52 ` pinskia at gcc dot gnu dot org
2004-04-28 13:30 ` uros at kss-loka dot si
` (8 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-04-28 12:52 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-04-28 11:41 -------
Confirmed. I think this has to do with the branch probability of taking it being 50% so it is merging
both of them together so if you use profiled feedback that GCC has it will work correctly if it is not
taken 50% of the time.
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|critical |normal
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Keywords| |pessimizes-code
Last reconfirmed|0000-00-00 00:00:00 |2004-04-28 11:41:56
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (2 preceding siblings ...)
2004-04-28 12:52 ` pinskia at gcc dot gnu dot org
@ 2004-04-28 13:30 ` uros at kss-loka dot si
2004-04-28 13:42 ` pinskia at gcc dot gnu dot org
` (7 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: uros at kss-loka dot si @ 2004-04-28 13:30 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2004-04-28 12:04 -------
Andrew: I don't think that profiled feedback will help. fcos will be computed
every time, but in 50% that result will be thrown away, fsin will be calculated
and its result will be used. With -mbranch-cost=0, generated code looks a lot
better, exactly what would one expect (and with only one conditional jump):
--cut here--
test:
pushl %ebp
movl %esp, %ebp
fldl 8(%ebp)
fcoml .LC1
fnstsw %ax
sahf
jbe .L2
fcos
popl %ebp
ret
.p2align 4,,7
.L2:
fsin
popl %ebp
ret
--cut here--
-mbranch-cost=0 does not help with -march=i686, resulting code is the same as
without -mbranch-cost.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (3 preceding siblings ...)
2004-04-28 13:30 ` uros at kss-loka dot si
@ 2004-04-28 13:42 ` pinskia at gcc dot gnu dot org
2004-04-28 14:45 ` falk at debian dot org
` (6 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-04-28 13:42 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-04-28 12:52 -------
Note branch cost and branch probabilities are two seperate things. I tried to find what flag is causing
this but I could not I would need to look into the RTL dups to see what is going on.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (4 preceding siblings ...)
2004-04-28 13:42 ` pinskia at gcc dot gnu dot org
@ 2004-04-28 14:45 ` falk at debian dot org
2004-06-04 6:15 ` [Bug rtl-optimization/15187] " pinskia at gcc dot gnu dot org
` (5 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: falk at debian dot org @ 2004-04-28 14:45 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From falk at debian dot org 2004-04-28 13:35 -------
It seems to me that cases where this is a bad decision are really rare. Basically
it is only bad if both branches use the same functional unit, which additionally
isn't fully pipelined. This seems like a rare situation, and in the future CPUs
will have even more pipelined units, which will usually not have to be
represented with UNSPEC, and even more expensive branches. IMHO we should leave
it just as it is and maybe try to mitigate it with target specific hacks if
it is really an issue in real programs.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (5 preceding siblings ...)
2004-04-28 14:45 ` falk at debian dot org
@ 2004-06-04 6:15 ` pinskia at gcc dot gnu dot org
2004-07-07 13:34 ` roger at eyesopen dot com
` (4 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-06-04 6:15 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (6 preceding siblings ...)
2004-06-04 6:15 ` [Bug rtl-optimization/15187] " pinskia at gcc dot gnu dot org
@ 2004-07-07 13:34 ` roger at eyesopen dot com
2004-09-17 5:32 ` cvs-commit at gcc dot gnu dot org
` (3 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: roger at eyesopen dot com @ 2004-07-07 13:34 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From roger at eyesopen dot com 2004-07-07 13:34 -------
There's a relevant discussion of these issues posted to gcc-patches at:
http://gcc.gnu.org/ml/gcc-patches/2004-07/msg00597.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (7 preceding siblings ...)
2004-07-07 13:34 ` roger at eyesopen dot com
@ 2004-09-17 5:32 ` cvs-commit at gcc dot gnu dot org
2004-09-17 5:41 ` uros at kss-loka dot si
` (2 subsequent siblings)
11 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2004-09-17 5:32 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From cvs-commit at gcc dot gnu dot org 2004-09-17 05:32 -------
Subject: Bug 15187
CVSROOT: /cvs/gcc
Module name: gcc
Changes by: uros@gcc.gnu.org 2004-09-17 05:32:37
Modified files:
gcc : ChangeLog ifcvt.c
Log message:
PR rtl-optimization/15187
* ifcvt.c (noce_try_cmove_arith): Exit early if total
insn_rtx_cost of both branches > BRANCH_COST
Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5487&r2=2.5488
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ifcvt.c.diff?cvsroot=gcc&r1=1.164&r2=1.165
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (8 preceding siblings ...)
2004-09-17 5:32 ` cvs-commit at gcc dot gnu dot org
@ 2004-09-17 5:41 ` uros at kss-loka dot si
2004-09-17 5:47 ` uros at gcc dot gnu dot org
2004-09-17 8:44 ` pinskia at gcc dot gnu dot org
11 siblings, 0 replies; 15+ messages in thread
From: uros at kss-loka dot si @ 2004-09-17 5:41 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2004-09-17 05:41 -------
Fixed by:
http://gcc.gnu.org/ml/gcc-patches/2004-07/msg00654.html
http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01107.html
(TERGET_CMOV case by):
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01667.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (9 preceding siblings ...)
2004-09-17 5:41 ` uros at kss-loka dot si
@ 2004-09-17 5:47 ` uros at gcc dot gnu dot org
2004-09-17 8:44 ` pinskia at gcc dot gnu dot org
11 siblings, 0 replies; 15+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-09-17 5:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at gcc dot gnu dot org 2004-09-17 05:47 -------
I forgot to mark this bug as RESOLVED FIXED in previous comment.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
` (10 preceding siblings ...)
2004-09-17 5:47 ` uros at gcc dot gnu dot org
@ 2004-09-17 8:44 ` pinskia at gcc dot gnu dot org
11 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-09-17 8:44 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.0.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
[not found] <bug-15187-1649@http.gcc.gnu.org/bugzilla/>
2006-03-28 10:34 ` pluto at agmk dot net
@ 2006-03-29 14:08 ` uros at kss-loka dot si
1 sibling, 0 replies; 15+ messages in thread
From: uros at kss-loka dot si @ 2006-03-29 14:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from uros at kss-loka dot si 2006-03-29 14:08 -------
(In reply to comment #11)
> it looks like 4.1.1 and 4.2.0 still produce unoptimal code.
> test: pushl %ebp
> movl %esp, %ebp
> fldl 8(%ebp)
> fldz
> fcomip %st(1), %st
> jae .L2
> popl %ebp
> fcos
> ret
>
> .L2: popl %ebp
> fsin
> ret
No, this code is optimal. Please compare the code above to the code in
description, where fcos is calculated even if x <= 0.0
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug rtl-optimization/15187] Inefficient if optimization with -O2 -ffast-math
[not found] <bug-15187-1649@http.gcc.gnu.org/bugzilla/>
@ 2006-03-28 10:34 ` pluto at agmk dot net
2006-03-29 14:08 ` uros at kss-loka dot si
1 sibling, 0 replies; 15+ messages in thread
From: pluto at agmk dot net @ 2006-03-28 10:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from pluto at agmk dot net 2006-03-28 10:34 -------
it looks like 4.1.1 and 4.2.0 still produce unoptimal code.
#include <math.h>
double test(double x)
{
if (x > 0.0)
return cos(x);
else
return sin(x);
}
[ 4.1.1-0.20060322 / rev.112277 /x86-64 ]
$ gcc bug.c -O2 -ffast-math -m32 -march=i686 -S
test: pushl %ebp
movl %esp, %ebp
fldl 8(%ebp)
fldz
fcomip %st(1), %st
jae .L2
popl %ebp
fcos
ret
.L2: popl %ebp
fsin
ret
[ 4.2.0-20060323 / rev.112317 / x86-64 ]
$ ./xgcc -B. bug.c -O2 -ffast-math -m32 -march=i686 -S
bug.c: In function ‘test’:
bug.c:7: internal compiler error: in bsi_last, at tree-flow-inline.h:760
--
pluto at agmk dot net changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pluto at agmk dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15187
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-03-29 14:08 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-28 7:26 [Bug optimization/15187] New: Inefficient if optimization with -O2 -ffast-math uros at kss-loka dot si
2004-04-28 9:50 ` [Bug optimization/15187] " zack at codesourcery dot com
2004-04-28 11:22 ` uros at kss-loka dot si
2004-04-28 12:52 ` pinskia at gcc dot gnu dot org
2004-04-28 13:30 ` uros at kss-loka dot si
2004-04-28 13:42 ` pinskia at gcc dot gnu dot org
2004-04-28 14:45 ` falk at debian dot org
2004-06-04 6:15 ` [Bug rtl-optimization/15187] " pinskia at gcc dot gnu dot org
2004-07-07 13:34 ` roger at eyesopen dot com
2004-09-17 5:32 ` cvs-commit at gcc dot gnu dot org
2004-09-17 5:41 ` uros at kss-loka dot si
2004-09-17 5:47 ` uros at gcc dot gnu dot org
2004-09-17 8:44 ` pinskia at gcc dot gnu dot org
[not found] <bug-15187-1649@http.gcc.gnu.org/bugzilla/>
2006-03-28 10:34 ` pluto at agmk dot net
2006-03-29 14:08 ` uros at kss-loka dot si
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).