public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/31249]  New: pseudo-optimzation with sincos/cexpi
@ 2007-03-17 21:11 dominiq at lps dot ens dot fr
  2007-03-18  9:50 ` [Bug middle-end/31249] " pinskia at gcc dot gnu dot org
                   ` (18 more replies)
  0 siblings, 19 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-17 21:11 UTC (permalink / raw)
  To: gcc-bugs

With gfortran and g++ the computation of cos(x) and sin(x) is "optimized" by
taking
the real and imaginary parts of cexpi(x) (at least it is what I understand).
This is working
if and only if the computation of cexpi(x) is faster than the sum of the
separate computations 
of cos(x) and sin(x). 

Now consider the following code:

integer, parameter :: n=1000000
integer :: i
real(8) :: pi, ss, sc, t, dt
pi = acos(-1.0d0)
dt=pi/n
sc=0
ss=0
t=0
do i= 1, 100*n
  sc = sc + cos(t-dt)
  ss = ss + sin(t)
  t = t+dt
end do
print *, sc, ss
end

the result is (G5 1.8Ghz, OSX 10.3.9):

[karma] bug/timing% gfc -O3 sincos.f90 
[karma] bug/timing% time a.out 
 -6.324121638644320E-002 -2.934958087315009E-003
13.020u 0.050s 0:13.59 96.1%    0+0k 0+2io 0pf+0w

It is easy to see that I have fooled the optimizer with the line

  sc = sc + cos(t-dt)

If I replace it by:

  sc = sc + cos(t)

the result is now (over a 50% increase of the CPU time):

[karma] bug/timing% gfc -O3 sincos_o.f90
[karma] bug/timing% time a.out
 -6.324121573032526E-002 -2.934958087315009E-003
21.740u 0.080s 0:22.18 98.3%    0+0k 0+2io 0pf+0w

to be compared with the result of the code:

integer, parameter :: n=1000000
integer :: i
real(8) :: pi, ss, sc, t, dt
complex(8) :: z, dz
pi = acos(-1.0d0)
dt=pi/n
dz=cmplx(0.0d0,dt,8)
sc=0
ss=0
z=0
do i= 1, 100*n
  sc = sc + real(exp(z))
  ss = ss + aimag(exp(z))
  z = z+dz
end do
print *, sc, ss
end

is

[karma] bug/timing% gfc -O3 cexp.f90
[karma] bug/timing% time a.out
 -6.324121573032526E-002 -2.934958087315009E-003
20.850u 0.110s 0:21.45 97.7%    0+0k 0+2io 0pf+0w

Following the comments in PR #30969, 30980, and 31161, I have understood that
on OSX cexpi "fallback" to cexp in perfect agreement with the above timings.

So it would probably nice to disable the sincos "optimisation" on platforms
that
do not support fast cexpi such as OSX (as presently configured).

Note that on Sat, 30 Sep 2006 in

http://gcc.gnu.org/ml/fortran/2006-09/msg00454.html

I have reported (in vain) a timing regression for the fatigue.f90 polyhedron
test case.
Is this related to this pseudo-optimization or to another change?


-- 
           Summary: pseudo-optimzation with sincos/cexpi
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dominiq at lps dot ens dot fr
GCC target triplet: powerpc-apple-darwin7


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
@ 2007-03-18  9:50 ` pinskia at gcc dot gnu dot org
  2007-03-18 10:20 ` dominiq at lps dot ens dot fr
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-03-18  9:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2007-03-18 09:49 -------
The only reason why cexp is slow on PPC darwin is because the ABI is stupid. 
Complex float arguments are passed via the GPR and returned also the same way
instead of via the FPRs.  So you will get a transfer of registers.  This is
also true of PPC64 darwin, why they made the same mistake twice I have no idea,
guess they did not expect people to use complex that much.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
  2007-03-18  9:50 ` [Bug middle-end/31249] " pinskia at gcc dot gnu dot org
@ 2007-03-18 10:20 ` dominiq at lps dot ens dot fr
  2007-03-19  9:28 ` dominiq at lps dot ens dot fr
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-18 10:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from dominiq at lps dot ens dot fr  2007-03-18 10:20 -------
Andrew,

Thanks for the answer. Additional timings for AMD Opteron(tm) Processor 250,
2.4Ghz:

Target: x86_64-unknown-linux-gnu
...
gcc version 4.3.0 20061231 (experimental)

[tocata] test/fortran> gfc -O3 sincos.f90 
[tocata] test/fortran> time a.out 
 -6.324121691031215E-002 -2.934957388823078E-003
19.847u 0.001s 0:20.41 97.2%    0+0k 0+0io 0pf+0w

[tocata] test/fortran> gfc -O3 sincos_o.f90
[tocata] test/fortran> time a.out
 -6.324121619598655E-002 -2.934957388823078E-003
19.793u 0.000s 0:19.80 99.9%    0+0k 0+0io 0pf+0w

[tocata] test/fortran> gfc -O3 cexp.f90 
[tocata] test/fortran> time a.out
 -6.324121619598655E-002 -2.934957388823078E-003
15.613u 0.000s 0:15.63 99.8%    0+0k 0+0io 0pf+0w

sin+cos is not optimized as cexpi.

Target: i386-pc-linux-gnu
...
gcc version 4.3.0 20070225 (experimental)

[tocata] test/fortran> gfc32 -Wa,-32 -O3 -fdump-tree-optimized sincos.f90
[tocata] test/fortran> time a.out
 -6.324122144403047E-002 -2.934963088285132E-003
10.757u 0.000s 0:10.76 99.9%    0+0k 0+0io 0pf+0w

[tocata] test/fortran> gfc32 -Wa,-32 -O3 -fdump-tree-optimized sincos_o.f90
tocata] test/fortran> time a.out 
 -6.324122124732012E-002 -2.934963117388848E-003
7.291u 0.001s 0:07.47 97.5%     0+0k 0+0io 4pf+0w

[tocata] test/fortran> gfc32 -Wa,-32 -O3 -fdump-tree-optimized cexp.f90
[tocata] test/fortran> time a.out
 -6.324122124732012E-002 -2.934963117388848E-003
11.412u 0.000s 0:11.41 100.0%   0+0k 0+0io 0pf+0w

sin+cos is optimized as cexpi which is faster than cexp -> real optimization!
The i386 code is almost twice as fast as the x86_64 one.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
  2007-03-18  9:50 ` [Bug middle-end/31249] " pinskia at gcc dot gnu dot org
  2007-03-18 10:20 ` dominiq at lps dot ens dot fr
@ 2007-03-19  9:28 ` dominiq at lps dot ens dot fr
  2007-03-19 10:43 ` rguenth at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-19  9:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from dominiq at lps dot ens dot fr  2007-03-19 09:28 -------
BTW, did I miss an option to turn this optimization off?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (2 preceding siblings ...)
  2007-03-19  9:28 ` dominiq at lps dot ens dot fr
@ 2007-03-19 10:43 ` rguenth at gcc dot gnu dot org
  2007-03-19 12:44 ` dominiq at lps dot ens dot fr
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-03-19 10:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2007-03-19 10:43 -------
There is no option to turn it off.  But for !TARGET_C99_FUNCTIONS and
!TARGET_HAS_SINCOS targets it's off.  Usually (in fact, for every libm I looked
into), cexp is implemented as

complex double cexp (complex double x)
{
  double cos = cos (imag(x));
  double sin = sin (imag(x));
  double e = 1;
  if (real(x) != 0)
    e = exp (real(x));
...

possibly computing cos and sin in an efficient way (using sincos).  So
cexp () should be never slower than calling sin () and cos ().  If the ABI
were not stupid of course ;)

Does darwin have a sincos() library function?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (3 preceding siblings ...)
  2007-03-19 10:43 ` rguenth at gcc dot gnu dot org
@ 2007-03-19 12:44 ` dominiq at lps dot ens dot fr
  2007-03-19 17:52   ` Andrew Pinski
  2007-03-19 17:53 ` pinskia at gmail dot com
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-19 12:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dominiq at lps dot ens dot fr  2007-03-19 12:43 -------
> There is no option to turn it off.  But for !TARGET_C99_FUNCTIONS and
> !TARGET_HAS_SINCOS targets it's off.  

>From my understanding of the thread

http://gcc.gnu.org/ml/gcc/2007-03/msg00639.html

if !TARGET_64BIT, then TARGET_C99_FUNCTIONS depends on 
darwin_macosx_version_min which seems presently default to 10.1.
So TARGET_C99_FUNCTIONS seems to be set to 0 at least for a G5 under 
OSX 10.3 and a G4 under 10.4.

> Does darwin have a sincos() library function?

I don't know. If there is no answer in this PR, I can ask the 
question on gcc@gcc.gnu.org.

>From the behavior reported in PR30969, PR30980, and PR31161, it seems
that the optimization is off for gcc, but on for g++ and gfortran,
though I cannot figure out why.

> If the ABI were not stupid of course ;)

Since sin() and cos() are non trivial functions, I am very surprised 
that a wrong API makes a 50% difference.

If the API is so time consuming, why not inline sin() and cos()? 
In addition a decent optimizer should be able to eliminate redundant
part of the codes, making the use of a sincos() function not necessary,
or am I too naive?

What is the best way to collect data on the different platform
to evaluate how this optimization is really working?
I have only access to OSX and Intel or AMD64 under Linux.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-19 12:44 ` dominiq at lps dot ens dot fr
@ 2007-03-19 17:52   ` Andrew Pinski
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Pinski @ 2007-03-19 17:52 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs

On 19 Mar 2007 12:43:49 -0000, dominiq at lps dot ens dot fr
<gcc-bugzilla@gcc.gnu.org> wrote:
>
> Since sin() and cos() are non trivial functions, I am very surprised
> that a wrong API makes a 50% difference.

Well Here is how it can make a 50% difference (at least on the Cell,
the 970 has less of a restriction and only the dispatch group is
rejected).  Modern PowerPC processors like not to store stuff to the
stack and then load it again with in a number of cycles (cell is
around 50 cycles while the 970 is just within a dispatch group).
Transfering between the integer register set and the floating point
register set can only be done via memory so you will get a LHS or a
LRU reject (depending on what processor you are on).  This can either
cause a 50 cycle delay or reject of the dispatch group (the later can
cause multiple rejects).  The number of cycles used up by this issue
can add up with both sides of the function having this hazard.

Thanks,
Andrew Pinski


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (4 preceding siblings ...)
  2007-03-19 12:44 ` dominiq at lps dot ens dot fr
@ 2007-03-19 17:53 ` pinskia at gmail dot com
  2007-03-20 11:04 ` rguenth at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gmail dot com @ 2007-03-19 17:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gmail dot com  2007-03-19 17:52 -------
Subject: Re:  pseudo-optimzation with sincos/cexpi

On 19 Mar 2007 12:43:49 -0000, dominiq at lps dot ens dot fr
<gcc-bugzilla@gcc.gnu.org> wrote:
>
> Since sin() and cos() are non trivial functions, I am very surprised
> that a wrong API makes a 50% difference.

Well Here is how it can make a 50% difference (at least on the Cell,
the 970 has less of a restriction and only the dispatch group is
rejected).  Modern PowerPC processors like not to store stuff to the
stack and then load it again with in a number of cycles (cell is
around 50 cycles while the 970 is just within a dispatch group).
Transfering between the integer register set and the floating point
register set can only be done via memory so you will get a LHS or a
LRU reject (depending on what processor you are on).  This can either
cause a 50 cycle delay or reject of the dispatch group (the later can
cause multiple rejects).  The number of cycles used up by this issue
can add up with both sides of the function having this hazard.

Thanks,
Andrew Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (5 preceding siblings ...)
  2007-03-19 17:53 ` pinskia at gmail dot com
@ 2007-03-20 11:04 ` rguenth at gcc dot gnu dot org
  2007-03-20 13:57 ` dominiq at lps dot ens dot fr
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-03-20 11:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2007-03-20 11:04 -------
I agree it's surprising to get user-visible effects with the
TARGET_C99_FUNCTIONS
difference between the frontends, but they are (supposed to) providing C99
runtime completion by their runtime libraries.  And they rely on full C99
support.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (6 preceding siblings ...)
  2007-03-20 11:04 ` rguenth at gcc dot gnu dot org
@ 2007-03-20 13:57 ` dominiq at lps dot ens dot fr
  2007-03-20 14:03 ` dominiq at lps dot ens dot fr
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-20 13:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from dominiq at lps dot ens dot fr  2007-03-20 13:57 -------
> The only reason why cexp is slow on PPC darwin is because the ABI is stupid. 
> Complex float arguments are passed via the GPR and returned also the same way
> instead of via the FPRs.  So you will get a transfer of registers.  This is
> also true of PPC64 darwin, why they made the same mistake twice I have no idea,
> guess they did not expect people to use complex that much.

Is this also true for complex double on 32 bit architectures (i.e., 4 GPRs)
or do you mean the GPR is used to pass a pointer?

> ... .  The number of cycles used up by this issue
> can add up with both sides of the function having this hazard.

You are comforting my prejudice against using procedures in critical loops.

Now if you cannot convince darwin people to fix the problem, I cannot how I
could.  My short term interests are:

(1) to understand how this reverse optimization is triggered.

(2) to know what are the non-Linux platform that are affected beside
Darwin.

(3) to get a work around less hackish that what I did in my first example.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (7 preceding siblings ...)
  2007-03-20 13:57 ` dominiq at lps dot ens dot fr
@ 2007-03-20 14:03 ` dominiq at lps dot ens dot fr
  2007-03-20 14:26 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-20 14:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from dominiq at lps dot ens dot fr  2007-03-20 14:03 -------
> I agree it's surprising to get user-visible effects with the
> TARGET_C99_FUNCTIONS difference between the frontends, 
> but they are (supposed to) providing C99 runtime completion 
> by their runtime libraries.  And they rely on full C99 support.

Do you mean that g++ and gfortran set TARGET_C99_FUNCTIONS on
their own?

If yes, the cexpi optimization should probably another condition:
what is the point to replace sin+cos by a call to a function
calling sin+cos?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (8 preceding siblings ...)
  2007-03-20 14:03 ` dominiq at lps dot ens dot fr
@ 2007-03-20 14:26 ` rguenth at gcc dot gnu dot org
  2007-03-20 14:58 ` dominiq at lps dot ens dot fr
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-03-20 14:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenth at gcc dot gnu dot org  2007-03-20 14:26 -------
That sin+cos is practically sincos (so you get one for free).  Just not every
library exports that sincos.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (9 preceding siblings ...)
  2007-03-20 14:26 ` rguenth at gcc dot gnu dot org
@ 2007-03-20 14:58 ` dominiq at lps dot ens dot fr
  2007-03-20 15:07 ` rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-20 14:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from dominiq at lps dot ens dot fr  2007-03-20 14:57 -------
Subject: Re:  pseudo-optimzation with sincos/cexpi

> That sin+cos is practically sincos (so you get one for free).  Just not every
> library exports that sincos.

Does not this assume that it exists a real sincos(x) and not a faked one
thorugh cexp((1,x))?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (10 preceding siblings ...)
  2007-03-20 14:58 ` dominiq at lps dot ens dot fr
@ 2007-03-20 15:07 ` rguenth at gcc dot gnu dot org
  2007-03-20 16:09 ` dominiq at lps dot ens dot fr
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-03-20 15:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from rguenth at gcc dot gnu dot org  2007-03-20 15:06 -------
Depends on how you name it ;)  You can propose that we only enable sincos
transformation if TARGET_HAS_SINCOS is set, I wouldn't necessarily object to
that.
(The targets I care for have a sincos)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (11 preceding siblings ...)
  2007-03-20 15:07 ` rguenth at gcc dot gnu dot org
@ 2007-03-20 16:09 ` dominiq at lps dot ens dot fr
  2007-03-20 16:12 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-20 16:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from dominiq at lps dot ens dot fr  2007-03-20 16:08 -------
> You can propose that we only enable sincos transformation 
> if TARGET_HAS_SINCOS is set, I wouldn't necessarily object to
> that. (The targets I care for have a sincos)

Sound reasonable: replacing:

  return (TARGET_HAS_SINCOS
          || TARGET_C99_FUNCTIONS)
         && optimize;

by

  return TARGET_HAS_SINCOS
         && optimize;

in gcc/tree-ssa-math-opts.c, isn't it?

I can even do a preliminary test to check that it
does not break anything.

What's bother me is that i suspect the problem is present
for all non-Linux platforms and to have no feedback from them.
If you have some idea about the way to trigger their interest,
it would be nice.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (12 preceding siblings ...)
  2007-03-20 16:09 ` dominiq at lps dot ens dot fr
@ 2007-03-20 16:12 ` rguenth at gcc dot gnu dot org
  2007-03-20 16:14 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-03-20 16:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from rguenth at gcc dot gnu dot org  2007-03-20 16:12 -------
The recommended way is to post a message to gcc@gcc.gnu.org or
gcc-patches@gcc.gnu.org


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (13 preceding siblings ...)
  2007-03-20 16:12 ` rguenth at gcc dot gnu dot org
@ 2007-03-20 16:14 ` pinskia at gcc dot gnu dot org
  2007-03-21 15:36 ` dominiq at lps dot ens dot fr
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-03-20 16:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from pinskia at gcc dot gnu dot org  2007-03-20 16:14 -------
> Is this also true for complex double on 32 bit architectures (i.e., 4 GPRs)
> or do you mean the GPR is used to pass a pointer?

4 GPRS

Yes this was a stupid decission on Apple's part for not looking at fixing GCC
before setting an ABI.

And really this problem is only with PPC no other target has this stupid ABI
issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (14 preceding siblings ...)
  2007-03-20 16:14 ` pinskia at gcc dot gnu dot org
@ 2007-03-21 15:36 ` dominiq at lps dot ens dot fr
  2007-03-21 15:57 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-21 15:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from dominiq at lps dot ens dot fr  2007-03-21 15:36 -------
> The recommended way is to post a message to gcc@gcc.gnu.org or
> gcc-patches@gcc.gnu.org

I'll follow your advice, but before I'ld like some feedback about what follows.

I have applied the following patch

--- gcc-4.3-20070316/gcc/tree-ssa-math-opts.c   Thu Mar  8 20:02:51 2007
+++ gcc-4.3-20070317/gcc/tree-ssa-math-opts.c   Tue Mar 20 17:21:16 2007
@@ -704,9 +704,7 @@
 gate_cse_sincos (void)
 {
   /* Make sure we have either sincos or cexp.  */
-  return (TARGET_HAS_SINCOS
-         || TARGET_C99_FUNCTIONS)
-        && optimize;
+  return TARGET_HAS_SINCOS && optimize;
 }

 struct tree_opt_pass pass_cse_sincos =

I have regtested it with no change in the reports for gcc, g++, gfortran, 
and objc. The timings before and after are

                              before                   after
                        -O0     -O1      %      -O0     -O1      %

g++-4 sincos_o.c        6.2     9.6     +55     6.3     5.6     -11
gfc sincos_o.f90        6.3     9.6     +52     6.4     5.5     -14

for the following C and Fotran tests:

[karma] bug/timing% cat sincos_o.c
#include <math.h>
#include <stdio.h>

int main()
{

  long    n = 1000000;
  long    i;
  double mo = -1.0;
  double pi = acos(mo);
  double sc = 0.0;
  double ss = 0.0;
  double  t = 0.0;
  double dt = pi/n;

  printf("%.17g \n", pi);
  printf("%.17g \n", dt);
  for (i=0; i< 40*n; i++) {
    sc += cos(t);
    ss += sin(t);
    t += dt;
  }
  printf("%.17g %.17g \n", sc, ss);
}

[karma] bug/timing% cat sincos_o.f90 
integer, parameter :: n=1000000
integer :: i
real(8) :: pi, ss, sc, t, dt
pi = acos(-1.0d0)
dt=pi/n
sc=0
ss=0
t=0
do i= 1, 40*n
  sc = sc + cos(t)
  ss = ss + sin(t)
  t = t+dt
end do
print *, sc, ss
end

So from the PPC Darwin point of view, everything is working as expected.

I have done a search on the regtest list based on

FAIL: gcc.dg/builtins-59.c scan-tree-dump __builtin_cexpi

assuming that platforms that do not pass it, are likely to have not
__builtin_cexpi, thus are exposed to the same bad optimization.
I have found the following list tested on a regular basis:

powerpc-apple-darwin8.5.0
hppa2.0w-hp-hpux11.11
v850-unknown-elf
sparc-unknown-elf
sh-unknown-elf
powerpc-unknown-eabisim
mips-unknown-elf
m32r-unknown-elf
m32c-unknown-elf
avr-unknown-none
frv-unknown-elf
arm-unknown-elf
cris-axis-elf
arm-none-eabi

As far as I can tell, only the first two are tested against g++ and 
gfortran.

Note that the list does not include powerpc64-apple-darwin8.8.0.
So it seems that it has __builtin_cexpi.

I don't know what will be the final decision about the proposed patch,
but there is no "emergency" since I can use it for my coming weekly builds.
I would prefer to have some feedback from the listed platforms before
seeing the patch applied.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (15 preceding siblings ...)
  2007-03-21 15:36 ` dominiq at lps dot ens dot fr
@ 2007-03-21 15:57 ` rguenth at gcc dot gnu dot org
  2007-03-21 16:09 ` dominiq at lps dot ens dot fr
  2009-12-04 17:10 ` [Bug middle-end/31249] pseudo-optimization " dominiq at lps dot ens dot fr
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-03-21 15:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from rguenth at gcc dot gnu dot org  2007-03-21 15:57 -------
It would be nice to know whether darwin does not implement cexp in an optimal
way (special casing zero real part and dispatching to a sincos equivalent for
the imaginary part) or if the issue is only a bad ABI for complex values. 
Otherwise the proposal looks ok, those other targets are mostly embedded ones
and as such don't care too much about optimized sincos probably.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimzation with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (16 preceding siblings ...)
  2007-03-21 15:57 ` rguenth at gcc dot gnu dot org
@ 2007-03-21 16:09 ` dominiq at lps dot ens dot fr
  2009-12-04 17:10 ` [Bug middle-end/31249] pseudo-optimization " dominiq at lps dot ens dot fr
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-03-21 16:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from dominiq at lps dot ens dot fr  2007-03-21 16:09 -------
> It would be nice to know whether darwin does not implement cexp in an optimal way ...

I have forgotten to mention that I did some (quick) profiling: cexp seens
trivially implemented.
It calls sin, cos and exp + the ABI problem mentionned by Andrew.  It seems
that any 
inplementation of cexp calling sin and cos would at best lead to a draw and
more likely to
a regression in timing.  Now I cannot rule out that on some platforms cexp is
implemented
in a clever way, using some kind of sincos alrgorithm (this is what I would
like to know).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug middle-end/31249] pseudo-optimization with sincos/cexpi
  2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
                   ` (17 preceding siblings ...)
  2007-03-21 16:09 ` dominiq at lps dot ens dot fr
@ 2009-12-04 17:10 ` dominiq at lps dot ens dot fr
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-12-04 17:10 UTC (permalink / raw)
  To: gcc-bugs



-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor
            Summary|pseudo-optimzation with     |pseudo-optimization with
                   |sincos/cexpi                |sincos/cexpi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31249


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-12-04 17:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-17 21:11 [Bug middle-end/31249] New: pseudo-optimzation with sincos/cexpi dominiq at lps dot ens dot fr
2007-03-18  9:50 ` [Bug middle-end/31249] " pinskia at gcc dot gnu dot org
2007-03-18 10:20 ` dominiq at lps dot ens dot fr
2007-03-19  9:28 ` dominiq at lps dot ens dot fr
2007-03-19 10:43 ` rguenth at gcc dot gnu dot org
2007-03-19 12:44 ` dominiq at lps dot ens dot fr
2007-03-19 17:52   ` Andrew Pinski
2007-03-19 17:53 ` pinskia at gmail dot com
2007-03-20 11:04 ` rguenth at gcc dot gnu dot org
2007-03-20 13:57 ` dominiq at lps dot ens dot fr
2007-03-20 14:03 ` dominiq at lps dot ens dot fr
2007-03-20 14:26 ` rguenth at gcc dot gnu dot org
2007-03-20 14:58 ` dominiq at lps dot ens dot fr
2007-03-20 15:07 ` rguenth at gcc dot gnu dot org
2007-03-20 16:09 ` dominiq at lps dot ens dot fr
2007-03-20 16:12 ` rguenth at gcc dot gnu dot org
2007-03-20 16:14 ` pinskia at gcc dot gnu dot org
2007-03-21 15:36 ` dominiq at lps dot ens dot fr
2007-03-21 15:57 ` rguenth at gcc dot gnu dot org
2007-03-21 16:09 ` dominiq at lps dot ens dot fr
2009-12-04 17:10 ` [Bug middle-end/31249] pseudo-optimization " dominiq at lps dot ens dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).