[Bug middle-end/33780] different results between O3 and O0

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/33780] different results between O3 and O0
       [not found] <bug-33780-4@http.gcc.gnu.org/bugzilla/>
@ 2024-02-16  5:15 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-16  5:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I can only reproduce this these days on 32bit x86 when x87 is in use ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/33780] different results between O3 and O0
  2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
                   ` (4 preceding siblings ...)
  2008-02-05 11:09 ` rguenth at gcc dot gnu dot org
@ 2008-02-11 11:26 ` jv244 at cam dot ac dot uk
  5 siblings, 0 replies; 7+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-02-11 11:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from jv244 at cam dot ac dot uk  2008-02-11 11:25 -------
(In reply to comment #5)
> Or are you complaining that we constant fold integer powers as if they were
> non-integer powers

Richard,

I think I'm suggesting more a enhancement than a bug fix. Some good quality
compiler like IBM's xlf90 guarantee bit-wise identical results at all
optimisation levels, unless one specifies -qno-strict (which is included by
default). My idea was/is that optimization independent results are (on the
right hardware, i.e. not x87) also provided by gcc, unless one specifies
-ffast-math. The above testcase is the only one I found in our 500kLOC
simulation package that changed results as I changed optimisation levels, so I
think it is valuable to make the underlying optimisation depending on a compile
time flag (which is included in -ffast-math).

>From a Fortran language point of view the compiler is free to use any
implementation to compute a**i, including one that varies with optimisation
level or context.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/33780] different results between O3 and O0
  2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
                   ` (3 preceding siblings ...)
  2007-10-18 15:23 ` dominiq at lps dot ens dot fr
@ 2008-02-05 11:09 ` rguenth at gcc dot gnu dot org
  2008-02-11 11:26 ` jv244 at cam dot ac dot uk
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-02-05 11:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2008-02-05 11:08 -------
The difference with your last testcase is indeed that we unroll the loop
setting c[] and constant-fold the result by computing the integral powers
with 0.5ulp precision (and a single rounding(!)).  But all your other
computations involve intermediate roundings, so this is still invalid IMHO.

Or are you complaining that we constant fold integer powers as if they were
non-integer powers (not using the fact that Fortran would allow expansion
to multiple FLOPs)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/33780] different results between O3 and O0
  2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
                   ` (2 preceding siblings ...)
  2007-10-15 15:51 ` dominiq at lps dot ens dot fr
@ 2007-10-18 15:23 ` dominiq at lps dot ens dot fr
  2008-02-05 11:09 ` rguenth at gcc dot gnu dot org
  2008-02-11 11:26 ` jv244 at cam dot ac dot uk
  5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-10-18 15:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from dominiq at lps dot ens dot fr  2007-10-18 15:23 -------
> while we expand a constant integral power to an optimal (as of the count of
> multiplications / additions) series of multiplications and additions.

It seems the difference is coming from something else. I'll bet that with -O3
the powers are computed at compile time with some extra precision. At least it
is what I conclude from the following code:

  integer :: ia(16) = (/ 1, 1, 2, 2, 3, 3, 4, 4, 6, 5, 6, 6, 10, 7, 9, 8 /)
  real(8) :: a(16), b(16), c(16), d(16)
  real(16) :: e(16)
  real(8), parameter :: r=4.51556282882533D0
  a(1) = r
  a(2) = a(1)*a(1)
  a(3:4) = a(2)*a(1:2)
  a(5:8) = a(4)*a(1:4)
  a(9:16) = a(8)*a(1:8)
  b(1) = r
  do i = 2, 16
     b(i) = b(ia(i))*b(i-ia(i))
  end do
  e(1) = r
  e(2) = e(1)*e(1)
  e(3:4) = e(2)*e(1:2)
  e(5:8) = e(4)*e(1:4)
  e(9:16) = e(8)*e(1:8)
  d = real(e,8)
  do i=1, 16
     c(i) = r**i
  end do
  do i=1, 16
     print *, a(i)/d(i)-1.0d0, b(i)/d(i)-1.0d0, c(i)/d(i)-1.0d0
  end do
  print *
  print *, sum(abs(a/d-1.0d0)), sum(abs(b/d-1.0d0)), sum(abs(c/d-1.0d0)),
epsilon(r)
end

a stores the powers computed from the usual shift, b those computed from the
"optimal" tree (from gcc/builtins.c, c those computed from the power, and d the
values rounded from values with higher precision. At -O2 the output is:

   0.0000000000000000        0.0000000000000000        0.0000000000000000     
   0.0000000000000000        0.0000000000000000        0.0000000000000000     
   0.0000000000000000        0.0000000000000000        0.0000000000000000     
  2.22044604925031308E-016  2.22044604925031308E-016  2.22044604925031308E-016
  2.22044604925031308E-016  2.22044604925031308E-016  2.22044604925031308E-016
  2.22044604925031308E-016   0.0000000000000000       2.22044604925031308E-016
  2.22044604925031308E-016  2.22044604925031308E-016  2.22044604925031308E-016
  4.44089209850062616E-016  4.44089209850062616E-016  4.44089209850062616E-016
  2.22044604925031308E-016   0.0000000000000000       2.22044604925031308E-016
  4.44089209850062616E-016  2.22044604925031308E-016  4.44089209850062616E-016
  4.44089209850062616E-016  2.22044604925031308E-016  4.44089209850062616E-016
  4.44089209850062616E-016   0.0000000000000000       4.44089209850062616E-016
  4.44089209850062616E-016  4.44089209850062616E-016  4.44089209850062616E-016
  4.44089209850062616E-016  2.22044604925031308E-016  4.44089209850062616E-016
  6.66133814775093924E-016   0.0000000000000000       6.66133814775093924E-016
  6.66133814775093924E-016  6.66133814775093924E-016  6.66133814775093924E-016

  5.10702591327572009E-015  2.88657986402540701E-015  5.10702591327572009E-015 
2.22044604925031308E-016

a and c are equal, while at -O3 the output is

   0.0000000000000000        0.0000000000000000        0.0000000000000000     
   0.0000000000000000        0.0000000000000000        0.0000000000000000     
   0.0000000000000000        0.0000000000000000        0.0000000000000000     
  2.22044604925031308E-016  2.22044604925031308E-016   0.0000000000000000     
  2.22044604925031308E-016  2.22044604925031308E-016   0.0000000000000000     
  2.22044604925031308E-016   0.0000000000000000        0.0000000000000000     
  2.22044604925031308E-016  2.22044604925031308E-016   0.0000000000000000     
  4.44089209850062616E-016  4.44089209850062616E-016   0.0000000000000000     
  2.22044604925031308E-016   0.0000000000000000        0.0000000000000000     
  4.44089209850062616E-016  2.22044604925031308E-016   0.0000000000000000     
  4.44089209850062616E-016  2.22044604925031308E-016   0.0000000000000000     
  4.44089209850062616E-016   0.0000000000000000        0.0000000000000000     
  4.44089209850062616E-016  4.44089209850062616E-016   0.0000000000000000     
  4.44089209850062616E-016  2.22044604925031308E-016   0.0000000000000000     
  6.66133814775093924E-016   0.0000000000000000        0.0000000000000000     
  6.66133814775093924E-016  6.66133814775093924E-016   0.0000000000000000     

  5.10702591327572009E-015  2.88657986402540701E-015   0.0000000000000000      
2.22044604925031308E-016

c and d are equal. If the difference were coming only from the "optimal" tree,
c should be equal to b.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/33780] different results between O3 and O0
  2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
  2007-10-15 15:01 ` [Bug middle-end/33780] " rguenth at gcc dot gnu dot org
  2007-10-15 15:30 ` jv244 at cam dot ac dot uk
@ 2007-10-15 15:51 ` dominiq at lps dot ens dot fr
  2007-10-18 15:23 ` dominiq at lps dot ens dot fr
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens dot fr @ 2007-10-15 15:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from dominiq at lps dot ens dot fr  2007-10-15 15:51 -------
> that causes numerical results with CP2K to change going from -O0 to -O3. 

If you do expect that optimization optimizes your computation, you should
expect some change of the numerical results, so put some tolerance in the
comparisons.
How small can be the tolerance depends on your problem. I have modified your
code to compute the polynomial in three different ways (yours, plus two others,
the third one being the recommended one for "numerical stability", if I did not
make some errors):

FUNCTION F(r,a,e,pf,qf)
 REAL*8 :: a(2:15)
 REAL*8 :: r,f,e, pr,pf,qf,rr
 f=0
 pf=0
 pr=r
 rr=1.0d0/r
 DO i = 2, 15
    f = f + a(i)/(r**(i-1)*REAL(i-1,8))
    pf = pf + a(i)/(pr*REAL(i-1,8))
    pr = r*pr
 END DO
 f=f-e
 pf=pf-e
 qf=a(15)/REAL(14,8)
 do i = 14, 2, -1
    qf=rr*qf+a(i)/REAL(i-1,8)
 end do
 qf=rr*qf-e
END FUNCTION F

PROGRAM TEST

  REAL*8 :: a(2:15)=(/-195.771601327700D0,  15343.7861339500D0, &
                      -530864.458651600D0,  10707934.3905800D0, &
                      -140099704.789000D0,  1250943273.78500D0, &
                      -7795458330.67600D0,  33955897217.3100D0, &
                      -101135640744.000D0,  193107995718.700D0, &
                      -193440560940.000D0, -4224406093.91800D0, &
                       217192386506.500D0, -157581228915.500D0/)

  REAL*8 :: r=4.51556282882533D0
  REAL*8 :: e=-2.21199635966809471000D0
  REAL*8 :: pf, qf
  REAL*8 :: f

  write(6,*)  f(r,a,e,pf,qf)
  write(6,*)  pf, qf

END PROGRAM TEST

[karma] f90/bug% gfc pr33780_red.f90
[karma] f90/bug% a.out
   0.0000000000000000     
  1.36957112317759311E-012  7.66053886991358013E-012
[karma] f90/bug% gfc -O3 pr33780_red.f90
[karma] f90/bug% a.out
 -1.14397380457376130E-012
  1.36957112317759311E-012  1.04861186749364478E-011

So for your problem the tolerance should be ~1.0E-11, and since pf and qf don't
use powers, this is not a question of inlining, but how the operations are
shuffled to "optimize" your computation.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/33780] different results between O3 and O0
  2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
  2007-10-15 15:01 ` [Bug middle-end/33780] " rguenth at gcc dot gnu dot org
@ 2007-10-15 15:30 ` jv244 at cam dot ac dot uk
  2007-10-15 15:51 ` dominiq at lps dot ens dot fr
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jv244 at cam dot ac dot uk @ 2007-10-15 15:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from jv244 at cam dot ac dot uk  2007-10-15 15:30 -------
(In reply to comment #1)
> As Fortran allows open-coding of integral powers this is not a bug.  But
> I'll leave final closing as INVALID to more fortran-knowing people.

I agree that Fortran-wise this is not a bug. However, gcc tends to do some of
the things that Fortran allows by default (x/y -> x*(1/y) ; x+Y+z -> (x=z)+y ;
..) only at -ffast-math. AFAIK, this seems to be the only optimisation in gcc
that causes numerical results with CP2K to change going from -O0 to -O3. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/33780] different results between O3 and O0
  2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
@ 2007-10-15 15:01 ` rguenth at gcc dot gnu dot org
  2007-10-15 15:30 ` jv244 at cam dot ac dot uk
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-15 15:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2007-10-15 15:01 -------
Confirmed.  It changes for me -O2 vs. -O2 -fpeel-loops which causes the loop
in f to be completely unrolled (which at the first sight doesn't look wrong).
And we expand __powidf2 inline because the exponent is now a constant for all
calls.  This indeed might explain the difference as __powidf2 is implemented
as

TYPE
NAME (TYPE x, int m)
{
  unsigned int n = m < 0 ? -m : m;
  TYPE y = n % 2 ? x : 1;
  while (n >>= 1)
    {
      x = x * x;
      if (n % 2)
        y = y * x;
    }
  return m < 0 ? 1/y : y;
}  

while we expand a constant integral power to an optimal (as of the count of
multiplications / additions) series of multiplications and additions.

As Fortran allows open-coding of integral powers this is not a bug.  But
I'll leave final closing as INVALID to more fortran-knowing people.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-10-15 15:01:08
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33780


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-02-16  5:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-33780-4@http.gcc.gnu.org/bugzilla/>
2024-02-16  5:15 ` [Bug middle-end/33780] different results between O3 and O0 pinskia at gcc dot gnu.org
2007-10-15 14:49 [Bug middle-end/33780] New: " jv244 at cam dot ac dot uk
2007-10-15 15:01 ` [Bug middle-end/33780] " rguenth at gcc dot gnu dot org
2007-10-15 15:30 ` jv244 at cam dot ac dot uk
2007-10-15 15:51 ` dominiq at lps dot ens dot fr
2007-10-18 15:23 ` dominiq at lps dot ens dot fr
2008-02-05 11:09 ` rguenth at gcc dot gnu dot org
2008-02-11 11:26 ` jv244 at cam dot ac dot uk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).