public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/34702]  New: 1.0 is not the inverse of 1.0 with -mrecip on x86
@ 2008-01-07 15:08 dominiq at lps dot ens dot fr
  2008-01-07 22:39 ` [Bug target/34702] [4.3 Regression] " pinskia at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-01-07 15:08 UTC (permalink / raw)
  To: gcc-bugs

The following test

integer :: i, n
real :: x, y(10)
x =1.0
n = 10
do i = 1, n
  y(i) =  1.0/x
  x = 2.0*x
end do
print *, y
end

when compiled with -O -ffast-math -mrecip, gives

  0.99999994      0.49999997      0.24999999      0.12499999     
6.24999963E-02  3.12499981E-02  1.56249991E-02  7.81249953E-03  3.90624977E-03 
1.95312488E-03

the inverse of an integer power of 2 is not an integer power of 2 with -mrecip
on i686-apple-darwin9 and x86_64-unknown-linux-gnu.

Since x0*(2.0-x*x0) does not have round-off errors when x=2.0**i and
x0=2.0**(-i), I assume that rcp* don't return an integer power of 2 if the
input is an integer power of 2.

In addition the result of a Newton-Raphson iteration is not the same from above
of from below:

real :: x, y
x = 1.0
y = nearest(x,x)
print *, y*(2.0-x*y)
y = nearest(x,-x)
print *, y*(2.0-x*y)
end

gives

  1.00000000    
  0.99999994    

This result is quite unfortunate since the integer powers of 2 are the only
floating point numbers having an exact inverse.  This numerical error probably
not be fixed in an effective way, but should probably be documented.

As a side note, I stumbled on the problem while trying to run the aermod.f90
polyhedron test with -mrecip. I have been chasing a resulting bus error at
execution for a while until I found the culprit as line 35369 of the subroutine
NUMRISE:

      IF ( FLOAT(NNP/NP).EQ.FLOAT(NNP)/XNP ) THEN

for which the comparison fails even if NP=1 and XNP=1.0 with -mrecip. It
follows that the variable NN, initialized within this IF block, is not
initialized, thus in some case leading to a variable DELN negative or bigger
than 1 at line 35514, leading to an access out of bounds.  Note also that there
is no point to discuss (at least with me) the way the program is written: I am
well aware of the dangers of testing floating points for equality!

A way to limit this kind of problems while letting people use -mrecip if it
speeds up their code, could be (if possible) to use the exact division within
IF expressions.


-- 
           Summary: 1.0 is not the inverse of 1.0 with -mrecip on x86
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dominiq at lps dot ens dot fr
 GCC build triplet: i686-apple-darwin9
  GCC host triplet: i686-apple-darwin9
GCC target triplet: i686-apple-darwin9


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] [4.3 Regression] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
@ 2008-01-07 22:39 ` pinskia at gcc dot gnu dot org
  2008-01-08  9:23 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-01-07 22:39 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|i686-apple-darwin9          |
   GCC host triplet|i686-apple-darwin9          |
 GCC target triplet|i686-apple-darwin9          |i?86-*-*
           Keywords|                            |wrong-code
            Summary|1.0 is not the inverse of   |[4.3 Regression] 1.0 is not
                   |1.0 with -mrecip on x86     |the inverse of 1.0 with -
                   |                            |mrecip on x86
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] [4.3 Regression] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
  2008-01-07 22:39 ` [Bug target/34702] [4.3 Regression] " pinskia at gcc dot gnu dot org
@ 2008-01-08  9:23 ` ubizjak at gmail dot com
  2008-01-08 10:17 ` dominiq at lps dot ens dot fr
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-08  9:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from ubizjak at gmail dot com  2008-01-08 08:42 -------
Hm, I don't see the reason, why this is a regression. What this PR shows is the
limitation of the precision of rcpss instruction.

Actually, this 2ulp precision is the reason, why -mrecip is not enabled by
default for -ffast-math.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] [4.3 Regression] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
  2008-01-07 22:39 ` [Bug target/34702] [4.3 Regression] " pinskia at gcc dot gnu dot org
  2008-01-08  9:23 ` ubizjak at gmail dot com
@ 2008-01-08 10:17 ` dominiq at lps dot ens dot fr
  2008-01-08 10:28 ` dominiq at lps dot ens dot fr
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-01-08 10:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from dominiq at lps dot ens dot fr  2008-01-08 09:46 -------
I don't think either that this is a regression, only a bad side effect. A
possibility to overcome it would be to change the way theNewton-Raphson
iteration is computed. Presently it seems to be x1=x0*(2.0-x*x0) which is bad
when x*x0=nearest(1.0,-1.0) as the result of (2.0-x*x0) is 1.0. I see two ways
to improve the accuracy: x1=2.0*x0-(x*x0*x0) and x1=x0+(x0*(1.0-x*x0))
(assuming the parentheses are obeyed).
The first case add a multiply, but should not increase the latency if the
multiply in 2.0*x0 is inserted between the first and the second multiplies of
x*x0*x0. The second case would add the 'add' latency to the original one, but
have a better balance between adds and multiplies and is probably the most
accurate.

Since I am not familiar enough with the x86, I cannot guess precisely  what are
the other effects of these implementations: extra moves, register pressure, ...
. Naively I'll say that the first one would be better in codes having a deficit
in multiplies while the second one would be better for long sequence of
divisions.

If anyone is interested to dig further this issue, I can test patches and
timings on a core2duo. Anyway I think something should be said about this
"feature" in the manual and there may be some need to have some (better?) "cost
model" of replacing a division by "recip+NR" as I read it in a previous post.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] [4.3 Regression] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
                   ` (2 preceding siblings ...)
  2008-01-08 10:17 ` dominiq at lps dot ens dot fr
@ 2008-01-08 10:28 ` dominiq at lps dot ens dot fr
  2008-01-08 10:32 ` [Bug target/34702] " rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-01-08 10:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from dominiq at lps dot ens dot fr  2008-01-08 10:17 -------
I just had a look to the doc patch in
http://gcc.gnu.org/ml/gcc-patches/2008-01/msg00273.html:

...
+Note that while the throughput of the sequence is higher than the throughput
+of the non-reciprocal instruction, the precision of the sequence can be
+decreased up to 2 ulp.

It looks nice, but could you add (after converting my frenglish) something
along the line "in particular the inverse of 1.0 is no longer 1.0, but
0.99999994" (or nearest(1.0,-1.0) or its C* equivalent).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
                   ` (3 preceding siblings ...)
  2008-01-08 10:28 ` dominiq at lps dot ens dot fr
@ 2008-01-08 10:32 ` rguenth at gcc dot gnu dot org
  2008-01-08 14:21 ` uros at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-08 10:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2008-01-08 10:17 -------
I agree, this is not a regression, but a documentation issue (patch was
posted).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|wrong-code                  |
   Last reconfirmed|0000-00-00 00:00:00         |2008-01-08 10:17:57
               date|                            |
            Summary|[4.3 Regression] 1.0 is not |1.0 is not the inverse of
                   |the inverse of 1.0 with -   |1.0 with -mrecip on x86
                   |mrecip on x86               |
   Target Milestone|4.3.0                       |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
                   ` (4 preceding siblings ...)
  2008-01-08 10:32 ` [Bug target/34702] " rguenth at gcc dot gnu dot org
@ 2008-01-08 14:21 ` uros at gcc dot gnu dot org
  2008-01-08 14:37 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: uros at gcc dot gnu dot org @ 2008-01-08 14:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from uros at gcc dot gnu dot org  2008-01-08 13:51 -------
Subject: Bug 34702

Author: uros
Date: Tue Jan  8 13:50:14 2008
New Revision: 131394

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131394
Log:
        PR target/34702
        * doc/invoke.texi (i386 and x86-64 Options) [mrecip]: Document
        limitations of reciprocal sequences on x86 targets.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
                   ` (5 preceding siblings ...)
  2008-01-08 14:21 ` uros at gcc dot gnu dot org
@ 2008-01-08 14:37 ` ubizjak at gmail dot com
  2008-01-08 15:14 ` dominiq at lps dot ens dot fr
  2009-09-17  9:46 ` ubizjak at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-08 14:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from ubizjak at gmail dot com  2008-01-08 14:02 -------
(In reply to comment #3)

> It looks nice, but could you add (after converting my frenglish) something
> along the line "in particular the inverse of 1.0 is no longer 1.0, but
> 0.99999994" (or nearest(1.0,-1.0) or its C* equivalent).

I have added a small example you provided to the text:

--cut here--
... are generated only when @option{-funsafe-math-optimizations} is enabled
together with @option{-finite-math-only} and @option{-fno-trapping-math}.
Note that while the throughput of the sequence is higher than the throughput
of the non-reciprocal instruction, the precision of the sequence can be
decreased up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
--cut here--

The point of -mrecip is to create the sequence with the fastest throughput
possible on the expense of the precision. Please note, that we operate on
SFmode here, so precision should be of no concern. While plain rcpss & friends
are not precise enough to be even remotely usable (and they can be accessed
using builtins anyway), the NR enhanced reciprocals provide adequate precision
with great speedup. However, it looks that none of them are good enough for
plain -ffast-math, mainly due to direct comparisons of FP values as in exmple
you provided.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
                   ` (6 preceding siblings ...)
  2008-01-08 14:37 ` ubizjak at gmail dot com
@ 2008-01-08 15:14 ` dominiq at lps dot ens dot fr
  2009-09-17  9:46 ` ubizjak at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-01-08 15:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from dominiq at lps dot ens dot fr  2008-01-08 14:20 -------
> the NR enhanced reciprocals provide adequate precision with great speedup

Two quick remarks along this line:

(1) On Core2Duo the speedup is not obvious for the polyhedron testsuite,
excepted for gas_dyn 7.7s vs. 8.5s (also capacita 59.6s vs. 57s).

(2) ifort seems to make an extensive use of this optimization (gas_dyn 4.1s),
in general with "packed" code, but seems to use 'div' also in other places.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/34702] 1.0 is not the inverse of 1.0 with -mrecip on x86
  2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
                   ` (7 preceding siblings ...)
  2008-01-08 15:14 ` dominiq at lps dot ens dot fr
@ 2009-09-17  9:46 ` ubizjak at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2009-09-17  9:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from ubizjak at gmail dot com  2009-09-17 09:46 -------
This is just expected behavior, documented for -mrecip.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34702


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-09-17  9:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-07 15:08 [Bug target/34702] New: 1.0 is not the inverse of 1.0 with -mrecip on x86 dominiq at lps dot ens dot fr
2008-01-07 22:39 ` [Bug target/34702] [4.3 Regression] " pinskia at gcc dot gnu dot org
2008-01-08  9:23 ` ubizjak at gmail dot com
2008-01-08 10:17 ` dominiq at lps dot ens dot fr
2008-01-08 10:28 ` dominiq at lps dot ens dot fr
2008-01-08 10:32 ` [Bug target/34702] " rguenth at gcc dot gnu dot org
2008-01-08 14:21 ` uros at gcc dot gnu dot org
2008-01-08 14:37 ` ubizjak at gmail dot com
2008-01-08 15:14 ` dominiq at lps dot ens dot fr
2009-09-17  9:46 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).