From mboxrd@z Thu Jan  1 00:00:00 1970
From: Linus Torvalds <torvalds@transmeta.com>
To: Tim Hollebeek <tim@hollebeek.com>
Cc: Theodore Papadopoulo <Theodore.Papadopoulo@sophia.inria.fr>, Gabriel Dos Reis <gdr@codesourcery.com>, <dewar@gnat.com>, <amylaar@redhat.com>, <aoliva@redhat.com>, <gcc@gcc.gnu.org>, <moshier@moshier.ne.mediaone.net>, <tprince@computer.org>
Subject: Re: What is acceptable for -ffast-math? (Was: associative law incombine)
Date: Wed, 01 Aug 2001 09:54:00 -0000
Message-id: <Pine.LNX.4.33.0108010929020.20829-100000@penguin.transmeta.com>
References: <20010801122417.A2509@cj44686-b.reston1.va.home.com>
X-SW-Source: 2001-08/msg00052.html

On Wed, 1 Aug 2001, Tim Hollebeek wrote:
>
> Rewriting the programmers program in such a way that he gets
> significantly less accuracy than he could reasonably expect is just
> silly.

Ehh.. "significantly less accuracy"?

Now, show me a case where a/b/c -> a/(b*c) is significantly less accurate?
Let's confine this to x86 - it is the most common case by far, and hey,
especially when it comes to games etc I doubt it matters what HP-PA does,
for example. We can easily make the optimizations be enabled on a
per-architecture basis.

The one obvious case is (b*c) overflows and turns into Inf, at which point
the end result will be +-0 with the optimization. Now, in order for the
original to _not_ have been close to zero, we're talking about 'a' being
at the very limits of the number space. And remember that on x86 register
arithmetic - regardless of precision setting - the exponent is 15 bits. So
in order to get the overflow, AND to get a original value that wasn't zero
(ie to get an error), 'a' would have had to be on the order of

	2**(16383-64)

which won't even _fit_ in a double anyway (the -64 is to take even
denormals into account, but others have said that flush-to-zero would have
been acceptable anyway by default). Did I mis-do my calculations.

(Repeat argument for underflow).

So the overflow case actually doesn't exist on x86 if we keep the values
in registers or flush to stack in extended real format (and if we do NOT
flush to stack in extended real format, x86 has other problems like the
fact that the results depend on register flushes etc).

Do we have other problems with the conversion? Sure, it might be 1ULP off
even in the non-overflow case, but as we've seen others do that
optimization by default, never mind even with -ffast-math.

So tell me, what makes the optimization so dangerous? It doesn't seem to
do anything that we don't already do.

Did I miss some noteworthy case?

		Linus