From mboxrd@z Thu Jan  1 00:00:00 1970
From: hjstein@bfr.co.il (Harvey J. Stein)
To: "Geert Bosch" <bosch@gnat.com>
Cc: hjstein@bfr.co.il
Subject: Re: FWD: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
Date: Tue, 15 Dec 1998 03:34:00 -0000
Message-id: <m2vhjd8x0x.fsf@blinky.bfr.co.il>
References: <9812150945.AA19177@nile.gnat.com>
X-SW-Source: 1998-12/msg00510.html

"Geert Bosch" <bosch@gnat.com> writes:

 > On 14 Dec 1998 11:51:23 +0200, Harvey J. Stein wrote:
 > 
 >   Reasonable floating point code should expect that reordering
 >   operations will produce slightly different results due to round off
 >   error, and should be tolerant of the optimizer doing such.  Especially
 >   given how little control the programmer has over exactly how
 >   computations are ordered.
 > 
 > Many useful fpt algorithms rely on ordering of operations to be honored, 
 > and a compiler evaluating  B + (A - B) as (B + A) - B or even as A 
 > is seriously broken for numerical stuff.
 > 
 > Having spills to memory retain full precision is very useful as this allows
 > one to prove much more about fpt code. Here is an example of what I mean,
 > using a decimal fpt type with 4 digits for extended precision in registers 
 > and 3 for the in-memory precision of a variable. (Examples using binary
 > 64-bit and 80-bit fpt types are similar but harder to read.)
 >
 > Calculate  S = (10.0 + 0.454) - (0.454 + 10.0), spilling one partial sum to T.

<snip>

 > Case 1 does not use extended registers and rounds at every operation.
 >   This is completely IEEE conformant behaviour.
 > 
 > Case 2 uses extended registers and same precision for spilled value.
 >   This is not IEEE-conformant, but guarantees consistent rounding behaviour.
 >   In particular the relative error is never more than that of case 1.
 >   For most algorithms this will work fine, double rounding will only
 >   occur on the final assignment. This is not ideal, but now worst-case
 >   is one double rounding per statement instead of one per operation.
 >   If assignments are forced to go to memory (using volatile var's for example),
 >   fpt behaviour is independent of optimization level. 
 > 
 > Case 3 uses extended registers, but lower precision for spilled value.
 >   This is the worst case and is what is causing problems right now.
 >   The intermediate values while evaluating the expression may be subject 
 >   to double rounding errors. People who care about right answers often 
 >   turn off optimization, but ironically this makes the problems only worse! 

I was thinking more along the lines of complex sequences of
computations with subexpression elimination, etc.

But, I'm a little unclear on register spilling.  Exactly when do
values enter & leave FP registers?  Does everything stay in FP
registers for the useful lifetime of the value except when a spill
occurs, or can things move in and out of variables more freely?

For example, suppose I have code like:

    x = a*b;
    y = c*d;
    z = x+y;

I've been under the (worst case) assumption that any combination of a,
b, c & d might be in FP registers, that the multiplies might be done
using register/register or memory/register multiplies, and that x and
y might be gotten either from memory or from registers.

Is this the case?

Or is it the case that values always go into FP registers first, and
are always manipulated from FP registers except if we run out, in
which case a spill is done?  In particular, is it never the case that
something would get stored back into memory (freeing up an FP
register), and then later loaded back into an FP.  For example, in the
above (after adding enough computations), could x get computed, stored
back into &x and then later loaded from &x to compute z?

If my original assumption is correct, then I think my objections still
hold - spilling in extended precision will help a little but not
completely.

If not - if it's really the case that everything always stays in FP
registers except for spills, then I agree that doing 80 bit spills
will largely prevent weird numerical values.  It would effectively
make 80 bitness contagious, which should be sufficient even for
comparisons to act reasonably (assuming constants are also computed in
80 bits, to prevent 1.0/3.0 from not equalling x/y after x=1.0;
y=3.0).

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il