From mboxrd@z Thu Jan  1 00:00:00 1970
From: N8TM@aol.com
To: tprince@cat.e-mail.com, burley@gnu.org, egcs@cygnus.com
Subject: Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86
Date: Thu, 03 Dec 1998 06:34:00 -0000
Message-id: <32db0c93.3666999f@aol.com>
X-SW-Source: 1998-12/msg00097.html

In a message dated 12/2/98 burley@gnu.org writes:
C: Craig Burley
T: Tim
<<C: the loads/stores involving the variables themselves would be
 single-precision, but the operations are done in, or produce results
 in, extended (80-bit) precision.  These should, according to my
 proposal, be *spilled* as 80-bit, not 64-bit or 32-bit, values,
 though when written to destinations (user-named variables), they'd
 then (normally) be chopped down to size, per -ffloat-store and
 what-not.
 
T: For a single-precision calculation, performing the register spills in
double would provide enough extra precision, without significant impact on
performance, if aligned storage can be used. Certainly, 80-bit spills would be
fine if they didn't impact performance.  This is like going back to the old
days of the GE600/Honeywell6000 architecture, where the floating point
register was 80 bits wide (only 8 bits for the exponent!) but there was no
efficient way to spill the full register width, nor would there have been much
use for it, considering how much of the extra precision was lost due to under-
flows.

 >>C:In other words, the default for x86 code generation should
 >>apparently be that, when the compiler generates an intermediate result,
 >>it *always* uses maximum available precision for that result, even
 >>if it has to spill the result to memory.  (I *think* it can do this while
 >>obeying the current FP mode, but don't have time to check right
 >>now.)
 >>[...]
 >
 >T: In the case where e is used in a subsequent calculation, we
 >don't want to force a store and reload unless -ffloat-store is
 >invoked.
 
 >C: Correct, AFAIK.

T: There's some uncertainty here, where the desire to maintain performance
causes us to keep the extra precision, although the programmer might
conceivably not want it.  In order to turn it off in a "fine-grained" manner,
the programmer must program in a "float-store" which I do by invoking an
external function which returns the rounded-off value (can't be in-lined).
 
 >T: But I'm not sure you can always apply the same rules to
 >storage to a named variable (it might be stored in a structure or
 >COMMON block) as to register spills, which aren't visible in the
 >source code.
 
>C:  No, I don't think you can, and that's what my proposal and email
 were trying to clarify (less than successfully, I gather!).
 
>C: That is, I was trying to focus my proposal on only the compiler-
 generated temporaries that get spilled and chopped down to "size"
 at the same time.
 
 >T: This is a more
 >difficult question to solve and I'm confused about what
 >connection you are making between that and the spilled
 >temporaries.
 
>C:  In my proposal, essentially none, except that it used to confuse me,
 and I believe it still confuses others, that there are pretty bright-
 line distinctions between compiler-generated temporaries and user-named
 variables, in terms of precisions the compiler is, or should be,
 permitted to employ for each class.  (But not all the distinctions
 are so clear, it seems.)
 
 
>C:  With compiler-generated temporaries, it is, again, helpful or hurtful,
 and normally permitted, for the compiler to employ *more* than the
 implicit precision of the operation, but the problem with the gcc
 back end, on the x86 at least, is that it (apparently) sometimes
 employs *less*, specifically, when spilling those temporaries.  (That
 is, when the temporary needs to be copied from the register in which
 it "lives" to a memory location, the gcc back end apparently is
 happy to chop the temporary down to fit into a smaller memory location.)
 
 >C: My proposal deals only with this latter deficiency (as I now think it
 is), that is, it recommends that precision *reduction* of compiler-
 generated temporaries no longer happen (at least not by default).
 
  
>C:  -  The compiler provides no way to "force" available excess precision
      to be reliably used for programmer-named variables anyplace that
      is possible (say, within a module).  Some compilers offer explicit
      extended type declarations (REAL*10 in Fortran; `long double' in C?),
      but g77 doesn't yet.  So, whether a named variable carries the
      (possible) excess precision of its computed value into subsequent
      calculations is at the whim of the compiler's optimization phases.
 
T: I think what you are getting at is that it's usually acceptable for the
results to be calculated in the declared precision; extra precision is usually
desirable, but unpredictable combinations of extra precision and no extra
precision may be disastrous.  See Kahan's writings about the quadratic
formula.  Your proposal would make an improvement here.
  
>C: REAL*16 seems to be asked for fairly often.)

T:  Probably by people who don' t recognize how much performance hit the Intel
processors will take going from REAL*10 to REAL*16.  If the Lahey/Fuji f95
compiler gets the alignment problems fixed so that REAL(kind=8) returns to
good performance, I think this will become more evident.
 

 >T: I suspect the 96 bits must be written to a 128-bit aligned storage
 >location to minimize the performance hit.
 
>C:  Probably.  But we're not even at 64-bit aligned storage for stack
 variables (which is where spills must happen, for the most part) yet,
 and IMO code that requires FP spills, on the x86 anyway, is probably
 not going to notice the lack of alignment due to its complexity.

T:  I believe that i686-pc-linux-gnulibc1 is trying with some success to do
aligned spills, and that that's the reason why -O2 is often faster running
than -Os on that target, while -O2 is slower than -Os on the same code on the
targets which don't have double alignments on the stack.
 
 
 >T: If someone does manage to implement this, I would like to study
 >the effect on the complex math functions of libF77, using Cody's
 >CELEFUNT test suite.  I have demonstrated already that the
 >extended double facility shows to good advantage in the double
 >complex functions.  The single complex functions already
 >accomplish what we are talking about by using double
 >declarations for all locals, and that gives them a big advantage
 >over certain vendors' libraries.
 
>C:  Right now, my impression is that the effect would be nil *unless*
 these codes are complicated enough to cause spills of temporaries
 in the first place.

T: The improvement in accuracy depends on getting extended precision results
from built-in math functions, so it would require a math-inline option as well
as the 80-bit register spills.  I don't know whether it can be done
effectively say by taking care to make the math-inline headers of libc6 more
reliable.
 
 
>C:  First, the main goal of my proposal is to reduce unpredictable loss
 of precision on machines like x86, where programmers should be
 aware their code will often employ extended precision (and thus might
 depend on it).
 
>C:  However, if -ffloat-store is not used, then perhaps this reduction
 would not be complete, and lead to rarer, yet even more obscure and
 hard-to-find, bugs, unless we indeed make sure that even spills of
 named variables carry never chop the values of those variables (which
 might be in extended precision).

T:  That might be too much to expect.  It's true that there could be
situations where adding code might cause a named variable to be spilled to its
declared precision where a simpler version used extended precision, but I
doubt it's feasible to prevent that.  I'll suggest a less ambitious goal:
that the recognition of common sub-expressions should not lead to reduced
precision:

	a = b*c + d*e
	f = d*e*g + h

If the compiler decides to treat d*e as a common sub-expression, in order to
save an operation, but then finds that this expression needs to spill, that
spill and restore should be full precision.  Otherwise, we get back to the
unpredictable situations.
 
 
         tq vm, (burley)
 
 >C: P.S. Most, if not all of this, is the result of widespread disagreement
 over what a simple type declaration like `REAL*8 A' or `double a;' really
 means.  The simple view is "it means that the variable must be capable
 of holding the specified precision", but so many people really expect
 it to mean so much more, in terms of whether operations on the variable
 may, might, or must involve more precision, etc.  And, since the
 predominant languages give those people no straightforward way to express
 what they *do* really want, how surprising is it that they "overload" the
 "simple" view of what a type definition really means?
  >>

T: This is getting off-topic.  I might think that f90 declarations like

	a = REAL(selected_real_kind(15))
	b = REAL(selected_real_kind(18))

could allow the programmer to express intent in more detail while retaining
portability, but I don't think any existing compilers implement this in a
useful way.