From mboxrd@z Thu Jan  1 00:00:00 1970
From: tprince@cat.e-mail.com
To: egcs@cygnus.com, tprince@computer.org
Subject: /internet
Date: Fri, 04 Dec 1998 17:41:00 -0000
Message-id: <4.19981204.20.41.11.410443@cat.e-mail.com>
X-SW-Source: 1998-12/msg00139.html

>T: I think what you are getting at is that it's usually acceptable for the
>results to be calculated in the declared precision; extra precision is usually
>desirable, but unpredictable combinations of extra precision and no extra
>precision may be disastrous.  See Kahan's writings about the quadratic
>formula.  Your proposal would make an improvement here.

>C:That feedback is helpful, and does seem to reflect what I was trying to
>say originally.  (I haven't seen Kahan's writings, or at least very little
>of them, at this point.)

T:  Look at http://http.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps
I figured out how to do his quadratic algorithm in C with volatiles on the
R8K and PowerPC (neither of which I use any more) but it needed function
calls to do it in g77.  If someone gets the fused MACs going for hppa2.0,
those issues will come up there.  It really is possible to find these fused
MAC's introducing NaN's in a program which works correctly otherwise,
and I've not been able to track them down in a program bigger than that
quadratic formula thing.  The R10K has a gentler form of fused MAC
where the behavior has been made to be generally the same as with
individual IEEE-754 compliant operations.  Kahan didn't analyze what
might happen with unfavorable combinations of two rounding modes
on an Intel-like processor; I'm suspecting it would not be good news.
Has anyone looked into this?

>>C: I think a substantial portion of the audience asking for REAL*16 is
>*non-Intel*.  SPARC and Alpha people come to mind.  I agree that those
>who want enough extra precision to more reliably compute 64-bit results
>from 64-bit inputs would likely prefer the faster, native support
>provided by REAL*10 on Intel, and ideally "we" (g77/egcs/whatever) would
>be able to provide REAL*10 somewhat faster than REAL*16 on other machines
>as well, even though, unlike on Intels, the REAL*10 would be emulated.

T:  There are 2 major varieties of REAL*16.  The one which HP (and, I believe,
Sun Lahey and DEC) use is the more accurate and slower one, which conforms
nominally to IEEE P854 and has roughly the same exponent range as the
Intel REAL*10.  SGI and IBM use a faster version, which is facilitated by the
fused Multiply-accumulate instructions, which has roughly 6 fewer bits of
precision, a range less than that of double precision, and doesn't conform to
IEEE P854.

T; In both the HP and SGI libraries, the math functions give up
accuracy so as not to lose as much speed, so it is possible in either case to
wind up with little more accuracy than you would get with a carefully
implemented REAL*10.  I don't know about the other vendors' libraries.  On
the pentiums, some of the math functions inherently take advantage of the
full precision (log() but not log10(), sqrt(), sin()/cos(), tan(), atan()), while a
few require more of the style of programming found in non-Intel math
libraries, but with asm() mixed in, putting the proper usage of clobbers to
the test.

>>C:I don't think aligned spills happen reliably at all on any *released*
>version of egcs or gcc yet (well, except maybe for old versions of
>gcc patched with those big g77 patches that *seemed* to do most of the
>aligned-double thing).  But it looks like egcs 1.2 or 1.3 will align
>doubles on the stack, covering spills, at or near a rock-solid level
>of reliability.

T: Treatment of spills in general seems to be one area where gnu has some
room for improvement, in comparison to commercial compilers, particularly
for Intel.  I'm sure amazed that Lahey lost track of their alignments for lf95,
but they seem otherwise to be able to avoid spill performance problems.

>>C: if the
>compiler decides to call library (or inline) functions for constructs
>not explicitly, in the code, involving such calls, and those functions
>are not 80-bit, the result might indeed be similar to spilling to 64-bit
>values in that the programmer doesn't expect a sudden loss of precision
>there.

>>C:I'm thinking, for example, of complex divides, which g77 implements
>by avoiding the back end's version and going straight for c_div (or
>whatever) in libF77, to support a larger domain of inputs with
>greater accuracy.

T:  There, of course, the straightforward use of extended precision takes
care of the situation more effectively, where special-case coding is needed
otherwise.  But that can be done by using conditional compilation inside
c_div, according to whether the target architecture has long double of
greater precision and range than double.

>>C:Though, in this example, the loss of precision is a bit easier to
>predict: it currently happens for complex divides.  Someday, though,
>we might decide to have it apply to complex multiplies, and/or it
>might be desirable to have the compiler choose, based on less visible
>data (than the source code) to do a call rather than in-line the code.
>It's important to preserve the precision in such cases.

T:  It's more a question of avoiding unexpected exceptions.  The overhead
of the function call is not a serious matter for c_div, but it could be for
multiplication.  I looked up some of the implementations when you
brought this up over a year ago, and the only one I found which takes
special precautions on complex multiplication was VAX/VMS.  It's
needed more on VAX floating point, as even with the precautions, the
range of working operands is less than with IEEE floating point and no
special precautions.

Dr. Timothy C. Prince
Consulting Engineer
Solar Turbines, a Caterpillar Company
alternate e-mail: tprince@computer.org

           To:                                              INTERNET - IBMMAIL
                                                            N3356140 - IBMMAIL