public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* /internet
@ 1998-12-15 15:06 tprince
  1998-12-15 19:12 ` /internet Stephen L Moshier
  0 siblings, 1 reply; 26+ messages in thread
From: tprince @ 1998-12-15 15:06 UTC (permalink / raw)
  To: bosch, burley, egcs, hjstein, jbuck, moshier

The question of possible over/underflow of intermediate results is more relevant for floating point than
for integer multiplication.  However, I believe most people consider that the programmer is obligated
to use parentheses to enforce safe grouping, if there is a known such grouping. If the program was
originally tested on Intel, this is not likely to have been considered!  Reassociation is commonly
required for adequate performance of a series of additions, and there of course accuracy is more
likely to be an issue than over/underflow.

I try to make a practice of using appropriate parentheses in an expression such as

(a-b) + (c-d) + (e-f)

where that may improve accuracy (given that the differences are known to be relatively small) as well
as suggesting an effective pipelined grouping.  Possibly this would deserve an FAQ treatment if
egcs/gnu begins to do reassociations.

Treatment of reassociation is by no means uniform.  I just accepted Arnaud Desitter's suggestion of a
usage of Kahan summation to improve the portability of the check-summing in Livermore Fortran
Kernel.  It turns out that the Irix (MipsPro7.2) compiler requires the option -OPT:fold_reassociate=OFF
in conjunction with optimization, in order for this to work.   They treat neglect of parentheses and order
of assignments, as well as changing the sense of comparison, as a normal consequence of full
optimization.  Of course, in that case, the intended data dependencies require 4 times as many cycles
as the "optimized" code, but that's still faster than REAL*16 on some machines.

This optimization doesn't matter on Intel in extended precision mode, but one of the compilers I tested
apparently sets double precision mode.  I tried declaring "real(selected_real_kind(18)) sum" to make
a strong suggestion that the compiler should use extended precision or REAL*16, but certain
compilers rejected this entirely, and g77 has not adopted this syntax.
____________________________________________

>>For FP, we would like the ability to reassociate some expressions.  Take
(a * b * c * d) * e

>>Right now we'll genrate

t1 = a * b;
t2 = t1 * c;
t3 = t2 * d;
t4 = t3 * e;

>>Note the dependency of each insn on the previous insn.  This can be a major
performance penalty -- especially on targets which have dual FP units or where
a fpmul isn't incredibly fast (data dependency stalls at each step).

t1 = a * b;
t2 = c * d;
t3 = t1 * t2;
t4 = t3 * e;


>>Is a much better (and safe as far as I know) sequence.  The first two insns
are totally independent, which at the minimum reduces one of the 3 stall
conditions due to data dependency.  For a target with a pipelined FPU or
dual FPUs the second sequence sequence will be significantly faster.


>>For integer, we need to know where the parens are to preserve integer overflow
semantics in languages like Ada for similar transformations.


jeff<<

-
Dr. Timothy C. Prince
Consulting Engineer
Solar Turbines, a Caterpillar Company
alternate e-mail: tprince@computer.org

           To:                                              INTERNET - IBMMAIL
                                                            N0520484 - IBMMAIL

^ permalink raw reply	[flat|nested] 26+ messages in thread
* Re: /internet
@ 1998-12-16 12:34 Geert Bosch
  1998-12-16 13:02 ` /internet Harvey J. Stein
  1998-12-16 16:25 ` /internet Jeffrey A Law
  0 siblings, 2 replies; 26+ messages in thread
From: Geert Bosch @ 1998-12-16 12:34 UTC (permalink / raw)
  To: Joe Buck, moshier; +Cc: burley, egcs, hjstein, jbuck, law, tim, tprince

On Wed, 16 Dec 98 11:58:08 PST, Joe Buck wrote:

  Amazing.  These guys are trying to turn C into Ada.

If that's true, they are doing it all wrong! :-)
In Ada95 rules have be precisely chosen such to allow 
useful optimizations like these. See my previous message 
where I explain how checks (and as a result operations) 
may be removed. 

Another useful difference is that in Ada evaluations of
functions in "pure" packages are guaranteed to be free of 
side-effects and may reuse results of earlier invocations 
with the same parameters. GCC takes full advantage of this 
by moving them out of loops etc. Of course all elementary 
fpt functions (including complex ones) are defined in pure 
packages and this leads to nice optimizations without sacrifying 
one bit of accuracy.

Those guys should copy these features instead of ones they make up!

Regards,
   Geert


^ permalink raw reply	[flat|nested] 26+ messages in thread
* /internet
@ 1998-12-04 17:41 tprince
  0 siblings, 0 replies; 26+ messages in thread
From: tprince @ 1998-12-04 17:41 UTC (permalink / raw)
  To: egcs, tprince

>T: I think what you are getting at is that it's usually acceptable for the
>results to be calculated in the declared precision; extra precision is usually
>desirable, but unpredictable combinations of extra precision and no extra
>precision may be disastrous.  See Kahan's writings about the quadratic
>formula.  Your proposal would make an improvement here.

>C:That feedback is helpful, and does seem to reflect what I was trying to
>say originally.  (I haven't seen Kahan's writings, or at least very little
>of them, at this point.)

T:  Look at http://http.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps
I figured out how to do his quadratic algorithm in C with volatiles on the
R8K and PowerPC (neither of which I use any more) but it needed function
calls to do it in g77.  If someone gets the fused MACs going for hppa2.0,
those issues will come up there.  It really is possible to find these fused
MAC's introducing NaN's in a program which works correctly otherwise,
and I've not been able to track them down in a program bigger than that
quadratic formula thing.  The R10K has a gentler form of fused MAC
where the behavior has been made to be generally the same as with
individual IEEE-754 compliant operations.  Kahan didn't analyze what
might happen with unfavorable combinations of two rounding modes
on an Intel-like processor; I'm suspecting it would not be good news.
Has anyone looked into this?

>>C: I think a substantial portion of the audience asking for REAL*16 is
>*non-Intel*.  SPARC and Alpha people come to mind.  I agree that those
>who want enough extra precision to more reliably compute 64-bit results
>from 64-bit inputs would likely prefer the faster, native support
>provided by REAL*10 on Intel, and ideally "we" (g77/egcs/whatever) would
>be able to provide REAL*10 somewhat faster than REAL*16 on other machines
>as well, even though, unlike on Intels, the REAL*10 would be emulated.

T:  There are 2 major varieties of REAL*16.  The one which HP (and, I believe,
Sun Lahey and DEC) use is the more accurate and slower one, which conforms
nominally to IEEE P854 and has roughly the same exponent range as the
Intel REAL*10.  SGI and IBM use a faster version, which is facilitated by the
fused Multiply-accumulate instructions, which has roughly 6 fewer bits of
precision, a range less than that of double precision, and doesn't conform to
IEEE P854.

T; In both the HP and SGI libraries, the math functions give up
accuracy so as not to lose as much speed, so it is possible in either case to
wind up with little more accuracy than you would get with a carefully
implemented REAL*10.  I don't know about the other vendors' libraries.  On
the pentiums, some of the math functions inherently take advantage of the
full precision (log() but not log10(), sqrt(), sin()/cos(), tan(), atan()), while a
few require more of the style of programming found in non-Intel math
libraries, but with asm() mixed in, putting the proper usage of clobbers to
the test.

>>C:I don't think aligned spills happen reliably at all on any *released*
>version of egcs or gcc yet (well, except maybe for old versions of
>gcc patched with those big g77 patches that *seemed* to do most of the
>aligned-double thing).  But it looks like egcs 1.2 or 1.3 will align
>doubles on the stack, covering spills, at or near a rock-solid level
>of reliability.

T: Treatment of spills in general seems to be one area where gnu has some
room for improvement, in comparison to commercial compilers, particularly
for Intel.  I'm sure amazed that Lahey lost track of their alignments for lf95,
but they seem otherwise to be able to avoid spill performance problems.

>>C: if the
>compiler decides to call library (or inline) functions for constructs
>not explicitly, in the code, involving such calls, and those functions
>are not 80-bit, the result might indeed be similar to spilling to 64-bit
>values in that the programmer doesn't expect a sudden loss of precision
>there.

>>C:I'm thinking, for example, of complex divides, which g77 implements
>by avoiding the back end's version and going straight for c_div (or
>whatever) in libF77, to support a larger domain of inputs with
>greater accuracy.

T:  There, of course, the straightforward use of extended precision takes
care of the situation more effectively, where special-case coding is needed
otherwise.  But that can be done by using conditional compilation inside
c_div, according to whether the target architecture has long double of
greater precision and range than double.

>>C:Though, in this example, the loss of precision is a bit easier to
>predict: it currently happens for complex divides.  Someday, though,
>we might decide to have it apply to complex multiplies, and/or it
>might be desirable to have the compiler choose, based on less visible
>data (than the source code) to do a call rather than in-line the code.
>It's important to preserve the precision in such cases.

T:  It's more a question of avoiding unexpected exceptions.  The overhead
of the function call is not a serious matter for c_div, but it could be for
multiplication.  I looked up some of the implementations when you
brought this up over a year ago, and the only one I found which takes
special precautions on complex multiplication was VAX/VMS.  It's
needed more on VAX floating point, as even with the precautions, the
range of working operands is less than with IEEE floating point and no
special precautions.

Dr. Timothy C. Prince
Consulting Engineer
Solar Turbines, a Caterpillar Company
alternate e-mail: tprince@computer.org

           To:                                              INTERNET - IBMMAIL
                                                            N3356140 - IBMMAIL

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~1998-12-17 20:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-12-15 15:06 /internet tprince
1998-12-15 19:12 ` /internet Stephen L Moshier
1998-12-15 19:21   ` /internet Joe Buck
1998-12-15 19:37     ` /internet Jeffrey A Law
1998-12-16  7:58       ` /internet Tim Hollebeek
1998-12-16  8:41         ` /internet Joe Buck
1998-12-16 11:45           ` /internet Stephen L Moshier
1998-12-16 11:59             ` /internet Joe Buck
1998-12-16 13:19               ` /internet Chip Salzenberg
1998-12-16 16:20                 ` /internet Jeffrey A Law
1998-12-16 17:42                   ` /internet Joern Rennecke
1998-12-17  9:46                     ` /internet Horst von Brand
1998-12-16 16:37               ` /internet Jeffrey A Law
1998-12-16 16:56                 ` /internet Per Bothner
1998-12-17 20:20                   ` /internet Jeffrey A Law
1998-12-16 17:52                 ` /internet Joern Rennecke
1998-12-17  4:43                 ` /internet Sylvain Pion
1998-12-17 10:26               ` /internet Craig Burley
1998-12-15 23:08     ` /internet Matthias Urlichs
1998-12-16  9:33       ` /internet Craig Burley
1998-12-16  5:44     ` /internet Stephen L Moshier
1998-12-16  9:37   ` /internet Craig Burley
  -- strict thread matches above, loose matches on Subject: below --
1998-12-16 12:34 /internet Geert Bosch
1998-12-16 13:02 ` /internet Harvey J. Stein
1998-12-16 16:25 ` /internet Jeffrey A Law
1998-12-04 17:41 /internet tprince

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).