From mboxrd@z Thu Jan 1 00:00:00 1970 From: Craig Burley To: toon@moene.indiv.nluug.nl Cc: burley@gnu.org Subject: Re: FWD: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86 Date: Wed, 16 Dec 1998 10:05:00 -0000 Message-id: <199812161805.NAA14993@melange.gnu.org> References: <3676C529.FE8214F@moene.indiv.nluug.nl> X-SW-Source: 1998-12/msg00588.html >Am I the only one - apart from Harvey J. Stein and Tim Prince - who >finds this whole discussion unreal ? Surely, 80 bit temporaries might >seem a neat hack to a numerical analyst like Dr. Kahan, but the ordinary >computational physicist or chemist knows better than to choose "poorly >conditioned" algorithms. Then we have an awful lot of "extraordinary" people sending email asking why FP doesn't work as expected in g77, gcc, egcs, and so on, and campaigning for various corrections to Java, IEEE 854, x86, or whatever! >My main concern is that there is a grid spacing that will render the >basic equation of geostrophy badly approximated in 32-bit arithmetic: > > 1 dp > --- -- = f v > rho dx > >p pressure >rho mass of air per unit volume (1 kg / m^3) >x distance >f Coriolis parameter (10^-4 s^-1) >v wind speed > >You can do the math (p ~ 10^5 kg m^-1 s^-2, v ~ 10 m/s, what dx will >make dp < 10^-3 p ?) I can't figure out what you're saying. How will *not* randomly spilling a computed 80-bit intermediate value to a chopped-down 64-bit result make your code stop working, exactly? >If that's the case we have to rethink our finite difference code for the >first time in 13 years and use a trick like subtracting a basic state >from the equations - big deal. I still don't know what you're saying. How does fixing this longstanding bug in gcc/g77 break your code, exactly, in terms I, as *not* a math expert, can understand? >The last thing I need is to have egcs slowed down to a crawl by having >it spill unaligned 80-bit temporaries for something that shouldn't be >larger than 32 bits in the first place. We don't *know* that it'll slow down to a crawl. Spilling outside of function-call return values seems to be rather rare; spilling return values probably happens less when optimization is turned on, and, besides, you're doing a *call* already! I don't like that stack frames will get somewhat larger, though. >Please make this and other "accuracy" options a "-pedantic-numerics" >one. Sounds like you're arguing in favor of the default being extreme speed at the expense of correct, consistent behavior. IMO, your experience represents that now-rare breed of programmers: People Who Know What They're Doing. And, if shops like yours can rewrite all your code, at huge expense, to no longer depend on language support for 64-bit integers, to get it to run on the fastest multi-million-dollar supercomputer you could get ahold of... ...then you can darn well use the `-fchop-fp-spills' option I've proposed (though, in my original proposal, I hadn't yet proposed a name for it) when you decide you want your working code to run faster. I think that's pretty fair to ask of you, rather than ask the millions of people who we want to use gcc, g77, and so on, over the next few years, to use a special option to get their fast code to start working in the first place! In particular, I'd rather people who use g77 to do their numerical work be able to remain experts in their fields, rather than have to become experts in compiler code generation, which they'll have to be to know what options to use. They might have to increase their computer expertise to get things to run *faster*, but, generally, I think we should not require people to be experts on deep-down details of how particular pieces of software do their job just to get their straightforwardly-written code to *work* in the first place. I know, "late answers are wrong answers", but wrong answers remain wrong answers no matter how quickly one obtains them. And since most people have far less expertise and resources than shops like yours, Toon, we don't want to burden them by telling them all the options they must use to get code to work "as expected", all of which carry a red flag saying "this will slow down your code, but if you know what you're doing you can avoid it", which will cause *most* of these programmers to say "of course I know what I'm doing", forget the option, and get wrong results. I think we'd be far better off, thinking globally and into the future, if the defaults tended towards correctness and consistency, and the options that changed behavior generally said things like "if you know your code never depends on ..., you can use this option to possibly get better performance". We're less likely to get spurious bug reports using this approach, at least -- compare the number of spurious bug reports we've gotten from people *using* -ffast-math versus those *forgetting* to use -ffloat-store, for example! Ideally, the philosophy I'm promoting above would extend to ensuring much more consistency across *all* GNU platforms. As I've said before, this'd mean defaulting to -mieee (or even -mieee-with-inexact) on Alphas, completely emulating IEEE software on a few older machines, and so on, and I don't think the industry would welcome the kind of performance drop we'd get as a worthwhile tradeoff for the small amount of extra consistency...at least, not right now, and, besides, if users want Java, they know where to get it (at least for the moment, in theory, when the moon is just overhead ;-). But there's a *clear* widespread lack of understanding among today's programmers that f(x) < f(y) does not imply f(y) > f(x) even when f has no side effects or external references and x and y are constants. I think it's easier for us to at least try to live with this lack of understanding by fixing the compiler to meet the expectations of this huge audience, than to try and teach them all to at least use some option like -pedantic-numerics, much less teach them all about why internal compiler code generation can produce such amazing results. >From my compiler-internals perspective, I'm flummoxed as to why *anyone* with knowledge of the issues would claim that 80-bit values should be randomly chopped down to 64 bits by the compiler as a *default*. From that perspective, performance simply isn't an issue -- if it was, we could simply not spill *anything* and just re-use random data and get *great* performance, if consistent results were so unimportant to us. The FP register stack contains 80-bit registers. *Exactly* why should we not spill them *correctly*? tq vm, (burley)