From mboxrd@z Thu Jan 1 00:00:00 1970 From: N8TM@aol.com To: tprince@cat.e-mail.com, burley@gnu.org, egcs@cygnus.com Subject: Re: FLOATING-POINT CONSISTENCY, -FFLOAT-STORE, AND X86 Date: Thu, 03 Dec 1998 06:34:00 -0000 Message-id: <32db0c93.3666999f@aol.com> X-SW-Source: 1998-12/msg00097.html In a message dated 12/2/98 burley@gnu.org writes: C: Craig Burley T: Tim <>C:In other words, the default for x86 code generation should >>apparently be that, when the compiler generates an intermediate result, >>it *always* uses maximum available precision for that result, even >>if it has to spill the result to memory. (I *think* it can do this while >>obeying the current FP mode, but don't have time to check right >>now.) >>[...] > >T: In the case where e is used in a subsequent calculation, we >don't want to force a store and reload unless -ffloat-store is >invoked. >C: Correct, AFAIK. T: There's some uncertainty here, where the desire to maintain performance causes us to keep the extra precision, although the programmer might conceivably not want it. In order to turn it off in a "fine-grained" manner, the programmer must program in a "float-store" which I do by invoking an external function which returns the rounded-off value (can't be in-lined). >T: But I'm not sure you can always apply the same rules to >storage to a named variable (it might be stored in a structure or >COMMON block) as to register spills, which aren't visible in the >source code. >C: No, I don't think you can, and that's what my proposal and email were trying to clarify (less than successfully, I gather!). >C: That is, I was trying to focus my proposal on only the compiler- generated temporaries that get spilled and chopped down to "size" at the same time. >T: This is a more >difficult question to solve and I'm confused about what >connection you are making between that and the spilled >temporaries. >C: In my proposal, essentially none, except that it used to confuse me, and I believe it still confuses others, that there are pretty bright- line distinctions between compiler-generated temporaries and user-named variables, in terms of precisions the compiler is, or should be, permitted to employ for each class. (But not all the distinctions are so clear, it seems.) >C: With compiler-generated temporaries, it is, again, helpful or hurtful, and normally permitted, for the compiler to employ *more* than the implicit precision of the operation, but the problem with the gcc back end, on the x86 at least, is that it (apparently) sometimes employs *less*, specifically, when spilling those temporaries. (That is, when the temporary needs to be copied from the register in which it "lives" to a memory location, the gcc back end apparently is happy to chop the temporary down to fit into a smaller memory location.) >C: My proposal deals only with this latter deficiency (as I now think it is), that is, it recommends that precision *reduction* of compiler- generated temporaries no longer happen (at least not by default). >C: - The compiler provides no way to "force" available excess precision to be reliably used for programmer-named variables anyplace that is possible (say, within a module). Some compilers offer explicit extended type declarations (REAL*10 in Fortran; `long double' in C?), but g77 doesn't yet. So, whether a named variable carries the (possible) excess precision of its computed value into subsequent calculations is at the whim of the compiler's optimization phases. T: I think what you are getting at is that it's usually acceptable for the results to be calculated in the declared precision; extra precision is usually desirable, but unpredictable combinations of extra precision and no extra precision may be disastrous. See Kahan's writings about the quadratic formula. Your proposal would make an improvement here. >C: REAL*16 seems to be asked for fairly often.) T: Probably by people who don' t recognize how much performance hit the Intel processors will take going from REAL*10 to REAL*16. If the Lahey/Fuji f95 compiler gets the alignment problems fixed so that REAL(kind=8) returns to good performance, I think this will become more evident. >T: I suspect the 96 bits must be written to a 128-bit aligned storage >location to minimize the performance hit. >C: Probably. But we're not even at 64-bit aligned storage for stack variables (which is where spills must happen, for the most part) yet, and IMO code that requires FP spills, on the x86 anyway, is probably not going to notice the lack of alignment due to its complexity. T: I believe that i686-pc-linux-gnulibc1 is trying with some success to do aligned spills, and that that's the reason why -O2 is often faster running than -Os on that target, while -O2 is slower than -Os on the same code on the targets which don't have double alignments on the stack. >T: If someone does manage to implement this, I would like to study >the effect on the complex math functions of libF77, using Cody's >CELEFUNT test suite. I have demonstrated already that the >extended double facility shows to good advantage in the double >complex functions. The single complex functions already >accomplish what we are talking about by using double >declarations for all locals, and that gives them a big advantage >over certain vendors' libraries. >C: Right now, my impression is that the effect would be nil *unless* these codes are complicated enough to cause spills of temporaries in the first place. T: The improvement in accuracy depends on getting extended precision results from built-in math functions, so it would require a math-inline option as well as the 80-bit register spills. I don't know whether it can be done effectively say by taking care to make the math-inline headers of libc6 more reliable. >C: First, the main goal of my proposal is to reduce unpredictable loss of precision on machines like x86, where programmers should be aware their code will often employ extended precision (and thus might depend on it). >C: However, if -ffloat-store is not used, then perhaps this reduction would not be complete, and lead to rarer, yet even more obscure and hard-to-find, bugs, unless we indeed make sure that even spills of named variables carry never chop the values of those variables (which might be in extended precision). T: That might be too much to expect. It's true that there could be situations where adding code might cause a named variable to be spilled to its declared precision where a simpler version used extended precision, but I doubt it's feasible to prevent that. I'll suggest a less ambitious goal: that the recognition of common sub-expressions should not lead to reduced precision: a = b*c + d*e f = d*e*g + h If the compiler decides to treat d*e as a common sub-expression, in order to save an operation, but then finds that this expression needs to spill, that spill and restore should be full precision. Otherwise, we get back to the unpredictable situations. tq vm, (burley) >C: P.S. Most, if not all of this, is the result of widespread disagreement over what a simple type declaration like `REAL*8 A' or `double a;' really means. The simple view is "it means that the variable must be capable of holding the specified precision", but so many people really expect it to mean so much more, in terms of whether operations on the variable may, might, or must involve more precision, etc. And, since the predominant languages give those people no straightforward way to express what they *do* really want, how surprising is it that they "overload" the "simple" view of what a type definition really means? >> T: This is getting off-topic. I might think that f90 declarations like a = REAL(selected_real_kind(15)) b = REAL(selected_real_kind(18)) could allow the programmer to express intent in more detail while retaining portability, but I don't think any existing compilers implement this in a useful way.