From mboxrd@z Thu Jan 1 00:00:00 1970 From: craig@jcb-sc.com To: d.love@dl.ac.uk Cc: craig@jcb-sc.com Subject: Re: Bug with g77 and -mieee on Alpha Linux Date: Sun, 11 Jul 1999 23:54:00 -0000 Message-id: <19990712065253.28570.qmail@deer> References: <199907062042.WAA00509@keksy.linux.provi.de> <19990707140435.1429.qmail@deer> <19990707194012.A291@keksy.linux.provi.de> <3783B4B1.89DC2124@moene.indiv.nluug.nl> <19990708135500.12573.qmail@deer> <19990710033831.20409.qmail@deer> X-SW-Source: 1999-07/msg00429.html List-Id: >>>>>> "JCB" == writes: > > JCB> Hardly. it's a *precise* description of what Toon constantly > JCB> advocates: the idea that any program written to exploit the full > JCB> range of IEEE 754 *values* (not even using *features* like > JCB> signalling NaNs, trapping on inexact, etc.) is inherently > JCB> *wrong*, > >I don't recall so, though perhaps that's what he thinks; it's >obviously silly. When it was asserted that even -mno-ieee didn't behave reasonably in the presence of denormals, that fact *was* hand-waved, effectively as not being evidence of our failure to provide a sufficiently clean, consistent environment. So, to me, it doesn't matter whether the *assertion* was wrong: it's the *hand-waving* that's wrong, and I will no longer put up with it (especially as done by management) on projects with which I'm associated. >Most of the time they are, and I say ``read `Working Programs'''. The >one in question is likely buggy, given that it's >optimization-dependent, regardless of the problem printing the >results. I wonder what percentage of people who are told "read `Working Programs'", pertaining to correct accommodation of FP behaviors across the present bunch of compiler/CPU combinations, actually *modify* their programs to meet the requirements; and what percentage of people find they can simply abandon g77 in favor of a compiler (even on a problematic CPU, like IA32) that is *sufficiently* predictable, FP-wise, that they need make many fewer changes, if any? Have you ever tried to find out, e.g. by following up with people a month or two after the fact? (I haven't seen a quote like the now-years-old, paraphrased, "g77 always comes up smelling of roses" in a *long* time, and I think this whole discussion explains why -- because, against my wishes, we've collectively allowed g77 to acquire a real stench about it, of unpredictability, instability, and lack of proper attention to important issues, like ensuring the viability of testing using standard test suites, of component testing, and so on.) The problem being that, while I don't like changing g77 to support gratuitously buggy programs, I have little faith that these sorts of bugs are *exclusively* gratuitous -- and, in the meantime, we lose out on that latter group of users helping test g77 on codes it doesn't otherwise see, at least not on the problematic CPUs. > JCB> -- my impression is, users are concluding *g77* is buggy, in > JCB> droves. > >They've always done so. To minimize it, you'll want defaults taken >from `Working Programs', and then they'll say it's just slow. I wonder what percentage of users we'd *lose* as a result of it being "slow", despite offering options (like -ffast-math, -mno-ieee, -ftruncate-fp-spills, the latter being the only one we don't already have) for "expert" users to use to speed things up? I wonder what percentage of users we'd *gain*, because, even though their *production* code *starts* out as slow, they are able to gain confidence by running their own code that *tests* for reasonable FP behavior, even if just to validate the compiler's own integrity, before turning on options to gain speed at the expense of some of that reasonable behavior? I wonder if anyone ever paid attention to these issues before making the pertinent decisions, and if anyone ever will, when it comes to GNU/GCC? (Well, I know from experience, we sometimes try, but the more effect a decision has, the more objective analysis that needs to be done up front. I see nearly *zero* evidence that such analysis is favored, even *now*, with all the controversy that has ensued, with all the users who've complained, some of which have *not* been so easily hand- waved as having "broken code", e.g. they're simply trying to test IEEE compliance on systems that, using *other* compilers, actually offer it!) > JCB> I'm coming to the same conclusion myself, that g77 is fatally > JCB> buggy, by design, because of its refusal to offer even basic, > JCB> predictable behavior. > >I've certainly wanted a framework for controlling the FP behaviour >across platforms as consistently as possible, as have other users, but >the work I did on it was rejected and it was clear that working on the >f2000 features would be a waste of time. What do you mean by "was rejected"? If you mean I didn't add the code you'd wanted added to libg2c to set up exceptions, then please say so, but I assure you, that wasn't a *rejection* in the sense I think others might take it as. That is, you're leaving the impression that you tried to make things *better* by offering such patches for inclusion in g77/libg2c, and that your *desire* to improve things was rejected. That was hardly the case. The problem was the method you chose, which, while perhaps best for a non-GNU-like vendor (which could easily include another OSS effort, so I don't mean "proprietary", though they often are able to expend up-front resources to make sure things work), involved changing the way the *system* behaved, by default. Now, *libf2c* already changes that by setting up exception handlers, but g77 already "swallows" whatever is "special" about libf2c, with a few exceptions. And, we've already gotten somewhat burnt by our (mostly my) feeling we're able to maintain our own libg2c, a libf2c with *source* (not just configury/build) modifications. Modifications I, myself, can at least fairly easily reason about because they're in straightforward C. Now, add on top of those problems modifications to the signal/exception environment, or that make use of pre-main()-invocation initialization, which *I* don't really know anything about (except that I've gotten the impression it isn't supported so uniformly as straight C code, might have bugs, etc.), and perhaps you can understand why I didn't just throw the code in -- especially back during the gcc 2.7 -> 2.8 fiasco, which included the g77 0.5.20 fiasco, both of which went for a *long* time and made for *substantial* changes of attitude among many of us (well, at least me) regarding how to properly maintain g77. So, I've (slowly) learned to minimize the changes g77 makes to the fundamental environments in which it operates: - The underlying CPU, whose default FP behavior and *recommended use* (which, frankly, includes 80-bit spills, though I can't do anything about that in g77) should be honored by g77, since we can neither take a big performance hit via emulation nor deploy resources to do "fast FP" our own way. Also, this honors the CPU choice of the user, to a bit of an extent (though g77 tries to cater to a wide audience here -- e.g. those effectively *forced* to do useful work on IA32, as well as those who buy a T3 or whatever, so that particular argument applies most weakly to the CPU, I would think). - The underlying OS, whose choices for the signal/exception environment, initial FP state, and so on, should be honored, again, because we don't want to emulate our way into a perfectly consistent environment, nor do we have the resources to go our own way, and, again, because it respects the OS choice of the user (thus encouraging users to complain to OS vendors, rather than us, about defaults they don't like, or at least read the docs for *their* environment *themselves* and thus discover how to override the defaults, rather than relying on us to guess at all sorts of configury hacks to install working patches that reset the defaults for them -- and for *other* unsuspecting users of those same systems). - Netlib libf2c, whose choices regarding changing the signal/exception environment, FP support, and so on, should be honored, again, ditto the reasons stated elsewhere (though we currently don't give g77 users a choice by supporting another run-time library, we will someday), plus we get a *more* robust product to the extent we "join in" with the testing people do on the f2c/libf2c combo. (And we've indeed taken hits, one way or another, by going our own way via libg2c.) - The gcc back end, which could be argued makes the *worst* choices for g77/FP work of the items under discussion, though it *is* improving. (After all, we've been avoiding its complex divide for about 99% of g77's history, avoiding *all* of its complex arithmetic for most of g77's public life, etc.) But, the more we can simply "fall in line" with gcc -- e.g. via Toon's patches to offer a better complex divide -- the better (though Toon's work is not yet a great example of this, as it involves simply switching among methods based on the front end, and the g77 method is "new" code, but at least that new method can be used by other front ends as-is, which was not the case for using c_div/z_div). So while I rejected some proposed changes to g77 or libg2c that *might* have made things somewhat better, I certainly have *never* rejected the ideas that lay behind them, to the extent they were preferably implemented in the gcc back end, in libf2c, in the underlying OS, or in the CPU. And, keep in mind, Toon has been, in effect, arguing *against* changes like yours, because, as he points out, we're not going to get "perfect" consistent behavior *anyway*, so why do anything that might slow down code, which adding a signal handler (or changing an FP mode) might? (Though I don't recall having specifically objected to *that* -- he probably wouldn't, since such code wouldn't be in a loop!) For myself, not having had, prior to a year or so ago, a particularly clear picture of the issues, I had to rely on advice as to how to generally handle these issues, combined with my own need to know, to at least some extent, what was going on. I was probably, back when you submitted the patches you're talking about, in the mode Toon still is, which is "if it involves FP, you're on your own", loosely stated, and doing everything I could to *avoid* having g77 adjust the underlying components (libf2c, gcc, OS, and CPU), especially system-specific changes (which can easily mushroom into bazillions of #ifdef's, as most everyone who does GNU-like development realizes by now). Also, I thought I made it perfectly clear that I was in constant- apology mode regarding my lack of being up to snuff wrt Internet access, ability to upload/download bugs, investigate stuff, etc. Until early this year, that was a *huge* problem, making investigating anything "peripheral" to my fundamental job as g77 front-end maintainer be poorly treated by me. Add to that the fact that, also until very recently (starting with the EGCS project, but, more practically, ramping up through early this year), nobody could really get anything into g77 without going through me, in the sense that I had to be the one to distribute not only *releases* of g77, but *alpha* versions. So, if I added a patch that would, even in practice, improve g77's behavior on a particular system, and went to the (significant!) trouble of distributing that version of g77 for alpha-testing, and it blew up *other* systems, that'd be a *huge* waste of various resources, which were in such short supply at that time, I was even ignoring some pretty mainline g77 front-end issues. (Heck, until I got my PII and except when using my Alpha, I avoided making significant changes to f/com.c and f/expr.c, because Emacs C-mode couldn't keep up with my thought/typing process -- seemed like its indentation handling, which gets invoked when typing `}', had slowed down at some point or something, making my poor little 486 sputter for a few seconds every few keystrokes.) That, and other circumstances, conspired to make me *very* reticent to add any changes in which I didn't *personally* have a high degree of confidence, that didn't come from a source that suggested a high degree of confidence was due the changes (e.g. I accepted changes by dmg to netlib libf2c for libg2c pretty much blindly), and that could easily break lots of things, *especially* in subtle, not-discovered-until- way-down-the-road things. Further, on many of the occasions where I *did* confidently forge ahead with similarly risky changes to g77, whether made by me or incorporated on someone else's say-so, we got severely burnt. (Some of what made g77 circa 0.5.21, I forget exactly which versions, buggy was because I'd pretty much blindly folded in most, or maybe all, of the GNAT patches to the gcc back end, on kenner's say-so. For all I know, *they* might have been all correct -- bugs might have been all due to the interactions they had with g77's patches -- but the upshot was, if I hadn't integrated them, we *might* have had a more stable g77 for awhile there.) To sum up my reasons for not immediately integrating whatever such patches you might submit as "rejection" does both me, and my reasoning for not taking them as-is a disservice. >Complete consistency across >even IEEE-ish targets is surely doomed, though. I'm not quite ready to say *that*, but I certainly wouldn't encourage GCC to attempt it, especially not as a default. Certainly the current crop of g77/gcc users is not interested in it. What I would *love* to see is a full-court-press by the number-crunching industry to effectively *mandate* strict IEEE 754 conformance (to the "range" of the standard appropriate, e.g. not bothering with trapping/ exceptions or extra rounding modes in FORTRAN-77-based languages, but still getting all the precision and consistency exactly right, for the *default* types, with no excess precision, thus no spill issues, etc.). Not that I think that highly of IEEE 754 per se (how can I, not being a numerical analyst?), but if the industry did just make every Fortran compiler do whatever it took to conform to that standard by default, the benefits would be enormous, in better, more widespread, testing at component/unit levels, in reducing complexity for programmers, allowing them to focus on *their* problems rather than peripheral issues (like "which floating-point behavior do I get *today*?")... ...and, most importantly, it would probably result in whatever Intel shipped as the next IA32 "upgrade" finally delivering IEEE-754- conforming *performance*, instead of *punishment*. > JCB> So other systems crash on overflow (instead of generating Inf)? > >Of course they do, though they're decreasing in importance in my game, >if you mean the basic hardware. However, I'd almost always want to >set up the FP system on an IEEE box to do that anyway to find the >bugs. I distributed the crystallographic suite that way for all the >systems I knew how. I got grief from users who seemed to want bogus >output, but I fixed plenty of bugs. Excellent!! The better a job we can do communicating these potential means for finding problems to users, the better. Yes, a completely uniform g77 environment across platforms that, e.g., defaulted to crashing on overflow rather than returning Inf might be ideal, but, failing that, I think we should a) go with whatever the (four-part) environment, described above, decides and b) document as best we can how to override that environment. (Though, I have had some people tell me that a default of crashing instead of returning Inf is wrong. I am not prepared to make that sort of decision myself.) >Please spare me the diatribes when I report how things actually behave. What in the world prompted you to say that?? >$ uname -a >OSF1 pxsv6.dl.ac.uk V4.0 1091 alpha >$ cat >z.f > a=exp(-100.) > print *, a > end >$ f77 -O0 -ieee_with_no_inexact z.f && ./a.out > 3.7835059E-44 >$ f77 -O0 z.f && ./a.out > 0.0000000E+00 >$ g77 z.f && ./a.out > 0. Well, *that's* good news! How does this behave? CHARACTER*50 A EQUIVALENCE (I, R) A = '1E-40' READ (UNIT=A, FMT=*) R PRINT '(Z8)', I END (It should print "116C2", modulo endianness, if my IA32 system is any guide.) If it crashes, how does it behave if "1E-40" is replaced by "1E-400", and by "1E-4"? > JCB> the *actual* behavior appears to be that, > >I refer to the behaviour I actually observe. I can now check how an >Alpha works rather than just being told I don't understand it without >useful explanation. Who told you you didn't understand it? Could you provide a reference? Keep in mind *I* have pretty consistently pointed out that I don't have my Alpha up and running yet (as my web site makes pretty clear). >The issue with libI77 is trivially fixed. By compiling it with -mieee? But we won't be doing that for gcc 2.95, so if doing so exposes bugs not caught by *our* limited testing, but that might be caught by the wider base of users who limit their testing to releases (or prereleases) of g77, that's going to be painful. Alternatively, we could make a modification to libg2c's pertinent routines. Not the approach I want to take anymore. Or, we could convince dmg to change netlib libf2c, which, if we can agree that the change makes sense within the context of a consistent numerical environment, is the option I'd prefer. (More and wider testing, etc.) > >> Note that gcc's default is the same as Digital's. > > JCB> If only that were the case, we'd have few of the problems we > JCB> have now. > >I'm using the Digital compiler and I checked before saying so. They >both default to the non-IEEE mode, as documented. Did you not bother reading the rest of what I wrote? I *know* they chose what we call -mno-ieee as a *compiler* option the default, but they did *not* choose failing to properly *research*, *design*, and *implement* it as a default, did they? I've explained that *many* times now. Why are you insisting on twisting my words around, just to make a point? Do you really have so little respect for my efforts here? > JCB> Digital Fortran offers a fully-working environment based on what > JCB> we call -mno-ieee *and* offers a fully-working environment based on > JCB> what we call -mieee as well. It happens to offer the first as a > JCB> default. > > JCB> g77 offers neither choice. > >I don't understand what `fully-working' means -- bug free? (g77 >clearly doesn't provide a fully-working environment on any system; >particularly because debugging doesn't work properly.) No, I've *explained* what fully-working means before -- and probably even in that email. Didn't you bother reading it? >The DEC default is to crash on overflow, for instance, which it sounds >as though you think is wrong. I get a segv from attempted i/o of a >subnormal in default mode compiling with it. I've verified that gcc >and g77 pass paranoia perfectly with gcc -mieee and appropriately >multilibbed libg2c. No, I don't think crashing on overflow is wrong *per se*. I think g77-compiled code crashing on overflow, denormal, underflow, or even divide by zero is wrong if the underlying system (e.g. the native compiler who we're perhaps trying to emulate to some degree, or the way the OS normally behaves under, say, f2c/libf2c) does *not* do so. The idea I keep trying to get across is that, to the extent we don't tightly adhere to *standards*, we lose opportunities for testing, code re-use, user understanding, etc. (Some of those "standards" are really ad-hoc, "this is how this machine behaves" sort of things, and that makes consistency really difficult to achieve in some cases, I realize. Each issue must be carefully thought through.) (I had, up until several months ago, been hoping/assuming people would just magically do the right testing during development and early prerelease. It's now clear that's not the case, so I support, with more understanding now rather than just faith or appreciation for attempts to DTRT, the "longish" release schedules adopted by the EGCS project. I now think they may well not be long enough, or allow for enough show-stopper bugs to be discovered, tracked down, fixed, and respun.) The next choice is to adhere to what a particular *system* normally does, and then implement that properly. My impression was, based on submissions *to this list*, that we did *not* do that for g77 vis-a-vis Digital's compilers. Now, you can claim that, in fact, those submissions were in error, that we do, in fact, offer (fundamentally, aside from typical sorts of bugs) the same environment Digital does (at least for -mno-ieee; clearly that's not the case for -mieee, so we cannot really "capture" users who want to *start* with -mieee and then selectively "migrate" codes to -mno-ieee). But, my problem with this whole debate has been how people like Toon and yourself have responded to these mere *allegations* of incorrect behavior: "It's not incorrect, fix your code." "Any code that generates a denormal or computes an underflow is wrong, so it's better to crash." (Even if *Digital* offerings *don't* crash in the same circumstances!!) And we've seen the same attitude manifested before as regards 80-bit spills. Further, this attitude is now being copped by people upon whom I previously relied to support my efforts to make g77 a great product, and who were not even the people who made these short-sighted decisions in the first place (meaning there are even *more* people who have these attitudes, and have, or at least had, the power to impose them on others). So I'll no longer be working on new g77 stuff, since I have concluded that I won't have the support I need to do it right, nor to withstand assaults on my attempts to make it *work* first, *then* make it fast, as defined by *me* using *my* 25 years of experience in the field. >Given the value of `offers neither choice', perhaps someone can sort >it out. I'd fix and test the multilibbing if I thought it stood a >better chance of being accepted than similar stuff I've wasted time >on. Please explain what you mean by "similar stuff", and give examples, as the only things I can guess at what you *might* be talking about aren't *remotely* similar in ways that pertain to this discussion. Particularly disturbing is the fact that you're the *only* other person to whom I can go to for help maintaining libg2c; that you and I took *lots* of heat (e.g. from HJ Lu) trying to make libg2c work with multilibs (work mostly done by Robert Lipe, IIRC); and, here you are, implying that I'd actively resist attempts you make trying to *use* that multilib facility to extract the exact same benefits I've clearly said I *assumed* had been present all along, and would have fixed *myself* if I'd discovered, on my own, had been omitted (by virtue, for example, of having a running Alpha). All because I didn't accept patches from you to change how libg2c sets up the exception environment, or some such thing? If you didn't before, now you (and others) can understand now, why I no longer see my working on g77 as particular productive. You'd rather argue with me than let me work, especially if the alternative is that I might make g77 more robust for more users, but perhaps a little slower for a few. Whether it's entirely my fault, I've now lost the support of the very people who've been most important to g77's success in the past (other than myself): Dave Love, who made libU77 and libg2c happen, but who now appears to claim I actively resist his attempts to improve g77; Toon Moene, almost *the* biggest supporter of my g77 work over the years, who takes up a huge, long argument with me (while I *could* have been working on the rewrite) because he didn't agree with me that the hassle users have to deal with to use -mieee was just too much, and who has previously (among other things) dismissed my proposal to do 80-bit spills as a default nearly out-of-hand; and Richard Henderson, whose also dismissed that same proposal nearly out of hand, yet without whom g77's current performance would be *abysmal* on some machines (like Alphas). I mean, yes, it's been wonderful to get the statements of support I *have* gotten on these issues, but they've all come from people who I don't recognize as active participants in improving g77 (or the gcc back end). There's nobody who has consistently spoken up to support my view that we should choose defaults that tend to lead to an overall more robust environment *and* to whom I can look to to be taken seriously by others who are doing the actual *work* on these projects. (Certainly their statements of support, in these public discussions, have apparently accomplished nothing vis-a-vis the opinions expressed by those who *do* directly influence g77's evolution.) I can't carry the water for doing things right anymore vis-a-vis g77. There's no way the architecture I intended to use for my rewrite will be tolerated in this environment, and it's highly unlikely g77 will offer reasonable, consistent-with-vendor-practice stable numerics in the timeframe that *I* require it to to make that rewrite (and the features I expected to add via it) worthwhile anyway. I'm not going to fight anymore. And, despite Toon's attempts to get me to do so, I'm not just going to be a code-boy who obediently (and *voluntarily*) "improves" g77 exactly as directed by others, when I *know* those directions are wrong-headed and short-sighted, as well as that the results don't speak well for my *overall* abilities (which include product *architecting* and *design*, not just implementation and debugging, which appears to be the only areas, if any, for which I'm respected by you). Best of all, by stepping aside, I make it *much* easier for you, Toon, and others to designate a replacement g77 maintainer, someone for whose opinions and experience you'll have much more respect *or* who will happily make whatever changes you suggest, regardless of how little actual R&D has been put into them. (I sincerely hope the former occurs, but don't hold out much hope for it. I'll certainly have plenty of email-archive URLs ready to offer to anyone who asks me "what sorts of things happen on this project when somebody stands up and says, here's the *right* way to solve this problem, when the right way isn't the convenient, cheap, or fast way?") > JCB> Remember, my original statement was to the effect that -mno-ieee > JCB> as a default was a poor choice because we didn't bother to > JCB> properly implement it, whereas, at least with -mieee, we'd have > JCB> had plenty of *existing* code (coming from other, > JCB> IEEE-754-conforming, environments), such as library routines, > JCB> test suites, and so on, to just "plug in" for ordinary use. > >I don't understand what that's about. How is it not properly >implemented (modulo general bugs with 64-bit targets and other bugs >not directly related to the fP model)? Why the confidence that -mieee >is properly implemented in contrast? (libf2c was developed on VAXen, >presumably along with Berkeley libm stuff originally.) We didn't make sure we implemented -mno-ieee with *nearly* as much attention to detail as *Digital* did when *it* chose to go out-of-step with the industry (IEEE 754). (A choice I gather it now regrets, or at least no longer sticks with, if my impressions about the 21264 are correct. I'd love to find out, for sure, what the top Digital/Compaq technical gurus now think about making -mno-ieee the default, assuming that they now, on 21264, effectively offer -mieee the default.) And, by not choosing -mieee, we've clearly chosen to not ensure that we offer a consistent IEEE 754 environment on Alphas, a choice Digital *did* make, because -mieee doesn't even work under g77. So all the testing that *could* have gone on, using codes that assumed IEEE 754 (even if just codes whose *only purpose* was to test IEEE 754 compliance, i.e. *not* codes subject to Toon's numerical-analysis-expertise-based rejection), has *never* gone on for g77/Alpha, or gcc/Alpha, or g++/Alpha. To put it coarsely: the decision GNU made regarding -mieee was made mainly to avoid the hard work of getting FP right in gcc (and, by extensions, unfortunately, g77). That hard work might have included doing difficult optimizations to make -mieee perform well (as it could, I gather, perform *much* better than it does today, since there are "trap shadows" we could account for and thereby avoid some uses of TRAPB). And/or, it might have included making sure all the libraries supported it properly. And/or, it might have included making sure most of the code out there that helps people test IEEE 754 conformance was run by volunteers using -mieee. And/or, it might have included thoroughly documented these issues up front, so that not only end users, but "middle" people, like myself, would have been made aware of them up front -- to make better decisions regarding testing, for example. We didn't do any of that, AFAICT. I assert that Digital did *all* of it, up front. Taking that into account, saying that "we made the same choice as Digital" would probably be litigable. It certainly hides the truth. And it hides it in a way that is consistent with likely *future* blunders like this, i.e. where someone just says "well, it's okay for us to not do this right, because some other vendor doesn't", and everybody goes "uh-huh, okay, sounds great", and nobody asks (or they're yelled at for doing so) questions like "well, what *else* might that vendor have done to mitigate *their* choice, that *we* had better consider doing?". (Remember, for example, that it might often be the case that $100K worth of test codes might be publicly available that test against some *standard*, and therefore available to GNU products, while any *vendor* that chooses to *not* follow that standard can pony up $100K - $1M, or more, to obtain a *proprietary* set of test codes, equivalent to the public ones, but that test *their* choice of protocols, formats, or whatever. If GNU copies the *vendor* choice, it locks itself out of that free $100K-worth of test codes, and probably won't have reasonable access to that $1M worth of *proprietary* codes, either. If this theory is correct, and if it applies here, I wonder just how many bugs we would have flushed out of `-mno-ieee' as the default if we'd been able to run all the tests Digital designed and ran on its Alpha product line, from the beginning? I wouldn't be surprised if we *still* fail more in *their* test suite than *they* failed as of their first public, non-beta release. But we have no easy way of finding out. Even if not specifically the case with -mno-ieee and Digital, this, as a hypothetical situation, illustrates the hazards we *must* account for every time we decide to go our own way.) > JCB> The assertions so far amount to "but any code that doesn't work > JCB> with g77 is inherently broken", or some Gatesian approximation > JCB> thereof. > >I fail to see how, at least from me. g77 on Alphas has clearly had >serious problems unrelated to IEEE FP behaviour. Not from you, but you've tended to agree with Toon's statements concerning the irrelevance of IEEE 754 support in Fortran codes. Toon certainly made the sharpest statements in regard to this issue, and he was *not* alone in doing so back in December during the 80-bit-spill debacle. >I don't get the rant about DEC's FP models. What rant? I was *applauding* Digital's decision to do these things *right* -- make sure the industry-standard stuff works, and make sure their non-standard choices work. (And their VAX FP models *predated* IEEE 754, I'm pretty sure -- I think VAX came out in early 1978 or so, and I would assume its F_floating, D_floating, and G_floating formats were present, or at least all specified, from the beginning, whereas I'm under the impression IEEE 754 came out years later.) So I was hardly criticizing them for creating *those* formats. My point was that Digital goes, or went, to a *lot* of trouble to make sure it supports, with a minimum of surprise to its end users, any decisions it makes that involve being outside the boundaries of standards, de facto or otherwise. (Heck, when they designed the STRUCTURE facility, they actually tried hard to make sure it *wouldn't* look like whatever future facility standard Fortran might want to offer, so there wouldn't be collisions in the "space" of language design. Exactly the opposite sort of behavior, especially as regards ethics, towards the industry, *and* its customer base, over the long haul, exhibited by MS, and some others as well.) Now, it's true that Digital didn't become what MS became, and that one could claim it was their very attention to detail and elegance that doomed them. I might agree. I *don't* agree that GNU should copy the worst aspects of MS' impact on the industry -- the general unwillingness to appreciate and execute proper design and engineering, as well the refusal to stick as closely as possible to standards and then deviate only *properly* from them, when necessary, in ways that at least don't hinder the long-term prospects of the industry and/or its customer base (*even* if they decide to move to a different vendor!). >VAX FP is the major >reason that most code doesn't require IEEE conformance, for historical >reasons if nothing else. If DEC decided (for such reasons?) that that >a non-IEEE default was appropriate, and they're so good at such >things, I'm baffled why it was so wrong for kenner to follow the >lead. Because kenner, or really the *rest* of us, didn't bother to actually *follow* the lead. We merely appropriated the stance that IEEE 754 conformance wasn't worthwhile as a default. Further, we decided (by fiat) that properly *following through* on that decision, e.g. by ensuring we produced a product designed *properly* despite being non-standard out of the box, wasn't worthwhile either, a stance Digital surely did *not* take. As a result, we continue to play whack-a-mole with g77-, gcc-, and, generally, GNU-wide choices to "march to a different drummer" and how those choices affect our ability to attract a wide user base, maintain products, and so on. (I fully recognize that some of these issues make good arguments for going *against* current/standard practice, so I'm not making a blanket statement about the *appropriateness* of all those choices, merely about their *effects*.) tq vm, (burley) >>From craig@jcb-sc.com Mon Jul 12 00:24:00 1999 From: craig@jcb-sc.com To: law@cygnus.com Cc: craig@jcb-sc.com Subject: Re: Bug with g77 and -mieee on Alpha Linux Date: Mon, 12 Jul 1999 00:24:00 -0000 Message-id: <19990712072342.28771.qmail@deer> References: <4048.931760012@upchuck.cygnus.com> X-SW-Source: 1999-07/msg00430.html Content-length: 2422 >The difference between these two cases is in the -f[no]emulate-complex we >aren't even sure *what* the compiler should be doing, much less *how* to >get it done. Ah, okay, I didn't realize that the problem went that deep. >There's a rather serious design issue that needs to be investigated before we >can even begin to look at solutions. Worse yet, the investigation phase will >probably require looking at a variety of other Fortran compilers to see how >they handle passing of complex values -- we should not insert a gratuitous ABI >incompatibility for passing complex values. Agreed. Though, for g77's purposes, the ABI for complex is currently *always* the ABI for struct { float; float; };. I'd be interested in knowing about any ABI's for which that was not the case, because they'd be systems on which g77 -fno-emulate-complex might not even *work*, if implemented to follow the native ABI. (That's because g77 would be telling the back end to pass/accept __complex__ across calls, but the other end might be f2c-compiled, or g77-emulating-complex, or other, code that uses the struct method.) Put another way, g77 is presently architected (in terms of options it provides, the way it works internally, etc.) to assume the ABI doesn't make __complex__ different from the equivalent struct. We'd at least need to understand and document such differences, e.g. explain that users of pertinent systems not combine g77 -fno-emulate-complex with code from certain types of other sources. (-fno-f2c is related to this, as another example.) >Letting the release out with -fno-emulate-complex without resolving these >issues >makes it much more likely that we will need to break binary compatibility >again for the Fortran compiler once we figure out how to properly pass complex >values. Note that I'm not aware there were ever any problems, in *this* area at least, back when -fno-emulate-complex was the only choice, though that was a long time ago, and I might have forgotten. >I haven't made a final decision on the complex stuff, but I'll have to make one >in the very near future (next 24hrs). Whew, well, good luck! I *still* don't know what to recommend, other than to say that, if we change the default again and don't respin/retest for at least a month, we're risking some pretty serious regressions. Wish I could put a number on that risk, as that might help you. tq vm, (burley) >>From law@cygnus.com Mon Jul 12 00:59:00 1999 From: Jeffrey A Law To: craig@jcb-sc.com Cc: egcs-bugs@egcs.cygnus.com Subject: Re: Bug with g77 and -mieee on Alpha Linux Date: Mon, 12 Jul 1999 00:59:00 -0000 Message-id: <4241.931766333@upchuck.cygnus.com> References: <19990712072342.28771.qmail@deer> X-SW-Source: 1999-07/msg00431.html Content-length: 2171 In message < 19990712072342.28771.qmail@deer >you write: > >The difference between these two cases is in the -f[no]emulate-complex we > Agreed. Though, for g77's purposes, the ABI for complex is currently > *always* the ABI for struct { float; float; };. OK. That's good to know. If this is indeed the ABI we need to use after investigating vendor compilers, then we've probably got some work to do in the backends which pass parameters in registers since I do not believe all are prepared to handle complex modes in the various FUNCTION_ARG macros. Having it as a struct does help with some issues since structs are usually not passed in FP registers, even when their members are strictly floats, strictly doubles or strictly long doubles. Treating it as an aggregate has some performance impacts, so I wouldn't be totally surprised if some vendor treated it as two independent float args for parameter passing purposes. But that's something we'll have to investigate. > I'd be interested in > knowing about any ABI's for which that was not the case, because they'd > be systems on which g77 -fno-emulate-complex might not even *work*, Yup. I doubt we've ever done any serious work at looking at how vendor compilers handle complex and interoperability between g77 and vendor compiled fortran libraries (or even libf2c/libg2c compiled with the vendor compiler). On systems like HPs incompatibilities in this area could also be hidden by the linker for double complex values -- in the event of a caller/callee mismatch for register types, the linker will insert a trampoline to shuffle arguments from one register file to the other. Ick. > Whew, well, good luck! I *still* don't know what to recommend, other > than to say that, if we change the default again and don't respin/retest > for at least a month, we're risking some pretty serious regressions. > Wish I could put a number on that risk, as that might help you. I doubt we'd need a month to respin/retest. I bet we could get all the testing we needed by lapack in a week. lapack looks to stress the complex stuff a hell of a lot more than our regression testsuite. jeff