From mboxrd@z Thu Jan  1 00:00:00 1970
From: craig@jcb-sc.com
To: d.love@dl.ac.uk
Cc: craig@jcb-sc.com
Subject: Re: Bug with g77 and -mieee on Alpha Linux
Date: Sun, 11 Jul 1999 23:54:00 -0000
Message-id: <19990712065253.28570.qmail@deer>
References: <199907062042.WAA00509@keksy.linux.provi.de> <19990707140435.1429.qmail@deer> <19990707194012.A291@keksy.linux.provi.de> <3783B4B1.89DC2124@moene.indiv.nluug.nl> <19990708135500.12573.qmail@deer> <rzqoghls9j6.fsf@djlvig.dl.ac.uk> <19990710033831.20409.qmail@deer> <rzqu2rbqb8f.fsf@djlvig.dl.ac.uk>
X-SW-Source: 1999-07/msg00429.html
List-Id: <gcc-bugs.sourceware.org>

>>>>>> "JCB" ==   <craig@jcb-sc.com> writes:
>
> JCB> Hardly.  it's a *precise* description of what Toon constantly
> JCB> advocates: the idea that any program written to exploit the full
> JCB> range of IEEE 754 *values* (not even using *features* like
> JCB> signalling NaNs, trapping on inexact, etc.) is inherently
> JCB> *wrong*,
>
>I don't recall so, though perhaps that's what he thinks; it's
>obviously silly.

When it was asserted that even -mno-ieee didn't behave reasonably
in the presence of denormals, that fact *was* hand-waved, effectively
as not being evidence of our failure to provide a sufficiently
clean, consistent environment.

So, to me, it doesn't matter whether the *assertion* was wrong:
it's the *hand-waving* that's wrong, and I will no longer put up
with it (especially as done by management) on projects with which
I'm associated.

>Most of the time they are, and I say ``read `Working Programs'''.  The
>one in question is likely buggy, given that it's
>optimization-dependent, regardless of the problem printing the
>results.

I wonder what percentage of people who are told "read `Working Programs'",
pertaining to correct accommodation of FP behaviors across the present
bunch of compiler/CPU combinations, actually *modify* their programs
to meet the requirements; and what percentage of people find they can
simply abandon g77 in favor of a compiler (even on a problematic CPU,
like IA32) that is *sufficiently* predictable, FP-wise, that they need
make many fewer changes, if any?  Have you ever tried to find out,
e.g. by following up with people a month or two after the fact?  (I haven't
seen a quote like the now-years-old, paraphrased, "g77 always comes up
smelling of roses" in a *long* time, and I think this whole discussion
explains why -- because, against my wishes, we've collectively allowed
g77 to acquire a real stench about it, of unpredictability, instability,
and lack of proper attention to important issues, like ensuring the
viability of testing using standard test suites, of component testing,
and so on.)

The problem being that, while I don't like changing g77 to support
gratuitously buggy programs, I have little faith that these sorts of
bugs are *exclusively* gratuitous -- and, in the meantime, we lose out
on that latter group of users helping test g77 on codes it doesn't
otherwise see, at least not on the problematic CPUs.

> JCB>  -- my impression is, users are concluding *g77* is buggy, in
> JCB> droves.  
>
>They've always done so.  To minimize it, you'll want defaults taken
>from `Working Programs', and then they'll say it's just slow.

I wonder what percentage of users we'd *lose* as a result of it being
"slow", despite offering options (like -ffast-math, -mno-ieee,
-ftruncate-fp-spills, the latter being the only one we don't already
have) for "expert" users to use to speed things up?

I wonder what percentage of users we'd *gain*, because, even though
their *production* code *starts* out as slow, they are able to gain
confidence by running their own code that *tests* for reasonable
FP behavior, even if just to validate the compiler's own integrity,
before turning on options to gain speed at the expense of some of
that reasonable behavior?

I wonder if anyone ever paid attention to these issues before making
the pertinent decisions, and if anyone ever will, when it comes to
GNU/GCC?  (Well, I know from experience, we sometimes try, but the more
effect a decision has, the more objective analysis that needs to be done
up front.  I see nearly *zero* evidence that such analysis is favored,
even *now*, with all the controversy that has ensued, with all the
users who've complained, some of which have *not* been so easily hand-
waved as having "broken code", e.g. they're simply trying to test
IEEE compliance on systems that, using *other* compilers, actually
offer it!)

> JCB> I'm coming to the same conclusion myself, that g77 is fatally
> JCB> buggy, by design, because of its refusal to offer even basic,
> JCB> predictable behavior.
>
>I've certainly wanted a framework for controlling the FP behaviour
>across platforms as consistently as possible, as have other users, but
>the work I did on it was rejected and it was clear that working on the
>f2000 features would be a waste of time.

What do you mean by "was rejected"?  If you mean I didn't add the code
you'd wanted added to libg2c to set up exceptions, then please say so,
but I assure you, that wasn't a *rejection* in the sense I think others
might take it as.

That is, you're leaving the impression that you tried to make things
*better* by offering such patches for inclusion in g77/libg2c, and
that your *desire* to improve things was rejected.

That was hardly the case.  The problem was the method you chose, which,
while perhaps best for a non-GNU-like vendor (which could easily include
another OSS effort, so I don't mean "proprietary", though they often are
able to expend up-front resources to make sure things work), involved
changing the way the *system* behaved, by default.

Now, *libf2c* already changes that by setting up exception handlers, but
g77 already "swallows" whatever is "special" about libf2c, with a few
exceptions.

And, we've already gotten somewhat burnt by our (mostly my) feeling
we're able to maintain our own libg2c, a libf2c with *source* (not
just configury/build) modifications.  Modifications I, myself, can
at least fairly easily reason about because they're in straightforward C.

Now, add on top of those problems modifications to the signal/exception
environment, or that make use of pre-main()-invocation initialization,
which *I* don't really know anything about (except that I've gotten
the impression it isn't supported so uniformly as straight C code,
might have bugs, etc.), and perhaps you can understand why I didn't
just throw the code in -- especially back during the gcc 2.7 -> 2.8
fiasco, which included the g77 0.5.20 fiasco, both of which went for
a *long* time and made for *substantial* changes of attitude among
many of us (well, at least me) regarding how to properly maintain g77.

So, I've (slowly) learned to minimize the changes g77 makes to the
fundamental environments in which it operates:

  -  The underlying CPU, whose default FP behavior and *recommended use*
     (which, frankly, includes 80-bit spills, though I can't do anything
     about that in g77) should be honored by g77, since we can neither take
     a big performance hit via emulation nor deploy resources to do "fast
     FP" our own way.  Also, this honors the CPU choice of the user, to
     a bit of an extent (though g77 tries to cater to a wide audience
     here -- e.g. those effectively *forced* to do useful work on IA32,
     as well as those who buy a T3 or whatever, so that particular argument
     applies most weakly to the CPU, I would think).

  -  The underlying OS, whose choices for the signal/exception environment,
     initial FP state, and so on, should be honored, again, because we
     don't want to emulate our way into a perfectly consistent environment,
     nor do we have the resources to go our own way, and, again, because
     it respects the OS choice of the user (thus encouraging users to
     complain to OS vendors, rather than us, about defaults they don't like,
     or at least read the docs for *their* environment *themselves* and
     thus discover how to override the defaults, rather than relying on
     us to guess at all sorts of configury hacks to install working
     patches that reset the defaults for them -- and for *other*
     unsuspecting users of those same systems).

  -  Netlib libf2c, whose choices regarding changing the signal/exception
     environment, FP support, and so on, should be honored, again, ditto
     the reasons stated elsewhere (though we currently don't give g77 users
     a choice by supporting another run-time library, we will someday),
     plus we get a *more* robust product to the extent we "join in" with
     the testing people do on the f2c/libf2c combo.  (And we've indeed
     taken hits, one way or another, by going our own way via libg2c.)

  -  The gcc back end, which could be argued makes the *worst* choices
     for g77/FP work of the items under discussion, though it *is*
     improving.  (After all, we've been avoiding its complex divide
     for about 99% of g77's history, avoiding *all* of its complex
     arithmetic for most of g77's public life, etc.)  But, the more we
     can simply "fall in line" with gcc -- e.g. via Toon's patches to
     offer a better complex divide -- the better (though Toon's work is
     not yet a great example of this, as it involves simply switching
     among methods based on the front end, and the g77 method is "new"
     code, but at least that new method can be used by other front
     ends as-is, which was not the case for using c_div/z_div).

So while I rejected some proposed changes to g77 or libg2c that *might*
have made things somewhat better, I certainly have *never* rejected
the ideas that lay behind them, to the extent they were preferably
implemented in the gcc back end, in libf2c, in the underlying OS, or
in the CPU.

And, keep in mind, Toon has been, in effect, arguing *against* changes
like yours, because, as he points out, we're not going to get "perfect"
consistent behavior *anyway*, so why do anything that might slow down
code, which adding a signal handler (or changing an FP mode) might?
(Though I don't recall having specifically objected to *that* -- he
probably wouldn't, since such code wouldn't be in a loop!)

For myself, not having had, prior to a year or so ago, a particularly
clear picture of the issues, I had to rely on advice as to how to
generally handle these issues, combined with my own need to know, to
at least some extent, what was going on.  I was probably, back when
you submitted the patches you're talking about, in the mode Toon still
is, which is "if it involves FP, you're on your own", loosely stated,
and doing everything I could to *avoid* having g77 adjust the underlying
components (libf2c, gcc, OS, and CPU), especially system-specific
changes (which can easily mushroom into bazillions of #ifdef's, as
most everyone who does GNU-like development realizes by now).

Also, I thought I made it perfectly clear that I was in constant-
apology mode regarding my lack of being up to snuff wrt Internet
access, ability to upload/download bugs, investigate stuff, etc.
Until early this year, that was a *huge* problem, making investigating
anything "peripheral" to my fundamental job as g77 front-end maintainer
be poorly treated by me.

Add to that the fact that, also until very recently (starting with the
EGCS project, but, more practically, ramping up through early this year),
nobody could really get anything into g77 without going through me,
in the sense that I had to be the one to distribute not only *releases*
of g77, but *alpha* versions.

So, if I added a patch that would, even in practice, improve g77's
behavior on a particular system, and went to the (significant!) trouble
of distributing that version of g77 for alpha-testing, and it blew
up *other* systems, that'd be a *huge* waste of various resources,
which were in such short supply at that time, I was even ignoring some
pretty mainline g77 front-end issues.  (Heck, until I got my PII and
except when using my Alpha, I avoided making significant changes to
f/com.c and f/expr.c, because Emacs C-mode couldn't keep up with my
thought/typing process -- seemed like its indentation handling, which
gets invoked when typing `}', had slowed down at some point or something,
making my poor little 486 sputter for a few seconds every few keystrokes.)

That, and other circumstances, conspired to make me *very* reticent to
add any changes in which I didn't *personally* have a high degree of
confidence, that didn't come from a source that suggested a high degree
of confidence was due the changes (e.g. I accepted changes by dmg to
netlib libf2c for libg2c pretty much blindly), and that could easily
break lots of things, *especially* in subtle, not-discovered-until-
way-down-the-road things.

Further, on many of the occasions where I *did* confidently forge ahead
with similarly risky changes to g77, whether made by me or incorporated
on someone else's say-so, we got severely burnt.  (Some of what made
g77 circa 0.5.21, I forget exactly which versions, buggy was because
I'd pretty much blindly folded in most, or maybe all, of the GNAT patches
to the gcc back end, on kenner's say-so.  For all I know, *they* might
have been all correct -- bugs might have been all due to the interactions
they had with g77's patches -- but the upshot was, if I hadn't integrated
them, we *might* have had a more stable g77 for awhile there.)

To sum up my reasons for not immediately integrating whatever such patches
you might submit as "rejection" does both me, and my reasoning for not
taking them as-is a disservice.

>Complete consistency across
>even IEEE-ish targets is surely doomed, though.

I'm not quite ready to say *that*, but I certainly wouldn't encourage
GCC to attempt it, especially not as a default.  Certainly the current
crop of g77/gcc users is not interested in it.

What I would *love* to see is a full-court-press by the number-crunching
industry to effectively *mandate* strict IEEE 754 conformance (to the
"range" of the standard appropriate, e.g. not bothering with trapping/
exceptions or extra rounding modes in FORTRAN-77-based languages, but
still getting all the precision and consistency exactly right, for the
*default* types, with no excess precision, thus no spill issues, etc.).

Not that I think that highly of IEEE 754 per se (how can I, not being
a numerical analyst?), but if the industry did just make every Fortran
compiler do whatever it took to conform to that standard by default,
the benefits would be enormous, in better, more widespread, testing
at component/unit levels, in reducing complexity for programmers,
allowing them to focus on *their* problems rather than peripheral issues
(like "which floating-point behavior do I get *today*?")...

...and, most importantly, it would probably result in whatever Intel
shipped as the next IA32 "upgrade" finally delivering IEEE-754-
conforming *performance*, instead of *punishment*.

> JCB> So other systems crash on overflow (instead of generating Inf)?
>
>Of course they do, though they're decreasing in importance in my game,
>if you mean the basic hardware.  However, I'd almost always want to
>set up the FP system on an IEEE box to do that anyway to find the
>bugs.  I distributed the crystallographic suite that way for all the
>systems I knew how.  I got grief from users who seemed to want bogus
>output, but I fixed plenty of bugs.

Excellent!!  The better a job we can do communicating these potential
means for finding problems to users, the better.  Yes, a completely
uniform g77 environment across platforms that, e.g., defaulted to
crashing on overflow rather than returning Inf might be ideal, but,
failing that, I think we should a) go with whatever the (four-part)
environment, described above, decides and b) document as best we can
how to override that environment.

(Though, I have had some people tell me that a default of crashing
instead of returning Inf is wrong.  I am not prepared to make that
sort of decision myself.)

>Please spare me the diatribes when I report how things actually behave.

What in the world prompted you to say that??

>$ uname -a
>OSF1 pxsv6.dl.ac.uk V4.0 1091 alpha
>$ cat >z.f
>        a=exp(-100.)
>        print *, a
>        end
>$ f77 -O0 -ieee_with_no_inexact z.f  && ./a.out
>  3.7835059E-44
>$ f77 -O0 z.f && ./a.out
>  0.0000000E+00
>$ g77 z.f && ./a.out
>  0.

Well, *that's* good news!

How does this behave?

      CHARACTER*50 A
      EQUIVALENCE (I, R)
      A = '1E-40'
      READ (UNIT=A, FMT=*) R
      PRINT '(Z8)', I
      END

(It should print "116C2", modulo endianness, if my IA32 system is any guide.)

If it crashes, how does it behave if "1E-40" is replaced by "1E-400",
and by "1E-4"?

> JCB> the *actual* behavior appears to be that, 
>
>I refer to the behaviour I actually observe.  I can now check how an
>Alpha works rather than just being told I don't understand it without
>useful explanation.

Who told you you didn't understand it?  Could you provide a
reference?  Keep in mind *I* have pretty consistently pointed out
that I don't have my Alpha up and running yet (as my web site makes
pretty clear).

>The issue with libI77 is trivially fixed.

By compiling it with -mieee?  But we won't be doing that for gcc 2.95,
so if doing so exposes bugs not caught by *our* limited testing, but
that might be caught by the wider base of users who limit their testing
to releases (or prereleases) of g77, that's going to be painful.

Alternatively, we could make a modification to libg2c's pertinent
routines.  Not the approach I want to take anymore.

Or, we could convince dmg to change netlib libf2c, which, if we can
agree that the change makes sense within the context of a consistent
numerical environment, is the option I'd prefer.  (More and wider
testing, etc.)

> >> Note that gcc's default is the same as Digital's.
>
> JCB> If only that were the case, we'd have few of the problems we
> JCB> have now.
>
>I'm using the Digital compiler and I checked before saying so.  They
>both default to the non-IEEE mode, as documented.

Did you not bother reading the rest of what I wrote?  I *know* they
chose what we call -mno-ieee as a *compiler* option the default, but
they did *not* choose failing to properly *research*, *design*, and
*implement* it as a default, did they?  I've explained that *many*
times now.  Why are you insisting on twisting my words around, just to
make a point?  Do you really have so little respect for my efforts here?

> JCB>   Digital Fortran offers a fully-working environment based on what
> JCB>   we call -mno-ieee *and* offers a fully-working environment based on
> JCB>   what we call -mieee as well.  It happens to offer the first as a
> JCB>   default.
>
> JCB>   g77 offers neither choice.
>
>I don't understand what `fully-working' means -- bug free?  (g77
>clearly doesn't provide a fully-working environment on any system;
>particularly because debugging doesn't work properly.)

No, I've *explained* what fully-working means before -- and probably
even in that email.  Didn't you bother reading it?

>The DEC default is to crash on overflow, for instance, which it sounds
>as though you think is wrong.  I get a segv from attempted i/o of a
>subnormal in default mode compiling with it.  I've verified that gcc
>and g77 pass paranoia perfectly with gcc -mieee and appropriately
>multilibbed libg2c.

No, I don't think crashing on overflow is wrong *per se*.  I think
g77-compiled code crashing on overflow, denormal, underflow, or
even divide by zero is wrong if the underlying system (e.g. the native
compiler who we're perhaps trying to emulate to some degree, or the
way the OS normally behaves under, say, f2c/libf2c) does *not* do so.

The idea I keep trying to get across is that, to the extent we don't
tightly adhere to *standards*, we lose opportunities for testing, code
re-use, user understanding, etc.  (Some of those "standards" are really
ad-hoc, "this is how this machine behaves" sort of things, and that
makes consistency really difficult to achieve in some cases, I realize.
Each issue must be carefully thought through.)

(I had, up until several months ago, been hoping/assuming people would
just magically do the right testing during development and early
prerelease.  It's now clear that's not the case, so I support, with
more understanding now rather than just faith or appreciation for
attempts to DTRT, the "longish" release schedules adopted by the EGCS
project.  I now think they may well not be long enough, or allow for
enough show-stopper bugs to be discovered, tracked down, fixed, and
respun.)

The next choice is to adhere to what a particular *system* normally
does, and then implement that properly.

My impression was, based on submissions *to this list*, that we did *not*
do that for g77 vis-a-vis Digital's compilers.

Now, you can claim that, in fact, those submissions were in error, that
we do, in fact, offer (fundamentally, aside from typical sorts of bugs)
the same environment Digital does (at least for -mno-ieee; clearly that's
not the case for -mieee, so we cannot really "capture" users who want
to *start* with -mieee and then selectively "migrate" codes to -mno-ieee).

But, my problem with this whole debate has been how people like Toon and
yourself have responded to these mere *allegations* of incorrect
behavior:

  "It's not incorrect, fix your code."

  "Any code that generates a denormal or computes an underflow is wrong,
  so it's better to crash."  (Even if *Digital* offerings *don't* crash
  in the same circumstances!!)

And we've seen the same attitude manifested before as regards 80-bit
spills.

Further, this attitude is now being copped by people upon whom I previously
relied to support my efforts to make g77 a great product, and who were
not even the people who made these short-sighted decisions in the first
place (meaning there are even *more* people who have these attitudes,
and have, or at least had, the power to impose them on others).

So I'll no longer be working on new g77 stuff, since I have concluded
that I won't have the support I need to do it right, nor to withstand
assaults on my attempts to make it *work* first, *then* make it fast,
as defined by *me* using *my* 25 years of experience in the field.

>Given the value of `offers neither choice', perhaps someone can sort
>it out.  I'd fix and test the multilibbing if I thought it stood a
>better chance of being accepted than similar stuff I've wasted time
>on.

Please explain what you mean by "similar stuff", and give examples,
as the only things I can guess at what you *might* be talking about
aren't *remotely* similar in ways that pertain to this discussion.

Particularly disturbing is the fact that you're the *only* other person
to whom I can go to for help maintaining libg2c; that you and I took
*lots* of heat (e.g. from HJ Lu) trying to make libg2c work with
multilibs (work mostly done by Robert Lipe, IIRC); and, here you are,
implying that I'd actively resist attempts you make trying to *use*
that multilib facility to extract the exact same benefits I've clearly
said I *assumed* had been present all along, and would have fixed
*myself* if I'd discovered, on my own, had been omitted (by virtue,
for example, of having a running Alpha).

All because I didn't accept patches from you to change how libg2c
sets up the exception environment, or some such thing?

If you didn't before, now you (and others) can understand now, why
I no longer see my working on g77 as particular productive.  You'd
rather argue with me than let me work, especially if the alternative
is that I might make g77 more robust for more users, but perhaps a little
slower for a few.

Whether it's entirely my fault, I've now lost the support of the very
people who've been most important to g77's success in the past (other
than myself): Dave Love, who made libU77 and libg2c happen, but who
now appears to claim I actively resist his attempts to improve g77;
Toon Moene, almost *the* biggest supporter of my g77 work over the years,
who takes up a huge, long argument with me (while I *could*
have been working on the rewrite) because he didn't agree with me that
the hassle users have to deal with to use -mieee was just too much, and
who has previously (among other things) dismissed my proposal to
do 80-bit spills as a default nearly out-of-hand; and Richard Henderson,
whose also dismissed that same proposal nearly out of hand, yet without
whom g77's current performance would be *abysmal* on some machines (like
Alphas).

I mean, yes, it's been wonderful to get the statements of support I
*have* gotten on these issues, but they've all come from people who
I don't recognize as active participants in improving g77 (or the
gcc back end).  There's nobody who has consistently spoken up to
support my view that we should choose defaults that tend to lead to
an overall more robust environment *and* to whom I can look to to be
taken seriously by others who are doing the actual *work* on these
projects.  (Certainly their statements of support, in these public
discussions, have apparently accomplished nothing vis-a-vis the
opinions expressed by those who *do* directly influence g77's
evolution.)

I can't carry the water for doing things right anymore vis-a-vis
g77.  There's no way the architecture I intended to use for my rewrite
will be tolerated in this environment, and it's highly unlikely
g77 will offer reasonable, consistent-with-vendor-practice stable
numerics in the timeframe that *I* require it to to make that rewrite
(and the features I expected to add via it) worthwhile anyway.

I'm not going to fight anymore.  And, despite Toon's attempts to get
me to do so, I'm not just going to be a code-boy who obediently
(and *voluntarily*) "improves" g77 exactly as directed by others,
when I *know* those directions are wrong-headed and short-sighted,
as well as that the results don't speak well for my *overall*
abilities (which include product *architecting* and *design*, not
just implementation and debugging, which appears to be the only areas,
if any, for which I'm respected by you).

Best of all, by stepping aside, I make it *much* easier for you, Toon,
and others to designate a replacement g77 maintainer, someone for whose
opinions and experience you'll have much more respect *or* who will
happily make whatever changes you suggest, regardless of how little
actual R&D has been put into them.  (I sincerely hope the former occurs,
but don't hold out much hope for it.  I'll certainly have plenty of
email-archive URLs ready to offer to anyone who asks me "what sorts
of things happen on this project when somebody stands up and says,
here's the *right* way to solve this problem, when the right way isn't
the convenient, cheap, or fast way?")

> JCB> Remember, my original statement was to the effect that -mno-ieee
> JCB> as a default was a poor choice because we didn't bother to
> JCB> properly implement it, whereas, at least with -mieee, we'd have
> JCB> had plenty of *existing* code (coming from other,
> JCB> IEEE-754-conforming, environments), such as library routines,
> JCB> test suites, and so on, to just "plug in" for ordinary use.
>
>I don't understand what that's about.  How is it not properly
>implemented (modulo general bugs with 64-bit targets and other bugs
>not directly related to the fP model)?  Why the confidence that -mieee
>is properly implemented in contrast?  (libf2c was developed on VAXen,
>presumably along with Berkeley libm stuff originally.)

We didn't make sure we implemented -mno-ieee with *nearly* as much
attention to detail as *Digital* did when *it* chose to go out-of-step
with the industry (IEEE 754).  (A choice I gather it now regrets, or at
least no longer sticks with, if my impressions about the 21264 are
correct.  I'd love to find out, for sure, what the top Digital/Compaq
technical gurus now think about making -mno-ieee the default, assuming
that they now, on 21264, effectively offer -mieee the default.)

And, by not choosing -mieee, we've clearly chosen to not ensure that
we offer a consistent IEEE 754 environment on Alphas, a choice Digital
*did* make, because -mieee doesn't even work under g77.  So all the
testing that *could* have gone on, using codes that assumed IEEE 754
(even if just codes whose *only purpose* was to test IEEE 754 compliance,
i.e. *not* codes subject to Toon's numerical-analysis-expertise-based
rejection), has *never* gone on for g77/Alpha, or gcc/Alpha, or g++/Alpha.

To put it coarsely: the decision GNU made regarding -mieee was made
mainly to avoid the hard work of getting FP right in gcc (and, by
extensions, unfortunately, g77).

That hard work might have included doing difficult optimizations to
make -mieee perform well (as it could, I gather, perform *much* better
than it does today, since there are "trap shadows" we could account
for and thereby avoid some uses of TRAPB).

And/or, it might have included making sure all the libraries supported
it properly.

And/or, it might have included making sure most of the code out there that
helps people test IEEE 754 conformance was run by volunteers using -mieee.

And/or, it might have included thoroughly documented these issues up
front, so that not only end users, but "middle" people, like myself,
would have been made aware of them up front -- to make better decisions
regarding testing, for example.

We didn't do any of that, AFAICT.  I assert that Digital did *all* of it,
up front.

Taking that into account, saying that "we made the same choice as Digital"
would probably be litigable.  It certainly hides the truth.  And it hides
it in a way that is consistent with likely *future* blunders like this,
i.e. where someone just says "well, it's okay for us to not do this right,
because some other vendor doesn't", and everybody goes "uh-huh, okay,
sounds great", and nobody asks (or they're yelled at for doing so)
questions like "well, what *else* might that vendor have done to mitigate
*their* choice, that *we* had better consider doing?".

(Remember, for example, that it might often be the case that $100K worth
of test codes might be publicly available that test against some *standard*,
and therefore available to GNU products, while any *vendor* that chooses
to *not* follow that standard can pony up $100K - $1M, or more, to obtain
a *proprietary* set of test codes, equivalent to the public ones, but
that test *their* choice of protocols, formats, or whatever.  If GNU
copies the *vendor* choice, it locks itself out of that free $100K-worth
of test codes, and probably won't have reasonable access to that $1M
worth of *proprietary* codes, either.  If this theory is correct, and
if it applies here, I wonder just how many bugs we would have flushed
out of `-mno-ieee' as the default if we'd been able to run all the
tests Digital designed and ran on its Alpha product line, from the
beginning?  I wouldn't be surprised if we *still* fail more in *their*
test suite than *they* failed as of their first public, non-beta release.
But we have no easy way of finding out.  Even if not specifically the
case with -mno-ieee and Digital, this, as a hypothetical situation,
illustrates the hazards we *must* account for every time we decide to
go our own way.)

> JCB> The assertions so far amount to "but any code that doesn't work
> JCB> with g77 is inherently broken", or some Gatesian approximation
> JCB> thereof.
>
>I fail to see how, at least from me.  g77 on Alphas has clearly had
>serious problems unrelated to IEEE FP behaviour.

Not from you, but you've tended to agree with Toon's statements concerning
the irrelevance of IEEE 754 support in Fortran codes.  Toon certainly
made the sharpest statements in regard to this issue, and he was *not*
alone in doing so back in December during the 80-bit-spill debacle.

>I don't get the rant about DEC's FP models.

What rant?  I was *applauding* Digital's decision to do these things
*right* -- make sure the industry-standard stuff works, and make sure
their non-standard choices work.  (And their VAX FP models *predated*
IEEE 754, I'm pretty sure -- I think VAX came out in early 1978 or so,
and I would assume its F_floating, D_floating, and G_floating formats
were present, or at least all specified, from the beginning, whereas
I'm under the impression IEEE 754 came out years later.)

So I was hardly criticizing them for creating *those* formats.  My point
was that Digital goes, or went, to a *lot* of trouble to make sure it
supports, with a minimum of surprise to its end users, any decisions it
makes that involve being outside the boundaries of standards, de facto
or otherwise.  (Heck, when they designed the STRUCTURE facility, they
actually tried hard to make sure it *wouldn't* look like whatever future
facility standard Fortran might want to offer, so there wouldn't be
collisions in the "space" of language design.  Exactly the opposite sort
of behavior, especially as regards ethics, towards the industry, *and*
its customer base, over the long haul, exhibited by MS, and some others
as well.)

Now, it's true that Digital didn't become what MS became, and that
one could claim it was their very attention to detail and elegance
that doomed them.  I might agree.

I *don't* agree that GNU should copy the worst aspects of MS' impact on
the industry -- the general unwillingness to appreciate and execute proper
design and engineering, as well the refusal to stick as closely as possible
to standards and then deviate only *properly* from them, when necessary,
in ways that at least don't hinder the long-term prospects of the industry
and/or its customer base (*even* if they decide to move to a different
vendor!).

>VAX FP is the major
>reason that most code doesn't require IEEE conformance, for historical
>reasons if nothing else.  If DEC decided (for such reasons?) that that
>a non-IEEE default was appropriate, and they're so good at such
>things, I'm baffled why it was so wrong for kenner to follow the
>lead.

Because kenner, or really the *rest* of us, didn't bother to actually *follow*
the lead.  We merely appropriated the stance that IEEE 754 conformance
wasn't worthwhile as a default.  Further, we decided (by fiat) that properly
*following through* on that decision, e.g. by ensuring we produced a product
designed *properly* despite being non-standard out of the box, wasn't
worthwhile either, a stance Digital surely did *not* take.

As a result, we continue to play whack-a-mole with g77-, gcc-, and,
generally, GNU-wide choices to "march to a different drummer" and
how those choices affect our ability to attract a wide user base,
maintain products, and so on.

(I fully recognize that some of these issues make good arguments for
going *against* current/standard practice, so I'm not making a blanket
statement about the *appropriateness* of all those choices, merely about
their *effects*.)

        tq vm, (burley)
>>From craig@jcb-sc.com Mon Jul 12 00:24:00 1999
From: craig@jcb-sc.com
To: law@cygnus.com
Cc: craig@jcb-sc.com
Subject: Re: Bug with g77 and -mieee on Alpha Linux
Date: Mon, 12 Jul 1999 00:24:00 -0000
Message-id: <19990712072342.28771.qmail@deer>
References: <4048.931760012@upchuck.cygnus.com>
X-SW-Source: 1999-07/msg00430.html
Content-length: 2422

>The difference between these two cases is in the -f[no]emulate-complex we
>aren't even sure *what* the compiler should be doing, much less *how* to
>get it done.

Ah, okay, I didn't realize that the problem went that deep.

>There's a rather serious design issue that needs to be investigated before we
>can even begin to look at solutions.  Worse yet, the investigation phase will
>probably require looking at a variety of other Fortran compilers to see how
>they handle passing of complex values -- we should not insert a gratuitous ABI
>incompatibility for passing complex values.

Agreed.  Though, for g77's purposes, the ABI for complex is currently
*always* the ABI for struct { float; float; };.  I'd be interested in
knowing about any ABI's for which that was not the case, because they'd
be systems on which g77 -fno-emulate-complex might not even *work*,
if implemented to follow the native ABI.  (That's because g77 would be
telling the back end to pass/accept __complex__ across calls, but the
other end might be f2c-compiled, or g77-emulating-complex, or other,
code that uses the struct method.)

Put another way, g77 is presently architected (in terms of options it
provides, the way it works internally, etc.) to assume the ABI doesn't
make __complex__ different from the equivalent struct.

We'd at least need to understand and document such differences, e.g.
explain that users of pertinent systems not combine g77 -fno-emulate-complex
with code from certain types of other sources.  (-fno-f2c is related to
this, as another example.)

>Letting the release out with -fno-emulate-complex without resolving these 
>issues
>makes it much more likely that we will need to break binary compatibility 
>again for the Fortran compiler once we figure out how to properly pass complex
>values.

Note that I'm not aware there were ever any problems, in *this* area at
least, back when -fno-emulate-complex was the only choice, though that
was a long time ago, and I might have forgotten.

>I haven't made a final decision on the complex stuff, but I'll have to make one
>in the very near future (next 24hrs).  

Whew, well, good luck!  I *still* don't know what to recommend, other
than to say that, if we change the default again and don't respin/retest
for at least a month, we're risking some pretty serious regressions.
Wish I could put a number on that risk, as that might help you.

        tq vm, (burley)
>>From law@cygnus.com Mon Jul 12 00:59:00 1999
From: Jeffrey A Law <law@cygnus.com>
To: craig@jcb-sc.com
Cc: egcs-bugs@egcs.cygnus.com
Subject: Re: Bug with g77 and -mieee on Alpha Linux 
Date: Mon, 12 Jul 1999 00:59:00 -0000
Message-id: <4241.931766333@upchuck.cygnus.com>
References: <19990712072342.28771.qmail@deer>
X-SW-Source: 1999-07/msg00431.html
Content-length: 2171

  In message < 19990712072342.28771.qmail@deer >you write:
  > >The difference between these two cases is in the -f[no]emulate-complex we
  > Agreed.  Though, for g77's purposes, the ABI for complex is currently
  > *always* the ABI for struct { float; float; };.
OK.  That's good to know.   If this is indeed the ABI we need to use after
investigating vendor compilers, then we've probably got some work to do in
the backends which pass parameters in registers since I do not believe all 
are prepared to handle complex modes in the various FUNCTION_ARG macros.

Having it as a struct does help with some issues since structs are usually
not passed in FP registers, even when their members are strictly floats,
strictly doubles or strictly long doubles.

Treating it as an aggregate has some performance impacts, so I wouldn't be
totally surprised if some vendor treated it as two independent float args for
parameter passing purposes.  But that's something we'll have to investigate.

  > I'd be interested in
  > knowing about any ABI's for which that was not the case, because they'd
  > be systems on which g77 -fno-emulate-complex might not even *work*,
Yup.  I doubt we've ever done any serious work at looking at how vendor
compilers handle complex and interoperability between g77 and vendor compiled
fortran libraries (or even libf2c/libg2c compiled with the vendor compiler).

On systems like HPs incompatibilities in this area could also be hidden by the
linker for double complex values -- in the event of a caller/callee mismatch
for register types, the linker will insert a trampoline to shuffle arguments
from one register file to the other.  Ick.


  > Whew, well, good luck!  I *still* don't know what to recommend, other
  > than to say that, if we change the default again and don't respin/retest
  > for at least a month, we're risking some pretty serious regressions.
  > Wish I could put a number on that risk, as that might help you.
I doubt we'd need a month to respin/retest.  I bet we could get all the testing
we needed by lapack in a week.  lapack looks to stress the complex stuff a
hell of a lot more than our regression testsuite.

jeff