From mboxrd@z Thu Jan 1 00:00:00 1970 From: craig@jcb-sc.com To: jvickers@dial.pipex.com, jbuck@synopsys.COM, law@cygnus.com, mark@codesourcery.com, gcc@gcc.gnu.org, rms@gnu.org Cc: craig@jcb-sc.com Subject: Re: type based aliasing again Date: Thu, 30 Sep 1999 18:02:00 -0000 Message-ID: <19990910151259.9117.qmail@deer> X-SW-Source: 1999-09n/msg00414.html Message-ID: <19990930180200.XXup0YfMnzG0qbg6lU9NXHzha_G3zYwvFwYfn51ffDE@z> RMS wrote: > In C, we cannot divide all user code into "right" and "wrong" in this > kind of simple way, and certainly not based on the ISO standard. That > standard is just the decisions of a certain committee (which I was a > member of) about what cases conforming compilers should commit to > support. We must not let ourselves start thinking that C code is > "wrong", just because it is not conforming ISO C code. Saying the standard is "just the decisions of a certain committee" is like saying C is "just a language made up by bored hackers". If you're prepared to offer substantial replacements for the care, consideration, expertise, thoughtfulness, expectation of public participation in the process, and so on, that the C standards committee (presumably) provides, fine. But, I've watched (reasonably carefully) how successful this attitude of "ignore the standards bodies except when we find them convenient" has worked since I first heard about GNU using it so many years ago (at which point I found it *very* persuasive). To put it simply, it's a failure when it comes to the C language, as far as I can tell. The very people who seem most enthusiastic about their ability and willingness to substitute their designs for those of a standards body, while insisting on using the same name that body goes to substantial effort to preserve as a going concern, are the ones who strike me as least willing to spend substantial time and effort designing their variant of the language, specifying it, patiently calling for public comment and *listening* to it, ensuring it's widely understood in terms of its differences vis-a-vis its "standard" namesake, and documenting it. (If you think documenting and specifying are the same, you're wrong -- the C standard is a specification, all those books describing how to write C code is documentation. If you replace the standards body, you not only must write the specification yourself, you must write the first documentation yourself, and, further, give authors of documentation some incentive to write documents for your language. I don't think "C minus the silly aliasing restriction" is enough incentive, since I don't believe anybody's published serious end-programmer documentation on the GNU C language, despite all its extensions.) So, yes, we support many extensions in GCC, and that's nice, when looked at in isolation. But when I think of all the bug reports I've seen, all the confusion over how these features are supposed to work, the lack of clear documentation *or* specifications regarding these extensions and their interactions with each other, and the lack of adequate testing of many of them, I think that, overall, the attitude that we can do things our own way has led to an overall failure to provide a C compiler that users can rely upon *now* and in the *future* to correctly compile the (GNU C, not just ISO C) code they write. Particularly insidious is this practice of redefining constructs that aren't blatantly non-standard in a small chunk of code. Which is what RMS (and Linus) appears to want, without users having to specify -fno-strict-aliasing -- an option that at least gives users *some* heads-up about the code being in some special dialect, not C. It's one thing for an expert in ISO C to recognize that "... __complex__ float x; ..." is obviously not ISO C code. The "__complex__", or similarly undefined, keyword, gives such things away. It's entirely another thing for such an expert to see some code making references via pointers, see *nothing* wrong with the code vis-a-vis ISO C, and thus reasonably assume that, since it works on a wide variety of architectures, that code is playing *no* aliasing games...and then be proven wrong, because what he *didn't* know is that the code requires an RMS C compiler to work, because it *is* playing aliasing games. By which time, he's devoted many hours, perhaps days, to making changes to that code (just a portion of a program, so the *other* portions had to work together with it to violate the aliasing rules), changes that are then recognized as a huge waste of time. If you, the reader, don't grasp what I'm saying there, that's okay, all it means is you don't understand some basic language-design principles. It doesn't mean you're not a great programmer, it just means you shouldn't be doing language design (e.g. proposing changes to a language or dialect, especially not pushing for them from a position of power). But I'll illustrate (in Fortran, since it's actually clearer) this concept with an example. Suppose Barney Super Fortran Programmer is looking at code like this: SUBROUTINE X (A, B) ... A = B * 2 ... B = B + 9 ... END No matter what the "call tree" from X down looks like, he *knows* that, assuming the code is considered "correct" according to the Fortran standards (and there's no command-line options used to compile it to say otherwise), e.g. it runs on a wide variety of platforms, that A *cannot* alias B. Given that knowledge, he need *not* examine *any* callers of X to see whether A actually *might* alias B. That means he can, *legitimately*, change that code to read: SUBROUTINE X (A, B) ... TMPB = B B = B + 9 A = TMPB * 2 ... ... END (All he has to do is ensure that the "..." following A don't explicitly reference B...the usual sorts of things that programmers check for.) Never mind why he might want to make that change. Maybe it's for performance reasons. The point is, he *can* make that change because he *knows* A and B don't alias each other. Period. End of story. But wait! Turns out this code is always compiled by RMS Fortran. RMS decided this aliasing prohibition, put in by the Fortran standards committee back in the '70s and preserved ever since, was just silly, so he eliminated it from his dialect. (Maybe he only *partially* eliminated it, by saying "let's do a cost/performance analysis each time we considered breaking code like the above in a given instance", and the net result was the same -- most RMS Fortran users thinking there were no alias restrictions in the language, because they always got away with that, except perhaps in "extremely obvious" instances such as small self-contained DO loops.) But he called his dialect "Fortran" anyway, called his compiler "RMS Fortran" (as versus his operating system, which is called "...*not* Unix"), and thus set the stage for Barney, our super-programmer, wasting his time by adding bugs to working code. How? Because it turns out X *does* get called with A and B aliased, that the caller *relies* on the changing of A happening before B is modified, and that the caller is thus *highly* sensitive to the particular ordering of the modifications. In other words, while X looks like it is written in standard Fortran, and while every single one of X's callers, *in isolation* (without looking at X itself), might look like *they* are written in standard Fortran, the reality is, the whole *program* is, in fact, *not* standard Fortran. It is RMS Fortran. The original programmers of that code got away with it because the compiler happened to avoid ever reordering things like that, except in instances "everybody" -- meaning only those RMS Fortran users truly paying attention to these issues - would agree such reordering was desirable. And the result is buggy code. All of the usual responses -- "well, if Barney was so good, why didn't he read the docs", for example -- would have applied just as well if RMS hadn't changed the dialect in this way and left it up to *Fortran* programmers to read the industry-standard *Fortran* specification and the widely published books on *Fortran* programming. Further, the *overall* quality of the RMS Fortran product plus RMS Fortran code is *lower*, because it is less of a good language. That is, it plays tricks -- appearing to mean one thing, when it means another, leading people who misunderstand it to break code written in it. So, to cater to the mediocre programmers who won't read *any* docs, not even the widely published ones on the general language they're using, we *do*, in fact, punish the *great* programmers who *do* read the *right* docs, but to whom we've given no adequate notice, in instances such as the above, that there are some *other* docs they need to go read and study. (Yes, the quality of those industry-wide specifications and docs might be low. That's a separately addressed problem, to be fixed by offering better ones, *not* by changing the language being specified.) If you're going to insist that we continue to chart our own course for the language GCC compiles vis-a-vis ISO C, then I suggest you consider renaming *that* language to something else, like "GNU P", so as to avoid confusing users. After all, you insisted the EGCS project be named EGCS, not GCC3, to avoid confusing users. I suggest you (RMS) apply the Golden Rule here -- if you were so concerned about EGCS polluting the "good name" of GCC back then, I suggest you consider how the ISO C people feel about this continued willingness of the GCC people to say they compile code written in something called "C", when they're at the same time so willing to say "ISO C is just a committee" to excuse ignoring, whenever convenient, what ISO C says about what C "means" as a language name. Then, when GNU P has proven to be a better C language than ISO C, such that the *overall* community prefers it just as it ended up prefering EGCS to GCC2, we can rename everything back. No, I'm not arguing for "aggressively breaking bad code", because I don't think that's wise, taken literally -- and because I know it's pretty hard to do, and would actually make code *slower*. I agree with what I think Mark *meant* by that phrase, though, which is "let's not bother trying to make all sorts of cost/benefit analyses for every possible conflict between some optimization and whether that might break some existing bad code out there". We have better things to do with our time, like make sure GCC generates *fast* code and that, when compiled with -fno-strict-aliasing, it generates code that accommodates the aliasing bug in user code. tq vm, (burley)