From mboxrd@z Thu Jan  1 00:00:00 1970
From: craig@jcb-sc.com
To: jvickers@dial.pipex.com, jbuck@synopsys.COM, law@cygnus.com, mark@codesourcery.com, gcc@gcc.gnu.org, rms@gnu.org
Cc: craig@jcb-sc.com
Subject: Re: type based aliasing again
Date: Thu, 30 Sep 1999 18:02:00 -0000
Message-ID: <19990910151259.9117.qmail@deer>
X-SW-Source: 1999-09n/msg00414.html
Message-ID: <19990930180200.XXup0YfMnzG0qbg6lU9NXHzha_G3zYwvFwYfn51ffDE@z>

RMS wrote:
> In C, we cannot divide all user code into "right" and "wrong" in this
> kind of simple way, and certainly not based on the ISO standard.  That
> standard is just the decisions of a certain committee (which I was a
> member of) about what cases conforming compilers should commit to
> support.  We must not let ourselves start thinking that C code is
> "wrong", just because it is not conforming ISO C code.

Saying the standard is "just the decisions of a certain committee" is
like saying C is "just a language made up by bored hackers".

If you're prepared to offer substantial replacements for the care,
consideration, expertise, thoughtfulness, expectation of public
participation in the process, and so on, that the C standards
committee (presumably) provides, fine.

But, I've watched (reasonably carefully) how successful this attitude
of "ignore the standards bodies except when we find them convenient"
has worked since I first heard about GNU using it so many years ago
(at which point I found it *very* persuasive).

To put it simply, it's a failure when it comes to the C language,
as far as I can tell.  The very people who seem most
enthusiastic about their ability and willingness to substitute
their designs for those of a standards body, while insisting on
using the same name that body goes to substantial effort to preserve
as a going concern, are the ones who strike me as least willing
to spend substantial time and effort designing their variant of
the language, specifying it, patiently calling for public comment
and *listening* to it, ensuring it's widely understood in terms of
its differences vis-a-vis its "standard" namesake, and documenting
it.  (If you think documenting and specifying are the same, you're
wrong -- the C standard is a specification, all those books
describing how to write C code is documentation.  If you replace
the standards body, you not only must write the specification yourself,
you must write the first documentation yourself, and, further, give
authors of documentation some incentive to write documents for your
language.  I don't think "C minus the silly aliasing restriction"
is enough incentive, since I don't believe anybody's published
serious end-programmer documentation on the GNU C language, despite
all its extensions.)

So, yes, we support many extensions in GCC, and that's nice, when
looked at in isolation.  But when I think of all the bug reports
I've seen, all the confusion over how these features are supposed
to work, the lack of clear documentation *or* specifications regarding
these extensions and their interactions with each other, and the lack
of adequate testing of many of them, I think that, overall, the attitude
that we can do things our own way has led to an overall failure to
provide a C compiler that users can rely upon *now* and in the *future*
to correctly compile the (GNU C, not just ISO C) code they write.

Particularly insidious is this practice of redefining constructs
that aren't blatantly non-standard in a small chunk of code.  Which
is what RMS (and Linus) appears to want, without users having to specify
-fno-strict-aliasing -- an option that at least gives users *some*
heads-up about the code being in some special dialect, not C.

It's one thing for an expert in ISO C to recognize that
"... __complex__ float x; ..." is obviously not ISO C code.
The "__complex__", or similarly undefined, keyword, gives such
things away.

It's entirely another thing for such an expert to see some code
making references via pointers, see *nothing* wrong with the code
vis-a-vis ISO C, and thus reasonably assume that, since it works
on a wide variety of architectures, that code is playing *no*
aliasing games...and then be proven wrong, because what he *didn't*
know is that the code requires an RMS C compiler to work, because
it *is* playing aliasing games.  By which time, he's devoted many
hours, perhaps days, to making changes to that code (just a portion
of a program, so the *other* portions had to work together with it
to violate the aliasing rules), changes that are then recognized
as a huge waste of time.

If you, the reader, don't grasp what I'm saying there, that's okay,
all it means is you don't understand some basic language-design
principles.  It doesn't mean you're not a great programmer, it just
means you shouldn't be doing language design (e.g. proposing changes
to a language or dialect, especially not pushing for them from
a position of power).

But I'll illustrate (in Fortran, since it's actually clearer) this
concept with an example.  Suppose Barney Super Fortran Programmer
is looking at code like this:

   SUBROUTINE X (A, B)
   ...
   A = B * 2
   ...
   B = B + 9
   ...
   END

No matter what the "call tree" from X down looks like, he *knows*
that, assuming the code is considered "correct" according to the
Fortran standards (and there's no command-line options used to
compile it to say otherwise), e.g. it runs on a wide variety
of platforms, that A *cannot* alias B.

Given that knowledge, he need *not* examine *any* callers of X
to see whether A actually *might* alias B.

That means he can, *legitimately*, change that code to read:

   SUBROUTINE X (A, B)
   ...
   TMPB = B
   B = B + 9
   A = TMPB * 2
   ...
   ...
   END

(All he has to do is ensure that the "..." following A don't explicitly
reference B...the usual sorts of things that programmers check for.)

Never mind why he might want to make that change.  Maybe it's for
performance reasons.

The point is, he *can* make that change because he *knows* A and B don't
alias each other.  Period.  End of story.

But wait!  Turns out this code is always compiled by RMS Fortran.  RMS
decided this aliasing prohibition, put in by the Fortran standards
committee back in the '70s and preserved ever since, was just silly,
so he eliminated it from his dialect.  (Maybe he only *partially*
eliminated it, by saying "let's do a cost/performance analysis each
time we considered breaking code like the above in a given instance",
and the net result was the same -- most RMS Fortran users thinking
there were no alias restrictions in the language, because they always
got away with that, except perhaps in "extremely obvious" instances
such as small self-contained DO loops.)

But he called his dialect "Fortran" anyway, called his compiler
"RMS Fortran" (as versus his operating system, which is called
"...*not* Unix"), and thus set the stage for Barney, our super-programmer,
wasting his time by adding bugs to working code.

How?  Because it turns out X *does* get called with A and B aliased,
that the caller *relies* on the changing of A happening before B is
modified, and that the caller is thus *highly* sensitive to the particular
ordering of the modifications.

In other words, while X looks like it is written in standard Fortran,
and while every single one of X's callers, *in isolation* (without
looking at X itself), might look like *they* are written in standard
Fortran, the reality is, the whole *program* is, in fact, *not* standard
Fortran.  It is RMS Fortran.  The original programmers of that
code got away with it because the compiler happened to avoid ever
reordering things like that, except in instances "everybody" -- meaning
only those RMS Fortran users truly paying attention to these issues -
would agree such reordering was desirable.

And the result is buggy code.

All of the usual responses -- "well, if Barney was so good, why didn't
he read the docs", for example -- would have applied just as well
if RMS hadn't changed the dialect in this way and left it up to
*Fortran* programmers to read the industry-standard *Fortran* specification
and the widely published books on *Fortran* programming.

Further, the *overall* quality of the RMS Fortran product plus RMS
Fortran code is *lower*, because it is less of a good language.
That is, it plays tricks -- appearing to mean one thing, when it
means another, leading people who misunderstand it to break code
written in it.

So, to cater to the mediocre programmers who won't read *any* docs,
not even the widely published ones on the general language they're
using, we *do*, in fact, punish the *great* programmers who *do*
read the *right* docs, but to whom we've given no adequate notice,
in instances such as the above, that there are some *other* docs
they need to go read and study.

(Yes, the quality of those industry-wide specifications and docs
might be low.  That's a separately addressed problem, to be
fixed by offering better ones, *not* by changing the language being
specified.)

If you're going to insist that we continue to chart our own course
for the language GCC compiles vis-a-vis ISO C, then I suggest you
consider renaming *that* language to something else, like "GNU P",
so as to avoid confusing users.

After all, you insisted the EGCS project be named EGCS, not GCC3,
to avoid confusing users.  I suggest you (RMS) apply the Golden Rule
here -- if you were so concerned about EGCS polluting the "good name"
of GCC back then, I suggest you consider how the ISO C people feel
about this continued willingness of the GCC people to say they
compile code written in something called "C", when they're at the
same time so willing to say "ISO C is just a committee" to excuse
ignoring, whenever convenient, what ISO C says about what C "means"
as a language name.

Then, when GNU P has proven to be a better C language than ISO C,
such that the *overall* community prefers it just as it ended up
prefering EGCS to GCC2, we can rename everything back.

No, I'm not arguing for "aggressively breaking bad code", because
I don't think that's wise, taken literally -- and because I
know it's pretty hard to do, and would actually make code *slower*.

I agree with what I think Mark *meant* by that phrase, though, which
is "let's not bother trying to make all sorts of cost/benefit analyses
for every possible conflict between some optimization and whether
that might break some existing bad code out there".  We have better things
to do with our time, like make sure GCC generates *fast* code and that,
when compiled with -fno-strict-aliasing, it generates code that accommodates
the aliasing bug in user code.

        tq vm, (burley)