From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fergus Henderson To: Zack Weinberg Cc: Gabriel Dos Reis , gcc@gcc.gnu.org Subject: Re: A FrontEnd in C++? Date: Fri, 23 Aug 2002 02:20:00 -0000 Message-id: <20020823092008.GA2439@ceres.cs.mu.oz.au> References: <20020819074617.GA2010@earth.cs.mu.oz.au> <20020819080229.GE14079@codesourcery.com> <20020821174040.GF2803@codesourcery.com> <20020822211611.GE24984@codesourcery.com> X-SW-Source: 2002-08/msg01382.html On 22-Aug-2002, Zack Weinberg wrote: > Please, everyone, join me in a thought experiment. Imagine that you > are writing a compiler. It doesn't matter what language(s) you are > writing in. You're working under the following constraint: You must > assume that no other component of the system works as specified, > unless you have no choice. Thus, you trust the operating system > kernel, since you have no choice there. You just barely trust the > bootstrap compiler(s) to generate correct machine code, when no > optimization is enabled and no clever features of the language(s) are > used. (Clever features include everything that doesn't have a trivial > translation to machine code.) You do not trust the language(s)' > runtime librar(ies) to work at all. > > Under this constraint, I hope you'll agree with me that the thing to > do is pick just one language, and write your compiler using the most > minimal subset of that language that is practical, also avoiding as > much of the runtime library as is practical. No, I don't agree at all. If you are working under those constraints, then rather than writing a whole compiler in the untrusted bootstrap language (e.g. C), it would be better to just write a simple bytecode interpreter in the untrusted bootstrap language. A bytecode interpreter is going to be simpler than a compiler, so it is less likely to trigger bugs in the bootstrap compiler. Then, you write your compiler with two back-ends, one that compiles to this bytecode, and another that compiles to whatever it was that you really wanted to compile to (e.g. native code). The compiler can be written in its own language, and bootstrapped using a trusted implementation of that language (or by some other means, e.g. see Robert Dewar's suggestions). Then you ship the sources plus the compiled bytecode for the compiler. When installing on untrusted systems, you first compile the bytecode interpreter, and then use that to run the compiled bytecode for the compiler, which you then use to compile your compiler to your chosen target language (e.g. native code). This approach (1) minimizes the amount of code which needs to depend on the bootstrap compiler (only a bytecode interpreter) (2) allows you to use your language of choice for implementing the compiler and (3) because of (2), reduces the overall complexity of the system compared to your suggested approach One the down side, it requires adding an additional target language (the bytecode), but that drawback is minor because the system will already be designed to support multiple target languages, and the additional target language is easy to compile to and doesn't need an optimizing back-end. > Now. Based on historical evidence, I argue that that's not a thought > experiment at all. It is instead a slight exaggeration of the > situation we are in, developing GCC. We have a five-year track record > of having to work around bugs in the C runtime and the bootstrap > compiler. I see no reason to expect this situation ever to change. GCC already violates the constraint that "You must assume that no other component of the system works as specified, unless you have no choice.", since it depends on the bootstrap compiler being able to correctly compile a whole C compiler, rather than just a bytecode interpreter. (Indeed, if you only use the default three-stage bootstrap, then the bootstrap compiler needs to correctly compile a whole *optimizing* C compiler. With the three-stage bootstrap process, the stage 1 compiler is built without optimization, but the stage 2 compiler is built with optimization enabled. This depends on the bootstrap compiler having correctly compiled the stage 1 optimizer. Using a four-stage bootstrap process, so that the stage 2 compiler is built without optimization, and is then used to build a stage 3 compiler with optimization which is then used to build a stage 4 compiler to check against the stage 3 compiler, avoids this, thus reducing the dependencies on the correctness of the bootstrap compiler.) > Restricting ourselves to a safe subset of C and C only is a sensible > way to insulate ourselves from these problems as best we can. Here again I disagree. There's nothing wrong with using C++ or Java, since GCC already includes C++ and Java compilers written in C. Using C++ or Java would not increase the dependencies on the correctness of the bootstrap compiler(s), because this C++ or Java code can be compiled using the trusted stage 3 GCC rather than relying on any C++ or Java compilers already installed on the system. (Or equivalently, it can be compiled using the not-yet-trusted stage 2 GCC, so long as part of the process includes checking that the stage 2 GCC is bit-for-bit identical with the stage 3 GCC.) Using Mercury to implement the Mercury front-end does not add a dependency on another bootstrap compiler, since the Mercury compiler can generate portable C code which you can ship. Furthermore, it does not even increase the dependencies on the correctness of the bootstrap compiler(s), since the Mercury-generated C code can be compiled with the trusted stage 3 GCC rather than with the untrusted bootstrap compiler. Using Ada to implement the Ada front-end does increase the dependencies on the correctness of the bootstrap compilers, since it requires a correct Ada compiler to bootstrap with, not just a correct C compiler. But that would not be the case for front-ends written in C++ or Java, or for front-ends written in any language for which there is a compiler to C, C++, or Java. Nor would it be the case for front-ends written in any language for which there is a compiler to a bytecode language whose interpreter is written in C, C++, or Java. (Actually, I should also include Fortran in that list -- but who'd want to write compilers or interpreters in Fortran? ;-) Even though writing the Ada front-end in Ada increases the dependencies on bootstrap compilers, this drawback is IMHO *far* outweighed by the benefits of writing the compiler in the same language. -- Fergus Henderson | "I have always known that the pursuit The University of Melbourne | of excellence is a lethal habit" WWW: < http://www.cs.mu.oz.au/~fjh > | -- the last words of T. S. Garp.