From mboxrd@z Thu Jan  1 00:00:00 1970
From: Fergus Henderson <fjh@cs.mu.OZ.AU>
To: Zack Weinberg <zack@codesourcery.com>
Cc: Gabriel Dos Reis <gdr@integrable-solutions.net>, gcc@gcc.gnu.org
Subject: Re: A FrontEnd in C++?
Date: Fri, 23 Aug 2002 02:20:00 -0000
Message-id: <20020823092008.GA2439@ceres.cs.mu.oz.au>
References: <F37crCgXAC98D6B8MK9000097e3@hotmail.com> <20020819074617.GA2010@earth.cs.mu.oz.au> <20020819080229.GE14079@codesourcery.com> <m38z338b33.fsf@soliton.integrable-solutions.net> <20020821174040.GF2803@codesourcery.com> <m3sn16677c.fsf@soliton.integrable-solutions.net> <20020822211611.GE24984@codesourcery.com>
X-SW-Source: 2002-08/msg01382.html

On 22-Aug-2002, Zack Weinberg <zack@codesourcery.com> wrote:
> Please, everyone, join me in a thought experiment.  Imagine that you
> are writing a compiler.  It doesn't matter what language(s) you are
> writing in.  You're working under the following constraint: You must
> assume that no other component of the system works as specified,
> unless you have no choice.  Thus, you trust the operating system
> kernel, since you have no choice there.  You just barely trust the
> bootstrap compiler(s) to generate correct machine code, when no
> optimization is enabled and no clever features of the language(s) are
> used.  (Clever features include everything that doesn't have a trivial
> translation to machine code.)  You do not trust the language(s)'
> runtime librar(ies) to work at all.
> 
> Under this constraint, I hope you'll agree with me that the thing to
> do is pick just one language, and write your compiler using the most
> minimal subset of that language that is practical, also avoiding as
> much of the runtime library as is practical.

No, I don't agree at all.

If you are working under those constraints, then rather than writing a
whole compiler in the untrusted bootstrap language (e.g. C), it would
be better to just write a simple bytecode interpreter in the untrusted
bootstrap language.  A bytecode interpreter is going to be simpler than a
compiler, so it is less likely to trigger bugs in the bootstrap compiler.

Then, you write your compiler with two back-ends, one that compiles
to this bytecode, and another that compiles to whatever it was that
you really wanted to compile to (e.g. native code).  The compiler
can be written in its own language, and bootstrapped using a trusted
implementation of that language (or by some other means, e.g. see Robert
Dewar's suggestions).

Then you ship the sources plus the compiled bytecode for the compiler.
When installing on untrusted systems, you first compile the bytecode
interpreter, and then use that to run the compiled bytecode for the
compiler, which you then use to compile your compiler to your chosen
target language (e.g. native code).

This approach
	(1) minimizes the amount of code which needs to depend on
	    the bootstrap compiler (only a bytecode interpreter)
	(2) allows you to use your language of choice for implementing
	    the compiler
and	(3) because of (2), reduces the overall complexity of the system
            compared to your suggested approach

One the down side, it requires adding an additional target language
(the bytecode), but that drawback is minor because the system will
already be designed to support multiple target languages, and the
additional target language is easy to compile to and doesn't need an
optimizing back-end.

> Now.  Based on historical evidence, I argue that that's not a thought
> experiment at all.  It is instead a slight exaggeration of the
> situation we are in, developing GCC.  We have a five-year track record
> of having to work around bugs in the C runtime and the bootstrap
> compiler.  I see no reason to expect this situation ever to change.

GCC already violates the constraint that "You must assume that no other
component of the system works as specified, unless you have no choice.",
since it depends on the bootstrap compiler being able to correctly
compile a whole C compiler, rather than just a bytecode interpreter.

(Indeed, if you only use the default three-stage bootstrap, then the
bootstrap compiler needs to correctly compile a whole *optimizing* C
compiler.  With the three-stage bootstrap process, the stage 1 compiler
is built without optimization, but the stage 2 compiler is built with
optimization enabled.  This depends on the bootstrap compiler having
correctly compiled the stage 1 optimizer.  Using a four-stage bootstrap
process, so that the stage 2 compiler is built without optimization, and
is then used to build a stage 3 compiler with optimization which is then
used to build a stage 4 compiler to check against the stage 3 compiler,
avoids this, thus reducing the dependencies on the correctness of the
bootstrap compiler.)

> Restricting ourselves to a safe subset of C and C only is a sensible
> way to insulate ourselves from these problems as best we can.

Here again I disagree.  There's nothing wrong with using C++ or Java,
since GCC already includes C++ and Java compilers written in C.
Using C++ or Java would not increase the dependencies on the
correctness of the bootstrap compiler(s), because this C++ or Java
code can be compiled using the trusted stage 3 GCC rather than
relying on any C++ or Java compilers already installed on the system.
(Or equivalently, it can be compiled using the not-yet-trusted
stage 2 GCC, so long as part of the process includes checking
that the stage 2 GCC is bit-for-bit identical with the stage 3 GCC.)

Using Mercury to implement the Mercury front-end does not add a
dependency on another bootstrap compiler, since the Mercury compiler
can generate portable C code which you can ship.  Furthermore, it does
not even increase the dependencies on the correctness of the bootstrap
compiler(s), since the Mercury-generated C code can be compiled with
the trusted stage 3 GCC rather than with the untrusted bootstrap compiler.

Using Ada to implement the Ada front-end does increase the dependencies
on the correctness of the bootstrap compilers, since it requires a
correct Ada compiler to bootstrap with, not just a correct C compiler.
But that would not be the case for front-ends written in C++ or Java,
or for front-ends written in any language for which there is a compiler
to C, C++, or Java.  Nor would it be the case for front-ends written
in any language for which there is a compiler to a bytecode language
whose interpreter is written in C, C++, or Java.  (Actually, I should
also include Fortran in that list -- but who'd want to write compilers
or interpreters in Fortran? ;-)

Even though writing the Ada front-end in Ada increases the dependencies
on bootstrap compilers, this drawback is IMHO *far* outweighed by the
benefits of writing the compiler in the same language.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.