From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-173787-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 20540 invoked by alias); 11 Apr 2012 22:34:16 -0000
Received: (qmail 20420 invoked by uid 22791); 11 Apr 2012 22:34:13 -0000
X-SWARE-Spam-Status: No, hits=-5.1 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,SARE_LWSHORTT,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail-iy0-f175.google.com (HELO mail-iy0-f175.google.com) (209.85.210.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 11 Apr 2012 22:33:59 +0000
Received: by iaag37 with SMTP id g37so1985050iaa.20        for <gcc@gcc.gnu.org>; Wed, 11 Apr 2012 15:33:58 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=google.com; s=20120113;        h=mime-version:in-reply-to:references:date:message-id:subject:from:to         :cc:content-type:x-system-of-record:x-gm-message-state;        bh=USiRWf7DpqO4opo0oS2RAwsS07fU+97JJzBqER4Ou54=;        b=bKO91vzWoiGrjK+oNCV04Tm/FwnsR7Dg4a4Br8oQepEQvJN190fEsY7VWOIiMWrHL/         FD0cbokzPsBu5CB4vHdCpWjcZ40YQ6cG8XmieoITyld74P4n7Mt/ShUpnzC80eR8PRXK         y8W9SY1Na0wWBmstwNorfEfVGbRKyQKzeMw0E/H4GFRfuCfx9URpFgZ9hhI6eyGivAqB         e0cPm841Pr4x72L3kzkhrez9hgNREMD8hojjtZ290ifKcMhaFyO+hcx4kNORAflVZXs8         eOoNNlN/xw13YgVrk5icAVN/+bTPjlesp86F2kbeaXNUyjcRvgVYlMs5Hm02Z47m/RHp         Z7mA==
Received: by 10.50.154.167 with SMTP id vp7mr65325igb.55.1334183638779;        Wed, 11 Apr 2012 15:33:58 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.50.154.167 with SMTP id vp7mr65313igb.55.1334183638680; Wed, 11 Apr 2012 15:33:58 -0700 (PDT)
Received: by 10.231.210.134 with HTTP; Wed, 11 Apr 2012 15:33:58 -0700 (PDT)
In-Reply-To: <CAFiYyc1jArwgQxGb06+x58v4KMwJZeXyU6j4YXKAAL29Ro0wEQ@mail.gmail.com>
References: <4F7B356E.9080003@google.com>	<CAGWvnymDHXtN1AR9hdrYpV7UVw-rkk5ZiM0kS9DbVLY24xQ-6Q@mail.gmail.com>	<CAAiZkiA29bnrEHg3jHyOtmdFo1HewaW-rp3KYOKC+gfEQ1pXzA@mail.gmail.com>	<CAFiYyc0wG3ha4B4BgA6g4NPnBG6Pj3iuMZ+_B+3AOgBkKvXpLg@mail.gmail.com>	<4F7C35A3.3080207@codesourcery.com>	<CAFiYyc0knheu7jRBUd5Vtva5Bj7GBypzCQ9BFmFYGXzLcFBYGA@mail.gmail.com>	<CAAkRFZ+4RfHszKh50DW1wKSis0wo3516Hy8626FNpYwgGWdABQ@mail.gmail.com>	<20120410084614.GJ6148@sunsite.ms.mff.cuni.cz>	<CAAkRFZKFu234Q7+Rm+DRpDJPe9Rr0jtxh26sZWVVeEE1mCDfwg@mail.gmail.com>	<20120410163905.GK6148@sunsite.ms.mff.cuni.cz>	<CAGqM8fYoo9=mEjCJeY92y9FGLqoBqHg4KStkyEGUvV18My9YpA@mail.gmail.com>	<CAFiYyc1jArwgQxGb06+x58v4KMwJZeXyU6j4YXKAAL29Ro0wEQ@mail.gmail.com>
Date: Wed, 11 Apr 2012 22:34:00 -0000
Message-ID: <CAGqM8fb6o=mJ_at1TDicQ2F2yKj_XLkD-2F1nt5aVPqUb8navw@mail.gmail.com>
Subject: Re: Switching to C++ by default in 4.8
From: Lawrence Crowl <crowl@google.com>
To: Richard Guenther <richard.guenther@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>, Xinliang David Li <davidxl@google.com>, 	Bernd Schmidt <bernds@codesourcery.com>, Gabriel Dos Reis <gdr@integrable-solutions.net>, 	David Edelsohn <dje.gcc@gmail.com>, Diego Novillo <dnovillo@google.com>, gcc <gcc@gcc.gnu.org>
Content-Type: text/plain; charset=ISO-8859-1
X-System-Of-Record: true
X-Gm-Message-State: ALoCoQmI8rL7YKY0ni078uGeetpstvRhwtlJC9xilMvdGC9MSqck8D6M14m3FyN8JqaNAU9tjW+pK89vmkgYzR8LqJvPM9kyAGfAiTLDPhZMyNBqOpRpbYIewb4G2eH33d/fl+KY8SQr
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2012-04/txt/msg00464.txt.bz2

On 4/11/12, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Apr 11, 2012 Lawrence Crowl <crowl@google.com> wrote:
> > On 4/10/12, Jakub Jelinek <jakub@redhat.com> wrote:
> > > That when stepping through code in the debugger you keep
> > > enterring/exiting these one liner inlines, most of them
> > > really should be at least by default considered just as normal
> > > statements (e.g. glibc heavily uses artificial attribute for
> > > those, still gdb doesn't hide those by default).
> >
> > You do want to step into those inline functions, except when
> > you do.  In the short term, we can make the debugger behave
> > as though they did not exist.  In the longer term, we really
> > want debugging tools that help C++ programmers.  One way to
> > get there is to use C++ ourselves.
>
> Fix the debugger first please.

And when the debugger says "show us you're using C++ first", what do
we do?  Based on discussions that I have had, this problem is real.

> > > > The above is just quickly cooked up examples. A carefully
> > > > designed C++ based API can be self documenting and make
> > > > the client code very readable. It is hard to believe that
> > > > there is no room for improvement in GCC.
> > >
> > > Do you have examples?  E.g. I haven't touched gold, because,
> > > while it is a new C++ codebase, looks completely unreadable to
> > > me, similarly libdw C++ stuff.  A carefully designed C based
> > > API can be self documenting and make the code very readable
> > > as well, often more so.
> >
> > If you just look at any decently sized code base, it'll look
> > pretty much unreadable.  The question is how quickly can
> > someone who learns the base vocabulary can produce reasonable
> > modifications.
> >
> > There are many places where C++ can help substantially.
> > For example:
> >
> > () The C++ postfix member function call syntax means that
> > following a chain of attributes is a linear read of the
> > expression.  With C function call syntax, you need to read the
> > expression inside out.
>
> It's a matter of what you are used to (consider LISP).

Certainly.  When I was learning to ride horses, every time I would
get comfortable, my instructor would say, now do it this other way.
It was very uncomfortable, but I got over that and improved my
riding.  I went through that same transition when I was switching
to C++.

> > () C++ has both overloaded functions and member functions,
> > so you can use the same verb to talk about several different
> > kinds of objects.  With C function names, we have to invent
> > a new function name for each type.  Such names are longer and
> > burden both the author and the reader of the code.
>
> Agreed.  Function overloading is one of the nice things that
> does not automatically make the code-base look "partial C++".
> Likewise operator overloading can make things like
>
> bit_offset = double_int_add (bit_offset,
>     tree_to_double_int (DECL_FIELD_BIT_OFFSET (field)));
>
> be just
>
> bit_offset = bit_offset + DECL_FIELD_BIT_OFFSET (field);
>
> it still looks like C but with some C++ "magic".
>
> > () Standard C++ idioms enable mashing program components
> > with ease.  The C++ standard library is based on mixing and
> > matching algorithms and data structures, via the common idiom
> > of iterators.
>
> Sort-of agreed.  Though iterator-style (and more so functor style)
> was never one of my favorite.
>
> > () The overloadable operator new means that memory can be
> > _implicitly_ allocated in the right place.
>
> Implicit allocation is bad.  In a compiler you want to _see_
> where you spend memory.

The operator new is explicit, but the source of the memory for
that allocation is implicit.  You want to be able to _change_
where you allocate memory without touching half the source base.
Operator new overloads enable that precisely because you do not
have to say where the memory comes from each time you allocate.

> > () Constructors and destructors reduce the number of places in
> > the code where you need to do explicit memory management. Without
> > garbage collection, leaks are less frequent.  With garbage
> > collection, you have much less active garbage, and can run
> > longer between collection runs.  Indeed, a conservative collector
> > would be sufficient.
>
> Time will tell.
>
> > () Constructors and destructors also neatly handle actions that
> > must occur in pairs.  The classic example is mutex lock and
> > unlock.  Within GCC, timevar operations need to happen in pairs.
>
> Agreed.
>
> > () Class hierarchies (even without virtual functions) can
> > directly represent type relationships, which means that a
> > debugger dump of a C++ type has little unnecessary information,
> > as opposed to the present union of structs approach with
> > GCC trees.
>
> In GCC trees only the "base" is a union, and it is so as
> implementation detail.  That gdb does not grok a 'tree' well is
> because gdb is stupid.  All the information is there.

It is an implementation detail that causes friction with the
programming environment.

> > () Class hierarchies also mean that programmers can distinguish
> > in the pointer types that a function needs a decl parameter,
> > without having to say 'all trees' versus 'a very specific tree'.
> > The static type checking avoids run-time bugs.
>
> True.  In a very limited set of cases.  C++ is not powerful enough to
> express pointer-to-everything-that-would-be-considered-a-gimple-val.
> Maybe C++ is not the right choice after all?  (I suppose C++ concepts
> would have helped here? pointer-to-tree-that-fulfils-is_gimple_val
> ...  (though is_gimple_val is not be a static property).
>
> > I have written compilers in both C and C++.  I much prefer
> > the latter.
>
> Did you ever try to convert an existing large C codebase to C++?
> I would not expect a very good result and rather start from
> scratch.  So I don't see that we ever arrive (or want to arrive)
> at a pure C++-style GCC.  Instead I expect we end up (and desire
> to end up) with GCC compiled with a C++ compiler that uses C++
> features to make the existing style more readable and maintainable.

While I didn't start the process, I have worked on a C++ compiler
that was in transition from a C source base to a C++ source base.
The parts of the compiler that didn't need much attention still
had a C style.  The parts that did need attention, or provided
immediate benefit, changed to a C++ style fairly rapidly.  Even so,
after more than a decade, the compiler had a mix of styles.  For all
I know, it may still have a mix.

One of the changes I made was to convert an enum into a class
with member functions, etc.  The functional change required more
information than the enum could represent.  The enum was passed in
a single register, while the class was copied for each parameter.
Assignment changed from register-to-register into memcpy.  So,
the instruction overhead for this type jumped substantially.
After 20,000 lines modified in this change, I benchmarked the
compiler and it was 1% faster.  Yes, faster.  The reason is that in
the process I reorganized the associated parsing and error checking.
I took a micro-optimization hit, but won a bigger macro-optimization.

The essential benefit of C++ is that it is easier to write and
use good abstractions, which enables higher-level changes for
higher-level effects.

-- 
Lawrence Crowl