From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-173805-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24268 invoked by alias); 12 Apr 2012 10:30:29 -0000
Received: (qmail 24258 invoked by uid 22791); 12 Apr 2012 10:30:28 -0000
X-SWARE-Spam-Status: No, hits=-4.8 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE
X-Spam-Check-By: sourceware.org
Received: from mail-iy0-f175.google.com (HELO mail-iy0-f175.google.com) (209.85.210.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 12 Apr 2012 10:30:06 +0000
Received: by iaag37 with SMTP id g37so2761260iaa.20        for <gcc@gcc.gnu.org>; Thu, 12 Apr 2012 03:30:05 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.50.183.193 with SMTP id eo1mr1831720igc.20.1334226605677; Thu, 12 Apr 2012 03:30:05 -0700 (PDT)
Received: by 10.42.228.200 with HTTP; Thu, 12 Apr 2012 03:30:05 -0700 (PDT)
In-Reply-To: <CAPOVtOvvZK43aCqEKbcoaC=97EWL14N-b+Q2sUYjT8pjow2ySg@mail.gmail.com>
References: <4F7B356E.9080003@google.com>	<CAGWvnymDHXtN1AR9hdrYpV7UVw-rkk5ZiM0kS9DbVLY24xQ-6Q@mail.gmail.com>	<CAAiZkiA29bnrEHg3jHyOtmdFo1HewaW-rp3KYOKC+gfEQ1pXzA@mail.gmail.com>	<CAFiYyc0wG3ha4B4BgA6g4NPnBG6Pj3iuMZ+_B+3AOgBkKvXpLg@mail.gmail.com>	<4F7C35A3.3080207@codesourcery.com>	<CAFiYyc0knheu7jRBUd5Vtva5Bj7GBypzCQ9BFmFYGXzLcFBYGA@mail.gmail.com>	<CAAkRFZ+4RfHszKh50DW1wKSis0wo3516Hy8626FNpYwgGWdABQ@mail.gmail.com>	<20120410084614.GJ6148@sunsite.ms.mff.cuni.cz>	<CAAkRFZKFu234Q7+Rm+DRpDJPe9Rr0jtxh26sZWVVeEE1mCDfwg@mail.gmail.com>	<20120410163905.GK6148@sunsite.ms.mff.cuni.cz>	<CAGqM8fYoo9=mEjCJeY92y9FGLqoBqHg4KStkyEGUvV18My9YpA@mail.gmail.com>	<CAPOVtOvvZK43aCqEKbcoaC=97EWL14N-b+Q2sUYjT8pjow2ySg@mail.gmail.com>
Date: Thu, 12 Apr 2012 10:30:00 -0000
Message-ID: <CAFiYyc2npQNerfv2NemZpsKVhxm0Pp8Ca5iVsF+G4+p45bktog@mail.gmail.com>
Subject: Re: Switching to C++ by default in 4.8
From: Richard Guenther <richard.guenther@gmail.com>
To: Chiheng Xu <chiheng.xu@gmail.com>
Cc: Lawrence Crowl <crowl@google.com>, Jakub Jelinek <jakub@redhat.com>, 	Xinliang David Li <davidxl@google.com>, Bernd Schmidt <bernds@codesourcery.com>, 	Gabriel Dos Reis <gdr@integrable-solutions.net>, David Edelsohn <dje.gcc@gmail.com>, 	Diego Novillo <dnovillo@google.com>, gcc <gcc@gcc.gnu.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2012-04/txt/msg00482.txt.bz2

On Thu, Apr 12, 2012 at 11:28 AM, Chiheng Xu <chiheng.xu@gmail.com> wrote:
>
> The reason why GCC's code is very hard to hack is not simple. In part,
> this is because GCC use a very old, extremely hard to understand build
> system. In part, this is because GCC developer are more focused on
> fixing bugs or adding new features, rather than re-factoring GCC's
> code itself. =A0For example, for a .c file that have 15 years old,
> people tend to fix its bugs to make it more and more ugly, rather to
> rewrite it.
>
> But I think the big reason is that, GCC tend to have extremely large
> .c files, which is typical > 6000 LOC. If you look at LLVM, there are
> rarely source code files that is > 2000 LOC. =A0Typical LLVM source code
> files have 1000~2000 LOC. =A0Just separating =A0a source code file of 6000
> LOC to several small files or file sections of 1000 LOC can improve
> the code significantly. =A0Why has this not been done before ? =A0GCC
> developers are reluctant to re-factoring their code may be the reason.
> And, as the .c file grows, it become even harder to re-factor.
> Thinking in C++ can help you write smaller, easier to understand,
> easier to maintain code(C or C++), which have high cohesion and low
> coupling.
>
> And I think the file names of GCC's source can also be changed more
> friendly to newbies, using some notion of FQN(fully qualified name)
> may be good.

I think one of the reasons is a tools deficiency - at least subversion (whi=
ch
we use) is not able to track code motion, so if you dig in the revision his=
tory
you will need more intermediate steps, but more important, rely on 2nd level
information (like the ChangeLog entry) to tell where a function was moved f=
rom.

Still some refactoring happens (I think mostly trying to remove APIs
is important).
But yes, I think we never renamed files ... I suppose when we start moving
things into sub-directories that would be a good time to re-think names.  At
least subversion can handle file-renames just fine ;)

Yes, files are too big - but splitting them is not easy unless you can
figure out
a hierarchy that you can expose.  The largest file is dwarf2out.c with
22825 lines,
but the average is more like 2000 (just looking at gcc/*.c files).
There are only
23 files bigger than 6000 lines (out of 356), so the situation is not as ba=
d as
you paint it.  But yes, looking at filenames hardly tells you about its con=
tents
anymore.

Richard.