From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5728 invoked by alias); 5 Jul 2011 16:59:03 -0000 Received: (qmail 5716 invoked by uid 22791); 5 Jul 2011 16:59:02 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,T_FRT_PROFILE2,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mailout-de.gmx.net (HELO mailout-de.gmx.net) (213.165.64.22) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Tue, 05 Jul 2011 16:58:47 +0000 Received: (qmail invoked by alias); 05 Jul 2011 16:58:44 -0000 Received: from mon.egee-see.org (EHLO [139.91.70.93]) [139.91.70.93] by mail.gmx.net (mp006) with SMTP; 05 Jul 2011 18:58:44 +0200 Date: Tue, 05 Jul 2011 16:59:00 -0000 From: Dimitrios Apostolou To: Philip Herron cc: gcc@gcc.gnu.org Subject: Re: GSOC - Student Roundup In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LNX 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg00041.txt.bz2 Hi Philip, thanks for writing your experiences, I found it very useful. I certainly like the idea of having such a thread every once in a while, just to keep everyone updated about our projects. I'm also curious to learn about the experiences of other students that are writing code for GCC for the first time. Here goes mine: I am Dimitrios Apostolou (jimis on IRC) and I live in Heraklion-Crete, Greece. My project concerns making GCC leaner and faster. Reading GCC codebase has been a hard exercise for me. In fact it's the only project I know of that becomes more and more difficult as time passes... I will try to describe some of the major hurdles I've faced so far. I started by profiling the execution of cc1, the C compiler. In general I could find no big hot-spot, it was in good shape, but I could see 3-4 areas that could make some difference if improved (for example hash tables, assembly output, C parser, bitmaps). But diving in and trying to change things is a completely different story. Minor tweaks are easy to make, but usually have minor impact. If you want to see bigger speedup you have to break the interface of functions being used in hundreds of places, and that is hard. Sometimes it was impossible for me, I was getting crashes in places far away of code I had changed, so I ended up reverting to original versions. Spending some time with a specific part of GCC's codebase gives you the ability to dive deeper and work more efficiently. But that is the point when I usually have achieved something and I must move on to some other part. And the whole GCC codebase is so huge, that understanding one part means nothing when you move to another. My advice here is that if your project permits that, touch as little code as possible in GCC, and be really proficient with that. Treat the rest as a black box, or you'll spend too much time trying to understand everything. Another hurdle is the usage of too many macros. Even if they exist for making the code easier to read, I can't see how they achieve this in a few extreme cases. I have had gdb expand 20 full-lines macros on a wide screen. Plus the profiler can't actually profile code in macros, so the impact of some data structures in performance is hidden that way. My moments of greatest awe/horror so far have been while changing things in vectors (vec.[ch]), which is actually a fully templated structure implemented in CPP! Finally I believe that some parts of the compiler should have a big NO-ENTRY flag for beginners. In my case, after having improved little stuff in assembly output and hash tables, I decided -driven by profiler's output- to try improving things in dataflow analysis part of GCC. It's true that there is much to be improved there but it requires a good understanding of this complex part. Three weeks later I am still striving to change simple stuff and jump to the next part, but regressions I've introduced don't allow me to do so yet. The level of my understanding of this part is still basic, I've now only scratched the surface of Dataflow Analysis. If I had this knowledge in the beginning I'd probably leave that part for the end of the summer, if at all. My plans included visiting IRA (register allocator) next, but I think I'll skip directly to the c-parser which I understand more. These are my major difficulties with GCC, I'm curious to learn about other students experience so far. Of course don't get the wrong impression, my general feeling on GCC development is positive, the community is helpful and really friendly inspite of my daily spamming on the IRC. :-p In the end I feel the fact that GCC is a multi-headed monster makes it even more exciting to try and tame it. Good luck in everyone's project, Dimitris