From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-197872-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30618 invoked by alias); 14 Dec 2018 14:15:54 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 30276 invoked by uid 89); 14 Dec 2018 14:15:30 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=intensively, apart, H*f:sk:ZBNiXa0, H*i:sk:ZBNiXa0
X-HELO: mail-qt1-f172.google.com
Received: from mail-qt1-f172.google.com (HELO mail-qt1-f172.google.com) (209.85.160.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 14 Dec 2018 14:15:25 +0000
Received: by mail-qt1-f172.google.com with SMTP id i7so6262696qtj.10        for <gcc@gcc.gnu.org>; Fri, 14 Dec 2018 06:15:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=usp-br.20150623.gappssmtp.com; s=20150623;        h=date:from:to:cc:subject:message-id:references:mime-version         :content-disposition:content-transfer-encoding:in-reply-to         :user-agent;        bh=yFvPC2qY65VtRL+NysUk8z09zDA9RtmgA2irpjWUfWk=;        b=BZ5vn/mpq4QnX/OgGX1lFsdXMIcjiKwn4WClXcahPUJNBEWcoxPahbxhna/K7CTb9X         dV7lWe1TAQf7Grr3qPEJb4wODbHrpdTXCL6Kfg5AGGxhXo70jlJc7xqcPKI/X8/5K4yj         q/ee2LkThob49CfX319re/BQXj2gxyGHT3dP9JWu/+pGoCnXRQgmnKuf18aSWIlo0u7b         7QjPBDlwpbdUhwJPj60KAaRnN/RM1uaDhaLzuNEvaGJDi1HC85oAk8qVBODNSekN4L7b         JA8e8AXD49ADHMVUEWlnu+wEBeKHTU8RG8KxtLF+o6J6SDNGQbejEqiLQ9iG69GuFYVz         DbyQ==
Return-Path: <giuliano.belinassi@usp.br>
Received: from smtp.gmail.com ([143.107.45.1])        by smtp.gmail.com with ESMTPSA id n26sm2356968qkg.74.2018.12.14.06.15.20        (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);        Fri, 14 Dec 2018 06:15:22 -0800 (PST)
Date: Fri, 14 Dec 2018 14:15:00 -0000
From: Giuliano Belinassi <giuliano.belinassi@usp.br>
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: Richard Guenther <richard.guenther@gmail.com>,	GCC Development <gcc@gcc.gnu.org>, kernel-usp@googlegroups.com,	gold@ime.usp.br, alfredo.goldman@gmail.com
Subject: Re: Parallelize the compilation using Threads
Message-ID: <20181214141518.get7oqqqpjmm7cnk@smtp.gmail.com>
References: <CAEFO=4A0DJVDYze7P5mCOzDjGpJC1Y180nP_UmXxwqduy87=bA@mail.gmail.com> <CAFiYyc1kogmJ_5suHg+7fDaNjrYZnGjNGq4dew5uvc+w6-_BKQ@mail.gmail.com> <CAEFO=4D2GU_KNG8Z-JH_4R7tFeU1Mm+u627HpWQJYmr5O+Ym7Q@mail.gmail.com> <CAFiYyc0HMDtPLJKkkYhFULBbGt8n-41fWaMtqLNmUOu9gird7w@mail.gmail.com> <CAEFO=4AAj6-+dSjivBP3DdWcYuw18iaC5wRrd7bTc_J2LT_+NQ@mail.gmail.com> <CAHFci28A9kAY7nJZ+ZBNiXa0Fgi3K6YR7QZr8UG4tUVupKhhvw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAHFci28A9kAY7nJZ+ZBNiXa0Fgi3K6YR7QZr8UG4tUVupKhhvw@mail.gmail.com>
User-Agent: NeoMutt/20180716
X-IsSubscribed: yes
X-SW-Source: 2018-12/txt/msg00087.txt.bz2

Hi,

See comments inline.

On 12/13, Bin.Cheng wrote:
> On Wed, Dec 12, 2018 at 11:46 PM Giuliano Augusto Faulin Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > Hi, I have some news. :-)
> >
> > I replicated the Martin LiÅ¡ka experiment [1] on a 64-cores machine for
> > gcc [2] and Linux kernel [3] (Linux kernel was fully parallelized),
> > and I am excited to dive into this problem. As a result, I want to
> > propose GSoC project on this issue, starting with something like:
> >     1- Systematically create a benchmark for easily information
> > gathering. Martin LiÅ¡ka already made the first version of it, but I
> > need to improve it.
> >     2- Find and document the global states (Try to reduce the gcc's
> > global states as well).
> >     3- Define the parallelization strategy.
> >     4- First parallelization attempt.
> Hi Giuliano,
> 
> Thanks very much for working on this.  It could be very useful, for
> example, one bottleneck we have is slow compilation of big single
> source file after intensively using distribution compilation.  Of
> course, a good parallelization strategy is needed.
> 

Interesting. How many lines the generated file has? Does it uses C++
templates?

The generated gimple-match.c file, for example, has 98786 lines and
takes about 30s to compile.

> Thanks,
> bin
> >
> > I also proposed this issue as a research project to my advisor and he
> > supported me on this idea. So I can work for at least one year on
> > this, and other things related to it.
> >
> > Would anyone be willing to mentor me on this?
> >
> > [1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> > [2] https://www.ime.usp.br/~belinass/64cores-experiment.svg
> > [3] https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg
> > On Mon, Nov 19, 2018 at 8:53 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinassi
> > > <giuliano.belinassi@usp.br> wrote:
> > > >
> > > > Hi! Sorry for the late reply again :P
> > > >
> > > > On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
> > > > > <giuliano.belinassi@usp.br> wrote:
> > > > > >
> > > > > > As a brief introduction, I am a graduate student that got interested
> > > > > >
> > > > > > in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
> > > > > > am a newcommer in GCC, but already have sent some patches, some of
> > > > > > them have already been accepted [2].
> > > > > >
> > > > > > I brought this subject up in IRC, but maybe here is a proper place to
> > > > > > discuss this topic.
> > > > > >
> > > > > > From my point of view, parallelizing GCC itself will only speed up the
> > > > > > compilation of projects which have a big file that creates a
> > > > > > bottleneck in the whole project compilation (note: by big, I mean the
> > > > > > amount of code to generate).
> > > > >
> > > > > That's true.  During GCC bootstrap there are some of those (see PR84402).
> > > > >
> > > >
> > > > > One way to improve parallelism is to use link-time optimization where
> > > > > even single source files can be split up into multiple link-time units.  But
> > > > > then there's the serial whole-program analysis part.
> > > >
> > > > Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 ?
> > > > That is a lot of data :-)
> > > >
> > > > It seems that 'phase opt and generate' is the most time-consuming
> > > > part. Is that the 'GIMPLE optimization pipeline' you were talking
> > > > about in this thread:
> > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
> > >
> > > It's everything that comes after the frontend parsing bits, thus this
> > > includes in particular RTL optimization and early GIMPLE optimizations.
> > >
> > > > > > Additionally, I know that GCC must not
> > > > > > change the project layout, but from the software engineering perspective,
> > > > > > this may be a bad smell that indicates that the file should be broken
> > > > > > into smaller files. Finally, the Makefiles will take care of the
> > > > > > parallelization task.
> > > > >
> > > > > What do you mean by GCC must not change the project layout?  GCC
> > > > > happily re-orders functions and link-time optimization will reorder
> > > > > TUs (well, linking may as well).
> > > > >
> > > >
> > > > That was a response to a comment made on IRC:
> > > >
> > > > On Thu, Nov 15, 2018 at 9:44 AM Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> > > > >I think this is in response to a comment I made on IRC. Giuliano said
> > > > >that if a project has a very large file that dominates the total build
> > > > >time, the file should be split up into smaller pieces. I said  "GCC
> > > > >can't restructure people's code. it can only try to compile it
> > > > >faster". We weren't referring to code transformations in the compiler
> > > > >like re-ordering functions, but physically refactoring the source
> > > > >code.
> > > >
> > > > Yes. But from one of the attachments from PR84402, it seems that such
> > > > files exist on GCC,
> > > > https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> > > >
> > > > > > My questions are:
> > > > > >
> > > > > >  1. Is there any project compilation that will significantly be improved
> > > > > > if GCC runs in parallel? Do someone has data about something related
> > > > > > to that? How about the Linux Kernel? If not, I can try to bring some.
> > > > >
> > > > > We do not have any data about this apart from experiments with
> > > > > splitting up source files for PR84402.
> > > > >
> > > > > >  2. Did I correctly understand the goal of the parallelization? Can
> > > > > > anyone provide extra details to me?
> > > > >
> > > > > You may want to search the mailing list archives since we had a
> > > > > student application (later revoked) for the task with some discussion.
> > > > >
> > > > > In my view (I proposed the thing) the most interesting parts are
> > > > > getting GCCs global state documented and reduced.  The parallelization
> > > > > itself is an interesting experiment but whether there will be any
> > > > > substantial improvement for builds that can already benefit from make
> > > > > parallelism remains a question.
> > > >
> > > > As I agree that documenting GCC's global states is good for the
> > > > community and the development of GCC, I really don't think this a good
> > > > motivation for parallelizing a compiler from a research standpoint.
> > >
> > > True ;)  Note that my suggestions to the other GSoC student were
> > > purely based on where it's easiest to experiment with paralellization
> > > and not where it would be most beneficial.
> > >
> > > > There must be something or someone that could take advantage of the
> > > > fine-grained parallelism. But that data from PR84402 seems to have the
> > > > answer to it. :-)
> > > >
> > > > On Thu, Nov 15, 2018 at 4:07 PM Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:
> > > > >
> > > > > On 15/11/18 10:29, Richard Biener wrote:
> > > > > > In my view (I proposed the thing) the most interesting parts are
> > > > > > getting GCCs global state documented and reduced.  The parallelization
> > > > > > itself is an interesting experiment but whether there will be any
> > > > > > substantial improvement for builds that can already benefit from make
> > > > > > parallelism remains a question.
> > > > >
> > > > > in the common case (project with many small files, much more than
> > > > > core count) i'd expect a regression:
> > > > >
> > > > > if gcc itself tries to parallelize that introduces inter thread
> > > > > synchronization and potential false sharing in gcc (e.g. malloc
> > > > > locks) that does not exist with make parallelism (glibc can avoid
> > > > > some atomic instructions when a process is single threaded).
> > > >
> > > > That is what I am mostly worried about. Or the most costly part is not
> > > > parallelizable at all. Also, I would expect a regression on very small
> > > > files, which probably could be avoided implementing this feature as a
> > > > flag?
> > >
> > > I think the the issue should be avoided by avoiding fine-grained paralellism.
> > > Which might be somewhat hard given there are core data structures that
> > > are shared (the memory allocator for a start).
> > >
> > > The other issue I am more worried about is that we probably have to
> > > interact with make somehow so that we do not end up with 64 threads
> > > when one does -j8 on a 8 core machine.  That's basically the same
> > > issue we run into with -flto and it's threaded WPA writeout or recursive
> > > invocation of make.
> > >
> > > >
> > > > On Fri, Nov 16, 2018 at 11:05 AM Martin Jambor <mjambor@suse.cz> wrote:
> > > > >
> > > > > Hi Giuliano,
> > > > >
> > > > > On Thu, Nov 15 2018, Richard Biener wrote:
> > > > > > You may want to search the mailing list archives since we had a
> > > > > > student application (later revoked) for the task with some discussion.
> > > > >
> > > > > Specifically, the whole thread beginning with
> > > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00179.html
> > > > >
> > > > > Martin
> > > > >
> > > >
> > > > Yes, I will research this carefully ;-)
> > > >
> > > > Thank you