Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Giuliano Belinassi <giuliano.belinassi@usp.br>
Cc: gcc@gcc.gnu.org, mjambor@suse.cz, hubicka@ucw.cz
Subject: Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
Date: Tue, 24 Mar 2020 08:20:43 +0100 (CET)	[thread overview]
Message-ID: <nycvar.YFH.7.76.2003240820340.5137@zhemvz.fhfr.qr> (raw)
In-Reply-To: <20200324003712.6mskkje7dw32isz2@smtp.gmail.com>

On Mon, 23 Mar 2020, Giuliano Belinassi wrote:

> Hi, Richi
> 
> On 03/18, Richard Biener wrote:
> > On Tue, 17 Mar 2020, Giuliano Belinassi wrote:
> > 
> > > Hi, all
> > > 
> > > I have applied some revews to the project. Please see the new proposal
> > > here:
> > 
> > Looks good, some editorial changes below
> > 
> > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > 
> > > **Automatic Detection of Parallel Compilation Viability**
> > > 
> > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > Timezone: GMT$-$3:00\
> > > University of São Paulo -- Brazil\
> > > IRC: giulianob in \#gcc\
> > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > Github: <https://github.com/giulianobelinassi/>\
> > > Date:
> > > 
> > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > pursuing a Masters Degree in Computer Science at the same institution.
> > > I've always been fascinated by topics such as High-Performance Computing
> > > and Code Optimization, having worked with a parallel implementation of a
> > > Boundary Elements Method software in GPU. I am currently conducting
> > > research on compiler parallelization and developing the
> > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > already presented it in [GNU Cauldron
> > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > 
> > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > Parallelism, Multithreaded Debugging and other typical programming
> > > tools.
> > > 
> > > Brief Introduction
> > > 
> > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > parallelizing the Intra Procedural optimizations improves speed when
> > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > showed that this takes 75% of compilation time.
> > > 
> > > In this project we plan to use the LTO infrastructure to improve
> > > compilation performance in the non-LTO case, with a tradeoff of
> > > generating a binary as good as if LTO is disabled. Here, we will
> > > automatically detect when a single file will benefit from parallelism,
> > > and procceed with the compilation in parallel if so.
> > > 
> > > Use of LTO
> > > 
> > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > the compiler to analyse the program as a whole, instead of analysing and
> > > compiling one file at time. Therefore, LTO is able to collect more
> > > information about the program and generate a better optimization plan.
> > > LTO is divided in three parts:
> > > 
> > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > >     stage runs sequentially in each file and, therefore, in parallel in
> > >     the project compilation.
> > > 
> > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > >     (IPA) in the entire program. This state runs serially in the
> > >     project.
> > > 
> > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > >     Optimizations in each partition. This stage runs in parallel.
> > > 
> > > Since WPA can bottleneck the compilation because it runs serially in the
> > > entire project, LTO was designed to produce faster binaries, not to
> > > produce binaries fast.
> > > 
> > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > for each Translation Unit (TU), instead in the Whole Program, and
> > 
> > This proposal is to use LTO to produce binaries fast by running
> > the IPA phase separately for each Translation Unit (TU), instead of on the 
> > Whole Program and ...
> > 
> > > automatically detect when to partition the TU into multiple LTRANS to
> > > improve compilation performance. The advantage of this approach is:
> > > 
> > > -   It can generate binaries as good as when LTO is disabled.
> > > 
> > > -   It is faster, as we can partition big files into multiple partitions
> > >     and compile these partitions in parallel.
> > > 
> > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > 
> > This reads a bit odd, regular compilation already interacts with the
> > GNU Make Jobserver.  I'd reorder and reword it w/o dashes like
> > 
> > We can partition big files into multiple partitions and compile these 
> > partitions in parallel which should improve CPU utilization by exposing
> > smaller chunks to the GNU Make Jobserver.  Code generation quality
> > should be unaffected by this.
> 
> How about:
> 
> ```
> The advantage of this approach is: by partitioning big files into
> multiple partitions, we can improve the compilation performance by
> exposing these partitions to the Jobserver. Therefore, it can improve
> CPU utilization in manycore machines.  Generated code quality should be
> unaffected by this procedure, which means that it should run as fast as
> when LTO is disabled.
> ```
> ?

Sounds great.

Richard.

> > 
> > Thanks,
> > Richard.
> > 
> > > Planned Tasks
> > > 
> > > I plan to use the GSoC time to develop the following topics:
> > > 
> > > -   Week \[1, 3\] -- April 27 to May 15:\
> > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
> > >     Unit (CU) after IPA analysis directly into multiple LTRANS
> > >     partitions, instead of generating a temporary GIMPLE file, and to
> > >     accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
> > >     the generated ASM filenames will be written to.
> > > 
> > >     There are two possible cases in which I could work on:
> > > 
> > >     1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
> > >         `cc1` will fork and compile these partitions, each of them
> > >         generating a ASM file, and write the generated asm name into
> > >         `<tempfile>`. Note that if the number of partitions is one, then
> > >         this part is not necessary.
> > > 
> > >     2.  *Stream LTRANS IR*: After CU is partitionated into multiple
> > >         LTRANS, then `cc1` will write these partitions into disk so that
> > >         LTO can read these files and proceed as a standard LTO operation
> > >         in order to generate a partially linked object file.
> > > 
> > >     1\. Has the advantage of having less overhead than 2., as there is less
> > >     IO operations, however it may be hard to implement as the assembler file
> > >     may be already opened before forking, so caution is necessary to make
> > >     sure that there are a 1 - 1 relationship between assembler file and the
> > >     compilation process. 2. on the other hand can easily interact with the
> > >     GNU jobserver.
> > > 
> > > -   Week \[4, 7\] -- May 18 to June 12:\
> > >     Update the `gcc` driver to take each file in `<tempfile>`, then
> > >     assemble and partially link them together. Here, an important
> > >     optimization is to use a named pipe in `<tempfile>` to avoid having
> > >     to wait every partition to end its compilation before assembling the
> > >     files.
> > > 
> > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > >     Deliver a non-optimized version of the project. Some programs ought
> > >     to be compiled correctly, but probably there will be a huge overhead
> > >     because so far there is no way of interacting with GNU Jobserver.
> > > 
> > > -   Week \[9, 11\] -- June 22 to July 10:\
> > >     Work on GNU Make Jobserver integration. A way of doing this is to
> > >     adapt the LTO WPA -> LTRANS way of interacting with
> > >     Jobserver. Another way is to make the forked `cc1` consume Jobserver
> > >     tokens until the compilation finishes, then return the token when
> > >     done.
> > > 
> > > -   Week 12 -- July 13 to 17: **Second Evaluation**\
> > >     Deliver a more optimized version of the project. Here we should
> > >     filter files that would compile fast from files that would require
> > >     partitioning, and interact with GNU Jobserver. Therefore we should
> > >     see some speedup.
> > > 
> > > -   Week \[13, 15\] -- July 20 to August 10:\
> > >     Develop adequate tests coverage and address unexpected issues so
> > >     that this feature can be merged to trunk for the next GCC release.
> > > 
> > > -   Week 16: **Final evaluation**\
> > >     Deliver the final product as a series of patches for trunk.
> > > 
> > > On 03/13, Giuliano Belinassi wrote:
> > > > Hi, all
> > > > 
> > > > I want to propose and apply for the following GSoC project: Automatic
> > > > Detection of Parallel Compilation Viability.
> > > > 
> > > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > > 
> > > > Feedback is welcome :)
> > > > 
> > > > Here is a markdown version of it:
> > > > 
> > > > **Automatic Detection of Parallel Compilation Viability**
> > > > 
> > > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > > Timezone: GMT$-$3:00\
> > > > University of São Paulo -- Brazil\
> > > > IRC: giulianob in \#gcc\
> > > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > > Github: <https://github.com/giulianobelinassi/>\
> > > > Date:
> > > > 
> > > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > > pursuing a Masters Degree in Computer Science at the same institution.
> > > > I've always been fascinated by topics such as High-Performance Computing
> > > > and Code Optimization, having worked with a parallel implementation of a
> > > > Boundary Elements Method software in GPU. I am currently conducting
> > > > research on compiler parallelization and developing the
> > > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > > already presented it in [GNU Cauldron
> > > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > > 
> > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > > Parallelism, Multithreaded Debugging and other typical programming
> > > > tools.
> > > > 
> > > > Brief Introduction
> > > > 
> > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > > parallelizing the Intra Procedural optimizations improves speed when
> > > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > > showed that this takes 75% of compilation time.
> > > > 
> > > > In this project we plan to use the LTO infrastructure to improve
> > > > compilation performance in the non-LTO case, with a tradeoff of
> > > > generating a binary as good as if LTO is disabled. Here, we will
> > > > automatically detect when a single file will benefit from parallelism,
> > > > and proceed with the compilation in parallel if so.
> > > > 
> > > > Use of LTO
> > > > 
> > > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > > the compiler to analyse the program as a whole, instead of analysing and
> > > > compiling one file at time. Therefore, LTO is able to collect more
> > > > information about the program and generate a better optimization plan.
> > > > LTO is divided in three parts:
> > > > 
> > > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > > >     stage runs sequentially in each file and, therefore, in parallel in
> > > >     the project compilation.
> > > > 
> > > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > > >     (IPA) in the entire program. This state runs serially in the
> > > >     project.
> > > > 
> > > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > > >     Optimizations in each partition. This stage runs in parallel.
> > > > 
> > > > Since WPA can bottleneck the compilation because it runs serially in the
> > > > entire project, LTO was designed to produce faster binaries, not to
> > > > produce binaries fast.
> > > > 
> > > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > > for each Translation Unit (TU), instead in the Whole Program, and
> > > > automatically detect when to partition the TU into multiple LTRANS to
> > > > improve performance. The advantage of this approach is:
> > > > 
> > > > -   It can generate binaries as good as when LTO is disabled.
> > > > 
> > > > -   It is faster, as we can partition big files into multiple partitions
> > > >     and compile these partitions in parallel
> > > > 
> > > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > > > 
> > > > Planned Tasks
> > > > 
> > > > I plan to use the GSoC time to develop the following topics:
> > > > 
> > > > -   Week \[1, 3\] -- April 27 to May 15:\
> > > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> > > >     IPA analysis directly into multiple LTRANS partitions, instead of
> > > >     generating a temporary GIMPLE file.
> > > > 
> > > > -   Week \[4, 7\] -- May 18 to June 12:\
> > > >     Update the `gcc` driver to take these multiple LTRANS partitions,
> > > >     then call the compiler and assembler for each of them, and merge the
> > > >     results into one object file. Here I will use the LTO LTRANS object
> > > >     streaming, therefore it should interact with GNU Make Jobserver.
> > > > 
> > > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > > >     Deliver a non-optimized version of the project. Some programs ought
> > > >     to be compiled correctly, but probably there will be a huge overhead
> > > >     because so far there will not be any criteria about when to
> > > >     partition. Some tests are also planned for this evaluation.
> > > > 
> > > > -   Week \[9, 11\] -- June 22 to July 10:\
> > > >     Implement a criteria about when to partition, and interactively
> > > >     improve it based on data.
> > > > 
> > > > -   Week 12 --- July 13 to 17: **Second Evaluation**\
> > > >     Deliver a more optimized version of the project. Here we should
> > > >     filter files that would compile fast from files that would require
> > > >     partitioning, and therefore we should see some speedup.
> > > > 
> > > > -   Week \[13, 15\] --- July 20 to August 10:\
> > > >     Develop adequate tests coverage and address unexpected issues so
> > > >     that this feature can be merged to trunk for the next GCC release.
> > > > 
> > > > -   Week 16: **Final evaluation**\
> > > >     Deliver the final product as a series of patches for trunk.
> > > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> Thank you,
> Giuliano.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

next prev parent reply	other threads:[~2020-03-24  7:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-13 20:15 Giuliano Belinassi
2020-03-17 20:24 ` Giuliano Belinassi
2020-03-18 14:27   ` Richard Biener
2020-03-24  0:37     ` Giuliano Belinassi
2020-03-24  7:20       ` Richard Biener [this message]
2020-03-24 20:54         ` Giuliano Belinassi
     [not found] <20200313200551.viqhqgjw3gixjarw@smtp.gmail.com>
2020-03-16 13:08 ` Richard Biener
2020-03-17 20:04   ` Giuliano Belinassi
2020-03-18 11:44     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.76.2003240820340.5137@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc@gcc.gnu.org \
    --cc=giuliano.belinassi@usp.br \
    --cc=hubicka@ucw.cz \
    --cc=mjambor@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).