public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Giuliano Belinassi <giuliano.belinassi@usp.br>
To: gcc@gcc.gnu.org
Subject: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
Date: Fri, 13 Mar 2020 17:15:35 -0300	[thread overview]
Message-ID: <20200313201535.4hguknuxjejpt6f3@smtp.gmail.com> (raw)

Hi, all

I want to propose and apply for the following GSoC project: Automatic
Detection of Parallel Compilation Viability.

https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf

Feedback is welcome :)

Here is a markdown version of it:

**Automatic Detection of Parallel Compilation Viability**

[Giuliano Belinassi]{style="color: darkgreen"}\
Timezone: GMT$-$3:00\
University of São Paulo -- Brazil\
IRC: giulianob in \#gcc\
Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
Github: <https://github.com/giulianobelinassi/>\
Date:

About Me Computer Science Bachelor (University of São Paulo), currently
pursuing a Masters Degree in Computer Science at the same institution.
I've always been fascinated by topics such as High-Performance Computing
and Code Optimization, having worked with a parallel implementation of a
Boundary Elements Method software in GPU. I am currently conducting
research on compiler parallelization and developing the
[ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
already presented it in [GNU Cauldron
2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).

**Skills**: Strong knowledge in C, Concurrency, Shared Memory
Parallelism, Multithreaded Debugging and other typical programming
tools.

Brief Introduction

In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
parallelizing the Intra Procedural optimizations improves speed when
compiling huge files by a factor of 1.8x in a 4 cores machine, and also
showed that this takes 75% of compilation time.

In this project we plan to use the LTO infrastructure to improve
compilation performance in the non-LTO case, with a tradeoff of
generating a binary as good as if LTO is disabled. Here, we will
automatically detect when a single file will benefit from parallelism,
and proceed with the compilation in parallel if so.

Use of LTO

The Link Time Optimization (LTO) is a compilation technique that allows
the compiler to analyse the program as a whole, instead of analysing and
compiling one file at time. Therefore, LTO is able to collect more
information about the program and generate a better optimization plan.
LTO is divided in three parts:

-   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
    stage runs sequentially in each file and, therefore, in parallel in
    the project compilation.

-   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
    (IPA) in the entire program. This state runs serially in the
    project.

-   *LTRANS (Local Transformation)*: Execute all Intra Procedural
    Optimizations in each partition. This stage runs in parallel.

Since WPA can bottleneck the compilation because it runs serially in the
entire project, LTO was designed to produce faster binaries, not to
produce binaries fast.

Here, the proposed use of LTO to address this problem is to run the IPA
for each Translation Unit (TU), instead in the Whole Program, and
automatically detect when to partition the TU into multiple LTRANS to
improve performance. The advantage of this approach is:

-   It can generate binaries as good as when LTO is disabled.

-   It is faster, as we can partition big files into multiple partitions
    and compile these partitions in parallel

-   It can interact with GNU Make Jobserver, improving CPU utilization.

Planned Tasks

I plan to use the GSoC time to develop the following topics:

-   Week \[1, 3\] -- April 27 to May 15:\
    Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
    IPA analysis directly into multiple LTRANS partitions, instead of
    generating a temporary GIMPLE file.

-   Week \[4, 7\] -- May 18 to June 12:\
    Update the `gcc` driver to take these multiple LTRANS partitions,
    then call the compiler and assembler for each of them, and merge the
    results into one object file. Here I will use the LTO LTRANS object
    streaming, therefore it should interact with GNU Make Jobserver.

-   Week 8 -- June 15 to 19: **First Evaluation**\
    Deliver a non-optimized version of the project. Some programs ought
    to be compiled correctly, but probably there will be a huge overhead
    because so far there will not be any criteria about when to
    partition. Some tests are also planned for this evaluation.

-   Week \[9, 11\] -- June 22 to July 10:\
    Implement a criteria about when to partition, and interactively
    improve it based on data.

-   Week 12 --- July 13 to 17: **Second Evaluation**\
    Deliver a more optimized version of the project. Here we should
    filter files that would compile fast from files that would require
    partitioning, and therefore we should see some speedup.

-   Week \[13, 15\] --- July 20 to August 10:\
    Develop adequate tests coverage and address unexpected issues so
    that this feature can be merged to trunk for the next GCC release.

-   Week 16: **Final evaluation**\
    Deliver the final product as a series of patches for trunk.


             reply	other threads:[~2020-03-13 20:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-13 20:15 Giuliano Belinassi [this message]
2020-03-17 20:24 ` Giuliano Belinassi
2020-03-18 14:27   ` Richard Biener
2020-03-24  0:37     ` Giuliano Belinassi
2020-03-24  7:20       ` Richard Biener
2020-03-24 20:54         ` Giuliano Belinassi
     [not found] <20200313200551.viqhqgjw3gixjarw@smtp.gmail.com>
2020-03-16 13:08 ` Richard Biener
2020-03-17 20:04   ` Giuliano Belinassi
2020-03-18 11:44     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200313201535.4hguknuxjejpt6f3@smtp.gmail.com \
    --to=giuliano.belinassi@usp.br \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).