From: Giuliano Belinassi <giuliano.belinassi@usp.br>
To: Richard Biener <rguenther@suse.de>
Cc: gcc@gcc.gnu.org, mjambor@suse.cz, hubicka@ucw.cz
Subject: Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
Date: Tue, 17 Mar 2020 17:04:50 -0300 [thread overview]
Message-ID: <20200317200449.mdgono6renx3ssra@smtp.gmail.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2003161351130.5137@zhemvz.fhfr.qr>
Hi, Richi
Thank you for your review!
On 03/16, Richard Biener wrote:
> On Fri, 13 Mar 2020, Giuliano Belinassi wrote:
>
> > Hi, all
> >
> > I want to propose and apply for the following GSoC project: Automatic
> > Detection of Parallel Compilation Viability.
> >
> > Here is the proposal, and I am attaching a pdf file for better
> > readability:
> >
> > **Automatic Detection of Parallel Compilation Viability**
> >
> > [Giuliano Belinassi]{style="color: darkgreen"}\
> > Timezone: GMT$-$3:00\
> > University of São Paulo -- Brazil\
> > IRC: giulianob in \#gcc\
> > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > Github: <https://github.com/giulianobelinassi/>\
> >
> > About Me: Computer Science Bachelor (University of São Paulo), currently
> > pursuing a Masters Degree in Computer Science at the same institution.
> > I've always been fascinated by topics such as High-Performance Computing
> > and Code Optimization, having worked with a parallel implementation of a
> > Boundary Elements Method software in GPU. I am currently conducting
> > research on compiler parallelization and developing the
> > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > already presented it in [GNU Cauldron
> > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> >
> > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > Parallelism, Multithreaded Debugging and other typical programming
> > tools.
> >
> > Brief Introduction
> >
> > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > parallelizing the Intra Procedural optimizations improves speed when
> > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > showed that this takes 75% of compilation time.
> >
> > In this project we plan to use the LTO infrastructure to improve
> > compilation performance in the non-LTO case, with a tradeoff of
> > generating a binary as good as if LTO is disabled. Here, we will
> > automatically detect when a single file will benefit from parallelism,
> > and proceed with the compilation in parallel if so.
> >
> > Use of LTO
> >
> > The Link Time Optimization (LTO) is a compilation technique that allows
> > the compiler to analyse the program as a whole, instead of analysing and
> > compiling one file at time. Therefore, LTO is able to collect more
> > information about the program and generate a better optimization plan.
> > LTO is divided in three parts:
> >
> > - *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > stage runs sequentially in each file and, therefore, in parallel in
> > the project compilation.
> >
> > - *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > (IPA) in the entire program. This state runs serially in the
> > project.
> >
> > - *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > Optimizations in each partition. This stage runs in parallel.
> >
> > Since WPA can bottleneck the compilation because it runs serially in the
> > entire project, LTO was designed to produce faster binaries, not to
> > produce binaries fast.
> >
> > Here, the proposed use of LTO to address this problem is to run the IPA
> > for each Translation Unit (TU), instead in the Whole Program, and
> > automatically detect when to partition the TU into multiple LTRANS to
> > improve performance. The advantage of this approach is:
>
> "to improve compilation performance"
>
> > - It can generate binaries as good as when LTO is disabled.
> >
> > - It is faster, as we can partition big files into multiple partitions
> > and compile these partitions in parallel
> >
> > - It can interact with GNU Make Jobserver, improving CPU utilization.
>
> The previous already improves CPU utilization, I guess GNU make jobserver
> integration avoids CPU overcommit.
>
> > Planned Tasks
> >
> > I plan to use the GSoC time to develop the following topics:
> >
> > - Week \[1, 3\] -- April 27 to May 15:\
> > Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> > IPA analysis directly into multiple LTRANS partitions, instead of
> > generating a temporary GIMPLE file.
>
> To summarize in my own words:
>
> After IPA analysis partition the CU into possibly multiple LTRANS
> partitions even for non-LTO compilations. Invoke LTRANS compilation
> for partitions 2..n without writing intermediate IL through mechanisms
> like forking.
>
> I might say that you could run into "issues" here with asm_out_file
> already opened and partially written to. Possibly easier (but harder
> on the driver side) would be to stream LTO LTRANS IL for partitions
> 2..n and handle those like with regular LTO operation. But I guess
> I'd try w/o writing IL first and only if it turns out too difficult
> go the IL writing way.
Ok. I changed the application text based on that.
>
> > - Week \[4, 7\] -- May 18 to June 12:\
> > Update the `gcc` driver to take these multiple LTRANS partitions,
> > then call the compiler and assembler for each of them, and merge the
> > results into one object file. Here I will use the LTO LTRANS object
> > streaming, therefore it should interact with GNU Make Jobserver.
>
> Hmm, so if you indeed want to do that as second step the first step
> would still need driver modifications to invoke the assembler. I think
> in previous discussions I suggested to have the driver signal cc1 and
> friends via a special -fsplit-tu-to-asm-outputs=<tempfile> argument that
> splitting is desirable and that the used output assembler files should
> be written to <tempfile> so the driver can pick them up for assembling
> and linking.
>
> You also miss the fact that the driver also needs to invoke the linker
> to merge the N LTRANS objects back to one.
Actually I tried to avoid entering in such technical detail here, but
it indeed makes the proposal more concrete.
>
> I suggest you first ignore the jobserver and try doing without
> LTRANS IL streaming. I think meanwhile lto1 got jobserver support
> for the WPA -> LTRANS streaming so you can reuse that for jobserver
> aware "forking" (and later assembling in the driver). Using
> a named pipe or some other mechanism might also allow to pick up
> assembler output for the individual units as it becomes ready rather
> than waiting for the slowest LTRANS unit to finish compiling.
Ok. I updated the proposal with this information.
>
> > - Week 8 -- June 15 to 19: **First Evaluation**\
> > Deliver a non-optimized version of the project. Some programs ought
> > to be compiled correctly, but probably there will be a huge overhead
> > because so far there will not be any criteria about when to
> > partition. Some tests are also planned for this evaluation.
> >
> > - Week \[9, 11\] -- June 22 to July 10:\
> > Implement a criteria about when to partition, and interactively
> > improve it based on data.
>
> I think this should be already there (though we error on the side of
> generating "more" partitions). For non-LTO parallelizing operation
> we maybe want to tune the various --params that are available
> though (lto-min-partition and lto-partitions).
This is interesting. Here at the Lab we have a student which developed
experiment design model to predict how some parameters can impact in the
final result. Could you please give me more details about these
parameters?
>
> So I'd suggest to concentrate on the jobserver integration for the
> second phase?
I just changed the proposal to focus in jobserver integration at this
stage.
>
> Otherwise the proposal looks good and I'm confident we can deliver
> something that will be ready for real-world usage for GCC 11!
Thank you :)
Giuliano.
>
> Thanks,
> Richard.
>
> > - Week 12 -- July 13 to 17: **Second Evaluation**\
> > Deliver a more optimized version of the project. Here we should
> > filter files that would compile fast from files that would require
> > partitioning, and therefore we should see some speedup.
> >
> > - Week \[13, 15\] -- July 20 to August 10:\
> > Develop adequate tests coverage and address unexpected issues so
> > that this feature can be merged to trunk for the next GCC release.
> >
> > - Week 16: **Final evaluation**\
> > Deliver the final product as a series of patches for trunk.
> >
> >
> > Thank you
> > Giuliano.
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2020-03-17 20:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200313200551.viqhqgjw3gixjarw@smtp.gmail.com>
2020-03-16 13:08 ` Richard Biener
2020-03-17 20:04 ` Giuliano Belinassi [this message]
2020-03-18 11:44 ` Richard Biener
2020-03-13 20:15 Giuliano Belinassi
2020-03-17 20:24 ` Giuliano Belinassi
2020-03-18 14:27 ` Richard Biener
2020-03-24 0:37 ` Giuliano Belinassi
2020-03-24 7:20 ` Richard Biener
2020-03-24 20:54 ` Giuliano Belinassi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200317200449.mdgono6renx3ssra@smtp.gmail.com \
--to=giuliano.belinassi@usp.br \
--cc=gcc@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=mjambor@suse.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).