[GSoC 2020] Automatic Detection of Parallel Compilation Viability

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [GSoC 2020] Automatic Detection of Parallel Compilation Viability
@ 2020-03-13 20:15 Giuliano Belinassi
  2020-03-17 20:24 ` Giuliano Belinassi
  0 siblings, 1 reply; 9+ messages in thread
From: Giuliano Belinassi @ 2020-03-13 20:15 UTC (permalink / raw)
  To: gcc

Hi, all

I want to propose and apply for the following GSoC project: Automatic
Detection of Parallel Compilation Viability.

https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf

Feedback is welcome :)

Here is a markdown version of it:

**Automatic Detection of Parallel Compilation Viability**

[Giuliano Belinassi]{style="color: darkgreen"}\
Timezone: GMT$-$3:00\
University of São Paulo -- Brazil\
IRC: giulianob in \#gcc\
Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
Github: <https://github.com/giulianobelinassi/>\
Date:

About Me Computer Science Bachelor (University of São Paulo), currently
pursuing a Masters Degree in Computer Science at the same institution.
I've always been fascinated by topics such as High-Performance Computing
and Code Optimization, having worked with a parallel implementation of a
Boundary Elements Method software in GPU. I am currently conducting
research on compiler parallelization and developing the
[ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
already presented it in [GNU Cauldron
2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).

**Skills**: Strong knowledge in C, Concurrency, Shared Memory
Parallelism, Multithreaded Debugging and other typical programming
tools.

Brief Introduction

In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
parallelizing the Intra Procedural optimizations improves speed when
compiling huge files by a factor of 1.8x in a 4 cores machine, and also
showed that this takes 75% of compilation time.

In this project we plan to use the LTO infrastructure to improve
compilation performance in the non-LTO case, with a tradeoff of
generating a binary as good as if LTO is disabled. Here, we will
automatically detect when a single file will benefit from parallelism,
and proceed with the compilation in parallel if so.

Use of LTO

The Link Time Optimization (LTO) is a compilation technique that allows
the compiler to analyse the program as a whole, instead of analysing and
compiling one file at time. Therefore, LTO is able to collect more
information about the program and generate a better optimization plan.
LTO is divided in three parts:

-   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
    stage runs sequentially in each file and, therefore, in parallel in
    the project compilation.

-   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
    (IPA) in the entire program. This state runs serially in the
    project.

-   *LTRANS (Local Transformation)*: Execute all Intra Procedural
    Optimizations in each partition. This stage runs in parallel.

Since WPA can bottleneck the compilation because it runs serially in the
entire project, LTO was designed to produce faster binaries, not to
produce binaries fast.

Here, the proposed use of LTO to address this problem is to run the IPA
for each Translation Unit (TU), instead in the Whole Program, and
automatically detect when to partition the TU into multiple LTRANS to
improve performance. The advantage of this approach is:

-   It can generate binaries as good as when LTO is disabled.

-   It is faster, as we can partition big files into multiple partitions
    and compile these partitions in parallel

-   It can interact with GNU Make Jobserver, improving CPU utilization.

Planned Tasks

I plan to use the GSoC time to develop the following topics:

-   Week \[1, 3\] -- April 27 to May 15:\
    Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
    IPA analysis directly into multiple LTRANS partitions, instead of
    generating a temporary GIMPLE file.

-   Week \[4, 7\] -- May 18 to June 12:\
    Update the `gcc` driver to take these multiple LTRANS partitions,
    then call the compiler and assembler for each of them, and merge the
    results into one object file. Here I will use the LTO LTRANS object
    streaming, therefore it should interact with GNU Make Jobserver.

-   Week 8 -- June 15 to 19: **First Evaluation**\
    Deliver a non-optimized version of the project. Some programs ought
    to be compiled correctly, but probably there will be a huge overhead
    because so far there will not be any criteria about when to
    partition. Some tests are also planned for this evaluation.

-   Week \[9, 11\] -- June 22 to July 10:\
    Implement a criteria about when to partition, and interactively
    improve it based on data.

-   Week 12 --- July 13 to 17: **Second Evaluation**\
    Deliver a more optimized version of the project. Here we should
    filter files that would compile fast from files that would require
    partitioning, and therefore we should see some speedup.

-   Week \[13, 15\] --- July 20 to August 10:\
    Develop adequate tests coverage and address unexpected issues so
    that this feature can be merged to trunk for the next GCC release.

-   Week 16: **Final evaluation**\
    Deliver the final product as a series of patches for trunk.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-13 20:15 [GSoC 2020] Automatic Detection of Parallel Compilation Viability Giuliano Belinassi
@ 2020-03-17 20:24 ` Giuliano Belinassi
  2020-03-18 14:27   ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Giuliano Belinassi @ 2020-03-17 20:24 UTC (permalink / raw)
  To: gcc; +Cc: rguenther, mjambor, hubicka

Hi, all

I have applied some revews to the project. Please see the new proposal
here:

https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf

**Automatic Detection of Parallel Compilation Viability**

[Giuliano Belinassi]{style="color: darkgreen"}\
Timezone: GMT$-$3:00\
University of São Paulo -- Brazil\
IRC: giulianob in \#gcc\
Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
Github: <https://github.com/giulianobelinassi/>\
Date:

About Me Computer Science Bachelor (University of São Paulo), currently
pursuing a Masters Degree in Computer Science at the same institution.
I've always been fascinated by topics such as High-Performance Computing
and Code Optimization, having worked with a parallel implementation of a
Boundary Elements Method software in GPU. I am currently conducting
research on compiler parallelization and developing the
[ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
already presented it in [GNU Cauldron
2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).

**Skills**: Strong knowledge in C, Concurrency, Shared Memory
Parallelism, Multithreaded Debugging and other typical programming
tools.

Brief Introduction

In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
parallelizing the Intra Procedural optimizations improves speed when
compiling huge files by a factor of 1.8x in a 4 cores machine, and also
showed that this takes 75% of compilation time.

In this project we plan to use the LTO infrastructure to improve
compilation performance in the non-LTO case, with a tradeoff of
generating a binary as good as if LTO is disabled. Here, we will
automatically detect when a single file will benefit from parallelism,
and procceed with the compilation in parallel if so.

Use of LTO

The Link Time Optimization (LTO) is a compilation technique that allows
the compiler to analyse the program as a whole, instead of analysing and
compiling one file at time. Therefore, LTO is able to collect more
information about the program and generate a better optimization plan.
LTO is divided in three parts:

-   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
    stage runs sequentially in each file and, therefore, in parallel in
    the project compilation.

-   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
    (IPA) in the entire program. This state runs serially in the
    project.

-   *LTRANS (Local Transformation)*: Execute all Intra Procedural
    Optimizations in each partition. This stage runs in parallel.

Since WPA can bottleneck the compilation because it runs serially in the
entire project, LTO was designed to produce faster binaries, not to
produce binaries fast.

Here, the proposed use of LTO to address this problem is to run the IPA
for each Translation Unit (TU), instead in the Whole Program, and
automatically detect when to partition the TU into multiple LTRANS to
improve compilation performance. The advantage of this approach is:

-   It can generate binaries as good as when LTO is disabled.

-   It is faster, as we can partition big files into multiple partitions
    and compile these partitions in parallel.

-   It can interact with GNU Make Jobserver, improving CPU utilization.

Planned Tasks

I plan to use the GSoC time to develop the following topics:

-   Week \[1, 3\] -- April 27 to May 15:\
    Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
    Unit (CU) after IPA analysis directly into multiple LTRANS
    partitions, instead of generating a temporary GIMPLE file, and to
    accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
    the generated ASM filenames will be written to.

    There are two possible cases in which I could work on:

    1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
        `cc1` will fork and compile these partitions, each of them
        generating a ASM file, and write the generated asm name into
        `<tempfile>`. Note that if the number of partitions is one, then
        this part is not necessary.

    2.  *Stream LTRANS IR*: After CU is partitionated into multiple
        LTRANS, then `cc1` will write these partitions into disk so that
        LTO can read these files and proceed as a standard LTO operation
        in order to generate a partially linked object file.

    1\. Has the advantage of having less overhead than 2., as there is less
    IO operations, however it may be hard to implement as the assembler file
    may be already opened before forking, so caution is necessary to make
    sure that there are a 1 - 1 relationship between assembler file and the
    compilation process. 2. on the other hand can easily interact with the
    GNU jobserver.

-   Week \[4, 7\] -- May 18 to June 12:\
    Update the `gcc` driver to take each file in `<tempfile>`, then
    assemble and partially link them together. Here, an important
    optimization is to use a named pipe in `<tempfile>` to avoid having
    to wait every partition to end its compilation before assembling the
    files.

-   Week 8 -- June 15 to 19: **First Evaluation**\
    Deliver a non-optimized version of the project. Some programs ought
    to be compiled correctly, but probably there will be a huge overhead
    because so far there is no way of interacting with GNU Jobserver.

-   Week \[9, 11\] -- June 22 to July 10:\
    Work on GNU Make Jobserver integration. A way of doing this is to
    adapt the LTO WPA -> LTRANS way of interacting with
    Jobserver. Another way is to make the forked `cc1` consume Jobserver
    tokens until the compilation finishes, then return the token when
    done.

-   Week 12 -- July 13 to 17: **Second Evaluation**\
    Deliver a more optimized version of the project. Here we should
    filter files that would compile fast from files that would require
    partitioning, and interact with GNU Jobserver. Therefore we should
    see some speedup.

-   Week \[13, 15\] -- July 20 to August 10:\
    Develop adequate tests coverage and address unexpected issues so
    that this feature can be merged to trunk for the next GCC release.

-   Week 16: **Final evaluation**\
    Deliver the final product as a series of patches for trunk.

On 03/13, Giuliano Belinassi wrote:
> Hi, all
> 
> I want to propose and apply for the following GSoC project: Automatic
> Detection of Parallel Compilation Viability.
> 
> https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> 
> Feedback is welcome :)
> 
> Here is a markdown version of it:
> 
> **Automatic Detection of Parallel Compilation Viability**
> 
> [Giuliano Belinassi]{style="color: darkgreen"}\
> Timezone: GMT$-$3:00\
> University of São Paulo -- Brazil\
> IRC: giulianob in \#gcc\
> Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> Github: <https://github.com/giulianobelinassi/>\
> Date:
> 
> About Me Computer Science Bachelor (University of São Paulo), currently
> pursuing a Masters Degree in Computer Science at the same institution.
> I've always been fascinated by topics such as High-Performance Computing
> and Code Optimization, having worked with a parallel implementation of a
> Boundary Elements Method software in GPU. I am currently conducting
> research on compiler parallelization and developing the
> [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> already presented it in [GNU Cauldron
> 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> 
> **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> Parallelism, Multithreaded Debugging and other typical programming
> tools.
> 
> Brief Introduction
> 
> In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> parallelizing the Intra Procedural optimizations improves speed when
> compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> showed that this takes 75% of compilation time.
> 
> In this project we plan to use the LTO infrastructure to improve
> compilation performance in the non-LTO case, with a tradeoff of
> generating a binary as good as if LTO is disabled. Here, we will
> automatically detect when a single file will benefit from parallelism,
> and proceed with the compilation in parallel if so.
> 
> Use of LTO
> 
> The Link Time Optimization (LTO) is a compilation technique that allows
> the compiler to analyse the program as a whole, instead of analysing and
> compiling one file at time. Therefore, LTO is able to collect more
> information about the program and generate a better optimization plan.
> LTO is divided in three parts:
> 
> -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
>     stage runs sequentially in each file and, therefore, in parallel in
>     the project compilation.
> 
> -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
>     (IPA) in the entire program. This state runs serially in the
>     project.
> 
> -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
>     Optimizations in each partition. This stage runs in parallel.
> 
> Since WPA can bottleneck the compilation because it runs serially in the
> entire project, LTO was designed to produce faster binaries, not to
> produce binaries fast.
> 
> Here, the proposed use of LTO to address this problem is to run the IPA
> for each Translation Unit (TU), instead in the Whole Program, and
> automatically detect when to partition the TU into multiple LTRANS to
> improve performance. The advantage of this approach is:
> 
> -   It can generate binaries as good as when LTO is disabled.
> 
> -   It is faster, as we can partition big files into multiple partitions
>     and compile these partitions in parallel
> 
> -   It can interact with GNU Make Jobserver, improving CPU utilization.
> 
> Planned Tasks
> 
> I plan to use the GSoC time to develop the following topics:
> 
> -   Week \[1, 3\] -- April 27 to May 15:\
>     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
>     IPA analysis directly into multiple LTRANS partitions, instead of
>     generating a temporary GIMPLE file.
> 
> -   Week \[4, 7\] -- May 18 to June 12:\
>     Update the `gcc` driver to take these multiple LTRANS partitions,
>     then call the compiler and assembler for each of them, and merge the
>     results into one object file. Here I will use the LTO LTRANS object
>     streaming, therefore it should interact with GNU Make Jobserver.
> 
> -   Week 8 -- June 15 to 19: **First Evaluation**\
>     Deliver a non-optimized version of the project. Some programs ought
>     to be compiled correctly, but probably there will be a huge overhead
>     because so far there will not be any criteria about when to
>     partition. Some tests are also planned for this evaluation.
> 
> -   Week \[9, 11\] -- June 22 to July 10:\
>     Implement a criteria about when to partition, and interactively
>     improve it based on data.
> 
> -   Week 12 --- July 13 to 17: **Second Evaluation**\
>     Deliver a more optimized version of the project. Here we should
>     filter files that would compile fast from files that would require
>     partitioning, and therefore we should see some speedup.
> 
> -   Week \[13, 15\] --- July 20 to August 10:\
>     Develop adequate tests coverage and address unexpected issues so
>     that this feature can be merged to trunk for the next GCC release.
> 
> -   Week 16: **Final evaluation**\
>     Deliver the final product as a series of patches for trunk.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-17 20:24 ` Giuliano Belinassi
@ 2020-03-18 14:27   ` Richard Biener
  2020-03-24  0:37     ` Giuliano Belinassi
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2020-03-18 14:27 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc, mjambor, hubicka

On Tue, 17 Mar 2020, Giuliano Belinassi wrote:

> Hi, all
> 
> I have applied some revews to the project. Please see the new proposal
> here:

Looks good, some editorial changes below

> https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> 
> **Automatic Detection of Parallel Compilation Viability**
> 
> [Giuliano Belinassi]{style="color: darkgreen"}\
> Timezone: GMT$-$3:00\
> University of São Paulo -- Brazil\
> IRC: giulianob in \#gcc\
> Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> Github: <https://github.com/giulianobelinassi/>\
> Date:
> 
> About Me Computer Science Bachelor (University of São Paulo), currently
> pursuing a Masters Degree in Computer Science at the same institution.
> I've always been fascinated by topics such as High-Performance Computing
> and Code Optimization, having worked with a parallel implementation of a
> Boundary Elements Method software in GPU. I am currently conducting
> research on compiler parallelization and developing the
> [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> already presented it in [GNU Cauldron
> 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> 
> **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> Parallelism, Multithreaded Debugging and other typical programming
> tools.
> 
> Brief Introduction
> 
> In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> parallelizing the Intra Procedural optimizations improves speed when
> compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> showed that this takes 75% of compilation time.
> 
> In this project we plan to use the LTO infrastructure to improve
> compilation performance in the non-LTO case, with a tradeoff of
> generating a binary as good as if LTO is disabled. Here, we will
> automatically detect when a single file will benefit from parallelism,
> and procceed with the compilation in parallel if so.
> 
> Use of LTO
> 
> The Link Time Optimization (LTO) is a compilation technique that allows
> the compiler to analyse the program as a whole, instead of analysing and
> compiling one file at time. Therefore, LTO is able to collect more
> information about the program and generate a better optimization plan.
> LTO is divided in three parts:
> 
> -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
>     stage runs sequentially in each file and, therefore, in parallel in
>     the project compilation.
> 
> -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
>     (IPA) in the entire program. This state runs serially in the
>     project.
> 
> -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
>     Optimizations in each partition. This stage runs in parallel.
> 
> Since WPA can bottleneck the compilation because it runs serially in the
> entire project, LTO was designed to produce faster binaries, not to
> produce binaries fast.
> 
> Here, the proposed use of LTO to address this problem is to run the IPA
> for each Translation Unit (TU), instead in the Whole Program, and

This proposal is to use LTO to produce binaries fast by running
the IPA phase separately for each Translation Unit (TU), instead of on the 
Whole Program and ...

> automatically detect when to partition the TU into multiple LTRANS to
> improve compilation performance. The advantage of this approach is:
> 
> -   It can generate binaries as good as when LTO is disabled.
> 
> -   It is faster, as we can partition big files into multiple partitions
>     and compile these partitions in parallel.
> 
> -   It can interact with GNU Make Jobserver, improving CPU utilization.

This reads a bit odd, regular compilation already interacts with the
GNU Make Jobserver.  I'd reorder and reword it w/o dashes like

We can partition big files into multiple partitions and compile these 
partitions in parallel which should improve CPU utilization by exposing
smaller chunks to the GNU Make Jobserver.  Code generation quality
should be unaffected by this.

Thanks,
Richard.

> Planned Tasks
> 
> I plan to use the GSoC time to develop the following topics:
> 
> -   Week \[1, 3\] -- April 27 to May 15:\
>     Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
>     Unit (CU) after IPA analysis directly into multiple LTRANS
>     partitions, instead of generating a temporary GIMPLE file, and to
>     accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
>     the generated ASM filenames will be written to.
> 
>     There are two possible cases in which I could work on:
> 
>     1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
>         `cc1` will fork and compile these partitions, each of them
>         generating a ASM file, and write the generated asm name into
>         `<tempfile>`. Note that if the number of partitions is one, then
>         this part is not necessary.
> 
>     2.  *Stream LTRANS IR*: After CU is partitionated into multiple
>         LTRANS, then `cc1` will write these partitions into disk so that
>         LTO can read these files and proceed as a standard LTO operation
>         in order to generate a partially linked object file.
> 
>     1\. Has the advantage of having less overhead than 2., as there is less
>     IO operations, however it may be hard to implement as the assembler file
>     may be already opened before forking, so caution is necessary to make
>     sure that there are a 1 - 1 relationship between assembler file and the
>     compilation process. 2. on the other hand can easily interact with the
>     GNU jobserver.
> 
> -   Week \[4, 7\] -- May 18 to June 12:\
>     Update the `gcc` driver to take each file in `<tempfile>`, then
>     assemble and partially link them together. Here, an important
>     optimization is to use a named pipe in `<tempfile>` to avoid having
>     to wait every partition to end its compilation before assembling the
>     files.
> 
> -   Week 8 -- June 15 to 19: **First Evaluation**\
>     Deliver a non-optimized version of the project. Some programs ought
>     to be compiled correctly, but probably there will be a huge overhead
>     because so far there is no way of interacting with GNU Jobserver.
> 
> -   Week \[9, 11\] -- June 22 to July 10:\
>     Work on GNU Make Jobserver integration. A way of doing this is to
>     adapt the LTO WPA -> LTRANS way of interacting with
>     Jobserver. Another way is to make the forked `cc1` consume Jobserver
>     tokens until the compilation finishes, then return the token when
>     done.
> 
> -   Week 12 -- July 13 to 17: **Second Evaluation**\
>     Deliver a more optimized version of the project. Here we should
>     filter files that would compile fast from files that would require
>     partitioning, and interact with GNU Jobserver. Therefore we should
>     see some speedup.
> 
> -   Week \[13, 15\] -- July 20 to August 10:\
>     Develop adequate tests coverage and address unexpected issues so
>     that this feature can be merged to trunk for the next GCC release.
> 
> -   Week 16: **Final evaluation**\
>     Deliver the final product as a series of patches for trunk.
> 
> On 03/13, Giuliano Belinassi wrote:
> > Hi, all
> > 
> > I want to propose and apply for the following GSoC project: Automatic
> > Detection of Parallel Compilation Viability.
> > 
> > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > 
> > Feedback is welcome :)
> > 
> > Here is a markdown version of it:
> > 
> > **Automatic Detection of Parallel Compilation Viability**
> > 
> > [Giuliano Belinassi]{style="color: darkgreen"}\
> > Timezone: GMT$-$3:00\
> > University of São Paulo -- Brazil\
> > IRC: giulianob in \#gcc\
> > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > Github: <https://github.com/giulianobelinassi/>\
> > Date:
> > 
> > About Me Computer Science Bachelor (University of São Paulo), currently
> > pursuing a Masters Degree in Computer Science at the same institution.
> > I've always been fascinated by topics such as High-Performance Computing
> > and Code Optimization, having worked with a parallel implementation of a
> > Boundary Elements Method software in GPU. I am currently conducting
> > research on compiler parallelization and developing the
> > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > already presented it in [GNU Cauldron
> > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > 
> > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > Parallelism, Multithreaded Debugging and other typical programming
> > tools.
> > 
> > Brief Introduction
> > 
> > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > parallelizing the Intra Procedural optimizations improves speed when
> > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > showed that this takes 75% of compilation time.
> > 
> > In this project we plan to use the LTO infrastructure to improve
> > compilation performance in the non-LTO case, with a tradeoff of
> > generating a binary as good as if LTO is disabled. Here, we will
> > automatically detect when a single file will benefit from parallelism,
> > and proceed with the compilation in parallel if so.
> > 
> > Use of LTO
> > 
> > The Link Time Optimization (LTO) is a compilation technique that allows
> > the compiler to analyse the program as a whole, instead of analysing and
> > compiling one file at time. Therefore, LTO is able to collect more
> > information about the program and generate a better optimization plan.
> > LTO is divided in three parts:
> > 
> > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> >     stage runs sequentially in each file and, therefore, in parallel in
> >     the project compilation.
> > 
> > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> >     (IPA) in the entire program. This state runs serially in the
> >     project.
> > 
> > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> >     Optimizations in each partition. This stage runs in parallel.
> > 
> > Since WPA can bottleneck the compilation because it runs serially in the
> > entire project, LTO was designed to produce faster binaries, not to
> > produce binaries fast.
> > 
> > Here, the proposed use of LTO to address this problem is to run the IPA
> > for each Translation Unit (TU), instead in the Whole Program, and
> > automatically detect when to partition the TU into multiple LTRANS to
> > improve performance. The advantage of this approach is:
> > 
> > -   It can generate binaries as good as when LTO is disabled.
> > 
> > -   It is faster, as we can partition big files into multiple partitions
> >     and compile these partitions in parallel
> > 
> > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > 
> > Planned Tasks
> > 
> > I plan to use the GSoC time to develop the following topics:
> > 
> > -   Week \[1, 3\] -- April 27 to May 15:\
> >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> >     IPA analysis directly into multiple LTRANS partitions, instead of
> >     generating a temporary GIMPLE file.
> > 
> > -   Week \[4, 7\] -- May 18 to June 12:\
> >     Update the `gcc` driver to take these multiple LTRANS partitions,
> >     then call the compiler and assembler for each of them, and merge the
> >     results into one object file. Here I will use the LTO LTRANS object
> >     streaming, therefore it should interact with GNU Make Jobserver.
> > 
> > -   Week 8 -- June 15 to 19: **First Evaluation**\
> >     Deliver a non-optimized version of the project. Some programs ought
> >     to be compiled correctly, but probably there will be a huge overhead
> >     because so far there will not be any criteria about when to
> >     partition. Some tests are also planned for this evaluation.
> > 
> > -   Week \[9, 11\] -- June 22 to July 10:\
> >     Implement a criteria about when to partition, and interactively
> >     improve it based on data.
> > 
> > -   Week 12 --- July 13 to 17: **Second Evaluation**\
> >     Deliver a more optimized version of the project. Here we should
> >     filter files that would compile fast from files that would require
> >     partitioning, and therefore we should see some speedup.
> > 
> > -   Week \[13, 15\] --- July 20 to August 10:\
> >     Develop adequate tests coverage and address unexpected issues so
> >     that this feature can be merged to trunk for the next GCC release.
> > 
> > -   Week 16: **Final evaluation**\
> >     Deliver the final product as a series of patches for trunk.
> > 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-18 14:27   ` Richard Biener
@ 2020-03-24  0:37     ` Giuliano Belinassi
  2020-03-24  7:20       ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Giuliano Belinassi @ 2020-03-24  0:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, mjambor, hubicka

Hi, Richi

On 03/18, Richard Biener wrote:
> On Tue, 17 Mar 2020, Giuliano Belinassi wrote:
> 
> > Hi, all
> > 
> > I have applied some revews to the project. Please see the new proposal
> > here:
> 
> Looks good, some editorial changes below
> 
> > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > 
> > **Automatic Detection of Parallel Compilation Viability**
> > 
> > [Giuliano Belinassi]{style="color: darkgreen"}\
> > Timezone: GMT$-$3:00\
> > University of São Paulo -- Brazil\
> > IRC: giulianob in \#gcc\
> > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > Github: <https://github.com/giulianobelinassi/>\
> > Date:
> > 
> > About Me Computer Science Bachelor (University of São Paulo), currently
> > pursuing a Masters Degree in Computer Science at the same institution.
> > I've always been fascinated by topics such as High-Performance Computing
> > and Code Optimization, having worked with a parallel implementation of a
> > Boundary Elements Method software in GPU. I am currently conducting
> > research on compiler parallelization and developing the
> > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > already presented it in [GNU Cauldron
> > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > 
> > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > Parallelism, Multithreaded Debugging and other typical programming
> > tools.
> > 
> > Brief Introduction
> > 
> > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > parallelizing the Intra Procedural optimizations improves speed when
> > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > showed that this takes 75% of compilation time.
> > 
> > In this project we plan to use the LTO infrastructure to improve
> > compilation performance in the non-LTO case, with a tradeoff of
> > generating a binary as good as if LTO is disabled. Here, we will
> > automatically detect when a single file will benefit from parallelism,
> > and procceed with the compilation in parallel if so.
> > 
> > Use of LTO
> > 
> > The Link Time Optimization (LTO) is a compilation technique that allows
> > the compiler to analyse the program as a whole, instead of analysing and
> > compiling one file at time. Therefore, LTO is able to collect more
> > information about the program and generate a better optimization plan.
> > LTO is divided in three parts:
> > 
> > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> >     stage runs sequentially in each file and, therefore, in parallel in
> >     the project compilation.
> > 
> > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> >     (IPA) in the entire program. This state runs serially in the
> >     project.
> > 
> > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> >     Optimizations in each partition. This stage runs in parallel.
> > 
> > Since WPA can bottleneck the compilation because it runs serially in the
> > entire project, LTO was designed to produce faster binaries, not to
> > produce binaries fast.
> > 
> > Here, the proposed use of LTO to address this problem is to run the IPA
> > for each Translation Unit (TU), instead in the Whole Program, and
> 
> This proposal is to use LTO to produce binaries fast by running
> the IPA phase separately for each Translation Unit (TU), instead of on the 
> Whole Program and ...
> 
> > automatically detect when to partition the TU into multiple LTRANS to
> > improve compilation performance. The advantage of this approach is:
> > 
> > -   It can generate binaries as good as when LTO is disabled.
> > 
> > -   It is faster, as we can partition big files into multiple partitions
> >     and compile these partitions in parallel.
> > 
> > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> 
> This reads a bit odd, regular compilation already interacts with the
> GNU Make Jobserver.  I'd reorder and reword it w/o dashes like
> 
> We can partition big files into multiple partitions and compile these 
> partitions in parallel which should improve CPU utilization by exposing
> smaller chunks to the GNU Make Jobserver.  Code generation quality
> should be unaffected by this.

How about:

```
The advantage of this approach is: by partitioning big files into
multiple partitions, we can improve the compilation performance by
exposing these partitions to the Jobserver. Therefore, it can improve
CPU utilization in manycore machines.  Generated code quality should be
unaffected by this procedure, which means that it should run as fast as
when LTO is disabled.
```
?

> 
> Thanks,
> Richard.
> 
> > Planned Tasks
> > 
> > I plan to use the GSoC time to develop the following topics:
> > 
> > -   Week \[1, 3\] -- April 27 to May 15:\
> >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
> >     Unit (CU) after IPA analysis directly into multiple LTRANS
> >     partitions, instead of generating a temporary GIMPLE file, and to
> >     accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
> >     the generated ASM filenames will be written to.
> > 
> >     There are two possible cases in which I could work on:
> > 
> >     1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
> >         `cc1` will fork and compile these partitions, each of them
> >         generating a ASM file, and write the generated asm name into
> >         `<tempfile>`. Note that if the number of partitions is one, then
> >         this part is not necessary.
> > 
> >     2.  *Stream LTRANS IR*: After CU is partitionated into multiple
> >         LTRANS, then `cc1` will write these partitions into disk so that
> >         LTO can read these files and proceed as a standard LTO operation
> >         in order to generate a partially linked object file.
> > 
> >     1\. Has the advantage of having less overhead than 2., as there is less
> >     IO operations, however it may be hard to implement as the assembler file
> >     may be already opened before forking, so caution is necessary to make
> >     sure that there are a 1 - 1 relationship between assembler file and the
> >     compilation process. 2. on the other hand can easily interact with the
> >     GNU jobserver.
> > 
> > -   Week \[4, 7\] -- May 18 to June 12:\
> >     Update the `gcc` driver to take each file in `<tempfile>`, then
> >     assemble and partially link them together. Here, an important
> >     optimization is to use a named pipe in `<tempfile>` to avoid having
> >     to wait every partition to end its compilation before assembling the
> >     files.
> > 
> > -   Week 8 -- June 15 to 19: **First Evaluation**\
> >     Deliver a non-optimized version of the project. Some programs ought
> >     to be compiled correctly, but probably there will be a huge overhead
> >     because so far there is no way of interacting with GNU Jobserver.
> > 
> > -   Week \[9, 11\] -- June 22 to July 10:\
> >     Work on GNU Make Jobserver integration. A way of doing this is to
> >     adapt the LTO WPA -> LTRANS way of interacting with
> >     Jobserver. Another way is to make the forked `cc1` consume Jobserver
> >     tokens until the compilation finishes, then return the token when
> >     done.
> > 
> > -   Week 12 -- July 13 to 17: **Second Evaluation**\
> >     Deliver a more optimized version of the project. Here we should
> >     filter files that would compile fast from files that would require
> >     partitioning, and interact with GNU Jobserver. Therefore we should
> >     see some speedup.
> > 
> > -   Week \[13, 15\] -- July 20 to August 10:\
> >     Develop adequate tests coverage and address unexpected issues so
> >     that this feature can be merged to trunk for the next GCC release.
> > 
> > -   Week 16: **Final evaluation**\
> >     Deliver the final product as a series of patches for trunk.
> > 
> > On 03/13, Giuliano Belinassi wrote:
> > > Hi, all
> > > 
> > > I want to propose and apply for the following GSoC project: Automatic
> > > Detection of Parallel Compilation Viability.
> > > 
> > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > 
> > > Feedback is welcome :)
> > > 
> > > Here is a markdown version of it:
> > > 
> > > **Automatic Detection of Parallel Compilation Viability**
> > > 
> > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > Timezone: GMT$-$3:00\
> > > University of São Paulo -- Brazil\
> > > IRC: giulianob in \#gcc\
> > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > Github: <https://github.com/giulianobelinassi/>\
> > > Date:
> > > 
> > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > pursuing a Masters Degree in Computer Science at the same institution.
> > > I've always been fascinated by topics such as High-Performance Computing
> > > and Code Optimization, having worked with a parallel implementation of a
> > > Boundary Elements Method software in GPU. I am currently conducting
> > > research on compiler parallelization and developing the
> > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > already presented it in [GNU Cauldron
> > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > 
> > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > Parallelism, Multithreaded Debugging and other typical programming
> > > tools.
> > > 
> > > Brief Introduction
> > > 
> > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > parallelizing the Intra Procedural optimizations improves speed when
> > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > showed that this takes 75% of compilation time.
> > > 
> > > In this project we plan to use the LTO infrastructure to improve
> > > compilation performance in the non-LTO case, with a tradeoff of
> > > generating a binary as good as if LTO is disabled. Here, we will
> > > automatically detect when a single file will benefit from parallelism,
> > > and proceed with the compilation in parallel if so.
> > > 
> > > Use of LTO
> > > 
> > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > the compiler to analyse the program as a whole, instead of analysing and
> > > compiling one file at time. Therefore, LTO is able to collect more
> > > information about the program and generate a better optimization plan.
> > > LTO is divided in three parts:
> > > 
> > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > >     stage runs sequentially in each file and, therefore, in parallel in
> > >     the project compilation.
> > > 
> > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > >     (IPA) in the entire program. This state runs serially in the
> > >     project.
> > > 
> > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > >     Optimizations in each partition. This stage runs in parallel.
> > > 
> > > Since WPA can bottleneck the compilation because it runs serially in the
> > > entire project, LTO was designed to produce faster binaries, not to
> > > produce binaries fast.
> > > 
> > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > for each Translation Unit (TU), instead in the Whole Program, and
> > > automatically detect when to partition the TU into multiple LTRANS to
> > > improve performance. The advantage of this approach is:
> > > 
> > > -   It can generate binaries as good as when LTO is disabled.
> > > 
> > > -   It is faster, as we can partition big files into multiple partitions
> > >     and compile these partitions in parallel
> > > 
> > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > > 
> > > Planned Tasks
> > > 
> > > I plan to use the GSoC time to develop the following topics:
> > > 
> > > -   Week \[1, 3\] -- April 27 to May 15:\
> > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> > >     IPA analysis directly into multiple LTRANS partitions, instead of
> > >     generating a temporary GIMPLE file.
> > > 
> > > -   Week \[4, 7\] -- May 18 to June 12:\
> > >     Update the `gcc` driver to take these multiple LTRANS partitions,
> > >     then call the compiler and assembler for each of them, and merge the
> > >     results into one object file. Here I will use the LTO LTRANS object
> > >     streaming, therefore it should interact with GNU Make Jobserver.
> > > 
> > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > >     Deliver a non-optimized version of the project. Some programs ought
> > >     to be compiled correctly, but probably there will be a huge overhead
> > >     because so far there will not be any criteria about when to
> > >     partition. Some tests are also planned for this evaluation.
> > > 
> > > -   Week \[9, 11\] -- June 22 to July 10:\
> > >     Implement a criteria about when to partition, and interactively
> > >     improve it based on data.
> > > 
> > > -   Week 12 --- July 13 to 17: **Second Evaluation**\
> > >     Deliver a more optimized version of the project. Here we should
> > >     filter files that would compile fast from files that would require
> > >     partitioning, and therefore we should see some speedup.
> > > 
> > > -   Week \[13, 15\] --- July 20 to August 10:\
> > >     Develop adequate tests coverage and address unexpected issues so
> > >     that this feature can be merged to trunk for the next GCC release.
> > > 
> > > -   Week 16: **Final evaluation**\
> > >     Deliver the final product as a series of patches for trunk.
> > > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Thank you,
Giuliano.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-24  0:37     ` Giuliano Belinassi
@ 2020-03-24  7:20       ` Richard Biener
  2020-03-24 20:54         ` Giuliano Belinassi
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2020-03-24  7:20 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc, mjambor, hubicka

On Mon, 23 Mar 2020, Giuliano Belinassi wrote:

> Hi, Richi
> 
> On 03/18, Richard Biener wrote:
> > On Tue, 17 Mar 2020, Giuliano Belinassi wrote:
> > 
> > > Hi, all
> > > 
> > > I have applied some revews to the project. Please see the new proposal
> > > here:
> > 
> > Looks good, some editorial changes below
> > 
> > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > 
> > > **Automatic Detection of Parallel Compilation Viability**
> > > 
> > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > Timezone: GMT$-$3:00\
> > > University of São Paulo -- Brazil\
> > > IRC: giulianob in \#gcc\
> > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > Github: <https://github.com/giulianobelinassi/>\
> > > Date:
> > > 
> > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > pursuing a Masters Degree in Computer Science at the same institution.
> > > I've always been fascinated by topics such as High-Performance Computing
> > > and Code Optimization, having worked with a parallel implementation of a
> > > Boundary Elements Method software in GPU. I am currently conducting
> > > research on compiler parallelization and developing the
> > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > already presented it in [GNU Cauldron
> > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > 
> > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > Parallelism, Multithreaded Debugging and other typical programming
> > > tools.
> > > 
> > > Brief Introduction
> > > 
> > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > parallelizing the Intra Procedural optimizations improves speed when
> > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > showed that this takes 75% of compilation time.
> > > 
> > > In this project we plan to use the LTO infrastructure to improve
> > > compilation performance in the non-LTO case, with a tradeoff of
> > > generating a binary as good as if LTO is disabled. Here, we will
> > > automatically detect when a single file will benefit from parallelism,
> > > and procceed with the compilation in parallel if so.
> > > 
> > > Use of LTO
> > > 
> > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > the compiler to analyse the program as a whole, instead of analysing and
> > > compiling one file at time. Therefore, LTO is able to collect more
> > > information about the program and generate a better optimization plan.
> > > LTO is divided in three parts:
> > > 
> > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > >     stage runs sequentially in each file and, therefore, in parallel in
> > >     the project compilation.
> > > 
> > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > >     (IPA) in the entire program. This state runs serially in the
> > >     project.
> > > 
> > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > >     Optimizations in each partition. This stage runs in parallel.
> > > 
> > > Since WPA can bottleneck the compilation because it runs serially in the
> > > entire project, LTO was designed to produce faster binaries, not to
> > > produce binaries fast.
> > > 
> > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > for each Translation Unit (TU), instead in the Whole Program, and
> > 
> > This proposal is to use LTO to produce binaries fast by running
> > the IPA phase separately for each Translation Unit (TU), instead of on the 
> > Whole Program and ...
> > 
> > > automatically detect when to partition the TU into multiple LTRANS to
> > > improve compilation performance. The advantage of this approach is:
> > > 
> > > -   It can generate binaries as good as when LTO is disabled.
> > > 
> > > -   It is faster, as we can partition big files into multiple partitions
> > >     and compile these partitions in parallel.
> > > 
> > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > 
> > This reads a bit odd, regular compilation already interacts with the
> > GNU Make Jobserver.  I'd reorder and reword it w/o dashes like
> > 
> > We can partition big files into multiple partitions and compile these 
> > partitions in parallel which should improve CPU utilization by exposing
> > smaller chunks to the GNU Make Jobserver.  Code generation quality
> > should be unaffected by this.
> 
> How about:
> 
> ```
> The advantage of this approach is: by partitioning big files into
> multiple partitions, we can improve the compilation performance by
> exposing these partitions to the Jobserver. Therefore, it can improve
> CPU utilization in manycore machines.  Generated code quality should be
> unaffected by this procedure, which means that it should run as fast as
> when LTO is disabled.
> ```
> ?

Sounds great.

Richard.

> > 
> > Thanks,
> > Richard.
> > 
> > > Planned Tasks
> > > 
> > > I plan to use the GSoC time to develop the following topics:
> > > 
> > > -   Week \[1, 3\] -- April 27 to May 15:\
> > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
> > >     Unit (CU) after IPA analysis directly into multiple LTRANS
> > >     partitions, instead of generating a temporary GIMPLE file, and to
> > >     accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
> > >     the generated ASM filenames will be written to.
> > > 
> > >     There are two possible cases in which I could work on:
> > > 
> > >     1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
> > >         `cc1` will fork and compile these partitions, each of them
> > >         generating a ASM file, and write the generated asm name into
> > >         `<tempfile>`. Note that if the number of partitions is one, then
> > >         this part is not necessary.
> > > 
> > >     2.  *Stream LTRANS IR*: After CU is partitionated into multiple
> > >         LTRANS, then `cc1` will write these partitions into disk so that
> > >         LTO can read these files and proceed as a standard LTO operation
> > >         in order to generate a partially linked object file.
> > > 
> > >     1\. Has the advantage of having less overhead than 2., as there is less
> > >     IO operations, however it may be hard to implement as the assembler file
> > >     may be already opened before forking, so caution is necessary to make
> > >     sure that there are a 1 - 1 relationship between assembler file and the
> > >     compilation process. 2. on the other hand can easily interact with the
> > >     GNU jobserver.
> > > 
> > > -   Week \[4, 7\] -- May 18 to June 12:\
> > >     Update the `gcc` driver to take each file in `<tempfile>`, then
> > >     assemble and partially link them together. Here, an important
> > >     optimization is to use a named pipe in `<tempfile>` to avoid having
> > >     to wait every partition to end its compilation before assembling the
> > >     files.
> > > 
> > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > >     Deliver a non-optimized version of the project. Some programs ought
> > >     to be compiled correctly, but probably there will be a huge overhead
> > >     because so far there is no way of interacting with GNU Jobserver.
> > > 
> > > -   Week \[9, 11\] -- June 22 to July 10:\
> > >     Work on GNU Make Jobserver integration. A way of doing this is to
> > >     adapt the LTO WPA -> LTRANS way of interacting with
> > >     Jobserver. Another way is to make the forked `cc1` consume Jobserver
> > >     tokens until the compilation finishes, then return the token when
> > >     done.
> > > 
> > > -   Week 12 -- July 13 to 17: **Second Evaluation**\
> > >     Deliver a more optimized version of the project. Here we should
> > >     filter files that would compile fast from files that would require
> > >     partitioning, and interact with GNU Jobserver. Therefore we should
> > >     see some speedup.
> > > 
> > > -   Week \[13, 15\] -- July 20 to August 10:\
> > >     Develop adequate tests coverage and address unexpected issues so
> > >     that this feature can be merged to trunk for the next GCC release.
> > > 
> > > -   Week 16: **Final evaluation**\
> > >     Deliver the final product as a series of patches for trunk.
> > > 
> > > On 03/13, Giuliano Belinassi wrote:
> > > > Hi, all
> > > > 
> > > > I want to propose and apply for the following GSoC project: Automatic
> > > > Detection of Parallel Compilation Viability.
> > > > 
> > > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > > 
> > > > Feedback is welcome :)
> > > > 
> > > > Here is a markdown version of it:
> > > > 
> > > > **Automatic Detection of Parallel Compilation Viability**
> > > > 
> > > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > > Timezone: GMT$-$3:00\
> > > > University of São Paulo -- Brazil\
> > > > IRC: giulianob in \#gcc\
> > > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > > Github: <https://github.com/giulianobelinassi/>\
> > > > Date:
> > > > 
> > > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > > pursuing a Masters Degree in Computer Science at the same institution.
> > > > I've always been fascinated by topics such as High-Performance Computing
> > > > and Code Optimization, having worked with a parallel implementation of a
> > > > Boundary Elements Method software in GPU. I am currently conducting
> > > > research on compiler parallelization and developing the
> > > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > > already presented it in [GNU Cauldron
> > > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > > 
> > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > > Parallelism, Multithreaded Debugging and other typical programming
> > > > tools.
> > > > 
> > > > Brief Introduction
> > > > 
> > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > > parallelizing the Intra Procedural optimizations improves speed when
> > > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > > showed that this takes 75% of compilation time.
> > > > 
> > > > In this project we plan to use the LTO infrastructure to improve
> > > > compilation performance in the non-LTO case, with a tradeoff of
> > > > generating a binary as good as if LTO is disabled. Here, we will
> > > > automatically detect when a single file will benefit from parallelism,
> > > > and proceed with the compilation in parallel if so.
> > > > 
> > > > Use of LTO
> > > > 
> > > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > > the compiler to analyse the program as a whole, instead of analysing and
> > > > compiling one file at time. Therefore, LTO is able to collect more
> > > > information about the program and generate a better optimization plan.
> > > > LTO is divided in three parts:
> > > > 
> > > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > > >     stage runs sequentially in each file and, therefore, in parallel in
> > > >     the project compilation.
> > > > 
> > > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > > >     (IPA) in the entire program. This state runs serially in the
> > > >     project.
> > > > 
> > > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > > >     Optimizations in each partition. This stage runs in parallel.
> > > > 
> > > > Since WPA can bottleneck the compilation because it runs serially in the
> > > > entire project, LTO was designed to produce faster binaries, not to
> > > > produce binaries fast.
> > > > 
> > > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > > for each Translation Unit (TU), instead in the Whole Program, and
> > > > automatically detect when to partition the TU into multiple LTRANS to
> > > > improve performance. The advantage of this approach is:
> > > > 
> > > > -   It can generate binaries as good as when LTO is disabled.
> > > > 
> > > > -   It is faster, as we can partition big files into multiple partitions
> > > >     and compile these partitions in parallel
> > > > 
> > > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > > > 
> > > > Planned Tasks
> > > > 
> > > > I plan to use the GSoC time to develop the following topics:
> > > > 
> > > > -   Week \[1, 3\] -- April 27 to May 15:\
> > > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> > > >     IPA analysis directly into multiple LTRANS partitions, instead of
> > > >     generating a temporary GIMPLE file.
> > > > 
> > > > -   Week \[4, 7\] -- May 18 to June 12:\
> > > >     Update the `gcc` driver to take these multiple LTRANS partitions,
> > > >     then call the compiler and assembler for each of them, and merge the
> > > >     results into one object file. Here I will use the LTO LTRANS object
> > > >     streaming, therefore it should interact with GNU Make Jobserver.
> > > > 
> > > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > > >     Deliver a non-optimized version of the project. Some programs ought
> > > >     to be compiled correctly, but probably there will be a huge overhead
> > > >     because so far there will not be any criteria about when to
> > > >     partition. Some tests are also planned for this evaluation.
> > > > 
> > > > -   Week \[9, 11\] -- June 22 to July 10:\
> > > >     Implement a criteria about when to partition, and interactively
> > > >     improve it based on data.
> > > > 
> > > > -   Week 12 --- July 13 to 17: **Second Evaluation**\
> > > >     Deliver a more optimized version of the project. Here we should
> > > >     filter files that would compile fast from files that would require
> > > >     partitioning, and therefore we should see some speedup.
> > > > 
> > > > -   Week \[13, 15\] --- July 20 to August 10:\
> > > >     Develop adequate tests coverage and address unexpected issues so
> > > >     that this feature can be merged to trunk for the next GCC release.
> > > > 
> > > > -   Week 16: **Final evaluation**\
> > > >     Deliver the final product as a series of patches for trunk.
> > > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> Thank you,
> Giuliano.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-24  7:20       ` Richard Biener
@ 2020-03-24 20:54         ` Giuliano Belinassi
  0 siblings, 0 replies; 9+ messages in thread
From: Giuliano Belinassi @ 2020-03-24 20:54 UTC (permalink / raw)
  To: gcc; +Cc: mjambor, hubicka, rguenther

Hi, all.

I am updating the timeline, since it was shifted due to SARS-CoV-2. Here
is the updated version:

-   Week \[1, 4\] -- May 4 to May 27:\
    Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
    Unit (CU) after IPA analysis directly into multiple LTRANS
    partitions, instead of generating a temporary GIMPLE file, and to
    accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
    the generated ASM filenames will be written to.

    There are two possible cases in which I could work on:

    1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
        `cc1` will fork and compile these partitions, each of them
        generating a ASM file, and write the generated asm name into
        `<tempfile>`. Note that if the number of partitions is one, then
        this part is not necessary.

    2.  *Stream LTRANS IR*: After CU is partitionated into multiple
        LTRANS, then `cc1` will write these partitions into disk so that
        LTO can read these files and proceed as a standard LTO operation
        in order to generate a partially linked object file.

    1\. Has the advantage of having less overhead than 2., as there is less
    IO operations, however it may be hard to implement as the assembler file
    may be already opened before forking, so caution is necessary to make
    sure that there are a 1 - 1 relationship between assembler file and the
    compilation process. 2. on the other hand can easily interact with the
    GNU jobserver.

-   Week \[5, 8\] -- June 1 to June 26:\
    Update the `gcc` driver to take each file in `<tempfile>`, then
    assemble and partially link them together. Here, an important
    optimization is to use a named pipe in `<tempfile>` to avoid having
    to wait every partition to end its compilation before assembling the
    files.

-   Week 9 -- June 29 to July 3: **First Evaluation**\
    Deliver a non-optimized version of the project. Some programs ought
    to be compiled correctly, but probably there will be a huge overhead
    because so far there is no way of interacting with GNU Jobserver.

-   Week \[10, 12\] -- July 6 to July 24:\
    Work on GNU Make Jobserver integration. A way of doing this is to
    adapt the LTO WPA $\rightarrow$ LTRANS way of interacting with
    Jobserver. Another way is to make the forked `cc1` consume Jobserver
    tokens until the compilation finishes, then return the token when
    done.

-   Week 13 -- July 27 to 31: **Second Evaluation**\
    Deliver a more optimized version of the project. Here we should
    filter files that would compile fast from files that would require
    partitioning, and interact with GNU Jobserver. Therefore we should
    see some speedup.

-   Week \[14, 16\] -- August 3 to 21:\
    Develop adequate tests coverage and address unexpected issues so
    that this feature can be merged to trunk for the next GCC release.

-   Week 17: August 24 to 31 **Final evaluation**\
    Deliver the final product as a series of patches for trunk.

Thank you,
Giuliano.

On 03/24, Richard Biener wrote:
> On Mon, 23 Mar 2020, Giuliano Belinassi wrote:
> 
> > Hi, Richi
> > 
> > On 03/18, Richard Biener wrote:
> > > On Tue, 17 Mar 2020, Giuliano Belinassi wrote:
> > > 
> > > > Hi, all
> > > > 
> > > > I have applied some revews to the project. Please see the new proposal
> > > > here:
> > > 
> > > Looks good, some editorial changes below
> > > 
> > > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > > 
> > > > **Automatic Detection of Parallel Compilation Viability**
> > > > 
> > > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > > Timezone: GMT$-$3:00\
> > > > University of São Paulo -- Brazil\
> > > > IRC: giulianob in \#gcc\
> > > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > > Github: <https://github.com/giulianobelinassi/>\
> > > > Date:
> > > > 
> > > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > > pursuing a Masters Degree in Computer Science at the same institution.
> > > > I've always been fascinated by topics such as High-Performance Computing
> > > > and Code Optimization, having worked with a parallel implementation of a
> > > > Boundary Elements Method software in GPU. I am currently conducting
> > > > research on compiler parallelization and developing the
> > > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > > already presented it in [GNU Cauldron
> > > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > > 
> > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > > Parallelism, Multithreaded Debugging and other typical programming
> > > > tools.
> > > > 
> > > > Brief Introduction
> > > > 
> > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > > parallelizing the Intra Procedural optimizations improves speed when
> > > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > > showed that this takes 75% of compilation time.
> > > > 
> > > > In this project we plan to use the LTO infrastructure to improve
> > > > compilation performance in the non-LTO case, with a tradeoff of
> > > > generating a binary as good as if LTO is disabled. Here, we will
> > > > automatically detect when a single file will benefit from parallelism,
> > > > and procceed with the compilation in parallel if so.
> > > > 
> > > > Use of LTO
> > > > 
> > > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > > the compiler to analyse the program as a whole, instead of analysing and
> > > > compiling one file at time. Therefore, LTO is able to collect more
> > > > information about the program and generate a better optimization plan.
> > > > LTO is divided in three parts:
> > > > 
> > > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > > >     stage runs sequentially in each file and, therefore, in parallel in
> > > >     the project compilation.
> > > > 
> > > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > > >     (IPA) in the entire program. This state runs serially in the
> > > >     project.
> > > > 
> > > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > > >     Optimizations in each partition. This stage runs in parallel.
> > > > 
> > > > Since WPA can bottleneck the compilation because it runs serially in the
> > > > entire project, LTO was designed to produce faster binaries, not to
> > > > produce binaries fast.
> > > > 
> > > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > > for each Translation Unit (TU), instead in the Whole Program, and
> > > 
> > > This proposal is to use LTO to produce binaries fast by running
> > > the IPA phase separately for each Translation Unit (TU), instead of on the 
> > > Whole Program and ...
> > > 
> > > > automatically detect when to partition the TU into multiple LTRANS to
> > > > improve compilation performance. The advantage of this approach is:
> > > > 
> > > > -   It can generate binaries as good as when LTO is disabled.
> > > > 
> > > > -   It is faster, as we can partition big files into multiple partitions
> > > >     and compile these partitions in parallel.
> > > > 
> > > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > > 
> > > This reads a bit odd, regular compilation already interacts with the
> > > GNU Make Jobserver.  I'd reorder and reword it w/o dashes like
> > > 
> > > We can partition big files into multiple partitions and compile these 
> > > partitions in parallel which should improve CPU utilization by exposing
> > > smaller chunks to the GNU Make Jobserver.  Code generation quality
> > > should be unaffected by this.
> > 
> > How about:
> > 
> > ```
> > The advantage of this approach is: by partitioning big files into
> > multiple partitions, we can improve the compilation performance by
> > exposing these partitions to the Jobserver. Therefore, it can improve
> > CPU utilization in manycore machines.  Generated code quality should be
> > unaffected by this procedure, which means that it should run as fast as
> > when LTO is disabled.
> > ```
> > ?
> 
> Sounds great.
> 
> Richard.
> 
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > Planned Tasks
> > > > 
> > > > I plan to use the GSoC time to develop the following topics:
> > > > 
> > > > -   Week \[1, 3\] -- April 27 to May 15:\
> > > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation
> > > >     Unit (CU) after IPA analysis directly into multiple LTRANS
> > > >     partitions, instead of generating a temporary GIMPLE file, and to
> > > >     accept a additional parameter `-fsplit-outputs=<tempfile>`, in which
> > > >     the generated ASM filenames will be written to.
> > > > 
> > > >     There are two possible cases in which I could work on:
> > > > 
> > > >     1.  *Fork*: After the CU is partitioned into multiple LTRANS, then
> > > >         `cc1` will fork and compile these partitions, each of them
> > > >         generating a ASM file, and write the generated asm name into
> > > >         `<tempfile>`. Note that if the number of partitions is one, then
> > > >         this part is not necessary.
> > > > 
> > > >     2.  *Stream LTRANS IR*: After CU is partitionated into multiple
> > > >         LTRANS, then `cc1` will write these partitions into disk so that
> > > >         LTO can read these files and proceed as a standard LTO operation
> > > >         in order to generate a partially linked object file.
> > > > 
> > > >     1\. Has the advantage of having less overhead than 2., as there is less
> > > >     IO operations, however it may be hard to implement as the assembler file
> > > >     may be already opened before forking, so caution is necessary to make
> > > >     sure that there are a 1 - 1 relationship between assembler file and the
> > > >     compilation process. 2. on the other hand can easily interact with the
> > > >     GNU jobserver.
> > > > 
> > > > -   Week \[4, 7\] -- May 18 to June 12:\
> > > >     Update the `gcc` driver to take each file in `<tempfile>`, then
> > > >     assemble and partially link them together. Here, an important
> > > >     optimization is to use a named pipe in `<tempfile>` to avoid having
> > > >     to wait every partition to end its compilation before assembling the
> > > >     files.
> > > > 
> > > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > > >     Deliver a non-optimized version of the project. Some programs ought
> > > >     to be compiled correctly, but probably there will be a huge overhead
> > > >     because so far there is no way of interacting with GNU Jobserver.
> > > > 
> > > > -   Week \[9, 11\] -- June 22 to July 10:\
> > > >     Work on GNU Make Jobserver integration. A way of doing this is to
> > > >     adapt the LTO WPA -> LTRANS way of interacting with
> > > >     Jobserver. Another way is to make the forked `cc1` consume Jobserver
> > > >     tokens until the compilation finishes, then return the token when
> > > >     done.
> > > > 
> > > > -   Week 12 -- July 13 to 17: **Second Evaluation**\
> > > >     Deliver a more optimized version of the project. Here we should
> > > >     filter files that would compile fast from files that would require
> > > >     partitioning, and interact with GNU Jobserver. Therefore we should
> > > >     see some speedup.
> > > > 
> > > > -   Week \[13, 15\] -- July 20 to August 10:\
> > > >     Develop adequate tests coverage and address unexpected issues so
> > > >     that this feature can be merged to trunk for the next GCC release.
> > > > 
> > > > -   Week 16: **Final evaluation**\
> > > >     Deliver the final product as a series of patches for trunk.
> > > > 
> > > > On 03/13, Giuliano Belinassi wrote:
> > > > > Hi, all
> > > > > 
> > > > > I want to propose and apply for the following GSoC project: Automatic
> > > > > Detection of Parallel Compilation Viability.
> > > > > 
> > > > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf
> > > > > 
> > > > > Feedback is welcome :)
> > > > > 
> > > > > Here is a markdown version of it:
> > > > > 
> > > > > **Automatic Detection of Parallel Compilation Viability**
> > > > > 
> > > > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > > > Timezone: GMT$-$3:00\
> > > > > University of São Paulo -- Brazil\
> > > > > IRC: giulianob in \#gcc\
> > > > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > > > Github: <https://github.com/giulianobelinassi/>\
> > > > > Date:
> > > > > 
> > > > > About Me Computer Science Bachelor (University of São Paulo), currently
> > > > > pursuing a Masters Degree in Computer Science at the same institution.
> > > > > I've always been fascinated by topics such as High-Performance Computing
> > > > > and Code Optimization, having worked with a parallel implementation of a
> > > > > Boundary Elements Method software in GPU. I am currently conducting
> > > > > research on compiler parallelization and developing the
> > > > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > > > already presented it in [GNU Cauldron
> > > > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > > > 
> > > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > > > Parallelism, Multithreaded Debugging and other typical programming
> > > > > tools.
> > > > > 
> > > > > Brief Introduction
> > > > > 
> > > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > > > parallelizing the Intra Procedural optimizations improves speed when
> > > > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > > > showed that this takes 75% of compilation time.
> > > > > 
> > > > > In this project we plan to use the LTO infrastructure to improve
> > > > > compilation performance in the non-LTO case, with a tradeoff of
> > > > > generating a binary as good as if LTO is disabled. Here, we will
> > > > > automatically detect when a single file will benefit from parallelism,
> > > > > and proceed with the compilation in parallel if so.
> > > > > 
> > > > > Use of LTO
> > > > > 
> > > > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > > > the compiler to analyse the program as a whole, instead of analysing and
> > > > > compiling one file at time. Therefore, LTO is able to collect more
> > > > > information about the program and generate a better optimization plan.
> > > > > LTO is divided in three parts:
> > > > > 
> > > > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > > > >     stage runs sequentially in each file and, therefore, in parallel in
> > > > >     the project compilation.
> > > > > 
> > > > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > > > >     (IPA) in the entire program. This state runs serially in the
> > > > >     project.
> > > > > 
> > > > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > > > >     Optimizations in each partition. This stage runs in parallel.
> > > > > 
> > > > > Since WPA can bottleneck the compilation because it runs serially in the
> > > > > entire project, LTO was designed to produce faster binaries, not to
> > > > > produce binaries fast.
> > > > > 
> > > > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > > > for each Translation Unit (TU), instead in the Whole Program, and
> > > > > automatically detect when to partition the TU into multiple LTRANS to
> > > > > improve performance. The advantage of this approach is:
> > > > > 
> > > > > -   It can generate binaries as good as when LTO is disabled.
> > > > > 
> > > > > -   It is faster, as we can partition big files into multiple partitions
> > > > >     and compile these partitions in parallel
> > > > > 
> > > > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > > > > 
> > > > > Planned Tasks
> > > > > 
> > > > > I plan to use the GSoC time to develop the following topics:
> > > > > 
> > > > > -   Week \[1, 3\] -- April 27 to May 15:\
> > > > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> > > > >     IPA analysis directly into multiple LTRANS partitions, instead of
> > > > >     generating a temporary GIMPLE file.
> > > > > 
> > > > > -   Week \[4, 7\] -- May 18 to June 12:\
> > > > >     Update the `gcc` driver to take these multiple LTRANS partitions,
> > > > >     then call the compiler and assembler for each of them, and merge the
> > > > >     results into one object file. Here I will use the LTO LTRANS object
> > > > >     streaming, therefore it should interact with GNU Make Jobserver.
> > > > > 
> > > > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > > > >     Deliver a non-optimized version of the project. Some programs ought
> > > > >     to be compiled correctly, but probably there will be a huge overhead
> > > > >     because so far there will not be any criteria about when to
> > > > >     partition. Some tests are also planned for this evaluation.
> > > > > 
> > > > > -   Week \[9, 11\] -- June 22 to July 10:\
> > > > >     Implement a criteria about when to partition, and interactively
> > > > >     improve it based on data.
> > > > > 
> > > > > -   Week 12 --- July 13 to 17: **Second Evaluation**\
> > > > >     Deliver a more optimized version of the project. Here we should
> > > > >     filter files that would compile fast from files that would require
> > > > >     partitioning, and therefore we should see some speedup.
> > > > > 
> > > > > -   Week \[13, 15\] --- July 20 to August 10:\
> > > > >     Develop adequate tests coverage and address unexpected issues so
> > > > >     that this feature can be merged to trunk for the next GCC release.
> > > > > 
> > > > > -   Week 16: **Final evaluation**\
> > > > >     Deliver the final product as a series of patches for trunk.
> > > > > 
> > > > 
> > > 
> > > -- 
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > 
> > Thank you,
> > Giuliano.
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-17 20:04   ` Giuliano Belinassi
@ 2020-03-18 11:44     ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2020-03-18 11:44 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc, mjambor, hubicka

On Tue, 17 Mar 2020, Giuliano Belinassi wrote:

> Hi, Richi
> 
> Thank you for your review!
> 
> On 03/16, Richard Biener wrote:
> > On Fri, 13 Mar 2020, Giuliano Belinassi wrote:
> > 
> > > Hi, all
> > > 
> > > I want to propose and apply for the following GSoC project: Automatic
> > > Detection of Parallel Compilation Viability.
> > > 
> > > Here is the proposal, and I am attaching a pdf file for better
> > > readability:
> > > 
> > > **Automatic Detection of Parallel Compilation Viability**
> > > 
> > > [Giuliano Belinassi]{style="color: darkgreen"}\
> > > Timezone: GMT$-$3:00\
> > > University of São Paulo -- Brazil\
> > > IRC: giulianob in \#gcc\
> > > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > > Github: <https://github.com/giulianobelinassi/>\
> > > 
> > > About Me: Computer Science Bachelor (University of São Paulo), currently
> > > pursuing a Masters Degree in Computer Science at the same institution.
> > > I've always been fascinated by topics such as High-Performance Computing
> > > and Code Optimization, having worked with a parallel implementation of a
> > > Boundary Elements Method software in GPU. I am currently conducting
> > > research on compiler parallelization and developing the
> > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > > already presented it in [GNU Cauldron
> > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > > 
> > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > > Parallelism, Multithreaded Debugging and other typical programming
> > > tools.
> > > 
> > > Brief Introduction
> > > 
> > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > > parallelizing the Intra Procedural optimizations improves speed when
> > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > > showed that this takes 75% of compilation time.
> > > 
> > > In this project we plan to use the LTO infrastructure to improve
> > > compilation performance in the non-LTO case, with a tradeoff of
> > > generating a binary as good as if LTO is disabled. Here, we will
> > > automatically detect when a single file will benefit from parallelism,
> > > and proceed with the compilation in parallel if so.
> > > 
> > > Use of LTO
> > > 
> > > The Link Time Optimization (LTO) is a compilation technique that allows
> > > the compiler to analyse the program as a whole, instead of analysing and
> > > compiling one file at time. Therefore, LTO is able to collect more
> > > information about the program and generate a better optimization plan.
> > > LTO is divided in three parts:
> > > 
> > > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> > >     stage runs sequentially in each file and, therefore, in parallel in
> > >     the project compilation.
> > > 
> > > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> > >     (IPA) in the entire program. This state runs serially in the
> > >     project.
> > > 
> > > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> > >     Optimizations in each partition. This stage runs in parallel.
> > > 
> > > Since WPA can bottleneck the compilation because it runs serially in the
> > > entire project, LTO was designed to produce faster binaries, not to
> > > produce binaries fast.
> > > 
> > > Here, the proposed use of LTO to address this problem is to run the IPA
> > > for each Translation Unit (TU), instead in the Whole Program, and
> > > automatically detect when to partition the TU into multiple LTRANS to
> > > improve performance. The advantage of this approach is:
> > 
> > "to improve compilation performance"
> > 
> > > -   It can generate binaries as good as when LTO is disabled.
> > > 
> > > -   It is faster, as we can partition big files into multiple partitions
> > >     and compile these partitions in parallel
> > > 
> > > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> > 
> > The previous already improves CPU utilization, I guess GNU make jobserver
> > integration avoids CPU overcommit.
> > 
> > > Planned Tasks
> > > 
> > > I plan to use the GSoC time to develop the following topics:
> > > 
> > > -   Week \[1, 3\] -- April 27 to May 15:\
> > >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> > >     IPA analysis directly into multiple LTRANS partitions, instead of
> > >     generating a temporary GIMPLE file.
> > 
> > To summarize in my own words:
> > 
> >   After IPA analysis partition the CU into possibly multiple LTRANS 
> >   partitions even for non-LTO compilations. Invoke LTRANS compilation
> >   for partitions 2..n without writing intermediate IL through mechanisms
> >   like forking.
> > 
> > I might say that you could run into "issues" here with asm_out_file
> > already opened and partially written to.  Possibly easier (but harder
> > on the driver side) would be to stream LTO LTRANS IL for partitions
> > 2..n and handle those like with regular LTO operation.  But I guess
> > I'd try w/o writing IL first and only if it turns out too difficult
> > go the IL writing way.
> 
> Ok. I changed the application text based on that.
> 
> > 
> > > -   Week \[4, 7\] -- May 18 to June 12:\
> > >     Update the `gcc` driver to take these multiple LTRANS partitions,
> > >     then call the compiler and assembler for each of them, and merge the
> > >     results into one object file. Here I will use the LTO LTRANS object
> > >     streaming, therefore it should interact with GNU Make Jobserver.
> > 
> > Hmm, so if you indeed want to do that as second step the first step
> > would still need driver modifications to invoke the assembler.  I think
> > in previous discussions I suggested to have the driver signal cc1 and 
> > friends via a special -fsplit-tu-to-asm-outputs=<tempfile> argument that 
> > splitting is desirable and that the used output assembler files should
> > be written to <tempfile> so the driver can pick them up for assembling
> > and linking.
> > 
> > You also miss the fact that the driver also needs to invoke the linker
> > to merge the N LTRANS objects back to one.
> 
> Actually I tried to avoid entering in such technical detail here, but
> it indeed makes the proposal more concrete.
> 
> > 
> > I suggest you first ignore the jobserver and try doing without
> > LTRANS IL streaming.  I think meanwhile lto1 got jobserver support
> > for the WPA -> LTRANS streaming so you can reuse that for jobserver
> > aware "forking" (and later assembling in the driver).  Using
> > a named pipe or some other mechanism might also allow to pick up
> > assembler output for the individual units as it becomes ready rather
> > than waiting for the slowest LTRANS unit to finish compiling.
> 
> Ok. I updated the proposal with this information.
> 
> > 
> > > -   Week 8 -- June 15 to 19: **First Evaluation**\
> > >     Deliver a non-optimized version of the project. Some programs ought
> > >     to be compiled correctly, but probably there will be a huge overhead
> > >     because so far there will not be any criteria about when to
> > >     partition. Some tests are also planned for this evaluation.
> > > 
> > > -   Week \[9, 11\] -- June 22 to July 10:\
> > >     Implement a criteria about when to partition, and interactively
> > >     improve it based on data.
> > 
> > I think this should be already there (though we error on the side of
> > generating "more" partitions).  For non-LTO parallelizing operation
> > we maybe want to tune the various --params that are available
> > though (lto-min-partition and lto-partitions).
> 
> This is interesting. Here at the Lab we have a student which developed
> experiment design model to predict how some parameters can impact in the
> final result. Could you please give me more details about these
> parameters?

--param lto-partitions specifies the maximum number of partitions
to create and --param lto-min-partition specifies the minimum size
a partition needs to have to be considered for splitting.

> > 
> > So I'd suggest to concentrate on the jobserver integration for the
> > second phase?
> 
> I just changed the proposal to focus in jobserver integration at this
> stage.
> 
> > 
> > Otherwise the proposal looks good and I'm confident we can deliver
> > something that will be ready for real-world usage for GCC 11!
> 
> Thank you :)
> Giuliano.
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > -   Week 12 -- July 13 to 17: **Second Evaluation**\
> > >     Deliver a more optimized version of the project. Here we should
> > >     filter files that would compile fast from files that would require
> > >     partitioning, and therefore we should see some speedup.
> > > 
> > > -   Week \[13, 15\] -- July 20 to August 10:\
> > >     Develop adequate tests coverage and address unexpected issues so
> > >     that this feature can be merged to trunk for the next GCC release.
> > >
> > > -   Week 16: **Final evaluation**\
> > >     Deliver the final product as a series of patches for trunk.
> > > 
> > > 
> > > Thank you
> > > Giuliano.
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
  2020-03-16 13:08 ` Richard Biener
@ 2020-03-17 20:04   ` Giuliano Belinassi
  2020-03-18 11:44     ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Giuliano Belinassi @ 2020-03-17 20:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, mjambor, hubicka

Hi, Richi

Thank you for your review!

On 03/16, Richard Biener wrote:
> On Fri, 13 Mar 2020, Giuliano Belinassi wrote:
> 
> > Hi, all
> > 
> > I want to propose and apply for the following GSoC project: Automatic
> > Detection of Parallel Compilation Viability.
> > 
> > Here is the proposal, and I am attaching a pdf file for better
> > readability:
> > 
> > **Automatic Detection of Parallel Compilation Viability**
> > 
> > [Giuliano Belinassi]{style="color: darkgreen"}\
> > Timezone: GMT$-$3:00\
> > University of São Paulo -- Brazil\
> > IRC: giulianob in \#gcc\
> > Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> > Github: <https://github.com/giulianobelinassi/>\
> > 
> > About Me: Computer Science Bachelor (University of São Paulo), currently
> > pursuing a Masters Degree in Computer Science at the same institution.
> > I've always been fascinated by topics such as High-Performance Computing
> > and Code Optimization, having worked with a parallel implementation of a
> > Boundary Elements Method software in GPU. I am currently conducting
> > research on compiler parallelization and developing the
> > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> > already presented it in [GNU Cauldron
> > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> > 
> > **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> > Parallelism, Multithreaded Debugging and other typical programming
> > tools.
> > 
> > Brief Introduction
> > 
> > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> > parallelizing the Intra Procedural optimizations improves speed when
> > compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> > showed that this takes 75% of compilation time.
> > 
> > In this project we plan to use the LTO infrastructure to improve
> > compilation performance in the non-LTO case, with a tradeoff of
> > generating a binary as good as if LTO is disabled. Here, we will
> > automatically detect when a single file will benefit from parallelism,
> > and proceed with the compilation in parallel if so.
> > 
> > Use of LTO
> > 
> > The Link Time Optimization (LTO) is a compilation technique that allows
> > the compiler to analyse the program as a whole, instead of analysing and
> > compiling one file at time. Therefore, LTO is able to collect more
> > information about the program and generate a better optimization plan.
> > LTO is divided in three parts:
> > 
> > -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
> >     stage runs sequentially in each file and, therefore, in parallel in
> >     the project compilation.
> > 
> > -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
> >     (IPA) in the entire program. This state runs serially in the
> >     project.
> > 
> > -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
> >     Optimizations in each partition. This stage runs in parallel.
> > 
> > Since WPA can bottleneck the compilation because it runs serially in the
> > entire project, LTO was designed to produce faster binaries, not to
> > produce binaries fast.
> > 
> > Here, the proposed use of LTO to address this problem is to run the IPA
> > for each Translation Unit (TU), instead in the Whole Program, and
> > automatically detect when to partition the TU into multiple LTRANS to
> > improve performance. The advantage of this approach is:
> 
> "to improve compilation performance"
> 
> > -   It can generate binaries as good as when LTO is disabled.
> > 
> > -   It is faster, as we can partition big files into multiple partitions
> >     and compile these partitions in parallel
> > 
> > -   It can interact with GNU Make Jobserver, improving CPU utilization.
> 
> The previous already improves CPU utilization, I guess GNU make jobserver
> integration avoids CPU overcommit.
> 
> > Planned Tasks
> > 
> > I plan to use the GSoC time to develop the following topics:
> > 
> > -   Week \[1, 3\] -- April 27 to May 15:\
> >     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
> >     IPA analysis directly into multiple LTRANS partitions, instead of
> >     generating a temporary GIMPLE file.
> 
> To summarize in my own words:
> 
>   After IPA analysis partition the CU into possibly multiple LTRANS 
>   partitions even for non-LTO compilations. Invoke LTRANS compilation
>   for partitions 2..n without writing intermediate IL through mechanisms
>   like forking.
> 
> I might say that you could run into "issues" here with asm_out_file
> already opened and partially written to.  Possibly easier (but harder
> on the driver side) would be to stream LTO LTRANS IL for partitions
> 2..n and handle those like with regular LTO operation.  But I guess
> I'd try w/o writing IL first and only if it turns out too difficult
> go the IL writing way.

Ok. I changed the application text based on that.

> 
> > -   Week \[4, 7\] -- May 18 to June 12:\
> >     Update the `gcc` driver to take these multiple LTRANS partitions,
> >     then call the compiler and assembler for each of them, and merge the
> >     results into one object file. Here I will use the LTO LTRANS object
> >     streaming, therefore it should interact with GNU Make Jobserver.
> 
> Hmm, so if you indeed want to do that as second step the first step
> would still need driver modifications to invoke the assembler.  I think
> in previous discussions I suggested to have the driver signal cc1 and 
> friends via a special -fsplit-tu-to-asm-outputs=<tempfile> argument that 
> splitting is desirable and that the used output assembler files should
> be written to <tempfile> so the driver can pick them up for assembling
> and linking.
> 
> You also miss the fact that the driver also needs to invoke the linker
> to merge the N LTRANS objects back to one.

Actually I tried to avoid entering in such technical detail here, but
it indeed makes the proposal more concrete.

> 
> I suggest you first ignore the jobserver and try doing without
> LTRANS IL streaming.  I think meanwhile lto1 got jobserver support
> for the WPA -> LTRANS streaming so you can reuse that for jobserver
> aware "forking" (and later assembling in the driver).  Using
> a named pipe or some other mechanism might also allow to pick up
> assembler output for the individual units as it becomes ready rather
> than waiting for the slowest LTRANS unit to finish compiling.

Ok. I updated the proposal with this information.

> 
> > -   Week 8 -- June 15 to 19: **First Evaluation**\
> >     Deliver a non-optimized version of the project. Some programs ought
> >     to be compiled correctly, but probably there will be a huge overhead
> >     because so far there will not be any criteria about when to
> >     partition. Some tests are also planned for this evaluation.
> > 
> > -   Week \[9, 11\] -- June 22 to July 10:\
> >     Implement a criteria about when to partition, and interactively
> >     improve it based on data.
> 
> I think this should be already there (though we error on the side of
> generating "more" partitions).  For non-LTO parallelizing operation
> we maybe want to tune the various --params that are available
> though (lto-min-partition and lto-partitions).

This is interesting. Here at the Lab we have a student which developed
experiment design model to predict how some parameters can impact in the
final result. Could you please give me more details about these
parameters?

> 
> So I'd suggest to concentrate on the jobserver integration for the
> second phase?

I just changed the proposal to focus in jobserver integration at this
stage.

> 
> Otherwise the proposal looks good and I'm confident we can deliver
> something that will be ready for real-world usage for GCC 11!

Thank you :)
Giuliano.

> 
> Thanks,
> Richard.
> 
> > -   Week 12 -- July 13 to 17: **Second Evaluation**\
> >     Deliver a more optimized version of the project. Here we should
> >     filter files that would compile fast from files that would require
> >     partitioning, and therefore we should see some speedup.
> > 
> > -   Week \[13, 15\] -- July 20 to August 10:\
> >     Develop adequate tests coverage and address unexpected issues so
> >     that this feature can be merged to trunk for the next GCC release.
> >
> > -   Week 16: **Final evaluation**\
> >     Deliver the final product as a series of patches for trunk.
> > 
> > 
> > Thank you
> > Giuliano.
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GSoC 2020] Automatic Detection of Parallel Compilation Viability
       [not found] <20200313200551.viqhqgjw3gixjarw@smtp.gmail.com>
@ 2020-03-16 13:08 ` Richard Biener
  2020-03-17 20:04   ` Giuliano Belinassi
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2020-03-16 13:08 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc, mjambor, hubicka

On Fri, 13 Mar 2020, Giuliano Belinassi wrote:

> Hi, all
> 
> I want to propose and apply for the following GSoC project: Automatic
> Detection of Parallel Compilation Viability.
> 
> Here is the proposal, and I am attaching a pdf file for better
> readability:
> 
> **Automatic Detection of Parallel Compilation Viability**
> 
> [Giuliano Belinassi]{style="color: darkgreen"}\
> Timezone: GMT$-$3:00\
> University of São Paulo -- Brazil\
> IRC: giulianob in \#gcc\
> Email: [`giuliano.belinassi@usp.br`](mailto:giuliano.belinassi@usp.br)\
> Github: <https://github.com/giulianobelinassi/>\
> 
> About Me: Computer Science Bachelor (University of São Paulo), currently
> pursuing a Masters Degree in Computer Science at the same institution.
> I've always been fascinated by topics such as High-Performance Computing
> and Code Optimization, having worked with a parallel implementation of a
> Boundary Elements Method software in GPU. I am currently conducting
> research on compiler parallelization and developing the
> [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having
> already presented it in [GNU Cauldron
> 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q).
> 
> **Skills**: Strong knowledge in C, Concurrency, Shared Memory
> Parallelism, Multithreaded Debugging and other typical programming
> tools.
> 
> Brief Introduction
> 
> In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that
> parallelizing the Intra Procedural optimizations improves speed when
> compiling huge files by a factor of 1.8x in a 4 cores machine, and also
> showed that this takes 75% of compilation time.
> 
> In this project we plan to use the LTO infrastructure to improve
> compilation performance in the non-LTO case, with a tradeoff of
> generating a binary as good as if LTO is disabled. Here, we will
> automatically detect when a single file will benefit from parallelism,
> and proceed with the compilation in parallel if so.
> 
> Use of LTO
> 
> The Link Time Optimization (LTO) is a compilation technique that allows
> the compiler to analyse the program as a whole, instead of analysing and
> compiling one file at time. Therefore, LTO is able to collect more
> information about the program and generate a better optimization plan.
> LTO is divided in three parts:
> 
> -   *LGEN (Local Generation)*: Each file is translated to GIMPLE. This
>     stage runs sequentially in each file and, therefore, in parallel in
>     the project compilation.
> 
> -   *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis
>     (IPA) in the entire program. This state runs serially in the
>     project.
> 
> -   *LTRANS (Local Transformation)*: Execute all Intra Procedural
>     Optimizations in each partition. This stage runs in parallel.
> 
> Since WPA can bottleneck the compilation because it runs serially in the
> entire project, LTO was designed to produce faster binaries, not to
> produce binaries fast.
> 
> Here, the proposed use of LTO to address this problem is to run the IPA
> for each Translation Unit (TU), instead in the Whole Program, and
> automatically detect when to partition the TU into multiple LTRANS to
> improve performance. The advantage of this approach is:

"to improve compilation performance"

> -   It can generate binaries as good as when LTO is disabled.
> 
> -   It is faster, as we can partition big files into multiple partitions
>     and compile these partitions in parallel
> 
> -   It can interact with GNU Make Jobserver, improving CPU utilization.

The previous already improves CPU utilization, I guess GNU make jobserver
integration avoids CPU overcommit.

> Planned Tasks
> 
> I plan to use the GSoC time to develop the following topics:
> 
> -   Week \[1, 3\] -- April 27 to May 15:\
>     Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after
>     IPA analysis directly into multiple LTRANS partitions, instead of
>     generating a temporary GIMPLE file.

To summarize in my own words:

  After IPA analysis partition the CU into possibly multiple LTRANS 
  partitions even for non-LTO compilations. Invoke LTRANS compilation
  for partitions 2..n without writing intermediate IL through mechanisms
  like forking.

I might say that you could run into "issues" here with asm_out_file
already opened and partially written to.  Possibly easier (but harder
on the driver side) would be to stream LTO LTRANS IL for partitions
2..n and handle those like with regular LTO operation.  But I guess
I'd try w/o writing IL first and only if it turns out too difficult
go the IL writing way.

> -   Week \[4, 7\] -- May 18 to June 12:\
>     Update the `gcc` driver to take these multiple LTRANS partitions,
>     then call the compiler and assembler for each of them, and merge the
>     results into one object file. Here I will use the LTO LTRANS object
>     streaming, therefore it should interact with GNU Make Jobserver.

Hmm, so if you indeed want to do that as second step the first step
would still need driver modifications to invoke the assembler.  I think
in previous discussions I suggested to have the driver signal cc1 and 
friends via a special -fsplit-tu-to-asm-outputs=<tempfile> argument that 
splitting is desirable and that the used output assembler files should
be written to <tempfile> so the driver can pick them up for assembling
and linking.

You also miss the fact that the driver also needs to invoke the linker
to merge the N LTRANS objects back to one.

I suggest you first ignore the jobserver and try doing without
LTRANS IL streaming.  I think meanwhile lto1 got jobserver support
for the WPA -> LTRANS streaming so you can reuse that for jobserver
aware "forking" (and later assembling in the driver).  Using
a named pipe or some other mechanism might also allow to pick up
assembler output for the individual units as it becomes ready rather
than waiting for the slowest LTRANS unit to finish compiling.

> -   Week 8 -- June 15 to 19: **First Evaluation**\
>     Deliver a non-optimized version of the project. Some programs ought
>     to be compiled correctly, but probably there will be a huge overhead
>     because so far there will not be any criteria about when to
>     partition. Some tests are also planned for this evaluation.
> 
> -   Week \[9, 11\] -- June 22 to July 10:\
>     Implement a criteria about when to partition, and interactively
>     improve it based on data.

I think this should be already there (though we error on the side of
generating "more" partitions).  For non-LTO parallelizing operation
we maybe want to tune the various --params that are available
though (lto-min-partition and lto-partitions).

So I'd suggest to concentrate on the jobserver integration for the
second phase?

Otherwise the proposal looks good and I'm confident we can deliver
something that will be ready for real-world usage for GCC 11!

Thanks,
Richard.

> -   Week 12 -- July 13 to 17: **Second Evaluation**\
>     Deliver a more optimized version of the project. Here we should
>     filter files that would compile fast from files that would require
>     partitioning, and therefore we should see some speedup.
> 
> -   Week \[13, 15\] -- July 20 to August 10:\
>     Develop adequate tests coverage and address unexpected issues so
>     that this feature can be merged to trunk for the next GCC release.
>
> -   Week 16: **Final evaluation**\
>     Deliver the final product as a series of patches for trunk.
> 
> 
> Thank you
> Giuliano.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-24 20:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 20:15 [GSoC 2020] Automatic Detection of Parallel Compilation Viability Giuliano Belinassi
2020-03-17 20:24 ` Giuliano Belinassi
2020-03-18 14:27   ` Richard Biener
2020-03-24  0:37     ` Giuliano Belinassi
2020-03-24  7:20       ` Richard Biener
2020-03-24 20:54         ` Giuliano Belinassi
     [not found] <20200313200551.viqhqgjw3gixjarw@smtp.gmail.com>
2020-03-16 13:08 ` Richard Biener
2020-03-17 20:04   ` Giuliano Belinassi
2020-03-18 11:44     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).