public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* GSoC topic: Implement hot cold splitting at GIMPLE IR level
@ 2020-03-03  5:51 Aditya K
  2020-03-03  8:47 ` Martin Liška
  0 siblings, 1 reply; 9+ messages in thread
From: Aditya K @ 2020-03-03  5:51 UTC (permalink / raw)
  To: gcc

Hi Everyone,
I was one of the original authors of hot cold splitting optimization in LLVM. I was wondering if implementing
a region based hot cold splitting optimization would be useful in GCC? We already have optimal implementation of SESE region detection in GCC (https://github.com/gcc-mirror/gcc/blob/master/gcc/sese.h) so implementation of hot cold splitting at IR level could leverage some of that.

Motivation:
With the increasing popularity of RISC-V architecture, where most applications are constrained for code-size. I assume those applications would (soon) be suffering from app launch time and page faults. I don't have numbers for this sorry, just a guess. Having an IR level hot cold splitting pass would benefit applications deployed on such devices by reducing their startup working set. I'd be happy to mentor a GSoC candidate if we chose to list this as one of the projects.

Description of the project: Region based Hot Cold Splitting is an IR level function splitting transformation. The goal of hot/cold splitting is to improve the memory locality of code and helps reduce startup working set. The splitting pass does this by identifying cold blocks and moving them into separate functions. Because it is implemented at the IR level all the back end target benefit from it. It is a relatively new optimization and it was recently presented at the LLVM Dev Meeting in 2019 and the slides are here: https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech8. There are fast algorithms to detect SESE regions as illustrated in (http://impact.gforge.inria.fr/impact2016/papers/impact2016-kumar.pdf), we can leverage that to detect regions. 

Deliverables:
- Implement hot cold splitting pass at GIMPLE level
- Detect maximal cold region in a function and outline it as a separate function
- Use static as well as dynamic profile information to mark cold edges
- Write unit tests to show variety of regions outlined


Thanks,
-Aditya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-03  5:51 GSoC topic: Implement hot cold splitting at GIMPLE IR level Aditya K
@ 2020-03-03  8:47 ` Martin Liška
       [not found]   ` <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Liška @ 2020-03-03  8:47 UTC (permalink / raw)
  To: Aditya K, gcc; +Cc: Jan Hubicka

Hello.

Thank you for idea. I would like to provide some comments about what GCC can currently
do and I'm curious we need something extra on top of what we do.
Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With
a PGO profile, the optimization is quite aggressive and can save quite some code
being placed into a cold partitioning and being optimized for size. Without a profile,
we do a static profile guess (predict.c), where we also propagate information about cold
blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make
the real reordering (bb-reorder.c).

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
       [not found]   ` <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com>
@ 2020-03-03 20:42     ` Aditya K
  2020-03-03 21:25       ` Jan Hubicka
  0 siblings, 1 reply; 9+ messages in thread
From: Aditya K @ 2020-03-03 20:42 UTC (permalink / raw)
  To: gcc


Hi Martin,
Thank you for explaining the status quo. After reading the code of bb-reorder.c,
 it looks pretty good and seems it doesn't need any significant improvements.
In that case, the only value GIMPLE level hot/cold splitting could bring is to enable aggressive code-size optimization
by merging of similar/identical functions: after outlining cold regions, they may be suitable candidates for function merging.
ipa-split might be enabling some of that, having a region based function splitting could improve ipa-split.

-Aditya


--
From: Martin Liška <mliska@suse.cz>
Sent: Tuesday, March 3, 2020 2:47 AM
To: Aditya K <hiraditya@msn.com>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
Cc: Jan Hubicka <hubicka@ucw.cz>
Subject: Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level

Hello.
Thank you for idea. I would like to provide some comments about what GCC can currently
do and I'm curious we need something extra on top of what we do.
Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With
a PGO profile, the optimization is quite aggressive and can save quite some code
being placed into a cold partitioning and being optimized for size. Without a profile,
we do a static profile guess (predict.c), where we also propagate information about cold
blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make
the real reordering (bb-reorder.c).

Martin



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-03 20:42     ` Fw: " Aditya K
@ 2020-03-03 21:25       ` Jan Hubicka
  2020-03-05 15:50         ` Segher Boessenkool
  2020-03-16 23:11         ` Aditya K
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Hubicka @ 2020-03-03 21:25 UTC (permalink / raw)
  To: Aditya K; +Cc: gcc

Aditya,
the hot/cold partitioning is currently organized as folows:

1) we have static branch prediction code in predict.c and profile
   feedback which we store into cfg and callgraph.
2) we have predicates

   optimize_*_for_speed_p/size_p

   where * can be function, basic block, cfg edge, callgraph edge, loop
   or loop nest
3) all optimization passes trading code size for speed should
   the corresponding predicaes about whether to do the transform.
4) ipa-split pass is mostly there to enable partial inlining
5) hot-cold partition in bb-reorder offlines stores
6) we have ipa-icf pass to identify functions and some code in FRE to
   identify code within single function. 
7) we do shrink wrapping to mitigate register pressure problems caused
   in cold regions of code

I think this is bit stronger to what llvm does currently which relies on
outlining SESE regions earlier rather than going through the pain of
implementing support for partitioning.  

Building clang10 with GCC and FDO leads to 37MB .text section
Building clang10 with GCC and LTO+FDO leads to 34MB .text section
Building clang10 with clang10 and FDO leads to 53MB .text section
Building clang10 with clang10 and thinlto+FDO leads to 67MB .text section

GCC built clang is about 2-3% faster building Firefox.

There are many things which I think could/should imporve in our
framework.
 1) our size optimization is very agressive (llvms -Oz) and enaling it
    based on heuristics may lead to huge regressions.  We probably want
    to have optimize_for_size_p predicate to have two levels and do less
    agressive mode in place we are not sure the code is really very
    cold.
 2) ipa-split is very simplistic and only splits when there is no value
    computed in header of function used in the tail.  We should support
    adding extra parameters for values computed and do more general SESE
    outlining

    Note that we do SESE outlining for openMP but this code is not
    interfaced very generically to be easilly used by ipa-split.

    Original implementation of ipa-split was kind of "first cut" trying
    to clean up interfaces to rest of the compiler and implement more
    fancy features later. This never happened so there is certainly
    space for imrovements here.

    We also do all splitting before actual IPA optimization while it may
    be more reasonable to identify potential split points and make IPA
    optimization to decide on transforms (currently we rely on inliner
    to inline back useless splits).
 3) function partitioning is enabled only for x86. I never had time to
    get it working on other targets and no-one picked up this task
 4) ipa-icf and in-function code merging is currently very conservative
    (I plan to work on this next stage1) comparing metadata like type
    based aliasing info.
 5) we have only very limited logic to detect cold regions without
    profile feedback and thus amount of offlined code is very small
    (this also is because of 1).
    We basically know that code leading to abort/exception handling etc
    is cold and consider everything else hot.
 6) We lack code placement pass (though Martin has WIP implementation of
    it)
 7) We do no partitioning of data segment which may be also interesting
    to do.
 8) Most of non-x86 backends do not implement very well the hot/cold
    code models and instruction choice.
 9) Shrink-wrapping and register allocation is not always able to move
    spilling to code paths but this is generally very hard problem to
    track.

So there are a lot of place for improvmeent (and I am sure more can be
found) and I would be happy to help you with them.

Honza
> 
> Hi Martin,
> Thank you for explaining the status quo. After reading the code of bb-reorder.c,
> \xA0it looks pretty good and seems it doesn't need any significant improvements.
> In that case, the only value GIMPLE level hot/cold splitting could bring is to enable aggressive code-size optimization
> by merging of similar/identical functions: after outlining cold regions, they may be suitable candidates for function merging.
> ipa-split might be enabling some of that, having a region based function splitting could improve ipa-split.
> 
> -Aditya
> 
> 
> --
> From: Martin Li\xB9ka <mliska@suse.cz>
> Sent: Tuesday, March 3, 2020 2:47 AM
> To: Aditya K <hiraditya@msn.com>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
> Cc: Jan Hubicka <hubicka@ucw.cz>
> Subject: Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level
> 
> Hello.
> Thank you for idea. I would like to provide some comments about what GCC can currently
> do and I'm curious we need something extra on top of what we do.
> Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With
> a PGO profile, the optimization is quite aggressive and can save quite some code
> being placed into a cold partitioning and being optimized for size. Without a profile,
> we do a static profile guess (predict.c), where we also propagate information about cold
> blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make
> the real reordering (bb-reorder.c).
> 
> Martin
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-03 21:25       ` Jan Hubicka
@ 2020-03-05 15:50         ` Segher Boessenkool
  2020-03-16 23:11         ` Aditya K
  1 sibling, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2020-03-05 15:50 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Aditya K, gcc

Hi!

On Tue, Mar 03, 2020 at 10:25:32PM +0100, Jan Hubicka wrote:
> I think this is bit stronger to what llvm does currently which relies on
> outlining SESE regions earlier rather than going through the pain of
> implementing support for partitioning.  

OTOH outlining can result in improved code, too (it will naturally keep
all "cold" variables in the cold outlined function, because of SESE).

>  3) function partitioning is enabled only for x86. I never had time to
>     get it working on other targets and no-one picked up this task

What makes this not completely generic in the first place?

>  8) Most of non-x86 backends do not implement very well the hot/cold
>     code models and instruction choice.

I'm not sure what this means, could you expand?

>  9) Shrink-wrapping and register allocation is not always able to move
>     spilling to code paths but this is generally very hard problem to
>     track.

Separate shrink-wrapping should handle this quite well.  But, currently
shrink-wrapping does not move existing code very much, which could
improve this further (doing some kind of rematerialisation, essentially).


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-03 21:25       ` Jan Hubicka
  2020-03-05 15:50         ` Segher Boessenkool
@ 2020-03-16 23:11         ` Aditya K
  2020-03-16 23:19           ` Jakub Jelinek
  1 sibling, 1 reply; 9+ messages in thread
From: Aditya K @ 2020-03-16 23:11 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

> 
> 2) ipa-split is very simplistic and only splits when there is no value
>    computed in header of function used in the tail.  We should support
>   adding extra parameters for values computed and do more general SESE
>    outlining
>  Note that we do SESE outlining for openMP but this code is not
>  interfaced very generically to be easilly used by ipa-split.

This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that
could help with function splitting as well as openMP.

>    Original implementation of ipa-split was kind of "first cut" trying
>    to clean up interfaces to rest of the compiler and implement more
>    fancy features later. This never happened so there is certainly
>    space for imrovements here.

I see ` TODO: We might support multiple return blocks. ' in ipa-split.c
which could achieved once we have SEME based function splitting. 

>    We also do all splitting before actual IPA optimization while it may
>    be more reasonable to identify potential split points and make IPA
>    optimization to decide on transforms (currently we rely on inliner
>    to inline back useless splits).

If a student has enough bandwidth, we could allow them to work on this.
Thanks for showing various places of improvement. I'd be happy to mentor on the above project.
Please let me know your thoughts.

Sorry I could not reply earlier because of various reasons.

Best,
-Aditya
 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-16 23:11         ` Aditya K
@ 2020-03-16 23:19           ` Jakub Jelinek
  2020-03-17 14:31             ` Aditya K
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2020-03-16 23:19 UTC (permalink / raw)
  To: Aditya K; +Cc: Jan Hubicka, gcc

On Mon, Mar 16, 2020 at 11:11:14PM +0000, Aditya K via Gcc wrote:
> > 
> > 2) ipa-split is very simplistic and only splits when there is no value
> >    computed in header of function used in the tail.  We should support
> >   adding extra parameters for values computed and do more general SESE
> >    outlining
> >  Note that we do SESE outlining for openMP but this code is not
> >  interfaced very generically to be easilly used by ipa-split.
> 
> This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that
> could help with function splitting as well as openMP.

No, OpenMP region outlining needs to be done where it is done currently,
ipa-split is way too late for that.

	Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-16 23:19           ` Jakub Jelinek
@ 2020-03-17 14:31             ` Aditya K
  2020-03-17 14:47               ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Aditya K @ 2020-03-17 14:31 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc

As I understand the openmp outliner is also at the tree level. A region based outliner could be reused there. I’m not particular about the outliner being specific to ipa-split. A GSoC project can help us get the coding+testing done.  Any pass that needs a function splitting at tree level can reuse them.

-Aditya
________________________________
From: Jakub Jelinek <jakub@redhat.com>
Sent: Monday, March 16, 2020 5:19:16 PM
To: Aditya K <hiraditya@msn.com>
Cc: Jan Hubicka <hubicka@ucw.cz>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
Subject: Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level

On Mon, Mar 16, 2020 at 11:11:14PM +0000, Aditya K via Gcc wrote:
> >
> > 2) ipa-split is very simplistic and only splits when there is no value
> >    computed in header of function used in the tail.  We should support
> >   adding extra parameters for values computed and do more general SESE
> >    outlining
> >  Note that we do SESE outlining for openMP but this code is not
> >  interfaced very generically to be easilly used by ipa-split.
>
> This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that
> could help with function splitting as well as openMP.

No, OpenMP region outlining needs to be done where it is done currently,
ipa-split is way too late for that.

        Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
  2020-03-17 14:31             ` Aditya K
@ 2020-03-17 14:47               ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2020-03-17 14:47 UTC (permalink / raw)
  To: Aditya K; +Cc: Jakub Jelinek, gcc, Jan Hubicka

On Tue, Mar 17, 2020 at 3:33 PM Aditya K via Gcc <gcc@gcc.gnu.org> wrote:
>
> As I understand the openmp outliner is also at the tree level. A region based outliner could be reused there. I’m not particular about the outliner being specific to ipa-split. A GSoC project can help us get the coding+testing done.  Any pass that needs a function splitting at tree level can reuse them.

There's a SESE region outliner (it also handles SEME regions with
alternate exits exiting the function), in move_sese_region_to_fn.
Probably exactly what would be needed for this.  It's also used by
OpenMP outlining (but not IPA split as Honza said).

Richard.

> -Aditya
> ________________________________
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, March 16, 2020 5:19:16 PM
> To: Aditya K <hiraditya@msn.com>
> Cc: Jan Hubicka <hubicka@ucw.cz>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
> Subject: Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
>
> On Mon, Mar 16, 2020 at 11:11:14PM +0000, Aditya K via Gcc wrote:
> > >
> > > 2) ipa-split is very simplistic and only splits when there is no value
> > >    computed in header of function used in the tail.  We should support
> > >   adding extra parameters for values computed and do more general SESE
> > >    outlining
> > >  Note that we do SESE outlining for openMP but this code is not
> > >  interfaced very generically to be easilly used by ipa-split.
> >
> > This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that
> > could help with function splitting as well as openMP.
>
> No, OpenMP region outlining needs to be done where it is done currently,
> ipa-split is way too late for that.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-17 14:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03  5:51 GSoC topic: Implement hot cold splitting at GIMPLE IR level Aditya K
2020-03-03  8:47 ` Martin Liška
     [not found]   ` <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com>
2020-03-03 20:42     ` Fw: " Aditya K
2020-03-03 21:25       ` Jan Hubicka
2020-03-05 15:50         ` Segher Boessenkool
2020-03-16 23:11         ` Aditya K
2020-03-16 23:19           ` Jakub Jelinek
2020-03-17 14:31             ` Aditya K
2020-03-17 14:47               ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).