* GSoC topic: Implement hot cold splitting at GIMPLE IR level @ 2020-03-03 5:51 Aditya K 2020-03-03 8:47 ` Martin Liška 0 siblings, 1 reply; 9+ messages in thread From: Aditya K @ 2020-03-03 5:51 UTC (permalink / raw) To: gcc Hi Everyone, I was one of the original authors of hot cold splitting optimization in LLVM. I was wondering if implementing a region based hot cold splitting optimization would be useful in GCC? We already have optimal implementation of SESE region detection in GCC (https://github.com/gcc-mirror/gcc/blob/master/gcc/sese.h) so implementation of hot cold splitting at IR level could leverage some of that. Motivation: With the increasing popularity of RISC-V architecture, where most applications are constrained for code-size. I assume those applications would (soon) be suffering from app launch time and page faults. I don't have numbers for this sorry, just a guess. Having an IR level hot cold splitting pass would benefit applications deployed on such devices by reducing their startup working set. I'd be happy to mentor a GSoC candidate if we chose to list this as one of the projects. Description of the project: Region based Hot Cold Splitting is an IR level function splitting transformation. The goal of hot/cold splitting is to improve the memory locality of code and helps reduce startup working set. The splitting pass does this by identifying cold blocks and moving them into separate functions. Because it is implemented at the IR level all the back end target benefit from it. It is a relatively new optimization and it was recently presented at the LLVM Dev Meeting in 2019 and the slides are here: https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech8. There are fast algorithms to detect SESE regions as illustrated in (http://impact.gforge.inria.fr/impact2016/papers/impact2016-kumar.pdf), we can leverage that to detect regions. Deliverables: - Implement hot cold splitting pass at GIMPLE level - Detect maximal cold region in a function and outline it as a separate function - Use static as well as dynamic profile information to mark cold edges - Write unit tests to show variety of regions outlined Thanks, -Aditya ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-03 5:51 GSoC topic: Implement hot cold splitting at GIMPLE IR level Aditya K @ 2020-03-03 8:47 ` Martin Liška [not found] ` <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com> 0 siblings, 1 reply; 9+ messages in thread From: Martin Liška @ 2020-03-03 8:47 UTC (permalink / raw) To: Aditya K, gcc; +Cc: Jan Hubicka Hello. Thank you for idea. I would like to provide some comments about what GCC can currently do and I'm curious we need something extra on top of what we do. Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With a PGO profile, the optimization is quite aggressive and can save quite some code being placed into a cold partitioning and being optimized for size. Without a profile, we do a static profile guess (predict.c), where we also propagate information about cold blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make the real reordering (bb-reorder.c). Martin ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com>]
* Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level [not found] ` <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com> @ 2020-03-03 20:42 ` Aditya K 2020-03-03 21:25 ` Jan Hubicka 0 siblings, 1 reply; 9+ messages in thread From: Aditya K @ 2020-03-03 20:42 UTC (permalink / raw) To: gcc Hi Martin, Thank you for explaining the status quo. After reading the code of bb-reorder.c, it looks pretty good and seems it doesn't need any significant improvements. In that case, the only value GIMPLE level hot/cold splitting could bring is to enable aggressive code-size optimization by merging of similar/identical functions: after outlining cold regions, they may be suitable candidates for function merging. ipa-split might be enabling some of that, having a region based function splitting could improve ipa-split. -Aditya -- From: Martin Liška <mliska@suse.cz> Sent: Tuesday, March 3, 2020 2:47 AM To: Aditya K <hiraditya@msn.com>; gcc@gcc.gnu.org <gcc@gcc.gnu.org> Cc: Jan Hubicka <hubicka@ucw.cz> Subject: Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level Hello. Thank you for idea. I would like to provide some comments about what GCC can currently do and I'm curious we need something extra on top of what we do. Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With a PGO profile, the optimization is quite aggressive and can save quite some code being placed into a cold partitioning and being optimized for size. Without a profile, we do a static profile guess (predict.c), where we also propagate information about cold blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make the real reordering (bb-reorder.c). Martin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-03 20:42 ` Fw: " Aditya K @ 2020-03-03 21:25 ` Jan Hubicka 2020-03-05 15:50 ` Segher Boessenkool 2020-03-16 23:11 ` Aditya K 0 siblings, 2 replies; 9+ messages in thread From: Jan Hubicka @ 2020-03-03 21:25 UTC (permalink / raw) To: Aditya K; +Cc: gcc Aditya, the hot/cold partitioning is currently organized as folows: 1) we have static branch prediction code in predict.c and profile feedback which we store into cfg and callgraph. 2) we have predicates optimize_*_for_speed_p/size_p where * can be function, basic block, cfg edge, callgraph edge, loop or loop nest 3) all optimization passes trading code size for speed should the corresponding predicaes about whether to do the transform. 4) ipa-split pass is mostly there to enable partial inlining 5) hot-cold partition in bb-reorder offlines stores 6) we have ipa-icf pass to identify functions and some code in FRE to identify code within single function. 7) we do shrink wrapping to mitigate register pressure problems caused in cold regions of code I think this is bit stronger to what llvm does currently which relies on outlining SESE regions earlier rather than going through the pain of implementing support for partitioning. Building clang10 with GCC and FDO leads to 37MB .text section Building clang10 with GCC and LTO+FDO leads to 34MB .text section Building clang10 with clang10 and FDO leads to 53MB .text section Building clang10 with clang10 and thinlto+FDO leads to 67MB .text section GCC built clang is about 2-3% faster building Firefox. There are many things which I think could/should imporve in our framework. 1) our size optimization is very agressive (llvms -Oz) and enaling it based on heuristics may lead to huge regressions. We probably want to have optimize_for_size_p predicate to have two levels and do less agressive mode in place we are not sure the code is really very cold. 2) ipa-split is very simplistic and only splits when there is no value computed in header of function used in the tail. We should support adding extra parameters for values computed and do more general SESE outlining Note that we do SESE outlining for openMP but this code is not interfaced very generically to be easilly used by ipa-split. Original implementation of ipa-split was kind of "first cut" trying to clean up interfaces to rest of the compiler and implement more fancy features later. This never happened so there is certainly space for imrovements here. We also do all splitting before actual IPA optimization while it may be more reasonable to identify potential split points and make IPA optimization to decide on transforms (currently we rely on inliner to inline back useless splits). 3) function partitioning is enabled only for x86. I never had time to get it working on other targets and no-one picked up this task 4) ipa-icf and in-function code merging is currently very conservative (I plan to work on this next stage1) comparing metadata like type based aliasing info. 5) we have only very limited logic to detect cold regions without profile feedback and thus amount of offlined code is very small (this also is because of 1). We basically know that code leading to abort/exception handling etc is cold and consider everything else hot. 6) We lack code placement pass (though Martin has WIP implementation of it) 7) We do no partitioning of data segment which may be also interesting to do. 8) Most of non-x86 backends do not implement very well the hot/cold code models and instruction choice. 9) Shrink-wrapping and register allocation is not always able to move spilling to code paths but this is generally very hard problem to track. So there are a lot of place for improvmeent (and I am sure more can be found) and I would be happy to help you with them. Honza > > Hi Martin, > Thank you for explaining the status quo. After reading the code of bb-reorder.c, > \xA0it looks pretty good and seems it doesn't need any significant improvements. > In that case, the only value GIMPLE level hot/cold splitting could bring is to enable aggressive code-size optimization > by merging of similar/identical functions: after outlining cold regions, they may be suitable candidates for function merging. > ipa-split might be enabling some of that, having a region based function splitting could improve ipa-split. > > -Aditya > > > -- > From: Martin Li\xB9ka <mliska@suse.cz> > Sent: Tuesday, March 3, 2020 2:47 AM > To: Aditya K <hiraditya@msn.com>; gcc@gcc.gnu.org <gcc@gcc.gnu.org> > Cc: Jan Hubicka <hubicka@ucw.cz> > Subject: Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level > > Hello. > Thank you for idea. I would like to provide some comments about what GCC can currently > do and I'm curious we need something extra on top of what we do. > Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With > a PGO profile, the optimization is quite aggressive and can save quite some code > being placed into a cold partitioning and being optimized for size. Without a profile, > we do a static profile guess (predict.c), where we also propagate information about cold > blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make > the real reordering (bb-reorder.c). > > Martin > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-03 21:25 ` Jan Hubicka @ 2020-03-05 15:50 ` Segher Boessenkool 2020-03-16 23:11 ` Aditya K 1 sibling, 0 replies; 9+ messages in thread From: Segher Boessenkool @ 2020-03-05 15:50 UTC (permalink / raw) To: Jan Hubicka; +Cc: Aditya K, gcc Hi! On Tue, Mar 03, 2020 at 10:25:32PM +0100, Jan Hubicka wrote: > I think this is bit stronger to what llvm does currently which relies on > outlining SESE regions earlier rather than going through the pain of > implementing support for partitioning. OTOH outlining can result in improved code, too (it will naturally keep all "cold" variables in the cold outlined function, because of SESE). > 3) function partitioning is enabled only for x86. I never had time to > get it working on other targets and no-one picked up this task What makes this not completely generic in the first place? > 8) Most of non-x86 backends do not implement very well the hot/cold > code models and instruction choice. I'm not sure what this means, could you expand? > 9) Shrink-wrapping and register allocation is not always able to move > spilling to code paths but this is generally very hard problem to > track. Separate shrink-wrapping should handle this quite well. But, currently shrink-wrapping does not move existing code very much, which could improve this further (doing some kind of rematerialisation, essentially). Segher ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-03 21:25 ` Jan Hubicka 2020-03-05 15:50 ` Segher Boessenkool @ 2020-03-16 23:11 ` Aditya K 2020-03-16 23:19 ` Jakub Jelinek 1 sibling, 1 reply; 9+ messages in thread From: Aditya K @ 2020-03-16 23:11 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc > > 2) ipa-split is very simplistic and only splits when there is no value > computed in header of function used in the tail. We should support > adding extra parameters for values computed and do more general SESE > outlining > Note that we do SESE outlining for openMP but this code is not > interfaced very generically to be easilly used by ipa-split. This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that could help with function splitting as well as openMP. > Original implementation of ipa-split was kind of "first cut" trying > to clean up interfaces to rest of the compiler and implement more > fancy features later. This never happened so there is certainly > space for imrovements here. I see ` TODO: We might support multiple return blocks. ' in ipa-split.c which could achieved once we have SEME based function splitting. > We also do all splitting before actual IPA optimization while it may > be more reasonable to identify potential split points and make IPA > optimization to decide on transforms (currently we rely on inliner > to inline back useless splits). If a student has enough bandwidth, we could allow them to work on this. Thanks for showing various places of improvement. I'd be happy to mentor on the above project. Please let me know your thoughts. Sorry I could not reply earlier because of various reasons. Best, -Aditya ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-16 23:11 ` Aditya K @ 2020-03-16 23:19 ` Jakub Jelinek 2020-03-17 14:31 ` Aditya K 0 siblings, 1 reply; 9+ messages in thread From: Jakub Jelinek @ 2020-03-16 23:19 UTC (permalink / raw) To: Aditya K; +Cc: Jan Hubicka, gcc On Mon, Mar 16, 2020 at 11:11:14PM +0000, Aditya K via Gcc wrote: > > > > 2) ipa-split is very simplistic and only splits when there is no value > > computed in header of function used in the tail. We should support > > adding extra parameters for values computed and do more general SESE > > outlining > > Note that we do SESE outlining for openMP but this code is not > > interfaced very generically to be easilly used by ipa-split. > > This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that > could help with function splitting as well as openMP. No, OpenMP region outlining needs to be done where it is done currently, ipa-split is way too late for that. Jakub ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-16 23:19 ` Jakub Jelinek @ 2020-03-17 14:31 ` Aditya K 2020-03-17 14:47 ` Richard Biener 0 siblings, 1 reply; 9+ messages in thread From: Aditya K @ 2020-03-17 14:31 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jan Hubicka, gcc As I understand the openmp outliner is also at the tree level. A region based outliner could be reused there. I’m not particular about the outliner being specific to ipa-split. A GSoC project can help us get the coding+testing done. Any pass that needs a function splitting at tree level can reuse them. -Aditya ________________________________ From: Jakub Jelinek <jakub@redhat.com> Sent: Monday, March 16, 2020 5:19:16 PM To: Aditya K <hiraditya@msn.com> Cc: Jan Hubicka <hubicka@ucw.cz>; gcc@gcc.gnu.org <gcc@gcc.gnu.org> Subject: Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level On Mon, Mar 16, 2020 at 11:11:14PM +0000, Aditya K via Gcc wrote: > > > > 2) ipa-split is very simplistic and only splits when there is no value > > computed in header of function used in the tail. We should support > > adding extra parameters for values computed and do more general SESE > > outlining > > Note that we do SESE outlining for openMP but this code is not > > interfaced very generically to be easilly used by ipa-split. > > This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that > could help with function splitting as well as openMP. No, OpenMP region outlining needs to be done where it is done currently, ipa-split is way too late for that. Jakub ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level 2020-03-17 14:31 ` Aditya K @ 2020-03-17 14:47 ` Richard Biener 0 siblings, 0 replies; 9+ messages in thread From: Richard Biener @ 2020-03-17 14:47 UTC (permalink / raw) To: Aditya K; +Cc: Jakub Jelinek, gcc, Jan Hubicka On Tue, Mar 17, 2020 at 3:33 PM Aditya K via Gcc <gcc@gcc.gnu.org> wrote: > > As I understand the openmp outliner is also at the tree level. A region based outliner could be reused there. I’m not particular about the outliner being specific to ipa-split. A GSoC project can help us get the coding+testing done. Any pass that needs a function splitting at tree level can reuse them. There's a SESE region outliner (it also handles SEME regions with alternate exits exiting the function), in move_sese_region_to_fn. Probably exactly what would be needed for this. It's also used by OpenMP outlining (but not IPA split as Honza said). Richard. > -Aditya > ________________________________ > From: Jakub Jelinek <jakub@redhat.com> > Sent: Monday, March 16, 2020 5:19:16 PM > To: Aditya K <hiraditya@msn.com> > Cc: Jan Hubicka <hubicka@ucw.cz>; gcc@gcc.gnu.org <gcc@gcc.gnu.org> > Subject: Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level > > On Mon, Mar 16, 2020 at 11:11:14PM +0000, Aditya K via Gcc wrote: > > > > > > 2) ipa-split is very simplistic and only splits when there is no value > > > computed in header of function used in the tail. We should support > > > adding extra parameters for values computed and do more general SESE > > > outlining > > > Note that we do SESE outlining for openMP but this code is not > > > interfaced very generically to be easilly used by ipa-split. > > > > This sounds like a good GSoC project to work on. We could have a SESE/SEME based ipa-split, that > > could help with function splitting as well as openMP. > > No, OpenMP region outlining needs to be done where it is done currently, > ipa-split is way too late for that. > > Jakub > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-17 14:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-03 5:51 GSoC topic: Implement hot cold splitting at GIMPLE IR level Aditya K 2020-03-03 8:47 ` Martin Liška [not found] ` <BYAPR08MB42320E829A37607C0AC4F967B6E40@BYAPR08MB4232.namprd08.prod.outlook.com> 2020-03-03 20:42 ` Fw: " Aditya K 2020-03-03 21:25 ` Jan Hubicka 2020-03-05 15:50 ` Segher Boessenkool 2020-03-16 23:11 ` Aditya K 2020-03-16 23:19 ` Jakub Jelinek 2020-03-17 14:31 ` Aditya K 2020-03-17 14:47 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).