From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CA35C39F0827; Fri, 5 Feb 2021 12:02:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CA35C39F0827 From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies Date: Fri, 05 Feb 2021 12:02:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc short_desc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2021 12:02:05 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98782 Tamar Christina changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org Summary|IRA artificially creating |[11 Regression] Bad |spills due to BB |interaction between IPA |frequencies |frequences and IRA | |resulting in spills due to | |changes in BB frequencies --- Comment #3 from Tamar Christina --- Hi, Since we are in stage-4 I'd like to put all our ducks in a row and see what= the options are at this point. IRA as you can imagine is huge and quite complex, the more I investigate the problem the more I realize that there isn't a spot fix for this issue. It will require a lot more work in = IRA and understanding parts of it that I don't fully understand yet. But one thing is clear, there is a severe interaction between IPA predictio= ns and IRA under conditions where there is high register pressure *and* a function call. The problem is that the changes introduced in g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 make local changes. i.e. they effect only some BB and not others. The problem is any spot fix = in IRA would be a globally scoped. I was investigating whether the issue could be solved by having IRA treat t= he recursive inlined function in exchange2 as one region instead of going live range splitting. And yes using -fira-region=3Done does make a difference, but only a small difference of about 33% of the regression. How= ever doing this has some disadvantage in that regions that before would not count in the live range of the call a= re now counted, so you regress spilling in those cases. This is why this flag can only recover 33% of the regression, it introduces some of it's own. The second alternative I tried as a spot fix is to be able to specify a wei= ght for the CALL_FREQ for use during situations of high reg pressure and call live ranges. The "hack" looks like this: index 4fe019b2367..674e6ca7a48 100644 --- a/gcc/caller-save.c +++ b/gcc/caller-save.c @@ -425,6 +425,7 @@ setup_save_areas (void) || find_reg_note (insn, REG_NORETURN, NULL)) continue; freq =3D REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)); + freq =3D freq * (param_ira_call_freq_weight / 100.f); REG_SET_TO_HARD_REG_SET (hard_regs_to_save, &chain->live_throughout); used_regs =3D insn_callee_abi (insn).full_reg_clobbers (); diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c index 4ba29dcadf4..6e2699e5a7d 100644 --- a/gcc/ira-lives.c +++ b/gcc/ira-lives.c @@ -1392,7 +1392,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node) it was saved on previous call in the same basic block and the hard register was not mentioned between the two calls. */ - ALLOCNO_CALL_FREQ (a) +=3D freq / 3; + ALLOCNO_CALL_FREQ (a) +=3D (freq * (param_ira_call_freq_weight / 100.0f)); diff --git a/gcc/params.opt b/gcc/params.opt index cfed980a4d2..39d5cae9f31 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -321,6 +321,11 @@ Max size of conflict table in MB. Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param Optimization Max loops number for regional RA. +-param=3Dira-call-freq-weight=3D +Common Joined UInteger Var(param_ira_call_freq_weight) Init(100) Param Optimization +Scale to be applied to the weighting of the frequencies of allocations live across +a call. + -param=3Div-always-prune-cand-set-bound=3D Common Joined UInteger Var(param_iv_always_prune_cand_set_bound) Init(10) Param Optimization If number of candidates in the set is smaller, we always try to remove unu= sed ivs during its optimization. And if we look at the changes in the frequency between the good and bad case the prediction changes approx 40%. So using the value of --param ira-call-freq-weight=3D40 recovers about 60% = of the regression. The issue this global change introduce is however that IRA seems to start preferring callee-saves. Which is in itself not an issue, but at the boundary of a region it will then emit moves from temp to callee-sav= es to carry live values to the next region. This is completely unneeded, enabling the late register renaming pass (-frename-registers) removes these superfluous moves and we recover 66% of the regression. But this is just a big hack. The obv= ious disadvantage here, since again it's a global change is that it pushes all caller saves to be spilled before the function call. And indeed, before the recursive call there now is a massive amount of spilling happening. But it is something that would be "safe" to do at this point in the GCC development cycle. The last and preferred approach, if you agree Honza, is if we can revert g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 for GCC 11 and commit it back first things GCC 12. This would give us a release cycle to focus on the interaction with IRA. If we can fix it we fix it, if we can't, we'll have to live with it, but it gives us time to try to look into alternate optimizations. A possibility would also be to introduce a flag that would restore the GCC = 11 behavior, which would give you the optimization for GCC 11 but buy us again time to properly fix in GCC 12. The downside i= s, well, there's a flag now. But from looking at the patch this would be easy to do, it's just ignoring the already predicted check.=20 I would really appreciate some feedback from you two, Honza and Vlad. For now I am also marking this as a 11 regression as I want to reach some k= ind of judgement on what the options are. To summerize, the options I see are in order of preference: 1) temporarily revert g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 2) Add a parameter to restore GCC 10 behavior 3) Add a partial workaround in IRA using the parameter 4) Do nothing and use -fira-region=3Done In all cases we'll have to look at RA in GCC 12. Thanks, Tamar=