From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E7FB6385843B; Fri, 31 Dec 2021 17:28:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E7FB6385843B From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies Date: Fri, 31 Dec 2021 17:28:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status assigned_to attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Dec 2021 17:28:20 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98782 rsandifo at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot= gnu.org --- Comment #32 from rsandifo at gcc dot gnu.org --- Created attachment 52102 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D52102&action=3Dedit Alternative patch This patch is a squash of several ira tweaks that together recover the pre-GCC11 exchange2 performance on aarch64. It isn't ready for trunk yet (hence lack of comments and changelog). It would be great to hear whether/how it works on other targets though. The patch bootstraps on aarch64-linux-gnu and x86_64-linux-gnu, but there are some new scan-assembler failures that need looking at. Quoting from the covering message: The main changes are: (1) Add ira_loop_border_costs to simplify spill cost calculations (NFC intended) (2) Avoid freeing updated costs until the loop node has been fully allocated. This in turn allows: (3) Make improve_allocation work exclusively on updated costs, rather than using a mixture of updated and original costs. One reason this matters is that the register costs only make sense relative to the memory costs, so in some cases, a common register is subtracted from the updated memory cost instead of being added to each individual updated register cost. (4) If a child allocno has a hard register conflict, allow the parent allocno to handle the conflict by spilling to memory throughout the child allocno's loop. This carries the child allocno's full memory cost plus the cost of spilling to memory on entry to the loop and restoring it on exit, but this can still be cheaper than spilling the entire parent allocno. In particular, it helps for allocnos that are live across a loop but not referenced within it, since the child allocno's memory cost is 0 in that case. (5) Extend (4) to cases in which the child allocno is live across a call. The parent then has a free choice between spilling call-clobbered registers around each call (as normal) or spilling them on entry to the loop, keeping the allocno in memory throughout the loop, and restoring them on exit from the loop. (6) Detect <80><9C>soft conflicts<80><9D> in which: - one allocno (A1) is a cap whose (transitive) <80><9C>real<80>= <9D> allocno is A1' - A1' occurs in loop L1' - the other allocno (A2) is a non-cap allocno - the equivalent of A2 is live across L1' (hence the conflict) but has no references in L1' In this case we can spill A2 around L1' (or perhaps some parent loop) and reuse the same register for A1'. A1 and A2 can then use the same hard register, provided that we make sure not to propagate A1's allocation to A1'.=