From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CA6A83858032; Tue, 7 Dec 2021 19:44:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CA6A83858032 From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies Date: Tue, 07 Dec 2021 19:44:42 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Dec 2021 19:44:42 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98782 --- Comment #14 from rsandifo at gcc dot gnu.org --- Thanks for the nice cut-down example. In the original comment and the covering note to patch 1, the highlighted spilled value is the final (terminating) value of foo + 1024. Let's start with the SPILLED=3D0.51 case and consider that register ("R") in isolation without changing anything else. If we spill R to memory (as for the current SPILLED=3D0.51 code), we generate: - a store of R outside the loop (low execution count) - a load of R inside the loop (after the call) with freq 0.51 * loop iters - a load of R inside the loop with freq 0.49 * loop iters If we force R to be allocated a call-clobbered register instead of being spilled (and changing nothing else, via a hack to ira-color.c:improve_allocation) then we generate: - a store of R inside the loop (before the call) with freq 0.51 * loop iters - a load of R inside the loop (after the call) with freq 0.51 * loop iters So the in-loop cost of the second (allocated) version is higher than the in-loop cost of the first (spilled) version. As the value of SPILLED increases, the difference between the two also increases: spilling R gets progressively =E2=80=9Ccheaper=E2=80=9D than the allocating= a call-clobbered register to R. That is, we're trading two memory accesses in the call branch (if we allocate R) against one memory access in both branches (if we spill R). As the call branch gets more likely, the cost of doing two memory accesses there gets higher relative to the cost of doing one memory access in both branches. And that seems like the right behaviour in principle. >>From that point of view, it doesn't look like the memory and register costs of R are too wrong here. The things being costed are the store and load around the call (which do exist if we allocate a call-clobbered register) and the loads at each use site (which do exist if we spill R). Like Feng Xue says in comment 1, I think the main missed optimisation opportunity here is that foo + 1024 is invariant, so if we allocate a call-clobbered register, we could save R once outside the loop and reload it after each call. That would give: - a store of R outside the loop (low execution count) - a load of R inside the loop (after the call) with freq 0.51 * loop iters which is cheaper than both the current approaches. We don't do that optimisation yet though, so the current costing seems to reflect what we currently generate. I don't know how well the above translates to the original example though. Are the some of the spilled values in exchange loop-invariant as well?=