From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6939A3858C60; Wed, 8 Dec 2021 14:31:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6939A3858C60 From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies Date: Wed, 08 Dec 2021 14:31:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Dec 2021 14:31:39 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98782 --- Comment #17 from Tamar Christina --- > On =E2=80=9CCALL_FREQ grows much quicker than BB_FREQ=E2=80=9D: for r104,= the > ALLOCNO_FREQ ought in principle to be fixed for a given loop iteration > count. It shouldn't grow or shrink based on the value of SPILLED. > That's because every execution of the loop body involves exactly one > reference to r104. SPILLED specifies the probability that that single > reference is the =E2=80=9Ccall=E2=80=9D use rather than the =E2=80=9Cnon-= call=E2=80=9D use, but it doesn't > change the total number of references per iteration. >=20 > So I think the only reason we see the different ALLOCNO_FREQs in: >=20 > ALLOCNO_FREQ 989, =E2=80=A6 >=20 > vs: >=20 > ALLOCNO_FREQ 990, =E2=80=A6 >=20 > is round-off error. If the values had more precision, I think we'd > have a fixed ALLOCNO_FREQ and a varying ALLOCNO_CALL_FREQ. yeah, that's plausible, as far as I can tell the FREQ are always scaled by REG_FREQ_FROM_EDGE_FREQ into [0, BB_FREQ_MAX] and that indeed does an integer division. The general problem is that the IPA frequences don't really seem to have any bounded range and so it always needs to scale. So I think you're always going to have this error one way or another which may or may not work to your advantage on any given program. Maybe we need a way to be a bit more tolerant of this rounding error instead? > > Instead I've chosen a middle ground here (same as yours but done in > > ira_tune_allocno_costs instead), which is to store and load only inside > > the loop, but to do so only in the BB which contains the call. > I don't think you were saying otherwise, but just FTR: I wasn't > proposing a solution, I was just describing a hack. It seemed > to me like IRA was making the right decision for r104 in isolation, > for the given SPILLED value and target costs. My hack to force > an allocation for r104 made things worse. Ah ok, fair enough :) >=20 > > > which is cheaper than both the current approaches. We don't do that > > > optimisation yet though, so the current costing seems to reflect what= we > > > currently generate. > >=20 > > In many (if not most) Arches stores are significantly cheaper than the = loads > > though. So the store before the call doesn't end up making that much of= a > > difference, but yes it adds up if you have many of them. > Yeah. Could we fix the problem that way instead? The only reason IRA is > treating loads and stores as equal cost is because aarch64 asked it to :-) I tried a quick check and it does fix the testcase but not the benchmark. w= hich is not entirely unexpected thinking about it because x86 does correctly mod= el the store costs. I can try fixing the costs correctly and try reducing again. It looks like= it still thinks spilling to memory is cheaper than caller saves reloads.=