From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 574303858D20; Fri, 14 Apr 2023 17:44:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 574303858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681494289; bh=QHQ9ryT+oKGnEEoXroiZ6UGiqIIswEgNzg4op2jVOn0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PXLkKZVDvNhHmUj9YckaGaG/ZWs8A2W+b0uW4F/X8Pmbk5+iuF66SUPRIIkEhJ3y4 kcRv3QP7ZlZ4KTTH/Muf91tNwG2jutdLKOdhYqEpnO8c5dFokZlG3fgzujdOfGGKEr ogep5HuYbgObTe0JzpY8X58Z9GpJTfGp+sR0cvNY= From: "jskumari at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/109009] Shrink Wrap missed opportunity Date: Fri, 14 Apr 2023 17:44:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: unknown X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jskumari at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jskumari at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109009 --- Comment #5 from Surya Kumari Jangala --- I was analysing and comparing the following test cases: Test1 (shrink wrapped) long foo (long i, long cond) { i =3D i + 1; if (cond) bar (); return i; } Test2 (not shrink wrapped) long foo (long i, long cond) { if (cond) bar (); return i+1; } There is a difference in register allocation by IRA in the two cases. Input RTL to IRA (Test1: passing case) BB2: set r123, r4 set r122, r3 set r120, compare(r123, 0) set r117, r122 + 1 if r120 jump BB4 else jump BB3 BB3: call bar() BB4: set r3, r117 return r3 Input RTL to IRA (Test2: failing case) BB2: set r123, r4 set r122, r3 set r120, compare(r123, 0) set r118, r122 if r120 jump BB4 else jump BB3 BB3: call bar() BB4: set r3, r118+1 return r3 There is a difference in registers allocated for r117 (passing case) and r1= 18 (failing case) by IRA. r117 is allocated r3 while r118 is allocated r31. Since r117 is allocated r3, r3 is spilled across the call to bar() by LRA. = And so only BB3 requires a prolog and shrink wrap is successful. In the failing case, since r31 is assigned to r118, BB2 requires a prolog a= nd shrink wrap fails. In the IRA pass, after graph coloring, both r117 and r118 get assigned to r= 3. The routine improve_allocation() is called after graph coloring. In this routine, IRA checks for each allocno if spilling any conflicting allocnos c= an improve the=20 allocation of this allocno. Going into more detail, improve_allocation() does the following: 1. We first compute the cost improvement for usage of each profitable hard register for a given allocno A. The cost improvement is computed as follows: costs[regno] =3D A->hard_reg_costs[regno] // =E2=80=98hard_reg_costs=E2= =80=99 is an array of usage=20 costs for each hard register costs[regno] -=3D allocno_copy_cost_saving (A, regno); costs[regno] -=3D base_cost; //Say, =E2=80=98reg=E2=80=99 is assigned to = A. Then =E2=80=98base_cost=E2=80=99 is=20 the usage cost of =E2=80=98reg=E2=80=99 for = A. 2. Then we process each conflicting allocno of A and update the cost improvement for the profitable hard registers of A. Basically, we compute t= he spill costs of the conflicting allocnos and update the cost (for A) of the register that was assigned to the conflicting allocno.=20 3. We then find the best register among the profitable registers, spill the conflicting allocno that uses this best register and assign the best regist= er to A. However, the initial hard register costs for some of the profitable hard registers is different in the passing and failing cases. More specifically,= the costs in hard_reg_costs[] array are 0 for regs 14-31 in the failing case. A zero cost seems incorrect. If using a reg in the set [14..31] has zero cost, then why wasn=E2=80=99t such a reg chosen for r118? In the passing case, the costs in hard_reg_costs[] for regs 14-31 is 2000. At the end of step 1, costs[r31] is -390 for failing case(for allocno r118)= and 1610 for passing case (for allocno r117). Another issue(?) is that in step 2, the only conficting allocno for r118 is= the allocno for r120 which is used to hold the value of the condition check. The pseudo r120 has been assigned to r100 by the graph coloring step. But r100 = is not in the set of profitable hard registers for r118. (The profitable hard = regs are: [0, 3-12, 14-31]). So the allocno for r120 is not considered for spill= ing. And finally in step 3, r31 is assigned to r118, though r31 has not been assigned to any conflicting allocno. Perhaps improve_allocation() should on= ly consider registers that have been assigned to conflicting allocnos, and not other registers, since it=E2=80=99s stated aim is to see if spilling confli= cting allocnos can result in a better allocation. I am investigating why hard_reg_costs[] has 0 cost for r14..r31.=