From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 574303858D20; Fri, 14 Apr 2023 17:44:49 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 574303858D20
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1681494289;
	bh=QHQ9ryT+oKGnEEoXroiZ6UGiqIIswEgNzg4op2jVOn0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=PXLkKZVDvNhHmUj9YckaGaG/ZWs8A2W+b0uW4F/X8Pmbk5+iuF66SUPRIIkEhJ3y4
	 kcRv3QP7ZlZ4KTTH/Muf91tNwG2jutdLKOdhYqEpnO8c5dFokZlG3fgzujdOfGGKEr
	 ogep5HuYbgObTe0JzpY8X58Z9GpJTfGp+sR0cvNY=
From: "jskumari at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
Date: Fri, 14 Apr 2023 17:44:47 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jskumari at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: jskumari at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-109009-4-xRSMQckCLs@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-109009-4@http.gcc.gnu.org/bugzilla/>
References: <bug-109009-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109009
--- Comment #5 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
I was analysing and comparing the following test cases:

Test1 (shrink wrapped)

long
foo (long i, long cond)
{
  i =3D i + 1;
  if (cond)
    bar ();
  return i;
}


Test2 (not shrink wrapped)

long
foo (long i, long cond)
{
  if (cond)
    bar ();
  return i+1;
}


There is a difference in register allocation by IRA in the two cases.

Input RTL to IRA (Test1: passing case)
BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r117, r122 + 1
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r117
  return r3


Input RTL to IRA (Test2: failing case)

BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r118, r122
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r118+1
  return r3


There is a difference in registers allocated for r117 (passing case) and r1=
18
(failing case) by IRA.
r117 is allocated r3 while r118 is allocated r31.
Since r117 is allocated r3, r3 is spilled across the call to bar() by LRA. =
And
so only BB3 requires a prolog and shrink wrap is successful.
In the failing case, since r31 is assigned to r118, BB2 requires a prolog a=
nd
shrink wrap fails.

In the IRA pass, after graph coloring, both r117 and r118 get assigned to r=
3.
The routine improve_allocation() is called after graph coloring. In this
routine, IRA checks for each allocno if spilling any conflicting allocnos c=
an
improve the=20
allocation of this allocno.

Going into more detail, improve_allocation() does the following:
1. We first compute the cost improvement for usage of each profitable hard
register for a given allocno A. The cost improvement is computed as follows:

costs[regno] =3D A->hard_reg_costs[regno]   // =E2=80=98hard_reg_costs=E2=
=80=99 is an array of
usage=20
                                             costs for each hard register
costs[regno] -=3D allocno_copy_cost_saving (A, regno);
costs[regno] -=3D base_cost;   //Say, =E2=80=98reg=E2=80=99 is assigned to =
A. Then =E2=80=98base_cost=E2=80=99 is=20
                               the usage cost of =E2=80=98reg=E2=80=99 for =
A.

2. Then we process each conflicting allocno of A and update the cost
improvement for the profitable hard registers of A. Basically, we compute t=
he
spill costs of the conflicting allocnos and update the cost (for A) of the
register that was assigned to the conflicting allocno.=20
3. We then find the best register among the profitable registers, spill the
conflicting allocno that uses this best register and assign the best regist=
er
to A.


However, the initial hard register costs for some of the profitable hard
registers is different in the passing and failing cases. More specifically,=
 the
costs in hard_reg_costs[] array are 0 for regs 14-31 in the failing case. A
zero cost seems incorrect. If using a reg in the set [14..31] has zero cost,
then why wasn=E2=80=99t such a reg chosen for r118?
In the passing case, the costs in hard_reg_costs[] for regs 14-31 is 2000.
At the end of step 1, costs[r31] is -390 for failing case(for allocno r118)=
 and
1610 for passing case (for allocno r117).

Another issue(?) is that in step 2, the only conficting allocno for r118 is=
 the
allocno for r120 which is used to hold the value of the condition check. The
pseudo r120 has been assigned to r100 by the graph coloring step. But r100 =
is
not in the set of profitable hard registers for r118. (The profitable hard =
regs
are: [0, 3-12, 14-31]). So the allocno for r120 is not considered for spill=
ing.
 And finally in step 3, r31 is assigned to r118, though r31 has not been
assigned to any conflicting allocno. Perhaps improve_allocation() should on=
ly
consider registers that have been assigned to conflicting allocnos, and not
other registers, since it=E2=80=99s stated aim is to see if spilling confli=
cting
allocnos can result in a better allocation.

I am investigating why hard_reg_costs[] has 0 cost for r14..r31.=