[IRA] Query about improve_allocation() routine in IRA

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [IRA] Query about improve_allocation() routine in IRA
@ 2023-04-19 18:53 Surya Kumari Jangala
  2023-04-20 13:56 ` Vladimir Makarov
  0 siblings, 1 reply; 2+ messages in thread
From: Surya Kumari Jangala @ 2023-04-19 18:53 UTC (permalink / raw)
  To: GCC Development, vmakarov, Peter Bergner

Hi Vladimir,
I have been analyzing a test case for which shrink wrapping fails 
(on powerpc, 64bit LE). But if the same test case is slightly modified,
shrink wrapping kicks in.

Here are the two tests:

Test1 (shrink wrapped)

long
foo (long i, long cond)
{
  i = i + 1;
  if (cond)
    bar ();
  return i;
}

Test2 (not shrink wrapped)

long
foo (long i, long cond)
{
  if (cond)
    bar ();
  return i+1;
}

The difference between the two tests is when ‘i’ is incremented.

There is a difference in register allocation by IRA in the two cases.

Input RTL to IRA (Test1: passing case)

BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r117, r122 + 1
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r117
  return r3

Input RTL to IRA (Test2: failing case)

BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r118, r122
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r118+1
  return r3

There is a difference in registers allocated for the pseudo register
r117 (passing case) and pseudo register r118 (failing case) by IRA.
r117 is allocated r3 while r118 is allocated r31.
Since r117 is allocated r3, r3 is spilled across the call to bar() by LRA.
And so only BB3 requires a prolog and shrink wrap is successful.
In the failing case, since r31 is assigned to r118, BB2 requires a prolog
and shrink wrap fails.

In the IRA pass, after graph coloring, both r117 and r118 get assigned
to r3. However, the routine improve_allocation() assigns r31 to r118.
The routine improve_allocation() is called after graph coloring. In this
routine, IRA checks for each allocno if spilling any conflicting allocnos
can improve the allocation of this allocno.

Going into more detail, improve_allocation() does the following:
Step 1: We first compute the cost improvement for usage of each profitable
hard register for a given allocno A. The cost improvement is computed as
follows:

costs[regno] = A->hard_reg_costs[regno]   // ‘hard_reg_costs’ is an array
                                      of usage costs for each hard register
costs[regno] -= allocno_copy_cost_saving (A, regno);
costs[regno] -= base_cost;   //Say, ‘reg’ is assigned to A. Then
                              ‘base_cost’ is the usage cost of ‘reg’ for A.

Step 2: Then we process each conflicting allocno of A and update the cost
improvement for the profitable hard registers of A. Basically, we compute
the spill costs of the conflicting allocnos and update the cost (for A) of
the register that was assigned to the conflicting allocno. 

Step 3: We then find the best register among the profitable registers, spill
the conflicting allocno that uses this best register and assign the best
register to A.

However, the initial hard register costs for some of the profitable hard
registers is different in the passing and failing test cases. More
specifically, the costs in hard_reg_costs[] array are 0 for regs 14-31
in the failing case. In the passing case, the costs in hard_reg_costs[]
for regs 14-31 is 2000. 

At the end of step 1, costs[r31] is -390 for failing case(for allocno r118)
and 1610 for passing case (for allocno r117).

In step 2 for the failing test, the only conflicting allocno for r118 is
the allocno for r120 which is used to hold the value of the compare operation.
The pseudo r120 has been assigned to r100 by the graph coloring step. But
r100 is not in the set of profitable hard registers for r118. (The profitable
hard regs are: [0, 3-12, 14-31]). So the allocno for r120 is not considered
for spilling. And finally in step 3, r31 is assigned to r118.

I have a few queries:

1. A zero cost seems strange for the regs r14-r31. If using a reg in the
set [14..31] has zero cost, then why wasn’t such a reg chosen for r118
in the first place, instead of r3? 

2. In step 3, shouldn’t we restrict the register chosen to be a register
that has been assigned to a conflicting allocno? This is not the case for
the failing test. The allocno for r118 is assigned r31, but there is no
conflicting allocno that has been assigned r31.

3. In steps 1 & 2, shouldn’t we consider the register save and restore
cost too? ’r31’ being a callee-save (non-volatile) register has to be
saved before being used, whereas this is not required for r3 which is a
caller-save (volatile) register. 

Thanks in advance,
Surya

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [IRA] Query about improve_allocation() routine in IRA
  2023-04-19 18:53 [IRA] Query about improve_allocation() routine in IRA Surya Kumari Jangala
@ 2023-04-20 13:56 ` Vladimir Makarov
  0 siblings, 0 replies; 2+ messages in thread
From: Vladimir Makarov @ 2023-04-20 13:56 UTC (permalink / raw)
  To: Surya Kumari Jangala, GCC Development, Peter Bergner


On 4/19/23 14:53, Surya Kumari Jangala wrote:
> ...

> I have a few queries:
>
> 1. A zero cost seems strange for the regs r14-r31. If using a reg in the
> set [14..31] has zero cost, then why wasn’t such a reg chosen for r118
> in the first place, instead of r3?
I guess it is because assign_hard_reg (see also calculate_saved_nregs) 
takes into account that r31 not used yet needs insns to save/restore it 
in prolog/epilog when we assign r31 the first time to any pseudos.  All 
pseudos assigned to r31 after that will not be punished.  It is not 
reflected in the hard reg costs as we do not know will be this hard 
register assigned.   Also an order of hard register assignment can be 
used for some targets. It works when all assignment costs are the same.
> 2. In step 3, shouldn’t we restrict the register chosen to be a register
> that has been assigned to a conflicting allocno? This is not the case for
> the failing test. The allocno for r118 is assigned r31, but there is no
> conflicting allocno that has been assigned r31.

Yes in this case, this approach could be used.  But what if after 
spilling conflicting pseudos for one pseudo, we can simply use hard 
registers of the pseudos spilled before for free for another pseudo 
conflicting with the spilled pseudos.  So I think we should constrain to 
any hard regs used before improve_allocation, whether they are used by 
conflicting pseudos or not.


> 3. In steps 1 & 2, shouldn’t we consider the register save and restore
> cost too? ’r31’ being a callee-save (non-volatile) register has to be
> saved before being used, whereas this is not required for r3 which is a
> caller-save (volatile) register.
>
I guess we could.  I am not against improving the code. Especially as 
IRA was developed for many years. For example cost calculations in 
assign_hard_reg were improved before and after adding improve_allocation 
and it might make some parts of code are less important.  But any new 
change would be nice to be proved by benchmarking (SPEC rates or size or 
executed spilled insns etc). When improve_allocation was added, it 
improved SPEC2000 for x86 but I don't remember by what margin.

Thank you for reporting this case and your questions, proposals and 
interest in RA. I am open to review your patches for RA.



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-04-20 13:56 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-19 18:53 [IRA] Query about improve_allocation() routine in IRA Surya Kumari Jangala
2023-04-20 13:56 ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).