[Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity
@ 2023-03-03 14:04 jskumari at gcc dot gnu.org
  2023-03-03 16:40 ` [Bug rtl-optimization/109009] " segher at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-03-03 14:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

            Bug ID: 109009
           Summary: Shrink Wrap missed opportunity
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jskumari at gcc dot gnu.org
  Target Milestone: ---

The following test case does not get shrink wrapped:

void bar (void);

long
foo (long i, long cond)
{
  if (cond)
    bar ();
  return i+1;
}

However, if we change the return statement from
   return i+1;
to
   return i;

then shrink wrapping kicks in.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
@ 2023-03-03 16:40 ` segher at gcc dot gnu.org
  2023-03-05  5:23 ` jskumari at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2023-03-03 16:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This is very target-specific.  Could you please attach a test case (with any
significant compiler flags as well, and specific target mentioned, etc.) that
shows the problem?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
  2023-03-03 16:40 ` [Bug rtl-optimization/109009] " segher at gcc dot gnu.org
@ 2023-03-05  5:23 ` jskumari at gcc dot gnu.org
  2023-03-05 12:19 ` jskumari at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-03-05  5:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #2 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
For the working testcase:

long
foo (long i, long cond)
{
  if (cond)
    bar ();
  return i;
}

The input RTL to the shrink wrap pass is:

BB2:
   set r100, compare(r4, 0)
   if r100 jump BB4 else jump BB3

BB3:
   set mem(r1+32), r3
   call bar()
   set r3, mem(r1+32)

BB4:
   return r3

The shrink wrap pass concludes that only BB3 requires a prolog and successfully
does shrink wrapping.

For the non-working case:
long
foo (long i, long cond)
{
  if (cond)
    bar ();
  return i+1;
}

The input rtl to the shrink wrap pass is:

BB2:
   set r100, compare(r4, 0)
   set r31, r3
   if r100 jump BB4 else jump BB3

BB3:
   call bar()

BB4:
   set r3, r31+1
   return r3

The shrink wrap pass decides that both BB2 and BB3 require a prolog, and
finally the prolog is placed in BB2. Thus shrink wrapping fails for this test.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
  2023-03-03 16:40 ` [Bug rtl-optimization/109009] " segher at gcc dot gnu.org
  2023-03-05  5:23 ` jskumari at gcc dot gnu.org
@ 2023-03-05 12:19 ` jskumari at gcc dot gnu.org
  2023-03-05 15:01 ` segher at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-03-05 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #3 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
For the working case:

 * Input RTL to the IRA pass:
BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r118, r122
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r118
  return


 * RTL after the IRA pass:
same as the input

 * RTL after reload pass:
BB2:
  set r100, compare(r4, 0)
  if r100 jump BB4 else jump BB3
BB3:
  set mem(r1+32), r3
  call bar()
  set r3, mem(r1+32)

-----------

For the failing case:

 * Input RTL to the IRA pass:
BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r118, r122
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r118+1
  return r3

  * RTL after IRA pass
same as the input

  * RTL after reload pass
BB2:
  set r100, compare(r4, 0)
  set r31, r3
  if r100 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r31+1
  return r3

--------

After the IRA pass, the IR looks very similar in both the working and failing
testcases. But the reload pass produces different IR. 
The live ranges seem to be properly split after IRA. I will be checking why the
reload/LRA pass produces different IR.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-03-05 12:19 ` jskumari at gcc dot gnu.org
@ 2023-03-05 15:01 ` segher at gcc dot gnu.org
  2023-04-14 17:44 ` jskumari at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2023-03-05 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #4 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Alternatively (or in addition), you can look how to make the shrink-wrap pass
transform the code with some simple added move instructions, maybe even
involving an extra pseudo, so that it can shrink-wrap more.  A very simple
(and because of that, not very effective) version of that is done in
prepare_shrink_wrapping already.

The problem with this is that such transformations are not free: the extra
insns can often be optimised away (just by register allocation), and even if
not, if it causes more / better shrink-wrapping it is a win anyway.  But it
has to be done only if it improves shrink-wrapping (or it likely improves it),
or it isn't a net win.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-03-05 15:01 ` segher at gcc dot gnu.org
@ 2023-04-14 17:44 ` jskumari at gcc dot gnu.org
  2023-05-10 11:51 ` jskumari at gcc dot gnu.org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-04-14 17:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #5 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
I was analysing and comparing the following test cases:

Test1 (shrink wrapped)

long
foo (long i, long cond)
{
  i = i + 1;
  if (cond)
    bar ();
  return i;
}

Test2 (not shrink wrapped)

long
foo (long i, long cond)
{
  if (cond)
    bar ();
  return i+1;
}

There is a difference in register allocation by IRA in the two cases.

Input RTL to IRA (Test1: passing case)
BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r117, r122 + 1
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r117
  return r3

Input RTL to IRA (Test2: failing case)

BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r118, r122
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r118+1
  return r3

There is a difference in registers allocated for r117 (passing case) and r118
(failing case) by IRA.
r117 is allocated r3 while r118 is allocated r31.
Since r117 is allocated r3, r3 is spilled across the call to bar() by LRA. And
so only BB3 requires a prolog and shrink wrap is successful.
In the failing case, since r31 is assigned to r118, BB2 requires a prolog and
shrink wrap fails.

In the IRA pass, after graph coloring, both r117 and r118 get assigned to r3.
The routine improve_allocation() is called after graph coloring. In this
routine, IRA checks for each allocno if spilling any conflicting allocnos can
improve the 
allocation of this allocno.

Going into more detail, improve_allocation() does the following:
1. We first compute the cost improvement for usage of each profitable hard
register for a given allocno A. The cost improvement is computed as follows:

costs[regno] = A->hard_reg_costs[regno]   // ‘hard_reg_costs’ is an array of
usage 
                                             costs for each hard register
costs[regno] -= allocno_copy_cost_saving (A, regno);
costs[regno] -= base_cost;   //Say, ‘reg’ is assigned to A. Then ‘base_cost’ is 
                               the usage cost of ‘reg’ for A.

2. Then we process each conflicting allocno of A and update the cost
improvement for the profitable hard registers of A. Basically, we compute the
spill costs of the conflicting allocnos and update the cost (for A) of the
register that was assigned to the conflicting allocno. 
3. We then find the best register among the profitable registers, spill the
conflicting allocno that uses this best register and assign the best register
to A.

However, the initial hard register costs for some of the profitable hard
registers is different in the passing and failing cases. More specifically, the
costs in hard_reg_costs[] array are 0 for regs 14-31 in the failing case. A
zero cost seems incorrect. If using a reg in the set [14..31] has zero cost,
then why wasn’t such a reg chosen for r118?
In the passing case, the costs in hard_reg_costs[] for regs 14-31 is 2000.
At the end of step 1, costs[r31] is -390 for failing case(for allocno r118) and
1610 for passing case (for allocno r117).

Another issue(?) is that in step 2, the only conficting allocno for r118 is the
allocno for r120 which is used to hold the value of the condition check. The
pseudo r120 has been assigned to r100 by the graph coloring step. But r100 is
not in the set of profitable hard registers for r118. (The profitable hard regs
are: [0, 3-12, 14-31]). So the allocno for r120 is not considered for spilling.
 And finally in step 3, r31 is assigned to r118, though r31 has not been
assigned to any conflicting allocno. Perhaps improve_allocation() should only
consider registers that have been assigned to conflicting allocnos, and not
other registers, since it’s stated aim is to see if spilling conflicting
allocnos can result in a better allocation.

I am investigating why hard_reg_costs[] has 0 cost for r14..r31.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-04-14 17:44 ` jskumari at gcc dot gnu.org
@ 2023-05-10 11:51 ` jskumari at gcc dot gnu.org
  2023-05-11  9:49 ` jskumari at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-05-10 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #6 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
Continuing with the analysis of the test cases specified in comment 5, here are
some findings:

After graph colouring, when we do improve_allocation(), we find that in the
failing test case, the hard_reg_cost[r31] for allocno r118 is 0. (Note: not
just r31, for several other registers the cost is 0). The cost for r3 is 2390.

I spent some time investigating why the hard_reg_cost is 0. In the failing
test, the hard_reg_cost[] array is initialized to ALLOCNO_CLASS_COST in the
routine setup_allocno_class_and_costs(). ALLOCNO_CLASS_COST for allocno r118 is
0.

In the passing test, the hard_reg_cost[] array is initialized to
ALLOCNO_CLASS_COST in the routine process_bb_node_for_hard_reg_moves().
ALLOCNO_CLASS_COST is 2000 for allocno r117.

The reason the initialization happens in a different routine in the passing and
failing tests is because the preferred register class is different from the
allocno class in the failing case (the former is BASE_REGS while the latter is
GENERAL_REGS), while in the passing case both are same (GENERAL_REGS).

ALLOCNO_CLASS_COST is minimal accumulated register class cost of the allocno
class.

In the failing case, while the initial value of hard_reg_cost[r3] is 0, it is
changed in the following routines:
1. ira_tune_allocno_costs() : 0->2640
 (this routine checks if the register is caller save and if it is live across a
call. r3 is caller save while r31 is not. So, only cost for r3 is changed.)
2. process_regs_for_copy() : 2640->2390
 (this routine processes the 'set r3, r118+1' insn and reduces the cost of r3
in order to make it more preferential for r118. Again, cost of r31 is not
changed as there is no insn involving r31 & r118)

Even though r31 has 0 cost, r3 is chosen to be assigned to r118 because r31 is
a callee save register and it's use will have save/restore cost.

In the passing case, both r31 and r3 initially have a cost of 2000. 

The cost for r3 is then changed in the following routines:
1. process_bb_node_for_hard_reg_moves() : 2000->0
     (the cost is reduced in order to give preference to r3 for allocno r117.
This is due to the 'set r3, r117' insn)
2. ira_tune_allocno_costs () : 0->2640
3. process_regs_for_copy() : 2640->640

r3 is chosen for allocno r117 as it has lesser cost than r31.

Next, I checked why ALLOCNO_CLASS_COST is 0 in the failing case while in the
passing case it is 2000.

In the routine find_costs_and_classes(), we compute the cost of each register
class for each allocno. First we go thru all the insns in the routine. Then for
each insn which involves a copy from/to a hard reg to/from a pseudo reg, we
compute the cost of the 'move' if the pseudo is assigned a register from a
specific register class. This cost is the register class cost of this allocno
for this specific insn. We add up all such costs to get the register class cost
for an allocno.

In the passing case, we have the insn 'set r3, r117' which is processed in
find_costs_and_classes(). For this insn, we check the cost of the move if r117
is assigned to different register classes. The minimum of these costs is the
ALLOCNO_CLASS_COST for r117.

But in the failing case, there is no 'set' insn involving a hard register and
the pseudo register r118. So the ALLOCNO_CLASS_COST is 0.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-05-10 11:51 ` jskumari at gcc dot gnu.org
@ 2023-05-11  9:49 ` jskumari at gcc dot gnu.org
  2023-06-23 15:03 ` jskumari at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-05-11  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #7 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
There are a couple of issues in IRA:

1. In improve_allocation() routine, we are not considering save/restore cost of
using a callee save register (r31 in the failing case). Due to this, r31 is
being chosen instead of r3 for allocno r118.

2. In find_costs_and_classes(), we are not computing the cost of register moves
when we have a 'set' insn that copies from one pseudo reg to another pseudo
reg. This is resulting in r118 having ALLOCNO_CLASS_COST of 0.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-05-11  9:49 ` jskumari at gcc dot gnu.org
@ 2023-06-23 15:03 ` jskumari at gcc dot gnu.org
  2023-06-23 19:57 ` bergner at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-06-23 15:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #8 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
(In reply to Surya Kumari Jangala from comment #7)
> There are a couple of issues in IRA:
> 
> 1. In improve_allocation() routine, we are not considering save/restore cost
> of using a callee save register (r31 in the failing case). Due to this, r31
> is being chosen instead of r3 for allocno r118.
> 

I made changes in the improve_allocation() routine to consider the save/restore
cost, but this still results in r31 being chosen for allocno r118.

Save/restore cost is computed by using the costs in the "ira_memory_move_cost"
array. But the memory move costs are quite small, it is 4. So even adding this
to r31 does not make the cost of r31 large enough for it to be not chosen.

However, while computing the save/restore cost, we are considering only the
memory move cost but not the BB frequency. I think it is important to consider
the frequency too. We factor in the frequency when we compute the savings on
removed copies in allocno_copy_cost_saving(). In this routine, we multiply the
frequency with ira_register_move_cost. So why not factor in the frequency for
memory move cost?

For the testcase we are considering:

BB2:
  set r123, r4
  set r122, r3
  set r120, compare(r123, 0)
  set r118, r122
  if r120 jump BB4 else jump BB3
BB3:
  call bar()
BB4:
  set r3, r118+1
  return r3

In improve_allocation(), we compute the cost improvement of each hard register
for allocno r118. And for each hard register we compute the
allocno_copy_cost_saving() if that hard reg is assigned to r118. In
allocno_copy_cost_saving(), we check if there are copies involving r118. And we
have one copy:
    set r118, r122

Since r122 is a copy of r3, so the copy cost saving if r3 is assigned to r118
is:
(freq of the copy insn) * ira_register_move_cost

Here, freq of the copy insn is taken as 1000 and ira_register_move_cost is 2.
So the copy cost saving is 2000. Note that BB2 in which the copy insn occurs is
the first BB.

The save insn for r31 will be placed in the prolog, so the freq of prolog needs
to be taken into consideration.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-06-23 15:03 ` jskumari at gcc dot gnu.org
@ 2023-06-23 19:57 ` bergner at gcc dot gnu.org
  2023-06-23 20:04 ` bergner at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-06-23 19:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #9 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Surya Kumari Jangala from comment #8)
> However, while computing the save/restore cost, we are considering only the
> memory move cost but not the BB frequency. I think it is important to
> consider the frequency too. 

Yes, you'll need to factor in the BB frequency.  Since the save/restore code
will go into (at this point modulo shrink-wrapping later) the prologue and
epilogue, you'll want something like: <prologue/epilogue freq> * 2 *
"ira_memory_move_cost".
I think the issue here, is that the "frequency" of the entry block isn't '1',
but some larger value.  I'm not 100% sure, but maybe you can use:
  REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))
for the BB frequency of the prologue/epilogue?

> We factor in the frequency when we compute the
> savings on removed copies in allocno_copy_cost_saving(). In this routine, we
> multiply the frequency with ira_register_move_cost. So why not factor in the
> frequency for memory move cost?

allocno_copy_cost_saving() is dealing with an actual copy instruction(s), so a
real instruction in a specific BB, so it knows the frequency of the copy(ies). 
The ira_memory_move_cost is more of a HW cost of a generic load/store and it
isn't tied to a specific instruction, so there is no frequency to scale by, so
you'll need to do that manually here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-06-23 19:57 ` bergner at gcc dot gnu.org
@ 2023-06-23 20:04 ` bergner at gcc dot gnu.org
  2023-06-27 13:18 ` jskumari at gcc dot gnu.org
  2023-06-27 13:19 ` jskumari at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: bergner at gcc dot gnu.org @ 2023-06-23 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #10 from Peter Bergner <bergner at gcc dot gnu.org> ---
(In reply to Peter Bergner from comment #9)
> Yes, you'll need to factor in the BB frequency.  Since the save/restore code
> will go into (at this point modulo shrink-wrapping later) the prologue and
> epilogue, you'll want something like: <prologue/epilogue freq> * 2 *
> "ira_memory_move_cost".

Thinking more, depending on the allocno/mode, you might also need to factor in
calculate_saved_nregs(), in case the allocno represents a register pair or
larger.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-06-23 20:04 ` bergner at gcc dot gnu.org
@ 2023-06-27 13:18 ` jskumari at gcc dot gnu.org
  2023-06-27 13:19 ` jskumari at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-06-27 13:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #11 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
(In reply to Peter Bergner from comment #9)
> (In reply to Surya Kumari Jangala from comment #8)
> > However, while computing the save/restore cost, we are considering only the
> > memory move cost but not the BB frequency. I think it is important to
> > consider the frequency too. 
> 
> Yes, you'll need to factor in the BB frequency.  Since the save/restore code
> will go into (at this point modulo shrink-wrapping later) the prologue and
> epilogue, you'll want something like: <prologue/epilogue freq> * 2 *
> "ira_memory_move_cost".

Thanks for confirming that we have to factor in the BB frequency. 

> I think the issue here, is that the "frequency" of the entry block isn't
> '1', but some larger value.  I'm not 100% sure, but maybe you can use:
>   REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))

This works. It gives the frequency of the entry block.

> for the BB frequency of the prologue/epilogue?
> 
> 
> > We factor in the frequency when we compute the
> > savings on removed copies in allocno_copy_cost_saving(). In this routine, we
> > multiply the frequency with ira_register_move_cost. So why not factor in the
> > frequency for memory move cost?
> 
> allocno_copy_cost_saving() is dealing with an actual copy instruction(s), so
> a real instruction in a specific BB, so it knows the frequency of the
> copy(ies).  The ira_memory_move_cost is more of a HW cost of a generic
> load/store and it isn't tied to a specific instruction, so there is no
> frequency to scale by, so you'll need to do that manually here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/109009] Shrink Wrap missed opportunity
  2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-06-27 13:18 ` jskumari at gcc dot gnu.org
@ 2023-06-27 13:19 ` jskumari at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: jskumari at gcc dot gnu.org @ 2023-06-27 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109009

--- Comment #12 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> ---
(In reply to Peter Bergner from comment #10)
> (In reply to Peter Bergner from comment #9)
> > Yes, you'll need to factor in the BB frequency.  Since the save/restore code
> > will go into (at this point modulo shrink-wrapping later) the prologue and
> > epilogue, you'll want something like: <prologue/epilogue freq> * 2 *
> > "ira_memory_move_cost".
> 
> Thinking more, depending on the allocno/mode, you might also need to factor
> in calculate_saved_nregs(), in case the allocno represents a register pair
> or larger.

Yes, I am taking calculate_saved_nregs() into consideration.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-27 13:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-03 14:04 [Bug rtl-optimization/109009] New: Shrink Wrap missed opportunity jskumari at gcc dot gnu.org
2023-03-03 16:40 ` [Bug rtl-optimization/109009] " segher at gcc dot gnu.org
2023-03-05  5:23 ` jskumari at gcc dot gnu.org
2023-03-05 12:19 ` jskumari at gcc dot gnu.org
2023-03-05 15:01 ` segher at gcc dot gnu.org
2023-04-14 17:44 ` jskumari at gcc dot gnu.org
2023-05-10 11:51 ` jskumari at gcc dot gnu.org
2023-05-11  9:49 ` jskumari at gcc dot gnu.org
2023-06-23 15:03 ` jskumari at gcc dot gnu.org
2023-06-23 19:57 ` bergner at gcc dot gnu.org
2023-06-23 20:04 ` bergner at gcc dot gnu.org
2023-06-27 13:18 ` jskumari at gcc dot gnu.org
2023-06-27 13:19 ` jskumari at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).