Code bloat due to silly IRA cost model?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Code bloat due to silly IRA cost model?
@ 2019-10-25 11:07 Georg-Johann Lay
  2019-12-10 20:16 ` Georg-Johann Lay
  0 siblings, 1 reply; 14+ messages in thread
From: Georg-Johann Lay @ 2019-10-25 11:07 UTC (permalink / raw)
  To: gcc

Hi,

I am trying to track down a code bloat issue and am stuck becauce I do 
not understand IRA's cose model.

The test case is as simple as it gets:

float func (float);

float call (float f)
{
     return func (f);
}

IRA dump shows the following insns:


(insn 14 4 2 2 (set (reg:SF 44)
         (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
      (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
         (nil)))
(insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
         (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
      (expr_list:REG_DEAD (reg:SF 44)
         (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SF 22 r22)
         (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
      (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
         (nil)))
(call_insn/j 7 6 8 2 (parallel [

#14 sets pseudo 44 from arg register R22.
#2 moves it to pseudo 43
#6 moves it to R22 as it prepares for call_insn #7.

There are 2 allocnos and cost:

Pass 0 for finding pseudo/allocno costs

     a1 (r44,l0) best NO_REGS, allocno NO_REGS
     a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

which is quite odd because MEM is way more expensive here than any REG.

Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor 
of 100:

     a1 (r44,l0) best NO_REGS, allocno NO_REGS
     a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 
LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
   a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 
LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000

What??? The REG costs are 100 times higher, and stille higher that the 
MEM costs.  What the heck is going on?

Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 
yiels:

   a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
GENERAL_REGS:0 MEM:0
   a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
GENERAL_REGS:0 MEM:0

as expected, i.e. there is no other hidden source of costs considered by 
IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and 
TARGET_MEMORY_MOVE_COST = original gives:

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

How the heck do I tell ira-costs that registers are way cheaper than MEM?

Johann


p.s.

test case compiled with

$ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v

Target: avr
Configured with: ../../gcc.gnu.org/trunk/configure --target=avr 
--prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls 
--with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld 
--enable-checking=release --enable-languages=c,c++ --disable-gcov
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.0.0 20191021 (experimental) (GCC)






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-10-25 11:07 Code bloat due to silly IRA cost model? Georg-Johann Lay
@ 2019-12-10 20:16 ` Georg-Johann Lay
  2019-12-11 17:55   ` Richard Sandiford
  0 siblings, 1 reply; 14+ messages in thread
From: Georg-Johann Lay @ 2019-12-10 20:16 UTC (permalink / raw)
  To: gcc

Hi, doesn't actually anybody know know to make memory more expensive 
than registers when it comes to allocating registers?

Whatever I am trying for TARGET_MEMORY_MOVE_COST and 
TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more 
expensive than mem and therefore allocates values to stack slots instead 
of keeping them in registers.

Test case (for avr) is as simple as it gets:

float func (float);

float call (float f)
{
     return func (f);
}

What am I missing?

Johann


Georg-Johann Lay schrieb:
> Hi,
> 
> I am trying to track down a code bloat issue and am stuck because I do 
> not understand IRA's cost model.
> 
> The test case is as simple as it gets:
> 
> float func (float);
> 
> float call (float f)
> {
>     return func (f);
> }
> 
> IRA dump shows the following insns:
> 
> 
> (insn 14 4 2 2 (set (reg:SF 44)
>         (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>      (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>         (nil)))
> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>         (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>      (expr_list:REG_DEAD (reg:SF 44)
>         (nil)))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (set (reg:SF 22 r22)
>         (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>      (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>         (nil)))
> (call_insn/j 7 6 8 2 (parallel [
> 
> #14 sets pseudo 44 from arg register R22.
> #2 moves it to pseudo 43
> #6 moves it to R22 as it prepares for call_insn #7.
> 
> There are 2 allocnos and cost:
> 
> Pass 0 for finding pseudo/allocno costs
> 
>     a1 (r44,l0) best NO_REGS, allocno NO_REGS
>     a0 (r43,l0) best NO_REGS, allocno NO_REGS
> 
>   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
> 
> which is quite odd because MEM is way more expensive here than any REG.
> 
> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor 
> of 100:
> 
>     a1 (r44,l0) best NO_REGS, allocno NO_REGS
>     a0 (r43,l0) best NO_REGS, allocno NO_REGS
> 
>   a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 
> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>   a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 
> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
> 
> What??? The REG costs are 100 times higher, and stille higher that the 
> MEM costs.  What the heck is going on?
> 
> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 
> yiels:
> 
>   a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
> GENERAL_REGS:0 MEM:0
>   a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
> GENERAL_REGS:0 MEM:0
> 
> as expected, i.e. there is no other hidden source of costs considered by 
> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and 
> TARGET_MEMORY_MOVE_COST = original gives:
> 
>   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
> 
> How the heck do I tell ira-costs that registers are way cheaper than MEM?
> 
> Johann
> 
> 
> p.s.
> 
> test case compiled with
> 
> $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v
> 
> Target: avr
> Configured with: ../../gcc.gnu.org/trunk/configure --target=avr 
> --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls 
> --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld 
> --enable-checking=release --enable-languages=c,c++ --disable-gcov
> Thread model: single
> Supported LTO compression algorithms: zlib
> gcc version 10.0.0 20191021 (experimental) (GCC)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-10 20:16 ` Georg-Johann Lay
@ 2019-12-11 17:55   ` Richard Sandiford
  2019-12-13 11:58     ` Georg-Johann Lay
  2019-12-16 13:52     ` Georg-Johann Lay
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Sandiford @ 2019-12-11 17:55 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc

Georg-Johann Lay <gjl@gcc.gnu.org> writes:
> Hi, doesn't actually anybody know know to make memory more expensive 
> than registers when it comes to allocating registers?
>
> Whatever I am trying for TARGET_MEMORY_MOVE_COST and 
> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more 
> expensive than mem and therefore allocates values to stack slots instead 
> of keeping them in registers.
>
> Test case (for avr) is as simple as it gets:
>
> float func (float);
>
> float call (float f)
> {
>      return func (f);
> }
>
> What am I missing?
>
> Johann
>
>
> Georg-Johann Lay schrieb:
>> Hi,
>> 
>> I am trying to track down a code bloat issue and am stuck because I do 
>> not understand IRA's cost model.
>> 
>> The test case is as simple as it gets:
>> 
>> float func (float);
>> 
>> float call (float f)
>> {
>>     return func (f);
>> }
>> 
>> IRA dump shows the following insns:
>> 
>> 
>> (insn 14 4 2 2 (set (reg:SF 44)
>>         (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>>      (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>>         (nil)))
>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>>         (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>>      (expr_list:REG_DEAD (reg:SF 44)
>>         (nil)))
>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>> (insn 6 3 7 2 (set (reg:SF 22 r22)
>>         (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>>      (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>>         (nil)))
>> (call_insn/j 7 6 8 2 (parallel [
>> 
>> #14 sets pseudo 44 from arg register R22.
>> #2 moves it to pseudo 43
>> #6 moves it to R22 as it prepares for call_insn #7.
>> 
>> There are 2 allocnos and cost:
>> 
>> Pass 0 for finding pseudo/allocno costs
>> 
>>     a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>     a0 (r43,l0) best NO_REGS, allocno NO_REGS
>> 
>>   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>> 
>> which is quite odd because MEM is way more expensive here than any REG.
>> 
>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor 
>> of 100:
>> 
>>     a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>     a0 (r43,l0) best NO_REGS, allocno NO_REGS
>> 
>>   a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 
>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>   a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 
>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>> 
>> What??? The REG costs are 100 times higher, and stille higher that the 
>> MEM costs.  What the heck is going on?
>> 
>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 
>> yiels:
>> 
>>   a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
>> GENERAL_REGS:0 MEM:0
>>   a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
>> GENERAL_REGS:0 MEM:0
>> 
>> as expected, i.e. there is no other hidden source of costs considered by 
>> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and 
>> TARGET_MEMORY_MOVE_COST = original gives:
>> 
>>   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>> 
>> How the heck do I tell ira-costs that registers are way cheaper than MEM?

I think this is coming from:

  /* FIXME: Ideally, the following test is not needed.
        However, it turned out that it can reduce the number
        of spill fails.  AVR and it's poor endowment with
        address registers is extreme stress test for reload.  */

  if (GET_MODE_SIZE (mode) >= 4
      && regno >= REG_X)
    return false;

in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
moves between pointer registers and general registers have the highest
possible cost (65535) to prevent them for being used for SFmode.  So:

   ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;

The costs for union classes are the maximum (worst-case) cost of
for each subclass, so this means that:

   ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;

as well.

Removing the code above fixes it.  If you don't want to do that, an
alternative might be to add a class for r0-r25 (but I've not tested that).

Thanks,
Richard

>> 
>> Johann
>> 
>> 
>> p.s.
>> 
>> test case compiled with
>> 
>> $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v
>> 
>> Target: avr
>> Configured with: ../../gcc.gnu.org/trunk/configure --target=avr 
>> --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls 
>> --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld 
>> --enable-checking=release --enable-languages=c,c++ --disable-gcov
>> Thread model: single
>> Supported LTO compression algorithms: zlib
>> gcc version 10.0.0 20191021 (experimental) (GCC)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-11 17:55   ` Richard Sandiford
@ 2019-12-13 11:58     ` Georg-Johann Lay
  2019-12-13 12:46       ` Richard Sandiford
  2019-12-16 13:52     ` Georg-Johann Lay
  1 sibling, 1 reply; 14+ messages in thread
From: Georg-Johann Lay @ 2019-12-13 11:58 UTC (permalink / raw)
  To: gcc; +Cc: richard.sandiford

Am 11.12.19 um 18:55 schrieb Richard Sandiford:
> Georg-Johann Lay <gjl@gcc.gnu.org> writes:
>> Hi, doesn't actually anybody know know to make memory more expensive
>> than registers when it comes to allocating registers?
>>
>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and
>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
>> expensive than mem and therefore allocates values to stack slots instead
>> of keeping them in registers.
>>
>> Test case (for avr) is as simple as it gets:
>>
>> float func (float);
>>
>> float call (float f)
>> {
>>       return func (f);
>> }
>>
>> What am I missing?
>>
>> Johann
>>
>>
>> Georg-Johann Lay schrieb:
>>> Hi,
>>>
>>> I am trying to track down a code bloat issue and am stuck because I do
>>> not understand IRA's cost model.
>>>
>>> The test case is as simple as it gets:
>>>
>>> float func (float);
>>>
>>> float call (float f)
>>> {
>>>      return func (f);
>>> }
>>>
>>> IRA dump shows the following insns:
>>>
>>>
>>> (insn 14 4 2 2 (set (reg:SF 44)
>>>          (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>>>       (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>>>          (nil)))
>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>>>          (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>>>       (expr_list:REG_DEAD (reg:SF 44)
>>>          (nil)))
>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>> (insn 6 3 7 2 (set (reg:SF 22 r22)
>>>          (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>>>       (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>>>          (nil)))
>>> (call_insn/j 7 6 8 2 (parallel [
>>>
>>> #14 sets pseudo 44 from arg register R22.
>>> #2 moves it to pseudo 43
>>> #6 moves it to R22 as it prepares for call_insn #7.
>>>
>>> There are 2 allocnos and cost:
>>>
>>> Pass 0 for finding pseudo/allocno costs
>>>
>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>
>>> which is quite odd because MEM is way more expensive here than any REG.
>>>
>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
>>> of 100:
>>>
>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>    a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>
>>> What??? The REG costs are 100 times higher, and stille higher that the
>>> MEM costs.  What the heck is going on?
>>>
>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
>>> yiels:
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>> GENERAL_REGS:0 MEM:0
>>>    a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>> GENERAL_REGS:0 MEM:0
>>>
>>> as expected, i.e. there is no other hidden source of costs considered by
>>> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
>>> TARGET_MEMORY_MOVE_COST = original gives:
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>
>>> How the heck do I tell ira-costs that registers are way cheaper than MEM?
> 
> I think this is coming from:
> 
>    /* FIXME: Ideally, the following test is not needed.
>          However, it turned out that it can reduce the number
>          of spill fails.  AVR and it's poor endowment with
>          address registers is extreme stress test for reload.  */
> 
>    if (GET_MODE_SIZE (mode) >= 4
>        && regno >= REG_X)
>      return false;

This was introduced to "fix" unable to find a register to spill ICE.

What I do not understand is that the code with long (which is SImode on 
avr) is fine:

long lunc (long);

long callL (long f)
{
     return lunc (f);
}

callL:
	rjmp lunc	 ;  7	[c=24 l=1]  call_value_insn/3


> in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
> moves between pointer registers and general registers have the highest
> possible cost (65535) to prevent them for being used for SFmode.  So:
> 
>     ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;
> 
> The costs for union classes are the maximum (worst-case) cost of
> for each subclass, so this means that:
> 
>     ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;
> 
> as well.

This means that, when there is an expensive class (because it only 
contains one register for example), then it will blow the cost of 
GENERAL_REGS to crazy values no matter what?

What's also strange is that the register allocator would not need to 
allocate a register at all:  The incoming parameter comes in SI:22 and 
is just be passed through to the callee, which also receives the value 
in SI:22.  Why would one move that value to memory?  Even if memory was 
cheaper, moving the value to mem just to load it again to the same 
register is not very sensible...  because in almost any case, /no/ 
instruction is cheaper than /some/ instructions?

> Removing the code above fixes it.  If you don't want to do that, an
> alternative might be to add a class for r0-r25 (but I've not tested that).

Is there a way that it would use a similar path like SImode?
> 
> Thanks,
> Richard
> 
>>>
>>> Johann
>>>
>>>
>>> p.s.
>>>
>>> test case compiled with
>>>
>>> $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v
>>>
>>> Target: avr
>>> Configured with: ../../gcc.gnu.org/trunk/configure --target=avr
>>> --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls
>>> --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld
>>> --enable-checking=release --enable-languages=c,c++ --disable-gcov
>>> Thread model: single
>>> Supported LTO compression algorithms: zlib
>>> gcc version 10.0.0 20191021 (experimental) (GCC)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 11:58     ` Georg-Johann Lay
@ 2019-12-13 12:46       ` Richard Sandiford
  2019-12-13 16:04         ` Segher Boessenkool
  2020-01-09  9:52         ` Georg-Johann Lay
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Sandiford @ 2019-12-13 12:46 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc

Georg-Johann Lay <gjl@gcc.gnu.org> writes:
> Am 11.12.19 um 18:55 schrieb Richard Sandiford:
>> Georg-Johann Lay <gjl@gcc.gnu.org> writes:
>>> Hi, doesn't actually anybody know know to make memory more expensive
>>> than registers when it comes to allocating registers?
>>>
>>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and
>>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
>>> expensive than mem and therefore allocates values to stack slots instead
>>> of keeping them in registers.
>>>
>>> Test case (for avr) is as simple as it gets:
>>>
>>> float func (float);
>>>
>>> float call (float f)
>>> {
>>>       return func (f);
>>> }
>>>
>>> What am I missing?
>>>
>>> Johann
>>>
>>>
>>> Georg-Johann Lay schrieb:
>>>> Hi,
>>>>
>>>> I am trying to track down a code bloat issue and am stuck because I do
>>>> not understand IRA's cost model.
>>>>
>>>> The test case is as simple as it gets:
>>>>
>>>> float func (float);
>>>>
>>>> float call (float f)
>>>> {
>>>>      return func (f);
>>>> }
>>>>
>>>> IRA dump shows the following insns:
>>>>
>>>>
>>>> (insn 14 4 2 2 (set (reg:SF 44)
>>>>          (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>>>>       (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>>>>          (nil)))
>>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>>>>          (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>>>>       (expr_list:REG_DEAD (reg:SF 44)
>>>>          (nil)))
>>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>>> (insn 6 3 7 2 (set (reg:SF 22 r22)
>>>>          (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>>>>       (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>>>>          (nil)))
>>>> (call_insn/j 7 6 8 2 (parallel [
>>>>
>>>> #14 sets pseudo 44 from arg register R22.
>>>> #2 moves it to pseudo 43
>>>> #6 moves it to R22 as it prepares for call_insn #7.
>>>>
>>>> There are 2 allocnos and cost:
>>>>
>>>> Pass 0 for finding pseudo/allocno costs
>>>>
>>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>
>>>> which is quite odd because MEM is way more expensive here than any REG.
>>>>
>>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
>>>> of 100:
>>>>
>>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>>    a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>>
>>>> What??? The REG costs are 100 times higher, and stille higher that the
>>>> MEM costs.  What the heck is going on?
>>>>
>>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
>>>> yiels:
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>>> GENERAL_REGS:0 MEM:0
>>>>    a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>>> GENERAL_REGS:0 MEM:0
>>>>
>>>> as expected, i.e. there is no other hidden source of costs considered by
>>>> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
>>>> TARGET_MEMORY_MOVE_COST = original gives:
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>
>>>> How the heck do I tell ira-costs that registers are way cheaper than MEM?
>> 
>> I think this is coming from:
>> 
>>    /* FIXME: Ideally, the following test is not needed.
>>          However, it turned out that it can reduce the number
>>          of spill fails.  AVR and it's poor endowment with
>>          address registers is extreme stress test for reload.  */
>> 
>>    if (GET_MODE_SIZE (mode) >= 4
>>        && regno >= REG_X)
>>      return false;
>
> This was introduced to "fix" unable to find a register to spill ICE.
>
> What I do not understand is that the code with long (which is SImode on 
> avr) is fine:
>
> long lunc (long);
>
> long callL (long f)
> {
>      return lunc (f);
> }
>
> callL:
> 	rjmp lunc	 ;  7	[c=24 l=1]  call_value_insn/3

This is due to differences in the way that lower-subreg.c lowers
SF moves vs. SI moves.  For SI it generates pure QI moves and so
gets rid of the SI entirely.  For SF it still builds the QI values
back into an SF:

      || (!SCALAR_INT_MODE_P (dest_mode)
	  && !resolve_reg_p (dest)
	  && !resolve_subreg_p (dest)))

I imagine this is because non-int modes are held in FPRs rather than
GPRs on most targets, but TBH I'm not sure.  I couldn't see a comment
that explains the above decision.

With -fno-split-wide-types I see the same RA behaviour for both SI and SF
(i.e. both spill to memory).

>> in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
>> moves between pointer registers and general registers have the highest
>> possible cost (65535) to prevent them for being used for SFmode.  So:
>> 
>>     ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;
>> 
>> The costs for union classes are the maximum (worst-case) cost of
>> for each subclass, so this means that:
>> 
>>     ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;
>> 
>> as well.
>
> This means that, when there is an expensive class (because it only 
> contains one register for example),

Having one register doesn't automatically make it expensive.
E.g. there's only one "c" register on x86, but it's not more expensive
than other registers because of that.

Move costs aren't a good way of deterring unnecessary uses of small
classes.  The costs should just describe the actual size or speed
overhead of moving the register.

> then it will blow the cost of GENERAL_REGS to crazy values no matter
> what?

Yeah.  This is because (with the above intended use of costs) the
worst-case cost of a superclass X can't be less than the worst-case cost
of one of its subclasses Y.  If the RA decides to allocate an X, it might
get unlucky and be forced to use a register in Y.

If a class X - Y exists then it won't be affected by the Y costs.
So taking Y's cost into account when calculating X's cost means that
the RA will prefer X - Y over X, which is exactly what making Y
expensive should achieve.

FWIW, that's why I suggested seeing what would happen if you added a new
class for GENERAL_REGS - POINTER_REGS.

> What's also strange is that the register allocator would not need to 
> allocate a register at all:  The incoming parameter comes in SI:22 and 
> is just be passed through to the callee, which also receives the value 
> in SI:22.  Why would one move that value to memory?  Even if memory was 
> cheaper, moving the value to mem just to load it again to the same 
> register is not very sensible...  because in almost any case, /no/ 
> instruction is cheaper than /some/ instructions?

Earlier passes could perhaps propagate the pseudo registers away in very
simple cases like this.  It would be a very special-case optimisation
though.  If there was anything other than "move register X to register X"
between the calls, getting rid of the pseudo registers before RA could
introduce spill failures.

combine's to blame for the fact that we have two pseudo registers rather
than one.  See the comments about the avr-elf results in:

   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html

for more details.

>> Removing the code above fixes it.  If you don't want to do that, an
>> alternative might be to add a class for r0-r25 (but I've not tested that).
>
> Is there a way that it would use a similar path like SImode?

AFAICT the SI and SF costs are the same.  The difference is coming
from -fsplit-wide-types rather than RA.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 12:46       ` Richard Sandiford
@ 2019-12-13 16:04         ` Segher Boessenkool
  2019-12-13 16:22           ` Richard Sandiford
  2020-01-09  9:52         ` Georg-Johann Lay
  1 sibling, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2019-12-13 16:04 UTC (permalink / raw)
  To: Georg-Johann Lay, gcc, richard.sandiford

On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote:
> combine's to blame for the fact that we have two pseudo registers rather
> than one.  See the comments about the avr-elf results in:
> 
>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html
> 
> for more details.

It's not combine's fault if register allocation does a bad job.  And we
should *not* generate worse code in combine just because it exposes a
problem in RA (with 2-2 and make_more_copies we generate better code on
average, on all targets I tested, 50 or so).

If having two pseudos here is not an advantage, then RA should optimise
one away.  It does usually, why not here?

Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 16:04         ` Segher Boessenkool
@ 2019-12-13 16:22           ` Richard Sandiford
  2019-12-13 18:59             ` Segher Boessenkool
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2019-12-13 16:22 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Georg-Johann Lay, gcc

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote:
>> combine's to blame for the fact that we have two pseudo registers rather
>> than one.  See the comments about the avr-elf results in:
>> 
>>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html
>> 
>> for more details.
>
> It's not combine's fault if register allocation does a bad job.  And we
> should *not* generate worse code in combine just because it exposes a
> problem in RA (with 2-2 and make_more_copies we generate better code on
> average, on all targets I tested, 50 or so).
>
> If having two pseudos here is not an advantage, then RA should optimise
> one away.  It does usually, why not here?

I didn't say it was combine's fault that RA was bad.  I said it was
combine's fault that we have two pseudos rather than one.

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 16:22           ` Richard Sandiford
@ 2019-12-13 18:59             ` Segher Boessenkool
  2019-12-13 22:31               ` Richard Sandiford
  0 siblings, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2019-12-13 18:59 UTC (permalink / raw)
  To: Georg-Johann Lay, gcc, richard.sandiford

On Fri, Dec 13, 2019 at 04:22:11PM +0000, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote:
> >> combine's to blame for the fact that we have two pseudo registers rather
> >> than one.  See the comments about the avr-elf results in:
> >> 
> >>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html
> >> 
> >> for more details.
> >
> > It's not combine's fault if register allocation does a bad job.  And we
> > should *not* generate worse code in combine just because it exposes a
> > problem in RA (with 2-2 and make_more_copies we generate better code on
> > average, on all targets I tested, 50 or so).
> >
> > If having two pseudos here is not an advantage, then RA should optimise
> > one away.  It does usually, why not here?
> 
> I didn't say it was combine's fault that RA was bad.  I said it was
> combine's fault that we have two pseudos rather than one.

But that is not a fault, that is on purpose.

Before this change, combine would forward hard registers into pseudos
greedily.  RA can do a better job than that.  If you found a case where
RA does not do a good job, let's fix that?

(And combine does get rid of two pseudos, if that is a good idea to do.
If instructions do not properly combine, it can not, of course).


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 18:59             ` Segher Boessenkool
@ 2019-12-13 22:31               ` Richard Sandiford
  2019-12-18 15:29                 ` Segher Boessenkool
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2019-12-13 22:31 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Georg-Johann Lay, gcc

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, Dec 13, 2019 at 04:22:11PM +0000, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote:
>> >> combine's to blame for the fact that we have two pseudo registers rather
>> >> than one.  See the comments about the avr-elf results in:
>> >> 
>> >>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html
>> >> 
>> >> for more details.
>> >
>> > It's not combine's fault if register allocation does a bad job.  And we
>> > should *not* generate worse code in combine just because it exposes a
>> > problem in RA (with 2-2 and make_more_copies we generate better code on
>> > average, on all targets I tested, 50 or so).
>> >
>> > If having two pseudos here is not an advantage, then RA should optimise
>> > one away.  It does usually, why not here?
>> 
>> I didn't say it was combine's fault that RA was bad.  I said it was
>> combine's fault that we have two pseudos rather than one.
>
> But that is not a fault, that is on purpose.
>
> Before this change, combine would forward hard registers into pseudos
> greedily.  RA can do a better job than that.

I don't think anyone's disputing that.  You quoted the initial text
above out of context.  Johann had asked why the RA even needed to do
anything for the posted testcase, where we have the equivalent of
"foo (bar ())", bar returns a value in register X and foo takes an
argument in register X.  I was trying to explain that we still need:

  (set pseudo X)
  ...
  (set X pseudo)

in order to avoid spill failures in all but trivial cases, and that we
rely on the RA to make a good allocation choice for the pseudo.  So I
think what you said above is basically explaining back to me what I'd
said in the context that was snipped.

But we only need one temporary pseudo register to avoid the spill
failures, whereas in Johann's case the RA sees two:

  (set pseudo2 X)
  (set pseudo pseudo2)
  ...
  (set X pseudo2)

My point was the extra pseudo<-pseudo2 move is created by combine for
its own internal purposes and pseudo2 isn't something *the RA* needs to
avoid spill failures.  But in this case combine fails to fold the extra
move with anything and so the move "leaks out" to later passes, including
RA.  The snipped context linked to the message where we'd discussed this,
including why combine fails to restore the original X<-pseudo<-X chain
for avr-elf.  It also shows that avr-elf code improved significantly
if we *did* restore that original chain (which the new combine pass
happened to do, although that just fell out in the wash rather than
being a specific aim).

In a perfect world, we could keep adding more and more pseudos to a
move chain without affecting the output of the RA.  But it's not too
surprising if that isn't always true in practice.  After all, the
point of adding pseudo2 in the first place is that *combine* handles
pseudo<-pseudo2<-X differently from just pseudo<-X. ;-)

Thanks,
Richard

> If you found a case where
> RA does not do a good job, let's fix that?
>
> (And combine does get rid of two pseudos, if that is a good idea to do.
> If instructions do not properly combine, it can not, of course).
>
>
> Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 22:31               ` Richard Sandiford
@ 2019-12-18 15:29                 ` Segher Boessenkool
  2019-12-18 15:43                   ` Richard Sandiford
  0 siblings, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2019-12-18 15:29 UTC (permalink / raw)
  To: Georg-Johann Lay, gcc, richard.sandiford

Hi Richard,

On Fri, Dec 13, 2019 at 10:31:54PM +0000, Richard Sandiford wrote:
> >> I didn't say it was combine's fault that RA was bad.  I said it was
> >> combine's fault that we have two pseudos rather than one.

See below.

> My point was the extra pseudo<-pseudo2 move is created by combine for
> its own internal purposes

And my point is that it is *not* internal purposes :-)  This is done
because we no longer combine with the hard register, but combining with
just a register move is quite beneficial for many targets.

We could (and probably should) do a 1->1 combine first, i.e. just
simplification for every single insn, but that causes other problems
right now.  GCC 11, I hope.

What happens is we have this:

insn_cost 4 for    14: r44:SF=r22:SF
      REG_DEAD r22:SF
insn_cost 4 for     2: r43:SF=r44:SF
      REG_DEAD r44:SF
insn_cost 4 for     6: r22:SF=r43:SF
      REG_DEAD r43:SF
insn_cost 0 for     7: r22:SF=call [`g'] argc:0
      REG_CALL_DECL `g'

(where insn 14 and r44 are created by make_more_copies).

Now, insn 14 would normally be combined into insn 2.  But this doesn't
happen because the target prohibits it, with the
targetm.class_likely_spilled_p in cant_combine_insn_p.

I wonder if we still need that at all?

Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-18 15:29                 ` Segher Boessenkool
@ 2019-12-18 15:43                   ` Richard Sandiford
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Sandiford @ 2019-12-18 15:43 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Georg-Johann Lay, gcc

Segher Boessenkool <segher@kernel.crashing.org> writes:
>> My point was the extra pseudo<-pseudo2 move is created by combine for
>> its own internal purposes
>
> And my point is that it is *not* internal purposes :-)  This is done
> because we no longer combine with the hard register, but combining with
> just a register move is quite beneficial for many targets.

But that's what I meant by "its own internal purposes".  It's something
one part of combine does to make other parts of combine work better.

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-13 12:46       ` Richard Sandiford
  2019-12-13 16:04         ` Segher Boessenkool
@ 2020-01-09  9:52         ` Georg-Johann Lay
  1 sibling, 0 replies; 14+ messages in thread
From: Georg-Johann Lay @ 2020-01-09  9:52 UTC (permalink / raw)
  To: gcc; +Cc: richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 10425 bytes --]

Am 13.12.19 um 13:45 schrieb Richard Sandiford:
> Georg-Johann Lay <gjl@gcc.gnu.org> writes:
>> Am 11.12.19 um 18:55 schrieb Richard Sandiford:
>>> Georg-Johann Lay <gjl@gcc.gnu.org> writes:
>>>> Hi, doesn't actually anybody know know to make memory more expensive
>>>> than registers when it comes to allocating registers?
>>>>
>>>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and
>>>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
>>>> expensive than mem and therefore allocates values to stack slots instead
>>>> of keeping them in registers.
>>>>
>>>> Test case (for avr) is as simple as it gets:
>>>>
>>>> float func (float);
>>>>
>>>> float call (float f)
>>>> {
>>>>        return func (f);
>>>> }
>>>>
>>>> What am I missing?
>>>>
>>>> Johann
>>>>
>>>>
>>>> Georg-Johann Lay schrieb:
>>>>> Hi,
>>>>>
>>>>> I am trying to track down a code bloat issue and am stuck because I do
>>>>> not understand IRA's cost model.
>>>>>
>>>>> The test case is as simple as it gets:
>>>>>
>>>>> float func (float);
>>>>>
>>>>> float call (float f)
>>>>> {
>>>>>       return func (f);
>>>>> }
>>>>>
>>>>> IRA dump shows the following insns:
>>>>>
>>>>>
>>>>> (insn 14 4 2 2 (set (reg:SF 44)
>>>>>           (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>>>>>        (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>>>>>           (nil)))
>>>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>>>>>           (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>>>>>        (expr_list:REG_DEAD (reg:SF 44)
>>>>>           (nil)))
>>>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>>>> (insn 6 3 7 2 (set (reg:SF 22 r22)
>>>>>           (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>>>>>        (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>>>>>           (nil)))
>>>>> (call_insn/j 7 6 8 2 (parallel [
>>>>>
>>>>> #14 sets pseudo 44 from arg register R22.
>>>>> #2 moves it to pseudo 43
>>>>> #6 moves it to R22 as it prepares for call_insn #7.
>>>>>
>>>>> There are 2 allocnos and cost:
>>>>>
>>>>> Pass 0 for finding pseudo/allocno costs
>>>>>
>>>>>       a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>>>       a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>>>
>>>>>     a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>>     a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>>
>>>>> which is quite odd because MEM is way more expensive here than any REG.
>>>>>
>>>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
>>>>> of 100:
>>>>>
>>>>>       a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>>>       a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>>>
>>>>>     a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>>>     a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>>>
>>>>> What??? The REG costs are 100 times higher, and stille higher that the
>>>>> MEM costs.  What the heck is going on?
>>>>>
>>>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
>>>>> yiels:
>>>>>
>>>>>     a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>>>> GENERAL_REGS:0 MEM:0
>>>>>     a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>>>> GENERAL_REGS:0 MEM:0
>>>>>
>>>>> as expected, i.e. there is no other hidden source of costs considered by
>>>>> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
>>>>> TARGET_MEMORY_MOVE_COST = original gives:
>>>>>
>>>>>     a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>>     a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>>
>>>>> How the heck do I tell ira-costs that registers are way cheaper than MEM?
>>>
>>> I think this is coming from:
>>>
>>>     /* FIXME: Ideally, the following test is not needed.
>>>           However, it turned out that it can reduce the number
>>>           of spill fails.  AVR and it's poor endowment with
>>>           address registers is extreme stress test for reload.  */
>>>
>>>     if (GET_MODE_SIZE (mode) >= 4
>>>         && regno >= REG_X)
>>>       return false;
>>
>> This was introduced to "fix" unable to find a register to spill ICE.
>>
>> What I do not understand is that the code with long (which is SImode on
>> avr) is fine:
>>
>> long lunc (long);
>>
>> long callL (long f)
>> {
>>       return lunc (f);
>> }
>>
>> callL:
>> 	rjmp lunc	 ;  7	[c=24 l=1]  call_value_insn/3
> 
> This is due to differences in the way that lower-subreg.c lowers
> SF moves vs. SI moves.  For SI it generates pure QI moves and so
> gets rid of the SI entirely.  For SF it still builds the QI values
> back into an SF:
> 
>        || (!SCALAR_INT_MODE_P (dest_mode)
> 	  && !resolve_reg_p (dest)
> 	  && !resolve_subreg_p (dest)))
> 
> I imagine this is because non-int modes are held in FPRs rather than
> GPRs on most targets, but TBH I'm not sure.  I couldn't see a comment
> that explains the above decision.
> 
> With -fno-split-wide-types I see the same RA behaviour for both SI and SF
> (i.e. both spill to memory).
> 
>>> in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
>>> moves between pointer registers and general registers have the highest
>>> possible cost (65535) to prevent them for being used for SFmode.  So:
>>>
>>>      ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;
>>>
>>> The costs for union classes are the maximum (worst-case) cost of
>>> for each subclass, so this means that:
>>>
>>>      ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;
>>>
>>> as well.
>>
>> This means that, when there is an expensive class (because it only
>> contains one register for example),
> 
> Having one register doesn't automatically make it expensive.
> E.g. there's only one "c" register on x86, but it's not more expensive
> than other registers because of that.
> 
> Move costs aren't a good way of deterring unnecessary uses of small
> classes.  The costs should just describe the actual size or speed
> overhead of moving the register.
> 
>> then it will blow the cost of GENERAL_REGS to crazy values no matter
>> what?
> 
> Yeah.  This is because (with the above intended use of costs) the
> worst-case cost of a superclass X can't be less than the worst-case cost
> of one of its subclasses Y.  If the RA decides to allocate an X, it might
> get unlucky and be forced to use a register in Y.
> 
> If a class X - Y exists then it won't be affected by the Y costs.
> So taking Y's cost into account when calculating X's cost means that
> the RA will prefer X - Y over X, which is exactly what making Y
> expensive should achieve.
> 
> FWIW, that's why I suggested seeing what would happen if you added a new
> class for GENERAL_REGS - POINTER_REGS.

Hi, I have tried that.  Didn't fix the issue.

For reference, the test case compiled with -Os -fsplit-wide-types-early:

float func (float);

float call (float f)
{
     return func (f);
}

with the attached delta.

It's still the case that there are 16 superfluous instructions, dead 
store, setup of frame even though not needed.


call:
	push r28		 ;  17	[c=4 l=1]  pushqi1/0
	push r29		 ;  18	[c=4 l=1]  pushqi1/0
	 ; SP -= 4	 ;  22	[c=4 l=2]  *addhi3_sp
	rcall .	
	rcall .	
	in r28,__SP_L__	 ;  23	[c=4 l=2]  *movhi/7
	in r29,__SP_H__
/* prologue: function */
/* frame size = 4 */
/* stack size = 6 */
.L__stack_usage = 6
	std Y+1,r22	 ;  14	[c=4 l=4]  *movsf/3
	std Y+2,r23
	std Y+3,r24
	std Y+4,r25
/* epilogue start */
	 ; SP += 4	 ;  34	[c=4 l=4]  *addhi3_sp
	pop __tmp_reg__
	pop __tmp_reg__
	pop __tmp_reg__
	pop __tmp_reg__
	pop r29		 ;  35	[c=4 l=1]  popqi
	pop r28		 ;  36	[c=4 l=1]  popqi
	rjmp func	 ;  7	[c=24 l=1]  call_value_insn/3

In the IRA dump, it's still the case that REGs are consistently more 
expensive than MEM:

Pass 0 for finding pseudo/allocno costs

     a1 (r44,l0) best NO_REGS, allocno NO_REGS
     a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 NO_POINTER_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 NO_POINTER_REGS:32000 MEM:9000


Pass 1 for finding pseudo/allocno costs

     r44: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
     r43: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:48000 SIMPLE_LD_REGS:48000 LD_REGS:48000 
NO_LD_REGS:48000 NO_POINTER_REGS:48000 MEM:17000
   a1(r44,l0) costs: ADDW_REGS:40008 SIMPLE_LD_REGS:40008 LD_REGS:40008 
NO_LD_REGS:40008 NO_POINTER_REGS:40008 MEM:17000

Johann

p.s.: Also added a reg-class for the intersection of R0..r25 and 
r24..r31.  Don't know if that's a requirement for regclass layout though.


>> What's also strange is that the register allocator would not need to
>> allocate a register at all:  The incoming parameter comes in SI:22 and
>> is just be passed through to the callee, which also receives the value
>> in SI:22.  Why would one move that value to memory?  Even if memory was
>> cheaper, moving the value to mem just to load it again to the same
>> register is not very sensible...  because in almost any case, /no/
>> instruction is cheaper than /some/ instructions?
> 
> Earlier passes could perhaps propagate the pseudo registers away in very
> simple cases like this.  It would be a very special-case optimisation
> though.  If there was anything other than "move register X to register X"
> between the calls, getting rid of the pseudo registers before RA could
> introduce spill failures.
> 
> combine's to blame for the fact that we have two pseudo registers rather
> than one.  See the comments about the avr-elf results in:
> 
>     https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html
> 
> for more details.
> 
>>> Removing the code above fixes it.  If you don't want to do that, an
>>> alternative might be to add a class for r0-r25 (but I've not tested that).
>>
>> Is there a way that it would use a similar path like SImode?
> 
> AFAICT the SI and SF costs are the same.  The difference is coming
> from -fsplit-wide-types rather than RA.
> 
> Thanks,
> Richard
> 


[-- Attachment #2: x.diff --]
[-- Type: text/x-patch, Size: 4255 bytes --]

Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(revision 279994)
+++ config/avr/avr.c	(working copy)
@@ -850,7 +850,7 @@ avr_regno_reg_class (int r)
       SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS,
       SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS,
       /* r24, r25 */
-      ADDW_REGS, ADDW_REGS,
+      R24_R25_REGS, R24_R25_REGS,
       /* X: r26, 27 */
       POINTER_X_REGS, POINTER_X_REGS,
       /* Y: r28, r29 */
@@ -12704,6 +12704,7 @@ avr_conditional_register_usage (void)
 
       CLEAR_HARD_REG_SET (reg_class_contents[(int) ADDW_REGS]);
       CLEAR_HARD_REG_SET (reg_class_contents[(int) NO_LD_REGS]);
+	  reg_class_contents[NO_POINTER_REGS] &= reg_class_contents[LD_REGS];
     }
 }
 
Index: config/avr/avr.h
===================================================================
--- config/avr/avr.h	(revision 279994)
+++ config/avr/avr.h	(working copy)
@@ -219,6 +219,7 @@ These two properties are reflected by bu
 enum reg_class {
   NO_REGS,
   R0_REG,			/* r0 */
+  R24_R25_REGS,			/* r24 - r25 */
   POINTER_X_REGS,		/* r26 - r27 */
   POINTER_Y_REGS,		/* r28 - r29 */
   POINTER_Z_REGS,		/* r30 - r31 */
@@ -229,6 +230,7 @@ enum reg_class {
   SIMPLE_LD_REGS,		/* r16 - r23 */
   LD_REGS,			/* r16 - r31 */
   NO_LD_REGS,			/* r0 - r15 */
+  NO_POINTER_REGS,		/* r0 - r25 */
   GENERAL_REGS,			/* r0 - r31 */
   ALL_REGS, LIM_REG_CLASSES
 };
@@ -236,25 +238,28 @@ enum reg_class {
 
 #define N_REG_CLASSES (int)LIM_REG_CLASSES
 
-#define REG_CLASS_NAMES {					\
-		 "NO_REGS",					\
-		   "R0_REG",	/* r0 */                        \
-		   "POINTER_X_REGS", /* r26 - r27 */		\
-		   "POINTER_Y_REGS", /* r28 - r29 */		\
-		   "POINTER_Z_REGS", /* r30 - r31 */		\
-		   "STACK_REG",	/* STACK */			\
-		   "BASE_POINTER_REGS",	/* r28 - r31 */		\
-		   "POINTER_REGS", /* r26 - r31 */		\
-		   "ADDW_REGS",	/* r24 - r31 */			\
-                   "SIMPLE_LD_REGS", /* r16 - r23 */            \
-		   "LD_REGS",	/* r16 - r31 */			\
-                   "NO_LD_REGS", /* r0 - r15 */                 \
-		   "GENERAL_REGS", /* r0 - r31 */		\
-		   "ALL_REGS" }
+#define REG_CLASS_NAMES {			\
+	"NO_REGS",				\
+	"R0_REG",	     /* r0 */		\
+	"R24_R25_REGS",	     /* r24 - r25 */	\
+	"POINTER_X_REGS",    /* r26 - r27 */	\
+	"POINTER_Y_REGS",    /* r28 - r29 */	\
+	"POINTER_Z_REGS",    /* r30 - r31 */	\
+	"STACK_REG",	     /* STACK */	\
+	"BASE_POINTER_REGS", /* r28 - r31 */	\
+	"POINTER_REGS",	     /* r26 - r31 */	\
+	"ADDW_REGS",	     /* r24 - r31 */	\
+	"SIMPLE_LD_REGS",    /* r16 - r23 */	\
+	"LD_REGS",	     /* r16 - r31 */	\
+	"NO_LD_REGS",	     /* r0 - r15 */	\
+	"NO_POINTER_REGS",   /* r0 - r35 */	\
+	"GENERAL_REGS",	     /* r0 - r31 */	\
+	"ALL_REGS" }
 
 #define REG_CLASS_CONTENTS {						\
   {0x00000000,0x00000000},	/* NO_REGS */				\
   {0x00000001,0x00000000},	/* R0_REG */                            \
+  {3u << 24,0x00000000},        /* R24_R25_REGS,   r24 - r25 */		\
   {3u << REG_X,0x00000000},     /* POINTER_X_REGS, r26 - r27 */		\
   {3u << REG_Y,0x00000000},     /* POINTER_Y_REGS, r28 - r29 */		\
   {3u << REG_Z,0x00000000},     /* POINTER_Z_REGS, r30 - r31 */		\
@@ -269,6 +274,7 @@ enum reg_class {
   {(3u << REG_X)|(3u << REG_Y)|(3u << REG_Z)|(3u << REG_W)|(0xffu << 16),\
      0x00000000},	/* LD_REGS, r16 - r31 */			\
   {0x0000ffff,0x00000000},	/* NO_LD_REGS  r0 - r15 */              \
+  {0x03ffffff,0x00000000},	/* NO_POINTER_REGS  r0 - r25 */         \
   {0xffffffff,0x00000000},	/* GENERAL_REGS, r0 - r31 */		\
   {0xffffffff,0x00000003}	/* ALL_REGS */				\
 }
Index: config/avr/constraints.md
===================================================================
--- config/avr/constraints.md	(revision 279992)
+++ config/avr/constraints.md	(working copy)
@@ -53,6 +53,12 @@ (define_register_constraint "z" "POINTER
 (define_register_constraint "q" "STACK_REG"
   "Stack pointer register (SPH:SPL).")
 
+(define_register_constraint "R24R25" "R24_R25_REGS"
+  "Register pair r25:r24.")
+
+(define_register_constraint "R00R25" "NO_POINTER_REGS"
+  "Non-pointer registers r0 to r25.")
+
 (define_constraint "I"
   "Integer constant in the range 0 @dots{} 63."
   (and (match_code "const_int")

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-11 17:55   ` Richard Sandiford
  2019-12-13 11:58     ` Georg-Johann Lay
@ 2019-12-16 13:52     ` Georg-Johann Lay
  2019-12-16 14:12       ` Richard Sandiford
  1 sibling, 1 reply; 14+ messages in thread
From: Georg-Johann Lay @ 2019-12-16 13:52 UTC (permalink / raw)
  To: richard.sandiford; +Cc: gcc

Am 11.12.19 um 18:55 schrieb Richard Sandiford:
> Georg-Johann Lay <gjl@gcc.gnu.org> writes:
>> Hi, doesn't actually anybody know know to make memory more expensive
>> than registers when it comes to allocating registers?
>>
>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and
>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
>> expensive than mem and therefore allocates values to stack slots instead
>> of keeping them in registers.
>>
>> Test case (for avr) is as simple as it gets:
>>
>> float func (float);
>>
>> float call (float f)
>> {
>>       return func (f);
>> }
>>
>> What am I missing?
>>
>> Johann
>>
>>
>> Georg-Johann Lay schrieb:
>>> Hi,
>>>
>>> I am trying to track down a code bloat issue and am stuck because I do
>>> not understand IRA's cost model.
>>>
>>> The test case is as simple as it gets:
>>>
>>> float func (float);
>>>
>>> float call (float f)
>>> {
>>>      return func (f);
>>> }
>>>
>>> IRA dump shows the following insns:
>>>
>>>
>>> (insn 14 4 2 2 (set (reg:SF 44)
>>>          (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>>>       (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>>>          (nil)))
>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>>>          (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>>>       (expr_list:REG_DEAD (reg:SF 44)
>>>          (nil)))
>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>> (insn 6 3 7 2 (set (reg:SF 22 r22)
>>>          (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>>>       (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>>>          (nil)))
>>> (call_insn/j 7 6 8 2 (parallel [
>>>
>>> #14 sets pseudo 44 from arg register R22.
>>> #2 moves it to pseudo 43
>>> #6 moves it to R22 as it prepares for call_insn #7.
>>>
>>> There are 2 allocnos and cost:
>>>
>>> Pass 0 for finding pseudo/allocno costs
>>>
>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>
>>> which is quite odd because MEM is way more expensive here than any REG.
>>>
>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
>>> of 100:
>>>
>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>    a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>
>>> What??? The REG costs are 100 times higher, and stille higher that the
>>> MEM costs.  What the heck is going on?
>>>
>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
>>> yiels:
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>> GENERAL_REGS:0 MEM:0
>>>    a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>> GENERAL_REGS:0 MEM:0
>>>
>>> as expected, i.e. there is no other hidden source of costs considered by
>>> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
>>> TARGET_MEMORY_MOVE_COST = original gives:
>>>
>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>
>>> How the heck do I tell ira-costs that registers are way cheaper than MEM?
> 
> I think this is coming from:
> 
>    /* FIXME: Ideally, the following test is not needed.
>          However, it turned out that it can reduce the number
>          of spill fails.  AVR and it's poor endowment with
>          address registers is extreme stress test for reload.  */
> 
>    if (GET_MODE_SIZE (mode) >= 4
>        && regno >= REG_X)
>      return false;
> 
> in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
> moves between pointer registers and general registers have the highest
> possible cost (65535) to prevent them for being used for SFmode.  So:
> 
>     ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;
> 
> The costs for union classes are the maximum (worst-case) cost of
> for each subclass, so this means that:
> 
>     ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;
> 
> as well.
> 
> Removing the code above fixes it.  If you don't want to do that, an
> alternative might be to add a class for r0-r25 (but I've not tested that).

I am still having some headache understanding that...

For example, currently R26 is forbidden for SFmode, but the same applies 
to R25 or any odd registers (modes >= 2 regs have to start in even 
registers).

Then this would imply, even after the condition regno >= 26 was removed, 
the costs would still be astronomically high because HI:21 is refused 
and SI:23 is refused etc, and due to that the cost of that class will be 
0x10000 for modes >= 2 regs?

How can the register allocator tell apart whether a register is rejected 
due to its mode or due to the register number?  AFAIK there is no other 
ws than rejecting odd registers in that hook, because register classes 
must not have holes.  Or did that change meanwhile?

Johann


> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Code bloat due to silly IRA cost model?
  2019-12-16 13:52     ` Georg-Johann Lay
@ 2019-12-16 14:12       ` Richard Sandiford
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Sandiford @ 2019-12-16 14:12 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc

Georg-Johann Lay <gjl@gcc.gnu.org> writes:
> Am 11.12.19 um 18:55 schrieb Richard Sandiford:
>> Georg-Johann Lay <gjl@gcc.gnu.org> writes:
>>> Hi, doesn't actually anybody know know to make memory more expensive
>>> than registers when it comes to allocating registers?
>>>
>>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and
>>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
>>> expensive than mem and therefore allocates values to stack slots instead
>>> of keeping them in registers.
>>>
>>> Test case (for avr) is as simple as it gets:
>>>
>>> float func (float);
>>>
>>> float call (float f)
>>> {
>>>       return func (f);
>>> }
>>>
>>> What am I missing?
>>>
>>> Johann
>>>
>>>
>>> Georg-Johann Lay schrieb:
>>>> Hi,
>>>>
>>>> I am trying to track down a code bloat issue and am stuck because I do
>>>> not understand IRA's cost model.
>>>>
>>>> The test case is as simple as it gets:
>>>>
>>>> float func (float);
>>>>
>>>> float call (float f)
>>>> {
>>>>      return func (f);
>>>> }
>>>>
>>>> IRA dump shows the following insns:
>>>>
>>>>
>>>> (insn 14 4 2 2 (set (reg:SF 44)
>>>>          (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
>>>>       (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
>>>>          (nil)))
>>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
>>>>          (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
>>>>       (expr_list:REG_DEAD (reg:SF 44)
>>>>          (nil)))
>>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>>> (insn 6 3 7 2 (set (reg:SF 22 r22)
>>>>          (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
>>>>       (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
>>>>          (nil)))
>>>> (call_insn/j 7 6 8 2 (parallel [
>>>>
>>>> #14 sets pseudo 44 from arg register R22.
>>>> #2 moves it to pseudo 43
>>>> #6 moves it to R22 as it prepares for call_insn #7.
>>>>
>>>> There are 2 allocnos and cost:
>>>>
>>>> Pass 0 for finding pseudo/allocno costs
>>>>
>>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>
>>>> which is quite odd because MEM is way more expensive here than any REG.
>>>>
>>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
>>>> of 100:
>>>>
>>>>      a1 (r44,l0) best NO_REGS, allocno NO_REGS
>>>>      a0 (r43,l0) best NO_REGS, allocno NO_REGS
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>>    a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000
>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000
>>>>
>>>> What??? The REG costs are 100 times higher, and stille higher that the
>>>> MEM costs.  What the heck is going on?
>>>>
>>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
>>>> yiels:
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>>> GENERAL_REGS:0 MEM:0
>>>>    a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
>>>> GENERAL_REGS:0 MEM:0
>>>>
>>>> as expected, i.e. there is no other hidden source of costs considered by
>>>> IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
>>>> TARGET_MEMORY_MOVE_COST = original gives:
>>>>
>>>>    a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>    a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
>>>>
>>>> How the heck do I tell ira-costs that registers are way cheaper than MEM?
>> 
>> I think this is coming from:
>> 
>>    /* FIXME: Ideally, the following test is not needed.
>>          However, it turned out that it can reduce the number
>>          of spill fails.  AVR and it's poor endowment with
>>          address registers is extreme stress test for reload.  */
>> 
>>    if (GET_MODE_SIZE (mode) >= 4
>>        && regno >= REG_X)
>>      return false;
>> 
>> in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
>> moves between pointer registers and general registers have the highest
>> possible cost (65535) to prevent them for being used for SFmode.  So:
>> 
>>     ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;
>> 
>> The costs for union classes are the maximum (worst-case) cost of
>> for each subclass, so this means that:
>> 
>>     ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;
>> 
>> as well.
>> 
>> Removing the code above fixes it.  If you don't want to do that, an
>> alternative might be to add a class for r0-r25 (but I've not tested that).
>
> I am still having some headache understanding that...
>
> For example, currently R26 is forbidden for SFmode, but the same applies 
> to R25 or any odd registers (modes >= 2 regs have to start in even 
> registers).
>
> Then this would imply, even after the condition regno >= 26 was removed, 
> the costs would still be astronomically high because HI:21 is refused 
> and SI:23 is refused etc, and due to that the cost of that class will be 
> 0x10000 for modes >= 2 regs?

No, that will be OK.  The above happens at the class level, not at the
level of individual registers.  All classes that contain HI:21 (21-22)
or SI:23 (23-26) also contain valid HI and SI registers, so the costs
will be based on the valid registers.

The problem in the case above is that POINTER_REGS is big enough to hold
SFmode but isn't allowed to do so.

> How can the register allocator tell apart whether a register is rejected 
> due to its mode or due to the register number?  AFAIK there is no other 
> ws than rejecting odd registers in that hook, because register classes 
> must not have holes.  Or did that change meanwhile?

This hook is still the right way of rejecting odd registers.  The RA
doesn't really need to know whether something was rejected because of
its register number, mode, or both.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-01-09  9:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 11:07 Code bloat due to silly IRA cost model? Georg-Johann Lay
2019-12-10 20:16 ` Georg-Johann Lay
2019-12-11 17:55   ` Richard Sandiford
2019-12-13 11:58     ` Georg-Johann Lay
2019-12-13 12:46       ` Richard Sandiford
2019-12-13 16:04         ` Segher Boessenkool
2019-12-13 16:22           ` Richard Sandiford
2019-12-13 18:59             ` Segher Boessenkool
2019-12-13 22:31               ` Richard Sandiford
2019-12-18 15:29                 ` Segher Boessenkool
2019-12-18 15:43                   ` Richard Sandiford
2020-01-09  9:52         ` Georg-Johann Lay
2019-12-16 13:52     ` Georg-Johann Lay
2019-12-16 14:12       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).