* Code bloat due to silly IRA cost model? @ 2019-10-25 11:07 Georg-Johann Lay 2019-12-10 20:16 ` Georg-Johann Lay 0 siblings, 1 reply; 14+ messages in thread From: Georg-Johann Lay @ 2019-10-25 11:07 UTC (permalink / raw) To: gcc Hi, I am trying to track down a code bloat issue and am stuck becauce I do not understand IRA's cose model. The test case is as simple as it gets: float func (float); float call (float f) { return func (f); } IRA dump shows the following insns: (insn 14 4 2 2 (set (reg:SF 44) (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) (nil))) (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) (reg:SF 44)) "bloat.c":4:1 85 {*movsf} (expr_list:REG_DEAD (reg:SF 44) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (set (reg:SF 22 r22) (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) (nil))) (call_insn/j 7 6 8 2 (parallel [ #14 sets pseudo 44 from arg register R22. #2 moves it to pseudo 43 #6 moves it to R22 as it prepares for call_insn #7. There are 2 allocnos and cost: Pass 0 for finding pseudo/allocno costs a1 (r44,l0) best NO_REGS, allocno NO_REGS a0 (r43,l0) best NO_REGS, allocno NO_REGS a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 which is quite odd because MEM is way more expensive here than any REG. Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor of 100: a1 (r44,l0) best NO_REGS, allocno NO_REGS a0 (r43,l0) best NO_REGS, allocno NO_REGS a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 What??? The REG costs are 100 times higher, and stille higher that the MEM costs. What the heck is going on? Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 yiels: a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 MEM:0 a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 MEM:0 as expected, i.e. there is no other hidden source of costs considered by IRA. And even TARGET_REGISTER_MOVE_COST = 0 and TARGET_MEMORY_MOVE_COST = original gives: a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 How the heck do I tell ira-costs that registers are way cheaper than MEM? Johann p.s. test case compiled with $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v Target: avr Configured with: ../../gcc.gnu.org/trunk/configure --target=avr --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld --enable-checking=release --enable-languages=c,c++ --disable-gcov Thread model: single Supported LTO compression algorithms: zlib gcc version 10.0.0 20191021 (experimental) (GCC) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-10-25 11:07 Code bloat due to silly IRA cost model? Georg-Johann Lay @ 2019-12-10 20:16 ` Georg-Johann Lay 2019-12-11 17:55 ` Richard Sandiford 0 siblings, 1 reply; 14+ messages in thread From: Georg-Johann Lay @ 2019-12-10 20:16 UTC (permalink / raw) To: gcc Hi, doesn't actually anybody know know to make memory more expensive than registers when it comes to allocating registers? Whatever I am trying for TARGET_MEMORY_MOVE_COST and TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more expensive than mem and therefore allocates values to stack slots instead of keeping them in registers. Test case (for avr) is as simple as it gets: float func (float); float call (float f) { return func (f); } What am I missing? Johann Georg-Johann Lay schrieb: > Hi, > > I am trying to track down a code bloat issue and am stuck because I do > not understand IRA's cost model. > > The test case is as simple as it gets: > > float func (float); > > float call (float f) > { > return func (f); > } > > IRA dump shows the following insns: > > > (insn 14 4 2 2 (set (reg:SF 44) > (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} > (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) > (nil))) > (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) > (reg:SF 44)) "bloat.c":4:1 85 {*movsf} > (expr_list:REG_DEAD (reg:SF 44) > (nil))) > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (set (reg:SF 22 r22) > (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} > (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) > (nil))) > (call_insn/j 7 6 8 2 (parallel [ > > #14 sets pseudo 44 from arg register R22. > #2 moves it to pseudo 43 > #6 moves it to R22 as it prepares for call_insn #7. > > There are 2 allocnos and cost: > > Pass 0 for finding pseudo/allocno costs > > a1 (r44,l0) best NO_REGS, allocno NO_REGS > a0 (r43,l0) best NO_REGS, allocno NO_REGS > > a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 > NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 > a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 > NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 > > which is quite odd because MEM is way more expensive here than any REG. > > Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor > of 100: > > a1 (r44,l0) best NO_REGS, allocno NO_REGS > a0 (r43,l0) best NO_REGS, allocno NO_REGS > > a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 > LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 > a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 > LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 > > What??? The REG costs are 100 times higher, and stille higher that the > MEM costs. What the heck is going on? > > Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 > yiels: > > a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 > GENERAL_REGS:0 MEM:0 > a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 > GENERAL_REGS:0 MEM:0 > > as expected, i.e. there is no other hidden source of costs considered by > IRA. And even TARGET_REGISTER_MOVE_COST = 0 and > TARGET_MEMORY_MOVE_COST = original gives: > > a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 > NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 > a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 > NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 > > How the heck do I tell ira-costs that registers are way cheaper than MEM? > > Johann > > > p.s. > > test case compiled with > > $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v > > Target: avr > Configured with: ../../gcc.gnu.org/trunk/configure --target=avr > --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls > --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld > --enable-checking=release --enable-languages=c,c++ --disable-gcov > Thread model: single > Supported LTO compression algorithms: zlib > gcc version 10.0.0 20191021 (experimental) (GCC) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-10 20:16 ` Georg-Johann Lay @ 2019-12-11 17:55 ` Richard Sandiford 2019-12-13 11:58 ` Georg-Johann Lay 2019-12-16 13:52 ` Georg-Johann Lay 0 siblings, 2 replies; 14+ messages in thread From: Richard Sandiford @ 2019-12-11 17:55 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc Georg-Johann Lay <gjl@gcc.gnu.org> writes: > Hi, doesn't actually anybody know know to make memory more expensive > than registers when it comes to allocating registers? > > Whatever I am trying for TARGET_MEMORY_MOVE_COST and > TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more > expensive than mem and therefore allocates values to stack slots instead > of keeping them in registers. > > Test case (for avr) is as simple as it gets: > > float func (float); > > float call (float f) > { > return func (f); > } > > What am I missing? > > Johann > > > Georg-Johann Lay schrieb: >> Hi, >> >> I am trying to track down a code bloat issue and am stuck because I do >> not understand IRA's cost model. >> >> The test case is as simple as it gets: >> >> float func (float); >> >> float call (float f) >> { >> return func (f); >> } >> >> IRA dump shows the following insns: >> >> >> (insn 14 4 2 2 (set (reg:SF 44) >> (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} >> (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) >> (nil))) >> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) >> (reg:SF 44)) "bloat.c":4:1 85 {*movsf} >> (expr_list:REG_DEAD (reg:SF 44) >> (nil))) >> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >> (insn 6 3 7 2 (set (reg:SF 22 r22) >> (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} >> (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) >> (nil))) >> (call_insn/j 7 6 8 2 (parallel [ >> >> #14 sets pseudo 44 from arg register R22. >> #2 moves it to pseudo 43 >> #6 moves it to R22 as it prepares for call_insn #7. >> >> There are 2 allocnos and cost: >> >> Pass 0 for finding pseudo/allocno costs >> >> a1 (r44,l0) best NO_REGS, allocno NO_REGS >> a0 (r43,l0) best NO_REGS, allocno NO_REGS >> >> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >> >> which is quite odd because MEM is way more expensive here than any REG. >> >> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor >> of 100: >> >> a1 (r44,l0) best NO_REGS, allocno NO_REGS >> a0 (r43,l0) best NO_REGS, allocno NO_REGS >> >> a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >> a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >> >> What??? The REG costs are 100 times higher, and stille higher that the >> MEM costs. What the heck is going on? >> >> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 >> yiels: >> >> a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >> GENERAL_REGS:0 MEM:0 >> a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >> GENERAL_REGS:0 MEM:0 >> >> as expected, i.e. there is no other hidden source of costs considered by >> IRA. And even TARGET_REGISTER_MOVE_COST = 0 and >> TARGET_MEMORY_MOVE_COST = original gives: >> >> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >> >> How the heck do I tell ira-costs that registers are way cheaper than MEM? I think this is coming from: /* FIXME: Ideally, the following test is not needed. However, it turned out that it can reduce the number of spill fails. AVR and it's poor endowment with address registers is extreme stress test for reload. */ if (GET_MODE_SIZE (mode) >= 4 && regno >= REG_X) return false; in avr_hard_regno_mode_ok. This forbids SFmode in r26+ and means that moves between pointer registers and general registers have the highest possible cost (65535) to prevent them for being used for SFmode. So: ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535; The costs for union classes are the maximum (worst-case) cost of for each subclass, so this means that: ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535; as well. Removing the code above fixes it. If you don't want to do that, an alternative might be to add a class for r0-r25 (but I've not tested that). Thanks, Richard >> >> Johann >> >> >> p.s. >> >> test case compiled with >> >> $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v >> >> Target: avr >> Configured with: ../../gcc.gnu.org/trunk/configure --target=avr >> --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls >> --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld >> --enable-checking=release --enable-languages=c,c++ --disable-gcov >> Thread model: single >> Supported LTO compression algorithms: zlib >> gcc version 10.0.0 20191021 (experimental) (GCC) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-11 17:55 ` Richard Sandiford @ 2019-12-13 11:58 ` Georg-Johann Lay 2019-12-13 12:46 ` Richard Sandiford 2019-12-16 13:52 ` Georg-Johann Lay 1 sibling, 1 reply; 14+ messages in thread From: Georg-Johann Lay @ 2019-12-13 11:58 UTC (permalink / raw) To: gcc; +Cc: richard.sandiford Am 11.12.19 um 18:55 schrieb Richard Sandiford: > Georg-Johann Lay <gjl@gcc.gnu.org> writes: >> Hi, doesn't actually anybody know know to make memory more expensive >> than registers when it comes to allocating registers? >> >> Whatever I am trying for TARGET_MEMORY_MOVE_COST and >> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more >> expensive than mem and therefore allocates values to stack slots instead >> of keeping them in registers. >> >> Test case (for avr) is as simple as it gets: >> >> float func (float); >> >> float call (float f) >> { >> return func (f); >> } >> >> What am I missing? >> >> Johann >> >> >> Georg-Johann Lay schrieb: >>> Hi, >>> >>> I am trying to track down a code bloat issue and am stuck because I do >>> not understand IRA's cost model. >>> >>> The test case is as simple as it gets: >>> >>> float func (float); >>> >>> float call (float f) >>> { >>> return func (f); >>> } >>> >>> IRA dump shows the following insns: >>> >>> >>> (insn 14 4 2 2 (set (reg:SF 44) >>> (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} >>> (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) >>> (nil))) >>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) >>> (reg:SF 44)) "bloat.c":4:1 85 {*movsf} >>> (expr_list:REG_DEAD (reg:SF 44) >>> (nil))) >>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >>> (insn 6 3 7 2 (set (reg:SF 22 r22) >>> (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} >>> (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) >>> (nil))) >>> (call_insn/j 7 6 8 2 (parallel [ >>> >>> #14 sets pseudo 44 from arg register R22. >>> #2 moves it to pseudo 43 >>> #6 moves it to R22 as it prepares for call_insn #7. >>> >>> There are 2 allocnos and cost: >>> >>> Pass 0 for finding pseudo/allocno costs >>> >>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>> >>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> >>> which is quite odd because MEM is way more expensive here than any REG. >>> >>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor >>> of 100: >>> >>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>> >>> a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>> a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>> >>> What??? The REG costs are 100 times higher, and stille higher that the >>> MEM costs. What the heck is going on? >>> >>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 >>> yiels: >>> >>> a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>> GENERAL_REGS:0 MEM:0 >>> a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>> GENERAL_REGS:0 MEM:0 >>> >>> as expected, i.e. there is no other hidden source of costs considered by >>> IRA. And even TARGET_REGISTER_MOVE_COST = 0 and >>> TARGET_MEMORY_MOVE_COST = original gives: >>> >>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> >>> How the heck do I tell ira-costs that registers are way cheaper than MEM? > > I think this is coming from: > > /* FIXME: Ideally, the following test is not needed. > However, it turned out that it can reduce the number > of spill fails. AVR and it's poor endowment with > address registers is extreme stress test for reload. */ > > if (GET_MODE_SIZE (mode) >= 4 > && regno >= REG_X) > return false; This was introduced to "fix" unable to find a register to spill ICE. What I do not understand is that the code with long (which is SImode on avr) is fine: long lunc (long); long callL (long f) { return lunc (f); } callL: rjmp lunc ; 7 [c=24 l=1] call_value_insn/3 > in avr_hard_regno_mode_ok. This forbids SFmode in r26+ and means that > moves between pointer registers and general registers have the highest > possible cost (65535) to prevent them for being used for SFmode. So: > > ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535; > > The costs for union classes are the maximum (worst-case) cost of > for each subclass, so this means that: > > ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535; > > as well. This means that, when there is an expensive class (because it only contains one register for example), then it will blow the cost of GENERAL_REGS to crazy values no matter what? What's also strange is that the register allocator would not need to allocate a register at all: The incoming parameter comes in SI:22 and is just be passed through to the callee, which also receives the value in SI:22. Why would one move that value to memory? Even if memory was cheaper, moving the value to mem just to load it again to the same register is not very sensible... because in almost any case, /no/ instruction is cheaper than /some/ instructions? > Removing the code above fixes it. If you don't want to do that, an > alternative might be to add a class for r0-r25 (but I've not tested that). Is there a way that it would use a similar path like SImode? > > Thanks, > Richard > >>> >>> Johann >>> >>> >>> p.s. >>> >>> test case compiled with >>> >>> $ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v >>> >>> Target: avr >>> Configured with: ../../gcc.gnu.org/trunk/configure --target=avr >>> --prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls >>> --with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld >>> --enable-checking=release --enable-languages=c,c++ --disable-gcov >>> Thread model: single >>> Supported LTO compression algorithms: zlib >>> gcc version 10.0.0 20191021 (experimental) (GCC) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 11:58 ` Georg-Johann Lay @ 2019-12-13 12:46 ` Richard Sandiford 2019-12-13 16:04 ` Segher Boessenkool 2020-01-09 9:52 ` Georg-Johann Lay 0 siblings, 2 replies; 14+ messages in thread From: Richard Sandiford @ 2019-12-13 12:46 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc Georg-Johann Lay <gjl@gcc.gnu.org> writes: > Am 11.12.19 um 18:55 schrieb Richard Sandiford: >> Georg-Johann Lay <gjl@gcc.gnu.org> writes: >>> Hi, doesn't actually anybody know know to make memory more expensive >>> than registers when it comes to allocating registers? >>> >>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and >>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more >>> expensive than mem and therefore allocates values to stack slots instead >>> of keeping them in registers. >>> >>> Test case (for avr) is as simple as it gets: >>> >>> float func (float); >>> >>> float call (float f) >>> { >>> return func (f); >>> } >>> >>> What am I missing? >>> >>> Johann >>> >>> >>> Georg-Johann Lay schrieb: >>>> Hi, >>>> >>>> I am trying to track down a code bloat issue and am stuck because I do >>>> not understand IRA's cost model. >>>> >>>> The test case is as simple as it gets: >>>> >>>> float func (float); >>>> >>>> float call (float f) >>>> { >>>> return func (f); >>>> } >>>> >>>> IRA dump shows the following insns: >>>> >>>> >>>> (insn 14 4 2 2 (set (reg:SF 44) >>>> (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} >>>> (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) >>>> (nil))) >>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) >>>> (reg:SF 44)) "bloat.c":4:1 85 {*movsf} >>>> (expr_list:REG_DEAD (reg:SF 44) >>>> (nil))) >>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >>>> (insn 6 3 7 2 (set (reg:SF 22 r22) >>>> (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} >>>> (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) >>>> (nil))) >>>> (call_insn/j 7 6 8 2 (parallel [ >>>> >>>> #14 sets pseudo 44 from arg register R22. >>>> #2 moves it to pseudo 43 >>>> #6 moves it to R22 as it prepares for call_insn #7. >>>> >>>> There are 2 allocnos and cost: >>>> >>>> Pass 0 for finding pseudo/allocno costs >>>> >>>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>>> >>>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> >>>> which is quite odd because MEM is way more expensive here than any REG. >>>> >>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor >>>> of 100: >>>> >>>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>>> >>>> a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>>> a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>>> >>>> What??? The REG costs are 100 times higher, and stille higher that the >>>> MEM costs. What the heck is going on? >>>> >>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 >>>> yiels: >>>> >>>> a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>>> GENERAL_REGS:0 MEM:0 >>>> a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>>> GENERAL_REGS:0 MEM:0 >>>> >>>> as expected, i.e. there is no other hidden source of costs considered by >>>> IRA. And even TARGET_REGISTER_MOVE_COST = 0 and >>>> TARGET_MEMORY_MOVE_COST = original gives: >>>> >>>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> >>>> How the heck do I tell ira-costs that registers are way cheaper than MEM? >> >> I think this is coming from: >> >> /* FIXME: Ideally, the following test is not needed. >> However, it turned out that it can reduce the number >> of spill fails. AVR and it's poor endowment with >> address registers is extreme stress test for reload. */ >> >> if (GET_MODE_SIZE (mode) >= 4 >> && regno >= REG_X) >> return false; > > This was introduced to "fix" unable to find a register to spill ICE. > > What I do not understand is that the code with long (which is SImode on > avr) is fine: > > long lunc (long); > > long callL (long f) > { > return lunc (f); > } > > callL: > rjmp lunc ; 7 [c=24 l=1] call_value_insn/3 This is due to differences in the way that lower-subreg.c lowers SF moves vs. SI moves. For SI it generates pure QI moves and so gets rid of the SI entirely. For SF it still builds the QI values back into an SF: || (!SCALAR_INT_MODE_P (dest_mode) && !resolve_reg_p (dest) && !resolve_subreg_p (dest))) I imagine this is because non-int modes are held in FPRs rather than GPRs on most targets, but TBH I'm not sure. I couldn't see a comment that explains the above decision. With -fno-split-wide-types I see the same RA behaviour for both SI and SF (i.e. both spill to memory). >> in avr_hard_regno_mode_ok. This forbids SFmode in r26+ and means that >> moves between pointer registers and general registers have the highest >> possible cost (65535) to prevent them for being used for SFmode. So: >> >> ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535; >> >> The costs for union classes are the maximum (worst-case) cost of >> for each subclass, so this means that: >> >> ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535; >> >> as well. > > This means that, when there is an expensive class (because it only > contains one register for example), Having one register doesn't automatically make it expensive. E.g. there's only one "c" register on x86, but it's not more expensive than other registers because of that. Move costs aren't a good way of deterring unnecessary uses of small classes. The costs should just describe the actual size or speed overhead of moving the register. > then it will blow the cost of GENERAL_REGS to crazy values no matter > what? Yeah. This is because (with the above intended use of costs) the worst-case cost of a superclass X can't be less than the worst-case cost of one of its subclasses Y. If the RA decides to allocate an X, it might get unlucky and be forced to use a register in Y. If a class X - Y exists then it won't be affected by the Y costs. So taking Y's cost into account when calculating X's cost means that the RA will prefer X - Y over X, which is exactly what making Y expensive should achieve. FWIW, that's why I suggested seeing what would happen if you added a new class for GENERAL_REGS - POINTER_REGS. > What's also strange is that the register allocator would not need to > allocate a register at all: The incoming parameter comes in SI:22 and > is just be passed through to the callee, which also receives the value > in SI:22. Why would one move that value to memory? Even if memory was > cheaper, moving the value to mem just to load it again to the same > register is not very sensible... because in almost any case, /no/ > instruction is cheaper than /some/ instructions? Earlier passes could perhaps propagate the pseudo registers away in very simple cases like this. It would be a very special-case optimisation though. If there was anything other than "move register X to register X" between the calls, getting rid of the pseudo registers before RA could introduce spill failures. combine's to blame for the fact that we have two pseudo registers rather than one. See the comments about the avr-elf results in: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html for more details. >> Removing the code above fixes it. If you don't want to do that, an >> alternative might be to add a class for r0-r25 (but I've not tested that). > > Is there a way that it would use a similar path like SImode? AFAICT the SI and SF costs are the same. The difference is coming from -fsplit-wide-types rather than RA. Thanks, Richard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 12:46 ` Richard Sandiford @ 2019-12-13 16:04 ` Segher Boessenkool 2019-12-13 16:22 ` Richard Sandiford 2020-01-09 9:52 ` Georg-Johann Lay 1 sibling, 1 reply; 14+ messages in thread From: Segher Boessenkool @ 2019-12-13 16:04 UTC (permalink / raw) To: Georg-Johann Lay, gcc, richard.sandiford On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote: > combine's to blame for the fact that we have two pseudo registers rather > than one. See the comments about the avr-elf results in: > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html > > for more details. It's not combine's fault if register allocation does a bad job. And we should *not* generate worse code in combine just because it exposes a problem in RA (with 2-2 and make_more_copies we generate better code on average, on all targets I tested, 50 or so). If having two pseudos here is not an advantage, then RA should optimise one away. It does usually, why not here? Segher ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 16:04 ` Segher Boessenkool @ 2019-12-13 16:22 ` Richard Sandiford 2019-12-13 18:59 ` Segher Boessenkool 0 siblings, 1 reply; 14+ messages in thread From: Richard Sandiford @ 2019-12-13 16:22 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Georg-Johann Lay, gcc Segher Boessenkool <segher@kernel.crashing.org> writes: > On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote: >> combine's to blame for the fact that we have two pseudo registers rather >> than one. See the comments about the avr-elf results in: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html >> >> for more details. > > It's not combine's fault if register allocation does a bad job. And we > should *not* generate worse code in combine just because it exposes a > problem in RA (with 2-2 and make_more_copies we generate better code on > average, on all targets I tested, 50 or so). > > If having two pseudos here is not an advantage, then RA should optimise > one away. It does usually, why not here? I didn't say it was combine's fault that RA was bad. I said it was combine's fault that we have two pseudos rather than one. Richard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 16:22 ` Richard Sandiford @ 2019-12-13 18:59 ` Segher Boessenkool 2019-12-13 22:31 ` Richard Sandiford 0 siblings, 1 reply; 14+ messages in thread From: Segher Boessenkool @ 2019-12-13 18:59 UTC (permalink / raw) To: Georg-Johann Lay, gcc, richard.sandiford On Fri, Dec 13, 2019 at 04:22:11PM +0000, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote: > >> combine's to blame for the fact that we have two pseudo registers rather > >> than one. See the comments about the avr-elf results in: > >> > >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html > >> > >> for more details. > > > > It's not combine's fault if register allocation does a bad job. And we > > should *not* generate worse code in combine just because it exposes a > > problem in RA (with 2-2 and make_more_copies we generate better code on > > average, on all targets I tested, 50 or so). > > > > If having two pseudos here is not an advantage, then RA should optimise > > one away. It does usually, why not here? > > I didn't say it was combine's fault that RA was bad. I said it was > combine's fault that we have two pseudos rather than one. But that is not a fault, that is on purpose. Before this change, combine would forward hard registers into pseudos greedily. RA can do a better job than that. If you found a case where RA does not do a good job, let's fix that? (And combine does get rid of two pseudos, if that is a good idea to do. If instructions do not properly combine, it can not, of course). Segher ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 18:59 ` Segher Boessenkool @ 2019-12-13 22:31 ` Richard Sandiford 2019-12-18 15:29 ` Segher Boessenkool 0 siblings, 1 reply; 14+ messages in thread From: Richard Sandiford @ 2019-12-13 22:31 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Georg-Johann Lay, gcc Segher Boessenkool <segher@kernel.crashing.org> writes: > On Fri, Dec 13, 2019 at 04:22:11PM +0000, Richard Sandiford wrote: >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> > On Fri, Dec 13, 2019 at 12:45:47PM +0000, Richard Sandiford wrote: >> >> combine's to blame for the fact that we have two pseudo registers rather >> >> than one. See the comments about the avr-elf results in: >> >> >> >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html >> >> >> >> for more details. >> > >> > It's not combine's fault if register allocation does a bad job. And we >> > should *not* generate worse code in combine just because it exposes a >> > problem in RA (with 2-2 and make_more_copies we generate better code on >> > average, on all targets I tested, 50 or so). >> > >> > If having two pseudos here is not an advantage, then RA should optimise >> > one away. It does usually, why not here? >> >> I didn't say it was combine's fault that RA was bad. I said it was >> combine's fault that we have two pseudos rather than one. > > But that is not a fault, that is on purpose. > > Before this change, combine would forward hard registers into pseudos > greedily. RA can do a better job than that. I don't think anyone's disputing that. You quoted the initial text above out of context. Johann had asked why the RA even needed to do anything for the posted testcase, where we have the equivalent of "foo (bar ())", bar returns a value in register X and foo takes an argument in register X. I was trying to explain that we still need: (set pseudo X) ... (set X pseudo) in order to avoid spill failures in all but trivial cases, and that we rely on the RA to make a good allocation choice for the pseudo. So I think what you said above is basically explaining back to me what I'd said in the context that was snipped. But we only need one temporary pseudo register to avoid the spill failures, whereas in Johann's case the RA sees two: (set pseudo2 X) (set pseudo pseudo2) ... (set X pseudo2) My point was the extra pseudo<-pseudo2 move is created by combine for its own internal purposes and pseudo2 isn't something *the RA* needs to avoid spill failures. But in this case combine fails to fold the extra move with anything and so the move "leaks out" to later passes, including RA. The snipped context linked to the message where we'd discussed this, including why combine fails to restore the original X<-pseudo<-X chain for avr-elf. It also shows that avr-elf code improved significantly if we *did* restore that original chain (which the new combine pass happened to do, although that just fell out in the wash rather than being a specific aim). In a perfect world, we could keep adding more and more pseudos to a move chain without affecting the output of the RA. But it's not too surprising if that isn't always true in practice. After all, the point of adding pseudo2 in the first place is that *combine* handles pseudo<-pseudo2<-X differently from just pseudo<-X. ;-) Thanks, Richard > If you found a case where > RA does not do a good job, let's fix that? > > (And combine does get rid of two pseudos, if that is a good idea to do. > If instructions do not properly combine, it can not, of course). > > > Segher ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 22:31 ` Richard Sandiford @ 2019-12-18 15:29 ` Segher Boessenkool 2019-12-18 15:43 ` Richard Sandiford 0 siblings, 1 reply; 14+ messages in thread From: Segher Boessenkool @ 2019-12-18 15:29 UTC (permalink / raw) To: Georg-Johann Lay, gcc, richard.sandiford Hi Richard, On Fri, Dec 13, 2019 at 10:31:54PM +0000, Richard Sandiford wrote: > >> I didn't say it was combine's fault that RA was bad. I said it was > >> combine's fault that we have two pseudos rather than one. See below. > My point was the extra pseudo<-pseudo2 move is created by combine for > its own internal purposes And my point is that it is *not* internal purposes :-) This is done because we no longer combine with the hard register, but combining with just a register move is quite beneficial for many targets. We could (and probably should) do a 1->1 combine first, i.e. just simplification for every single insn, but that causes other problems right now. GCC 11, I hope. What happens is we have this: insn_cost 4 for 14: r44:SF=r22:SF REG_DEAD r22:SF insn_cost 4 for 2: r43:SF=r44:SF REG_DEAD r44:SF insn_cost 4 for 6: r22:SF=r43:SF REG_DEAD r43:SF insn_cost 0 for 7: r22:SF=call [`g'] argc:0 REG_CALL_DECL `g' (where insn 14 and r44 are created by make_more_copies). Now, insn 14 would normally be combined into insn 2. But this doesn't happen because the target prohibits it, with the targetm.class_likely_spilled_p in cant_combine_insn_p. I wonder if we still need that at all? Segher ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-18 15:29 ` Segher Boessenkool @ 2019-12-18 15:43 ` Richard Sandiford 0 siblings, 0 replies; 14+ messages in thread From: Richard Sandiford @ 2019-12-18 15:43 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Georg-Johann Lay, gcc Segher Boessenkool <segher@kernel.crashing.org> writes: >> My point was the extra pseudo<-pseudo2 move is created by combine for >> its own internal purposes > > And my point is that it is *not* internal purposes :-) This is done > because we no longer combine with the hard register, but combining with > just a register move is quite beneficial for many targets. But that's what I meant by "its own internal purposes". It's something one part of combine does to make other parts of combine work better. Richard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-13 12:46 ` Richard Sandiford 2019-12-13 16:04 ` Segher Boessenkool @ 2020-01-09 9:52 ` Georg-Johann Lay 1 sibling, 0 replies; 14+ messages in thread From: Georg-Johann Lay @ 2020-01-09 9:52 UTC (permalink / raw) To: gcc; +Cc: richard.sandiford [-- Attachment #1: Type: text/plain, Size: 10425 bytes --] Am 13.12.19 um 13:45 schrieb Richard Sandiford: > Georg-Johann Lay <gjl@gcc.gnu.org> writes: >> Am 11.12.19 um 18:55 schrieb Richard Sandiford: >>> Georg-Johann Lay <gjl@gcc.gnu.org> writes: >>>> Hi, doesn't actually anybody know know to make memory more expensive >>>> than registers when it comes to allocating registers? >>>> >>>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and >>>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more >>>> expensive than mem and therefore allocates values to stack slots instead >>>> of keeping them in registers. >>>> >>>> Test case (for avr) is as simple as it gets: >>>> >>>> float func (float); >>>> >>>> float call (float f) >>>> { >>>> return func (f); >>>> } >>>> >>>> What am I missing? >>>> >>>> Johann >>>> >>>> >>>> Georg-Johann Lay schrieb: >>>>> Hi, >>>>> >>>>> I am trying to track down a code bloat issue and am stuck because I do >>>>> not understand IRA's cost model. >>>>> >>>>> The test case is as simple as it gets: >>>>> >>>>> float func (float); >>>>> >>>>> float call (float f) >>>>> { >>>>> return func (f); >>>>> } >>>>> >>>>> IRA dump shows the following insns: >>>>> >>>>> >>>>> (insn 14 4 2 2 (set (reg:SF 44) >>>>> (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} >>>>> (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) >>>>> (nil))) >>>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) >>>>> (reg:SF 44)) "bloat.c":4:1 85 {*movsf} >>>>> (expr_list:REG_DEAD (reg:SF 44) >>>>> (nil))) >>>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >>>>> (insn 6 3 7 2 (set (reg:SF 22 r22) >>>>> (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} >>>>> (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) >>>>> (nil))) >>>>> (call_insn/j 7 6 8 2 (parallel [ >>>>> >>>>> #14 sets pseudo 44 from arg register R22. >>>>> #2 moves it to pseudo 43 >>>>> #6 moves it to R22 as it prepares for call_insn #7. >>>>> >>>>> There are 2 allocnos and cost: >>>>> >>>>> Pass 0 for finding pseudo/allocno costs >>>>> >>>>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>>>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>>>> >>>>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>>> >>>>> which is quite odd because MEM is way more expensive here than any REG. >>>>> >>>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor >>>>> of 100: >>>>> >>>>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>>>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>>>> >>>>> a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>>>> a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>>>> >>>>> What??? The REG costs are 100 times higher, and stille higher that the >>>>> MEM costs. What the heck is going on? >>>>> >>>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 >>>>> yiels: >>>>> >>>>> a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>>>> GENERAL_REGS:0 MEM:0 >>>>> a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>>>> GENERAL_REGS:0 MEM:0 >>>>> >>>>> as expected, i.e. there is no other hidden source of costs considered by >>>>> IRA. And even TARGET_REGISTER_MOVE_COST = 0 and >>>>> TARGET_MEMORY_MOVE_COST = original gives: >>>>> >>>>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>>> >>>>> How the heck do I tell ira-costs that registers are way cheaper than MEM? >>> >>> I think this is coming from: >>> >>> /* FIXME: Ideally, the following test is not needed. >>> However, it turned out that it can reduce the number >>> of spill fails. AVR and it's poor endowment with >>> address registers is extreme stress test for reload. */ >>> >>> if (GET_MODE_SIZE (mode) >= 4 >>> && regno >= REG_X) >>> return false; >> >> This was introduced to "fix" unable to find a register to spill ICE. >> >> What I do not understand is that the code with long (which is SImode on >> avr) is fine: >> >> long lunc (long); >> >> long callL (long f) >> { >> return lunc (f); >> } >> >> callL: >> rjmp lunc ; 7 [c=24 l=1] call_value_insn/3 > > This is due to differences in the way that lower-subreg.c lowers > SF moves vs. SI moves. For SI it generates pure QI moves and so > gets rid of the SI entirely. For SF it still builds the QI values > back into an SF: > > || (!SCALAR_INT_MODE_P (dest_mode) > && !resolve_reg_p (dest) > && !resolve_subreg_p (dest))) > > I imagine this is because non-int modes are held in FPRs rather than > GPRs on most targets, but TBH I'm not sure. I couldn't see a comment > that explains the above decision. > > With -fno-split-wide-types I see the same RA behaviour for both SI and SF > (i.e. both spill to memory). > >>> in avr_hard_regno_mode_ok. This forbids SFmode in r26+ and means that >>> moves between pointer registers and general registers have the highest >>> possible cost (65535) to prevent them for being used for SFmode. So: >>> >>> ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535; >>> >>> The costs for union classes are the maximum (worst-case) cost of >>> for each subclass, so this means that: >>> >>> ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535; >>> >>> as well. >> >> This means that, when there is an expensive class (because it only >> contains one register for example), > > Having one register doesn't automatically make it expensive. > E.g. there's only one "c" register on x86, but it's not more expensive > than other registers because of that. > > Move costs aren't a good way of deterring unnecessary uses of small > classes. The costs should just describe the actual size or speed > overhead of moving the register. > >> then it will blow the cost of GENERAL_REGS to crazy values no matter >> what? > > Yeah. This is because (with the above intended use of costs) the > worst-case cost of a superclass X can't be less than the worst-case cost > of one of its subclasses Y. If the RA decides to allocate an X, it might > get unlucky and be forced to use a register in Y. > > If a class X - Y exists then it won't be affected by the Y costs. > So taking Y's cost into account when calculating X's cost means that > the RA will prefer X - Y over X, which is exactly what making Y > expensive should achieve. > > FWIW, that's why I suggested seeing what would happen if you added a new > class for GENERAL_REGS - POINTER_REGS. Hi, I have tried that. Didn't fix the issue. For reference, the test case compiled with -Os -fsplit-wide-types-early: float func (float); float call (float f) { return func (f); } with the attached delta. It's still the case that there are 16 superfluous instructions, dead store, setup of frame even though not needed. call: push r28 ; 17 [c=4 l=1] pushqi1/0 push r29 ; 18 [c=4 l=1] pushqi1/0 ; SP -= 4 ; 22 [c=4 l=2] *addhi3_sp rcall . rcall . in r28,__SP_L__ ; 23 [c=4 l=2] *movhi/7 in r29,__SP_H__ /* prologue: function */ /* frame size = 4 */ /* stack size = 6 */ .L__stack_usage = 6 std Y+1,r22 ; 14 [c=4 l=4] *movsf/3 std Y+2,r23 std Y+3,r24 std Y+4,r25 /* epilogue start */ ; SP += 4 ; 34 [c=4 l=4] *addhi3_sp pop __tmp_reg__ pop __tmp_reg__ pop __tmp_reg__ pop __tmp_reg__ pop r29 ; 35 [c=4 l=1] popqi pop r28 ; 36 [c=4 l=1] popqi rjmp func ; 7 [c=24 l=1] call_value_insn/3 In the IRA dump, it's still the case that REGs are consistently more expensive than MEM: Pass 0 for finding pseudo/allocno costs a1 (r44,l0) best NO_REGS, allocno NO_REGS a0 (r43,l0) best NO_REGS, allocno NO_REGS a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 NO_POINTER_REGS:32000 MEM:9000 a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 NO_LD_REGS:32000 NO_POINTER_REGS:32000 MEM:9000 Pass 1 for finding pseudo/allocno costs r44: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS r43: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS a0(r43,l0) costs: ADDW_REGS:48000 SIMPLE_LD_REGS:48000 LD_REGS:48000 NO_LD_REGS:48000 NO_POINTER_REGS:48000 MEM:17000 a1(r44,l0) costs: ADDW_REGS:40008 SIMPLE_LD_REGS:40008 LD_REGS:40008 NO_LD_REGS:40008 NO_POINTER_REGS:40008 MEM:17000 Johann p.s.: Also added a reg-class for the intersection of R0..r25 and r24..r31. Don't know if that's a requirement for regclass layout though. >> What's also strange is that the register allocator would not need to >> allocate a register at all: The incoming parameter comes in SI:22 and >> is just be passed through to the callee, which also receives the value >> in SI:22. Why would one move that value to memory? Even if memory was >> cheaper, moving the value to mem just to load it again to the same >> register is not very sensible... because in almost any case, /no/ >> instruction is cheaper than /some/ instructions? > > Earlier passes could perhaps propagate the pseudo registers away in very > simple cases like this. It would be a very special-case optimisation > though. If there was anything other than "move register X to register X" > between the calls, getting rid of the pseudo registers before RA could > introduce spill failures. > > combine's to blame for the fact that we have two pseudo registers rather > than one. See the comments about the avr-elf results in: > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02150.html > > for more details. > >>> Removing the code above fixes it. If you don't want to do that, an >>> alternative might be to add a class for r0-r25 (but I've not tested that). >> >> Is there a way that it would use a similar path like SImode? > > AFAICT the SI and SF costs are the same. The difference is coming > from -fsplit-wide-types rather than RA. > > Thanks, > Richard > [-- Attachment #2: x.diff --] [-- Type: text/x-patch, Size: 4255 bytes --] Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 279994) +++ config/avr/avr.c (working copy) @@ -850,7 +850,7 @@ avr_regno_reg_class (int r) SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, /* r24, r25 */ - ADDW_REGS, ADDW_REGS, + R24_R25_REGS, R24_R25_REGS, /* X: r26, 27 */ POINTER_X_REGS, POINTER_X_REGS, /* Y: r28, r29 */ @@ -12704,6 +12704,7 @@ avr_conditional_register_usage (void) CLEAR_HARD_REG_SET (reg_class_contents[(int) ADDW_REGS]); CLEAR_HARD_REG_SET (reg_class_contents[(int) NO_LD_REGS]); + reg_class_contents[NO_POINTER_REGS] &= reg_class_contents[LD_REGS]; } } Index: config/avr/avr.h =================================================================== --- config/avr/avr.h (revision 279994) +++ config/avr/avr.h (working copy) @@ -219,6 +219,7 @@ These two properties are reflected by bu enum reg_class { NO_REGS, R0_REG, /* r0 */ + R24_R25_REGS, /* r24 - r25 */ POINTER_X_REGS, /* r26 - r27 */ POINTER_Y_REGS, /* r28 - r29 */ POINTER_Z_REGS, /* r30 - r31 */ @@ -229,6 +230,7 @@ enum reg_class { SIMPLE_LD_REGS, /* r16 - r23 */ LD_REGS, /* r16 - r31 */ NO_LD_REGS, /* r0 - r15 */ + NO_POINTER_REGS, /* r0 - r25 */ GENERAL_REGS, /* r0 - r31 */ ALL_REGS, LIM_REG_CLASSES }; @@ -236,25 +238,28 @@ enum reg_class { #define N_REG_CLASSES (int)LIM_REG_CLASSES -#define REG_CLASS_NAMES { \ - "NO_REGS", \ - "R0_REG", /* r0 */ \ - "POINTER_X_REGS", /* r26 - r27 */ \ - "POINTER_Y_REGS", /* r28 - r29 */ \ - "POINTER_Z_REGS", /* r30 - r31 */ \ - "STACK_REG", /* STACK */ \ - "BASE_POINTER_REGS", /* r28 - r31 */ \ - "POINTER_REGS", /* r26 - r31 */ \ - "ADDW_REGS", /* r24 - r31 */ \ - "SIMPLE_LD_REGS", /* r16 - r23 */ \ - "LD_REGS", /* r16 - r31 */ \ - "NO_LD_REGS", /* r0 - r15 */ \ - "GENERAL_REGS", /* r0 - r31 */ \ - "ALL_REGS" } +#define REG_CLASS_NAMES { \ + "NO_REGS", \ + "R0_REG", /* r0 */ \ + "R24_R25_REGS", /* r24 - r25 */ \ + "POINTER_X_REGS", /* r26 - r27 */ \ + "POINTER_Y_REGS", /* r28 - r29 */ \ + "POINTER_Z_REGS", /* r30 - r31 */ \ + "STACK_REG", /* STACK */ \ + "BASE_POINTER_REGS", /* r28 - r31 */ \ + "POINTER_REGS", /* r26 - r31 */ \ + "ADDW_REGS", /* r24 - r31 */ \ + "SIMPLE_LD_REGS", /* r16 - r23 */ \ + "LD_REGS", /* r16 - r31 */ \ + "NO_LD_REGS", /* r0 - r15 */ \ + "NO_POINTER_REGS", /* r0 - r35 */ \ + "GENERAL_REGS", /* r0 - r31 */ \ + "ALL_REGS" } #define REG_CLASS_CONTENTS { \ {0x00000000,0x00000000}, /* NO_REGS */ \ {0x00000001,0x00000000}, /* R0_REG */ \ + {3u << 24,0x00000000}, /* R24_R25_REGS, r24 - r25 */ \ {3u << REG_X,0x00000000}, /* POINTER_X_REGS, r26 - r27 */ \ {3u << REG_Y,0x00000000}, /* POINTER_Y_REGS, r28 - r29 */ \ {3u << REG_Z,0x00000000}, /* POINTER_Z_REGS, r30 - r31 */ \ @@ -269,6 +274,7 @@ enum reg_class { {(3u << REG_X)|(3u << REG_Y)|(3u << REG_Z)|(3u << REG_W)|(0xffu << 16),\ 0x00000000}, /* LD_REGS, r16 - r31 */ \ {0x0000ffff,0x00000000}, /* NO_LD_REGS r0 - r15 */ \ + {0x03ffffff,0x00000000}, /* NO_POINTER_REGS r0 - r25 */ \ {0xffffffff,0x00000000}, /* GENERAL_REGS, r0 - r31 */ \ {0xffffffff,0x00000003} /* ALL_REGS */ \ } Index: config/avr/constraints.md =================================================================== --- config/avr/constraints.md (revision 279992) +++ config/avr/constraints.md (working copy) @@ -53,6 +53,12 @@ (define_register_constraint "z" "POINTER (define_register_constraint "q" "STACK_REG" "Stack pointer register (SPH:SPL).") +(define_register_constraint "R24R25" "R24_R25_REGS" + "Register pair r25:r24.") + +(define_register_constraint "R00R25" "NO_POINTER_REGS" + "Non-pointer registers r0 to r25.") + (define_constraint "I" "Integer constant in the range 0 @dots{} 63." (and (match_code "const_int") ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-11 17:55 ` Richard Sandiford 2019-12-13 11:58 ` Georg-Johann Lay @ 2019-12-16 13:52 ` Georg-Johann Lay 2019-12-16 14:12 ` Richard Sandiford 1 sibling, 1 reply; 14+ messages in thread From: Georg-Johann Lay @ 2019-12-16 13:52 UTC (permalink / raw) To: richard.sandiford; +Cc: gcc Am 11.12.19 um 18:55 schrieb Richard Sandiford: > Georg-Johann Lay <gjl@gcc.gnu.org> writes: >> Hi, doesn't actually anybody know know to make memory more expensive >> than registers when it comes to allocating registers? >> >> Whatever I am trying for TARGET_MEMORY_MOVE_COST and >> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more >> expensive than mem and therefore allocates values to stack slots instead >> of keeping them in registers. >> >> Test case (for avr) is as simple as it gets: >> >> float func (float); >> >> float call (float f) >> { >> return func (f); >> } >> >> What am I missing? >> >> Johann >> >> >> Georg-Johann Lay schrieb: >>> Hi, >>> >>> I am trying to track down a code bloat issue and am stuck because I do >>> not understand IRA's cost model. >>> >>> The test case is as simple as it gets: >>> >>> float func (float); >>> >>> float call (float f) >>> { >>> return func (f); >>> } >>> >>> IRA dump shows the following insns: >>> >>> >>> (insn 14 4 2 2 (set (reg:SF 44) >>> (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} >>> (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) >>> (nil))) >>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) >>> (reg:SF 44)) "bloat.c":4:1 85 {*movsf} >>> (expr_list:REG_DEAD (reg:SF 44) >>> (nil))) >>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >>> (insn 6 3 7 2 (set (reg:SF 22 r22) >>> (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} >>> (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) >>> (nil))) >>> (call_insn/j 7 6 8 2 (parallel [ >>> >>> #14 sets pseudo 44 from arg register R22. >>> #2 moves it to pseudo 43 >>> #6 moves it to R22 as it prepares for call_insn #7. >>> >>> There are 2 allocnos and cost: >>> >>> Pass 0 for finding pseudo/allocno costs >>> >>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>> >>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> >>> which is quite odd because MEM is way more expensive here than any REG. >>> >>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor >>> of 100: >>> >>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>> >>> a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>> a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>> >>> What??? The REG costs are 100 times higher, and stille higher that the >>> MEM costs. What the heck is going on? >>> >>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 >>> yiels: >>> >>> a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>> GENERAL_REGS:0 MEM:0 >>> a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>> GENERAL_REGS:0 MEM:0 >>> >>> as expected, i.e. there is no other hidden source of costs considered by >>> IRA. And even TARGET_REGISTER_MOVE_COST = 0 and >>> TARGET_MEMORY_MOVE_COST = original gives: >>> >>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>> >>> How the heck do I tell ira-costs that registers are way cheaper than MEM? > > I think this is coming from: > > /* FIXME: Ideally, the following test is not needed. > However, it turned out that it can reduce the number > of spill fails. AVR and it's poor endowment with > address registers is extreme stress test for reload. */ > > if (GET_MODE_SIZE (mode) >= 4 > && regno >= REG_X) > return false; > > in avr_hard_regno_mode_ok. This forbids SFmode in r26+ and means that > moves between pointer registers and general registers have the highest > possible cost (65535) to prevent them for being used for SFmode. So: > > ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535; > > The costs for union classes are the maximum (worst-case) cost of > for each subclass, so this means that: > > ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535; > > as well. > > Removing the code above fixes it. If you don't want to do that, an > alternative might be to add a class for r0-r25 (but I've not tested that). I am still having some headache understanding that... For example, currently R26 is forbidden for SFmode, but the same applies to R25 or any odd registers (modes >= 2 regs have to start in even registers). Then this would imply, even after the condition regno >= 26 was removed, the costs would still be astronomically high because HI:21 is refused and SI:23 is refused etc, and due to that the cost of that class will be 0x10000 for modes >= 2 regs? How can the register allocator tell apart whether a register is rejected due to its mode or due to the register number? AFAIK there is no other ws than rejecting odd registers in that hook, because register classes must not have holes. Or did that change meanwhile? Johann > Thanks, > Richard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Code bloat due to silly IRA cost model? 2019-12-16 13:52 ` Georg-Johann Lay @ 2019-12-16 14:12 ` Richard Sandiford 0 siblings, 0 replies; 14+ messages in thread From: Richard Sandiford @ 2019-12-16 14:12 UTC (permalink / raw) To: Georg-Johann Lay; +Cc: gcc Georg-Johann Lay <gjl@gcc.gnu.org> writes: > Am 11.12.19 um 18:55 schrieb Richard Sandiford: >> Georg-Johann Lay <gjl@gcc.gnu.org> writes: >>> Hi, doesn't actually anybody know know to make memory more expensive >>> than registers when it comes to allocating registers? >>> >>> Whatever I am trying for TARGET_MEMORY_MOVE_COST and >>> TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more >>> expensive than mem and therefore allocates values to stack slots instead >>> of keeping them in registers. >>> >>> Test case (for avr) is as simple as it gets: >>> >>> float func (float); >>> >>> float call (float f) >>> { >>> return func (f); >>> } >>> >>> What am I missing? >>> >>> Johann >>> >>> >>> Georg-Johann Lay schrieb: >>>> Hi, >>>> >>>> I am trying to track down a code bloat issue and am stuck because I do >>>> not understand IRA's cost model. >>>> >>>> The test case is as simple as it gets: >>>> >>>> float func (float); >>>> >>>> float call (float f) >>>> { >>>> return func (f); >>>> } >>>> >>>> IRA dump shows the following insns: >>>> >>>> >>>> (insn 14 4 2 2 (set (reg:SF 44) >>>> (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf} >>>> (expr_list:REG_DEAD (reg:SF 22 r22 [ f ]) >>>> (nil))) >>>> (insn 2 14 3 2 (set (reg/v:SF 43 [ f ]) >>>> (reg:SF 44)) "bloat.c":4:1 85 {*movsf} >>>> (expr_list:REG_DEAD (reg:SF 44) >>>> (nil))) >>>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >>>> (insn 6 3 7 2 (set (reg:SF 22 r22) >>>> (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf} >>>> (expr_list:REG_DEAD (reg/v:SF 43 [ f ]) >>>> (nil))) >>>> (call_insn/j 7 6 8 2 (parallel [ >>>> >>>> #14 sets pseudo 44 from arg register R22. >>>> #2 moves it to pseudo 43 >>>> #6 moves it to R22 as it prepares for call_insn #7. >>>> >>>> There are 2 allocnos and cost: >>>> >>>> Pass 0 for finding pseudo/allocno costs >>>> >>>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>>> >>>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> >>>> which is quite odd because MEM is way more expensive here than any REG. >>>> >>>> Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor >>>> of 100: >>>> >>>> a1 (r44,l0) best NO_REGS, allocno NO_REGS >>>> a0 (r43,l0) best NO_REGS, allocno NO_REGS >>>> >>>> a0(r43,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>>> a1(r44,l0) costs: ADDW_REGS:3200000 SIMPLE_LD_REGS:3200000 >>>> LD_REGS:3200000 NO_LD_REGS:3200000 GENERAL_REGS:3200000 MEM:801000 >>>> >>>> What??? The REG costs are 100 times higher, and stille higher that the >>>> MEM costs. What the heck is going on? >>>> >>>> Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 >>>> yiels: >>>> >>>> a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>>> GENERAL_REGS:0 MEM:0 >>>> a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 >>>> GENERAL_REGS:0 MEM:0 >>>> >>>> as expected, i.e. there is no other hidden source of costs considered by >>>> IRA. And even TARGET_REGISTER_MOVE_COST = 0 and >>>> TARGET_MEMORY_MOVE_COST = original gives: >>>> >>>> a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 >>>> NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000 >>>> >>>> How the heck do I tell ira-costs that registers are way cheaper than MEM? >> >> I think this is coming from: >> >> /* FIXME: Ideally, the following test is not needed. >> However, it turned out that it can reduce the number >> of spill fails. AVR and it's poor endowment with >> address registers is extreme stress test for reload. */ >> >> if (GET_MODE_SIZE (mode) >= 4 >> && regno >= REG_X) >> return false; >> >> in avr_hard_regno_mode_ok. This forbids SFmode in r26+ and means that >> moves between pointer registers and general registers have the highest >> possible cost (65535) to prevent them for being used for SFmode. So: >> >> ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535; >> >> The costs for union classes are the maximum (worst-case) cost of >> for each subclass, so this means that: >> >> ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535; >> >> as well. >> >> Removing the code above fixes it. If you don't want to do that, an >> alternative might be to add a class for r0-r25 (but I've not tested that). > > I am still having some headache understanding that... > > For example, currently R26 is forbidden for SFmode, but the same applies > to R25 or any odd registers (modes >= 2 regs have to start in even > registers). > > Then this would imply, even after the condition regno >= 26 was removed, > the costs would still be astronomically high because HI:21 is refused > and SI:23 is refused etc, and due to that the cost of that class will be > 0x10000 for modes >= 2 regs? No, that will be OK. The above happens at the class level, not at the level of individual registers. All classes that contain HI:21 (21-22) or SI:23 (23-26) also contain valid HI and SI registers, so the costs will be based on the valid registers. The problem in the case above is that POINTER_REGS is big enough to hold SFmode but isn't allowed to do so. > How can the register allocator tell apart whether a register is rejected > due to its mode or due to the register number? AFAIK there is no other > ws than rejecting odd registers in that hook, because register classes > must not have holes. Or did that change meanwhile? This hook is still the right way of rejecting odd registers. The RA doesn't really need to know whether something was rejected because of its register number, mode, or both. Thanks, Richard ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-01-09 9:52 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-25 11:07 Code bloat due to silly IRA cost model? Georg-Johann Lay 2019-12-10 20:16 ` Georg-Johann Lay 2019-12-11 17:55 ` Richard Sandiford 2019-12-13 11:58 ` Georg-Johann Lay 2019-12-13 12:46 ` Richard Sandiford 2019-12-13 16:04 ` Segher Boessenkool 2019-12-13 16:22 ` Richard Sandiford 2019-12-13 18:59 ` Segher Boessenkool 2019-12-13 22:31 ` Richard Sandiford 2019-12-18 15:29 ` Segher Boessenkool 2019-12-18 15:43 ` Richard Sandiford 2020-01-09 9:52 ` Georg-Johann Lay 2019-12-16 13:52 ` Georg-Johann Lay 2019-12-16 14:12 ` Richard Sandiford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).