public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
@ 2024-04-15 22:42 vineetg at gcc dot gnu.org
2024-04-15 22:43 ` [Bug rtl-optimization/114729] " pinskia at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-04-15 22:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
Bug ID: 114729
Summary: RISC-V SPEC2017 507.cactu excessive spillls with
-fschedule-insns
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vineetg at gcc dot gnu.org
CC: jeffreyalaw at gmail dot com, kito.cheng at gmail dot com,
rdapp at gcc dot gnu.org
Target Milestone: ---
Created attachment 57953
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57953&action=edit
spec cactu reduced
In RISC-V SPEC runs, Cactu dynamic icounts are worst of all (compared to
aarch64 with similar build toggles: -Ofast).
As of Upstream commit 3fed1609f610 of 2024-01-31:
aarch64: 1,363,212,534,747 vs.
risc-v : 2,852,277,890,338
There's an existing issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265
which captures ongoing work to improve the stack/array accesses. However that
is more of damage control. The root cause happens to be excessive stack spills
on RISC-V. Robin noticed these were somehow triggered by first scheduling pass.
Disabling sched1 with -fno-schedule-insns brings down the total icount to half
1,295,520,619,523 which is even slightly better than aarch64, all things
considered.
I ran a reducer (tracking token sfp in -verbose-asm output) and was able to get
a test which shows a single stack spill (store+load) with
default/-fschedule-insns and none with -fno-schedule-insns.
It seems sched1 is moving insn around, but the actual spills are generated by
IRA. So this is an interplay of sched1 and IRA.
```
ira
New iteration of spill/restore move
Changing RTL for loop 2 (header bb6)
Changing RTL for loop 1 (header bb4)
26 vs parent 26:Creating newreg=246 from oldreg=137
25 vs parent 25:Creating newreg=247 from oldreg=143
11 vs parent 11:Creating newreg=248 from oldreg=223
16 vs parent 16:Creating newreg=249 from oldreg=237
Changing RTL for loop 3 (header bb3)
26 vs parent 26:Creating newreg=250 from oldreg=246
25 vs parent 25:Creating newreg=251 from oldreg=247
-1 vs parent 11:Creating newreg=253 from oldreg=248
16 vs parent 16:Creating newreg=254 from oldreg=249
...
scanning new insn with uid = 181.
scanning new insn with uid = 182.
scanning new insn with uid = 183.
scanning new insn with uid = 184.
changing bb of uid 194
unscanned insn
scanning new insn with uid = 185.
scanning new insn with uid = 186.
scanning new insn with uid = 187.
scanning new insn with uid = 188.
changing bb of uid 195
unscanned insn
...
+++Costs: overall 11650, reg 10680, mem 970, ld 485, st 485, move 1366
+++ move loops 0, new jumps 2
...
(insn 9 104 11 2 (set (reg/f:DI 137 [ r.4_4 ])
(mem/f/c:DI (lo_sum:DI (reg/f:DI 155)
(symbol_ref:DI ("r") [flags 0x86]
<var_decl 0x7a69fcdb1630 r>)) [4 r+0 S8 A64]))
{*movdi_64bit}
(expr_list:REG_DEAD (reg/f:DI 155)
(expr_list:REG_EQUAL (mem/f/c:DI
(symbol_ref:DI ("r") [flags 0x86]
<var_decl 0x7a69fcdb1630 r>) [4 r+0 S8 A64])
(insn 115 165 181 2 (set (reg:DI 245)
(const_int 1 [0x1])) {*movdi_64bit}
(expr_list:REG_EQUIV (const_int 1 [0x1])
---- spill code start -----
(insn 181 115 182 2 (set (reg/f:DI 246
[orig:137 r.4_4 ] [137])
(reg/f:DI 137 [ r.4_4 ])) {*movdi_64bit}
(expr_list:REG_DEAD (reg/f:DI 137 [ r.4_4 ])
(insn 182 181 183 2 (set (reg/f:DI 247
[orig:143 w.9_10 ] [143])
(reg/f:DI 143 [ w.9_10 ])) {*movdi_64bit}
(expr_list:REG_DEAD (reg/f:DI 143 [ w.9_10 ])
(insn 183 182 184 2 (set (reg:DI 248
[orig:223 MEM[(int *)j.15_19 + 4B] ] [223])
(reg:DI 223 [ MEM[(int *)j.15_19 + 4B] ]))
{*movdi_64bit}
(expr_list:REG_DEAD (reg:DI 223
[ MEM[(int *)j.15_19 + 4B] ])
(insn 184 183 174 2 (set (reg:DI 249
[orig:237 _38 ] [237])
(reg:DI 237 [ _38 ])) {*movdi_64bit}
(expr_list:REG_DEAD (reg:DI 237 [ _38 ])
---- spill code -----
(jump_insn 174 184 175 2 (set (pc)
(label_ref 100)) 350 {jump}
(nil)
-> 100)
(barrier 175 174 196)
---- spill code start -----
(code_label 196 175 195 10 10 (nil) [1 uses])
(note 195 196 189 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
(insn 189 195 190 10 (set (reg/f:DI 250
[orig:137 r.4_4 ] [137])
(reg/f:DI 246 [orig:137 r.4_4 ] [137]))
{*movdi_64bit}
(expr_list:REG_DEAD (reg/f:DI 246
[orig:137 r.4_4 ] [137])
(insn 190 189 191 10 (set (reg/f:DI 251
[orig:143 w.9_10 ] [143])
(reg/f:DI 247 [orig:143 w.9_10 ] [143]))
{*movdi_64bit}
(expr_list:REG_DEAD (reg/f:DI 247
[orig:143 w.9_10 ] [143])
(insn 191 190 192 10 (set (reg/v:DI 252
[orig:152 i ] [152])
(reg/v:DI 152 [ i ])) 208 {*movdi_64bit}
(expr_list:REG_DEAD (reg/v:DI 152 [ i ])
(insn 192 191 193 10 (set (reg:DI 253
[orig:223 MEM[(int *)j.15_19 + 4B] ] [223])
(reg:DI 248 [orig:223
MEM[(int *)j.15_19 + 4B] ] [223])) {*movdi_64bit}
(expr_list:REG_DEAD (reg:DI 248
[orig:223 MEM[(int *)j.15_19 + 4B] ] [223])
(insn 193 192 97 10 (set (reg:DI 254
[orig:237 _38 ] [237])
(reg:DI 249 [orig:237 _38 ] [237]))
{*movdi_64bit}
(expr_list:REG_DEAD (reg:DI 249
[orig:237 _38 ] [237])
---- spill code -----
(code_label 97 193 14 3 3 (nil) [1 uses])
(note 14 97 33 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 17 56 21 3 (set (reg:DF 159 [ l ])
(mem/c:DF (lo_sum:DI (reg/f:DI 234)
```
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
@ 2024-04-15 22:43 ` pinskia at gcc dot gnu.org
2024-04-15 22:49 ` vineetg at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-15 22:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
Component|target |rtl-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect there are some other duplicates of this issue (including ones that
affect aarch64).
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
2024-04-15 22:43 ` [Bug rtl-optimization/114729] " pinskia at gcc dot gnu.org
@ 2024-04-15 22:49 ` vineetg at gcc dot gnu.org
2024-04-15 23:14 ` law at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-04-15 22:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #2 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
FWIW -fsched-pressure is already default enabled for RISC-V.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
2024-04-15 22:43 ` [Bug rtl-optimization/114729] " pinskia at gcc dot gnu.org
2024-04-15 22:49 ` vineetg at gcc dot gnu.org
@ 2024-04-15 23:14 ` law at gcc dot gnu.org
2024-04-15 23:28 ` vineetg at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-15 23:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-04-15
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
CC| |law at gcc dot gnu.org
--- Comment #3 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Right. So what I'm most interested in are the scheduler decisions as most
likely IRA/LRA are simply stuck dealing with a pathological conflict graph
given the number of register available. ie, sched1 is the root cause.
Given the size of the problem and the fact that we have register pressure
sensitive scheduling enabled, I don't really think it's related to the issues
Andrew linked or the others I've looked at in the past. But we're going to
have to dive into those sched1 dumps to know for sure.
Vineet, do we have this isolated enough that we know what function is most
affected and presumably the most impacted blocks? If so we can probably start
to debug scheduler dumps.
There's a flag -fsched-verbose=N that gives a lot more low level information
about the scheduler's decisions. I usually use N=99. It makes for a huge
dump, but gives extremely detailed information about the scheduler's view of
the world. It's going to be big enough that bugzilla might balk at attaching
the file, even after compression.
I'm going to go ahead and confirm given Robin's seen the same behavior on this
benchmark.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (2 preceding siblings ...)
2024-04-15 23:14 ` law at gcc dot gnu.org
@ 2024-04-15 23:28 ` vineetg at gcc dot gnu.org
2024-04-16 2:39 ` juzhe.zhong at rivai dot ai
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-04-15 23:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #4 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #3)
> Vineet, do we have this isolated enough that we know what function is most
> affected and presumably the most impacted blocks? If so we can probably
> start to debug scheduler dumps.
I think so :-) But this is all anecdotal.
The test attached was reduced from original/full ML_BSSN_RHS.ii (which granted
is 2nd most spill, orig is ML_BSSN_Advect.ii which i have also reduced now).
Anyhow pretty much all of file is one function and my reduction methodology
was to see 1 spill with sched1 enabled and none otherwise. I hope that is
representative of the pathology seen in the original/full ML_BSSN_RHS.ii
> There's a flag -fsched-verbose=N that gives a lot more low level information
> about the scheduler's decisions. I usually use N=99. It makes for a huge
> dump, but gives extremely detailed information about the scheduler's view of
> the world.
I'll start diving into sched1 dumps as you suggest.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (3 preceding siblings ...)
2024-04-15 23:28 ` vineetg at gcc dot gnu.org
@ 2024-04-16 2:39 ` juzhe.zhong at rivai dot ai
2024-04-16 7:40 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-04-16 2:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |juzhe.zhong at rivai dot ai
--- Comment #5 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Did you try another scheduler ?
-fselective-scheduling to see whether the spill issues still exist ?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (4 preceding siblings ...)
2024-04-16 2:39 ` juzhe.zhong at rivai dot ai
@ 2024-04-16 7:40 ` rguenth at gcc dot gnu.org
2024-04-16 13:49 ` law at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-16 7:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |riscv
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
There are also different sched-pressure algorithms via --param
sched-pressure-algorithm and -flive-range-shrinkage
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (5 preceding siblings ...)
2024-04-16 7:40 ` rguenth at gcc dot gnu.org
@ 2024-04-16 13:49 ` law at gcc dot gnu.org
2024-04-16 13:58 ` law at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-16 13:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Yes, there are different algorithms. I looked at them a while back when we
first noticed the problems with spilling and x264. There was very little
difference for specint when we varied the algorithms. I didn't look at specfp
at the time IIRC.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (6 preceding siblings ...)
2024-04-16 13:49 ` law at gcc dot gnu.org
@ 2024-04-16 13:58 ` law at gcc dot gnu.org
2024-04-16 23:02 ` vineetg at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-16 13:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #8 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I didn't even notice you had that testcase attached!
I haven't done a deep dive, but the first thing that jumps out is the number of
instructions in the ready queue, most likely because of the addressing of
objects in static storage. The highparts alone are going to take ~18 GPRs for
the loop.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (7 preceding siblings ...)
2024-04-16 13:58 ` law at gcc dot gnu.org
@ 2024-04-16 23:02 ` vineetg at gcc dot gnu.org
2024-04-18 0:45 ` vineetg at gcc dot gnu.org
2024-04-18 14:30 ` law at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-04-16 23:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
Vineet Gupta <vineetg at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2024-04-15 00:00:00 |2024-4-16
--- Comment #9 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
So I stared with the reg being spilled (a1)
.L2:
beq a1,zero,.L5 # if j[1] == 0
li a2,1
ble a6,s11,.L2 # if j[0] < 1
sd a1,8(sp) # spill (save)
.L3: # inner loop start
...
blt a2,a6,.L3 # inner loop end
ld a1,8(sp) # spill (restore)
j .L2
Next was zooming into the inner loop where a1 is being used/clobbered by sched1
and not w/o sched1 with my rudimentary define, use, dead annotation.
------------------------------------------------------------------------------
-fschedule-insns (NOK) | -fno-schedule-insns (OK)
------------------------------------------------------------------------------
1-def ld a5,%lo(u)(s0) #u, u | 1-def ld a5,%lo(u)(t6) # u, u
2-def srliw a0,a5,16 | 2-def srliw s10,a5,16
3-def srli a1,a5,32 | 1-use sh a5,%lo(_Z1sv)(a4)
1-use sh a5,%lo(_Z1sv)(a3) | 2-dead sh s10,%lo(_Z1sv+2)(a4)
---insn1--- | 3-def srli s10,a5,32
1-use srli a5,a5,48 | 1-use srli a5,a5,48
---insn2--- | 1-dead sh a5,%lo(_Z1sv+6)(a4)
2-dead sh a0,%lo(_Z1sv+2)(a3) | ---insn1---
3-dead sh a1,%lo(_Z1sv+4)(a3) | ---insn2---
1-dead sh a5,%lo(_Z1sv+6)(a3) | 3-dead sh s10,%lo(_Z1sv+4)(a4)
The problem seems to be longer live range of 2-def (on left side). If it was
used/dead right afte, 3-def won't need a new register.
With that insight, I can now start looking into the sched1 dumps of the
corresponding BB.
;; 10--> b 0: i 35 r170#0=[r242+low(`u')]
:alu:@GR_REGS+1(1)@FP_REGS+0(0)
;; 11--> b 0: i 79 r209=[r229+low(`f')]
:alu:GR_REGS+0(0)FP_REGS+1(1)
;; 12--> b 0: i 76 r141=fix(r206)
:alu:@GR_REGS+1(1)@FP_REGS+0(-1)
;; 13--> b 0: i 46 r180=zxt(r170,0x10,0x10)
:alu:@GR_REGS+1(1)@FP_REGS+0(0)
;; 14--> b 0: i 55 r188=r170 0>>0x20
:alu:GR_REGS+1(1)FP_REGS+0(0)
;; 15--> b 0: i 81 r210=r141<<0x3
:alu:GR_REGS+1(0)FP_REGS+0(0)
;; 16--> b 0: i 82 r211=r143+r210
:alu:GR_REGS+1(0)FP_REGS+0(0)
;; 17--> b 0: i 44 [r230+low(`_Z1sv')]=r170#0
:alu:@GR_REGS+0(0)@FP_REGS+0(0)
;; 18--> b 0: i 65 r197=r170 0>>0x30
:alu:GR_REGS+1(0)FP_REGS+0(0)
;; 19--> b 0: i 54 [r230+low(const(`_Z1sv'+0x2))]=r180#0
:alu:@GR_REGS+0(-1)@FP_REGS+0(0)
;; 20--> b 0: i 64 [r230+low(const(`_Z1sv'+0x4))]=r188#0
:alu:GR_REGS+0(-1)FP_REGS+0(0)
;; 21--> b 0: i 73 [r230+low(const(`_Z1sv'+0x6))]=r197#0
:alu:GR_REGS+0(-1)FP_REGS+0(0)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (8 preceding siblings ...)
2024-04-16 23:02 ` vineetg at gcc dot gnu.org
@ 2024-04-18 0:45 ` vineetg at gcc dot gnu.org
2024-04-18 14:30 ` law at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-04-18 0:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #10 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
Debug update -fsched-verbose=99 dumps (they are reaaaaalllly verbose)
For the insn/regs under consideration, the canonical pre-scheduled sequence
with ideal live-range (but non-ideal load-to-use delay) is following
;; ======================================================
;; -- basic block 3 from 17 to 98 -- before reload
;; ======================================================
;; | 35 | 10 | r170#0=[r242+low(`u')] alu
;; | 44 | 6 | [r230+low(`_Z1sv')]=r170#0 alu
;; | 46 | 7 | r180=zxt(r170,0x10,0x10) alu
;; | 54 | 6 | [r230+low(const(`_Z1sv'+0x2))]=r180#0 alu
;; | 55 | 7 | r188=r170 0>>0x20 alu
;; | 64 | 6 | [r230+low(const(`_Z1sv'+0x4))]=r188#0 alu
;; | 65 | 7 | r197=r170 0>>0x30 alu
;; | 73 | 6 | [r230+low(const(`_Z1sv'+0x6))]=r197#0 alu
r170 (insn 35) is the central character whose live range has to be longest
because of dependencies.
- {46, 55, 65} USE r170, and sources which create new pseudos
- {54, 64, 73} are where these new pseudos sink.
How these 2 sets are interleaved defines the register pressure.
- If above src1:sink1:src2:sink2:src3:sink3: 1 reg suffices
- If src1:src2:src3: 3 reg needed
Per sched1 dumps, the "source" set gets inducted into the ready queue together:
;; dependencies resolved: insn 65
;; tick updated: insn 65 into ready
;; dependencies resolved: insn 55
;; tick updated: insn 55 into ready
;; dependencies resolved: insn 46
;; tick updated: insn 46 into ready
;; dependencies resolved: insn 44
;; tick updated: insn 44 into ready
;; +------------------------------------------------------
;; | Pressure costs for ready queue
;; | pressure points GR_REGS:[26->28 at 17:54] FP_REGS:[1->1 at 0:94]
;; +------------------------------------------------------
;; | 15 44 | 6 +3 | GR_REGS:[0 base cost 0] FP_REGS:[0 base cost 0]
;; | 16 46 | 7 +3 | GR_REGS:[1 base cost 0] FP_REGS:[0 base cost 0]
^^^^
;; | 18 55 | 7 +3 | GR_REGS:[1 base cost 1] FP_REGS:[0 base cost 0]
^^^^
;; | 20 65 | 7 +3 | GR_REGS:[1 base cost 1] FP_REGS:[0 base cost 0]
^^^^
;; | 11 76 | 10 +2 | GR_REGS:[1 base cost 0] FP_REGS:[-1 base cost
0]
;; | 0 94 | 2 +1 | GR_REGS:[0 base cost 0] FP_REGS:[0 base cost 0]
;; | 28 92 | 5 +1 | GR_REGS:[0 base cost 0] FP_REGS:[1 base cost 0]
;; | 26 88 | 5 +1 | GR_REGS:[0 base cost 0] FP_REGS:[1 base cost 0]
;; | 22 79 | 9 +1 | GR_REGS:[0 base cost 0] FP_REGS:[1 base cost 0]
;; +------------------------------------------------------
;; RFS_PRESSURE_DELAY: 7: 44 46 76 94
;; RFS_PRIORITY: 6: 92 88 79
;; RFS_PRESSURE_INDEX: 2: 55
;; Ready list (t = 10): 65:44(cost=1:prio=7:delay=3:idx=20)
55:42(cost=1:prio=7:delay=3:idx=18) 44:39(cost=0:prio=6:delay=3:idx=15)
46:40(cost=0:prio=7:delay=3:idx=16) 76:47(cost=0:prio=10:delay=2:idx=11)
94:58(cost=0:prio=2:delay=1:idx=0) 92:56(cost=0:prio=5:delay=1:idx=28)
88:54(cost=0:prio=5:delay=1:idx=26) 79:48(cost=0:prio=9:delay=1:idx=22)
As the algorithm converges, they move around a bit, but rarely are the src/sink
considered in same iteration and if at all only 1
;; +------------------------------------------------------
;; | Pressure costs for ready queue
;; | pressure points GR_REGS:[29->29 at 0:94] FP_REGS:[1->1 at 0:94]
;; +------------------------------------------------------
...
;; | 19 64 | 6 +0 | GR_REGS:[-1 base cost -1] FP_REGS:[0 base cost
0]
;; | 17 54 | 6 +0 | GR_REGS:[-1 base cost -1] FP_REGS:[0 base cost
0]
;; | 20 65 | 7 +0 | GR_REGS:[0 base cost 0] FP_REGS:[0 base cos
All of this leads to the pessimistic schedule emitted in the end.
I'm still trying to wrap my head around the humungous dump info.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/114729] RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
` (9 preceding siblings ...)
2024-04-18 0:45 ` vineetg at gcc dot gnu.org
@ 2024-04-18 14:30 ` law at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-18 14:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
--- Comment #11 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Yup. -fsched-verbose=99 is *very* verbose. But that's the point, to see all
the gory details. It can be dialed down, but I've never done so myself.
What stands out to me is this:
;; | Pressure costs for ready queue
;; | pressure points GR_REGS:[29->29 at 0:94] FP_REGS:[1->1 at 0:94]
I haven't had to debug pressure stuff, so I'm not as familiar with its dump
format. But I'd hazard a guess the "29->29" means the insn is neutral WRT
register pressure with the estimate being we need 29 GPRs before/after this
insn.
If we think about our GPR file, at 29 we're likely already spilling. 32 - (sp,
fp, ra, x0, gp, perhaps tp as well). So maybe that points at the first two
thing to verify.
1. What does the "29" actually mean. I'm guessing it means the number of GPRs
estimated live at this point. But we should make sure.
2. How does the heuristic determine when to start applying pressure
sensitivity? Presumably it's based on the number of registers in a particular
class. But given we can't allocate sp, ra, x0, fp, gp, are we properly
accounting for those limitations?
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-04-18 14:30 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-15 22:42 [Bug target/114729] New: RISC-V SPEC2017 507.cactu excessive spillls with -fschedule-insns vineetg at gcc dot gnu.org
2024-04-15 22:43 ` [Bug rtl-optimization/114729] " pinskia at gcc dot gnu.org
2024-04-15 22:49 ` vineetg at gcc dot gnu.org
2024-04-15 23:14 ` law at gcc dot gnu.org
2024-04-15 23:28 ` vineetg at gcc dot gnu.org
2024-04-16 2:39 ` juzhe.zhong at rivai dot ai
2024-04-16 7:40 ` rguenth at gcc dot gnu.org
2024-04-16 13:49 ` law at gcc dot gnu.org
2024-04-16 13:58 ` law at gcc dot gnu.org
2024-04-16 23:02 ` vineetg at gcc dot gnu.org
2024-04-18 0:45 ` vineetg at gcc dot gnu.org
2024-04-18 14:30 ` law at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).