* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
@ 2022-05-26 6:12 ` kito at gcc dot gnu.org
2022-06-07 2:41 ` wilson at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: kito at gcc dot gnu.org @ 2022-05-26 6:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Kito Cheng <kito at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kito at gcc dot gnu.org
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2022-05-26
--- Comment #1 from Kito Cheng <kito at gcc dot gnu.org> ---
Confirmed, Thanks for reporting this!
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
2022-05-26 6:12 ` [Bug target/105733] " kito at gcc dot gnu.org
@ 2022-06-07 2:41 ` wilson at gcc dot gnu.org
2022-07-25 20:39 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: wilson at gcc dot gnu.org @ 2022-06-07 2:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Jim Wilson <wilson at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilson at gcc dot gnu.org
--- Comment #2 from Jim Wilson <wilson at gcc dot gnu.org> ---
This is a known problem that I have commented on before. For instance here
https://github.com/riscv-collab/riscv-gcc/issues/193
Copying my reply to here since this is a better place for the bug report:
This is a known problem. GCC first emits code using a frame pointer, and then
tries to eliminate the frame pointer during register allocation. When the frame
size is larger than the immediate field of an add, we need temporaries to
compute frame offsets. Compiler optimizations like common subexpression
elimination (cse) try to optimize the calculation of the temporaries. Then when
we try to eliminate the frame pointer, we run into trouble because the cse opt
changes interfere with the frame pointer elimination, and we end up with an
ugly inefficient mess.
This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and
most everyone else has 16-bit immediates, we hit the problem sooner, and thus
makes it much more visible for RISC-V. The example above is putting 12KB on the
stack, which is larger than 12-bits of range (4KB), but well within 16-bits of
range (64KB).
I have a prototype of a patch to fix it by allowing >12-bit offsets for frame
pointer references, and then fixing this when eliminating the frame pointer,
but this can cause other problems, and needs more work to be usable. I have no
idea when someone might try to finish the patch.
End of inclusion from github.
To improve my earlier answer, we have signed 12-bit immediates, so the trouble
actually starts at 2048 bytes, and Jessica's example is larger than that.
Reduce BUFSIZ to 2044 and you get a reasonable result.
foo:
li a5,4096
addi a5,a5,-2048
addi sp,sp,-2048
add a5,a5,a0
add a0,a5,sp
li t0,4096
sb zero,-2048(a0)
addi t0,t0,-2048
add sp,sp,t0
jr ra
There is still a bit of oddness here. We are adding 2048 to a0 and then using
an address -2048(a0). I think that is more cse trouble. 2048 requires two
instructions to load into a register which is likely confusing something
somewhere.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
2022-05-26 6:12 ` [Bug target/105733] " kito at gcc dot gnu.org
2022-06-07 2:41 ` wilson at gcc dot gnu.org
@ 2022-07-25 20:39 ` pinskia at gcc dot gnu.org
2022-07-25 20:40 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-25 20:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vineet.gupta at linux dot dev
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 106439 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
` (2 preceding siblings ...)
2022-07-25 20:39 ` pinskia at gcc dot gnu.org
@ 2022-07-25 20:40 ` pinskia at gcc dot gnu.org
2023-12-15 21:47 ` vineetg at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-25 20:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
` (3 preceding siblings ...)
2022-07-25 20:40 ` pinskia at gcc dot gnu.org
@ 2023-12-15 21:47 ` vineetg at gcc dot gnu.org
2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
2024-05-28 17:15 ` vineetg at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: vineetg at gcc dot gnu.org @ 2023-12-15 21:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Vineet Gupta <vineetg at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vineetg at gcc dot gnu.org
--- Comment #4 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
There has been good improvements in gcc codegen specially with commit below.
commit 6619b3d4c15cd754798b1048c67f3806bbcc2e6d
Author: Jivan Hakobyan <jivanhakobyan9@gmail.com>
Date: Wed Aug 23 14:10:30 2023 -0600
Improve quality of code from LRA register elimination
This is primarily Jivan's work, I'm mostly responsible for the write-up and
coordinating with Vlad on a few questions.
On targets with limitations on immediates usable in arithmetic
instructions,
LRA's register elimination phase can construct fairly poor code.
Tip W/o commit 6619b3d4c | With 6619b3d4c
|
foo: | foo:
li t0,-4096 | li t0,-4096
addi t0,t0,2032 | addi t0,t0,2032
li a5,0 |
li a4,0 |
add sp,sp,t0 | add sp,sp,t0
add a4,a4,a5 |
add a5,a4,sp | add a5,a5,a0
add a5,a5,a0 |
li t0,4096 | li t0,4096
sb zero,0(a5) | sb zero,0(a5)
addi t0,t0,-2032 | addi t0,t0,-2032
add sp,sp,t0 | add sp,sp,t0
jr ra | jr ra
We still have the weird LUI 4096 based constant construction. I have a patch to
avoid 4096 for certain ranges [-4096,-2049] or [2048,4094] (cribbed from
llvm).
e.g. 2064 = 2047 + 17 and we could potentially "spread" the 2 parts over 2 adds
to SP, avoiding the LUI. However if a const costs more than 1 insn, gcc wants
to force it in a register rather than split the add operation into 2 adds with
the split constants.
expand_binop
expand_binop_directly
avoid_expensive_constant
/* X is to be used in mode MODE as operand OPN to BINOPTAB. If we're
optimizing, and if the operand is a constant that costs more than
1 instruction, force the constant into a register and return that
register. Return X otherwise. UNSIGNEDP says whether X is unsigned. */
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
` (4 preceding siblings ...)
2023-12-15 21:47 ` vineetg at gcc dot gnu.org
@ 2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
2024-05-28 17:15 ` vineetg at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-21 17:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vineet Gupta <vineetg@gcc.gnu.org>:
https://gcc.gnu.org/g:f9cfc192ed0127edb7e79818917dd2859fce4d44
commit r15-757-gf9cfc192ed0127edb7e79818917dd2859fce4d44
Author: Vineet Gupta <vineetg@rivosinc.com>
Date: Mon May 13 11:46:03 2024 -0700
RISC-V: avoid LUI based const mat in prologue/epilogue expansion
[PR/105733]
If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.
The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.
This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)
gcc-13.1 release | gcc 230823 | |
| g6619b3d4c15c | This patch |
clang/llvm
---------------------------------------------------------------------------------
li t0,-4096 | li t0,-4096 | addi sp,sp,-2048 | addi
sp,sp,-2048
addi t0,t0,2016 | addi t0,t0,2032 | add sp,sp,-16 | addi
sp,sp,-32
li a4,4096 | add sp,sp,t0 | add a5,sp,a0 | add
a1,sp,16
add sp,sp,t0 | addi a5,sp,-2032 | sb zero,0(a5) | add
a0,a0,a1
li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb
zero,0(a0)
addi a4,a4,-2032 | li t0, 4096 | addi sp,sp,32 | addi
sp,sp,2032
add a4,a4,a5 | sb zero,2032(a0) | ret | addi
sp,sp,48
addi a5,sp,16 | addi t0,t0,-2032 | | ret
add a5,a4,a5 | add sp,sp,t0 |
add a0,a5,a0 | ret |
li t0,4096 |
sd a5,8(sp) |
sb zero,2032(a0)|
addi t0,t0,-2016 |
add sp,sp,t0 |
ret |
gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.
Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #1568
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/105733] riscv: Poor codegen for large stack frames
2022-05-26 0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
` (5 preceding siblings ...)
2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
@ 2024-05-28 17:15 ` vineetg at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-05-28 17:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Vineet Gupta <vineetg at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #6 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
Fixed with aforementioned commit for gcc-15.
^ permalink raw reply [flat|nested] 8+ messages in thread