public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/113162] New: RISC-V: Unexpected register spillings in vectorized codes and intrinsic codes that have subregs.
@ 2023-12-28 9:29 juzhe.zhong at rivai dot ai
2023-12-28 9:32 ` [Bug c/113162] " juzhe.zhong at rivai dot ai
0 siblings, 1 reply; 2+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28 9:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113162
Bug ID: 113162
Summary: RISC-V: Unexpected register spillings in vectorized
codes and intrinsic codes that have subregs.
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
This following case:
int f[12][100];
void foo (int v)
{
for (int r = 0; r < 100; r += 4)
{
int i = r + 1;
f[0][r] = f[1][r] * (f[2][r] + v) - f[1][i] * (f[2][i]);
f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r] + v);
f[0][r+2] = f[1][r+2] * (f[2][r+2] + v) - f[1][i+2] * (f[2][i+2]);
f[0][i+2] = f[1][r+2] * (f[2][i+2]) + f[1][i+2] * (f[2][r+2] + v);
}
}
using dynamic LMUL, GCC chooses LMUL = 2 to generate the vectorized codes:
lui a5,%hi(f)
addi a5,a5,%lo(f)
addi a3,a5,800
addi a4,a5,400
vsetivli zero,8,e32,m2,ta,ma
addi sp,sp,-32
vlseg4e32.v v8,(a4)
vlseg4e32.v v16,(a3)
vmv.v.x v2,a0
vadd.vv v6,v2,v16
vmul.vv v24,v6,v10
vmul.vv v6,v6,v8
vs2r.v v24,0(sp)
addi a3,a5,928
vmv.v.v v24,v18
vnmsub.vv v24,v10,v6
addi a4,a5,528
vl2re32.v v6,0(sp)
vmacc.vv v6,v18,v8
vadd.vv v4,v2,v20
vmv2r.v v26,v6
vmul.vv v0,v4,v12
vmul.vv v4,v4,v14
vmv.v.v v28,v22
vnmsub.vv v28,v14,v0
vmv.v.v v30,v4
vmacc.vv v30,v22,v12
vsseg4e32.v v24,(a5)
vlseg4e32.v v8,(a4)
vlseg4e32.v v16,(a3)
vadd.vv v6,v2,v16
vmul.vv v24,v6,v10
vmul.vv v6,v6,v8
vs2r.v v24,0(sp)
addi a6,a5,128
vmv.v.v v24,v18
vnmsub.vv v24,v10,v6
addi a0,a5,1056
vl2re32.v v6,0(sp)
addi a1,a5,656
vmacc.vv v6,v18,v8
vadd.vv v4,v2,v20
vmv2r.v v26,v6
vmul.vv v0,v4,v12
vmul.vv v4,v4,v14
vmv.v.v v28,v22
vnmsub.vv v28,v14,v0
vmv.v.v v30,v4
vmacc.vv v30,v22,v12
vsseg4e32.v v24,(a6)
vlseg4e32.v v8,(a1)
vlseg4e32.v v16,(a0)
vadd.vv v6,v2,v16
vmul.vv v24,v6,v10
vmul.vv v6,v6,v8
vs2r.v v24,0(sp)
vadd.vv v4,v2,v20
vmv.v.v v24,v18
vnmsub.vv v24,v10,v6
vmul.vv v0,v4,v12
vl2re32.v v6,0(sp)
vmv.v.v v28,v22
vnmsub.vv v28,v14,v0
vmacc.vv v6,v18,v8
vmul.vv v4,v4,v14
vmv2r.v v26,v6
vmv.v.v v30,v4
vmacc.vv v30,v22,v12
addi a2,a5,256
addi a3,a5,1184
addi a4,a5,784
addi a5,a5,384
vsseg4e32.v v24,(a2)
vsetivli zero,1,e32,m2,ta,ma
vlseg4e32.v v8,(a4)
vlseg4e32.v v16,(a3)
vadd.vv v4,v2,v16
vadd.vv v2,v2,v20
vmul.vv v0,v4,v10
vmul.vv v6,v2,v12
vmul.vv v4,v4,v8
vmul.vv v2,v2,v14
vmv.v.v v24,v10
vnmsub.vv v24,v18,v4
vmv.v.v v26,v0
vmacc.vv v26,v8,v18
vmv.v.v v28,v14
vnmsub.vv v28,v22,v6
vmv.v.v v30,v2
vmacc.vv v30,v12,v22
vsseg4e32.v v24,(a5)
addi sp,sp,32
jr ra
There are redundant spillings (vs2r.v and vl2re32.v) which causes worse
performance on real hardware comparing with default LMUL (LMUL = 1).
After investigations, I find it is not dynamic LMUL cost model issue.
Actually, dynamic LMUL cost model works well and chooses the perfect LMUL = 2
for this case.
The spillings are redundant because we lack subreg liveness tracking in
IRA/LRA,
so RA consider this situation has many alloco conflict.
Confirm with this following series lehua's subreg patch:
https://patchwork.ozlabs.org/project/gcc/list/?series=381823
fix this issue perfectly:
vsetivli zero,8,e32,m2,ta,ma
vmv.v.x v2,a0
lui a5,%hi(f)
addi a5,a5,%lo(f)
addi a4,a5,400
vlseg4e32.v v8,(a4)
addi a4,a5,800
vlseg4e32.v v16,(a4)
vadd.vv v4,v2,v16
vmul.vv v6,v4,v8
vmul.vv v16,v4,v10
vadd.vv v4,v2,v20
vmul.vv v20,v4,v12
vmul.vv v4,v4,v14
vmv.v.v v24,v18
vnmsub.vv v24,v10,v6
vmv.v.v v26,v16
vmacc.vv v26,v18,v8
vmv.v.v v28,v22
vnmsub.vv v28,v14,v20
vmv.v.v v30,v4
vmacc.vv v30,v22,v12
vsseg4e32.v v24,(a5)
addi a4,a5,528
vlseg4e32.v v8,(a4)
addi a4,a5,928
vlseg4e32.v v16,(a4)
vadd.vv v4,v2,v16
vmul.vv v6,v4,v8
vmul.vv v16,v4,v10
vadd.vv v4,v2,v20
vmul.vv v20,v4,v12
vmul.vv v4,v4,v14
vmv.v.v v24,v18
vnmsub.vv v24,v10,v6
vmv.v.v v26,v16
vmacc.vv v26,v18,v8
vmv.v.v v28,v22
vnmsub.vv v28,v14,v20
vmv.v.v v30,v4
vmacc.vv v30,v22,v12
addi a4,a5,128
vsseg4e32.v v24,(a4)
addi a4,a5,656
vlseg4e32.v v8,(a4)
addi a4,a5,1056
vlseg4e32.v v16,(a4)
vadd.vv v4,v2,v16
vmul.vv v6,v4,v8
vmul.vv v16,v4,v10
vadd.vv v4,v2,v20
vmul.vv v20,v4,v12
vmul.vv v4,v4,v14
vmv.v.v v24,v18
vnmsub.vv v24,v10,v6
vmv.v.v v26,v16
vmacc.vv v26,v18,v8
vmv.v.v v28,v22
vnmsub.vv v28,v14,v20
vmv.v.v v30,v4
vmacc.vv v30,v22,v12
addi a4,a5,256
vsseg4e32.v v24,(a4)
addi a4,a5,784
vsetivli zero,1,e32,m2,ta,ma
vlseg4e32.v v8,(a4)
addi a4,a5,1184
vlseg4e32.v v16,(a4)
vadd.vv v4,v2,v16
vmul.vv v6,v4,v8
vmul.vv v4,v4,v10
vadd.vv v2,v2,v20
vmul.vv v16,v2,v12
vmul.vv v2,v2,v14
vmv.v.v v24,v10
vnmsub.vv v24,v18,v6
vmv.v.v v26,v4
vmacc.vv v26,v8,v18
vmv.v.v v28,v14
vnmsub.vv v28,v22,v16
vmv.v.v v30,v2
vmacc.vv v30,v12,v22
addi a5,a5,384
vsseg4e32.v v24,(a5)
ret
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug c/113162] RISC-V: Unexpected register spillings in vectorized codes and intrinsic codes that have subregs.
2023-12-28 9:29 [Bug c/113162] New: RISC-V: Unexpected register spillings in vectorized codes and intrinsic codes that have subregs juzhe.zhong at rivai dot ai
@ 2023-12-28 9:32 ` juzhe.zhong at rivai dot ai
0 siblings, 0 replies; 2+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28 9:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113162
--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
We can have this following reduced intrinsic codes to represent the subreg
issue:
https://godbolt.org/z/KfvT7hjnz
#include "riscv_vector.h"
void foo (void *in, void *out, int x)
{
vint32m2_t dup = __riscv_vmv_v_x_i32m2 (x, 8);
vint32m2x4_t tuple1 = __riscv_vlseg4e32_v_i32m2x4 (in, 8);
vint32m2x4_t tuple2 = __riscv_vlseg4e32_v_i32m2x4 (in + 16, 8);
vint32m2_t tmp1 = __riscv_vadd_vv_i32m2 (dup, __riscv_vget_v_i32m2x4_i32m2
(tuple2, 0), 8);
vint32m2_t tmp2 = __riscv_vmul_vv_i32m2 (tmp1, __riscv_vget_v_i32m2x4_i32m2
(tuple1, 0), 8);
tmp1 = __riscv_vmul_vv_i32m2 (tmp1, __riscv_vget_v_i32m2x4_i32m2 (tuple1,
1), 8);
vint32m2_t tmp3 = __riscv_vadd_vv_i32m2 (dup, __riscv_vget_v_i32m2x4_i32m2
(tuple2, 2), 8);
vint32m2_t tmp4 = __riscv_vmul_vv_i32m2 (tmp3, __riscv_vget_v_i32m2x4_i32m2
(tuple1, 2), 8);
vint32m2_t tmp9 = __riscv_vmul_vv_i32m2 (tmp3, __riscv_vget_v_i32m2x4_i32m2
(tuple1, 3), 8);
vint32m2_t tmp5 = __riscv_vnmsub_vv_i32m2 (tmp9,
__riscv_vget_v_i32m2x4_i32m2 (tuple2, 1), tmp2, 8);
vint32m2_t tmp6 = __riscv_vmacc_vv_i32m2 (tmp1,
__riscv_vget_v_i32m2x4_i32m2 (tuple1, 0), tmp3, 8);
vint32m2_t tmp7 = __riscv_vnmsac_vv_i32m2 (tmp4,
__riscv_vget_v_i32m2x4_i32m2 (tuple2, 3), tmp4, 8);
vint32m2_t tmp8 = __riscv_vmacc_vv_i32m2 (tmp3,
__riscv_vget_v_i32m2x4_i32m2 (tuple2, 3), __riscv_vget_v_i32m2x4_i32m2 (tuple1,
2), 8);
vint32m2x4_t create = __riscv_vcreate_v_i32m2x4 (tmp5, tmp6, tmp7, tmp8);
__riscv_vsseg4e32_v_i32m2x4 (out, create, 8);
}
GCC:
foo:
csrr t0,vlenb
slli t1,t0,1
addi a5,a0,16
vsetivli zero,8,e32,m2,ta,ma
sub sp,sp,t1
vlseg4e32.v v8,(a0)
vlseg4e32.v v24,(a5)
vmv.v.x v2,a2
csrr t0,vlenb
vadd.vv v4,v2,v24
vadd.vv v2,v2,v28
vmul.vv v0,v4,v10
vmul.vv v16,v2,v12
vmul.vv v4,v4,v8
vs2r.v v16,0(sp)
slli t1,t0,1
vmul.vv v6,v2,v14
vmv.v.v v18,v0
vmacc.vv v18,v2,v8
vnmsub.vv v6,v26,v4
vmv.v.v v22,v2
vmacc.vv v22,v30,v12
vl2re32.v v4,0(sp)
vmv2r.v v16,v6
vnmsub.vv v4,v30,v4
vmv2r.v v20,v4
vsseg4e32.v v16,(a1)
add sp,sp,t1
jr ra
Clang:
foo: # @foo
vsetivli zero, 8, e32, m2, ta, ma
addi a3, a0, 16
vlseg4e32.v v8, (a3)
vlseg4e32.v v16, (a0)
vmv.v.x v24, a2
vadd.vv v8, v24, v8
vmul.vv v26, v8, v16
vmul.vv v4, v8, v18
vadd.vv v8, v24, v12
vmul.vv v6, v8, v20
vmul.vv v2, v8, v22
vnmsub.vv v2, v10, v26
vmacc.vv v4, v8, v16
vnmsub.vv v6, v14, v6
vmacc.vv v8, v20, v14
vsseg4e32.v v2, (a1)
ret
With lehua's patch:
foo:
addi a5,a0,16
vsetivli zero,8,e32,m2,ta,ma
vlseg4e32.v v8,(a0)
vlseg4e32.v v24,(a5)
vmv.v.x v2,a2
vadd.vv v4,v2,v24
vadd.vv v2,v2,v28
vmul.vv v10,v4,v10
vmul.vv v14,v2,v14
vmul.vv v4,v4,v8
vmul.vv v6,v2,v12
vmv.v.v v16,v14
vnmsub.vv v16,v26,v4
vmv.v.v v18,v10
vmacc.vv v18,v2,v8
vmv.v.v v20,v6
vnmsub.vv v20,v30,v6
vmv.v.v v22,v2
vmacc.vv v22,v30,v12
vsseg4e32.v v16,(a1)
ret
No spillings, but has some redundant vmv.v.v which should be other issues.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-12-28 9:32 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-28 9:29 [Bug c/113162] New: RISC-V: Unexpected register spillings in vectorized codes and intrinsic codes that have subregs juzhe.zhong at rivai dot ai
2023-12-28 9:32 ` [Bug c/113162] " juzhe.zhong at rivai dot ai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).