public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
@ 2023-12-22 9:04 juzhe.zhong at rivai dot ai
2023-12-23 0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-22 9:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112
Bug ID: 113112
Summary: RISC-V: Dynamic LMUL feature stabilization for GCC-14
release
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
Created attachment 56922
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56922&action=edit
dynamic LMUL fail case
Hi, as we known that we have supported dynamic LMUL feature but not stable.
As far as I known, we only have these 2 execution FAILs:
FAIL: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/signbit-5.c execution test
in full coverage testing. And they are not the real FAIL.
Tests need to be adjusted.
And I have tested on K230 and other hardware, turns out we will have over 30%
performance improvement (compare with default LMUL = M1) for various benchmark
if we can select reasonable big
LMUL (no additional registers spillings).
However, I also find that there are some benchmarks have significantly
performance
drop (compare with default LMUL = M1) when using dynamic LMUL.
I am pretty sure because we pick the wrong big LMUL (LMUL>1) which causes
additional register spillings then we have bad performance for such situations.
For example:
#include <stdint-gcc.h>
#define N 40
int a[N];
__attribute__ ((noinline)) int
foo (int n){
int i,j;
int sum,x;
for (i = 0; i < n; i++) {
sum = 0;
for (j = 0; j < n; j++) {
sum += (i + j);
}
a[i] = sum;
}
return 0;
}
-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-lmul=dynamic --param
riscv-autovec-preference=fixed-vlmax
ASM:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addi sp,sp,-128
addi a2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp)
.L3:
vl8re32.v v16,0(sp)
vsetvli a4,a1,e8,m2,ta,ma
li a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v v8,0(sp)
.L4:
addiw a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
slli a4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li a0,0
addi sp,sp,128
jr ra
.L11:
li a0,0
ret
As we can see, pick up LMUL = 8 then spills.
This case is found by this following code I add into mov pattern:
if (known_gt (GET_MODE_SIZE (mode), BYTES_PER_RISCV_VECTOR)
&& riscv_autovec_lmul == RVV_DYNAMIC && lra_in_progress)
gcc_unreachable ();
The attachment is the file shows the cases that we pick up incorrect too big
LMUL which cause addiontial spillings.
I will work on this issue in the following days.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
2023-12-22 9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
@ 2023-12-23 0:59 ` cvs-commit at gcc dot gnu.org
2023-12-26 9:29 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-23 0:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112
--- Comment #1 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:290230034092898981488d0716ddae43bd36c09f
commit r14-6810-g290230034092898981488d0716ddae43bd36c09f
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Sat Dec 23 07:07:42 2023 +0800
RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model
analysis
Consider this following case:
foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addi sp,sp,-128
addi a2,a2,%lo(.LANCHOR0)
mv a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v v8
vs8r.v v8,0(sp) ---> spill
.L3:
vl8re32.v v16,0(sp) ---> reload
vsetvli a4,a1,e8,m2,ta,ma
li a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v v8,0(sp) ---> spill
.L4:
addiw a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
slli a4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li a0,0
addi sp,sp,128
jr ra
.L11:
li a0,0
ret
Pick unexpected LMUL = 8.
The root cause is we didn't involve PHI initial value in the dynamic LMUL
calculation:
# j_17 = PHI <j_11(9), 0(5)> ---> #
vect_vec_iv_.8_24 = PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>
We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it
does allocate an vector register group for it.
This patch fixes this missing count. Then after this patch we pick up
perfect LMUL (LMUL = M4)
foo:
ble a0,zero,.L9
lui a4,%hi(.LANCHOR0)
addi a4,a4,%lo(.LANCHOR0)
mv a2,a0
vsetivli zero,16,e32,m4,ta,ma
vid.v v20
.L3:
vsetvli a3,a2,e8,m1,ta,ma
li a5,0
vsetivli zero,16,e32,m4,ta,ma
vmv4r.v v16,v20
vmv.v.i v12,0
vmv.v.x v4,a3
vmv4r.v v8,v12
vadd.vv v20,v20,v4
.L4:
addiw a5,a5,1
vmv4r.v v4,v8
vadd.vi v8,v8,1
vadd.vv v4,v16,v4
vadd.vv v12,v12,v4
bne a0,a5,.L4
slli a5,a3,2
vsetvli zero,a3,e32,m4,ta,ma
sub a2,a2,a3
vse32.v v12,0(a4)
add a4,a4,a5
bne a2,zero,.L3
.L9:
li a0,0
ret
Tested on --with-arch=gcv no regression.
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Refine dump information.
(preferred_new_lmul_p): Make PHI initial value into live regs
calculation.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
2023-12-22 9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
2023-12-23 0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
@ 2023-12-26 9:29 ` cvs-commit at gcc dot gnu.org
2023-12-27 9:19 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-26 9:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112
--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:f83cfb8148bcf0876df76761a9a4545bc939667d
commit r14-6836-gf83cfb8148bcf0876df76761a9a4545bc939667d
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Tue Dec 26 16:42:27 2023 +0800
RISC-V: Some minior tweak on dynamic LMUL cost model
Tweak some codes of dynamic LMUL cost model to make computation more
predictable and accurate.
Tested on both RV32 and RV64 no regression.
Committed.
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul):
Tweak LMUL estimation.
(has_unexpected_spills_p): Ditto.
(costs::record_potential_unexpected_spills): Ditto.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: Add more
checks.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-12.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-2.c: New test.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
2023-12-22 9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
2023-12-23 0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
2023-12-26 9:29 ` cvs-commit at gcc dot gnu.org
@ 2023-12-27 9:19 ` cvs-commit at gcc dot gnu.org
2024-01-02 0:23 ` cvs-commit at gcc dot gnu.org
2024-01-03 9:21 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-27 9:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112
--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:c4ac073d4fc7474e29d085bbd10971138ee7478e
commit r14-6850-gc4ac073d4fc7474e29d085bbd10971138ee7478e
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Wed Dec 27 16:16:41 2023 +0800
RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness
information
Consider this following case:
int f[12][100];
void bad1(int v1, int v2)
{
for (int r = 0; r < 100; r += 4)
{
int i = r + 1;
f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]);
f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]);
f[0][r+2] = f[1][r+2] * (f[2][r+2]) - f[1][i+2] * (f[2][i+2]);
f[0][i+2] = f[1][r+2] * (f[2][i+2]) + f[1][i+2] * (f[2][r+2]);
}
}
Pick up LMUL = 8 VLS blindly:
lui a4,%hi(f)
addi a4,a4,%lo(f)
addi sp,sp,-592
addi a3,a4,800
lui a5,%hi(.LANCHOR0)
vl8re32.v v24,0(a3)
addi a5,a5,%lo(.LANCHOR0)
addi a1,a4,400
addi a3,sp,140
vl8re32.v v16,0(a1)
vl4re16.v v4,0(a5)
addi a7,a5,192
vs4r.v v4,0(a3)
addi t0,a5,64
addi a3,sp,336
li t2,32
addi a2,a5,128
vsetvli a5,zero,e32,m8,ta,ma
vrgatherei16.vv v8,v16,v4
vmul.vv v8,v8,v24
vl8re32.v v0,0(a7)
vs8r.v v8,0(a3)
vmsltu.vx v8,v0,t2
addi a3,sp,12
addi t2,sp,204
vsm.v v8,0(t2)
vl4re16.v v4,0(t0)
vl4re16.v v0,0(a2)
vs4r.v v4,0(a3)
addi t0,sp,336
vrgatherei16.vv v8,v24,v4
addi a3,sp,208
vrgatherei16.vv v24,v16,v0
vs4r.v v0,0(a3)
vmul.vv v8,v8,v24
vlm.v v0,0(t2)
vl8re32.v v24,0(t0)
addi a3,sp,208
vsub.vv v16,v24,v8
addi t6,a4,528
vadd.vv v8,v24,v8
addi t5,a4,928
vmerge.vvm v8,v8,v16,v0
addi t3,a4,128
vs8r.v v8,0(a4)
addi t4,a4,1056
addi t1,a4,656
addi a0,a4,256
addi a6,a4,1184
addi a1,a4,784
addi a7,a4,384
addi a4,sp,140
vl4re16.v v0,0(a3)
vl8re32.v v24,0(t6)
vl4re16.v v4,0(a4)
vrgatherei16.vv v16,v24,v0
addi a3,sp,12
vs8r.v v16,0(t0)
vl8re32.v v8,0(t5)
vrgatherei16.vv v16,v24,v4
vl4re16.v v4,0(a3)
vrgatherei16.vv v24,v8,v4
vmul.vv v16,v16,v8
vl8re32.v v8,0(t0)
vmul.vv v8,v8,v24
vsub.vv v24,v16,v8
vlm.v v0,0(t2)
addi a3,sp,208
vadd.vv v8,v8,v16
vl8re32.v v16,0(t4)
vmerge.vvm v8,v8,v24,v0
vrgatherei16.vv v24,v16,v4
vs8r.v v24,0(t0)
vl4re16.v v28,0(a3)
addi a3,sp,464
vs8r.v v8,0(t3)
vl8re32.v v8,0(t1)
vrgatherei16.vv v0,v8,v28
vs8r.v v0,0(a3)
addi a3,sp,140
vl4re16.v v24,0(a3)
addi a3,sp,464
vrgatherei16.vv v0,v8,v24
vl8re32.v v24,0(t0)
vmv8r.v v8,v0
vl8re32.v v0,0(a3)
vmul.vv v8,v8,v16
vmul.vv v24,v24,v0
vsub.vv v16,v8,v24
vadd.vv v8,v8,v24
vsetivli zero,4,e32,m8,ta,ma
vle32.v v24,0(a6)
vsetvli a4,zero,e32,m8,ta,ma
addi a4,sp,12
vlm.v v0,0(t2)
vmerge.vvm v8,v8,v16,v0
vl4re16.v v16,0(a4)
vrgatherei16.vv v0,v24,v16
vsetivli zero,4,e32,m8,ta,ma
vs8r.v v0,0(a4)
addi a4,sp,208
vl4re16.v v0,0(a4)
vs8r.v v8,0(a0)
vle32.v v16,0(a1)
vsetvli a5,zero,e32,m8,ta,ma
vrgatherei16.vv v8,v16,v0
vs8r.v v8,0(a4)
addi a4,sp,140
vl4re16.v v4,0(a4)
addi a5,sp,12
vrgatherei16.vv v8,v16,v4
vl8re32.v v0,0(a5)
vsetivli zero,4,e32,m8,ta,ma
addi a5,sp,208
vmv8r.v v16,v8
vl8re32.v v8,0(a5)
vmul.vv v24,v24,v16
vmul.vv v8,v0,v8
vsub.vv v16,v24,v8
vadd.vv v8,v8,v24
vsetvli a5,zero,e8,m2,ta,ma
vlm.v v0,0(t2)
vsetivli zero,4,e32,m8,ta,ma
vmerge.vvm v8,v8,v16,v0
vse32.v v8,0(a7)
addi sp,sp,592
jr ra
This patch makes loop with known NITERS be aware of liveness estimation,
after this patch, choosing LMUL = 4:
lui a5,%hi(f)
addi a5,a5,%lo(f)
addi a3,a5,400
addi a4,a5,800
vsetivli zero,8,e32,m2,ta,ma
vlseg4e32.v v16,(a3)
vlseg4e32.v v8,(a4)
vmul.vv v2,v8,v16
addi a3,a5,528
vmv.v.v v24,v10
vnmsub.vv v24,v18,v2
addi a4,a5,928
vmul.vv v2,v12,v22
vmul.vv v6,v8,v18
vmv.v.v v30,v2
vmacc.vv v30,v14,v20
vmv.v.v v26,v6
vmacc.vv v26,v10,v16
vmul.vv v4,v12,v20
vmv.v.v v28,v14
vnmsub.vv v28,v22,v4
vsseg4e32.v v24,(a5)
vlseg4e32.v v16,(a3)
vlseg4e32.v v8,(a4)
vmul.vv v2,v8,v16
addi a6,a5,128
vmv.v.v v24,v10
vnmsub.vv v24,v18,v2
addi a0,a5,656
vmul.vv v2,v12,v22
addi a1,a5,1056
vmv.v.v v30,v2
vmacc.vv v30,v14,v20
vmul.vv v6,v8,v18
vmul.vv v4,v12,v20
vmv.v.v v26,v6
vmacc.vv v26,v10,v16
vmv.v.v v28,v14
vnmsub.vv v28,v22,v4
vsseg4e32.v v24,(a6)
vlseg4e32.v v16,(a0)
vlseg4e32.v v8,(a1)
vmul.vv v2,v8,v16
addi a2,a5,256
vmv.v.v v24,v10
vnmsub.vv v24,v18,v2
addi a3,a5,784
vmul.vv v2,v12,v22
addi a4,a5,1184
vmv.v.v v30,v2
vmacc.vv v30,v14,v20
vmul.vv v6,v8,v18
vmul.vv v4,v12,v20
vmv.v.v v26,v6
vmacc.vv v26,v10,v16
vmv.v.v v28,v14
vnmsub.vv v28,v22,v4
addi a5,a5,384
vsseg4e32.v v24,(a2)
vsetivli zero,1,e32,m2,ta,ma
vlseg4e32.v v16,(a3)
vlseg4e32.v v8,(a4)
vmul.vv v2,v16,v8
vmul.vv v6,v18,v8
vmv.v.v v24,v18
vnmsub.vv v24,v10,v2
vmul.vv v4,v20,v12
vmul.vv v2,v22,v12
vmv.v.v v26,v6
vmacc.vv v26,v16,v10
vmv.v.v v28,v22
vnmsub.vv v28,v14,v4
vmv.v.v v30,v2
vmacc.vv v30,v20,v14
vsseg4e32.v v24,(a5)
ret
Tested on both RV32 and RV64 no regressions.
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (is_gimple_assign_or_call):
New function.
(get_first_lane_point): Ditto.
(get_last_lane_point): Ditto.
(max_number_of_live_regs): Refine live point dump.
(compute_estimated_lmul): Make unknown NITERS loop be aware of
liveness.
(costs::better_main_loop_than_p): Ditto.
* config/riscv/riscv-vector-costs.h (struct stmt_point): Add new
member.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-3.c: New test.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
2023-12-22 9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
` (2 preceding siblings ...)
2023-12-27 9:19 ` cvs-commit at gcc dot gnu.org
@ 2024-01-02 0:23 ` cvs-commit at gcc dot gnu.org
2024-01-03 9:21 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-02 0:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:9a29b00365a07745c4ba2ed2af374e7c732aaeb3
commit r14-6877-g9a29b00365a07745c4ba2ed2af374e7c732aaeb3
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Fri Dec 29 09:21:02 2023 +0800
RISC-V: Count pointer type SSA into RVV regs liveness for dynamic LMUL cost
model
This patch fixes the following choosing unexpected big LMUL which cause
register spillings.
Before this patch, choosing LMUL = 4:
addi sp,sp,-160
addiw t1,a2,-1
li a5,7
bleu t1,a5,.L16
vsetivli zero,8,e64,m4,ta,ma
vmv.v.x v4,a0
vs4r.v v4,0(sp) ---> spill to the stack.
vmv.v.x v4,a1
addi a5,sp,64
vs4r.v v4,0(a5) ---> spill to the stack.
The root cause is the following codes:
if (poly_int_tree_p (var)
|| (is_gimple_val (var)
&& !POINTER_TYPE_P (TREE_TYPE (var))))
We count the variable as consuming a RVV reg group when it is not
POINTER_TYPE.
It is right for load/store STMT for example:
_1 = (MEM)*addr --> addr won't be allocated an RVV vector group.
However, we find it is not right for non-load/store STMT:
_3 = _1 == x_8(D);
_1 is pointer type too but we does allocate a RVV register group for it.
So after this patch, we are choosing the perfect LMUL for the testcase in
this patch:
ble a2,zero,.L17
addiw a7,a2,-1
li a5,3
bleu a7,a5,.L15
srliw a5,a7,2
slli a6,a5,1
add a6,a6,a5
lui a5,%hi(replacements)
addi t1,a5,%lo(replacements)
slli a6,a6,5
lui t4,%hi(.LANCHOR0)
lui t3,%hi(.LANCHOR0+8)
lui a3,%hi(.LANCHOR0+16)
lui a4,%hi(.LC1)
vsetivli zero,4,e16,mf2,ta,ma
addi t4,t4,%lo(.LANCHOR0)
addi t3,t3,%lo(.LANCHOR0+8)
addi a3,a3,%lo(.LANCHOR0+16)
addi a4,a4,%lo(.LC1)
add a6,t1,a6
addi a5,a5,%lo(replacements)
vle16.v v18,0(t4)
vle16.v v17,0(t3)
vle16.v v16,0(a3)
vmsgeu.vi v25,v18,4
vadd.vi v24,v18,-4
vmsgeu.vi v23,v17,4
vadd.vi v22,v17,-4
vlm.v v21,0(a4)
vmsgeu.vi v20,v16,4
vadd.vi v19,v16,-4
vsetvli zero,zero,e64,m2,ta,mu
vmv.v.x v12,a0
vmv.v.x v14,a1
.L4:
vlseg3e64.v v6,(a5)
vmseq.vv v2,v6,v12
vmseq.vv v0,v8,v12
vmsne.vv v1,v8,v12
vmand.mm v1,v1,v2
vmerge.vvm v2,v8,v14,v0
vmv1r.v v0,v1
addi a4,a5,24
vmerge.vvm v6,v6,v14,v0
vmerge.vim v2,v2,0,v0
vrgatherei16.vv v4,v6,v18
vmv1r.v v0,v25
vrgatherei16.vv v4,v2,v24,v0.t
vs1r.v v4,0(a5)
addi a3,a5,48
vmv1r.v v0,v21
vmv2r.v v4,v2
vcompress.vm v4,v6,v0
vs1r.v v4,0(a4)
vmv1r.v v0,v23
addi a4,a5,72
vrgatherei16.vv v4,v6,v17
vrgatherei16.vv v4,v2,v22,v0.t
vs1r.v v4,0(a3)
vmv1r.v v0,v20
vrgatherei16.vv v4,v6,v16
addi a5,a5,96
vrgatherei16.vv v4,v2,v19,v0.t
vs1r.v v4,0(a4)
bne a6,a5,.L4
No spillings, no "sp" register used.
Tested on both RV32 and RV64, no regression.
Ok for trunk ?
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Fix
pointer type liveness count.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: New test.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release
2023-12-22 9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
` (3 preceding siblings ...)
2024-01-02 0:23 ` cvs-commit at gcc dot gnu.org
@ 2024-01-03 9:21 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-03 9:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:a43bd8255451227fc1cd3601b1f0265b21fafada
commit r14-6889-ga43bd8255451227fc1cd3601b1f0265b21fafada
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Tue Jan 2 11:37:43 2024 +0800
RISC-V: Make liveness be aware of rgroup number of LENS[dynamic LMUL]
This patch fixes the following situation:
vl4re16.v v12,0(a5)
...
vl4re16.v v16,0(a3)
vs4r.v v12,0(a5)
...
vl4re16.v v4,0(a0)
vs4r.v v16,0(a3)
...
vsetvli a3,zero,e16,m4,ta,ma
...
vmv.v.x v8,t6
vmsgeu.vv v2,v16,v8
vsub.vv v16,v16,v8
vs4r.v v16,0(a5)
...
vs4r.v v4,0(a0)
vmsgeu.vv v1,v4,v8
...
vsub.vv v4,v4,v8
slli a6,a4,2
vs4r.v v4,0(a5)
...
vsub.vv v4,v12,v8
vmsgeu.vv v3,v12,v8
vs4r.v v4,0(a5)
...
There are many spills which are 'vs4r.v'. The root cause is that we don't
count
vector REG liveness referencing the rgroup controls.
_29 = _25->iatom[0]; is transformed into the following vect statement with
4 different loop_len (loop_len_74, loop_len_75, loop_len_76, loop_len_77).
vect__29.11_78 = .MASK_LEN_LOAD (vectp_sb.9_72, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_74, 0);
vect__29.12_80 = .MASK_LEN_LOAD (vectp_sb.9_79, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_75, 0);
vect__29.13_82 = .MASK_LEN_LOAD (vectp_sb.9_81, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_76, 0);
vect__29.14_84 = .MASK_LEN_LOAD (vectp_sb.9_83, 32B, { -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_77, 0);
which are the LENS number (LOOP_VINFO_LENS (loop_vinfo).length ()).
Count liveness according to LOOP_VINFO_LENS (loop_vinfo).length () to
compute liveness more accurately:
vsetivli zero,8,e16,m1,ta,ma
vmsgeu.vi v19,v14,8
vadd.vi v18,v14,-8
vmsgeu.vi v17,v1,8
vadd.vi v16,v1,-8
vlm.v v15,0(a5)
...
Tested no regression, ok for trunk ?
PR target/113112
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Add
rgroup info.
(max_number_of_live_regs): Ditto.
(has_unexpected_spills_p): Ditto.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-5.c: New test.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-03 9:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-22 9:04 [Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release juzhe.zhong at rivai dot ai
2023-12-23 0:59 ` [Bug target/113112] " cvs-commit at gcc dot gnu.org
2023-12-26 9:29 ` cvs-commit at gcc dot gnu.org
2023-12-27 9:19 ` cvs-commit at gcc dot gnu.org
2024-01-02 0:23 ` cvs-commit at gcc dot gnu.org
2024-01-03 9:21 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).