public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
@ 2023-01-03 1:51 juzhe.zhong at rivai dot ai
2023-01-03 1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-01-03 1:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270
Bug ID: 108270
Summary: un-optimal vsetvl for multi-loop if avl is 0 ~ 31
immediate
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
Consider this following case:
#include "riscv_vector.h"
void f1 (void * restrict in, void * restrict out, int l, int n, int m)
{
for (int i = 0; i < l; i++){
for (int j = 0; j < m; j++){
for (int k = 0; k < n; k++)
{
vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
__riscv_vse8_v_i8mf8 (out + i + j, v, 17);
}
}
}
}
GCC ASM:
f1:
0: mv a7,a2
1: mv a6,a0
2: mv t1,a1
3: mv a2,a3
4: vsetivli zero,17,e8,mf8,ta,ma
5: ble a7,zero,.L1
6: ble a4,zero,.L1
7: ble a3,zero,.L1
8: add a1,a0,a4
9: li a0,0
10:.L4:
11: add a3,a6,a0
12: add a4,t1,a0
13:.L7:
14: li a5,0
15:.L5:
16: vle8.v v24,0(a3)
17: addiw a5,a5,1
18: vse8.v v24,0(a4)
19: bne a2,a5,.L5
20: addi a3,a3,1
21: addi a4,a4,1
22: bne a3,a1,.L7
23: addi a0,a0,1
24: addi a1,a1,1
25: bne a0,a7,.L4
26:.L1:
27: ret
The vsetivli instruction is hoisted too early. The best location of vsetivli
should be any point from 8 to 9.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug c/108270] un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
2023-01-03 1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
@ 2023-01-03 1:52 ` juzhe.zhong at rivai dot ai
2023-04-21 9:49 ` [Bug target/108270] " cvs-commit at gcc dot gnu.org
2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-01-03 1:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |juzhe.zhong at rivai dot ai
--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
This is a trivial issue. I will fix it later.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/108270] un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
2023-01-03 1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
2023-01-03 1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
@ 2023-04-21 9:49 ` cvs-commit at gcc dot gnu.org
2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-21 9:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270
--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kito Cheng <kito@gcc.gnu.org>:
https://gcc.gnu.org/g:d06e9264b0192c2c77e07d7fb0fe090efcb510c0
commit r14-135-gd06e9264b0192c2c77e07d7fb0fe090efcb510c0
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Fri Apr 21 17:19:12 2023 +0800
RISC-V: Defer vsetvli insertion to later if possible [PR108270]
Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.
Consider the following testcase:
void f (void * restrict in, void * restrict out, int l, int n, int m)
{
for (int i = 0; i < l; i++){
for (int j = 0; j < m; j++){
for (int k = 0; k < n; k++)
{
vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
__riscv_vse8_v_i8mf8 (out + i + j, v, 17);
}
}
}
}
Compile option: -O3
Before this patch:
mv a7,a2
mv a6,a0
mv t1,a1
mv a2,a3
vsetivli zero,17,e8,mf8,ta,ma
ble a7,zero,.L1
ble a4,zero,.L1
ble a3,zero,.L1
...
After this patch:
mv a7,a2
mv a6,a0
mv t1,a1
mv a2,a3
ble a7,zero,.L1
ble a4,zero,.L1
ble a3,zero,.L1
add a1,a0,a4
li a0,0
vsetivli zero,17,e8,mf8,ta,ma
...
This issue is a missed optmization produced by Phase 3 global backward
demand
fusion instead of LCM.
This patch is fixing poor placement of the vsetvl.
This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand
info
backward fusion and propogation) which
is I introduced into VSETVL PASS to enhance LCM && improve vsetvl
instruction
performance.
This patch is to supress the Phase 3 too aggressive backward fusion and
propagation to the top of the function program
when there is no define instruction of AVL (AVL is 0 ~ 31 imm since
vsetivli
instruction allows imm value instead of reg).
You may want to ask why we need Phase 3 to the job.
Well, we have so many situations that pure LCM fails to optimize, here I
can
show you a simple case to demonstrate it:
void f (void * restrict in, void * restrict out, int n, int m, int cond)
{
size_t vl = 101;
for (size_t j = 0; j < m; j++){
if (cond) {
for (size_t i = 0; i < n; i++)
{
vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
__riscv_vse8_v_i8mf8 (out + i, v, vl);
}
} else {
for (size_t i = 0; i < n; i++)
{
vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
v = __riscv_vadd_vv_i32mf2 (v,v,vl);
__riscv_vse32_v_i32mf2 (out + i, v, vl);
}
}
}
}
You can see:
The first inner loop needs vsetvli e8 mf8 for vle+vse.
The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.
If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up
with :
outerloop:
...
vsetvli e8mf8
inner loop 1:
....
vsetvli e32mf2
inner loop 2:
....
However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2
of
inner loop 2 into vsetvli e8 mf8, then we will end up with this result
after
phase 3:
outerloop:
...
inner loop 1:
vsetvli e32mf2
....
inner loop 2:
vsetvli e32mf2
....
Then, this demand information after phase 3 will be well optimized after
phase 4
(LCM), after Phase 4 result is:
vsetvli e32mf2
outerloop:
...
inner loop 1:
....
inner loop 2:
....
You can see this is the optimal codegen after current VSETVL PASS (Phase 3:
Demand backward fusion and propagation + Phase 4: LCM ). This is a known
issue
when I start to implement VSETVL PASS.
gcc/ChangeLog:
PR target/108270
* config/riscv/riscv-vsetvl.cc
(vector_infos_manager::all_empty_predecessor_p): New function.
(pass_vsetvl::backward_demand_fusion): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.
gcc/testsuite/ChangeLog:
PR target/108270
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase.
* gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/108270] un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate
2023-01-03 1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
2023-01-03 1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
2023-04-21 9:49 ` [Bug target/108270] " cvs-commit at gcc dot gnu.org
@ 2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
2 siblings, 0 replies; 4+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-03 15:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-03 15:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03 1:51 [Bug c/108270] New: un-optimal vsetvl for multi-loop if avl is 0 ~ 31 immediate juzhe.zhong at rivai dot ai
2023-01-03 1:52 ` [Bug c/108270] " juzhe.zhong at rivai dot ai
2023-04-21 9:49 ` [Bug target/108270] " cvs-commit at gcc dot gnu.org
2023-05-03 15:01 ` juzhe.zhong at rivai dot ai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).