Ok. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/thread.html I have add comments as you suggested. juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-06-13 07:21 To: juzhe.zhong; gcc-patches CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc; pan2.li Subject: Re: [PATCH V2] RISC-V: Support RVV VLA SLP auto-vectorization On 6/6/23 21:19, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong > > This patch enables basic VLA SLP auto-vectorization. > Consider this following case: > void > f (uint8_t *restrict a, uint8_t *restrict b) > { > for (int i = 0; i < 100; ++i) > { > a[i * 8 + 0] = b[i * 8 + 7] + 1; > a[i * 8 + 1] = b[i * 8 + 7] + 2; > a[i * 8 + 2] = b[i * 8 + 7] + 8; > a[i * 8 + 3] = b[i * 8 + 7] + 4; > a[i * 8 + 4] = b[i * 8 + 7] + 5; > a[i * 8 + 5] = b[i * 8 + 7] + 6; > a[i * 8 + 6] = b[i * 8 + 7] + 7; > a[i * 8 + 7] = b[i * 8 + 7] + 3; > } > } > > To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: > > 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. > { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } > > 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. > { 1, 2, 8, 4, 5, 6, 7, 3, ... } > > And these vector can be generated at prologue. > > After this patch, we end up with this following codegen: > > Prologue: > ... > vsetvli a7,zero,e16,m2,ta,ma > vid.v v4 > vsrl.vi v4,v4,3 > li a3,8 > vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } > ... > li t1,67633152 > addi t1,t1,513 > li a3,50790400 > addi a3,a3,1541 > slli a3,a3,32 > add a3,a3,t1 > vsetvli t1,zero,e64,m1,ta,ma > vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } > ... > LoopBody: > ... > min a3,... > vsetvli zero,a3,e8,m1,ta,ma > vle8.v v2,0(a6) > vsetvli a7,zero,e8,m1,ta,ma > vrgatherei16.vv v1,v2,v4 > vadd.vv v1,v1,v3 > vsetvli zero,a3,e8,m1,ta,ma > vse8.v v1,0(a2) > add a6,a6,a4 > add a2,a2,a4 > mv a3,a5 > add a5,a5,t1 > bgtu a3,a4,.L3 > ... > > Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger > range than "vrgather.vv" (which only can maximum element index = 255). > Epilogue: > lbu a5,799(a1) > addiw a4,a5,1 > sb a4,792(a0) > addiw a4,a5,2 > sb a4,793(a0) > addiw a4,a5,8 > sb a4,794(a0) > addiw a4,a5,4 > sb a4,795(a0) > addiw a4,a5,5 > sb a4,796(a0) > addiw a4,a5,6 > sb a4,797(a0) > addiw a4,a5,7 > sb a4,798(a0) > addiw a5,a5,3 > sb a5,799(a0) > ret > > There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. > I will support VLS modes for "Epilogue auto-vectorization" in the future. > > gcc/ChangeLog: > > * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. > * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. > (rvv_builder::single_step_npatterns_p): New function. > (rvv_builder::npatterns_all_equal_p): Ditto. > (const_vec_all_in_range_p): Support POLY handling. > (gen_const_vector_dup): Ditto. > (emit_vlmax_gather_insn): Add vrgatherei16. > (emit_vlmax_masked_gather_mu_insn): Ditto. > (expand_const_vector): Add VLA SLP const vector support. > (expand_vec_perm): Support POLY. > (struct expand_vec_perm_d): New struct. > (shuffle_generic_patterns): New function. > (expand_vec_perm_const_1): Ditto. > (expand_vec_perm_const): Ditto. > * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. > (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. > * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. > * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test. > > +} > + > +/* Return true if all elements of NPATTERNS are equal. > + > + E.g. NPATTERNS = 4: > + { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... } > + E.g. NPATTERNS = 8: > + { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... } > +*/ > +bool > +rvv_builder::npatterns_all_equal_p () const > +{ > + poly_int64 ele0 = rtx_to_poly_int64 (elt (0)); > + for (unsigned int i = 1; i < npatterns (); i++) > + { > + poly_int64 ele = rtx_to_poly_int64 (elt (i)); > + if (!known_eq (ele, ele0)) > + return false; > + } > + return true; > +} There seems to be a disconnect here. You only seem to check the first NPATTERN elements. Don't you need to check the rest? Or am I just getting confused by the function comment? > + > +static bool > +expand_vec_perm_const_1 (struct expand_vec_perm_d *d) Needs a function comment. > > + > +bool > +expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target, > + rtx op0, rtx op1, const vec_perm_indices &sel) Similarly. Overall it looks really good. Just a couple comments to fix and sort out whether or not I'm misinterpreting rvv_builder::npatterns_all_equal_p. Jeff