Hi, all. After several investigations: Here is my experiements: void single_rgroup (int32_t *__restrict a, int32_t *__restrict b, int n) { for (int i = 0; i < n; i++) a[i] = b[i] + a[i]; } void mutiple_rgroup (float *__restrict f, double *__restrict d, int n) { for (int i = 0; i < n; ++i) { f[i * 2 + 0] = 1; f[i * 2 + 1] = 2; d[i] = 3; } } single_rgroup: ble a2,zero,.L5 li a4,4 .L3: minu a5,a2,a4 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a0) vle32.v v2,0(a1) vsetivli zero,4,e32,m1,ta,ma mv a3,a2 ---------> 1 more "mv" instruction vadd.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) addi a1,a1,16 addi a0,a0,16 addi a2,a2,-4 bgtu a3,a4,.L3 .L5: ret .size single_rgroup, .-single_rgroup .align 1 .globl foo5 .type foo5, @function mutiple_rgroup : ble a2,zero,.L11 lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) vl1re32.v v2,0(a5) lui a5,%hi(.LANCHOR0+16) addi a5,a5,%lo(.LANCHOR0+16) slli a2,a2,1 li a3,8 li a7,4 vl1re64.v v1,0(a5) .L9: minu a5,a2,a3 minu a4,a5,a7 sub a5,a5,a4 addi a6,a0,16 vsetvli zero,a4,e32,m1,ta,ma vse32.v v2,0(a0) srli a4,a4,1 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a6) srli a5,a5,1 vsetvli zero,a4,e64,m1,ta,ma addi a6,a1,16 vse64.v v1,0(a1) mv a4,a2 ---------> 1 more "mv" instruction vsetvli zero,a5,e64,m1,ta,ma vse64.v v1,0(a6) addi a0,a0,32 addi a1,a1,32 addi a2,a2,-8 bgtu a4,a3,.L9 .L11: ret These are the examples, I have tried enough amount cases. This is the worst case after this patch for RVV: no matter single-rgroup or multiple-rgroup, we will end up with 1 more "mv" instruction inside the loop. There are also some examples I have tried with no more instructions (It seems IVOPTS has done some optimization in some cases). From my side (RVV), I think one more "mv" instruction is not a big deal if this patch (apply vf step and check conditon by remain > vf) can help IBM. For single-rgroup, this 'mv' instruction will gone when we use SELECT_VL. For multiple-rgroup, the 'mv' instruction remains but as I said, not a big deal. If this patch's approach is approved, I will rebase and send SELECT_VL patch again base on this patch. Looking forward your suggestions. Thanks. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-05-30 20:33 To: juzhe.zhong CC: Richard Sandiford; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV On Tue, 30 May 2023, juzhe.zhong wrote: > This patch will generate the number of rgroup ?mov? instructions inside the > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more > instruction in loop. If this patch is necessary? I think I should find a way > to fix it. That's odd, you only need to adjust the IV which is used in the exit test, not all the others. > ---- Replied Message ---- > From > Richard Sandiford > Date > 05/30/2023 19:41 > To > juzhe.zhong@rivai.ai > Cc > gcc-patches, > rguenther, > linkw > Subject > Re: [PATCH] VECT: Change flow of decrement IV > "juzhe.zhong@rivai.ai" writes: > > Before this patch: > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > sub a2,a2,a5 > > bne a2,zero,.L3 > > .L5: > > ret > > > > After this patch: > > > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > neg a7,a4 -->>>additional instruction > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > mv a6,a2 -->>>additional instruction > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > add a2,a2,a7 > > bgtu a6,a4,.L3 > > .L5: > > ret > > > > There is 1 more instruction in preheader and 1 more instruction in loop. > > But I think it's OK for RVV since we will definitely be using SELECT_VL so > this issue will gone. > > But what about cases where you won't be using SELECT_VL, such as SLP? > > Richard > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)