More information of power's testcase: Before this patch: test_npeel_int16_t: lui a4,%hi(.LANCHOR0+130) lui a3,%hi(.LANCHOR1) addi a3,a3,%lo(.LANCHOR1) addi a4,a4,%lo(.LANCHOR0+130) li a5,58 li a2,16 vsetivli zero,16,e16,m1,ta,ma vl1re16.v v3,0(a3) vid.v v1 .L5: minu a3,a5,a2 vsetvli zero,a3,e16,m1,ta,ma sub a5,a5,a3 vse16.v v1,0(a4) vsetivli zero,16,e16,m1,ta,ma addi a4,a4,32 vadd.vv v1,v1,v3 bne a5,zero,.L5 ret After this patch: test_npeel_int16_t: lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) li a1,16 vsetivli zero,16,e16,m1,ta,ma addi a2,a5,130 vid.v v1 addi a3,a5,162 vadd.vx v4,v1,a1 addi a4,a5,194 li a1,32 vadd.vx v3,v1,a1 vse16.v v1,0(a2) vse16.v v4,0(a3) vse16.v v3,0(a4) addi a5,a5,226 li a1,48 vadd.vx v2,v1,a1 vsetivli zero,10,e16,m1,ta,ma vse16.v v2,0(a5) ret It's obvious, previously, power's testcase in RVV side can not unroll, but after this patch, in RVV side, it can unroll now. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-05-30 20:33 To: juzhe.zhong CC: Richard Sandiford; gcc-patches; linkw Subject: Re: [PATCH] VECT: Change flow of decrement IV On Tue, 30 May 2023, juzhe.zhong wrote: > This patch will generate the number of rgroup ?mov? instructions inside the > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more > instruction in loop. If this patch is necessary? I think I should find a way > to fix it. That's odd, you only need to adjust the IV which is used in the exit test, not all the others. > ---- Replied Message ---- > From > Richard Sandiford > Date > 05/30/2023 19:41 > To > juzhe.zhong@rivai.ai > Cc > gcc-patches, > rguenther, > linkw > Subject > Re: [PATCH] VECT: Change flow of decrement IV > "juzhe.zhong@rivai.ai" writes: > > Before this patch: > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > sub a2,a2,a5 > > bne a2,zero,.L3 > > .L5: > > ret > > > > After this patch: > > > > foo: > > ble a2,zero,.L5 > > csrr a3,vlenb > > srli a4,a3,2 > > neg a7,a4 -->>>additional instruction > > .L3: > > minu a5,a2,a4 > > vsetvli zero,a5,e32,m1,ta,ma > > vle32.v v2,0(a1) > > vle32.v v1,0(a0) > > vsetvli t1,zero,e32,m1,ta,ma > > mv a6,a2 -->>>additional instruction > > vadd.vv v1,v1,v2 > > vsetvli zero,a5,e32,m1,ta,ma > > vse32.v v1,0(a0) > > add a1,a1,a3 > > add a0,a0,a3 > > add a2,a2,a7 > > bgtu a6,a4,.L3 > > .L5: > > ret > > > > There is 1 more instruction in preheader and 1 more instruction in loop. > > But I think it's OK for RVV since we will definitely be using SELECT_VL so > this issue will gone. > > But what about cases where you won't be using SELECT_VL, such as SLP? > > Richard > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)