More information of power's testcase:

Before this patch:
test_npeel_int16_t:
lui a4,%hi(.LANCHOR0+130)
lui a3,%hi(.LANCHOR1)
addi a3,a3,%lo(.LANCHOR1)
addi a4,a4,%lo(.LANCHOR0+130)
li a5,58
li a2,16
vsetivli zero,16,e16,m1,ta,ma
vl1re16.v v3,0(a3)
vid.v v1
.L5:
minu a3,a5,a2
vsetvli zero,a3,e16,m1,ta,ma
sub a5,a5,a3
vse16.v v1,0(a4)
vsetivli zero,16,e16,m1,ta,ma
addi a4,a4,32
vadd.vv v1,v1,v3
bne a5,zero,.L5
ret

After this patch:
test_npeel_int16_t:
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
li a1,16
vsetivli zero,16,e16,m1,ta,ma
addi a2,a5,130
vid.v v1
addi a3,a5,162
vadd.vx v4,v1,a1
addi a4,a5,194
li a1,32
vadd.vx v3,v1,a1
vse16.v v1,0(a2)
vse16.v v4,0(a3)
vse16.v v3,0(a4)
addi a5,a5,226
li a1,48
vadd.vx v2,v1,a1
vsetivli zero,10,e16,m1,ta,ma
vse16.v v2,0(a5)
ret

It's obvious, previously, power's testcase in RVV side can not unroll, but after this patch, in RVV side, it can unroll now.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
> ---- Replied Message ----
> From
> Richard Sandiford<richard.sandiford@arm.com>
> Date
> 05/30/2023 19:41
> To
> juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>
> Cc
> gcc-patches<gcc-patches@gcc.gnu.org>,
> rguenther<rguenther@suse.de>,
> linkw<linkw@linux.ibm.com>
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >       sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)