And also I already decided to make remove WHILE_LEN pattern since it seems to be unnecessary.
And as Richard said, it's just a simple airthmetic and it's not worthwhile to do that.

So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV specific done in RISC-V port.
I think it's more reasonable for IBM use and more target use in the future.

So, this patch will need to changed as "introduce a new flow to do vectorization loop control" which is a new loop control flow
with saturating subtracting n down to zero, and add a target hook for it so that we can switch to this flow ?

Is it more reasonable ?
Thanks.


juzhe.zhong@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-14 10:54
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Hi Juzhe,
 
on 2023/4/13 21:44, 钟居哲 wrote:
> Thanks Kewen.
> 
> Current flow in this patch like you said:
> ....
> len = WHILE_LEN (n,vf);
> ...
> v = len_load (addr,len);
> ..
> addr = addr + vf (in byte align);
> ....
> 
> This patch is just keep adding address with a vector factor (adjust as byte align).
> For example, if your vector length = 512bit. Then this patch is just updating address as
> addr = addr + 64;
> 
> However, today after I read RVV ISA more deeply, it should be more appropriate that
> the address should updated as : addr = addr + (len * 4) if len is element number of INT32.
> the len is the result by WHILE_LEN which calculate the len.
 
I just read your detailed explanation on the usage of vsetvli insn (really appreciate that),
it looks that this WHILE_LEN wants some more semantics than MIN, so I assume you still want
to introduce this WHILE_LEN.
 
> 
> I assume for IBM target, it's better to just update address directly adding the whole register bytesize 
> in address IV. Since I think the second way (address = addr + (len * 4)) is too RVV specific, and won't be suitable for IBM. Is that right?
 
Yes, we just wants to add the whole vector register length in bytes.
 
> If it is true, I will keep this patch flow (won't change to  address = addr + (len * 4)) to see what else I need to do for IBM.
> I would rather do that in RISC-V backend port.
 
IMHO, you don't need to push this down to RV backend, just query these ports having len_{load,store}
support with a target hook or special operand in optab while_len (see internal_len_load_store_bias)
for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, you want it to have
the semantics as what vsetvli performs, but for IBM ports, it would be just like MIN_EXPR, maybe we
can also generate MIN or WHILE_LEN based on this kind of target information.
 
If the above assumption holds, I wonder if you also want WHILE_LEN to have the implicit effect
to update vector length register?  If yes, the codes with multiple rgroups looks unexpected:
 
+ _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
+ _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
 
as the latter one seems to override the former.  Besides, if the given operands are known constants,
it can't directly be folded into constants and do further propagation.   From this perspective, Richi's
suggestion on "tieing the scalar result with the uses" looks better IMHO.
 
> 
>>> I tried
>>>to compile the above source files on Power, the former can adopt doloop
>>>optimization but the latter fails to. 
> You mean GCC can not do hardward loop optimization when IV loop control is variable ? 
 
No, for both cases, IV is variable, the dumping at loop2_doloop for the proposed sequence says
"Doloop: Possible infinite iteration case.", it seems to show that for the proposed sequence compiler 
isn't able to figure out the loop is finite, it may miss the range information on n, or it isn't
able to analyze how the invariant involves, but I didn't look into it, all my guesses.
 
BR,
Kewen