No, we can only pass "available" to LCM. Passing "compatible" to LCM can not work for us. LCM can only help for eliminate vsetvls can not help for fuse vsetvls. For example: bb 0: vsetvl e8,mf8 vadd (Demand SEW = 8, LMUL = MF8) bb 1: vsetvl e32 mf2 vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or e16,mf4 or e32,mf2 or e64,m1]) I use LCM to handle the case above, I tell LCM that the vsetvl of vadd is "available" for the following "vle" instruction. Then LCM will let us to remove "vsetvl e32mf2" This is what I said "available" case that I use LCM to handle. However, LCM can not handle "compatible" case. Here is the example: Loop: { bb 0: vsetvl e8,mf8,TA vadd (Demand SEW = 8, LMUL = MF8, can either TU or TA) bb 1: vsetvl e32 mf2,TU vle32 (Demand RATIO (SEW/LMUL) = 64, so available [SEW, LMUL] = [e8,mf8 or e16,mf4 or e32,mf2 or e64,m1], and demand TU) } It's obvious that neither "vsetvl e8,mf8,TA" nor "vsetvl e32 mf2,TU" are available for both instructions "vadd" and "vle". That's why we need Phase 3 in VSETVL PASS. I do the demand fusion generate a new vsetvl instructions "vsetvl e8,mf8,TU" which is available for both RVV instructions "vadd" and "vle", and update the first vsetvl "vsetvl e8,mf8,TA" to "vsetvl e8,mf8,TU" Then, I tell LCM "vsetvl e8,mf8,TU" is available for both "vadd" and "vle32", so LCM will hoist "vsetvl e8,mf8,TU" outside the LOOP and remove all vsetvls inside the loop. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-04-11 16:55 To: juzhe.zhong CC: Jeff Law; gcc-patches; kito.cheng; palmer Subject: Re: Re: [PATCH] RISC-V: Fix PR108279 On Wed, Apr 5, 2023 at 3:53 PM wrote: > > >> So fusion in this context is really about identifying cases where two > >> configuration settings are equivalent and you "fuse" them together. > >> Presumably this is only going to be possible when the vector insns are > >> just doing data movement rather than actual computations? > > >> If my understanding is correct, I can kind of see why you're doing > >> fusion during phase 3. My sense is there's a better way, but I'm having > >> a bit of trouble working out the details of what that should be to > >> myself. In any event, revamping parts of the vsetvl insertion code > >> isn't the kind of thing we should be doing now. > > The vsetvl demand fusion happens is not necessary "equivalent", instead, we > call it we will do demand fusion when they are "compatible". > And the fusion can happen between any vector insns including data movement > and actual computations. > > What is "compatible" ?? This definition is according to RVV ISA. > For example , For a vadd.vv need a vsetvl zero, 4, e32,m1,ta,ma. > and a vle.v need a vsetvl zero,4,e8,mf4,ta,ma. > > According to RVV ISA: > vadd.vv demand SEW = 32, LMUL = M1, AVL = 4 > vle.v demand RATIO = SEW/LMUL = 32, AVL = 4. > So after demand fusion, the demand becomes SEW = 32, LMUL = M1, AVL = 4. > Such vsetvl instruction is configured as this demand fusion, we call it "compatible" > since we can find a common vsetvl VL/VTYPE status for both vadd.vv and vle.v > > However, what case is not "incompatible", same example, if the vadd.vv demand SEW = 32. LMUL = MF2, > the vadd.vv is incompatible with vle.v. since we can't find a common VL/VTYPE vsetvl instruction available > for both of them. > > We have local demand fusion which is Phase 1. Local demand fusion is doing the fusion within a block > And also we have global demand fusion which is Phase 3. Global demand fusion is doing across blocks. > > After Phase 1, each block has a single demand fusion. Phase 3 is doing global demand fusion trying to > find the common VL/VTYPE status available for a bunch of blocks, and fuse them into a single vsetvl. > So that we eliminate redundant vsetvli. > > Here is a example: > > bb 0: (vle.v demand RATIO = 32) > / \ > bb 1 bb 2 > / \ / \ > bb 3 bb 4 .... bb 5 > vadd vmul vdiv > (demand (demand (demand > sew = 8, sew = 8, sew = 8, > lmul = mf4) lmul = mf4, lmul = mf4, > tail policy = tu) mask policy = mu) > > So in this case, we should do the global demand fusion for bb 0, bb3, bb 4, bb5. > since they are compatible according to RVV ISA. > The final demand info of vsetvl should be vsetvl e8,mf4,tu,mu and put it > in the bb0. Then we can avoid vsetvl in bb 3, 4, 5. Just to throw in a comment here - I think you should present LCM with something it can identify as the same for compatible vsetvl and then it should just work? OTOH if "compatible" is not transitive that's not possible (but then I can't quickly make up an example where it wouldn't be). > >> We have more fusion rules according to RVV ISA. Phase 3 (Global demand fusion) is > >> really important. > > >> That would seem to indicate the function is poorly named. Unless you're > >> using "empty" here to mean the state is valid or dirty. Either way it > >> seems like the function name ought to be improved. > > >> The comments talk about bb1 being inside a loop. Nowhere do you check > >> that as far as I can tell. > > >> When trying to understand what the patch is going I ran across this comment: > > >> /* The local_dem vector insn_info of the block. */ > >> vector_insn_info local_dem; > > > >> That comment really doesn't improve anything. "local_dem" is clearly > >> short-hand for something (local demand?), whatever it is, make it > >> clearer in the comment. > > Sorry for bad comments in the codes. Currently, I am working on the first patch > of auto-vectorization. After I sent the first patch of auto-vectorization for you to > review. I would like to re-check all the comments and code style of VSETVL PASS. > And refine them. > > > > > juzhe.zhong@rivai.ai > > From: Jeff Law > Date: 2023-04-05 21:05 > To: juzhe.zhong; gcc-patches > CC: kito.cheng; palmer > Subject: Re: [PATCH] RISC-V: Fix PR108279 > > > On 4/2/23 16:40, juzhe.zhong@rivai.ai wrote: > > This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand > > info backward fusion and propogation) which > > is I introduced into VSETVL PASS to enhance LCM && improve vsetvl > > instruction performance. > So fusion in this context is really about identifying cases where two > configuration settings are equivalent and you "fuse" them together. > Presumably this is only going to be possible when the vector insns are > just doing data movement rather than actual computations? > > If my understanding is correct, I can kind of see why you're doing > fusion during phase 3. My sense is there's a better way, but I'm having > a bit of trouble working out the details of what that should be to > myself. In any event, revamping parts of the vsetvl insertion code > isn't the kind of thing we should be doing now. > > > WRT the actual patch. Please put a function comment on the > all_empty_predecessor_p method. Something like this perhaps? > > /* Return TRUE if all the predecessors of CFG_BB have vsetvl > state that is valid or dirty, FALSE otherwise. */ > > > That would seem to indicate the function is poorly named. Unless you're > using "empty" here to mean the state is valid or dirty. Either way it > seems like the function name ought to be improved. > > The comments talk about bb1 being inside a loop. Nowhere do you check > that as far as I can tell. > > When trying to understand what the patch is going I ran across this comment: > > /* The local_dem vector insn_info of the block. */ > vector_insn_info local_dem; > > > That comment really doesn't improve anything. "local_dem" is clearly > short-hand for something (local demand?), whatever it is, make it > clearer in the comment. > > Jeff >