>> I didn't get why we need vlmax vsetvl here? >> We use code_for_pred_mov if subpart_mode is fractional LMUL mode >> and will use the whole reg load store if not fractional LMUL. >> So we don't need explicitly vsetvl for the above 2 cases in my understanding? >> I know I must miss something, do you mind giving me a few more explanations? The bug happens here: https://godbolt.org/z/TbqKv3cWr Wrong codes: vlseg8e16.v v1,(a5) vsetvli a7,zero,e16,mf2,ta,ma vse16.v v1,0(a7) "a7" register is the issue. Before split2, in our tuple move pattern: (define_insn_and_split "*mov_" [(set (match_operand:VT 0 "reg_or_mem_operand" "=vr,vr, m") (match_operand:VT 1 "reg_or_mem_operand" " vr, m,vr")) (clobber (match_scratch:P 2 "=X,&r,&r")) (clobber (match_scratch:P 3 "=X,&r,&r")) (clobber (match_scratch:P 4 "=X,&r,&r"))] We clobber scalar registers since we may need to emit vsetvli VL,zero We don't emit a explicit pattern set the clobber registers. So we end up have this following RTL: (insn 133 42 153 2 (set (reg:DI 6 t1 [247]) (reg:DI 17 a7 [251])) "/app/example.c":8:14 206 {*movdi_64bit} (nil)) (insn 153 133 154 2 (set (reg:DI 16 a6 [231]) (reg:DI 6 t1 [247])) "/app/example.c":8:14 206 {*movdi_64bit} (nil)) (insn 154 153 155 2 (set (mem:RVVMF2HI (reg:DI 16 a6 [231]) [0 S8 A16]) (if_then_else:RVVMF2HI (unspec:RVVMF32BI [ (const_vector:RVVMF32BI [ (const_int 1 [0x1]) repeated x4 ]) (reg:DI 17 a7 [232]) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (reg:RVVMF2HI 97 v1 [orig:162 vect_array.21 ] [162]) (unspec:RVVMF2HI [ (reg:SI 0 zero) ] UNSPEC_VUNDEF))) "/app/example.c":8:14 1711 {*pred_movrvvmf2hi} (nil)) You can see the memory address is "a6" is not "a7" However, we have this following patterns before: (insn 133 42 153 2 (set (reg:DI 6 t1 [247]) (reg:DI 17 a7 [251])) "/app/example.c":8:14 206 {*movdi_64bit} (nil)) (insn 153 133 154 2 (set (reg:DI 16 a6 [231]) (reg:DI 6 t1 [247])) "/app/example.c":8:14 206 {*movdi_64bit} (nil)) The latter pass consider "a6" can be replaced by "a7". Then, the memory address is changed into "a7" which is wrong. So. we should emit vsetvl, let GCC known the AVL "a7" used is a different value. Then bug will be fixed. But you remind me a thing, is that for whole register mode , we don't need this. So, the code should be adjusted: if(fractional_p) emit_vlmax_vsetvl (subpart_mode, ops[4]); juzhe.zhong@rivai.ai From: Kito Cheng Date: 2023-11-17 16:49 To: Juzhe-Zhong CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc Subject: Re: [PATCH] RISC-V: Fix bug of tuple move splitter[PR112561] > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc > index 6a2009ffb05..08bbb657a06 100644 > --- a/gcc/config/riscv/riscv-v.cc > +++ b/gcc/config/riscv/riscv-v.cc > @@ -374,10 +374,24 @@ void > emit_vlmax_insn_lra (unsigned icode, unsigned insn_flags, rtx *ops, rtx vl) > { > gcc_assert (!can_create_pseudo_p ()); > + machine_mode mode = GET_MODE (ops[0]); > > - insn_expander e (insn_flags, true); > - e.set_vl (vl); > - e.emit_insn ((enum insn_code) icode, ops); > + if (imm_avl_p (mode)) > + { > + /* Even though VL is a real hardreg already allocated since > + it is post-RA now, we still gain benefits that we emit > + vsetivli zero, imm instead of vsetvli VL, zero which is > + we can be more flexible in post-RA instruction scheduling. */ > + insn_expander e (insn_flags, false); > + e.set_vl (gen_int_mode (GET_MODE_NUNITS (mode), Pmode)); > + e.emit_insn ((enum insn_code) icode, ops); > + } > + else > + { > + insn_expander e (insn_flags, true); > + e.set_vl (vl); > + e.emit_insn ((enum insn_code) icode, ops); > + } It's a separate optimization which should not be included within this patch. > } > > /* Emit an RVV insn with a predefined vector length. Contrary to > @@ -2148,6 +2162,7 @@ expand_tuple_move (rtx *ops) > offset = ops[2]; > } > > + emit_vlmax_vsetvl (subpart_mode, ops[4]); I didn't get why we need vlmax vsetvl here? We use code_for_pred_mov if subpart_mode is fractional LMUL mode and will use the whole reg load store if not fractional LMUL. So we don't need explicitly vsetvl for the above 2 cases in my understanding? I know I must miss something, do you mind giving me a few more explanations? > if (MEM_P (ops[1])) > { > /* Load operations. */