>> ISTM that if you want to run before sched2, then >> you'd need to introduce dependencies between the vsetvl instrutions and >> the vector instructions that utilize those settings? Yes, I want to run before sched2 so that we could have the chance to do the instruction scheduling before sched2. I already introduce dependencies in vector instructions so that it won't produce any issues. >> Formatting note. For a multi-line conditional, go ahead and use an open >> paren and the usual indention style. >> return (INSN_CODE (rinsn) == CODE_FOR_vsetvldi >> || INSN_CODE (rinsn) == CODE_FOR_vsetvlsi); >> There's other examples in the new file. >> s/shoule/should/ >> s/propagete/propagate/ >> s/optimzal/optimal/ >> s/PASSes/passes/ >> s/intrinsiscs/intrinsics/ >> s/instrinsics/intrinsics/ >> s/acrocss/across/ Address commnents. >> It'd probably be better to move this into rtl.cc with a prototype in >> rtl.h rather than have duplicate definitions in gcse.c and the RISC-V >> backend. I'm not even entirely sure why we really need it here. Maybe we do that when GCC14 is open? >> These need function comments. What isn't clear to me is why we don't >> just call validate_change? Is it just so we get the dump info? Yes, since it's called more than once and I want to dump details in dump file. Such dump infos are important for debugging. juzhe.zhong@rivai.ai From: Jeff Law Date: 2022-12-19 23:44 To: juzhe.zhong; gcc-patches CC: kito.cheng; palmer Subject: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support I believe Kito already approved. There's nothing here that is critical, just minor cleanups and I'm fine with them being cleaned up as a follow-up patch given Kito has already approved this patch. On 12/14/22 00:13, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong > > This patch is to support VSETVL PASS for RVV support. > 1.The optimization and performance is guaranteed LCM (Lazy code motion). > 2.Base on RTL_SSA framework to gain better optimization chances. > 3.Also we do VL/VTYPE, demand information backward propagation across > blocks by RTL_SSA reverse order in CFG. > 4.It has been well and fully tested by about 200+ testcases for VLMAX > AVL situation (Only for VLMAX since we don't have an intrinsics to > test non-VLMAX). > 5.Will support AVL model in the next patch > > gcc/ChangeLog: > > * config.gcc: Add riscv-vsetvl.o. > * config/riscv/riscv-passes.def (INSERT_PASS_BEFORE): Add VSETVL PASS location. > * config/riscv/riscv-protos.h (make_pass_vsetvl): New function. > (enum avl_type): New enum. > (get_ta): New function. > (get_ma): Ditto. > (get_avl_type): Ditto. > (calculate_ratio): Ditto. > (enum tail_policy): New enum. > (enum mask_policy): Ditto. > * config/riscv/riscv-v.cc (calculate_ratio): New function. > (emit_pred_op): change the VLMAX mov codgen. > (get_ta): New function. > (get_ma): Ditto. > (enum tail_policy): Change enum. > (get_prefer_tail_policy): New function. > (enum mask_policy): Change enum. > (get_prefer_mask_policy): New function. > * config/riscv/t-riscv: Add riscv-vsetvl.o > * config/riscv/vector.md (): Adjust attribute and pattern for VSETVL PASS. > (@vlmax_avl): Ditto. > (@vsetvl_no_side_effects): Delete. > (vsetvl_vtype_change_only): New MD pattern. > (@vsetvl_discard_result): Ditto. > * config/riscv/riscv-vsetvl.cc: New file. > * config/riscv/riscv-vsetvl.h: New file. So a high level note. Once you've inserted your vsetvl instrutions, you can't have further code motion, correct? So doesn't this potentially have a poor interaction with something like speculative code motion as performed by sched? ISTM that if you want to run before sched2, then you'd need to introduce dependencies between the vsetvl instrutions and the vector instructions that utilize those settings? I can envision wanting to schedule the vsetvl instructions so that they bubble up slightly from their insertion points to avoid stalls or allow the vector units to start executing earlier. Is that what's driving the the current pass placement? If not would it make more sense to use the late prologue/epilogue hooks that Richard Sandiford posted recently (I'm not sure they're committed yet). > + > +static bool > +loop_basic_block_p (const basic_block cfg_bb) > +{ > + return JUMP_P (BB_END (cfg_bb)) && any_condjump_p (BB_END (cfg_bb)); > +} The name seems poor here -- AFAICT this has nothing to do with loops. It's just a test that the end of a block is a conditional jump. I'm pretty sure we could extract BB_END (cfg_bb) and use an existing routine instead of writing our own. I'd suggest peeking at jump.cc to see if there's something already suitable. + > +/* Return true if it is vsetvldi or vsetvlsi. */ > +static bool > +vsetvl_insn_p (rtx_insn *rinsn) > +{ > + return INSN_CODE (rinsn) == CODE_FOR_vsetvldi > + || INSN_CODE (rinsn) == CODE_FOR_vsetvlsi; Formatting note. For a multi-line conditional, go ahead and use an open paren and the usual indention style. return (INSN_CODE (rinsn) == CODE_FOR_vsetvldi || INSN_CODE (rinsn) == CODE_FOR_vsetvlsi); There's other examples in the new file. > + > +/* An "anticipatable occurrence" is one that is the first occurrence in the > + basic block, the operands are not modified in the basic block prior > + to the occurrence and the output is not used between the start of > + the block and the occurrence. */ > +static bool > +anticipatable_occurrence_p (const insn_info *insn, const vector_insn_info dem) > +{ > + /* The only possible operand we care of VSETVL is AVL. */ > + if (dem.has_avl_reg ()) > + { > + /* The operands shoule not be modified in the basic block prior s/shoule/should/ > + > +/* Return true if the branch probability is dominate. */ > +static bool > +dominate_probability_p (edge e) The function comment needs some work. "is dominate" doesn't really mean anything without more context. It looks like you're really testing if the edge probability is greater than 50%. > + > + /* There is a obvious case that is not worthwhile and meaningless > + to propagate the demand information: > + local_dem > + __________ > + ____|____ | > + | | | > + |________| | > + |_________| > + reaching_out > + Header is incompatible with reaching_out and the block is loop itself, > + we don't backward propagete the local_dem since we can't avoid emit > + vsetvl for the local_dem. */ s/propagete/propagate/ > +static bool > +can_backward_propagate_p (const function_info *ssa, const basic_block cfg_bb, > + const vector_insn_info prop) > +{ > + insn_info *insn = prop.get_insn (); > + > + /* TODO: We don't backward propagate the explict VSETVL here > + since we will change vsetvl and vsetvlmax intrinsiscs into > + no side effects which can be optimized into optimzal location > + by GCC internal PASSes. We only need to support these backward > + propagation if vsetvl instrinsics have side effects. */ s/optimzal/optimal/ s/PASSes/passes/ s/intrinsiscs/intrinsics/ s/instrinsics/intrinsics/ > + > + /* If the definition is in the current block, we can't propagate it > + acrocss blocks. */ s/acrocss/across/ > + > +static void > +eliminate_insn (rtx_insn *rinsn) > +{ > + if (dump_file) > + { > + fprintf (dump_file, "\nEliminate insn %d:\n", INSN_UID (rinsn)); > + print_rtl_single (dump_file, rinsn); > + } > + if (in_sequence_p ()) > + remove_insn (rinsn); > + else > + delete_insn (rinsn); > +} Hmm, presumably you can be in a sequence because you're generating code and you might want to remove an insn in the generated sequence? > + > +/* ??? If there was a jump optimization pass after gcse and before loop, > + then we would not need to do this here, because jump would add the > + necessary REG_LABEL_OPERAND and REG_LABEL_TARGET notes. */ > + > +static void > +add_label_notes (rtx x, rtx_insn *insn) It'd probably be better to move this into rtl.cc with a prototype in rtl.h rather than have duplicate definitions in gcse.c and the RISC-V backend. I'm not even entirely sure why we really need it here. > + > + This is used by both the PRE and code hoisting. */ > + > +static void > +insert_insn_end_basic_block (rtx_insn *rinsn, basic_block cfg_bb) Similarly. Though rather than rtl.cc maybe we should just export it out of gcse or wherever it's currently defined. > > + > +static void > +change_insn (rtx_insn *rinsn, rtx new_pat) These need function comments. What isn't clear to me is why we don't just call validate_change? Is it just so we get the dump info? Jeff