I have reorder the functions so that we won't mess up deleted functions and new functions. V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628237.html >> Why need this exception? Because we have this piece code here for fusion in "EMPTY" block: new_info = expr.merge (expr, GLOBAL_MERGE, eg->src->index); The expr may not have a reall avl source which is considered as incompatible. However, in this case, we should skip the compatible check, just use merge to compute demand info. >>Make sure I understand this correctly: it's worth if thoe edges has >>different probability? >>If all probability is same, then it's not worth? The probability is supposed to help for picking the optimal VSETVL info for incompatible demand infos. Consider this following case: void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2) { for (size_t i = 0; i < n; i++) { if (i== cond) { vint8mf8_t v = *(vint8mf8_t*)(in + i + 100); *(vint8mf8_t*)(out + i + 100) = v; } else { vbool1_t v = *(vbool1_t*)(in + i + 400); *(vbool1_t*)(out + i + 400) = v; } } } Both VSETVLs are incompatible since one want e8mf8, the other wants e8m8. For if (i == cond) is very low probability (It could only be accessed 0 times or once) We want to hoist the e8m8 to get optimal codegen like this: f: beq a2,zero,.L10 addi a0,a0,1600 addi a1,a1,1600 li a5,0 vsetvli a4,zero,e8,m8,ta,ma .L5: beq a3,a5,.L12 vlm.v v1,0(a0) vsm.v v1,0(a1) .L4: addi a5,a5,1 addi a0,a0,4 addi a1,a1,4 bne a2,a5,.L5 .L10: ret .L12: vsetvli a7,zero,e8,mf8,ta,ma addi a6,a1,-1200 addi t1,a0,-1200 vle8.v v1,0(t1) vse8.v v1,0(a6) vsetvli a4,zero,e8,m8,ta,ma j .L4 Wheras the other case is like this: void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2) { for (size_t i = 0; i < n; i++) { if (i > cond) { vint8mf8_t v = *(vint8mf8_t*)(in + i + 100); *(vint8mf8_t*)(out + i + 100) = v; } else { vbool1_t v = *(vbool1_t*)(in + i + 400); *(vbool1_t*)(out + i + 400) = v; } } } Both condition probabilities are equal, so we don't want to take any of them as higher priority, so the codegen should be: f: beq a2,zero,.L10 addi a0,a0,1600 addi a1,a1,1600 li a5,0 j .L5 .L12: vsetvli a7,zero,e8,mf8,ta,ma addi a5,a5,1 vle8.v v1,0(a6) vse8.v v1,0(a4) addi a0,a0,4 addi a1,a1,4 beq a2,a5,.L10 .L5: addi a4,a1,-1200 addi a6,a0,-1200 bltu a3,a5,.L12 vsetvli t1,zero,e8,m8,ta,ma addi a5,a5,1 vlm.v v1,0(a0) vsm.v v1,0(a1) addi a0,a0,4 addi a1,a1,4 bne a2,a5,.L5 .L10: ret juzhe.zhong@rivai.ai From: Kito Cheng Date: 2023-08-22 23:35 To: Kito Cheng CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS It's really great improvement, it's drop some state like HARD_EMPTY and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to understand! also this also fundamentally improved the phase 3, although one concern is the time complexity might be come more higher order, (and it's already high enough in fact.) but mostly those vectorized code are only appeard within the inner most loop, so that is acceptable in generally So I will try my best to review this closely to make it more close to the perfect :) I saw you has update serveral testcase, why update instead of add new testcase?? could you say more about why some testcase added __riscv_vadd_vv_i8mf8 or add some more dependency of vl variable? > @@ -1423,8 +1409,13 @@ static bool > ge_sew_ratio_unavailable_p (const vector_insn_info &info1, > const vector_insn_info &info2) > { > - if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW)) > - return info1.get_sew () < info2.get_sew (); > + if (!info2.demand_p (DEMAND_LMUL)) > + { > + if (info2.demand_p (DEMAND_GE_SEW)) > + return info1.get_sew () < info2.get_sew (); > + else if (!info2.demand_p (DEMAND_SEW)) > + return false; > + } This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. > return true; > } > @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > return; > if (optimize == 0 && !has_vtype_op (rinsn)) > return; > - if (optimize > 0 && !vsetvl_insn_p (rinsn)) > + if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn)) I didn't get this change, could you explan few more about that? it was early exit for non vsetvl insn, but now it allowed that now? > return; > m_state = VALID; > extract_insn_cached (rinsn); > @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const vector_insn_info &info1, > > vector_insn_info > vector_insn_info::merge (const vector_insn_info &merge_info, > - enum merge_type type) const > + enum merge_type type, unsigned bb_index) const > { > - if (!vsetvl_insn_p (get_insn ()->rtl ())) > + if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info) Why need this exception? > gcc_assert (this->compatible_p (merge_info) > && "Can't merge incompatible demanded infos"); > @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs ( > } > > bool > -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const > +vector_infos_manager::earliest_fusion_worthwhile_p ( > + const basic_block cfg_bb) const > { > - hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); > - for (const basic_block pred_cfg_bb : pred_cfg_bbs) > + edge e; > + edge_iterator ei; > + profile_probability prob = profile_probability::uninitialized (); > + FOR_EACH_EDGE (e, ei, cfg_bb->succs) > { > - const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index]; > - if (!pred_block_info.local_dem.valid_or_dirty_p () > - && !pred_block_info.reaching_out.valid_or_dirty_p ()) > + if (prob == profile_probability::uninitialized ()) > + prob = vector_block_infos[e->dest->index].probability; > + else if (prob == vector_block_infos[e->dest->index].probability) > continue; > - return false; > + else > + return true; Make sure I understand this correctly: it's worth if thoe edges has different probability? > } > - return true; > + return false; If all probability is same, then it's not worth? Plz add few comments no matter my understand is right or not :) > } > > bool > @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const > sbitmap_iterator sbi; > > EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi) > - { > - if (ratio == -1) > - ratio = vector_exprs[bb_index]->get_ratio (); > - else if (vector_exprs[bb_index]->get_ratio () != ratio) > - return false; > - } > + { > + if (ratio == -1) > + ratio = vector_exprs[bb_index]->get_ratio (); > + else if (vector_exprs[bb_index]->get_ratio () != ratio) > + return false; > + } > return true; > } Split this into a NFC patch, you can commit that without asking review. > @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn, > ] UNSPEC_VPREDICATE) > (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137]) > (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5 > - [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) "rvv.c":8:12 > - 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV > + [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) > + "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV > (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 +0 S[64, 64] A128]) > (expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [ > (const_vector:RVVMF8BI repeat [ Split this into a NFC patch, you can commit that without asking review. > @@ -2777,6 +2770,17 @@ pass_vsetvl::update_vector_info (const insn_info *i, > m_vector_manager->vector_insn_infos[i->uid ()] = new_info; > } > > +void > +pass_vsetvl::update_block_info (int index, profile_probability prob, > + vector_insn_info new_info) const vector_insn_info &new_info > +{ > + m_vector_manager->vector_block_infos[index].probability = prob; > + if (m_vector_manager->vector_block_infos[index].local_dem > + == m_vector_manager->vector_block_infos[index].reaching_out) > + m_vector_manager->vector_block_infos[index].local_dem = new_info; > + m_vector_manager->vector_block_infos[index].reaching_out = new_info; > +} > + { auto &block_info = m_vector_manager->vector_block_infos[index]; block_info.probability = prob; if (block_info.local_dem == block_info.reaching_out) block_info.local_dem = new_info; block_info.reaching_out = new_info; } > /* Simple m_vsetvl_insert vsetvl for optimize == 0. */ > void > pass_vsetvl::simple_vsetvl (void) const > + for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn (); > + real_insn_and_same_bb_p (i, earliest_pred) > + && after_or_same_p (i, last_insn); > + i = i->prev_nondebug_insn ()) > { > + if (!vl && find_access (i->defs (), REGNO (avl))) > + return false; > + if (vl && find_access (i->defs (), REGNO (vl))) > + return false; > + if (vl && find_access (i->uses (), REGNO (vl))) > + return false; should we check `i->is_call () || i->is_asm ()`? > @@ -3892,7 +3408,7 @@ pass_vsetvl::refine_vsetvls (void) const > basic_block cfg_bb; > FOR_EACH_BB_FN (cfg_bb, cfun) > { > - auto info = get_block_info(cfg_bb).local_dem; > + auto info = get_block_info (cfg_bb).local_dem; > insn_info *insn = info.get_insn (); > if (!info.valid_p ()) > continue; Split this into a NFC patch, you can commit that without asking review. > @@ -3938,8 +3454,7 @@ pass_vsetvl::cleanup_vsetvls () > basic_block cfg_bb; > FOR_EACH_BB_FN (cfg_bb, cfun) > { > - auto &info > - = get_block_info(cfg_bb).reaching_out; > + auto &info = get_block_info (cfg_bb).reaching_out; > gcc_assert (m_vector_manager->expr_set_num ( > m_vector_manager->vector_del[cfg_bb->index]) > <= 1); Split this into a NFC patch, you can commit that without asking review. > @@ -3951,9 +3466,7 @@ pass_vsetvl::cleanup_vsetvls () > info.set_unknown (); > else > { > - const auto dem > - = get_block_info(cfg_bb) > - .local_dem; > + const auto dem = get_block_info (cfg_bb).local_dem; > gcc_assert (dem == *m_vector_manager->vector_exprs[i]); > insn_info *insn = dem.get_insn (); > gcc_assert (insn && insn->rtl ()); Split this into a NFC patch, you can commit that without asking review. > @@ -4020,33 +3543,10 @@ pass_vsetvl::commit_vsetvls (void) > for (const bb_info *bb : crtl->ssa->bbs ()) > { > basic_block cfg_bb = bb->cfg_bb (); > - const auto reaching_out > - = get_block_info(cfg_bb).reaching_out; > + const auto reaching_out = get_block_info (cfg_bb).reaching_out; Split this into a NFC patch, you can commit that without asking review. > @@ -4263,7 +3783,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const > > /* Local AVL compatibility checking is simpler than global, we only > need to check the REGNO is same. */ > - if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem) > + if (prev_dem.valid_or_dirty_p () > + && prev_dem.skip_avl_compatible_p (curr_dem) > && local_avl_compatible_p (prev_avl, curr_avl)) > { > /* curr_dem and prev_dem is compatible! */ Split this into a NFC patch, you can commit that without asking review. >@@ -4655,8 +4240,7 @@ pass_vsetvl::compute_probabilities (void) > for (const bb_info *bb : crtl->ssa->bbs ()) > { > basic_block cfg_bb = bb->cfg_bb (); >- auto &curr_prob >- = get_block_info(cfg_bb).probability; >+ auto &curr_prob = get_block_info (cfg_bb).probability; > > /* GCC assume entry block (bb 0) are always so > executed so set its probability as "always". */ Split this into a NFC patch, you can commit that without asking review. > @@ -4669,8 +4253,7 @@ pass_vsetvl::compute_probabilities (void) > gcc_assert (curr_prob.initialized_p ()); > FOR_EACH_EDGE (e, ei, cfg_bb->succs) > { > - auto &new_prob > - = get_block_info(e->dest).probability; > + auto &new_prob = get_block_info (e->dest).probability; > if (!new_prob.initialized_p ()) > new_prob = curr_prob * e->probability; > else if (new_prob == profile_probability::always ()) Split this into a NFC patch, you can commit that without asking review. > @@ -4298,7 +3819,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const > none exists or if a user RVV instruction is enountered > prior to any vsetvl. */ > static rtx_insn * > -get_first_vsetvl_before_rvv_insns (basic_block cfg_bb) > +get_first_vsetvl_before_rvv_insns (basic_block cfg_bb, > + enum vsetvl_type insn_type) > { add gcc_assert (insn_type == VSETVL_DISCARD_RESULT || insn_type == VSETVL_VTYPE_CHANGE_ONLY). > rtx_insn *rinsn; > FOR_BB_INSNS (cfg_bb, rinsn) > @@ -4310,7 +3832,11 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb) > if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn)) > return nullptr; > > - if (vsetvl_discard_result_insn_p (rinsn)) > + if (insn_type == VSETVL_DISCARD_RESULT > + && vsetvl_discard_result_insn_p (rinsn)) > + return rinsn; > + if (insn_type == VSETVL_VTYPE_CHANGE_ONLY > + && vsetvl_vtype_change_only_p (rinsn)) > return rinsn; > } > return nullptr; > diff --git a/gcc/config/riscv/riscv-vsetvl.def b/gcc/config/riscv/riscv-vsetvl.def > index 7a73149f1da..7289c01efcf 100644 > --- a/gcc/config/riscv/riscv-vsetvl.def > +++ b/gcc/config/riscv/riscv-vsetvl.def > @@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE, > /*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE, > /*NEW_DEMAND_SEW*/ true, > /*NEW_DEMAND_LMUL*/ false, > - /*NEW_DEMAND_RATIO*/ false, > + /*NEW_DEMAND_RATIO*/ true, This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. > /*NEW_DEMAND_GE_SEW*/ true, first_sew, > vlmul_for_first_sew_second_ratio, second_ratio) > DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE, > @@ -386,7 +337,8 @@ public: > bool compatible_avl_p (const avl_info &) const; > bool compatible_vtype_p (const vl_vtype_info &) const; > bool compatible_p (const vl_vtype_info &) const; > - vector_insn_info merge (const vector_insn_info &, enum merge_type) const; > + vector_insn_info merge (const vector_insn_info &, enum merge_type, > + unsigned = 0) const; it seems weired to set bb_index as 0 by default? > > rtl_ssa::insn_info *get_insn () const { return m_insn; } > const bool *get_demands (void) const { return m_demands; } > diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv > index 1252d6f851a..f3ce66ccdd4 100644 > --- a/gcc/config/riscv/t-riscv > +++ b/gcc/config/riscv/t-riscv > @@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \ > $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \ > $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \ > insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \ > - predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h > + predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \ > + $(srcdir)/config/riscv/riscv-vsetvl.def This should be a seperate fix and backport to GCC 13 as well, pre-approve for both master and GCC-13 branch for this fix. > $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > $(srcdir)/config/riscv/riscv-vsetvl.cc