This patch removed 2nd time initialization of RTL_SSA which is the approach we both hate. juzhe.zhong@rivai.ai From: Kito Cheng Date: 2023-06-09 18:45 To: juzhe.zhong CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS Thankful you send this before weekend, I could run the fuzzy testing during this weekend :P On Fri, Jun 9, 2023 at 6:41 PM wrote: > > From: Juzhe-Zhong > > This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6 > are quite messy and cause some bugs discovered by my downstream auto-vectorization > test-generator. > > Before this patch. > > Phase 5 is cleanup_insns is the function remove AVL operand dependency from each RVV instruction. > E.g. vadd.vv (use a5), after Phase 5, ====> vadd.vv (use const_int 0). Since "a5" is used in "vsetvl" instructions and > after the correct "vsetvl" instructions are inserted, each RVV instruction doesn't need AVL operand "a5" anymore. Then, > we remove this operand dependency helps for the following scheduling PASS. > > Phase 6 is propagate_avl do the following 2 things: > 1. Local && Global user vsetvl instructions optimization. > E.g. > vsetvli a2, a2, e8, mf8 ======> Change it into vsetvli a2, a2, e32, mf2 > vsetvli zero,a2, e32, mf2 ======> eliminate > 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is not used by any instructions. > Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM which change the CFG, I re-new a new > RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and optmize user vsetvli base on the new RTL_SSA. > > There are 2 issues in Phase 5 && Phase 6: > 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better local user vsetvl optimizations better than > Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So the local user vsetvli instructions optimizaiton > in Phase 6 is redundant and should be removed. > 2. A bug discovered by my downstream auto-vectorization test-generator (I can't put the test in this patch since we are missing autovec > patterns for it so we can't use the upstream GCC directly reproduce such issue but I will remember put it back after I support the > necessary autovec patterns). Such bug is causing by using RTL_SSA re-new framework. The issue description is this: > > Before Phase 6: > ... > insn1: vsetlvi a3, 17 <========== generated by SELECT_VL auto-vec pattern. > slli a4,a3,3 > ... > insn2: vsetvli zero, a3, ... > load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is removed in Phase 5) > ... > > In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1. > insn2 is the vsetvli instruction inserted in Phase 4 which is not included in the RLT_SSA framework > even though we renew it (I didn't take a look at it and I don't think we need to now). > Base on this situation, the def_info of insn2 has the information "set->single_nondebug_insn_use ()" > which return true. Obviously, this information is not correct, since insn1 has aleast 2 uses: > 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my downstream test-generator > execution test failed. > > Conclusion of RTL_SSA framework: > Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of the VSETVL PASS which is absolutely correct, the other > is re-new after Phase 4 (LCM) has incorrect information that causes bugs. > > Besides, we don't like to initialize RTL_SSA second time it seems to be a waste since we just need to do a little optimization. > > Base on all circumstances I described above, I rework and reorganize Phase 5 && Phase 6 as follows: > 1. Phase 5 is called ssa_post_optimization which is doing the optimization base on the RTL_SSA information (The RTL_SSA is initialized > at the beginning of the VSETVL PASS, no need to re-new it again). This phase includes 3 optimizaitons: > 1). local_eliminate_vsetvl_insn we already have (no change). > 2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal Phase 6 but with more powerful and reliable implementation. > E.g. > void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > size_t avl; > if (m > 100) > avl = __riscv_vsetvl_e16mf4(vl << 4); > else > avl = __riscv_vsetvl_e32mf2(vl >> 8); > for (size_t i = 0; i < m; i++) { > vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl); > v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl); > __riscv_vse8_v_i8mf8(out + i, v0, avl); > } > } > > This example failed to global user vsetvl optimize before this patch: > f: > li a5,100 > bleu a3,a5,.L2 > slli a2,a2,4 > vsetvli a4,a2,e16,mf4,ta,mu > .L3: > li a5,0 > vsetvli zero,a4,e8,mf8,ta,ma > .L5: > add a6,a0,a5 > add a2,a1,a5 > vle8.v v1,0(a6) > addi a5,a5,1 > vadd.vv v1,v1,v1 > vse8.v v1,0(a2) > bgtu a3,a5,.L5 > .L10: > ret > .L2: > beq a3,zero,.L10 > srli a2,a2,8 > vsetvli a4,a2,e32,mf2,ta,mu > j .L3 > With this patch: > f: > li a5,100 > bleu a3,a5,.L2 > slli a2,a2,4 > vsetvli zero,a2,e8,mf8,ta,ma > .L3: > li a5,0 > .L5: > add a6,a0,a5 > add a2,a1,a5 > vle8.v v1,0(a6) > addi a5,a5,1 > vadd.vv v1,v1,v1 > vse8.v v1,0(a2) > bgtu a3,a5,.L5 > .L10: > ret > .L2: > beq a3,zero,.L10 > srli a2,a2,8 > vsetvli zero,a2,e8,mf8,ta,ma > j .L3 > > 3). Remove AVL operand dependency of each RVV instructions. > > 2. Phase 6 is called df_post_optimization: Optimize "vsetvl a3,a2...." into Optimize "vsetvl zero,a2...." base on > dataflow analysis of new CFG (new CFG is created by LCM). The reason we need to do use new CFG and after Phase 5: > ... > vsetvl a3, a2... > vadd.vv (use a3) > If we don't have Phase 5 which removes the "a3" use in vadd.vv, we will fail to optimize vsetvl a3,a2 into vsetvl zero,a2. > > This patch passed all tests in rvv.exp with ONLY peformance && codegen improved (no performance decline and no bugs including my > downstream tests). > > gcc/ChangeLog: > > * config/riscv/riscv-vsetvl.cc (available_occurrence_p): Ehance user vsetvl optimization. > (vector_insn_info::parse_insn): Add rtx_insn parse. > (pass_vsetvl::local_eliminate_vsetvl_insn): Ehance user vsetvl optimization. > (get_first_vsetvl): New function. > (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto. > (pass_vsetvl::cleanup_insns): Remove it. > (pass_vsetvl::ssa_post_optimization): New function. > (has_no_uses): Ditto. > (pass_vsetvl::propagate_avl): Remove it. > (pass_vsetvl::df_post_optimization): New function. > (pass_vsetvl::lazy_vsetvl): Rework Phase 5 && Phase 6. > * config/riscv/riscv-vsetvl.h: Adapt declaration. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/vsetvl/vsetvl-16.c: Adapt test. > * gcc.target/riscv/rvv/vsetvl/vsetvl-2.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-3.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-21.c: New test. > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: New test. > * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: New test. > > --- > gcc/config/riscv/riscv-vsetvl.cc | 400 +++++++++++------- > gcc/config/riscv/riscv-vsetvl.h | 34 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-16.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-2.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-21.c | 21 + > .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 21 + > .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 37 ++ > .../gcc.target/riscv/rvv/vsetvl/vsetvl-3.c | 2 +- > 8 files changed, 366 insertions(+), 153 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c > > diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc > index fe55f4ccd30..924a94adf9c 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -395,10 +395,15 @@ available_occurrence_p (const bb_info *bb, const vector_insn_info dem) > if (!vlmax_avl_p (dem.get_avl ())) > { > rtx dest = NULL_RTX; > + insn_info *i = insn; > if (vsetvl_insn_p (insn->rtl ())) > - dest = get_vl (insn->rtl ()); > - for (const insn_info *i = insn; real_insn_and_same_bb_p (i, bb); > - i = i->next_nondebug_insn ()) > + { > + dest = get_vl (insn->rtl ()); > + /* For user vsetvl a2, a2 instruction, we consider it as > + available even though it modifies "a2". */ > + i = i->next_nondebug_insn (); > + } > + for (; real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ()) > { > if (read_vl_insn_p (i->rtl ())) > continue; > @@ -1893,11 +1898,13 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > *this = vector_insn_info (); > if (!NONDEBUG_INSN_P (rinsn)) > return; > - if (!has_vtype_op (rinsn)) > + if (optimize == 0 && !has_vtype_op (rinsn)) > + return; > + if (optimize > 0 && !vsetvl_insn_p (rinsn)) > return; > m_state = VALID; > extract_insn_cached (rinsn); > - const rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)]; > + rtx avl = ::get_avl (rinsn); > m_avl = avl_info (avl, nullptr); > m_sew = ::get_sew (rinsn); > m_vlmul = ::get_vlmul (rinsn); > @@ -2730,10 +2737,11 @@ private: > /* Phase 5. */ > rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) const; > void local_eliminate_vsetvl_insn (const bb_info *) const; > - void cleanup_insns (void) const; > + bool global_eliminate_vsetvl_insn (const bb_info *) const; > + void ssa_post_optimization (void) const; > > /* Phase 6. */ > - void propagate_avl (void) const; > + void df_post_optimization (void) const; > > void init (void); > void done (void); > @@ -4246,7 +4254,7 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const > > /* Local AVL compatibility checking is simpler than global, we only > need to check the REGNO is same. */ > - if (prev_dem.valid_p () && prev_dem.skip_avl_compatible_p (curr_dem) > + if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem) > && local_avl_compatible_p (prev_avl, curr_avl)) > { > /* curr_dem and prev_dem is compatible! */ > @@ -4277,27 +4285,187 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const > } > } > > -/* Before VSETVL PASS, RVV instructions pattern is depending on AVL operand > - implicitly. Since we will emit VSETVL instruction and make RVV instructions > - depending on VL/VTYPE global status registers, we remove the such AVL operand > - in the RVV instructions pattern here in order to remove AVL dependencies when > - AVL operand is a register operand. > - > - Before the VSETVL PASS: > - li a5,32 > - ... > - vadd.vv (..., a5) > - After the VSETVL PASS: > - li a5,32 > - vsetvli zero, a5, ... > - ... > - vadd.vv (..., const_int 0). */ > +/* Get the first vsetvl instructions of the block. */ > +static rtx_insn * > +get_first_vsetvl (basic_block cfg_bb) > +{ > + rtx_insn *rinsn; > + FOR_BB_INSNS (cfg_bb, rinsn) > + { > + if (!NONDEBUG_INSN_P (rinsn)) > + continue; > + /* If we don't find any inserted vsetvli before user RVV instructions, > + we don't need to optimize the vsetvls in this block. */ > + if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn)) > + return nullptr; > + > + if (vsetvl_discard_result_insn_p (rinsn)) > + return rinsn; > + } > + return nullptr; > +} > + > +/* Global user vsetvl optimizaiton: > + > + Case 1: > + bb 1: > + vsetvl a5,a4,e8,mf8 > + ... > + bb 2: > + ... > + vsetvl zero,a5,e8,mf8 --> Eliminate directly. > + > + Case 2: > + bb 1: > + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2 > + ... > + bb 2: > + ... > + vsetvl zero,a5,e32,mf2 --> Eliminate directly. > + > + Case 3: > + bb 1: > + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2 > + ... > + bb 2: > + ... > + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2 > + goto bb 3 > + bb 3: > + ... > + vsetvl zero,a5,e32,mf2 --> Eliminate directly. > +*/ > +bool > +pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const > +{ > + rtx_insn *vsetvl_rinsn; > + vector_insn_info dem = vector_insn_info (); > + const auto &block_info = get_block_info (bb); > + basic_block cfg_bb = bb->cfg_bb (); > + > + if (block_info.local_dem.valid_or_dirty_p ()) > + { > + /* Optimize the local vsetvl. */ > + dem = block_info.local_dem; > + vsetvl_rinsn = get_first_vsetvl (cfg_bb); > + } > + if (!vsetvl_rinsn) > + /* Optimize the global vsetvl inserted by LCM. */ > + vsetvl_rinsn = get_vsetvl_at_end (bb, &dem); > + > + /* No need to optimize if block doesn't have vsetvl instructions. */ > + if (!dem.valid_or_dirty_p () || !vsetvl_rinsn || !dem.get_avl_source () > + || !dem.has_avl_reg ()) > + return false; > + > + /* If all preds has VL/VTYPE status setted by user vsetvls, and these > + user vsetvls are all skip_avl_compatible_p with the vsetvl in this > + block, we can eliminate this vsetvl instruction. */ > + sbitmap avin = m_vector_manager->vector_avin[cfg_bb->index]; > + > + unsigned int bb_index; > + sbitmap_iterator sbi; > + rtx avl = get_avl (dem.get_insn ()->rtl ()); > + hash_set sets > + = get_all_sets (dem.get_avl_source (), true, false, false); > + /* Condition 1: All VL/VTYPE available in are all compatible. */ > + EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi) > + { > + const auto &expr = m_vector_manager->vector_exprs[bb_index]; > + const auto &insn = expr->get_insn (); > + def_info *def = find_access (insn->defs (), REGNO (avl)); > + set_info *set = safe_dyn_cast (def); > + if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () == bb > + || !sets.contains (set)) > + return false; > + } > + > + /* Condition 2: Check it has preds. */ > + if (EDGE_COUNT (cfg_bb->preds) == 0) > + return false; > + > + /* Condition 3: We don't do the global optimization for the block > + has a pred is entry block or exit block. */ > + /* Condition 4: All preds have available VL/VTYPE out. */ > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, cfg_bb->preds) > + { > + sbitmap avout = m_vector_manager->vector_avout[e->src->index]; > + if (e->src == ENTRY_BLOCK_PTR_FOR_FN (cfun) > + || e->src == EXIT_BLOCK_PTR_FOR_FN (cfun) || bitmap_empty_p (avout)) > + return false; > + > + EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi) > + { > + const auto &expr = m_vector_manager->vector_exprs[bb_index]; > + const auto &insn = expr->get_insn (); > + def_info *def = find_access (insn->defs (), REGNO (avl)); > + set_info *set = safe_dyn_cast (def); > + if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () == bb > + || !sets.contains (set) || !expr->skip_avl_compatible_p (dem)) > + return false; > + } > + } > + > + /* Step1: Reshape the VL/VTYPE status to make sure everything compatible. */ > + hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); > + FOR_EACH_EDGE (e, ei, cfg_bb->preds) > + { > + sbitmap avout = m_vector_manager->vector_avout[e->src->index]; > + EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi) > + { > + vector_insn_info prev_dem = *m_vector_manager->vector_exprs[bb_index]; > + vector_insn_info curr_dem = dem; > + insn_info *insn = prev_dem.get_insn (); > + if (!pred_cfg_bbs.contains (insn->bb ()->cfg_bb ())) > + continue; > + /* Update avl info since we need to make sure they are fully > + compatible before merge. */ > + curr_dem.set_avl_info (prev_dem.get_avl_info ()); > + /* Merge both and update into curr_vsetvl. */ > + prev_dem = curr_dem.merge (prev_dem, LOCAL_MERGE); > + change_vsetvl_insn (insn, prev_dem); > + } > + } > + > + /* Step2: eliminate the vsetvl instruction. */ > + eliminate_insn (vsetvl_rinsn); > + return true; > +} > + > +/* This function does the following post optimization base on RTL_SSA: > + > + 1. Local user vsetvl optimizations. > + 2. Global user vsetvl optimizations. > + 3. AVL dependencies removal: > + Before VSETVL PASS, RVV instructions pattern is depending on AVL operand > + implicitly. Since we will emit VSETVL instruction and make RVV > + instructions depending on VL/VTYPE global status registers, we remove the > + such AVL operand in the RVV instructions pattern here in order to remove > + AVL dependencies when AVL operand is a register operand. > + > + Before the VSETVL PASS: > + li a5,32 > + ... > + vadd.vv (..., a5) > + After the VSETVL PASS: > + li a5,32 > + vsetvli zero, a5, ... > + ... > + vadd.vv (..., const_int 0). */ > void > -pass_vsetvl::cleanup_insns (void) const > +pass_vsetvl::ssa_post_optimization (void) const > { > for (const bb_info *bb : crtl->ssa->bbs ()) > { > local_eliminate_vsetvl_insn (bb); > + bool changed_p = true; > + while (changed_p) > + { > + changed_p = false; > + changed_p |= global_eliminate_vsetvl_insn (bb); > + } > for (insn_info *insn : bb->real_nondebug_insns ()) > { > rtx_insn *rinsn = insn->rtl (); > @@ -4342,135 +4510,81 @@ pass_vsetvl::cleanup_insns (void) const > } > } > > +/* Return true if the SET result is not used by any instructions. */ > +static bool > +has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno) > +{ > + /* Handle the following case that can not be detected in RTL_SSA. */ > + /* E.g. > + li a5, 100 > + vsetvli a6, a5... > + ... > + vadd (use a6) > + > + The use of "a6" is removed from "vadd" but the information is > + not updated in RTL_SSA framework. We don't want to re-new > + a new RTL_SSA which is expensive, instead, we use data-flow > + analysis to check whether "a6" has no uses. */ > + if (bitmap_bit_p (df_get_live_out (cfg_bb), regno)) > + return false; > + > + rtx_insn *iter; > + for (iter = NEXT_INSN (rinsn); iter && iter != NEXT_INSN (BB_END (cfg_bb)); > + iter = NEXT_INSN (iter)) > + if (df_find_use (iter, regno_reg_rtx[regno])) > + return false; > + > + return true; > +} > + > +/* This function does the following post optimization base on dataflow > + analysis: > + > + 1. Change vsetvl rd, rs1 --> vsevl zero, rs1, if rd is not used by any > + nondebug instructions. Even though this PASS runs after RA and it doesn't > + help for reduce register pressure, it can help instructions scheduling since > + we remove the dependencies. > + > + 2. Remove redundant user vsetvls base on outcome of Phase 4 (LCM) && Phase 5 > + (AVL dependencies removal). */ > void > -pass_vsetvl::propagate_avl (void) const > -{ > - /* Rebuild the RTL_SSA according to the new CFG generated by LCM. */ > - /* Finalization of RTL_SSA. */ > - free_dominance_info (CDI_DOMINATORS); > - if (crtl->ssa->perform_pending_updates ()) > - cleanup_cfg (0); > - delete crtl->ssa; > - crtl->ssa = nullptr; > - /* Initialization of RTL_SSA. */ > - calculate_dominance_info (CDI_DOMINATORS); > +pass_vsetvl::df_post_optimization (void) const > +{ > df_analyze (); > - crtl->ssa = new function_info (cfun); > - > hash_set to_delete; > - for (const bb_info *bb : crtl->ssa->bbs ()) > + basic_block cfg_bb; > + rtx_insn *rinsn; > + FOR_ALL_BB_FN (cfg_bb, cfun) > { > - for (insn_info *insn : bb->real_nondebug_insns ()) > + FOR_BB_INSNS (cfg_bb, rinsn) > { > - if (vsetvl_discard_result_insn_p (insn->rtl ())) > + if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn)) > { > - rtx avl = get_avl (insn->rtl ()); > - if (!REG_P (avl)) > - continue; > - > - set_info *set = find_access (insn->uses (), REGNO (avl))->def (); > - insn_info *def_insn = extract_single_source (set); > - if (!def_insn) > - continue; > - > - /* Handle this case: > - vsetvli a6,zero,e32,m1,ta,mu > - li a5,4096 > - add a7,a0,a5 > - addi a7,a7,-96 > - vsetvli t1,zero,e8,mf8,ta,ma > - vle8.v v24,0(a7) > - add a5,a3,a5 > - addi a5,a5,-96 > - vse8.v v24,0(a5) > - vsetvli zero,a6,e32,m1,tu,ma > - */ > - if (vsetvl_insn_p (def_insn->rtl ())) > - { > - vl_vtype_info def_info = get_vl_vtype_info (def_insn); > - vl_vtype_info info = get_vl_vtype_info (insn); > - rtx avl = get_avl (def_insn->rtl ()); > - rtx vl = get_vl (def_insn->rtl ()); > - if (def_info.get_ratio () == info.get_ratio ()) > - { > - if (vlmax_avl_p (def_info.get_avl ())) > - { > - info.set_avl_info ( > - avl_info (def_info.get_avl (), nullptr)); > - rtx new_pat > - = gen_vsetvl_pat (VSETVL_NORMAL, info, vl); > - validate_change (insn->rtl (), > - &PATTERN (insn->rtl ()), new_pat, > - false); > - continue; > - } > - if (def_info.has_avl_imm () || rtx_equal_p (avl, vl)) > - { > - info.set_avl_info (avl_info (avl, nullptr)); > - emit_vsetvl_insn (VSETVL_DISCARD_RESULT, EMIT_AFTER, > - info, NULL_RTX, insn->rtl ()); > - if (set->single_nondebug_insn_use ()) > - { > - to_delete.add (insn->rtl ()); > - to_delete.add (def_insn->rtl ()); > - } > - continue; > - } > - } > - } > - } > - > - /* Change vsetvl rd, rs1 --> vsevl zero, rs1, > - if rd is not used by any nondebug instructions. > - Even though this PASS runs after RA and it doesn't help for > - reduce register pressure, it can help instructions scheduling > - since we remove the dependencies. */ > - if (vsetvl_insn_p (insn->rtl ())) > - { > - rtx vl = get_vl (insn->rtl ()); > - rtx avl = get_avl (insn->rtl ()); > - def_info *def = find_access (insn->defs (), REGNO (vl)); > - set_info *set = safe_dyn_cast (def); > + rtx vl = get_vl (rinsn); > vector_insn_info info; > - info.parse_insn (insn); > - gcc_assert (set); > - if (m_vector_manager->to_delete_vsetvls.contains (insn->rtl ())) > - { > - m_vector_manager->to_delete_vsetvls.remove (insn->rtl ()); > - if (m_vector_manager->to_refine_vsetvls.contains ( > - insn->rtl ())) > - m_vector_manager->to_refine_vsetvls.remove (insn->rtl ()); > - if (!set->has_nondebug_insn_uses ()) > - { > - to_delete.add (insn->rtl ()); > - continue; > - } > - } > - if (m_vector_manager->to_refine_vsetvls.contains (insn->rtl ())) > + info.parse_insn (rinsn); > + bool to_delete_p = m_vector_manager->to_delete_p (rinsn); > + bool to_refine_p = m_vector_manager->to_refine_p (rinsn); > + if (has_no_uses (cfg_bb, rinsn, REGNO (vl))) > { > - m_vector_manager->to_refine_vsetvls.remove (insn->rtl ()); > - if (!set->has_nondebug_insn_uses ()) > + if (to_delete_p) > + to_delete.add (rinsn); > + else if (to_refine_p) > { > rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, > info, NULL_RTX); > - change_insn (insn->rtl (), new_pat); > - continue; > + validate_change (rinsn, &PATTERN (rinsn), new_pat, false); > + } > + else if (!vlmax_avl_p (info.get_avl ())) > + { > + rtx new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info, > + NULL_RTX); > + validate_change (rinsn, &PATTERN (rinsn), new_pat, false); > } > - } > - if (vlmax_avl_p (avl)) > - continue; > - rtx new_pat > - = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info, NULL_RTX); > - if (!set->has_nondebug_insn_uses ()) > - { > - validate_change (insn->rtl (), &PATTERN (insn->rtl ()), > - new_pat, false); > - continue; > } > } > } > } > - > for (rtx_insn *rinsn : to_delete) > eliminate_insn (rinsn); > } > @@ -4593,16 +4707,16 @@ pass_vsetvl::lazy_vsetvl (void) > fprintf (dump_file, "\nPhase 4: PRE vsetvl by Lazy code motion (LCM)\n"); > pre_vsetvl (); > > - /* Phase 5 - Cleanup AVL && VL operand of RVV instruction. */ > + /* Phase 5 - Post optimization base on RTL_SSA. */ > if (dump_file) > - fprintf (dump_file, "\nPhase 5: Cleanup AVL and VL operands\n"); > - cleanup_insns (); > + fprintf (dump_file, "\nPhase 5: Post optimization base on RTL_SSA\n"); > + ssa_post_optimization (); > > - /* Phase 6 - Rebuild RTL_SSA to propagate AVL between vsetvls. */ > + /* Phase 6 - Post optimization base on data-flow analysis. */ > if (dump_file) > fprintf (dump_file, > - "\nPhase 6: Rebuild RTL_SSA to propagate AVL between vsetvls\n"); > - propagate_avl (); > + "\nPhase 6: Post optimization base on data-flow analysis\n"); > + df_post_optimization (); > } > > /* Main entry point for this pass. */ > diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h > index d7a6c14e931..4257451bb74 100644 > --- a/gcc/config/riscv/riscv-vsetvl.h > +++ b/gcc/config/riscv/riscv-vsetvl.h > @@ -290,13 +290,6 @@ private: > definition of AVL. */ > rtl_ssa::insn_info *m_insn; > > - /* Parse the instruction to get VL/VTYPE information and demanding > - * information. */ > - /* This is only called by simple_vsetvl subroutine when optimize == 0. > - Since RTL_SSA can not be enabled when optimize == 0, we don't initialize > - the m_insn. */ > - void parse_insn (rtx_insn *); > - > friend class vector_infos_manager; > > public: > @@ -305,6 +298,12 @@ public: > m_insn (nullptr) > {} > > + /* Parse the instruction to get VL/VTYPE information and demanding > + * information. */ > + /* This is only called by simple_vsetvl subroutine when optimize == 0. > + Since RTL_SSA can not be enabled when optimize == 0, we don't initialize > + the m_insn. */ > + void parse_insn (rtx_insn *); > /* This is only called by lazy_vsetvl subroutine when optimize > 0. > We use RTL_SSA framework to initialize the insn_info. */ > void parse_insn (rtl_ssa::insn_info *); > @@ -454,6 +453,27 @@ public: > bool all_empty_predecessor_p (const basic_block) const; > bool all_avail_in_compatible_p (const basic_block) const; > > + bool to_delete_p (rtx_insn *rinsn) > + { > + if (to_delete_vsetvls.contains (rinsn)) > + { > + to_delete_vsetvls.remove (rinsn); > + if (to_refine_vsetvls.contains (rinsn)) > + to_refine_vsetvls.remove (rinsn); > + return true; > + } > + return false; > + } > + bool to_refine_p (rtx_insn *rinsn) > + { > + if (to_refine_vsetvls.contains (rinsn)) > + { > + to_refine_vsetvls.remove (rinsn); > + return true; > + } > + return false; > + } > + > void release (void); > void create_bitmap_vectors (void); > void free_bitmap_vectors (void); > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c > index e0c6588b1db..29e05c4982b 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c > @@ -16,5 +16,5 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) { > } > } > > -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*10} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c > index 0c5da5e640c..ff0171b3ff6 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c > @@ -17,4 +17,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) { > } > > /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*10} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c > new file mode 100644 > index 00000000000..551920c6a72 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > + size_t avl; > + if (m > 100) > + avl = __riscv_vsetvl_e16mf4(vl << 4); > + else{ > + if (k) > + avl = __riscv_vsetvl_e8mf8(vl); > + } > + for (size_t i = 0; i < m; i++) { > + vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl); > + __riscv_vse8_v_i8mf8(out + i, v0, avl); > + } > +} > + > +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c > new file mode 100644 > index 00000000000..103f4238c76 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > + size_t avl; > + if (m > 100) > + avl = __riscv_vsetvl_e16mf4(vl << 4); > + else > + avl = __riscv_vsetvl_e32mf2(vl >> 8); > + for (size_t i = 0; i < m; i++) { > + vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl); > + v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl); > + __riscv_vse8_v_i8mf8(out + i, v0, avl); > + } > +} > + > +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c > new file mode 100644 > index 00000000000..66c90ac10e7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > + size_t avl; > + switch (m) > + { > + case 50: > + avl = __riscv_vsetvl_e16mf4(vl << 4); > + break; > + case 1: > + avl = __riscv_vsetvl_e32mf2(k); > + break; > + case 2: > + avl = __riscv_vsetvl_e64m1(vl); > + break; > + case 3: > + avl = __riscv_vsetvl_e32mf2(k >> 8); > + break; > + default: > + avl = __riscv_vsetvl_e32mf2(k + vl); > + break; > + } > + for (size_t i = 0; i < m; i++) { > + vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl); > + v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl); > + v0 = __riscv_vadd_vv_i8mf8_tu (v0, v0, v0, avl); > + __riscv_vse8_v_i8mf8(out + i, v0, avl); > + } > +} > + > +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c > index f995e04aacc..13d09fc3fd1 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c > @@ -18,4 +18,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) { > } > > /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*10} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ > -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ > -- > 2.36.1 >