From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: Kito.cheng <kito.cheng@sifive.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
kito.cheng <kito.cheng@gmail.com>, palmer <palmer@dabbelt.com>,
palmer <palmer@rivosinc.com>,
jeffreyalaw <jeffreyalaw@gmail.com>,
"Robin Dapp" <rdapp.gcc@gmail.com>, pan2.li <pan2.li@intel.com>
Subject: Re: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
Date: Fri, 9 Jun 2023 18:49:26 +0800 [thread overview]
Message-ID: <D1554890BD89B05A+20230609184925693903157@rivai.ai> (raw)
In-Reply-To: <CALLt3Tihy5kQCxK1yduTSYibZ7hSs3aQjuNFoWaNLy-muYtbbw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 37740 bytes --]
This patch removed 2nd time initialization of RTL_SSA which is the approach we both hate.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-06-09 18:45
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li
Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
Thankful you send this before weekend, I could run the fuzzy testing
during this weekend :P
On Fri, Jun 9, 2023 at 6:41 PM <juzhe.zhong@rivai.ai> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6
> are quite messy and cause some bugs discovered by my downstream auto-vectorization
> test-generator.
>
> Before this patch.
>
> Phase 5 is cleanup_insns is the function remove AVL operand dependency from each RVV instruction.
> E.g. vadd.vv (use a5), after Phase 5, ====> vadd.vv (use const_int 0). Since "a5" is used in "vsetvl" instructions and
> after the correct "vsetvl" instructions are inserted, each RVV instruction doesn't need AVL operand "a5" anymore. Then,
> we remove this operand dependency helps for the following scheduling PASS.
>
> Phase 6 is propagate_avl do the following 2 things:
> 1. Local && Global user vsetvl instructions optimization.
> E.g.
> vsetvli a2, a2, e8, mf8 ======> Change it into vsetvli a2, a2, e32, mf2
> vsetvli zero,a2, e32, mf2 ======> eliminate
> 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is not used by any instructions.
> Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM which change the CFG, I re-new a new
> RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and optmize user vsetvli base on the new RTL_SSA.
>
> There are 2 issues in Phase 5 && Phase 6:
> 1. local_eliminate_vsetvl_insn was introduced by @kito which can do better local user vsetvl optimizations better than
> Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So the local user vsetvli instructions optimizaiton
> in Phase 6 is redundant and should be removed.
> 2. A bug discovered by my downstream auto-vectorization test-generator (I can't put the test in this patch since we are missing autovec
> patterns for it so we can't use the upstream GCC directly reproduce such issue but I will remember put it back after I support the
> necessary autovec patterns). Such bug is causing by using RTL_SSA re-new framework. The issue description is this:
>
> Before Phase 6:
> ...
> insn1: vsetlvi a3, 17 <========== generated by SELECT_VL auto-vec pattern.
> slli a4,a3,3
> ...
> insn2: vsetvli zero, a3, ...
> load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is removed in Phase 5)
> ...
>
> In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
> insn2 is the vsetvli instruction inserted in Phase 4 which is not included in the RLT_SSA framework
> even though we renew it (I didn't take a look at it and I don't think we need to now).
> Base on this situation, the def_info of insn2 has the information "set->single_nondebug_insn_use ()"
> which return true. Obviously, this information is not correct, since insn1 has aleast 2 uses:
> 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my downstream test-generator
> execution test failed.
>
> Conclusion of RTL_SSA framework:
> Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of the VSETVL PASS which is absolutely correct, the other
> is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
>
> Besides, we don't like to initialize RTL_SSA second time it seems to be a waste since we just need to do a little optimization.
>
> Base on all circumstances I described above, I rework and reorganize Phase 5 && Phase 6 as follows:
> 1. Phase 5 is called ssa_post_optimization which is doing the optimization base on the RTL_SSA information (The RTL_SSA is initialized
> at the beginning of the VSETVL PASS, no need to re-new it again). This phase includes 3 optimizaitons:
> 1). local_eliminate_vsetvl_insn we already have (no change).
> 2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal Phase 6 but with more powerful and reliable implementation.
> E.g.
> void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> size_t avl;
> if (m > 100)
> avl = __riscv_vsetvl_e16mf4(vl << 4);
> else
> avl = __riscv_vsetvl_e32mf2(vl >> 8);
> for (size_t i = 0; i < m; i++) {
> vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
> v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
> __riscv_vse8_v_i8mf8(out + i, v0, avl);
> }
> }
>
> This example failed to global user vsetvl optimize before this patch:
> f:
> li a5,100
> bleu a3,a5,.L2
> slli a2,a2,4
> vsetvli a4,a2,e16,mf4,ta,mu
> .L3:
> li a5,0
> vsetvli zero,a4,e8,mf8,ta,ma
> .L5:
> add a6,a0,a5
> add a2,a1,a5
> vle8.v v1,0(a6)
> addi a5,a5,1
> vadd.vv v1,v1,v1
> vse8.v v1,0(a2)
> bgtu a3,a5,.L5
> .L10:
> ret
> .L2:
> beq a3,zero,.L10
> srli a2,a2,8
> vsetvli a4,a2,e32,mf2,ta,mu
> j .L3
> With this patch:
> f:
> li a5,100
> bleu a3,a5,.L2
> slli a2,a2,4
> vsetvli zero,a2,e8,mf8,ta,ma
> .L3:
> li a5,0
> .L5:
> add a6,a0,a5
> add a2,a1,a5
> vle8.v v1,0(a6)
> addi a5,a5,1
> vadd.vv v1,v1,v1
> vse8.v v1,0(a2)
> bgtu a3,a5,.L5
> .L10:
> ret
> .L2:
> beq a3,zero,.L10
> srli a2,a2,8
> vsetvli zero,a2,e8,mf8,ta,ma
> j .L3
>
> 3). Remove AVL operand dependency of each RVV instructions.
>
> 2. Phase 6 is called df_post_optimization: Optimize "vsetvl a3,a2...." into Optimize "vsetvl zero,a2...." base on
> dataflow analysis of new CFG (new CFG is created by LCM). The reason we need to do use new CFG and after Phase 5:
> ...
> vsetvl a3, a2...
> vadd.vv (use a3)
> If we don't have Phase 5 which removes the "a3" use in vadd.vv, we will fail to optimize vsetvl a3,a2 into vsetvl zero,a2.
>
> This patch passed all tests in rvv.exp with ONLY peformance && codegen improved (no performance decline and no bugs including my
> downstream tests).
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (available_occurrence_p): Ehance user vsetvl optimization.
> (vector_insn_info::parse_insn): Add rtx_insn parse.
> (pass_vsetvl::local_eliminate_vsetvl_insn): Ehance user vsetvl optimization.
> (get_first_vsetvl): New function.
> (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
> (pass_vsetvl::cleanup_insns): Remove it.
> (pass_vsetvl::ssa_post_optimization): New function.
> (has_no_uses): Ditto.
> (pass_vsetvl::propagate_avl): Remove it.
> (pass_vsetvl::df_post_optimization): New function.
> (pass_vsetvl::lazy_vsetvl): Rework Phase 5 && Phase 6.
> * config/riscv/riscv-vsetvl.h: Adapt declaration.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/vsetvl-16.c: Adapt test.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-2.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-3.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-21.c: New test.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: New test.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: New test.
>
> ---
> gcc/config/riscv/riscv-vsetvl.cc | 400 +++++++++++-------
> gcc/config/riscv/riscv-vsetvl.h | 34 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-16.c | 2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-2.c | 2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-21.c | 21 +
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 21 +
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 37 ++
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-3.c | 2 +-
> 8 files changed, 366 insertions(+), 153 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
> index fe55f4ccd30..924a94adf9c 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -395,10 +395,15 @@ available_occurrence_p (const bb_info *bb, const vector_insn_info dem)
> if (!vlmax_avl_p (dem.get_avl ()))
> {
> rtx dest = NULL_RTX;
> + insn_info *i = insn;
> if (vsetvl_insn_p (insn->rtl ()))
> - dest = get_vl (insn->rtl ());
> - for (const insn_info *i = insn; real_insn_and_same_bb_p (i, bb);
> - i = i->next_nondebug_insn ())
> + {
> + dest = get_vl (insn->rtl ());
> + /* For user vsetvl a2, a2 instruction, we consider it as
> + available even though it modifies "a2". */
> + i = i->next_nondebug_insn ();
> + }
> + for (; real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
> {
> if (read_vl_insn_p (i->rtl ()))
> continue;
> @@ -1893,11 +1898,13 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
> *this = vector_insn_info ();
> if (!NONDEBUG_INSN_P (rinsn))
> return;
> - if (!has_vtype_op (rinsn))
> + if (optimize == 0 && !has_vtype_op (rinsn))
> + return;
> + if (optimize > 0 && !vsetvl_insn_p (rinsn))
> return;
> m_state = VALID;
> extract_insn_cached (rinsn);
> - const rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
> + rtx avl = ::get_avl (rinsn);
> m_avl = avl_info (avl, nullptr);
> m_sew = ::get_sew (rinsn);
> m_vlmul = ::get_vlmul (rinsn);
> @@ -2730,10 +2737,11 @@ private:
> /* Phase 5. */
> rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) const;
> void local_eliminate_vsetvl_insn (const bb_info *) const;
> - void cleanup_insns (void) const;
> + bool global_eliminate_vsetvl_insn (const bb_info *) const;
> + void ssa_post_optimization (void) const;
>
> /* Phase 6. */
> - void propagate_avl (void) const;
> + void df_post_optimization (void) const;
>
> void init (void);
> void done (void);
> @@ -4246,7 +4254,7 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
>
> /* Local AVL compatibility checking is simpler than global, we only
> need to check the REGNO is same. */
> - if (prev_dem.valid_p () && prev_dem.skip_avl_compatible_p (curr_dem)
> + if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p (curr_dem)
> && local_avl_compatible_p (prev_avl, curr_avl))
> {
> /* curr_dem and prev_dem is compatible! */
> @@ -4277,27 +4285,187 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info *bb) const
> }
> }
>
> -/* Before VSETVL PASS, RVV instructions pattern is depending on AVL operand
> - implicitly. Since we will emit VSETVL instruction and make RVV instructions
> - depending on VL/VTYPE global status registers, we remove the such AVL operand
> - in the RVV instructions pattern here in order to remove AVL dependencies when
> - AVL operand is a register operand.
> -
> - Before the VSETVL PASS:
> - li a5,32
> - ...
> - vadd.vv (..., a5)
> - After the VSETVL PASS:
> - li a5,32
> - vsetvli zero, a5, ...
> - ...
> - vadd.vv (..., const_int 0). */
> +/* Get the first vsetvl instructions of the block. */
> +static rtx_insn *
> +get_first_vsetvl (basic_block cfg_bb)
> +{
> + rtx_insn *rinsn;
> + FOR_BB_INSNS (cfg_bb, rinsn)
> + {
> + if (!NONDEBUG_INSN_P (rinsn))
> + continue;
> + /* If we don't find any inserted vsetvli before user RVV instructions,
> + we don't need to optimize the vsetvls in this block. */
> + if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
> + return nullptr;
> +
> + if (vsetvl_discard_result_insn_p (rinsn))
> + return rinsn;
> + }
> + return nullptr;
> +}
> +
> +/* Global user vsetvl optimizaiton:
> +
> + Case 1:
> + bb 1:
> + vsetvl a5,a4,e8,mf8
> + ...
> + bb 2:
> + ...
> + vsetvl zero,a5,e8,mf8 --> Eliminate directly.
> +
> + Case 2:
> + bb 1:
> + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2
> + ...
> + bb 2:
> + ...
> + vsetvl zero,a5,e32,mf2 --> Eliminate directly.
> +
> + Case 3:
> + bb 1:
> + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2
> + ...
> + bb 2:
> + ...
> + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2
> + goto bb 3
> + bb 3:
> + ...
> + vsetvl zero,a5,e32,mf2 --> Eliminate directly.
> +*/
> +bool
> +pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const
> +{
> + rtx_insn *vsetvl_rinsn;
> + vector_insn_info dem = vector_insn_info ();
> + const auto &block_info = get_block_info (bb);
> + basic_block cfg_bb = bb->cfg_bb ();
> +
> + if (block_info.local_dem.valid_or_dirty_p ())
> + {
> + /* Optimize the local vsetvl. */
> + dem = block_info.local_dem;
> + vsetvl_rinsn = get_first_vsetvl (cfg_bb);
> + }
> + if (!vsetvl_rinsn)
> + /* Optimize the global vsetvl inserted by LCM. */
> + vsetvl_rinsn = get_vsetvl_at_end (bb, &dem);
> +
> + /* No need to optimize if block doesn't have vsetvl instructions. */
> + if (!dem.valid_or_dirty_p () || !vsetvl_rinsn || !dem.get_avl_source ()
> + || !dem.has_avl_reg ())
> + return false;
> +
> + /* If all preds has VL/VTYPE status setted by user vsetvls, and these
> + user vsetvls are all skip_avl_compatible_p with the vsetvl in this
> + block, we can eliminate this vsetvl instruction. */
> + sbitmap avin = m_vector_manager->vector_avin[cfg_bb->index];
> +
> + unsigned int bb_index;
> + sbitmap_iterator sbi;
> + rtx avl = get_avl (dem.get_insn ()->rtl ());
> + hash_set<set_info *> sets
> + = get_all_sets (dem.get_avl_source (), true, false, false);
> + /* Condition 1: All VL/VTYPE available in are all compatible. */
> + EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
> + {
> + const auto &expr = m_vector_manager->vector_exprs[bb_index];
> + const auto &insn = expr->get_insn ();
> + def_info *def = find_access (insn->defs (), REGNO (avl));
> + set_info *set = safe_dyn_cast<set_info *> (def);
> + if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () == bb
> + || !sets.contains (set))
> + return false;
> + }
> +
> + /* Condition 2: Check it has preds. */
> + if (EDGE_COUNT (cfg_bb->preds) == 0)
> + return false;
> +
> + /* Condition 3: We don't do the global optimization for the block
> + has a pred is entry block or exit block. */
> + /* Condition 4: All preds have available VL/VTYPE out. */
> + edge e;
> + edge_iterator ei;
> + FOR_EACH_EDGE (e, ei, cfg_bb->preds)
> + {
> + sbitmap avout = m_vector_manager->vector_avout[e->src->index];
> + if (e->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
> + || e->src == EXIT_BLOCK_PTR_FOR_FN (cfun) || bitmap_empty_p (avout))
> + return false;
> +
> + EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi)
> + {
> + const auto &expr = m_vector_manager->vector_exprs[bb_index];
> + const auto &insn = expr->get_insn ();
> + def_info *def = find_access (insn->defs (), REGNO (avl));
> + set_info *set = safe_dyn_cast<set_info *> (def);
> + if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () == bb
> + || !sets.contains (set) || !expr->skip_avl_compatible_p (dem))
> + return false;
> + }
> + }
> +
> + /* Step1: Reshape the VL/VTYPE status to make sure everything compatible. */
> + hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
> + FOR_EACH_EDGE (e, ei, cfg_bb->preds)
> + {
> + sbitmap avout = m_vector_manager->vector_avout[e->src->index];
> + EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi)
> + {
> + vector_insn_info prev_dem = *m_vector_manager->vector_exprs[bb_index];
> + vector_insn_info curr_dem = dem;
> + insn_info *insn = prev_dem.get_insn ();
> + if (!pred_cfg_bbs.contains (insn->bb ()->cfg_bb ()))
> + continue;
> + /* Update avl info since we need to make sure they are fully
> + compatible before merge. */
> + curr_dem.set_avl_info (prev_dem.get_avl_info ());
> + /* Merge both and update into curr_vsetvl. */
> + prev_dem = curr_dem.merge (prev_dem, LOCAL_MERGE);
> + change_vsetvl_insn (insn, prev_dem);
> + }
> + }
> +
> + /* Step2: eliminate the vsetvl instruction. */
> + eliminate_insn (vsetvl_rinsn);
> + return true;
> +}
> +
> +/* This function does the following post optimization base on RTL_SSA:
> +
> + 1. Local user vsetvl optimizations.
> + 2. Global user vsetvl optimizations.
> + 3. AVL dependencies removal:
> + Before VSETVL PASS, RVV instructions pattern is depending on AVL operand
> + implicitly. Since we will emit VSETVL instruction and make RVV
> + instructions depending on VL/VTYPE global status registers, we remove the
> + such AVL operand in the RVV instructions pattern here in order to remove
> + AVL dependencies when AVL operand is a register operand.
> +
> + Before the VSETVL PASS:
> + li a5,32
> + ...
> + vadd.vv (..., a5)
> + After the VSETVL PASS:
> + li a5,32
> + vsetvli zero, a5, ...
> + ...
> + vadd.vv (..., const_int 0). */
> void
> -pass_vsetvl::cleanup_insns (void) const
> +pass_vsetvl::ssa_post_optimization (void) const
> {
> for (const bb_info *bb : crtl->ssa->bbs ())
> {
> local_eliminate_vsetvl_insn (bb);
> + bool changed_p = true;
> + while (changed_p)
> + {
> + changed_p = false;
> + changed_p |= global_eliminate_vsetvl_insn (bb);
> + }
> for (insn_info *insn : bb->real_nondebug_insns ())
> {
> rtx_insn *rinsn = insn->rtl ();
> @@ -4342,135 +4510,81 @@ pass_vsetvl::cleanup_insns (void) const
> }
> }
>
> +/* Return true if the SET result is not used by any instructions. */
> +static bool
> +has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
> +{
> + /* Handle the following case that can not be detected in RTL_SSA. */
> + /* E.g.
> + li a5, 100
> + vsetvli a6, a5...
> + ...
> + vadd (use a6)
> +
> + The use of "a6" is removed from "vadd" but the information is
> + not updated in RTL_SSA framework. We don't want to re-new
> + a new RTL_SSA which is expensive, instead, we use data-flow
> + analysis to check whether "a6" has no uses. */
> + if (bitmap_bit_p (df_get_live_out (cfg_bb), regno))
> + return false;
> +
> + rtx_insn *iter;
> + for (iter = NEXT_INSN (rinsn); iter && iter != NEXT_INSN (BB_END (cfg_bb));
> + iter = NEXT_INSN (iter))
> + if (df_find_use (iter, regno_reg_rtx[regno]))
> + return false;
> +
> + return true;
> +}
> +
> +/* This function does the following post optimization base on dataflow
> + analysis:
> +
> + 1. Change vsetvl rd, rs1 --> vsevl zero, rs1, if rd is not used by any
> + nondebug instructions. Even though this PASS runs after RA and it doesn't
> + help for reduce register pressure, it can help instructions scheduling since
> + we remove the dependencies.
> +
> + 2. Remove redundant user vsetvls base on outcome of Phase 4 (LCM) && Phase 5
> + (AVL dependencies removal). */
> void
> -pass_vsetvl::propagate_avl (void) const
> -{
> - /* Rebuild the RTL_SSA according to the new CFG generated by LCM. */
> - /* Finalization of RTL_SSA. */
> - free_dominance_info (CDI_DOMINATORS);
> - if (crtl->ssa->perform_pending_updates ())
> - cleanup_cfg (0);
> - delete crtl->ssa;
> - crtl->ssa = nullptr;
> - /* Initialization of RTL_SSA. */
> - calculate_dominance_info (CDI_DOMINATORS);
> +pass_vsetvl::df_post_optimization (void) const
> +{
> df_analyze ();
> - crtl->ssa = new function_info (cfun);
> -
> hash_set<rtx_insn *> to_delete;
> - for (const bb_info *bb : crtl->ssa->bbs ())
> + basic_block cfg_bb;
> + rtx_insn *rinsn;
> + FOR_ALL_BB_FN (cfg_bb, cfun)
> {
> - for (insn_info *insn : bb->real_nondebug_insns ())
> + FOR_BB_INSNS (cfg_bb, rinsn)
> {
> - if (vsetvl_discard_result_insn_p (insn->rtl ()))
> + if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn))
> {
> - rtx avl = get_avl (insn->rtl ());
> - if (!REG_P (avl))
> - continue;
> -
> - set_info *set = find_access (insn->uses (), REGNO (avl))->def ();
> - insn_info *def_insn = extract_single_source (set);
> - if (!def_insn)
> - continue;
> -
> - /* Handle this case:
> - vsetvli a6,zero,e32,m1,ta,mu
> - li a5,4096
> - add a7,a0,a5
> - addi a7,a7,-96
> - vsetvli t1,zero,e8,mf8,ta,ma
> - vle8.v v24,0(a7)
> - add a5,a3,a5
> - addi a5,a5,-96
> - vse8.v v24,0(a5)
> - vsetvli zero,a6,e32,m1,tu,ma
> - */
> - if (vsetvl_insn_p (def_insn->rtl ()))
> - {
> - vl_vtype_info def_info = get_vl_vtype_info (def_insn);
> - vl_vtype_info info = get_vl_vtype_info (insn);
> - rtx avl = get_avl (def_insn->rtl ());
> - rtx vl = get_vl (def_insn->rtl ());
> - if (def_info.get_ratio () == info.get_ratio ())
> - {
> - if (vlmax_avl_p (def_info.get_avl ()))
> - {
> - info.set_avl_info (
> - avl_info (def_info.get_avl (), nullptr));
> - rtx new_pat
> - = gen_vsetvl_pat (VSETVL_NORMAL, info, vl);
> - validate_change (insn->rtl (),
> - &PATTERN (insn->rtl ()), new_pat,
> - false);
> - continue;
> - }
> - if (def_info.has_avl_imm () || rtx_equal_p (avl, vl))
> - {
> - info.set_avl_info (avl_info (avl, nullptr));
> - emit_vsetvl_insn (VSETVL_DISCARD_RESULT, EMIT_AFTER,
> - info, NULL_RTX, insn->rtl ());
> - if (set->single_nondebug_insn_use ())
> - {
> - to_delete.add (insn->rtl ());
> - to_delete.add (def_insn->rtl ());
> - }
> - continue;
> - }
> - }
> - }
> - }
> -
> - /* Change vsetvl rd, rs1 --> vsevl zero, rs1,
> - if rd is not used by any nondebug instructions.
> - Even though this PASS runs after RA and it doesn't help for
> - reduce register pressure, it can help instructions scheduling
> - since we remove the dependencies. */
> - if (vsetvl_insn_p (insn->rtl ()))
> - {
> - rtx vl = get_vl (insn->rtl ());
> - rtx avl = get_avl (insn->rtl ());
> - def_info *def = find_access (insn->defs (), REGNO (vl));
> - set_info *set = safe_dyn_cast<set_info *> (def);
> + rtx vl = get_vl (rinsn);
> vector_insn_info info;
> - info.parse_insn (insn);
> - gcc_assert (set);
> - if (m_vector_manager->to_delete_vsetvls.contains (insn->rtl ()))
> - {
> - m_vector_manager->to_delete_vsetvls.remove (insn->rtl ());
> - if (m_vector_manager->to_refine_vsetvls.contains (
> - insn->rtl ()))
> - m_vector_manager->to_refine_vsetvls.remove (insn->rtl ());
> - if (!set->has_nondebug_insn_uses ())
> - {
> - to_delete.add (insn->rtl ());
> - continue;
> - }
> - }
> - if (m_vector_manager->to_refine_vsetvls.contains (insn->rtl ()))
> + info.parse_insn (rinsn);
> + bool to_delete_p = m_vector_manager->to_delete_p (rinsn);
> + bool to_refine_p = m_vector_manager->to_refine_p (rinsn);
> + if (has_no_uses (cfg_bb, rinsn, REGNO (vl)))
> {
> - m_vector_manager->to_refine_vsetvls.remove (insn->rtl ());
> - if (!set->has_nondebug_insn_uses ())
> + if (to_delete_p)
> + to_delete.add (rinsn);
> + else if (to_refine_p)
> {
> rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY,
> info, NULL_RTX);
> - change_insn (insn->rtl (), new_pat);
> - continue;
> + validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> + }
> + else if (!vlmax_avl_p (info.get_avl ()))
> + {
> + rtx new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info,
> + NULL_RTX);
> + validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> }
> - }
> - if (vlmax_avl_p (avl))
> - continue;
> - rtx new_pat
> - = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info, NULL_RTX);
> - if (!set->has_nondebug_insn_uses ())
> - {
> - validate_change (insn->rtl (), &PATTERN (insn->rtl ()),
> - new_pat, false);
> - continue;
> }
> }
> }
> }
> -
> for (rtx_insn *rinsn : to_delete)
> eliminate_insn (rinsn);
> }
> @@ -4593,16 +4707,16 @@ pass_vsetvl::lazy_vsetvl (void)
> fprintf (dump_file, "\nPhase 4: PRE vsetvl by Lazy code motion (LCM)\n");
> pre_vsetvl ();
>
> - /* Phase 5 - Cleanup AVL && VL operand of RVV instruction. */
> + /* Phase 5 - Post optimization base on RTL_SSA. */
> if (dump_file)
> - fprintf (dump_file, "\nPhase 5: Cleanup AVL and VL operands\n");
> - cleanup_insns ();
> + fprintf (dump_file, "\nPhase 5: Post optimization base on RTL_SSA\n");
> + ssa_post_optimization ();
>
> - /* Phase 6 - Rebuild RTL_SSA to propagate AVL between vsetvls. */
> + /* Phase 6 - Post optimization base on data-flow analysis. */
> if (dump_file)
> fprintf (dump_file,
> - "\nPhase 6: Rebuild RTL_SSA to propagate AVL between vsetvls\n");
> - propagate_avl ();
> + "\nPhase 6: Post optimization base on data-flow analysis\n");
> + df_post_optimization ();
> }
>
> /* Main entry point for this pass. */
> diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
> index d7a6c14e931..4257451bb74 100644
> --- a/gcc/config/riscv/riscv-vsetvl.h
> +++ b/gcc/config/riscv/riscv-vsetvl.h
> @@ -290,13 +290,6 @@ private:
> definition of AVL. */
> rtl_ssa::insn_info *m_insn;
>
> - /* Parse the instruction to get VL/VTYPE information and demanding
> - * information. */
> - /* This is only called by simple_vsetvl subroutine when optimize == 0.
> - Since RTL_SSA can not be enabled when optimize == 0, we don't initialize
> - the m_insn. */
> - void parse_insn (rtx_insn *);
> -
> friend class vector_infos_manager;
>
> public:
> @@ -305,6 +298,12 @@ public:
> m_insn (nullptr)
> {}
>
> + /* Parse the instruction to get VL/VTYPE information and demanding
> + * information. */
> + /* This is only called by simple_vsetvl subroutine when optimize == 0.
> + Since RTL_SSA can not be enabled when optimize == 0, we don't initialize
> + the m_insn. */
> + void parse_insn (rtx_insn *);
> /* This is only called by lazy_vsetvl subroutine when optimize > 0.
> We use RTL_SSA framework to initialize the insn_info. */
> void parse_insn (rtl_ssa::insn_info *);
> @@ -454,6 +453,27 @@ public:
> bool all_empty_predecessor_p (const basic_block) const;
> bool all_avail_in_compatible_p (const basic_block) const;
>
> + bool to_delete_p (rtx_insn *rinsn)
> + {
> + if (to_delete_vsetvls.contains (rinsn))
> + {
> + to_delete_vsetvls.remove (rinsn);
> + if (to_refine_vsetvls.contains (rinsn))
> + to_refine_vsetvls.remove (rinsn);
> + return true;
> + }
> + return false;
> + }
> + bool to_refine_p (rtx_insn *rinsn)
> + {
> + if (to_refine_vsetvls.contains (rinsn))
> + {
> + to_refine_vsetvls.remove (rinsn);
> + return true;
> + }
> + return false;
> + }
> +
> void release (void);
> void create_bitmap_vectors (void);
> void free_bitmap_vectors (void);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c
> index e0c6588b1db..29e05c4982b 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c
> @@ -16,5 +16,5 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) {
> }
> }
>
> -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*10} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c
> index 0c5da5e640c..ff0171b3ff6 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c
> @@ -17,4 +17,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) {
> }
>
> /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*10} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
> new file mode 100644
> index 00000000000..551920c6a72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> + size_t avl;
> + if (m > 100)
> + avl = __riscv_vsetvl_e16mf4(vl << 4);
> + else{
> + if (k)
> + avl = __riscv_vsetvl_e8mf8(vl);
> + }
> + for (size_t i = 0; i < m; i++) {
> + vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
> + __riscv_vse8_v_i8mf8(out + i, v0, avl);
> + }
> +}
> +
> +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
> new file mode 100644
> index 00000000000..103f4238c76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> + size_t avl;
> + if (m > 100)
> + avl = __riscv_vsetvl_e16mf4(vl << 4);
> + else
> + avl = __riscv_vsetvl_e32mf2(vl >> 8);
> + for (size_t i = 0; i < m; i++) {
> + vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
> + v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
> + __riscv_vse8_v_i8mf8(out + i, v0, avl);
> + }
> +}
> +
> +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> new file mode 100644
> index 00000000000..66c90ac10e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> + size_t avl;
> + switch (m)
> + {
> + case 50:
> + avl = __riscv_vsetvl_e16mf4(vl << 4);
> + break;
> + case 1:
> + avl = __riscv_vsetvl_e32mf2(k);
> + break;
> + case 2:
> + avl = __riscv_vsetvl_e64m1(vl);
> + break;
> + case 3:
> + avl = __riscv_vsetvl_e32mf2(k >> 8);
> + break;
> + default:
> + avl = __riscv_vsetvl_e32mf2(k + vl);
> + break;
> + }
> + for (size_t i = 0; i < m; i++) {
> + vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
> + v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
> + v0 = __riscv_vadd_vv_i8mf8_tu (v0, v0, v0, avl);
> + __riscv_vse8_v_i8mf8(out + i, v0, avl);
> + }
> +}
> +
> +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c
> index f995e04aacc..13d09fc3fd1 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c
> @@ -18,4 +18,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) {
> }
>
> /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*10} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> --
> 2.36.1
>
next prev parent reply other threads:[~2023-06-09 10:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-09 10:41 juzhe.zhong
2023-06-09 10:45 ` Kito Cheng
2023-06-09 10:49 ` juzhe.zhong [this message]
2023-06-09 14:33 ` Jeff Law
2023-06-09 14:46 ` 钟居哲
2023-06-09 14:58 ` 钟居哲
2023-06-09 15:09 ` Jeff Law
2023-06-09 22:52 ` 钟居哲
2023-06-12 19:02 ` Richard Sandiford
2023-06-16 10:55 ` Andreas Schwab
2023-06-16 11:39 ` Li, Pan2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D1554890BD89B05A+20230609184925693903157@rivai.ai \
--to=juzhe.zhong@rivai.ai \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=kito.cheng@gmail.com \
--cc=kito.cheng@sifive.com \
--cc=palmer@dabbelt.com \
--cc=palmer@rivosinc.com \
--cc=pan2.li@intel.com \
--cc=rdapp.gcc@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).