From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id 4CBC43858D35 for ; Fri, 9 Jun 2023 10:46:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4CBC43858D35 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=sifive.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-19f6f8c840bso571479fac.3 for ; Fri, 09 Jun 2023 03:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1686307567; x=1688899567; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=unBQvgzeOUyFP2owOxzZzHicRAUDhIc+FKz1JgGbnDg=; b=N/9XqN9xXHIZ0GvHGCLmwf/E5LYxlWRh7s53dTpM+7ZVjjkjxxq1T+E9vz296SWpGe mO0q6YdQ5Vnfu2g5y2RTMtxkF6kdGCTWt/foP1p5qaf+in1UMddM37ynUXMFnaKs9Luy kY+PhiM+4zGN0/y0P21iXi4h35Yrol9wJYpLbLj+q8rTpxRrfxR4n/F4c6CjSJDaaZN4 mia1ZF6xnr5XOpBFH3R3esHUAuIvdhHyL/8CW6JFZ8I/thtloKL3zZzFJMthfiQv59pr McxHJUoKv8vzzsxBhnF+4Cu0eDKBdQCq8xrge3dke7UHoNtsaRldSl8nxUxlgqtgYj3E xRJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686307567; x=1688899567; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=unBQvgzeOUyFP2owOxzZzHicRAUDhIc+FKz1JgGbnDg=; b=I14MMLnskDldfW1zzogne+gQQo+pz79l/Qi/SuMQLZl5XaDRRqf7g6+ljHJgXMeQTq 968cpdp2aqgPvF40pZGRc6JRqdumVyTvfWK5uId+eBX+qB6NpBc3BJi5cBlgb7PPIn1M mXQB+nPiFETBHu/mt+TjrUqn97A+sPK+2QUPDtHeFXg1MBrHOcBcEvTqq4QP4hQqOopE BWOm7K1JrnRIqo1SM0sjpmymlUNNxwIJrXIL6euY/KGLbtILjEiP1XDrcDBNu/hN386c qrP+wDYbXUDecZN7TRi1aKGue6DGVLlV32ImusrpR7lhfbfuOtbTyK810VyUoyJx04w2 wAHg== X-Gm-Message-State: AC+VfDwj77PiFVAdW77RMHFe+8yD3FynnqkwgG8p9hwJSOhAMhWuWszx 0NQaxVkGnOkYQrH+PS/LqD4kTWSoXI0YwQdmpWjclA== X-Google-Smtp-Source: ACHHUZ5Wm65SZa6ekv1vPXEYtROgubsoqfQvx95urGRyq4bVqHyadwEyRHLIFGoLc4f1g67Yz4hhbI/pGRYcsCuMMf4= X-Received: by 2002:a05:6808:d3:b0:39c:4629:9253 with SMTP id t19-20020a05680800d300b0039c46299253mr1309965oic.42.1686307567176; Fri, 09 Jun 2023 03:46:07 -0700 (PDT) MIME-Version: 1.0 References: <20230609104105.9100-1-juzhe.zhong@rivai.ai> In-Reply-To: <20230609104105.9100-1-juzhe.zhong@rivai.ai> From: Kito Cheng Date: Fri, 9 Jun 2023 18:45:55 +0800 Message-ID: Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS To: juzhe.zhong@rivai.ai Cc: gcc-patches@gcc.gnu.org, kito.cheng@gmail.com, palmer@dabbelt.com, palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, pan2.li@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_BODY,RCVD_IN_DNSWL_NONE,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thankful you send this before weekend, I could run the fuzzy testing during this weekend :P On Fri, Jun 9, 2023 at 6:41=E2=80=AFPM wrote: > > From: Juzhe-Zhong > > This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 &= & Phase 6 > are quite messy and cause some bugs discovered by my downstream auto-vect= orization > test-generator. > > Before this patch. > > Phase 5 is cleanup_insns is the function remove AVL operand dependency fr= om each RVV instruction. > E.g. vadd.vv (use a5), after Phase 5, =3D=3D=3D=3D> vadd.vv (use const_in= t 0). Since "a5" is used in "vsetvl" instructions and > after the correct "vsetvl" instructions are inserted, each RVV instructio= n doesn't need AVL operand "a5" anymore. Then, > we remove this operand dependency helps for the following scheduling PASS= . > > Phase 6 is propagate_avl do the following 2 things: > 1. Local && Global user vsetvl instructions optimization. > E.g. > vsetvli a2, a2, e8, mf8 =3D=3D=3D=3D=3D=3D> Change it into vsetvl= i a2, a2, e32, mf2 > vsetvli zero,a2, e32, mf2 =3D=3D=3D=3D=3D=3D> eliminate > 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2"= is not used by any instructions. > Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on= LCM which change the CFG, I re-new a new > RTL_SSA framework (which is more expensive than just using DF) for Phase = 6 and optmize user vsetvli base on the new RTL_SSA. > > There are 2 issues in Phase 5 && Phase 6: > 1. local_eliminate_vsetvl_insn was introduced by @kito which can do bette= r local user vsetvl optimizations better than > Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework= . So the local user vsetvli instructions optimizaiton > in Phase 6 is redundant and should be removed. > 2. A bug discovered by my downstream auto-vectorization test-generator (I= can't put the test in this patch since we are missing autovec > patterns for it so we can't use the upstream GCC directly reproduce su= ch issue but I will remember put it back after I support the > necessary autovec patterns). Such bug is causing by using RTL_SSA re-n= ew framework. The issue description is this: > > Before Phase 6: > ... > insn1: vsetlvi a3, 17 <=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D generated by SEL= ECT_VL auto-vec pattern. > slli a4,a3,3 > ... > insn2: vsetvli zero, a3, ... > load (use const_int 0, before Phase 5, it's using a3, but the use of "= a3" is removed in Phase 5) > ... > > In Phase 6, we iterate to insn2, then get the def of "a3" which is the in= sn1. > insn2 is the vsetvli instruction inserted in Phase 4 which is not include= d in the RLT_SSA framework > even though we renew it (I didn't take a look at it and I don't think we = need to now). > Base on this situation, the def_info of insn2 has the information "set->s= ingle_nondebug_insn_use ()" > which return true. Obviously, this information is not correct, since insn= 1 has aleast 2 uses: > 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated= by my downstream test-generator > execution test failed. > > Conclusion of RTL_SSA framework: > Before this patch, we initialize RTL_SSA 2 times. One is at the beginning= of the VSETVL PASS which is absolutely correct, the other > is re-new after Phase 4 (LCM) has incorrect information that causes bugs. > > Besides, we don't like to initialize RTL_SSA second time it seems to be a= waste since we just need to do a little optimization. > > Base on all circumstances I described above, I rework and reorganize Phas= e 5 && Phase 6 as follows: > 1. Phase 5 is called ssa_post_optimization which is doing the optimizatio= n base on the RTL_SSA information (The RTL_SSA is initialized > at the beginning of the VSETVL PASS, no need to re-new it again). This= phase includes 3 optimizaitons: > 1). local_eliminate_vsetvl_insn we already have (no change). > 2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from o= rignal Phase 6 but with more powerful and reliable implementation. > E.g. > void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > size_t avl; > if (m > 100) > avl =3D __riscv_vsetvl_e16mf4(vl << 4); > else > avl =3D __riscv_vsetvl_e32mf2(vl >> 8); > for (size_t i =3D 0; i < m; i++) { > vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl); > v0 =3D __riscv_vadd_vv_i8mf8 (v0, v0, avl); > __riscv_vse8_v_i8mf8(out + i, v0, avl); > } > } > > This example failed to global user vsetvl optimize before this patc= h: > f: > li a5,100 > bleu a3,a5,.L2 > slli a2,a2,4 > vsetvli a4,a2,e16,mf4,ta,mu > .L3: > li a5,0 > vsetvli zero,a4,e8,mf8,ta,ma > .L5: > add a6,a0,a5 > add a2,a1,a5 > vle8.v v1,0(a6) > addi a5,a5,1 > vadd.vv v1,v1,v1 > vse8.v v1,0(a2) > bgtu a3,a5,.L5 > .L10: > ret > .L2: > beq a3,zero,.L10 > srli a2,a2,8 > vsetvli a4,a2,e32,mf2,ta,mu > j .L3 > With this patch: > f: > li a5,100 > bleu a3,a5,.L2 > slli a2,a2,4 > vsetvli zero,a2,e8,mf8,ta,ma > .L3: > li a5,0 > .L5: > add a6,a0,a5 > add a2,a1,a5 > vle8.v v1,0(a6) > addi a5,a5,1 > vadd.vv v1,v1,v1 > vse8.v v1,0(a2) > bgtu a3,a5,.L5 > .L10: > ret > .L2: > beq a3,zero,.L10 > srli a2,a2,8 > vsetvli zero,a2,e8,mf8,ta,ma > j .L3 > > 3). Remove AVL operand dependency of each RVV instructions. > > 2. Phase 6 is called df_post_optimization: Optimize "vsetvl a3,a2...." in= to Optimize "vsetvl zero,a2...." base on > dataflow analysis of new CFG (new CFG is created by LCM). The reason w= e need to do use new CFG and after Phase 5: > ... > vsetvl a3, a2... > vadd.vv (use a3) > If we don't have Phase 5 which removes the "a3" use in vadd.vv, we wil= l fail to optimize vsetvl a3,a2 into vsetvl zero,a2. > > This patch passed all tests in rvv.exp with ONLY peformance && codegen= improved (no performance decline and no bugs including my > downstream tests). > > gcc/ChangeLog: > > * config/riscv/riscv-vsetvl.cc (available_occurrence_p): Ehance u= ser vsetvl optimization. > (vector_insn_info::parse_insn): Add rtx_insn parse. > (pass_vsetvl::local_eliminate_vsetvl_insn): Ehance user vsetvl op= timization. > (get_first_vsetvl): New function. > (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto. > (pass_vsetvl::cleanup_insns): Remove it. > (pass_vsetvl::ssa_post_optimization): New function. > (has_no_uses): Ditto. > (pass_vsetvl::propagate_avl): Remove it. > (pass_vsetvl::df_post_optimization): New function. > (pass_vsetvl::lazy_vsetvl): Rework Phase 5 && Phase 6. > * config/riscv/riscv-vsetvl.h: Adapt declaration. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/vsetvl/vsetvl-16.c: Adapt test. > * gcc.target/riscv/rvv/vsetvl/vsetvl-2.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-3.c: Ditto. > * gcc.target/riscv/rvv/vsetvl/vsetvl-21.c: New test. > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: New test. > * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: New test. > > --- > gcc/config/riscv/riscv-vsetvl.cc | 400 +++++++++++------- > gcc/config/riscv/riscv-vsetvl.h | 34 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-16.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-2.c | 2 +- > .../gcc.target/riscv/rvv/vsetvl/vsetvl-21.c | 21 + > .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c | 21 + > .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 37 ++ > .../gcc.target/riscv/rvv/vsetvl/vsetvl-3.c | 2 +- > 8 files changed, 366 insertions(+), 153 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c > > diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vs= etvl.cc > index fe55f4ccd30..924a94adf9c 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -395,10 +395,15 @@ available_occurrence_p (const bb_info *bb, const ve= ctor_insn_info dem) > if (!vlmax_avl_p (dem.get_avl ())) > { > rtx dest =3D NULL_RTX; > + insn_info *i =3D insn; > if (vsetvl_insn_p (insn->rtl ())) > - dest =3D get_vl (insn->rtl ()); > - for (const insn_info *i =3D insn; real_insn_and_same_bb_p (i, b= b); > - i =3D i->next_nondebug_insn ()) > + { > + dest =3D get_vl (insn->rtl ()); > + /* For user vsetvl a2, a2 instruction, we consider it as > + available even though it modifies "a2". */ > + i =3D i->next_nondebug_insn (); > + } > + for (; real_insn_and_same_bb_p (i, bb); i =3D i->next_nondebug_= insn ()) > { > if (read_vl_insn_p (i->rtl ())) > continue; > @@ -1893,11 +1898,13 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > *this =3D vector_insn_info (); > if (!NONDEBUG_INSN_P (rinsn)) > return; > - if (!has_vtype_op (rinsn)) > + if (optimize =3D=3D 0 && !has_vtype_op (rinsn)) > + return; > + if (optimize > 0 && !vsetvl_insn_p (rinsn)) > return; > m_state =3D VALID; > extract_insn_cached (rinsn); > - const rtx avl =3D recog_data.operand[get_attr_vl_op_idx (rinsn)]; > + rtx avl =3D ::get_avl (rinsn); > m_avl =3D avl_info (avl, nullptr); > m_sew =3D ::get_sew (rinsn); > m_vlmul =3D ::get_vlmul (rinsn); > @@ -2730,10 +2737,11 @@ private: > /* Phase 5. */ > rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) cons= t; > void local_eliminate_vsetvl_insn (const bb_info *) const; > - void cleanup_insns (void) const; > + bool global_eliminate_vsetvl_insn (const bb_info *) const; > + void ssa_post_optimization (void) const; > > /* Phase 6. */ > - void propagate_avl (void) const; > + void df_post_optimization (void) const; > > void init (void); > void done (void); > @@ -4246,7 +4254,7 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_= info *bb) const > > /* Local AVL compatibility checking is simpler than global, we onl= y > need to check the REGNO is same. */ > - if (prev_dem.valid_p () && prev_dem.skip_avl_compatible_p (curr_de= m) > + if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p= (curr_dem) > && local_avl_compatible_p (prev_avl, curr_avl)) > { > /* curr_dem and prev_dem is compatible! */ > @@ -4277,27 +4285,187 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const = bb_info *bb) const > } > } > > -/* Before VSETVL PASS, RVV instructions pattern is depending on AVL oper= and > - implicitly. Since we will emit VSETVL instruction and make RVV instru= ctions > - depending on VL/VTYPE global status registers, we remove the such AVL= operand > - in the RVV instructions pattern here in order to remove AVL dependenc= ies when > - AVL operand is a register operand. > - > - Before the VSETVL PASS: > - li a5,32 > - ... > - vadd.vv (..., a5) > - After the VSETVL PASS: > - li a5,32 > - vsetvli zero, a5, ... > - ... > - vadd.vv (..., const_int 0). */ > +/* Get the first vsetvl instructions of the block. */ > +static rtx_insn * > +get_first_vsetvl (basic_block cfg_bb) > +{ > + rtx_insn *rinsn; > + FOR_BB_INSNS (cfg_bb, rinsn) > + { > + if (!NONDEBUG_INSN_P (rinsn)) > + continue; > + /* If we don't find any inserted vsetvli before user RVV instructi= ons, > + we don't need to optimize the vsetvls in this block. */ > + if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn)) > + return nullptr; > + > + if (vsetvl_discard_result_insn_p (rinsn)) > + return rinsn; > + } > + return nullptr; > +} > + > +/* Global user vsetvl optimizaiton: > + > + Case 1: > + bb 1: > + vsetvl a5,a4,e8,mf8 > + ... > + bb 2: > + ... > + vsetvl zero,a5,e8,mf8 --> Eliminate directly. > + > + Case 2: > + bb 1: > + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2 > + ... > + bb 2: > + ... > + vsetvl zero,a5,e32,mf2 --> Eliminate directly. > + > + Case 3: > + bb 1: > + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2 > + ... > + bb 2: > + ... > + vsetvl a5,a4,e8,mf8 --> vsetvl a5,a4,e32,mf2 > + goto bb 3 > + bb 3: > + ... > + vsetvl zero,a5,e32,mf2 --> Eliminate directly. > +*/ > +bool > +pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const > +{ > + rtx_insn *vsetvl_rinsn; > + vector_insn_info dem =3D vector_insn_info (); > + const auto &block_info =3D get_block_info (bb); > + basic_block cfg_bb =3D bb->cfg_bb (); > + > + if (block_info.local_dem.valid_or_dirty_p ()) > + { > + /* Optimize the local vsetvl. */ > + dem =3D block_info.local_dem; > + vsetvl_rinsn =3D get_first_vsetvl (cfg_bb); > + } > + if (!vsetvl_rinsn) > + /* Optimize the global vsetvl inserted by LCM. */ > + vsetvl_rinsn =3D get_vsetvl_at_end (bb, &dem); > + > + /* No need to optimize if block doesn't have vsetvl instructions. */ > + if (!dem.valid_or_dirty_p () || !vsetvl_rinsn || !dem.get_avl_source (= ) > + || !dem.has_avl_reg ()) > + return false; > + > + /* If all preds has VL/VTYPE status setted by user vsetvls, and these > + user vsetvls are all skip_avl_compatible_p with the vsetvl in this > + block, we can eliminate this vsetvl instruction. */ > + sbitmap avin =3D m_vector_manager->vector_avin[cfg_bb->index]; > + > + unsigned int bb_index; > + sbitmap_iterator sbi; > + rtx avl =3D get_avl (dem.get_insn ()->rtl ()); > + hash_set sets > + =3D get_all_sets (dem.get_avl_source (), true, false, false); > + /* Condition 1: All VL/VTYPE available in are all compatible. */ > + EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi) > + { > + const auto &expr =3D m_vector_manager->vector_exprs[bb_index]; > + const auto &insn =3D expr->get_insn (); > + def_info *def =3D find_access (insn->defs (), REGNO (avl)); > + set_info *set =3D safe_dyn_cast (def); > + if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () =3D=3D bb > + || !sets.contains (set)) > + return false; > + } > + > + /* Condition 2: Check it has preds. */ > + if (EDGE_COUNT (cfg_bb->preds) =3D=3D 0) > + return false; > + > + /* Condition 3: We don't do the global optimization for the block > + has a pred is entry block or exit block. */ > + /* Condition 4: All preds have available VL/VTYPE out. */ > + edge e; > + edge_iterator ei; > + FOR_EACH_EDGE (e, ei, cfg_bb->preds) > + { > + sbitmap avout =3D m_vector_manager->vector_avout[e->src->index]; > + if (e->src =3D=3D ENTRY_BLOCK_PTR_FOR_FN (cfun) > + || e->src =3D=3D EXIT_BLOCK_PTR_FOR_FN (cfun) || bitmap_empty_p= (avout)) > + return false; > + > + EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi) > + { > + const auto &expr =3D m_vector_manager->vector_exprs[bb_index]; > + const auto &insn =3D expr->get_insn (); > + def_info *def =3D find_access (insn->defs (), REGNO (avl)); > + set_info *set =3D safe_dyn_cast (def); > + if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () =3D=3D bb > + || !sets.contains (set) || !expr->skip_avl_compatible_p (de= m)) > + return false; > + } > + } > + > + /* Step1: Reshape the VL/VTYPE status to make sure everything compatib= le. */ > + hash_set pred_cfg_bbs =3D get_all_predecessors (cfg_bb); > + FOR_EACH_EDGE (e, ei, cfg_bb->preds) > + { > + sbitmap avout =3D m_vector_manager->vector_avout[e->src->index]; > + EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi) > + { > + vector_insn_info prev_dem =3D *m_vector_manager->vector_exprs[b= b_index]; > + vector_insn_info curr_dem =3D dem; > + insn_info *insn =3D prev_dem.get_insn (); > + if (!pred_cfg_bbs.contains (insn->bb ()->cfg_bb ())) > + continue; > + /* Update avl info since we need to make sure they are fully > + compatible before merge. */ > + curr_dem.set_avl_info (prev_dem.get_avl_info ()); > + /* Merge both and update into curr_vsetvl. */ > + prev_dem =3D curr_dem.merge (prev_dem, LOCAL_MERGE); > + change_vsetvl_insn (insn, prev_dem); > + } > + } > + > + /* Step2: eliminate the vsetvl instruction. */ > + eliminate_insn (vsetvl_rinsn); > + return true; > +} > + > +/* This function does the following post optimization base on RTL_SSA: > + > + 1. Local user vsetvl optimizations. > + 2. Global user vsetvl optimizations. > + 3. AVL dependencies removal: > + Before VSETVL PASS, RVV instructions pattern is depending on AVL o= perand > + implicitly. Since we will emit VSETVL instruction and make RVV > + instructions depending on VL/VTYPE global status registers, we rem= ove the > + such AVL operand in the RVV instructions pattern here in order to = remove > + AVL dependencies when AVL operand is a register operand. > + > + Before the VSETVL PASS: > + li a5,32 > + ... > + vadd.vv (..., a5) > + After the VSETVL PASS: > + li a5,32 > + vsetvli zero, a5, ... > + ... > + vadd.vv (..., const_int 0). */ > void > -pass_vsetvl::cleanup_insns (void) const > +pass_vsetvl::ssa_post_optimization (void) const > { > for (const bb_info *bb : crtl->ssa->bbs ()) > { > local_eliminate_vsetvl_insn (bb); > + bool changed_p =3D true; > + while (changed_p) > + { > + changed_p =3D false; > + changed_p |=3D global_eliminate_vsetvl_insn (bb); > + } > for (insn_info *insn : bb->real_nondebug_insns ()) > { > rtx_insn *rinsn =3D insn->rtl (); > @@ -4342,135 +4510,81 @@ pass_vsetvl::cleanup_insns (void) const > } > } > > +/* Return true if the SET result is not used by any instructions. */ > +static bool > +has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno) > +{ > + /* Handle the following case that can not be detected in RTL_SSA. */ > + /* E.g. > + li a5, 100 > + vsetvli a6, a5... > + ... > + vadd (use a6) > + > + The use of "a6" is removed from "vadd" but the information is > + not updated in RTL_SSA framework. We don't want to re-new > + a new RTL_SSA which is expensive, instead, we use data-flow > + analysis to check whether "a6" has no uses. */ > + if (bitmap_bit_p (df_get_live_out (cfg_bb), regno)) > + return false; > + > + rtx_insn *iter; > + for (iter =3D NEXT_INSN (rinsn); iter && iter !=3D NEXT_INSN (BB_END (= cfg_bb)); > + iter =3D NEXT_INSN (iter)) > + if (df_find_use (iter, regno_reg_rtx[regno])) > + return false; > + > + return true; > +} > + > +/* This function does the following post optimization base on dataflow > + analysis: > + > + 1. Change vsetvl rd, rs1 --> vsevl zero, rs1, if rd is not used by an= y > + nondebug instructions. Even though this PASS runs after RA and it doe= sn't > + help for reduce register pressure, it can help instructions schedulin= g since > + we remove the dependencies. > + > + 2. Remove redundant user vsetvls base on outcome of Phase 4 (LCM) && = Phase 5 > + (AVL dependencies removal). */ > void > -pass_vsetvl::propagate_avl (void) const > -{ > - /* Rebuild the RTL_SSA according to the new CFG generated by LCM. */ > - /* Finalization of RTL_SSA. */ > - free_dominance_info (CDI_DOMINATORS); > - if (crtl->ssa->perform_pending_updates ()) > - cleanup_cfg (0); > - delete crtl->ssa; > - crtl->ssa =3D nullptr; > - /* Initialization of RTL_SSA. */ > - calculate_dominance_info (CDI_DOMINATORS); > +pass_vsetvl::df_post_optimization (void) const > +{ > df_analyze (); > - crtl->ssa =3D new function_info (cfun); > - > hash_set to_delete; > - for (const bb_info *bb : crtl->ssa->bbs ()) > + basic_block cfg_bb; > + rtx_insn *rinsn; > + FOR_ALL_BB_FN (cfg_bb, cfun) > { > - for (insn_info *insn : bb->real_nondebug_insns ()) > + FOR_BB_INSNS (cfg_bb, rinsn) > { > - if (vsetvl_discard_result_insn_p (insn->rtl ())) > + if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn)) > { > - rtx avl =3D get_avl (insn->rtl ()); > - if (!REG_P (avl)) > - continue; > - > - set_info *set =3D find_access (insn->uses (), REGNO (avl))-= >def (); > - insn_info *def_insn =3D extract_single_source (set); > - if (!def_insn) > - continue; > - > - /* Handle this case: > - vsetvli a6,zero,e32,m1,ta,mu > - li a5,4096 > - add a7,a0,a5 > - addi a7,a7,-96 > - vsetvli t1,zero,e8,mf8,ta,ma > - vle8.v v24,0(a7) > - add a5,a3,a5 > - addi a5,a5,-96 > - vse8.v v24,0(a5) > - vsetvli zero,a6,e32,m1,tu,ma > - */ > - if (vsetvl_insn_p (def_insn->rtl ())) > - { > - vl_vtype_info def_info =3D get_vl_vtype_info (def_insn)= ; > - vl_vtype_info info =3D get_vl_vtype_info (insn); > - rtx avl =3D get_avl (def_insn->rtl ()); > - rtx vl =3D get_vl (def_insn->rtl ()); > - if (def_info.get_ratio () =3D=3D info.get_ratio ()) > - { > - if (vlmax_avl_p (def_info.get_avl ())) > - { > - info.set_avl_info ( > - avl_info (def_info.get_avl (), nullptr)); > - rtx new_pat > - =3D gen_vsetvl_pat (VSETVL_NORMAL, info, vl); > - validate_change (insn->rtl (), > - &PATTERN (insn->rtl ()), new_p= at, > - false); > - continue; > - } > - if (def_info.has_avl_imm () || rtx_equal_p (avl, vl= )) > - { > - info.set_avl_info (avl_info (avl, nullptr)); > - emit_vsetvl_insn (VSETVL_DISCARD_RESULT, EMIT_A= FTER, > - info, NULL_RTX, insn->rtl ())= ; > - if (set->single_nondebug_insn_use ()) > - { > - to_delete.add (insn->rtl ()); > - to_delete.add (def_insn->rtl ()); > - } > - continue; > - } > - } > - } > - } > - > - /* Change vsetvl rd, rs1 --> vsevl zero, rs1, > - if rd is not used by any nondebug instructions. > - Even though this PASS runs after RA and it doesn't help for > - reduce register pressure, it can help instructions schedulin= g > - since we remove the dependencies. */ > - if (vsetvl_insn_p (insn->rtl ())) > - { > - rtx vl =3D get_vl (insn->rtl ()); > - rtx avl =3D get_avl (insn->rtl ()); > - def_info *def =3D find_access (insn->defs (), REGNO (vl)); > - set_info *set =3D safe_dyn_cast (def); > + rtx vl =3D get_vl (rinsn); > vector_insn_info info; > - info.parse_insn (insn); > - gcc_assert (set); > - if (m_vector_manager->to_delete_vsetvls.contains (insn->rtl= ())) > - { > - m_vector_manager->to_delete_vsetvls.remove (insn->rtl (= )); > - if (m_vector_manager->to_refine_vsetvls.contains ( > - insn->rtl ())) > - m_vector_manager->to_refine_vsetvls.remove (insn->rtl= ()); > - if (!set->has_nondebug_insn_uses ()) > - { > - to_delete.add (insn->rtl ()); > - continue; > - } > - } > - if (m_vector_manager->to_refine_vsetvls.contains (insn->rtl= ())) > + info.parse_insn (rinsn); > + bool to_delete_p =3D m_vector_manager->to_delete_p (rinsn); > + bool to_refine_p =3D m_vector_manager->to_refine_p (rinsn); > + if (has_no_uses (cfg_bb, rinsn, REGNO (vl))) > { > - m_vector_manager->to_refine_vsetvls.remove (insn->rtl (= )); > - if (!set->has_nondebug_insn_uses ()) > + if (to_delete_p) > + to_delete.add (rinsn); > + else if (to_refine_p) > { > rtx new_pat =3D gen_vsetvl_pat (VSETVL_VTYPE_CHANGE= _ONLY, > info, NULL_RTX); > - change_insn (insn->rtl (), new_pat); > - continue; > + validate_change (rinsn, &PATTERN (rinsn), new_pat, = false); > + } > + else if (!vlmax_avl_p (info.get_avl ())) > + { > + rtx new_pat =3D gen_vsetvl_pat (VSETVL_DISCARD_RESU= LT, info, > + NULL_RTX); > + validate_change (rinsn, &PATTERN (rinsn), new_pat, = false); > } > - } > - if (vlmax_avl_p (avl)) > - continue; > - rtx new_pat > - =3D gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info, NULL_RTX= ); > - if (!set->has_nondebug_insn_uses ()) > - { > - validate_change (insn->rtl (), &PATTERN (insn->rtl ()), > - new_pat, false); > - continue; > } > } > } > } > - > for (rtx_insn *rinsn : to_delete) > eliminate_insn (rinsn); > } > @@ -4593,16 +4707,16 @@ pass_vsetvl::lazy_vsetvl (void) > fprintf (dump_file, "\nPhase 4: PRE vsetvl by Lazy code motion (LCM)= \n"); > pre_vsetvl (); > > - /* Phase 5 - Cleanup AVL && VL operand of RVV instruction. */ > + /* Phase 5 - Post optimization base on RTL_SSA. */ > if (dump_file) > - fprintf (dump_file, "\nPhase 5: Cleanup AVL and VL operands\n"); > - cleanup_insns (); > + fprintf (dump_file, "\nPhase 5: Post optimization base on RTL_SSA\n"= ); > + ssa_post_optimization (); > > - /* Phase 6 - Rebuild RTL_SSA to propagate AVL between vsetvls. */ > + /* Phase 6 - Post optimization base on data-flow analysis. */ > if (dump_file) > fprintf (dump_file, > - "\nPhase 6: Rebuild RTL_SSA to propagate AVL between vsetvls= \n"); > - propagate_avl (); > + "\nPhase 6: Post optimization base on data-flow analysis\n")= ; > + df_post_optimization (); > } > > /* Main entry point for this pass. */ > diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vse= tvl.h > index d7a6c14e931..4257451bb74 100644 > --- a/gcc/config/riscv/riscv-vsetvl.h > +++ b/gcc/config/riscv/riscv-vsetvl.h > @@ -290,13 +290,6 @@ private: > definition of AVL. */ > rtl_ssa::insn_info *m_insn; > > - /* Parse the instruction to get VL/VTYPE information and demanding > - * information. */ > - /* This is only called by simple_vsetvl subroutine when optimize =3D= =3D 0. > - Since RTL_SSA can not be enabled when optimize =3D=3D 0, we don't i= nitialize > - the m_insn. */ > - void parse_insn (rtx_insn *); > - > friend class vector_infos_manager; > > public: > @@ -305,6 +298,12 @@ public: > m_insn (nullptr) > {} > > + /* Parse the instruction to get VL/VTYPE information and demanding > + * information. */ > + /* This is only called by simple_vsetvl subroutine when optimize =3D= =3D 0. > + Since RTL_SSA can not be enabled when optimize =3D=3D 0, we don't i= nitialize > + the m_insn. */ > + void parse_insn (rtx_insn *); > /* This is only called by lazy_vsetvl subroutine when optimize > 0. > We use RTL_SSA framework to initialize the insn_info. */ > void parse_insn (rtl_ssa::insn_info *); > @@ -454,6 +453,27 @@ public: > bool all_empty_predecessor_p (const basic_block) const; > bool all_avail_in_compatible_p (const basic_block) const; > > + bool to_delete_p (rtx_insn *rinsn) > + { > + if (to_delete_vsetvls.contains (rinsn)) > + { > + to_delete_vsetvls.remove (rinsn); > + if (to_refine_vsetvls.contains (rinsn)) > + to_refine_vsetvls.remove (rinsn); > + return true; > + } > + return false; > + } > + bool to_refine_p (rtx_insn *rinsn) > + { > + if (to_refine_vsetvls.contains (rinsn)) > + { > + to_refine_vsetvls.remove (rinsn); > + return true; > + } > + return false; > + } > + > void release (void); > void create_bitmap_vectors (void); > void free_bitmap_vectors (void); > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c b/gcc/= testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c > index e0c6588b1db..29e05c4982b 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c > @@ -16,5 +16,5 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) = { > } > } > > -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*1= 0} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c b/gcc/t= estsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c > index 0c5da5e640c..ff0171b3ff6 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c > @@ -17,4 +17,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) = { > } > > /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*1= 0} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c b/gcc/= testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c > new file mode 100644 > index 00000000000..551920c6a72 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=3Drv32gcv -mabi=3Dilp32 -fno-schedule-insns -fno= -schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > + size_t avl; > + if (m > 100) > + avl =3D __riscv_vsetvl_e16mf4(vl << 4); > + else{ > + if (k) > + avl =3D __riscv_vsetvl_e8mf8(vl); > + } > + for (size_t i =3D 0; i < m; i++) { > + vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl); > + __riscv_vse8_v_i8mf8(out + i, v0, avl); > + } > +} > + > +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4= } 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } = */ > +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c b/gcc/= testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c > new file mode 100644 > index 00000000000..103f4238c76 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=3Drv32gcv -mabi=3Dilp32 -fno-schedule-insns -fno= -schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > + size_t avl; > + if (m > 100) > + avl =3D __riscv_vsetvl_e16mf4(vl << 4); > + else > + avl =3D __riscv_vsetvl_e32mf2(vl >> 8); > + for (size_t i =3D 0; i < m; i++) { > + vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl); > + v0 =3D __riscv_vadd_vv_i8mf8 (v0, v0, avl); > + __riscv_vse8_v_i8mf8(out + i, v0, avl); > + } > +} > + > +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4= } 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } = */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,= \s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "= -funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c b/gcc/= testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c > new file mode 100644 > index 00000000000..66c90ac10e7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=3Drv32gcv -mabi=3Dilp32 -fno-schedule-insns -fno= -schedule-insns2" } */ > + > +#include "riscv_vector.h" > + > +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) { > + size_t avl; > + switch (m) > + { > + case 50: > + avl =3D __riscv_vsetvl_e16mf4(vl << 4); > + break; > + case 1: > + avl =3D __riscv_vsetvl_e32mf2(k); > + break; > + case 2: > + avl =3D __riscv_vsetvl_e64m1(vl); > + break; > + case 3: > + avl =3D __riscv_vsetvl_e32mf2(k >> 8); > + break; > + default: > + avl =3D __riscv_vsetvl_e32mf2(k + vl); > + break; > + } > + for (size_t i =3D 0; i < m; i++) { > + vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl); > + v0 =3D __riscv_vadd_vv_i8mf8 (v0, v0, avl); > + v0 =3D __riscv_vadd_vv_i8mf8_tu (v0, v0, v0, avl); > + __riscv_vse8_v_i8mf8(out + i, v0, avl); > + } > +} > + > +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4= } 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } = */ > +/* { dg-final { scan-assembler-times {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8= } 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } = */ > +/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,= \s*mf8,\s*tu,\s*m[au]} 5 { target { no-opts "-O0" no-opts "-Os" no-opts "-g= " no-opts "-funroll-loops" } } } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c b/gcc/t= estsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c > index f995e04aacc..13d09fc3fd1 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c > @@ -18,4 +18,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) = { > } > > /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*1= 0} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0= " no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }= */ > -- > 2.36.1 >