From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=xXM4=B5=sifive.com=kito.cheng@sourceware.org>
Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34])
	by sourceware.org (Postfix) with ESMTPS id 4CBC43858D35
	for <gcc-patches@gcc.gnu.org>; Fri,  9 Jun 2023 10:46:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4CBC43858D35
Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=sifive.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com
Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-19f6f8c840bso571479fac.3
        for <gcc-patches@gcc.gnu.org>; Fri, 09 Jun 2023 03:46:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=sifive.com; s=google; t=1686307567; x=1688899567;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=unBQvgzeOUyFP2owOxzZzHicRAUDhIc+FKz1JgGbnDg=;
        b=N/9XqN9xXHIZ0GvHGCLmwf/E5LYxlWRh7s53dTpM+7ZVjjkjxxq1T+E9vz296SWpGe
         mO0q6YdQ5Vnfu2g5y2RTMtxkF6kdGCTWt/foP1p5qaf+in1UMddM37ynUXMFnaKs9Luy
         kY+PhiM+4zGN0/y0P21iXi4h35Yrol9wJYpLbLj+q8rTpxRrfxR4n/F4c6CjSJDaaZN4
         mia1ZF6xnr5XOpBFH3R3esHUAuIvdhHyL/8CW6JFZ8I/thtloKL3zZzFJMthfiQv59pr
         McxHJUoKv8vzzsxBhnF+4Cu0eDKBdQCq8xrge3dke7UHoNtsaRldSl8nxUxlgqtgYj3E
         xRJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1686307567; x=1688899567;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=unBQvgzeOUyFP2owOxzZzHicRAUDhIc+FKz1JgGbnDg=;
        b=I14MMLnskDldfW1zzogne+gQQo+pz79l/Qi/SuMQLZl5XaDRRqf7g6+ljHJgXMeQTq
         968cpdp2aqgPvF40pZGRc6JRqdumVyTvfWK5uId+eBX+qB6NpBc3BJi5cBlgb7PPIn1M
         mXQB+nPiFETBHu/mt+TjrUqn97A+sPK+2QUPDtHeFXg1MBrHOcBcEvTqq4QP4hQqOopE
         BWOm7K1JrnRIqo1SM0sjpmymlUNNxwIJrXIL6euY/KGLbtILjEiP1XDrcDBNu/hN386c
         qrP+wDYbXUDecZN7TRi1aKGue6DGVLlV32ImusrpR7lhfbfuOtbTyK810VyUoyJx04w2
         wAHg==
X-Gm-Message-State: AC+VfDwj77PiFVAdW77RMHFe+8yD3FynnqkwgG8p9hwJSOhAMhWuWszx
	0NQaxVkGnOkYQrH+PS/LqD4kTWSoXI0YwQdmpWjclA==
X-Google-Smtp-Source: ACHHUZ5Wm65SZa6ekv1vPXEYtROgubsoqfQvx95urGRyq4bVqHyadwEyRHLIFGoLc4f1g67Yz4hhbI/pGRYcsCuMMf4=
X-Received: by 2002:a05:6808:d3:b0:39c:4629:9253 with SMTP id
 t19-20020a05680800d300b0039c46299253mr1309965oic.42.1686307567176; Fri, 09
 Jun 2023 03:46:07 -0700 (PDT)
MIME-Version: 1.0
References: <20230609104105.9100-1-juzhe.zhong@rivai.ai>
In-Reply-To: <20230609104105.9100-1-juzhe.zhong@rivai.ai>
From: Kito Cheng <kito.cheng@sifive.com>
Date: Fri, 9 Jun 2023 18:45:55 +0800
Message-ID: <CALLt3Tihy5kQCxK1yduTSYibZ7hSs3aQjuNFoWaNLy-muYtbbw@mail.gmail.com>
Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
To: juzhe.zhong@rivai.ai
Cc: gcc-patches@gcc.gnu.org, kito.cheng@gmail.com, palmer@dabbelt.com, 
	palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, 
	pan2.li@intel.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_BODY,RCVD_IN_DNSWL_NONE,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Thankful you send this before weekend, I could run the fuzzy testing
during this weekend :P

On Fri, Jun 9, 2023 at 6:41=E2=80=AFPM <juzhe.zhong@rivai.ai> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 &=
& Phase 6
> are quite messy and cause some bugs discovered by my downstream auto-vect=
orization
> test-generator.
>
> Before this patch.
>
> Phase 5 is cleanup_insns is the function remove AVL operand dependency fr=
om each RVV instruction.
> E.g. vadd.vv (use a5), after Phase 5, =3D=3D=3D=3D> vadd.vv (use const_in=
t 0). Since "a5" is used in "vsetvl" instructions and
> after the correct "vsetvl" instructions are inserted, each RVV instructio=
n doesn't need AVL operand "a5" anymore. Then,
> we remove this operand dependency helps for the following scheduling PASS=
.
>
> Phase 6 is propagate_avl do the following 2 things:
> 1. Local && Global user vsetvl instructions optimization.
>    E.g.
>       vsetvli a2, a2, e8, mf8   =3D=3D=3D=3D=3D=3D> Change it into vsetvl=
i a2, a2, e32, mf2
>       vsetvli zero,a2, e32, mf2  =3D=3D=3D=3D=3D=3D> eliminate
> 2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2"=
 is not used by any instructions.
> Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on=
 LCM which change the CFG, I re-new a new
> RTL_SSA framework (which is more expensive than just using DF) for Phase =
6 and optmize user vsetvli base on the new RTL_SSA.
>
> There are 2 issues in Phase 5 && Phase 6:
> 1. local_eliminate_vsetvl_insn was introduced by @kito which can do bette=
r local user vsetvl optimizations better than
>    Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework=
. So the local user vsetvli instructions optimizaiton
>    in Phase 6 is redundant and should be removed.
> 2. A bug discovered by my downstream auto-vectorization test-generator (I=
 can't put the test in this patch since we are missing autovec
>    patterns for it so we can't use the upstream GCC directly reproduce su=
ch issue but I will remember put it back after I support the
>    necessary autovec patterns). Such bug is causing by using RTL_SSA re-n=
ew framework. The issue description is this:
>
> Before Phase 6:
>    ...
>    insn1: vsetlvi a3, 17 <=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D generated by SEL=
ECT_VL auto-vec pattern.
>    slli a4,a3,3
>    ...
>    insn2: vsetvli zero, a3, ...
>    load (use const_int 0, before Phase 5, it's using a3, but the use of "=
a3" is removed in Phase 5)
>    ...
>
> In Phase 6, we iterate to insn2, then get the def of "a3" which is the in=
sn1.
> insn2 is the vsetvli instruction inserted in Phase 4 which is not include=
d in the RLT_SSA framework
> even though we renew it (I didn't take a look at it and I don't think we =
need to now).
> Base on this situation, the def_info of insn2 has the information "set->s=
ingle_nondebug_insn_use ()"
> which return true. Obviously, this information is not correct, since insn=
1 has aleast 2 uses:
> 1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated=
 by my downstream test-generator
> execution test failed.
>
> Conclusion of RTL_SSA framework:
> Before this patch, we initialize RTL_SSA 2 times. One is at the beginning=
 of the VSETVL PASS which is absolutely correct, the other
> is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
>
> Besides, we don't like to initialize RTL_SSA second time it seems to be a=
 waste since we just need to do a little optimization.
>
> Base on all circumstances I described above, I rework and reorganize Phas=
e 5 && Phase 6 as follows:
> 1. Phase 5 is called ssa_post_optimization which is doing the optimizatio=
n base on the RTL_SSA information (The RTL_SSA is initialized
>    at the beginning of the VSETVL PASS, no need to re-new it again). This=
 phase includes 3 optimizaitons:
>    1). local_eliminate_vsetvl_insn we already have (no change).
>    2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from o=
rignal Phase 6 but with more powerful and reliable implementation.
>       E.g.
>       void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
>         size_t avl;
>         if (m > 100)
>           avl =3D __riscv_vsetvl_e16mf4(vl << 4);
>         else
>           avl =3D __riscv_vsetvl_e32mf2(vl >> 8);
>         for (size_t i =3D 0; i < m; i++) {
>           vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl);
>           v0 =3D __riscv_vadd_vv_i8mf8 (v0, v0, avl);
>           __riscv_vse8_v_i8mf8(out + i, v0, avl);
>         }
>       }
>
>       This example failed to global user vsetvl optimize before this patc=
h:
>       f:
>               li      a5,100
>               bleu    a3,a5,.L2
>               slli    a2,a2,4
>               vsetvli a4,a2,e16,mf4,ta,mu
>       .L3:
>               li      a5,0
>               vsetvli zero,a4,e8,mf8,ta,ma
>       .L5:
>               add     a6,a0,a5
>               add     a2,a1,a5
>               vle8.v  v1,0(a6)
>               addi    a5,a5,1
>               vadd.vv v1,v1,v1
>               vse8.v  v1,0(a2)
>               bgtu    a3,a5,.L5
>       .L10:
>               ret
>       .L2:
>               beq     a3,zero,.L10
>               srli    a2,a2,8
>               vsetvli a4,a2,e32,mf2,ta,mu
>               j       .L3
>       With this patch:
>       f:
>               li      a5,100
>               bleu    a3,a5,.L2
>               slli    a2,a2,4
>               vsetvli zero,a2,e8,mf8,ta,ma
>       .L3:
>               li      a5,0
>       .L5:
>               add     a6,a0,a5
>               add     a2,a1,a5
>               vle8.v  v1,0(a6)
>               addi    a5,a5,1
>               vadd.vv v1,v1,v1
>               vse8.v  v1,0(a2)
>               bgtu    a3,a5,.L5
>       .L10:
>               ret
>       .L2:
>               beq     a3,zero,.L10
>               srli    a2,a2,8
>               vsetvli zero,a2,e8,mf8,ta,ma
>               j       .L3
>
>    3). Remove AVL operand dependency of each RVV instructions.
>
> 2. Phase 6 is called df_post_optimization: Optimize "vsetvl a3,a2...." in=
to Optimize "vsetvl zero,a2...." base on
>    dataflow analysis of new CFG (new CFG is created by LCM). The reason w=
e need to do use new CFG and after Phase 5:
>    ...
>    vsetvl a3, a2...
>    vadd.vv (use a3)
>    If we don't have Phase 5 which removes the "a3" use in vadd.vv, we wil=
l fail to optimize vsetvl a3,a2 into vsetvl zero,a2.
>
>    This patch passed all tests in rvv.exp with ONLY peformance && codegen=
 improved (no performance decline and no bugs including my
>    downstream tests).
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-vsetvl.cc (available_occurrence_p): Ehance u=
ser vsetvl optimization.
>         (vector_insn_info::parse_insn): Add rtx_insn parse.
>         (pass_vsetvl::local_eliminate_vsetvl_insn): Ehance user vsetvl op=
timization.
>         (get_first_vsetvl): New function.
>         (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
>         (pass_vsetvl::cleanup_insns): Remove it.
>         (pass_vsetvl::ssa_post_optimization): New function.
>         (has_no_uses): Ditto.
>         (pass_vsetvl::propagate_avl): Remove it.
>         (pass_vsetvl::df_post_optimization): New function.
>         (pass_vsetvl::lazy_vsetvl): Rework Phase 5 && Phase 6.
>         * config/riscv/riscv-vsetvl.h: Adapt declaration.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/vsetvl/vsetvl-16.c: Adapt test.
>         * gcc.target/riscv/rvv/vsetvl/vsetvl-2.c: Ditto.
>         * gcc.target/riscv/rvv/vsetvl/vsetvl-3.c: Ditto.
>         * gcc.target/riscv/rvv/vsetvl/vsetvl-21.c: New test.
>         * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: New test.
>         * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc              | 400 +++++++++++-------
>  gcc/config/riscv/riscv-vsetvl.h               |  34 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-16.c   |   2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-2.c    |   2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-21.c   |  21 +
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  21 +
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |  37 ++
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-3.c    |   2 +-
>  8 files changed, 366 insertions(+), 153 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vs=
etvl.cc
> index fe55f4ccd30..924a94adf9c 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -395,10 +395,15 @@ available_occurrence_p (const bb_info *bb, const ve=
ctor_insn_info dem)
>        if (!vlmax_avl_p (dem.get_avl ()))
>         {
>           rtx dest =3D NULL_RTX;
> +         insn_info *i =3D insn;
>           if (vsetvl_insn_p (insn->rtl ()))
> -           dest =3D get_vl (insn->rtl ());
> -         for (const insn_info *i =3D insn; real_insn_and_same_bb_p (i, b=
b);
> -              i =3D i->next_nondebug_insn ())
> +           {
> +             dest =3D get_vl (insn->rtl ());
> +             /* For user vsetvl a2, a2 instruction, we consider it as
> +                available even though it modifies "a2".  */
> +             i =3D i->next_nondebug_insn ();
> +           }
> +         for (; real_insn_and_same_bb_p (i, bb); i =3D i->next_nondebug_=
insn ())
>             {
>               if (read_vl_insn_p (i->rtl ()))
>                 continue;
> @@ -1893,11 +1898,13 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>    *this =3D vector_insn_info ();
>    if (!NONDEBUG_INSN_P (rinsn))
>      return;
> -  if (!has_vtype_op (rinsn))
> +  if (optimize =3D=3D 0 && !has_vtype_op (rinsn))
> +    return;
> +  if (optimize > 0 && !vsetvl_insn_p (rinsn))
>      return;
>    m_state =3D VALID;
>    extract_insn_cached (rinsn);
> -  const rtx avl =3D recog_data.operand[get_attr_vl_op_idx (rinsn)];
> +  rtx avl =3D ::get_avl (rinsn);
>    m_avl =3D avl_info (avl, nullptr);
>    m_sew =3D ::get_sew (rinsn);
>    m_vlmul =3D ::get_vlmul (rinsn);
> @@ -2730,10 +2737,11 @@ private:
>    /* Phase 5.  */
>    rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) cons=
t;
>    void local_eliminate_vsetvl_insn (const bb_info *) const;
> -  void cleanup_insns (void) const;
> +  bool global_eliminate_vsetvl_insn (const bb_info *) const;
> +  void ssa_post_optimization (void) const;
>
>    /* Phase 6.  */
> -  void propagate_avl (void) const;
> +  void df_post_optimization (void) const;
>
>    void init (void);
>    void done (void);
> @@ -4246,7 +4254,7 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_=
info *bb) const
>
>        /* Local AVL compatibility checking is simpler than global, we onl=
y
>          need to check the REGNO is same.  */
> -      if (prev_dem.valid_p () && prev_dem.skip_avl_compatible_p (curr_de=
m)
> +      if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p=
 (curr_dem)
>           && local_avl_compatible_p (prev_avl, curr_avl))
>         {
>           /* curr_dem and prev_dem is compatible!  */
> @@ -4277,27 +4285,187 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const =
bb_info *bb) const
>      }
>  }
>
> -/* Before VSETVL PASS, RVV instructions pattern is depending on AVL oper=
and
> -   implicitly. Since we will emit VSETVL instruction and make RVV instru=
ctions
> -   depending on VL/VTYPE global status registers, we remove the such AVL=
 operand
> -   in the RVV instructions pattern here in order to remove AVL dependenc=
ies when
> -   AVL operand is a register operand.
> -
> -   Before the VSETVL PASS:
> -     li a5,32
> -     ...
> -     vadd.vv (..., a5)
> -   After the VSETVL PASS:
> -     li a5,32
> -     vsetvli zero, a5, ...
> -     ...
> -     vadd.vv (..., const_int 0).  */
> +/* Get the first vsetvl instructions of the block.  */
> +static rtx_insn *
> +get_first_vsetvl (basic_block cfg_bb)
> +{
> +  rtx_insn *rinsn;
> +  FOR_BB_INSNS (cfg_bb, rinsn)
> +    {
> +      if (!NONDEBUG_INSN_P (rinsn))
> +       continue;
> +      /* If we don't find any inserted vsetvli before user RVV instructi=
ons,
> +        we don't need to optimize the vsetvls in this block.  */
> +      if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn))
> +       return nullptr;
> +
> +      if (vsetvl_discard_result_insn_p (rinsn))
> +       return rinsn;
> +    }
> +  return nullptr;
> +}
> +
> +/* Global user vsetvl optimizaiton:
> +
> +     Case 1:
> +     bb 1:
> +       vsetvl a5,a4,e8,mf8
> +       ...
> +     bb 2:
> +       ...
> +       vsetvl zero,a5,e8,mf8 --> Eliminate directly.
> +
> +     Case 2:
> +      bb 1:
> +       vsetvl a5,a4,e8,mf8    --> vsetvl a5,a4,e32,mf2
> +       ...
> +      bb 2:
> +       ...
> +       vsetvl zero,a5,e32,mf2 --> Eliminate directly.
> +
> +     Case 3:
> +      bb 1:
> +       vsetvl a5,a4,e8,mf8    --> vsetvl a5,a4,e32,mf2
> +       ...
> +      bb 2:
> +       ...
> +       vsetvl a5,a4,e8,mf8    --> vsetvl a5,a4,e32,mf2
> +       goto bb 3
> +      bb 3:
> +       ...
> +       vsetvl zero,a5,e32,mf2 --> Eliminate directly.
> +*/
> +bool
> +pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const
> +{
> +  rtx_insn *vsetvl_rinsn;
> +  vector_insn_info dem =3D vector_insn_info ();
> +  const auto &block_info =3D get_block_info (bb);
> +  basic_block cfg_bb =3D bb->cfg_bb ();
> +
> +  if (block_info.local_dem.valid_or_dirty_p ())
> +    {
> +      /* Optimize the local vsetvl.  */
> +      dem =3D block_info.local_dem;
> +      vsetvl_rinsn =3D get_first_vsetvl (cfg_bb);
> +    }
> +  if (!vsetvl_rinsn)
> +    /* Optimize the global vsetvl inserted by LCM.  */
> +    vsetvl_rinsn =3D get_vsetvl_at_end (bb, &dem);
> +
> +  /* No need to optimize if block doesn't have vsetvl instructions.  */
> +  if (!dem.valid_or_dirty_p () || !vsetvl_rinsn || !dem.get_avl_source (=
)
> +      || !dem.has_avl_reg ())
> +    return false;
> +
> +  /* If all preds has VL/VTYPE status setted by user vsetvls, and these
> +     user vsetvls are all skip_avl_compatible_p with the vsetvl in this
> +     block, we can eliminate this vsetvl instruction.  */
> +  sbitmap avin =3D m_vector_manager->vector_avin[cfg_bb->index];
> +
> +  unsigned int bb_index;
> +  sbitmap_iterator sbi;
> +  rtx avl =3D get_avl (dem.get_insn ()->rtl ());
> +  hash_set<set_info *> sets
> +    =3D get_all_sets (dem.get_avl_source (), true, false, false);
> +  /* Condition 1: All VL/VTYPE available in are all compatible.  */
> +  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
> +    {
> +      const auto &expr =3D m_vector_manager->vector_exprs[bb_index];
> +      const auto &insn =3D expr->get_insn ();
> +      def_info *def =3D find_access (insn->defs (), REGNO (avl));
> +      set_info *set =3D safe_dyn_cast<set_info *> (def);
> +      if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () =3D=3D bb
> +         || !sets.contains (set))
> +       return false;
> +    }
> +
> +  /* Condition 2: Check it has preds.  */
> +  if (EDGE_COUNT (cfg_bb->preds) =3D=3D 0)
> +    return false;
> +
> +  /* Condition 3: We don't do the global optimization for the block
> +     has a pred is entry block or exit block.  */
> +  /* Condition 4: All preds have available VL/VTYPE out.  */
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (e, ei, cfg_bb->preds)
> +    {
> +      sbitmap avout =3D m_vector_manager->vector_avout[e->src->index];
> +      if (e->src =3D=3D ENTRY_BLOCK_PTR_FOR_FN (cfun)
> +         || e->src =3D=3D EXIT_BLOCK_PTR_FOR_FN (cfun) || bitmap_empty_p=
 (avout))
> +       return false;
> +
> +      EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi)
> +       {
> +         const auto &expr =3D m_vector_manager->vector_exprs[bb_index];
> +         const auto &insn =3D expr->get_insn ();
> +         def_info *def =3D find_access (insn->defs (), REGNO (avl));
> +         set_info *set =3D safe_dyn_cast<set_info *> (def);
> +         if (!vsetvl_insn_p (insn->rtl ()) || insn->bb () =3D=3D bb
> +             || !sets.contains (set) || !expr->skip_avl_compatible_p (de=
m))
> +           return false;
> +       }
> +    }
> +
> +  /* Step1: Reshape the VL/VTYPE status to make sure everything compatib=
le.  */
> +  hash_set<basic_block> pred_cfg_bbs =3D get_all_predecessors (cfg_bb);
> +  FOR_EACH_EDGE (e, ei, cfg_bb->preds)
> +    {
> +      sbitmap avout =3D m_vector_manager->vector_avout[e->src->index];
> +      EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi)
> +       {
> +         vector_insn_info prev_dem =3D *m_vector_manager->vector_exprs[b=
b_index];
> +         vector_insn_info curr_dem =3D dem;
> +         insn_info *insn =3D prev_dem.get_insn ();
> +         if (!pred_cfg_bbs.contains (insn->bb ()->cfg_bb ()))
> +           continue;
> +         /* Update avl info since we need to make sure they are fully
> +            compatible before merge.  */
> +         curr_dem.set_avl_info (prev_dem.get_avl_info ());
> +         /* Merge both and update into curr_vsetvl.  */
> +         prev_dem =3D curr_dem.merge (prev_dem, LOCAL_MERGE);
> +         change_vsetvl_insn (insn, prev_dem);
> +       }
> +    }
> +
> +  /* Step2: eliminate the vsetvl instruction.  */
> +  eliminate_insn (vsetvl_rinsn);
> +  return true;
> +}
> +
> +/* This function does the following post optimization base on RTL_SSA:
> +
> +   1. Local user vsetvl optimizations.
> +   2. Global user vsetvl optimizations.
> +   3. AVL dependencies removal:
> +      Before VSETVL PASS, RVV instructions pattern is depending on AVL o=
perand
> +      implicitly. Since we will emit VSETVL instruction and make RVV
> +      instructions depending on VL/VTYPE global status registers, we rem=
ove the
> +      such AVL operand in the RVV instructions pattern here in order to =
remove
> +      AVL dependencies when AVL operand is a register operand.
> +
> +      Before the VSETVL PASS:
> +       li a5,32
> +       ...
> +       vadd.vv (..., a5)
> +      After the VSETVL PASS:
> +       li a5,32
> +       vsetvli zero, a5, ...
> +       ...
> +       vadd.vv (..., const_int 0).  */
>  void
> -pass_vsetvl::cleanup_insns (void) const
> +pass_vsetvl::ssa_post_optimization (void) const
>  {
>    for (const bb_info *bb : crtl->ssa->bbs ())
>      {
>        local_eliminate_vsetvl_insn (bb);
> +      bool changed_p =3D true;
> +      while (changed_p)
> +       {
> +         changed_p =3D false;
> +         changed_p |=3D global_eliminate_vsetvl_insn (bb);
> +       }
>        for (insn_info *insn : bb->real_nondebug_insns ())
>         {
>           rtx_insn *rinsn =3D insn->rtl ();
> @@ -4342,135 +4510,81 @@ pass_vsetvl::cleanup_insns (void) const
>      }
>  }
>
> +/* Return true if the SET result is not used by any instructions.  */
> +static bool
> +has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
> +{
> +  /* Handle the following case that can not be detected in RTL_SSA.  */
> +  /* E.g.
> +         li a5, 100
> +         vsetvli a6, a5...
> +         ...
> +         vadd (use a6)
> +
> +       The use of "a6" is removed from "vadd" but the information is
> +       not updated in RTL_SSA framework. We don't want to re-new
> +       a new RTL_SSA which is expensive, instead, we use data-flow
> +       analysis to check whether "a6" has no uses.  */
> +  if (bitmap_bit_p (df_get_live_out (cfg_bb), regno))
> +    return false;
> +
> +  rtx_insn *iter;
> +  for (iter =3D NEXT_INSN (rinsn); iter && iter !=3D NEXT_INSN (BB_END (=
cfg_bb));
> +       iter =3D NEXT_INSN (iter))
> +    if (df_find_use (iter, regno_reg_rtx[regno]))
> +      return false;
> +
> +  return true;
> +}
> +
> +/* This function does the following post optimization base on dataflow
> +   analysis:
> +
> +   1. Change vsetvl rd, rs1 --> vsevl zero, rs1, if rd is not used by an=
y
> +   nondebug instructions. Even though this PASS runs after RA and it doe=
sn't
> +   help for reduce register pressure, it can help instructions schedulin=
g since
> +   we remove the dependencies.
> +
> +   2. Remove redundant user vsetvls base on outcome of Phase 4 (LCM) && =
Phase 5
> +   (AVL dependencies removal).  */
>  void
> -pass_vsetvl::propagate_avl (void) const
> -{
> -  /* Rebuild the RTL_SSA according to the new CFG generated by LCM.  */
> -  /* Finalization of RTL_SSA.  */
> -  free_dominance_info (CDI_DOMINATORS);
> -  if (crtl->ssa->perform_pending_updates ())
> -    cleanup_cfg (0);
> -  delete crtl->ssa;
> -  crtl->ssa =3D nullptr;
> -  /* Initialization of RTL_SSA.  */
> -  calculate_dominance_info (CDI_DOMINATORS);
> +pass_vsetvl::df_post_optimization (void) const
> +{
>    df_analyze ();
> -  crtl->ssa =3D new function_info (cfun);
> -
>    hash_set<rtx_insn *> to_delete;
> -  for (const bb_info *bb : crtl->ssa->bbs ())
> +  basic_block cfg_bb;
> +  rtx_insn *rinsn;
> +  FOR_ALL_BB_FN (cfg_bb, cfun)
>      {
> -      for (insn_info *insn : bb->real_nondebug_insns ())
> +      FOR_BB_INSNS (cfg_bb, rinsn)
>         {
> -         if (vsetvl_discard_result_insn_p (insn->rtl ()))
> +         if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn))
>             {
> -             rtx avl =3D get_avl (insn->rtl ());
> -             if (!REG_P (avl))
> -               continue;
> -
> -             set_info *set =3D find_access (insn->uses (), REGNO (avl))-=
>def ();
> -             insn_info *def_insn =3D extract_single_source (set);
> -             if (!def_insn)
> -               continue;
> -
> -             /* Handle this case:
> -                vsetvli        a6,zero,e32,m1,ta,mu
> -                li     a5,4096
> -                add    a7,a0,a5
> -                addi   a7,a7,-96
> -                vsetvli        t1,zero,e8,mf8,ta,ma
> -                vle8.v v24,0(a7)
> -                add    a5,a3,a5
> -                addi   a5,a5,-96
> -                vse8.v v24,0(a5)
> -                vsetvli        zero,a6,e32,m1,tu,ma
> -             */
> -             if (vsetvl_insn_p (def_insn->rtl ()))
> -               {
> -                 vl_vtype_info def_info =3D get_vl_vtype_info (def_insn)=
;
> -                 vl_vtype_info info =3D get_vl_vtype_info (insn);
> -                 rtx avl =3D get_avl (def_insn->rtl ());
> -                 rtx vl =3D get_vl (def_insn->rtl ());
> -                 if (def_info.get_ratio () =3D=3D info.get_ratio ())
> -                   {
> -                     if (vlmax_avl_p (def_info.get_avl ()))
> -                       {
> -                         info.set_avl_info (
> -                           avl_info (def_info.get_avl (), nullptr));
> -                         rtx new_pat
> -                           =3D gen_vsetvl_pat (VSETVL_NORMAL, info, vl);
> -                         validate_change (insn->rtl (),
> -                                          &PATTERN (insn->rtl ()), new_p=
at,
> -                                          false);
> -                         continue;
> -                       }
> -                     if (def_info.has_avl_imm () || rtx_equal_p (avl, vl=
))
> -                       {
> -                         info.set_avl_info (avl_info (avl, nullptr));
> -                         emit_vsetvl_insn (VSETVL_DISCARD_RESULT, EMIT_A=
FTER,
> -                                           info, NULL_RTX, insn->rtl ())=
;
> -                         if (set->single_nondebug_insn_use ())
> -                           {
> -                             to_delete.add (insn->rtl ());
> -                             to_delete.add (def_insn->rtl ());
> -                           }
> -                         continue;
> -                       }
> -                   }
> -               }
> -           }
> -
> -         /* Change vsetvl rd, rs1 --> vsevl zero, rs1,
> -            if rd is not used by any nondebug instructions.
> -            Even though this PASS runs after RA and it doesn't help for
> -            reduce register pressure, it can help instructions schedulin=
g
> -            since we remove the dependencies.  */
> -         if (vsetvl_insn_p (insn->rtl ()))
> -           {
> -             rtx vl =3D get_vl (insn->rtl ());
> -             rtx avl =3D get_avl (insn->rtl ());
> -             def_info *def =3D find_access (insn->defs (), REGNO (vl));
> -             set_info *set =3D safe_dyn_cast<set_info *> (def);
> +             rtx vl =3D get_vl (rinsn);
>               vector_insn_info info;
> -             info.parse_insn (insn);
> -             gcc_assert (set);
> -             if (m_vector_manager->to_delete_vsetvls.contains (insn->rtl=
 ()))
> -               {
> -                 m_vector_manager->to_delete_vsetvls.remove (insn->rtl (=
));
> -                 if (m_vector_manager->to_refine_vsetvls.contains (
> -                       insn->rtl ()))
> -                   m_vector_manager->to_refine_vsetvls.remove (insn->rtl=
 ());
> -                 if (!set->has_nondebug_insn_uses ())
> -                   {
> -                     to_delete.add (insn->rtl ());
> -                     continue;
> -                   }
> -               }
> -             if (m_vector_manager->to_refine_vsetvls.contains (insn->rtl=
 ()))
> +             info.parse_insn (rinsn);
> +             bool to_delete_p =3D m_vector_manager->to_delete_p (rinsn);
> +             bool to_refine_p =3D m_vector_manager->to_refine_p (rinsn);
> +             if (has_no_uses (cfg_bb, rinsn, REGNO (vl)))
>                 {
> -                 m_vector_manager->to_refine_vsetvls.remove (insn->rtl (=
));
> -                 if (!set->has_nondebug_insn_uses ())
> +                 if (to_delete_p)
> +                   to_delete.add (rinsn);
> +                 else if (to_refine_p)
>                     {
>                       rtx new_pat =3D gen_vsetvl_pat (VSETVL_VTYPE_CHANGE=
_ONLY,
>                                                     info, NULL_RTX);
> -                     change_insn (insn->rtl (), new_pat);
> -                     continue;
> +                     validate_change (rinsn, &PATTERN (rinsn), new_pat, =
false);
> +                   }
> +                 else if (!vlmax_avl_p (info.get_avl ()))
> +                   {
> +                     rtx new_pat =3D gen_vsetvl_pat (VSETVL_DISCARD_RESU=
LT, info,
> +                                                   NULL_RTX);
> +                     validate_change (rinsn, &PATTERN (rinsn), new_pat, =
false);
>                     }
> -               }
> -             if (vlmax_avl_p (avl))
> -               continue;
> -             rtx new_pat
> -               =3D gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info, NULL_RTX=
);
> -             if (!set->has_nondebug_insn_uses ())
> -               {
> -                 validate_change (insn->rtl (), &PATTERN (insn->rtl ()),
> -                                  new_pat, false);
> -                 continue;
>                 }
>             }
>         }
>      }
> -
>    for (rtx_insn *rinsn : to_delete)
>      eliminate_insn (rinsn);
>  }
> @@ -4593,16 +4707,16 @@ pass_vsetvl::lazy_vsetvl (void)
>      fprintf (dump_file, "\nPhase 4: PRE vsetvl by Lazy code motion (LCM)=
\n");
>    pre_vsetvl ();
>
> -  /* Phase 5 - Cleanup AVL && VL operand of RVV instruction.  */
> +  /* Phase 5 - Post optimization base on RTL_SSA.  */
>    if (dump_file)
> -    fprintf (dump_file, "\nPhase 5: Cleanup AVL and VL operands\n");
> -  cleanup_insns ();
> +    fprintf (dump_file, "\nPhase 5: Post optimization base on RTL_SSA\n"=
);
> +  ssa_post_optimization ();
>
> -  /* Phase 6 - Rebuild RTL_SSA to propagate AVL between vsetvls.  */
> +  /* Phase 6 - Post optimization base on data-flow analysis.  */
>    if (dump_file)
>      fprintf (dump_file,
> -            "\nPhase 6: Rebuild RTL_SSA to propagate AVL between vsetvls=
\n");
> -  propagate_avl ();
> +            "\nPhase 6: Post optimization base on data-flow analysis\n")=
;
> +  df_post_optimization ();
>  }
>
>  /* Main entry point for this pass.  */
> diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vse=
tvl.h
> index d7a6c14e931..4257451bb74 100644
> --- a/gcc/config/riscv/riscv-vsetvl.h
> +++ b/gcc/config/riscv/riscv-vsetvl.h
> @@ -290,13 +290,6 @@ private:
>       definition of AVL.  */
>    rtl_ssa::insn_info *m_insn;
>
> -  /* Parse the instruction to get VL/VTYPE information and demanding
> -   * information.  */
> -  /* This is only called by simple_vsetvl subroutine when optimize =3D=
=3D 0.
> -     Since RTL_SSA can not be enabled when optimize =3D=3D 0, we don't i=
nitialize
> -     the m_insn.  */
> -  void parse_insn (rtx_insn *);
> -
>    friend class vector_infos_manager;
>
>  public:
> @@ -305,6 +298,12 @@ public:
>        m_insn (nullptr)
>    {}
>
> +  /* Parse the instruction to get VL/VTYPE information and demanding
> +   * information.  */
> +  /* This is only called by simple_vsetvl subroutine when optimize =3D=
=3D 0.
> +     Since RTL_SSA can not be enabled when optimize =3D=3D 0, we don't i=
nitialize
> +     the m_insn.  */
> +  void parse_insn (rtx_insn *);
>    /* This is only called by lazy_vsetvl subroutine when optimize > 0.
>       We use RTL_SSA framework to initialize the insn_info.  */
>    void parse_insn (rtl_ssa::insn_info *);
> @@ -454,6 +453,27 @@ public:
>    bool all_empty_predecessor_p (const basic_block) const;
>    bool all_avail_in_compatible_p (const basic_block) const;
>
> +  bool to_delete_p (rtx_insn *rinsn)
> +  {
> +    if (to_delete_vsetvls.contains (rinsn))
> +      {
> +       to_delete_vsetvls.remove (rinsn);
> +       if (to_refine_vsetvls.contains (rinsn))
> +         to_refine_vsetvls.remove (rinsn);
> +       return true;
> +      }
> +    return false;
> +  }
> +  bool to_refine_p (rtx_insn *rinsn)
> +  {
> +    if (to_refine_vsetvls.contains (rinsn))
> +      {
> +       to_refine_vsetvls.remove (rinsn);
> +       return true;
> +      }
> +    return false;
> +  }
> +
>    void release (void);
>    void create_bitmap_vectors (void);
>    void free_bitmap_vectors (void);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c b/gcc/=
testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c
> index e0c6588b1db..29e05c4982b 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-16.c
> @@ -16,5 +16,5 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) =
{
>    }
>  }
>
> -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
>  /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*1=
0} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c b/gcc/t=
estsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c
> index 0c5da5e640c..ff0171b3ff6 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-2.c
> @@ -17,4 +17,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) =
{
>  }
>
>  /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*1=
0} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c b/gcc/=
testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
> new file mode 100644
> index 00000000000..551920c6a72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=3Drv32gcv -mabi=3Dilp32 -fno-schedule-insns -fno=
-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> +  size_t avl;
> +  if (m > 100)
> +    avl =3D __riscv_vsetvl_e16mf4(vl << 4);
> +  else{
> +    if (k)
> +      avl =3D __riscv_vsetvl_e8mf8(vl);
> +  }
> +  for (size_t i =3D 0; i < m; i++) {
> +    vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl);
> +    __riscv_vse8_v_i8mf8(out + i, v0, avl);
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4=
} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } =
*/
> +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c b/gcc/=
testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
> new file mode 100644
> index 00000000000..103f4238c76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=3Drv32gcv -mabi=3Dilp32 -fno-schedule-insns -fno=
-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> +  size_t avl;
> +  if (m > 100)
> +    avl =3D __riscv_vsetvl_e16mf4(vl << 4);
> +  else
> +    avl =3D __riscv_vsetvl_e32mf2(vl >> 8);
> +  for (size_t i =3D 0; i < m; i++) {
> +    vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl);
> +    v0 =3D __riscv_vadd_vv_i8mf8 (v0, v0, avl);
> +    __riscv_vse8_v_i8mf8(out + i, v0, avl);
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4=
} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } =
*/
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,=
\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "=
-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c b/gcc/=
testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> new file mode 100644
> index 00000000000..66c90ac10e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=3Drv32gcv -mabi=3Dilp32 -fno-schedule-insns -fno=
-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
> +  size_t avl;
> +  switch (m)
> +  {
> +  case 50:
> +    avl =3D __riscv_vsetvl_e16mf4(vl << 4);
> +    break;
> +  case 1:
> +    avl =3D __riscv_vsetvl_e32mf2(k);
> +    break;
> +  case 2:
> +    avl =3D __riscv_vsetvl_e64m1(vl);
> +    break;
> +  case 3:
> +    avl =3D __riscv_vsetvl_e32mf2(k >> 8);
> +    break;
> +  default:
> +    avl =3D __riscv_vsetvl_e32mf2(k + vl);
> +    break;
> +  }
> +  for (size_t i =3D 0; i < m; i++) {
> +    vint8mf8_t v0 =3D __riscv_vle8_v_i8mf8(base + i, avl);
> +    v0 =3D __riscv_vadd_vv_i8mf8 (v0, v0, avl);
> +    v0 =3D __riscv_vadd_vv_i8mf8_tu (v0, v0, v0, avl);
> +    __riscv_vse8_v_i8mf8(out + i, v0, avl);
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4=
} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } =
*/
> +/* { dg-final { scan-assembler-times {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8=
} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } =
*/
> +/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,=
\s*mf8,\s*tu,\s*m[au]} 5 { target { no-opts "-O0" no-opts "-Os" no-opts "-g=
" no-opts "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c b/gcc/t=
estsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c
> index f995e04aacc..13d09fc3fd1 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-3.c
> @@ -18,4 +18,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m) =
{
>  }
>
>  /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*1=
0} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> -/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0=
" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } }=
 */
> --
> 2.36.1
>