[PATCH V3] RISC-V: Support Dynamic LMUL Cost model

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
@ 2023-09-11  9:41 Juzhe-Zhong
  2023-09-11 20:31 ` Robin Dapp
  0 siblings, 1 reply; 4+ messages in thread
From: Juzhe-Zhong @ 2023-09-11  9:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong

This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic.

Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
      int32_t *__restrict d,
      int32_t *__restrict d2,
      int32_t *__restrict d3,
      int32_t *__restrict d4,
      int32_t *__restrict d5,
      int n)
{
  for (int i = 0; i < n; i++)
    {
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      d2[i] = a2[i] + c2[i];
      d3[i] = a3[i] + c3[i];
      d4[i] = a4[i] + c4[i];
      d5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i] + a[i];

      c2[i] = a[i] + c[i];
      c3[i] = b5[i] * a5[i];
      c4[i] = a2[i] * a3[i];
      c5[i] = b5[i] * a2[i];
      c[i] = a[i] + c3[i];
      c2[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i]
      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
    }
}

Demo: https://godbolt.org/z/x1acoMxGT

You can see it will produce register spilling if you specify LMUL >= 4

Now, with --param=riscv-autovec-lmul=dynamic.

GCC is able to pick LMUL = 2 to optimized this case.

This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to determine the VF/LMUL.

Note that this patch can well handle both SLP and non-SLP loop.

Currenty approach didn't consider the later instruction scheduler which may improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether we should support live range shrink for such corner case since we don't known whether it can improve performance a lot.)

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (get_last_live_range): New function.
	(compute_nregs_for_mode): Ditto.
	(live_range_conflict_p): Ditto.
	(max_number_of_live_regs): Ditto.
	(compute_lmul): Ditto.
	(costs::prefer_new_lmul_p): Ditto.
	(costs::better_main_loop_than_p): Ditto.
	* config/riscv/riscv-vector-costs.h (struct stmt_point): New struct.
	(struct var_live_range): Ditto.
	(struct autovec_info): Ditto.
	* config/riscv/t-riscv: Update makefile for COST model.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc        | 514 ++++++++++++++++++
 gcc/config/riscv/riscv-vector-costs.h         |  25 +
 gcc/config/riscv/t-riscv                      |   3 +-
 .../riscv/rvv/dynamic-lmul-mixed-1.c          |  50 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-1.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-2.c     |  63 +++
 .../costmodel/riscv/rvv/dynamic-lmul1-3.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-4.c     | 121 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-5.c     | 149 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-6.c     | 150 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-7.c     |  48 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-1.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-2.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-3.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-4.c     |  49 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-5.c     |  52 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-6.c     |  54 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-1.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-2.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-3.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-4.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-6.c     |  27 +
 .../costmodel/riscv/rvv/dynamic-lmul4-7.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-8.c     |  36 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-1.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-10.c    |  22 +
 .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-3.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-4.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-5.c     |  25 +
 .../costmodel/riscv/rvv/dynamic-lmul8-6.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-7.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-8.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-9.c     |  19 +
 .../riscv/rvv/rvv-costmodel-vect.exp          |  52 ++
 36 files changed, 2189 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp

diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
index 1a5e13d5eb3..fd235e6e26f 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -36,16 +36,530 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tm_p.h"
 #include "tree-vectorizer.h"
+#include "gimple-iterator.h"
+#include "bitmap.h"
+#include "ssa.h"
+#include "backend.h"
 
 /* This file should be included last.  */
 #include "riscv-vector-costs.h"
 
 namespace riscv_vector {
 
+/* Dynamic LMUL philosophy - Local linear-scan SSA live range based analysis
+   determine LMUL
+
+     - Collect all vectorize STMTs locally for each loop block.
+     - Build program point based graph, ignore non-vectorize STMTs:
+
+	   vectorize STMT 0 - point 0
+	   scalar STMT 0 - ignore.
+	   vectorize STMT 1 - point 1
+	   ...
+     - Compute the number of live V_REGs live at each program point
+     - Determine LMUL in VECTOR COST model according to the program point
+       which has maximum live V_REGs.
+
+     Note:
+
+     - BIGGEST_MODE is the biggest LMUL auto-vectorization element mode.
+       It's important for mixed size auto-vectorization (Conversions, ... etc).
+       E.g. For a loop that is vectorizing conversion of INT32 -> INT64.
+       The biggest mode is DImode and LMUL = 8, LMUL = 4 for SImode.
+       We compute the number live V_REGs at each program point according to
+       this information.
+     - We only compute program points and live ranges locally (within a block)
+       since we just need to compute the number of live V_REGs at each program
+       point and we are not really allocating the registers for each SSA.
+       We can make the variable has another local live range in another block
+       if it live out/live in to another block.  Such approach doesn't affect
+       out accurate live range analysis.
+     - Current analysis didn't consider any instruction scheduling which
+       may improve the register pressure.  So we are conservatively doing the
+       analysis which may end up with smaller LMUL.
+       TODO: Maybe we could support a reasonable live range shrink algorithm
+       which take advantage of instruction scheduling.
+     - We may have these following possible autovec modes analysis:
+
+	 1. M8 -> M4 -> M2 -> M1 (stop analysis here) -> MF2 -> MF4 -> MF8
+	 2. M8 -> M1(M4) -> MF2(M2) -> MF4(M1) (stop analysis here) -> MF8(MF2)
+	 3. M1(M8) -> MF2(M4) -> MF4(M2) -> MF8(M1)
+*/
+static hash_map<class loop *, autovec_info> loop_autovec_infos;
+
+/* Return the last VAR live range index if we already have collected.
+   Otherwise return -1.  */
+static int
+get_last_live_range (const vec<var_live_range> &live_ranges, tree var)
+{
+  unsigned int ix;
+  var_live_range *live_range;
+  FOR_EACH_VEC_ELT_REVERSE (live_ranges, ix, live_range)
+    if (live_range->var == var)
+      return ix;
+  return -1;
+}
+
+/* Collect all STMTs that are vectorized and compute their program points.
+   Note that we don't care about the STMTs that are not vectorized and
+   we only build the local graph (within a block) of program points.
+
+   Loop:
+     bb 2:
+       STMT 1 (be vectorized)      -- point 0
+       STMT 2 (not be vectorized)  -- ignored
+       STMT 3 (be vectorized)      -- point 1
+       STMT 4 (be vectorized)      -- point 2
+       STMT 5 (be vectorized)      -- point 3
+       ...
+     bb 3:
+       STMT 1 (be vectorized)      -- point 0
+       STMT 2 (be vectorized)      -- point 1
+       STMT 3 (not be vectorized)  -- ignored
+       STMT 4 (not be vectorized)  -- ignored
+       STMT 5 (be vectorized)      -- point 2
+       ...
+*/
+static void
+compute_local_program_points (
+  vec_info *vinfo,
+  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      gimple_stmt_iterator si;
+      unsigned int i;
+      /* Collect the stmts that is vectorized and mark their program point.  */
+      for (i = 0; i < nbbs; i++)
+	{
+	  int point = 0;
+	  basic_block bb = bbs[i];
+	  vec<stmt_point> program_points = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local program points for bb %d:\n",
+			     bb->index);
+	  for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
+	    {
+	      if (!(is_gimple_assign (gsi_stmt (si))
+		    || is_gimple_call (gsi_stmt (si))))
+		continue;
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  stmt_point info = {point, gsi_stmt (si)};
+		  program_points.safe_push (info);
+		  point++;
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "program point %d: %G", info.point,
+				     gsi_stmt (si));
+		}
+	    }
+	  program_points_per_bb.put (bb, program_points);
+	}
+    }
+}
+
+/* Compute local live ranges of each vectorized variable.
+   Note that we only compute local live ranges (within a block) since
+   local live ranges information is accurate enough for us to determine
+   the LMUL/vectorization factor of the loop.
+
+   Loop:
+     bb 2:
+       STMT 1               -- point 0
+       STMT 2 (def SSA 1)   -- point 1
+       STMT 3 (use SSA 1)   -- point 2
+       STMT 4               -- point 3
+     bb 3:
+       STMT 1               -- point 0
+       STMT 2               -- point 1
+       STMT 3               -- point 2
+       STMT 4 (use SSA 2)   -- point 3
+
+   The live range of SSA 1 is [1, 3] in bb 2.
+   The live range of SSA 2 is [0, 4] in bb 3.  */
+static machine_mode
+compute_local_live_ranges (
+  const hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
+  hash_map<basic_block, vec<var_live_range>> &live_ranges_map)
+{
+  machine_mode biggest_mode = QImode;
+  if (!program_points_per_bb.is_empty ())
+    {
+      auto_vec<tree> visited_vars;
+      unsigned int i;
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_per_bb.begin ();
+	   iter != program_points_per_bb.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<stmt_point> program_points = (*iter).second;
+	  vec<var_live_range> live_ranges = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local live ranges for bb %d:\n",
+			     bb->index);
+	  for (const auto program_point : program_points)
+	    {
+	      int point = program_point.point;
+	      gimple *stmt = program_point.stmt;
+	      if (!gimple_store_p (stmt))
+		{
+		  tree lhs = gimple_get_lhs (stmt);
+		  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+		  if (GET_MODE_SIZE (mode).to_constant ()
+		      > GET_MODE_SIZE (biggest_mode).to_constant ())
+		    biggest_mode = mode;
+		  var_live_range range = {lhs, point, point};
+		  live_ranges.safe_push (range);
+		}
+	      for (i = 0; i < gimple_num_args (stmt); i++)
+		{
+		  tree var = gimple_arg (stmt, i);
+		  if (is_gimple_reg (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
+		    {
+		      machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+		      if (GET_MODE_SIZE (mode).to_constant ()
+			  > GET_MODE_SIZE (biggest_mode).to_constant ())
+			biggest_mode = mode;
+		      int index = get_last_live_range (live_ranges, var);
+		      if (index == -1)
+			{
+			  /* We assume the variable is live from the start of
+			     this block.  */
+			  var_live_range range = {var, 0, point};
+			  live_ranges.safe_push (range);
+			}
+		      else
+			/* We will grow the live range for each use.  */
+			live_ranges[index].end = point;
+		    }
+		}
+	    }
+	  live_ranges_map.put (bb, live_ranges);
+	  if (dump_enabled_p ())
+	    for (i = 0; i < live_ranges.length (); i++)
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "%T: type = %T, start = %d, end = %d\n",
+			       live_ranges[i].var,
+			       TREE_TYPE (live_ranges[i].var),
+			       live_ranges[i].start, live_ranges[i].end);
+	}
+    }
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Biggest mode = %s\n",
+		     GET_MODE_NAME (biggest_mode));
+  return biggest_mode;
+}
+
+/* Compute the mode for MODE, BIGGEST_MODE and LMUL.
+
+   E.g. If mode = SImode, biggest_mode = DImode, LMUL = M4.
+	Then return RVVM4SImode (LMUL = 4, element mode = SImode).  */
+static unsigned int
+compute_nregs_for_mode (machine_mode mode, machine_mode biggest_mode, int lmul)
+{
+  unsigned int mode_size = GET_MODE_SIZE (mode).to_constant ();
+  unsigned int biggest_size = GET_MODE_SIZE (biggest_mode).to_constant ();
+  gcc_assert (biggest_size >= mode_size);
+  unsigned int ratio = biggest_size / mode_size;
+  return lmul / ratio;
+}
+
+/* This function helps to determine whether current LMUL will cause
+   potential vector register (V_REG) spillings according to live range
+   information.
+
+     - First, compute how many variable are alive of each program point
+       in each bb of the loop.
+     - Second, compute how many V_REGs are alive of each program point
+       in each bb of the loop according the BIGGEST_MODE and the variable
+       mode.
+     - Third, Return the maximum V_REGs are alive of the loop.  */
+static unsigned int
+max_number_of_live_regs (const basic_block bb,
+			 const vec<var_live_range> &live_ranges,
+			 unsigned int max_point, machine_mode biggest_mode,
+			 int lmul)
+{
+  unsigned int max_nregs = 0;
+  unsigned int i, j;
+  unsigned int live_point = 0;
+  auto_vec<unsigned int> live_vars_vec;
+  live_vars_vec.safe_grow (max_point + 1, true);
+  for (i = 0; i < live_vars_vec.length (); ++i)
+    live_vars_vec[i] = 0;
+  for (i = 0; i < live_ranges.length (); i++)
+    {
+      var_live_range live_range = live_ranges[i];
+      for (j = live_range.start; j <= live_range.end; j++)
+	{
+	  machine_mode mode = TYPE_MODE (TREE_TYPE (live_range.var));
+	  unsigned int nregs
+	    = compute_nregs_for_mode (mode, biggest_mode, lmul);
+	  live_vars_vec[j] += nregs;
+	  if (live_vars_vec[j] > max_nregs)
+	    max_nregs = live_vars_vec[j];
+	}
+    }
+
+  /* Collect user explicit RVV type.  */
+  bool post_dom_available_p = dom_info_available_p (CDI_POST_DOMINATORS);
+  if (!post_dom_available_p)
+    calculate_dominance_info (CDI_POST_DOMINATORS);
+  auto_vec<basic_block> all_preds
+    = get_all_dominated_blocks (CDI_POST_DOMINATORS, bb);
+  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
+    {
+      tree t = ssa_name (i);
+      if (!t)
+	continue;
+      machine_mode mode = TYPE_MODE (TREE_TYPE (t));
+      if (!lookup_vector_type_attribute (TREE_TYPE (t))
+	  && !riscv_v_ext_vls_mode_p (mode))
+	continue;
+
+      gimple *def = SSA_NAME_DEF_STMT (t);
+      if (gimple_bb (def) && !all_preds.contains (gimple_bb (def)))
+	continue;
+      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
+      const ssa_use_operand_t *ptr;
+
+      for (ptr = head->next; ptr != head; ptr = ptr->next)
+	{
+	  if (!USE_STMT (ptr) || is_gimple_debug (USE_STMT (ptr))
+	      || !dominated_by_p (CDI_POST_DOMINATORS, bb,
+				  gimple_bb (USE_STMT (ptr))))
+	    continue;
+
+	  int regno_alignment = riscv_get_v_regno_alignment (mode);
+	  max_nregs += regno_alignment;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (
+	      MSG_NOTE, vect_location,
+	      "Explicit used SSA %T, vectype = %T, mode = %s, cause %d "
+	      "V_REG live in bb %d at program point %d\n",
+	      t, TREE_TYPE (t), GET_MODE_NAME (mode), regno_alignment,
+	      bb->index, live_point);
+	  break;
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Maximum lmul = %d, %d number of live V_REG at program "
+		     "point %d for bb %d\n",
+		     lmul, max_nregs, live_point, bb->index);
+  if (!post_dom_available_p)
+    free_dominance_info (CDI_POST_DOMINATORS);
+  return max_nregs;
+}
+
+/* Return the LMUL of the current analysis.  */
+static int
+get_current_lmul (class loop *loop)
+{
+  return loop_autovec_infos.get (loop)->current_lmul;
+}
+
+/* Update the live ranges according PHI.
+
+   Loop:
+     bb 2:
+       STMT 1               -- point 0
+       STMT 2 (def SSA 1)   -- point 1
+       STMT 3 (use SSA 1)   -- point 2
+       STMT 4               -- point 3
+     bb 3:
+       SSA 2 = PHI<SSA 1>
+       STMT 1               -- point 0
+       STMT 2               -- point 1
+       STMT 3 (use SSA 2)   -- point 2
+       STMT 4               -- point 3
+
+   Before this function, the SSA 1 live range is [2, 3] in bb 2
+   and SSA 2 is [0, 3] in bb 3.
+
+   Then, after this function, we update SSA 1 live range in bb 2
+   into [2, 4] since SSA 1 is live out into bb 3.  */
+static void
+update_local_live_ranges (
+  vec_info *vinfo,
+  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
+  hash_map<basic_block, vec<var_live_range>> &live_ranges_map)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      unsigned int i, j, k;
+      gphi_iterator psi;
+      for (i = 0; i < nbbs; i++)
+	{
+	  basic_block bb = bbs[i];
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Update local program points for bb %d:\n",
+			     bb->index);
+	  for (psi = gsi_start_phis (bbs[i]); !gsi_end_p (psi); gsi_next (&psi))
+	    {
+	      gphi *phi = psi.phi ();
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (phi);
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  for (j = 0; j < gimple_phi_num_args (phi); j++)
+		    {
+		      edge e = gimple_phi_arg_edge (phi, j);
+		      tree def = gimple_phi_arg_def (phi, j);
+		      auto *live_ranges = live_ranges_map.get (e->src);
+		      if (!program_points_per_bb.get (e->src))
+			continue;
+		      unsigned int max_point
+			= (*program_points_per_bb.get (e->src)).length () - 1;
+		      for (k = 0; k < (*live_ranges).length (); k++)
+			{
+			  if ((*live_ranges)[i].var == def)
+			    {
+			      int end = (*live_ranges)[i].end;
+			      (*live_ranges)[i].end = max_point;
+			      if (dump_enabled_p ())
+				dump_printf_loc (
+				  MSG_NOTE, vect_location,
+				  "Update %T end point from %d to %d:\n",
+				  (*live_ranges)[i].var, end,
+				  (*live_ranges)[i].end);
+			    }
+			}
+		    }
+		}
+	    }
+	}
+    }
+}
+
 costs::costs (vec_info *vinfo, bool costing_for_scalar)
   : vector_costs (vinfo, costing_for_scalar)
 {}
 
+/* Return true that the LMUL of new COST model is preferred.  */
+bool
+costs::prefer_new_lmul_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+  auto this_loop_vinfo = as_a<loop_vec_info> (this->m_vinfo);
+  auto other_loop_vinfo = as_a<loop_vec_info> (other->m_vinfo);
+  class loop *loop = LOOP_VINFO_LOOP (this_loop_vinfo);
+
+  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (this_loop_vinfo)
+      && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (other_loop_vinfo))
+    return false;
+
+  if (loop_autovec_infos.get (loop) && loop_autovec_infos.get (loop)->end_p)
+    return false;
+  else if (loop_autovec_infos.get (loop))
+    loop_autovec_infos.get (loop)->current_lmul
+      = loop_autovec_infos.get (loop)->current_lmul / 2;
+  else
+    {
+      int regno_alignment
+	= riscv_get_v_regno_alignment (other_loop_vinfo->vector_mode);
+      if (known_eq (LOOP_VINFO_SLP_UNROLLING_FACTOR (other_loop_vinfo), 1U))
+	regno_alignment = RVV_M8;
+      loop_autovec_infos.put (loop, {regno_alignment, regno_alignment, false});
+    }
+
+  int lmul = get_current_lmul (loop);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Comparing two main loops (%s at VF %d vs %s at VF %d)\n",
+		     GET_MODE_NAME (this_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (this_loop_vinfo),
+		     GET_MODE_NAME (other_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (other_loop_vinfo));
+
+  /* Compute local program points.
+     It's a fast and effective computation.  */
+  hash_map<basic_block, vec<stmt_point>> program_points_per_bb;
+  compute_local_program_points (other->m_vinfo, program_points_per_bb);
+
+  /* Compute local live ranges.  */
+  hash_map<basic_block, vec<var_live_range>> live_ranges_map;
+  machine_mode biggest_mode
+    = compute_local_live_ranges (program_points_per_bb, live_ranges_map);
+
+  /* Update live ranges according to PHI.  */
+  update_local_live_ranges (other->m_vinfo, program_points_per_bb,
+			    live_ranges_map);
+
+  /* TODO: We calculate the maximum live vars base on current STMTS
+     sequence.  We can support live range shrink if it can give us
+     big improvement in the future.  */
+  if (!live_ranges_map.is_empty ())
+    {
+      unsigned int max_nregs = 0;
+      for (hash_map<basic_block, vec<var_live_range>>::iterator iter
+	   = live_ranges_map.begin ();
+	   iter != live_ranges_map.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<var_live_range> live_ranges = (*iter).second;
+	  unsigned int max_point
+	    = (*program_points_per_bb.get (bb)).length () - 1;
+	  if (live_ranges.is_empty ())
+	    continue;
+	  /* We prefer larger LMUL unless it causes register spillings.  */
+	  unsigned int nregs
+	    = max_number_of_live_regs (bb, live_ranges, max_point, biggest_mode,
+				       lmul);
+	  if (nregs > max_nregs)
+	    max_nregs = nregs;
+	  live_ranges.release ();
+	}
+      live_ranges_map.empty ();
+      if (loop_autovec_infos.get (loop)->current_lmul == RVV_M1
+	  || max_nregs <= V_REG_NUM)
+	loop_autovec_infos.get (loop)->end_p = true;
+      if (loop_autovec_infos.get (loop)->current_lmul > RVV_M1)
+	return max_nregs > V_REG_NUM;
+      return false;
+    }
+  if (!program_points_per_bb.is_empty ())
+    {
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_per_bb.begin ();
+	   iter != program_points_per_bb.end (); ++iter)
+	{
+	  vec<stmt_point> program_points = (*iter).second;
+	  if (!program_points.is_empty ())
+	    program_points.release ();
+	}
+      program_points_per_bb.empty ();
+    }
+  return lmul > RVV_M1;
+}
+
+bool
+costs::better_main_loop_than_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+
+  if (!flag_vect_cost_model)
+    return vector_costs::better_main_loop_than_p (other);
+
+  if (riscv_autovec_lmul == RVV_DYNAMIC)
+    return prefer_new_lmul_p (uncast_other);
+
+  return vector_costs::better_main_loop_than_p (other);
+}
+
 unsigned
 costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 		      stmt_vec_info stmt_info, slp_tree, tree vectype,
diff --git a/gcc/config/riscv/riscv-vector-costs.h b/gcc/config/riscv/riscv-vector-costs.h
index 57b1be01048..000ffd32df4 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -23,6 +23,27 @@
 
 namespace riscv_vector {
 
+struct stmt_point
+{
+  /* Program point.  */
+  unsigned int point;
+  gimple *stmt;
+};
+
+struct var_live_range
+{
+  tree var;
+  unsigned int start;
+  unsigned int end;
+};
+
+struct autovec_info
+{
+  unsigned int initial_lmul;
+  unsigned int current_lmul;
+  bool end_p;
+};
+
 /* rvv-specific vector costs.  */
 class costs : public vector_costs
 {
@@ -31,12 +52,16 @@ class costs : public vector_costs
 public:
   costs (vec_info *, bool);
 
+  bool better_main_loop_than_p (const vector_costs *other) const override;
+
 private:
   unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
 			      stmt_vec_info stmt_info, slp_tree node,
 			      tree vectype, int misalign,
 			      vect_cost_model_location where) override;
   void finish_cost (const vector_costs *) override;
+
+  bool prefer_new_lmul_p (const vector_costs *) const;
 };
 
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index b1f80d1d87c..ec5d563859e 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -70,7 +70,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
 riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
-  fold-const.h $(TM_P_H) tree-vectorizer.h \
+  fold-const.h $(TM_P_H) tree-vectorizer.h gimple-iterator.h bitmap.h \
+  ssa.h backend.h \
   $(srcdir)/config/riscv/riscv-vector-costs.h
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/riscv/riscv-vector-costs.cc
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
new file mode 100644
index 00000000000..fd9f38bc766
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = d5[i] + b[i];
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
new file mode 100644
index 00000000000..6c414bcd115
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
new file mode 100644
index 00000000000..b77f3ff58ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      e[i] = a2[i] + c2[i];
+      e2[i] = d2[i] + a2[i];
+      e3[i] = d3[i] + a3[i];
+      e4[i] = d4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i];
+    }
+}
+
+/* FIXME: Choosing LMUL = 1 is not the optimal since it can be LMUL = 2 if we apply instruction scheduler.  */
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
new file mode 100644
index 00000000000..164930c9bba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
new file mode 100644
index 00000000000..8d80fbfe390
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
@@ -0,0 +1,121 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
new file mode 100644
index 00000000000..7b4014ddaf6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
@@ -0,0 +1,149 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+
+      int32_t *__restrict gggg,
+      int32_t *__restrict gggg2,
+      int32_t *__restrict gggg3,
+      int32_t *__restrict gggg4,
+      int32_t *__restrict gggg5,
+
+      int32_t *__restrict ggggg,
+      int32_t *__restrict ggggg2,
+      int32_t *__restrict ggggg3,
+      int32_t *__restrict ggggg4,
+      int32_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
new file mode 100644
index 00000000000..51d05f2bec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
@@ -0,0 +1,150 @@
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+  
+      int8_t *__restrict gg,
+      int8_t *__restrict gg2,
+      int8_t *__restrict gg3,
+      int8_t *__restrict gg4,
+      int8_t *__restrict gg5,
+  
+      int8_t *__restrict ggg,
+      int8_t *__restrict ggg2,
+      int8_t *__restrict ggg3,
+      int8_t *__restrict ggg4,
+      int8_t *__restrict ggg5,
+
+      int8_t *__restrict gggg,
+      int8_t *__restrict gggg2,
+      int8_t *__restrict gggg3,
+      int8_t *__restrict gggg4,
+      int8_t *__restrict gggg5,
+
+      int8_t *__restrict ggggg,
+      int8_t *__restrict ggggg2,
+      int8_t *__restrict ggggg3,
+      int8_t *__restrict ggggg4,
+      int8_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
new file mode 100644
index 00000000000..dfd71414b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -Wno-psabi -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+vint32m8_t
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n, vint32m8_t vector)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return vector;
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
new file mode 100644
index 00000000000..a80b1b1556a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
new file mode 100644
index 00000000000..9964f3fe8ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (a, 32);
+  __riscv_vse32_v_i32m1 (c, v, 32);
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
new file mode 100644
index 00000000000..ab670bf0c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+typedef int8_t v128qi __attribute__ ((vector_size (128)));
+
+v128qi global_v;
+
+v128qi
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return global_v + 3;
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
new file mode 100644
index 00000000000..a01e32b9f8d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+typedef int8_t v128qi __attribute__ ((vector_size (128)));
+
+v128qi global_v;
+
+v128qi
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < 128; i++)
+    b[i] = global_v[i] + 8;
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return global_v + 3;
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
new file mode 100644
index 00000000000..156ccc7f98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+      int32_t *__restrict d4, int32_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
new file mode 100644
index 00000000000..4cacc039dcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d, int8_t *__restrict d2, int8_t *__restrict d3,
+      int8_t *__restrict d4, int8_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
new file mode 100644
index 00000000000..2308109f00c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int8_t *__restrict b,
+          int8_t *__restrict c,
+          int8_t *__restrict a2,
+          int8_t *__restrict b2,
+          int8_t *__restrict c2,
+          int8_t *__restrict a3,
+          int8_t *__restrict b3,
+          int8_t *__restrict c3,
+          int8_t *__restrict a4,
+          int8_t *__restrict b4,
+          int8_t *__restrict c4,
+          int64_t *__restrict a5,
+          int8_t *__restrict b5,
+          int8_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
new file mode 100644
index 00000000000..2a1521bffb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int64_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
new file mode 100644
index 00000000000..928a507a363
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int16_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int16_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
new file mode 100644
index 00000000000..f16cfb9fd08
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 7] + 1;
+      a[i * 8 + 1] = b[i * 8 + 6] + 2;
+      a[i * 8 + 2] = b[i * 8 + 5] + 3;
+      a[i * 8 + 3] = b[i * 8 + 4] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 2] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
new file mode 100644
index 00000000000..e324380e27b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int8_t *__restrict a,
+          int64_t *__restrict b,
+          int64_t *__restrict c,
+          int64_t *__restrict a2,
+          int64_t *__restrict b2,
+          int64_t *__restrict c2,
+          int64_t *__restrict a3,
+          int64_t *__restrict b3,
+          int64_t *__restrict c3,
+          int64_t *__restrict a4,
+          int64_t *__restrict b4,
+          int64_t *__restrict c4,
+          int8_t *__restrict a5,
+          int64_t *__restrict b5,
+          int64_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
new file mode 100644
index 00000000000..553f2aac0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 16] = b[i * 16 + 15] + 1;
+      a[i * 16 + 1] = b[i * 16 + 14] + 2;
+      a[i * 16 + 2] = b[i * 16 + 13] + 3;
+      a[i * 16 + 3] = b[i * 16 + 12] + 4;
+      a[i * 16 + 4] = b[i * 16 + 11] + 5;
+      a[i * 16 + 5] = b[i * 16 + 10] + 6;
+      a[i * 16 + 6] = b[i * 16 + 9] + 7;
+      a[i * 16 + 7] = b[i * 16 + 8] + 8;
+      
+      a[i * 16 + 8] = b[i * 16 + 7] + 1;
+      a[i * 16 + 9] = b[i * 16 + 6] + 2;
+      a[i * 16 + 10] = b[i * 16 + 5] + 3;
+      a[i * 16 + 11] = b[i * 16 + 4] + 4;
+      a[i * 16 + 12] = b[i * 16 + 3] + 5;
+      a[i * 16 + 13] = b[i * 16 + 2] + 6;
+      a[i * 16 + 14] = b[i * 16 + 1] + 7;
+      a[i * 16 + 15] = b[i * 16 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
new file mode 100644
index 00000000000..e2483004698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
new file mode 100644
index 00000000000..e65abb299a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int
+foo (int *x, int n, int res)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      res += x[i * 2];
+      res += x[i * 2 + 1];
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
new file mode 100644
index 00000000000..a50265fc1ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int16_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
new file mode 100644
index 00000000000..3e9751a22ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
new file mode 100644
index 00000000000..3b2527aad5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+#include <stddef.h>
+
+void
+foo (size_t *__restrict a, size_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
new file mode 100644
index 00000000000..d63926fa56a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++){
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+  }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
new file mode 100644
index 00000000000..5c816140950
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2,
+     int8_t *__restrict b2, int8_t *__restrict a3, int8_t *__restrict b3,
+     int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict a5,
+     int8_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
new file mode 100644
index 00000000000..596608bb8d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict a2,
+     int32_t *__restrict b2, int32_t *__restrict a3, int32_t *__restrict b3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict a5,
+     int32_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
new file mode 100644
index 00000000000..a859d976555
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int8_t
+foo (int8_t *__restrict a, int8_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
new file mode 100644
index 00000000000..b965fd0373a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int64_t
+foo (int64_t *__restrict a, int64_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
new file mode 100644
index 00000000000..a3e8f50b73f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
@@ -0,0 +1,52 @@
+# Copyright (C) 2023-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a riscv target.
+if { ![istarget riscv*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+set gcc_march "rv64gcv_zvfh"
+set gcc_mabi  "lp64d"
+if [istarget riscv32-*-*] then {
+  set gcc_march "rv32gcv_zvfh"
+  set gcc_mabi  "ilp32d"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/dynamic-lmul*.\[cS\]]] \
+	"-O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic" $CFLAGS
+
+# All done.
+dg-finish
-- 
2.36.3


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
  2023-09-11  9:41 [PATCH V3] RISC-V: Support Dynamic LMUL Cost model Juzhe-Zhong
@ 2023-09-11 20:31 ` Robin Dapp
  2023-09-12  0:50   ` juzhe.zhong
  2023-09-12  6:52   ` juzhe.zhong
  0 siblings, 2 replies; 4+ messages in thread
From: Robin Dapp @ 2023-09-11 20:31 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw

Hi Juzhe,

glad that we can use the dominator info directly.  Could we move the
calculation of the info to the beginning (if it's not available)?  That
makes it clearer that it's a prerequisite.  Function comments look
good now.

Some general remarks kind of similar to v1:

 - I would prefer a hash_map or similar to hold the end point for a range
   instead of looking through potentially all ranges in contrived cases.

 - As long as we're just looking for the maximum number of live registers,
   we can use a sliding-window approach:  create a structure with all
   start and end points, sort it, and increase the current pressure
   if we start a new range or decrease.  That's O(n log n).

> +      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +      const ssa_use_operand_t *ptr;
> +
> +      for (ptr = head->next; ptr != head; ptr = ptr->next)
> +	{

Why does FOR_EACH_IMM_USE not work here?

> +		      unsigned int max_point
> +			= (*program_points_per_bb.get (e->src)).length () - 1;
> +		      for (k = 0; k < (*live_ranges).length (); k++)
> +			{
> +			  if ((*live_ranges)[i].var == def)

Would also be nice not having to search through all ranges but just index/hash
it via var (or similar).

What about one test with global live ranges?  Not a necessity IMHO we can still
add it later.

Regards
 Robin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
  2023-09-11 20:31 ` Robin Dapp
@ 2023-09-12  0:50   ` juzhe.zhong
  2023-09-12  6:52   ` juzhe.zhong
  1 sibling, 0 replies; 4+ messages in thread
From: juzhe.zhong @ 2023-09-12  0:50 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

>> What about one test with global live ranges?  Not a necessity IMHO we can still
>> add it later.
We already have.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 04:31
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
glad that we can use the dominator info directly.  Could we move the
calculation of the info to the beginning (if it's not available)?  That
makes it clearer that it's a prerequisite.  Function comments look
good now.
 
Some general remarks kind of similar to v1:
 
- I would prefer a hash_map or similar to hold the end point for a range
   instead of looking through potentially all ranges in contrived cases.
 
- As long as we're just looking for the maximum number of live registers,
   we can use a sliding-window approach:  create a structure with all
   start and end points, sort it, and increase the current pressure
   if we start a new range or decrease.  That's O(n log n).
 
> +      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +      const ssa_use_operand_t *ptr;
> +
> +      for (ptr = head->next; ptr != head; ptr = ptr->next)
> + {
 
Why does FOR_EACH_IMM_USE not work here?
 
> +       unsigned int max_point
> + = (*program_points_per_bb.get (e->src)).length () - 1;
> +       for (k = 0; k < (*live_ranges).length (); k++)
> + {
> +   if ((*live_ranges)[i].var == def)
 
Would also be nice not having to search through all ranges but just index/hash
it via var (or similar).
 
What about one test with global live ranges?  Not a necessity IMHO we can still
add it later.
 
Regards
Robin
 
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
  2023-09-11 20:31 ` Robin Dapp
  2023-09-12  0:50   ` juzhe.zhong
@ 2023-09-12  6:52   ` juzhe.zhong
  1 sibling, 0 replies; 4+ messages in thread
From: juzhe.zhong @ 2023-09-12  6:52 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]

>> As long as we're just looking for the maximum number of live registers,
>> we can use a sliding-window approach:  create a structure with all
>> start and end points, sort it, and increase the current pressure
>> if we start a new range or decrease.  That's O(n log n).

I failed to see it can help. Current approach is straightforward.
  for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
       iter != live_ranges.end (); ++iter)
    {
      tree var = (*iter).first;
      pair live_range = (*iter).second;
      for (i = live_range.first; i <= live_range.second; i++)
        {
          machine_mode mode = TYPE_MODE (TREE_TYPE (var));
          unsigned int nregs
            = compute_nregs_for_mode (mode, biggest_mode, lmul);
          live_vars_vec[i] += nregs;
          if (live_vars_vec[i] > max_nregs)
            max_nregs = live_vars_vec[i];
        }
    }

Could you revise this piece of codes ?

Other comments has been addressed in V4:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629959.html 




juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 04:31
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
glad that we can use the dominator info directly.  Could we move the
calculation of the info to the beginning (if it's not available)?  That
makes it clearer that it's a prerequisite.  Function comments look
good now.
 
Some general remarks kind of similar to v1:
 
- I would prefer a hash_map or similar to hold the end point for a range
   instead of looking through potentially all ranges in contrived cases.
 
- As long as we're just looking for the maximum number of live registers,
   we can use a sliding-window approach:  create a structure with all
   start and end points, sort it, and increase the current pressure
   if we start a new range or decrease.  That's O(n log n).
 
> +      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +      const ssa_use_operand_t *ptr;
> +
> +      for (ptr = head->next; ptr != head; ptr = ptr->next)
> + {
 
Why does FOR_EACH_IMM_USE not work here?
 
> +       unsigned int max_point
> + = (*program_points_per_bb.get (e->src)).length () - 1;
> +       for (k = 0; k < (*live_ranges).length (); k++)
> + {
> +   if ((*live_ranges)[i].var == def)
 
Would also be nice not having to search through all ranges but just index/hash
it via var (or similar).
 
What about one test with global live ranges?  Not a necessity IMHO we can still
add it later.
 
Regards
Robin
 
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-09-12  6:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11  9:41 [PATCH V3] RISC-V: Support Dynamic LMUL Cost model Juzhe-Zhong
2023-09-11 20:31 ` Robin Dapp
2023-09-12  0:50   ` juzhe.zhong
2023-09-12  6:52   ` juzhe.zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).