public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
@ 2023-09-11  9:38 Juzhe-Zhong
  0 siblings, 0 replies; 5+ messages in thread
From: Juzhe-Zhong @ 2023-09-11  9:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong

This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic.

Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
      int32_t *__restrict d,
      int32_t *__restrict d2,
      int32_t *__restrict d3,
      int32_t *__restrict d4,
      int32_t *__restrict d5,
      int n)
{
  for (int i = 0; i < n; i++)
    {
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      d2[i] = a2[i] + c2[i];
      d3[i] = a3[i] + c3[i];
      d4[i] = a4[i] + c4[i];
      d5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i] + a[i];

      c2[i] = a[i] + c[i];
      c3[i] = b5[i] * a5[i];
      c4[i] = a2[i] * a3[i];
      c5[i] = b5[i] * a2[i];
      c[i] = a[i] + c3[i];
      c2[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i]
      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
    }
}

Demo: https://godbolt.org/z/x1acoMxGT

You can see it will produce register spilling if you specify LMUL >= 4

Now, with --param=riscv-autovec-lmul=dynamic.

GCC is able to pick LMUL = 2 to optimized this case.

This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to determine the VF/LMUL.

Note that this patch can well handle both SLP and non-SLP loop.

Currenty approach didn't consider the later instruction scheduler which may improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether we should support live range shrink for such corner case since we don't known whether it can improve performance a lot.)

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (get_last_live_range): New function.
	(compute_nregs_for_mode): Ditto.
	(live_range_conflict_p): Ditto.
	(max_number_of_live_regs): Ditto.
	(compute_lmul): Ditto.
	(costs::prefer_new_lmul_p): Ditto.
	(costs::better_main_loop_than_p): Ditto.
	* config/riscv/riscv-vector-costs.h (struct stmt_point): New struct.
	(struct var_live_range): Ditto.
	(struct autovec_info): Ditto.
	* config/riscv/t-riscv: Update makefile for COST model.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc        | 514 ++++++++++++++++++
 gcc/config/riscv/riscv-vector-costs.h         |  25 +
 gcc/config/riscv/t-riscv                      |   3 +-
 .../riscv/rvv/dynamic-lmul-mixed-1.c          |  50 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-1.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-2.c     |  63 +++
 .../costmodel/riscv/rvv/dynamic-lmul1-3.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-4.c     | 121 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-5.c     | 149 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-6.c     | 150 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-7.c     |  48 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-1.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-2.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-3.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-4.c     |  49 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-5.c     |  52 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-6.c     |  54 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-1.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-2.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-3.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-4.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-6.c     |  27 +
 .../costmodel/riscv/rvv/dynamic-lmul4-7.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-8.c     |  36 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-1.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-10.c    |  22 +
 .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-3.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-4.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-5.c     |  25 +
 .../costmodel/riscv/rvv/dynamic-lmul8-6.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-7.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-8.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-9.c     |  19 +
 .../riscv/rvv/rvv-costmodel-vect.exp          |  52 ++
 36 files changed, 2189 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp

diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
index 1a5e13d5eb3..fd235e6e26f 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -36,16 +36,530 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tm_p.h"
 #include "tree-vectorizer.h"
+#include "gimple-iterator.h"
+#include "bitmap.h"
+#include "ssa.h"
+#include "backend.h"
 
 /* This file should be included last.  */
 #include "riscv-vector-costs.h"
 
 namespace riscv_vector {
 
+/* Dynamic LMUL philosophy - Local linear-scan SSA live range based analysis
+   determine LMUL
+
+     - Collect all vectorize STMTs locally for each loop block.
+     - Build program point based graph, ignore non-vectorize STMTs:
+
+	   vectorize STMT 0 - point 0
+	   scalar STMT 0 - ignore.
+	   vectorize STMT 1 - point 1
+	   ...
+     - Compute the number of live V_REGs live at each program point
+     - Determine LMUL in VECTOR COST model according to the program point
+       which has maximum live V_REGs.
+
+     Note:
+
+     - BIGGEST_MODE is the biggest LMUL auto-vectorization element mode.
+       It's important for mixed size auto-vectorization (Conversions, ... etc).
+       E.g. For a loop that is vectorizing conversion of INT32 -> INT64.
+       The biggest mode is DImode and LMUL = 8, LMUL = 4 for SImode.
+       We compute the number live V_REGs at each program point according to
+       this information.
+     - We only compute program points and live ranges locally (within a block)
+       since we just need to compute the number of live V_REGs at each program
+       point and we are not really allocating the registers for each SSA.
+       We can make the variable has another local live range in another block
+       if it live out/live in to another block.  Such approach doesn't affect
+       out accurate live range analysis.
+     - Current analysis didn't consider any instruction scheduling which
+       may improve the register pressure.  So we are conservatively doing the
+       analysis which may end up with smaller LMUL.
+       TODO: Maybe we could support a reasonable live range shrink algorithm
+       which take advantage of instruction scheduling.
+     - We may have these following possible autovec modes analysis:
+
+	 1. M8 -> M4 -> M2 -> M1 (stop analysis here) -> MF2 -> MF4 -> MF8
+	 2. M8 -> M1(M4) -> MF2(M2) -> MF4(M1) (stop analysis here) -> MF8(MF2)
+	 3. M1(M8) -> MF2(M4) -> MF4(M2) -> MF8(M1)
+*/
+static hash_map<class loop *, autovec_info> loop_autovec_infos;
+
+/* Return the last VAR live range index if we already have collected.
+   Otherwise return -1.  */
+static int
+get_last_live_range (const vec<var_live_range> &live_ranges, tree var)
+{
+  unsigned int ix;
+  var_live_range *live_range;
+  FOR_EACH_VEC_ELT_REVERSE (live_ranges, ix, live_range)
+    if (live_range->var == var)
+      return ix;
+  return -1;
+}
+
+/* Collect all STMTs that are vectorized and compute their program points.
+   Note that we don't care about the STMTs that are not vectorized and
+   we only build the local graph (within a block) of program points.
+
+   Loop:
+     bb 2:
+       STMT 1 (be vectorized)      -- point 0
+       STMT 2 (not be vectorized)  -- ignored
+       STMT 3 (be vectorized)      -- point 1
+       STMT 4 (be vectorized)      -- point 2
+       STMT 5 (be vectorized)      -- point 3
+       ...
+     bb 3:
+       STMT 1 (be vectorized)      -- point 0
+       STMT 2 (be vectorized)      -- point 1
+       STMT 3 (not be vectorized)  -- ignored
+       STMT 4 (not be vectorized)  -- ignored
+       STMT 5 (be vectorized)      -- point 2
+       ...
+*/
+static void
+compute_local_program_points (
+  vec_info *vinfo,
+  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      gimple_stmt_iterator si;
+      unsigned int i;
+      /* Collect the stmts that is vectorized and mark their program point.  */
+      for (i = 0; i < nbbs; i++)
+	{
+	  int point = 0;
+	  basic_block bb = bbs[i];
+	  vec<stmt_point> program_points = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local program points for bb %d:\n",
+			     bb->index);
+	  for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
+	    {
+	      if (!(is_gimple_assign (gsi_stmt (si))
+		    || is_gimple_call (gsi_stmt (si))))
+		continue;
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  stmt_point info = {point, gsi_stmt (si)};
+		  program_points.safe_push (info);
+		  point++;
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "program point %d: %G", info.point,
+				     gsi_stmt (si));
+		}
+	    }
+	  program_points_per_bb.put (bb, program_points);
+	}
+    }
+}
+
+/* Compute local live ranges of each vectorized variable.
+   Note that we only compute local live ranges (within a block) since
+   local live ranges information is accurate enough for us to determine
+   the LMUL/vectorization factor of the loop.
+
+   Loop:
+     bb 2:
+       STMT 1               -- point 0
+       STMT 2 (def SSA 1)   -- point 1
+       STMT 3 (use SSA 1)   -- point 2
+       STMT 4               -- point 3
+     bb 3:
+       STMT 1               -- point 0
+       STMT 2               -- point 1
+       STMT 3               -- point 2
+       STMT 4 (use SSA 2)   -- point 3
+
+   The live range of SSA 1 is [1, 3] in bb 2.
+   The live range of SSA 2 is [0, 4] in bb 3.  */
+static machine_mode
+compute_local_live_ranges (
+  const hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
+  hash_map<basic_block, vec<var_live_range>> &live_ranges_map)
+{
+  machine_mode biggest_mode = QImode;
+  if (!program_points_per_bb.is_empty ())
+    {
+      auto_vec<tree> visited_vars;
+      unsigned int i;
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_per_bb.begin ();
+	   iter != program_points_per_bb.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<stmt_point> program_points = (*iter).second;
+	  vec<var_live_range> live_ranges = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local live ranges for bb %d:\n",
+			     bb->index);
+	  for (const auto program_point : program_points)
+	    {
+	      int point = program_point.point;
+	      gimple *stmt = program_point.stmt;
+	      if (!gimple_store_p (stmt))
+		{
+		  tree lhs = gimple_get_lhs (stmt);
+		  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+		  if (GET_MODE_SIZE (mode).to_constant ()
+		      > GET_MODE_SIZE (biggest_mode).to_constant ())
+		    biggest_mode = mode;
+		  var_live_range range = {lhs, point, point};
+		  live_ranges.safe_push (range);
+		}
+	      for (i = 0; i < gimple_num_args (stmt); i++)
+		{
+		  tree var = gimple_arg (stmt, i);
+		  if (is_gimple_reg (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
+		    {
+		      machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+		      if (GET_MODE_SIZE (mode).to_constant ()
+			  > GET_MODE_SIZE (biggest_mode).to_constant ())
+			biggest_mode = mode;
+		      int index = get_last_live_range (live_ranges, var);
+		      if (index == -1)
+			{
+			  /* We assume the variable is live from the start of
+			     this block.  */
+			  var_live_range range = {var, 0, point};
+			  live_ranges.safe_push (range);
+			}
+		      else
+			/* We will grow the live range for each use.  */
+			live_ranges[index].end = point;
+		    }
+		}
+	    }
+	  live_ranges_map.put (bb, live_ranges);
+	  if (dump_enabled_p ())
+	    for (i = 0; i < live_ranges.length (); i++)
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "%T: type = %T, start = %d, end = %d\n",
+			       live_ranges[i].var,
+			       TREE_TYPE (live_ranges[i].var),
+			       live_ranges[i].start, live_ranges[i].end);
+	}
+    }
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Biggest mode = %s\n",
+		     GET_MODE_NAME (biggest_mode));
+  return biggest_mode;
+}
+
+/* Compute the mode for MODE, BIGGEST_MODE and LMUL.
+
+   E.g. If mode = SImode, biggest_mode = DImode, LMUL = M4.
+	Then return RVVM4SImode (LMUL = 4, element mode = SImode).  */
+static unsigned int
+compute_nregs_for_mode (machine_mode mode, machine_mode biggest_mode, int lmul)
+{
+  unsigned int mode_size = GET_MODE_SIZE (mode).to_constant ();
+  unsigned int biggest_size = GET_MODE_SIZE (biggest_mode).to_constant ();
+  gcc_assert (biggest_size >= mode_size);
+  unsigned int ratio = biggest_size / mode_size;
+  return lmul / ratio;
+}
+
+/* This function helps to determine whether current LMUL will cause
+   potential vector register (V_REG) spillings according to live range
+   information.
+
+     - First, compute how many variable are alive of each program point
+       in each bb of the loop.
+     - Second, compute how many V_REGs are alive of each program point
+       in each bb of the loop according the BIGGEST_MODE and the variable
+       mode.
+     - Third, Return the maximum V_REGs are alive of the loop.  */
+static unsigned int
+max_number_of_live_regs (const basic_block bb,
+			 const vec<var_live_range> &live_ranges,
+			 unsigned int max_point, machine_mode biggest_mode,
+			 int lmul)
+{
+  unsigned int max_nregs = 0;
+  unsigned int i, j;
+  unsigned int live_point = 0;
+  auto_vec<unsigned int> live_vars_vec;
+  live_vars_vec.safe_grow (max_point + 1, true);
+  for (i = 0; i < live_vars_vec.length (); ++i)
+    live_vars_vec[i] = 0;
+  for (i = 0; i < live_ranges.length (); i++)
+    {
+      var_live_range live_range = live_ranges[i];
+      for (j = live_range.start; j <= live_range.end; j++)
+	{
+	  machine_mode mode = TYPE_MODE (TREE_TYPE (live_range.var));
+	  unsigned int nregs
+	    = compute_nregs_for_mode (mode, biggest_mode, lmul);
+	  live_vars_vec[j] += nregs;
+	  if (live_vars_vec[j] > max_nregs)
+	    max_nregs = live_vars_vec[j];
+	}
+    }
+
+  /* Collect user explicit RVV type.  */
+  bool post_dom_available_p = dom_info_available_p (CDI_POST_DOMINATORS);
+  if (!post_dom_available_p)
+    calculate_dominance_info (CDI_POST_DOMINATORS);
+  auto_vec<basic_block> all_preds
+    = get_all_dominated_blocks (CDI_POST_DOMINATORS, bb);
+  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
+    {
+      tree t = ssa_name (i);
+      if (!t)
+	continue;
+      machine_mode mode = TYPE_MODE (TREE_TYPE (t));
+      if (!lookup_vector_type_attribute (TREE_TYPE (t))
+	  && !riscv_v_ext_vls_mode_p (mode))
+	continue;
+
+      gimple *def = SSA_NAME_DEF_STMT (t);
+      if (gimple_bb (def) && !all_preds.contains (gimple_bb (def)))
+	continue;
+      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
+      const ssa_use_operand_t *ptr;
+
+      for (ptr = head->next; ptr != head; ptr = ptr->next)
+	{
+	  if (!USE_STMT (ptr) || is_gimple_debug (USE_STMT (ptr))
+	      || !dominated_by_p (CDI_POST_DOMINATORS, bb,
+				  gimple_bb (USE_STMT (ptr))))
+	    continue;
+
+	  int regno_alignment = riscv_get_v_regno_alignment (mode);
+	  max_nregs += regno_alignment;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (
+	      MSG_NOTE, vect_location,
+	      "Explicit used SSA %T, vectype = %T, mode = %s, cause %d "
+	      "V_REG live in bb %d at program point %d\n",
+	      t, TREE_TYPE (t), GET_MODE_NAME (mode), regno_alignment,
+	      bb->index, live_point);
+	  break;
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Maximum lmul = %d, %d number of live V_REG at program "
+		     "point %d for bb %d\n",
+		     lmul, max_nregs, live_point, bb->index);
+  if (!post_dom_available_p)
+    free_dominance_info (CDI_POST_DOMINATORS);
+  return max_nregs;
+}
+
+/* Return the LMUL of the current analysis.  */
+static int
+get_current_lmul (class loop *loop)
+{
+  return loop_autovec_infos.get (loop)->current_lmul;
+}
+
+/* Update the live ranges according PHI.
+
+   Loop:
+     bb 2:
+       STMT 1               -- point 0
+       STMT 2 (def SSA 1)   -- point 1
+       STMT 3 (use SSA 1)   -- point 2
+       STMT 4               -- point 3
+     bb 3:
+       SSA 2 = PHI<SSA 1>
+       STMT 1               -- point 0
+       STMT 2               -- point 1
+       STMT 3 (use SSA 2)   -- point 2
+       STMT 4               -- point 3
+
+   Before this function, the SSA 1 live range is [2, 3] in bb 2
+   and SSA 2 is [0, 3] in bb 3.
+
+   Then, after this function, we update SSA 1 live range in bb 2
+   into [2, 4] since SSA 1 is live out into bb 3.  */
+static void
+update_local_live_ranges (
+  vec_info *vinfo,
+  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
+  hash_map<basic_block, vec<var_live_range>> &live_ranges_map)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      unsigned int i, j, k;
+      gphi_iterator psi;
+      for (i = 0; i < nbbs; i++)
+	{
+	  basic_block bb = bbs[i];
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Update local program points for bb %d:\n",
+			     bb->index);
+	  for (psi = gsi_start_phis (bbs[i]); !gsi_end_p (psi); gsi_next (&psi))
+	    {
+	      gphi *phi = psi.phi ();
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (phi);
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  for (j = 0; j < gimple_phi_num_args (phi); j++)
+		    {
+		      edge e = gimple_phi_arg_edge (phi, j);
+		      tree def = gimple_phi_arg_def (phi, j);
+		      auto *live_ranges = live_ranges_map.get (e->src);
+		      if (!program_points_per_bb.get (e->src))
+			continue;
+		      unsigned int max_point
+			= (*program_points_per_bb.get (e->src)).length () - 1;
+		      for (k = 0; k < (*live_ranges).length (); k++)
+			{
+			  if ((*live_ranges)[i].var == def)
+			    {
+			      int end = (*live_ranges)[i].end;
+			      (*live_ranges)[i].end = max_point;
+			      if (dump_enabled_p ())
+				dump_printf_loc (
+				  MSG_NOTE, vect_location,
+				  "Update %T end point from %d to %d:\n",
+				  (*live_ranges)[i].var, end,
+				  (*live_ranges)[i].end);
+			    }
+			}
+		    }
+		}
+	    }
+	}
+    }
+}
+
 costs::costs (vec_info *vinfo, bool costing_for_scalar)
   : vector_costs (vinfo, costing_for_scalar)
 {}
 
+/* Return true that the LMUL of new COST model is preferred.  */
+bool
+costs::prefer_new_lmul_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+  auto this_loop_vinfo = as_a<loop_vec_info> (this->m_vinfo);
+  auto other_loop_vinfo = as_a<loop_vec_info> (other->m_vinfo);
+  class loop *loop = LOOP_VINFO_LOOP (this_loop_vinfo);
+
+  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (this_loop_vinfo)
+      && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (other_loop_vinfo))
+    return false;
+
+  if (loop_autovec_infos.get (loop) && loop_autovec_infos.get (loop)->end_p)
+    return false;
+  else if (loop_autovec_infos.get (loop))
+    loop_autovec_infos.get (loop)->current_lmul
+      = loop_autovec_infos.get (loop)->current_lmul / 2;
+  else
+    {
+      int regno_alignment
+	= riscv_get_v_regno_alignment (other_loop_vinfo->vector_mode);
+      if (known_eq (LOOP_VINFO_SLP_UNROLLING_FACTOR (other_loop_vinfo), 1U))
+	regno_alignment = RVV_M8;
+      loop_autovec_infos.put (loop, {regno_alignment, regno_alignment, false});
+    }
+
+  int lmul = get_current_lmul (loop);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Comparing two main loops (%s at VF %d vs %s at VF %d)\n",
+		     GET_MODE_NAME (this_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (this_loop_vinfo),
+		     GET_MODE_NAME (other_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (other_loop_vinfo));
+
+  /* Compute local program points.
+     It's a fast and effective computation.  */
+  hash_map<basic_block, vec<stmt_point>> program_points_per_bb;
+  compute_local_program_points (other->m_vinfo, program_points_per_bb);
+
+  /* Compute local live ranges.  */
+  hash_map<basic_block, vec<var_live_range>> live_ranges_map;
+  machine_mode biggest_mode
+    = compute_local_live_ranges (program_points_per_bb, live_ranges_map);
+
+  /* Update live ranges according to PHI.  */
+  update_local_live_ranges (other->m_vinfo, program_points_per_bb,
+			    live_ranges_map);
+
+  /* TODO: We calculate the maximum live vars base on current STMTS
+     sequence.  We can support live range shrink if it can give us
+     big improvement in the future.  */
+  if (!live_ranges_map.is_empty ())
+    {
+      unsigned int max_nregs = 0;
+      for (hash_map<basic_block, vec<var_live_range>>::iterator iter
+	   = live_ranges_map.begin ();
+	   iter != live_ranges_map.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<var_live_range> live_ranges = (*iter).second;
+	  unsigned int max_point
+	    = (*program_points_per_bb.get (bb)).length () - 1;
+	  if (live_ranges.is_empty ())
+	    continue;
+	  /* We prefer larger LMUL unless it causes register spillings.  */
+	  unsigned int nregs
+	    = max_number_of_live_regs (bb, live_ranges, max_point, biggest_mode,
+				       lmul);
+	  if (nregs > max_nregs)
+	    max_nregs = nregs;
+	  live_ranges.release ();
+	}
+      live_ranges_map.empty ();
+      if (loop_autovec_infos.get (loop)->current_lmul == RVV_M1
+	  || max_nregs <= V_REG_NUM)
+	loop_autovec_infos.get (loop)->end_p = true;
+      if (loop_autovec_infos.get (loop)->current_lmul > RVV_M1)
+	return max_nregs > V_REG_NUM;
+      return false;
+    }
+  if (!program_points_per_bb.is_empty ())
+    {
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_per_bb.begin ();
+	   iter != program_points_per_bb.end (); ++iter)
+	{
+	  vec<stmt_point> program_points = (*iter).second;
+	  if (!program_points.is_empty ())
+	    program_points.release ();
+	}
+      program_points_per_bb.empty ();
+    }
+  return lmul > RVV_M1;
+}
+
+bool
+costs::better_main_loop_than_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+
+  if (!flag_vect_cost_model)
+    return vector_costs::better_main_loop_than_p (other);
+
+  if (riscv_autovec_lmul == RVV_DYNAMIC)
+    return prefer_new_lmul_p (uncast_other);
+
+  return vector_costs::better_main_loop_than_p (other);
+}
+
 unsigned
 costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 		      stmt_vec_info stmt_info, slp_tree, tree vectype,
diff --git a/gcc/config/riscv/riscv-vector-costs.h b/gcc/config/riscv/riscv-vector-costs.h
index 57b1be01048..000ffd32df4 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -23,6 +23,27 @@
 
 namespace riscv_vector {
 
+struct stmt_point
+{
+  /* Program point.  */
+  unsigned int point;
+  gimple *stmt;
+};
+
+struct var_live_range
+{
+  tree var;
+  unsigned int start;
+  unsigned int end;
+};
+
+struct autovec_info
+{
+  unsigned int initial_lmul;
+  unsigned int current_lmul;
+  bool end_p;
+};
+
 /* rvv-specific vector costs.  */
 class costs : public vector_costs
 {
@@ -31,12 +52,16 @@ class costs : public vector_costs
 public:
   costs (vec_info *, bool);
 
+  bool better_main_loop_than_p (const vector_costs *other) const override;
+
 private:
   unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
 			      stmt_vec_info stmt_info, slp_tree node,
 			      tree vectype, int misalign,
 			      vect_cost_model_location where) override;
   void finish_cost (const vector_costs *) override;
+
+  bool prefer_new_lmul_p (const vector_costs *) const;
 };
 
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index b1f80d1d87c..ec5d563859e 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -70,7 +70,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
 riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
-  fold-const.h $(TM_P_H) tree-vectorizer.h \
+  fold-const.h $(TM_P_H) tree-vectorizer.h gimple-iterator.h bitmap.h \
+  ssa.h backend.h \
   $(srcdir)/config/riscv/riscv-vector-costs.h
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/riscv/riscv-vector-costs.cc
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
new file mode 100644
index 00000000000..fd9f38bc766
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = d5[i] + b[i];
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
new file mode 100644
index 00000000000..6c414bcd115
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
new file mode 100644
index 00000000000..b77f3ff58ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      e[i] = a2[i] + c2[i];
+      e2[i] = d2[i] + a2[i];
+      e3[i] = d3[i] + a3[i];
+      e4[i] = d4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i];
+    }
+}
+
+/* FIXME: Choosing LMUL = 1 is not the optimal since it can be LMUL = 2 if we apply instruction scheduler.  */
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
new file mode 100644
index 00000000000..164930c9bba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
new file mode 100644
index 00000000000..8d80fbfe390
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
@@ -0,0 +1,121 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
new file mode 100644
index 00000000000..7b4014ddaf6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
@@ -0,0 +1,149 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+
+      int32_t *__restrict gggg,
+      int32_t *__restrict gggg2,
+      int32_t *__restrict gggg3,
+      int32_t *__restrict gggg4,
+      int32_t *__restrict gggg5,
+
+      int32_t *__restrict ggggg,
+      int32_t *__restrict ggggg2,
+      int32_t *__restrict ggggg3,
+      int32_t *__restrict ggggg4,
+      int32_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
new file mode 100644
index 00000000000..51d05f2bec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
@@ -0,0 +1,150 @@
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+  
+      int8_t *__restrict gg,
+      int8_t *__restrict gg2,
+      int8_t *__restrict gg3,
+      int8_t *__restrict gg4,
+      int8_t *__restrict gg5,
+  
+      int8_t *__restrict ggg,
+      int8_t *__restrict ggg2,
+      int8_t *__restrict ggg3,
+      int8_t *__restrict ggg4,
+      int8_t *__restrict ggg5,
+
+      int8_t *__restrict gggg,
+      int8_t *__restrict gggg2,
+      int8_t *__restrict gggg3,
+      int8_t *__restrict gggg4,
+      int8_t *__restrict gggg5,
+
+      int8_t *__restrict ggggg,
+      int8_t *__restrict ggggg2,
+      int8_t *__restrict ggggg3,
+      int8_t *__restrict ggggg4,
+      int8_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
new file mode 100644
index 00000000000..dfd71414b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -Wno-psabi -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+vint32m8_t
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n, vint32m8_t vector)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return vector;
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
new file mode 100644
index 00000000000..a80b1b1556a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
new file mode 100644
index 00000000000..9964f3fe8ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (a, 32);
+  __riscv_vse32_v_i32m1 (c, v, 32);
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
new file mode 100644
index 00000000000..ab670bf0c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+typedef int8_t v128qi __attribute__ ((vector_size (128)));
+
+v128qi global_v;
+
+v128qi
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return global_v + 3;
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
new file mode 100644
index 00000000000..a01e32b9f8d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+typedef int8_t v128qi __attribute__ ((vector_size (128)));
+
+v128qi global_v;
+
+v128qi
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < 128; i++)
+    b[i] = global_v[i] + 8;
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return global_v + 3;
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
new file mode 100644
index 00000000000..156ccc7f98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+      int32_t *__restrict d4, int32_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
new file mode 100644
index 00000000000..4cacc039dcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d, int8_t *__restrict d2, int8_t *__restrict d3,
+      int8_t *__restrict d4, int8_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
new file mode 100644
index 00000000000..2308109f00c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int8_t *__restrict b,
+          int8_t *__restrict c,
+          int8_t *__restrict a2,
+          int8_t *__restrict b2,
+          int8_t *__restrict c2,
+          int8_t *__restrict a3,
+          int8_t *__restrict b3,
+          int8_t *__restrict c3,
+          int8_t *__restrict a4,
+          int8_t *__restrict b4,
+          int8_t *__restrict c4,
+          int64_t *__restrict a5,
+          int8_t *__restrict b5,
+          int8_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
new file mode 100644
index 00000000000..2a1521bffb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int64_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
new file mode 100644
index 00000000000..928a507a363
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int16_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int16_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
new file mode 100644
index 00000000000..f16cfb9fd08
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 7] + 1;
+      a[i * 8 + 1] = b[i * 8 + 6] + 2;
+      a[i * 8 + 2] = b[i * 8 + 5] + 3;
+      a[i * 8 + 3] = b[i * 8 + 4] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 2] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
new file mode 100644
index 00000000000..e324380e27b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int8_t *__restrict a,
+          int64_t *__restrict b,
+          int64_t *__restrict c,
+          int64_t *__restrict a2,
+          int64_t *__restrict b2,
+          int64_t *__restrict c2,
+          int64_t *__restrict a3,
+          int64_t *__restrict b3,
+          int64_t *__restrict c3,
+          int64_t *__restrict a4,
+          int64_t *__restrict b4,
+          int64_t *__restrict c4,
+          int8_t *__restrict a5,
+          int64_t *__restrict b5,
+          int64_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
new file mode 100644
index 00000000000..553f2aac0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 16] = b[i * 16 + 15] + 1;
+      a[i * 16 + 1] = b[i * 16 + 14] + 2;
+      a[i * 16 + 2] = b[i * 16 + 13] + 3;
+      a[i * 16 + 3] = b[i * 16 + 12] + 4;
+      a[i * 16 + 4] = b[i * 16 + 11] + 5;
+      a[i * 16 + 5] = b[i * 16 + 10] + 6;
+      a[i * 16 + 6] = b[i * 16 + 9] + 7;
+      a[i * 16 + 7] = b[i * 16 + 8] + 8;
+      
+      a[i * 16 + 8] = b[i * 16 + 7] + 1;
+      a[i * 16 + 9] = b[i * 16 + 6] + 2;
+      a[i * 16 + 10] = b[i * 16 + 5] + 3;
+      a[i * 16 + 11] = b[i * 16 + 4] + 4;
+      a[i * 16 + 12] = b[i * 16 + 3] + 5;
+      a[i * 16 + 13] = b[i * 16 + 2] + 6;
+      a[i * 16 + 14] = b[i * 16 + 1] + 7;
+      a[i * 16 + 15] = b[i * 16 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
new file mode 100644
index 00000000000..e2483004698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
new file mode 100644
index 00000000000..e65abb299a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int
+foo (int *x, int n, int res)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      res += x[i * 2];
+      res += x[i * 2 + 1];
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
new file mode 100644
index 00000000000..a50265fc1ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int16_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
new file mode 100644
index 00000000000..3e9751a22ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
new file mode 100644
index 00000000000..3b2527aad5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+#include <stddef.h>
+
+void
+foo (size_t *__restrict a, size_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
new file mode 100644
index 00000000000..d63926fa56a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++){
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+  }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
new file mode 100644
index 00000000000..5c816140950
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2,
+     int8_t *__restrict b2, int8_t *__restrict a3, int8_t *__restrict b3,
+     int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict a5,
+     int8_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
new file mode 100644
index 00000000000..596608bb8d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict a2,
+     int32_t *__restrict b2, int32_t *__restrict a3, int32_t *__restrict b3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict a5,
+     int32_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
new file mode 100644
index 00000000000..a859d976555
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int8_t
+foo (int8_t *__restrict a, int8_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
new file mode 100644
index 00000000000..b965fd0373a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int64_t
+foo (int64_t *__restrict a, int64_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
new file mode 100644
index 00000000000..a3e8f50b73f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
@@ -0,0 +1,52 @@
+# Copyright (C) 2023-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a riscv target.
+if { ![istarget riscv*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+set gcc_march "rv64gcv_zvfh"
+set gcc_mabi  "lp64d"
+if [istarget riscv32-*-*] then {
+  set gcc_march "rv32gcv_zvfh"
+  set gcc_mabi  "ilp32d"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/dynamic-lmul*.\[cS\]]] \
+	"-O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic" $CFLAGS
+
+# All done.
+dg-finish
-- 
2.36.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
  2023-09-05  8:56 Juzhe-Zhong
  2023-09-05 21:02 ` Robin Dapp
@ 2023-09-06 13:22 ` Robin Dapp
  1 sibling, 0 replies; 5+ messages in thread
From: Robin Dapp @ 2023-09-06 13:22 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw

Hi Juzhe,

general remark upfront:  Please add function-level comments for all
functions.  This makes reading and reviewing much easier.  I had to sweep
back and forth quite a bit.

> +
> +static int
> +get_last_live_range (const vec<var_live_range> &live_ranges, tree var)
> +{
> +  unsigned int ix;
> +  var_live_range *live_range;
> +  FOR_EACH_VEC_ELT_REVERSE (live_ranges, ix, live_range)
> +    if (live_range->var == var)
> +      return ix;
> +  return -1;
> +}

From reading the usage site of this function it looks like we could benefit
from having the live ranges be a hash_map as well?  That way we wouldn't
need to scan through the list every time.  Something like
hash_map<tree, pair<int, int>>.  It looks like we only consider the range
end anyway.

> +		      int index = get_last_live_range (live_ranges, var);

That way we could avoid some worst-case behavior here for pathological
inputs.

> +		      if (index == -1)
> +			{
> +			  var_live_range range = {var, 0, point};
> +			  live_ranges.safe_push (range);
> +			}

Please add a comment that we assume the variable is live from the start
of this block. 

> +		      else
> +			live_ranges[index].end = point;

And here a comment that we will grow the live range for each use.

> +static bool
> +live_range_conflict_p (const var_live_range &live_range1,
> +		       const var_live_range &live_range2)
> +{
> +  if (live_range1.start >= live_range2.end)
> +    return false;
> +  if (live_range1.end <= live_range2.start)
> +    return false;
> +  if (live_range2.start >= live_range1.end)
> +    return false;
> +  if (live_range2.end <= live_range1.start)
> +    return false;
> +  return true;
> +}

Rename to live_range_overlap_p and simplify to
 return a.end >= b.start || b.end >= a.start;

> +
> +static unsigned int
> +max_number_of_live_regs (const basic_block bb,
> +			 const vec<var_live_range> &live_ranges,
> +			 machine_mode biggest_mode, int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i, j, k;
> +  unsigned int live_point = 0;
> +  for (i = 0; i < live_ranges.length (); i++)
> +    {
> +      auto_vec<var_live_range> conflict_live_ranges;
> +      var_live_range live_range = live_ranges[i];
> +      conflict_live_ranges.safe_push (live_range);
> +      unsigned int min_point = live_range.start;
> +      unsigned int max_point = live_range.end;
> +      for (j = 0; j < live_ranges.length (); j++)
> +	{
> +	  if (j == i)
> +	    continue;
> +	  if (live_range_conflict_p (live_range, live_ranges[j]))
> +	    {
> +	      conflict_live_ranges.safe_push (live_ranges[j]);
> +	      min_point
> +		= std::min (min_point, (unsigned int) live_ranges[j].start);
> +	      max_point
> +		= std::max (max_point, (unsigned int) live_ranges[j].end);
> +	    }
> +	}
> +      for (j = min_point; j <= max_point; j++)
> +	{
> +	  unsigned int nregs = 0;
> +	  for (k = 0; k < conflict_live_ranges.length (); k++)
> +	    {
> +	      if (j >= (unsigned int) conflict_live_ranges[k].start
> +		  && j <= (unsigned int) conflict_live_ranges[k].end)
> +		{
> +		  machine_mode mode
> +		    = TYPE_MODE (TREE_TYPE (conflict_live_ranges[k].var));
> +		  nregs += compute_nregs_for_mode (mode, biggest_mode, lmul);
> +		}
> +	    }
> +	  if (nregs > max_nregs)
> +	    {
> +	      max_nregs = nregs;
> +	      live_point = j;
> +	    }
> +	}
> +    }

This looks pretty quadratic in the number of live ranges (or even cubic?).
Can't it be done more efficiently using a sliding-window approach by sorting
the live ranges according to their start point before?
Also std::min/max -> MIN/MAX.

> +
> +  /* Collect user explicit RVV type.  */
> +  hash_set<basic_block> all_preds = get_all_predecessors (bb);
> +  hash_set<basic_block> all_succs = get_all_successors (bb);

As mentioned before, maybe dominator info could help here?

> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +    {
> +      tree t = ssa_name (i);
> +      if (!t)
> +	continue;
> +      machine_mode mode = TYPE_MODE (TREE_TYPE (t));
> +      if (!lookup_vector_type_attribute (TREE_TYPE (t))
> +	  && !riscv_v_ext_vls_mode_p (mode))
> +	continue;
> +
> +      gimple *def = SSA_NAME_DEF_STMT (t);
> +      if (gimple_bb (def) && !all_preds.contains (gimple_bb (def)))
> +	continue;
> +      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +      const ssa_use_operand_t *ptr;
> +
> +      for (ptr = head->next; ptr != head; ptr = ptr->next)
> +	{
> +	  if (USE_STMT (ptr) && !is_gimple_debug (USE_STMT (ptr)))
> +	    {
> +	      if (all_succs.contains (gimple_bb (USE_STMT (ptr))))
> +		{

Reverse the conditions and continue, i.e. if (!USE_STMT || is_gimple_debug
 || !all_succs.contains).

> +
> +static int
> +compute_lmul (class loop *loop)
> +{
> +  unsigned int current_lmul = loop_autovec_infos.get (loop)->current_lmul;
> +  return current_lmul;
> +}

return loop_autovec_infos.get (loop)->current_lmul;
Besides, the function does not compute so maybe change its name?
And assert that we have loop_autovec_infos for that loop.
Or maybe inline it altogether.

> +bool
> +costs::prefer_new_lmul_p (const vector_costs *uncast_other) const
> +{

I don't remember how often the cost hooks are actually called so maybe
this is unnecessary: Does it make sense to cache the computation results
here in case we are called more than once for the same loop? 

Regarding tests:  I would like to have at least one test with global
variables that overlap local ones.

Regards
 Robin


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
  2023-09-05 21:39   ` 钟居哲
@ 2023-09-05 21:53     ` Jeff Law
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Law @ 2023-09-05 21:53 UTC (permalink / raw)
  To: 钟居哲, rdapp.gcc, gcc-patches; +Cc: kito.cheng, kito.cheng



On 9/5/23 15:39, 钟居哲 wrote:
> - Why don't we use the normal reverse postorder (or postorder) approach of
>     computing live ranges?  Is that because we don't really need full global
>     live ranges?
> 
> Yes. We don't need global live ranges.
> 
> - Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
>     know the answer but an explanation (in a comment) would still be useful.
> 
> The existing code can't help I have tried many times.
I would expect it to be fairly hard to use for this purpose.  I've tried 
to use it in other contexts as well without success.

Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
  2023-09-05  8:56 Juzhe-Zhong
@ 2023-09-05 21:02 ` Robin Dapp
  2023-09-05 21:39   ` 钟居哲
  2023-09-06 13:22 ` Robin Dapp
  1 sibling, 1 reply; 5+ messages in thread
From: Robin Dapp @ 2023-09-05 21:02 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw

Hi Juzhe,

I think the general approach makes sense and it doesn't need to be perfect
from the beginning as we can always iterate on it.  Before continuing with a
more detailed review (hopefully tomorrow) some high-level questions upfront.
It would help to document some of these choices so it's easier to understand
the rationale.

 - Why don't we use the normal reverse postorder (or postorder) approach of
   computing live ranges?  Is that because we don't really need full global
   live ranges?

 - Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
   know the answer but an explanation (in a comment) would still be useful.

 - Do we really need get_all_predecessors/get_all_successors?  As they're
   only used for "defined before" and "used after", at first glance it
   looks like some kind of dominance info could help there but I didn't
   really check in detail.

 - Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related
   passes?  I don't mind maps but just wonder if it's on purpose, for
   convenience or something else. 

Besides, it might help to rename program_points_map (into program_points_per_bb
or so).  At first it looked quadratic to me but we're just iterating over
the program points of a BB.

Regards
 Robin


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
@ 2023-09-05  8:56 Juzhe-Zhong
  2023-09-05 21:02 ` Robin Dapp
  2023-09-06 13:22 ` Robin Dapp
  0 siblings, 2 replies; 5+ messages in thread
From: Juzhe-Zhong @ 2023-09-05  8:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong

This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic.

Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
      int32_t *__restrict d,
      int32_t *__restrict d2,
      int32_t *__restrict d3,
      int32_t *__restrict d4,
      int32_t *__restrict d5,
      int n)
{
  for (int i = 0; i < n; i++)
    {
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      d2[i] = a2[i] + c2[i];
      d3[i] = a3[i] + c3[i];
      d4[i] = a4[i] + c4[i];
      d5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i] + a[i];

      c2[i] = a[i] + c[i];
      c3[i] = b5[i] * a5[i];
      c4[i] = a2[i] * a3[i];
      c5[i] = b5[i] * a2[i];
      c[i] = a[i] + c3[i];
      c2[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i]
      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
    }
}

Demo: https://godbolt.org/z/x1acoMxGT

You can see it will produce register spilling if you specify LMUL >= 4

Now, with --param=riscv-autovec-lmul=dynamic.

GCC is able to pick LMUL = 2 to optimized this case.

This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to determine the VF/LMUL.

Note that this patch can well handle both SLP and non-SLP loop.

Currenty approach didn't consider the later instruction scheduler which may improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether we should support live range shrink for such corner case since we don't known whether it can improve performance a lot.)

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (get_last_live_range): New function.
	(compute_nregs_for_mode): Ditto.
	(live_range_conflict_p): Ditto.
	(max_number_of_live_regs): Ditto.
	(compute_lmul): Ditto.
	(costs::prefer_new_lmul_p): Ditto.
	(costs::better_main_loop_than_p): Ditto.
	* config/riscv/riscv-vector-costs.h (struct stmt_point): New struct.
	(struct var_live_range): Ditto.
	(struct autovec_info): Ditto.
	* config/riscv/t-riscv: Update makefile for COST model.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc        | 463 ++++++++++++++++++
 gcc/config/riscv/riscv-vector-costs.h         |  25 +
 gcc/config/riscv/t-riscv                      |   3 +-
 .../riscv/rvv/dynamic-lmul-mixed-1.c          |  50 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-1.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-2.c     |  63 +++
 .../costmodel/riscv/rvv/dynamic-lmul1-3.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-4.c     | 121 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-5.c     | 149 ++++++
 .../costmodel/riscv/rvv/dynamic-lmul1-6.c     | 150 ++++++
 .../costmodel/riscv/rvv/dynamic-lmul1-7.c     |  48 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-1.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-2.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-3.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-4.c     |  49 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-1.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-2.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-3.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-4.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-6.c     |  27 +
 .../costmodel/riscv/rvv/dynamic-lmul4-7.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-8.c     |  36 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-1.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-10.c    |  22 +
 .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-3.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-4.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-5.c     |  25 +
 .../costmodel/riscv/rvv/dynamic-lmul8-6.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-7.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-8.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-9.c     |  19 +
 .../riscv/rvv/rvv-costmodel-vect.exp          |  52 ++
 34 files changed, 2032 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp

diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
index 1a5e13d5eb3..3158bd634a6 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -36,16 +36,479 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tm_p.h"
 #include "tree-vectorizer.h"
+#include "gimple-iterator.h"
+#include "bitmap.h"
+#include "ssa.h"
+#include "backend.h"
 
 /* This file should be included last.  */
 #include "riscv-vector-costs.h"
 
 namespace riscv_vector {
 
+/* Dynamic LMUL philosophy - Local linear-scan SSA live range based analysis
+   determine LMUL
+
+     - Collect all vectorize STMTs locally for each loop block.
+     - Build program point based graph, ignore non-vectorize STMTs:
+
+	   vectorize STMT 0 - point 0
+	   scalar STMT 0 - ignore.
+	   vectorize STMT 1 - point 1
+	   ...
+     - Compute the number of live V_REGs live at each program point
+     - Determine LMUL in VECTOR COST model according to the program point
+       which has maximum live V_REGs.
+
+     Note:
+
+     - BIGGEST_MODE is the biggest LMUL auto-vectorization element mode.
+       It's important for mixed size auto-vectorization (Conversions, ... etc).
+       E.g. For a loop that is vectorizing conversion of INT32 -> INT64.
+       The biggest mode is DImode and LMUL = 8, LMUL = 4 for SImode.
+       We compute the number live V_REGs at each program point according to
+       this information.
+     - We only compute program points and live ranges locally (within a block)
+       since we just need to compute the number of live V_REGs at each program
+       point and we are not really allocating the registers for each SSA.
+       We can make the variable has another local live range in another block
+       if it live out/live in to another block.  Such approach doesn't affect
+       out accurate live range analysis.
+     - Current analysis didn't consider any instruction scheduling which
+       may improve the register pressure.  So we are conservatively doing the
+       analysis which may end up with smaller LMUL.
+       TODO: Maybe we could support a reasonable live range shrink algorithm
+       which take advantage of instruction scheduling.
+     - We may have these following possible autovec modes analysis:
+
+	 1. M8 -> M4 -> M2 -> M1 (stop analysis here) -> MF2 -> MF4 -> MF8
+	 2. M8 -> M1(M4) -> MF2(M2) -> MF4(M1) (stop analysis here) -> MF8(MF2)
+	 3. M1(M8) -> MF2(M4) -> MF4(M2) -> MF8(M1)
+*/
+static hash_map<class loop *, autovec_info> loop_autovec_infos;
+
+static int
+get_last_live_range (const vec<var_live_range> &live_ranges, tree var)
+{
+  unsigned int ix;
+  var_live_range *live_range;
+  FOR_EACH_VEC_ELT_REVERSE (live_ranges, ix, live_range)
+    if (live_range->var == var)
+      return ix;
+  return -1;
+}
+
+static void
+compute_local_program_points (
+  vec_info *vinfo, hash_map<basic_block, vec<stmt_point>> &program_points_map)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      gimple_stmt_iterator si;
+      unsigned int i;
+      /* Collect the stmts that is vectorized and mark their program point.  */
+      for (i = 0; i < nbbs; i++)
+	{
+	  int point = 0;
+	  basic_block bb = bbs[i];
+	  vec<stmt_point> program_points = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local program points for bb %d:\n",
+			     bb->index);
+	  for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
+	    {
+	      if (!(is_gimple_assign (gsi_stmt (si))
+		    || is_gimple_call (gsi_stmt (si))))
+		continue;
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  stmt_point info = {point, gsi_stmt (si)};
+		  program_points.safe_push (info);
+		  point++;
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "program point %d: %G", info.point,
+				     gsi_stmt (si));
+		}
+	    }
+	  program_points_map.put (bb, program_points);
+	}
+    }
+}
+
+static machine_mode
+compute_local_live_ranges (
+  const hash_map<basic_block, vec<stmt_point>> &program_points_map,
+  hash_map<basic_block, vec<var_live_range>> &live_ranges_map)
+{
+  machine_mode biggest_mode = QImode;
+  if (!program_points_map.is_empty ())
+    {
+      auto_vec<tree> visited_vars;
+      unsigned int i;
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_map.begin ();
+	   iter != program_points_map.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<stmt_point> program_points = (*iter).second;
+	  vec<var_live_range> live_ranges = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local live ranges for bb %d:\n",
+			     bb->index);
+	  for (const auto program_point : program_points)
+	    {
+	      int point = program_point.point;
+	      gimple *stmt = program_point.stmt;
+	      if (!gimple_store_p (stmt))
+		{
+		  tree lhs = gimple_get_lhs (stmt);
+		  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+		  if (GET_MODE_SIZE (mode).to_constant ()
+		      > GET_MODE_SIZE (biggest_mode).to_constant ())
+		    biggest_mode = mode;
+		  var_live_range range = {lhs, point, point};
+		  live_ranges.safe_push (range);
+		}
+	      for (i = 0; i < gimple_num_args (stmt); i++)
+		{
+		  tree var = gimple_arg (stmt, i);
+		  if (is_gimple_reg (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
+		    {
+		      machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+		      if (GET_MODE_SIZE (mode).to_constant ()
+			  > GET_MODE_SIZE (biggest_mode).to_constant ())
+			biggest_mode = mode;
+		      int index = get_last_live_range (live_ranges, var);
+		      if (index == -1)
+			{
+			  var_live_range range = {var, 0, point};
+			  live_ranges.safe_push (range);
+			}
+		      else
+			live_ranges[index].end = point;
+		    }
+		}
+	    }
+	  live_ranges_map.put (bb, live_ranges);
+	  if (dump_enabled_p ())
+	    for (i = 0; i < live_ranges.length (); i++)
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "%T: type = %T, start = %d, end = %d\n",
+			       live_ranges[i].var,
+			       TREE_TYPE (live_ranges[i].var),
+			       live_ranges[i].start, live_ranges[i].end);
+	}
+    }
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Biggest mode = %s\n",
+		     GET_MODE_NAME (biggest_mode));
+  return biggest_mode;
+}
+
+static unsigned int
+compute_nregs_for_mode (machine_mode mode, machine_mode biggest_mode, int lmul)
+{
+  unsigned int mode_size = GET_MODE_SIZE (mode).to_constant ();
+  unsigned int biggest_size = GET_MODE_SIZE (biggest_mode).to_constant ();
+  gcc_assert (biggest_size >= mode_size);
+  unsigned int ratio = biggest_size / mode_size;
+  return lmul / ratio;
+}
+
+static bool
+live_range_conflict_p (const var_live_range &live_range1,
+		       const var_live_range &live_range2)
+{
+  if (live_range1.start >= live_range2.end)
+    return false;
+  if (live_range1.end <= live_range2.start)
+    return false;
+  if (live_range2.start >= live_range1.end)
+    return false;
+  if (live_range2.end <= live_range1.start)
+    return false;
+  return true;
+}
+
+static unsigned int
+max_number_of_live_regs (const basic_block bb,
+			 const vec<var_live_range> &live_ranges,
+			 machine_mode biggest_mode, int lmul)
+{
+  unsigned int max_nregs = 0;
+  unsigned int i, j, k;
+  unsigned int live_point = 0;
+  for (i = 0; i < live_ranges.length (); i++)
+    {
+      auto_vec<var_live_range> conflict_live_ranges;
+      var_live_range live_range = live_ranges[i];
+      conflict_live_ranges.safe_push (live_range);
+      unsigned int min_point = live_range.start;
+      unsigned int max_point = live_range.end;
+      for (j = 0; j < live_ranges.length (); j++)
+	{
+	  if (j == i)
+	    continue;
+	  if (live_range_conflict_p (live_range, live_ranges[j]))
+	    {
+	      conflict_live_ranges.safe_push (live_ranges[j]);
+	      min_point
+		= std::min (min_point, (unsigned int) live_ranges[j].start);
+	      max_point
+		= std::max (max_point, (unsigned int) live_ranges[j].end);
+	    }
+	}
+      for (j = min_point; j <= max_point; j++)
+	{
+	  unsigned int nregs = 0;
+	  for (k = 0; k < conflict_live_ranges.length (); k++)
+	    {
+	      if (j >= (unsigned int) conflict_live_ranges[k].start
+		  && j <= (unsigned int) conflict_live_ranges[k].end)
+		{
+		  machine_mode mode
+		    = TYPE_MODE (TREE_TYPE (conflict_live_ranges[k].var));
+		  nregs += compute_nregs_for_mode (mode, biggest_mode, lmul);
+		}
+	    }
+	  if (nregs > max_nregs)
+	    {
+	      max_nregs = nregs;
+	      live_point = j;
+	    }
+	}
+    }
+
+  /* Collect user explicit RVV type.  */
+  hash_set<basic_block> all_preds = get_all_predecessors (bb);
+  hash_set<basic_block> all_succs = get_all_successors (bb);
+  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
+    {
+      tree t = ssa_name (i);
+      if (!t)
+	continue;
+      machine_mode mode = TYPE_MODE (TREE_TYPE (t));
+      if (!lookup_vector_type_attribute (TREE_TYPE (t))
+	  && !riscv_v_ext_vls_mode_p (mode))
+	continue;
+
+      gimple *def = SSA_NAME_DEF_STMT (t);
+      if (gimple_bb (def) && !all_preds.contains (gimple_bb (def)))
+	continue;
+      const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
+      const ssa_use_operand_t *ptr;
+
+      for (ptr = head->next; ptr != head; ptr = ptr->next)
+	{
+	  if (USE_STMT (ptr) && !is_gimple_debug (USE_STMT (ptr)))
+	    {
+	      if (all_succs.contains (gimple_bb (USE_STMT (ptr))))
+		{
+		  int regno_alignment = riscv_get_v_regno_alignment (mode);
+		  max_nregs += regno_alignment;
+		  if (dump_enabled_p ())
+		    dump_printf_loc (
+		      MSG_NOTE, vect_location,
+		      "Explicit used SSA %T, vectype = %T, mode = %s, cause %d "
+		      "V_REG live in bb %d at program point %d\n",
+		      t, TREE_TYPE (t), GET_MODE_NAME (mode), regno_alignment,
+		      bb->index, live_point);
+		  break;
+		}
+	    }
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Maximum lmul = %d, %d number of live V_REG at program "
+		     "point %d for bb %d\n",
+		     lmul, max_nregs, live_point, bb->index);
+  return max_nregs;
+}
+
+static int
+compute_lmul (class loop *loop)
+{
+  unsigned int current_lmul = loop_autovec_infos.get (loop)->current_lmul;
+  return current_lmul;
+}
+
+static void
+update_local_live_ranges (
+  vec_info *vinfo, hash_map<basic_block, vec<stmt_point>> &program_points_map,
+  hash_map<basic_block, vec<var_live_range>> &live_ranges_map)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      unsigned int i, j, k;
+      gphi_iterator psi;
+      for (i = 0; i < nbbs; i++)
+	{
+	  basic_block bb = bbs[i];
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Update local program points for bb %d:\n",
+			     bb->index);
+	  for (psi = gsi_start_phis (bbs[i]); !gsi_end_p (psi); gsi_next (&psi))
+	    {
+	      gphi *phi = psi.phi ();
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (phi);
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  for (j = 0; j < gimple_phi_num_args (phi); j++)
+		    {
+		      edge e = gimple_phi_arg_edge (phi, j);
+		      tree def = gimple_phi_arg_def (phi, j);
+		      auto *live_ranges = live_ranges_map.get (e->src);
+		      if (!program_points_map.get (e->src))
+			continue;
+		      int max_point
+			= (*program_points_map.get (e->src)).length () - 1;
+		      for (k = 0; k < (*live_ranges).length (); k++)
+			{
+			  if ((*live_ranges)[i].var == def)
+			    {
+			      int end = (*live_ranges)[i].end;
+			      (*live_ranges)[i].end = max_point;
+			      if (dump_enabled_p ())
+				dump_printf_loc (
+				  MSG_NOTE, vect_location,
+				  "Update %T end point from %d to %d:\n",
+				  (*live_ranges)[i].var, end,
+				  (*live_ranges)[i].end);
+			    }
+			}
+		    }
+		}
+	    }
+	}
+    }
+}
+
 costs::costs (vec_info *vinfo, bool costing_for_scalar)
   : vector_costs (vinfo, costing_for_scalar)
 {}
 
+bool
+costs::prefer_new_lmul_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+  auto this_loop_vinfo = as_a<loop_vec_info> (this->m_vinfo);
+  auto other_loop_vinfo = as_a<loop_vec_info> (other->m_vinfo);
+  class loop *loop = LOOP_VINFO_LOOP (this_loop_vinfo);
+
+  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (this_loop_vinfo)
+      && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (other_loop_vinfo))
+    return false;
+
+  if (loop_autovec_infos.get (loop) && loop_autovec_infos.get (loop)->end_p)
+    return false;
+  else if (loop_autovec_infos.get (loop))
+    loop_autovec_infos.get (loop)->current_lmul
+      = loop_autovec_infos.get (loop)->current_lmul / 2;
+  else
+    {
+      int regno_alignment
+	= riscv_get_v_regno_alignment (other_loop_vinfo->vector_mode);
+      if (known_eq (LOOP_VINFO_SLP_UNROLLING_FACTOR (other_loop_vinfo), 1U))
+	regno_alignment = RVV_M8;
+      loop_autovec_infos.put (loop, {regno_alignment, regno_alignment, false});
+    }
+
+  int lmul = compute_lmul (loop);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Comparing two main loops (%s at VF %d vs %s at VF %d)\n",
+		     GET_MODE_NAME (this_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (this_loop_vinfo),
+		     GET_MODE_NAME (other_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (other_loop_vinfo));
+
+  /* Compute local program points.
+     It's a fast and effective computation.  */
+  hash_map<basic_block, vec<stmt_point>> program_points_map;
+  compute_local_program_points (other->m_vinfo, program_points_map);
+
+  /* Compute local live ranges.  */
+  hash_map<basic_block, vec<var_live_range>> live_ranges_map;
+  machine_mode biggest_mode
+    = compute_local_live_ranges (program_points_map, live_ranges_map);
+
+  /* Update live ranges according to PHI.  */
+  update_local_live_ranges (other->m_vinfo, program_points_map,
+			    live_ranges_map);
+
+  /* TODO: We calculate the maximum live vars base on current STMTS
+     sequence.  We can support live range shrink if it can give us
+     big improvement in the future.  */
+  if (!program_points_map.is_empty ())
+    {
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_map.begin ();
+	   iter != program_points_map.end (); ++iter)
+	{
+	  vec<stmt_point> program_points = (*iter).second;
+	  if (!program_points.is_empty ())
+	    program_points.release ();
+	}
+      program_points_map.empty ();
+    }
+  if (!live_ranges_map.is_empty ())
+    {
+      unsigned int max_nregs = 0;
+      for (hash_map<basic_block, vec<var_live_range>>::iterator iter
+	   = live_ranges_map.begin ();
+	   iter != live_ranges_map.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<var_live_range> live_ranges = (*iter).second;
+	  if (live_ranges.is_empty ())
+	    continue;
+	  /* We prefer larger LMUL unless it causes register spillings.  */
+	  unsigned int nregs
+	    = max_number_of_live_regs (bb, live_ranges, biggest_mode, lmul);
+	  if (nregs > max_nregs)
+	    max_nregs = nregs;
+	  live_ranges.release ();
+	}
+      live_ranges_map.empty ();
+      if (loop_autovec_infos.get (loop)->current_lmul == RVV_M1
+	  || max_nregs <= V_REG_NUM)
+	loop_autovec_infos.get (loop)->end_p = true;
+      if (loop_autovec_infos.get (loop)->current_lmul > RVV_M1)
+	return max_nregs > V_REG_NUM;
+      return false;
+    }
+  return lmul > RVV_M1;
+}
+
+bool
+costs::better_main_loop_than_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+
+  if (!flag_vect_cost_model)
+    return vector_costs::better_main_loop_than_p (other);
+
+  if (riscv_autovec_lmul == RVV_DYNAMIC)
+    return prefer_new_lmul_p (uncast_other);
+
+  return vector_costs::better_main_loop_than_p (other);
+}
+
 unsigned
 costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 		      stmt_vec_info stmt_info, slp_tree, tree vectype,
diff --git a/gcc/config/riscv/riscv-vector-costs.h b/gcc/config/riscv/riscv-vector-costs.h
index 57b1be01048..c0abb8e6455 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -23,6 +23,27 @@
 
 namespace riscv_vector {
 
+struct stmt_point
+{
+  /* Program point.  */
+  int point;
+  gimple *stmt;
+};
+
+struct var_live_range
+{
+  tree var;
+  int start;
+  int end;
+};
+
+struct autovec_info
+{
+  unsigned int initial_lmul;
+  unsigned int current_lmul;
+  bool end_p;
+};
+
 /* rvv-specific vector costs.  */
 class costs : public vector_costs
 {
@@ -31,12 +52,16 @@ class costs : public vector_costs
 public:
   costs (vec_info *, bool);
 
+  bool better_main_loop_than_p (const vector_costs *other) const override;
+
 private:
   unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
 			      stmt_vec_info stmt_info, slp_tree node,
 			      tree vectype, int misalign,
 			      vect_cost_model_location where) override;
   void finish_cost (const vector_costs *) override;
+
+  bool prefer_new_lmul_p (const vector_costs *) const;
 };
 
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index b1f80d1d87c..ec5d563859e 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -70,7 +70,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
 riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
-  fold-const.h $(TM_P_H) tree-vectorizer.h \
+  fold-const.h $(TM_P_H) tree-vectorizer.h gimple-iterator.h bitmap.h \
+  ssa.h backend.h \
   $(srcdir)/config/riscv/riscv-vector-costs.h
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/riscv/riscv-vector-costs.cc
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
new file mode 100644
index 00000000000..fd9f38bc766
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = d5[i] + b[i];
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
new file mode 100644
index 00000000000..6c414bcd115
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
new file mode 100644
index 00000000000..b77f3ff58ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      e[i] = a2[i] + c2[i];
+      e2[i] = d2[i] + a2[i];
+      e3[i] = d3[i] + a3[i];
+      e4[i] = d4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i];
+    }
+}
+
+/* FIXME: Choosing LMUL = 1 is not the optimal since it can be LMUL = 2 if we apply instruction scheduler.  */
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
new file mode 100644
index 00000000000..164930c9bba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
new file mode 100644
index 00000000000..8d80fbfe390
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
@@ -0,0 +1,121 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
new file mode 100644
index 00000000000..7b4014ddaf6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
@@ -0,0 +1,149 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+
+      int32_t *__restrict gggg,
+      int32_t *__restrict gggg2,
+      int32_t *__restrict gggg3,
+      int32_t *__restrict gggg4,
+      int32_t *__restrict gggg5,
+
+      int32_t *__restrict ggggg,
+      int32_t *__restrict ggggg2,
+      int32_t *__restrict ggggg3,
+      int32_t *__restrict ggggg4,
+      int32_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
new file mode 100644
index 00000000000..51d05f2bec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
@@ -0,0 +1,150 @@
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+  
+      int8_t *__restrict gg,
+      int8_t *__restrict gg2,
+      int8_t *__restrict gg3,
+      int8_t *__restrict gg4,
+      int8_t *__restrict gg5,
+  
+      int8_t *__restrict ggg,
+      int8_t *__restrict ggg2,
+      int8_t *__restrict ggg3,
+      int8_t *__restrict ggg4,
+      int8_t *__restrict ggg5,
+
+      int8_t *__restrict gggg,
+      int8_t *__restrict gggg2,
+      int8_t *__restrict gggg3,
+      int8_t *__restrict gggg4,
+      int8_t *__restrict gggg5,
+
+      int8_t *__restrict ggggg,
+      int8_t *__restrict ggggg2,
+      int8_t *__restrict ggggg3,
+      int8_t *__restrict ggggg4,
+      int8_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
new file mode 100644
index 00000000000..dfd71414b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -Wno-psabi -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+vint32m8_t
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n, vint32m8_t vector)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return vector;
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
new file mode 100644
index 00000000000..a80b1b1556a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
new file mode 100644
index 00000000000..9964f3fe8ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (a, 32);
+  __riscv_vse32_v_i32m1 (c, v, 32);
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
new file mode 100644
index 00000000000..156ccc7f98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+      int32_t *__restrict d4, int32_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
new file mode 100644
index 00000000000..4cacc039dcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d, int8_t *__restrict d2, int8_t *__restrict d3,
+      int8_t *__restrict d4, int8_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
new file mode 100644
index 00000000000..2308109f00c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int8_t *__restrict b,
+          int8_t *__restrict c,
+          int8_t *__restrict a2,
+          int8_t *__restrict b2,
+          int8_t *__restrict c2,
+          int8_t *__restrict a3,
+          int8_t *__restrict b3,
+          int8_t *__restrict c3,
+          int8_t *__restrict a4,
+          int8_t *__restrict b4,
+          int8_t *__restrict c4,
+          int64_t *__restrict a5,
+          int8_t *__restrict b5,
+          int8_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
new file mode 100644
index 00000000000..2a1521bffb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int64_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
new file mode 100644
index 00000000000..928a507a363
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int16_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int16_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
new file mode 100644
index 00000000000..f16cfb9fd08
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 7] + 1;
+      a[i * 8 + 1] = b[i * 8 + 6] + 2;
+      a[i * 8 + 2] = b[i * 8 + 5] + 3;
+      a[i * 8 + 3] = b[i * 8 + 4] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 2] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
new file mode 100644
index 00000000000..e324380e27b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int8_t *__restrict a,
+          int64_t *__restrict b,
+          int64_t *__restrict c,
+          int64_t *__restrict a2,
+          int64_t *__restrict b2,
+          int64_t *__restrict c2,
+          int64_t *__restrict a3,
+          int64_t *__restrict b3,
+          int64_t *__restrict c3,
+          int64_t *__restrict a4,
+          int64_t *__restrict b4,
+          int64_t *__restrict c4,
+          int8_t *__restrict a5,
+          int64_t *__restrict b5,
+          int64_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
new file mode 100644
index 00000000000..553f2aac0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 16] = b[i * 16 + 15] + 1;
+      a[i * 16 + 1] = b[i * 16 + 14] + 2;
+      a[i * 16 + 2] = b[i * 16 + 13] + 3;
+      a[i * 16 + 3] = b[i * 16 + 12] + 4;
+      a[i * 16 + 4] = b[i * 16 + 11] + 5;
+      a[i * 16 + 5] = b[i * 16 + 10] + 6;
+      a[i * 16 + 6] = b[i * 16 + 9] + 7;
+      a[i * 16 + 7] = b[i * 16 + 8] + 8;
+      
+      a[i * 16 + 8] = b[i * 16 + 7] + 1;
+      a[i * 16 + 9] = b[i * 16 + 6] + 2;
+      a[i * 16 + 10] = b[i * 16 + 5] + 3;
+      a[i * 16 + 11] = b[i * 16 + 4] + 4;
+      a[i * 16 + 12] = b[i * 16 + 3] + 5;
+      a[i * 16 + 13] = b[i * 16 + 2] + 6;
+      a[i * 16 + 14] = b[i * 16 + 1] + 7;
+      a[i * 16 + 15] = b[i * 16 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
new file mode 100644
index 00000000000..e2483004698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
new file mode 100644
index 00000000000..e65abb299a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int
+foo (int *x, int n, int res)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      res += x[i * 2];
+      res += x[i * 2 + 1];
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
new file mode 100644
index 00000000000..a50265fc1ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int16_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
new file mode 100644
index 00000000000..3e9751a22ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
new file mode 100644
index 00000000000..3b2527aad5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+#include <stddef.h>
+
+void
+foo (size_t *__restrict a, size_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
new file mode 100644
index 00000000000..d63926fa56a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++){
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+  }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
new file mode 100644
index 00000000000..5c816140950
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2,
+     int8_t *__restrict b2, int8_t *__restrict a3, int8_t *__restrict b3,
+     int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict a5,
+     int8_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
new file mode 100644
index 00000000000..596608bb8d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict a2,
+     int32_t *__restrict b2, int32_t *__restrict a3, int32_t *__restrict b3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict a5,
+     int32_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
new file mode 100644
index 00000000000..a859d976555
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int8_t
+foo (int8_t *__restrict a, int8_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
new file mode 100644
index 00000000000..b965fd0373a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int64_t
+foo (int64_t *__restrict a, int64_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
new file mode 100644
index 00000000000..a3e8f50b73f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
@@ -0,0 +1,52 @@
+# Copyright (C) 2023-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a riscv target.
+if { ![istarget riscv*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+set gcc_march "rv64gcv_zvfh"
+set gcc_mabi  "lp64d"
+if [istarget riscv32-*-*] then {
+  set gcc_march "rv32gcv_zvfh"
+  set gcc_mabi  "ilp32d"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/dynamic-lmul*.\[cS\]]] \
+	"-O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic" $CFLAGS
+
+# All done.
+dg-finish
-- 
2.36.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-11  9:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11  9:38 [PATCH V2] RISC-V: Support Dynamic LMUL Cost model Juzhe-Zhong
  -- strict thread matches above, loose matches on Subject: below --
2023-09-05  8:56 Juzhe-Zhong
2023-09-05 21:02 ` Robin Dapp
2023-09-05 21:39   ` 钟居哲
2023-09-05 21:53     ` Jeff Law
2023-09-06 13:22 ` Robin Dapp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).