[PATCH V4] RISC-V: Support Dynamic LMUL Cost model

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
@ 2023-09-12  6:49 Juzhe-Zhong
  2023-09-12  8:19 ` Robin Dapp
  2023-09-12  9:17 ` Robin Dapp
  0 siblings, 2 replies; 9+ messages in thread
From: Juzhe-Zhong @ 2023-09-12  6:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong

This patch support dynamic LMUL cost modeling with --param=riscv-autovec-lmul=dynamic.

Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
      int32_t *__restrict d,
      int32_t *__restrict d2,
      int32_t *__restrict d3,
      int32_t *__restrict d4,
      int32_t *__restrict d5,
      int n)
{
  for (int i = 0; i < n; i++)
    {
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      d2[i] = a2[i] + c2[i];
      d3[i] = a3[i] + c3[i];
      d4[i] = a4[i] + c4[i];
      d5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i] + a[i];

      c2[i] = a[i] + c[i];
      c3[i] = b5[i] * a5[i];
      c4[i] = a2[i] * a3[i];
      c5[i] = b5[i] * a2[i];
      c[i] = a[i] + c3[i];
      c2[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i]
      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
    }
}

Demo: https://godbolt.org/z/x1acoMxGT

You can see it will produce register spilling if you specify LMUL >= 4

Now, with --param=riscv-autovec-lmul=dynamic.

GCC is able to pick LMUL = 2 to optimized this case.

This feature is supported by linear scan based local live ranges analysis and
compute maximum live V_REGS in specific program point of the function to determine the VF/LMUL.

Note that this patch can well handle both SLP and non-SLP loop.

Currenty approach didn't consider the later instruction scheduler which may improve the register pressure.
In this case, we are conservatively applying smaller VF/LMUL. (Not sure whether we should support live range shrink for such corner case since we don't known whether it can improve performance a lot.)

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (get_last_live_range): New function.
	(compute_nregs_for_mode): Ditto.
	(live_range_conflict_p): Ditto.
	(max_number_of_live_regs): Ditto.
	(compute_lmul): Ditto.
	(costs::prefer_new_lmul_p): Ditto.
	(costs::better_main_loop_than_p): Ditto.
	* config/riscv/riscv-vector-costs.h (struct stmt_point): New struct.
	(struct var_live_range): Ditto.
	(struct autovec_info): Ditto.
	* config/riscv/t-riscv: Update makefile for COST model.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: New test.
	* gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp: New test.
---
 gcc/config/riscv/riscv-vector-costs.cc        | 504 ++++++++++++++++++
 gcc/config/riscv/riscv-vector-costs.h         |  21 +
 gcc/config/riscv/t-riscv                      |   3 +-
 .../riscv/rvv/dynamic-lmul-mixed-1.c          |  50 ++
 .../costmodel/riscv/rvv/dynamic-lmul1-1.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-2.c     |  63 +++
 .../costmodel/riscv/rvv/dynamic-lmul1-3.c     |  91 ++++
 .../costmodel/riscv/rvv/dynamic-lmul1-4.c     | 121 +++++
 .../costmodel/riscv/rvv/dynamic-lmul1-5.c     | 149 ++++++
 .../costmodel/riscv/rvv/dynamic-lmul1-6.c     | 150 ++++++
 .../costmodel/riscv/rvv/dynamic-lmul1-7.c     |  48 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-1.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-2.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-3.c     |  51 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-4.c     |  49 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-5.c     |  52 ++
 .../costmodel/riscv/rvv/dynamic-lmul2-6.c     |  54 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-1.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-2.c     |  35 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-3.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-4.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-5.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-6.c     |  27 +
 .../costmodel/riscv/rvv/dynamic-lmul4-7.c     |  47 ++
 .../costmodel/riscv/rvv/dynamic-lmul4-8.c     |  36 ++
 .../costmodel/riscv/rvv/dynamic-lmul8-1.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-10.c    |  22 +
 .../costmodel/riscv/rvv/dynamic-lmul8-2.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-3.c     |  18 +
 .../costmodel/riscv/rvv/dynamic-lmul8-4.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-5.c     |  25 +
 .../costmodel/riscv/rvv/dynamic-lmul8-6.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-7.c     |  23 +
 .../costmodel/riscv/rvv/dynamic-lmul8-8.c     |  19 +
 .../costmodel/riscv/rvv/dynamic-lmul8-9.c     |  19 +
 .../riscv/rvv/rvv-costmodel-vect.exp          |  52 ++
 36 files changed, 2175 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp

diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
index 1a5e13d5eb3..1e82dab1bc1 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -36,16 +36,520 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tm_p.h"
 #include "tree-vectorizer.h"
+#include "gimple-iterator.h"
+#include "bitmap.h"
+#include "ssa.h"
+#include "backend.h"
 
 /* This file should be included last.  */
 #include "riscv-vector-costs.h"
 
 namespace riscv_vector {
 
+/* Dynamic LMUL philosophy - Local linear-scan SSA live range based analysis
+   determine LMUL
+
+     - Collect all vectorize STMTs locally for each loop block.
+     - Build program point based graph, ignore non-vectorize STMTs:
+
+	   vectorize STMT 0 - point 0
+	   scalar STMT 0 - ignore.
+	   vectorize STMT 1 - point 1
+	   ...
+     - Compute the number of live V_REGs live at each program point
+     - Determine LMUL in VECTOR COST model according to the program point
+       which has maximum live V_REGs.
+
+     Note:
+
+     - BIGGEST_MODE is the biggest LMUL auto-vectorization element mode.
+       It's important for mixed size auto-vectorization (Conversions, ... etc).
+       E.g. For a loop that is vectorizing conversion of INT32 -> INT64.
+       The biggest mode is DImode and LMUL = 8, LMUL = 4 for SImode.
+       We compute the number live V_REGs at each program point according to
+       this information.
+     - We only compute program points and live ranges locally (within a block)
+       since we just need to compute the number of live V_REGs at each program
+       point and we are not really allocating the registers for each SSA.
+       We can make the variable has another local live range in another block
+       if it live out/live in to another block.  Such approach doesn't affect
+       out accurate live range analysis.
+     - Current analysis didn't consider any instruction scheduling which
+       may improve the register pressure.  So we are conservatively doing the
+       analysis which may end up with smaller LMUL.
+       TODO: Maybe we could support a reasonable live range shrink algorithm
+       which take advantage of instruction scheduling.
+     - We may have these following possible autovec modes analysis:
+
+	 1. M8 -> M4 -> M2 -> M1 (stop analysis here) -> MF2 -> MF4 -> MF8
+	 2. M8 -> M1(M4) -> MF2(M2) -> MF4(M1) (stop analysis here) -> MF8(MF2)
+	 3. M1(M8) -> MF2(M4) -> MF4(M2) -> MF8(M1)
+*/
+static hash_map<class loop *, autovec_info> loop_autovec_infos;
+
+/* Collect all STMTs that are vectorized and compute their program points.
+   Note that we don't care about the STMTs that are not vectorized and
+   we only build the local graph (within a block) of program points.
+
+   Loop:
+     bb 2:
+       STMT 1 (be vectorized)      -- point 0
+       STMT 2 (not be vectorized)  -- ignored
+       STMT 3 (be vectorized)      -- point 1
+       STMT 4 (be vectorized)      -- point 2
+       STMT 5 (be vectorized)      -- point 3
+       ...
+     bb 3:
+       STMT 1 (be vectorized)      -- point 0
+       STMT 2 (be vectorized)      -- point 1
+       STMT 3 (not be vectorized)  -- ignored
+       STMT 4 (not be vectorized)  -- ignored
+       STMT 5 (be vectorized)      -- point 2
+       ...
+*/
+static void
+compute_local_program_points (
+  vec_info *vinfo,
+  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      gimple_stmt_iterator si;
+      unsigned int i;
+      /* Collect the stmts that is vectorized and mark their program point.  */
+      for (i = 0; i < nbbs; i++)
+	{
+	  int point = 0;
+	  basic_block bb = bbs[i];
+	  vec<stmt_point> program_points = vNULL;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local program points for bb %d:\n",
+			     bb->index);
+	  for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
+	    {
+	      if (!(is_gimple_assign (gsi_stmt (si))
+		    || is_gimple_call (gsi_stmt (si))))
+		continue;
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  stmt_point info = {point, gsi_stmt (si)};
+		  program_points.safe_push (info);
+		  point++;
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "program point %d: %G", info.point,
+				     gsi_stmt (si));
+		}
+	    }
+	  program_points_per_bb.put (bb, program_points);
+	}
+    }
+}
+
+/* Compute local live ranges of each vectorized variable.
+   Note that we only compute local live ranges (within a block) since
+   local live ranges information is accurate enough for us to determine
+   the LMUL/vectorization factor of the loop.
+
+   Loop:
+     bb 2:
+       STMT 1               -- point 0
+       STMT 2 (def SSA 1)   -- point 1
+       STMT 3 (use SSA 1)   -- point 2
+       STMT 4               -- point 3
+     bb 3:
+       STMT 1               -- point 0
+       STMT 2               -- point 1
+       STMT 3               -- point 2
+       STMT 4 (use SSA 2)   -- point 3
+
+   The live range of SSA 1 is [1, 3] in bb 2.
+   The live range of SSA 2 is [0, 4] in bb 3.  */
+static machine_mode
+compute_local_live_ranges (
+  const hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
+  hash_map<basic_block, hash_map<tree, pair>> &live_ranges_per_bb)
+{
+  machine_mode biggest_mode = QImode;
+  if (!program_points_per_bb.is_empty ())
+    {
+      auto_vec<tree> visited_vars;
+      unsigned int i;
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_per_bb.begin ();
+	   iter != program_points_per_bb.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  vec<stmt_point> program_points = (*iter).second;
+	  bool existed_p = false;
+	  hash_map<tree, pair> *live_ranges
+	    = &live_ranges_per_bb.get_or_insert (bb, &existed_p);
+	  gcc_assert (!existed_p);
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Compute local live ranges for bb %d:\n",
+			     bb->index);
+	  for (const auto program_point : program_points)
+	    {
+	      unsigned int point = program_point.point;
+	      gimple *stmt = program_point.stmt;
+	      machine_mode mode = biggest_mode;
+	      if (!gimple_store_p (stmt))
+		{
+		  tree lhs = gimple_get_lhs (stmt);
+		  mode = TYPE_MODE (TREE_TYPE (lhs));
+		  bool existed_p = false;
+		  pair &live_range
+		    = live_ranges->get_or_insert (lhs, &existed_p);
+		  gcc_assert (!existed_p);
+		  live_range = pair (point, point);
+		}
+	      for (i = 0; i < gimple_num_args (stmt); i++)
+		{
+		  tree var = gimple_arg (stmt, i);
+		  if (is_gimple_reg (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
+		    {
+		      mode = TYPE_MODE (TREE_TYPE (var));
+		      bool existed_p = false;
+		      pair &live_range
+			= live_ranges->get_or_insert (var, &existed_p);
+		      if (existed_p)
+			/* We will grow the live range for each use.  */
+			live_range = pair (live_range.first, point);
+		      else
+			/* We assume the variable is live from the start of
+			   this block.  */
+			live_range = pair (0, point);
+		    }
+		}
+	      if (GET_MODE_SIZE (mode).to_constant ()
+		  > GET_MODE_SIZE (biggest_mode).to_constant ())
+		biggest_mode = mode;
+	    }
+	  if (dump_enabled_p ())
+	    for (hash_map<tree, pair>::iterator iter = live_ranges->begin ();
+		 iter != live_ranges->end (); ++iter)
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "%T: type = %T, start = %d, end = %d\n",
+			       (*iter).first, TREE_TYPE ((*iter).first),
+			       (*iter).second.first, (*iter).second.second);
+	}
+    }
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Biggest mode = %s\n",
+		     GET_MODE_NAME (biggest_mode));
+  return biggest_mode;
+}
+
+/* Compute the mode for MODE, BIGGEST_MODE and LMUL.
+
+   E.g. If mode = SImode, biggest_mode = DImode, LMUL = M4.
+	Then return RVVM4SImode (LMUL = 4, element mode = SImode).  */
+static unsigned int
+compute_nregs_for_mode (machine_mode mode, machine_mode biggest_mode, int lmul)
+{
+  unsigned int mode_size = GET_MODE_SIZE (mode).to_constant ();
+  unsigned int biggest_size = GET_MODE_SIZE (biggest_mode).to_constant ();
+  gcc_assert (biggest_size >= mode_size);
+  unsigned int ratio = biggest_size / mode_size;
+  return lmul / ratio;
+}
+
+/* This function helps to determine whether current LMUL will cause
+   potential vector register (V_REG) spillings according to live range
+   information.
+
+     - First, compute how many variable are alive of each program point
+       in each bb of the loop.
+     - Second, compute how many V_REGs are alive of each program point
+       in each bb of the loop according the BIGGEST_MODE and the variable
+       mode.
+     - Third, Return the maximum V_REGs are alive of the loop.  */
+static unsigned int
+max_number_of_live_regs (const basic_block bb,
+			 const hash_map<tree, pair> &live_ranges,
+			 unsigned int max_point, machine_mode biggest_mode,
+			 int lmul)
+{
+  unsigned int max_nregs = 0;
+  unsigned int i;
+  unsigned int live_point = 0;
+  auto_vec<unsigned int> live_vars_vec;
+  live_vars_vec.safe_grow (max_point + 1, true);
+  for (i = 0; i < live_vars_vec.length (); ++i)
+    live_vars_vec[i] = 0;
+  for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
+       iter != live_ranges.end (); ++iter)
+    {
+      tree var = (*iter).first;
+      pair live_range = (*iter).second;
+      for (i = live_range.first; i <= live_range.second; i++)
+	{
+	  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+	  unsigned int nregs
+	    = compute_nregs_for_mode (mode, biggest_mode, lmul);
+	  live_vars_vec[i] += nregs;
+	  if (live_vars_vec[i] > max_nregs)
+	    max_nregs = live_vars_vec[i];
+	}
+    }
+
+  /* Collect user explicit RVV type.  */
+  auto_vec<basic_block> all_preds
+    = get_all_dominated_blocks (CDI_POST_DOMINATORS, bb);
+  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
+    {
+      tree t = ssa_name (i);
+      if (!t)
+	continue;
+      machine_mode mode = TYPE_MODE (TREE_TYPE (t));
+      if (!lookup_vector_type_attribute (TREE_TYPE (t))
+	  && !riscv_v_ext_vls_mode_p (mode))
+	continue;
+
+      gimple *def = SSA_NAME_DEF_STMT (t);
+      if (gimple_bb (def) && !all_preds.contains (gimple_bb (def)))
+	continue;
+      use_operand_p use_p;
+      imm_use_iterator iterator;
+
+      FOR_EACH_IMM_USE_FAST (use_p, iterator, t)
+	{
+	  if (!USE_STMT (use_p) || is_gimple_debug (USE_STMT (use_p))
+	      || !dominated_by_p (CDI_POST_DOMINATORS, bb,
+				  gimple_bb (USE_STMT (use_p))))
+	    continue;
+
+	  int regno_alignment = riscv_get_v_regno_alignment (mode);
+	  max_nregs += regno_alignment;
+	  if (dump_enabled_p ())
+	    dump_printf_loc (
+	      MSG_NOTE, vect_location,
+	      "Explicit used SSA %T, vectype = %T, mode = %s, cause %d "
+	      "V_REG live in bb %d at program point %d\n",
+	      t, TREE_TYPE (t), GET_MODE_NAME (mode), regno_alignment,
+	      bb->index, live_point);
+	  break;
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Maximum lmul = %d, %d number of live V_REG at program "
+		     "point %d for bb %d\n",
+		     lmul, max_nregs, live_point, bb->index);
+  return max_nregs;
+}
+
+/* Return the LMUL of the current analysis.  */
+static int
+get_current_lmul (class loop *loop)
+{
+  return loop_autovec_infos.get (loop)->current_lmul;
+}
+
+/* Update the live ranges according PHI.
+
+   Loop:
+     bb 2:
+       STMT 1               -- point 0
+       STMT 2 (def SSA 1)   -- point 1
+       STMT 3 (use SSA 1)   -- point 2
+       STMT 4               -- point 3
+     bb 3:
+       SSA 2 = PHI<SSA 1>
+       STMT 1               -- point 0
+       STMT 2               -- point 1
+       STMT 3 (use SSA 2)   -- point 2
+       STMT 4               -- point 3
+
+   Before this function, the SSA 1 live range is [2, 3] in bb 2
+   and SSA 2 is [0, 3] in bb 3.
+
+   Then, after this function, we update SSA 1 live range in bb 2
+   into [2, 4] since SSA 1 is live out into bb 3.  */
+static void
+update_local_live_ranges (
+  vec_info *vinfo,
+  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
+  hash_map<basic_block, hash_map<tree, pair>> &live_ranges_per_bb)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+      unsigned int nbbs = loop->num_nodes;
+      unsigned int i, j;
+      gphi_iterator psi;
+      for (i = 0; i < nbbs; i++)
+	{
+	  basic_block bb = bbs[i];
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Update local program points for bb %d:\n",
+			     bb->index);
+	  for (psi = gsi_start_phis (bbs[i]); !gsi_end_p (psi); gsi_next (&psi))
+	    {
+	      gphi *phi = psi.phi ();
+	      stmt_vec_info stmt_info = vinfo->lookup_stmt (phi);
+	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+		  != undef_vec_info_type)
+		{
+		  for (j = 0; j < gimple_phi_num_args (phi); j++)
+		    {
+		      edge e = gimple_phi_arg_edge (phi, j);
+		      tree def = gimple_phi_arg_def (phi, j);
+		      auto *live_ranges = live_ranges_per_bb.get (e->src);
+		      if (!program_points_per_bb.get (e->src))
+			continue;
+		      unsigned int max_point
+			= (*program_points_per_bb.get (e->src)).length () - 1;
+		      auto *live_range = live_ranges->get (def);
+		      if (live_range)
+			{
+			  unsigned int end = (*live_range).second;
+			  (*live_range).second = max_point;
+			  if (dump_enabled_p ())
+			    dump_printf_loc (
+			      MSG_NOTE, vect_location,
+			      "Update %T end point from %d to %d:\n", def, end,
+			      (*live_range).second);
+			}
+		    }
+		}
+	    }
+	}
+    }
+}
+
 costs::costs (vec_info *vinfo, bool costing_for_scalar)
   : vector_costs (vinfo, costing_for_scalar)
 {}
 
+/* Return true that the LMUL of new COST model is preferred.  */
+bool
+costs::preferred_new_lmul_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+  auto this_loop_vinfo = as_a<loop_vec_info> (this->m_vinfo);
+  auto other_loop_vinfo = as_a<loop_vec_info> (other->m_vinfo);
+  class loop *loop = LOOP_VINFO_LOOP (this_loop_vinfo);
+
+  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (this_loop_vinfo)
+      && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (other_loop_vinfo))
+    return false;
+
+  if (loop_autovec_infos.get (loop) && loop_autovec_infos.get (loop)->end_p)
+    return false;
+  else if (loop_autovec_infos.get (loop))
+    loop_autovec_infos.get (loop)->current_lmul
+      = loop_autovec_infos.get (loop)->current_lmul / 2;
+  else
+    {
+      int regno_alignment
+	= riscv_get_v_regno_alignment (other_loop_vinfo->vector_mode);
+      if (known_eq (LOOP_VINFO_SLP_UNROLLING_FACTOR (other_loop_vinfo), 1U))
+	regno_alignment = RVV_M8;
+      loop_autovec_infos.put (loop, {regno_alignment, regno_alignment, false});
+    }
+
+  int lmul = get_current_lmul (loop);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Comparing two main loops (%s at VF %d vs %s at VF %d)\n",
+		     GET_MODE_NAME (this_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (this_loop_vinfo),
+		     GET_MODE_NAME (other_loop_vinfo->vector_mode),
+		     vect_vf_for_cost (other_loop_vinfo));
+
+  /* Compute local program points.
+     It's a fast and effective computation.  */
+  hash_map<basic_block, vec<stmt_point>> program_points_per_bb;
+  compute_local_program_points (other->m_vinfo, program_points_per_bb);
+
+  /* Compute local live ranges.  */
+  hash_map<basic_block, hash_map<tree, pair>> live_ranges_per_bb;
+  machine_mode biggest_mode
+    = compute_local_live_ranges (program_points_per_bb, live_ranges_per_bb);
+
+  /* Update live ranges according to PHI.  */
+  update_local_live_ranges (other->m_vinfo, program_points_per_bb,
+			    live_ranges_per_bb);
+
+  /* TODO: We calculate the maximum live vars base on current STMTS
+     sequence.  We can support live range shrink if it can give us
+     big improvement in the future.  */
+  if (!live_ranges_per_bb.is_empty ())
+    {
+      unsigned int max_nregs = 0;
+      for (hash_map<basic_block, hash_map<tree, pair>>::iterator iter
+	   = live_ranges_per_bb.begin ();
+	   iter != live_ranges_per_bb.end (); ++iter)
+	{
+	  basic_block bb = (*iter).first;
+	  unsigned int max_point
+	    = (*program_points_per_bb.get (bb)).length () - 1;
+	  if ((*iter).second.is_empty ())
+	    continue;
+	  /* We prefer larger LMUL unless it causes register spillings.  */
+	  unsigned int nregs
+	    = max_number_of_live_regs (bb, (*iter).second, max_point,
+				       biggest_mode, lmul);
+	  if (nregs > max_nregs)
+	    max_nregs = nregs;
+	  live_ranges_per_bb.empty ();
+	}
+      live_ranges_per_bb.empty ();
+      if (loop_autovec_infos.get (loop)->current_lmul == RVV_M1
+	  || max_nregs <= V_REG_NUM)
+	loop_autovec_infos.get (loop)->end_p = true;
+      if (loop_autovec_infos.get (loop)->current_lmul > RVV_M1)
+	return max_nregs > V_REG_NUM;
+      return false;
+    }
+  if (!program_points_per_bb.is_empty ())
+    {
+      for (hash_map<basic_block, vec<stmt_point>>::iterator iter
+	   = program_points_per_bb.begin ();
+	   iter != program_points_per_bb.end (); ++iter)
+	{
+	  vec<stmt_point> program_points = (*iter).second;
+	  if (!program_points.is_empty ())
+	    program_points.release ();
+	}
+      program_points_per_bb.empty ();
+    }
+  return lmul > RVV_M1;
+}
+
+bool
+costs::better_main_loop_than_p (const vector_costs *uncast_other) const
+{
+  auto other = static_cast<const costs *> (uncast_other);
+
+  if (!flag_vect_cost_model)
+    return vector_costs::better_main_loop_than_p (other);
+
+  if (riscv_autovec_lmul == RVV_DYNAMIC)
+    {
+      bool post_dom_available_p = dom_info_available_p (CDI_POST_DOMINATORS);
+      if (!post_dom_available_p)
+	calculate_dominance_info (CDI_POST_DOMINATORS);
+      bool preferred_p = preferred_new_lmul_p (uncast_other);
+      if (!post_dom_available_p)
+	free_dominance_info (CDI_POST_DOMINATORS);
+      return preferred_p;
+    }
+
+  return vector_costs::better_main_loop_than_p (other);
+}
+
 unsigned
 costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 		      stmt_vec_info stmt_info, slp_tree, tree vectype,
diff --git a/gcc/config/riscv/riscv-vector-costs.h b/gcc/config/riscv/riscv-vector-costs.h
index 57b1be01048..7b5814a4cff 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -23,6 +23,23 @@
 
 namespace riscv_vector {
 
+struct stmt_point
+{
+  /* Program point.  */
+  unsigned int point;
+  gimple *stmt;
+};
+
+/* Pair typedef used by live range: <start, end>.  */
+typedef std::pair<unsigned int, unsigned int> pair;
+
+struct autovec_info
+{
+  unsigned int initial_lmul;
+  unsigned int current_lmul;
+  bool end_p;
+};
+
 /* rvv-specific vector costs.  */
 class costs : public vector_costs
 {
@@ -31,12 +48,16 @@ class costs : public vector_costs
 public:
   costs (vec_info *, bool);
 
+  bool better_main_loop_than_p (const vector_costs *other) const override;
+
 private:
   unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
 			      stmt_vec_info stmt_info, slp_tree node,
 			      tree vectype, int misalign,
 			      vect_cost_model_location where) override;
   void finish_cost (const vector_costs *) override;
+
+  bool preferred_new_lmul_p (const vector_costs *) const;
 };
 
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index b1f80d1d87c..ec5d563859e 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -70,7 +70,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
 riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TARGET_H) $(FUNCTION_H) \
   $(TREE_H) basic-block.h $(RTL_H) gimple.h targhooks.h cfgloop.h \
-  fold-const.h $(TM_P_H) tree-vectorizer.h \
+  fold-const.h $(TM_P_H) tree-vectorizer.h gimple-iterator.h bitmap.h \
+  ssa.h backend.h \
   $(srcdir)/config/riscv/riscv-vector-costs.h
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/riscv/riscv-vector-costs.cc
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
new file mode 100644
index 00000000000..fd9f38bc766
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = d5[i] + b[i];
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
new file mode 100644
index 00000000000..6c414bcd115
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-1.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
new file mode 100644
index 00000000000..b77f3ff58ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-2.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      e[i] = a2[i] + c2[i];
+      e2[i] = d2[i] + a2[i];
+      e3[i] = d3[i] + a3[i];
+      e4[i] = d4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i];
+    }
+}
+
+/* FIXME: Choosing LMUL = 1 is not the optimal since it can be LMUL = 2 if we apply instruction scheduler.  */
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
new file mode 100644
index 00000000000..164930c9bba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c
@@ -0,0 +1,91 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
new file mode 100644
index 00000000000..8d80fbfe390
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-4.c
@@ -0,0 +1,121 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
new file mode 100644
index 00000000000..7b4014ddaf6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-5.c
@@ -0,0 +1,149 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int32_t *__restrict e,
+      int32_t *__restrict e2,
+      int32_t *__restrict e3,
+      int32_t *__restrict e4,
+      int32_t *__restrict e5,
+      int32_t *__restrict f,
+      int32_t *__restrict f2,
+      int32_t *__restrict f3,
+      int32_t *__restrict f4,
+      int32_t *__restrict f5,
+      int32_t *__restrict g,
+      int32_t *__restrict g2,
+      int32_t *__restrict g3,
+      int32_t *__restrict g4,
+      int32_t *__restrict g5,
+  
+      int32_t *__restrict gg,
+      int32_t *__restrict gg2,
+      int32_t *__restrict gg3,
+      int32_t *__restrict gg4,
+      int32_t *__restrict gg5,
+  
+      int32_t *__restrict ggg,
+      int32_t *__restrict ggg2,
+      int32_t *__restrict ggg3,
+      int32_t *__restrict ggg4,
+      int32_t *__restrict ggg5,
+
+      int32_t *__restrict gggg,
+      int32_t *__restrict gggg2,
+      int32_t *__restrict gggg3,
+      int32_t *__restrict gggg4,
+      int32_t *__restrict gggg5,
+
+      int32_t *__restrict ggggg,
+      int32_t *__restrict ggggg2,
+      int32_t *__restrict ggggg3,
+      int32_t *__restrict ggggg4,
+      int32_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
new file mode 100644
index 00000000000..51d05f2bec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c
@@ -0,0 +1,150 @@
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int8_t *__restrict e,
+      int8_t *__restrict e2,
+      int8_t *__restrict e3,
+      int8_t *__restrict e4,
+      int8_t *__restrict e5,
+      int8_t *__restrict f,
+      int8_t *__restrict f2,
+      int8_t *__restrict f3,
+      int8_t *__restrict f4,
+      int8_t *__restrict f5,
+      int8_t *__restrict g,
+      int8_t *__restrict g2,
+      int8_t *__restrict g3,
+      int8_t *__restrict g4,
+      int8_t *__restrict g5,
+  
+      int8_t *__restrict gg,
+      int8_t *__restrict gg2,
+      int8_t *__restrict gg3,
+      int8_t *__restrict gg4,
+      int8_t *__restrict gg5,
+  
+      int8_t *__restrict ggg,
+      int8_t *__restrict ggg2,
+      int8_t *__restrict ggg3,
+      int8_t *__restrict ggg4,
+      int8_t *__restrict ggg5,
+
+      int8_t *__restrict gggg,
+      int8_t *__restrict gggg2,
+      int8_t *__restrict gggg3,
+      int8_t *__restrict gggg4,
+      int8_t *__restrict gggg5,
+
+      int8_t *__restrict ggggg,
+      int8_t *__restrict ggggg2,
+      int8_t *__restrict ggggg3,
+      int8_t *__restrict ggggg4,
+      int8_t *__restrict ggggg5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+
+      e[i] = c2[i] + c2[i];
+      e2[i] = c2[i] + d2[i];
+      e3[i] = d3[i] + d3[i];
+      e4[i] = c4[i] + a4[i];
+      e5[i] = a[i] + a4[i];
+      a5[i] = a[i] + a4[i];
+
+      f[i] = e2[i] + c2[i];
+      f2[i] = e2[i] + d2[i];
+      f3[i] = e3[i] + d3[i];
+      f4[i] = e4[i] + a4[i];
+      f5[i] = e[i] + a4[i];
+      f5[i] = e5[i] + a4[i];
+
+      g[i] = f2[i] + c2[i];
+      g2[i] = f2[i] + d2[i];
+      g3[i] = f3[i] + d3[i];
+      g4[i] = f4[i] + a4[i];
+      g5[i] = f[i] + a4[i];
+      g5[i] = f5[i] + a4[i];
+
+
+      gg[i] = f2[i] + c2[i];
+      gg2[i] = f2[i] + d2[i];
+      gg3[i] = f3[i] + d3[i];
+      gg4[i] = f4[i] + a4[i];
+      gg5[i] = f[i] + a4[i];
+      gg5[i] = f5[i] + a4[i];
+
+
+      ggg[i] = f2[i] + c2[i];
+      ggg2[i] = f2[i] + d2[i];
+      ggg3[i] = f3[i] + d3[i];
+      ggg4[i] = f4[i] + a4[i];
+      ggg5[i] = f[i] + a4[i];
+      ggg5[i] = f5[i] + a4[i];
+
+      gggg[i] = f2[i] + c2[i];
+      gggg2[i] = f2[i] + d2[i];
+      gggg3[i] = f3[i] + d3[i];
+      gggg4[i] = f4[i] + a4[i];
+      gggg5[i] = f[i] + a4[i];
+      gggg5[i] = f5[i] + a4[i];
+
+      ggggg[i] = f2[i] + c2[i];
+      ggggg2[i] = f2[i] + d2[i];
+      ggggg3[i] = f3[i] + d3[i];
+      ggggg4[i] = f4[i] + a4[i];
+      ggggg5[i] = f[i] + a4[i];
+      ggggg5[i] = f5[i] + a4[i];
+
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i]
+      * e[i] * e2[i] * e3[i] * e4[i] * e5[i]
+      * f[i] * f2[i] * f3[i] * f4[i] * f5[i]
+      * g[i] * g2[i] * g3[i] * g4[i] * g5[i]
+      * gg[i] * gg2[i] * gg3[i] * gg4[i] * gg5[i]
+      * ggg[i] * ggg2[i] * ggg3[i] * ggg4[i] * ggg5[i]
+      * gggg[i] * gggg2[i] * gggg3[i] * gggg4[i] * gggg5[i]
+      * ggggg[i] * ggggg2[i] * ggggg3[i] * ggggg4[i] * ggggg5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
new file mode 100644
index 00000000000..dfd71414b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -Wno-psabi -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+vint32m8_t
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n, vint32m8_t vector)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return vector;
+}
+
+/* { dg-final { scan-assembler {e32,m1} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 1" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
new file mode 100644
index 00000000000..a80b1b1556a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b,    int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d,
+      int8_t *__restrict d2,
+      int8_t *__restrict d3,
+      int8_t *__restrict d4,
+      int8_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
new file mode 100644
index 00000000000..ce83bb22324
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b,    int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d,
+      int32_t *__restrict d2,
+      int32_t *__restrict d3,
+      int32_t *__restrict d4,
+      int32_t *__restrict d5,
+      int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
+      * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
+      * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
new file mode 100644
index 00000000000..9964f3fe8ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 (a, 32);
+  __riscv_vse32_v_i32m1 (c, v, 32);
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
new file mode 100644
index 00000000000..ab670bf0c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+typedef int8_t v128qi __attribute__ ((vector_size (128)));
+
+v128qi global_v;
+
+v128qi
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return global_v + 3;
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
new file mode 100644
index 00000000000..a01e32b9f8d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+typedef int8_t v128qi __attribute__ ((vector_size (128)));
+
+v128qi global_v;
+
+v128qi
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+     int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+     int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+     int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+     int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+     int32_t *__restrict d4, int32_t *__restrict d5, int n)
+{
+  for (int i = 0; i < 128; i++)
+    b[i] = global_v[i] + 8;
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d2[i] = a2[i] + c2[i];
+      d3[i] = a3[i] + c3[i];
+      d4[i] = a4[i] + c4[i];
+      d5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i] + a[i];
+
+      c2[i] = a[i] + c[i];
+      c3[i] = b5[i] * a5[i];
+      c4[i] = a2[i] * a3[i];
+      c5[i] = b5[i] * a2[i];
+      c[i] = a[i] + c3[i];
+      c2[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]
+	     + a[i] * a2[i] * a3[i] * a4[i] * a5[i] * c[i] * c2[i] * c3[i]
+		 * c4[i] * c5[i] * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
+    }
+    return global_v + 3;
+}
+
+/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
new file mode 100644
index 00000000000..156ccc7f98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c,
+      int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
+      int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
+      int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
+      int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
+      int32_t *__restrict d, int32_t *__restrict d2, int32_t *__restrict d3,
+      int32_t *__restrict d4, int32_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
new file mode 100644
index 00000000000..4cacc039dcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict c,
+      int8_t *__restrict a2, int8_t *__restrict b2, int8_t *__restrict c2,
+      int8_t *__restrict a3, int8_t *__restrict b3, int8_t *__restrict c3,
+      int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict c4,
+      int8_t *__restrict a5, int8_t *__restrict b5, int8_t *__restrict c5,
+      int8_t *__restrict d, int8_t *__restrict d2, int8_t *__restrict d3,
+      int8_t *__restrict d4, int8_t *__restrict d5, int n, int m)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      d[i] = a[i] - a2[i];
+      d2[i] = a2[i] * a[i];
+      d3[i] = a3[i] * a2[i];
+      d4[i] = a2[i] * d2[i];
+      d5[i] = a[i] * a2[i] * a3[i] * a4[i] * d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
new file mode 100644
index 00000000000..2308109f00c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int8_t *__restrict b,
+          int8_t *__restrict c,
+          int8_t *__restrict a2,
+          int8_t *__restrict b2,
+          int8_t *__restrict c2,
+          int8_t *__restrict a3,
+          int8_t *__restrict b3,
+          int8_t *__restrict c3,
+          int8_t *__restrict a4,
+          int8_t *__restrict b4,
+          int8_t *__restrict c4,
+          int64_t *__restrict a5,
+          int8_t *__restrict b5,
+          int8_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
new file mode 100644
index 00000000000..2a1521bffb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int64_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int64_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
new file mode 100644
index 00000000000..928a507a363
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int16_t *__restrict a,
+          int32_t *__restrict b,
+          int32_t *__restrict c,
+          int32_t *__restrict a2,
+          int32_t *__restrict b2,
+          int32_t *__restrict c2,
+          int32_t *__restrict a3,
+          int32_t *__restrict b3,
+          int32_t *__restrict c3,
+          int32_t *__restrict a4,
+          int32_t *__restrict b4,
+          int32_t *__restrict c4,
+          int16_t *__restrict a5,
+          int32_t *__restrict b5,
+          int32_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
new file mode 100644
index 00000000000..f16cfb9fd08
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 7] + 1;
+      a[i * 8 + 1] = b[i * 8 + 6] + 2;
+      a[i * 8 + 2] = b[i * 8 + 5] + 3;
+      a[i * 8 + 3] = b[i * 8 + 4] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 2] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
new file mode 100644
index 00000000000..e324380e27b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void foo2 (int8_t *__restrict a,
+          int64_t *__restrict b,
+          int64_t *__restrict c,
+          int64_t *__restrict a2,
+          int64_t *__restrict b2,
+          int64_t *__restrict c2,
+          int64_t *__restrict a3,
+          int64_t *__restrict b3,
+          int64_t *__restrict c3,
+          int64_t *__restrict a4,
+          int64_t *__restrict b4,
+          int64_t *__restrict c4,
+          int8_t *__restrict a5,
+          int64_t *__restrict b5,
+          int64_t *__restrict c5,
+          int n)
+{
+    for (int i = 0; i < n; i++){
+      a[i] = b[i] + c[i];
+      b5[i] = b[i] + c[i];
+      a2[i] = b2[i] + c2[i];
+      a3[i] = b3[i] + c3[i];
+      a4[i] = b4[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a5[i] + b5[i]+ a[i];
+
+      a[i] = a[i] + c[i];
+      b5[i] = a[i] + c[i];
+      a2[i] = a[i] + c2[i];
+      a3[i] = a[i] + c3[i];
+      a4[i] = a[i] + c4[i];
+      a5[i] = a[i] + a4[i];
+      a[i] = a[i] + b5[i]+ a[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
new file mode 100644
index 00000000000..553f2aac0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fselective-scheduling -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 16] = b[i * 16 + 15] + 1;
+      a[i * 16 + 1] = b[i * 16 + 14] + 2;
+      a[i * 16 + 2] = b[i * 16 + 13] + 3;
+      a[i * 16 + 3] = b[i * 16 + 12] + 4;
+      a[i * 16 + 4] = b[i * 16 + 11] + 5;
+      a[i * 16 + 5] = b[i * 16 + 10] + 6;
+      a[i * 16 + 6] = b[i * 16 + 9] + 7;
+      a[i * 16 + 7] = b[i * 16 + 8] + 8;
+      
+      a[i * 16 + 8] = b[i * 16 + 7] + 1;
+      a[i * 16 + 9] = b[i * 16 + 6] + 2;
+      a[i * 16 + 10] = b[i * 16 + 5] + 3;
+      a[i * 16 + 11] = b[i * 16 + 4] + 4;
+      a[i * 16 + 12] = b[i * 16 + 3] + 5;
+      a[i * 16 + 13] = b[i * 16 + 2] + 6;
+      a[i * 16 + 14] = b[i * 16 + 1] + 7;
+      a[i * 16 + 15] = b[i * 16 + 0] + 8;
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m4} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
new file mode 100644
index 00000000000..e2483004698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
new file mode 100644
index 00000000000..e65abb299a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int
+foo (int *x, int n, int res)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      res += x[i * 2];
+      res += x[i * 2 + 1];
+    }
+  return res;
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-times {csrr} 1 } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
new file mode 100644
index 00000000000..a50265fc1ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int16_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
new file mode 100644
index 00000000000..3e9751a22ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
new file mode 100644
index 00000000000..3b2527aad5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+#include <stddef.h>
+
+void
+foo (size_t *__restrict a, size_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+    a[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
new file mode 100644
index 00000000000..d63926fa56a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++){
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+    a[i] = a[i] + b[i];
+  }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
new file mode 100644
index 00000000000..5c816140950
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict a2,
+     int8_t *__restrict b2, int8_t *__restrict a3, int8_t *__restrict b3,
+     int8_t *__restrict a4, int8_t *__restrict b4, int8_t *__restrict a5,
+     int8_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
new file mode 100644
index 00000000000..596608bb8d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+void
+foo (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict a2,
+     int32_t *__restrict b2, int32_t *__restrict a3, int32_t *__restrict b3,
+     int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict a5,
+     int32_t *__restrict b5, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = b[i] * a2[i] * b2[i] * a3[i] * b3[i] * a4[i] * b4[i] * a5[i] * b5[i];
+    }
+}
+
+/* { dg-final { scan-assembler {e32,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
new file mode 100644
index 00000000000..a859d976555
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int8_t
+foo (int8_t *__restrict a, int8_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e8,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
new file mode 100644
index 00000000000..b965fd0373a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+int64_t
+foo (int64_t *__restrict a, int64_t init, int n)
+{
+  for (int i = 0; i < n; i++)
+    init += a[i];
+  return init;
+}
+
+/* { dg-final { scan-assembler {e64,m8} } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
new file mode 100644
index 00000000000..a3e8f50b73f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/rvv-costmodel-vect.exp
@@ -0,0 +1,52 @@
+# Copyright (C) 2023-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a riscv target.
+if { ![istarget riscv*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+set gcc_march "rv64gcv_zvfh"
+set gcc_mabi  "lp64d"
+if [istarget riscv32-*-*] then {
+  set gcc_march "rv32gcv_zvfh"
+  set gcc_mabi  "ilp32d"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/dynamic-lmul*.\[cS\]]] \
+	"-O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic" $CFLAGS
+
+# All done.
+dg-finish
-- 
2.36.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  6:49 [PATCH V4] RISC-V: Support Dynamic LMUL Cost model Juzhe-Zhong
@ 2023-09-12  8:19 ` Robin Dapp
  2023-09-12  9:14   ` juzhe.zhong
  2023-09-12  9:17 ` Robin Dapp
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Dapp @ 2023-09-12  8:19 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw

Hi Juzhe,

> +max_number_of_live_regs (const basic_block bb,
> +			 const hash_map<tree, pair> &live_ranges,
> +			 unsigned int max_point, machine_mode biggest_mode,
> +			 int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i;
> +  unsigned int live_point = 0;
> +  auto_vec<unsigned int> live_vars_vec;
> +  live_vars_vec.safe_grow (max_point + 1, true);
> +  for (i = 0; i < live_vars_vec.length (); ++i)
> +    live_vars_vec[i] = 0;
> +  for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
> +       iter != live_ranges.end (); ++iter)
> +    {
> +      tree var = (*iter).first;
> +      pair live_range = (*iter).second;
> +      for (i = live_range.first; i <= live_range.second; i++)
> +	{
> +	  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> +	  unsigned int nregs
> +	    = compute_nregs_for_mode (mode, biggest_mode, lmul);
> +	  live_vars_vec[i] += nregs;
> +	  if (live_vars_vec[i] > max_nregs)
> +	    max_nregs = live_vars_vec[i];
> +	}
> +    }

My concern is that we have O(nm) here, where n = number of live_ranges
and m = size of live range.  In large basic blocks (think calculix of
SPECfp 2006 which can reach up to 2000 instructions IIRC) this might
become prohibitive.

I'm going to do a quick benchmark with calculix and report back.  If
there is no noticable difference we can ditch my idea.

For short live ranges (like < 10) the O(nm) could be better.  As of now,
we still calculate the nregs n*m times, though.  I have something like
the following in mind (it is definitely not shorter, though):

  struct range {
      unsigned int pt;
      bool start;
      unsigned int nregs;
  };

  auto_vec<range> ranges (2 * live_ranges.elements ());
  for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
       iter != live_ranges.end (); ++iter)
    {
      tree var = (*iter).first;
      machine_mode mode = TYPE_MODE (TREE_TYPE (var));
      unsigned int nregs
	  = compute_nregs_for_mode (mode, biggest_mode, lmul);
      ranges.quick_push ({(*iter).second.first, true, nregs});
      ranges.quick_push ({(*iter).second.second, false, nregs});
    }

  ranges.qsort ([] (const void *a, const void *b) -> int {
		unsigned int aa = ((const range *)a)->pt;
		unsigned int bb = ((const range *)b)->pt;
		if (aa < bb)
		  return -1;
		if (aa == bb)
		  return 0;
		return 1;
		});

  unsigned int cur = 0;
  max_nregs = ranges[0].nregs;

  for (auto r : ranges)
    {
      if (r.start)
	cur += r.nregs;
      else
	cur -= r.nregs;
      max_nregs = MAX (max_nregs, cur);
    }

> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +    {
> +      tree t = ssa_name (i);
> +      if (!t)
> +       continue;

Could likely be replaced by

  tree t;
  FOR_EACH_SSA_NAME (i, t, cfun)

> +static void
> +update_local_live_ranges (
> +  vec_info *vinfo,
> +  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
> +  hash_map<basic_block, hash_map<tree, pair>> &live_ranges_per_bb)
> +{

I just realized (sorry) that this is "nested" a bit far.  Can we still
have e.g. 

> +  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> +    {

this,

> +	      if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
> +		  != undef_vec_info_type)

this,

> +		      if (live_range)
> +			{

and this just "continue"?

Apart from that, LGTM.

Regards
 Robin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  8:19 ` Robin Dapp
@ 2023-09-12  9:14   ` juzhe.zhong
  0 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong @ 2023-09-12  9:14 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]

Thanks Robin.

I have tried your codes. It works fine and tests passes.
Does your code O(nlogn) complexity ?




juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-09-12 16:19
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
> +max_number_of_live_regs (const basic_block bb,
> + const hash_map<tree, pair> &live_ranges,
> + unsigned int max_point, machine_mode biggest_mode,
> + int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i;
> +  unsigned int live_point = 0;
> +  auto_vec<unsigned int> live_vars_vec;
> +  live_vars_vec.safe_grow (max_point + 1, true);
> +  for (i = 0; i < live_vars_vec.length (); ++i)
> +    live_vars_vec[i] = 0;
> +  for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
> +       iter != live_ranges.end (); ++iter)
> +    {
> +      tree var = (*iter).first;
> +      pair live_range = (*iter).second;
> +      for (i = live_range.first; i <= live_range.second; i++)
> + {
> +   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> +   unsigned int nregs
> +     = compute_nregs_for_mode (mode, biggest_mode, lmul);
> +   live_vars_vec[i] += nregs;
> +   if (live_vars_vec[i] > max_nregs)
> +     max_nregs = live_vars_vec[i];
> + }
> +    }
 
My concern is that we have O(nm) here, where n = number of live_ranges
and m = size of live range.  In large basic blocks (think calculix of
SPECfp 2006 which can reach up to 2000 instructions IIRC) this might
become prohibitive.
 
I'm going to do a quick benchmark with calculix and report back.  If
there is no noticable difference we can ditch my idea.
 
For short live ranges (like < 10) the O(nm) could be better.  As of now,
we still calculate the nregs n*m times, though.  I have something like
the following in mind (it is definitely not shorter, though):
 
  struct range {
      unsigned int pt;
      bool start;
      unsigned int nregs;
  };
 
  auto_vec<range> ranges (2 * live_ranges.elements ());
  for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
       iter != live_ranges.end (); ++iter)
    {
      tree var = (*iter).first;
      machine_mode mode = TYPE_MODE (TREE_TYPE (var));
      unsigned int nregs
  = compute_nregs_for_mode (mode, biggest_mode, lmul);
      ranges.quick_push ({(*iter).second.first, true, nregs});
      ranges.quick_push ({(*iter).second.second, false, nregs});
    }
 
  ranges.qsort ([] (const void *a, const void *b) -> int {
unsigned int aa = ((const range *)a)->pt;
unsigned int bb = ((const range *)b)->pt;
if (aa < bb)
  return -1;
if (aa == bb)
  return 0;
return 1;
});
 
  unsigned int cur = 0;
  max_nregs = ranges[0].nregs;
 
  for (auto r : ranges)
    {
      if (r.start)
cur += r.nregs;
      else
cur -= r.nregs;
      max_nregs = MAX (max_nregs, cur);
    }
 
> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +    {
> +      tree t = ssa_name (i);
> +      if (!t)
> +       continue;
 
Could likely be replaced by
 
  tree t;
  FOR_EACH_SSA_NAME (i, t, cfun)
 
> +static void
> +update_local_live_ranges (
> +  vec_info *vinfo,
> +  hash_map<basic_block, vec<stmt_point>> &program_points_per_bb,
> +  hash_map<basic_block, hash_map<tree, pair>> &live_ranges_per_bb)
> +{
 
I just realized (sorry) that this is "nested" a bit far.  Can we still
have e.g. 
 
> +  if (loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo))
> +    {
 
this,
 
> +       if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
> +   != undef_vec_info_type)
 
this,
 
> +       if (live_range)
> + {
 
and this just "continue"?
 
Apart from that, LGTM.
 
Regards
Robin
 
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  6:49 [PATCH V4] RISC-V: Support Dynamic LMUL Cost model Juzhe-Zhong
  2023-09-12  8:19 ` Robin Dapp
@ 2023-09-12  9:17 ` Robin Dapp
  2023-09-12  9:25   ` juzhe.zhong
  1 sibling, 1 reply; 9+ messages in thread
From: Robin Dapp @ 2023-09-12  9:17 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw

I did some benchmarks and, at least for calculix the differences are
miniscule.  I'd say we can stick with the current approach and improve
as needed.

However, I noticed ICEs here:

+  gcc_assert (biggest_size >= mode_size);

and here:

+  mode = TYPE_MODE (TREE_TYPE (lhs));

when compiling calculix.

Regards
 Robin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  9:17 ` Robin Dapp
@ 2023-09-12  9:25   ` juzhe.zhong
  2023-09-12  9:31     ` Robin Dapp
  0 siblings, 1 reply; 9+ messages in thread
From: juzhe.zhong @ 2023-09-12  9:25 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

Is calculix big ?

Could you give me the testcase to reproduce it?

For +  gcc_assert (biggest_size >= mode_size);
I currently don't have an idea to fix it.

But for +  mode = TYPE_MODE (TREE_TYPE (lhs));
I think I can fix it. 

if (!gimple_store_p (stmt))
                {
                  tree lhs = gimple_get_lhs (stmt);
                  mode = TYPE_MODE (TREE_TYPE (lhs));

If it is not a STORE, I assume it always has a LHS. Turns out that my original thought is incorrect.
I think I know the fix.

juzhe.zhong@rivai.ai

From: Robin Dapp
Date: 2023-09-12 17:17
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
I did some benchmarks and, at least for calculix the differences are
miniscule.  I'd say we can stick with the current approach and improve
as needed.

However, I noticed ICEs here:

+  gcc_assert (biggest_size >= mode_size);

and here:

+  mode = TYPE_MODE (TREE_TYPE (lhs));

when compiling calculix.

Regards
Robin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  9:25   ` juzhe.zhong
@ 2023-09-12  9:31     ` Robin Dapp
  2023-09-12  9:36       ` juzhe.zhong
       [not found]       ` <2023091217364091739043@rivai.ai>
  0 siblings, 2 replies; 9+ messages in thread
From: Robin Dapp @ 2023-09-12  9:31 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, Kito.cheng, jeffreyalaw

> Is calculix big ?

It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.

> Could you give me the testcase to reproduce it?

OK, I will try to reduce it, will be Fortran, though.

Regards
 Robin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  9:31     ` Robin Dapp
@ 2023-09-12  9:36       ` juzhe.zhong
  2023-09-12 10:57         ` Robin Dapp
       [not found]       ` <2023091217364091739043@rivai.ai>
  1 sibling, 1 reply; 9+ messages in thread
From: juzhe.zhong @ 2023-09-12  9:36 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]

This is first version of dynamic LMUL.
I didn't test it with full GCC testsuite.

My plan is to first pass all GCC testsuite (including vect.exp) with default LMUL = M1.
Then enable dynamic LMUL to test it.

Maybe we could tolerate this ICE issue for now. Then we can test it with full GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the future).

Is that reasonable ? If yes, I will fix all your comments and send V5.

juzhe.zhong@rivai.ai

From: Robin Dapp
Date: 2023-09-12 17:31
To: juzhe.zhong@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
> Is calculix big ?

It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.

> Could you give me the testcase to reproduce it?

OK, I will try to reduce it, will be Fortran, though.

Regards
Robin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
       [not found]       ` <2023091217364091739043@rivai.ai>
@ 2023-09-12  9:38         ` juzhe.zhong
  0 siblings, 0 replies; 9+ messages in thread
From: juzhe.zhong @ 2023-09-12  9:38 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

Then you don't need to waste time on reduce the case from SPEC.

juzhe.zhong@rivai.ai

From: juzhe.zhong@rivai.ai
Date: 2023-09-12 17:36
To: Robin Dapp; gcc-patches
CC: Robin Dapp; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
This is first version of dynamic LMUL.
I didn't test it with full GCC testsuite.

My plan is to first pass all GCC testsuite (including vect.exp) with default LMUL = M1.
Then enable dynamic LMUL to test it.

Maybe we could tolerate this ICE issue for now. Then we can test it with full GCC testsuite (I belive we can reproduce with some case in GCC testsuite in the future).

Is that reasonable ? If yes, I will fix all your comments and send V5.

juzhe.zhong@rivai.ai

From: Robin Dapp
Date: 2023-09-12 17:31
To: juzhe.zhong@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
> Is calculix big ?

It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops.  I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.

> Could you give me the testcase to reproduce it?

OK, I will try to reduce it, will be Fortran, though.

Regards
Robin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4] RISC-V: Support Dynamic LMUL Cost model
  2023-09-12  9:36       ` juzhe.zhong
@ 2023-09-12 10:57         ` Robin Dapp
  0 siblings, 0 replies; 9+ messages in thread
From: Robin Dapp @ 2023-09-12 10:57 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, Kito.cheng, jeffreyalaw


> This is first version of dynamic LMUL.
> I didn't test it with full GCC testsuite.
> 
> My plan is to first pass all GCC testsuite (including vect.exp) with default LMUL = M1.
> Then enable dynamic LMUL to test it.
> 
> Maybe we could tolerate this ICE issue for now. Then we can test it
> with full GCC testsuite (I belive we can reproduce with some case in
> GCC testsuite in the future).
> 
> Is that reasonable ? If yes, I will fix all your comments and send V5.

Yes, works for me.

Regards
 Robin


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-09-12 10:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-12  6:49 [PATCH V4] RISC-V: Support Dynamic LMUL Cost model Juzhe-Zhong
2023-09-12  8:19 ` Robin Dapp
2023-09-12  9:14   ` juzhe.zhong
2023-09-12  9:17 ` Robin Dapp
2023-09-12  9:25   ` juzhe.zhong
2023-09-12  9:31     ` Robin Dapp
2023-09-12  9:36       ` juzhe.zhong
2023-09-12 10:57         ` Robin Dapp
     [not found]       ` <2023091217364091739043@rivai.ai>
2023-09-12  9:38         ` juzhe.zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).