[PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization
@ 2023-06-06 11:46 juzhe.zhong
  2023-06-06 11:46 ` [PATCH] RISC-V: Enable SELECT_VL for RVV juzhe.zhong
  2023-06-06 11:46 ` [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization juzhe.zhong
  0 siblings, 2 replies; 8+ messages in thread
From: juzhe.zhong @ 2023-06-06 11:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, palmer, rdapp.gcc, jeffreyalaw, Juzhe-Zhong

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

Fix according to comments from Robin of V1 patch.

This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
	  int n)
{
  for (int i = 0; i < n; i++)
    dst[i] += (int16_t) a[i] * (int16_t) b[i];
}

Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..

After this patch:
...
vwmaccsu.vv
...

gcc/ChangeLog:

        * config/riscv/autovec-opt.md (*<optab>_fma<mode>): New pattern.
        (*single_<optab>mult_plus<mode>): Ditto.
        (*double_<optab>mult_plus<mode>): Ditto.
        (*sign_zero_extend_fma): Ditto.
        (*zero_sign_extend_fma): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): New enum.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.

---
 gcc/config/riscv/autovec-opt.md               | 162 ++++++++++++++++++
 gcc/config/riscv/riscv-protos.h               |   1 +
 .../riscv/rvv/autovec/widen/widen-8.c         |  27 +++
 .../riscv/rvv/autovec/widen/widen-9.c         |  23 +++
 .../rvv/autovec/widen/widen-complicate-5.c    |  32 ++++
 .../rvv/autovec/widen/widen-complicate-6.c    |  30 ++++
 .../riscv/rvv/autovec/widen/widen_run-8.c     |  38 ++++
 .../riscv/rvv/autovec/widen/widen_run-9.c     |  35 ++++
 8 files changed, 348 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f6052b50572..1c36b5f56be 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -170,3 +170,165 @@
   }
   [(set_attr "type" "vmalu")
    (set_attr "mode" "<MODE>")])
+
+;; =========================================================================
+;; == Widening Ternary arithmetic
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] VWMACC
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vwmacc.vv
+;; - vwmaccu.vv
+;; -------------------------------------------------------------------------
+
+;; Combine ext + ext + fma ===> widen fma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;;   vect__8.64_40 = (vector([4,4]) int) vect__7.63_41;
+;;   vect__11.68_35 = (vector([4,4]) int) vect__10.67_36;
+;;   vect__13.70_33 = .FMA (vect__11.68_35, vect__8.64_40, vect__4.60_45);
+(define_insn_and_split "*<optab>_fma<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+	(plus:VWEXTI
+	  (mult:VWEXTI
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand")))
+	  (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
+    riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus (<CODE>, <MODE>mode),
+					   riscv_vector::RVV_WIDEN_TERNOP, ops);
+    DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; This helps to match ext + fma to enhance the combine optimizations.
+(define_insn_and_split "*single_<optab>mult_plus<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+	(plus:VWEXTI
+	  (mult:VWEXTI
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))
+	    (match_operand:VWEXTI 3 "register_operand"))
+	  (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_vf2 (<CODE>, <MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ext_ops[] = {tmp, operands[2]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops);
+
+    rtx dst = expand_ternary_op (<MODE>mode, fma_optab, tmp, operands[3],
+				 operands[1], operands[0], 0);
+    emit_move_insn (operands[0], dst);
+    DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; Combine ext + ext + mult + plus ===> widen fma.
+;; We have some special cases generated by LoopVectorizer:
+;;   vect__8.18_46 = (vector([8,8]) signed short) vect__7.17_47;
+;;   vect__11.22_41 = (vector([8,8]) signed short) vect__10.21_42;
+;;   vect__12.23_40 = vect__11.22_41 * vect__8.18_46;
+;;   vect__14.25_38 = vect__13.24_39 + vect__5.14_51;
+;; This situation doesn't generate FMA IR.
+(define_insn_and_split "*double_<optab>mult_plus<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand")
+	     (match_operand 6 "vector_length_operand")
+	     (match_operand 7 "const_int_operand")
+	     (match_operand 8 "const_int_operand")
+	     (match_operand 9 "const_int_operand")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+          (plus:VWEXTI
+	    (if_then_else:VWEXTI
+	      (unspec:<VM>
+	        [(match_dup 1)
+	         (match_dup 6)
+	         (match_dup 7)
+	         (match_dup 8)
+	         (match_dup 9)
+	         (reg:SI VL_REGNUM)
+	         (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	      (mult:VWEXTI
+	        (any_extend:VWEXTI
+	          (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand"))
+	        (any_extend:VWEXTI
+	          (match_operand:<V_DOUBLE_TRUNC> 5 "register_operand")))
+              (match_operand:VWEXTI 2 "vector_undef_operand"))
+	    (match_operand:VWEXTI 3 "register_operand"))
+          (match_dup 2)))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    emit_insn (gen_pred_widen_mul_plus (<CODE>, <MODE>mode, operands[0],
+					operands[1], operands[3], operands[4],
+					operands[5], operands[6], operands[7],
+					operands[8], operands[9]));
+    DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; Combine sign_extend + zero_extend + fma ===> widen fma (su).
+(define_insn_and_split "*sign_zero_extend_fma"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+	(plus:VWEXTI
+	  (mult:VWEXTI
+	    (sign_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))
+	    (zero_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand")))
+	  (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plussu (<MODE>mode),
+					   riscv_vector::RVV_WIDEN_TERNOP, operands);
+    DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; This helps to match zero_extend + sign_extend + fma
+;; to enhance the combine optimizations.
+(define_insn_and_split "*zero_sign_extend_fma"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+	(plus:VWEXTI
+	  (mult:VWEXTI
+	    (zero_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))
+	    (sign_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand")))
+	  (match_operand:VWEXTI 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    rtx ops[] = {operands[0], operands[1], operands[3], operands[2]};
+    riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plussu (<MODE>mode),
+					   riscv_vector::RVV_WIDEN_TERNOP, ops);
+    DONE;
+  }
+  [(set_attr "type" "viwmuladd")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 27ecd16e496..b311b937f17 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -143,6 +143,7 @@ enum insn_type
   RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
   RVV_UNOP_MU = RVV_UNOP + 2,	  /* Likewise.  */
   RVV_TERNOP = 5,
+  RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
 };
 enum vlmul_type
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
new file mode 100644
index 00000000000..f3ca07c02e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwmacc_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,  \
+						       TYPE2 *__restrict a,    \
+						       TYPE2 *__restrict b,    \
+						       int n)                  \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] += (TYPE1) a[i] * (TYPE1) b[i];                                   \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmacc\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwmaccu\.vv} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
new file mode 100644
index 00000000000..969a1e8f80c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-9.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, TYPE3)                                         \
+  __attribute__ ((noipa)) void vwmacc_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,  \
+						       TYPE2 *__restrict a,    \
+						       TYPE3 *__restrict b,    \
+						       int n)                  \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] += (TYPE1) a[i] * (TYPE1) b[i];                                   \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t, uint8_t)                                         \
+  TEST_TYPE (int32_t, int16_t, uint16_t)                                       \
+  TEST_TYPE (int64_t, int32_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmaccsu\.vv} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
new file mode 100644
index 00000000000..187b6db21fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
+    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] += (TYPE1) a[i] * (TYPE1) b[i];                                 \
+	dst2[i] += (TYPE1) a2[i] * (TYPE1) b[i];                               \
+	dst3[i] += (TYPE1) a2[i] * (TYPE1) a[i];                               \
+	dst4[i] += (TYPE1) a[i] * (TYPE1) b2[i];                               \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmacc\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvwmaccu\.vv} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
new file mode 100644
index 00000000000..fa56f21aa81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, TYPE3)                                         \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE3 *__restrict b,          \
+    TYPE3 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] += (TYPE1) a[i] * (TYPE1) b[i];                                 \
+	dst2[i] += (TYPE1) a2[i] * (TYPE1) b[i];                               \
+	dst3[i] += (TYPE1) a2[i] * (TYPE1) a[i];                               \
+	dst4[i] += (TYPE1) a[i] * (TYPE1) b2[i];                               \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t, uint8_t)                                         \
+  TEST_TYPE (int32_t, int16_t, uint16_t)                                       \
+  TEST_TYPE (int64_t, int32_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmaccsu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwmacc\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwmaccu\.vv} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
new file mode 100644
index 00000000000..f4840d30dc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c
@@ -0,0 +1,38 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include <assert.h>
+#include "widen-8.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE2 b##TYPE2[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  TYPE1 dst2##TYPE1[SZ];                                                       \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % 8723;                                          \
+      b##TYPE2[i] = LIMIT + i & 1964;                                          \
+      dst##TYPE1[i] = LIMIT + i & 628;                                         \
+      dst2##TYPE1[i] = LIMIT + i & 628;                                        \
+    }                                                                          \
+  vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ);                 \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i]                                                      \
+	    == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) + dst2##TYPE1[i]);
+
+#define RUN_ALL()                                                              \
+  RUN (int16_t, int8_t, -128)                                                  \
+  RUN (uint16_t, uint8_t, 255)                                                 \
+  RUN (int32_t, int16_t, -32768)                                               \
+  RUN (uint32_t, uint16_t, 65535)                                              \
+  RUN (int64_t, int32_t, -2147483648)                                          \
+  RUN (uint64_t, uint32_t, 4294967295)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c
new file mode 100644
index 00000000000..2caa09a2c5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-9.c
@@ -0,0 +1,35 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include <assert.h>
+#include "widen-9.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, TYPE3, LIMIT)                                        \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE3 b##TYPE3[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  TYPE1 dst2##TYPE1[SZ];                                                       \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % 8723;                                          \
+      b##TYPE3[i] = LIMIT + i & 1964;                                          \
+      dst##TYPE1[i] = LIMIT + i & 728;                                         \
+      dst2##TYPE1[i] = LIMIT + i & 728;                                        \
+    }                                                                          \
+  vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE3, SZ);                 \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i]                                                      \
+	    == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE3[i]) + dst2##TYPE1[i]);
+
+#define RUN_ALL()                                                              \
+  RUN (int16_t, int8_t, uint8_t, -128)                                         \
+  RUN (int32_t, int16_t, uint16_t, -32768)                                     \
+  RUN (int64_t, int32_t, uint32_t, -2147483648)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
-- 
2.36.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] RISC-V: Enable SELECT_VL for RVV
  2023-06-06 11:46 [PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization juzhe.zhong
@ 2023-06-06 11:46 ` juzhe.zhong
  2023-06-06 11:49   ` 钟居哲
  2023-06-06 11:46 ` [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization juzhe.zhong
  1 sibling, 1 reply; 8+ messages in thread
From: juzhe.zhong @ 2023-06-06 11:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, palmer, rdapp.gcc, jeffreyalaw, Juzhe-Zhong

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

        * config/riscv/autovec.md (select_vl<mode>): New pattern.
        * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): export global.
        * config/riscv/riscv-v.cc (force_vector_length_operand): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Adapt test.
        * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
        * gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: New test.

---
 gcc/config/riscv/autovec.md                   | 19 +++++++++++++
 gcc/config/riscv/riscv-protos.h               |  1 +
 gcc/config/riscv/riscv-v.cc                   |  2 +-
 .../riscv/rvv/autovec/partial/select_vl-1.c   | 28 +++++++++++++++++++
 .../riscv/rvv/autovec/ternop/ternop-2.c       |  2 +-
 .../riscv/rvv/autovec/ternop/ternop-5.c       |  2 +-
 6 files changed, 51 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9f4492db23c..c298f069714 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -18,6 +18,25 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
+;; =========================================================================
+;; == SELECT_VL
+;; =========================================================================
+
+(define_expand "select_vl<mode>"
+  [(match_operand:P 0 "register_operand")
+   (match_operand:P 1 "vector_length_operand")
+   (match_operand:P 2 "")]
+  "TARGET_VECTOR"
+{
+  poly_int64 nunits = rtx_to_poly_int64 (operands[2]);
+  /* We arbitrary picked QImode as inner scalar mode to get vector mode.
+     since vsetvl only demand ratio. We let VSETVL PASS to optimize it.  */
+  scalar_int_mode mode = QImode;
+  machine_mode rvv_mode = riscv_vector::get_vector_mode (mode, nunits).require ();
+  emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (rvv_mode, operands[0], operands[1]));
+  DONE;
+})
+
 ;; =========================================================================
 ;; == Loads/Stores
 ;; =========================================================================
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 00e1b20c6c6..d770e5e826e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -232,6 +232,7 @@ enum vlen_enum
   RVV_64 = 64,
   RVV_65536 = 65536
 };
+rtx gen_no_side_effects_vsetvl_rtx (machine_mode, rtx, rtx);
 bool slide1_sew64_helper (int, machine_mode, machine_mode,
 			  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 49752cd8899..83277fc2c05 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1280,7 +1280,7 @@ force_vector_length_operand (rtx vl)
   return vl;
 }
 
-static rtx
+rtx
 gen_no_side_effects_vsetvl_rtx (machine_mode vmode, rtx vl, rtx avl)
 {
   unsigned int sew = get_sew (vmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
new file mode 100644
index 00000000000..b8e0ca0f1f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fno-vect-cost-model -fno-tree-loop-distribute-patterns -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE)                                                        \
+  __attribute__ ((noipa)) void select_vl_##TYPE (TYPE *__restrict dst,         \
+						 TYPE *__restrict a, int n)    \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = a[i];                                                           \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int8_t)                                                           \
+  TEST_TYPE (uint8_t)                                                          \
+  TEST_TYPE (int16_t)                                                          \
+  TEST_TYPE (uint16_t)                                                         \
+  TEST_TYPE (int32_t)                                                          \
+  TEST_TYPE (uint32_t)                                                         \
+  TEST_TYPE (int64_t)                                                          \
+  TEST_TYPE (uint64_t)                                                         \
+  TEST_TYPE (float)                                                            \
+  TEST_TYPE (double)
+
+TEST_ALL ()
+
+/* { dg-final { scan-tree-dump-times "\.SELECT_VL" 10 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
index 89eeaf6315f..e52e07ddd09 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-schedule-insns" } */
 
 #include <stdint-gcc.h>
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
index a9a7198feb4..49c85efbf3a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-schedule-insns" } */
 
 #include <stdint-gcc.h>
 
-- 
2.36.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
  2023-06-06 11:46 [PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization juzhe.zhong
  2023-06-06 11:46 ` [PATCH] RISC-V: Enable SELECT_VL for RVV juzhe.zhong
@ 2023-06-06 11:46 ` juzhe.zhong
  1 sibling, 0 replies; 8+ messages in thread
From: juzhe.zhong @ 2023-06-06 11:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, palmer, rdapp.gcc, jeffreyalaw, Juzhe-Zhong

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i * 8 + 0] = b[i * 8 + 7] + 1;
      a[i * 8 + 1] = b[i * 8 + 7] + 2;
      a[i * 8 + 2] = b[i * 8 + 7] + 8;
      a[i * 8 + 3] = b[i * 8 + 7] + 4;
      a[i * 8 + 4] = b[i * 8 + 7] + 5;
      a[i * 8 + 5] = b[i * 8 + 7] + 6;
      a[i * 8 + 6] = b[i * 8 + 7] + 7;
      a[i * 8 + 7] = b[i * 8 + 7] + 3;
    }
}

To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
        vsetvli a7,zero,e16,m2,ta,ma
        vid.v   v4
        vsrl.vi v4,v4,3
        li      a3,8
        vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
        li      t1,67633152
        addi    t1,t1,513
        li      a3,50790400
        addi    a3,a3,1541
        slli    a3,a3,32
        add     a3,a3,t1
        vsetvli t1,zero,e64,m1,ta,ma
        vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
        min     a3,...
        vsetvli zero,a3,e8,m1,ta,ma
        vle8.v  v2,0(a6)
        vsetvli a7,zero,e8,m1,ta,ma
        vrgatherei16.vv v1,v2,v4
        vadd.vv v1,v1,v3
        vsetvli zero,a3,e8,m1,ta,ma
        vse8.v  v1,0(a2)
        add     a6,a6,a4
        add     a2,a2,a4
        mv      a3,a5
        add     a5,a5,t1
        bgtu    a3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger
      range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
        lbu     a5,799(a1)
        addiw   a4,a5,1
        sb      a4,792(a0)
        addiw   a4,a5,2
        sb      a4,793(a0)
        addiw   a4,a5,8
        sb      a4,794(a0)
        addiw   a4,a5,4
        sb      a4,795(a0)
        addiw   a4,a5,5
        sb      a4,796(a0)
        addiw   a4,a5,6
        sb      a4,797(a0)
        addiw   a4,a5,7
        sb      a4,798(a0)
        addiw   a5,a5,3
        sb      a5,799(a0)
        ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

        * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
        * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
        (rvv_builder::single_step_npatterns_p): New function.
        (rvv_builder::npatterns_all_equal_p): Ditto.
        (const_vec_all_in_range_p): Support POLY handling.
        (gen_const_vector_dup): Ditto.
        (emit_vlmax_gather_insn): Add vrgatherei16.
        (emit_vlmax_masked_gather_mu_insn): Ditto.
        (expand_const_vector): Add VLA SLP const vector support.
        (expand_vec_perm): Support POLY.
        (struct expand_vec_perm_d): New struct.
        (shuffle_generic_patterns): New function.
        (expand_vec_perm_const_1): Ditto.
        (expand_vec_perm_const): Ditto.
        * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
        (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer.
        * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.

---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-v.cc                   | 352 ++++++++++++++++--
 gcc/config/riscv/riscv.cc                     |  16 +
 .../riscv/rvv/autovec/partial/slp-1.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-2.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-3.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-4.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-5.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-6.c         |  23 ++
 .../riscv/rvv/autovec/partial/slp-7.c         |  15 +
 .../riscv/rvv/autovec/partial/slp_run-1.c     |  66 ++++
 .../riscv/rvv/autovec/partial/slp_run-2.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-3.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-4.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-5.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-6.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-7.c     |  58 +++
 .../gcc.target/riscv/rvv/autovec/scalable-1.c |   2 +-
 .../gcc.target/riscv/rvv/autovec/v-1.c        |   7 +-
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c      |   2 +-
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c      |   2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   2 +-
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c      |   2 +-
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   2 +-
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c      |   2 +-
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c      |   2 +-
 26 files changed, 963 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d770e5e826e..27ecd16e496 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -168,6 +168,8 @@ void init_builtins (void);
 const char *mangle_builtin_type (const_tree);
 #ifdef GCC_TARGET_H
 bool verify_type_context (location_t, type_context_kind, const_tree, bool);
+bool expand_vec_perm_const (machine_mode, machine_mode, rtx, rtx, rtx,
+			    const vec_perm_indices &);
 #endif
 void handle_pragma_vector (void);
 tree builtin_decl (unsigned, bool);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 83277fc2c05..4864429ed06 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -251,9 +251,12 @@ public:
     m_inner_mode = GET_MODE_INNER (mode);
     m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
     m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
+    m_mask_mode = get_mask_mode (mode).require ();
 
     gcc_assert (
       int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
+    m_int_mode
+      = get_vector_mode (m_inner_int_mode, GET_MODE_NUNITS (mode)).require ();
   }
 
   bool can_duplicate_repeating_sequence_p ();
@@ -262,9 +265,14 @@ public:
   bool repeating_sequence_use_merge_profitable_p ();
   rtx get_merge_scalar_mask (unsigned int) const;
 
+  bool single_step_npatterns_p () const;
+  bool npatterns_all_equal_p () const;
+
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
   scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
+  machine_mode mask_mode () const { return m_mask_mode; }
+  machine_mode int_mode () const { return m_int_mode; }
   unsigned int inner_bits_size () const { return m_inner_bits_size; }
   unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
 
@@ -273,6 +281,8 @@ private:
   scalar_int_mode m_inner_int_mode;
   machine_mode m_new_mode;
   scalar_int_mode m_new_inner_mode;
+  machine_mode m_mask_mode;
+  machine_mode m_int_mode;
   unsigned int m_inner_bits_size;
   unsigned int m_inner_bytes_size;
 };
@@ -290,7 +300,9 @@ rvv_builder::can_duplicate_repeating_sequence_p ()
       || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD
       || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
     return false;
-  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+  if (full_nelts ().is_constant ())
+    return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+  return nelts_per_pattern () == 1;
 }
 
 /* Return true if it is a repeating sequence that using
@@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
   return gen_int_mode (mask, inner_int_mode ());
 }
 
+/* Return true if the variable-length vector is single step.  */
+bool
+rvv_builder::single_step_npatterns_p () const
+{
+  if (nelts_per_pattern () != 3)
+    return false;
+
+  poly_int64 step
+    = rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 0; i < npatterns (); i++)
+    {
+      poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
+      poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
+      poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
+      poly_int64 diff1 = ele1 - ele0;
+      poly_int64 diff2 = ele2 - ele1;
+      if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
+	return false;
+    }
+  return true;
+}
+
+/* Return true if all elements of NPATTERNS are equal.
+
+   E.g. NPATTERNS = 4:
+     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
+   E.g. NPATTERNS = 8:
+     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
+*/
+bool
+rvv_builder::npatterns_all_equal_p () const
+{
+  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 1; i < npatterns (); i++)
+    {
+      poly_int64 ele = rtx_to_poly_int64 (elt (i));
+      if (!known_eq (ele, ele0))
+	return false;
+    }
+  return true;
+}
+
 static unsigned
 get_sew (machine_mode mode)
 {
@@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
    future.  */
 
 static bool
-const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
+const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
 {
   if (!CONST_VECTOR_P (vec)
       || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
@@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
   for (int i = 0; i < nunits; i++)
     {
       rtx vec_elem = CONST_VECTOR_ELT (vec, i);
-      if (!CONST_INT_P (vec_elem)
-	  || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
+      poly_int64 value;
+      if (!poly_int_rtx_p (vec_elem, &value)
+	  || maybe_lt (value, minval)
+	  || maybe_gt (value, maxval))
 	return false;
     }
   return true;
@@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
    future.  */
 
 static rtx
-gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
+gen_const_vector_dup (machine_mode mode, poly_int64 val)
 {
   rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
   return gen_const_vec_duplicate (mode, c);
@@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   rtx elt;
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
-  if (const_vec_duplicate_p (sel, &elt))
+  machine_mode sel_mode = GET_MODE (sel);
+  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+    icode = code_for_pred_gatherei16 (data_mode);
+  else if (const_vec_duplicate_p (sel, &elt))
     {
       icode = code_for_pred_gather_scalar (data_mode);
       sel = elt;
@@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
   rtx elt;
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
-  if (const_vec_duplicate_p (sel, &elt))
+  machine_mode sel_mode = GET_MODE (sel);
+  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+    icode = code_for_pred_gatherei16 (data_mode);
+  else if (const_vec_duplicate_p (sel, &elt))
     {
       icode = code_for_pred_gather_scalar (data_mode);
       sel = elt;
@@ -895,11 +957,130 @@ expand_const_vector (rtx target, rtx src)
       return;
     }
 
-  /* TODO: We only support const duplicate vector for now. More cases
-     will be supported when we support auto-vectorization:
+  /* Handle variable-length vector.  */
+  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
+  rvv_builder builder (mode, npatterns, nelts_per_pattern);
+  for (unsigned int i = 0; i < nelts_per_pattern; i++)
+    {
+      for (unsigned int j = 0; j < npatterns; j++)
+	builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
+    }
+  builder.finalize ();
 
-       1. multiple elts duplicate vector.
-       2. multiple patterns with multiple elts.  */
+  if (CONST_VECTOR_DUPLICATE_P (src))
+    {
+      if (builder.can_duplicate_repeating_sequence_p ())
+	{
+	  rtx ele = builder.get_merged_repeating_sequence ();
+	  rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
+	  emit_move_insn (target, gen_lowpart (mode, dup));
+	}
+      else
+	{
+	  unsigned int nbits = npatterns - 1;
+
+	  /* Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+	  rtx vid = gen_reg_rtx (builder.int_mode ());
+	  rtx op[] = {vid};
+	  emit_vlmax_insn (code_for_pred_series (builder.int_mode ()),
+			   RVV_MISC_OP, op);
+
+	  /* Generate vid_repeat = { 0, 1, ... nbits, ... }  */
+	  rtx vid_repeat = gen_reg_rtx (builder.int_mode ());
+	  rtx and_ops[] = {vid_repeat, vid,
+			   gen_int_mode (nbits, builder.inner_int_mode ())};
+	  emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
+			   RVV_BINOP, and_ops);
+
+	  rtx tmp = gen_reg_rtx (builder.mode ());
+	  rtx dup_ops[] = {tmp, builder.elt (0)};
+	  emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), RVV_UNOP,
+			   dup_ops);
+	  for (unsigned int i = 1; i < builder.npatterns (); i++)
+	    {
+	      /* Generate mask according to i.  */
+	      rtx mask = gen_reg_rtx (builder.mask_mode ());
+	      rtx const_vec = gen_const_vector_dup (builder.int_mode (), i);
+	      expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
+
+	      /* Merge scalar to each i.  */
+	      rtx tmp2 = gen_reg_rtx (builder.mode ());
+	      rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
+	      insn_code icode = code_for_pred_merge_scalar (builder.mode ());
+	      emit_vlmax_merge_insn (icode, RVV_MERGE_OP, merge_ops);
+	      tmp = tmp2;
+	    }
+	  emit_move_insn (target, tmp);
+	}
+      return;
+    }
+  else if (CONST_VECTOR_STEPPED_P (src))
+    {
+      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+      if (builder.single_step_npatterns_p ())
+	{
+	  /* Describe the case by choosing NPATTERNS = 4 as an example.  */
+	  rtx base, step;
+	  if (builder.npatterns_all_equal_p ())
+	    {
+	      /* Generate the variable-length vector as below:
+		 E.g. { 0, 0, 0, 0, 8, 8, 8, 8, 16, 16, 16, 16, ... } */
+	      /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
+	      base = expand_vector_broadcast (builder.mode (), builder.elt (0));
+	    }
+	  else
+	    {
+	      /* Generate the variable-length vector as below:
+		 E.g. { 0, 6, 0, 6, 8, 14, 8, 14, 16, 22, 16, 22, ... } */
+	      /* Step 1: Generate base = { 0, 6, 0, 6, ... }.  */
+	      rvv_builder new_builder (builder.mode (), builder.npatterns (),
+				       1);
+	      for (unsigned int i = 0; i < builder.npatterns (); ++i)
+		new_builder.quick_push (builder.elt (i));
+	      rtx new_vec = new_builder.build ();
+	      base = gen_reg_rtx (builder.mode ());
+	      emit_move_insn (base, new_vec);
+	    }
+
+	  /* Step 2: Generate step = gen_int_mode (diff, mode).  */
+	  poly_int64 value1 = rtx_to_poly_int64 (builder.elt (0));
+	  poly_int64 value2
+	    = rtx_to_poly_int64 (builder.elt (builder.npatterns ()));
+	  poly_int64 diff = value2 - value1;
+	  step = gen_int_mode (diff, builder.inner_mode ());
+
+	  /* Step 3: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+	  rtx vid = gen_reg_rtx (builder.mode ());
+	  rtx op[] = {vid};
+	  emit_vlmax_insn (code_for_pred_series (builder.mode ()), RVV_MISC_OP,
+			   op);
+
+	  /* Step 4: Generate factor = { 0, 0, 0, 0, 1, 1, 1, 1, ... }.  */
+	  rtx factor = gen_reg_rtx (builder.mode ());
+	  rtx shift_ops[]
+	    = {factor, vid,
+	       gen_int_mode (exact_log2 (builder.npatterns ()), Pmode)};
+	  emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, builder.mode ()),
+			   RVV_BINOP, shift_ops);
+
+	  /* Step 5: Generate adjusted step = { 0, 0, 0, 0, diff, diff, ... } */
+	  rtx adjusted_step = gen_reg_rtx (builder.mode ());
+	  rtx mul_ops[] = {adjusted_step, factor, step};
+	  emit_vlmax_insn (code_for_pred_scalar (MULT, builder.mode ()),
+			   RVV_BINOP, mul_ops);
+
+	  /* Step 6: Generate the final result.  */
+	  rtx add_ops[] = {target, base, adjusted_step};
+	  emit_vlmax_insn (code_for_pred (PLUS, builder.mode ()), RVV_BINOP,
+			   add_ops);
+	}
+      else
+	/* TODO: We will enable more variable-length vector in the future.  */
+	gcc_unreachable ();
+    }
+  else
+    gcc_unreachable ();
 }
 
 /* Expand a pre-RA RVV data move from SRC to DEST.
@@ -2029,14 +2210,13 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
 {
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
-
-  /* Enforced by the pattern condition.  */
-  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
 
   /* Check if the sel only references the first values vector. If each select
      index is in range of [0, nunits - 1]. A single vrgather instructions is
-     enough.  */
-  if (const_vec_all_in_range_p (sel, 0, nunits - 1))
+     enough. Since we will use vrgatherei16.vv for variable-length vector,
+     it is never out of range and we don't need to modulo the index.  */
+  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
     {
       emit_vlmax_gather_insn (target, op0, sel);
       return;
@@ -2057,14 +2237,20 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
       return;
     }
 
-  /* Note: vec_perm indices are supposed to wrap when they go beyond the
-     size of the two value vectors, i.e. the upper bits of the indices
-     are effectively ignored.  RVV vrgather instead produces 0 for any
-     out-of-range indices, so we need to modulo all the vec_perm indices
-     to ensure they are all in range of [0, 2 * nunits - 1].  */
+  rtx sel_mod = sel;
   rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  rtx sel_mod
-    = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0, OPTAB_DIRECT);
+  /* We don't need to modulo indices for VLA vector.
+     Since we should gurantee they aren't out of range before.  */
+  if (nunits.is_constant ())
+    {
+      /* Note: vec_perm indices are supposed to wrap when they go beyond the
+	 size of the two value vectors, i.e. the upper bits of the indices
+	 are effectively ignored.  RVV vrgather instead produces 0 for any
+	 out-of-range indices, so we need to modulo all the vec_perm indices
+	 to ensure they are all in range of [0, 2 * nunits - 1].  */
+      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
+				     OPTAB_DIRECT);
+    }
 
   /* This following sequence is handling the case that:
      __builtin_shufflevector (vec1, vec2, index...), the index can be any
@@ -2094,4 +2280,124 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   emit_vlmax_masked_gather_mu_insn (target, op1, tmp, mask);
 }
 
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST for RVV.  */
+
+/* vec_perm support.  */
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  vec_perm_indices perm;
+  machine_mode vmode;
+  machine_mode op_mode;
+  bool one_vector_p;
+  bool testing_p;
+};
+
+/* Recognize the pattern that can be shuffled by generic approach.  */
+
+static bool
+shuffle_generic_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
+
+  /* For constant size indices, we dont't need to handle it here.
+     Just leave it to vec_perm<mode>.  */
+  if (d->perm.length ().is_constant ())
+    return false;
+
+  /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
+     Otherwise, it could overflow the index range.  */
+  if (GET_MODE_INNER (d->vmode) == QImode
+      && !get_vector_mode (HImode, nunits).exists (&sel_mode))
+    return false;
+
+  /* Success! */
+  if (d->testing_p)
+    return true;
+
+  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+  expand_vec_perm (d->target, d->op0, d->op1, force_reg (sel_mode, sel));
+  return true;
+}
+
+static bool
+expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
+{
+  gcc_assert (d->op_mode != E_VOIDmode);
+
+  /* The pattern matching functions above are written to look for a small
+     number to begin the sequence (0, 1, N/2).  If we begin with an index
+     from the second operand, we can swap the operands.  */
+  poly_int64 nelt = d->perm.length ();
+  if (known_ge (d->perm[0], nelt))
+    {
+      d->perm.rotate_inputs (1);
+      std::swap (d->op0, d->op1);
+    }
+
+  if (known_gt (nelt, 1))
+    {
+      if (d->vmode == d->op_mode)
+	{
+	  if (shuffle_generic_patterns (d))
+	    return true;
+	  return false;
+	}
+      else
+	return false;
+    }
+  return false;
+}
+
+bool
+expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
+		       rtx op0, rtx op1, const vec_perm_indices &sel)
+{
+  /* RVV doesn't have Mask type pack/unpack instructions and we don't use
+     mask to do the iteration loop control. Just disable it directly.  */
+  if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
+    return false;
+
+  struct expand_vec_perm_d d;
+
+  /* Check whether the mask can be applied to a single vector.  */
+  if (sel.ninputs () == 1 || (op0 && rtx_equal_p (op0, op1)))
+    d.one_vector_p = true;
+  else if (sel.all_from_input_p (0))
+    {
+      d.one_vector_p = true;
+      op1 = op0;
+    }
+  else if (sel.all_from_input_p (1))
+    {
+      d.one_vector_p = true;
+      op0 = op1;
+    }
+  else
+    d.one_vector_p = false;
+
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
+		     sel.nelts_per_input ());
+  d.vmode = vmode;
+  d.op_mode = op_mode;
+  d.target = target;
+  d.op0 = op0;
+  if (op0 == op1)
+    d.op1 = d.op0;
+  else
+    d.op1 = op1;
+  d.testing_p = !target;
+
+  if (!d.testing_p)
+    return expand_vec_perm_const_1 (&d);
+
+  rtx_insn *last = get_last_insn ();
+  bool ret = expand_vec_perm_const_1 (&d);
+  gcc_assert (last == get_last_insn ());
+
+  return ret;
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index caa7858b864..5d22012b591 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7631,6 +7631,19 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
   return default_vectorize_related_mode (vector_mode, element_mode, nunits);
 }
 
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
+
+static bool
+riscv_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
+				rtx target, rtx op0, rtx op1,
+				const vec_perm_indices &sel)
+{
+  if (TARGET_VECTOR && riscv_v_ext_vector_mode_p (vmode))
+    return riscv_vector::expand_vec_perm_const (vmode, op_mode, target, op0,
+						op1, sel);
+
+  return false;
+}
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -7930,6 +7943,9 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
 #undef TARGET_VECTORIZE_RELATED_MODE
 #define TARGET_VECTORIZE_RELATED_MODE riscv_vectorize_related_mode
 
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST riscv_vectorize_vec_perm_const
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
new file mode 100644
index 00000000000..befb518e2dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
new file mode 100644
index 00000000000..ac817451295
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
new file mode 100644
index 00000000000..73962055b03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
new file mode 100644
index 00000000000..fa216fc8c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
new file mode 100644
index 00000000000..899ed9e310b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 4] + 3;
+      a[i * 8 + 3] = b[i * 8 + 8] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 4] + 7;
+      a[i * 8 + 7] = b[i * 8 + 8] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
new file mode 100644
index 00000000000..fb87cc00cea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 2] + 2;
+      a[i * 8 + 2] = b[i * 8 + 6] + 8;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 4] + 6;
+      a[i * 8 + 6] = b[i * 8 + 5] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
new file mode 100644
index 00000000000..3dd744b586e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (float *__restrict f, double *__restrict d, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      f[i * 2 + 0] = 1;
+      f[i * 2 + 1] = 2;
+      d[i] = 3;
+    }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
new file mode 100644
index 00000000000..16f078a0433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-1.c"
+
+#define LIMIT 128
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 37] = {0};                                          \
+  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
new file mode 100644
index 00000000000..41f688f628c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-2.c"
+
+#define LIMIT 32767
+
+void __attribute__ ((optimize (0)))
+f_golden (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
+  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
+  int16_t b_##NUM[NUM * 8 + 37] = {0};                                         \
+  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
new file mode 100644
index 00000000000..30996cb2c6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-3.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 8] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
new file mode 100644
index 00000000000..3d43ef0890c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-4.c"
+
+#define LIMIT 32767
+
+void __attribute__ ((optimize (0)))
+f_golden (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
+  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
+  int16_t b_##NUM[NUM * 8 + 8] = {0};                                          \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
new file mode 100644
index 00000000000..814308bd7af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-5.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 4] + 3;
+      a[i * 8 + 3] = b[i * 8 + 8] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 4] + 7;
+      a[i * 8 + 7] = b[i * 8 + 8] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
new file mode 100644
index 00000000000..e317eeac2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-6.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 2] + 2;
+      a[i * 8 + 2] = b[i * 8 + 6] + 8;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 4] + 6;
+      a[i * 8 + 6] = b[i * 8 + 5] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
new file mode 100644
index 00000000000..a8e4781988e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
@@ -0,0 +1,58 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-7.c"
+
+void
+f_golden (float *__restrict f, double *__restrict d, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      f[i * 2 + 0] = 1;
+      f[i * 2 + 1] = 2;
+      d[i] = 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  float a_##NUM[NUM * 2 + 2] = {0};                                            \
+  float a_golden_##NUM[NUM * 2 + 2] = {0};                                     \
+  double b_##NUM[NUM] = {0};                                                   \
+  double b_golden_##NUM[NUM] = {0};                                            \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_golden_##NUM, NUM);                              \
+  for (int i = 0; i < NUM; i++)                                                \
+    {                                                                          \
+      if (a_##NUM[i * 2 + 0] != a_golden_##NUM[i * 2 + 0])                     \
+	__builtin_abort ();                                                    \
+      if (a_##NUM[i * 2 + 1] != a_golden_##NUM[i * 2 + 1])                     \
+	__builtin_abort ();                                                    \
+      if (b_##NUM[i] != b_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
index 500b0adce66..3c03a87377d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
@@ -14,4 +14,4 @@ f (int32_t *__restrict f, int32_t *__restrict d, int n)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
index 383c82a3b7c..e68d05f5f48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
@@ -3,9 +3,4 @@
 
 #include "template-1.h"
 
-/* Currently, we don't support SLP auto-vectorization for VLA. But it's
-   necessary that we add this testcase here to make sure such unsupported SLP
-   auto-vectorization will not cause an ICE. We will enable "vect" checking when
-   we support SLP auto-vectorization for VLA in the future.  */
-
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
index 23cc1c8651f..ecfda79e19a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
index 4f130f02f67..1394f08f2b9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
index 823d51a03cb..c5e89996fa4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
index 5ead22746d3..6b320ca6f38 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
index e03d1b44ca6..6c2a002de9c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
index 5bb2d9d96fa..ae3f066477c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
index 71820ece4b2..fc676a3865e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
-- 
2.36.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Enable SELECT_VL for RVV
  2023-06-06 11:46 ` [PATCH] RISC-V: Enable SELECT_VL for RVV juzhe.zhong
@ 2023-06-06 11:49   ` 钟居哲
  0 siblings, 0 replies; 8+ messages in thread
From: 钟居哲 @ 2023-06-06 11:49 UTC (permalink / raw)
  To: 钟居哲, gcc-patches
  Cc: kito.cheng, palmer, rdapp.gcc, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 6668 bytes --]

Oh。 Sorry my mistake. Forget about this patch since SELECT_VL is not merged into middle-end yet.



juzhe.zhong@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-06 19:46
To: gcc-patches
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; Juzhe-Zhong
Subject: [PATCH] RISC-V: Enable SELECT_VL for RVV
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
 
gcc/ChangeLog:
 
        * config/riscv/autovec.md (select_vl<mode>): New pattern.
        * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): export global.
        * config/riscv/riscv-v.cc (force_vector_length_operand): Ditto.
 
gcc/testsuite/ChangeLog:
 
        * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Adapt test.
        * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
        * gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: New test.
 
---
gcc/config/riscv/autovec.md                   | 19 +++++++++++++
gcc/config/riscv/riscv-protos.h               |  1 +
gcc/config/riscv/riscv-v.cc                   |  2 +-
.../riscv/rvv/autovec/partial/select_vl-1.c   | 28 +++++++++++++++++++
.../riscv/rvv/autovec/ternop/ternop-2.c       |  2 +-
.../riscv/rvv/autovec/ternop/ternop-5.c       |  2 +-
6 files changed, 51 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9f4492db23c..c298f069714 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -18,6 +18,25 @@
;; along with GCC; see the file COPYING3.  If not see
;; <http://www.gnu.org/licenses/>.
+;; =========================================================================
+;; == SELECT_VL
+;; =========================================================================
+
+(define_expand "select_vl<mode>"
+  [(match_operand:P 0 "register_operand")
+   (match_operand:P 1 "vector_length_operand")
+   (match_operand:P 2 "")]
+  "TARGET_VECTOR"
+{
+  poly_int64 nunits = rtx_to_poly_int64 (operands[2]);
+  /* We arbitrary picked QImode as inner scalar mode to get vector mode.
+     since vsetvl only demand ratio. We let VSETVL PASS to optimize it.  */
+  scalar_int_mode mode = QImode;
+  machine_mode rvv_mode = riscv_vector::get_vector_mode (mode, nunits).require ();
+  emit_insn (riscv_vector::gen_no_side_effects_vsetvl_rtx (rvv_mode, operands[0], operands[1]));
+  DONE;
+})
+
;; =========================================================================
;; == Loads/Stores
;; =========================================================================
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 00e1b20c6c6..d770e5e826e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -232,6 +232,7 @@ enum vlen_enum
   RVV_64 = 64,
   RVV_65536 = 65536
};
+rtx gen_no_side_effects_vsetvl_rtx (machine_mode, rtx, rtx);
bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
rtx gen_avl_for_scalar_move (rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 49752cd8899..83277fc2c05 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1280,7 +1280,7 @@ force_vector_length_operand (rtx vl)
   return vl;
}
-static rtx
+rtx
gen_no_side_effects_vsetvl_rtx (machine_mode vmode, rtx vl, rtx avl)
{
   unsigned int sew = get_sew (vmode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
new file mode 100644
index 00000000000..b8e0ca0f1f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/select_vl-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fno-vect-cost-model -fno-tree-loop-distribute-patterns -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE)                                                        \
+  __attribute__ ((noipa)) void select_vl_##TYPE (TYPE *__restrict dst,         \
+ TYPE *__restrict a, int n)    \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = a[i];                                                           \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int8_t)                                                           \
+  TEST_TYPE (uint8_t)                                                          \
+  TEST_TYPE (int16_t)                                                          \
+  TEST_TYPE (uint16_t)                                                         \
+  TEST_TYPE (int32_t)                                                          \
+  TEST_TYPE (uint32_t)                                                         \
+  TEST_TYPE (int64_t)                                                          \
+  TEST_TYPE (uint64_t)                                                         \
+  TEST_TYPE (float)                                                            \
+  TEST_TYPE (double)
+
+TEST_ALL ()
+
+/* { dg-final { scan-tree-dump-times "\.SELECT_VL" 10 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
index 89eeaf6315f..e52e07ddd09 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-schedule-insns" } */
#include <stdint-gcc.h>
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
index a9a7198feb4..49c85efbf3a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-schedule-insns" } */
#include <stdint-gcc.h>
-- 
2.36.1
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
  2023-06-07  0:38 ` juzhe.zhong
@ 2023-06-07  2:38   ` Kito Cheng
  0 siblings, 0 replies; 8+ messages in thread
From: Kito Cheng @ 2023-06-07  2:38 UTC (permalink / raw)
  To: juzhe.zhong
  Cc: gcc-patches, Kito.cheng, palmer, palmer, jeffreyalaw, Robin Dapp,
	pan2.li

Few comments, but all comments are asking adding more comment :P

> @@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
>    return gen_int_mode (mask, inner_int_mode ());
>  }
>
> +/* Return true if the variable-length vector is single step.  */
> +bool
> +rvv_builder::single_step_npatterns_p () const

what is single_step_npatterns? could you have more comment?

> +{
> +  if (nelts_per_pattern () != 3)
> +    return false;
> +
> +  poly_int64 step
> +    = rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 0; i < npatterns (); i++)
> +    {
> +      poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
> +      poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
> +      poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
> +      poly_int64 diff1 = ele1 - ele0;
> +      poly_int64 diff2 = ele2 - ele1;
> +      if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
> +       return false;
> +    }
> +  return true;
> +}
> +
> +/* Return true if all elements of NPATTERNS are equal.
> +
> +   E.g. NPATTERNS = 4:
> +     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
> +   E.g. NPATTERNS = 8:
> +     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
> +*/
> +bool
> +rvv_builder::npatterns_all_equal_p () const
> +{
> +  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 1; i < npatterns (); i++)
> +    {
> +      poly_int64 ele = rtx_to_poly_int64 (elt (i));
> +      if (!known_eq (ele, ele0))
> +       return false;
> +    }
> +  return true;
> +}
> +
>  static unsigned
>  get_sew (machine_mode mode)
>  {
> @@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
>     future.  */
>
>  static bool
> -const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
> +const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
>  {
>    if (!CONST_VECTOR_P (vec)
>        || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
> @@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
>    for (int i = 0; i < nunits; i++)
>      {
>        rtx vec_elem = CONST_VECTOR_ELT (vec, i);
> -      if (!CONST_INT_P (vec_elem)
> -         || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
> +      poly_int64 value;
> +      if (!poly_int_rtx_p (vec_elem, &value)
> +         || maybe_lt (value, minval)
> +         || maybe_gt (value, maxval))
>         return false;
>      }
>    return true;
> @@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
>     future.  */
>
>  static rtx
> -gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
> +gen_const_vector_dup (machine_mode mode, poly_int64 val)
>  {
>    rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
>    return gen_const_vec_duplicate (mode, c);
> @@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
>    rtx elt;
>    insn_code icode;
>    machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +    icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>      {
>        icode = code_for_pred_gather_scalar (data_mode);
>        sel = elt;
> @@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
>    rtx elt;
>    insn_code icode;
>    machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +    icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>      {
>        icode = code_for_pred_gather_scalar (data_mode);
>        sel = elt;
> @@ -895,11 +957,130 @@ expand_const_vector (rtx target, rtx src)
>        return;
>      }
>
> -  /* TODO: We only support const duplicate vector for now. More cases
> -     will be supported when we support auto-vectorization:
> +  /* Handle variable-length vector.  */
> +  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
> +  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
> +  rvv_builder builder (mode, npatterns, nelts_per_pattern);
> +  for (unsigned int i = 0; i < nelts_per_pattern; i++)
> +    {
> +      for (unsigned int j = 0; j < npatterns; j++)
> +       builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
> +    }
> +  builder.finalize ();
>
> -       1. multiple elts duplicate vector.
> -       2. multiple patterns with multiple elts.  */
> +  if (CONST_VECTOR_DUPLICATE_P (src))


I thought it's a predicator for a vector with same value like [a, a,
a, a,...] when I read the check
but seems like not? so could you add more comment for that?

> +    {
> +      if (builder.can_duplicate_repeating_sequence_p ())

Also more comment about this

> +       {
> +         rtx ele = builder.get_merged_repeating_sequence ();
> +         rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
> +         emit_move_insn (target, gen_lowpart (mode, dup));
> +       }
> +      else

and this.

> +       {
> +         unsigned int nbits = npatterns - 1;
> +
> +         /* Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
> +         rtx vid = gen_reg_rtx (builder.int_mode ());
> +         rtx op[] = {vid};
> +         emit_vlmax_insn (code_for_pred_series (builder.int_mode ()),
> +                          RVV_MISC_OP, op);
> +
> +         /* Generate vid_repeat = { 0, 1, ... nbits, ... }  */
> +         rtx vid_repeat = gen_reg_rtx (builder.int_mode ());
> +         rtx and_ops[] = {vid_repeat, vid,
> +                          gen_int_mode (nbits, builder.inner_int_mode ())};
> +         emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
> +                          RVV_BINOP, and_ops);
> +
> +         rtx tmp = gen_reg_rtx (builder.mode ());
> +         rtx dup_ops[] = {tmp, builder.elt (0)};
> +         emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), RVV_UNOP,
> +                          dup_ops);
> +         for (unsigned int i = 1; i < builder.npatterns (); i++)
> +           {
> +             /* Generate mask according to i.  */
> +             rtx mask = gen_reg_rtx (builder.mask_mode ());
> +             rtx const_vec = gen_const_vector_dup (builder.int_mode (), i);
> +             expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
> +
> +             /* Merge scalar to each i.  */
> +             rtx tmp2 = gen_reg_rtx (builder.mode ());
> +             rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
> +             insn_code icode = code_for_pred_merge_scalar (builder.mode ());
> +             emit_vlmax_merge_insn (icode, RVV_MERGE_OP, merge_ops);
> +             tmp = tmp2;
> +           }
> +         emit_move_insn (target, tmp);
> +       }
> +      return;
> +    }
> +  else if (CONST_VECTOR_STEPPED_P (src))
> +    {
> +      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> +      if (builder.single_step_npatterns_p ())
> +       {
> +         /* Describe the case by choosing NPATTERNS = 4 as an example.  */
> +         rtx base, step;
> +         if (builder.npatterns_all_equal_p ())
> +           {
> +             /* Generate the variable-length vector as below:
> +                E.g. { 0, 0, 0, 0, 8, 8, 8, 8, 16, 16, 16, 16, ... } */

Add more comment like:
{ a, a, a, a, a + step, a + step, a + step, a + step, a + step * 2, a
+ step * 2,, a + step * 2,, a + step * 2, ...}

> +             /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
> +             base = expand_vector_broadcast (builder.mode (), builder.elt (0));
> +           }
> +         else
> +           {
> +             /* Generate the variable-length vector as below:
> +                E.g. { 0, 6, 0, 6, 8, 14, 8, 14, 16, 22, 16, 22, ... } */

Add more comment like:
{ a, b, a, b, a + step, b + step, a + step *2, b + step *2, ...}

> +             /* Step 1: Generate base = { 0, 6, 0, 6, ... }.  */
> +             rvv_builder new_builder (builder.mode (), builder.npatterns (),
> +                                      1);
> +             for (unsigned int i = 0; i < builder.npatterns (); ++i)
> +               new_builder.quick_push (builder.elt (i));
> +             rtx new_vec = new_builder.build ();
> +             base = gen_reg_rtx (builder.mode ());
> +             emit_move_insn (base, new_vec);
> +           }
> +
> +         /* Step 2: Generate step = gen_int_mode (diff, mode).  */
> +         poly_int64 value1 = rtx_to_poly_int64 (builder.elt (0));
> +         poly_int64 value2
> +           = rtx_to_poly_int64 (builder.elt (builder.npatterns ()));
> +         poly_int64 diff = value2 - value1;
> +         step = gen_int_mode (diff, builder.inner_mode ());
> +
> +         /* Step 3: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
> +         rtx vid = gen_reg_rtx (builder.mode ());
> +         rtx op[] = {vid};
> +         emit_vlmax_insn (code_for_pred_series (builder.mode ()), RVV_MISC_OP,
> +                          op);
> +
> +         /* Step 4: Generate factor = { 0, 0, 0, 0, 1, 1, 1, 1, ... }.  */
> +         rtx factor = gen_reg_rtx (builder.mode ());
> +         rtx shift_ops[]
> +           = {factor, vid,
> +              gen_int_mode (exact_log2 (builder.npatterns ()), Pmode)};

Do we have check builder.npatterns () must be power of 2 in somewhere?

> +         emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, builder.mode ()),
> +                          RVV_BINOP, shift_ops);
> +
> +         /* Step 5: Generate adjusted step = { 0, 0, 0, 0, diff, diff, ... } */
> +         rtx adjusted_step = gen_reg_rtx (builder.mode ());
> +         rtx mul_ops[] = {adjusted_step, factor, step};
> +         emit_vlmax_insn (code_for_pred_scalar (MULT, builder.mode ()),
> +                          RVV_BINOP, mul_ops);
> +
> +         /* Step 6: Generate the final result.  */
> +         rtx add_ops[] = {target, base, adjusted_step};
> +         emit_vlmax_insn (code_for_pred (PLUS, builder.mode ()), RVV_BINOP,
> +                          add_ops);
> +       }
> +      else
> +       /* TODO: We will enable more variable-length vector in the future.  */
> +       gcc_unreachable ();
> +    }
> +  else
> +    gcc_unreachable ();
>  }
>
>  /* Expand a pre-RA RVV data move from SRC to DEST.

On Wed, Jun 7, 2023 at 8:39 AM juzhe.zhong@rivai.ai
<juzhe.zhong@rivai.ai> wrote:
>
> Ping this patch. Ok for trunk ?
> Since following patches are blocked by this.
>
>
>
> juzhe.zhong@rivai.ai
>
> From: juzhe.zhong
> Date: 2023-06-06 12:16
> To: gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; Juzhe-Zhong
> Subject: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> This patch enables basic VLA SLP auto-vectorization.
> Consider this following case:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
>   for (int i = 0; i < 100; ++i)
>     {
>       a[i * 8 + 0] = b[i * 8 + 7] + 1;
>       a[i * 8 + 1] = b[i * 8 + 7] + 2;
>       a[i * 8 + 2] = b[i * 8 + 7] + 8;
>       a[i * 8 + 3] = b[i * 8 + 7] + 4;
>       a[i * 8 + 4] = b[i * 8 + 7] + 5;
>       a[i * 8 + 5] = b[i * 8 + 7] + 6;
>       a[i * 8 + 6] = b[i * 8 + 7] + 7;
>       a[i * 8 + 7] = b[i * 8 + 7] + 3;
>     }
> }
>
> To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:
>
> 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
> { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
>
> 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
> { 1, 2, 8, 4, 5, 6, 7, 3, ... }
>
> And these vector can be generated at prologue.
>
> After this patch, we end up with this following codegen:
>
> Prologue:
> ...
>         vsetvli a7,zero,e16,m2,ta,ma
>         vid.v   v4
>         vsrl.vi v4,v4,3
>         li      a3,8
>         vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
> ...
>         li      t1,67633152
>         addi    t1,t1,513
>         li      a3,50790400
>         addi    a3,a3,1541
>         slli    a3,a3,32
>         add     a3,a3,t1
>         vsetvli t1,zero,e64,m1,ta,ma
>         vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
> ...
> LoopBody:
> ...
>         min     a3,...
>         vsetvli zero,a3,e8,m1,ta,ma
>         vle8.v  v2,0(a6)
>         vsetvli a7,zero,e8,m1,ta,ma
>         vrgatherei16.vv v1,v2,v4
>         vadd.vv v1,v1,v3
>         vsetvli zero,a3,e8,m1,ta,ma
>         vse8.v  v1,0(a2)
>         add     a6,a6,a4
>         add     a2,a2,a4
>         mv      a3,a5
>         add     a5,a5,t1
>         bgtu    a3,a4,.L3
> ...
>
> Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger
>       range than "vrgather.vv" (which only can maximum element index = 255).
> Epilogue:
>         lbu     a5,799(a1)
>         addiw   a4,a5,1
>         sb      a4,792(a0)
>         addiw   a4,a5,2
>         sb      a4,793(a0)
>         addiw   a4,a5,8
>         sb      a4,794(a0)
>         addiw   a4,a5,4
>         sb      a4,795(a0)
>         addiw   a4,a5,5
>         sb      a4,796(a0)
>         addiw   a4,a5,6
>         sb      a4,797(a0)
>         addiw   a4,a5,7
>         sb      a4,798(a0)
>         addiw   a5,a5,3
>         sb      a5,799(a0)
>         ret
>
> There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support.
> I will support VLS modes for "Epilogue auto-vectorization" in the future.
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
>         * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
>         (rvv_builder::single_step_npatterns_p): New function.
>         (rvv_builder::npatterns_all_equal_p): Ditto.
>         (const_vec_all_in_range_p): Support POLY handling.
>         (gen_const_vector_dup): Ditto.
>         (emit_vlmax_gather_insn): Add vrgatherei16.
>         (emit_vlmax_masked_gather_mu_insn): Ditto.
>         (expand_const_vector): Add VLA SLP const vector support.
>         (expand_vec_perm): Support POLY.
>         (struct expand_vec_perm_d): New struct.
>         (shuffle_generic_patterns): New function.
>         (expand_vec_perm_const_1): Ditto.
>         (expand_vec_perm_const): Ditto.
>         * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer.
>         * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
>
> ---
> gcc/config/riscv/riscv-protos.h               |   2 +
> gcc/config/riscv/riscv-v.cc                   | 352 ++++++++++++++++--
> gcc/config/riscv/riscv.cc                     |  16 +
> .../riscv/rvv/autovec/partial/slp-1.c         |  22 ++
> .../riscv/rvv/autovec/partial/slp-2.c         |  22 ++
> .../riscv/rvv/autovec/partial/slp-3.c         |  22 ++
> .../riscv/rvv/autovec/partial/slp-4.c         |  22 ++
> .../riscv/rvv/autovec/partial/slp-5.c         |  22 ++
> .../riscv/rvv/autovec/partial/slp-6.c         |  23 ++
> .../riscv/rvv/autovec/partial/slp-7.c         |  15 +
> .../riscv/rvv/autovec/partial/slp_run-1.c     |  66 ++++
> .../riscv/rvv/autovec/partial/slp_run-2.c     |  67 ++++
> .../riscv/rvv/autovec/partial/slp_run-3.c     |  67 ++++
> .../riscv/rvv/autovec/partial/slp_run-4.c     |  67 ++++
> .../riscv/rvv/autovec/partial/slp_run-5.c     |  67 ++++
> .../riscv/rvv/autovec/partial/slp_run-6.c     |  67 ++++
> .../riscv/rvv/autovec/partial/slp_run-7.c     |  58 +++
> .../gcc.target/riscv/rvv/autovec/scalable-1.c |   2 +-
> .../gcc.target/riscv/rvv/autovec/v-1.c        |   7 +-
> .../riscv/rvv/autovec/zve32f_zvl128b-1.c      |   2 +-
> .../riscv/rvv/autovec/zve32x_zvl128b-1.c      |   2 +-
> .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   2 +-
> .../riscv/rvv/autovec/zve64d_zvl128b-1.c      |   2 +-
> .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   2 +-
> .../riscv/rvv/autovec/zve64f_zvl128b-1.c      |   2 +-
> .../riscv/rvv/autovec/zve64x_zvl128b-1.c      |   2 +-
> 26 files changed, 963 insertions(+), 37 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index d770e5e826e..27ecd16e496 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -168,6 +168,8 @@ void init_builtins (void);
> const char *mangle_builtin_type (const_tree);
> #ifdef GCC_TARGET_H
> bool verify_type_context (location_t, type_context_kind, const_tree, bool);
> +bool expand_vec_perm_const (machine_mode, machine_mode, rtx, rtx, rtx,
> +     const vec_perm_indices &);
> #endif
> void handle_pragma_vector (void);
> tree builtin_decl (unsigned, bool);
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 83277fc2c05..4864429ed06 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -251,9 +251,12 @@ public:
>      m_inner_mode = GET_MODE_INNER (mode);
>      m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
>      m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
> +    m_mask_mode = get_mask_mode (mode).require ();
>      gcc_assert (
>        int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
> +    m_int_mode
> +      = get_vector_mode (m_inner_int_mode, GET_MODE_NUNITS (mode)).require ();
>    }
>    bool can_duplicate_repeating_sequence_p ();
> @@ -262,9 +265,14 @@ public:
>    bool repeating_sequence_use_merge_profitable_p ();
>    rtx get_merge_scalar_mask (unsigned int) const;
> +  bool single_step_npatterns_p () const;
> +  bool npatterns_all_equal_p () const;
> +
>    machine_mode new_mode () const { return m_new_mode; }
>    scalar_mode inner_mode () const { return m_inner_mode; }
>    scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
> +  machine_mode mask_mode () const { return m_mask_mode; }
> +  machine_mode int_mode () const { return m_int_mode; }
>    unsigned int inner_bits_size () const { return m_inner_bits_size; }
>    unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
> @@ -273,6 +281,8 @@ private:
>    scalar_int_mode m_inner_int_mode;
>    machine_mode m_new_mode;
>    scalar_int_mode m_new_inner_mode;
> +  machine_mode m_mask_mode;
> +  machine_mode m_int_mode;
>    unsigned int m_inner_bits_size;
>    unsigned int m_inner_bytes_size;
> };
> @@ -290,7 +300,9 @@ rvv_builder::can_duplicate_repeating_sequence_p ()
>        || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD
>        || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
>      return false;
> -  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
> +  if (full_nelts ().is_constant ())
> +    return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
> +  return nelts_per_pattern () == 1;
> }
> /* Return true if it is a repeating sequence that using
> @@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
>    return gen_int_mode (mask, inner_int_mode ());
> }
> +/* Return true if the variable-length vector is single step.  */
> +bool
> +rvv_builder::single_step_npatterns_p () const
> +{
> +  if (nelts_per_pattern () != 3)
> +    return false;
> +
> +  poly_int64 step
> +    = rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 0; i < npatterns (); i++)
> +    {
> +      poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
> +      poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
> +      poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
> +      poly_int64 diff1 = ele1 - ele0;
> +      poly_int64 diff2 = ele2 - ele1;
> +      if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
> + return false;
> +    }
> +  return true;
> +}
> +
> +/* Return true if all elements of NPATTERNS are equal.
> +
> +   E.g. NPATTERNS = 4:
> +     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
> +   E.g. NPATTERNS = 8:
> +     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
> +*/
> +bool
> +rvv_builder::npatterns_all_equal_p () const
> +{
> +  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 1; i < npatterns (); i++)
> +    {
> +      poly_int64 ele = rtx_to_poly_int64 (elt (i));
> +      if (!known_eq (ele, ele0))
> + return false;
> +    }
> +  return true;
> +}
> +
> static unsigned
> get_sew (machine_mode mode)
> {
> @@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
>     future.  */
> static bool
> -const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
> +const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
> {
>    if (!CONST_VECTOR_P (vec)
>        || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
> @@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
>    for (int i = 0; i < nunits; i++)
>      {
>        rtx vec_elem = CONST_VECTOR_ELT (vec, i);
> -      if (!CONST_INT_P (vec_elem)
> -   || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
> +      poly_int64 value;
> +      if (!poly_int_rtx_p (vec_elem, &value)
> +   || maybe_lt (value, minval)
> +   || maybe_gt (value, maxval))
> return false;
>      }
>    return true;
> @@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
>     future.  */
> static rtx
> -gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
> +gen_const_vector_dup (machine_mode mode, poly_int64 val)
> {
>    rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
>    return gen_const_vec_duplicate (mode, c);
> @@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
>    rtx elt;
>    insn_code icode;
>    machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +    icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>      {
>        icode = code_for_pred_gather_scalar (data_mode);
>        sel = elt;
> @@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
>    rtx elt;
>    insn_code icode;
>    machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +    icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>      {
>        icode = code_for_pred_gather_scalar (data_mode);
>        sel = elt;
> @@ -895,11 +957,130 @@ expand_const_vector (rtx target, rtx src)
>        return;
>      }
> -  /* TODO: We only support const duplicate vector for now. More cases
> -     will be supported when we support auto-vectorization:
> +  /* Handle variable-length vector.  */
> +  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
> +  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
> +  rvv_builder builder (mode, npatterns, nelts_per_pattern);
> +  for (unsigned int i = 0; i < nelts_per_pattern; i++)
> +    {
> +      for (unsigned int j = 0; j < npatterns; j++)
> + builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
> +    }
> +  builder.finalize ();
> -       1. multiple elts duplicate vector.
> -       2. multiple patterns with multiple elts.  */
> +  if (CONST_VECTOR_DUPLICATE_P (src))
> +    {
> +      if (builder.can_duplicate_repeating_sequence_p ())
> + {
> +   rtx ele = builder.get_merged_repeating_sequence ();
> +   rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
> +   emit_move_insn (target, gen_lowpart (mode, dup));
> + }
> +      else
> + {
> +   unsigned int nbits = npatterns - 1;
> +
> +   /* Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
> +   rtx vid = gen_reg_rtx (builder.int_mode ());
> +   rtx op[] = {vid};
> +   emit_vlmax_insn (code_for_pred_series (builder.int_mode ()),
> +    RVV_MISC_OP, op);
> +
> +   /* Generate vid_repeat = { 0, 1, ... nbits, ... }  */
> +   rtx vid_repeat = gen_reg_rtx (builder.int_mode ());
> +   rtx and_ops[] = {vid_repeat, vid,
> +    gen_int_mode (nbits, builder.inner_int_mode ())};
> +   emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
> +    RVV_BINOP, and_ops);
> +
> +   rtx tmp = gen_reg_rtx (builder.mode ());
> +   rtx dup_ops[] = {tmp, builder.elt (0)};
> +   emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), RVV_UNOP,
> +    dup_ops);
> +   for (unsigned int i = 1; i < builder.npatterns (); i++)
> +     {
> +       /* Generate mask according to i.  */
> +       rtx mask = gen_reg_rtx (builder.mask_mode ());
> +       rtx const_vec = gen_const_vector_dup (builder.int_mode (), i);
> +       expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
> +
> +       /* Merge scalar to each i.  */
> +       rtx tmp2 = gen_reg_rtx (builder.mode ());
> +       rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
> +       insn_code icode = code_for_pred_merge_scalar (builder.mode ());
> +       emit_vlmax_merge_insn (icode, RVV_MERGE_OP, merge_ops);
> +       tmp = tmp2;
> +     }
> +   emit_move_insn (target, tmp);
> + }
> +      return;
> +    }
> +  else if (CONST_VECTOR_STEPPED_P (src))
> +    {
> +      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> +      if (builder.single_step_npatterns_p ())
> + {
> +   /* Describe the case by choosing NPATTERNS = 4 as an example.  */
> +   rtx base, step;
> +   if (builder.npatterns_all_equal_p ())
> +     {
> +       /* Generate the variable-length vector as below:
> + E.g. { 0, 0, 0, 0, 8, 8, 8, 8, 16, 16, 16, 16, ... } */
> +       /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
> +       base = expand_vector_broadcast (builder.mode (), builder.elt (0));
> +     }
> +   else
> +     {
> +       /* Generate the variable-length vector as below:
> + E.g. { 0, 6, 0, 6, 8, 14, 8, 14, 16, 22, 16, 22, ... } */
> +       /* Step 1: Generate base = { 0, 6, 0, 6, ... }.  */
> +       rvv_builder new_builder (builder.mode (), builder.npatterns (),
> +        1);
> +       for (unsigned int i = 0; i < builder.npatterns (); ++i)
> + new_builder.quick_push (builder.elt (i));
> +       rtx new_vec = new_builder.build ();
> +       base = gen_reg_rtx (builder.mode ());
> +       emit_move_insn (base, new_vec);
> +     }
> +
> +   /* Step 2: Generate step = gen_int_mode (diff, mode).  */
> +   poly_int64 value1 = rtx_to_poly_int64 (builder.elt (0));
> +   poly_int64 value2
> +     = rtx_to_poly_int64 (builder.elt (builder.npatterns ()));
> +   poly_int64 diff = value2 - value1;
> +   step = gen_int_mode (diff, builder.inner_mode ());
> +
> +   /* Step 3: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
> +   rtx vid = gen_reg_rtx (builder.mode ());
> +   rtx op[] = {vid};
> +   emit_vlmax_insn (code_for_pred_series (builder.mode ()), RVV_MISC_OP,
> +    op);
> +
> +   /* Step 4: Generate factor = { 0, 0, 0, 0, 1, 1, 1, 1, ... }.  */
> +   rtx factor = gen_reg_rtx (builder.mode ());
> +   rtx shift_ops[]
> +     = {factor, vid,
> +        gen_int_mode (exact_log2 (builder.npatterns ()), Pmode)};
> +   emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, builder.mode ()),
> +    RVV_BINOP, shift_ops);
> +
> +   /* Step 5: Generate adjusted step = { 0, 0, 0, 0, diff, diff, ... } */
> +   rtx adjusted_step = gen_reg_rtx (builder.mode ());
> +   rtx mul_ops[] = {adjusted_step, factor, step};
> +   emit_vlmax_insn (code_for_pred_scalar (MULT, builder.mode ()),
> +    RVV_BINOP, mul_ops);
> +
> +   /* Step 6: Generate the final result.  */
> +   rtx add_ops[] = {target, base, adjusted_step};
> +   emit_vlmax_insn (code_for_pred (PLUS, builder.mode ()), RVV_BINOP,
> +    add_ops);
> + }
> +      else
> + /* TODO: We will enable more variable-length vector in the future.  */
> + gcc_unreachable ();
> +    }
> +  else
> +    gcc_unreachable ();
> }
> /* Expand a pre-RA RVV data move from SRC to DEST.
> @@ -2029,14 +2210,13 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
> {
>    machine_mode data_mode = GET_MODE (target);
>    machine_mode sel_mode = GET_MODE (sel);
> -
> -  /* Enforced by the pattern condition.  */
> -  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
> +  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
>    /* Check if the sel only references the first values vector. If each select
>       index is in range of [0, nunits - 1]. A single vrgather instructions is
> -     enough.  */
> -  if (const_vec_all_in_range_p (sel, 0, nunits - 1))
> +     enough. Since we will use vrgatherei16.vv for variable-length vector,
> +     it is never out of range and we don't need to modulo the index.  */
> +  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
>      {
>        emit_vlmax_gather_insn (target, op0, sel);
>        return;
> @@ -2057,14 +2237,20 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>        return;
>      }
> -  /* Note: vec_perm indices are supposed to wrap when they go beyond the
> -     size of the two value vectors, i.e. the upper bits of the indices
> -     are effectively ignored.  RVV vrgather instead produces 0 for any
> -     out-of-range indices, so we need to modulo all the vec_perm indices
> -     to ensure they are all in range of [0, 2 * nunits - 1].  */
> +  rtx sel_mod = sel;
>    rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
> -  rtx sel_mod
> -    = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0, OPTAB_DIRECT);
> +  /* We don't need to modulo indices for VLA vector.
> +     Since we should gurantee they aren't out of range before.  */
> +  if (nunits.is_constant ())
> +    {
> +      /* Note: vec_perm indices are supposed to wrap when they go beyond the
> + size of the two value vectors, i.e. the upper bits of the indices
> + are effectively ignored.  RVV vrgather instead produces 0 for any
> + out-of-range indices, so we need to modulo all the vec_perm indices
> + to ensure they are all in range of [0, 2 * nunits - 1].  */
> +      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
> +      OPTAB_DIRECT);
> +    }
>    /* This following sequence is handling the case that:
>       __builtin_shufflevector (vec1, vec2, index...), the index can be any
> @@ -2094,4 +2280,124 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>    emit_vlmax_masked_gather_mu_insn (target, op1, tmp, mask);
> }
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST for RVV.  */
> +
> +/* vec_perm support.  */
> +
> +struct expand_vec_perm_d
> +{
> +  rtx target, op0, op1;
> +  vec_perm_indices perm;
> +  machine_mode vmode;
> +  machine_mode op_mode;
> +  bool one_vector_p;
> +  bool testing_p;
> +};
> +
> +/* Recognize the pattern that can be shuffled by generic approach.  */
> +
> +static bool
> +shuffle_generic_patterns (struct expand_vec_perm_d *d)
> +{
> +  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
> +  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
> +
> +  /* For constant size indices, we dont't need to handle it here.
> +     Just leave it to vec_perm<mode>.  */
> +  if (d->perm.length ().is_constant ())
> +    return false;
> +
> +  /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
> +     Otherwise, it could overflow the index range.  */
> +  if (GET_MODE_INNER (d->vmode) == QImode
> +      && !get_vector_mode (HImode, nunits).exists (&sel_mode))
> +    return false;
> +
> +  /* Success! */
> +  if (d->testing_p)
> +    return true;
> +
> +  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
> +  expand_vec_perm (d->target, d->op0, d->op1, force_reg (sel_mode, sel));
> +  return true;
> +}
> +
> +static bool
> +expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
> +{
> +  gcc_assert (d->op_mode != E_VOIDmode);
> +
> +  /* The pattern matching functions above are written to look for a small
> +     number to begin the sequence (0, 1, N/2).  If we begin with an index
> +     from the second operand, we can swap the operands.  */
> +  poly_int64 nelt = d->perm.length ();
> +  if (known_ge (d->perm[0], nelt))
> +    {
> +      d->perm.rotate_inputs (1);
> +      std::swap (d->op0, d->op1);
> +    }
> +
> +  if (known_gt (nelt, 1))
> +    {
> +      if (d->vmode == d->op_mode)
> + {
> +   if (shuffle_generic_patterns (d))
> +     return true;
> +   return false;
> + }
> +      else
> + return false;
> +    }
> +  return false;
> +}
> +
> +bool
> +expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
> +        rtx op0, rtx op1, const vec_perm_indices &sel)
> +{
> +  /* RVV doesn't have Mask type pack/unpack instructions and we don't use
> +     mask to do the iteration loop control. Just disable it directly.  */
> +  if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
> +    return false;
> +
> +  struct expand_vec_perm_d d;
> +
> +  /* Check whether the mask can be applied to a single vector.  */
> +  if (sel.ninputs () == 1 || (op0 && rtx_equal_p (op0, op1)))
> +    d.one_vector_p = true;
> +  else if (sel.all_from_input_p (0))
> +    {
> +      d.one_vector_p = true;
> +      op1 = op0;
> +    }
> +  else if (sel.all_from_input_p (1))
> +    {
> +      d.one_vector_p = true;
> +      op0 = op1;
> +    }
> +  else
> +    d.one_vector_p = false;
> +
> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
> +      sel.nelts_per_input ());
> +  d.vmode = vmode;
> +  d.op_mode = op_mode;
> +  d.target = target;
> +  d.op0 = op0;
> +  if (op0 == op1)
> +    d.op1 = d.op0;
> +  else
> +    d.op1 = op1;
> +  d.testing_p = !target;
> +
> +  if (!d.testing_p)
> +    return expand_vec_perm_const_1 (&d);
> +
> +  rtx_insn *last = get_last_insn ();
> +  bool ret = expand_vec_perm_const_1 (&d);
> +  gcc_assert (last == get_last_insn ());
> +
> +  return ret;
> +}
> +
> } // namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index caa7858b864..5d22012b591 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7631,6 +7631,19 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
>    return default_vectorize_related_mode (vector_mode, element_mode, nunits);
> }
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
> +
> +static bool
> +riscv_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
> + rtx target, rtx op0, rtx op1,
> + const vec_perm_indices &sel)
> +{
> +  if (TARGET_VECTOR && riscv_v_ext_vector_mode_p (vmode))
> +    return riscv_vector::expand_vec_perm_const (vmode, op_mode, target, op0,
> + op1, sel);
> +
> +  return false;
> +}
> /* Initialize the GCC target structure.  */
> #undef TARGET_ASM_ALIGNED_HI_OP
> @@ -7930,6 +7943,9 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
> #undef TARGET_VECTORIZE_RELATED_MODE
> #define TARGET_VECTORIZE_RELATED_MODE riscv_vectorize_related_mode
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST riscv_vectorize_vec_perm_const
> +
> struct gcc_target targetm = TARGET_INITIALIZER;
> #include "gt-riscv.h"
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
> new file mode 100644
> index 00000000000..befb518e2dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
> new file mode 100644
> index 00000000000..ac817451295
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
> new file mode 100644
> index 00000000000..73962055b03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
> new file mode 100644
> index 00000000000..fa216fc8c40
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
> new file mode 100644
> index 00000000000..899ed9e310b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 4] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 8] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 4] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 8] + 8;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
> new file mode 100644
> index 00000000000..fb87cc00cea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (uint8_t *restrict a, uint8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 2] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 6] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 3] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 4] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 5] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 0] + 3;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
> new file mode 100644
> index 00000000000..3dd744b586e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (float *__restrict f, double *__restrict d, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      f[i * 2 + 0] = 1;
> +      f[i * 2 + 1] = 2;
> +      d[i] = 3;
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
> new file mode 100644
> index 00000000000..16f078a0433
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
> @@ -0,0 +1,66 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-1.c"
> +
> +#define LIMIT 128
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 37] = {0};                                          \
> +  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> + b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> + b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
> new file mode 100644
> index 00000000000..41f688f628c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-2.c"
> +
> +#define LIMIT 32767
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
> +  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
> +  int16_t b_##NUM[NUM * 8 + 37] = {0};                                         \
> +  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> + b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> + b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
> new file mode 100644
> index 00000000000..30996cb2c6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-3.c"
> +
> +#define LIMIT 128
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 8] = {0};                                           \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> + b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> + b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
> new file mode 100644
> index 00000000000..3d43ef0890c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-4.c"
> +
> +#define LIMIT 32767
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
> +  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
> +  int16_t b_##NUM[NUM * 8 + 8] = {0};                                          \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> + b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> + b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
> new file mode 100644
> index 00000000000..814308bd7af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-5.c"
> +
> +#define LIMIT 128
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 4] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 8] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 4] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 8] + 8;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
> +  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> + b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> + b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
> new file mode 100644
> index 00000000000..e317eeac2f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-6.c"
> +
> +#define LIMIT 128
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 2] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 6] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 3] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 4] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 5] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 0] + 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
> +  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> + b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> + b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
> new file mode 100644
> index 00000000000..a8e4781988e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
> @@ -0,0 +1,58 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-7.c"
> +
> +void
> +f_golden (float *__restrict f, double *__restrict d, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      f[i * 2 + 0] = 1;
> +      f[i * 2 + 1] = 2;
> +      d[i] = 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  float a_##NUM[NUM * 2 + 2] = {0};                                            \
> +  float a_golden_##NUM[NUM * 2 + 2] = {0};                                     \
> +  double b_##NUM[NUM] = {0};                                                   \
> +  double b_golden_##NUM[NUM] = {0};                                            \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_golden_##NUM, NUM);                              \
> +  for (int i = 0; i < NUM; i++)                                                \
> +    {                                                                          \
> +      if (a_##NUM[i * 2 + 0] != a_golden_##NUM[i * 2 + 0])                     \
> + __builtin_abort ();                                                    \
> +      if (a_##NUM[i * 2 + 1] != a_golden_##NUM[i * 2 + 1])                     \
> + __builtin_abort ();                                                    \
> +      if (b_##NUM[i] != b_golden_##NUM[i])                                     \
> + __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> index 500b0adce66..3c03a87377d 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> @@ -14,4 +14,4 @@ f (int32_t *__restrict f, int32_t *__restrict d, int n)
>      }
> }
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> index 383c82a3b7c..e68d05f5f48 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> @@ -3,9 +3,4 @@
> #include "template-1.h"
> -/* Currently, we don't support SLP auto-vectorization for VLA. But it's
> -   necessary that we add this testcase here to make sure such unsupported SLP
> -   auto-vectorization will not cause an ICE. We will enable "vect" checking when
> -   we support SLP auto-vectorization for VLA in the future.  */
> -
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> index 23cc1c8651f..ecfda79e19a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
> index 4f130f02f67..1394f08f2b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
> index 823d51a03cb..c5e89996fa4 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
> index 5ead22746d3..6b320ca6f38 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
> index e03d1b44ca6..6c2a002de9c 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
> index 5bb2d9d96fa..ae3f066477c 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
> index 71820ece4b2..fc676a3865e 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
> @@ -3,4 +3,4 @@
> #include "template-1.h"
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
  2023-06-06  4:16 juzhe.zhong
  2023-06-06  6:55 ` Richard Biener
@ 2023-06-07  0:38 ` juzhe.zhong
  2023-06-07  2:38   ` Kito Cheng
  1 sibling, 1 reply; 8+ messages in thread
From: juzhe.zhong @ 2023-06-07  0:38 UTC (permalink / raw)
  To: 钟居哲, gcc-patches
  Cc: kito.cheng, Kito.cheng, palmer, palmer, jeffreyalaw, Robin Dapp, pan2.li

[-- Attachment #1: Type: text/plain, Size: 57220 bytes --]

Ping this patch. Ok for trunk ?
Since following patches are blocked by this.



juzhe.zhong@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-06 12:16
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; Juzhe-Zhong
Subject: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
 
This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i * 8 + 0] = b[i * 8 + 7] + 1;
      a[i * 8 + 1] = b[i * 8 + 7] + 2;
      a[i * 8 + 2] = b[i * 8 + 7] + 8;
      a[i * 8 + 3] = b[i * 8 + 7] + 4;
      a[i * 8 + 4] = b[i * 8 + 7] + 5;
      a[i * 8 + 5] = b[i * 8 + 7] + 6;
      a[i * 8 + 6] = b[i * 8 + 7] + 7;
      a[i * 8 + 7] = b[i * 8 + 7] + 3;
    }
}
 
To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:
 
1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
 
2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }
 
And these vector can be generated at prologue.
 
After this patch, we end up with this following codegen:
 
Prologue:
...
        vsetvli a7,zero,e16,m2,ta,ma
        vid.v   v4
        vsrl.vi v4,v4,3
        li      a3,8
        vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
        li      t1,67633152
        addi    t1,t1,513
        li      a3,50790400
        addi    a3,a3,1541
        slli    a3,a3,32
        add     a3,a3,t1
        vsetvli t1,zero,e64,m1,ta,ma
        vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
        min     a3,...
        vsetvli zero,a3,e8,m1,ta,ma
        vle8.v  v2,0(a6)
        vsetvli a7,zero,e8,m1,ta,ma
        vrgatherei16.vv v1,v2,v4
        vadd.vv v1,v1,v3
        vsetvli zero,a3,e8,m1,ta,ma
        vse8.v  v1,0(a2)
        add     a6,a6,a4
        add     a2,a2,a4
        mv      a3,a5
        add     a5,a5,t1
        bgtu    a3,a4,.L3
...
 
Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger
      range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
        lbu     a5,799(a1)
        addiw   a4,a5,1
        sb      a4,792(a0)
        addiw   a4,a5,2
        sb      a4,793(a0)
        addiw   a4,a5,8
        sb      a4,794(a0)
        addiw   a4,a5,4
        sb      a4,795(a0)
        addiw   a4,a5,5
        sb      a4,796(a0)
        addiw   a4,a5,6
        sb      a4,797(a0)
        addiw   a4,a5,7
        sb      a4,798(a0)
        addiw   a5,a5,3
        sb      a5,799(a0)
        ret
 
There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.
 
gcc/ChangeLog:
 
        * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
        * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
        (rvv_builder::single_step_npatterns_p): New function.
        (rvv_builder::npatterns_all_equal_p): Ditto.
        (const_vec_all_in_range_p): Support POLY handling.
        (gen_const_vector_dup): Ditto.
        (emit_vlmax_gather_insn): Add vrgatherei16.
        (emit_vlmax_masked_gather_mu_insn): Ditto.
        (expand_const_vector): Add VLA SLP const vector support.
        (expand_vec_perm): Support POLY.
        (struct expand_vec_perm_d): New struct.
        (shuffle_generic_patterns): New function.
        (expand_vec_perm_const_1): Ditto.
        (expand_vec_perm_const): Ditto.
        * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
        (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
 
gcc/testsuite/ChangeLog:
 
        * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer.
        * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
 
---
gcc/config/riscv/riscv-protos.h               |   2 +
gcc/config/riscv/riscv-v.cc                   | 352 ++++++++++++++++--
gcc/config/riscv/riscv.cc                     |  16 +
.../riscv/rvv/autovec/partial/slp-1.c         |  22 ++
.../riscv/rvv/autovec/partial/slp-2.c         |  22 ++
.../riscv/rvv/autovec/partial/slp-3.c         |  22 ++
.../riscv/rvv/autovec/partial/slp-4.c         |  22 ++
.../riscv/rvv/autovec/partial/slp-5.c         |  22 ++
.../riscv/rvv/autovec/partial/slp-6.c         |  23 ++
.../riscv/rvv/autovec/partial/slp-7.c         |  15 +
.../riscv/rvv/autovec/partial/slp_run-1.c     |  66 ++++
.../riscv/rvv/autovec/partial/slp_run-2.c     |  67 ++++
.../riscv/rvv/autovec/partial/slp_run-3.c     |  67 ++++
.../riscv/rvv/autovec/partial/slp_run-4.c     |  67 ++++
.../riscv/rvv/autovec/partial/slp_run-5.c     |  67 ++++
.../riscv/rvv/autovec/partial/slp_run-6.c     |  67 ++++
.../riscv/rvv/autovec/partial/slp_run-7.c     |  58 +++
.../gcc.target/riscv/rvv/autovec/scalable-1.c |   2 +-
.../gcc.target/riscv/rvv/autovec/v-1.c        |   7 +-
.../riscv/rvv/autovec/zve32f_zvl128b-1.c      |   2 +-
.../riscv/rvv/autovec/zve32x_zvl128b-1.c      |   2 +-
.../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   2 +-
.../riscv/rvv/autovec/zve64d_zvl128b-1.c      |   2 +-
.../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   2 +-
.../riscv/rvv/autovec/zve64f_zvl128b-1.c      |   2 +-
.../riscv/rvv/autovec/zve64x_zvl128b-1.c      |   2 +-
26 files changed, 963 insertions(+), 37 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d770e5e826e..27ecd16e496 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -168,6 +168,8 @@ void init_builtins (void);
const char *mangle_builtin_type (const_tree);
#ifdef GCC_TARGET_H
bool verify_type_context (location_t, type_context_kind, const_tree, bool);
+bool expand_vec_perm_const (machine_mode, machine_mode, rtx, rtx, rtx,
+     const vec_perm_indices &);
#endif
void handle_pragma_vector (void);
tree builtin_decl (unsigned, bool);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 83277fc2c05..4864429ed06 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -251,9 +251,12 @@ public:
     m_inner_mode = GET_MODE_INNER (mode);
     m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
     m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
+    m_mask_mode = get_mask_mode (mode).require ();
     gcc_assert (
       int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
+    m_int_mode
+      = get_vector_mode (m_inner_int_mode, GET_MODE_NUNITS (mode)).require ();
   }
   bool can_duplicate_repeating_sequence_p ();
@@ -262,9 +265,14 @@ public:
   bool repeating_sequence_use_merge_profitable_p ();
   rtx get_merge_scalar_mask (unsigned int) const;
+  bool single_step_npatterns_p () const;
+  bool npatterns_all_equal_p () const;
+
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
   scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
+  machine_mode mask_mode () const { return m_mask_mode; }
+  machine_mode int_mode () const { return m_int_mode; }
   unsigned int inner_bits_size () const { return m_inner_bits_size; }
   unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
@@ -273,6 +281,8 @@ private:
   scalar_int_mode m_inner_int_mode;
   machine_mode m_new_mode;
   scalar_int_mode m_new_inner_mode;
+  machine_mode m_mask_mode;
+  machine_mode m_int_mode;
   unsigned int m_inner_bits_size;
   unsigned int m_inner_bytes_size;
};
@@ -290,7 +300,9 @@ rvv_builder::can_duplicate_repeating_sequence_p ()
       || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD
       || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
     return false;
-  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+  if (full_nelts ().is_constant ())
+    return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+  return nelts_per_pattern () == 1;
}
/* Return true if it is a repeating sequence that using
@@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
   return gen_int_mode (mask, inner_int_mode ());
}
+/* Return true if the variable-length vector is single step.  */
+bool
+rvv_builder::single_step_npatterns_p () const
+{
+  if (nelts_per_pattern () != 3)
+    return false;
+
+  poly_int64 step
+    = rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 0; i < npatterns (); i++)
+    {
+      poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
+      poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
+      poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
+      poly_int64 diff1 = ele1 - ele0;
+      poly_int64 diff2 = ele2 - ele1;
+      if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
+ return false;
+    }
+  return true;
+}
+
+/* Return true if all elements of NPATTERNS are equal.
+
+   E.g. NPATTERNS = 4:
+     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
+   E.g. NPATTERNS = 8:
+     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
+*/
+bool
+rvv_builder::npatterns_all_equal_p () const
+{
+  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 1; i < npatterns (); i++)
+    {
+      poly_int64 ele = rtx_to_poly_int64 (elt (i));
+      if (!known_eq (ele, ele0))
+ return false;
+    }
+  return true;
+}
+
static unsigned
get_sew (machine_mode mode)
{
@@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
    future.  */
static bool
-const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
+const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
{
   if (!CONST_VECTOR_P (vec)
       || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
@@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
   for (int i = 0; i < nunits; i++)
     {
       rtx vec_elem = CONST_VECTOR_ELT (vec, i);
-      if (!CONST_INT_P (vec_elem)
-   || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
+      poly_int64 value;
+      if (!poly_int_rtx_p (vec_elem, &value)
+   || maybe_lt (value, minval)
+   || maybe_gt (value, maxval))
return false;
     }
   return true;
@@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
    future.  */
static rtx
-gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
+gen_const_vector_dup (machine_mode mode, poly_int64 val)
{
   rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
   return gen_const_vec_duplicate (mode, c);
@@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   rtx elt;
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
-  if (const_vec_duplicate_p (sel, &elt))
+  machine_mode sel_mode = GET_MODE (sel);
+  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+    icode = code_for_pred_gatherei16 (data_mode);
+  else if (const_vec_duplicate_p (sel, &elt))
     {
       icode = code_for_pred_gather_scalar (data_mode);
       sel = elt;
@@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
   rtx elt;
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
-  if (const_vec_duplicate_p (sel, &elt))
+  machine_mode sel_mode = GET_MODE (sel);
+  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+    icode = code_for_pred_gatherei16 (data_mode);
+  else if (const_vec_duplicate_p (sel, &elt))
     {
       icode = code_for_pred_gather_scalar (data_mode);
       sel = elt;
@@ -895,11 +957,130 @@ expand_const_vector (rtx target, rtx src)
       return;
     }
-  /* TODO: We only support const duplicate vector for now. More cases
-     will be supported when we support auto-vectorization:
+  /* Handle variable-length vector.  */
+  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
+  rvv_builder builder (mode, npatterns, nelts_per_pattern);
+  for (unsigned int i = 0; i < nelts_per_pattern; i++)
+    {
+      for (unsigned int j = 0; j < npatterns; j++)
+ builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
+    }
+  builder.finalize ();
-       1. multiple elts duplicate vector.
-       2. multiple patterns with multiple elts.  */
+  if (CONST_VECTOR_DUPLICATE_P (src))
+    {
+      if (builder.can_duplicate_repeating_sequence_p ())
+ {
+   rtx ele = builder.get_merged_repeating_sequence ();
+   rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
+   emit_move_insn (target, gen_lowpart (mode, dup));
+ }
+      else
+ {
+   unsigned int nbits = npatterns - 1;
+
+   /* Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+   rtx vid = gen_reg_rtx (builder.int_mode ());
+   rtx op[] = {vid};
+   emit_vlmax_insn (code_for_pred_series (builder.int_mode ()),
+    RVV_MISC_OP, op);
+
+   /* Generate vid_repeat = { 0, 1, ... nbits, ... }  */
+   rtx vid_repeat = gen_reg_rtx (builder.int_mode ());
+   rtx and_ops[] = {vid_repeat, vid,
+    gen_int_mode (nbits, builder.inner_int_mode ())};
+   emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
+    RVV_BINOP, and_ops);
+
+   rtx tmp = gen_reg_rtx (builder.mode ());
+   rtx dup_ops[] = {tmp, builder.elt (0)};
+   emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), RVV_UNOP,
+    dup_ops);
+   for (unsigned int i = 1; i < builder.npatterns (); i++)
+     {
+       /* Generate mask according to i.  */
+       rtx mask = gen_reg_rtx (builder.mask_mode ());
+       rtx const_vec = gen_const_vector_dup (builder.int_mode (), i);
+       expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
+
+       /* Merge scalar to each i.  */
+       rtx tmp2 = gen_reg_rtx (builder.mode ());
+       rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
+       insn_code icode = code_for_pred_merge_scalar (builder.mode ());
+       emit_vlmax_merge_insn (icode, RVV_MERGE_OP, merge_ops);
+       tmp = tmp2;
+     }
+   emit_move_insn (target, tmp);
+ }
+      return;
+    }
+  else if (CONST_VECTOR_STEPPED_P (src))
+    {
+      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+      if (builder.single_step_npatterns_p ())
+ {
+   /* Describe the case by choosing NPATTERNS = 4 as an example.  */
+   rtx base, step;
+   if (builder.npatterns_all_equal_p ())
+     {
+       /* Generate the variable-length vector as below:
+ E.g. { 0, 0, 0, 0, 8, 8, 8, 8, 16, 16, 16, 16, ... } */
+       /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
+       base = expand_vector_broadcast (builder.mode (), builder.elt (0));
+     }
+   else
+     {
+       /* Generate the variable-length vector as below:
+ E.g. { 0, 6, 0, 6, 8, 14, 8, 14, 16, 22, 16, 22, ... } */
+       /* Step 1: Generate base = { 0, 6, 0, 6, ... }.  */
+       rvv_builder new_builder (builder.mode (), builder.npatterns (),
+        1);
+       for (unsigned int i = 0; i < builder.npatterns (); ++i)
+ new_builder.quick_push (builder.elt (i));
+       rtx new_vec = new_builder.build ();
+       base = gen_reg_rtx (builder.mode ());
+       emit_move_insn (base, new_vec);
+     }
+
+   /* Step 2: Generate step = gen_int_mode (diff, mode).  */
+   poly_int64 value1 = rtx_to_poly_int64 (builder.elt (0));
+   poly_int64 value2
+     = rtx_to_poly_int64 (builder.elt (builder.npatterns ()));
+   poly_int64 diff = value2 - value1;
+   step = gen_int_mode (diff, builder.inner_mode ());
+
+   /* Step 3: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+   rtx vid = gen_reg_rtx (builder.mode ());
+   rtx op[] = {vid};
+   emit_vlmax_insn (code_for_pred_series (builder.mode ()), RVV_MISC_OP,
+    op);
+
+   /* Step 4: Generate factor = { 0, 0, 0, 0, 1, 1, 1, 1, ... }.  */
+   rtx factor = gen_reg_rtx (builder.mode ());
+   rtx shift_ops[]
+     = {factor, vid,
+        gen_int_mode (exact_log2 (builder.npatterns ()), Pmode)};
+   emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, builder.mode ()),
+    RVV_BINOP, shift_ops);
+
+   /* Step 5: Generate adjusted step = { 0, 0, 0, 0, diff, diff, ... } */
+   rtx adjusted_step = gen_reg_rtx (builder.mode ());
+   rtx mul_ops[] = {adjusted_step, factor, step};
+   emit_vlmax_insn (code_for_pred_scalar (MULT, builder.mode ()),
+    RVV_BINOP, mul_ops);
+
+   /* Step 6: Generate the final result.  */
+   rtx add_ops[] = {target, base, adjusted_step};
+   emit_vlmax_insn (code_for_pred (PLUS, builder.mode ()), RVV_BINOP,
+    add_ops);
+ }
+      else
+ /* TODO: We will enable more variable-length vector in the future.  */
+ gcc_unreachable ();
+    }
+  else
+    gcc_unreachable ();
}
/* Expand a pre-RA RVV data move from SRC to DEST.
@@ -2029,14 +2210,13 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
{
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
-
-  /* Enforced by the pattern condition.  */
-  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
   /* Check if the sel only references the first values vector. If each select
      index is in range of [0, nunits - 1]. A single vrgather instructions is
-     enough.  */
-  if (const_vec_all_in_range_p (sel, 0, nunits - 1))
+     enough. Since we will use vrgatherei16.vv for variable-length vector,
+     it is never out of range and we don't need to modulo the index.  */
+  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
     {
       emit_vlmax_gather_insn (target, op0, sel);
       return;
@@ -2057,14 +2237,20 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
       return;
     }
-  /* Note: vec_perm indices are supposed to wrap when they go beyond the
-     size of the two value vectors, i.e. the upper bits of the indices
-     are effectively ignored.  RVV vrgather instead produces 0 for any
-     out-of-range indices, so we need to modulo all the vec_perm indices
-     to ensure they are all in range of [0, 2 * nunits - 1].  */
+  rtx sel_mod = sel;
   rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  rtx sel_mod
-    = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0, OPTAB_DIRECT);
+  /* We don't need to modulo indices for VLA vector.
+     Since we should gurantee they aren't out of range before.  */
+  if (nunits.is_constant ())
+    {
+      /* Note: vec_perm indices are supposed to wrap when they go beyond the
+ size of the two value vectors, i.e. the upper bits of the indices
+ are effectively ignored.  RVV vrgather instead produces 0 for any
+ out-of-range indices, so we need to modulo all the vec_perm indices
+ to ensure they are all in range of [0, 2 * nunits - 1].  */
+      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
+      OPTAB_DIRECT);
+    }
   /* This following sequence is handling the case that:
      __builtin_shufflevector (vec1, vec2, index...), the index can be any
@@ -2094,4 +2280,124 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   emit_vlmax_masked_gather_mu_insn (target, op1, tmp, mask);
}
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST for RVV.  */
+
+/* vec_perm support.  */
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  vec_perm_indices perm;
+  machine_mode vmode;
+  machine_mode op_mode;
+  bool one_vector_p;
+  bool testing_p;
+};
+
+/* Recognize the pattern that can be shuffled by generic approach.  */
+
+static bool
+shuffle_generic_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
+
+  /* For constant size indices, we dont't need to handle it here.
+     Just leave it to vec_perm<mode>.  */
+  if (d->perm.length ().is_constant ())
+    return false;
+
+  /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
+     Otherwise, it could overflow the index range.  */
+  if (GET_MODE_INNER (d->vmode) == QImode
+      && !get_vector_mode (HImode, nunits).exists (&sel_mode))
+    return false;
+
+  /* Success! */
+  if (d->testing_p)
+    return true;
+
+  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+  expand_vec_perm (d->target, d->op0, d->op1, force_reg (sel_mode, sel));
+  return true;
+}
+
+static bool
+expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
+{
+  gcc_assert (d->op_mode != E_VOIDmode);
+
+  /* The pattern matching functions above are written to look for a small
+     number to begin the sequence (0, 1, N/2).  If we begin with an index
+     from the second operand, we can swap the operands.  */
+  poly_int64 nelt = d->perm.length ();
+  if (known_ge (d->perm[0], nelt))
+    {
+      d->perm.rotate_inputs (1);
+      std::swap (d->op0, d->op1);
+    }
+
+  if (known_gt (nelt, 1))
+    {
+      if (d->vmode == d->op_mode)
+ {
+   if (shuffle_generic_patterns (d))
+     return true;
+   return false;
+ }
+      else
+ return false;
+    }
+  return false;
+}
+
+bool
+expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
+        rtx op0, rtx op1, const vec_perm_indices &sel)
+{
+  /* RVV doesn't have Mask type pack/unpack instructions and we don't use
+     mask to do the iteration loop control. Just disable it directly.  */
+  if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
+    return false;
+
+  struct expand_vec_perm_d d;
+
+  /* Check whether the mask can be applied to a single vector.  */
+  if (sel.ninputs () == 1 || (op0 && rtx_equal_p (op0, op1)))
+    d.one_vector_p = true;
+  else if (sel.all_from_input_p (0))
+    {
+      d.one_vector_p = true;
+      op1 = op0;
+    }
+  else if (sel.all_from_input_p (1))
+    {
+      d.one_vector_p = true;
+      op0 = op1;
+    }
+  else
+    d.one_vector_p = false;
+
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
+      sel.nelts_per_input ());
+  d.vmode = vmode;
+  d.op_mode = op_mode;
+  d.target = target;
+  d.op0 = op0;
+  if (op0 == op1)
+    d.op1 = d.op0;
+  else
+    d.op1 = op1;
+  d.testing_p = !target;
+
+  if (!d.testing_p)
+    return expand_vec_perm_const_1 (&d);
+
+  rtx_insn *last = get_last_insn ();
+  bool ret = expand_vec_perm_const_1 (&d);
+  gcc_assert (last == get_last_insn ());
+
+  return ret;
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index caa7858b864..5d22012b591 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7631,6 +7631,19 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
   return default_vectorize_related_mode (vector_mode, element_mode, nunits);
}
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
+
+static bool
+riscv_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
+ rtx target, rtx op0, rtx op1,
+ const vec_perm_indices &sel)
+{
+  if (TARGET_VECTOR && riscv_v_ext_vector_mode_p (vmode))
+    return riscv_vector::expand_vec_perm_const (vmode, op_mode, target, op0,
+ op1, sel);
+
+  return false;
+}
/* Initialize the GCC target structure.  */
#undef TARGET_ASM_ALIGNED_HI_OP
@@ -7930,6 +7943,9 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
#undef TARGET_VECTORIZE_RELATED_MODE
#define TARGET_VECTORIZE_RELATED_MODE riscv_vectorize_related_mode
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST riscv_vectorize_vec_perm_const
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-riscv.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
new file mode 100644
index 00000000000..befb518e2dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
new file mode 100644
index 00000000000..ac817451295
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
new file mode 100644
index 00000000000..73962055b03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
new file mode 100644
index 00000000000..fa216fc8c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
new file mode 100644
index 00000000000..899ed9e310b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 4] + 3;
+      a[i * 8 + 3] = b[i * 8 + 8] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 4] + 7;
+      a[i * 8 + 7] = b[i * 8 + 8] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
new file mode 100644
index 00000000000..fb87cc00cea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 2] + 2;
+      a[i * 8 + 2] = b[i * 8 + 6] + 8;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 4] + 6;
+      a[i * 8 + 6] = b[i * 8 + 5] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
new file mode 100644
index 00000000000..3dd744b586e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (float *__restrict f, double *__restrict d, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      f[i * 2 + 0] = 1;
+      f[i * 2 + 1] = 2;
+      d[i] = 3;
+    }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
new file mode 100644
index 00000000000..16f078a0433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-1.c"
+
+#define LIMIT 128
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 37] = {0};                                          \
+  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+ b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+ b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
new file mode 100644
index 00000000000..41f688f628c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-2.c"
+
+#define LIMIT 32767
+
+void __attribute__ ((optimize (0)))
+f_golden (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
+  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
+  int16_t b_##NUM[NUM * 8 + 37] = {0};                                         \
+  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+ b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+ b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
new file mode 100644
index 00000000000..30996cb2c6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-3.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 8] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+ b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+ b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
new file mode 100644
index 00000000000..3d43ef0890c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-4.c"
+
+#define LIMIT 32767
+
+void __attribute__ ((optimize (0)))
+f_golden (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
+  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
+  int16_t b_##NUM[NUM * 8 + 8] = {0};                                          \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+ b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+ b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
new file mode 100644
index 00000000000..814308bd7af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-5.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 4] + 3;
+      a[i * 8 + 3] = b[i * 8 + 8] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 4] + 7;
+      a[i * 8 + 7] = b[i * 8 + 8] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+ b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+ b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
new file mode 100644
index 00000000000..e317eeac2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-6.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 2] + 2;
+      a[i * 8 + 2] = b[i * 8 + 6] + 8;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 4] + 6;
+      a[i * 8 + 6] = b[i * 8 + 5] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+ b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+ b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
new file mode 100644
index 00000000000..a8e4781988e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
@@ -0,0 +1,58 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-7.c"
+
+void
+f_golden (float *__restrict f, double *__restrict d, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      f[i * 2 + 0] = 1;
+      f[i * 2 + 1] = 2;
+      d[i] = 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  float a_##NUM[NUM * 2 + 2] = {0};                                            \
+  float a_golden_##NUM[NUM * 2 + 2] = {0};                                     \
+  double b_##NUM[NUM] = {0};                                                   \
+  double b_golden_##NUM[NUM] = {0};                                            \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_golden_##NUM, NUM);                              \
+  for (int i = 0; i < NUM; i++)                                                \
+    {                                                                          \
+      if (a_##NUM[i * 2 + 0] != a_golden_##NUM[i * 2 + 0])                     \
+ __builtin_abort ();                                                    \
+      if (a_##NUM[i * 2 + 1] != a_golden_##NUM[i * 2 + 1])                     \
+ __builtin_abort ();                                                    \
+      if (b_##NUM[i] != b_golden_##NUM[i])                                     \
+ __builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
index 500b0adce66..3c03a87377d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
@@ -14,4 +14,4 @@ f (int32_t *__restrict f, int32_t *__restrict d, int n)
     }
}
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
index 383c82a3b7c..e68d05f5f48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
@@ -3,9 +3,4 @@
#include "template-1.h"
-/* Currently, we don't support SLP auto-vectorization for VLA. But it's
-   necessary that we add this testcase here to make sure such unsupported SLP
-   auto-vectorization will not cause an ICE. We will enable "vect" checking when
-   we support SLP auto-vectorization for VLA in the future.  */
-
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
index 23cc1c8651f..ecfda79e19a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
index 4f130f02f67..1394f08f2b9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
index 823d51a03cb..c5e89996fa4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
index 5ead22746d3..6b320ca6f38 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
index e03d1b44ca6..6c2a002de9c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
index 5bb2d9d96fa..ae3f066477c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
index 71820ece4b2..fc676a3865e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
@@ -3,4 +3,4 @@
#include "template-1.h"
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
-- 
2.36.1
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
  2023-06-06  4:16 juzhe.zhong
@ 2023-06-06  6:55 ` Richard Biener
  2023-06-07  0:38 ` juzhe.zhong
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2023-06-06  6:55 UTC (permalink / raw)
  To: juzhe.zhong
  Cc: gcc-patches, kito.cheng, kito.cheng, palmer, palmer, jeffreyalaw,
	rdapp.gcc, pan2.li

On Tue, Jun 6, 2023 at 6:17 AM <juzhe.zhong@rivai.ai> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> This patch enables basic VLA SLP auto-vectorization.
> Consider this following case:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
>   for (int i = 0; i < 100; ++i)
>     {
>       a[i * 8 + 0] = b[i * 8 + 7] + 1;
>       a[i * 8 + 1] = b[i * 8 + 7] + 2;
>       a[i * 8 + 2] = b[i * 8 + 7] + 8;
>       a[i * 8 + 3] = b[i * 8 + 7] + 4;
>       a[i * 8 + 4] = b[i * 8 + 7] + 5;
>       a[i * 8 + 5] = b[i * 8 + 7] + 6;
>       a[i * 8 + 6] = b[i * 8 + 7] + 7;
>       a[i * 8 + 7] = b[i * 8 + 7] + 3;
>     }
> }
>
> To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:
>
> 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
> { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
>
> 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
> { 1, 2, 8, 4, 5, 6, 7, 3, ... }
>
> And these vector can be generated at prologue.
>
> After this patch, we end up with this following codegen:
>
> Prologue:
> ...
>         vsetvli a7,zero,e16,m2,ta,ma
>         vid.v   v4
>         vsrl.vi v4,v4,3
>         li      a3,8
>         vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
> ...
>         li      t1,67633152
>         addi    t1,t1,513
>         li      a3,50790400
>         addi    a3,a3,1541
>         slli    a3,a3,32
>         add     a3,a3,t1
>         vsetvli t1,zero,e64,m1,ta,ma
>         vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
> ...
> LoopBody:
> ...
>         min     a3,...
>         vsetvli zero,a3,e8,m1,ta,ma
>         vle8.v  v2,0(a6)
>         vsetvli a7,zero,e8,m1,ta,ma
>         vrgatherei16.vv v1,v2,v4
>         vadd.vv v1,v1,v3
>         vsetvli zero,a3,e8,m1,ta,ma
>         vse8.v  v1,0(a2)
>         add     a6,a6,a4
>         add     a2,a2,a4
>         mv      a3,a5
>         add     a5,a5,t1
>         bgtu    a3,a4,.L3
> ...
>
> Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger
>       range than "vrgather.vv" (which only can maximum element index = 255).
> Epilogue:
>         lbu     a5,799(a1)
>         addiw   a4,a5,1
>         sb      a4,792(a0)
>         addiw   a4,a5,2
>         sb      a4,793(a0)
>         addiw   a4,a5,8
>         sb      a4,794(a0)
>         addiw   a4,a5,4
>         sb      a4,795(a0)
>         addiw   a4,a5,5
>         sb      a4,796(a0)
>         addiw   a4,a5,6
>         sb      a4,797(a0)
>         addiw   a4,a5,7
>         sb      a4,798(a0)
>         addiw   a5,a5,3
>         sb      a5,799(a0)
>         ret
>
> There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support.
> I will support VLS modes for "Epilogue auto-vectorization" in the future.

What's the epilogue generated for?  With a VLA main loop body you
shouldn't have one apart from
when that body isn't entered because of cost or alias reasons?

>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
>         * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
>         (rvv_builder::single_step_npatterns_p): New function.
>         (rvv_builder::npatterns_all_equal_p): Ditto.
>         (const_vec_all_in_range_p): Support POLY handling.
>         (gen_const_vector_dup): Ditto.
>         (emit_vlmax_gather_insn): Add vrgatherei16.
>         (emit_vlmax_masked_gather_mu_insn): Ditto.
>         (expand_const_vector): Add VLA SLP const vector support.
>         (expand_vec_perm): Support POLY.
>         (struct expand_vec_perm_d): New struct.
>         (shuffle_generic_patterns): New function.
>         (expand_vec_perm_const_1): Ditto.
>         (expand_vec_perm_const): Ditto.
>         * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer.
>         * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
>         * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
>
> ---
>  gcc/config/riscv/riscv-protos.h               |   2 +
>  gcc/config/riscv/riscv-v.cc                   | 352 ++++++++++++++++--
>  gcc/config/riscv/riscv.cc                     |  16 +
>  .../riscv/rvv/autovec/partial/slp-1.c         |  22 ++
>  .../riscv/rvv/autovec/partial/slp-2.c         |  22 ++
>  .../riscv/rvv/autovec/partial/slp-3.c         |  22 ++
>  .../riscv/rvv/autovec/partial/slp-4.c         |  22 ++
>  .../riscv/rvv/autovec/partial/slp-5.c         |  22 ++
>  .../riscv/rvv/autovec/partial/slp-6.c         |  23 ++
>  .../riscv/rvv/autovec/partial/slp-7.c         |  15 +
>  .../riscv/rvv/autovec/partial/slp_run-1.c     |  66 ++++
>  .../riscv/rvv/autovec/partial/slp_run-2.c     |  67 ++++
>  .../riscv/rvv/autovec/partial/slp_run-3.c     |  67 ++++
>  .../riscv/rvv/autovec/partial/slp_run-4.c     |  67 ++++
>  .../riscv/rvv/autovec/partial/slp_run-5.c     |  67 ++++
>  .../riscv/rvv/autovec/partial/slp_run-6.c     |  67 ++++
>  .../riscv/rvv/autovec/partial/slp_run-7.c     |  58 +++
>  .../gcc.target/riscv/rvv/autovec/scalable-1.c |   2 +-
>  .../gcc.target/riscv/rvv/autovec/v-1.c        |   7 +-
>  .../riscv/rvv/autovec/zve32f_zvl128b-1.c      |   2 +-
>  .../riscv/rvv/autovec/zve32x_zvl128b-1.c      |   2 +-
>  .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   2 +-
>  .../riscv/rvv/autovec/zve64d_zvl128b-1.c      |   2 +-
>  .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   2 +-
>  .../riscv/rvv/autovec/zve64f_zvl128b-1.c      |   2 +-
>  .../riscv/rvv/autovec/zve64x_zvl128b-1.c      |   2 +-
>  26 files changed, 963 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index d770e5e826e..27ecd16e496 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -168,6 +168,8 @@ void init_builtins (void);
>  const char *mangle_builtin_type (const_tree);
>  #ifdef GCC_TARGET_H
>  bool verify_type_context (location_t, type_context_kind, const_tree, bool);
> +bool expand_vec_perm_const (machine_mode, machine_mode, rtx, rtx, rtx,
> +                           const vec_perm_indices &);
>  #endif
>  void handle_pragma_vector (void);
>  tree builtin_decl (unsigned, bool);
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 83277fc2c05..4864429ed06 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -251,9 +251,12 @@ public:
>      m_inner_mode = GET_MODE_INNER (mode);
>      m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
>      m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
> +    m_mask_mode = get_mask_mode (mode).require ();
>
>      gcc_assert (
>        int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
> +    m_int_mode
> +      = get_vector_mode (m_inner_int_mode, GET_MODE_NUNITS (mode)).require ();
>    }
>
>    bool can_duplicate_repeating_sequence_p ();
> @@ -262,9 +265,14 @@ public:
>    bool repeating_sequence_use_merge_profitable_p ();
>    rtx get_merge_scalar_mask (unsigned int) const;
>
> +  bool single_step_npatterns_p () const;
> +  bool npatterns_all_equal_p () const;
> +
>    machine_mode new_mode () const { return m_new_mode; }
>    scalar_mode inner_mode () const { return m_inner_mode; }
>    scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
> +  machine_mode mask_mode () const { return m_mask_mode; }
> +  machine_mode int_mode () const { return m_int_mode; }
>    unsigned int inner_bits_size () const { return m_inner_bits_size; }
>    unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
>
> @@ -273,6 +281,8 @@ private:
>    scalar_int_mode m_inner_int_mode;
>    machine_mode m_new_mode;
>    scalar_int_mode m_new_inner_mode;
> +  machine_mode m_mask_mode;
> +  machine_mode m_int_mode;
>    unsigned int m_inner_bits_size;
>    unsigned int m_inner_bytes_size;
>  };
> @@ -290,7 +300,9 @@ rvv_builder::can_duplicate_repeating_sequence_p ()
>        || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD
>        || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
>      return false;
> -  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
> +  if (full_nelts ().is_constant ())
> +    return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
> +  return nelts_per_pattern () == 1;
>  }
>
>  /* Return true if it is a repeating sequence that using
> @@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
>    return gen_int_mode (mask, inner_int_mode ());
>  }
>
> +/* Return true if the variable-length vector is single step.  */
> +bool
> +rvv_builder::single_step_npatterns_p () const
> +{
> +  if (nelts_per_pattern () != 3)
> +    return false;
> +
> +  poly_int64 step
> +    = rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 0; i < npatterns (); i++)
> +    {
> +      poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
> +      poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
> +      poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
> +      poly_int64 diff1 = ele1 - ele0;
> +      poly_int64 diff2 = ele2 - ele1;
> +      if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
> +       return false;
> +    }
> +  return true;
> +}
> +
> +/* Return true if all elements of NPATTERNS are equal.
> +
> +   E.g. NPATTERNS = 4:
> +     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
> +   E.g. NPATTERNS = 8:
> +     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
> +*/
> +bool
> +rvv_builder::npatterns_all_equal_p () const
> +{
> +  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 1; i < npatterns (); i++)
> +    {
> +      poly_int64 ele = rtx_to_poly_int64 (elt (i));
> +      if (!known_eq (ele, ele0))
> +       return false;
> +    }
> +  return true;
> +}
> +
>  static unsigned
>  get_sew (machine_mode mode)
>  {
> @@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
>     future.  */
>
>  static bool
> -const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
> +const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
>  {
>    if (!CONST_VECTOR_P (vec)
>        || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
> @@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
>    for (int i = 0; i < nunits; i++)
>      {
>        rtx vec_elem = CONST_VECTOR_ELT (vec, i);
> -      if (!CONST_INT_P (vec_elem)
> -         || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
> +      poly_int64 value;
> +      if (!poly_int_rtx_p (vec_elem, &value)
> +         || maybe_lt (value, minval)
> +         || maybe_gt (value, maxval))
>         return false;
>      }
>    return true;
> @@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
>     future.  */
>
>  static rtx
> -gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
> +gen_const_vector_dup (machine_mode mode, poly_int64 val)
>  {
>    rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
>    return gen_const_vec_duplicate (mode, c);
> @@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
>    rtx elt;
>    insn_code icode;
>    machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +    icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>      {
>        icode = code_for_pred_gather_scalar (data_mode);
>        sel = elt;
> @@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
>    rtx elt;
>    insn_code icode;
>    machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +    icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>      {
>        icode = code_for_pred_gather_scalar (data_mode);
>        sel = elt;
> @@ -895,11 +957,130 @@ expand_const_vector (rtx target, rtx src)
>        return;
>      }
>
> -  /* TODO: We only support const duplicate vector for now. More cases
> -     will be supported when we support auto-vectorization:
> +  /* Handle variable-length vector.  */
> +  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
> +  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
> +  rvv_builder builder (mode, npatterns, nelts_per_pattern);
> +  for (unsigned int i = 0; i < nelts_per_pattern; i++)
> +    {
> +      for (unsigned int j = 0; j < npatterns; j++)
> +       builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
> +    }
> +  builder.finalize ();
>
> -       1. multiple elts duplicate vector.
> -       2. multiple patterns with multiple elts.  */
> +  if (CONST_VECTOR_DUPLICATE_P (src))
> +    {
> +      if (builder.can_duplicate_repeating_sequence_p ())
> +       {
> +         rtx ele = builder.get_merged_repeating_sequence ();
> +         rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
> +         emit_move_insn (target, gen_lowpart (mode, dup));
> +       }
> +      else
> +       {
> +         unsigned int nbits = npatterns - 1;
> +
> +         /* Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
> +         rtx vid = gen_reg_rtx (builder.int_mode ());
> +         rtx op[] = {vid};
> +         emit_vlmax_insn (code_for_pred_series (builder.int_mode ()),
> +                          RVV_MISC_OP, op);
> +
> +         /* Generate vid_repeat = { 0, 1, ... nbits, ... }  */
> +         rtx vid_repeat = gen_reg_rtx (builder.int_mode ());
> +         rtx and_ops[] = {vid_repeat, vid,
> +                          gen_int_mode (nbits, builder.inner_int_mode ())};
> +         emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
> +                          RVV_BINOP, and_ops);
> +
> +         rtx tmp = gen_reg_rtx (builder.mode ());
> +         rtx dup_ops[] = {tmp, builder.elt (0)};
> +         emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), RVV_UNOP,
> +                          dup_ops);
> +         for (unsigned int i = 1; i < builder.npatterns (); i++)
> +           {
> +             /* Generate mask according to i.  */
> +             rtx mask = gen_reg_rtx (builder.mask_mode ());
> +             rtx const_vec = gen_const_vector_dup (builder.int_mode (), i);
> +             expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
> +
> +             /* Merge scalar to each i.  */
> +             rtx tmp2 = gen_reg_rtx (builder.mode ());
> +             rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
> +             insn_code icode = code_for_pred_merge_scalar (builder.mode ());
> +             emit_vlmax_merge_insn (icode, RVV_MERGE_OP, merge_ops);
> +             tmp = tmp2;
> +           }
> +         emit_move_insn (target, tmp);
> +       }
> +      return;
> +    }
> +  else if (CONST_VECTOR_STEPPED_P (src))
> +    {
> +      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> +      if (builder.single_step_npatterns_p ())
> +       {
> +         /* Describe the case by choosing NPATTERNS = 4 as an example.  */
> +         rtx base, step;
> +         if (builder.npatterns_all_equal_p ())
> +           {
> +             /* Generate the variable-length vector as below:
> +                E.g. { 0, 0, 0, 0, 8, 8, 8, 8, 16, 16, 16, 16, ... } */
> +             /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
> +             base = expand_vector_broadcast (builder.mode (), builder.elt (0));
> +           }
> +         else
> +           {
> +             /* Generate the variable-length vector as below:
> +                E.g. { 0, 6, 0, 6, 8, 14, 8, 14, 16, 22, 16, 22, ... } */
> +             /* Step 1: Generate base = { 0, 6, 0, 6, ... }.  */
> +             rvv_builder new_builder (builder.mode (), builder.npatterns (),
> +                                      1);
> +             for (unsigned int i = 0; i < builder.npatterns (); ++i)
> +               new_builder.quick_push (builder.elt (i));
> +             rtx new_vec = new_builder.build ();
> +             base = gen_reg_rtx (builder.mode ());
> +             emit_move_insn (base, new_vec);
> +           }
> +
> +         /* Step 2: Generate step = gen_int_mode (diff, mode).  */
> +         poly_int64 value1 = rtx_to_poly_int64 (builder.elt (0));
> +         poly_int64 value2
> +           = rtx_to_poly_int64 (builder.elt (builder.npatterns ()));
> +         poly_int64 diff = value2 - value1;
> +         step = gen_int_mode (diff, builder.inner_mode ());
> +
> +         /* Step 3: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
> +         rtx vid = gen_reg_rtx (builder.mode ());
> +         rtx op[] = {vid};
> +         emit_vlmax_insn (code_for_pred_series (builder.mode ()), RVV_MISC_OP,
> +                          op);
> +
> +         /* Step 4: Generate factor = { 0, 0, 0, 0, 1, 1, 1, 1, ... }.  */
> +         rtx factor = gen_reg_rtx (builder.mode ());
> +         rtx shift_ops[]
> +           = {factor, vid,
> +              gen_int_mode (exact_log2 (builder.npatterns ()), Pmode)};
> +         emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, builder.mode ()),
> +                          RVV_BINOP, shift_ops);
> +
> +         /* Step 5: Generate adjusted step = { 0, 0, 0, 0, diff, diff, ... } */
> +         rtx adjusted_step = gen_reg_rtx (builder.mode ());
> +         rtx mul_ops[] = {adjusted_step, factor, step};
> +         emit_vlmax_insn (code_for_pred_scalar (MULT, builder.mode ()),
> +                          RVV_BINOP, mul_ops);
> +
> +         /* Step 6: Generate the final result.  */
> +         rtx add_ops[] = {target, base, adjusted_step};
> +         emit_vlmax_insn (code_for_pred (PLUS, builder.mode ()), RVV_BINOP,
> +                          add_ops);
> +       }
> +      else
> +       /* TODO: We will enable more variable-length vector in the future.  */
> +       gcc_unreachable ();
> +    }
> +  else
> +    gcc_unreachable ();
>  }
>
>  /* Expand a pre-RA RVV data move from SRC to DEST.
> @@ -2029,14 +2210,13 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>  {
>    machine_mode data_mode = GET_MODE (target);
>    machine_mode sel_mode = GET_MODE (sel);
> -
> -  /* Enforced by the pattern condition.  */
> -  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
> +  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
>
>    /* Check if the sel only references the first values vector. If each select
>       index is in range of [0, nunits - 1]. A single vrgather instructions is
> -     enough.  */
> -  if (const_vec_all_in_range_p (sel, 0, nunits - 1))
> +     enough. Since we will use vrgatherei16.vv for variable-length vector,
> +     it is never out of range and we don't need to modulo the index.  */
> +  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
>      {
>        emit_vlmax_gather_insn (target, op0, sel);
>        return;
> @@ -2057,14 +2237,20 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>        return;
>      }
>
> -  /* Note: vec_perm indices are supposed to wrap when they go beyond the
> -     size of the two value vectors, i.e. the upper bits of the indices
> -     are effectively ignored.  RVV vrgather instead produces 0 for any
> -     out-of-range indices, so we need to modulo all the vec_perm indices
> -     to ensure they are all in range of [0, 2 * nunits - 1].  */
> +  rtx sel_mod = sel;
>    rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
> -  rtx sel_mod
> -    = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0, OPTAB_DIRECT);
> +  /* We don't need to modulo indices for VLA vector.
> +     Since we should gurantee they aren't out of range before.  */
> +  if (nunits.is_constant ())
> +    {
> +      /* Note: vec_perm indices are supposed to wrap when they go beyond the
> +        size of the two value vectors, i.e. the upper bits of the indices
> +        are effectively ignored.  RVV vrgather instead produces 0 for any
> +        out-of-range indices, so we need to modulo all the vec_perm indices
> +        to ensure they are all in range of [0, 2 * nunits - 1].  */
> +      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
> +                                    OPTAB_DIRECT);
> +    }
>
>    /* This following sequence is handling the case that:
>       __builtin_shufflevector (vec1, vec2, index...), the index can be any
> @@ -2094,4 +2280,124 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
>    emit_vlmax_masked_gather_mu_insn (target, op1, tmp, mask);
>  }
>
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST for RVV.  */
> +
> +/* vec_perm support.  */
> +
> +struct expand_vec_perm_d
> +{
> +  rtx target, op0, op1;
> +  vec_perm_indices perm;
> +  machine_mode vmode;
> +  machine_mode op_mode;
> +  bool one_vector_p;
> +  bool testing_p;
> +};
> +
> +/* Recognize the pattern that can be shuffled by generic approach.  */
> +
> +static bool
> +shuffle_generic_patterns (struct expand_vec_perm_d *d)
> +{
> +  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
> +  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
> +
> +  /* For constant size indices, we dont't need to handle it here.
> +     Just leave it to vec_perm<mode>.  */
> +  if (d->perm.length ().is_constant ())
> +    return false;
> +
> +  /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
> +     Otherwise, it could overflow the index range.  */
> +  if (GET_MODE_INNER (d->vmode) == QImode
> +      && !get_vector_mode (HImode, nunits).exists (&sel_mode))
> +    return false;
> +
> +  /* Success! */
> +  if (d->testing_p)
> +    return true;
> +
> +  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
> +  expand_vec_perm (d->target, d->op0, d->op1, force_reg (sel_mode, sel));
> +  return true;
> +}
> +
> +static bool
> +expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
> +{
> +  gcc_assert (d->op_mode != E_VOIDmode);
> +
> +  /* The pattern matching functions above are written to look for a small
> +     number to begin the sequence (0, 1, N/2).  If we begin with an index
> +     from the second operand, we can swap the operands.  */
> +  poly_int64 nelt = d->perm.length ();
> +  if (known_ge (d->perm[0], nelt))
> +    {
> +      d->perm.rotate_inputs (1);
> +      std::swap (d->op0, d->op1);
> +    }
> +
> +  if (known_gt (nelt, 1))
> +    {
> +      if (d->vmode == d->op_mode)
> +       {
> +         if (shuffle_generic_patterns (d))
> +           return true;
> +         return false;
> +       }
> +      else
> +       return false;
> +    }
> +  return false;
> +}
> +
> +bool
> +expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
> +                      rtx op0, rtx op1, const vec_perm_indices &sel)
> +{
> +  /* RVV doesn't have Mask type pack/unpack instructions and we don't use
> +     mask to do the iteration loop control. Just disable it directly.  */
> +  if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
> +    return false;
> +
> +  struct expand_vec_perm_d d;
> +
> +  /* Check whether the mask can be applied to a single vector.  */
> +  if (sel.ninputs () == 1 || (op0 && rtx_equal_p (op0, op1)))
> +    d.one_vector_p = true;
> +  else if (sel.all_from_input_p (0))
> +    {
> +      d.one_vector_p = true;
> +      op1 = op0;
> +    }
> +  else if (sel.all_from_input_p (1))
> +    {
> +      d.one_vector_p = true;
> +      op0 = op1;
> +    }
> +  else
> +    d.one_vector_p = false;
> +
> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
> +                    sel.nelts_per_input ());
> +  d.vmode = vmode;
> +  d.op_mode = op_mode;
> +  d.target = target;
> +  d.op0 = op0;
> +  if (op0 == op1)
> +    d.op1 = d.op0;
> +  else
> +    d.op1 = op1;
> +  d.testing_p = !target;
> +
> +  if (!d.testing_p)
> +    return expand_vec_perm_const_1 (&d);
> +
> +  rtx_insn *last = get_last_insn ();
> +  bool ret = expand_vec_perm_const_1 (&d);
> +  gcc_assert (last == get_last_insn ());
> +
> +  return ret;
> +}
> +
>  } // namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index caa7858b864..5d22012b591 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7631,6 +7631,19 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
>    return default_vectorize_related_mode (vector_mode, element_mode, nunits);
>  }
>
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
> +
> +static bool
> +riscv_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
> +                               rtx target, rtx op0, rtx op1,
> +                               const vec_perm_indices &sel)
> +{
> +  if (TARGET_VECTOR && riscv_v_ext_vector_mode_p (vmode))
> +    return riscv_vector::expand_vec_perm_const (vmode, op_mode, target, op0,
> +                                               op1, sel);
> +
> +  return false;
> +}
>
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
> @@ -7930,6 +7943,9 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
>  #undef TARGET_VECTORIZE_RELATED_MODE
>  #define TARGET_VECTORIZE_RELATED_MODE riscv_vectorize_related_mode
>
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST riscv_vectorize_vec_perm_const
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-riscv.h"
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
> new file mode 100644
> index 00000000000..befb518e2dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
> new file mode 100644
> index 00000000000..ac817451295
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
> new file mode 100644
> index 00000000000..73962055b03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
> new file mode 100644
> index 00000000000..fa216fc8c40
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
> new file mode 100644
> index 00000000000..899ed9e310b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 4] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 8] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 4] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 8] + 8;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
> new file mode 100644
> index 00000000000..fb87cc00cea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (uint8_t *restrict a, uint8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 2] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 6] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 3] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 4] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 5] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 0] + 3;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
> new file mode 100644
> index 00000000000..3dd744b586e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include <stdint-gcc.h>
> +
> +void __attribute__ ((noipa))
> +f (float *__restrict f, double *__restrict d, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      f[i * 2 + 0] = 1;
> +      f[i * 2 + 1] = 2;
> +      d[i] = 3;
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
> new file mode 100644
> index 00000000000..16f078a0433
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
> @@ -0,0 +1,66 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-1.c"
> +
> +#define LIMIT 128
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 37] = {0};                                          \
> +  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> +       b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> +       b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
> new file mode 100644
> index 00000000000..41f688f628c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-2.c"
> +
> +#define LIMIT 32767
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 37] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 37] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 37] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 37] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 37] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 37] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 37] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 37] + 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
> +  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
> +  int16_t b_##NUM[NUM * 8 + 37] = {0};                                         \
> +  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> +       b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> +       b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
> new file mode 100644
> index 00000000000..30996cb2c6e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-3.c"
> +
> +#define LIMIT 128
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 8] = {0};                                           \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> +       b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> +       b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
> new file mode 100644
> index 00000000000..3d43ef0890c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-4.c"
> +
> +#define LIMIT 32767
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int16_t *restrict a, int16_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 1] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 1] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 7] + 8;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
> +  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
> +  int16_t b_##NUM[NUM * 8 + 8] = {0};                                          \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> +       b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> +       b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
> new file mode 100644
> index 00000000000..814308bd7af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-5.c"
> +
> +#define LIMIT 128
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 7] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 4] + 3;
> +      a[i * 8 + 3] = b[i * 8 + 8] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 1] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 7] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 4] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 8] + 8;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
> +  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> +       b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> +       b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
> new file mode 100644
> index 00000000000..e317eeac2f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-6.c"
> +
> +#define LIMIT 128
> +
> +void __attribute__ ((optimize (0)))
> +f_golden (int8_t *restrict a, int8_t *restrict b, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      a[i * 8 + 0] = b[i * 8 + 1] + 1;
> +      a[i * 8 + 1] = b[i * 8 + 2] + 2;
> +      a[i * 8 + 2] = b[i * 8 + 6] + 8;
> +      a[i * 8 + 3] = b[i * 8 + 7] + 4;
> +      a[i * 8 + 4] = b[i * 8 + 3] + 5;
> +      a[i * 8 + 5] = b[i * 8 + 4] + 6;
> +      a[i * 8 + 6] = b[i * 8 + 5] + 7;
> +      a[i * 8 + 7] = b[i * 8 + 0] + 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
> +  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
> +  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
> +  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
> +    {                                                                          \
> +      if (i % NUM == 0)                                                        \
> +       b_##NUM[i] = (i + NUM) % LIMIT;                                        \
> +      else                                                                     \
> +       b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
> +    }                                                                          \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
> +  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
> +    {                                                                          \
> +      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
> new file mode 100644
> index 00000000000..a8e4781988e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
> @@ -0,0 +1,58 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "slp-7.c"
> +
> +void
> +f_golden (float *__restrict f, double *__restrict d, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      f[i * 2 + 0] = 1;
> +      f[i * 2 + 1] = 2;
> +      d[i] = 3;
> +    }
> +}
> +
> +int
> +main (void)
> +{
> +#define RUN(NUM)                                                               \
> +  float a_##NUM[NUM * 2 + 2] = {0};                                            \
> +  float a_golden_##NUM[NUM * 2 + 2] = {0};                                     \
> +  double b_##NUM[NUM] = {0};                                                   \
> +  double b_golden_##NUM[NUM] = {0};                                            \
> +  f (a_##NUM, b_##NUM, NUM);                                                   \
> +  f_golden (a_golden_##NUM, b_golden_##NUM, NUM);                              \
> +  for (int i = 0; i < NUM; i++)                                                \
> +    {                                                                          \
> +      if (a_##NUM[i * 2 + 0] != a_golden_##NUM[i * 2 + 0])                     \
> +       __builtin_abort ();                                                    \
> +      if (a_##NUM[i * 2 + 1] != a_golden_##NUM[i * 2 + 1])                     \
> +       __builtin_abort ();                                                    \
> +      if (b_##NUM[i] != b_golden_##NUM[i])                                     \
> +       __builtin_abort ();                                                    \
> +    }
> +
> +  RUN (3);
> +  RUN (5);
> +  RUN (15);
> +  RUN (16);
> +  RUN (17);
> +  RUN (31);
> +  RUN (32);
> +  RUN (33);
> +  RUN (63);
> +  RUN (64);
> +  RUN (65);
> +  RUN (127);
> +  RUN (128);
> +  RUN (129);
> +  RUN (239);
> +  RUN (359);
> +  RUN (498);
> +  RUN (799);
> +  RUN (977);
> +  RUN (5789);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> index 500b0adce66..3c03a87377d 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
> @@ -14,4 +14,4 @@ f (int32_t *__restrict f, int32_t *__restrict d, int n)
>      }
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> index 383c82a3b7c..e68d05f5f48 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> @@ -3,9 +3,4 @@
>
>  #include "template-1.h"
>
> -/* Currently, we don't support SLP auto-vectorization for VLA. But it's
> -   necessary that we add this testcase here to make sure such unsupported SLP
> -   auto-vectorization will not cause an ICE. We will enable "vect" checking when
> -   we support SLP auto-vectorization for VLA in the future.  */
> -
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> index 23cc1c8651f..ecfda79e19a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
> index 4f130f02f67..1394f08f2b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
> index 823d51a03cb..c5e89996fa4 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
> index 5ead22746d3..6b320ca6f38 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
> index e03d1b44ca6..6c2a002de9c 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
> index 5bb2d9d96fa..ae3f066477c 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
> index 71820ece4b2..fc676a3865e 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
> @@ -3,4 +3,4 @@
>
>  #include "template-1.h"
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
@ 2023-06-06  4:16 juzhe.zhong
  2023-06-06  6:55 ` Richard Biener
  2023-06-07  0:38 ` juzhe.zhong
  0 siblings, 2 replies; 8+ messages in thread
From: juzhe.zhong @ 2023-06-06  4:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: kito.cheng, kito.cheng, palmer, palmer, jeffreyalaw, rdapp.gcc,
	pan2.li, Juzhe-Zhong

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i * 8 + 0] = b[i * 8 + 7] + 1;
      a[i * 8 + 1] = b[i * 8 + 7] + 2;
      a[i * 8 + 2] = b[i * 8 + 7] + 8;
      a[i * 8 + 3] = b[i * 8 + 7] + 4;
      a[i * 8 + 4] = b[i * 8 + 7] + 5;
      a[i * 8 + 5] = b[i * 8 + 7] + 6;
      a[i * 8 + 6] = b[i * 8 + 7] + 7;
      a[i * 8 + 7] = b[i * 8 + 7] + 3;
    }
}

To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
        vsetvli a7,zero,e16,m2,ta,ma
        vid.v   v4
        vsrl.vi v4,v4,3
        li      a3,8
        vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
        li      t1,67633152
        addi    t1,t1,513
        li      a3,50790400
        addi    a3,a3,1541
        slli    a3,a3,32
        add     a3,a3,t1
        vsetvli t1,zero,e64,m1,ta,ma
        vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
        min     a3,...
        vsetvli zero,a3,e8,m1,ta,ma
        vle8.v  v2,0(a6)
        vsetvli a7,zero,e8,m1,ta,ma
        vrgatherei16.vv v1,v2,v4
        vadd.vv v1,v1,v3
        vsetvli zero,a3,e8,m1,ta,ma
        vse8.v  v1,0(a2)
        add     a6,a6,a4
        add     a2,a2,a4
        mv      a3,a5
        add     a5,a5,t1
        bgtu    a3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger
      range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
        lbu     a5,799(a1)
        addiw   a4,a5,1
        sb      a4,792(a0)
        addiw   a4,a5,2
        sb      a4,793(a0)
        addiw   a4,a5,8
        sb      a4,794(a0)
        addiw   a4,a5,4
        sb      a4,795(a0)
        addiw   a4,a5,5
        sb      a4,796(a0)
        addiw   a4,a5,6
        sb      a4,797(a0)
        addiw   a4,a5,7
        sb      a4,798(a0)
        addiw   a5,a5,3
        sb      a5,799(a0)
        ret

There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

        * config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
        * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
        (rvv_builder::single_step_npatterns_p): New function.
        (rvv_builder::npatterns_all_equal_p): Ditto.
        (const_vec_all_in_range_p): Support POLY handling.
        (gen_const_vector_dup): Ditto.
        (emit_vlmax_gather_insn): Add vrgatherei16.
        (emit_vlmax_masked_gather_mu_insn): Ditto.
        (expand_const_vector): Add VLA SLP const vector support.
        (expand_vec_perm): Support POLY.
        (struct expand_vec_perm_d): New struct.
        (shuffle_generic_patterns): New function.
        (expand_vec_perm_const_1): Ditto.
        (expand_vec_perm_const): Ditto.
        * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
        (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer.
        * gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
        * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.

---
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-v.cc                   | 352 ++++++++++++++++--
 gcc/config/riscv/riscv.cc                     |  16 +
 .../riscv/rvv/autovec/partial/slp-1.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-2.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-3.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-4.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-5.c         |  22 ++
 .../riscv/rvv/autovec/partial/slp-6.c         |  23 ++
 .../riscv/rvv/autovec/partial/slp-7.c         |  15 +
 .../riscv/rvv/autovec/partial/slp_run-1.c     |  66 ++++
 .../riscv/rvv/autovec/partial/slp_run-2.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-3.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-4.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-5.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-6.c     |  67 ++++
 .../riscv/rvv/autovec/partial/slp_run-7.c     |  58 +++
 .../gcc.target/riscv/rvv/autovec/scalable-1.c |   2 +-
 .../gcc.target/riscv/rvv/autovec/v-1.c        |   7 +-
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c      |   2 +-
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c      |   2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   2 +-
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c      |   2 +-
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   2 +-
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c      |   2 +-
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c      |   2 +-
 26 files changed, 963 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d770e5e826e..27ecd16e496 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -168,6 +168,8 @@ void init_builtins (void);
 const char *mangle_builtin_type (const_tree);
 #ifdef GCC_TARGET_H
 bool verify_type_context (location_t, type_context_kind, const_tree, bool);
+bool expand_vec_perm_const (machine_mode, machine_mode, rtx, rtx, rtx,
+			    const vec_perm_indices &);
 #endif
 void handle_pragma_vector (void);
 tree builtin_decl (unsigned, bool);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 83277fc2c05..4864429ed06 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -251,9 +251,12 @@ public:
     m_inner_mode = GET_MODE_INNER (mode);
     m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
     m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
+    m_mask_mode = get_mask_mode (mode).require ();
 
     gcc_assert (
       int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
+    m_int_mode
+      = get_vector_mode (m_inner_int_mode, GET_MODE_NUNITS (mode)).require ();
   }
 
   bool can_duplicate_repeating_sequence_p ();
@@ -262,9 +265,14 @@ public:
   bool repeating_sequence_use_merge_profitable_p ();
   rtx get_merge_scalar_mask (unsigned int) const;
 
+  bool single_step_npatterns_p () const;
+  bool npatterns_all_equal_p () const;
+
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
   scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
+  machine_mode mask_mode () const { return m_mask_mode; }
+  machine_mode int_mode () const { return m_int_mode; }
   unsigned int inner_bits_size () const { return m_inner_bits_size; }
   unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
 
@@ -273,6 +281,8 @@ private:
   scalar_int_mode m_inner_int_mode;
   machine_mode m_new_mode;
   scalar_int_mode m_new_inner_mode;
+  machine_mode m_mask_mode;
+  machine_mode m_int_mode;
   unsigned int m_inner_bits_size;
   unsigned int m_inner_bytes_size;
 };
@@ -290,7 +300,9 @@ rvv_builder::can_duplicate_repeating_sequence_p ()
       || GET_MODE_SIZE (m_new_inner_mode) > UNITS_PER_WORD
       || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
     return false;
-  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+  if (full_nelts ().is_constant ())
+    return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ());
+  return nelts_per_pattern () == 1;
 }
 
 /* Return true if it is a repeating sequence that using
@@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
   return gen_int_mode (mask, inner_int_mode ());
 }
 
+/* Return true if the variable-length vector is single step.  */
+bool
+rvv_builder::single_step_npatterns_p () const
+{
+  if (nelts_per_pattern () != 3)
+    return false;
+
+  poly_int64 step
+    = rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 0; i < npatterns (); i++)
+    {
+      poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
+      poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
+      poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
+      poly_int64 diff1 = ele1 - ele0;
+      poly_int64 diff2 = ele2 - ele1;
+      if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
+	return false;
+    }
+  return true;
+}
+
+/* Return true if all elements of NPATTERNS are equal.
+
+   E.g. NPATTERNS = 4:
+     { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
+   E.g. NPATTERNS = 8:
+     { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
+*/
+bool
+rvv_builder::npatterns_all_equal_p () const
+{
+  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
+  for (unsigned int i = 1; i < npatterns (); i++)
+    {
+      poly_int64 ele = rtx_to_poly_int64 (elt (i));
+      if (!known_eq (ele, ele0))
+	return false;
+    }
+  return true;
+}
+
 static unsigned
 get_sew (machine_mode mode)
 {
@@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
    future.  */
 
 static bool
-const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
+const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
 {
   if (!CONST_VECTOR_P (vec)
       || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
@@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
   for (int i = 0; i < nunits; i++)
     {
       rtx vec_elem = CONST_VECTOR_ELT (vec, i);
-      if (!CONST_INT_P (vec_elem)
-	  || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
+      poly_int64 value;
+      if (!poly_int_rtx_p (vec_elem, &value)
+	  || maybe_lt (value, minval)
+	  || maybe_gt (value, maxval))
 	return false;
     }
   return true;
@@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT maxval)
    future.  */
 
 static rtx
-gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
+gen_const_vector_dup (machine_mode mode, poly_int64 val)
 {
   rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
   return gen_const_vec_duplicate (mode, c);
@@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
   rtx elt;
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
-  if (const_vec_duplicate_p (sel, &elt))
+  machine_mode sel_mode = GET_MODE (sel);
+  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+    icode = code_for_pred_gatherei16 (data_mode);
+  else if (const_vec_duplicate_p (sel, &elt))
     {
       icode = code_for_pred_gather_scalar (data_mode);
       sel = elt;
@@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
   rtx elt;
   insn_code icode;
   machine_mode data_mode = GET_MODE (target);
-  if (const_vec_duplicate_p (sel, &elt))
+  machine_mode sel_mode = GET_MODE (sel);
+  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
+    icode = code_for_pred_gatherei16 (data_mode);
+  else if (const_vec_duplicate_p (sel, &elt))
     {
       icode = code_for_pred_gather_scalar (data_mode);
       sel = elt;
@@ -895,11 +957,130 @@ expand_const_vector (rtx target, rtx src)
       return;
     }
 
-  /* TODO: We only support const duplicate vector for now. More cases
-     will be supported when we support auto-vectorization:
+  /* Handle variable-length vector.  */
+  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
+  rvv_builder builder (mode, npatterns, nelts_per_pattern);
+  for (unsigned int i = 0; i < nelts_per_pattern; i++)
+    {
+      for (unsigned int j = 0; j < npatterns; j++)
+	builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
+    }
+  builder.finalize ();
 
-       1. multiple elts duplicate vector.
-       2. multiple patterns with multiple elts.  */
+  if (CONST_VECTOR_DUPLICATE_P (src))
+    {
+      if (builder.can_duplicate_repeating_sequence_p ())
+	{
+	  rtx ele = builder.get_merged_repeating_sequence ();
+	  rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
+	  emit_move_insn (target, gen_lowpart (mode, dup));
+	}
+      else
+	{
+	  unsigned int nbits = npatterns - 1;
+
+	  /* Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+	  rtx vid = gen_reg_rtx (builder.int_mode ());
+	  rtx op[] = {vid};
+	  emit_vlmax_insn (code_for_pred_series (builder.int_mode ()),
+			   RVV_MISC_OP, op);
+
+	  /* Generate vid_repeat = { 0, 1, ... nbits, ... }  */
+	  rtx vid_repeat = gen_reg_rtx (builder.int_mode ());
+	  rtx and_ops[] = {vid_repeat, vid,
+			   gen_int_mode (nbits, builder.inner_int_mode ())};
+	  emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
+			   RVV_BINOP, and_ops);
+
+	  rtx tmp = gen_reg_rtx (builder.mode ());
+	  rtx dup_ops[] = {tmp, builder.elt (0)};
+	  emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), RVV_UNOP,
+			   dup_ops);
+	  for (unsigned int i = 1; i < builder.npatterns (); i++)
+	    {
+	      /* Generate mask according to i.  */
+	      rtx mask = gen_reg_rtx (builder.mask_mode ());
+	      rtx const_vec = gen_const_vector_dup (builder.int_mode (), i);
+	      expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
+
+	      /* Merge scalar to each i.  */
+	      rtx tmp2 = gen_reg_rtx (builder.mode ());
+	      rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
+	      insn_code icode = code_for_pred_merge_scalar (builder.mode ());
+	      emit_vlmax_merge_insn (icode, RVV_MERGE_OP, merge_ops);
+	      tmp = tmp2;
+	    }
+	  emit_move_insn (target, tmp);
+	}
+      return;
+    }
+  else if (CONST_VECTOR_STEPPED_P (src))
+    {
+      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+      if (builder.single_step_npatterns_p ())
+	{
+	  /* Describe the case by choosing NPATTERNS = 4 as an example.  */
+	  rtx base, step;
+	  if (builder.npatterns_all_equal_p ())
+	    {
+	      /* Generate the variable-length vector as below:
+		 E.g. { 0, 0, 0, 0, 8, 8, 8, 8, 16, 16, 16, 16, ... } */
+	      /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
+	      base = expand_vector_broadcast (builder.mode (), builder.elt (0));
+	    }
+	  else
+	    {
+	      /* Generate the variable-length vector as below:
+		 E.g. { 0, 6, 0, 6, 8, 14, 8, 14, 16, 22, 16, 22, ... } */
+	      /* Step 1: Generate base = { 0, 6, 0, 6, ... }.  */
+	      rvv_builder new_builder (builder.mode (), builder.npatterns (),
+				       1);
+	      for (unsigned int i = 0; i < builder.npatterns (); ++i)
+		new_builder.quick_push (builder.elt (i));
+	      rtx new_vec = new_builder.build ();
+	      base = gen_reg_rtx (builder.mode ());
+	      emit_move_insn (base, new_vec);
+	    }
+
+	  /* Step 2: Generate step = gen_int_mode (diff, mode).  */
+	  poly_int64 value1 = rtx_to_poly_int64 (builder.elt (0));
+	  poly_int64 value2
+	    = rtx_to_poly_int64 (builder.elt (builder.npatterns ()));
+	  poly_int64 diff = value2 - value1;
+	  step = gen_int_mode (diff, builder.inner_mode ());
+
+	  /* Step 3: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+	  rtx vid = gen_reg_rtx (builder.mode ());
+	  rtx op[] = {vid};
+	  emit_vlmax_insn (code_for_pred_series (builder.mode ()), RVV_MISC_OP,
+			   op);
+
+	  /* Step 4: Generate factor = { 0, 0, 0, 0, 1, 1, 1, 1, ... }.  */
+	  rtx factor = gen_reg_rtx (builder.mode ());
+	  rtx shift_ops[]
+	    = {factor, vid,
+	       gen_int_mode (exact_log2 (builder.npatterns ()), Pmode)};
+	  emit_vlmax_insn (code_for_pred_scalar (LSHIFTRT, builder.mode ()),
+			   RVV_BINOP, shift_ops);
+
+	  /* Step 5: Generate adjusted step = { 0, 0, 0, 0, diff, diff, ... } */
+	  rtx adjusted_step = gen_reg_rtx (builder.mode ());
+	  rtx mul_ops[] = {adjusted_step, factor, step};
+	  emit_vlmax_insn (code_for_pred_scalar (MULT, builder.mode ()),
+			   RVV_BINOP, mul_ops);
+
+	  /* Step 6: Generate the final result.  */
+	  rtx add_ops[] = {target, base, adjusted_step};
+	  emit_vlmax_insn (code_for_pred (PLUS, builder.mode ()), RVV_BINOP,
+			   add_ops);
+	}
+      else
+	/* TODO: We will enable more variable-length vector in the future.  */
+	gcc_unreachable ();
+    }
+  else
+    gcc_unreachable ();
 }
 
 /* Expand a pre-RA RVV data move from SRC to DEST.
@@ -2029,14 +2210,13 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
 {
   machine_mode data_mode = GET_MODE (target);
   machine_mode sel_mode = GET_MODE (sel);
-
-  /* Enforced by the pattern condition.  */
-  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
 
   /* Check if the sel only references the first values vector. If each select
      index is in range of [0, nunits - 1]. A single vrgather instructions is
-     enough.  */
-  if (const_vec_all_in_range_p (sel, 0, nunits - 1))
+     enough. Since we will use vrgatherei16.vv for variable-length vector,
+     it is never out of range and we don't need to modulo the index.  */
+  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
     {
       emit_vlmax_gather_insn (target, op0, sel);
       return;
@@ -2057,14 +2237,20 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
       return;
     }
 
-  /* Note: vec_perm indices are supposed to wrap when they go beyond the
-     size of the two value vectors, i.e. the upper bits of the indices
-     are effectively ignored.  RVV vrgather instead produces 0 for any
-     out-of-range indices, so we need to modulo all the vec_perm indices
-     to ensure they are all in range of [0, 2 * nunits - 1].  */
+  rtx sel_mod = sel;
   rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  rtx sel_mod
-    = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0, OPTAB_DIRECT);
+  /* We don't need to modulo indices for VLA vector.
+     Since we should gurantee they aren't out of range before.  */
+  if (nunits.is_constant ())
+    {
+      /* Note: vec_perm indices are supposed to wrap when they go beyond the
+	 size of the two value vectors, i.e. the upper bits of the indices
+	 are effectively ignored.  RVV vrgather instead produces 0 for any
+	 out-of-range indices, so we need to modulo all the vec_perm indices
+	 to ensure they are all in range of [0, 2 * nunits - 1].  */
+      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
+				     OPTAB_DIRECT);
+    }
 
   /* This following sequence is handling the case that:
      __builtin_shufflevector (vec1, vec2, index...), the index can be any
@@ -2094,4 +2280,124 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   emit_vlmax_masked_gather_mu_insn (target, op1, tmp, mask);
 }
 
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST for RVV.  */
+
+/* vec_perm support.  */
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  vec_perm_indices perm;
+  machine_mode vmode;
+  machine_mode op_mode;
+  bool one_vector_p;
+  bool testing_p;
+};
+
+/* Recognize the pattern that can be shuffled by generic approach.  */
+
+static bool
+shuffle_generic_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
+
+  /* For constant size indices, we dont't need to handle it here.
+     Just leave it to vec_perm<mode>.  */
+  if (d->perm.length ().is_constant ())
+    return false;
+
+  /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
+     Otherwise, it could overflow the index range.  */
+  if (GET_MODE_INNER (d->vmode) == QImode
+      && !get_vector_mode (HImode, nunits).exists (&sel_mode))
+    return false;
+
+  /* Success! */
+  if (d->testing_p)
+    return true;
+
+  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+  expand_vec_perm (d->target, d->op0, d->op1, force_reg (sel_mode, sel));
+  return true;
+}
+
+static bool
+expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
+{
+  gcc_assert (d->op_mode != E_VOIDmode);
+
+  /* The pattern matching functions above are written to look for a small
+     number to begin the sequence (0, 1, N/2).  If we begin with an index
+     from the second operand, we can swap the operands.  */
+  poly_int64 nelt = d->perm.length ();
+  if (known_ge (d->perm[0], nelt))
+    {
+      d->perm.rotate_inputs (1);
+      std::swap (d->op0, d->op1);
+    }
+
+  if (known_gt (nelt, 1))
+    {
+      if (d->vmode == d->op_mode)
+	{
+	  if (shuffle_generic_patterns (d))
+	    return true;
+	  return false;
+	}
+      else
+	return false;
+    }
+  return false;
+}
+
+bool
+expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target,
+		       rtx op0, rtx op1, const vec_perm_indices &sel)
+{
+  /* RVV doesn't have Mask type pack/unpack instructions and we don't use
+     mask to do the iteration loop control. Just disable it directly.  */
+  if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
+    return false;
+
+  struct expand_vec_perm_d d;
+
+  /* Check whether the mask can be applied to a single vector.  */
+  if (sel.ninputs () == 1 || (op0 && rtx_equal_p (op0, op1)))
+    d.one_vector_p = true;
+  else if (sel.all_from_input_p (0))
+    {
+      d.one_vector_p = true;
+      op1 = op0;
+    }
+  else if (sel.all_from_input_p (1))
+    {
+      d.one_vector_p = true;
+      op0 = op1;
+    }
+  else
+    d.one_vector_p = false;
+
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
+		     sel.nelts_per_input ());
+  d.vmode = vmode;
+  d.op_mode = op_mode;
+  d.target = target;
+  d.op0 = op0;
+  if (op0 == op1)
+    d.op1 = d.op0;
+  else
+    d.op1 = op1;
+  d.testing_p = !target;
+
+  if (!d.testing_p)
+    return expand_vec_perm_const_1 (&d);
+
+  rtx_insn *last = get_last_insn ();
+  bool ret = expand_vec_perm_const_1 (&d);
+  gcc_assert (last == get_last_insn ());
+
+  return ret;
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index caa7858b864..5d22012b591 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7631,6 +7631,19 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
   return default_vectorize_related_mode (vector_mode, element_mode, nunits);
 }
 
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
+
+static bool
+riscv_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
+				rtx target, rtx op0, rtx op1,
+				const vec_perm_indices &sel)
+{
+  if (TARGET_VECTOR && riscv_v_ext_vector_mode_p (vmode))
+    return riscv_vector::expand_vec_perm_const (vmode, op_mode, target, op0,
+						op1, sel);
+
+  return false;
+}
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -7930,6 +7943,9 @@ riscv_vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode
 #undef TARGET_VECTORIZE_RELATED_MODE
 #define TARGET_VECTORIZE_RELATED_MODE riscv_vectorize_related_mode
 
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST riscv_vectorize_vec_perm_const
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
new file mode 100644
index 00000000000..befb518e2dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
new file mode 100644
index 00000000000..ac817451295
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
new file mode 100644
index 00000000000..73962055b03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
new file mode 100644
index 00000000000..fa216fc8c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
new file mode 100644
index 00000000000..899ed9e310b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 4] + 3;
+      a[i * 8 + 3] = b[i * 8 + 8] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 4] + 7;
+      a[i * 8 + 7] = b[i * 8 + 8] + 8;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
new file mode 100644
index 00000000000..fb87cc00cea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (uint8_t *restrict a, uint8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 2] + 2;
+      a[i * 8 + 2] = b[i * 8 + 6] + 8;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 4] + 6;
+      a[i * 8 + 6] = b[i * 8 + 5] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 3;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
new file mode 100644
index 00000000000..3dd744b586e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-7.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+void __attribute__ ((noipa))
+f (float *__restrict f, double *__restrict d, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      f[i * 2 + 0] = 1;
+      f[i * 2 + 1] = 2;
+      d[i] = 3;
+    }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
new file mode 100644
index 00000000000..16f078a0433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-1.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-1.c"
+
+#define LIMIT 128
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 37] = {0};                                          \
+  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
new file mode 100644
index 00000000000..41f688f628c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-2.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-2.c"
+
+#define LIMIT 32767
+
+void __attribute__ ((optimize (0)))
+f_golden (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 37] + 1;
+      a[i * 8 + 1] = b[i * 8 + 37] + 2;
+      a[i * 8 + 2] = b[i * 8 + 37] + 8;
+      a[i * 8 + 3] = b[i * 8 + 37] + 4;
+      a[i * 8 + 4] = b[i * 8 + 37] + 5;
+      a[i * 8 + 5] = b[i * 8 + 37] + 6;
+      a[i * 8 + 6] = b[i * 8 + 37] + 7;
+      a[i * 8 + 7] = b[i * 8 + 37] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
+  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
+  int16_t b_##NUM[NUM * 8 + 37] = {0};                                         \
+  for (int i = 0; i < NUM * 8 + 37; i++)                                       \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
new file mode 100644
index 00000000000..30996cb2c6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-3.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-3.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 8] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
new file mode 100644
index 00000000000..3d43ef0890c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-4.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-4.c"
+
+#define LIMIT 32767
+
+void __attribute__ ((optimize (0)))
+f_golden (int16_t *restrict a, int16_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 1] + 3;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 1] + 7;
+      a[i * 8 + 7] = b[i * 8 + 7] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int16_t a_##NUM[NUM * 8 + 8] = {0};                                          \
+  int16_t a_golden_##NUM[NUM * 8 + 8] = {0};                                   \
+  int16_t b_##NUM[NUM * 8 + 8] = {0};                                          \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
new file mode 100644
index 00000000000..814308bd7af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-5.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-5.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 7] + 2;
+      a[i * 8 + 2] = b[i * 8 + 4] + 3;
+      a[i * 8 + 3] = b[i * 8 + 8] + 4;
+      a[i * 8 + 4] = b[i * 8 + 1] + 5;
+      a[i * 8 + 5] = b[i * 8 + 7] + 6;
+      a[i * 8 + 6] = b[i * 8 + 4] + 7;
+      a[i * 8 + 7] = b[i * 8 + 8] + 8;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
new file mode 100644
index 00000000000..e317eeac2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-6.c
@@ -0,0 +1,67 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-6.c"
+
+#define LIMIT 128
+
+void __attribute__ ((optimize (0)))
+f_golden (int8_t *restrict a, int8_t *restrict b, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      a[i * 8 + 0] = b[i * 8 + 1] + 1;
+      a[i * 8 + 1] = b[i * 8 + 2] + 2;
+      a[i * 8 + 2] = b[i * 8 + 6] + 8;
+      a[i * 8 + 3] = b[i * 8 + 7] + 4;
+      a[i * 8 + 4] = b[i * 8 + 3] + 5;
+      a[i * 8 + 5] = b[i * 8 + 4] + 6;
+      a[i * 8 + 6] = b[i * 8 + 5] + 7;
+      a[i * 8 + 7] = b[i * 8 + 0] + 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  int8_t a_##NUM[NUM * 8 + 8] = {0};                                           \
+  int8_t a_golden_##NUM[NUM * 8 + 8] = {0};                                    \
+  int8_t b_##NUM[NUM * 8 + 9] = {0};                                           \
+  for (int i = 0; i < NUM * 8 + 9; i++)                                        \
+    {                                                                          \
+      if (i % NUM == 0)                                                        \
+	b_##NUM[i] = (i + NUM) % LIMIT;                                        \
+      else                                                                     \
+	b_##NUM[i] = (i - NUM) % (-LIMIT);                                     \
+    }                                                                          \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_##NUM, NUM);                                     \
+  for (int i = 0; i < NUM * 8 + 8; i++)                                        \
+    {                                                                          \
+      if (a_##NUM[i] != a_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
new file mode 100644
index 00000000000..a8e4781988e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-7.c
@@ -0,0 +1,58 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "slp-7.c"
+
+void
+f_golden (float *__restrict f, double *__restrict d, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      f[i * 2 + 0] = 1;
+      f[i * 2 + 1] = 2;
+      d[i] = 3;
+    }
+}
+
+int
+main (void)
+{
+#define RUN(NUM)                                                               \
+  float a_##NUM[NUM * 2 + 2] = {0};                                            \
+  float a_golden_##NUM[NUM * 2 + 2] = {0};                                     \
+  double b_##NUM[NUM] = {0};                                                   \
+  double b_golden_##NUM[NUM] = {0};                                            \
+  f (a_##NUM, b_##NUM, NUM);                                                   \
+  f_golden (a_golden_##NUM, b_golden_##NUM, NUM);                              \
+  for (int i = 0; i < NUM; i++)                                                \
+    {                                                                          \
+      if (a_##NUM[i * 2 + 0] != a_golden_##NUM[i * 2 + 0])                     \
+	__builtin_abort ();                                                    \
+      if (a_##NUM[i * 2 + 1] != a_golden_##NUM[i * 2 + 1])                     \
+	__builtin_abort ();                                                    \
+      if (b_##NUM[i] != b_golden_##NUM[i])                                     \
+	__builtin_abort ();                                                    \
+    }
+
+  RUN (3);
+  RUN (5);
+  RUN (15);
+  RUN (16);
+  RUN (17);
+  RUN (31);
+  RUN (32);
+  RUN (33);
+  RUN (63);
+  RUN (64);
+  RUN (65);
+  RUN (127);
+  RUN (128);
+  RUN (129);
+  RUN (239);
+  RUN (359);
+  RUN (498);
+  RUN (799);
+  RUN (977);
+  RUN (5789);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
index 500b0adce66..3c03a87377d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/scalable-1.c
@@ -14,4 +14,4 @@ f (int32_t *__restrict f, int32_t *__restrict d, int n)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
index 383c82a3b7c..e68d05f5f48 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
@@ -3,9 +3,4 @@
 
 #include "template-1.h"
 
-/* Currently, we don't support SLP auto-vectorization for VLA. But it's
-   necessary that we add this testcase here to make sure such unsupported SLP
-   auto-vectorization will not cause an ICE. We will enable "vect" checking when
-   we support SLP auto-vectorization for VLA in the future.  */
-
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
index 23cc1c8651f..ecfda79e19a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
index 4f130f02f67..1394f08f2b9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
index 823d51a03cb..c5e89996fa4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
index 5ead22746d3..6b320ca6f38 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 5 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
index e03d1b44ca6..6c2a002de9c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
index 5bb2d9d96fa..ae3f066477c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
index 71820ece4b2..fc676a3865e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c
@@ -3,4 +3,4 @@
 
 #include "template-1.h"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 3 "vect" } } */
-- 
2.36.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-07  2:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06 11:46 [PATCH V2] RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmization juzhe.zhong
2023-06-06 11:46 ` [PATCH] RISC-V: Enable SELECT_VL for RVV juzhe.zhong
2023-06-06 11:49   ` 钟居哲
2023-06-06 11:46 ` [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization juzhe.zhong
  -- strict thread matches above, loose matches on Subject: below --
2023-06-06  4:16 juzhe.zhong
2023-06-06  6:55 ` Richard Biener
2023-06-07  0:38 ` juzhe.zhong
2023-06-07  2:38   ` Kito Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).