public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
@ 2023-07-07 14:32 Juzhe-Zhong
  2023-07-10 21:51 ` 钟居哲
  2023-07-12  2:01 ` Jeff Law
  0 siblings, 2 replies; 14+ messages in thread
From: Juzhe-Zhong @ 2023-07-07 14:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, rdapp.gcc, jeffreyalaw, Juzhe-Zhong

This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.

All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.

We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate gather/scatter).

Also, we support strided loads/stores with vlse.v/vsse.v. Consider this following case:
#define TEST_LOOP(DATA_TYPE, BITS)                                             \
  void __attribute__ ((noinline, noclone))                                     \
  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
			  INDEX##BITS stride, INDEX##BITS n)                   \
  {                                                                            \
    for (INDEX##BITS i = 0; i < n; ++i)                                        \
      dest[i] += src[i * stride];                                              \
  }

Codegen:
f_int8_t_8:
	ble	a3,zero,.L10
	li	a5,1
	mv	a4,a0
	bne	a2,a5,.L4
	li	a2,1
.L6:
	vsetvli	a5,a3,e8,m2,ta,ma
	vle8.v	v2,0(a0)
	vlse8.v	v4,0(a1),a2
	vsetvli	a6,zero,e8,m2,ta,ma
	sub	a3,a3,a5
	vadd.vv	v2,v2,v4
	vsetvli	zero,a5,e8,m2,ta,ma
	vse8.v	v2,0(a4)
	add	a0,a0,a5
	add	a1,a1,a5
	add	a4,a4,a5
	bne	a3,zero,.L6
.L10:
	ret

We use vlse.v instead of vluxei.

This patch has been tested on both RV32 and RV64.

gcc/ChangeLog:

        * config/riscv/autovec.md (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
        (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
        (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
        (len_mask_gather_load<mode><mode>): Ditto.
        (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
        (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
        (len_mask_scatter_store<mode><mode>): Ditto.
        * config/riscv/predicates.md (const_1_operand): New predicate.
        (vector_gs_offset_operand): Ditto.
        (vector_gs_scale_operand_16): Ditto.
        (vector_gs_scale_operand_32): Ditto.
        (vector_gs_scale_operand_64): Ditto.
        (vector_gs_extension_operand): Ditto.
        (vector_gs_scale_operand_16_rv32): Ditto.
        (vector_gs_scale_operand_32_rv32): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
        (expand_gather_scatter): New function.
        * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
        (emit_vlmax_masked_store_insn): New function.
        (emit_nonvlmax_masked_store_insn): Ditto.
        (modulo_sel_indices): Ditto.
        (expand_vec_perm): Fix SLP for gather/scatter.
        (prepare_gather_scatter): New function.
        (strided_load_store_p): Ditto.
        (expand_gather_scatter): Ditto.
        * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI (DI CONST_POLY_INT)).
        * config/riscv/vector-iterators.md: Add gather/scatter.
        * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
        (@vec_duplicate<mode>): Ditto.
        (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix name.
        (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: New test.

---
 gcc/config/riscv/autovec.md                   | 256 ++++++++++++
 gcc/config/riscv/predicates.md                |  39 +-
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-v.cc                   | 372 ++++++++++++++++--
 gcc/config/riscv/riscv.cc                     |  11 +-
 gcc/config/riscv/vector-iterators.md          | 118 +++++-
 gcc/config/riscv/vector.md                    |  30 +-
 .../autovec/gather-scatter/gather_load-1.c    |  38 ++
 .../autovec/gather-scatter/gather_load-10.c   |  35 ++
 .../autovec/gather-scatter/gather_load-11.c   |  32 ++
 .../autovec/gather-scatter/gather_load-12.c   | 112 ++++++
 .../autovec/gather-scatter/gather_load-2.c    |  38 ++
 .../autovec/gather-scatter/gather_load-3.c    |  35 ++
 .../autovec/gather-scatter/gather_load-4.c    |  35 ++
 .../autovec/gather-scatter/gather_load-5.c    |  35 ++
 .../autovec/gather-scatter/gather_load-6.c    |  35 ++
 .../autovec/gather-scatter/gather_load-7.c    |  35 ++
 .../autovec/gather-scatter/gather_load-8.c    |  35 ++
 .../autovec/gather-scatter/gather_load-9.c    |  35 ++
 .../gather-scatter/gather_load_run-1.c        |  41 ++
 .../gather-scatter/gather_load_run-10.c       |  41 ++
 .../gather-scatter/gather_load_run-11.c       |  39 ++
 .../gather-scatter/gather_load_run-12.c       | 124 ++++++
 .../gather-scatter/gather_load_run-2.c        |  41 ++
 .../gather-scatter/gather_load_run-3.c        |  41 ++
 .../gather-scatter/gather_load_run-4.c        |  41 ++
 .../gather-scatter/gather_load_run-5.c        |  41 ++
 .../gather-scatter/gather_load_run-6.c        |  41 ++
 .../gather-scatter/gather_load_run-7.c        |  41 ++
 .../gather-scatter/gather_load_run-8.c        |  41 ++
 .../gather-scatter/gather_load_run-9.c        |  41 ++
 .../gather-scatter/mask_gather_load-1.c       |  39 ++
 .../gather-scatter/mask_gather_load-10.c      |  36 ++
 .../gather-scatter/mask_gather_load-11.c      | 116 ++++++
 .../gather-scatter/mask_gather_load-2.c       |  39 ++
 .../gather-scatter/mask_gather_load-3.c       |  36 ++
 .../gather-scatter/mask_gather_load-4.c       |  36 ++
 .../gather-scatter/mask_gather_load-5.c       |  36 ++
 .../gather-scatter/mask_gather_load-6.c       |  36 ++
 .../gather-scatter/mask_gather_load-7.c       |  36 ++
 .../gather-scatter/mask_gather_load-8.c       |  36 ++
 .../gather-scatter/mask_gather_load-9.c       |  36 ++
 .../gather-scatter/mask_gather_load_run-1.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-10.c  |  48 +++
 .../gather-scatter/mask_gather_load_run-11.c  | 140 +++++++
 .../gather-scatter/mask_gather_load_run-2.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-3.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-4.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-5.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-6.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-7.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-8.c   |  48 +++
 .../gather-scatter/mask_gather_load_run-9.c   |  48 +++
 .../gather-scatter/mask_scatter_store-1.c     |  39 ++
 .../gather-scatter/mask_scatter_store-10.c    |  36 ++
 .../gather-scatter/mask_scatter_store-2.c     |  39 ++
 .../gather-scatter/mask_scatter_store-3.c     |  36 ++
 .../gather-scatter/mask_scatter_store-4.c     |  36 ++
 .../gather-scatter/mask_scatter_store-5.c     |  36 ++
 .../gather-scatter/mask_scatter_store-6.c     |  36 ++
 .../gather-scatter/mask_scatter_store-7.c     |  36 ++
 .../gather-scatter/mask_scatter_store-8.c     |  36 ++
 .../gather-scatter/mask_scatter_store-9.c     |  36 ++
 .../gather-scatter/mask_scatter_store_run-1.c |  48 +++
 .../mask_scatter_store_run-10.c               |  48 +++
 .../gather-scatter/mask_scatter_store_run-2.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-3.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-4.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-5.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-6.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-7.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-8.c |  48 +++
 .../gather-scatter/mask_scatter_store_run-9.c |  48 +++
 .../autovec/gather-scatter/scatter_store-1.c  |  38 ++
 .../autovec/gather-scatter/scatter_store-10.c |  35 ++
 .../autovec/gather-scatter/scatter_store-2.c  |  38 ++
 .../autovec/gather-scatter/scatter_store-3.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-4.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-5.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-6.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-7.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-8.c  |  35 ++
 .../autovec/gather-scatter/scatter_store-9.c  |  35 ++
 .../gather-scatter/scatter_store_run-1.c      |  40 ++
 .../gather-scatter/scatter_store_run-10.c     |  40 ++
 .../gather-scatter/scatter_store_run-2.c      |  40 ++
 .../gather-scatter/scatter_store_run-3.c      |  40 ++
 .../gather-scatter/scatter_store_run-4.c      |  40 ++
 .../gather-scatter/scatter_store_run-5.c      |  40 ++
 .../gather-scatter/scatter_store_run-6.c      |  40 ++
 .../gather-scatter/scatter_store_run-7.c      |  40 ++
 .../gather-scatter/scatter_store_run-8.c      |  40 ++
 .../gather-scatter/scatter_store_run-9.c      |  40 ++
 .../autovec/gather-scatter/strided_load-1.c   |  46 +++
 .../autovec/gather-scatter/strided_load-2.c   |  46 +++
 .../gather-scatter/strided_load_run-1.c       |  84 ++++
 .../gather-scatter/strided_load_run-2.c       |  84 ++++
 .../autovec/gather-scatter/strided_store-1.c  |  46 +++
 .../autovec/gather-scatter/strided_store-2.c  |  46 +++
 .../gather-scatter/strided_store_run-1.c      |  82 ++++
 .../gather-scatter/strided_store_run-2.c      |  82 ++++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |  23 ++
 102 files changed, 5084 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9e61b2e41d8..78b9b5a2edb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -57,6 +57,262 @@
   }
 )
 
+;; =========================================================================
+;; == Gather Load
+;; =========================================================================
+
+(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand:VNX1_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX1_QHSD:gs_extension>")
+   (match_operand 4 "<VNX1_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand:VNX2_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX2_QHSD:gs_extension>")
+   (match_operand 4 "<VNX2_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand:VNX4_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX4_QHSD:gs_extension>")
+   (match_operand 4 "<VNX4_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand:VNX8_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX8_QHSD:gs_extension>")
+   (match_operand 4 "<VNX8_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand:VNX16_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX16_QHSD:gs_extension>")
+   (match_operand 4 "<VNX16_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand:VNX32_QHS 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX32_QHS:gs_extension>")
+   (match_operand 4 "<VNX32_QHS:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand:VNX64_QH 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX64_QH:gs_extension>")
+   (match_operand 4 "<VNX64_QH:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_gather_load<mode><mode>"
+  [(match_operand:VNX128_Q 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 2 "vector_gs_offset_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand 4 "const_1_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; =========================================================================
+;; == Scatter Store
+;; =========================================================================
+
+(define_expand "len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX1_QHSD:gs_extension>")
+   (match_operand 3 "<VNX1_QHSD:gs_scale>")
+   (match_operand:VNX1_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX2_QHSD:gs_extension>")
+   (match_operand 3 "<VNX2_QHSD:gs_scale>")
+   (match_operand:VNX2_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX4_QHSD:gs_extension>")
+   (match_operand 3 "<VNX4_QHSD:gs_scale>")
+   (match_operand:VNX4_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX8_QHSD:gs_extension>")
+   (match_operand 3 "<VNX8_QHSD:gs_scale>")
+   (match_operand:VNX8_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX16_QHSD:gs_extension>")
+   (match_operand 3 "<VNX16_QHSD:gs_scale>")
+   (match_operand:VNX16_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX32_QHS:gs_extension>")
+   (match_operand 3 "<VNX32_QHS:gs_scale>")
+   (match_operand:VNX32_QHS 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX64_QH:gs_extension>")
+   (match_operand 3 "<VNX64_QH:gs_scale>")
+   (match_operand:VNX64_QH 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_scatter_store<mode><mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 1 "vector_gs_offset_operand")
+   (match_operand 2 "const_1_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand:VNX128_Q 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
 ;; =========================================================================
 ;; == Vector creation
 ;; =========================================================================
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index eb975eaf994..5a65334e943 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -61,6 +61,10 @@
   (and (match_code "const_int,const_wide_int,const_vector")
        (match_test "op == CONST0_RTX (GET_MODE (op))")))
 
+(define_predicate "const_1_operand"
+  (and (match_code "const_int,const_wide_int,const_vector")
+       (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
 (define_predicate "reg_or_0_operand"
   (ior (match_operand 0 "const_0_operand")
        (match_operand 0 "register_operand")))
@@ -341,6 +345,39 @@
   (ior (match_operand 0 "register_operand")
        (match_code "const_vector")))
 
+(define_predicate "vector_gs_offset_operand"
+  (ior (match_operand 0 "register_operand")
+       (and (match_code "const_vector")
+            (match_test "CONST_VECTOR_NPATTERNS (op) == 1
+	                 && !CONST_VECTOR_DUPLICATE_P (op)"))))
+
+(define_predicate "vector_gs_scale_operand_16"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
+
+(define_predicate "vector_gs_scale_operand_32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 4")))
+
+(define_predicate "vector_gs_scale_operand_64"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || (INTVAL (op) == 8 && Pmode == DImode)")))
+
+(define_predicate "vector_gs_extension_operand"
+  (ior (match_operand 0 "const_1_operand")
+       (and (match_operand 0 "const_0_operand")
+            (match_test "Pmode == SImode"))))
+
+(define_predicate "vector_gs_scale_operand_16_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+		    || (INTVAL (op) == 2 && Pmode == SImode)")))
+
+(define_predicate "vector_gs_scale_operand_32_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+		    || (INTVAL (op) == 4 && Pmode == SImode)")))
+
 (define_predicate "ltge_operator"
   (match_code "lt,ltu,ge,geu"))
 
@@ -376,7 +413,7 @@
 		|| rtx_equal_p (op, CONST0_RTX (GET_MODE (op))))
 		&& maybe_gt (GET_MODE_BITSIZE (GET_MODE (op)), GET_MODE_BITSIZE (Pmode)))")
     (ior (match_test "rtx_equal_p (op, CONST0_RTX (GET_MODE (op)))")
-         (ior (match_operand 0 "const_int_operand")
+         (ior (match_code "const_int,const_poly_int")
               (ior (match_operand 0 "register_operand")
                    (match_test "satisfies_constraint_Wdm (op)"))))))
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5766e3597e8..fd6caccc183 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -148,6 +148,8 @@ enum insn_type
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
   RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
+  RVV_GATHER_M_OP = 5,
+  RVV_SCATTER_M_OP = 4,
 };
 enum vlmul_type
 {
@@ -255,6 +257,7 @@ void expand_vec_init (rtx, rtx);
 void expand_vec_perm (rtx, rtx, rtx, rtx);
 void expand_select_vl (rtx *);
 void expand_load_store (rtx *, bool);
+void expand_gather_scatter (rtx *, bool);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8d5bed7ebe4..dd36b3b71c7 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
+#include "gimple.h"
 
 using namespace riscv_vector;
 
@@ -556,15 +557,22 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
   return true;
 }
 
-/* Return a const_int vector of VAL.
-
-   This function also exists in aarch64, we may unify it in middle-end in the
-   future.  */
+/* Return a const vector of VAL. The VAL can be either const_int or
+   const_poly_int.  */
 
 static rtx
 gen_const_vector_dup (machine_mode mode, poly_int64 val)
 {
-  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+  scalar_mode smode = GET_MODE_INNER (mode);
+  rtx c = gen_int_mode (val, smode);
+  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
+    {
+      /* When VAL is const_poly_int value, we need to explicitly broadcast
+	 it into a vector using RVV broadcast instruction.  */
+      rtx dup = gen_reg_rtx (mode);
+      emit_insn (gen_vec_duplicate (mode, dup, c));
+      return dup;
+    }
   return gen_const_vec_duplicate (mode, c);
 }
 
@@ -901,6 +909,39 @@ emit_nonvlmax_masked_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
+/* This function emits a VLMAX masked store instruction.  */
+static void
+emit_vlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+					  /*HAS_DEST_P*/ false,
+					  /*FULLY_UNMASKED_P*/ false,
+					  /*USE_REAL_MERGE_P*/ true,
+					  /*HAS_AVL_P*/ true,
+					  /*VLMAX_P*/ true, dest_mode,
+					  mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* This function emits a non-VLMAX masked store instruction.  */
+static void
+emit_nonvlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+					  /*HAS_DEST_P*/ false,
+					  /*FULLY_UNMASKED_P*/ false,
+					  /*USE_REAL_MERGE_P*/ true,
+					  /*HAS_AVL_P*/ true,
+					  /*VLMAX_P*/ false, dest_mode,
+					  mask_mode);
+  e.set_vl (avl);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* This function emits a masked instruction.  */
 void
 emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
@@ -1137,7 +1178,6 @@ static void
 expand_const_vector (rtx target, rtx src)
 {
   machine_mode mode = GET_MODE (target);
-  scalar_mode elt_mode = GET_MODE_INNER (mode);
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
     {
       rtx elt;
@@ -1162,7 +1202,6 @@ expand_const_vector (rtx target, rtx src)
 	}
       else
 	{
-	  elt = force_reg (elt_mode, elt);
 	  rtx ops[] = {tmp, elt};
 	  emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
 	}
@@ -2431,6 +2470,25 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
   return false;
 }
 
+/* Modulo all SEL indices to ensure they are all in range if [0, MAX_SEL].  */
+static rtx
+modulo_sel_indices (rtx sel, poly_uint64 max_sel)
+{
+  rtx sel_mod;
+  machine_mode sel_mode = GET_MODE (sel);
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
+  /* If SEL is variable-length CONST_VECTOR, we don't need to modulo it.  */
+  if (!nunits.is_constant () && CONST_VECTOR_P (sel))
+    sel_mod = sel;
+  else
+    {
+      rtx mod = gen_const_vector_dup (sel_mode, max_sel);
+      sel_mod
+	= expand_simple_binop (sel_mode, AND, sel, mod, NULL, 0, OPTAB_DIRECT);
+    }
+  return sel_mod;
+}
+
 /* Implement vec_perm<mode>.  */
 
 void
@@ -2444,41 +2502,43 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
      index is in range of [0, nunits - 1]. A single vrgather instructions is
      enough. Since we will use vrgatherei16.vv for variable-length vector,
      it is never out of range and we don't need to modulo the index.  */
-  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
+  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
     {
       emit_vlmax_gather_insn (target, op0, sel);
       return;
     }
 
+  /* Check if all the indices are same.  */
+  rtx elt;
+  if (const_vec_duplicate_p (sel, &elt))
+    {
+      poly_uint64 value = rtx_to_poly_int64 (elt);
+      rtx op = op0;
+      if (maybe_gt (value, nunits - 1))
+	{
+	  sel = gen_const_vector_dup (sel_mode, value - nunits);
+	  op = op1;
+	}
+      emit_vlmax_gather_insn (target, op, sel);
+    }
+
+  /* Note: vec_perm indices are supposed to wrap when they go beyond the
+     size of the two value vectors, i.e. the upper bits of the indices
+     are effectively ignored.  RVV vrgather instead produces 0 for any
+     out-of-range indices, so we need to modulo all the vec_perm indices
+     to ensure they are all in range of [0, nunits - 1] when op0 == op1
+     or all in range of [0, 2 * nunits - 1] when op0 != op1.  */
+  rtx sel_mod
+    = modulo_sel_indices (sel,
+			  rtx_equal_p (op0, op1) ? nunits - 1 : 2 * nunits - 1);
   /* Check if the two values vectors are the same.  */
-  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
-    {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
-	 size of the two value vectors, i.e. the upper bits of the indices
-	 are effectively ignored.  RVV vrgather instead produces 0 for any
-	 out-of-range indices, so we need to modulo all the vec_perm indices
-	 to ensure they are all in range of [0, nunits - 1].  */
-      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
-      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-					 OPTAB_DIRECT);
-      emit_vlmax_gather_insn (target, op1, sel_mod);
+  if (rtx_equal_p (op0, op1))
+    {
+      emit_vlmax_gather_insn (target, op0, sel_mod);
       return;
     }
 
-  rtx sel_mod = sel;
   rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  /* We don't need to modulo indices for VLA vector.
-     Since we should gurantee they aren't out of range before.  */
-  if (nunits.is_constant ())
-    {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
-	 size of the two value vectors, i.e. the upper bits of the indices
-	 are effectively ignored.  RVV vrgather instead produces 0 for any
-	 out-of-range indices, so we need to modulo all the vec_perm indices
-	 to ensure they are all in range of [0, 2 * nunits - 1].  */
-      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-				     OPTAB_DIRECT);
-    }
 
   /* This following sequence is handling the case that:
      __builtin_shufflevector (vec1, vec2, index...), the index can be any
@@ -2812,4 +2872,252 @@ expand_load_store (rtx *ops, bool is_load)
     }
 }
 
+/* Prepare insn_code for gather_load/scatter_store according to
+   the vector mode and index mode.  */
+static insn_code
+prepare_gather_scatter (machine_mode vec_mode, machine_mode idx_mode,
+			bool is_load)
+{
+  if (!is_load)
+    return code_for_pred_indexed_store (UNSPEC_UNORDERED, vec_mode, idx_mode);
+  else
+    {
+      unsigned src_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (idx_mode));
+      unsigned dst_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+      if (dst_eew_bitsize == src_eew_bitsize)
+	return code_for_pred_indexed_load_same_eew (UNSPEC_UNORDERED, vec_mode);
+      else if (dst_eew_bitsize > src_eew_bitsize)
+	{
+	  unsigned factor = dst_eew_bitsize / src_eew_bitsize;
+	  switch (factor)
+	    {
+	    case 2:
+	      return code_for_pred_indexed_load_x2_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 4:
+	      return code_for_pred_indexed_load_x4_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 8:
+	      return code_for_pred_indexed_load_x8_greater_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	{
+	  unsigned factor = src_eew_bitsize / dst_eew_bitsize;
+	  switch (factor)
+	    {
+	    case 2:
+	      return code_for_pred_indexed_load_x2_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 4:
+	      return code_for_pred_indexed_load_x4_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    case 8:
+	      return code_for_pred_indexed_load_x8_smaller_eew (
+		UNSPEC_UNORDERED, vec_mode);
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+    }
+}
+
+/* Return true if it is the strided load/store.  */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+  if (const_vec_series_p (vec_offset, base, step))
+    return true;
+
+  /* For strided load/store, vectorizer always generates
+     VEC_SERIES_EXPR for vec_offset.  */
+  tree expr = REG_EXPR (vec_offset);
+  if (!expr || TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+    return false;
+
+  tree baset = gimple_assign_rhs1 (def_stmt);
+  tree stept = gimple_assign_rhs2 (def_stmt);
+  *base = expand_normal (baset);
+  *step = expand_normal (stept);
+
+  if (!rtx_equal_p (*base, const0_rtx))
+    return false;
+  return true;
+}
+
+/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
+void
+expand_gather_scatter (rtx *ops, bool is_load)
+{
+  rtx ptr, vec_offset, vec_reg, len, mask;
+  bool zero_extend_p;
+  int scale_log2;
+  if (is_load)
+    {
+      vec_reg = ops[0];
+      ptr = ops[1];
+      vec_offset = ops[2];
+      zero_extend_p = INTVAL (ops[3]);
+      scale_log2 = exact_log2 (INTVAL (ops[4]));
+      len = ops[5];
+      mask = ops[7];
+    }
+  else
+    {
+      vec_reg = ops[4];
+      ptr = ops[0];
+      vec_offset = ops[1];
+      zero_extend_p = INTVAL (ops[2]);
+      scale_log2 = exact_log2 (INTVAL (ops[3]));
+      len = ops[5];
+      mask = ops[7];
+    }
+
+  machine_mode vec_mode = GET_MODE (vec_reg);
+  machine_mode idx_mode = GET_MODE (vec_offset);
+  scalar_mode inner_vec_mode = GET_MODE_INNER (vec_mode);
+  scalar_mode inner_idx_mode = GET_MODE_INNER (idx_mode);
+  unsigned inner_vsize = GET_MODE_BITSIZE (inner_vec_mode);
+  unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
+  poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
+  poly_int64 value;
+  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+
+  /* We use vlse.v/vsse.v instead of indexed load/store by default
+     if it is strided load/store.
+
+     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
+     We may need COST MODE to adjust it.  */
+  rtx base, step;
+  if (strided_load_store_p (vec_offset, &base, &step))
+    {
+      if (GET_MODE (step) != Pmode)
+	{
+	  if (CONSTANT_P (step))
+	    step = force_reg (Pmode, step);
+	  else
+	    {
+	      rtx extend_step = gen_reg_rtx (Pmode);
+	      emit_insn (gen_extend_insn (extend_step, step, Pmode,
+					  GET_MODE (step),
+					  zero_extend_p ? true : false));
+	      step = extend_step;
+	    }
+	}
+      if (scale_log2 != 0)
+	{
+	  rtx scale_step = gen_reg_rtx (Pmode);
+	  rtx tmp = expand_simple_binop (Pmode, ASHIFT, step,
+					 gen_int_mode (scale_log2, Pmode),
+					 NULL_RTX, false, OPTAB_DIRECT);
+	  emit_move_insn (scale_step, tmp);
+	  step = scale_step;
+	}
+
+      rtx mem = validize_mem (gen_rtx_MEM (vec_mode, ptr));
+      /* Emit vlse.v if it's load. Otherwise, emit vsse.v.  */
+      if (is_load)
+	{
+	  insn_code icode = code_for_pred_strided_load (vec_mode);
+	  rtx load_ops[] = {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
+	  if (is_vlmax)
+	    emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+	  else
+	    emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+	}
+      else
+	{
+	  if (is_vlmax)
+	    {
+	      rtx vlmax_len = gen_reg_rtx (Pmode);
+	      emit_vlmax_vsetvl (vec_mode, vlmax_len);
+	      emit_insn (gen_pred_strided_store (vec_mode, mem, mask, step,
+						 vec_reg, vlmax_len,
+						 get_avl_type_rtx (VLMAX)));
+	    }
+	  else
+	    emit_insn (gen_pred_strided_store (vec_mode, mem, mask, step,
+					       vec_reg, len,
+					       get_avl_type_rtx (NONVLMAX)));
+	}
+      return;
+    }
+
+  if (inner_offsize < inner_vsize)
+    {
+      /* 7.2. Vector Load/Store Addressing Modes.
+	 If the vector offset elements are narrower than XLEN, they are
+	 zero-extended to XLEN before adding to the ptr effective address. If
+	 the vector offset elements are wider than XLEN, the least-significant
+	 XLEN bits are used in the address calculation. An implementation must
+	 raise an illegal instruction exception if the EEW is not supported for
+	 offset elements.
+
+	 RVV spec only refers to the scale_log == 0 case.  */
+      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
+	{
+	  if (zero_extend_p)
+	    inner_idx_mode
+	      = int_mode_for_size (inner_offsize * 2, 0).require ();
+	  else
+	    inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
+	  machine_mode new_idx_mode
+	    = get_vector_mode (inner_idx_mode, nunits).require ();
+	  rtx tmp = gen_reg_rtx (new_idx_mode);
+	  emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
+				      zero_extend_p ? true : false));
+	  vec_offset = tmp;
+	  idx_mode = new_idx_mode;
+	}
+    }
+
+  if (scale_log2 != 0)
+    {
+      rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+			      gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
+			      OPTAB_DIRECT);
+      vec_offset = tmp;
+    }
+
+  insn_code icode = prepare_gather_scatter (vec_mode, idx_mode, is_load);
+  if (is_vlmax)
+    {
+      if (is_load)
+	{
+	  rtx load_ops[]
+	    = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+	  emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+	}
+      else
+	{
+	  rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+	  emit_vlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops);
+	}
+    }
+  else
+    {
+      if (is_load)
+	{
+	  rtx load_ops[]
+	    = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+	  emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+	}
+      else
+	{
+	  rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+	  emit_nonvlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops,
+					   len);
+	}
+    }
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..8970f6da6ad 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2060,7 +2060,14 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, rtx tmp, rtx src)
      (m, n) = base * magn + constant.
      This calculation doesn't need div operation.  */
 
-  emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+  if (mode <= Pmode)
+    emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+  else
+    {
+      emit_move_insn (gen_highpart (Pmode, tmp), CONST0_RTX (Pmode));
+      emit_move_insn (gen_lowpart (Pmode, tmp),
+		      gen_int_mode (BYTES_PER_RISCV_VECTOR, Pmode));
+    }
 
   if (BYTES_PER_RISCV_VECTOR.is_constant ())
     {
@@ -2167,7 +2174,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
 	  return false;
 	}
 
-      if (satisfies_constraint_vp (src))
+      if (satisfies_constraint_vp (src) && GET_MODE (src) == Pmode)
 	return false;
 
       if (GET_MODE_SIZE (mode).to_constant () < GET_MODE_SIZE (Pmode))
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 8afd3dcaddd..ec49544bcf6 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -115,6 +115,9 @@
 
 (define_mode_iterator VEEWEXT2 [
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
@@ -161,6 +164,8 @@
 (define_mode_iterator VEEWTRUNC2 [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
@@ -172,6 +177,8 @@
 (define_mode_iterator VEEWTRUNC4 [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI (VNx32QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
 ])
 
 (define_mode_iterator VEEWTRUNC8 [
@@ -362,46 +369,67 @@
 ])
 
 (define_mode_iterator VNX1_QHSD [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
 ])
 
 (define_mode_iterator VNX2_QHSD [
-  VNx2QI VNx2HI VNx2SI
+  VNx2QI
+  VNx2HI
+  VNx2SI
   (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
 (define_mode_iterator VNX4_QHSD [
-  VNx4QI VNx4HI VNx4SI
+  VNx4QI
+  VNx4HI
+  VNx4SI
   (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
 (define_mode_iterator VNX8_QHSD [
-  VNx8QI VNx8HI VNx8SI
+  VNx8QI
+  VNx8HI
+  VNx8SI
   (VNx8DI "TARGET_VECTOR_ELEN_64")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
-(define_mode_iterator VNX16_QHS [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32")
+(define_mode_iterator VNX16_QHSD [
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX32_QHS [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX64_QH [
   (VNx64QI "TARGET_MIN_VLEN > 32")
   (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX128_Q [
@@ -409,35 +437,49 @@
 ])
 
 (define_mode_iterator VNX1_QHSDI [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX2_QHSDI [
-  VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx2QI
+  VNx2HI
+  VNx2SI
+  (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX4_QHSDI [
-  VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx4QI
+  VNx4HI
+  VNx4SI
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX8_QHSDI [
-  VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx8QI
+  VNx8HI
+  VNx8SI
+  (VNx8DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX16_QHSDI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128 && TARGET_64BIT")
 ])
 
 (define_mode_iterator VNX32_QHSI [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator VNX64_QHI [
-  VNx64QI (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
 ])
 
 (define_mode_iterator V_WHOLE [
@@ -1393,6 +1435,8 @@
 (define_mode_attr VINDEX_DOUBLE_TRUNC [
   (VNx1HI "VNx1QI") (VNx2HI "VNx2QI")  (VNx4HI "VNx4QI")  (VNx8HI "VNx8QI")
   (VNx16HI "VNx16QI") (VNx32HI "VNx32QI") (VNx64HI "VNx64QI")
+  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI")  (VNx4HF "VNx4QI")  (VNx8HF "VNx8QI")
+  (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
   (VNx1SI "VNx1HI") (VNx2SI "VNx2HI") (VNx4SI "VNx4HI") (VNx8SI "VNx8HI")
   (VNx16SI "VNx16HI") (VNx32SI "VNx32HI")
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI")
@@ -1420,6 +1464,7 @@
 (define_mode_attr VINDEX_DOUBLE_EXT [
   (VNx1QI "VNx1HI") (VNx2QI "VNx2HI") (VNx4QI "VNx4HI") (VNx8QI "VNx8HI") (VNx16QI "VNx16HI") (VNx32QI "VNx32HI") (VNx64QI "VNx64HI")
   (VNx1HI "VNx1SI") (VNx2HI "VNx2SI") (VNx4HI "VNx4SI") (VNx8HI "VNx8SI") (VNx16HI "VNx16SI") (VNx32HI "VNx32SI")
+  (VNx1HF "VNx1SI") (VNx2HF "VNx2SI") (VNx4HF "VNx4SI") (VNx8HF "VNx8SI") (VNx16HF "VNx16SI") (VNx32HF "VNx32SI")
   (VNx1SI "VNx1DI") (VNx2SI "VNx2DI") (VNx4SI "VNx4DI") (VNx8SI "VNx8DI") (VNx16SI "VNx16DI")
   (VNx1SF "VNx1DI") (VNx2SF "VNx2DI") (VNx4SF "VNx4DI") (VNx8SF "VNx8DI") (VNx16SF "VNx16DI")
 ])
@@ -1427,6 +1472,7 @@
 (define_mode_attr VINDEX_QUAD_EXT [
   (VNx1QI "VNx1SI") (VNx2QI "VNx2SI") (VNx4QI "VNx4SI") (VNx8QI "VNx8SI") (VNx16QI "VNx16SI") (VNx32QI "VNx32SI")
   (VNx1HI "VNx1DI") (VNx2HI "VNx2DI") (VNx4HI "VNx4DI") (VNx8HI "VNx8DI") (VNx16HI "VNx16DI")
+  (VNx1HF "VNx1DI") (VNx2HF "VNx2DI") (VNx4HF "VNx4DI") (VNx8HF "VNx8DI") (VNx16HF "VNx16DI")
 ])
 
 (define_mode_attr VINDEX_OCT_EXT [
@@ -1471,6 +1517,40 @@
   (VNx4DI "VNx8BI") (VNx8DI "VNx16BI") (VNx16DI "VNx32BI")
 ])
 
+(define_mode_attr gs_extension [
+  (VNx1QI "immediate_operand") (VNx2QI "immediate_operand") (VNx4QI "immediate_operand") (VNx8QI "immediate_operand") (VNx16QI "immediate_operand")
+  (VNx32QI "vector_gs_extension_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "immediate_operand") (VNx2HI "immediate_operand") (VNx4HI "immediate_operand") (VNx8HI "immediate_operand") (VNx16HI "immediate_operand")
+  (VNx32HI "vector_gs_extension_operand") (VNx64HI "const_1_operand")
+  (VNx1SI "immediate_operand") (VNx2SI "immediate_operand") (VNx4SI "immediate_operand") (VNx8SI "immediate_operand") (VNx16SI "immediate_operand")
+  (VNx32SI "vector_gs_extension_operand")
+  (VNx1DI "immediate_operand") (VNx2DI "immediate_operand") (VNx4DI "immediate_operand") (VNx8DI "immediate_operand") (VNx16DI "immediate_operand")
+
+  (VNx1HF "immediate_operand") (VNx2HF "immediate_operand") (VNx4HF "immediate_operand") (VNx8HF "immediate_operand") (VNx16HF "immediate_operand")
+  (VNx32HF "vector_gs_extension_operand") (VNx64HF "const_1_operand")
+  (VNx1SF "immediate_operand") (VNx2SF "immediate_operand") (VNx4SF "immediate_operand") (VNx8SF "immediate_operand") (VNx16SF "immediate_operand")
+  (VNx32SF "vector_gs_extension_operand")
+  (VNx1DF "immediate_operand") (VNx2DF "immediate_operand") (VNx4DF "immediate_operand") (VNx8DF "immediate_operand") (VNx16DF "immediate_operand")
+])
+
+(define_mode_attr gs_scale [
+  (VNx1QI "const_1_operand") (VNx2QI "const_1_operand") (VNx4QI "const_1_operand") (VNx8QI "const_1_operand")
+  (VNx16QI "const_1_operand") (VNx32QI "const_1_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "vector_gs_scale_operand_16") (VNx2HI "vector_gs_scale_operand_16") (VNx4HI "vector_gs_scale_operand_16") (VNx8HI "vector_gs_scale_operand_16")
+  (VNx16HI "vector_gs_scale_operand_16") (VNx32HI "vector_gs_scale_operand_16_rv32") (VNx64HI "const_1_operand")
+  (VNx1SI "vector_gs_scale_operand_32") (VNx2SI "vector_gs_scale_operand_32") (VNx4SI "vector_gs_scale_operand_32") (VNx8SI "vector_gs_scale_operand_32")
+  (VNx16SI "vector_gs_scale_operand_32") (VNx32SI "vector_gs_scale_operand_32_rv32")
+  (VNx1DI "vector_gs_scale_operand_64") (VNx2DI "vector_gs_scale_operand_64") (VNx4DI "vector_gs_scale_operand_64") (VNx8DI "vector_gs_scale_operand_64")
+  (VNx16DI "vector_gs_scale_operand_64")
+
+  (VNx1HF "vector_gs_scale_operand_16") (VNx2HF "vector_gs_scale_operand_16") (VNx4HF "vector_gs_scale_operand_16") (VNx8HF "vector_gs_scale_operand_16")
+  (VNx16HF "vector_gs_scale_operand_16") (VNx32HF "vector_gs_scale_operand_16_rv32") (VNx64HF "const_1_operand")
+  (VNx1SF "vector_gs_scale_operand_32") (VNx2SF "vector_gs_scale_operand_32") (VNx4SF "vector_gs_scale_operand_32") (VNx8SF "vector_gs_scale_operand_32")
+  (VNx16SF "vector_gs_scale_operand_32") (VNx32SF "vector_gs_scale_operand_32_rv32")
+  (VNx1DF "vector_gs_scale_operand_64") (VNx2DF "vector_gs_scale_operand_64") (VNx4DF "vector_gs_scale_operand_64") (VNx8DF "vector_gs_scale_operand_64")
+  (VNx16DF "vector_gs_scale_operand_64")
+])
+
 (define_int_iterator WREDUC [UNSPEC_WREDUC_SUM UNSPEC_WREDUC_USUM])
 
 (define_int_iterator ORDER [UNSPEC_ORDERED UNSPEC_UNORDERED])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5b7a17b9d34..19740c89132 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -818,7 +818,7 @@
 ;; This pattern only handles duplicates of non-constant inputs.
 ;; Constant vectors go through the movm pattern instead.
 ;; So "direct_broadcast_operand" can only be mem or reg, no CONSTANT.
-(define_expand "vec_duplicate<mode>"
+(define_expand "@vec_duplicate<mode>"
   [(set (match_operand:V 0 "register_operand")
 	(vec_duplicate:V
 	  (match_operand:<VEL> 1 "direct_broadcast_operand")))]
@@ -1357,8 +1357,16 @@
 	}
     }
   else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
-           && immediate_operand (operands[3], Pmode))
-    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, operands[3]));
+           && (immediate_operand (operands[3], Pmode)
+	       || (CONST_POLY_INT_P (operands[3])
+	           && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
+		   && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
+    {
+      rtx tmp = gen_reg_rtx (Pmode);
+      poly_int64 value = rtx_to_poly_int64 (operands[3]);
+      emit_move_insn (tmp, gen_int_mode (value, Pmode));
+      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
+    }
   else
     operands[3] = force_reg (<VEL>mode, operands[3]);
 })
@@ -1387,7 +1395,8 @@
    vlse<sew>.v\t%0,%3,zero
    vmv.s.x\t%0,%3
    vmv.s.x\t%0,%3"
-  "register_operand (operands[3], <VEL>mode)
+  "(register_operand (operands[3], <VEL>mode)
+  || CONST_POLY_INT_P (operands[3]))
   && GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)"
   [(set (match_dup 0)
 	(if_then_else:VI (unspec:<VM> [(match_dup 1) (match_dup 4)
@@ -1397,6 +1406,12 @@
 	  (match_dup 2)))]
   {
     gcc_assert (can_create_pseudo_p ());
+    if (CONST_POLY_INT_P (operands[3]))
+      {
+        rtx tmp = gen_reg_rtx (<VEL>mode);
+	emit_move_insn (tmp, operands[3]);
+	operands[3] = tmp;
+      }
     rtx m = assign_stack_local (<VEL>mode, GET_MODE_SIZE (<VEL>mode),
 				GET_MODE_ALIGNMENT (<VEL>mode));
     m = validize_mem (m);
@@ -1483,6 +1498,7 @@
 	     (match_operand 5 "vector_length_operand"    "   rK,    rK,    rK")
 	     (match_operand 6 "const_int_operand"        "    i,     i,     i")
 	     (match_operand 7 "const_int_operand"        "    i,     i,     i")
+	     (match_operand 8 "const_int_operand"        "    i,     i,     i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	  (unspec:V
@@ -1738,7 +1754,7 @@
   [(set_attr "type" "vst<order>x")
    (set_attr "mode" "<VNX8_QHSD:MODE>")])
 
-(define_insn "@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>"
+(define_insn "@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
   [(set (mem:BLK (scratch))
 	(unspec:BLK
 	  [(unspec:<VM>
@@ -1749,11 +1765,11 @@
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	   (match_operand 1 "pmode_reg_or_0_operand"      "  rJ")
 	   (match_operand:VNX16_QHSDI 2 "register_operand" "  vr")
-	   (match_operand:VNX16_QHS 3 "register_operand"  "  vr")] ORDER))]
+	   (match_operand:VNX16_QHSD 3 "register_operand"  "  vr")] ORDER))]
   "TARGET_VECTOR"
   "vs<order>xei<VNX16_QHSDI:sew>.v\t%3,(%z1),%2%p0"
   [(set_attr "type" "vst<order>x")
-   (set_attr "mode" "<VNX16_QHS:MODE>")])
+   (set_attr "mode" "<VNX16_QHSD:MODE>")])
 
 (define_insn "@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>"
   [(set (mem:BLK (scratch))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
new file mode 100644
index 00000000000..dffe13f6a8a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
new file mode 100644
index 00000000000..a622e516f06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
new file mode 100644
index 00000000000..4692380233d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE)                                                   \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict *src)           \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += *src[i];                                                      \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t)                                                                   \
+  T (uint8_t)                                                                  \
+  T (int16_t)                                                                  \
+  T (uint16_t)                                                                 \
+  T (_Float16)                                                                 \
+  T (int32_t)                                                                  \
+  T (uint32_t)                                                                 \
+  T (float)                                                                    \
+  T (int64_t)                                                                  \
+  T (uint64_t)                                                                 \
+  T (double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
new file mode 100644
index 00000000000..71a3dd466fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
@@ -0,0 +1,112 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+				INDEX_TYPE *restrict index)                    \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+	y[i * 2] = x[index[i * 2]] + 1;                                        \
+	y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                                \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
new file mode 100644
index 00000000000..785550c4b2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
new file mode 100644
index 00000000000..22aeb889221
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
new file mode 100644
index 00000000000..d74a83415d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
new file mode 100644
index 00000000000..2b6c0a87c18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
new file mode 100644
index 00000000000..407cc8a5a73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
new file mode 100644
index 00000000000..81b31ef26aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
new file mode 100644
index 00000000000..0bfdb8f0acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
new file mode 100644
index 00000000000..46f791105ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
new file mode 100644
index 00000000000..0d3c5b71e5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
new file mode 100644
index 00000000000..145df1e7797
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
new file mode 100644
index 00000000000..d36b6f025f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE *src_##DATA_TYPE[128];                                             \
+  DATA_TYPE src2_##DATA_TYPE[128];                                             \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = src2_##DATA_TYPE + i;                               \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE);                           \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i] + src_##DATA_TYPE[i][0]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
new file mode 100644
index 00000000000..b4e2ead8ca9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-12.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+	= (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+				src_##DATA_TYPE##_##INDEX_TYPE,                \
+				index_##DATA_TYPE##_##INDEX_TYPE);             \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                           \
+	      == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+		    [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                  \
+		  + 1));                                                       \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                       \
+	      == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+		    [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]              \
+		  + 2));                                                       \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
new file mode 100644
index 00000000000..76c6df32e6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
new file mode 100644
index 00000000000..0fd64260082
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
new file mode 100644
index 00000000000..069d232b912
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
new file mode 100644
index 00000000000..499e555c1d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
new file mode 100644
index 00000000000..ec6587aa4e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
new file mode 100644
index 00000000000..c16287955a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
new file mode 100644
index 00000000000..e1744f60dbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
new file mode 100644
index 00000000000..3ad6d33087f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+	    == (dest2_##DATA_TYPE[i]                                           \
+		+ src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
new file mode 100644
index 00000000000..a5de0deccbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
new file mode 100644
index 00000000000..74a0d05b37d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
new file mode 100644
index 00000000000..98c5b4678b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
@@ -0,0 +1,116 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+				INDEX_TYPE *restrict index,                    \
+				INDEX_TYPE *restrict cond)                     \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+	if (cond[i * 2])                                                       \
+	  y[i * 2] = x[index[i * 2]] + 1;                                      \
+	if (cond[i * 2 + 1])                                                   \
+	  y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                              \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
+/* { dg-final { scan-assembler-not {vlse64\.v\s+v[0-9]+,\s*0\([a-x0-9]+\),\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
new file mode 100644
index 00000000000..03f84ce962c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
new file mode 100644
index 00000000000..8578001ef41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
new file mode 100644
index 00000000000..b273caa0bfe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
new file mode 100644
index 00000000000..5055d886d62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
new file mode 100644
index 00000000000..2a4ae58588f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
new file mode 100644
index 00000000000..31d9414c549
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
new file mode 100644
index 00000000000..73ed23042fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
new file mode 100644
index 00000000000..2f64e805759
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
new file mode 100644
index 00000000000..41f60bd88b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
new file mode 100644
index 00000000000..9840434fa41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
new file mode 100644
index 00000000000..105c706dbf9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
@@ -0,0 +1,140 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE dest2_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  INDEX_TYPE cond_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+	= (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      dest_##DATA_TYPE##_##INDEX_TYPE[i]                                       \
+	= (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      dest2_##DATA_TYPE##_##INDEX_TYPE[i]                                      \
+	= (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+      cond_##DATA_TYPE##_##INDEX_TYPE[i] = (INDEX_TYPE) ((i & 0x3) == 3);      \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+				src_##DATA_TYPE##_##INDEX_TYPE,                \
+				index_##DATA_TYPE##_##INDEX_TYPE,              \
+				cond_##DATA_TYPE##_##INDEX_TYPE);              \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2])                              \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+		== (src_##DATA_TYPE##_##INDEX_TYPE                             \
+		      [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                \
+		    + 1));                                                     \
+      else                                                                     \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+		== dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2]);                   \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1])                          \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+		== (src_##DATA_TYPE##_##INDEX_TYPE                             \
+		      [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]            \
+		    + 2));                                                     \
+      else                                                                     \
+	assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+		== dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]);               \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
new file mode 100644
index 00000000000..33ddb5d9909
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
new file mode 100644
index 00000000000..9f06fbe4ecf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
new file mode 100644
index 00000000000..ae578f0c7b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
new file mode 100644
index 00000000000..741abd166e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
new file mode 100644
index 00000000000..a14a5c4ced1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
new file mode 100644
index 00000000000..0ccc7dce166
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
new file mode 100644
index 00000000000..a34688ff339
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
new file mode 100644
index 00000000000..1cfdede2060
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[i]                                            \
+		== (dest2_##DATA_TYPE[i]                                       \
+		    + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
new file mode 100644
index 00000000000..623de41267b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
new file mode 100644
index 00000000000..55112b067fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
new file mode 100644
index 00000000000..32a572d0064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
new file mode 100644
index 00000000000..fbaaa9d8a8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
new file mode 100644
index 00000000000..9b08661f8e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
new file mode 100644
index 00000000000..dd26635f2cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
new file mode 100644
index 00000000000..fa0206a0ec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
new file mode 100644
index 00000000000..325e86c26a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
new file mode 100644
index 00000000000..b4b84e9cdda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
new file mode 100644
index 00000000000..77a9af953e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+	dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
new file mode 100644
index 00000000000..e0d52bf6291
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
new file mode 100644
index 00000000000..c1af0d30e62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
new file mode 100644
index 00000000000..6b1b02eae35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
new file mode 100644
index 00000000000..cef0bdec1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
new file mode 100644
index 00000000000..88a74d5a632
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
new file mode 100644
index 00000000000..06804ab7111
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
new file mode 100644
index 00000000000..c6c9a676ed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
new file mode 100644
index 00000000000..8246e964aad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
new file mode 100644
index 00000000000..8ee35d2e505
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
new file mode 100644
index 00000000000..c27a673e2b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+	assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+		== dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
new file mode 100644
index 00000000000..6a390261cfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
new file mode 100644
index 00000000000..feb58d7d458
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
new file mode 100644
index 00000000000..e4c587fd7bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
new file mode 100644
index 00000000000..33ad256d3db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
new file mode 100644
index 00000000000..48d305623e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
new file mode 100644
index 00000000000..83ddc44bf9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
new file mode 100644
index 00000000000..11eb68bdb13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
new file mode 100644
index 00000000000..2e323477258
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
new file mode 100644
index 00000000000..e6732fe3790
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
new file mode 100644
index 00000000000..766a52b4622
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+		 INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
new file mode 100644
index 00000000000..cafa64f3527
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
new file mode 100644
index 00000000000..79f6885831f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
new file mode 100644
index 00000000000..376db088153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
new file mode 100644
index 00000000000..103b8649d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
new file mode 100644
index 00000000000..f5f89c0fb4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
new file mode 100644
index 00000000000..049251ec888
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
new file mode 100644
index 00000000000..59c8e701dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
new file mode 100644
index 00000000000..a24401181e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
new file mode 100644
index 00000000000..080c9b83363
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
new file mode 100644
index 00000000000..cc9f20f0daa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+		 indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+	    == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
new file mode 100644
index 00000000000..c7b990668c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i] += src[i * stride];                                              \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-assembler-not "vluxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
new file mode 100644
index 00000000000..37dd7291f9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < (BITS + 13); ++i)                              \
+      dest[i] += src[i * (BITS - 3)];                                          \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 46 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-assembler-not "vluxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
new file mode 100644
index 00000000000..4b03c25a907
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+	dest_##DATA_TYPE##_##BITS[i]                                           \
+	== (dest2_##DATA_TYPE##_##BITS[i]                                      \
+	    + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
new file mode 100644
index 00000000000..8499e4cef24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+	dest_##DATA_TYPE##_##BITS[i]                                           \
+	== (dest2_##DATA_TYPE##_##BITS[i]                                      \
+	    + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
new file mode 100644
index 00000000000..df0560c5a31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * stride] = src[i] + BITS;                                        \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-assembler-not "vsuxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
new file mode 100644
index 00000000000..1419cbc91b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+			  INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * (BITS - 3)] = src[i] + BITS;                                    \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 44 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-assembler-not "vsuxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
new file mode 100644
index 00000000000..e9dca4672c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
@@ -0,0 +1,82 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+	      == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
new file mode 100644
index 00000000000..509def789e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
@@ -0,0 +1,82 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+	= (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+	= (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+			  stride_##DATA_TYPE##_##BITS,                         \
+			  n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+	      == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 5e69235a268..19589fa9638 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -90,5 +90,28 @@ foreach op $AUTOVEC_TEST_OPTS {
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
 	"-std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax" $CFLAGS
 
+# gather-scatter tests
+set AUTOVEC_TEST_OPTS [list \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} ]
+foreach op $AUTOVEC_TEST_OPTS {
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/gather-scatter/*.\[cS\]]] \
+    "" "$op"
+}
+
 # All done.
 dg-finish
-- 
2.36.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-07 14:32 [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization Juzhe-Zhong
@ 2023-07-10 21:51 ` 钟居哲
  2023-07-12  2:01 ` Jeff Law
  1 sibling, 0 replies; 14+ messages in thread
From: 钟居哲 @ 2023-07-10 21:51 UTC (permalink / raw)
  To: 钟居哲, gcc-patches; +Cc: kito.cheng, rdapp.gcc, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 323418 bytes --]

Hi, 
Is it ok for trunk ?



juzhe.zhong@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-07-07 22:32
To: gcc-patches
CC: kito.cheng; rdapp.gcc; jeffreyalaw; Juzhe-Zhong
Subject: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.
 
All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
 
We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate gather/scatter).
 
Also, we support strided loads/stores with vlse.v/vsse.v. Consider this following case:
#define TEST_LOOP(DATA_TYPE, BITS)                                             \
  void __attribute__ ((noinline, noclone))                                     \
  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
  INDEX##BITS stride, INDEX##BITS n)                   \
  {                                                                            \
    for (INDEX##BITS i = 0; i < n; ++i)                                        \
      dest[i] += src[i * stride];                                              \
  }
 
Codegen:
f_int8_t_8:
ble a3,zero,.L10
li a5,1
mv a4,a0
bne a2,a5,.L4
li a2,1
.L6:
vsetvli a5,a3,e8,m2,ta,ma
vle8.v v2,0(a0)
vlse8.v v4,0(a1),a2
vsetvli a6,zero,e8,m2,ta,ma
sub a3,a3,a5
vadd.vv v2,v2,v4
vsetvli zero,a5,e8,m2,ta,ma
vse8.v v2,0(a4)
add a0,a0,a5
add a1,a1,a5
add a4,a4,a5
bne a3,zero,.L6
.L10:
ret
 
We use vlse.v instead of vluxei.
 
This patch has been tested on both RV32 and RV64.
 
gcc/ChangeLog:
 
        * config/riscv/autovec.md (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
        (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
        (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
        (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
        (len_mask_gather_load<mode><mode>): Ditto.
        (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
        (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
        (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
        (len_mask_scatter_store<mode><mode>): Ditto.
        * config/riscv/predicates.md (const_1_operand): New predicate.
        (vector_gs_offset_operand): Ditto.
        (vector_gs_scale_operand_16): Ditto.
        (vector_gs_scale_operand_32): Ditto.
        (vector_gs_scale_operand_64): Ditto.
        (vector_gs_extension_operand): Ditto.
        (vector_gs_scale_operand_16_rv32): Ditto.
        (vector_gs_scale_operand_32_rv32): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
        (expand_gather_scatter): New function.
        * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
        (emit_vlmax_masked_store_insn): New function.
        (emit_nonvlmax_masked_store_insn): Ditto.
        (modulo_sel_indices): Ditto.
        (expand_vec_perm): Fix SLP for gather/scatter.
        (prepare_gather_scatter): New function.
        (strided_load_store_p): Ditto.
        (expand_gather_scatter): Ditto.
        * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI (DI CONST_POLY_INT)).
        * config/riscv/vector-iterators.md: Add gather/scatter.
        * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
        (@vec_duplicate<mode>): Ditto.
        (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix name.
        (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
 
gcc/testsuite/ChangeLog:
 
        * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: New test.
 
---
gcc/config/riscv/autovec.md                   | 256 ++++++++++++
gcc/config/riscv/predicates.md                |  39 +-
gcc/config/riscv/riscv-protos.h               |   3 +
gcc/config/riscv/riscv-v.cc                   | 372 ++++++++++++++++--
gcc/config/riscv/riscv.cc                     |  11 +-
gcc/config/riscv/vector-iterators.md          | 118 +++++-
gcc/config/riscv/vector.md                    |  30 +-
.../autovec/gather-scatter/gather_load-1.c    |  38 ++
.../autovec/gather-scatter/gather_load-10.c   |  35 ++
.../autovec/gather-scatter/gather_load-11.c   |  32 ++
.../autovec/gather-scatter/gather_load-12.c   | 112 ++++++
.../autovec/gather-scatter/gather_load-2.c    |  38 ++
.../autovec/gather-scatter/gather_load-3.c    |  35 ++
.../autovec/gather-scatter/gather_load-4.c    |  35 ++
.../autovec/gather-scatter/gather_load-5.c    |  35 ++
.../autovec/gather-scatter/gather_load-6.c    |  35 ++
.../autovec/gather-scatter/gather_load-7.c    |  35 ++
.../autovec/gather-scatter/gather_load-8.c    |  35 ++
.../autovec/gather-scatter/gather_load-9.c    |  35 ++
.../gather-scatter/gather_load_run-1.c        |  41 ++
.../gather-scatter/gather_load_run-10.c       |  41 ++
.../gather-scatter/gather_load_run-11.c       |  39 ++
.../gather-scatter/gather_load_run-12.c       | 124 ++++++
.../gather-scatter/gather_load_run-2.c        |  41 ++
.../gather-scatter/gather_load_run-3.c        |  41 ++
.../gather-scatter/gather_load_run-4.c        |  41 ++
.../gather-scatter/gather_load_run-5.c        |  41 ++
.../gather-scatter/gather_load_run-6.c        |  41 ++
.../gather-scatter/gather_load_run-7.c        |  41 ++
.../gather-scatter/gather_load_run-8.c        |  41 ++
.../gather-scatter/gather_load_run-9.c        |  41 ++
.../gather-scatter/mask_gather_load-1.c       |  39 ++
.../gather-scatter/mask_gather_load-10.c      |  36 ++
.../gather-scatter/mask_gather_load-11.c      | 116 ++++++
.../gather-scatter/mask_gather_load-2.c       |  39 ++
.../gather-scatter/mask_gather_load-3.c       |  36 ++
.../gather-scatter/mask_gather_load-4.c       |  36 ++
.../gather-scatter/mask_gather_load-5.c       |  36 ++
.../gather-scatter/mask_gather_load-6.c       |  36 ++
.../gather-scatter/mask_gather_load-7.c       |  36 ++
.../gather-scatter/mask_gather_load-8.c       |  36 ++
.../gather-scatter/mask_gather_load-9.c       |  36 ++
.../gather-scatter/mask_gather_load_run-1.c   |  48 +++
.../gather-scatter/mask_gather_load_run-10.c  |  48 +++
.../gather-scatter/mask_gather_load_run-11.c  | 140 +++++++
.../gather-scatter/mask_gather_load_run-2.c   |  48 +++
.../gather-scatter/mask_gather_load_run-3.c   |  48 +++
.../gather-scatter/mask_gather_load_run-4.c   |  48 +++
.../gather-scatter/mask_gather_load_run-5.c   |  48 +++
.../gather-scatter/mask_gather_load_run-6.c   |  48 +++
.../gather-scatter/mask_gather_load_run-7.c   |  48 +++
.../gather-scatter/mask_gather_load_run-8.c   |  48 +++
.../gather-scatter/mask_gather_load_run-9.c   |  48 +++
.../gather-scatter/mask_scatter_store-1.c     |  39 ++
.../gather-scatter/mask_scatter_store-10.c    |  36 ++
.../gather-scatter/mask_scatter_store-2.c     |  39 ++
.../gather-scatter/mask_scatter_store-3.c     |  36 ++
.../gather-scatter/mask_scatter_store-4.c     |  36 ++
.../gather-scatter/mask_scatter_store-5.c     |  36 ++
.../gather-scatter/mask_scatter_store-6.c     |  36 ++
.../gather-scatter/mask_scatter_store-7.c     |  36 ++
.../gather-scatter/mask_scatter_store-8.c     |  36 ++
.../gather-scatter/mask_scatter_store-9.c     |  36 ++
.../gather-scatter/mask_scatter_store_run-1.c |  48 +++
.../mask_scatter_store_run-10.c               |  48 +++
.../gather-scatter/mask_scatter_store_run-2.c |  48 +++
.../gather-scatter/mask_scatter_store_run-3.c |  48 +++
.../gather-scatter/mask_scatter_store_run-4.c |  48 +++
.../gather-scatter/mask_scatter_store_run-5.c |  48 +++
.../gather-scatter/mask_scatter_store_run-6.c |  48 +++
.../gather-scatter/mask_scatter_store_run-7.c |  48 +++
.../gather-scatter/mask_scatter_store_run-8.c |  48 +++
.../gather-scatter/mask_scatter_store_run-9.c |  48 +++
.../autovec/gather-scatter/scatter_store-1.c  |  38 ++
.../autovec/gather-scatter/scatter_store-10.c |  35 ++
.../autovec/gather-scatter/scatter_store-2.c  |  38 ++
.../autovec/gather-scatter/scatter_store-3.c  |  35 ++
.../autovec/gather-scatter/scatter_store-4.c  |  35 ++
.../autovec/gather-scatter/scatter_store-5.c  |  35 ++
.../autovec/gather-scatter/scatter_store-6.c  |  35 ++
.../autovec/gather-scatter/scatter_store-7.c  |  35 ++
.../autovec/gather-scatter/scatter_store-8.c  |  35 ++
.../autovec/gather-scatter/scatter_store-9.c  |  35 ++
.../gather-scatter/scatter_store_run-1.c      |  40 ++
.../gather-scatter/scatter_store_run-10.c     |  40 ++
.../gather-scatter/scatter_store_run-2.c      |  40 ++
.../gather-scatter/scatter_store_run-3.c      |  40 ++
.../gather-scatter/scatter_store_run-4.c      |  40 ++
.../gather-scatter/scatter_store_run-5.c      |  40 ++
.../gather-scatter/scatter_store_run-6.c      |  40 ++
.../gather-scatter/scatter_store_run-7.c      |  40 ++
.../gather-scatter/scatter_store_run-8.c      |  40 ++
.../gather-scatter/scatter_store_run-9.c      |  40 ++
.../autovec/gather-scatter/strided_load-1.c   |  46 +++
.../autovec/gather-scatter/strided_load-2.c   |  46 +++
.../gather-scatter/strided_load_run-1.c       |  84 ++++
.../gather-scatter/strided_load_run-2.c       |  84 ++++
.../autovec/gather-scatter/strided_store-1.c  |  46 +++
.../autovec/gather-scatter/strided_store-2.c  |  46 +++
.../gather-scatter/strided_store_run-1.c      |  82 ++++
.../gather-scatter/strided_store_run-2.c      |  82 ++++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |  23 ++
102 files changed, 5084 insertions(+), 61 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9e61b2e41d8..78b9b5a2edb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -57,6 +57,262 @@
   }
)
+;; =========================================================================
+;; == Gather Load
+;; =========================================================================
+
+(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand:VNX1_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX1_QHSD:gs_extension>")
+   (match_operand 4 "<VNX1_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand:VNX2_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX2_QHSD:gs_extension>")
+   (match_operand 4 "<VNX2_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand:VNX4_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX4_QHSD:gs_extension>")
+   (match_operand 4 "<VNX4_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand:VNX8_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX8_QHSD:gs_extension>")
+   (match_operand 4 "<VNX8_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand:VNX16_QHSD 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX16_QHSD:gs_extension>")
+   (match_operand 4 "<VNX16_QHSD:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand:VNX32_QHS 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX32_QHS:gs_extension>")
+   (match_operand 4 "<VNX32_QHS:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+(define_expand "len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand:VNX64_QH 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 2 "vector_gs_offset_operand")
+   (match_operand 3 "<VNX64_QH:gs_extension>")
+   (match_operand 4 "<VNX64_QH:gs_scale>")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_gather_load<mode><mode>"
+  [(match_operand:VNX128_Q 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 2 "vector_gs_offset_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand 4 "const_1_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, true);
+  DONE;
+})
+
+;; =========================================================================
+;; == Scatter Store
+;; =========================================================================
+
+(define_expand "len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX1_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX1_QHSD:gs_extension>")
+   (match_operand 3 "<VNX1_QHSD:gs_scale>")
+   (match_operand:VNX1_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX1_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX2_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX2_QHSD:gs_extension>")
+   (match_operand 3 "<VNX2_QHSD:gs_scale>")
+   (match_operand:VNX2_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX2_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX4_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX4_QHSD:gs_extension>")
+   (match_operand 3 "<VNX4_QHSD:gs_scale>")
+   (match_operand:VNX4_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX4_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX8_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX8_QHSD:gs_extension>")
+   (match_operand 3 "<VNX8_QHSD:gs_scale>")
+   (match_operand:VNX8_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX8_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX16_QHSDI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX16_QHSD:gs_extension>")
+   (match_operand 3 "<VNX16_QHSD:gs_scale>")
+   (match_operand:VNX16_QHSD 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX16_QHSD:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX32_QHSI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX32_QHS:gs_extension>")
+   (match_operand 3 "<VNX32_QHS:gs_scale>")
+   (match_operand:VNX32_QHS 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX32_QHS:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+(define_expand "len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX64_QHI 1 "vector_gs_offset_operand")
+   (match_operand 2 "<VNX64_QH:gs_extension>")
+   (match_operand 3 "<VNX64_QH:gs_scale>")
+   (match_operand:VNX64_QH 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VNX64_QH:VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
+;; When SEW = 8 and LMUL = 8, we can't find any index mode with
+;; larger SEW. Since RVV indexed load/store support zero extend
+;; implicitly and not support scaling, we should only allow
+;; operands[3] and operands[4] to be const_1_operand.
+(define_expand "len_mask_scatter_store<mode><mode>"
+  [(match_operand 0 "pmode_reg_or_0_operand")
+   (match_operand:VNX128_Q 1 "vector_gs_offset_operand")
+   (match_operand 2 "const_1_operand")
+   (match_operand 3 "const_1_operand")
+   (match_operand:VNX128_Q 4 "register_operand")
+   (match_operand 5 "autovec_length_operand")
+   (match_operand 6 "const_0_operand")
+   (match_operand:<VM> 7 "vector_mask_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_gather_scatter (operands, false);
+  DONE;
+})
+
;; =========================================================================
;; == Vector creation
;; =========================================================================
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index eb975eaf994..5a65334e943 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -61,6 +61,10 @@
   (and (match_code "const_int,const_wide_int,const_vector")
        (match_test "op == CONST0_RTX (GET_MODE (op))")))
+(define_predicate "const_1_operand"
+  (and (match_code "const_int,const_wide_int,const_vector")
+       (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
(define_predicate "reg_or_0_operand"
   (ior (match_operand 0 "const_0_operand")
        (match_operand 0 "register_operand")))
@@ -341,6 +345,39 @@
   (ior (match_operand 0 "register_operand")
        (match_code "const_vector")))
+(define_predicate "vector_gs_offset_operand"
+  (ior (match_operand 0 "register_operand")
+       (and (match_code "const_vector")
+            (match_test "CONST_VECTOR_NPATTERNS (op) == 1
+                  && !CONST_VECTOR_DUPLICATE_P (op)"))))
+
+(define_predicate "vector_gs_scale_operand_16"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
+
+(define_predicate "vector_gs_scale_operand_32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || INTVAL (op) == 4")))
+
+(define_predicate "vector_gs_scale_operand_64"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1 || (INTVAL (op) == 8 && Pmode == DImode)")))
+
+(define_predicate "vector_gs_extension_operand"
+  (ior (match_operand 0 "const_1_operand")
+       (and (match_operand 0 "const_0_operand")
+            (match_test "Pmode == SImode"))))
+
+(define_predicate "vector_gs_scale_operand_16_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+     || (INTVAL (op) == 2 && Pmode == SImode)")))
+
+(define_predicate "vector_gs_scale_operand_32_rv32"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 1
+     || (INTVAL (op) == 4 && Pmode == SImode)")))
+
(define_predicate "ltge_operator"
   (match_code "lt,ltu,ge,geu"))
@@ -376,7 +413,7 @@
|| rtx_equal_p (op, CONST0_RTX (GET_MODE (op))))
&& maybe_gt (GET_MODE_BITSIZE (GET_MODE (op)), GET_MODE_BITSIZE (Pmode)))")
     (ior (match_test "rtx_equal_p (op, CONST0_RTX (GET_MODE (op)))")
-         (ior (match_operand 0 "const_int_operand")
+         (ior (match_code "const_int,const_poly_int")
               (ior (match_operand 0 "register_operand")
                    (match_test "satisfies_constraint_Wdm (op)"))))))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5766e3597e8..fd6caccc183 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -148,6 +148,8 @@ enum insn_type
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
   RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
+  RVV_GATHER_M_OP = 5,
+  RVV_SCATTER_M_OP = 4,
};
enum vlmul_type
{
@@ -255,6 +257,7 @@ void expand_vec_init (rtx, rtx);
void expand_vec_perm (rtx, rtx, rtx, rtx);
void expand_select_vl (rtx *);
void expand_load_store (rtx *, bool);
+void expand_gather_scatter (rtx *, bool);
/* Rounding mode bitfield for fixed point VXRM.  */
enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8d5bed7ebe4..dd36b3b71c7 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
#include "tm-constrs.h"
#include "rtx-vector-builder.h"
#include "targhooks.h"
+#include "gimple.h"
using namespace riscv_vector;
@@ -556,15 +557,22 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
   return true;
}
-/* Return a const_int vector of VAL.
-
-   This function also exists in aarch64, we may unify it in middle-end in the
-   future.  */
+/* Return a const vector of VAL. The VAL can be either const_int or
+   const_poly_int.  */
static rtx
gen_const_vector_dup (machine_mode mode, poly_int64 val)
{
-  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+  scalar_mode smode = GET_MODE_INNER (mode);
+  rtx c = gen_int_mode (val, smode);
+  if (!val.is_constant () && GET_MODE_SIZE (smode) > GET_MODE_SIZE (Pmode))
+    {
+      /* When VAL is const_poly_int value, we need to explicitly broadcast
+ it into a vector using RVV broadcast instruction.  */
+      rtx dup = gen_reg_rtx (mode);
+      emit_insn (gen_vec_duplicate (mode, dup, c));
+      return dup;
+    }
   return gen_const_vec_duplicate (mode, c);
}
@@ -901,6 +909,39 @@ emit_nonvlmax_masked_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
   e.emit_insn ((enum insn_code) icode, ops);
}
+/* This function emits a VLMAX masked store instruction.  */
+static void
+emit_vlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+   /*HAS_DEST_P*/ false,
+   /*FULLY_UNMASKED_P*/ false,
+   /*USE_REAL_MERGE_P*/ true,
+   /*HAS_AVL_P*/ true,
+   /*VLMAX_P*/ true, dest_mode,
+   mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* This function emits a non-VLMAX masked store instruction.  */
+static void
+emit_nonvlmax_masked_store_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
+   /*HAS_DEST_P*/ false,
+   /*FULLY_UNMASKED_P*/ false,
+   /*USE_REAL_MERGE_P*/ true,
+   /*HAS_AVL_P*/ true,
+   /*VLMAX_P*/ false, dest_mode,
+   mask_mode);
+  e.set_vl (avl);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
/* This function emits a masked instruction.  */
void
emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
@@ -1137,7 +1178,6 @@ static void
expand_const_vector (rtx target, rtx src)
{
   machine_mode mode = GET_MODE (target);
-  scalar_mode elt_mode = GET_MODE_INNER (mode);
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
     {
       rtx elt;
@@ -1162,7 +1202,6 @@ expand_const_vector (rtx target, rtx src)
}
       else
{
-   elt = force_reg (elt_mode, elt);
  rtx ops[] = {tmp, elt};
  emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
}
@@ -2431,6 +2470,25 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx op1,
   return false;
}
+/* Modulo all SEL indices to ensure they are all in range if [0, MAX_SEL].  */
+static rtx
+modulo_sel_indices (rtx sel, poly_uint64 max_sel)
+{
+  rtx sel_mod;
+  machine_mode sel_mode = GET_MODE (sel);
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
+  /* If SEL is variable-length CONST_VECTOR, we don't need to modulo it.  */
+  if (!nunits.is_constant () && CONST_VECTOR_P (sel))
+    sel_mod = sel;
+  else
+    {
+      rtx mod = gen_const_vector_dup (sel_mode, max_sel);
+      sel_mod
+ = expand_simple_binop (sel_mode, AND, sel, mod, NULL, 0, OPTAB_DIRECT);
+    }
+  return sel_mod;
+}
+
/* Implement vec_perm<mode>.  */
void
@@ -2444,41 +2502,43 @@ expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
      index is in range of [0, nunits - 1]. A single vrgather instructions is
      enough. Since we will use vrgatherei16.vv for variable-length vector,
      it is never out of range and we don't need to modulo the index.  */
-  if (!nunits.is_constant () || const_vec_all_in_range_p (sel, 0, nunits - 1))
+  if (nunits.is_constant () && const_vec_all_in_range_p (sel, 0, nunits - 1))
     {
       emit_vlmax_gather_insn (target, op0, sel);
       return;
     }
+  /* Check if all the indices are same.  */
+  rtx elt;
+  if (const_vec_duplicate_p (sel, &elt))
+    {
+      poly_uint64 value = rtx_to_poly_int64 (elt);
+      rtx op = op0;
+      if (maybe_gt (value, nunits - 1))
+ {
+   sel = gen_const_vector_dup (sel_mode, value - nunits);
+   op = op1;
+ }
+      emit_vlmax_gather_insn (target, op, sel);
+    }
+
+  /* Note: vec_perm indices are supposed to wrap when they go beyond the
+     size of the two value vectors, i.e. the upper bits of the indices
+     are effectively ignored.  RVV vrgather instead produces 0 for any
+     out-of-range indices, so we need to modulo all the vec_perm indices
+     to ensure they are all in range of [0, nunits - 1] when op0 == op1
+     or all in range of [0, 2 * nunits - 1] when op0 != op1.  */
+  rtx sel_mod
+    = modulo_sel_indices (sel,
+   rtx_equal_p (op0, op1) ? nunits - 1 : 2 * nunits - 1);
   /* Check if the two values vectors are the same.  */
-  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
-    {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
- size of the two value vectors, i.e. the upper bits of the indices
- are effectively ignored.  RVV vrgather instead produces 0 for any
- out-of-range indices, so we need to modulo all the vec_perm indices
- to ensure they are all in range of [0, nunits - 1].  */
-      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
-      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
- OPTAB_DIRECT);
-      emit_vlmax_gather_insn (target, op1, sel_mod);
+  if (rtx_equal_p (op0, op1))
+    {
+      emit_vlmax_gather_insn (target, op0, sel_mod);
       return;
     }
-  rtx sel_mod = sel;
   rtx max_sel = gen_const_vector_dup (sel_mode, 2 * nunits - 1);
-  /* We don't need to modulo indices for VLA vector.
-     Since we should gurantee they aren't out of range before.  */
-  if (nunits.is_constant ())
-    {
-      /* Note: vec_perm indices are supposed to wrap when they go beyond the
- size of the two value vectors, i.e. the upper bits of the indices
- are effectively ignored.  RVV vrgather instead produces 0 for any
- out-of-range indices, so we need to modulo all the vec_perm indices
- to ensure they are all in range of [0, 2 * nunits - 1].  */
-      sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 0,
-      OPTAB_DIRECT);
-    }
   /* This following sequence is handling the case that:
      __builtin_shufflevector (vec1, vec2, index...), the index can be any
@@ -2812,4 +2872,252 @@ expand_load_store (rtx *ops, bool is_load)
     }
}
+/* Prepare insn_code for gather_load/scatter_store according to
+   the vector mode and index mode.  */
+static insn_code
+prepare_gather_scatter (machine_mode vec_mode, machine_mode idx_mode,
+ bool is_load)
+{
+  if (!is_load)
+    return code_for_pred_indexed_store (UNSPEC_UNORDERED, vec_mode, idx_mode);
+  else
+    {
+      unsigned src_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (idx_mode));
+      unsigned dst_eew_bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+      if (dst_eew_bitsize == src_eew_bitsize)
+ return code_for_pred_indexed_load_same_eew (UNSPEC_UNORDERED, vec_mode);
+      else if (dst_eew_bitsize > src_eew_bitsize)
+ {
+   unsigned factor = dst_eew_bitsize / src_eew_bitsize;
+   switch (factor)
+     {
+     case 2:
+       return code_for_pred_indexed_load_x2_greater_eew (
+ UNSPEC_UNORDERED, vec_mode);
+     case 4:
+       return code_for_pred_indexed_load_x4_greater_eew (
+ UNSPEC_UNORDERED, vec_mode);
+     case 8:
+       return code_for_pred_indexed_load_x8_greater_eew (
+ UNSPEC_UNORDERED, vec_mode);
+     default:
+       gcc_unreachable ();
+     }
+ }
+      else
+ {
+   unsigned factor = src_eew_bitsize / dst_eew_bitsize;
+   switch (factor)
+     {
+     case 2:
+       return code_for_pred_indexed_load_x2_smaller_eew (
+ UNSPEC_UNORDERED, vec_mode);
+     case 4:
+       return code_for_pred_indexed_load_x4_smaller_eew (
+ UNSPEC_UNORDERED, vec_mode);
+     case 8:
+       return code_for_pred_indexed_load_x8_smaller_eew (
+ UNSPEC_UNORDERED, vec_mode);
+     default:
+       gcc_unreachable ();
+     }
+ }
+    }
+}
+
+/* Return true if it is the strided load/store.  */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+  if (const_vec_series_p (vec_offset, base, step))
+    return true;
+
+  /* For strided load/store, vectorizer always generates
+     VEC_SERIES_EXPR for vec_offset.  */
+  tree expr = REG_EXPR (vec_offset);
+  if (!expr || TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+    return false;
+
+  tree baset = gimple_assign_rhs1 (def_stmt);
+  tree stept = gimple_assign_rhs2 (def_stmt);
+  *base = expand_normal (baset);
+  *step = expand_normal (stept);
+
+  if (!rtx_equal_p (*base, const0_rtx))
+    return false;
+  return true;
+}
+
+/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
+void
+expand_gather_scatter (rtx *ops, bool is_load)
+{
+  rtx ptr, vec_offset, vec_reg, len, mask;
+  bool zero_extend_p;
+  int scale_log2;
+  if (is_load)
+    {
+      vec_reg = ops[0];
+      ptr = ops[1];
+      vec_offset = ops[2];
+      zero_extend_p = INTVAL (ops[3]);
+      scale_log2 = exact_log2 (INTVAL (ops[4]));
+      len = ops[5];
+      mask = ops[7];
+    }
+  else
+    {
+      vec_reg = ops[4];
+      ptr = ops[0];
+      vec_offset = ops[1];
+      zero_extend_p = INTVAL (ops[2]);
+      scale_log2 = exact_log2 (INTVAL (ops[3]));
+      len = ops[5];
+      mask = ops[7];
+    }
+
+  machine_mode vec_mode = GET_MODE (vec_reg);
+  machine_mode idx_mode = GET_MODE (vec_offset);
+  scalar_mode inner_vec_mode = GET_MODE_INNER (vec_mode);
+  scalar_mode inner_idx_mode = GET_MODE_INNER (idx_mode);
+  unsigned inner_vsize = GET_MODE_BITSIZE (inner_vec_mode);
+  unsigned inner_offsize = GET_MODE_BITSIZE (inner_idx_mode);
+  poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
+  poly_int64 value;
+  bool is_vlmax = poly_int_rtx_p (len, &value) && known_eq (value, nunits);
+
+  /* We use vlse.v/vsse.v instead of indexed load/store by default
+     if it is strided load/store.
+
+     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
+     We may need COST MODE to adjust it.  */
+  rtx base, step;
+  if (strided_load_store_p (vec_offset, &base, &step))
+    {
+      if (GET_MODE (step) != Pmode)
+ {
+   if (CONSTANT_P (step))
+     step = force_reg (Pmode, step);
+   else
+     {
+       rtx extend_step = gen_reg_rtx (Pmode);
+       emit_insn (gen_extend_insn (extend_step, step, Pmode,
+   GET_MODE (step),
+   zero_extend_p ? true : false));
+       step = extend_step;
+     }
+ }
+      if (scale_log2 != 0)
+ {
+   rtx scale_step = gen_reg_rtx (Pmode);
+   rtx tmp = expand_simple_binop (Pmode, ASHIFT, step,
+ gen_int_mode (scale_log2, Pmode),
+ NULL_RTX, false, OPTAB_DIRECT);
+   emit_move_insn (scale_step, tmp);
+   step = scale_step;
+ }
+
+      rtx mem = validize_mem (gen_rtx_MEM (vec_mode, ptr));
+      /* Emit vlse.v if it's load. Otherwise, emit vsse.v.  */
+      if (is_load)
+ {
+   insn_code icode = code_for_pred_strided_load (vec_mode);
+   rtx load_ops[] = {vec_reg, mask, RVV_VUNDEF (vec_mode), mem, step};
+   if (is_vlmax)
+     emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+   else
+     emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+ }
+      else
+ {
+   if (is_vlmax)
+     {
+       rtx vlmax_len = gen_reg_rtx (Pmode);
+       emit_vlmax_vsetvl (vec_mode, vlmax_len);
+       emit_insn (gen_pred_strided_store (vec_mode, mem, mask, step,
+ vec_reg, vlmax_len,
+ get_avl_type_rtx (VLMAX)));
+     }
+   else
+     emit_insn (gen_pred_strided_store (vec_mode, mem, mask, step,
+        vec_reg, len,
+        get_avl_type_rtx (NONVLMAX)));
+ }
+      return;
+    }
+
+  if (inner_offsize < inner_vsize)
+    {
+      /* 7.2. Vector Load/Store Addressing Modes.
+ If the vector offset elements are narrower than XLEN, they are
+ zero-extended to XLEN before adding to the ptr effective address. If
+ the vector offset elements are wider than XLEN, the least-significant
+ XLEN bits are used in the address calculation. An implementation must
+ raise an illegal instruction exception if the EEW is not supported for
+ offset elements.
+
+ RVV spec only refers to the scale_log == 0 case.  */
+      if (!zero_extend_p || (zero_extend_p && scale_log2 != 0))
+ {
+   if (zero_extend_p)
+     inner_idx_mode
+       = int_mode_for_size (inner_offsize * 2, 0).require ();
+   else
+     inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
+   machine_mode new_idx_mode
+     = get_vector_mode (inner_idx_mode, nunits).require ();
+   rtx tmp = gen_reg_rtx (new_idx_mode);
+   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
+       zero_extend_p ? true : false));
+   vec_offset = tmp;
+   idx_mode = new_idx_mode;
+ }
+    }
+
+  if (scale_log2 != 0)
+    {
+      rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+       gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
+       OPTAB_DIRECT);
+      vec_offset = tmp;
+    }
+
+  insn_code icode = prepare_gather_scatter (vec_mode, idx_mode, is_load);
+  if (is_vlmax)
+    {
+      if (is_load)
+ {
+   rtx load_ops[]
+     = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+   emit_vlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops);
+ }
+      else
+ {
+   rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+   emit_vlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops);
+ }
+    }
+  else
+    {
+      if (is_load)
+ {
+   rtx load_ops[]
+     = {vec_reg, mask, RVV_VUNDEF (vec_mode), ptr, vec_offset};
+   emit_nonvlmax_masked_insn (icode, RVV_GATHER_M_OP, load_ops, len);
+ }
+      else
+ {
+   rtx store_ops[] = {mask, ptr, vec_offset, vec_reg};
+   emit_nonvlmax_masked_store_insn (icode, RVV_SCATTER_M_OP, store_ops,
+    len);
+ }
+    }
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..8970f6da6ad 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2060,7 +2060,14 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, rtx tmp, rtx src)
      (m, n) = base * magn + constant.
      This calculation doesn't need div operation.  */
-  emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+  if (mode <= Pmode)
+    emit_move_insn (tmp, gen_int_mode (BYTES_PER_RISCV_VECTOR, mode));
+  else
+    {
+      emit_move_insn (gen_highpart (Pmode, tmp), CONST0_RTX (Pmode));
+      emit_move_insn (gen_lowpart (Pmode, tmp),
+       gen_int_mode (BYTES_PER_RISCV_VECTOR, Pmode));
+    }
   if (BYTES_PER_RISCV_VECTOR.is_constant ())
     {
@@ -2167,7 +2174,7 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx src)
  return false;
}
-      if (satisfies_constraint_vp (src))
+      if (satisfies_constraint_vp (src) && GET_MODE (src) == Pmode)
return false;
       if (GET_MODE_SIZE (mode).to_constant () < GET_MODE_SIZE (Pmode))
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 8afd3dcaddd..ec49544bcf6 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -115,6 +115,9 @@
(define_mode_iterator VEEWEXT2 [
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI "TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
@@ -161,6 +164,8 @@
(define_mode_iterator VEEWTRUNC2 [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI (VNx64QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16") (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI "TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
@@ -172,6 +177,8 @@
(define_mode_iterator VEEWTRUNC4 [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI (VNx32QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI "TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_VECTOR_ELEN_FP_16") (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16") (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
])
(define_mode_iterator VEEWTRUNC8 [
@@ -362,46 +369,67 @@
])
(define_mode_iterator VNX1_QHSD [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
])
(define_mode_iterator VNX2_QHSD [
-  VNx2QI VNx2HI VNx2SI
+  VNx2QI
+  VNx2HI
+  VNx2SI
   (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
])
(define_mode_iterator VNX4_QHSD [
-  VNx4QI VNx4HI VNx4SI
+  VNx4QI
+  VNx4HI
+  VNx4SI
   (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
])
(define_mode_iterator VNX8_QHSD [
-  VNx8QI VNx8HI VNx8SI
+  VNx8QI
+  VNx8HI
+  VNx8SI
   (VNx8DI "TARGET_VECTOR_ELEN_64")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8DF "TARGET_VECTOR_ELEN_FP_64")
])
-(define_mode_iterator VNX16_QHS [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32")
+(define_mode_iterator VNX16_QHSD [
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128") (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
+  (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VNX32_QHS [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128") (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VNX64_QH [
   (VNx64QI "TARGET_MIN_VLEN > 32")
   (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VNX128_Q [
@@ -409,35 +437,49 @@
])
(define_mode_iterator VNX1_QHSDI [
-  (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI "TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  (VNx1SI "TARGET_MIN_VLEN < 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128 && TARGET_64BIT")
])
(define_mode_iterator VNX2_QHSDI [
-  VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx2QI
+  VNx2HI
+  VNx2SI
+  (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
])
(define_mode_iterator VNX4_QHSDI [
-  VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx4QI
+  VNx4HI
+  VNx4SI
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
])
(define_mode_iterator VNX8_QHSDI [
-  VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
+  VNx8QI
+  VNx8HI
+  VNx8SI
+  (VNx8DI "TARGET_VECTOR_ELEN_64 && TARGET_64BIT")
])
(define_mode_iterator VNX16_QHSDI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
+  VNx16QI
+  VNx16HI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx16DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128 && TARGET_64BIT")
])
(define_mode_iterator VNX32_QHSI [
-  VNx32QI (VNx32HI "TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
+  VNx32QI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VNX64_QHI [
-  VNx64QI (VNx64HI "TARGET_MIN_VLEN >= 128")
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator V_WHOLE [
@@ -1393,6 +1435,8 @@
(define_mode_attr VINDEX_DOUBLE_TRUNC [
   (VNx1HI "VNx1QI") (VNx2HI "VNx2QI")  (VNx4HI "VNx4QI")  (VNx8HI "VNx8QI")
   (VNx16HI "VNx16QI") (VNx32HI "VNx32QI") (VNx64HI "VNx64QI")
+  (VNx1HF "VNx1QI") (VNx2HF "VNx2QI")  (VNx4HF "VNx4QI")  (VNx8HF "VNx8QI")
+  (VNx16HF "VNx16QI") (VNx32HF "VNx32QI") (VNx64HF "VNx64QI")
   (VNx1SI "VNx1HI") (VNx2SI "VNx2HI") (VNx4SI "VNx4HI") (VNx8SI "VNx8HI")
   (VNx16SI "VNx16HI") (VNx32SI "VNx32HI")
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI")
@@ -1420,6 +1464,7 @@
(define_mode_attr VINDEX_DOUBLE_EXT [
   (VNx1QI "VNx1HI") (VNx2QI "VNx2HI") (VNx4QI "VNx4HI") (VNx8QI "VNx8HI") (VNx16QI "VNx16HI") (VNx32QI "VNx32HI") (VNx64QI "VNx64HI")
   (VNx1HI "VNx1SI") (VNx2HI "VNx2SI") (VNx4HI "VNx4SI") (VNx8HI "VNx8SI") (VNx16HI "VNx16SI") (VNx32HI "VNx32SI")
+  (VNx1HF "VNx1SI") (VNx2HF "VNx2SI") (VNx4HF "VNx4SI") (VNx8HF "VNx8SI") (VNx16HF "VNx16SI") (VNx32HF "VNx32SI")
   (VNx1SI "VNx1DI") (VNx2SI "VNx2DI") (VNx4SI "VNx4DI") (VNx8SI "VNx8DI") (VNx16SI "VNx16DI")
   (VNx1SF "VNx1DI") (VNx2SF "VNx2DI") (VNx4SF "VNx4DI") (VNx8SF "VNx8DI") (VNx16SF "VNx16DI")
])
@@ -1427,6 +1472,7 @@
(define_mode_attr VINDEX_QUAD_EXT [
   (VNx1QI "VNx1SI") (VNx2QI "VNx2SI") (VNx4QI "VNx4SI") (VNx8QI "VNx8SI") (VNx16QI "VNx16SI") (VNx32QI "VNx32SI")
   (VNx1HI "VNx1DI") (VNx2HI "VNx2DI") (VNx4HI "VNx4DI") (VNx8HI "VNx8DI") (VNx16HI "VNx16DI")
+  (VNx1HF "VNx1DI") (VNx2HF "VNx2DI") (VNx4HF "VNx4DI") (VNx8HF "VNx8DI") (VNx16HF "VNx16DI")
])
(define_mode_attr VINDEX_OCT_EXT [
@@ -1471,6 +1517,40 @@
   (VNx4DI "VNx8BI") (VNx8DI "VNx16BI") (VNx16DI "VNx32BI")
])
+(define_mode_attr gs_extension [
+  (VNx1QI "immediate_operand") (VNx2QI "immediate_operand") (VNx4QI "immediate_operand") (VNx8QI "immediate_operand") (VNx16QI "immediate_operand")
+  (VNx32QI "vector_gs_extension_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "immediate_operand") (VNx2HI "immediate_operand") (VNx4HI "immediate_operand") (VNx8HI "immediate_operand") (VNx16HI "immediate_operand")
+  (VNx32HI "vector_gs_extension_operand") (VNx64HI "const_1_operand")
+  (VNx1SI "immediate_operand") (VNx2SI "immediate_operand") (VNx4SI "immediate_operand") (VNx8SI "immediate_operand") (VNx16SI "immediate_operand")
+  (VNx32SI "vector_gs_extension_operand")
+  (VNx1DI "immediate_operand") (VNx2DI "immediate_operand") (VNx4DI "immediate_operand") (VNx8DI "immediate_operand") (VNx16DI "immediate_operand")
+
+  (VNx1HF "immediate_operand") (VNx2HF "immediate_operand") (VNx4HF "immediate_operand") (VNx8HF "immediate_operand") (VNx16HF "immediate_operand")
+  (VNx32HF "vector_gs_extension_operand") (VNx64HF "const_1_operand")
+  (VNx1SF "immediate_operand") (VNx2SF "immediate_operand") (VNx4SF "immediate_operand") (VNx8SF "immediate_operand") (VNx16SF "immediate_operand")
+  (VNx32SF "vector_gs_extension_operand")
+  (VNx1DF "immediate_operand") (VNx2DF "immediate_operand") (VNx4DF "immediate_operand") (VNx8DF "immediate_operand") (VNx16DF "immediate_operand")
+])
+
+(define_mode_attr gs_scale [
+  (VNx1QI "const_1_operand") (VNx2QI "const_1_operand") (VNx4QI "const_1_operand") (VNx8QI "const_1_operand")
+  (VNx16QI "const_1_operand") (VNx32QI "const_1_operand") (VNx64QI "const_1_operand")
+  (VNx1HI "vector_gs_scale_operand_16") (VNx2HI "vector_gs_scale_operand_16") (VNx4HI "vector_gs_scale_operand_16") (VNx8HI "vector_gs_scale_operand_16")
+  (VNx16HI "vector_gs_scale_operand_16") (VNx32HI "vector_gs_scale_operand_16_rv32") (VNx64HI "const_1_operand")
+  (VNx1SI "vector_gs_scale_operand_32") (VNx2SI "vector_gs_scale_operand_32") (VNx4SI "vector_gs_scale_operand_32") (VNx8SI "vector_gs_scale_operand_32")
+  (VNx16SI "vector_gs_scale_operand_32") (VNx32SI "vector_gs_scale_operand_32_rv32")
+  (VNx1DI "vector_gs_scale_operand_64") (VNx2DI "vector_gs_scale_operand_64") (VNx4DI "vector_gs_scale_operand_64") (VNx8DI "vector_gs_scale_operand_64")
+  (VNx16DI "vector_gs_scale_operand_64")
+
+  (VNx1HF "vector_gs_scale_operand_16") (VNx2HF "vector_gs_scale_operand_16") (VNx4HF "vector_gs_scale_operand_16") (VNx8HF "vector_gs_scale_operand_16")
+  (VNx16HF "vector_gs_scale_operand_16") (VNx32HF "vector_gs_scale_operand_16_rv32") (VNx64HF "const_1_operand")
+  (VNx1SF "vector_gs_scale_operand_32") (VNx2SF "vector_gs_scale_operand_32") (VNx4SF "vector_gs_scale_operand_32") (VNx8SF "vector_gs_scale_operand_32")
+  (VNx16SF "vector_gs_scale_operand_32") (VNx32SF "vector_gs_scale_operand_32_rv32")
+  (VNx1DF "vector_gs_scale_operand_64") (VNx2DF "vector_gs_scale_operand_64") (VNx4DF "vector_gs_scale_operand_64") (VNx8DF "vector_gs_scale_operand_64")
+  (VNx16DF "vector_gs_scale_operand_64")
+])
+
(define_int_iterator WREDUC [UNSPEC_WREDUC_SUM UNSPEC_WREDUC_USUM])
(define_int_iterator ORDER [UNSPEC_ORDERED UNSPEC_UNORDERED])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5b7a17b9d34..19740c89132 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -818,7 +818,7 @@
;; This pattern only handles duplicates of non-constant inputs.
;; Constant vectors go through the movm pattern instead.
;; So "direct_broadcast_operand" can only be mem or reg, no CONSTANT.
-(define_expand "vec_duplicate<mode>"
+(define_expand "@vec_duplicate<mode>"
   [(set (match_operand:V 0 "register_operand")
(vec_duplicate:V
  (match_operand:<VEL> 1 "direct_broadcast_operand")))]
@@ -1357,8 +1357,16 @@
}
     }
   else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
-           && immediate_operand (operands[3], Pmode))
-    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, operands[3]));
+           && (immediate_operand (operands[3], Pmode)
+        || (CONST_POLY_INT_P (operands[3])
+            && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
+    && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
+    {
+      rtx tmp = gen_reg_rtx (Pmode);
+      poly_int64 value = rtx_to_poly_int64 (operands[3]);
+      emit_move_insn (tmp, gen_int_mode (value, Pmode));
+      operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
+    }
   else
     operands[3] = force_reg (<VEL>mode, operands[3]);
})
@@ -1387,7 +1395,8 @@
    vlse<sew>.v\t%0,%3,zero
    vmv.s.x\t%0,%3
    vmv.s.x\t%0,%3"
-  "register_operand (operands[3], <VEL>mode)
+  "(register_operand (operands[3], <VEL>mode)
+  || CONST_POLY_INT_P (operands[3]))
   && GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)"
   [(set (match_dup 0)
(if_then_else:VI (unspec:<VM> [(match_dup 1) (match_dup 4)
@@ -1397,6 +1406,12 @@
  (match_dup 2)))]
   {
     gcc_assert (can_create_pseudo_p ());
+    if (CONST_POLY_INT_P (operands[3]))
+      {
+        rtx tmp = gen_reg_rtx (<VEL>mode);
+ emit_move_insn (tmp, operands[3]);
+ operands[3] = tmp;
+      }
     rtx m = assign_stack_local (<VEL>mode, GET_MODE_SIZE (<VEL>mode),
GET_MODE_ALIGNMENT (<VEL>mode));
     m = validize_mem (m);
@@ -1483,6 +1498,7 @@
     (match_operand 5 "vector_length_operand"    "   rK,    rK,    rK")
     (match_operand 6 "const_int_operand"        "    i,     i,     i")
     (match_operand 7 "const_int_operand"        "    i,     i,     i")
+      (match_operand 8 "const_int_operand"        "    i,     i,     i")
     (reg:SI VL_REGNUM)
     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:V
@@ -1738,7 +1754,7 @@
   [(set_attr "type" "vst<order>x")
    (set_attr "mode" "<VNX8_QHSD:MODE>")])
-(define_insn "@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>"
+(define_insn "@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>"
   [(set (mem:BLK (scratch))
(unspec:BLK
  [(unspec:<VM>
@@ -1749,11 +1765,11 @@
     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
   (match_operand 1 "pmode_reg_or_0_operand"      "  rJ")
   (match_operand:VNX16_QHSDI 2 "register_operand" "  vr")
-    (match_operand:VNX16_QHS 3 "register_operand"  "  vr")] ORDER))]
+    (match_operand:VNX16_QHSD 3 "register_operand"  "  vr")] ORDER))]
   "TARGET_VECTOR"
   "vs<order>xei<VNX16_QHSDI:sew>.v\t%3,(%z1),%2%p0"
   [(set_attr "type" "vst<order>x")
-   (set_attr "mode" "<VNX16_QHS:MODE>")])
+   (set_attr "mode" "<VNX16_QHSD:MODE>")])
(define_insn "@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>"
   [(set (mem:BLK (scratch))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
new file mode 100644
index 00000000000..dffe13f6a8a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
new file mode 100644
index 00000000000..a622e516f06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
new file mode 100644
index 00000000000..4692380233d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE)                                                   \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict *src)           \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += *src[i];                                                      \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t)                                                                   \
+  T (uint8_t)                                                                  \
+  T (int16_t)                                                                  \
+  T (uint16_t)                                                                 \
+  T (_Float16)                                                                 \
+  T (int32_t)                                                                  \
+  T (uint32_t)                                                                 \
+  T (float)                                                                    \
+  T (int64_t)                                                                  \
+  T (uint64_t)                                                                 \
+  T (double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
new file mode 100644
index 00000000000..71a3dd466fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c
@@ -0,0 +1,112 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+ INDEX_TYPE *restrict index)                    \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+ y[i * 2] = x[index[i * 2]] + 1;                                        \
+ y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                                \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
new file mode 100644
index 00000000000..785550c4b2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
new file mode 100644
index 00000000000..22aeb889221
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
new file mode 100644
index 00000000000..d74a83415d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
new file mode 100644
index 00000000000..2b6c0a87c18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
new file mode 100644
index 00000000000..407cc8a5a73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
new file mode 100644
index 00000000000..81b31ef26aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
new file mode 100644
index 00000000000..0bfdb8f0acf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
new file mode 100644
index 00000000000..46f791105ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[i] += src[indices[i]];                                              \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
new file mode 100644
index 00000000000..0d3c5b71e5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
new file mode 100644
index 00000000000..145df1e7797
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
new file mode 100644
index 00000000000..d36b6f025f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE *src_##DATA_TYPE[128];                                             \
+  DATA_TYPE src2_##DATA_TYPE[128];                                             \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = src2_##DATA_TYPE + i;                               \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE);                           \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i] + src_##DATA_TYPE[i][0]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
new file mode 100644
index 00000000000..b4e2ead8ca9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
@@ -0,0 +1,124 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-12.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+ = (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+ src_##DATA_TYPE##_##INDEX_TYPE,                \
+ index_##DATA_TYPE##_##INDEX_TYPE);             \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                           \
+       == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+     [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                  \
+   + 1));                                                       \
+      assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                       \
+       == (src_##DATA_TYPE##_##INDEX_TYPE                               \
+     [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]              \
+   + 2));                                                       \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
new file mode 100644
index 00000000000..76c6df32e6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
new file mode 100644
index 00000000000..0fd64260082
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
new file mode 100644
index 00000000000..069d232b912
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
new file mode 100644
index 00000000000..499e555c1d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
new file mode 100644
index 00000000000..ec6587aa4e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
new file mode 100644
index 00000000000..c16287955a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
new file mode 100644
index 00000000000..e1744f60dbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
new file mode 100644
index 00000000000..3ad6d33087f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[i]                                                \
+     == (dest2_##DATA_TYPE[i]                                           \
+ + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
new file mode 100644
index 00000000000..a5de0deccbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
new file mode 100644
index 00000000000..74a0d05b37d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
new file mode 100644
index 00000000000..98c5b4678b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
@@ -0,0 +1,116 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                       \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
+ INDEX_TYPE *restrict index,                    \
+ INDEX_TYPE *restrict cond)                     \
+  {                                                                            \
+    for (int i = 0; i < 100; ++i)                                              \
+      {                                                                        \
+ if (cond[i * 2])                                                       \
+   y[i * 2] = x[index[i * 2]] + 1;                                      \
+ if (cond[i * 2 + 1])                                                   \
+   y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                              \
+      }                                                                        \
+  }
+
+TEST_LOOP (int8_t, int8_t)
+TEST_LOOP (uint8_t, int8_t)
+TEST_LOOP (int16_t, int8_t)
+TEST_LOOP (uint16_t, int8_t)
+TEST_LOOP (int32_t, int8_t)
+TEST_LOOP (uint32_t, int8_t)
+TEST_LOOP (int64_t, int8_t)
+TEST_LOOP (uint64_t, int8_t)
+TEST_LOOP (_Float16, int8_t)
+TEST_LOOP (float, int8_t)
+TEST_LOOP (double, int8_t)
+TEST_LOOP (int8_t, int16_t)
+TEST_LOOP (uint8_t, int16_t)
+TEST_LOOP (int16_t, int16_t)
+TEST_LOOP (uint16_t, int16_t)
+TEST_LOOP (int32_t, int16_t)
+TEST_LOOP (uint32_t, int16_t)
+TEST_LOOP (int64_t, int16_t)
+TEST_LOOP (uint64_t, int16_t)
+TEST_LOOP (_Float16, int16_t)
+TEST_LOOP (float, int16_t)
+TEST_LOOP (double, int16_t)
+TEST_LOOP (int8_t, int32_t)
+TEST_LOOP (uint8_t, int32_t)
+TEST_LOOP (int16_t, int32_t)
+TEST_LOOP (uint16_t, int32_t)
+TEST_LOOP (int32_t, int32_t)
+TEST_LOOP (uint32_t, int32_t)
+TEST_LOOP (int64_t, int32_t)
+TEST_LOOP (uint64_t, int32_t)
+TEST_LOOP (_Float16, int32_t)
+TEST_LOOP (float, int32_t)
+TEST_LOOP (double, int32_t)
+TEST_LOOP (int8_t, int64_t)
+TEST_LOOP (uint8_t, int64_t)
+TEST_LOOP (int16_t, int64_t)
+TEST_LOOP (uint16_t, int64_t)
+TEST_LOOP (int32_t, int64_t)
+TEST_LOOP (uint32_t, int64_t)
+TEST_LOOP (int64_t, int64_t)
+TEST_LOOP (uint64_t, int64_t)
+TEST_LOOP (_Float16, int64_t)
+TEST_LOOP (float, int64_t)
+TEST_LOOP (double, int64_t)
+TEST_LOOP (int8_t, uint8_t)
+TEST_LOOP (uint8_t, uint8_t)
+TEST_LOOP (int16_t, uint8_t)
+TEST_LOOP (uint16_t, uint8_t)
+TEST_LOOP (int32_t, uint8_t)
+TEST_LOOP (uint32_t, uint8_t)
+TEST_LOOP (int64_t, uint8_t)
+TEST_LOOP (uint64_t, uint8_t)
+TEST_LOOP (_Float16, uint8_t)
+TEST_LOOP (float, uint8_t)
+TEST_LOOP (double, uint8_t)
+TEST_LOOP (int8_t, uint16_t)
+TEST_LOOP (uint8_t, uint16_t)
+TEST_LOOP (int16_t, uint16_t)
+TEST_LOOP (uint16_t, uint16_t)
+TEST_LOOP (int32_t, uint16_t)
+TEST_LOOP (uint32_t, uint16_t)
+TEST_LOOP (int64_t, uint16_t)
+TEST_LOOP (uint64_t, uint16_t)
+TEST_LOOP (_Float16, uint16_t)
+TEST_LOOP (float, uint16_t)
+TEST_LOOP (double, uint16_t)
+TEST_LOOP (int8_t, uint32_t)
+TEST_LOOP (uint8_t, uint32_t)
+TEST_LOOP (int16_t, uint32_t)
+TEST_LOOP (uint16_t, uint32_t)
+TEST_LOOP (int32_t, uint32_t)
+TEST_LOOP (uint32_t, uint32_t)
+TEST_LOOP (int64_t, uint32_t)
+TEST_LOOP (uint64_t, uint32_t)
+TEST_LOOP (_Float16, uint32_t)
+TEST_LOOP (float, uint32_t)
+TEST_LOOP (double, uint32_t)
+TEST_LOOP (int8_t, uint64_t)
+TEST_LOOP (uint8_t, uint64_t)
+TEST_LOOP (int16_t, uint64_t)
+TEST_LOOP (uint16_t, uint64_t)
+TEST_LOOP (int32_t, uint64_t)
+TEST_LOOP (uint32_t, uint64_t)
+TEST_LOOP (int64_t, uint64_t)
+TEST_LOOP (uint64_t, uint64_t)
+TEST_LOOP (_Float16, uint64_t)
+TEST_LOOP (float, uint64_t)
+TEST_LOOP (double, uint64_t)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-assembler-not "vluxei64\.v" } } */
+/* { dg-final { scan-assembler-not "vsuxei64\.v" } } */
+/* { dg-final { scan-assembler-not {vlse64\.v\s+v[0-9]+,\s*0\([a-x0-9]+\),\s*zero} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
new file mode 100644
index 00000000000..03f84ce962c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
new file mode 100644
index 00000000000..8578001ef41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
new file mode 100644
index 00000000000..b273caa0bfe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
new file mode 100644
index 00000000000..5055d886d62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
new file mode 100644
index 00000000000..2a4ae58588f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                               \
+  T (uint8_t, 16)                                                              \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
new file mode 100644
index 00000000000..31d9414c549
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
new file mode 100644
index 00000000000..73ed23042fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                               \
+  T (uint8_t, 32)                                                              \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
new file mode 100644
index 00000000000..2f64e805759
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fno-schedule-insns -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[i] += src[indices[i]];                                            \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                               \
+  T (uint8_t, 64)                                                              \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
new file mode 100644
index 00000000000..41f60bd88b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
new file mode 100644
index 00000000000..9840434fa41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
new file mode 100644
index 00000000000..105c706dbf9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c
@@ -0,0 +1,140 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-11.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, INDEX_TYPE)                                        \
+  DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                        \
+  DATA_TYPE dest2_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                         \
+  INDEX_TYPE index_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                      \
+  INDEX_TYPE cond_##DATA_TYPE##_##INDEX_TYPE[202] = {0};                       \
+  for (int i = 0; i < 202; i++)                                                \
+    {                                                                          \
+      src_##DATA_TYPE##_##INDEX_TYPE[i]                                        \
+ = (DATA_TYPE) ((i * 19 + 735) & (sizeof (DATA_TYPE) * 7 - 1));         \
+      dest_##DATA_TYPE##_##INDEX_TYPE[i]                                       \
+ = (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      dest2_##DATA_TYPE##_##INDEX_TYPE[i]                                      \
+ = (DATA_TYPE) ((i * 7 + 666) & (sizeof (DATA_TYPE) * 5 - 1));          \
+      index_##DATA_TYPE##_##INDEX_TYPE[i] = (i * 7) % (55);                    \
+      cond_##DATA_TYPE##_##INDEX_TYPE[i] = (INDEX_TYPE) ((i & 0x3) == 3);      \
+    }                                                                          \
+  f_##DATA_TYPE##_##INDEX_TYPE (dest_##DATA_TYPE##_##INDEX_TYPE,               \
+ src_##DATA_TYPE##_##INDEX_TYPE,                \
+ index_##DATA_TYPE##_##INDEX_TYPE,              \
+ cond_##DATA_TYPE##_##INDEX_TYPE);              \
+  for (int i = 0; i < 100; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2])                              \
+ assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+ == (src_##DATA_TYPE##_##INDEX_TYPE                             \
+       [index_##DATA_TYPE##_##INDEX_TYPE[i * 2]]                \
+     + 1));                                                     \
+      else                                                                     \
+ assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2]                         \
+ == dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2]);                   \
+      if (cond_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1])                          \
+ assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+ == (src_##DATA_TYPE##_##INDEX_TYPE                             \
+       [index_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]]            \
+     + 2));                                                     \
+      else                                                                     \
+ assert (dest_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]                     \
+ == dest2_##DATA_TYPE##_##INDEX_TYPE[i * 2 + 1]);               \
+    }
+
+  RUN_LOOP (int8_t, int8_t)
+  RUN_LOOP (uint8_t, int8_t)
+  RUN_LOOP (int16_t, int8_t)
+  RUN_LOOP (uint16_t, int8_t)
+  RUN_LOOP (int32_t, int8_t)
+  RUN_LOOP (uint32_t, int8_t)
+  RUN_LOOP (int64_t, int8_t)
+  RUN_LOOP (uint64_t, int8_t)
+  RUN_LOOP (_Float16, int8_t)
+  RUN_LOOP (float, int8_t)
+  RUN_LOOP (double, int8_t)
+  RUN_LOOP (int8_t, int16_t)
+  RUN_LOOP (uint8_t, int16_t)
+  RUN_LOOP (int16_t, int16_t)
+  RUN_LOOP (uint16_t, int16_t)
+  RUN_LOOP (int32_t, int16_t)
+  RUN_LOOP (uint32_t, int16_t)
+  RUN_LOOP (int64_t, int16_t)
+  RUN_LOOP (uint64_t, int16_t)
+  RUN_LOOP (_Float16, int16_t)
+  RUN_LOOP (float, int16_t)
+  RUN_LOOP (double, int16_t)
+  RUN_LOOP (int8_t, int32_t)
+  RUN_LOOP (uint8_t, int32_t)
+  RUN_LOOP (int16_t, int32_t)
+  RUN_LOOP (uint16_t, int32_t)
+  RUN_LOOP (int32_t, int32_t)
+  RUN_LOOP (uint32_t, int32_t)
+  RUN_LOOP (int64_t, int32_t)
+  RUN_LOOP (uint64_t, int32_t)
+  RUN_LOOP (_Float16, int32_t)
+  RUN_LOOP (float, int32_t)
+  RUN_LOOP (double, int32_t)
+  RUN_LOOP (int8_t, int64_t)
+  RUN_LOOP (uint8_t, int64_t)
+  RUN_LOOP (int16_t, int64_t)
+  RUN_LOOP (uint16_t, int64_t)
+  RUN_LOOP (int32_t, int64_t)
+  RUN_LOOP (uint32_t, int64_t)
+  RUN_LOOP (int64_t, int64_t)
+  RUN_LOOP (uint64_t, int64_t)
+  RUN_LOOP (_Float16, int64_t)
+  RUN_LOOP (float, int64_t)
+  RUN_LOOP (double, int64_t)
+  RUN_LOOP (int8_t, uint8_t)
+  RUN_LOOP (uint8_t, uint8_t)
+  RUN_LOOP (int16_t, uint8_t)
+  RUN_LOOP (uint16_t, uint8_t)
+  RUN_LOOP (int32_t, uint8_t)
+  RUN_LOOP (uint32_t, uint8_t)
+  RUN_LOOP (int64_t, uint8_t)
+  RUN_LOOP (uint64_t, uint8_t)
+  RUN_LOOP (_Float16, uint8_t)
+  RUN_LOOP (float, uint8_t)
+  RUN_LOOP (double, uint8_t)
+  RUN_LOOP (int8_t, uint16_t)
+  RUN_LOOP (uint8_t, uint16_t)
+  RUN_LOOP (int16_t, uint16_t)
+  RUN_LOOP (uint16_t, uint16_t)
+  RUN_LOOP (int32_t, uint16_t)
+  RUN_LOOP (uint32_t, uint16_t)
+  RUN_LOOP (int64_t, uint16_t)
+  RUN_LOOP (uint64_t, uint16_t)
+  RUN_LOOP (_Float16, uint16_t)
+  RUN_LOOP (float, uint16_t)
+  RUN_LOOP (double, uint16_t)
+  RUN_LOOP (int8_t, uint32_t)
+  RUN_LOOP (uint8_t, uint32_t)
+  RUN_LOOP (int16_t, uint32_t)
+  RUN_LOOP (uint16_t, uint32_t)
+  RUN_LOOP (int32_t, uint32_t)
+  RUN_LOOP (uint32_t, uint32_t)
+  RUN_LOOP (int64_t, uint32_t)
+  RUN_LOOP (uint64_t, uint32_t)
+  RUN_LOOP (_Float16, uint32_t)
+  RUN_LOOP (float, uint32_t)
+  RUN_LOOP (double, uint32_t)
+  RUN_LOOP (int8_t, uint64_t)
+  RUN_LOOP (uint8_t, uint64_t)
+  RUN_LOOP (int16_t, uint64_t)
+  RUN_LOOP (uint16_t, uint64_t)
+  RUN_LOOP (int32_t, uint64_t)
+  RUN_LOOP (uint32_t, uint64_t)
+  RUN_LOOP (int64_t, uint64_t)
+  RUN_LOOP (uint64_t, uint64_t)
+  RUN_LOOP (_Float16, uint64_t)
+  RUN_LOOP (float, uint64_t)
+  RUN_LOOP (double, uint64_t)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
new file mode 100644
index 00000000000..33ddb5d9909
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
new file mode 100644
index 00000000000..9f06fbe4ecf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
new file mode 100644
index 00000000000..ae578f0c7b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
new file mode 100644
index 00000000000..741abd166e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
new file mode 100644
index 00000000000..a14a5c4ced1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
new file mode 100644
index 00000000000..0ccc7dce166
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
new file mode 100644
index 00000000000..a34688ff339
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "mask_gather_load-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
new file mode 100644
index 00000000000..1cfdede2060
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_gather_load-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128] = {0};                                       \
+  DATA_TYPE dest2_##DATA_TYPE[128] = {0};                                      \
+  DATA_TYPE src_##DATA_TYPE[128] = {0};                                        \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128] = {0};                         \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[i]                                            \
+ == (dest2_##DATA_TYPE[i]                                       \
+     + src_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]));      \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[i] == dest2_##DATA_TYPE[i]);                  \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
new file mode 100644
index 00000000000..623de41267b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
new file mode 100644
index 00000000000..55112b067fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
new file mode 100644
index 00000000000..32a572d0064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
new file mode 100644
index 00000000000..fbaaa9d8a8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
new file mode 100644
index 00000000000..9b08661f8e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                              \
+  T (uint16_t, 8)                                                             \
+  T (_Float16, 8)                                                             \
+  T (int32_t, 8)                                                              \
+  T (uint32_t, 8)                                                             \
+  T (float, 8)                                                                \
+  T (int64_t, 8)                                                              \
+  T (uint64_t, 8)                                                             \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
new file mode 100644
index 00000000000..dd26635f2cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
new file mode 100644
index 00000000000..fa0206a0ec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 16)                                                              \
+  T (uint32_t, 16)                                                             \
+  T (float, 16)                                                                \
+  T (int64_t, 16)                                                              \
+  T (uint64_t, 16)                                                             \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
new file mode 100644
index 00000000000..325e86c26a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
new file mode 100644
index 00000000000..b4b84e9cdda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                              \
+  T (uint16_t, 32)                                                             \
+  T (_Float16, 32)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 32)                                                              \
+  T (uint64_t, 32)                                                             \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
new file mode 100644
index 00000000000..77a9af953e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices, INDEX##BITS *restrict cond)    \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      if (cond[i])                                                             \
+ dest[indices[i]] = src[i] + 1;                                         \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                              \
+  T (uint16_t, 64)                                                             \
+  T (_Float16, 64)                                                             \
+  T (int32_t, 64)                                                              \
+  T (uint32_t, 64)                                                             \
+  T (float, 64)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
new file mode 100644
index 00000000000..e0d52bf6291
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
new file mode 100644
index 00000000000..c1af0d30e62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
new file mode 100644
index 00000000000..6b1b02eae35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
new file mode 100644
index 00000000000..cef0bdec1d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
new file mode 100644
index 00000000000..88a74d5a632
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
new file mode 100644
index 00000000000..06804ab7111
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
new file mode 100644
index 00000000000..c6c9a676ed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
new file mode 100644
index 00000000000..8246e964aad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
new file mode 100644
index 00000000000..8ee35d2e505
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
new file mode 100644
index 00000000000..c27a673e2b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "mask_scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  INDEX##BITS cond_##DATA_TYPE##_##BITS[128] = {0};                            \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+      cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i & 0x3) == 3);             \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS, cond_##DATA_TYPE##_##BITS);     \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      if (cond_##DATA_TYPE##_##BITS[i])                                        \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == (src_##DATA_TYPE[i] + 1));                                  \
+      else                                                                     \
+ assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]              \
+ == dest2_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]);        \
+    }
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
new file mode 100644
index 00000000000..6a390261cfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+#define INDEX16 uint16_t
+#define INDEX32 uint32_t
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
new file mode 100644
index 00000000000..feb58d7d458
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
new file mode 100644
index 00000000000..e4c587fd7bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 16)                                                              \
+  T (uint16_t, 16)                                                             \
+  T (_Float16, 16)                                                             \
+  T (int32_t, 32)                                                              \
+  T (uint32_t, 32)                                                             \
+  T (float, 32)                                                                \
+  T (int64_t, 64)                                                              \
+  T (uint64_t, 64)                                                             \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
new file mode 100644
index 00000000000..33ad256d3db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 uint8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
new file mode 100644
index 00000000000..48d305623e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX8 int8_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 8)                                                                \
+  T (uint8_t, 8)                                                               \
+  T (int16_t, 8)                                                               \
+  T (uint16_t, 8)                                                              \
+  T (_Float16, 8)                                                              \
+  T (int32_t, 8)                                                               \
+  T (uint32_t, 8)                                                              \
+  T (float, 8)                                                                 \
+  T (int64_t, 8)                                                               \
+  T (uint64_t, 8)                                                              \
+  T (double, 8)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
new file mode 100644
index 00000000000..83ddc44bf9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 uint16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
new file mode 100644
index 00000000000..11eb68bdb13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX16 int16_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 16)                                                                \
+  T (uint8_t, 16)                                                               \
+  T (int16_t, 16)                                                               \
+  T (uint16_t, 16)                                                              \
+  T (_Float16, 16)                                                              \
+  T (int32_t, 16)                                                               \
+  T (uint32_t, 16)                                                              \
+  T (float, 16)                                                                 \
+  T (int64_t, 16)                                                               \
+  T (uint64_t, 16)                                                              \
+  T (double, 16)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
new file mode 100644
index 00000000000..2e323477258
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 uint32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
new file mode 100644
index 00000000000..e6732fe3790
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX32 int32_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 32)                                                                \
+  T (uint8_t, 32)                                                               \
+  T (int16_t, 32)                                                               \
+  T (uint16_t, 32)                                                              \
+  T (_Float16, 32)                                                              \
+  T (int32_t, 32)                                                               \
+  T (uint32_t, 32)                                                              \
+  T (float, 32)                                                                 \
+  T (int64_t, 32)                                                               \
+  T (uint64_t, 32)                                                              \
+  T (double, 32)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
new file mode 100644
index 00000000000..766a52b4622
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d  -fdump-tree-vect-details" } */
+
+#include <stdint-gcc.h>
+
+#define INDEX64 uint64_t
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,            \
+ INDEX##BITS *restrict indices)                                \
+  {                                                                            \
+    for (int i = 0; i < 128; ++i)                                              \
+      dest[indices[i]] = src[i] + 1;                                           \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t, 64)                                                                \
+  T (uint8_t, 64)                                                               \
+  T (int16_t, 64)                                                               \
+  T (uint16_t, 64)                                                              \
+  T (_Float16, 64)                                                              \
+  T (int32_t, 64)                                                               \
+  T (uint32_t, 64)                                                              \
+  T (float, 64)                                                                 \
+  T (int64_t, 64)                                                               \
+  T (uint64_t, 64)                                                              \
+  T (double, 64)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 11 "vect" } } */
+/* { dg-final { scan-tree-dump " \.LEN_MASK_SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "vect" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
new file mode 100644
index 00000000000..cafa64f3527
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
new file mode 100644
index 00000000000..79f6885831f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-10.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
new file mode 100644
index 00000000000..376db088153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
new file mode 100644
index 00000000000..103b8649d38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-3.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
new file mode 100644
index 00000000000..f5f89c0fb4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-4.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
new file mode 100644
index 00000000000..049251ec888
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-5.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
new file mode 100644
index 00000000000..59c8e701dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-6.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
new file mode 100644
index 00000000000..a24401181e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-7.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
new file mode 100644
index 00000000000..080c9b83363
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+
+#include "scatter_store-8.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
new file mode 100644
index 00000000000..cc9f20f0daa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c
@@ -0,0 +1,40 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-mcmodel=medany" } */
+
+#include "scatter_store-9.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE[128];                                             \
+  DATA_TYPE dest2_##DATA_TYPE[128];                                            \
+  DATA_TYPE src_##DATA_TYPE[128];                                              \
+  INDEX##BITS indices_##DATA_TYPE##_##BITS[128];                               \
+  for (int i = 0; i < 128; i++)                                                \
+    {                                                                          \
+      dest_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));         \
+      dest2_##DATA_TYPE[i] = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));        \
+      src_##DATA_TYPE[i] = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));         \
+      indices_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) ((i * 3 + 67) % 128);      \
+    }                                                                          \
+  f_##DATA_TYPE (dest_##DATA_TYPE, src_##DATA_TYPE,                            \
+ indices_##DATA_TYPE##_##BITS);                                \
+  for (int i = 0; i < 128; i++)                                                \
+    assert (dest_##DATA_TYPE[indices_##DATA_TYPE##_##BITS[i]]                  \
+     == (src_##DATA_TYPE[i] + 1));
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
new file mode 100644
index 00000000000..c7b990668c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+   INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i] += src[i * stride];                                              \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-assembler-not "vluxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
new file mode 100644
index 00000000000..37dd7291f9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+   INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < (BITS + 13); ++i)                              \
+      dest[i] += src[i * (BITS - 3)];                                          \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_GATHER_LOAD" 46 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
+/* { dg-final { scan-assembler-not "vluxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
new file mode 100644
index 00000000000..4b03c25a907
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+ = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+   stride_##DATA_TYPE##_##BITS,                         \
+   n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+ dest_##DATA_TYPE##_##BITS[i]                                           \
+ == (dest2_##DATA_TYPE##_##BITS[i]                                      \
+     + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
new file mode 100644
index 00000000000..8499e4cef24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_load-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+ = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+   stride_##DATA_TYPE##_##BITS,                         \
+   n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (                                                                 \
+ dest_##DATA_TYPE##_##BITS[i]                                           \
+ == (dest2_##DATA_TYPE##_##BITS[i]                                      \
+     + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]));     \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
new file mode 100644
index 00000000000..df0560c5a31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+   INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * stride] = src[i] + BITS;                                        \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 66 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-assembler-not "vsuxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
new file mode 100644
index 00000000000..1419cbc91b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
+
+#include <stdint-gcc.h>
+
+#ifndef INDEX8
+#define INDEX8 int8_t
+#define INDEX16 int16_t
+#define INDEX32 int32_t
+#define INDEX64 int64_t
+#endif
+
+#define TEST_LOOP(DATA_TYPE, BITS)                                             \
+  void __attribute__ ((noinline, noclone))                                     \
+  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
+   INDEX##BITS stride, INDEX##BITS n)                   \
+  {                                                                            \
+    for (INDEX##BITS i = 0; i < n; ++i)                                        \
+      dest[i * (BITS - 3)] = src[i] + BITS;                                    \
+  }
+
+#define TEST_TYPE(T, DATA_TYPE)                                                \
+  T (DATA_TYPE, 8)                                                             \
+  T (DATA_TYPE, 16)                                                            \
+  T (DATA_TYPE, 32)                                                            \
+  T (DATA_TYPE, 64)
+
+#define TEST_ALL(T)                                                            \
+  TEST_TYPE (T, int8_t)                                                        \
+  TEST_TYPE (T, uint8_t)                                                       \
+  TEST_TYPE (T, int16_t)                                                       \
+  TEST_TYPE (T, uint16_t)                                                      \
+  TEST_TYPE (T, _Float16)                                                      \
+  TEST_TYPE (T, int32_t)                                                       \
+  TEST_TYPE (T, uint32_t)                                                      \
+  TEST_TYPE (T, float)                                                         \
+  TEST_TYPE (T, int64_t)                                                       \
+  TEST_TYPE (T, uint64_t)                                                      \
+  TEST_TYPE (T, double)
+
+TEST_ALL (TEST_LOOP)
+
+/* { dg-final { scan-tree-dump-times " \.LEN_MASK_SCATTER_STORE" 44 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
+/* { dg-final { scan-assembler-not "vsuxei" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
new file mode 100644
index 00000000000..e9dca4672c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
@@ -0,0 +1,82 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-1.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+ = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+   stride_##DATA_TYPE##_##BITS,                         \
+   n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+       == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
new file mode 100644
index 00000000000..509def789e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c
@@ -0,0 +1,82 @@
+/* { dg-do run { target { riscv_vector } } } */
+
+#include "strided_store-2.c"
+#include <assert.h>
+
+int
+main (void)
+{
+#define RUN_LOOP(DATA_TYPE, BITS)                                              \
+  DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];               \
+  DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];              \
+  DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)];                \
+  INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3);                        \
+  INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13);                            \
+  for (INDEX##BITS i = 0;                                                      \
+       i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++)          \
+    {                                                                          \
+      dest_##DATA_TYPE##_##BITS[i]                                             \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      dest2_##DATA_TYPE##_##BITS[i]                                            \
+ = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1));                           \
+      src_##DATA_TYPE##_##BITS[i]                                              \
+ = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1));                          \
+    }                                                                          \
+  f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \
+   stride_##DATA_TYPE##_##BITS,                         \
+   n_##DATA_TYPE##_##BITS);                             \
+  for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++)                             \
+    {                                                                          \
+      assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]       \
+       == (src_##DATA_TYPE##_##BITS[i] + BITS));                        \
+    }
+
+  RUN_LOOP (int8_t, 8)
+  RUN_LOOP (uint8_t, 8)
+  RUN_LOOP (int16_t, 8)
+  RUN_LOOP (uint16_t, 8)
+  RUN_LOOP (_Float16, 8)
+  RUN_LOOP (int32_t, 8)
+  RUN_LOOP (uint32_t, 8)
+  RUN_LOOP (float, 8)
+  RUN_LOOP (int64_t, 8)
+  RUN_LOOP (uint64_t, 8)
+  RUN_LOOP (double, 8)
+
+  RUN_LOOP (int8_t, 16)
+  RUN_LOOP (uint8_t, 16)
+  RUN_LOOP (int16_t, 16)
+  RUN_LOOP (uint16_t, 16)
+  RUN_LOOP (_Float16, 16)
+  RUN_LOOP (int32_t, 16)
+  RUN_LOOP (uint32_t, 16)
+  RUN_LOOP (float, 16)
+  RUN_LOOP (int64_t, 16)
+  RUN_LOOP (uint64_t, 16)
+  RUN_LOOP (double, 16)
+
+  RUN_LOOP (int8_t, 32)
+  RUN_LOOP (uint8_t, 32)
+  RUN_LOOP (int16_t, 32)
+  RUN_LOOP (uint16_t, 32)
+  RUN_LOOP (_Float16, 32)
+  RUN_LOOP (int32_t, 32)
+  RUN_LOOP (uint32_t, 32)
+  RUN_LOOP (float, 32)
+  RUN_LOOP (int64_t, 32)
+  RUN_LOOP (uint64_t, 32)
+  RUN_LOOP (double, 32)
+
+  RUN_LOOP (int8_t, 64)
+  RUN_LOOP (uint8_t, 64)
+  RUN_LOOP (int16_t, 64)
+  RUN_LOOP (uint16_t, 64)
+  RUN_LOOP (_Float16, 64)
+  RUN_LOOP (int32_t, 64)
+  RUN_LOOP (uint32_t, 64)
+  RUN_LOOP (float, 64)
+  RUN_LOOP (int64_t, 64)
+  RUN_LOOP (uint64_t, 64)
+  RUN_LOOP (double, 64)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 5e69235a268..19589fa9638 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -90,5 +90,28 @@ foreach op $AUTOVEC_TEST_OPTS {
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
"-std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax" $CFLAGS
+# gather-scatter tests
+set AUTOVEC_TEST_OPTS [list \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O2 --param riscv-autovec-preference=scalable --param riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} ]
+foreach op $AUTOVEC_TEST_OPTS {
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/gather-scatter/*.\[cS\]]] \
+    "" "$op"
+}
+
# All done.
dg-finish
-- 
2.36.1
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-07 14:32 [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization Juzhe-Zhong
  2023-07-10 21:51 ` 钟居哲
@ 2023-07-12  2:01 ` Jeff Law
  2023-07-12  2:34   ` juzhe.zhong
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Law @ 2023-07-12  2:01 UTC (permalink / raw)
  To: Juzhe-Zhong, gcc-patches; +Cc: kito.cheng, rdapp.gcc



On 7/7/23 08:32, Juzhe-Zhong wrote:
> This patch fully support gather_load/scatter_store:
> 1. Support single-rgroup on both RV32/RV64.
> 2. Support indexed element width can be same as or smaller than Pmode.
> 3. Support VLA SLP with gather/scatter.
> 4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
> 5. Fix bug of handling (subreg:SI (const_poly_int:DI))
> 6. Fix bug on vec_perm which is used by gather/scatter SLP.
> 
> All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
> We fully supported these 4 kinds of gather/scatter:
> 1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
> 
> We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate gather/scatter).
> 
> Also, we support strided loads/stores with vlse.v/vsse.v. Consider this following case:
> #define TEST_LOOP(DATA_TYPE, BITS)                                             \
>    void __attribute__ ((noinline, noclone))                                     \
>    f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
> 			  INDEX##BITS stride, INDEX##BITS n)                   \
>    {                                                                            \
>      for (INDEX##BITS i = 0; i < n; ++i)                                        \
>        dest[i] += src[i * stride];                                              \
>    }
> 
> Codegen:
> f_int8_t_8:
> 	ble	a3,zero,.L10
> 	li	a5,1
> 	mv	a4,a0
> 	bne	a2,a5,.L4
> 	li	a2,1
> .L6:
> 	vsetvli	a5,a3,e8,m2,ta,ma
> 	vle8.v	v2,0(a0)
> 	vlse8.v	v4,0(a1),a2
> 	vsetvli	a6,zero,e8,m2,ta,ma
> 	sub	a3,a3,a5
> 	vadd.vv	v2,v2,v4
> 	vsetvli	zero,a5,e8,m2,ta,ma
> 	vse8.v	v2,0(a4)
> 	add	a0,a0,a5
> 	add	a1,a1,a5
> 	add	a4,a4,a5
> 	bne	a3,zero,.L6
> .L10:
> 	ret
> 
> We use vlse.v instead of vluxei.
> 
> This patch has been tested on both RV32 and RV64.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/autovec.md (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
>          (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
>          (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
>          (len_mask_gather_load<mode><mode>): Ditto.
>          (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
>          (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
>          (len_mask_scatter_store<mode><mode>): Ditto.
>          * config/riscv/predicates.md (const_1_operand): New predicate.
>          (vector_gs_offset_operand): Ditto.
>          (vector_gs_scale_operand_16): Ditto.
>          (vector_gs_scale_operand_32): Ditto.
>          (vector_gs_scale_operand_64): Ditto.
>          (vector_gs_extension_operand): Ditto.
>          (vector_gs_scale_operand_16_rv32): Ditto.
>          (vector_gs_scale_operand_32_rv32): Ditto.
>          * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
>          (expand_gather_scatter): New function.
>          * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
>          (emit_vlmax_masked_store_insn): New function.
>          (emit_nonvlmax_masked_store_insn): Ditto.
>          (modulo_sel_indices): Ditto.
>          (expand_vec_perm): Fix SLP for gather/scatter.
>          (prepare_gather_scatter): New function.
>          (strided_load_store_p): Ditto.
>          (expand_gather_scatter): Ditto.
>          * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI (DI CONST_POLY_INT)).
>          * config/riscv/vector-iterators.md: Add gather/scatter.
>          * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
>          (@vec_duplicate<mode>): Ditto.
>          (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix name.
>          (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: New test.
> 
> ---






> +
> +/* Return true if it is the strided load/store.  */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_EXPR (vec_offset);
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is 
complete.  While you might be able to get REG_EXPR, I would not really 
expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some 
way to make sure it's not called at an inappropriate time.


> +
> +/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
> +void
> +expand_gather_scatter (rtx *ops, bool is_load)
> +{
> +
> +  /* We use vlse.v/vsse.v instead of indexed load/store by default
> +     if it is strided load/store.
> +
> +     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
> +     We may need COST MODE to adjust it.  */
I'd be surprised if we encounter a case where vector strided will be 
worse than the equivalent vector indexed.  In the unlikely event that 
happens, we'll have to implement a suitable cost model and splat the 
stride into a vector index register.    But I wouldn't worry too much 
about it at this stage.


> +  rtx base, step;
> +  if (strided_load_store_p (vec_offset, &base, &step))
> +    {
> +      if (GET_MODE (step) != Pmode)
> +	{
> +	  if (CONSTANT_P (step))
> +	    step = force_reg (Pmode, step);
> +	  else
> +	    {
> +	      rtx extend_step = gen_reg_rtx (Pmode);
> +	      emit_insn (gen_extend_insn (extend_step, step, Pmode,
> +					  GET_MODE (step),
> +					  zero_extend_p ? true : false));
> +	      step = extend_step;
> +	    }
What happens for a non-constant step in a mode the same size as Pmode, 
particularly in a non-optimizing compilation?  Wouldn't that abort with 
an unrecognized extension insn?

I'd have similar concerns with the code that handles the case 
inner_offsize < inner_vsize.




> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 5b7a17b9d34..19740c89132 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1357,8 +1357,16 @@
>   	}
>       }
>     else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
> -           && immediate_operand (operands[3], Pmode))
> -    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, operands[3]));
> +           && (immediate_operand (operands[3], Pmode)
> +	       || (CONST_POLY_INT_P (operands[3])
> +	           && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
> +		   && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
Should this have been known_lt rather than known_le?


> @@ -1397,6 +1406,12 @@
>   	  (match_dup 2)))]
>     {
>       gcc_assert (can_create_pseudo_p ());
> +    if (CONST_POLY_INT_P (operands[3]))
> +      {
> +        rtx tmp = gen_reg_rtx (<VEL>mode);
> +	emit_move_insn (tmp, operands[3]);
> +	operands[3] = tmp;
> +      }
Something's off in your formatting here.  I'd guess spaces vs tabs


In a few places you're using expand_binop.  Those interfaces are really 
more for gimple->RTL.  BUt code like expand_gather_scatter is really 
RTL, not gimple/tree.   Is there a reason why you're not using pure RTL 
interfaces?

Anyway this is mostly good, but I do think there are a few outstanding 
questions/concerns to work through.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  2:01 ` Jeff Law
@ 2023-07-12  2:34   ` juzhe.zhong
  2023-07-12  5:32     ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-07-12  2:34 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches; +Cc: Kito.cheng, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 18597 bytes --]

Hi, Jeff.

>> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
>> complete.  While you might be able to get REG_EXPR, I would not really
>> expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
>> way to make sure it's not called at an inappropriate time.
I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.

>> Should this have been known_lt rather than known_le?
It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE for SLP.

>> Something's off in your formatting here.  I'd guess spaces vs tabs
Ok.

>>In a few places you're using expand_binop.  Those interfaces are really
>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
>>interfaces?
I saw ARM SVE is using them in many places for expanding patterns.
And I think it's convenient so that's why I use them.

Thanks.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-07-12 10:01
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; rdapp.gcc
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
 
 
On 7/7/23 08:32, Juzhe-Zhong wrote:
> This patch fully support gather_load/scatter_store:
> 1. Support single-rgroup on both RV32/RV64.
> 2. Support indexed element width can be same as or smaller than Pmode.
> 3. Support VLA SLP with gather/scatter.
> 4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
> 5. Fix bug of handling (subreg:SI (const_poly_int:DI))
> 6. Fix bug on vec_perm which is used by gather/scatter SLP.
> 
> All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
> We fully supported these 4 kinds of gather/scatter:
> 1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
> 
> We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate gather/scatter).
> 
> Also, we support strided loads/stores with vlse.v/vsse.v. Consider this following case:
> #define TEST_LOOP(DATA_TYPE, BITS)                                             \
>    void __attribute__ ((noinline, noclone))                                     \
>    f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
>   INDEX##BITS stride, INDEX##BITS n)                   \
>    {                                                                            \
>      for (INDEX##BITS i = 0; i < n; ++i)                                        \
>        dest[i] += src[i * stride];                                              \
>    }
> 
> Codegen:
> f_int8_t_8:
> ble a3,zero,.L10
> li a5,1
> mv a4,a0
> bne a2,a5,.L4
> li a2,1
> .L6:
> vsetvli a5,a3,e8,m2,ta,ma
> vle8.v v2,0(a0)
> vlse8.v v4,0(a1),a2
> vsetvli a6,zero,e8,m2,ta,ma
> sub a3,a3,a5
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e8,m2,ta,ma
> vse8.v v2,0(a4)
> add a0,a0,a5
> add a1,a1,a5
> add a4,a4,a5
> bne a3,zero,.L6
> .L10:
> ret
> 
> We use vlse.v instead of vluxei.
> 
> This patch has been tested on both RV32 and RV64.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/autovec.md (len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
>          (len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
>          (len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
>          (len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
>          (len_mask_gather_load<mode><mode>): Ditto.
>          (len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
>          (len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
>          (len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
>          (len_mask_scatter_store<mode><mode>): Ditto.
>          * config/riscv/predicates.md (const_1_operand): New predicate.
>          (vector_gs_offset_operand): Ditto.
>          (vector_gs_scale_operand_16): Ditto.
>          (vector_gs_scale_operand_32): Ditto.
>          (vector_gs_scale_operand_64): Ditto.
>          (vector_gs_extension_operand): Ditto.
>          (vector_gs_scale_operand_16_rv32): Ditto.
>          (vector_gs_scale_operand_32_rv32): Ditto.
>          * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
>          (expand_gather_scatter): New function.
>          * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
>          (emit_vlmax_masked_store_insn): New function.
>          (emit_nonvlmax_masked_store_insn): Ditto.
>          (modulo_sel_indices): Ditto.
>          (expand_vec_perm): Fix SLP for gather/scatter.
>          (prepare_gather_scatter): New function.
>          (strided_load_store_p): Ditto.
>          (expand_gather_scatter): Ditto.
>          * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI (DI CONST_POLY_INT)).
>          * config/riscv/vector-iterators.md: Add gather/scatter.
>          * config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
>          (@vec_duplicate<mode>): Ditto.
>          (@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Fix name.
>          (@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c: New test.
> 
> ---
 
 
 
 
 
 
> +
> +/* Return true if it is the strided load/store.  */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_EXPR (vec_offset);
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is 
complete.  While you might be able to get REG_EXPR, I would not really 
expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some 
way to make sure it's not called at an inappropriate time.
 
 
> +
> +/* Expand LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.  */
> +void
> +expand_gather_scatter (rtx *ops, bool is_load)
> +{
> +
> +  /* We use vlse.v/vsse.v instead of indexed load/store by default
> +     if it is strided load/store.
> +
> +     FIXME: vlse.v/vsse.v may not always be better than vluxei.v/vsuxei.v.
> +     We may need COST MODE to adjust it.  */
I'd be surprised if we encounter a case where vector strided will be 
worse than the equivalent vector indexed.  In the unlikely event that 
happens, we'll have to implement a suitable cost model and splat the 
stride into a vector index register.    But I wouldn't worry too much 
about it at this stage.
 
 
> +  rtx base, step;
> +  if (strided_load_store_p (vec_offset, &base, &step))
> +    {
> +      if (GET_MODE (step) != Pmode)
> + {
> +   if (CONSTANT_P (step))
> +     step = force_reg (Pmode, step);
> +   else
> +     {
> +       rtx extend_step = gen_reg_rtx (Pmode);
> +       emit_insn (gen_extend_insn (extend_step, step, Pmode,
> +   GET_MODE (step),
> +   zero_extend_p ? true : false));
> +       step = extend_step;
> +     }
What happens for a non-constant step in a mode the same size as Pmode, 
particularly in a non-optimizing compilation?  Wouldn't that abort with 
an unrecognized extension insn?
 
I'd have similar concerns with the code that handles the case 
inner_offsize < inner_vsize.
 
 
 
 
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 5b7a17b9d34..19740c89132 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1357,8 +1357,16 @@
>   }
>       }
>     else if (GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)
> -           && immediate_operand (operands[3], Pmode))
> -    operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, force_reg (Pmode, operands[3]));
> +           && (immediate_operand (operands[3], Pmode)
> +        || (CONST_POLY_INT_P (operands[3])
> +            && known_ge (rtx_to_poly_int64 (operands[3]), 0U)
> +    && known_le (rtx_to_poly_int64 (operands[3]), GET_MODE_SIZE (<MODE>mode)))))
Should this have been known_lt rather than known_le?
 
 
> @@ -1397,6 +1406,12 @@
>     (match_dup 2)))]
>     {
>       gcc_assert (can_create_pseudo_p ());
> +    if (CONST_POLY_INT_P (operands[3]))
> +      {
> +        rtx tmp = gen_reg_rtx (<VEL>mode);
> + emit_move_insn (tmp, operands[3]);
> + operands[3] = tmp;
> +      }
Something's off in your formatting here.  I'd guess spaces vs tabs
 
 
In a few places you're using expand_binop.  Those interfaces are really 
more for gimple->RTL.  BUt code like expand_gather_scatter is really 
RTL, not gimple/tree.   Is there a reason why you're not using pure RTL 
interfaces?
 
Anyway this is mostly good, but I do think there are a few outstanding 
questions/concerns to work through.
 
Jeff
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  2:34   ` juzhe.zhong
@ 2023-07-12  5:32     ` Jeff Law
  2023-07-12  6:03       ` juzhe.zhong
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2023-07-12  5:32 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: Kito.cheng, Robin Dapp



On 7/11/23 20:34, juzhe.zhong@rivai.ai wrote:
> Hi, Jeff.
> 
>  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
>>>complete.  While you might be able to get REG_EXPR, I would not really
>>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
>>>way to make sure it's not called at an inappropriate time.
> I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> 
>>>Should this have been known_lt rather than known_le?
> It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> for SLP.
THanks for double checking.  It looked slightly odd checking ge or le.


> 
>>>Something's off in your formatting here.  I'd guess spaces vs tabs
> Ok.
> 
>>>In a few places you're using expand_binop.  Those interfaces are really
>>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
>>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
>>>interfaces?
> I saw ARM SVE is using them in many places for expanding patterns.
> And I think it's convenient so that's why I use them.
OK.

I still think we need a resolution on strided_load_store_p.  As I 
mentioned in my original email, I'm not sure you can depend on getting 
to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
dangling pointer, then bad things are going to happen.  So let's chase 
that down.  Presumably this is called during gimple->rtl expansion, 
right?  Is it ever called later?

I think my concerns about expand_gather_scatter are a non-issue after 
looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
when I first looked at that code.


jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  5:32     ` Jeff Law
@ 2023-07-12  6:03       ` juzhe.zhong
  2023-07-12  7:27         ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-07-12  6:03 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches
  Cc: Kito.cheng, Robin Dapp, richard.sandiford, rguenther

[-- Attachment #1: Type: text/plain, Size: 3671 bytes --]

I understand your concern. I CC Richards to see whether this piece of codes is  unsafe.

Hi, Richard and Richi:

Jeff is worrying about this codes in "expand_gather_scatter" of supporting len_mask_gather_load/len_mask_scatter_store in RISC-V port.

The codes are as follows:

 +/* Return true if it is the strided load/store. */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+  if (const_vec_series_p (vec_offset, base, step))
+    return true;
+
+  /* For strided load/store, vectorizer always generates
+     VEC_SERIES_EXPR for vec_offset.  */
+  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
+  if (!expr || TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+    return false;
+
+  tree baset = gimple_assign_rhs1 (def_stmt);
+  tree stept = gimple_assign_rhs2 (def_stmt);
+  *base = expand_normal (baset);
+  *step = expand_normal (stept);
+
+  if (!rtx_equal_p (*base, const0_rtx))
+    return false;
+  return true;
+}
In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the vector offset of gather/scatter is VEC_SERISE
If it is VEC_SERISE, I will lower them into RVV strided load/stores (vlse.v/vsse.v) which is using scalar stride, 
if it is not, then use common RVV indexed load/store with vector offset (vluxei/vsuxei).

Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  (during the stage "expand" expanding gimple ->rtl).

I am also wondering whether I am doing wrong here.
Thanks.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-07-12 13:32
To: juzhe.zhong@rivai.ai; gcc-patches
CC: Kito.cheng; Robin Dapp
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
 
 
On 7/11/23 20:34, juzhe.zhong@rivai.ai wrote:
> Hi, Jeff.
> 
>  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
>>>complete.  While you might be able to get REG_EXPR, I would not really
>>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
>>>way to make sure it's not called at an inappropriate time.
> I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> 
>>>Should this have been known_lt rather than known_le?
> It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> for SLP.
THanks for double checking.  It looked slightly odd checking ge or le.
 
 
> 
>>>Something's off in your formatting here.  I'd guess spaces vs tabs
> Ok.
> 
>>>In a few places you're using expand_binop.  Those interfaces are really
>>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
>>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
>>>interfaces?
> I saw ARM SVE is using them in many places for expanding patterns.
> And I think it's convenient so that's why I use them.
OK.
 
I still think we need a resolution on strided_load_store_p.  As I 
mentioned in my original email, I'm not sure you can depend on getting 
to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
dangling pointer, then bad things are going to happen.  So let's chase 
that down.  Presumably this is called during gimple->rtl expansion, 
right?  Is it ever called later?
 
I think my concerns about expand_gather_scatter are a non-issue after 
looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
when I first looked at that code.
 
 
jeff
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  6:03       ` juzhe.zhong
@ 2023-07-12  7:27         ` Richard Biener
  2023-07-12  7:35           ` juzhe.zhong
  2023-07-12 14:28           ` Jeff Law
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2023-07-12  7:27 UTC (permalink / raw)
  To: juzhe.zhong
  Cc: jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp, richard.sandiford

On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:

> I understand your concern. I CC Richards to see whether this piece of codes is  unsafe.
> 
> Hi, Richard and Richi:
> 
> Jeff is worrying about this codes in "expand_gather_scatter" of supporting len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> 
> The codes are as follows:
> 
>  +/* Return true if it is the strided load/store. */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
> +
> +  tree baset = gimple_assign_rhs1 (def_stmt);
> +  tree stept = gimple_assign_rhs2 (def_stmt);
> +  *base = expand_normal (baset);
> +  *step = expand_normal (stept);
> +
> +  if (!rtx_equal_p (*base, const0_rtx))
> +    return false;
> +  return true;
> +}
> In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the vector offset of gather/scatter is VEC_SERISE
> If it is VEC_SERISE, I will lower them into RVV strided load/stores (vlse.v/vsse.v) which is using scalar stride, 
> if it is not, then use common RVV indexed load/store with vector offset (vluxei/vsuxei).
> 
> Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  (during the stage "expand" expanding gimple ->rtl).

Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
can rely on REG_EXPR here since you don't know whether any coalescing
happened.  That said, maybe the implementation currently guarantees
you'll only see a REG_EXPR SSA name if there's a single definition
of that register, but at least I'm not aware of that and this is also
not documented.

I wonder if you can recover vlse.v at combine time though?

That said, if the ISA supports gather/scatter with an affine offset
the more appropriate way would be to add additional named expanders
for this and deal with the above in the middle-end during RTL
expansion instead.

Richard.

> I am also wondering whether I am doing wrong here.
> Thanks.
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Jeff Law
> Date: 2023-07-12 13:32
> To: juzhe.zhong@rivai.ai; gcc-patches
> CC: Kito.cheng; Robin Dapp
> Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
>  
>  
> On 7/11/23 20:34, juzhe.zhong@rivai.ai wrote:
> > Hi, Jeff.
> > 
> >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> >>>complete.  While you might be able to get REG_EXPR, I would not really
> >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> >>>way to make sure it's not called at an inappropriate time.
> > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > 
> >>>Should this have been known_lt rather than known_le?
> > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > for SLP.
> THanks for double checking.  It looked slightly odd checking ge or le.
>  
>  
> > 
> >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > Ok.
> > 
> >>>In a few places you're using expand_binop.  Those interfaces are really
> >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> >>>interfaces?
> > I saw ARM SVE is using them in many places for expanding patterns.
> > And I think it's convenient so that's why I use them.
> OK.
>  
> I still think we need a resolution on strided_load_store_p.  As I 
> mentioned in my original email, I'm not sure you can depend on getting 
> to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
> dangling pointer, then bad things are going to happen.  So let's chase 
> that down.  Presumably this is called during gimple->rtl expansion, 
> right?  Is it ever called later?
>  
> I think my concerns about expand_gather_scatter are a non-issue after 
> looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
> when I first looked at that code.
>  
>  
> jeff
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  7:27         ` Richard Biener
@ 2023-07-12  7:35           ` juzhe.zhong
  2023-07-12  7:56             ` Richard Biener
  2023-07-12 14:28           ` Jeff Law
  1 sibling, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2023-07-12  7:35 UTC (permalink / raw)
  To: rguenther
  Cc: jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 5447 bytes --]

Thanks Richard.

Is it correct that the better way is to add optabs (len_strided_load/len_strided_store),
then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to len_strided_load/len_strided_store optab (if it is strided load/store) in
expand_gather_load_optab_fn 
expand_scatter_store_optab_fn

of internal-fn.cc

Am I right? Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-07-12 15:27
To: juzhe.zhong@rivai.ai
CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
 
> I understand your concern. I CC Richards to see whether this piece of codes is  unsafe.
> 
> Hi, Richard and Richi:
> 
> Jeff is worrying about this codes in "expand_gather_scatter" of supporting len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> 
> The codes are as follows:
> 
>  +/* Return true if it is the strided load/store. */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +    return true;
> +
> +  /* For strided load/store, vectorizer always generates
> +     VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +    return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +    return false;
> +
> +  tree baset = gimple_assign_rhs1 (def_stmt);
> +  tree stept = gimple_assign_rhs2 (def_stmt);
> +  *base = expand_normal (baset);
> +  *step = expand_normal (stept);
> +
> +  if (!rtx_equal_p (*base, const0_rtx))
> +    return false;
> +  return true;
> +}
> In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the vector offset of gather/scatter is VEC_SERISE
> If it is VEC_SERISE, I will lower them into RVV strided load/stores (vlse.v/vsse.v) which is using scalar stride, 
> if it is not, then use common RVV indexed load/store with vector offset (vluxei/vsuxei).
> 
> Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  (during the stage "expand" expanding gimple ->rtl).
 
Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
can rely on REG_EXPR here since you don't know whether any coalescing
happened.  That said, maybe the implementation currently guarantees
you'll only see a REG_EXPR SSA name if there's a single definition
of that register, but at least I'm not aware of that and this is also
not documented.
 
I wonder if you can recover vlse.v at combine time though?
 
That said, if the ISA supports gather/scatter with an affine offset
the more appropriate way would be to add additional named expanders
for this and deal with the above in the middle-end during RTL
expansion instead.
 
Richard.
 
> I am also wondering whether I am doing wrong here.
> Thanks.
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Jeff Law
> Date: 2023-07-12 13:32
> To: juzhe.zhong@rivai.ai; gcc-patches
> CC: Kito.cheng; Robin Dapp
> Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
>  
>  
> On 7/11/23 20:34, juzhe.zhong@rivai.ai wrote:
> > Hi, Jeff.
> > 
> >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> >>>complete.  While you might be able to get REG_EXPR, I would not really
> >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> >>>way to make sure it's not called at an inappropriate time.
> > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > 
> >>>Should this have been known_lt rather than known_le?
> > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > for SLP.
> THanks for double checking.  It looked slightly odd checking ge or le.
>  
>  
> > 
> >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > Ok.
> > 
> >>>In a few places you're using expand_binop.  Those interfaces are really
> >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> >>>interfaces?
> > I saw ARM SVE is using them in many places for expanding patterns.
> > And I think it's convenient so that's why I use them.
> OK.
>  
> I still think we need a resolution on strided_load_store_p.  As I 
> mentioned in my original email, I'm not sure you can depend on getting 
> to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
> dangling pointer, then bad things are going to happen.  So let's chase 
> that down.  Presumably this is called during gimple->rtl expansion, 
> right?  Is it ever called later?
>  
> I think my concerns about expand_gather_scatter are a non-issue after 
> looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
> when I first looked at that code.
>  
>  
> jeff
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  7:35           ` juzhe.zhong
@ 2023-07-12  7:56             ` Richard Biener
  2023-07-12  8:05               ` juzhe.zhong
  2023-07-12  9:33               ` Richard Sandiford
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2023-07-12  7:56 UTC (permalink / raw)
  To: juzhe.zhong
  Cc: jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp, richard.sandiford

On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:

> Thanks Richard.
> 
> Is it correct that the better way is to add optabs (len_strided_load/len_strided_store),
> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to len_strided_load/len_strided_store optab (if it is strided load/store) in
> expand_gather_load_optab_fn 
> expand_scatter_store_optab_fn
> 
> of internal-fn.cc
> 
> Am I right? Thanks.

Yes.

In priciple the vectorizer can also directly take advantage of this
and code generate an internal .LEN_STRIDED_LOAD ifn.

Richard.

> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-12 15:27
> To: juzhe.zhong@rivai.ai
> CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
> Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
> On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I understand your concern. I CC Richards to see whether this piece of codes is  unsafe.
> > 
> > Hi, Richard and Richi:
> > 
> > Jeff is worrying about this codes in "expand_gather_scatter" of supporting len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> > 
> > The codes are as follows:
> > 
> >  +/* Return true if it is the strided load/store. */
> > +static bool
> > +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> > +{
> > +  if (const_vec_series_p (vec_offset, base, step))
> > +    return true;
> > +
> > +  /* For strided load/store, vectorizer always generates
> > +     VEC_SERIES_EXPR for vec_offset.  */
> > +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> > +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> > +    return false;
> > +
> > +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> > +  if (!def_stmt || !is_gimple_assign (def_stmt)
> > +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> > +    return false;
> > +
> > +  tree baset = gimple_assign_rhs1 (def_stmt);
> > +  tree stept = gimple_assign_rhs2 (def_stmt);
> > +  *base = expand_normal (baset);
> > +  *step = expand_normal (stept);
> > +
> > +  if (!rtx_equal_p (*base, const0_rtx))
> > +    return false;
> > +  return true;
> > +}
> > In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the vector offset of gather/scatter is VEC_SERISE
> > If it is VEC_SERISE, I will lower them into RVV strided load/stores (vlse.v/vsse.v) which is using scalar stride, 
> > if it is not, then use common RVV indexed load/store with vector offset (vluxei/vsuxei).
> > 
> > Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  (during the stage "expand" expanding gimple ->rtl).
>  
> Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
> can rely on REG_EXPR here since you don't know whether any coalescing
> happened.  That said, maybe the implementation currently guarantees
> you'll only see a REG_EXPR SSA name if there's a single definition
> of that register, but at least I'm not aware of that and this is also
> not documented.
>  
> I wonder if you can recover vlse.v at combine time though?
>  
> That said, if the ISA supports gather/scatter with an affine offset
> the more appropriate way would be to add additional named expanders
> for this and deal with the above in the middle-end during RTL
> expansion instead.
>  
> Richard.
>  
> > I am also wondering whether I am doing wrong here.
> > Thanks.
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Jeff Law
> > Date: 2023-07-12 13:32
> > To: juzhe.zhong@rivai.ai; gcc-patches
> > CC: Kito.cheng; Robin Dapp
> > Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
> >  
> >  
> > On 7/11/23 20:34, juzhe.zhong@rivai.ai wrote:
> > > Hi, Jeff.
> > > 
> > >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> > >>>complete.  While you might be able to get REG_EXPR, I would not really
> > >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> > >>>way to make sure it's not called at an inappropriate time.
> > > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > > 
> > >>>Should this have been known_lt rather than known_le?
> > > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > > for SLP.
> > THanks for double checking.  It looked slightly odd checking ge or le.
> >  
> >  
> > > 
> > >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > > Ok.
> > > 
> > >>>In a few places you're using expand_binop.  Those interfaces are really
> > >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> > >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> > >>>interfaces?
> > > I saw ARM SVE is using them in many places for expanding patterns.
> > > And I think it's convenient so that's why I use them.
> > OK.
> >  
> > I still think we need a resolution on strided_load_store_p.  As I 
> > mentioned in my original email, I'm not sure you can depend on getting 
> > to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
> > dangling pointer, then bad things are going to happen.  So let's chase 
> > that down.  Presumably this is called during gimple->rtl expansion, 
> > right?  Is it ever called later?
> >  
> > I think my concerns about expand_gather_scatter are a non-issue after 
> > looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
> > when I first looked at that code.
> >  
> >  
> > jeff
> >  
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  7:56             ` Richard Biener
@ 2023-07-12  8:05               ` juzhe.zhong
  2023-07-12  9:33               ` Richard Sandiford
  1 sibling, 0 replies; 14+ messages in thread
From: juzhe.zhong @ 2023-07-12  8:05 UTC (permalink / raw)
  To: rguenther
  Cc: jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 6312 bytes --]

Thanks Richard so much!

I am gonna prepare V7 of this patch with dropping the strided load/store support on RISC-V backend.



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-07-12 15:56
To: juzhe.zhong@rivai.ai
CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
 
> Thanks Richard.
> 
> Is it correct that the better way is to add optabs (len_strided_load/len_strided_store),
> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to len_strided_load/len_strided_store optab (if it is strided load/store) in
> expand_gather_load_optab_fn 
> expand_scatter_store_optab_fn
> 
> of internal-fn.cc
> 
> Am I right? Thanks.
 
Yes.
 
In priciple the vectorizer can also directly take advantage of this
and code generate an internal .LEN_STRIDED_LOAD ifn.
 
Richard.
 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-12 15:27
> To: juzhe.zhong@rivai.ai
> CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
> Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
> On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
>  
> > I understand your concern. I CC Richards to see whether this piece of codes is  unsafe.
> > 
> > Hi, Richard and Richi:
> > 
> > Jeff is worrying about this codes in "expand_gather_scatter" of supporting len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> > 
> > The codes are as follows:
> > 
> >  +/* Return true if it is the strided load/store. */
> > +static bool
> > +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> > +{
> > +  if (const_vec_series_p (vec_offset, base, step))
> > +    return true;
> > +
> > +  /* For strided load/store, vectorizer always generates
> > +     VEC_SERIES_EXPR for vec_offset.  */
> > +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> > +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> > +    return false;
> > +
> > +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> > +  if (!def_stmt || !is_gimple_assign (def_stmt)
> > +      || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> > +    return false;
> > +
> > +  tree baset = gimple_assign_rhs1 (def_stmt);
> > +  tree stept = gimple_assign_rhs2 (def_stmt);
> > +  *base = expand_normal (baset);
> > +  *step = expand_normal (stept);
> > +
> > +  if (!rtx_equal_p (*base, const0_rtx))
> > +    return false;
> > +  return true;
> > +}
> > In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the vector offset of gather/scatter is VEC_SERISE
> > If it is VEC_SERISE, I will lower them into RVV strided load/stores (vlse.v/vsse.v) which is using scalar stride, 
> > if it is not, then use common RVV indexed load/store with vector offset (vluxei/vsuxei).
> > 
> > Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  (during the stage "expand" expanding gimple ->rtl).
>  
> Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
> can rely on REG_EXPR here since you don't know whether any coalescing
> happened.  That said, maybe the implementation currently guarantees
> you'll only see a REG_EXPR SSA name if there's a single definition
> of that register, but at least I'm not aware of that and this is also
> not documented.
>  
> I wonder if you can recover vlse.v at combine time though?
>  
> That said, if the ISA supports gather/scatter with an affine offset
> the more appropriate way would be to add additional named expanders
> for this and deal with the above in the middle-end during RTL
> expansion instead.
>  
> Richard.
>  
> > I am also wondering whether I am doing wrong here.
> > Thanks.
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Jeff Law
> > Date: 2023-07-12 13:32
> > To: juzhe.zhong@rivai.ai; gcc-patches
> > CC: Kito.cheng; Robin Dapp
> > Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
> >  
> >  
> > On 7/11/23 20:34, juzhe.zhong@rivai.ai wrote:
> > > Hi, Jeff.
> > > 
> > >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> > >>>complete.  While you might be able to get REG_EXPR, I would not really
> > >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> > >>>way to make sure it's not called at an inappropriate time.
> > > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > > 
> > >>>Should this have been known_lt rather than known_le?
> > > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > > for SLP.
> > THanks for double checking.  It looked slightly odd checking ge or le.
> >  
> >  
> > > 
> > >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > > Ok.
> > > 
> > >>>In a few places you're using expand_binop.  Those interfaces are really
> > >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> > >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> > >>>interfaces?
> > > I saw ARM SVE is using them in many places for expanding patterns.
> > > And I think it's convenient so that's why I use them.
> > OK.
> >  
> > I still think we need a resolution on strided_load_store_p.  As I 
> > mentioned in my original email, I'm not sure you can depend on getting 
> > to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
> > dangling pointer, then bad things are going to happen.  So let's chase 
> > that down.  Presumably this is called during gimple->rtl expansion, 
> > right?  Is it ever called later?
> >  
> > I think my concerns about expand_gather_scatter are a non-issue after 
> > looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
> > when I first looked at that code.
> >  
> >  
> > jeff
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  7:56             ` Richard Biener
  2023-07-12  8:05               ` juzhe.zhong
@ 2023-07-12  9:33               ` Richard Sandiford
  2023-07-12  9:40                 ` juzhe.zhong
  2023-07-12 10:45                 ` Richard Biener
  1 sibling, 2 replies; 14+ messages in thread
From: Richard Sandiford @ 2023-07-12  9:33 UTC (permalink / raw)
  To: Richard Biener
  Cc: juzhe.zhong, jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp

Richard Biener <rguenther@suse.de> writes:
> On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
>
>> Thanks Richard.
>> 
>> Is it correct that the better way is to add optabs (len_strided_load/len_strided_store),
>> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to len_strided_load/len_strided_store optab (if it is strided load/store) in
>> expand_gather_load_optab_fn 
>> expand_scatter_store_optab_fn
>> 
>> of internal-fn.cc
>> 
>> Am I right? Thanks.
>
> Yes.
>
> In priciple the vectorizer can also directly take advantage of this
> and code generate an internal .LEN_STRIDED_LOAD ifn.

Yeah, in particular, having a strided load should relax some
of the restrictions around the relationship of the vector offset
type to the loaded/stored data.  E.g. a "gather" of N bytes with a
64-bit stride would in principle be possible without needing an
Nx64-bit vector offset type.

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  9:33               ` Richard Sandiford
@ 2023-07-12  9:40                 ` juzhe.zhong
  2023-07-12 10:45                 ` Richard Biener
  1 sibling, 0 replies; 14+ messages in thread
From: juzhe.zhong @ 2023-07-12  9:40 UTC (permalink / raw)
  To: richard.sandiford, rguenther
  Cc: jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]

Thanks Richard.

I have addressed all comments on V7 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624220.html 

Drop vlse/vsse codegen optimization in RISC-V backend, instead I will support LEN_MASK_STRIDED_LOAD/LEN_MASK_STRIDE_STORE
in the future.

Thanks. 


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 17:33
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
Richard Biener <rguenther@suse.de> writes:
> On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
>
>> Thanks Richard.
>> 
>> Is it correct that the better way is to add optabs (len_strided_load/len_strided_store),
>> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to len_strided_load/len_strided_store optab (if it is strided load/store) in
>> expand_gather_load_optab_fn 
>> expand_scatter_store_optab_fn
>> 
>> of internal-fn.cc
>> 
>> Am I right? Thanks.
>
> Yes.
>
> In priciple the vectorizer can also directly take advantage of this
> and code generate an internal .LEN_STRIDED_LOAD ifn.
 
Yeah, in particular, having a strided load should relax some
of the restrictions around the relationship of the vector offset
type to the loaded/stored data.  E.g. a "gather" of N bytes with a
64-bit stride would in principle be possible without needing an
Nx64-bit vector offset type.
 
Richard
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  9:33               ` Richard Sandiford
  2023-07-12  9:40                 ` juzhe.zhong
@ 2023-07-12 10:45                 ` Richard Biener
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Biener @ 2023-07-12 10:45 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: juzhe.zhong, jeffreyalaw, gcc-patches, Kito.cheng, Robin Dapp

On Wed, 12 Jul 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 12 Jul 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Thanks Richard.
> >> 
> >> Is it correct that the better way is to add optabs (len_strided_load/len_strided_store),
> >> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to len_strided_load/len_strided_store optab (if it is strided load/store) in
> >> expand_gather_load_optab_fn 
> >> expand_scatter_store_optab_fn
> >> 
> >> of internal-fn.cc
> >> 
> >> Am I right? Thanks.
> >
> > Yes.
> >
> > In priciple the vectorizer can also directly take advantage of this
> > and code generate an internal .LEN_STRIDED_LOAD ifn.
> 
> Yeah, in particular, having a strided load should relax some
> of the restrictions around the relationship of the vector offset
> type to the loaded/stored data.  E.g. a "gather" of N bytes with a
> 64-bit stride would in principle be possible without needing an
> Nx64-bit vector offset type.

And it can be used to do the VMAT_ELEMENTWISE/VMAT_STRIDED_SLP in
a more efficient way as well.  We never got around using gather/scatter
for these (because in practice those tend to be slower than what we
do now there).

Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization
  2023-07-12  7:27         ` Richard Biener
  2023-07-12  7:35           ` juzhe.zhong
@ 2023-07-12 14:28           ` Jeff Law
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff Law @ 2023-07-12 14:28 UTC (permalink / raw)
  To: Richard Biener, juzhe.zhong
  Cc: gcc-patches, Kito.cheng, Robin Dapp, richard.sandiford



On 7/12/23 01:27, Richard Biener wrote:

> 
> Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
> can rely on REG_EXPR here since you don't know whether any coalescing
> happened.  That said, maybe the implementation currently guarantees
> you'll only see a REG_EXPR SSA name if there's a single definition
> of that register, but at least I'm not aware of that and this is also
> not documented.
If anyone knows if the implementation guarantees that, it'd probably be 
Michael, since he did the revamping of the expansion code years ago.


> I wonder if you can recover vlse.v at combine time though?
It may be hard to recover at combine time -- our vector insns aren't in 
forms that are easily digested by combine.  In this specific case we 
have hope though.  Essentially combine would need to recognize the 
offsets vector as a simple stride and adjust appropriate.

> 
> That said, if the ISA supports gather/scatter with an affine offset
> the more appropriate way would be to add additional named expanders
> for this and deal with the above in the middle-end during RTL
> expansion instead.
It's worth a try.  I didn't have much luck with this at Tachyum, but I 
always expected it was a mis-understanding of some parts of the 
vectorizer on my part.  I was deep inside this class of problems when I 
had to push it on the stack to develop a golang port :(

We were basically going down the path of treating everythign as a 
scatter-gather, but trying to recognize strides in the offsets vector as 
a degenerate case.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-07-12 14:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07 14:32 [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization Juzhe-Zhong
2023-07-10 21:51 ` 钟居哲
2023-07-12  2:01 ` Jeff Law
2023-07-12  2:34   ` juzhe.zhong
2023-07-12  5:32     ` Jeff Law
2023-07-12  6:03       ` juzhe.zhong
2023-07-12  7:27         ` Richard Biener
2023-07-12  7:35           ` juzhe.zhong
2023-07-12  7:56             ` Richard Biener
2023-07-12  8:05               ` juzhe.zhong
2023-07-12  9:33               ` Richard Sandiford
2023-07-12  9:40                 ` juzhe.zhong
2023-07-12 10:45                 ` Richard Biener
2023-07-12 14:28           ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).