On Mon, 13 Feb 2023, Úğ ×İ wrote:

> Thanks all for your help and comments.
> 
> Let me share more information about this patch. Especially for the tree-ssa-sccvn.cc part.
> 
> Assume we have the blow test code for this issue.
> 
> void
> test_1(int8_t * restrict in, int8_t * restrict out) {
>     vbool8_t v2 = *(vbool8_t*)in;
>     vbool16_t v5 = *(vbool16_t*)in;
> 
>     *(vbool8_t*)(out + 100) = v2;
>     *(vbool16_t*)(out + 200) = v5;
> }
> 
> Without the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
>   vbool8_t v2;
>   __rvv_bool16_t _1;
> 
>   <bb 2> [local count: 1073741824]:
>   v2_4 = MEM[(vbool8_t *)in_3(D)];
>   _1 = VIEW_CONVERT_EXPR<__rvv_bool16_t>(v2_4);  // insert during 039.fre1
>   MEM[(vbool8_t *)out_5(D) + 100B] = v2_4;
>   MEM[(vbool16_t *)out_5(D) + 200B] = _1;
>   return;
> }
> 
> WIthin the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
>   vbool16_t v5;
>   vbool8_t v2;
> 
>   <bb 2> [local count: 1073741824]:
>   v2_3 = MEM[(vbool8_t *)in_2(D)];
>   v5_4 = MEM[(vbool16_t *)in_2(D)];
>   MEM[(vbool8_t *)out_5(D) + 100B] = v2_3;
>   MEM[(vbool16_t *)out_5(D) + 200B] = v5_4;
>   return;
> }
> 
> Thus, I figured out the a-main.c.039t.fre1 pass results in this CONVERT being inserted.
> With some debugging, I located the difference that comes from the
> expressions_equal_p. If GET_MODE_SIZE(mode) is the same between the VxN8Bimode
> and VxN4Bimode, the expressions_equal_p will compare the same address of a tree, aka
> POLY_INT_CST [8, 8].
> 
> visit_reference_op_load
>     |- vn_reference_lookup
>         |- vn_reference_lookup_2
>              |- find_slot_with_hash
>                  |- vn_reference_hasher::equal
>                      |- expressions_equal_p
> 
> Meanwhile, we also double-checked that set the different MODE_SIZE of both the
> VxN8Bimode and VxN4Bimode (for example, [8, 1] and [4,1] for test only) are able
> to resolve this issue. But they should be [1, 1] according to the ISA semantics.
> 
> Thus, we try to set other MODE_XXX but it seems not working at all. For example:
> 
> VNx4BIMode NUNITS [0x4, 0x4]
> VNx8BIMode NUNITS [0x8, 0x8]
> 
> Finally, I found the TARGET_MODES_TIEABLE_P and inject it into the function
> visit_reference_op_load to resolve this issue.
> 
> I will continue to try other ways besides the tree-ssa-sccvn.cc if this may not be
> the right place for this issue.

There are other places like alias analysis which will be not happy
if the mode size/precision do not match reality.  So no, I don't think
modes_tieable is the correct thing to check here.  Instead the existing
check seems to be to the point but the modes are not set up correctly
to carry the info of one having padding at the end and the other not.

Richard.

> Thank again and will keep you posted.
> 
> Pan
> 
> 
> 
> ________________________________
> From: Andrew Stubbs <ams@codesourcery.com>
> Sent: Monday, February 13, 2023 19:00
> To: Richard Biener <rguenther@suse.de>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Cc: Pan Li <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> 
> I presume I've been CC'd on this conversation because weird vector
> architecture problems have happened to me before. :)
> 
> However, I'm not sure I can help much because AMD GCN does not use
> BImode vectors at all. This is partly because loading boolean values
> into a GCN vector would have 31 padding bits for each lane, but mostly
> because the result of comparison instructions is written to a DImode
> scalar register, not into a vector.
> 
> I did experiment, long ago, with having a V64BImode that could be stored
> in scalar registers (tieable with DImode), but there wasn't any great
> advantage and it broke VECTOR_MODE_P in most other contexts.
> 
> It's possible to store truth values in vectors as integers, and there
> are some cases where we do so (SIMD clone mask arguments, for example),
> but that's mostly to smooth things over in the middle-end.
> 
> The problem with padding bits is something I do see: V64QImode has 24
> padding bits for each lane, in register. While there are instructions
> that will load and store QImode vectors correctly, without the padding,
> the backend still has to handle all the sign-extends, zero-extends, and
> truncates explicitly, because the middle-end and expand pass give no
> assistance with that for vectors (unlike scalars).
> 
> Andrew
> On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
> > On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Thanks for contributing this.
> >> Hi, Richard. Can you help us with this issue?
> >> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> >> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> >> According to RVV ISA, we adjust these modes as follows:
> >>
> >> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> >> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> >> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> >> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
> >
> > So how's VNx1BImode laid out for N == 2?  Is that still a single
> > byte and two consecutive bits?  I suppose so.
> >
> > But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >
> > I'm not sure what GET_MODE_PRECISION of the vector mode itself
> > should be here, but then I wonder ...
> >
> >> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> >> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> >> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> >> Consider the case as this:
> >> #include "riscv_vector.h"
> >> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> >> {
> >>    vint8m1_t v = *(vint8m1_t*)in;
> >>    *(vint8m1_t*)out = v;  vbool16_t v4 = *(vbool16_t *)in;
> >>    *(vbool16_t *)(out + 300) = v4;
> >>    vbool8_t v3 = *(vbool8_t*)in;
> >>    *(vbool8_t*)(out + 200) = v3;
> >> }
> >> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> >> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
> >
> > ... why for the loads the "padding" is not loaded?  The above testcase
> > is probably more complicated than necessary as well?
> >
> > Thanks,
> > Richard.
> >
> >>
> >> juzhe.zhong@rivai.ai
> >>
> >> From: incarnation.p.lee
> >> Date: 2023-02-11 16:46
> >> To: gcc-patches
> >> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> >> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> >> From: Pan Li <incarnation.p.lee@outlook.com>
> >>
> >> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> >> cannot be tied as the actually load/store size is determinated by
> >> the vl. The mode size of rvv bool types are also adjusted for the
> >> underlying optimization pass. The rvv bool type is vbool*_t, aka
> >> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> >> vbool64_t.
> >>
> >> PR 108185
> >> PR 108654
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> >> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> >> (riscv_modes_tieable_p):
> >> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> >> * machmode.h (VECTOR_BOOL_MODE_P):
> >> * tree-ssa-sccvn.cc (visit_reference_op_load):
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/pr108185-1.c: New test.
> >> * gcc.target/riscv/pr108185-2.c: New test.
> >> * gcc.target/riscv/pr108185-3.c: New test.
> >> * gcc.target/riscv/pr108185-4.c: New test.
> >> * gcc.target/riscv/pr108185-5.c: New test.
> >> * gcc.target/riscv/pr108185-6.c: New test.
> >> * gcc.target/riscv/pr108185-7.c: New test.
> >> * gcc.target/riscv/pr108185-8.c: New test.
> >>
> >> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> >> ---
> >> gcc/config/riscv/riscv-modes.def            | 14 ++--
> >> gcc/config/riscv/riscv.cc                   | 34 ++++++++-
> >> gcc/config/riscv/riscv.h                    |  2 +
> >> gcc/machmode.h                              |  3 +
> >> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >> gcc/tree-ssa-sccvn.cc                       | 13 +++-
> >> 13 files changed, 608 insertions(+), 11 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>
> >> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >> index d5305efa8a6..cc21d3c83a2 100644
> >> --- a/gcc/config/riscv/riscv-modes.def
> >> +++ b/gcc/config/riscv/riscv-modes.def
> >> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> >> ADJUST_ALIGNMENT (VNx32BI, 1);
> >> ADJUST_ALIGNMENT (VNx64BI, 1);
> >> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> >> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> >> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> >> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> >> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> >> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> >> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> >> /*
> >>      | Mode        | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> index 3b7804b7501..138c052e13c 100644
> >> --- a/gcc/config/riscv/riscv.cc
> >> +++ b/gcc/config/riscv/riscv.cc
> >> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >>     return scale;
> >> }
> >> +/* Call from ADJUST_BYTESIZE in riscv-modes.def.  Return the correct
> >> +   BYTES size for corresponding machine_mode.  */
> >> +
> >> +poly_int64
> >> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> >> +{
> >> +  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> >> +
> >> +  if (riscv_v_ext_vector_mode_p (mode))
> >> +    {
> >> +      poly_uint16 mode_size = GET_MODE_SIZE (mode);
> >> +
> >> +      if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> >> + return mode_size;
> >> +      else
> >> + return BYTES_PER_RISCV_VECTOR;
> >> +    }
> >> +
> >> +  return scale;
> >> +}
> >> +
> >> /* Return true if X is a valid address for machine mode MODE.  If it is,
> >>      fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
> >>      effect.  */
> >> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >> /* Implement TARGET_MODES_TIEABLE_P.
> >>      Don't allow floating-point modes to be tied, since type punning of
> >> -   single-precision and double-precision is implementation defined.  */
> >> +   single-precision and double-precision is implementation defined.
> >> +
> >> +   Don't allow different vbool*_t modes to be tied, since the type
> >> +   size is determinated by vl.  */
> >> static bool
> >> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> >> {
> >> +  if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> >> +    {
> >> +      if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> >> + return false;
> >> +
> >> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> >> +    }
> >> +
> >>     return (mode1 == mode2
> >>    || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> >>         && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> >> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >> index faffd5a77fe..f857223338c 100644
> >> --- a/gcc/config/riscv/riscv.h
> >> +++ b/gcc/config/riscv/riscv.h
> >> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> >> extern unsigned riscv_bytes_per_vector_chunk;
> >> extern poly_uint16 riscv_vector_chunks;
> >> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> >> +
> >> /* The number of bits and bytes in a RVV vector.  */
> >> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >> diff --git a/gcc/machmode.h b/gcc/machmode.h
> >> index f1865c1ef42..6720472f2c9 100644
> >> --- a/gcc/machmode.h
> >> +++ b/gcc/machmode.h
> >> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >>      || CLASS == MODE_ACCUM                      \
> >>      || CLASS == MODE_UACCUM)
> >> +/* Nonzero if MODE is an vector bool mode.  */
> >> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> >> +
> >> /* An optional T (i.e. a T or nothing), where T is some form of mode class.  */
> >> template<typename T>
> >> class opt_mode
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> new file mode 100644
> >> index 00000000000..c3d0b10271a
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> new file mode 100644
> >> index 00000000000..bd13ba916da
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> new file mode 100644
> >> index 00000000000..99928f7b1cc
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> new file mode 100644
> >> index 00000000000..e70284fada8
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> new file mode 100644
> >> index 00000000000..575a7842cdf
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> new file mode 100644
> >> index 00000000000..95a11d37016
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> new file mode 100644
> >> index 00000000000..8f6f0b11f09
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> new file mode 100644
> >> index 00000000000..d96959dd064
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> @@ -0,0 +1,77 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool1_t v1 = *(vbool1_t*)in;
> >> +    vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> +    *(vbool1_t*)(out + 100) = v1;
> >> +    *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool2_t v1 = *(vbool2_t*)in;
> >> +    vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> +    *(vbool2_t*)(out + 100) = v1;
> >> +    *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool4_t v1 = *(vbool4_t*)in;
> >> +    vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> +    *(vbool4_t*)(out + 100) = v1;
> >> +    *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool8_t v1 = *(vbool8_t*)in;
> >> +    vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> +    *(vbool8_t*)(out + 100) = v1;
> >> +    *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool16_t v1 = *(vbool16_t*)in;
> >> +    vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> +    *(vbool16_t*)(out + 100) = v1;
> >> +    *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool32_t v1 = *(vbool32_t*)in;
> >> +    vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> +    *(vbool32_t*)(out + 100) = v1;
> >> +    *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> +    vbool64_t v1 = *(vbool64_t*)in;
> >> +    vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> +    *(vbool64_t*)(out + 100) = v1;
> >> +    *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> >> index 028bedbc9a0..19fdba8cfa2 100644
> >> --- a/gcc/tree-ssa-sccvn.cc
> >> +++ b/gcc/tree-ssa-sccvn.cc
> >> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
> >> #include "gimple-fold.h"
> >> #include "tree-eh.h"
> >> #include "gimplify.h"
> >> +#include "target.h"
> >> #include "flags.h"
> >> #include "dojump.h"
> >> #include "explow.h"
> >> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> >>     if (result
> >>         && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> >>       {
> >> +      machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> >> +      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> >> +      poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> >> +      poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> >> +
> >>         /* Avoid the type punning in case the result mode has padding where
> >> - the op we lookup has not.  */
> >> -      if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> >> -     GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> >> + the op we lookup has not.
> >> + Avoid the type punning in case the target mode cannot be tied.  */
> >> +      if (maybe_lt (result_mode_precision, op_mode_precision)
> >> +     || !targetm.modes_tieable_p (result_mode, op_mode))
> >> result = NULL_TREE;
> >>         else
> >> {
> >>
> >
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)