I am not sure changing the precision inner mode of BImode is correct for RVV.
Since by definition , each single 1-bit mask in RVV mask layout are consecutive.
Maybe we can wait for Kito answer this question ?  



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-02-13 16:46
To: juzhe.zhong@rivai.ai
CC: incarnation.p.lee; gcc-patches; Kito.cheng; richard.sandiford; ams
Subject: Re: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
 
> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> Yes, I think so.
> 
> Let's explain RVV more clearly.
> Let's suppose we have vector-length = 64bits in RVV CPU.
> VNx1BI is exactly 1 consecutive bits.
> VNx2BI is exactly 2 consecutive bits.
> VNx4BI is exactly 4 consecutive bits.
> VNx8BI is exactly 8 consecutive bits.
> 
> For VNx1BI (vbool64_t ), we load it wich this asm:
> vsetvl e8mf8
> vlm.v
> 
> For VNx2BI (vbool32_t ), we load it wich this asm:
> vsetvl e8mf4
> vlm.v
> 
> For VNx4BI (vbool16_t ), we load it wich this asm:
> vsetvl e8mf2
> vlm.v
> 
> For VNx8BI (vbool8_t ), we load it wich this asm:
> vsetvl e8m1
> vlm.v
> 
> In case of this code sequence:
> vbool16_t v4 = *(vbool16_t *)in;
> vbool8_t v3 = *(vbool8_t*)in;
> 
> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
> We can't just use the data loaded by VNx4BI (vbool16_t ) in  VNx8BI (vbool8_t ).
> But we can use the data loaded by VNx8BI (vbool8_t  ) in  VNx4BI (vbool16_t ).
>
> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
> It's incorrect for RVV.
 
OK, so the 'vlm.v' instruction will zero the padding bits (according to
vsetvl), but I doubt the memory subsystem will not load a whole byte.
 
Then GET_MODE_PRECISION of VNx4BI has to be smaller than 
GET_MODE_PRECISION of VNx8BI, even if their size is the same.
 
I suppose that ADJUST_NUNITS should be able to do this, but then we
have in aarch64-modes.def
 
VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
 
so all VNxMBI modes are 2 bytes in size but their component is always
BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
 
For riscv we have
 
VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
 
so here it would be natural to set the mode precision to
a poly-int computed by the component precision times nunits?  OTOH
we have to look at the component precision vs. size as well and
 
/* Single bit mode used for booleans.  */ 
BOOL_MODE (BI, 1, 1); 
 
BOOL_MODE is not documented, but its precision and size, so BImode
has a size of 1.  That makes VECTOR_BOOL_MODE very special since
the layout isn't derived from the component mode.  Deriving the
layout from the precision would make aarch64 incorrect and
would need BI2 and BI4 modes at least.
 
Adding a parameter to ADJUST_NUNITS might be the way to go instead,
specifying the number of bits in a component?
 
Richard.
 
 
> Maybe @kito can give us more information about RVV ISA if I don't explain it clearly.
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-02-13 16:07
> To: juzhe.zhong
> CC: Pan Li; gcc-patches; kito.cheng; richard.sandiford; ams
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
>  
> > Thanks for contributing this.
> > Hi, Richard. Can you help us with this issue?
> > In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> > Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> > According to RVV ISA, we adjust these modes as follows:
> > 
> > VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> > VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> > VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> > VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
>  
> So how's VNx1BImode laid out for N == 2?  Is that still a single
> byte and two consecutive bits?  I suppose so.
>  
> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>  
> I'm not sure what GET_MODE_PRECISION of the vector mode itself
> should be here, but then I wonder ...
>  
> > If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> > However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> > Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> > Consider the case as this:
> > #include "riscv_vector.h"
> > void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> > {
> >   vint8m1_t v = *(vint8m1_t*)in;
> >   *(vint8m1_t*)out = v;  vbool16_t v4 = *(vbool16_t *)in;
> >   *(vbool16_t *)(out + 300) = v4;
> >   vbool8_t v3 = *(vbool8_t*)in;
> >   *(vbool8_t*)(out + 200) = v3;
> > }
> > The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> > We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
>  
> ... why for the loads the "padding" is not loaded?  The above testcase
> is probably more complicated than necessary as well?
>  
> Thanks,
> Richard.
> >
> > juzhe.zhong@rivai.ai
> >  
> > From: incarnation.p.lee
> > Date: 2023-02-11 16:46
> > To: gcc-patches
> > CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> > Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> > From: Pan Li <incarnation.p.lee@outlook.com>
> >  
> > Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> > cannot be tied as the actually load/store size is determinated by
> > the vl. The mode size of rvv bool types are also adjusted for the
> > underlying optimization pass. The rvv bool type is vbool*_t, aka
> > vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> > vbool64_t.
> >  
> > PR 108185
> > PR 108654
> >  
> > gcc/ChangeLog:
> >  
> > * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> > * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> > (riscv_modes_tieable_p):
> > * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> > * machmode.h (VECTOR_BOOL_MODE_P):
> > * tree-ssa-sccvn.cc (visit_reference_op_load):
> >  
> > gcc/testsuite/ChangeLog:
> >  
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >  
> > Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> > ---
> > gcc/config/riscv/riscv-modes.def            | 14 ++--
> > gcc/config/riscv/riscv.cc                   | 34 ++++++++-
> > gcc/config/riscv/riscv.h                    |  2 +
> > gcc/machmode.h                              |  3 +
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> > gcc/tree-ssa-sccvn.cc                       | 13 +++-
> > 13 files changed, 608 insertions(+), 11 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >  
> > diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..cc21d3c83a2 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> > ADJUST_ALIGNMENT (VNx32BI, 1);
> > ADJUST_ALIGNMENT (VNx64BI, 1);
> > -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> > +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> > +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> > +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> > +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> > +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> > +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> > +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> > /*
> >     | Mode        | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 3b7804b7501..138c052e13c 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >    return scale;
> > }
> > +/* Call from ADJUST_BYTESIZE in riscv-modes.def.  Return the correct
> > +   BYTES size for corresponding machine_mode.  */
> > +
> > +poly_int64
> > +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> > +{
> > +  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> > +
> > +  if (riscv_v_ext_vector_mode_p (mode))
> > +    {
> > +      poly_uint16 mode_size = GET_MODE_SIZE (mode);
> > +
> > +      if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> > + return mode_size;
> > +      else
> > + return BYTES_PER_RISCV_VECTOR;
> > +    }
> > +
> > +  return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE.  If it is,
> >     fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
> >     effect.  */
> > @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> > /* Implement TARGET_MODES_TIEABLE_P.
> >     Don't allow floating-point modes to be tied, since type punning of
> > -   single-precision and double-precision is implementation defined.  */
> > +   single-precision and double-precision is implementation defined.
> > +
> > +   Don't allow different vbool*_t modes to be tied, since the type
> > +   size is determinated by vl.  */
> > static bool
> > riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> > {
> > +  if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> > +    {
> > +      if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> > + return false;
> > +
> > + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> > +    }
> > +
> >    return (mode1 == mode2
> >   || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> >        && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index faffd5a77fe..f857223338c 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> > extern unsigned riscv_bytes_per_vector_chunk;
> > extern poly_uint16 riscv_vector_chunks;
> > extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> > +
> > /* The number of bits and bytes in a RVV vector.  */
> > #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> > #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/machmode.h b/gcc/machmode.h
> > index f1865c1ef42..6720472f2c9 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >     || CLASS == MODE_ACCUM                      \
> >     || CLASS == MODE_UACCUM)
> > +/* Nonzero if MODE is an vector bool mode.  */
> > +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> > +
> > /* An optional T (i.e. a T or nothing), where T is some form of mode class.  */
> > template<typename T>
> > class opt_mode
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..c3d0b10271a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..bd13ba916da
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..99928f7b1cc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..e70284fada8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..575a7842cdf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..95a11d37016
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool1_t v1 = *(vbool1_t*)in;
> > +    vbool1_t v2 = *(vbool1_t*)in;
> > +
> > +    *(vbool1_t*)(out + 100) = v1;
> > +    *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool2_t v1 = *(vbool2_t*)in;
> > +    vbool2_t v2 = *(vbool2_t*)in;
> > +
> > +    *(vbool2_t*)(out + 100) = v1;
> > +    *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool4_t v1 = *(vbool4_t*)in;
> > +    vbool4_t v2 = *(vbool4_t*)in;
> > +
> > +    *(vbool4_t*)(out + 100) = v1;
> > +    *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool8_t v1 = *(vbool8_t*)in;
> > +    vbool8_t v2 = *(vbool8_t*)in;
> > +
> > +    *(vbool8_t*)(out + 100) = v1;
> > +    *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool16_t v1 = *(vbool16_t*)in;
> > +    vbool16_t v2 = *(vbool16_t*)in;
> > +
> > +    *(vbool16_t*)(out + 100) = v1;
> > +    *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool32_t v1 = *(vbool32_t*)in;
> > +    vbool32_t v2 = *(vbool32_t*)in;
> > +
> > +    *(vbool32_t*)(out + 100) = v1;
> > +    *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > +    vbool64_t v1 = *(vbool64_t*)in;
> > +    vbool64_t v2 = *(vbool64_t*)in;
> > +
> > +    *(vbool64_t*)(out + 100) = v1;
> > +    *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 028bedbc9a0..19fdba8cfa2 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
> > #include "gimple-fold.h"
> > #include "tree-eh.h"
> > #include "gimplify.h"
> > +#include "target.h"
> > #include "flags.h"
> > #include "dojump.h"
> > #include "explow.h"
> > @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> >    if (result
> >        && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> >      {
> > +      machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> > +      machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> > +      poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> > +      poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> > +
> >        /* Avoid the type punning in case the result mode has padding where
> > - the op we lookup has not.  */
> > -      if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> > -     GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> > + the op we lookup has not.
> > + Avoid the type punning in case the target mode cannot be tied.  */
> > +      if (maybe_lt (result_mode_precision, op_mode_precision)
> > +     || !targetm.modes_tieable_p (result_mode, op_mode))
> > result = NULL_TREE;
> >        else
> > {
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)