From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=XlAy=6Y=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id BC6F13858434
	for <gcc-patches@gcc.gnu.org>; Tue, 28 Feb 2023 09:50:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BC6F13858434
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B0B01FB;
	Tue, 28 Feb 2023 01:50:58 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F3BC3F881;
	Tue, 28 Feb 2023 01:50:14 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Li\, Pan2" <pan2.li@intel.com>
Mail-Followup-To: "Li\, Pan2" <pan2.li@intel.com>, =?utf-8?B?55u8IOadjg==?=
 <incarnation.p.lee@outlook.com>,  incarnation.p.lee--- via Gcc-patches
 <gcc-patches@gcc.gnu.org>,  "juzhe.zhong\@rivai.ai"
 <juzhe.zhong@rivai.ai>,  "kito.cheng\@sifive.com" <kito.cheng@sifive.com>,
  "rguenther\@suse.de" <rguenther@suse.de>, richard.sandiford@arm.com
Cc: =?utf-8?B?55u8IOadjg==?= <incarnation.p.lee@outlook.com>,
  incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>,
  "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>,  "kito.cheng\@sifive.com"
 <kito.cheng@sifive.com>,  "rguenther\@suse.de" <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
References: <BYAPR04MB4824A720063FEE6C4F10776DA4A09@BYAPR04MB4824.namprd04.prod.outlook.com>
	<mptlekjtcco.fsf@arm.com>
	<BYAPR04MB48244794D1BF33A7F44DF8ADB7AF9@BYAPR04MB4824.namprd04.prod.outlook.com>
	<MW5PR11MB59083B1EAA01F0654526E760A9AC9@MW5PR11MB5908.namprd11.prod.outlook.com>
Date: Tue, 28 Feb 2023 09:50:12 +0000
In-Reply-To: <MW5PR11MB59083B1EAA01F0654526E760A9AC9@MW5PR11MB5908.namprd11.prod.outlook.com>
	(Pan2 Li's message of "Tue, 28 Feb 2023 02:27:07 +0000")
Message-ID: <mpt7cw2p19n.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-32.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SCC_10_SHORT_WORD_LINES,SCC_20_SHORT_WORD_LINES,SCC_35_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

"Li, Pan2" <pan2.li@intel.com> writes:
> Hi Richard Sandiford,
>
> After some investigation, I am not sure if it is possible to make it gene=
ral without any changes to exact_div. We can add one method like below to g=
et the unit poly for all possible N.
>
> template<unsigned int N, typename Ca>
> inline POLY_CONST_RESULT (N, Ca, Ca)
> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> {
>   typedef POLY_CONST_COEFF (Ca, Ca) C;
>
>   poly_int<N, C> normalized =3D a;
>
>   if (normalized.is_constant())
>     normalized.coeffs[0] =3D 1;
>   else
>     for (unsigned int i =3D 0; i < N; i++)
>       POLY_SET_COEFF (C, normalized, i, 1);
>
>   return normalized;
> }
>
> And then adjust the genmodes like below to consume the unit poly.
>
>       printf ("    poly_uint16 unit_poly =3D "
>              "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>       printf ("    if (known_lt (mode_precision[E_%smode], "
>              "unit_poly * BITS_PER_UNIT))\n", m->name);
>       printf ("      mode_size[E_%smode] =3D unit_poly;\n", m->name);
>
> I am not sure if it is a good idea to introduce above normalize code into=
 exact_div. Given the comment of the exact_div indicates that =E2=80=9C/* R=
eturn A / B, given that A is known to be a multiple of B. */=E2=80=9D.

My point was that we have multiple ways of dividing poly_ints:

- exact_div, for when the caller knows that the result is always exact
- can_div_trunc_p, for truncating division (round towards 0)
- can_div_away_from_zero_p, for rounding away from 0
- ...

This is like how we have multiple division *_EXPRs on trees.

Until now, exact_div was the correct choice for modes because vector
modes didn't have padding.  We're now changing that, so my suggestion
in the review was to change the division operation that we use.
Rather than use exact_div, we should now use can_div_away_from_zero_p,
which would have the effect of rounding the quotient up.

Something like:

      if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNI=
T,
				     &mode_size[E_%smode]))
        gcc_unreachable ();

But this will require a new overload of can_div_away_from_zero_p, since
the existing one is for constant quotients rather than constant divisors.

Thanks,
Richard

>
> Could you please help to share your opinion about this from the expert=E2=
=80=99s perspective ? Thank you!
>
> Pan
>
> From: =E7=9B=BC =E6=9D=8E <incarnation.p.lee@outlook.com>
> Sent: Monday, February 27, 2023 11:13 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; incarnation.p.lee--- v=
ia Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, P=
an2 <pan2.li@intel.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Never mind, wish you have a good holiday.
>
> Thanks for pointing this out, the if part cannot take care of poly_int wi=
th N > 2. As I understand, we need to make it general for all the N of poly=
_int.
>
> Thus I would like to double confirm with you about how to make it general=
. I suppose there will be a new function can_div_away_from_zero_p to replac=
e the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(c=
onsider the word exact, I suppose we should not touch here), right? Then we=
 still need one poly_int with all 1 for N as the return if can_div_away_fro=
m_zero_p is true.
>
> Thanks again for your professional suggestion, have a nice day, =F0=9F=98=
=89!
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandifo=
rd@arm.com>>
> Sent: Monday, February 27, 2023 22:24
> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:=
gcc-patches@gcc.gnu.org>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com> <=
incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; juzhe=
.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:j=
uzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> =
<kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<ma=
ilto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>; pan2=
.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@i=
ntel.com>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Sorry for the slow reply, been away for a couple of weeks.
>
> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gc=
c-patches@gcc.gnu.org>> writes:
>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>
>>        Fix the bug of the rvv bool mode precision with the adjustment.
>>        The bits size of vbool*_t will be adjusted to
>>        [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>        adjusted mode precison of vbool*_t will help underlying pass to
>>        make the right decision for both the correctness and optimization.
>>
>>        Given below sample code:
>>        void test_1(int8_t * restrict in, int8_t * restrict out)
>>        {
>>          vbool8_t v2 =3D *(vbool8_t*)in;
>>          vbool16_t v5 =3D *(vbool16_t*)in;
>>          *(vbool16_t*)(out + 200) =3D v5;
>>          *(vbool8_t*)(out + 100) =3D v2;
>>        }
>>
>>        Before the precision adjustment:
>>        addi    a4,a1,100
>>        vsetvli a5,zero,e8,m1,ta,ma
>>        addi    a1,a1,200
>>        vlm.v   v24,0(a0)
>>        vsm.v   v24,0(a4)
>>        // Need one vsetvli and vlm.v for correctness here.
>>        vsm.v   v24,0(a1)
>>
>>        After the precision adjustment:
>>        csrr    t0,vlenb
>>        slli    t1,t0,1
>>        csrr    a3,vlenb
>>        sub     sp,sp,t1
>>        slli    a4,a3,1
>>        add     a4,a4,sp
>>        sub     a3,a4,a3
>>        vsetvli a5,zero,e8,m1,ta,ma
>>        addi    a2,a1,200
>>        vlm.v   v24,0(a0)
>>        vsm.v   v24,0(a3)
>>        addi    a1,a1,100
>>        vsetvli a4,zero,e8,mf2,ta,ma
>>        csrr    t0,vlenb
>>        vlm.v   v25,0(a3)
>>        vsm.v   v25,0(a2)
>>        slli    t1,t0,1
>>        vsetvli a5,zero,e8,m1,ta,ma
>>        vsm.v   v24,0(a1)
>>        add     sp,sp,t1
>>        jr      ra
>>
>>        However, there may be some optimization opportunates after
>>        the mode precision adjustment. It can be token care of in
>>        the RISC-V backend in the underlying separted PR(s).
>>
>>        PR 108185
>>        PR 108654
>>
>> gcc/ChangeLog:
>>
>>        * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>        * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>        * config/riscv/riscv.h (riscv_v_adjust_precision):
>>        * genmodes.cc (ADJUST_PRECISION):
>>        (emit_mode_adjustments):
>>
>> gcc/testsuite/ChangeLog:
>>
>>        * gcc.target/riscv/pr108185-1.c: New test.
>>        * gcc.target/riscv/pr108185-2.c: New test.
>>        * gcc.target/riscv/pr108185-3.c: New test.
>>        * gcc.target/riscv/pr108185-4.c: New test.
>>        * gcc.target/riscv/pr108185-5.c: New test.
>>        * gcc.target/riscv/pr108185-6.c: New test.
>>        * gcc.target/riscv/pr108185-7.c: New test.
>>        * gcc.target/riscv/pr108185-8.c: New test.
>>
>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> ---
>>  gcc/config/riscv/riscv-modes.def            |  8 +++
>>  gcc/config/riscv/riscv.cc                   | 12 ++++
>>  gcc/config/riscv/riscv.h                    |  1 +
>>  gcc/genmodes.cc                             | 25 ++++++-
>>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>  12 files changed, 598 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>
>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-m=
odes.def
>> index d5305efa8a6..110bddce851 100644
>> --- a/gcc/config/riscv/riscv-modes.def
>> +++ b/gcc/config/riscv/riscv-modes.def
>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv=
_bytes_per_vector_chunk);
>>  ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_=
chunk);
>>  ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>
>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>> +
>>  /*
>>     | Mode        | MIN_VLEN=3D32 | MIN_VLEN=3D32 | MIN_VLEN=3D64 | MIN_=
VLEN=3D64 |
>>     |             | LMUL        | SEW/LMUL    | LMUL        | SEW/LMUL  =
  |
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index de3e1f903c7..cbe66c0e35b 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int sca=
le)
>>    return scale;
>>  }
>>
>> +/* Call from ADJUST_PRECISION in riscv-modes.def.  Return the correct
>> +   PRECISION size for corresponding machine_mode.  */
>> +
>> +poly_int64
>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>> +{
>> +  if (riscv_v_ext_vector_mode_p (mode))
>> +    return riscv_vector_chunks * scale;
>> +
>> +  return scale;
>> +}
>> +
>>  /* Return true if X is a valid address for machine mode MODE.  If it is,
>>     fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
>>     effect.  */
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index 5bc7f2f467d..15b9317a8ce 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>  extern unsigned riscv_bytes_per_vector_chunk;
>>  extern poly_uint16 riscv_vector_chunks;
>>  extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>  /* The number of bits and bytes in a RVV vector.  */
>>  #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv=
_bytes_per_vector_chunk * 8))
>>  #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * risc=
v_bytes_per_vector_chunk))
>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>> index 2d418f09aab..12f4e6335e6 100644
>> --- a/gcc/genmodes.cc
>> +++ b/gcc/genmodes.cc
>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>  static struct mode_adjust *adj_format;
>>  static struct mode_adjust *adj_ibit;
>>  static struct mode_adjust *adj_fbit;
>> +static struct mode_adjust *adj_precision;
>>
>>  /* Mode class operations.  */
>>  static enum mode_class
>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>  #define ADJUST_NUNITS(M, X)    _ADD_ADJUST (nunits, M, X, RANDOM, RANDO=
M)
>>  #define ADJUST_BYTESIZE(M, X)  _ADD_ADJUST (bytesize, M, X, RANDOM, RAN=
DOM)
>>  #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RA=
NDOM)
>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RA=
NDOM)
>>  #define ADJUST_FLOAT_FORMAT(M, X)    _ADD_ADJUST (format, M, X, FLOAT, =
FLOAT)
>>  #define ADJUST_IBIT(M, X)  _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>  #define ADJUST_FBIT(M, X)  _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>              " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>              m->name, m->name);
>>        printf ("    mode_precision[E_%smode] =3D ps * old_factor;\n", m-=
>name);
>> -      printf ("    mode_size[E_%smode] =3D exact_div (mode_precision[E_=
%smode],"
>> +      /* Normalize the size to 1 if precison is less than BITS_PER_UNIT=
.  */
>> +      printf ("    poly_uint16 size_one =3D "
>> +           "mode_precision[E_%smode].is_constant ()\n", m->name);
>> +      printf ("      ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>
> Have you tried this on an x86_64 system?  I wouldn't expect it to work
> because of the:
>
>   STATIC_ASSERT (N >=3D 2);
>
> in the poly_uint16 constructor.
>
>> +      printf ("    if (known_lt (mode_precision[E_%smode], "
>> +           "size_one * BITS_PER_UNIT))\n", m->name);
>> +      printf ("      mode_size[E_%smode] =3D size_one;\n", m->name);
>> +      printf ("    else\n");
>> +      printf ("      mode_size[E_%smode] =3D exact_div (mode_precision[=
E_%smode],"
>
> Now that the assert implicit in the original exact_div no longer holds,
> I think we should instead generalise it to can_div_away_from_zero_p
> (which will involve defining a new overload of can_div_away_from_zero_p).
> I think that will give the same result as the code above for the cases
> that the code above handles.  But it should be more general too.
>
> TBH, I'm still sceptical that this is all that is needed.  It seems
> unlikely that we've been so good at writing vector support code that
> we've made it work for precision < bitsize, despite that being an
> unsupported combination until now.  But I guess we can fix problems
> on a case-by-case basis.
>
> Thanks,
> Richard
>
>>              " BITS_PER_UNIT);\n", m->name, m->name);
>>        printf ("    mode_nunits[E_%smode] =3D ps;\n", m->name);
>>        printf ("    adjust_mode_mask (E_%smode);\n", m->name);
>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>      printf ("\n  /* %s:%d */\n  REAL_MODE_FORMAT (E_%smode) =3D %s;\n",
>>            a->file, a->line, a->mode->name, a->adjustment);
>>
>> +  /* Adjust precision to the actual bits size.  */
>> +  for (a =3D adj_precision; a; a =3D a->next)
>> +    switch (a->mode->cl)
>> +      {
>> +     case MODE_VECTOR_BOOL:
>> +       printf ("\n  /* %s:%d.  */\n  ps =3D %s;\n", a->file, a->line,
>> +               a->adjustment);
>> +       printf ("  mode_precision[E_%smode] =3D ps;\n", a->mode->name);
>> +       break;
>> +     default:
>> +       break;
>> +      }
>> +
>>    puts ("}");
>>  }
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-1.c
>> new file mode 100644
>> index 00000000000..e70960c5b6d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 18 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-2.c
>> new file mode 100644
>> index 00000000000..dcc7a644a88
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 17 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-3.c
>> new file mode 100644
>> index 00000000000..3af0513e006
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 16 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-4.c
>> new file mode 100644
>> index 00000000000..ea3c360d756
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 15 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-5.c
>> new file mode 100644
>> index 00000000000..9fc659d2402
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 14 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-6.c
>> new file mode 100644
>> index 00000000000..98275e5267d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 13 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-7.c
>> new file mode 100644
>> index 00000000000..8f6f0b11f09
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite=
/gcc.target/riscv/pr108185-8.c
>> new file mode 100644
>> index 00000000000..d96959dd064
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> @@ -0,0 +1,77 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>> +
>> +    *(vbool1_t*)(out + 100) =3D v1;
>> +    *(vbool1_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>> +
>> +    *(vbool2_t*)(out + 100) =3D v1;
>> +    *(vbool2_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>> +
>> +    *(vbool4_t*)(out + 100) =3D v1;
>> +    *(vbool4_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>> +
>> +    *(vbool8_t*)(out + 100) =3D v1;
>> +    *(vbool8_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>> +
>> +    *(vbool16_t*)(out + 100) =3D v1;
>> +    *(vbool16_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>> +
>> +    *(vbool32_t*)(out + 100) =3D v1;
>> +    *(vbool32_t*)(out + 200) =3D v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>> +
>> +    *(vbool64_t*)(out + 100) =3D v1;
>> +    *(vbool64_t*)(out + 200) =3D v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*=
e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 7 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]=
+\)} 14 } } */