From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=sdMC=6Z=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 964413858D33
	for <gcc-patches@gcc.gnu.org>; Wed,  1 Mar 2023 12:03:05 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 964413858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7E52E2F4;
	Wed,  1 Mar 2023 04:03:48 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 264703F587;
	Wed,  1 Mar 2023 04:03:04 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: =?utf-8?Q?=E7=9B=BC_=E6=9D=8E_via_Gcc-patches?=
 <gcc-patches@gcc.gnu.org>
Mail-Followup-To: =?utf-8?Q?=E7=9B=BC_=E6=9D=8E_via_Gcc-patches?=
 <gcc-patches@gcc.gnu.org>, =?utf-8?B?55u8IOadjg==?=
 <incarnation.p.lee@outlook.com>,  "juzhe.zhong\@rivai.ai"
 <juzhe.zhong@rivai.ai>,  pan2.li <pan2.li@intel.com>,  Kito.cheng
 <kito.cheng@sifive.com>,  rguenther <rguenther@suse.de>,
 richard.sandiford@arm.com
Cc: =?utf-8?B?55u8IOadjg==?= <incarnation.p.lee@outlook.com>,
  "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>,  pan2.li
 <pan2.li@intel.com>,  Kito.cheng <kito.cheng@sifive.com>,  rguenther
 <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
References: <BYAPR04MB4824A720063FEE6C4F10776DA4A09@BYAPR04MB4824.namprd04.prod.outlook.com>
	<mptlekjtcco.fsf@arm.com>
	<BYAPR04MB48244794D1BF33A7F44DF8ADB7AF9@BYAPR04MB4824.namprd04.prod.outlook.com>
	<MW5PR11MB59083B1EAA01F0654526E760A9AC9@MW5PR11MB5908.namprd11.prod.outlook.com>
	<mpt7cw2p19n.fsf@arm.com>
	<BYAPR04MB4824B9342456A1F67D34F1EDB7AC9@BYAPR04MB4824.namprd04.prod.outlook.com>
	<MW5PR11MB5908CAEDD81742D2CDCB6CC2A9AC9@MW5PR11MB5908.namprd11.prod.outlook.com>
	<mptedq8ok67.fsf@arm.com>
	<6F429B8CF9C3B3EF+2023030118462950633032@rivai.ai>
	<BYAPR04MB48243B29E559720BE1582219B7AD9@BYAPR04MB4824.namprd04.prod.outlook.com>
	<mpt8rggohe9.fsf@arm.com>
	<BYAPR04MB482460A21FA9890421AEB2FEB7AD9@BYAPR04MB4824.namprd04.prod.outlook.com>
Date: Wed, 01 Mar 2023 12:03:02 +0000
In-Reply-To: <BYAPR04MB482460A21FA9890421AEB2FEB7AD9@BYAPR04MB4824.namprd04.prod.outlook.com>
	(=?utf-8?B?IuebvCDmnY4=?= via Gcc-patches"'s message of "Wed, 1 Mar 2023
 11:53:20
	+0000")
Message-ID: <mpt1qm8of0p.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-32.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SCC_10_SHORT_WORD_LINES,SCC_20_SHORT_WORD_LINES,SCC_35_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

=E7=9B=BC =E6=9D=8E via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Just have a test with the below code, the [0x4, 0x4] test comes from VNx4=
BI. You can notice that the mode size is unchanged.
>
> printf ("    can_div_away_from_zero_p (mode_precision[E_%smode], "
>   "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> VNx4BI After precision [0x4, 0x4], size [0x4, 0]

Yeah, the result is expected to be unchanged if the division fails.
That's a deliberate part of the interface.  The can_* functions
should never be used without testing the boolean return value.

But this precision of [4,4] for VNx4BI is different from what you
listed below.  Like I say, if the precision really is [4,4], and if
the size really is ceil([4,4]/8), then I don't think we can represent
that with current infrastructure.

Thanks,
Richard

>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, March 1, 2023 19:11
> To: =E7=9B=BC =E6=9D=8E via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.c=
om>; =E7=9B=BC =E6=9D=8E <incarnation.p.lee@outlook.com>; Kito.cheng <kito.=
cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> =E7=9B=BC =E6=9D=8E via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Thank you all for your quick response.
>>
>> As juzhe mentioned, the memory access of RISC-V will be always aligned t=
o the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vboo=
l*.
>
> OK, thanks to both of you.  This is what I'd have expected.
>
> In that case, I think both the can_div_away_from_zero_p and the
> original patch (using size_one) will give the wrong results.
> There isn't a way of representing ceil([4,4]/8) as a poly_int.
> The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
>
>> Actually, the data [4,4] comes from the self-test, the RISC-V precision =
mode as below.
>>
>> VNx64BI precision [0x40, 0x40].
>> VNx32BI precision [0x20, 0x20].
>> VNx16BI precision [0x10, 0x10].
>> VNx8BI precision [0x8, 0x8].
>> VNx4BI precision [0x8, 0x8].
>> VNx2BI precision [0x8, 0x8].
>> VNx1BI precision [0x8, 0x8].
>
> Ah, OK.  Which self-test causes this?
>
> Richard
>
>> The impact of data [4, 4] will impact the genmode part, we cannot write =
like below as the gcc_unreachable will be hitten.
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, =
 &mode_size[E_%smode]))
>>   gcc_unreachable (); // Hit on [4, 4] of the self-test.
>>
>> Pan
>> ________________________________
>> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
>> Sent: Wednesday, March 1, 2023 18:46
>> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@inte=
l.com>
>> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-=
patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguent=
her@suse.de>
>> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adju=
stment
>>
>>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>>bits, even when that means accessing fully-unused bytes?  E.g. 4+4X
>>>>when X=3D3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>>8+8X would be 32 bits/4 bytes.  So a store of [8,8] for a precision
>>>>of [4,4] would store 2 bytes beyond the end of the useful data when X=
=3D=3D3?
>>
>> Hi, Richard. Thank you for helping us.
>> My understanding of RVV ISA:
>>
>> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VN=
x4QI, VNx2DI,...etc)
>> For data mode, we fully access the data and we don't have unused bytes, =
so we don't need to adjust precision.
>> However, for mask mode we access mask bit in compact model (since each m=
ask bit for corresponding element are consecutive).
>> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, thes=
e 4 modes have same bytesize (1,1)  but different bitsize.
>>
>> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, V=
Nx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignme=
nt, I guess it can not).
>>
>> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2B=
I,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access=
 in bit alignment. so they will the same in the access.
>> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte,=
 2bytes, 4bytes. They are accessing different size.
>>
>> This is my comprehension of RVV ISA, feel free to correct me.
>> Thanks.
>>
>> ________________________________
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
>> Date: 2023-03-01 18:11
>> To: Li\, Pan2<mailto:pan2.li@intel.com>
>> CC: =E7=9B=BC =E6=9D=8E<mailto:incarnation.p.lee@outlook.com>; incarnati=
on.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@r=
ivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.ch=
eng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustme=
nt
>> "Li, Pan2" <pan2.li@intel.com> writes:
>>> Hi Richard Sandiford,
>>>
>>> Just tried the overloaded constant divisors with below print div, it wo=
rks as you mentioned, =EF=98=89!
>>>
>>> printf ("    can_div_away_from_zero_p (mode_precision[E_%smode], "
>>>      "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>>
>>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>>> inline typename if_nonpoly<Cb, bool>::type
>>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>>>                          Cb b,
>>>                          poly_int_pod<N, Cq> *quotient)
>>> {
>>>   if (!can_div_trunc_p (a, b, quotient))
>>>     return false;
>>>   if (maybe_ne (*quotient * b, a))
>>>     for (unsigned int i =3D 0; i < N; ++i)
>>>       quotient->coeffs[i] +=3D (quotient->coeffs[i] < 0 ? -1 : 1);
>>>   return true;
>>> }
>>>
>>> But I may have a question about the one case as below.
>>>
>>> Assume:
>>> a =3D [4, 4], b =3D 8.
>>>
>>> When meet can_div_trunc_p, it will check if the reminder is constant or=
 not, aka a.coeffs[i] % 8 =3D=3D 0 (i >=3D 1). If not constant reminder, th=
e can_div_trunc_p will do nothing about quotient and return false.
>>>
>>> Thus, when a =3D [4, 4] for can_div_away_from_zero_p, the output *quoti=
ent will be unchanged, aka the mod_size[E_%smode] will be unchanged for thi=
s case. However, the underlying mode_size will adjust it to the real byte s=
ize, and I am not sure if it is by design or requires additional handling.
>>
>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>> bits, even when that means accessing fully-unused bytes?  E.g. 4+4X
>> when X=3D3 would be 16 bits/2 bytes of useful data, but a bitsize of
>> 8+8X would be 32 bits/4 bytes.  So a store of [8,8] for a precision
>> of [4,4] would store 2 bytes beyond the end of the useful data when X=3D=
=3D3?
>>
>> Richard
>>
>>> Pan
>>>
>>> From: =E7=9B=BC =E6=9D=8E <incarnation.p.lee@outlook.com>
>>> Sent: Tuesday, February 28, 2023 5:59 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@in=
tel.com>
>>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juz=
he.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm=
ent
>>>
>>> Understood, thanks for the explanations and suggestions. Let me have a =
try and keep you posted.
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandi=
ford@arm.com>>
>>> Sent: Tuesday, February 28, 2023 17:50
>>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Cc: =E7=9B=BC =E6=9D=8E <incarnation.p.lee@outlook.com<mailto:incarnati=
on.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@g=
cc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:ju=
zhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; ki=
to.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<ma=
ilto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <=
rguenther@suse.de<mailto:rguenther@suse.de>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm=
ent
>>>
>>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>>> Hi Richard Sandiford,
>>>>
>>>> After some investigation, I am not sure if it is possible to make it g=
eneral without any changes to exact_div. We can add one method like below t=
o get the unit poly for all possible N.
>>>>
>>>> template<unsigned int N, typename Ca>
>>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>>> {
>>>>   typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>>
>>>>   poly_int<N, C> normalized =3D a;
>>>>
>>>>   if (normalized.is_constant())
>>>>     normalized.coeffs[0] =3D 1;
>>>>   else
>>>>     for (unsigned int i =3D 0; i < N; i++)
>>>>       POLY_SET_COEFF (C, normalized, i, 1);
>>>>
>>>>   return normalized;
>>>> }
>>>>
>>>> And then adjust the genmodes like below to consume the unit poly.
>>>>
>>>>       printf ("    poly_uint16 unit_poly =3D "
>>>>              "normalize_to_unit (mode_precision[E_%smode]);\n", m->nam=
e);
>>>>       printf ("    if (known_lt (mode_precision[E_%smode], "
>>>>              "unit_poly * BITS_PER_UNIT))\n", m->name);
>>>>       printf ("      mode_size[E_%smode] =3D unit_poly;\n", m->name);
>>>>
>>>> I am not sure if it is a good idea to introduce above normalize code i=
nto exact_div. Given the comment of the exact_div indicates that =E2=80=9C/=
* Return A / B, given that A is known to be a multiple of B. */=E2=80=9D.
>>>
>>> My point was that we have multiple ways of dividing poly_ints:
>>>
>>> - exact_div, for when the caller knows that the result is always exact
>>> - can_div_trunc_p, for truncating division (round towards 0)
>>> - can_div_away_from_zero_p, for rounding away from 0
>>> - ...
>>>
>>> This is like how we have multiple division *_EXPRs on trees.
>>>
>>> Until now, exact_div was the correct choice for modes because vector
>>> modes didn't have padding.  We're now changing that, so my suggestion
>>> in the review was to change the division operation that we use.
>>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>>> which would have the effect of rounding the quotient up.
>>>
>>> Something like:
>>>
>>>       if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER=
_UNIT,
>>>                                      &mode_size[E_%smode]))
>>>         gcc_unreachable ();
>>>
>>> But this will require a new overload of can_div_away_from_zero_p, since
>>> the existing one is for constant quotients rather than constant divisor=
s.
>>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Could you please help to share your opinion about this from the expert=
=E2=80=99s perspective ? Thank you!
>>>>
>>>> Pan
>>>>
>>>> From: =E7=9B=BC =E6=9D=8E <incarnation.p.lee@outlook.com<mailto:incarn=
ation.p.lee@outlook.com>>
>>>> Sent: Monday, February 27, 2023 11:13 PM
>>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandif=
ord@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.or=
g<mailto:gcc-patches@gcc.gnu.org>>
>>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifi=
ve.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@su=
se.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjust=
ment
>>>>
>>>> Never mind, wish you have a good holiday.
>>>>
>>>> Thanks for pointing this out, the if part cannot take care of poly_int=
 with N > 2. As I understand, we need to make it general for all the N of p=
oly_int.
>>>>
>>>> Thus I would like to double confirm with you about how to make it gene=
ral. I suppose there will be a new function can_div_away_from_zero_p to rep=
lace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchange=
d(consider the word exact, I suppose we should not touch here), right? Then=
 we still need one poly_int with all 1 for N as the return if can_div_away_=
from_zero_p is true.
>>>>
>>>> Thanks again for your professional suggestion, have a nice day, =EF=98=
=89!
>>>>
>>>> Pan
>>>> ________________________________
>>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sand=
iford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@a=
rm.com>>>
>>>> Sent: Monday, February 27, 2023 22:24
>>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mail=
to:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patc=
hes@gcc.gnu.org>>>
>>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com=
<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.co=
m>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mai=
lto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>=
; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai=
.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhon=
g@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; ki=
to.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.c=
om%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.chen=
g@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>=
; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rgu=
enther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2=
.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@i=
ntel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjust=
ment
>>>>
>>>> Sorry for the slow reply, been away for a couple of weeks.
>>>>
>>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto=
:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patche=
s@gcc.gnu.org>>> writes:
>>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.=
li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>>
>>>>>        Fix the bug of the rvv bool mode precision with the adjustment.
>>>>>        The bits size of vbool*_t will be adjusted to
>>>>>        [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>>>        adjusted mode precison of vbool*_t will help underlying pass to
>>>>>        make the right decision for both the correctness and optimizat=
ion.
>>>>>
>>>>>        Given below sample code:
>>>>>        void test_1(int8_t * restrict in, int8_t * restrict out)
>>>>>        {
>>>>>          vbool8_t v2 =3D *(vbool8_t*)in;
>>>>>          vbool16_t v5 =3D *(vbool16_t*)in;
>>>>>          *(vbool16_t*)(out + 200) =3D v5;
>>>>>          *(vbool8_t*)(out + 100) =3D v2;
>>>>>        }
>>>>>
>>>>>        Before the precision adjustment:
>>>>>        addi    a4,a1,100
>>>>>        vsetvli a5,zero,e8,m1,ta,ma
>>>>>        addi    a1,a1,200
>>>>>        vlm.v   v24,0(a0)
>>>>>        vsm.v   v24,0(a4)
>>>>>        // Need one vsetvli and vlm.v for correctness here.
>>>>>        vsm.v   v24,0(a1)
>>>>>
>>>>>        After the precision adjustment:
>>>>>        csrr    t0,vlenb
>>>>>        slli    t1,t0,1
>>>>>        csrr    a3,vlenb
>>>>>        sub     sp,sp,t1
>>>>>        slli    a4,a3,1
>>>>>        add     a4,a4,sp
>>>>>        sub     a3,a4,a3
>>>>>        vsetvli a5,zero,e8,m1,ta,ma
>>>>>        addi    a2,a1,200
>>>>>        vlm.v   v24,0(a0)
>>>>>        vsm.v   v24,0(a3)
>>>>>        addi    a1,a1,100
>>>>>        vsetvli a4,zero,e8,mf2,ta,ma
>>>>>        csrr    t0,vlenb
>>>>>        vlm.v   v25,0(a3)
>>>>>        vsm.v   v25,0(a2)
>>>>>        slli    t1,t0,1
>>>>>        vsetvli a5,zero,e8,m1,ta,ma
>>>>>        vsm.v   v24,0(a1)
>>>>>        add     sp,sp,t1
>>>>>        jr      ra
>>>>>
>>>>>        However, there may be some optimization opportunates after
>>>>>        the mode precision adjustment. It can be token care of in
>>>>>        the RISC-V backend in the underlying separted PR(s).
>>>>>
>>>>>        PR 108185
>>>>>        PR 108654
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>        * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>>>        * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>>>        * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>>>        * genmodes.cc (ADJUST_PRECISION):
>>>>>        (emit_mode_adjustments):
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>        * gcc.target/riscv/pr108185-1.c: New test.
>>>>>        * gcc.target/riscv/pr108185-2.c: New test.
>>>>>        * gcc.target/riscv/pr108185-3.c: New test.
>>>>>        * gcc.target/riscv/pr108185-4.c: New test.
>>>>>        * gcc.target/riscv/pr108185-5.c: New test.
>>>>>        * gcc.target/riscv/pr108185-6.c: New test.
>>>>>        * gcc.target/riscv/pr108185-7.c: New test.
>>>>>        * gcc.target/riscv/pr108185-8.c: New test.
>>>>>
>>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mai=
lto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>> ---
>>>>>  gcc/config/riscv/riscv-modes.def            |  8 +++
>>>>>  gcc/config/riscv/riscv.cc                   | 12 ++++
>>>>>  gcc/config/riscv/riscv.h                    |  1 +
>>>>>  gcc/genmodes.cc                             | 25 ++++++-
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>>>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++=
++
>>>>>  12 files changed, 598 insertions(+), 1 deletion(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>>
>>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/risc=
v-modes.def
>>>>> index d5305efa8a6..110bddce851 100644
>>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * ri=
scv_bytes_per_vector_chunk);
>>>>>  ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vect=
or_chunk);
>>>>>  ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>>
>>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16=
));
>>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32=
));
>>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64=
));
>>>>> +
>>>>>  /*
>>>>>     | Mode        | MIN_VLEN=3D32 | MIN_VLEN=3D32 | MIN_VLEN=3D64 | M=
IN_VLEN=3D64 |
>>>>>     |             | LMUL        | SEW/LMUL    | LMUL        | SEW/LMU=
L    |
>>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>>> --- a/gcc/config/riscv/riscv.cc
>>>>> +++ b/gcc/config/riscv/riscv.cc
>>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int =
scale)
>>>>>    return scale;
>>>>>  }
>>>>>
>>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def.  Return the correct
>>>>> +   PRECISION size for corresponding machine_mode.  */
>>>>> +
>>>>> +poly_int64
>>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>>> +{
>>>>> +  if (riscv_v_ext_vector_mode_p (mode))
>>>>> +    return riscv_vector_chunks * scale;
>>>>> +
>>>>> +  return scale;
>>>>> +}
>>>>> +
>>>>>  /* Return true if X is a valid address for machine mode MODE.  If it=
 is,
>>>>>     fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is=
 in
>>>>>     effect.  */
>>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>>> --- a/gcc/config/riscv/riscv.h
>>>>> +++ b/gcc/config/riscv/riscv.h
>>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>>>  extern unsigned riscv_bytes_per_vector_chunk;
>>>>>  extern poly_uint16 riscv_vector_chunks;
>>>>>  extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>>>  /* The number of bits and bytes in a RVV vector.  */
>>>>>  #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * ri=
scv_bytes_per_vector_chunk * 8))
>>>>>  #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * r=
iscv_bytes_per_vector_chunk))
>>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>>> index 2d418f09aab..12f4e6335e6 100644
>>>>> --- a/gcc/genmodes.cc
>>>>> +++ b/gcc/genmodes.cc
>>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>>>  static struct mode_adjust *adj_format;
>>>>>  static struct mode_adjust *adj_ibit;
>>>>>  static struct mode_adjust *adj_fbit;
>>>>> +static struct mode_adjust *adj_precision;
>>>>>
>>>>>  /* Mode class operations.  */
>>>>>  static enum mode_class
>>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>>>  #define ADJUST_NUNITS(M, X)    _ADD_ADJUST (nunits, M, X, RANDOM, RA=
NDOM)
>>>>>  #define ADJUST_BYTESIZE(M, X)  _ADD_ADJUST (bytesize, M, X, RANDOM, =
RANDOM)
>>>>>  #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM,=
 RANDOM)
>>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,=
 RANDOM)
>>>>>  #define ADJUST_FLOAT_FORMAT(M, X)    _ADD_ADJUST (format, M, X, FLOA=
T, FLOAT)
>>>>>  #define ADJUST_IBIT(M, X)  _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>>>  #define ADJUST_FBIT(M, X)  _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>>>              " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>>>              m->name, m->name);
>>>>>        printf ("    mode_precision[E_%smode] =3D ps * old_factor;\n",=
 m->name);
>>>>> -      printf ("    mode_size[E_%smode] =3D exact_div (mode_precision=
[E_%smode],"
>>>>> +      /* Normalize the size to 1 if precison is less than BITS_PER_U=
NIT.  */
>>>>> +      printf ("    poly_uint16 size_one =3D "
>>>>> +           "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>>> +      printf ("      ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>>
>>>> Have you tried this on an x86_64 system?  I wouldn't expect it to work
>>>> because of the:
>>>>
>>>>   STATIC_ASSERT (N >=3D 2);
>>>>
>>>> in the poly_uint16 constructor.
>>>>
>>>>> +      printf ("    if (known_lt (mode_precision[E_%smode], "
>>>>> +           "size_one * BITS_PER_UNIT))\n", m->name);
>>>>> +      printf ("      mode_size[E_%smode] =3D size_one;\n", m->name);
>>>>> +      printf ("    else\n");
>>>>> +      printf ("      mode_size[E_%smode] =3D exact_div (mode_precisi=
on[E_%smode],"
>>>>
>>>> Now that the assert implicit in the original exact_div no longer holds,
>>>> I think we should instead generalise it to can_div_away_from_zero_p
>>>> (which will involve defining a new overload of can_div_away_from_zero_=
p).
>>>> I think that will give the same result as the code above for the cases
>>>> that the code above handles.  But it should be more general too.
>>>>
>>>> TBH, I'm still sceptical that this is all that is needed.  It seems
>>>> unlikely that we've been so good at writing vector support code that
>>>> we've made it work for precision < bitsize, despite that being an
>>>> unsupported combination until now.  But I guess we can fix problems
>>>> on a case-by-case basis.
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>>>              " BITS_PER_UNIT);\n", m->name, m->name);
>>>>>        printf ("    mode_nunits[E_%smode] =3D ps;\n", m->name);
>>>>>        printf ("    adjust_mode_mask (E_%smode);\n", m->name);
>>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>>>      printf ("\n  /* %s:%d */\n  REAL_MODE_FORMAT (E_%smode) =3D %s;\=
n",
>>>>>            a->file, a->line, a->mode->name, a->adjustment);
>>>>>
>>>>> +  /* Adjust precision to the actual bits size.  */
>>>>> +  for (a =3D adj_precision; a; a =3D a->next)
>>>>> +    switch (a->mode->cl)
>>>>> +      {
>>>>> +     case MODE_VECTOR_BOOL:
>>>>> +       printf ("\n  /* %s:%d.  */\n  ps =3D %s;\n", a->file, a->line,
>>>>> +               a->adjustment);
>>>>> +       printf ("  mode_precision[E_%smode] =3D ps;\n", a->mode->name=
);
>>>>> +       break;
>>>>> +     default:
>>>>> +       break;
>>>>> +      }
>>>>> +
>>>>>    puts ("}");
>>>>>  }
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..e70960c5b6d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 18 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-2.c
>>>>> new file mode 100644
>>>>> index 00000000000..dcc7a644a88
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 17 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-3.c
>>>>> new file mode 100644
>>>>> index 00000000000..3af0513e006
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 16 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-4.c
>>>>> new file mode 100644
>>>>> index 00000000000..ea3c360d756
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 15 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-5.c
>>>>> new file mode 100644
>>>>> index 00000000000..9fc659d2402
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 14 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-6.c
>>>>> new file mode 100644
>>>>> index 00000000000..98275e5267d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 13 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-7.c
>>>>> new file mode 100644
>>>>> index 00000000000..8f6f0b11f09
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out=
) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 12 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsu=
ite/gcc.target/riscv/pr108185-8.c
>>>>> new file mode 100644
>>>>> index 00000000000..d96959dd064
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> @@ -0,0 +1,77 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool1_t v1 =3D *(vbool1_t*)in;
>>>>> +    vbool1_t v2 =3D *(vbool1_t*)in;
>>>>> +
>>>>> +    *(vbool1_t*)(out + 100) =3D v1;
>>>>> +    *(vbool1_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool2_t v1 =3D *(vbool2_t*)in;
>>>>> +    vbool2_t v2 =3D *(vbool2_t*)in;
>>>>> +
>>>>> +    *(vbool2_t*)(out + 100) =3D v1;
>>>>> +    *(vbool2_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool4_t v1 =3D *(vbool4_t*)in;
>>>>> +    vbool4_t v2 =3D *(vbool4_t*)in;
>>>>> +
>>>>> +    *(vbool4_t*)(out + 100) =3D v1;
>>>>> +    *(vbool4_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)=
 {
>>>>> +    vbool8_t v1 =3D *(vbool8_t*)in;
>>>>> +    vbool8_t v2 =3D *(vbool8_t*)in;
>>>>> +
>>>>> +    *(vbool8_t*)(out + 100) =3D v1;
>>>>> +    *(vbool8_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool16_t v1 =3D *(vbool16_t*)in;
>>>>> +    vbool16_t v2 =3D *(vbool16_t*)in;
>>>>> +
>>>>> +    *(vbool16_t*)(out + 100) =3D v1;
>>>>> +    *(vbool16_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool32_t v1 =3D *(vbool32_t*)in;
>>>>> +    vbool32_t v2 =3D *(vbool32_t*)in;
>>>>> +
>>>>> +    *(vbool32_t*)(out + 100) =3D v1;
>>>>> +    *(vbool32_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict ou=
t) {
>>>>> +    vbool64_t v1 =3D *(vbool64_t*)in;
>>>>> +    vbool64_t v2 =3D *(vbool64_t*)in;
>>>>> +
>>>>> +    *(vbool64_t*)(out + 100) =3D v1;
>>>>> +    *(vbool64_t*)(out + 200) =3D v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,=
\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 7 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0=
-9]+\)} 14 } } */