From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=nWye=6Z=suse.de=rguenther@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d])
	by sourceware.org (Postfix) with ESMTPS id B120C385840C
	for <gcc-patches@gcc.gnu.org>; Wed,  1 Mar 2023 12:33:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B120C385840C
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out2.suse.de (Postfix) with ESMTP id 18F501FE12;
	Wed,  1 Mar 2023 12:33:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1677673989; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=THB/utftc8MKv1jGnGLyCVorsJbi0w5dBftSeG8jhsk=;
	b=PgK7VC8ue+BKmwxtCHY8otulqfHHwa0t0/weFmok3KARTt6ssr/MQr51GzHUW3WbpDjTPt
	0s14Afni0LQSzhnOZm9vDgY5y/FK9NIvYprVnfdisb0ehUS5Y/6YvqLVg0e5/87lVHJyz7
	4L/7phxejmX0UPWWgn3aOTKKoJQYLGQ=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1677673989;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=THB/utftc8MKv1jGnGLyCVorsJbi0w5dBftSeG8jhsk=;
	b=E5ESFOAAiLwOSgiMYQ8I6P4xLGxKk2Dxe2BEJRb0AQCK2jJYC2MAYHEx+X1xlMh0DuOrPq
	FHwRBlXzr3NPENDg==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 808222C141;
	Wed,  1 Mar 2023 12:33:08 +0000 (UTC)
Date: Wed, 1 Mar 2023 12:33:08 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
cc: =?GB2312?Q?=C5=CE_=C0=EE_via_Gcc-patches?= <gcc-patches@gcc.gnu.org>, 
    =?GB2312?B?xc4gwO4=?= <incarnation.p.lee@outlook.com>, 
    "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>, 
    "pan2.li" <pan2.li@intel.com>, "Kito.cheng" <kito.cheng@sifive.com>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
In-Reply-To: <mpt1qm8of0p.fsf@arm.com>
Message-ID: <nycvar.YFH.7.77.849.2303011222040.27913@jbgna.fhfr.qr>
References: <BYAPR04MB4824A720063FEE6C4F10776DA4A09@BYAPR04MB4824.namprd04.prod.outlook.com> <mptlekjtcco.fsf@arm.com> <BYAPR04MB48244794D1BF33A7F44DF8ADB7AF9@BYAPR04MB4824.namprd04.prod.outlook.com> <MW5PR11MB59083B1EAA01F0654526E760A9AC9@MW5PR11MB5908.namprd11.prod.outlook.com>
 <mpt7cw2p19n.fsf@arm.com> <BYAPR04MB4824B9342456A1F67D34F1EDB7AC9@BYAPR04MB4824.namprd04.prod.outlook.com> <MW5PR11MB5908CAEDD81742D2CDCB6CC2A9AC9@MW5PR11MB5908.namprd11.prod.outlook.com> <mptedq8ok67.fsf@arm.com> <6F429B8CF9C3B3EF+2023030118462950633032@rivai.ai>
 <BYAPR04MB48243B29E559720BE1582219B7AD9@BYAPR04MB4824.namprd04.prod.outlook.com> <mpt8rggohe9.fsf@arm.com> <BYAPR04MB482460A21FA9890421AEB2FEB7AD9@BYAPR04MB4824.namprd04.prod.outlook.com> <mpt1qm8of0p.fsf@arm.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="-1609957120-618482312-1677673989=:27913"
X-Spam-Status: No, score=-8.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_SHORT,SCC_10_SHORT_WORD_LINES,SCC_20_SHORT_WORD_LINES,SCC_35_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1609957120-618482312-1677673989=:27913
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT

On Wed, 1 Mar 2023, Richard Sandiford wrote:

> 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
> >
> > printf ("    can_div_away_from_zero_p (mode_precision[E_%smode], "
> >   "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> >
> > VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> > VNx4BI After precision [0x4, 0x4], size [0x4, 0]
> 
> Yeah, the result is expected to be unchanged if the division fails.
> That's a deliberate part of the interface.  The can_* functions
> should never be used without testing the boolean return value.
> 
> But this precision of [4,4] for VNx4BI is different from what you
> listed below.  Like I say, if the precision really is [4,4], and if
> the size really is ceil([4,4]/8), then I don't think we can represent
> that with current infrastructure.

The size of VNx4BI is (4*N + 7) / 8 bytes.  I suppose we could simply
not store the size in bytes but only the size in bits then?

I see the problem, but I also don't see a good solution since
for VNx4BI with N == 3 we have one and a half byte of storage.

How do memory access patterns work with poly-int sizes?

> 
> Thanks,
> Richard
> 
> >
> > Pan
> > ________________________________
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Wednesday, March 1, 2023 19:11
> > To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> > Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >
> > 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> Thank you all for your quick response.
> >>
> >> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
> >
> > OK, thanks to both of you.  This is what I'd have expected.
> >
> > In that case, I think both the can_div_away_from_zero_p and the
> > original patch (using size_one) will give the wrong results.
> > There isn't a way of representing ceil([4,4]/8) as a poly_int.
> > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> >
> >> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
> >>
> >> VNx64BI precision [0x40, 0x40].
> >> VNx32BI precision [0x20, 0x20].
> >> VNx16BI precision [0x10, 0x10].
> >> VNx8BI precision [0x8, 0x8].
> >> VNx4BI precision [0x8, 0x8].
> >> VNx2BI precision [0x8, 0x8].
> >> VNx1BI precision [0x8, 0x8].
> >
> > Ah, OK.  Which self-test causes this?
> >
> > Richard
> >
> >> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
> >>
> >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,  &mode_size[E_%smode]))
> >>   gcc_unreachable (); // Hit on [4, 4] of the self-test.
> >>
> >> Pan
> >> ________________________________
> >> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> >> Sent: Wednesday, March 1, 2023 18:46
> >> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> >> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>
> >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> >>>>bits, even when that means accessing fully-unused bytes?  E.g. 4+4X
> >>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> >>>>8+8X would be 32 bits/4 bytes.  So a store of [8,8] for a precision
> >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> >>
> >> Hi, Richard. Thank you for helping us.
> >> My understanding of RVV ISA:
> >>
> >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> >> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> >> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1)  but different bitsize.
> >>
> >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
> >>
> >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
> >>
> >> This is my comprehension of RVV ISA, feel free to correct me.
> >> Thanks.
> >>
> >> ________________________________
> >> juzhe.zhong@rivai.ai
> >>
> >> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> >> Date: 2023-03-01 18:11
> >> To: Li\, Pan2<mailto:pan2.li@intel.com>
> >> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >> "Li, Pan2" <pan2.li@intel.com> writes:
> >>> Hi Richard Sandiford,
> >>>
> >>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
> >>>
> >>> printf ("    can_div_away_from_zero_p (mode_precision[E_%smode], "
> >>>      "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> >>>
> >>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> >>> inline typename if_nonpoly<Cb, bool>::type
> >>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> >>>                          Cb b,
> >>>                          poly_int_pod<N, Cq> *quotient)
> >>> {
> >>>   if (!can_div_trunc_p (a, b, quotient))
> >>>     return false;
> >>>   if (maybe_ne (*quotient * b, a))
> >>>     for (unsigned int i = 0; i < N; ++i)
> >>>       quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> >>>   return true;
> >>> }
> >>>
> >>> But I may have a question about the one case as below.
> >>>
> >>> Assume:
> >>> a = [4, 4], b = 8.
> >>>
> >>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
> >>>
> >>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
> >>
> >> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> >> bits, even when that means accessing fully-unused bytes?  E.g. 4+4X
> >> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> >> 8+8X would be 32 bits/4 bytes.  So a store of [8,8] for a precision
> >> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> >>
> >> Richard
> >>
> >>> Pan
> >>>
> >>> From: 盼 李 <incarnation.p.lee@outlook.com>
> >>> Sent: Tuesday, February 28, 2023 5:59 PM
> >>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> >>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>
> >>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
> >>>
> >>> Pan
> >>> ________________________________
> >>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> >>> Sent: Tuesday, February 28, 2023 17:50
> >>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>
> >>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
> >>>> Hi Richard Sandiford,
> >>>>
> >>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
> >>>>
> >>>> template<unsigned int N, typename Ca>
> >>>> inline POLY_CONST_RESULT (N, Ca, Ca)
> >>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> >>>> {
> >>>>   typedef POLY_CONST_COEFF (Ca, Ca) C;
> >>>>
> >>>>   poly_int<N, C> normalized = a;
> >>>>
> >>>>   if (normalized.is_constant())
> >>>>     normalized.coeffs[0] = 1;
> >>>>   else
> >>>>     for (unsigned int i = 0; i < N; i++)
> >>>>       POLY_SET_COEFF (C, normalized, i, 1);
> >>>>
> >>>>   return normalized;
> >>>> }
> >>>>
> >>>> And then adjust the genmodes like below to consume the unit poly.
> >>>>
> >>>>       printf ("    poly_uint16 unit_poly = "
> >>>>              "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> >>>>       printf ("    if (known_lt (mode_precision[E_%smode], "
> >>>>              "unit_poly * BITS_PER_UNIT))\n", m->name);
> >>>>       printf ("      mode_size[E_%smode] = unit_poly;\n", m->name);
> >>>>
> >>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
> >>>
> >>> My point was that we have multiple ways of dividing poly_ints:
> >>>
> >>> - exact_div, for when the caller knows that the result is always exact
> >>> - can_div_trunc_p, for truncating division (round towards 0)
> >>> - can_div_away_from_zero_p, for rounding away from 0
> >>> - ...
> >>>
> >>> This is like how we have multiple division *_EXPRs on trees.
> >>>
> >>> Until now, exact_div was the correct choice for modes because vector
> >>> modes didn't have padding.  We're now changing that, so my suggestion
> >>> in the review was to change the division operation that we use.
> >>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> >>> which would have the effect of rounding the quotient up.
> >>>
> >>> Something like:
> >>>
> >>>       if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> >>>                                      &mode_size[E_%smode]))
> >>>         gcc_unreachable ();
> >>>
> >>> But this will require a new overload of can_div_away_from_zero_p, since
> >>> the existing one is for constant quotients rather than constant divisors.
> >>>
> >>> Thanks,
> >>> Richard
> >>>
> >>>>
> >>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
> >>>>
> >>>> Pan
> >>>>
> >>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
> >>>> Sent: Monday, February 27, 2023 11:13 PM
> >>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> >>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>>
> >>>> Never mind, wish you have a good holiday.
> >>>>
> >>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
> >>>>
> >>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
> >>>>
> >>>> Thanks again for your professional suggestion, have a nice day, !
> >>>>
> >>>> Pan
> >>>> ________________________________
> >>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
> >>>> Sent: Monday, February 27, 2023 22:24
> >>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
> >>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.
 li@intel.com%3cmailto:pan2..li@intel.com>>>
> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>>
> >>>> Sorry for the slow reply, been away for a couple of weeks.
> >>>>
> >>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
> >>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> >>>>>
> >>>>>        Fix the bug of the rvv bool mode precision with the adjustment.
> >>>>>        The bits size of vbool*_t will be adjusted to
> >>>>>        [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> >>>>>        adjusted mode precison of vbool*_t will help underlying pass to
> >>>>>        make the right decision for both the correctness and optimization.
> >>>>>
> >>>>>        Given below sample code:
> >>>>>        void test_1(int8_t * restrict in, int8_t * restrict out)
> >>>>>        {
> >>>>>          vbool8_t v2 = *(vbool8_t*)in;
> >>>>>          vbool16_t v5 = *(vbool16_t*)in;
> >>>>>          *(vbool16_t*)(out + 200) = v5;
> >>>>>          *(vbool8_t*)(out + 100) = v2;
> >>>>>        }
> >>>>>
> >>>>>        Before the precision adjustment:
> >>>>>        addi    a4,a1,100
> >>>>>        vsetvli a5,zero,e8,m1,ta,ma
> >>>>>        addi    a1,a1,200
> >>>>>        vlm.v   v24,0(a0)
> >>>>>        vsm.v   v24,0(a4)
> >>>>>        // Need one vsetvli and vlm.v for correctness here.
> >>>>>        vsm.v   v24,0(a1)
> >>>>>
> >>>>>        After the precision adjustment:
> >>>>>        csrr    t0,vlenb
> >>>>>        slli    t1,t0,1
> >>>>>        csrr    a3,vlenb
> >>>>>        sub     sp,sp,t1
> >>>>>        slli    a4,a3,1
> >>>>>        add     a4,a4,sp
> >>>>>        sub     a3,a4,a3
> >>>>>        vsetvli a5,zero,e8,m1,ta,ma
> >>>>>        addi    a2,a1,200
> >>>>>        vlm.v   v24,0(a0)
> >>>>>        vsm.v   v24,0(a3)
> >>>>>        addi    a1,a1,100
> >>>>>        vsetvli a4,zero,e8,mf2,ta,ma
> >>>>>        csrr    t0,vlenb
> >>>>>        vlm.v   v25,0(a3)
> >>>>>        vsm.v   v25,0(a2)
> >>>>>        slli    t1,t0,1
> >>>>>        vsetvli a5,zero,e8,m1,ta,ma
> >>>>>        vsm.v   v24,0(a1)
> >>>>>        add     sp,sp,t1
> >>>>>        jr      ra
> >>>>>
> >>>>>        However, there may be some optimization opportunates after
> >>>>>        the mode precision adjustment. It can be token care of in
> >>>>>        the RISC-V backend in the underlying separted PR(s).
> >>>>>
> >>>>>        PR 108185
> >>>>>        PR 108654
> >>>>>
> >>>>> gcc/ChangeLog:
> >>>>>
> >>>>>        * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> >>>>>        * config/riscv/riscv.cc (riscv_v_adjust_precision):
> >>>>>        * config/riscv/riscv.h (riscv_v_adjust_precision):
> >>>>>        * genmodes.cc (ADJUST_PRECISION):
> >>>>>        (emit_mode_adjustments):
> >>>>>
> >>>>> gcc/testsuite/ChangeLog:
> >>>>>
> >>>>>        * gcc.target/riscv/pr108185-1.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-2.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-3.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-4.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-5.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-6.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-7.c: New test.
> >>>>>        * gcc.target/riscv/pr108185-8.c: New test.
> >>>>>
> >>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> >>>>> ---
> >>>>>  gcc/config/riscv/riscv-modes.def            |  8 +++
> >>>>>  gcc/config/riscv/riscv.cc                   | 12 ++++
> >>>>>  gcc/config/riscv/riscv.h                    |  1 +
> >>>>>  gcc/genmodes.cc                             | 25 ++++++-
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >>>>>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >>>>>  12 files changed, 598 insertions(+), 1 deletion(-)
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>>
> >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >>>>> index d5305efa8a6..110bddce851 100644
> >>>>> --- a/gcc/config/riscv/riscv-modes.def
> >>>>> +++ b/gcc/config/riscv/riscv-modes.def
> >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >>>>>  ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >>>>>  ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >>>>>
> >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> >>>>> +
> >>>>>  /*
> >>>>>     | Mode        | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >>>>>     |             | LMUL        | SEW/LMUL    | LMUL        | SEW/LMUL    |
> >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >>>>> index de3e1f903c7..cbe66c0e35b 100644
> >>>>> --- a/gcc/config/riscv/riscv.cc
> >>>>> +++ b/gcc/config/riscv/riscv.cc
> >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >>>>>    return scale;
> >>>>>  }
> >>>>>
> >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def.  Return the correct
> >>>>> +   PRECISION size for corresponding machine_mode.  */
> >>>>> +
> >>>>> +poly_int64
> >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
> >>>>> +{
> >>>>> +  if (riscv_v_ext_vector_mode_p (mode))
> >>>>> +    return riscv_vector_chunks * scale;
> >>>>> +
> >>>>> +  return scale;
> >>>>> +}
> >>>>> +
> >>>>>  /* Return true if X is a valid address for machine mode MODE.  If it is,
> >>>>>     fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
> >>>>>     effect.  */
> >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >>>>> index 5bc7f2f467d..15b9317a8ce 100644
> >>>>> --- a/gcc/config/riscv/riscv.h
> >>>>> +++ b/gcc/config/riscv/riscv.h
> >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> >>>>>  extern unsigned riscv_bytes_per_vector_chunk;
> >>>>>  extern poly_uint16 riscv_vector_chunks;
> >>>>>  extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> >>>>>  /* The number of bits and bytes in a RVV vector.  */
> >>>>>  #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >>>>>  #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> >>>>> index 2d418f09aab..12f4e6335e6 100644
> >>>>> --- a/gcc/genmodes.cc
> >>>>> +++ b/gcc/genmodes.cc
> >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> >>>>>  static struct mode_adjust *adj_format;
> >>>>>  static struct mode_adjust *adj_ibit;
> >>>>>  static struct mode_adjust *adj_fbit;
> >>>>> +static struct mode_adjust *adj_precision;
> >>>>>
> >>>>>  /* Mode class operations.  */
> >>>>>  static enum mode_class
> >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> >>>>>  #define ADJUST_NUNITS(M, X)    _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> >>>>>  #define ADJUST_BYTESIZE(M, X)  _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> >>>>>  #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> >>>>>  #define ADJUST_FLOAT_FORMAT(M, X)    _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> >>>>>  #define ADJUST_IBIT(M, X)  _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> >>>>>  #define ADJUST_FBIT(M, X)  _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> >>>>>              " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> >>>>>              m->name, m->name);
> >>>>>        printf ("    mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> >>>>> -      printf ("    mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> >>>>> +      /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.  */
> >>>>> +      printf ("    poly_uint16 size_one = "
> >>>>> +           "mode_precision[E_%smode].is_constant ()\n", m->name);
> >>>>> +      printf ("      ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> >>>>
> >>>> Have you tried this on an x86_64 system?  I wouldn't expect it to work
> >>>> because of the:
> >>>>
> >>>>   STATIC_ASSERT (N >= 2);
> >>>>
> >>>> in the poly_uint16 constructor.
> >>>>
> >>>>> +      printf ("    if (known_lt (mode_precision[E_%smode], "
> >>>>> +           "size_one * BITS_PER_UNIT))\n", m->name);
> >>>>> +      printf ("      mode_size[E_%smode] = size_one;\n", m->name);
> >>>>> +      printf ("    else\n");
> >>>>> +      printf ("      mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> >>>>
> >>>> Now that the assert implicit in the original exact_div no longer holds,
> >>>> I think we should instead generalise it to can_div_away_from_zero_p
> >>>> (which will involve defining a new overload of can_div_away_from_zero_p).
> >>>> I think that will give the same result as the code above for the cases
> >>>> that the code above handles.  But it should be more general too.
> >>>>
> >>>> TBH, I'm still sceptical that this is all that is needed.  It seems
> >>>> unlikely that we've been so good at writing vector support code that
> >>>> we've made it work for precision < bitsize, despite that being an
> >>>> unsupported combination until now.  But I guess we can fix problems
> >>>> on a case-by-case basis.
> >>>>
> >>>> Thanks,
> >>>> Richard
> >>>>
> >>>>>              " BITS_PER_UNIT);\n", m->name, m->name);
> >>>>>        printf ("    mode_nunits[E_%smode] = ps;\n", m->name);
> >>>>>        printf ("    adjust_mode_mask (E_%smode);\n", m->name);
> >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> >>>>>      printf ("\n  /* %s:%d */\n  REAL_MODE_FORMAT (E_%smode) = %s;\n",
> >>>>>            a->file, a->line, a->mode->name, a->adjustment);
> >>>>>
> >>>>> +  /* Adjust precision to the actual bits size.  */
> >>>>> +  for (a = adj_precision; a; a = a->next)
> >>>>> +    switch (a->mode->cl)
> >>>>> +      {
> >>>>> +     case MODE_VECTOR_BOOL:
> >>>>> +       printf ("\n  /* %s:%d.  */\n  ps = %s;\n", a->file, a->line,
> >>>>> +               a->adjustment);
> >>>>> +       printf ("  mode_precision[E_%smode] = ps;\n", a->mode->name);
> >>>>> +       break;
> >>>>> +     default:
> >>>>> +       break;
> >>>>> +      }
> >>>>> +
> >>>>>    puts ("}");
> >>>>>  }
> >>>>>
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..e70960c5b6d
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..dcc7a644a88
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..3af0513e006
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..ea3c360d756
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..9fc659d2402
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..98275e5267d
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..8f6f0b11f09
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..d96959dd064
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>> @@ -0,0 +1,77 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool1_t v1 = *(vbool1_t*)in;
> >>>>> +    vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> +    *(vbool1_t*)(out + 100) = v1;
> >>>>> +    *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool2_t v1 = *(vbool2_t*)in;
> >>>>> +    vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> +    *(vbool2_t*)(out + 100) = v1;
> >>>>> +    *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool4_t v1 = *(vbool4_t*)in;
> >>>>> +    vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> +    *(vbool4_t*)(out + 100) = v1;
> >>>>> +    *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool8_t v1 = *(vbool8_t*)in;
> >>>>> +    vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> +    *(vbool8_t*)(out + 100) = v1;
> >>>>> +    *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool16_t v1 = *(vbool16_t*)in;
> >>>>> +    vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> +    *(vbool16_t*)(out + 100) = v1;
> >>>>> +    *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool32_t v1 = *(vbool32_t*)in;
> >>>>> +    vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> +    *(vbool32_t*)(out + 100) = v1;
> >>>>> +    *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> +    vbool64_t v1 = *(vbool64_t*)in;
> >>>>> +    vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> +    *(vbool64_t*)(out + 100) = v1;
> >>>>> +    *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
---1609957120-618482312-1677673989=:27913--