From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E078A3858D3C for ; Wed, 1 Mar 2023 11:11:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E078A3858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C3D6F2F4; Wed, 1 Mar 2023 03:12:27 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6E48A3F587; Wed, 1 Mar 2023 03:11:43 -0800 (PST) From: Richard Sandiford To: =?utf-8?Q?=E7=9B=BC_=E6=9D=8E_via_Gcc-patches?= Mail-Followup-To: =?utf-8?Q?=E7=9B=BC_=E6=9D=8E_via_Gcc-patches?= ,"juzhe.zhong\@rivai.ai" , pan2.li , =?utf-8?B?55u8IOadjg==?= , Kito.cheng , rguenther , richard.sandiford@arm.com Cc: "juzhe.zhong\@rivai.ai" , pan2.li , =?utf-8?B?55u8IOadjg==?= , Kito.cheng , rguenther Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment References: <6F429B8CF9C3B3EF+2023030118462950633032@rivai.ai> Date: Wed, 01 Mar 2023 11:11:42 +0000 In-Reply-To: (=?utf-8?B?IuebvCDmnY4=?= via Gcc-patches"'s message of "Wed, 1 Mar 2023 10:55:01 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-32.1 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SCC_10_SHORT_WORD_LINES,SCC_20_SHORT_WORD_LINES,SCC_35_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: =E7=9B=BC =E6=9D=8E via Gcc-patches writes: > Thank you all for your quick response. > > As juzhe mentioned, the memory access of RISC-V will be always aligned to= the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool= *. OK, thanks to both of you. This is what I'd have expected. In that case, I think both the can_div_away_from_zero_p and the original patch (using size_one) will give the wrong results. There isn't a way of representing ceil([4,4]/8) as a poly_int. The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even. > Actually, the data [4,4] comes from the self-test, the RISC-V precision m= ode as below. > > VNx64BI precision [0x40, 0x40]. > VNx32BI precision [0x20, 0x20]. > VNx16BI precision [0x10, 0x10]. > VNx8BI precision [0x8, 0x8]. > VNx4BI precision [0x8, 0x8]. > VNx2BI precision [0x8, 0x8]. > VNx1BI precision [0x8, 0x8]. Ah, OK. Which self-test causes this? Richard > The impact of data [4, 4] will impact the genmode part, we cannot write l= ike below as the gcc_unreachable will be hitten. > > if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, = &mode_size[E_%smode])) > gcc_unreachable (); // Hit on [4, 4] of the self-test. > > Pan > ________________________________ > From: juzhe.zhong@rivai.ai > Sent: Wednesday, March 1, 2023 18:46 > To: richard.sandiford ; pan2.li > Cc: incarnation.p.lee ; gcc-patches ; Kito.cheng ; rguenther > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjus= tment > >>> Is it right that, for RVV, a load or store of [4,4] will access [8,8] >>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X >>>when X=3D3 would be 16 bits/2 bytes of useful data, but a bitsize of >>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision >>>of [4,4] would store 2 bytes beyond the end of the useful data when X=3D= =3D3? > > Hi, Richard. Thank you for helping us. > My understanding of RVV ISA: > > In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx= 4QI, VNx2DI,...etc) > For data mode, we fully access the data and we don't have unused bytes, s= o we don't need to adjust precision. > However, for mask mode we access mask bit in compact model (since each ma= sk bit for corresponding element are consecutive). > for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these= 4 modes have same bytesize (1,1) but different bitsize. > > VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VN= x1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignmen= t, I guess it can not). > > If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI= ,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access = in bit alignment. so they will the same in the access. > However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, = 2bytes, 4bytes. They are accessing different size. > > This is my comprehension of RVV ISA, feel free to correct me. > Thanks. > > ________________________________ > juzhe.zhong@rivai.ai > > From: Richard Sandiford > Date: 2023-03-01 18:11 > To: Li\, Pan2 > CC: =E7=9B=BC =E6=9D=8E; incarnatio= n.p.lee--- via Gcc-patches; juzhe.zhong\@ri= vai.ai; kito.cheng\@sifive.com; rguenther\@suse.de > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > "Li, Pan2" writes: >> Hi Richard Sandiford, >> >> Just tried the overloaded constant divisors with below print div, it wor= ks as you mentioned, =EF=98=89! >> >> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " >> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); >> >> template >> inline typename if_nonpoly::type >> can_div_away_from_zero_p (const poly_int_pod &a, >> Cb b, >> poly_int_pod *quotient) >> { >> if (!can_div_trunc_p (a, b, quotient)) >> return false; >> if (maybe_ne (*quotient * b, a)) >> for (unsigned int i =3D 0; i < N; ++i) >> quotient->coeffs[i] +=3D (quotient->coeffs[i] < 0 ? -1 : 1); >> return true; >> } >> >> But I may have a question about the one case as below. >> >> Assume: >> a =3D [4, 4], b =3D 8. >> >> When meet can_div_trunc_p, it will check if the reminder is constant or = not, aka a.coeffs[i] % 8 =3D=3D 0 (i >=3D 1). If not constant reminder, the= can_div_trunc_p will do nothing about quotient and return false. >> >> Thus, when a =3D [4, 4] for can_div_away_from_zero_p, the output *quotie= nt will be unchanged, aka the mod_size[E_%smode] will be unchanged for this= case. However, the underlying mode_size will adjust it to the real byte si= ze, and I am not sure if it is by design or requires additional handling. > > Is it right that, for RVV, a load or store of [4,4] will access [8,8] > bits, even when that means accessing fully-unused bytes? E.g. 4+4X > when X=3D3 would be 16 bits/2 bytes of useful data, but a bitsize of > 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision > of [4,4] would store 2 bytes beyond the end of the useful data when X=3D= =3D3? > > Richard > >> Pan >> >> From: =E7=9B=BC =E6=9D=8E >> Sent: Tuesday, February 28, 2023 5:59 PM >> To: Richard Sandiford ; Li, Pan2 >> Cc: incarnation.p.lee--- via Gcc-patches ; juzh= e.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustme= nt >> >> Understood, thanks for the explanations and suggestions. Let me have a t= ry and keep you posted. >> >> Pan >> ________________________________ >> From: Richard Sandiford > >> Sent: Tuesday, February 28, 2023 17:50 >> To: Li, Pan2 > >> Cc: =E7=9B=BC =E6=9D=8E >; incarnation.p.lee--- via Gcc-patches >; juzhe.zhong@rivai.ai >; kit= o.cheng@sifive.com >; rguenther@suse.de > >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustme= nt >> >> "Li, Pan2" > writes: >>> Hi Richard Sandiford, >>> >>> After some investigation, I am not sure if it is possible to make it ge= neral without any changes to exact_div. We can add one method like below to= get the unit poly for all possible N. >>> >>> template >>> inline POLY_CONST_RESULT (N, Ca, Ca) >>> normalize_to_unit (const poly_int_pod &a) >>> { >>> typedef POLY_CONST_COEFF (Ca, Ca) C; >>> >>> poly_int normalized =3D a; >>> >>> if (normalized.is_constant()) >>> normalized.coeffs[0] =3D 1; >>> else >>> for (unsigned int i =3D 0; i < N; i++) >>> POLY_SET_COEFF (C, normalized, i, 1); >>> >>> return normalized; >>> } >>> >>> And then adjust the genmodes like below to consume the unit poly. >>> >>> printf (" poly_uint16 unit_poly =3D " >>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name= ); >>> printf (" if (known_lt (mode_precision[E_%smode], " >>> "unit_poly * BITS_PER_UNIT))\n", m->name); >>> printf (" mode_size[E_%smode] =3D unit_poly;\n", m->name); >>> >>> I am not sure if it is a good idea to introduce above normalize code in= to exact_div. Given the comment of the exact_div indicates that =E2=80=9C/*= Return A / B, given that A is known to be a multiple of B. */=E2=80=9D. >> >> My point was that we have multiple ways of dividing poly_ints: >> >> - exact_div, for when the caller knows that the result is always exact >> - can_div_trunc_p, for truncating division (round towards 0) >> - can_div_away_from_zero_p, for rounding away from 0 >> - ... >> >> This is like how we have multiple division *_EXPRs on trees. >> >> Until now, exact_div was the correct choice for modes because vector >> modes didn't have padding. We're now changing that, so my suggestion >> in the review was to change the division operation that we use. >> Rather than use exact_div, we should now use can_div_away_from_zero_p, >> which would have the effect of rounding the quotient up. >> >> Something like: >> >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_= UNIT, >> &mode_size[E_%smode])) >> gcc_unreachable (); >> >> But this will require a new overload of can_div_away_from_zero_p, since >> the existing one is for constant quotients rather than constant divisors. >> >> Thanks, >> Richard >> >>> >>> Could you please help to share your opinion about this from the expert= =E2=80=99s perspective ? Thank you! >>> >>> Pan >>> >>> From: =E7=9B=BC =E6=9D=8E > >>> Sent: Monday, February 27, 2023 11:13 PM >>> To: Richard Sandiford >; incarnation.p.lee--- via Gcc-patches > >>> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifiv= e.com; rguenther@suse.de; Li, Pan2 > >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent >>> >>> Never mind, wish you have a good holiday. >>> >>> Thanks for pointing this out, the if part cannot take care of poly_int = with N > 2. As I understand, we need to make it general for all the N of po= ly_int. >>> >>> Thus I would like to double confirm with you about how to make it gener= al. I suppose there will be a new function can_div_away_from_zero_p to repl= ace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged= (consider the word exact, I suppose we should not touch here), right? Then = we still need one poly_int with all 1 for N as the return if can_div_away_f= rom_zero_p is true. >>> >>> Thanks again for your professional suggestion, have a nice day, =EF=98= =89! >>> >>> Pan >>> ________________________________ >>> From: Richard Sandiford >> >>> Sent: Monday, February 27, 2023 22:24 >>> To: incarnation.p.lee--- via Gcc-patches >> >>> Cc: incarnation.p.lee@outlook.com> >>;= juzhe.zhong@rivai.ai> >>; kit= o.cheng@sifive.com> >>;= rguenther@suse.de >>; pan2.= li@intel.com >> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent >>> >>> Sorry for the slow reply, been away for a couple of weeks. >>> >>> "incarnation.p.lee--- via Gcc-patches" >> writes: >>>> From: Pan Li >> >>>> >>>> Fix the bug of the rvv bool mode precision with the adjustment. >>>> The bits size of vbool*_t will be adjusted to >>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The >>>> adjusted mode precison of vbool*_t will help underlying pass to >>>> make the right decision for both the correctness and optimizati= on. >>>> >>>> Given below sample code: >>>> void test_1(int8_t * restrict in, int8_t * restrict out) >>>> { >>>> vbool8_t v2 =3D *(vbool8_t*)in; >>>> vbool16_t v5 =3D *(vbool16_t*)in; >>>> *(vbool16_t*)(out + 200) =3D v5; >>>> *(vbool8_t*)(out + 100) =3D v2; >>>> } >>>> >>>> Before the precision adjustment: >>>> addi a4,a1,100 >>>> vsetvli a5,zero,e8,m1,ta,ma >>>> addi a1,a1,200 >>>> vlm.v v24,0(a0) >>>> vsm.v v24,0(a4) >>>> // Need one vsetvli and vlm.v for correctness here. >>>> vsm.v v24,0(a1) >>>> >>>> After the precision adjustment: >>>> csrr t0,vlenb >>>> slli t1,t0,1 >>>> csrr a3,vlenb >>>> sub sp,sp,t1 >>>> slli a4,a3,1 >>>> add a4,a4,sp >>>> sub a3,a4,a3 >>>> vsetvli a5,zero,e8,m1,ta,ma >>>> addi a2,a1,200 >>>> vlm.v v24,0(a0) >>>> vsm.v v24,0(a3) >>>> addi a1,a1,100 >>>> vsetvli a4,zero,e8,mf2,ta,ma >>>> csrr t0,vlenb >>>> vlm.v v25,0(a3) >>>> vsm.v v25,0(a2) >>>> slli t1,t0,1 >>>> vsetvli a5,zero,e8,m1,ta,ma >>>> vsm.v v24,0(a1) >>>> add sp,sp,t1 >>>> jr ra >>>> >>>> However, there may be some optimization opportunates after >>>> the mode precision adjustment. It can be token care of in >>>> the RISC-V backend in the underlying separted PR(s). >>>> >>>> PR 108185 >>>> PR 108654 >>>> >>>> gcc/ChangeLog: >>>> >>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION): >>>> * config/riscv/riscv.cc (riscv_v_adjust_precision): >>>> * config/riscv/riscv.h (riscv_v_adjust_precision): >>>> * genmodes.cc (ADJUST_PRECISION): >>>> (emit_mode_adjustments): >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/riscv/pr108185-1.c: New test. >>>> * gcc.target/riscv/pr108185-2.c: New test. >>>> * gcc.target/riscv/pr108185-3.c: New test. >>>> * gcc.target/riscv/pr108185-4.c: New test. >>>> * gcc.target/riscv/pr108185-5.c: New test. >>>> * gcc.target/riscv/pr108185-6.c: New test. >>>> * gcc.target/riscv/pr108185-7.c: New test. >>>> * gcc.target/riscv/pr108185-8.c: New test. >>>> >>>> Signed-off-by: Pan Li >> >>>> --- >>>> gcc/config/riscv/riscv-modes.def | 8 +++ >>>> gcc/config/riscv/riscv.cc | 12 ++++ >>>> gcc/config/riscv/riscv.h | 1 + >>>> gcc/genmodes.cc | 25 ++++++- >>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++ >>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++ >>>> 12 files changed, 598 insertions(+), 1 deletion(-) >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>> >>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv= -modes.def >>>> index d5305efa8a6..110bddce851 100644 >>>> --- a/gcc/config/riscv/riscv-modes.def >>>> +++ b/gcc/config/riscv/riscv-modes.def >>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * ris= cv_bytes_per_vector_chunk); >>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vecto= r_chunk); >>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8)); >>>> >>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1)); >>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2)); >>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4)); >>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8)); >>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16)= ); >>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32)= ); >>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64)= ); >>>> + >>>> /* >>>> | Mode | MIN_VLEN=3D32 | MIN_VLEN=3D32 | MIN_VLEN=3D64 | MI= N_VLEN=3D64 | >>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL= | >>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >>>> index de3e1f903c7..cbe66c0e35b 100644 >>>> --- a/gcc/config/riscv/riscv.cc >>>> +++ b/gcc/config/riscv/riscv.cc >>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int s= cale) >>>> return scale; >>>> } >>>> >>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct >>>> + PRECISION size for corresponding machine_mode. */ >>>> + >>>> +poly_int64 >>>> +riscv_v_adjust_precision (machine_mode mode, int scale) >>>> +{ >>>> + if (riscv_v_ext_vector_mode_p (mode)) >>>> + return riscv_vector_chunks * scale; >>>> + >>>> + return scale; >>>> +} >>>> + >>>> /* Return true if X is a valid address for machine mode MODE. If it = is, >>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is = in >>>> effect. */ >>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h >>>> index 5bc7f2f467d..15b9317a8ce 100644 >>>> --- a/gcc/config/riscv/riscv.h >>>> +++ b/gcc/config/riscv/riscv.h >>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; >>>> extern unsigned riscv_bytes_per_vector_chunk; >>>> extern poly_uint16 riscv_vector_chunks; >>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int); >>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int); >>>> /* The number of bits and bytes in a RVV vector. */ >>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * ris= cv_bytes_per_vector_chunk * 8)) >>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * ri= scv_bytes_per_vector_chunk)) >>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc >>>> index 2d418f09aab..12f4e6335e6 100644 >>>> --- a/gcc/genmodes.cc >>>> +++ b/gcc/genmodes.cc >>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; >>>> static struct mode_adjust *adj_format; >>>> static struct mode_adjust *adj_ibit; >>>> static struct mode_adjust *adj_fbit; >>>> +static struct mode_adjust *adj_precision; >>>> >>>> /* Mode class operations. */ >>>> static enum mode_class >>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, >>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RAN= DOM) >>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, R= ANDOM) >>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, = RANDOM) >>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, = RANDOM) >>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT= , FLOAT) >>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM) >>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) >>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void) >>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n", >>>> m->name, m->name); >>>> printf (" mode_precision[E_%smode] =3D ps * old_factor;\n", = m->name); >>>> - printf (" mode_size[E_%smode] =3D exact_div (mode_precision[= E_%smode]," >>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UN= IT. */ >>>> + printf (" poly_uint16 size_one =3D " >>>> + "mode_precision[E_%smode].is_constant ()\n", m->name); >>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n"); >>> >>> Have you tried this on an x86_64 system? I wouldn't expect it to work >>> because of the: >>> >>> STATIC_ASSERT (N >=3D 2); >>> >>> in the poly_uint16 constructor. >>> >>>> + printf (" if (known_lt (mode_precision[E_%smode], " >>>> + "size_one * BITS_PER_UNIT))\n", m->name); >>>> + printf (" mode_size[E_%smode] =3D size_one;\n", m->name); >>>> + printf (" else\n"); >>>> + printf (" mode_size[E_%smode] =3D exact_div (mode_precisio= n[E_%smode]," >>> >>> Now that the assert implicit in the original exact_div no longer holds, >>> I think we should instead generalise it to can_div_away_from_zero_p >>> (which will involve defining a new overload of can_div_away_from_zero_p= ). >>> I think that will give the same result as the code above for the cases >>> that the code above handles. But it should be more general too. >>> >>> TBH, I'm still sceptical that this is all that is needed. It seems >>> unlikely that we've been so good at writing vector support code that >>> we've made it work for precision < bitsize, despite that being an >>> unsupported combination until now. But I guess we can fix problems >>> on a case-by-case basis. >>> >>> Thanks, >>> Richard >>> >>>> " BITS_PER_UNIT);\n", m->name, m->name); >>>> printf (" mode_nunits[E_%smode] =3D ps;\n", m->name); >>>> printf (" adjust_mode_mask (E_%smode);\n", m->name); >>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void) >>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) =3D %s;\n= ", >>>> a->file, a->line, a->mode->name, a->adjustment); >>>> >>>> + /* Adjust precision to the actual bits size. */ >>>> + for (a =3D adj_precision; a; a =3D a->next) >>>> + switch (a->mode->cl) >>>> + { >>>> + case MODE_VECTOR_BOOL: >>>> + printf ("\n /* %s:%d. */\n ps =3D %s;\n", a->file, a->line, >>>> + a->adjustment); >>>> + printf (" mode_precision[E_%smode] =3D ps;\n", a->mode->name); >>>> + break; >>>> + default: >>>> + break; >>>> + } >>>> + >>>> puts ("}"); >>>> } >>>> >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsui= te/gcc.target/riscv/pr108185-1.c >>>> new file mode 100644 >>>> index 00000000000..e70960c5b6d >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 18 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsui= te/gcc.target/riscv/pr108185-2.c >>>> new file mode 100644 >>>> index 00000000000..dcc7a644a88 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 17 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsui= te/gcc.target/riscv/pr108185-3.c >>>> new file mode 100644 >>>> index 00000000000..3af0513e006 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 16 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsui= te/gcc.target/riscv/pr108185-4.c >>>> new file mode 100644 >>>> index 00000000000..ea3c360d756 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 15 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsui= te/gcc.target/riscv/pr108185-5.c >>>> new file mode 100644 >>>> index 00000000000..9fc659d2402 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 14 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsui= te/gcc.target/riscv/pr108185-6.c >>>> new file mode 100644 >>>> index 00000000000..98275e5267d >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 13 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsui= te/gcc.target/riscv/pr108185-7.c >>>> new file mode 100644 >>>> index 00000000000..8f6f0b11f09 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>> @@ -0,0 +1,68 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 12 } } */ >>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsui= te/gcc.target/riscv/pr108185-8.c >>>> new file mode 100644 >>>> index 00000000000..d96959dd064 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>> @@ -0,0 +1,77 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>> + >>>> +#include "riscv_vector.h" >>>> + >>>> +void >>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>> + >>>> + *(vbool1_t*)(out + 100) =3D v1; >>>> + *(vbool1_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>> + >>>> + *(vbool2_t*)(out + 100) =3D v1; >>>> + *(vbool2_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>> + >>>> + *(vbool4_t*)(out + 100) =3D v1; >>>> + *(vbool4_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>> + >>>> + *(vbool8_t*)(out + 100) =3D v1; >>>> + *(vbool8_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>> + >>>> + *(vbool16_t*)(out + 100) =3D v1; >>>> + *(vbool16_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>> + >>>> + *(vbool32_t*)(out + 100) =3D v1; >>>> + *(vbool32_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +void >>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>> + >>>> + *(vbool64_t*)(out + 100) =3D v1; >>>> + *(vbool64_t*)(out + 200) =3D v2; >>>> +} >>>> + >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\= s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 7 } } */ >>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-= 9]+\)} 14 } } */