From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 964413858D33 for ; Wed, 1 Mar 2023 12:03:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 964413858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7E52E2F4; Wed, 1 Mar 2023 04:03:48 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 264703F587; Wed, 1 Mar 2023 04:03:04 -0800 (PST) From: Richard Sandiford To: =?utf-8?Q?=E7=9B=BC_=E6=9D=8E_via_Gcc-patches?= Mail-Followup-To: =?utf-8?Q?=E7=9B=BC_=E6=9D=8E_via_Gcc-patches?= , =?utf-8?B?55u8IOadjg==?= , "juzhe.zhong\@rivai.ai" , pan2.li , Kito.cheng , rguenther , richard.sandiford@arm.com Cc: =?utf-8?B?55u8IOadjg==?= , "juzhe.zhong\@rivai.ai" , pan2.li , Kito.cheng , rguenther Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment References: <6F429B8CF9C3B3EF+2023030118462950633032@rivai.ai> Date: Wed, 01 Mar 2023 12:03:02 +0000 In-Reply-To: (=?utf-8?B?IuebvCDmnY4=?= via Gcc-patches"'s message of "Wed, 1 Mar 2023 11:53:20 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-32.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SCC_10_SHORT_WORD_LINES,SCC_20_SHORT_WORD_LINES,SCC_35_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: =E7=9B=BC =E6=9D=8E via Gcc-patches writes: > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4= BI. You can notice that the mode size is unchanged. > > printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " > "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); > > VNx4BI Before precision [0x4, 0x4], size [0x4, 0] > VNx4BI After precision [0x4, 0x4], size [0x4, 0] Yeah, the result is expected to be unchanged if the division fails. That's a deliberate part of the interface. The can_* functions should never be used without testing the boolean return value. But this precision of [4,4] for VNx4BI is different from what you listed below. Like I say, if the precision really is [4,4], and if the size really is ceil([4,4]/8), then I don't think we can represent that with current infrastructure. Thanks, Richard > > Pan > ________________________________ > From: Richard Sandiford > Sent: Wednesday, March 1, 2023 19:11 > To: =E7=9B=BC =E6=9D=8E via Gcc-patches > Cc: juzhe.zhong@rivai.ai ; pan2.li ; =E7=9B=BC =E6=9D=8E ; Kito.cheng ; rguenther > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > =E7=9B=BC =E6=9D=8E via Gcc-patches writes: >> Thank you all for your quick response. >> >> As juzhe mentioned, the memory access of RISC-V will be always aligned t= o the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vboo= l*. > > OK, thanks to both of you. This is what I'd have expected. > > In that case, I think both the can_div_away_from_zero_p and the > original patch (using size_one) will give the wrong results. > There isn't a way of representing ceil([4,4]/8) as a poly_int. > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even. > >> Actually, the data [4,4] comes from the self-test, the RISC-V precision = mode as below. >> >> VNx64BI precision [0x40, 0x40]. >> VNx32BI precision [0x20, 0x20]. >> VNx16BI precision [0x10, 0x10]. >> VNx8BI precision [0x8, 0x8]. >> VNx4BI precision [0x8, 0x8]. >> VNx2BI precision [0x8, 0x8]. >> VNx1BI precision [0x8, 0x8]. > > Ah, OK. Which self-test causes this? > > Richard > >> The impact of data [4, 4] will impact the genmode part, we cannot write = like below as the gcc_unreachable will be hitten. >> >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, = &mode_size[E_%smode])) >> gcc_unreachable (); // Hit on [4, 4] of the self-test. >> >> Pan >> ________________________________ >> From: juzhe.zhong@rivai.ai >> Sent: Wednesday, March 1, 2023 18:46 >> To: richard.sandiford ; pan2.li >> Cc: incarnation.p.lee ; gcc-patches ; Kito.cheng ; rguenther >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adju= stment >> >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8] >>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X >>>>when X=3D3 would be 16 bits/2 bytes of useful data, but a bitsize of >>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X= =3D=3D3? >> >> Hi, Richard. Thank you for helping us. >> My understanding of RVV ISA: >> >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VN= x4QI, VNx2DI,...etc) >> For data mode, we fully access the data and we don't have unused bytes, = so we don't need to adjust precision. >> However, for mask mode we access mask bit in compact model (since each m= ask bit for corresponding element are consecutive). >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, thes= e 4 modes have same bytesize (1,1) but different bitsize. >> >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, V= Nx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignme= nt, I guess it can not). >> >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2B= I,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access= in bit alignment. so they will the same in the access. >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte,= 2bytes, 4bytes. They are accessing different size. >> >> This is my comprehension of RVV ISA, feel free to correct me. >> Thanks. >> >> ________________________________ >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2023-03-01 18:11 >> To: Li\, Pan2 >> CC: =E7=9B=BC =E6=9D=8E; incarnati= on.p.lee--- via Gcc-patches; juzhe.zhong\@r= ivai.ai; kito.cheng\@sifive.com; rguenther\@suse.de >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustme= nt >> "Li, Pan2" writes: >>> Hi Richard Sandiford, >>> >>> Just tried the overloaded constant divisors with below print div, it wo= rks as you mentioned, =EF=98=89! >>> >>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " >>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); >>> >>> template >>> inline typename if_nonpoly::type >>> can_div_away_from_zero_p (const poly_int_pod &a, >>> Cb b, >>> poly_int_pod *quotient) >>> { >>> if (!can_div_trunc_p (a, b, quotient)) >>> return false; >>> if (maybe_ne (*quotient * b, a)) >>> for (unsigned int i =3D 0; i < N; ++i) >>> quotient->coeffs[i] +=3D (quotient->coeffs[i] < 0 ? -1 : 1); >>> return true; >>> } >>> >>> But I may have a question about the one case as below. >>> >>> Assume: >>> a =3D [4, 4], b =3D 8. >>> >>> When meet can_div_trunc_p, it will check if the reminder is constant or= not, aka a.coeffs[i] % 8 =3D=3D 0 (i >=3D 1). If not constant reminder, th= e can_div_trunc_p will do nothing about quotient and return false. >>> >>> Thus, when a =3D [4, 4] for can_div_away_from_zero_p, the output *quoti= ent will be unchanged, aka the mod_size[E_%smode] will be unchanged for thi= s case. However, the underlying mode_size will adjust it to the real byte s= ize, and I am not sure if it is by design or requires additional handling. >> >> Is it right that, for RVV, a load or store of [4,4] will access [8,8] >> bits, even when that means accessing fully-unused bytes? E.g. 4+4X >> when X=3D3 would be 16 bits/2 bytes of useful data, but a bitsize of >> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision >> of [4,4] would store 2 bytes beyond the end of the useful data when X=3D= =3D3? >> >> Richard >> >>> Pan >>> >>> From: =E7=9B=BC =E6=9D=8E >>> Sent: Tuesday, February 28, 2023 5:59 PM >>> To: Richard Sandiford ; Li, Pan2 >>> Cc: incarnation.p.lee--- via Gcc-patches ; juz= he.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent >>> >>> Understood, thanks for the explanations and suggestions. Let me have a = try and keep you posted. >>> >>> Pan >>> ________________________________ >>> From: Richard Sandiford > >>> Sent: Tuesday, February 28, 2023 17:50 >>> To: Li, Pan2 > >>> Cc: =E7=9B=BC =E6=9D=8E >; incarnation.p.lee--- via Gcc-patches >; juzhe.zhong@rivai.ai >; ki= to.cheng@sifive.com >; rguenther@suse.de <= rguenther@suse.de> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent >>> >>> "Li, Pan2" > writes: >>>> Hi Richard Sandiford, >>>> >>>> After some investigation, I am not sure if it is possible to make it g= eneral without any changes to exact_div. We can add one method like below t= o get the unit poly for all possible N. >>>> >>>> template >>>> inline POLY_CONST_RESULT (N, Ca, Ca) >>>> normalize_to_unit (const poly_int_pod &a) >>>> { >>>> typedef POLY_CONST_COEFF (Ca, Ca) C; >>>> >>>> poly_int normalized =3D a; >>>> >>>> if (normalized.is_constant()) >>>> normalized.coeffs[0] =3D 1; >>>> else >>>> for (unsigned int i =3D 0; i < N; i++) >>>> POLY_SET_COEFF (C, normalized, i, 1); >>>> >>>> return normalized; >>>> } >>>> >>>> And then adjust the genmodes like below to consume the unit poly. >>>> >>>> printf (" poly_uint16 unit_poly =3D " >>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->nam= e); >>>> printf (" if (known_lt (mode_precision[E_%smode], " >>>> "unit_poly * BITS_PER_UNIT))\n", m->name); >>>> printf (" mode_size[E_%smode] =3D unit_poly;\n", m->name); >>>> >>>> I am not sure if it is a good idea to introduce above normalize code i= nto exact_div. Given the comment of the exact_div indicates that =E2=80=9C/= * Return A / B, given that A is known to be a multiple of B. */=E2=80=9D. >>> >>> My point was that we have multiple ways of dividing poly_ints: >>> >>> - exact_div, for when the caller knows that the result is always exact >>> - can_div_trunc_p, for truncating division (round towards 0) >>> - can_div_away_from_zero_p, for rounding away from 0 >>> - ... >>> >>> This is like how we have multiple division *_EXPRs on trees. >>> >>> Until now, exact_div was the correct choice for modes because vector >>> modes didn't have padding. We're now changing that, so my suggestion >>> in the review was to change the division operation that we use. >>> Rather than use exact_div, we should now use can_div_away_from_zero_p, >>> which would have the effect of rounding the quotient up. >>> >>> Something like: >>> >>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER= _UNIT, >>> &mode_size[E_%smode])) >>> gcc_unreachable (); >>> >>> But this will require a new overload of can_div_away_from_zero_p, since >>> the existing one is for constant quotients rather than constant divisor= s. >>> >>> Thanks, >>> Richard >>> >>>> >>>> Could you please help to share your opinion about this from the expert= =E2=80=99s perspective ? Thank you! >>>> >>>> Pan >>>> >>>> From: =E7=9B=BC =E6=9D=8E > >>>> Sent: Monday, February 27, 2023 11:13 PM >>>> To: Richard Sandiford >; incarnation.p.lee--- via Gcc-patches > >>>> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifi= ve.com; rguenther@suse.de; Li, Pan2 > >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjust= ment >>>> >>>> Never mind, wish you have a good holiday. >>>> >>>> Thanks for pointing this out, the if part cannot take care of poly_int= with N > 2. As I understand, we need to make it general for all the N of p= oly_int. >>>> >>>> Thus I would like to double confirm with you about how to make it gene= ral. I suppose there will be a new function can_div_away_from_zero_p to rep= lace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchange= d(consider the word exact, I suppose we should not touch here), right? Then= we still need one poly_int with all 1 for N as the return if can_div_away_= from_zero_p is true. >>>> >>>> Thanks again for your professional suggestion, have a nice day, =EF=98= =89! >>>> >>>> Pan >>>> ________________________________ >>>> From: Richard Sandiford >> >>>> Sent: Monday, February 27, 2023 22:24 >>>> To: incarnation.p.lee--- via Gcc-patches >> >>>> Cc: incarnation.p.lee@outlook.com> >>= ; juzhe.zhong@rivai.ai> >>; ki= to.cheng@sifive.com> >>= ; rguenther@suse.de >>; pan2= .li@intel.com >> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjust= ment >>>> >>>> Sorry for the slow reply, been away for a couple of weeks. >>>> >>>> "incarnation.p.lee--- via Gcc-patches" >> writes: >>>>> From: Pan Li >> >>>>> >>>>> Fix the bug of the rvv bool mode precision with the adjustment. >>>>> The bits size of vbool*_t will be adjusted to >>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The >>>>> adjusted mode precison of vbool*_t will help underlying pass to >>>>> make the right decision for both the correctness and optimizat= ion. >>>>> >>>>> Given below sample code: >>>>> void test_1(int8_t * restrict in, int8_t * restrict out) >>>>> { >>>>> vbool8_t v2 =3D *(vbool8_t*)in; >>>>> vbool16_t v5 =3D *(vbool16_t*)in; >>>>> *(vbool16_t*)(out + 200) =3D v5; >>>>> *(vbool8_t*)(out + 100) =3D v2; >>>>> } >>>>> >>>>> Before the precision adjustment: >>>>> addi a4,a1,100 >>>>> vsetvli a5,zero,e8,m1,ta,ma >>>>> addi a1,a1,200 >>>>> vlm.v v24,0(a0) >>>>> vsm.v v24,0(a4) >>>>> // Need one vsetvli and vlm.v for correctness here. >>>>> vsm.v v24,0(a1) >>>>> >>>>> After the precision adjustment: >>>>> csrr t0,vlenb >>>>> slli t1,t0,1 >>>>> csrr a3,vlenb >>>>> sub sp,sp,t1 >>>>> slli a4,a3,1 >>>>> add a4,a4,sp >>>>> sub a3,a4,a3 >>>>> vsetvli a5,zero,e8,m1,ta,ma >>>>> addi a2,a1,200 >>>>> vlm.v v24,0(a0) >>>>> vsm.v v24,0(a3) >>>>> addi a1,a1,100 >>>>> vsetvli a4,zero,e8,mf2,ta,ma >>>>> csrr t0,vlenb >>>>> vlm.v v25,0(a3) >>>>> vsm.v v25,0(a2) >>>>> slli t1,t0,1 >>>>> vsetvli a5,zero,e8,m1,ta,ma >>>>> vsm.v v24,0(a1) >>>>> add sp,sp,t1 >>>>> jr ra >>>>> >>>>> However, there may be some optimization opportunates after >>>>> the mode precision adjustment. It can be token care of in >>>>> the RISC-V backend in the underlying separted PR(s). >>>>> >>>>> PR 108185 >>>>> PR 108654 >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION): >>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision): >>>>> * config/riscv/riscv.h (riscv_v_adjust_precision): >>>>> * genmodes.cc (ADJUST_PRECISION): >>>>> (emit_mode_adjustments): >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.target/riscv/pr108185-1.c: New test. >>>>> * gcc.target/riscv/pr108185-2.c: New test. >>>>> * gcc.target/riscv/pr108185-3.c: New test. >>>>> * gcc.target/riscv/pr108185-4.c: New test. >>>>> * gcc.target/riscv/pr108185-5.c: New test. >>>>> * gcc.target/riscv/pr108185-6.c: New test. >>>>> * gcc.target/riscv/pr108185-7.c: New test. >>>>> * gcc.target/riscv/pr108185-8.c: New test. >>>>> >>>>> Signed-off-by: Pan Li >> >>>>> --- >>>>> gcc/config/riscv/riscv-modes.def | 8 +++ >>>>> gcc/config/riscv/riscv.cc | 12 ++++ >>>>> gcc/config/riscv/riscv.h | 1 + >>>>> gcc/genmodes.cc | 25 ++++++- >>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++= ++ >>>>> 12 files changed, 598 insertions(+), 1 deletion(-) >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>>> >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/risc= v-modes.def >>>>> index d5305efa8a6..110bddce851 100644 >>>>> --- a/gcc/config/riscv/riscv-modes.def >>>>> +++ b/gcc/config/riscv/riscv-modes.def >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * ri= scv_bytes_per_vector_chunk); >>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vect= or_chunk); >>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8)); >>>>> >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1)); >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2)); >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4)); >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8)); >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16= )); >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32= )); >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64= )); >>>>> + >>>>> /* >>>>> | Mode | MIN_VLEN=3D32 | MIN_VLEN=3D32 | MIN_VLEN=3D64 | M= IN_VLEN=3D64 | >>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMU= L | >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >>>>> index de3e1f903c7..cbe66c0e35b 100644 >>>>> --- a/gcc/config/riscv/riscv.cc >>>>> +++ b/gcc/config/riscv/riscv.cc >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int = scale) >>>>> return scale; >>>>> } >>>>> >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct >>>>> + PRECISION size for corresponding machine_mode. */ >>>>> + >>>>> +poly_int64 >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale) >>>>> +{ >>>>> + if (riscv_v_ext_vector_mode_p (mode)) >>>>> + return riscv_vector_chunks * scale; >>>>> + >>>>> + return scale; >>>>> +} >>>>> + >>>>> /* Return true if X is a valid address for machine mode MODE. If it= is, >>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is= in >>>>> effect. */ >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h >>>>> index 5bc7f2f467d..15b9317a8ce 100644 >>>>> --- a/gcc/config/riscv/riscv.h >>>>> +++ b/gcc/config/riscv/riscv.h >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; >>>>> extern unsigned riscv_bytes_per_vector_chunk; >>>>> extern poly_uint16 riscv_vector_chunks; >>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int); >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int); >>>>> /* The number of bits and bytes in a RVV vector. */ >>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * ri= scv_bytes_per_vector_chunk * 8)) >>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * r= iscv_bytes_per_vector_chunk)) >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc >>>>> index 2d418f09aab..12f4e6335e6 100644 >>>>> --- a/gcc/genmodes.cc >>>>> +++ b/gcc/genmodes.cc >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; >>>>> static struct mode_adjust *adj_format; >>>>> static struct mode_adjust *adj_ibit; >>>>> static struct mode_adjust *adj_fbit; >>>>> +static struct mode_adjust *adj_precision; >>>>> >>>>> /* Mode class operations. */ >>>>> static enum mode_class >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, >>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RA= NDOM) >>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, = RANDOM) >>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM,= RANDOM) >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,= RANDOM) >>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOA= T, FLOAT) >>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM) >>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void) >>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n", >>>>> m->name, m->name); >>>>> printf (" mode_precision[E_%smode] =3D ps * old_factor;\n",= m->name); >>>>> - printf (" mode_size[E_%smode] =3D exact_div (mode_precision= [E_%smode]," >>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_U= NIT. */ >>>>> + printf (" poly_uint16 size_one =3D " >>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name); >>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n"); >>>> >>>> Have you tried this on an x86_64 system? I wouldn't expect it to work >>>> because of the: >>>> >>>> STATIC_ASSERT (N >=3D 2); >>>> >>>> in the poly_uint16 constructor. >>>> >>>>> + printf (" if (known_lt (mode_precision[E_%smode], " >>>>> + "size_one * BITS_PER_UNIT))\n", m->name); >>>>> + printf (" mode_size[E_%smode] =3D size_one;\n", m->name); >>>>> + printf (" else\n"); >>>>> + printf (" mode_size[E_%smode] =3D exact_div (mode_precisi= on[E_%smode]," >>>> >>>> Now that the assert implicit in the original exact_div no longer holds, >>>> I think we should instead generalise it to can_div_away_from_zero_p >>>> (which will involve defining a new overload of can_div_away_from_zero_= p). >>>> I think that will give the same result as the code above for the cases >>>> that the code above handles. But it should be more general too. >>>> >>>> TBH, I'm still sceptical that this is all that is needed. It seems >>>> unlikely that we've been so good at writing vector support code that >>>> we've made it work for precision < bitsize, despite that being an >>>> unsupported combination until now. But I guess we can fix problems >>>> on a case-by-case basis. >>>> >>>> Thanks, >>>> Richard >>>> >>>>> " BITS_PER_UNIT);\n", m->name, m->name); >>>>> printf (" mode_nunits[E_%smode] =3D ps;\n", m->name); >>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name); >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void) >>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) =3D %s;\= n", >>>>> a->file, a->line, a->mode->name, a->adjustment); >>>>> >>>>> + /* Adjust precision to the actual bits size. */ >>>>> + for (a =3D adj_precision; a; a =3D a->next) >>>>> + switch (a->mode->cl) >>>>> + { >>>>> + case MODE_VECTOR_BOOL: >>>>> + printf ("\n /* %s:%d. */\n ps =3D %s;\n", a->file, a->line, >>>>> + a->adjustment); >>>>> + printf (" mode_precision[E_%smode] =3D ps;\n", a->mode->name= ); >>>>> + break; >>>>> + default: >>>>> + break; >>>>> + } >>>>> + >>>>> puts ("}"); >>>>> } >>>>> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-1.c >>>>> new file mode 100644 >>>>> index 00000000000..e70960c5b6d >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 18 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-2.c >>>>> new file mode 100644 >>>>> index 00000000000..dcc7a644a88 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 17 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-3.c >>>>> new file mode 100644 >>>>> index 00000000000..3af0513e006 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 16 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-4.c >>>>> new file mode 100644 >>>>> index 00000000000..ea3c360d756 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 15 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-5.c >>>>> new file mode 100644 >>>>> index 00000000000..9fc659d2402 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 14 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-6.c >>>>> new file mode 100644 >>>>> index 00000000000..98275e5267d >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 13 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-7.c >>>>> new file mode 100644 >>>>> index 00000000000..8f6f0b11f09 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out= ) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 12 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsu= ite/gcc.target/riscv/pr108185-8.c >>>>> new file mode 100644 >>>>> index 00000000000..d96959dd064 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>>> @@ -0,0 +1,77 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool1_t v1 =3D *(vbool1_t*)in; >>>>> + vbool1_t v2 =3D *(vbool1_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) =3D v1; >>>>> + *(vbool1_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool2_t v1 =3D *(vbool2_t*)in; >>>>> + vbool2_t v2 =3D *(vbool2_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) =3D v1; >>>>> + *(vbool2_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool4_t v1 =3D *(vbool4_t*)in; >>>>> + vbool4_t v2 =3D *(vbool4_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) =3D v1; >>>>> + *(vbool4_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)= { >>>>> + vbool8_t v1 =3D *(vbool8_t*)in; >>>>> + vbool8_t v2 =3D *(vbool8_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) =3D v1; >>>>> + *(vbool8_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool16_t v1 =3D *(vbool16_t*)in; >>>>> + vbool16_t v2 =3D *(vbool16_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) =3D v1; >>>>> + *(vbool16_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool32_t v1 =3D *(vbool32_t*)in; >>>>> + vbool32_t v2 =3D *(vbool32_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) =3D v1; >>>>> + *(vbool32_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict ou= t) { >>>>> + vbool64_t v1 =3D *(vbool64_t*)in; >>>>> + vbool64_t v2 =3D *(vbool64_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) =3D v1; >>>>> + *(vbool64_t*)(out + 200) =3D v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,= \s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 7 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0= -9]+\)} 14 } } */