On Wed, 1 Mar 2023, Pan Li wrote: > I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading. > > The different precision try to resolve the below bugs, the second > vlm(with different size of load bytes compared to first one) is > eliminated because vbool8 and vbool16 have the same precision size, aka > [8, 8]. That's because the corresponding data vectors to vbool8 and vbool16 have the same number of lanes, right? (another RVV pecularity) > vbool8_t v2 = *(vbool8_t*)in; > vbool16_t v5 = *(vbool16_t*)in; > *(vbool16_t*)(out + 200) = v5; > *(vbool8_t*)(out + 100) = v2; > > addi a4,a1,100 > vsetvli a5,zero,e8,m1,ta,ma > addi a1,a1,200 > vlm.v v24,0(a0) > vsm.v v24,0(a4) > // Need one vsetvli and vlm.v for correctness here. > vsm.v v24,0(a1) > > Pan > ________________________________ > From: Richard Biener > Sent: Wednesday, March 1, 2023 20:33 > To: Richard Sandiford > Cc: 盼 李 via Gcc-patches ; 盼 李 ; juzhe.zhong@rivai.ai ; pan2.li ; Kito.cheng > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > On Wed, 1 Mar 2023, Richard Sandiford wrote: > > > 盼 李 via Gcc-patches writes: > > > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged. > > > > > > printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " > > > "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); > > > > > > VNx4BI Before precision [0x4, 0x4], size [0x4, 0] > > > VNx4BI After precision [0x4, 0x4], size [0x4, 0] > > > > Yeah, the result is expected to be unchanged if the division fails. > > That's a deliberate part of the interface. The can_* functions > > should never be used without testing the boolean return value. > > > > But this precision of [4,4] for VNx4BI is different from what you > > listed below. Like I say, if the precision really is [4,4], and if > > the size really is ceil([4,4]/8), then I don't think we can represent > > that with current infrastructure. > > The size of VNx4BI is (4*N + 7) / 8 bytes. I suppose we could simply > not store the size in bytes but only the size in bits then? > > I see the problem, but I also don't see a good solution since > for VNx4BI with N == 3 we have one and a half byte of storage. > > How do memory access patterns work with poly-int sizes? > > > > > Thanks, > > Richard > > > > > > > > Pan > > > ________________________________ > > > From: Richard Sandiford > > > Sent: Wednesday, March 1, 2023 19:11 > > > To: 盼 李 via Gcc-patches > > > Cc: juzhe.zhong@rivai.ai ; pan2.li ; 盼 李 ; Kito.cheng ; rguenther > > > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > > > > > 盼 李 via Gcc-patches writes: > > >> Thank you all for your quick response. > > >> > > >> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*. > > > > > > OK, thanks to both of you. This is what I'd have expected. > > > > > > In that case, I think both the can_div_away_from_zero_p and the > > > original patch (using size_one) will give the wrong results. > > > There isn't a way of representing ceil([4,4]/8) as a poly_int. > > > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even. > > > > > >> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below. > > >> > > >> VNx64BI precision [0x40, 0x40]. > > >> VNx32BI precision [0x20, 0x20]. > > >> VNx16BI precision [0x10, 0x10]. > > >> VNx8BI precision [0x8, 0x8]. > > >> VNx4BI precision [0x8, 0x8]. > > >> VNx2BI precision [0x8, 0x8]. > > >> VNx1BI precision [0x8, 0x8]. > > > > > > Ah, OK. Which self-test causes this? > > > > > > Richard > > > > > >> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten. > > >> > > >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode])) > > >> gcc_unreachable (); // Hit on [4, 4] of the self-test. > > >> > > >> Pan > > >> ________________________________ > > >> From: juzhe.zhong@rivai.ai > > >> Sent: Wednesday, March 1, 2023 18:46 > > >> To: richard.sandiford ; pan2.li > > >> Cc: incarnation.p.lee ; gcc-patches ; Kito.cheng ; rguenther > > >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > >> > > >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8] > > >>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X > > >>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of > > >>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision > > >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3? > > >> > > >> Hi, Richard. Thank you for helping us. > > >> My understanding of RVV ISA: > > >> > > >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc) > > >> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision. > > >> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive). > > >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize. > > >> > > >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not). > > >> > > >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access. > > >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size. > > >> > > >> This is my comprehension of RVV ISA, feel free to correct me. > > >> Thanks. > > >> > > >> ________________________________ > > >> juzhe.zhong@rivai.ai > > >> > > >> From: Richard Sandiford > > >> Date: 2023-03-01 18:11 > > >> To: Li\, Pan2 > > >> CC: 盼 李; incarnation.p.lee--- via Gcc-patches; juzhe.zhong\@rivai.ai; kito.cheng\@sifive.com; rguenther\@suse.de > > >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > >> "Li, Pan2" writes: > > >>> Hi Richard Sandiford, > > >>> > > >>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, ! > > >>> > > >>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " > > >>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); > > >>> > > >>> template > > >>> inline typename if_nonpoly::type > > >>> can_div_away_from_zero_p (const poly_int_pod &a, > > >>> Cb b, > > >>> poly_int_pod *quotient) > > >>> { > > >>> if (!can_div_trunc_p (a, b, quotient)) > > >>> return false; > > >>> if (maybe_ne (*quotient * b, a)) > > >>> for (unsigned int i = 0; i < N; ++i) > > >>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1); > > >>> return true; > > >>> } > > >>> > > >>> But I may have a question about the one case as below. > > >>> > > >>> Assume: > > >>> a = [4, 4], b = 8. > > >>> > > >>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false. > > >>> > > >>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling. > > >> > > >> Is it right that, for RVV, a load or store of [4,4] will access [8,8] > > >> bits, even when that means accessing fully-unused bytes? E.g. 4+4X > > >> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of > > >> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision > > >> of [4,4] would store 2 bytes beyond the end of the useful data when X==3? > > >> > > >> Richard > > >> > > >>> Pan > > >>> > > >>> From: 盼 李 > > >>> Sent: Tuesday, February 28, 2023 5:59 PM > > >>> To: Richard Sandiford ; Li, Pan2 > > >>> Cc: incarnation.p.lee--- via Gcc-patches ; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de > > >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > >>> > > >>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted. > > >>> > > >>> Pan > > >>> ________________________________ > > >>> From: Richard Sandiford > > > >>> Sent: Tuesday, February 28, 2023 17:50 > > >>> To: Li, Pan2 > > > >>> Cc: 盼 李 >; incarnation.p.lee--- via Gcc-patches >; juzhe.zhong@rivai.ai >; kito.cheng@sifive.com >; rguenther@suse.de > > > >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > >>> > > >>> "Li, Pan2" > writes: > > >>>> Hi Richard Sandiford, > > >>>> > > >>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N. > > >>>> > > >>>> template > > >>>> inline POLY_CONST_RESULT (N, Ca, Ca) > > >>>> normalize_to_unit (const poly_int_pod &a) > > >>>> { > > >>>> typedef POLY_CONST_COEFF (Ca, Ca) C; > > >>>> > > >>>> poly_int normalized = a; > > >>>> > > >>>> if (normalized.is_constant()) > > >>>> normalized.coeffs[0] = 1; > > >>>> else > > >>>> for (unsigned int i = 0; i < N; i++) > > >>>> POLY_SET_COEFF (C, normalized, i, 1); > > >>>> > > >>>> return normalized; > > >>>> } > > >>>> > > >>>> And then adjust the genmodes like below to consume the unit poly. > > >>>> > > >>>> printf (" poly_uint16 unit_poly = " > > >>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name); > > >>>> printf (" if (known_lt (mode_precision[E_%smode], " > > >>>> "unit_poly * BITS_PER_UNIT))\n", m->name); > > >>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name); > > >>>> > > >>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”. > > >>> > > >>> My point was that we have multiple ways of dividing poly_ints: > > >>> > > >>> - exact_div, for when the caller knows that the result is always exact > > >>> - can_div_trunc_p, for truncating division (round towards 0) > > >>> - can_div_away_from_zero_p, for rounding away from 0 > > >>> - ... > > >>> > > >>> This is like how we have multiple division *_EXPRs on trees. > > >>> > > >>> Until now, exact_div was the correct choice for modes because vector > > >>> modes didn't have padding. We're now changing that, so my suggestion > > >>> in the review was to change the division operation that we use. > > >>> Rather than use exact_div, we should now use can_div_away_from_zero_p, > > >>> which would have the effect of rounding the quotient up. > > >>> > > >>> Something like: > > >>> > > >>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, > > >>> &mode_size[E_%smode])) > > >>> gcc_unreachable (); > > >>> > > >>> But this will require a new overload of can_div_away_from_zero_p, since > > >>> the existing one is for constant quotients rather than constant divisors. > > >>> > > >>> Thanks, > > >>> Richard > > >>> > > >>>> > > >>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you! > > >>>> > > >>>> Pan > > >>>> > > >>>> From: 盼 李 > > > >>>> Sent: Monday, February 27, 2023 11:13 PM > > >>>> To: Richard Sandiford >; incarnation.p.lee--- via Gcc-patches > > > >>>> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 > > > >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > >>>> > > >>>> Never mind, wish you have a good holiday. > > >>>> > > >>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int. > > >>>> > > >>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true. > > >>>> > > >>>> Thanks again for your professional suggestion, have a nice day, ! > > >>>> > > >>>> Pan > > >>>> ________________________________ > > >>>> From: Richard Sandiford >> > > >>>> Sent: Monday, February 27, 2023 22:24 > > >>>> To: incarnation.p.lee--- via Gcc-patches >> > > >>>> Cc: incarnation.p.lee@outlook.com> >>; juzhe.zhong@rivai.ai> >>; kito.cheng@sifive.com> >>; rguenther@suse.de >>; pan2.li@intel.com li@intel.com%3cmailto:pan2..li@intel.com>>> > > >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > >>>> > > >>>> Sorry for the slow reply, been away for a couple of weeks. > > >>>> > > >>>> "incarnation.p.lee--- via Gcc-patches" >> writes: > > >>>>> From: Pan Li >> > > >>>>> > > >>>>> Fix the bug of the rvv bool mode precision with the adjustment. > > >>>>> The bits size of vbool*_t will be adjusted to > > >>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The > > >>>>> adjusted mode precison of vbool*_t will help underlying pass to > > >>>>> make the right decision for both the correctness and optimization. > > >>>>> > > >>>>> Given below sample code: > > >>>>> void test_1(int8_t * restrict in, int8_t * restrict out) > > >>>>> { > > >>>>> vbool8_t v2 = *(vbool8_t*)in; > > >>>>> vbool16_t v5 = *(vbool16_t*)in; > > >>>>> *(vbool16_t*)(out + 200) = v5; > > >>>>> *(vbool8_t*)(out + 100) = v2; > > >>>>> } > > >>>>> > > >>>>> Before the precision adjustment: > > >>>>> addi a4,a1,100 > > >>>>> vsetvli a5,zero,e8,m1,ta,ma > > >>>>> addi a1,a1,200 > > >>>>> vlm.v v24,0(a0) > > >>>>> vsm.v v24,0(a4) > > >>>>> // Need one vsetvli and vlm.v for correctness here. > > >>>>> vsm.v v24,0(a1) > > >>>>> > > >>>>> After the precision adjustment: > > >>>>> csrr t0,vlenb > > >>>>> slli t1,t0,1 > > >>>>> csrr a3,vlenb > > >>>>> sub sp,sp,t1 > > >>>>> slli a4,a3,1 > > >>>>> add a4,a4,sp > > >>>>> sub a3,a4,a3 > > >>>>> vsetvli a5,zero,e8,m1,ta,ma > > >>>>> addi a2,a1,200 > > >>>>> vlm.v v24,0(a0) > > >>>>> vsm.v v24,0(a3) > > >>>>> addi a1,a1,100 > > >>>>> vsetvli a4,zero,e8,mf2,ta,ma > > >>>>> csrr t0,vlenb > > >>>>> vlm.v v25,0(a3) > > >>>>> vsm.v v25,0(a2) > > >>>>> slli t1,t0,1 > > >>>>> vsetvli a5,zero,e8,m1,ta,ma > > >>>>> vsm.v v24,0(a1) > > >>>>> add sp,sp,t1 > > >>>>> jr ra > > >>>>> > > >>>>> However, there may be some optimization opportunates after > > >>>>> the mode precision adjustment. It can be token care of in > > >>>>> the RISC-V backend in the underlying separted PR(s). > > >>>>> > > >>>>> PR 108185 > > >>>>> PR 108654 > > >>>>> > > >>>>> gcc/ChangeLog: > > >>>>> > > >>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION): > > >>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision): > > >>>>> * config/riscv/riscv.h (riscv_v_adjust_precision): > > >>>>> * genmodes.cc (ADJUST_PRECISION): > > >>>>> (emit_mode_adjustments): > > >>>>> > > >>>>> gcc/testsuite/ChangeLog: > > >>>>> > > >>>>> * gcc.target/riscv/pr108185-1.c: New test. > > >>>>> * gcc.target/riscv/pr108185-2.c: New test. > > >>>>> * gcc.target/riscv/pr108185-3.c: New test. > > >>>>> * gcc.target/riscv/pr108185-4.c: New test. > > >>>>> * gcc.target/riscv/pr108185-5.c: New test. > > >>>>> * gcc.target/riscv/pr108185-6.c: New test. > > >>>>> * gcc.target/riscv/pr108185-7.c: New test. > > >>>>> * gcc.target/riscv/pr108185-8.c: New test. > > >>>>> > > >>>>> Signed-off-by: Pan Li >> > > >>>>> --- > > >>>>> gcc/config/riscv/riscv-modes.def | 8 +++ > > >>>>> gcc/config/riscv/riscv.cc | 12 ++++ > > >>>>> gcc/config/riscv/riscv.h | 1 + > > >>>>> gcc/genmodes.cc | 25 ++++++- > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++ > > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++ > > >>>>> 12 files changed, 598 insertions(+), 1 deletion(-) > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c > > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c > > >>>>> > > >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def > > >>>>> index d5305efa8a6..110bddce851 100644 > > >>>>> --- a/gcc/config/riscv/riscv-modes.def > > >>>>> +++ b/gcc/config/riscv/riscv-modes.def > > >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > > >>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); > > >>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8)); > > >>>>> > > >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1)); > > >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2)); > > >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4)); > > >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8)); > > >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16)); > > >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32)); > > >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64)); > > >>>>> + > > >>>>> /* > > >>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 | > > >>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL | > > >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > >>>>> index de3e1f903c7..cbe66c0e35b 100644 > > >>>>> --- a/gcc/config/riscv/riscv.cc > > >>>>> +++ b/gcc/config/riscv/riscv.cc > > >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale) > > >>>>> return scale; > > >>>>> } > > >>>>> > > >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct > > >>>>> + PRECISION size for corresponding machine_mode. */ > > >>>>> + > > >>>>> +poly_int64 > > >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale) > > >>>>> +{ > > >>>>> + if (riscv_v_ext_vector_mode_p (mode)) > > >>>>> + return riscv_vector_chunks * scale; > > >>>>> + > > >>>>> + return scale; > > >>>>> +} > > >>>>> + > > >>>>> /* Return true if X is a valid address for machine mode MODE. If it is, > > >>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in > > >>>>> effect. */ > > >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h > > >>>>> index 5bc7f2f467d..15b9317a8ce 100644 > > >>>>> --- a/gcc/config/riscv/riscv.h > > >>>>> +++ b/gcc/config/riscv/riscv.h > > >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; > > >>>>> extern unsigned riscv_bytes_per_vector_chunk; > > >>>>> extern poly_uint16 riscv_vector_chunks; > > >>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int); > > >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int); > > >>>>> /* The number of bits and bytes in a RVV vector. */ > > >>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8)) > > >>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk)) > > >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc > > >>>>> index 2d418f09aab..12f4e6335e6 100644 > > >>>>> --- a/gcc/genmodes.cc > > >>>>> +++ b/gcc/genmodes.cc > > >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; > > >>>>> static struct mode_adjust *adj_format; > > >>>>> static struct mode_adjust *adj_ibit; > > >>>>> static struct mode_adjust *adj_fbit; > > >>>>> +static struct mode_adjust *adj_precision; > > >>>>> > > >>>>> /* Mode class operations. */ > > >>>>> static enum mode_class > > >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, > > >>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM) > > >>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM) > > >>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM) > > >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM) > > >>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT) > > >>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM) > > >>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) > > >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void) > > >>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n", > > >>>>> m->name, m->name); > > >>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name); > > >>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode]," > > >>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */ > > >>>>> + printf (" poly_uint16 size_one = " > > >>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name); > > >>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n"); > > >>>> > > >>>> Have you tried this on an x86_64 system? I wouldn't expect it to work > > >>>> because of the: > > >>>> > > >>>> STATIC_ASSERT (N >= 2); > > >>>> > > >>>> in the poly_uint16 constructor. > > >>>> > > >>>>> + printf (" if (known_lt (mode_precision[E_%smode], " > > >>>>> + "size_one * BITS_PER_UNIT))\n", m->name); > > >>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name); > > >>>>> + printf (" else\n"); > > >>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode]," > > >>>> > > >>>> Now that the assert implicit in the original exact_div no longer holds, > > >>>> I think we should instead generalise it to can_div_away_from_zero_p > > >>>> (which will involve defining a new overload of can_div_away_from_zero_p). > > >>>> I think that will give the same result as the code above for the cases > > >>>> that the code above handles. But it should be more general too. > > >>>> > > >>>> TBH, I'm still sceptical that this is all that is needed. It seems > > >>>> unlikely that we've been so good at writing vector support code that > > >>>> we've made it work for precision < bitsize, despite that being an > > >>>> unsupported combination until now. But I guess we can fix problems > > >>>> on a case-by-case basis. > > >>>> > > >>>> Thanks, > > >>>> Richard > > >>>> > > >>>>> " BITS_PER_UNIT);\n", m->name, m->name); > > >>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name); > > >>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name); > > >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void) > > >>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n", > > >>>>> a->file, a->line, a->mode->name, a->adjustment); > > >>>>> > > >>>>> + /* Adjust precision to the actual bits size. */ > > >>>>> + for (a = adj_precision; a; a = a->next) > > >>>>> + switch (a->mode->cl) > > >>>>> + { > > >>>>> + case MODE_VECTOR_BOOL: > > >>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line, > > >>>>> + a->adjustment); > > >>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name); > > >>>>> + break; > > >>>>> + default: > > >>>>> + break; > > >>>>> + } > > >>>>> + > > >>>>> puts ("}"); > > >>>>> } > > >>>>> > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..e70960c5b6d > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..dcc7a644a88 > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..3af0513e006 > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..ea3c360d756 > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..9fc659d2402 > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..98275e5267d > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..8f6f0b11f09 > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c > > >>>>> @@ -0,0 +1,68 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c > > >>>>> new file mode 100644 > > >>>>> index 00000000000..d96959dd064 > > >>>>> --- /dev/null > > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c > > >>>>> @@ -0,0 +1,77 @@ > > >>>>> +/* { dg-do compile } */ > > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ > > >>>>> + > > >>>>> +#include "riscv_vector.h" > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool1_t v1 = *(vbool1_t*)in; > > >>>>> + vbool1_t v2 = *(vbool1_t*)in; > > >>>>> + > > >>>>> + *(vbool1_t*)(out + 100) = v1; > > >>>>> + *(vbool1_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool2_t v1 = *(vbool2_t*)in; > > >>>>> + vbool2_t v2 = *(vbool2_t*)in; > > >>>>> + > > >>>>> + *(vbool2_t*)(out + 100) = v1; > > >>>>> + *(vbool2_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool4_t v1 = *(vbool4_t*)in; > > >>>>> + vbool4_t v2 = *(vbool4_t*)in; > > >>>>> + > > >>>>> + *(vbool4_t*)(out + 100) = v1; > > >>>>> + *(vbool4_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool8_t v1 = *(vbool8_t*)in; > > >>>>> + vbool8_t v2 = *(vbool8_t*)in; > > >>>>> + > > >>>>> + *(vbool8_t*)(out + 100) = v1; > > >>>>> + *(vbool8_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool16_t v1 = *(vbool16_t*)in; > > >>>>> + vbool16_t v2 = *(vbool16_t*)in; > > >>>>> + > > >>>>> + *(vbool16_t*)(out + 100) = v1; > > >>>>> + *(vbool16_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool32_t v1 = *(vbool32_t*)in; > > >>>>> + vbool32_t v2 = *(vbool32_t*)in; > > >>>>> + > > >>>>> + *(vbool32_t*)(out + 100) = v1; > > >>>>> + *(vbool32_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +void > > >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) { > > >>>>> + vbool64_t v1 = *(vbool64_t*)in; > > >>>>> + vbool64_t v2 = *(vbool64_t*)in; > > >>>>> + > > >>>>> + *(vbool64_t*)(out + 100) = v1; > > >>>>> + *(vbool64_t*)(out + 200) = v2; > > >>>>> +} > > >>>>> + > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */ > > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */ > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; > HRB 36809 (AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)