let me explain more about the test [4,4]. As I understand, the self test will run more than 1 time for VNx4BI. The first time, precision[VNx4BI] = [8, 8], then the PR precision part will adjust it to [4, 4]. The rest times, precision[VNx4BI] = [4, 4], then can_* will return false and hit the gcc_unreeachable(). I agree that the current infrastructure cannot represent this case. As mentioned by juzhe, we just would like to have some differences for VNx[1-4]BI. So we try to adjust the precision and meet some self-test failure, that is the whole story of the genmode printf(" xxx") parts changes. It is perfect if you have some elegant way for this, including both the self-test and the precision part. Thank you so much Pan ________________________________ From: juzhe.zhong@rivai.ai Sent: Wednesday, March 1, 2023 20:13 To: richard.sandiford ; gcc-patches Cc: incarnation.p.lee ; pan2.li ; Kito.cheng ; rguenther Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment Actually, we just want to differentiate VNx1BI VNx2BI VNx4BI VNx8BI, and they are considered the same in GCC which produce BUG in RVV currently. This patch is just adjust precision to differentiate them but may not be (like you say), they may not be handled accurately according precision. However, at least it can help us differentiate these 4 mask modes and avoid encounter the bugs. The is the current solution that we have to fix the bug of RVV and avoid influence other targets. Do you have other ideas to fix this issue? Or such patch with adding adjust_precision support is OK to GCC? Thanks. ________________________________ juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2023-03-01 20:03 To: 盼 李 via Gcc-patches CC: 盼 李; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng; rguenther Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment 盼 李 via Gcc-patches writes: > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged. > > printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " > "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); > > VNx4BI Before precision [0x4, 0x4], size [0x4, 0] > VNx4BI After precision [0x4, 0x4], size [0x4, 0] Yeah, the result is expected to be unchanged if the division fails. That's a deliberate part of the interface. The can_* functions should never be used without testing the boolean return value. But this precision of [4,4] for VNx4BI is different from what you listed below. Like I say, if the precision really is [4,4], and if the size really is ceil([4,4]/8), then I don't think we can represent that with current infrastructure. Thanks, Richard > > Pan > ________________________________ > From: Richard Sandiford > Sent: Wednesday, March 1, 2023 19:11 > To: 盼 李 via Gcc-patches > Cc: juzhe.zhong@rivai.ai ; pan2.li ; 盼 李 ; Kito.cheng ; rguenther > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment > > 盼 李 via Gcc-patches writes: >> Thank you all for your quick response. >> >> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*. > > OK, thanks to both of you. This is what I'd have expected. > > In that case, I think both the can_div_away_from_zero_p and the > original patch (using size_one) will give the wrong results. > There isn't a way of representing ceil([4,4]/8) as a poly_int. > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even. > >> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below. >> >> VNx64BI precision [0x40, 0x40]. >> VNx32BI precision [0x20, 0x20]. >> VNx16BI precision [0x10, 0x10]. >> VNx8BI precision [0x8, 0x8]. >> VNx4BI precision [0x8, 0x8]. >> VNx2BI precision [0x8, 0x8]. >> VNx1BI precision [0x8, 0x8]. > > Ah, OK. Which self-test causes this? > > Richard > >> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten. >> >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode])) >> gcc_unreachable (); // Hit on [4, 4] of the self-test. >> >> Pan >> ________________________________ >> From: juzhe.zhong@rivai.ai >> Sent: Wednesday, March 1, 2023 18:46 >> To: richard.sandiford ; pan2.li >> Cc: incarnation.p.lee ; gcc-patches ; Kito.cheng ; rguenther >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment >> >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8] >>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X >>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of >>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3? >> >> Hi, Richard. Thank you for helping us. >> My understanding of RVV ISA: >> >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc) >> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision. >> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive). >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize. >> >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not). >> >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access. >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size. >> >> This is my comprehension of RVV ISA, feel free to correct me. >> Thanks. >> >> ________________________________ >> juzhe.zhong@rivai.ai >> >> From: Richard Sandiford >> Date: 2023-03-01 18:11 >> To: Li\, Pan2 >> CC: 盼 李; incarnation.p.lee--- via Gcc-patches; juzhe.zhong\@rivai.ai; kito.cheng\@sifive.com; rguenther\@suse.de >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment >> "Li, Pan2" writes: >>> Hi Richard Sandiford, >>> >>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, ! >>> >>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], " >>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name); >>> >>> template >>> inline typename if_nonpoly::type >>> can_div_away_from_zero_p (const poly_int_pod &a, >>> Cb b, >>> poly_int_pod *quotient) >>> { >>> if (!can_div_trunc_p (a, b, quotient)) >>> return false; >>> if (maybe_ne (*quotient * b, a)) >>> for (unsigned int i = 0; i < N; ++i) >>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1); >>> return true; >>> } >>> >>> But I may have a question about the one case as below. >>> >>> Assume: >>> a = [4, 4], b = 8. >>> >>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false. >>> >>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling. >> >> Is it right that, for RVV, a load or store of [4,4] will access [8,8] >> bits, even when that means accessing fully-unused bytes? E.g. 4+4X >> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of >> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision >> of [4,4] would store 2 bytes beyond the end of the useful data when X==3? >> >> Richard >> >>> Pan >>> >>> From: 盼 李 >>> Sent: Tuesday, February 28, 2023 5:59 PM >>> To: Richard Sandiford ; Li, Pan2 >>> Cc: incarnation.p.lee--- via Gcc-patches ; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment >>> >>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted. >>> >>> Pan >>> ________________________________ >>> From: Richard Sandiford > >>> Sent: Tuesday, February 28, 2023 17:50 >>> To: Li, Pan2 > >>> Cc: 盼 李 >; incarnation.p.lee--- via Gcc-patches >; juzhe.zhong@rivai.ai >; kito.cheng@sifive.com >; rguenther@suse.de > >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment >>> >>> "Li, Pan2" > writes: >>>> Hi Richard Sandiford, >>>> >>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N. >>>> >>>> template >>>> inline POLY_CONST_RESULT (N, Ca, Ca) >>>> normalize_to_unit (const poly_int_pod &a) >>>> { >>>> typedef POLY_CONST_COEFF (Ca, Ca) C; >>>> >>>> poly_int normalized = a; >>>> >>>> if (normalized.is_constant()) >>>> normalized.coeffs[0] = 1; >>>> else >>>> for (unsigned int i = 0; i < N; i++) >>>> POLY_SET_COEFF (C, normalized, i, 1); >>>> >>>> return normalized; >>>> } >>>> >>>> And then adjust the genmodes like below to consume the unit poly. >>>> >>>> printf (" poly_uint16 unit_poly = " >>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name); >>>> printf (" if (known_lt (mode_precision[E_%smode], " >>>> "unit_poly * BITS_PER_UNIT))\n", m->name); >>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name); >>>> >>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”. >>> >>> My point was that we have multiple ways of dividing poly_ints: >>> >>> - exact_div, for when the caller knows that the result is always exact >>> - can_div_trunc_p, for truncating division (round towards 0) >>> - can_div_away_from_zero_p, for rounding away from 0 >>> - ... >>> >>> This is like how we have multiple division *_EXPRs on trees. >>> >>> Until now, exact_div was the correct choice for modes because vector >>> modes didn't have padding. We're now changing that, so my suggestion >>> in the review was to change the division operation that we use. >>> Rather than use exact_div, we should now use can_div_away_from_zero_p, >>> which would have the effect of rounding the quotient up. >>> >>> Something like: >>> >>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, >>> &mode_size[E_%smode])) >>> gcc_unreachable (); >>> >>> But this will require a new overload of can_div_away_from_zero_p, since >>> the existing one is for constant quotients rather than constant divisors. >>> >>> Thanks, >>> Richard >>> >>>> >>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you! >>>> >>>> Pan >>>> >>>> From: 盼 李 > >>>> Sent: Monday, February 27, 2023 11:13 PM >>>> To: Richard Sandiford >; incarnation.p.lee--- via Gcc-patches > >>>> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 > >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment >>>> >>>> Never mind, wish you have a good holiday. >>>> >>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int. >>>> >>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true. >>>> >>>> Thanks again for your professional suggestion, have a nice day, ! >>>> >>>> Pan >>>> ________________________________ >>>> From: Richard Sandiford >> >>>> Sent: Monday, February 27, 2023 22:24 >>>> To: incarnation.p.lee--- via Gcc-patches >> >>>> Cc: incarnation.p.lee@outlook.com> >>; juzhe.zhong@rivai.ai> >>; kito.cheng@sifive.com> >>; rguenther@suse.de >>; pan2.li@intel.com >> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment >>>> >>>> Sorry for the slow reply, been away for a couple of weeks. >>>> >>>> "incarnation.p.lee--- via Gcc-patches" >> writes: >>>>> From: Pan Li >> >>>>> >>>>> Fix the bug of the rvv bool mode precision with the adjustment. >>>>> The bits size of vbool*_t will be adjusted to >>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The >>>>> adjusted mode precison of vbool*_t will help underlying pass to >>>>> make the right decision for both the correctness and optimization. >>>>> >>>>> Given below sample code: >>>>> void test_1(int8_t * restrict in, int8_t * restrict out) >>>>> { >>>>> vbool8_t v2 = *(vbool8_t*)in; >>>>> vbool16_t v5 = *(vbool16_t*)in; >>>>> *(vbool16_t*)(out + 200) = v5; >>>>> *(vbool8_t*)(out + 100) = v2; >>>>> } >>>>> >>>>> Before the precision adjustment: >>>>> addi a4,a1,100 >>>>> vsetvli a5,zero,e8,m1,ta,ma >>>>> addi a1,a1,200 >>>>> vlm.v v24,0(a0) >>>>> vsm.v v24,0(a4) >>>>> // Need one vsetvli and vlm.v for correctness here. >>>>> vsm.v v24,0(a1) >>>>> >>>>> After the precision adjustment: >>>>> csrr t0,vlenb >>>>> slli t1,t0,1 >>>>> csrr a3,vlenb >>>>> sub sp,sp,t1 >>>>> slli a4,a3,1 >>>>> add a4,a4,sp >>>>> sub a3,a4,a3 >>>>> vsetvli a5,zero,e8,m1,ta,ma >>>>> addi a2,a1,200 >>>>> vlm.v v24,0(a0) >>>>> vsm.v v24,0(a3) >>>>> addi a1,a1,100 >>>>> vsetvli a4,zero,e8,mf2,ta,ma >>>>> csrr t0,vlenb >>>>> vlm.v v25,0(a3) >>>>> vsm.v v25,0(a2) >>>>> slli t1,t0,1 >>>>> vsetvli a5,zero,e8,m1,ta,ma >>>>> vsm.v v24,0(a1) >>>>> add sp,sp,t1 >>>>> jr ra >>>>> >>>>> However, there may be some optimization opportunates after >>>>> the mode precision adjustment. It can be token care of in >>>>> the RISC-V backend in the underlying separted PR(s). >>>>> >>>>> PR 108185 >>>>> PR 108654 >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION): >>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision): >>>>> * config/riscv/riscv.h (riscv_v_adjust_precision): >>>>> * genmodes.cc (ADJUST_PRECISION): >>>>> (emit_mode_adjustments): >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.target/riscv/pr108185-1.c: New test. >>>>> * gcc.target/riscv/pr108185-2.c: New test. >>>>> * gcc.target/riscv/pr108185-3.c: New test. >>>>> * gcc.target/riscv/pr108185-4.c: New test. >>>>> * gcc.target/riscv/pr108185-5.c: New test. >>>>> * gcc.target/riscv/pr108185-6.c: New test. >>>>> * gcc.target/riscv/pr108185-7.c: New test. >>>>> * gcc.target/riscv/pr108185-8.c: New test. >>>>> >>>>> Signed-off-by: Pan Li >> >>>>> --- >>>>> gcc/config/riscv/riscv-modes.def | 8 +++ >>>>> gcc/config/riscv/riscv.cc | 12 ++++ >>>>> gcc/config/riscv/riscv.h | 1 + >>>>> gcc/genmodes.cc | 25 ++++++- >>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++ >>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++ >>>>> 12 files changed, 598 insertions(+), 1 deletion(-) >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>>> >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def >>>>> index d5305efa8a6..110bddce851 100644 >>>>> --- a/gcc/config/riscv/riscv-modes.def >>>>> +++ b/gcc/config/riscv/riscv-modes.def >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); >>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); >>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8)); >>>>> >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1)); >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2)); >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4)); >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8)); >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16)); >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32)); >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64)); >>>>> + >>>>> /* >>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 | >>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL | >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >>>>> index de3e1f903c7..cbe66c0e35b 100644 >>>>> --- a/gcc/config/riscv/riscv.cc >>>>> +++ b/gcc/config/riscv/riscv.cc >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale) >>>>> return scale; >>>>> } >>>>> >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct >>>>> + PRECISION size for corresponding machine_mode. */ >>>>> + >>>>> +poly_int64 >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale) >>>>> +{ >>>>> + if (riscv_v_ext_vector_mode_p (mode)) >>>>> + return riscv_vector_chunks * scale; >>>>> + >>>>> + return scale; >>>>> +} >>>>> + >>>>> /* Return true if X is a valid address for machine mode MODE. If it is, >>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in >>>>> effect. */ >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h >>>>> index 5bc7f2f467d..15b9317a8ce 100644 >>>>> --- a/gcc/config/riscv/riscv.h >>>>> +++ b/gcc/config/riscv/riscv.h >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; >>>>> extern unsigned riscv_bytes_per_vector_chunk; >>>>> extern poly_uint16 riscv_vector_chunks; >>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int); >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int); >>>>> /* The number of bits and bytes in a RVV vector. */ >>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8)) >>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk)) >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc >>>>> index 2d418f09aab..12f4e6335e6 100644 >>>>> --- a/gcc/genmodes.cc >>>>> +++ b/gcc/genmodes.cc >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; >>>>> static struct mode_adjust *adj_format; >>>>> static struct mode_adjust *adj_ibit; >>>>> static struct mode_adjust *adj_fbit; >>>>> +static struct mode_adjust *adj_precision; >>>>> >>>>> /* Mode class operations. */ >>>>> static enum mode_class >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, >>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM) >>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM) >>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM) >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM) >>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT) >>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM) >>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void) >>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n", >>>>> m->name, m->name); >>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name); >>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode]," >>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */ >>>>> + printf (" poly_uint16 size_one = " >>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name); >>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n"); >>>> >>>> Have you tried this on an x86_64 system? I wouldn't expect it to work >>>> because of the: >>>> >>>> STATIC_ASSERT (N >= 2); >>>> >>>> in the poly_uint16 constructor. >>>> >>>>> + printf (" if (known_lt (mode_precision[E_%smode], " >>>>> + "size_one * BITS_PER_UNIT))\n", m->name); >>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name); >>>>> + printf (" else\n"); >>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode]," >>>> >>>> Now that the assert implicit in the original exact_div no longer holds, >>>> I think we should instead generalise it to can_div_away_from_zero_p >>>> (which will involve defining a new overload of can_div_away_from_zero_p). >>>> I think that will give the same result as the code above for the cases >>>> that the code above handles. But it should be more general too. >>>> >>>> TBH, I'm still sceptical that this is all that is needed. It seems >>>> unlikely that we've been so good at writing vector support code that >>>> we've made it work for precision < bitsize, despite that being an >>>> unsupported combination until now. But I guess we can fix problems >>>> on a case-by-case basis. >>>> >>>> Thanks, >>>> Richard >>>> >>>>> " BITS_PER_UNIT);\n", m->name, m->name); >>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name); >>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name); >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void) >>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n", >>>>> a->file, a->line, a->mode->name, a->adjustment); >>>>> >>>>> + /* Adjust precision to the actual bits size. */ >>>>> + for (a = adj_precision; a; a = a->next) >>>>> + switch (a->mode->cl) >>>>> + { >>>>> + case MODE_VECTOR_BOOL: >>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line, >>>>> + a->adjustment); >>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name); >>>>> + break; >>>>> + default: >>>>> + break; >>>>> + } >>>>> + >>>>> puts ("}"); >>>>> } >>>>> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>>> new file mode 100644 >>>>> index 00000000000..e70960c5b6d >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>>> new file mode 100644 >>>>> index 00000000000..dcc7a644a88 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>>> new file mode 100644 >>>>> index 00000000000..3af0513e006 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>>> new file mode 100644 >>>>> index 00000000000..ea3c360d756 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>>> new file mode 100644 >>>>> index 00000000000..9fc659d2402 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>>> new file mode 100644 >>>>> index 00000000000..98275e5267d >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>>> new file mode 100644 >>>>> index 00000000000..8f6f0b11f09 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c >>>>> @@ -0,0 +1,68 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>>> new file mode 100644 >>>>> index 00000000000..d96959dd064 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c >>>>> @@ -0,0 +1,77 @@ >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */ >>>>> + >>>>> +#include "riscv_vector.h" >>>>> + >>>>> +void >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool1_t v1 = *(vbool1_t*)in; >>>>> + vbool1_t v2 = *(vbool1_t*)in; >>>>> + >>>>> + *(vbool1_t*)(out + 100) = v1; >>>>> + *(vbool1_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool2_t v1 = *(vbool2_t*)in; >>>>> + vbool2_t v2 = *(vbool2_t*)in; >>>>> + >>>>> + *(vbool2_t*)(out + 100) = v1; >>>>> + *(vbool2_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool4_t v1 = *(vbool4_t*)in; >>>>> + vbool4_t v2 = *(vbool4_t*)in; >>>>> + >>>>> + *(vbool4_t*)(out + 100) = v1; >>>>> + *(vbool4_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool8_t v1 = *(vbool8_t*)in; >>>>> + vbool8_t v2 = *(vbool8_t*)in; >>>>> + >>>>> + *(vbool8_t*)(out + 100) = v1; >>>>> + *(vbool8_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool16_t v1 = *(vbool16_t*)in; >>>>> + vbool16_t v2 = *(vbool16_t*)in; >>>>> + >>>>> + *(vbool16_t*)(out + 100) = v1; >>>>> + *(vbool16_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool32_t v1 = *(vbool32_t*)in; >>>>> + vbool32_t v2 = *(vbool32_t*)in; >>>>> + >>>>> + *(vbool32_t*)(out + 100) = v1; >>>>> + *(vbool32_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +void >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) { >>>>> + vbool64_t v1 = *(vbool64_t*)in; >>>>> + vbool64_t v2 = *(vbool64_t*)in; >>>>> + >>>>> + *(vbool64_t*)(out + 100) = v1; >>>>> + *(vbool64_t*)(out + 200) = v2; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */ >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */