From: "盼 李" <incarnation.p.lee@outlook.com>
To: "Richard Sandiford" <richard.sandiford@arm.com>,
"盼 李 via Gcc-patches" <gcc-patches@gcc.gnu.org>
Cc: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>,
pan2.li <pan2.li@intel.com>, Kito.cheng <kito.cheng@sifive.com>,
rguenther <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Date: Wed, 1 Mar 2023 11:26:40 +0000 [thread overview]
Message-ID: <BYAPR04MB4824B3FEC38AACE8844978FBB7AD9@BYAPR04MB4824.namprd04.prod.outlook.com> (raw)
In-Reply-To: <mpt8rggohe9.fsf@arm.com>
[-- Attachment #1: Type: text/plain, Size: 49696 bytes --]
Thank you for the explanation.
For [4,4] I need extra time to figure out which one, but I confirmed it occurs from the log.
This PR precision adjustment part tries to align the ISA as juzhe mentioned, by the underlying precision adjustment part.
VNx64BI precision [0x40, 0x40] // unchanged
VNx32BI precision [0x20, 0x20] // unchanged
VNx16BI precision [0x10, 0x10] // unchanged
VNx8BI precision [0x8, 0x8] // unchanged
VNx4BI precision [0x8, 0x8] => [4, 4]
VNx2BI precision [0x8, 0x8] => [2, 2]
VNx1BI precision [0x8, 0x8] => [1, 1]
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com>
Sent: Wednesday, March 1, 2023 19:11
To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Thank you all for your quick response.
>
> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
OK, thanks to both of you. This is what I'd have expected.
In that case, I think both the can_div_away_from_zero_p and the
original patch (using size_one) will give the wrong results.
There isn't a way of representing ceil([4,4]/8) as a poly_int.
The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>
> VNx64BI precision [0x40, 0x40].
> VNx32BI precision [0x20, 0x20].
> VNx16BI precision [0x10, 0x10].
> VNx8BI precision [0x8, 0x8].
> VNx4BI precision [0x8, 0x8].
> VNx2BI precision [0x8, 0x8].
> VNx1BI precision [0x8, 0x8].
Ah, OK. Which self-test causes this?
Richard
> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>
> Pan
> ________________________________
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Wednesday, March 1, 2023 18:46
> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Hi, Richard. Thank you for helping us.
> My understanding of RVV ISA:
>
> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>
> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>
> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>
> This is my comprehension of RVV ISA, feel free to correct me.
> Thanks.
>
> ________________________________
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> Date: 2023-03-01 18:11
> To: Li\, Pan2<mailto:pan2.li@intel.com>
> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
>> Hi Richard Sandiford,
>>
>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>
>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>
>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>> inline typename if_nonpoly<Cb, bool>::type
>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>> Cb b,
>> poly_int_pod<N, Cq> *quotient)
>> {
>> if (!can_div_trunc_p (a, b, quotient))
>> return false;
>> if (maybe_ne (*quotient * b, a))
>> for (unsigned int i = 0; i < N; ++i)
>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>> return true;
>> }
>>
>> But I may have a question about the one case as below.
>>
>> Assume:
>> a = [4, 4], b = 8.
>>
>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>
>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>
> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Richard
>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com>
>> Sent: Tuesday, February 28, 2023 5:59 PM
>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>> Sent: Tuesday, February 28, 2023 17:50
>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>> Hi Richard Sandiford,
>>>
>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>
>>> template<unsigned int N, typename Ca>
>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>> {
>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>
>>> poly_int<N, C> normalized = a;
>>>
>>> if (normalized.is_constant())
>>> normalized.coeffs[0] = 1;
>>> else
>>> for (unsigned int i = 0; i < N; i++)
>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>
>>> return normalized;
>>> }
>>>
>>> And then adjust the genmodes like below to consume the unit poly.
>>>
>>> printf (" poly_uint16 unit_poly = "
>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>
>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>
>> My point was that we have multiple ways of dividing poly_ints:
>>
>> - exact_div, for when the caller knows that the result is always exact
>> - can_div_trunc_p, for truncating division (round towards 0)
>> - can_div_away_from_zero_p, for rounding away from 0
>> - ...
>>
>> This is like how we have multiple division *_EXPRs on trees.
>>
>> Until now, exact_div was the correct choice for modes because vector
>> modes didn't have padding. We're now changing that, so my suggestion
>> in the review was to change the division operation that we use.
>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>> which would have the effect of rounding the quotient up.
>>
>> Something like:
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>> &mode_size[E_%smode]))
>> gcc_unreachable ();
>>
>> But this will require a new overload of can_div_away_from_zero_p, since
>> the existing one is for constant quotients rather than constant divisors.
>>
>> Thanks,
>> Richard
>>
>>>
>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>> Sent: Monday, February 27, 2023 11:13 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Never mind, wish you have a good holiday.
>>>
>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>
>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>
>>> Thanks again for your professional suggestion, have a nice day, !
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>> Sent: Monday, February 27, 2023 22:24
>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Sorry for the slow reply, been away for a couple of weeks.
>>>
>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>
>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>> The bits size of vbool*_t will be adjusted to
>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>> make the right decision for both the correctness and optimization.
>>>>
>>>> Given below sample code:
>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>> {
>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>> *(vbool16_t*)(out + 200) = v5;
>>>> *(vbool8_t*)(out + 100) = v2;
>>>> }
>>>>
>>>> Before the precision adjustment:
>>>> addi a4,a1,100
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a1,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a4)
>>>> // Need one vsetvli and vlm.v for correctness here.
>>>> vsm.v v24,0(a1)
>>>>
>>>> After the precision adjustment:
>>>> csrr t0,vlenb
>>>> slli t1,t0,1
>>>> csrr a3,vlenb
>>>> sub sp,sp,t1
>>>> slli a4,a3,1
>>>> add a4,a4,sp
>>>> sub a3,a4,a3
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a2,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a3)
>>>> addi a1,a1,100
>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>> csrr t0,vlenb
>>>> vlm.v v25,0(a3)
>>>> vsm.v v25,0(a2)
>>>> slli t1,t0,1
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> vsm.v v24,0(a1)
>>>> add sp,sp,t1
>>>> jr ra
>>>>
>>>> However, there may be some optimization opportunates after
>>>> the mode precision adjustment. It can be token care of in
>>>> the RISC-V backend in the underlying separted PR(s).
>>>>
>>>> PR 108185
>>>> PR 108654
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>> * genmodes.cc (ADJUST_PRECISION):
>>>> (emit_mode_adjustments):
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>
>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> ---
>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>> gcc/config/riscv/riscv.h | 1 +
>>>> gcc/genmodes.cc | 25 ++++++-
>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>
>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>> index d5305efa8a6..110bddce851 100644
>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>
>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>> +
>>>> /*
>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>> --- a/gcc/config/riscv/riscv.cc
>>>> +++ b/gcc/config/riscv/riscv.cc
>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>> return scale;
>>>> }
>>>>
>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>> + PRECISION size for corresponding machine_mode. */
>>>> +
>>>> +poly_int64
>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>> +{
>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>> + return riscv_vector_chunks * scale;
>>>> +
>>>> + return scale;
>>>> +}
>>>> +
>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>> effect. */
>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>> --- a/gcc/config/riscv/riscv.h
>>>> +++ b/gcc/config/riscv/riscv.h
>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>> extern poly_uint16 riscv_vector_chunks;
>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>> /* The number of bits and bytes in a RVV vector. */
>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>> index 2d418f09aab..12f4e6335e6 100644
>>>> --- a/gcc/genmodes.cc
>>>> +++ b/gcc/genmodes.cc
>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>> static struct mode_adjust *adj_format;
>>>> static struct mode_adjust *adj_ibit;
>>>> static struct mode_adjust *adj_fbit;
>>>> +static struct mode_adjust *adj_precision;
>>>>
>>>> /* Mode class operations. */
>>>> static enum mode_class
>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>> m->name, m->name);
>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>> + printf (" poly_uint16 size_one = "
>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>
>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>> because of the:
>>>
>>> STATIC_ASSERT (N >= 2);
>>>
>>> in the poly_uint16 constructor.
>>>
>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>> + printf (" else\n");
>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>
>>> Now that the assert implicit in the original exact_div no longer holds,
>>> I think we should instead generalise it to can_div_away_from_zero_p
>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>> I think that will give the same result as the code above for the cases
>>> that the code above handles. But it should be more general too.
>>>
>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>> unlikely that we've been so good at writing vector support code that
>>> we've made it work for precision < bitsize, despite that being an
>>> unsupported combination until now. But I guess we can fix problems
>>> on a case-by-case basis.
>>>
>>> Thanks,
>>> Richard
>>>
>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>
>>>> + /* Adjust precision to the actual bits size. */
>>>> + for (a = adj_precision; a; a = a->next)
>>>> + switch (a->mode->cl)
>>>> + {
>>>> + case MODE_VECTOR_BOOL:
>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>> + a->adjustment);
>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> +
>>>> puts ("}");
>>>> }
>>>>
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> new file mode 100644
>>>> index 00000000000..e70960c5b6d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> new file mode 100644
>>>> index 00000000000..dcc7a644a88
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> new file mode 100644
>>>> index 00000000000..3af0513e006
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> new file mode 100644
>>>> index 00000000000..ea3c360d756
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> new file mode 100644
>>>> index 00000000000..9fc659d2402
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> new file mode 100644
>>>> index 00000000000..98275e5267d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> new file mode 100644
>>>> index 00000000000..8f6f0b11f09
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> new file mode 100644
>>>> index 00000000000..d96959dd064
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> @@ -0,0 +1,77 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
next prev parent reply other threads:[~2023-03-01 11:26 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-16 15:11 incarnation.p.lee
[not found] ` <9800822AA73B1E3D+5F679DFB-633A-446F-BB7F-59ADEEE67E50@rivai.ai>
2023-02-17 7:18 ` Li, Pan2
2023-02-17 7:36 ` Richard Biener
2023-02-17 8:39 ` Li, Pan2
2023-02-21 6:36 ` Li, Pan2
2023-02-21 8:28 ` Kito Cheng
2023-02-24 5:08 ` juzhe.zhong
2023-02-24 7:21 ` Li, Pan2
2023-02-27 3:43 ` Li, Pan2
2023-02-27 14:24 ` Richard Sandiford
2023-02-27 15:13 ` 盼 李
2023-02-28 2:27 ` Li, Pan2
2023-02-28 9:50 ` Richard Sandiford
2023-02-28 9:59 ` 盼 李
2023-02-28 14:07 ` Li, Pan2
2023-03-01 10:11 ` Richard Sandiford
2023-03-01 10:46 ` juzhe.zhong
2023-03-01 10:55 ` 盼 李
2023-03-01 11:11 ` Richard Sandiford
2023-03-01 11:26 ` 盼 李 [this message]
2023-03-01 11:53 ` 盼 李
2023-03-01 12:03 ` Richard Sandiford
2023-03-01 12:13 ` juzhe.zhong
2023-03-01 12:27 ` 盼 李
2023-03-01 12:33 ` Richard Biener
2023-03-01 12:56 ` Pan Li
2023-03-01 13:11 ` Richard Biener
2023-03-01 13:19 ` Richard Sandiford
2023-03-01 13:26 ` Richard Biener
2023-03-01 13:50 ` juzhe.zhong
2023-03-01 13:59 ` Richard Biener
2023-03-01 14:03 ` Richard Biener
2023-03-01 14:19 ` juzhe.zhong
2023-03-01 15:42 ` Li, Pan2
2023-03-01 15:46 ` Pan Li
2023-03-01 16:14 ` Richard Sandiford
2023-03-01 22:53 ` juzhe.zhong
2023-03-02 6:07 ` Li, Pan2
2023-03-02 8:25 ` Richard Biener
2023-03-02 8:37 ` juzhe.zhong
2023-03-02 9:39 ` Richard Sandiford
2023-03-02 10:19 ` juzhe.zhong
[not found] ` <2023030121501634323743@rivai.ai>
2023-03-01 13:52 ` juzhe.zhong
2023-03-02 5:55 ` [PATCH v2] " pan2.li
2023-03-02 9:43 ` Richard Sandiford
2023-03-02 14:46 ` Li, Pan2
2023-03-02 17:54 ` Richard Sandiford
2023-03-03 2:34 ` Li, Pan2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BYAPR04MB4824B3FEC38AACE8844978FBB7AD9@BYAPR04MB4824.namprd04.prod.outlook.com \
--to=incarnation.p.lee@outlook.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=juzhe.zhong@rivai.ai \
--cc=kito.cheng@sifive.com \
--cc=pan2.li@intel.com \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).