>> What's the byte size of VNx1BI, expressed as a function of N?
>> If it's CEIL (N, 8) then we don't have a way of representing that yet.
N is a poly value.
RVV like SVE support scalable vector.
the N is poly (1,1).

VNx1B mode nunits = poly(1,1) units.
VNx1B mode bitsize =poly (1,1) bitsize.
VNx1B mode bytesize = poly(1,1) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (1/8,1/8).
However, it can't represent it like this. GCC consider its bytesize as  poly (1,1) bytesize.


VNx2B mode nunits = poly(2,2) units.
VNx2B mode bitsize =poly (2,2) bitsize.
VNx2B mode bytesize = poly(2,2) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (2/8,2/8).
However, it can't represent it like this. GCC consider its bytesize as  poly (1,1) bytesize.

VNx4BI,VNx8BI, likewise.

So their bitsize are different but byteszie are all same.


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-02-13 17:41
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; incarnation.p.lee; gcc-patches; Kito.cheng; ams
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
Richard Biener <rguenther@suse.de> writes:
> On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
>> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>> Yes, I think so.
>> 
>> Let's explain RVV more clearly.
>> Let's suppose we have vector-length = 64bits in RVV CPU.
>> VNx1BI is exactly 1 consecutive bits.
>> VNx2BI is exactly 2 consecutive bits.
>> VNx4BI is exactly 4 consecutive bits.
>> VNx8BI is exactly 8 consecutive bits.
>> 
>> For VNx1BI (vbool64_t ), we load it wich this asm:
>> vsetvl e8mf8
>> vlm.v
>> 
>> For VNx2BI (vbool32_t ), we load it wich this asm:
>> vsetvl e8mf4
>> vlm.v
>> 
>> For VNx4BI (vbool16_t ), we load it wich this asm:
>> vsetvl e8mf2
>> vlm.v
>> 
>> For VNx8BI (vbool8_t ), we load it wich this asm:
>> vsetvl e8m1
>> vlm.v
>> 
>> In case of this code sequence:
>> vbool16_t v4 = *(vbool16_t *)in;
>> vbool8_t v3 = *(vbool8_t*)in;
>> 
>> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
>> We can't just use the data loaded by VNx4BI (vbool16_t ) in  VNx8BI (vbool8_t ).
>> But we can use the data loaded by VNx8BI (vbool8_t  ) in  VNx4BI (vbool16_t ).
>>
>> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
>> It's incorrect for RVV.
>
> OK, so the 'vlm.v' instruction will zero the padding bits (according to
> vsetvl), but I doubt the memory subsystem will not load a whole byte.
>
> Then GET_MODE_PRECISION of VNx4BI has to be smaller than 
> GET_MODE_PRECISION of VNx8BI, even if their size is the same.
>
> I suppose that ADJUST_NUNITS should be able to do this, but then we
> have in aarch64-modes.def
>
> VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
>
> ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
> ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
>
> so all VNxMBI modes are 2 bytes in size but their component is always
> BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
 
Yeah.  Only the low bit is significant, so it's still a 1-bit element.
But the padding is distributed evenly across the elements rather than
being grouped at one end of the predicate.
 
> For riscv we have
>
> VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
> ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
>
> so here it would be natural to set the mode precision to
> a poly-int computed by the component precision times nunits?  OTOH
> we have to look at the component precision vs. size as well and
>
> /* Single bit mode used for booleans.  */ 
> BOOL_MODE (BI, 1, 1); 
>
> BOOL_MODE is not documented, but its precision and size, so BImode
> has a size of 1.  That makes VECTOR_BOOL_MODE very special since
> the layout isn't derived from the component mode.  Deriving the
> layout from the precision would make aarch64 incorrect and
> would need BI2 and BI4 modes at least.
 
I think the elements have to stay BI for AArch64.  Using BI2 (with a
precision of 2) would make both bits significant.
 
I'm not sure the RVV case fits into the existing mode layout scheme.
AFAIK we don't currently support vector modes with padding at one end.
If that's right, the fix is likely to involve more than just tweaking
the mode parameters.
 
What's the byte size of VNx1BI, expressed as a function of N?
If it's CEIL (N, 8) then we don't have a way of representing that yet.
 
Thanks,
Richard