From: Richard Biener <richard.guenther@gmail.com>
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod <Tejas.Belagod@arm.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> Packed Boolean Vectors
> ----------------------
>
> I'd like to propose a feature addition to GNU Vector extensions to add packed
> boolean vectors (PBV).  This has been discussed in the past here[1] and a variant has
> been implemented in Clang recently[2].
>
> With predication features being added to vector architectures (SVE, MVE, AVX),
> it is a useful feature to have to model predication on targets.  This could
> find its use in intrinsics or just used as is as a GNU vector extension being
> mapped to underlying target features.  For example, the packed boolean vector
> could directly map to a predicate register on SVE.
>
> Also, this new packed boolean type GNU extension can be used with SVE ACLE
> intrinsics to replace a fixed-length svbool_t.
>
> Here are a few options to represent the packed boolean vector type.

The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int))));
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).


Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.

Thanks,
Tejas.




> 1. __attribute__((vector_size (n))) where n represents bytes
>
>   typedef bool vbool __attribute__ ((vector_size (1)));
>
> In this approach, the shape of the boolean vector is unclear. IoW, it is not
> clear if each bit in 'n' controls a byte or an element. On targets
> like SVE, it would be natural to have each bit control a byte of the target
> vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, each
> bit would control one element/lane on the target vector(therefore resulting in a
> 'packed' layout with all significant bits at the LSB).
>
> 2. __attribute__((vector_size (n))) where n represents num of lanes
>
>   typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
>   typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof (v4si){0}[0])));
>
> Here the 'n' in the vector_size attribute represents the number of bits that
> is needed to represent a vector quantity.  In this case, this packed boolean
> vector can represent upto 'n' vector lanes. The size of the type is
> rounded up the nearest byte.  For example, the sizeof v4bi in the above
> example is 1.
>
> In this approach, because of the nature of the representation, the n bits required
> to represent the n lanes of the vector are packed at the LSB. This does not naturally
> align with the SVE approach of each bit representing a byte of the target vector
> and PBV therefore having an 'unpacked' layout.
>
> More importantly, another drawback here is that the change in units for vector_size
> might be confusing to programmers.  The units will have to be interpreted based on the
> base type of the typedef. It does not offer any flexibility in terms of the layout of
> the bool vector - it is fixed.
>
> 3. Combination of 1 and 2.
>
> Combining the best of 1 and 2, we can introduce extra parameters to vector_size that will
> unambiguously represent the layout of the PBV. Consider
>
>   typedef bool vbool __attribute__((vector_size (s, n[, w])));
>
> where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd parameter 'w'
> is the number of bits of the PBV that represents a lane of the target vector. 'w' would
> allow a target to force a certain layout of the PBV.
>
> The 2-parameter form of vector_size allows the target to have an
> implementation-defined layout of the PBV. The target is free to choose the 'w'
> if it is not specified to mirror the target layout of predicate registers. For
> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
>
> As an example, to represent the result of a comparison on 2 int16x8_t, we'd need
> 8 lanes of boolean which could be represented by
>
>   typedef bool v8b __attribute__ ((vector_size (2, 8)));
>
> SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2
>
> and AVX would choose a layout where all 8 consecutive bits packed at LSB would
> be significant i.e. w == 1.
>
> This scheme would accomodate more than 1 target to effectively represent vector
> bools that mirror the target properties.
>
> 4. A new attribite
>
> This is based on a suggestion from Richard S in [3]. The idea is to introduce a new
> attribute to define the PBV and make it general enough to
>
> * represent all targets flexibly (SVE, AVX etc)
> * represent sub-byte length predicates
> * have no change in units of vector_size/no new vector_size signature
> * not have the number of bytes constrain representation
>
> If we call the new attribute 'bool_vec' (for lack of a better name), consider
>
>   typedef bool vbool __attribute__((bool_vec (n[, w])))
>
> where 'n' represents number of lanes/elements and the optional 'w' is bits-per-lane.
>
> If 'w' is not specified, it and bytes-per-predicate are implementation-defined based on target.
> If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).
>
> 5. Behaviour of the packed vector boolean type.
>
> Taking the example of one of the options above, following is an illustration of it's behavior
>
> * ABI
>
>   New ABI rules will need to be defined for this type - eg alignment, PCS,
>   mangling etc
>
> * Initialization:
>
>   Packed Boolean Vectors(PBV) can be initialized like so:
>
>     typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
>     v4bi p = {false, true, false, false};
>
>   Each value in the initizlizer constant is of type bool. The lowest numbered
>   element in the const array corresponds to the LSbit of p, element 1 is
>   assigned to bit 4 etc.
>
>   p is effectively a 2-byte bitmask with value 0x0010
>
>   With a different layout
>
>     typedef bool v4bi __attribute__ ((vector_size (2, 4, 1)));
>     v4bi p = {false, true, false, false};
>
>   p is effectively a 2-byte bitmask with value 0x0002
>
> * Operations:
>
>   Packed Boolean Vectors support the following operations:
>   . unary ~
>   . unary !
>   . binary&,|andˆ
>   . assignments &=, |= and ˆ=
>   . comparisons <, <=, ==, !=, >= and >
>   . Ternary operator ?:
>
>   Operations are defined as applied to the individual elements i.e the bits
>   that are significant in the PBV. Whether the PBVs are treated as bitmasks
>   or otherwise is implementation-defined.
>
>   Insignificant bits could affect results of comparisons or ternary operators.
>   In such cases, it is implementation defined how the unused bits are treated.
>
>   . Subscript operator []
>
>   For the subscript operator, the packed boolean vector acts like a array of
>   elements - the first or the 0th indexed element being the LSbit of the PBV.
>   Subscript operator yields a scalar boolean value.
>   For example:
>
>     typedef bool v8b __attribute__ ((vector_size (2, 8, 2)));
>
>     // Subscript operator result yields a boolean value.
>     // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x.
>     bool foo (v8b p, int n) { p[3] = true; return p[1]; }
>
>   Out of bounds access: OOB access can be determined at compile time given the
>   strong typing of the PBVs.
>
>   PBV does not support address of operator(&) for elements of PBVs.
>
>   . Implicit conversion from integer vectors to PBVs
>
>   We would like to support the output of comparison operations to be PBVs. This
>   requires us to define the implicit conversion from an integer vector to PBV
>   as the result of vector comparisons are integer vectors.
>
>   To define this operation:
>
>     bool_vector = vector <cmpop> vector
>
>   There is no change in how vector <cmpop> vector behavior i.e. this comparison
>   would still produce an int_vector type as it does now.
>
>     temp_int_vec = vector <cmpop> vector
>     bool_vec = temp_int_vec // Implicit conversion from int_vec to bool_vec
>
>   The implicit conversion from int_vec to bool I'd define simply to be:
>
>     bool_vec[n] = (_Bool) int_vec[n]
>
>   where the C11 standard rules apply
>   6.3.1.2 Boolean type  When any scalar value is converted to _Bool, the result
>   is 0 if the value compares equal to 0; otherwise, the result is 1.
>
>
> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
> [2] https://reviews.llvm.org/D88905
> [3] https://reviews.llvm.org/D81083
>
> Thoughts?
>
> Thanks,
> Tejas.