From: Richard Biener Date: Monday, June 26, 2023 at 2:23 PM To: Tejas Belagod Cc: gcc-patches@gcc.gnu.org Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches wrote: > > Hi, > > Packed Boolean Vectors > ---------------------- > > I'd like to propose a feature addition to GNU Vector extensions to add packed > boolean vectors (PBV). This has been discussed in the past here[1] and a variant has > been implemented in Clang recently[2]. > > With predication features being added to vector architectures (SVE, MVE, AVX), > it is a useful feature to have to model predication on targets. This could > find its use in intrinsics or just used as is as a GNU vector extension being > mapped to underlying target features. For example, the packed boolean vector > could directly map to a predicate register on SVE. > > Also, this new packed boolean type GNU extension can be used with SVE ACLE > intrinsics to replace a fixed-length svbool_t. > > Here are a few options to represent the packed boolean vector type. The GIMPLE frontend uses a new 'vector_mask' attribute: typedef int v8si __attribute__((vector_size(8*sizeof(int)))); typedef v8si v8sib __attribute__((vector_mask)); it get's you a vector type that's the appropriate (dependent on the target) vector mask type for the vector data type (v8si in this case). Thanks Richard. Having had a quick look at the implementation, it does seem to tick the boxes. I must admit I haven't dug deep, but if the target hook allows the mask to be defined in way that is target-friendly (and I don't know how much effort it will be to migrate the attribute to more front-ends), it should do the job nicely. Let me go back and dig a bit deeper and get back with questions if any. Thanks, Tejas. > 1. __attribute__((vector_size (n))) where n represents bytes > > typedef bool vbool __attribute__ ((vector_size (1))); > > In this approach, the shape of the boolean vector is unclear. IoW, it is not > clear if each bit in 'n' controls a byte or an element. On targets > like SVE, it would be natural to have each bit control a byte of the target > vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, each > bit would control one element/lane on the target vector(therefore resulting in a > 'packed' layout with all significant bits at the LSB). > > 2. __attribute__((vector_size (n))) where n represents num of lanes > > typedef int v4si __attribute__ ((vector_size (4 * sizeof (int))); > typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof (v4si){0}[0]))); > > Here the 'n' in the vector_size attribute represents the number of bits that > is needed to represent a vector quantity. In this case, this packed boolean > vector can represent upto 'n' vector lanes. The size of the type is > rounded up the nearest byte. For example, the sizeof v4bi in the above > example is 1. > > In this approach, because of the nature of the representation, the n bits required > to represent the n lanes of the vector are packed at the LSB. This does not naturally > align with the SVE approach of each bit representing a byte of the target vector > and PBV therefore having an 'unpacked' layout. > > More importantly, another drawback here is that the change in units for vector_size > might be confusing to programmers. The units will have to be interpreted based on the > base type of the typedef. It does not offer any flexibility in terms of the layout of > the bool vector - it is fixed. > > 3. Combination of 1 and 2. > > Combining the best of 1 and 2, we can introduce extra parameters to vector_size that will > unambiguously represent the layout of the PBV. Consider > > typedef bool vbool __attribute__((vector_size (s, n[, w]))); > > where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd parameter 'w' > is the number of bits of the PBV that represents a lane of the target vector. 'w' would > allow a target to force a certain layout of the PBV. > > The 2-parameter form of vector_size allows the target to have an > implementation-defined layout of the PBV. The target is free to choose the 'w' > if it is not specified to mirror the target layout of predicate registers. For > eg. AVX would choose 'w' as 1 and SVE would choose s*8/n. > > As an example, to represent the result of a comparison on 2 int16x8_t, we'd need > 8 lanes of boolean which could be represented by > > typedef bool v8b __attribute__ ((vector_size (2, 8))); > > SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2 > > and AVX would choose a layout where all 8 consecutive bits packed at LSB would > be significant i.e. w == 1. > > This scheme would accomodate more than 1 target to effectively represent vector > bools that mirror the target properties. > > 4. A new attribite > > This is based on a suggestion from Richard S in [3]. The idea is to introduce a new > attribute to define the PBV and make it general enough to > > * represent all targets flexibly (SVE, AVX etc) > * represent sub-byte length predicates > * have no change in units of vector_size/no new vector_size signature > * not have the number of bytes constrain representation > > If we call the new attribute 'bool_vec' (for lack of a better name), consider > > typedef bool vbool __attribute__((bool_vec (n[, w]))) > > where 'n' represents number of lanes/elements and the optional 'w' is bits-per-lane. > > If 'w' is not specified, it and bytes-per-predicate are implementation-defined based on target. > If 'w' is specified, sizeof (vbool) will be ceil (n*w/8). > > 5. Behaviour of the packed vector boolean type. > > Taking the example of one of the options above, following is an illustration of it's behavior > > * ABI > > New ABI rules will need to be defined for this type - eg alignment, PCS, > mangling etc > > * Initialization: > > Packed Boolean Vectors(PBV) can be initialized like so: > > typedef bool v4bi __attribute__ ((vector_size (2, 4, 4))); > v4bi p = {false, true, false, false}; > > Each value in the initizlizer constant is of type bool. The lowest numbered > element in the const array corresponds to the LSbit of p, element 1 is > assigned to bit 4 etc. > > p is effectively a 2-byte bitmask with value 0x0010 > > With a different layout > > typedef bool v4bi __attribute__ ((vector_size (2, 4, 1))); > v4bi p = {false, true, false, false}; > > p is effectively a 2-byte bitmask with value 0x0002 > > * Operations: > > Packed Boolean Vectors support the following operations: > . unary ~ > . unary ! > . binary&,|andˆ > . assignments &=, |= and ˆ= > . comparisons <, <=, ==, !=, >= and > > . Ternary operator ?: > > Operations are defined as applied to the individual elements i.e the bits > that are significant in the PBV. Whether the PBVs are treated as bitmasks > or otherwise is implementation-defined. > > Insignificant bits could affect results of comparisons or ternary operators. > In such cases, it is implementation defined how the unused bits are treated. > > . Subscript operator [] > > For the subscript operator, the packed boolean vector acts like a array of > elements - the first or the 0th indexed element being the LSbit of the PBV. > Subscript operator yields a scalar boolean value. > For example: > > typedef bool v8b __attribute__ ((vector_size (2, 8, 2))); > > // Subscript operator result yields a boolean value. > // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x. > bool foo (v8b p, int n) { p[3] = true; return p[1]; } > > Out of bounds access: OOB access can be determined at compile time given the > strong typing of the PBVs. > > PBV does not support address of operator(&) for elements of PBVs. > > . Implicit conversion from integer vectors to PBVs > > We would like to support the output of comparison operations to be PBVs. This > requires us to define the implicit conversion from an integer vector to PBV > as the result of vector comparisons are integer vectors. > > To define this operation: > > bool_vector = vector vector > > There is no change in how vector vector behavior i.e. this comparison > would still produce an int_vector type as it does now. > > temp_int_vec = vector vector > bool_vec = temp_int_vec // Implicit conversion from int_vec to bool_vec > > The implicit conversion from int_vec to bool I'd define simply to be: > > bool_vec[n] = (_Bool) int_vec[n] > > where the C11 standard rules apply > 6.3.1.2 Boolean type When any scalar value is converted to _Bool, the result > is 0 if the value compares equal to 0; otherwise, the result is 1. > > > [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html > [2] https://reviews.llvm.org/D88905 > [3] https://reviews.llvm.org/D81083 > > Thoughts? > > Thanks, > Tejas.