Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tejas Belagod <tejas.belagod@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
Date: Wed, 26 Jul 2023 12:51:17 +0530	[thread overview]
Message-ID: <bce0b7e8-2380-159d-92f3-78070287ac47@arm.com> (raw)
In-Reply-To: <CAFiYyc11DiJNL-gJvCNa9gb4OAUJay11wDzzxgAFHPqvUj+qwA@mail.gmail.com>

On 7/17/23 5:46 PM, Richard Biener wrote:
> On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod <tejas.belagod@arm.com> wrote:
>>
>> On 7/13/23 4:05 PM, Richard Biener wrote:
>>> On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod <tejas.belagod@arm.com> wrote:
>>>>
>>>> On 7/3/23 1:31 PM, Richard Biener wrote:
>>>>> On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod <tejas.belagod@arm.com> wrote:
>>>>>>
>>>>>> On 6/29/23 6:55 PM, Richard Biener wrote:
>>>>>>> On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod <Tejas.Belagod@arm.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Richard Biener <richard.guenther@gmail.com>
>>>>>>>> Date: Tuesday, June 27, 2023 at 12:58 PM
>>>>>>>> To: Tejas Belagod <Tejas.Belagod@arm.com>
>>>>>>>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
>>>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>>>>>>>>
>>>>>>>> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod <Tejas.Belagod@arm.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Richard Biener <richard.guenther@gmail.com>
>>>>>>>>> Date: Monday, June 26, 2023 at 2:23 PM
>>>>>>>>> To: Tejas Belagod <Tejas.Belagod@arm.com>
>>>>>>>>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
>>>>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
>>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Packed Boolean Vectors
>>>>>>>>>> ----------------------
>>>>>>>>>>
>>>>>>>>>> I'd like to propose a feature addition to GNU Vector extensions to add packed
>>>>>>>>>> boolean vectors (PBV).  This has been discussed in the past here[1] and a variant has
>>>>>>>>>> been implemented in Clang recently[2].
>>>>>>>>>>
>>>>>>>>>> With predication features being added to vector architectures (SVE, MVE, AVX),
>>>>>>>>>> it is a useful feature to have to model predication on targets.  This could
>>>>>>>>>> find its use in intrinsics or just used as is as a GNU vector extension being
>>>>>>>>>> mapped to underlying target features.  For example, the packed boolean vector
>>>>>>>>>> could directly map to a predicate register on SVE.
>>>>>>>>>>
>>>>>>>>>> Also, this new packed boolean type GNU extension can be used with SVE ACLE
>>>>>>>>>> intrinsics to replace a fixed-length svbool_t.
>>>>>>>>>>
>>>>>>>>>> Here are a few options to represent the packed boolean vector type.
>>>>>>>>>
>>>>>>>>> The GIMPLE frontend uses a new 'vector_mask' attribute:
>>>>>>>>>
>>>>>>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
>>>>>>>>> typedef v8si v8sib __attribute__((vector_mask));
>>>>>>>>>
>>>>>>>>> it get's you a vector type that's the appropriate (dependent on the
>>>>>>>>> target) vector
>>>>>>>>> mask type for the vector data type (v8si in this case).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Richard.
>>>>>>>>>
>>>>>>>>> Having had a quick look at the implementation, it does seem to tick the boxes.
>>>>>>>>>
>>>>>>>>> I must admit I haven't dug deep, but if the target hook allows the mask to be
>>>>>>>>>
>>>>>>>>> defined in way that is target-friendly (and I don't know how much effort it will
>>>>>>>>>
>>>>>>>>> be to migrate the attribute to more front-ends), it should do the job nicely.
>>>>>>>>>
>>>>>>>>> Let me go back and dig a bit deeper and get back with questions if any.
>>>>>>>>
>>>>>>>>
>>>>>>>> Let me add that the advantage of this is the compiler doesn't need
>>>>>>>> to support weird explicitely laid out packed boolean vectors that do
>>>>>>>> not match what the target supports and the user doesn't need to know
>>>>>>>> what the target supports (and thus have an #ifdef maze around explicitely
>>>>>>>> specified layouts).
>>>>>>>>
>>>>>>>> Sorry for the delayed response – I spent a day experimenting with vector_mask.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough
>>>>>>>>
>>>>>>>> to avoid having to sprinkle the code with ifdefs.
>>>>>>>>
>>>>>>>>
>>>>>>>> It does remove some flexibility though, for example with -mavx512f -mavx512vl
>>>>>>>> you'll get AVX512 style masks for V4SImode data vectors but of course the
>>>>>>>> target sill supports SSE2/AVX2 style masks as well, but those would not be
>>>>>>>> available as "packed boolean vectors", though they are of course in fact
>>>>>>>> equal to V4SImode data vectors with -1 or 0 values, so in this particular
>>>>>>>> case it might not matter.
>>>>>>>>
>>>>>>>> That said, the vector_mask attribute will get you V4SImode vectors with
>>>>>>>> signed boolean elements of 32 bits for V4SImode data vectors with
>>>>>>>> SSE2/AVX2.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This sounds very much like what the scenario would be with NEON vs SVE. Coming to think
>>>>>>>>
>>>>>>>> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the ‘base’ vector type
>>>>>>>>
>>>>>>>> and a ‘w’ specified for the type.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Given its current implementation, if vector_mask is exposed to the CFE, would there be any
>>>>>>>>
>>>>>>>> major challenges wrt implementation or defining behaviour semantics? I played around with a
>>>>>>>>
>>>>>>>> few examples from the testsuite and wrote some new ones. I mostly tried operations that
>>>>>>>>
>>>>>>>> the new type would have to support (unary, binary bitwise, initializations etc) – with a couple of exceptions
>>>>>>>>
>>>>>>>> most of the ops seem to be supported. I also triggered a couple of ICEs in some tests involving
>>>>>>>>
>>>>>>>> implicit conversions to wider/narrower vector_mask types (will raise reports for these). Correct me
>>>>>>>>
>>>>>>>> if I’m wrong here, but we’d probably have to support a couple of new ops if vector_mask is exposed
>>>>>>>>
>>>>>>>> to the CFE – initialization and subscript operations?
>>>>>>>
>>>>>>> Yes, either that or restrict how the mask vectors can be used, thus
>>>>>>> properly diagnose improper
>>>>>>> uses.
>>>>>>
>>>>>> Indeed.
>>>>>>
>>>>>>      A question would be for example how to write common mask test
>>>>>>> operations like
>>>>>>> if (any (mask)) or if (all (mask)).
>>>>>>
>>>>>> I see 2 options here. New builtins could support new types - they'd
>>>>>> provide a target independent way to test any and all conditions. Another
>>>>>> would be to let the target use its intrinsics to do them in the most
>>>>>> efficient way possible (which the builtins would get lowered down to
>>>>>> anyway).
>>>>>>
>>>>>>
>>>>>>      Likewise writing merge operations
>>>>>>> - do those as
>>>>>>>
>>>>>>>      a = a | (mask ? b : 0);
>>>>>>>
>>>>>>> thus use ternary ?: for this?
>>>>>>
>>>>>> Yes, like now, the ternary could just translate to
>>>>>>
>>>>>>       {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... }
>>>>>>
>>>>>> One thing to flesh out is the semantics. Should we allow this operation
>>>>>> as long as the number of elements are the same even if the mask type if
>>>>>> different i.e.
>>>>>>
>>>>>>       v4hib ? v4si : v4si;
>>>>>>
>>>>>> I don't see why this can't be allowed as now we let
>>>>>>
>>>>>>       v4si ? v4sf : v4sf;
>>>>>>
>>>>>>
>>>>>> For initialization regular vector
>>>>>>> syntax should work:
>>>>>>>
>>>>>>> mtype mask = (mtype){ -1, -1, 0, 0, ... };
>>>>>>>
>>>>>>> there's the question of the signedness of the mask elements.  GCC
>>>>>>> internally uses signed
>>>>>>> bools with values -1 for true and 0 for false.
>>>>>>
>>>>>> One of the things is the value that represents true. This is largely
>>>>>> target-dependent when it comes to the vector_mask type. When vector_mask
>>>>>> types are created from GCC's internal representation of bool vectors
>>>>>> (signed ints) the point about implicit/explicit conversions from signed
>>>>>> int vect to mask types in the proposal covers this. So mask in
>>>>>>
>>>>>>       v4sib mask = (v4sib){-1, -1, 0, 0, ... }
>>>>>>
>>>>>> will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx
>>>>>> on SVE. On AVX2/SSE they'd still be represented as vector of signed ints
>>>>>> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications this
>>>>>> new mask type representations will have in the mid-end while being
>>>>>> converted back and forth to and from GCC's internal representation, but
>>>>>> I'm guessing this is already being handled at some level by the
>>>>>> vector_mask type's current support?
>>>>>
>>>>> Yes, I would guess so.  Of course what the middle-end is currently exposed
>>>>> to is simply what the vectorizer generates - once fuzzers discover this feature
>>>>> we'll see "interesting" uses that might run into missed or wrong handling of
>>>>> them.
>>>>>
>>>>> So whatever we do on the side of exposing this to users a good portion
>>>>> of testsuite coverage for the allowed use cases is important.
>>>>>
>>>>> Richard.
>>>>>
>>>>
>>>> Apologies for the long-ish reply, but here's a TLDR and gory details follow.
>>>>
>>>> TLDR:
>>>> GIMPLE's vector_mask type semantics seems to be target-dependent, so
>>>> elevating vector_mask to CFE with same semantics is undesirable. OTOH,
>>>> changing vector_mask to have target-independent CFE semantics will cause
>>>> dichotomy between its CFE and GFE behaviours. But vector_mask approach
>>>> scales well for sizeless types. Is the solution to have something like
>>>> vector_mask with defined target-independent type semantics, but call it
>>>> something else to prevent conflation with GIMPLE, a viable option?
>>>>
>>>> Details:
>>>> After some more analysis of the proposed options, here are some
>>>> interesting findings:
>>>>
>>>> vector_mask looked like a very interesting option until I ran into some
>>>> semantic uncertainly. This code:
>>>>
>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
>>>> typedef v8si v8sib __attribute__((vector_mask));
>>>>
>>>> typedef short v8hi __attribute__((vector_size(8*sizeof(short))));
>>>> typedef v8hi v8hib __attribute__((vector_mask));
>>>>
>>>> v8si res;
>>>> v8hi resh;
>>>>
>>>> v8hib __GIMPLE () foo (v8hib x, v8sib y)
>>>> {
>>>>      v8hib res;
>>>>
>>>>      res = x & y;
>>>>      return res;
>>>> }
>>>>
>>>> When compiled on AArch64, produces a type-mismatch error for binary
>>>> expression involving '&' because the 'derived' types 'v8hib' and 'v8sib'
>>>>     have a different target-layout. If the layout of these two 'derived'
>>>> types match, then the above code has no issue. Which is the case on
>>>> amdgcn-amdhsa target where it compiles without any error(amdgcn uses a
>>>> scalar DImode mask mode). IoW such code seems to be allowed on some
>>>> targets and not on others.
>>>>
>>>> With the same code, I tried putting casts and it worked fine on AArch64
>>>> and amdgcn. This target-specific behaviour of vector_mask derived types
>>>> will be difficult to specify once we move it to the CFE - in fact we
>>>> probably don't want target-specific behaviour once it moves to the CFE.
>>>>
>>>> If we expose vector_mask to CFE, we'd have to specify consistent
>>>> semantics for vector_mask types. We'd have to resolve ambiguities like
>>>> 'v4hib & v4sib' clearly to be able to specify the semantics of the type
>>>> system involving vector_mask. If we do this, don't we run the risk of a
>>>> dichotomy between the CFE and GFE semantics of vector_mask? I'm assuming
>>>> we'd want to retain vector_mask semantics as they are in GIMPLE.
>>>>
>>>> If we want to enforce constant semantics for vector_mask in the CFE, one
>>>> way is to treat vector_mask types as distinct if they're 'attached' to
>>>> distinct data vector types. In such a scenario, vector_mask types
>>>> attached to two data vector types with the same lane-width and number of
>>>> lanes would be classified as distinct. For eg:
>>>>
>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
>>>> typedef v8si v8sib __attribute__((vector_mask));
>>>>
>>>> typedef float v8sf __attribute__((vector_size(8*sizeof(float))));
>>>> typedef v8sf v8sfb __attribute__((vector_mask));
>>>>
>>>> v8si  foo (v8sf x, v8sf y, v8si i, v8si j)
>>>> {
>>>>      (a == b) & (v8sfb)(x == y) ? x : (v8si){0};
>>>> }
>>>>
>>>> This could be the case for unsigned vs signed int vectors too for eg -
>>>> seems a bit unnecessary tbh.
>>>>
>>>> Though vector_mask's being 'attached' to a type has its drawbacks, it
>>>> does seem to have an advantage when sizeless types are considered. If we
>>>> have to define a sizeless vector boolean type that is implied by the
>>>> lane size, we could do something like
>>>>
>>>> typedef svint32_t svbool32_t __attribute__((vector_mask));
>>>>
>>>> int32_t foo (svint32_t a, svint32_t b)
>>>> {
>>>>      svbool32_t pred = a > b;
>>>>
>>>>      return pred[2] ? a[2] : b[2];
>>>> }
>>>>
>>>> This is harder to do in the other schemes proposed so far as they're
>>>> size-based.
>>>>
>>>> To be able to free the boolean from the base type (not size) and retain
>>>> vector_mask's flexibility to declare sizeless types, we could have an
>>>> attribute that is more flexibly-typed and only 'derives' the lane-size
>>>> and number of lanes from its 'base' type without actually inheriting the
>>>> actual base type(char, short, int etc) or its signedness. This creates a
>>>> purer and stand-alone boolean type without the associated semantics'
>>>> complexity of having to cast between two same-size types with the same
>>>> number of lanes. Eg.
>>>>
>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
>>>> typedef v8si v8b __attribute__((vector_bool));
>>>>
>>>> However, with differing lane-sizes, there will have to be a cast as the
>>>> 'derived' element size is different which could impact the layout of the
>>>> vector mask. Eg.
>>>>
>>>> v8si  foo (v8hi x, v8hi y, v8si i, v8si j)
>>>> {
>>>>      (v8sib)(x == y) & (i == j) ? i : (v8si){0};
>>>> }
>>>>
>>>> Such conversions on targets like AVX512/AMDGCN will be a NOP, but
>>>> non-trivial on SVE (depending on the implemented layout of the bool vector).
>>>>
>>>> vector_bool decouples us from having to retain the behaviour of
>>>> vector_mask and provides the flexibility of not having to cast across
>>>> same-element-size vector types. Wrt to sizeless types, it could scale well.
>>>>
>>>> typedef svint32_t svbool32_t __attribute__((vector_bool));
>>>> typedef svint16_t svbool16_t __attribute__((vector_bool));
>>>>
>>>> int32_t foo (svint32_t a, svint32_t b)
>>>> {
>>>>      svbool32_t pred = a > b;
>>>>
>>>>      return pred[2] ? a[2] : b[2];
>>>> }
>>>>
>>>> int16_t bar (svint16_t a, svint16_t b)
>>>> {
>>>>      svbool16_t pred = a > b;
>>>>
>>>>      return pred[2] ? a[2] : b[2];
>>>> }
>>>>
>>>> On SVE, pred[2] refers to bit 4 for svint16_t and bit 8 for svint32_t on
>>>> the target predicate.
>>>>
>>>> Thoughts?
>>>
>>> The GIMPLE frontend accepts just what is valid on the target here.  Any
>>> "plumbing" such as implicit conversions (if we do not want to require
>>> explicit ones even when NOP) need to be done/enforced by the C frontend.
>>>
>>
>> Sorry, I'm not sure I follow - correct me if I'm wrong here.
>>
>> If we desire to define/allow operations like implicit/explicit
>> conversion on vector_mask types in CFE, don't we have to start from a
>> position of defining what makes vector_mask types distinct and therefore
>> require implicit/explicit conversions?
> 
> We need to look at which operations we want to produce vector masks and
> which operations consume them and what operations operate on them.
> 
> In GIMPLE comparisons produce them, conditionals consume them and
> we allow bitwise ops to operate on them directly (GIMPLE doesn't have
> logical && it just has bitwise &).
> 

Thanks for your thoughts - after I spent more cycles researching and 
experimenting, I think I understand the driving factors here. Comparison 
producers generate signed integer vectors of the same lane-width as the 
comparison operands. This means mixed type vectors can't be applied to 
conditional consumers or bitwise operators eg:

   v8hi foo (v8si a, v8si b, v8hi c, v8hi d)
   {
     return a > b || c > d; // error!
     return a > b || __builtin_convertvector (c > d, v8si); // OK.
     return a | b && c | d; // error!
     return a | b && __builtin_convertvector (c | d, v8si); // OK.
   }

Similarly, if we extend these 'stricter-typing' rules to vector_mask, it 
could look like:

typedef v4sib v4si __attribute__((vector_mask));
typedef v4hib v4hi __attribute__((vector_mask));

   v8sib foo (v8si a, v8si b, v8hi c, v8hi d)
   {
     v8sib psi = a > b;
     v8hib phi = c > d;

     return psi || phi; // error!
     return psi || __builtin_convertvector (phi, v8sib); // OK.
     return psi | phi; // error!
     return psi | __builtin_convertvector (phi, v8sib); // OK.
   }

At GIMPLE stage, on targets where the layout allows it (eg AMDGCN), 
expressions like
   psi | __builtin_convertvector (phi, v8sib)
can be optimized to
   psi | phi
because __builtin_convertvector (phi, v8sib) is a NOP.

I think this could make vector_mask more portable across targets. If one 
wants to take CFE vector_mask code and run it on the GFE, it should 
work; while the reverse won't as CFE vector_mask rules are more restrictive.

Does this look like a sensible approach for progress?

>> IIUC, GFE's distinctness of vector_mask types depends on how the mask
>> mode is implemented on the target. If implemented in CFE, vector_mask
>> types' distinctness probably shouldn't be based on target layout and
>> could be based on the type they're 'attached' to.
> 
> But since we eventually run on the target the layout should ideally
> match that of the target.  Now, the question is whether that's ever
> OK behavior - it effectively makes the mask somewhat opaque and
> only "observable" by probing it in defined manners.
> 
>> Wouldn't that diverge from target-specific GFE behaviour - or are you
>> suggesting its OK for vector_mask type semantics to be different in CFE
>> and GFE?
> 
> It's definitely undesirable but as said I'm not sure it has to differ
> [the layout].
> 

I agree it is best to have a consistent layout of vector_mask across CFE 
and GFE and also implement it to match the target layout for optimal 
code quality.

For observability, I think it makes sense to allow operations that are 
relevant and have a consistent meaning irrespective of that target. Eg. 
'vector_mask & 2' might not mean the same thing on all targets, but 
vector_mask[2] does. Therefore, I think the opaqueness is useful and 
necessary to some extent.

Thanks,
Tejas.

>>> There's one issue I can see that wasn't mentioned yet - GCC currently
>>> accepts
>>>
>>> typedef long gv1024di __attribute__((vector_size(1024*8)));
>>>
>>> even if there's no underlying support on the target which either has support
>>> only for smaller vectors or no vectors at all.  Currently vector_mask will
>>> simply fail to produce sth desirable here.  What's your idea of making
>>> that not target dependent?  GCC will later lower operations with such
>>> vectors, possibly splitting them up into sizes supported by the hardware
>>> natively, possibly performing elementwise operations.  For the former
>>> one would need to guess the "decomposition type" and based on that
>>> select the mask type [layout]?
>>>
>>> One idea would be to specify the mask layout follows the largest vector
>>> kind supported by the target and if there is none follow the layout
>>> of (signed?) _Bool [n]?  When there's no target support for vectors
>>> GCC will generally use elementwise operations apart from some
>>> special-cases.
>>>
>>
>> That is a very good point - thanks for raising it. For when GCC chooses
>> to lower to a vector type supported by the target, my initial thought
>> would be to, as you say, choose a mask that has enough bits to represent
>> the largest vector size with the smallest lane-width. The actual layout
>> of the mask will depend on how the target implements its mask mode.
>> Decomposition of vector_mask ought to follow the decomposition of the
>> GNU vectors type and each decomposed vector_mask type ought to have
>> enough bits to represent the decomposed GNU vector shape. It sounds nice
>> on paper, but I haven't really worked through a design for this. Do you
>> see any gotchas here?
> 
> Not really.  In the end it comes down to what the C writer is allowed to
> do with a vector mask.  I would for example expect that I could do
> 
>   auto m = v1 < v2;
>   _mm512_mask_sub_epi32 (a, m, b, c);
> 
> so generic masks should inter-operate with intrinsics (when the appropriate
> ISA is enabled).  That works for the data vectors themselves for example
> (quite some intrinsics are implemented with GCCs generic vector code).
> 
> I for example can't do
> 
>    _Bool lane2 = m[2];
> 
> to inspect lane two of a maks with AVX512.  I can do m & 2 but I wouldn't expect
> that to work (should I?) with a vector_mask mask (it's at least not
> valid directly
> in GIMPLE).  There's _mm512_int2mask and _mm512_mask2int which transfer
> between mask and int (but the mask types are really just typedefd to
> integer typeS).
> 
>>> While using a different name than vector_mask is certainly possible
>>> it wouldn't me to decide that, but I'm also not yet convinced it's
>>> really necessary.  As said, what the GIMPLE frontend accepts
>>> or not shouldn't limit us here - just the actual chosen layout of the
>>> boolean vectors.
>>>
>>
>> I'm just concerned about creating an alternate vector_mask functionality
>> in the CFE and risk not being consistent with GFE.
> 
> I think it's more important to double-check usablilty from the users side.
> If the implementation necessarily diverges from GIMPLE then we can
> choose a different attribute name but then it will also inevitably have
> code-generation (quality) issues as GIMPLE matches what the hardware
> can do.
> 
> Richard.
> 
>> Thanks,
>> Tejas.
>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Tejas.
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Tejas.
>>>>>>
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Tejas.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Tejas.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 1. __attribute__((vector_size (n))) where n represents bytes
>>>>>>>>>>
>>>>>>>>>>       typedef bool vbool __attribute__ ((vector_size (1)));
>>>>>>>>>>
>>>>>>>>>> In this approach, the shape of the boolean vector is unclear. IoW, it is not
>>>>>>>>>> clear if each bit in 'n' controls a byte or an element. On targets
>>>>>>>>>> like SVE, it would be natural to have each bit control a byte of the target
>>>>>>>>>> vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, each
>>>>>>>>>> bit would control one element/lane on the target vector(therefore resulting in a
>>>>>>>>>> 'packed' layout with all significant bits at the LSB).
>>>>>>>>>>
>>>>>>>>>> 2. __attribute__((vector_size (n))) where n represents num of lanes
>>>>>>>>>>
>>>>>>>>>>       typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
>>>>>>>>>>       typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof (v4si){0}[0])));
>>>>>>>>>>
>>>>>>>>>> Here the 'n' in the vector_size attribute represents the number of bits that
>>>>>>>>>> is needed to represent a vector quantity.  In this case, this packed boolean
>>>>>>>>>> vector can represent upto 'n' vector lanes. The size of the type is
>>>>>>>>>> rounded up the nearest byte.  For example, the sizeof v4bi in the above
>>>>>>>>>> example is 1.
>>>>>>>>>>
>>>>>>>>>> In this approach, because of the nature of the representation, the n bits required
>>>>>>>>>> to represent the n lanes of the vector are packed at the LSB. This does not naturally
>>>>>>>>>> align with the SVE approach of each bit representing a byte of the target vector
>>>>>>>>>> and PBV therefore having an 'unpacked' layout.
>>>>>>>>>>
>>>>>>>>>> More importantly, another drawback here is that the change in units for vector_size
>>>>>>>>>> might be confusing to programmers.  The units will have to be interpreted based on the
>>>>>>>>>> base type of the typedef. It does not offer any flexibility in terms of the layout of
>>>>>>>>>> the bool vector - it is fixed.
>>>>>>>>>>
>>>>>>>>>> 3. Combination of 1 and 2.
>>>>>>>>>>
>>>>>>>>>> Combining the best of 1 and 2, we can introduce extra parameters to vector_size that will
>>>>>>>>>> unambiguously represent the layout of the PBV. Consider
>>>>>>>>>>
>>>>>>>>>>       typedef bool vbool __attribute__((vector_size (s, n[, w])));
>>>>>>>>>>
>>>>>>>>>> where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd parameter 'w'
>>>>>>>>>> is the number of bits of the PBV that represents a lane of the target vector. 'w' would
>>>>>>>>>> allow a target to force a certain layout of the PBV.
>>>>>>>>>>
>>>>>>>>>> The 2-parameter form of vector_size allows the target to have an
>>>>>>>>>> implementation-defined layout of the PBV. The target is free to choose the 'w'
>>>>>>>>>> if it is not specified to mirror the target layout of predicate registers. For
>>>>>>>>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
>>>>>>>>>>
>>>>>>>>>> As an example, to represent the result of a comparison on 2 int16x8_t, we'd need
>>>>>>>>>> 8 lanes of boolean which could be represented by
>>>>>>>>>>
>>>>>>>>>>       typedef bool v8b __attribute__ ((vector_size (2, 8)));
>>>>>>>>>>
>>>>>>>>>> SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2
>>>>>>>>>>
>>>>>>>>>> and AVX would choose a layout where all 8 consecutive bits packed at LSB would
>>>>>>>>>> be significant i.e. w == 1.
>>>>>>>>>>
>>>>>>>>>> This scheme would accomodate more than 1 target to effectively represent vector
>>>>>>>>>> bools that mirror the target properties.
>>>>>>>>>>
>>>>>>>>>> 4. A new attribite
>>>>>>>>>>
>>>>>>>>>> This is based on a suggestion from Richard S in [3]. The idea is to introduce a new
>>>>>>>>>> attribute to define the PBV and make it general enough to
>>>>>>>>>>
>>>>>>>>>> * represent all targets flexibly (SVE, AVX etc)
>>>>>>>>>> * represent sub-byte length predicates
>>>>>>>>>> * have no change in units of vector_size/no new vector_size signature
>>>>>>>>>> * not have the number of bytes constrain representation
>>>>>>>>>>
>>>>>>>>>> If we call the new attribute 'bool_vec' (for lack of a better name), consider
>>>>>>>>>>
>>>>>>>>>>       typedef bool vbool __attribute__((bool_vec (n[, w])))
>>>>>>>>>>
>>>>>>>>>> where 'n' represents number of lanes/elements and the optional 'w' is bits-per-lane.
>>>>>>>>>>
>>>>>>>>>> If 'w' is not specified, it and bytes-per-predicate are implementation-defined based on target.
>>>>>>>>>> If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).
>>>>>>>>>>
>>>>>>>>>> 5. Behaviour of the packed vector boolean type.
>>>>>>>>>>
>>>>>>>>>> Taking the example of one of the options above, following is an illustration of it's behavior
>>>>>>>>>>
>>>>>>>>>> * ABI
>>>>>>>>>>
>>>>>>>>>>       New ABI rules will need to be defined for this type - eg alignment, PCS,
>>>>>>>>>>       mangling etc
>>>>>>>>>>
>>>>>>>>>> * Initialization:
>>>>>>>>>>
>>>>>>>>>>       Packed Boolean Vectors(PBV) can be initialized like so:
>>>>>>>>>>
>>>>>>>>>>         typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
>>>>>>>>>>         v4bi p = {false, true, false, false};
>>>>>>>>>>
>>>>>>>>>>       Each value in the initizlizer constant is of type bool. The lowest numbered
>>>>>>>>>>       element in the const array corresponds to the LSbit of p, element 1 is
>>>>>>>>>>       assigned to bit 4 etc.
>>>>>>>>>>
>>>>>>>>>>       p is effectively a 2-byte bitmask with value 0x0010
>>>>>>>>>>
>>>>>>>>>>       With a different layout
>>>>>>>>>>
>>>>>>>>>>         typedef bool v4bi __attribute__ ((vector_size (2, 4, 1)));
>>>>>>>>>>         v4bi p = {false, true, false, false};
>>>>>>>>>>
>>>>>>>>>>       p is effectively a 2-byte bitmask with value 0x0002
>>>>>>>>>>
>>>>>>>>>> * Operations:
>>>>>>>>>>
>>>>>>>>>>       Packed Boolean Vectors support the following operations:
>>>>>>>>>>       . unary ~
>>>>>>>>>>       . unary !
>>>>>>>>>>       . binary&,|andˆ
>>>>>>>>>>       . assignments &=, |= and ˆ=
>>>>>>>>>>       . comparisons <, <=, ==, !=, >= and >
>>>>>>>>>>       . Ternary operator ?:
>>>>>>>>>>
>>>>>>>>>>       Operations are defined as applied to the individual elements i.e the bits
>>>>>>>>>>       that are significant in the PBV. Whether the PBVs are treated as bitmasks
>>>>>>>>>>       or otherwise is implementation-defined.
>>>>>>>>>>
>>>>>>>>>>       Insignificant bits could affect results of comparisons or ternary operators.
>>>>>>>>>>       In such cases, it is implementation defined how the unused bits are treated.
>>>>>>>>>>
>>>>>>>>>>       . Subscript operator []
>>>>>>>>>>
>>>>>>>>>>       For the subscript operator, the packed boolean vector acts like a array of
>>>>>>>>>>       elements - the first or the 0th indexed element being the LSbit of the PBV.
>>>>>>>>>>       Subscript operator yields a scalar boolean value.
>>>>>>>>>>       For example:
>>>>>>>>>>
>>>>>>>>>>         typedef bool v8b __attribute__ ((vector_size (2, 8, 2)));
>>>>>>>>>>
>>>>>>>>>>         // Subscript operator result yields a boolean value.
>>>>>>>>>>         // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x.
>>>>>>>>>>         bool foo (v8b p, int n) { p[3] = true; return p[1]; }
>>>>>>>>>>
>>>>>>>>>>       Out of bounds access: OOB access can be determined at compile time given the
>>>>>>>>>>       strong typing of the PBVs.
>>>>>>>>>>
>>>>>>>>>>       PBV does not support address of operator(&) for elements of PBVs.
>>>>>>>>>>
>>>>>>>>>>       . Implicit conversion from integer vectors to PBVs
>>>>>>>>>>
>>>>>>>>>>       We would like to support the output of comparison operations to be PBVs. This
>>>>>>>>>>       requires us to define the implicit conversion from an integer vector to PBV
>>>>>>>>>>       as the result of vector comparisons are integer vectors.
>>>>>>>>>>
>>>>>>>>>>       To define this operation:
>>>>>>>>>>
>>>>>>>>>>         bool_vector = vector <cmpop> vector
>>>>>>>>>>
>>>>>>>>>>       There is no change in how vector <cmpop> vector behavior i.e. this comparison
>>>>>>>>>>       would still produce an int_vector type as it does now.
>>>>>>>>>>
>>>>>>>>>>         temp_int_vec = vector <cmpop> vector
>>>>>>>>>>         bool_vec = temp_int_vec // Implicit conversion from int_vec to bool_vec
>>>>>>>>>>
>>>>>>>>>>       The implicit conversion from int_vec to bool I'd define simply to be:
>>>>>>>>>>
>>>>>>>>>>         bool_vec[n] = (_Bool) int_vec[n]
>>>>>>>>>>
>>>>>>>>>>       where the C11 standard rules apply
>>>>>>>>>>       6.3.1.2 Boolean type  When any scalar value is converted to _Bool, the result
>>>>>>>>>>       is 0 if the value compares equal to 0; otherwise, the result is 1.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
>>>>>>>>>> [2] https://reviews.llvm.org/D88905
>>>>>>>>>> [3] https://reviews.llvm.org/D81083
>>>>>>>>>>
>>>>>>>>>> Thoughts?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Tejas.
>>>>>>
>>>>
>>

next prev parent reply	other threads:[~2023-07-26  7:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  6:23 Tejas Belagod
2023-06-26  8:50 ` Richard Biener
2023-06-27  6:30   ` Tejas Belagod
2023-06-27  7:28     ` Richard Biener
2023-06-28 11:26       ` Tejas Belagod
2023-06-29 13:25         ` Richard Biener
2023-07-03  6:50           ` Tejas Belagod
2023-07-03  8:01             ` Richard Biener
2023-07-13 10:14               ` Tejas Belagod
2023-07-13 10:35                 ` Richard Biener
2023-07-14 10:18                   ` Tejas Belagod
2023-07-17 12:16                     ` Richard Biener
2023-07-26  7:21                       ` Tejas Belagod [this message]
2023-07-26 12:26                         ` Richard Biener
2023-07-26 12:33                           ` Richard Biener
2023-10-05 20:48                             ` Matthias Kretz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bce0b7e8-2380-159d-92f3-78070287ac47@arm.com \
    --to=tejas.belagod@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).