From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x235.google.com (mail-lj1-x235.google.com [IPv6:2a00:1450:4864:20::235]) by sourceware.org (Postfix) with ESMTPS id 753453858C52 for ; Mon, 3 Jul 2023 08:04:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 753453858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x235.google.com with SMTP id 38308e7fff4ca-2b5e7dba43cso64777401fa.1 for ; Mon, 03 Jul 2023 01:04:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688371471; x=1690963471; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8bnZ4T0ROBJ+14Zbawl5AXx4iVJFvij3Yq8bBjHXK8o=; b=TXNbIG1keE11+QnpD6wuNb2isKfjsJpQuYbmGegv979NAm32XrnrNUDamB+lRX0Mdc f2nTuCiOWlZuVeztb2BhyEJFOcSLKVLDRbd+U7pUq69GJf7Zahv4X1wpyXA++nNlirlj RCf3Hx2bykGEnN68f5puJ4adTGzZrmIeWw68WNb8f6a66iaGloSYwTQqz3p1PMZRagVM gw/3yEwT7px/7mT3LDObblctwdTCxpfiDEEM62DPoVt0j4vLGBH4ER1MHVjQDaxOUzqJ Css82PiRTAaRiG7ZHlYBHl8qVL4ATwY7ZGT7mXZVLc4jiKnxSKNZQ4xC1zfrFxS4Nk18 ASgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688371471; x=1690963471; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8bnZ4T0ROBJ+14Zbawl5AXx4iVJFvij3Yq8bBjHXK8o=; b=bWCFQ3NUkPf0Daa9jrDxr4FFV0IZyM5ykJ/6qO2t2aO8rzRNC7Rsjs2frBgFvF9dtz wr3Qbk4c7uzoFPEMvtR7NuXeEUVo0pJzQYIRuZxvVXeR3pTeYXiAMi2yJS6WgDdmnUPp LIr32ogQu5Gx46283jhGuY0xDeUyeiOPPhiDc3byilMxMRvsCpPOjYBErXwdFmRZAdrn h65T8Pm2QyuF6C6pYlU8OtfhJ1HXRKYwZYTzjuDUWSuS9+O+rGicvbzDesi2pQhYUdiT OGTPahjxxDEkDw1po4gtqwP4r9bhbQsyLqK7upDMYf76q1taLcxsfC0qylmoKJXAn4Rm zTYQ== X-Gm-Message-State: ABy/qLae9pwytLFo4aNAjP+WQ8vpEQ9WhyrYxe2qzrBWI/llkAefOzSl b0Ghi59o3xOPQoHbn5Ji8g9Q2evAo/rL6sL+sPiB3R2T X-Google-Smtp-Source: APBJJlF0k+lbB/0m6tIleY6lHtNvJqF16bKwHLTcMoIfEoEajSdWiUwLS4sq4e4Llc9NXQcqjW/g9TxHPq3TGBVvwbw= X-Received: by 2002:a2e:9105:0:b0:2b6:cb55:72cb with SMTP id m5-20020a2e9105000000b002b6cb5572cbmr6053008ljg.39.1688371470634; Mon, 03 Jul 2023 01:04:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 3 Jul 2023 10:01:16 +0200 Message-ID: Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors To: Tejas Belagod Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jul 3, 2023 at 8:50=E2=80=AFAM Tejas Belagod wrote: > > On 6/29/23 6:55 PM, Richard Biener wrote: > > On Wed, Jun 28, 2023 at 1:26=E2=80=AFPM Tejas Belagod wrote: > >> > >> > >> > >> > >> > >> From: Richard Biener > >> Date: Tuesday, June 27, 2023 at 12:58 PM > >> To: Tejas Belagod > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > >> > >> On Tue, Jun 27, 2023 at 8:30=E2=80=AFAM Tejas Belagod wrote: > >>> > >>> > >>> > >>> > >>> > >>> From: Richard Biener > >>> Date: Monday, June 26, 2023 at 2:23 PM > >>> To: Tejas Belagod > >>> Cc: gcc-patches@gcc.gnu.org > >>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > >>> > >>> On Mon, Jun 26, 2023 at 8:24=E2=80=AFAM Tejas Belagod via Gcc-patches > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> Packed Boolean Vectors > >>>> ---------------------- > >>>> > >>>> I'd like to propose a feature addition to GNU Vector extensions to a= dd packed > >>>> boolean vectors (PBV). This has been discussed in the past here[1] = and a variant has > >>>> been implemented in Clang recently[2]. > >>>> > >>>> With predication features being added to vector architectures (SVE, = MVE, AVX), > >>>> it is a useful feature to have to model predication on targets. Thi= s could > >>>> find its use in intrinsics or just used as is as a GNU vector extens= ion being > >>>> mapped to underlying target features. For example, the packed boole= an vector > >>>> could directly map to a predicate register on SVE. > >>>> > >>>> Also, this new packed boolean type GNU extension can be used with SV= E ACLE > >>>> intrinsics to replace a fixed-length svbool_t. > >>>> > >>>> Here are a few options to represent the packed boolean vector type. > >>> > >>> The GIMPLE frontend uses a new 'vector_mask' attribute: > >>> > >>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); > >>> typedef v8si v8sib __attribute__((vector_mask)); > >>> > >>> it get's you a vector type that's the appropriate (dependent on the > >>> target) vector > >>> mask type for the vector data type (v8si in this case). > >>> > >>> > >>> > >>> Thanks Richard. > >>> > >>> Having had a quick look at the implementation, it does seem to tick t= he boxes. > >>> > >>> I must admit I haven't dug deep, but if the target hook allows the ma= sk to be > >>> > >>> defined in way that is target-friendly (and I don't know how much eff= ort it will > >>> > >>> be to migrate the attribute to more front-ends), it should do the job= nicely. > >>> > >>> Let me go back and dig a bit deeper and get back with questions if an= y. > >> > >> > >> Let me add that the advantage of this is the compiler doesn't need > >> to support weird explicitely laid out packed boolean vectors that do > >> not match what the target supports and the user doesn't need to know > >> what the target supports (and thus have an #ifdef maze around explicit= ely > >> specified layouts). > >> > >> Sorry for the delayed response =E2=80=93 I spent a day experimenting w= ith vector_mask. > >> > >> > >> > >> Yeah, this is what option 4 in the RFC is trying to achieve =E2=80=93 = be portable enough > >> > >> to avoid having to sprinkle the code with ifdefs. > >> > >> > >> It does remove some flexibility though, for example with -mavx512f -ma= vx512vl > >> you'll get AVX512 style masks for V4SImode data vectors but of course = the > >> target sill supports SSE2/AVX2 style masks as well, but those would no= t be > >> available as "packed boolean vectors", though they are of course in fa= ct > >> equal to V4SImode data vectors with -1 or 0 values, so in this particu= lar > >> case it might not matter. > >> > >> That said, the vector_mask attribute will get you V4SImode vectors wit= h > >> signed boolean elements of 32 bits for V4SImode data vectors with > >> SSE2/AVX2. > >> > >> > >> > >> This sounds very much like what the scenario would be with NEON vs SVE= . Coming to think > >> > >> of it, vector_mask resembles option 4 in the proposal with =E2=80=98n= =E2=80=99 implied by the =E2=80=98base=E2=80=99 vector type > >> > >> and a =E2=80=98w=E2=80=99 specified for the type. > >> > >> > >> > >> Given its current implementation, if vector_mask is exposed to the CFE= , would there be any > >> > >> major challenges wrt implementation or defining behaviour semantics? I= played around with a > >> > >> few examples from the testsuite and wrote some new ones. I mostly trie= d operations that > >> > >> the new type would have to support (unary, binary bitwise, initializat= ions etc) =E2=80=93 with a couple of exceptions > >> > >> most of the ops seem to be supported. I also triggered a couple of ICE= s in some tests involving > >> > >> implicit conversions to wider/narrower vector_mask types (will raise r= eports for these). Correct me > >> > >> if I=E2=80=99m wrong here, but we=E2=80=99d probably have to support a= couple of new ops if vector_mask is exposed > >> > >> to the CFE =E2=80=93 initialization and subscript operations? > > > > Yes, either that or restrict how the mask vectors can be used, thus > > properly diagnose improper > > uses. > > Indeed. > > A question would be for example how to write common mask test > > operations like > > if (any (mask)) or if (all (mask)). > > I see 2 options here. New builtins could support new types - they'd > provide a target independent way to test any and all conditions. Another > would be to let the target use its intrinsics to do them in the most > efficient way possible (which the builtins would get lowered down to > anyway). > > > Likewise writing merge operations > > - do those as > > > > a =3D a | (mask ? b : 0); > > > > thus use ternary ?: for this? > > Yes, like now, the ternary could just translate to > > {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... } > > One thing to flesh out is the semantics. Should we allow this operation > as long as the number of elements are the same even if the mask type if > different i.e. > > v4hib ? v4si : v4si; > > I don't see why this can't be allowed as now we let > > v4si ? v4sf : v4sf; > > > For initialization regular vector > > syntax should work: > > > > mtype mask =3D (mtype){ -1, -1, 0, 0, ... }; > > > > there's the question of the signedness of the mask elements. GCC > > internally uses signed > > bools with values -1 for true and 0 for false. > > One of the things is the value that represents true. This is largely > target-dependent when it comes to the vector_mask type. When vector_mask > types are created from GCC's internal representation of bool vectors > (signed ints) the point about implicit/explicit conversions from signed > int vect to mask types in the proposal covers this. So mask in > > v4sib mask =3D (v4sib){-1, -1, 0, 0, ... } > > will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx > on SVE. On AVX2/SSE they'd still be represented as vector of signed ints > {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications this > new mask type representations will have in the mid-end while being > converted back and forth to and from GCC's internal representation, but > I'm guessing this is already being handled at some level by the > vector_mask type's current support? Yes, I would guess so. Of course what the middle-end is currently exposed to is simply what the vectorizer generates - once fuzzers discover this fea= ture we'll see "interesting" uses that might run into missed or wrong handling o= f them. So whatever we do on the side of exposing this to users a good portion of testsuite coverage for the allowed use cases is important. Richard. > > Thanks, > Tejas. > > > > > Richard. > > > >> > >> > >> > >> > >> > >> Thanks, > >> > >> Tejas. > >> > >> > >> > >> > >> > >> Richard. > >> > >>> > >>> > >>> Thanks, > >>> > >>> Tejas. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>>> 1. __attribute__((vector_size (n))) where n represents bytes > >>>> > >>>> typedef bool vbool __attribute__ ((vector_size (1))); > >>>> > >>>> In this approach, the shape of the boolean vector is unclear. IoW, i= t is not > >>>> clear if each bit in 'n' controls a byte or an element. On targets > >>>> like SVE, it would be natural to have each bit control a byte of the= target > >>>> vector (therefore resulting in an 'unpacked' layout of the PBV) and = on AVX, each > >>>> bit would control one element/lane on the target vector(therefore re= sulting in a > >>>> 'packed' layout with all significant bits at the LSB). > >>>> > >>>> 2. __attribute__((vector_size (n))) where n represents num of lanes > >>>> > >>>> typedef int v4si __attribute__ ((vector_size (4 * sizeof (int))); > >>>> typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / size= of (v4si){0}[0]))); > >>>> > >>>> Here the 'n' in the vector_size attribute represents the number of b= its that > >>>> is needed to represent a vector quantity. In this case, this packed= boolean > >>>> vector can represent upto 'n' vector lanes. The size of the type is > >>>> rounded up the nearest byte. For example, the sizeof v4bi in the ab= ove > >>>> example is 1. > >>>> > >>>> In this approach, because of the nature of the representation, the n= bits required > >>>> to represent the n lanes of the vector are packed at the LSB. This d= oes not naturally > >>>> align with the SVE approach of each bit representing a byte of the t= arget vector > >>>> and PBV therefore having an 'unpacked' layout. > >>>> > >>>> More importantly, another drawback here is that the change in units = for vector_size > >>>> might be confusing to programmers. The units will have to be interp= reted based on the > >>>> base type of the typedef. It does not offer any flexibility in terms= of the layout of > >>>> the bool vector - it is fixed. > >>>> > >>>> 3. Combination of 1 and 2. > >>>> > >>>> Combining the best of 1 and 2, we can introduce extra parameters to = vector_size that will > >>>> unambiguously represent the layout of the PBV. Consider > >>>> > >>>> typedef bool vbool __attribute__((vector_size (s, n[, w]))); > >>>> > >>>> where 's' is size in bytes, 'n' is the number of lanes and an option= al 3rd parameter 'w' > >>>> is the number of bits of the PBV that represents a lane of the targe= t vector. 'w' would > >>>> allow a target to force a certain layout of the PBV. > >>>> > >>>> The 2-parameter form of vector_size allows the target to have an > >>>> implementation-defined layout of the PBV. The target is free to choo= se the 'w' > >>>> if it is not specified to mirror the target layout of predicate regi= sters. For > >>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n. > >>>> > >>>> As an example, to represent the result of a comparison on 2 int16x8_= t, we'd need > >>>> 8 lanes of boolean which could be represented by > >>>> > >>>> typedef bool v8b __attribute__ ((vector_size (2, 8))); > >>>> > >>>> SVE would implement v8b layout to make every 2nd bit significant i.e= . w =3D=3D 2 > >>>> > >>>> and AVX would choose a layout where all 8 consecutive bits packed at= LSB would > >>>> be significant i.e. w =3D=3D 1. > >>>> > >>>> This scheme would accomodate more than 1 target to effectively repre= sent vector > >>>> bools that mirror the target properties. > >>>> > >>>> 4. A new attribite > >>>> > >>>> This is based on a suggestion from Richard S in [3]. The idea is to = introduce a new > >>>> attribute to define the PBV and make it general enough to > >>>> > >>>> * represent all targets flexibly (SVE, AVX etc) > >>>> * represent sub-byte length predicates > >>>> * have no change in units of vector_size/no new vector_size signatur= e > >>>> * not have the number of bytes constrain representation > >>>> > >>>> If we call the new attribute 'bool_vec' (for lack of a better name),= consider > >>>> > >>>> typedef bool vbool __attribute__((bool_vec (n[, w]))) > >>>> > >>>> where 'n' represents number of lanes/elements and the optional 'w' i= s bits-per-lane. > >>>> > >>>> If 'w' is not specified, it and bytes-per-predicate are implementati= on-defined based on target. > >>>> If 'w' is specified, sizeof (vbool) will be ceil (n*w/8). > >>>> > >>>> 5. Behaviour of the packed vector boolean type. > >>>> > >>>> Taking the example of one of the options above, following is an illu= stration of it's behavior > >>>> > >>>> * ABI > >>>> > >>>> New ABI rules will need to be defined for this type - eg alignmen= t, PCS, > >>>> mangling etc > >>>> > >>>> * Initialization: > >>>> > >>>> Packed Boolean Vectors(PBV) can be initialized like so: > >>>> > >>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 4))); > >>>> v4bi p =3D {false, true, false, false}; > >>>> > >>>> Each value in the initizlizer constant is of type bool. The lowes= t numbered > >>>> element in the const array corresponds to the LSbit of p, element= 1 is > >>>> assigned to bit 4 etc. > >>>> > >>>> p is effectively a 2-byte bitmask with value 0x0010 > >>>> > >>>> With a different layout > >>>> > >>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 1))); > >>>> v4bi p =3D {false, true, false, false}; > >>>> > >>>> p is effectively a 2-byte bitmask with value 0x0002 > >>>> > >>>> * Operations: > >>>> > >>>> Packed Boolean Vectors support the following operations: > >>>> . unary ~ > >>>> . unary ! > >>>> . binary&,|and=CB=86 > >>>> . assignments &=3D, |=3D and =CB=86=3D > >>>> . comparisons <, <=3D, =3D=3D, !=3D, >=3D and > > >>>> . Ternary operator ?: > >>>> > >>>> Operations are defined as applied to the individual elements i.e = the bits > >>>> that are significant in the PBV. Whether the PBVs are treated as = bitmasks > >>>> or otherwise is implementation-defined. > >>>> > >>>> Insignificant bits could affect results of comparisons or ternary= operators. > >>>> In such cases, it is implementation defined how the unused bits a= re treated. > >>>> > >>>> . Subscript operator [] > >>>> > >>>> For the subscript operator, the packed boolean vector acts like a= array of > >>>> elements - the first or the 0th indexed element being the LSbit o= f the PBV. > >>>> Subscript operator yields a scalar boolean value. > >>>> For example: > >>>> > >>>> typedef bool v8b __attribute__ ((vector_size (2, 8, 2))); > >>>> > >>>> // Subscript operator result yields a boolean value. > >>>> // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x. > >>>> bool foo (v8b p, int n) { p[3] =3D true; return p[1]; } > >>>> > >>>> Out of bounds access: OOB access can be determined at compile tim= e given the > >>>> strong typing of the PBVs. > >>>> > >>>> PBV does not support address of operator(&) for elements of PBVs. > >>>> > >>>> . Implicit conversion from integer vectors to PBVs > >>>> > >>>> We would like to support the output of comparison operations to b= e PBVs. This > >>>> requires us to define the implicit conversion from an integer vec= tor to PBV > >>>> as the result of vector comparisons are integer vectors. > >>>> > >>>> To define this operation: > >>>> > >>>> bool_vector =3D vector vector > >>>> > >>>> There is no change in how vector vector behavior i.e. thi= s comparison > >>>> would still produce an int_vector type as it does now. > >>>> > >>>> temp_int_vec =3D vector vector > >>>> bool_vec =3D temp_int_vec // Implicit conversion from int_vec t= o bool_vec > >>>> > >>>> The implicit conversion from int_vec to bool I'd define simply to= be: > >>>> > >>>> bool_vec[n] =3D (_Bool) int_vec[n] > >>>> > >>>> where the C11 standard rules apply > >>>> 6.3.1.2 Boolean type When any scalar value is converted to _Bool= , the result > >>>> is 0 if the value compares equal to 0; otherwise, the result is 1= . > >>>> > >>>> > >>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html > >>>> [2] https://reviews.llvm.org/D88905 > >>>> [3] https://reviews.llvm.org/D81083 > >>>> > >>>> Thoughts? > >>>> > >>>> Thanks, > >>>> Tejas. >