From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=2pBN=CV=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x235.google.com (mail-lj1-x235.google.com [IPv6:2a00:1450:4864:20::235])
	by sourceware.org (Postfix) with ESMTPS id 753453858C52
	for <gcc-patches@gcc.gnu.org>; Mon,  3 Jul 2023 08:04:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 753453858C52
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x235.google.com with SMTP id 38308e7fff4ca-2b5e7dba43cso64777401fa.1
        for <gcc-patches@gcc.gnu.org>; Mon, 03 Jul 2023 01:04:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1688371471; x=1690963471;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=8bnZ4T0ROBJ+14Zbawl5AXx4iVJFvij3Yq8bBjHXK8o=;
        b=TXNbIG1keE11+QnpD6wuNb2isKfjsJpQuYbmGegv979NAm32XrnrNUDamB+lRX0Mdc
         f2nTuCiOWlZuVeztb2BhyEJFOcSLKVLDRbd+U7pUq69GJf7Zahv4X1wpyXA++nNlirlj
         RCf3Hx2bykGEnN68f5puJ4adTGzZrmIeWw68WNb8f6a66iaGloSYwTQqz3p1PMZRagVM
         gw/3yEwT7px/7mT3LDObblctwdTCxpfiDEEM62DPoVt0j4vLGBH4ER1MHVjQDaxOUzqJ
         Css82PiRTAaRiG7ZHlYBHl8qVL4ATwY7ZGT7mXZVLc4jiKnxSKNZQ4xC1zfrFxS4Nk18
         ASgA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688371471; x=1690963471;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=8bnZ4T0ROBJ+14Zbawl5AXx4iVJFvij3Yq8bBjHXK8o=;
        b=bWCFQ3NUkPf0Daa9jrDxr4FFV0IZyM5ykJ/6qO2t2aO8rzRNC7Rsjs2frBgFvF9dtz
         wr3Qbk4c7uzoFPEMvtR7NuXeEUVo0pJzQYIRuZxvVXeR3pTeYXiAMi2yJS6WgDdmnUPp
         LIr32ogQu5Gx46283jhGuY0xDeUyeiOPPhiDc3byilMxMRvsCpPOjYBErXwdFmRZAdrn
         h65T8Pm2QyuF6C6pYlU8OtfhJ1HXRKYwZYTzjuDUWSuS9+O+rGicvbzDesi2pQhYUdiT
         OGTPahjxxDEkDw1po4gtqwP4r9bhbQsyLqK7upDMYf76q1taLcxsfC0qylmoKJXAn4Rm
         zTYQ==
X-Gm-Message-State: ABy/qLae9pwytLFo4aNAjP+WQ8vpEQ9WhyrYxe2qzrBWI/llkAefOzSl
	b0Ghi59o3xOPQoHbn5Ji8g9Q2evAo/rL6sL+sPiB3R2T
X-Google-Smtp-Source: APBJJlF0k+lbB/0m6tIleY6lHtNvJqF16bKwHLTcMoIfEoEajSdWiUwLS4sq4e4Llc9NXQcqjW/g9TxHPq3TGBVvwbw=
X-Received: by 2002:a2e:9105:0:b0:2b6:cb55:72cb with SMTP id
 m5-20020a2e9105000000b002b6cb5572cbmr6053008ljg.39.1688371470634; Mon, 03 Jul
 2023 01:04:30 -0700 (PDT)
MIME-Version: 1.0
References: <AS8PR08MB7079EB8008E1C57F58587E88EA26A@AS8PR08MB7079.eurprd08.prod.outlook.com>
 <CAFiYyc3KFDjXa3XTqDJhuqWFwE+cAqvWGO4HZ9q6fC0nFdNdSw@mail.gmail.com>
 <AS8PR08MB70795366A2BFE13A0EEF6094EA27A@AS8PR08MB7079.eurprd08.prod.outlook.com>
 <CAFiYyc0BCz4ddEhrbcJC=xvMMSCVfJMC+i8x8uPseqTe3rd7pw@mail.gmail.com>
 <AS8PR08MB7079D26858AC63F4645EDBEEEA27A@AS8PR08MB7079.eurprd08.prod.outlook.com>
 <CAFiYyc1ffi8Ahid7EZLiPAuC6BVOE3XaS-Pn8M3SDKHZ51rBsA@mail.gmail.com> <ff50b9cc-fb56-a7d9-304b-e20d00d1aee6@arm.com>
In-Reply-To: <ff50b9cc-fb56-a7d9-304b-e20d00d1aee6@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 3 Jul 2023 10:01:16 +0200
Message-ID: <CAFiYyc2dN8O8yPfWjL3-1gEdxKt2WQQsskgPR6rvtTDQHL6HsQ@mail.gmail.com>
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
To: Tejas Belagod <tejas.belagod@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Jul 3, 2023 at 8:50=E2=80=AFAM Tejas Belagod <tejas.belagod@arm.com=
> wrote:
>
> On 6/29/23 6:55 PM, Richard Biener wrote:
> > On Wed, Jun 28, 2023 at 1:26=E2=80=AFPM Tejas Belagod <Tejas.Belagod@ar=
m.com> wrote:
> >>
> >>
> >>
> >>
> >>
> >> From: Richard Biener <richard.guenther@gmail.com>
> >> Date: Tuesday, June 27, 2023 at 12:58 PM
> >> To: Tejas Belagod <Tejas.Belagod@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> >> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >>
> >> On Tue, Jun 27, 2023 at 8:30=E2=80=AFAM Tejas Belagod <Tejas.Belagod@a=
rm.com> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Richard Biener <richard.guenther@gmail.com>
> >>> Date: Monday, June 26, 2023 at 2:23 PM
> >>> To: Tejas Belagod <Tejas.Belagod@arm.com>
> >>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> >>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >>>
> >>> On Mon, Jun 26, 2023 at 8:24=E2=80=AFAM Tejas Belagod via Gcc-patches
> >>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Packed Boolean Vectors
> >>>> ----------------------
> >>>>
> >>>> I'd like to propose a feature addition to GNU Vector extensions to a=
dd packed
> >>>> boolean vectors (PBV).  This has been discussed in the past here[1] =
and a variant has
> >>>> been implemented in Clang recently[2].
> >>>>
> >>>> With predication features being added to vector architectures (SVE, =
MVE, AVX),
> >>>> it is a useful feature to have to model predication on targets.  Thi=
s could
> >>>> find its use in intrinsics or just used as is as a GNU vector extens=
ion being
> >>>> mapped to underlying target features.  For example, the packed boole=
an vector
> >>>> could directly map to a predicate register on SVE.
> >>>>
> >>>> Also, this new packed boolean type GNU extension can be used with SV=
E ACLE
> >>>> intrinsics to replace a fixed-length svbool_t.
> >>>>
> >>>> Here are a few options to represent the packed boolean vector type.
> >>>
> >>> The GIMPLE frontend uses a new 'vector_mask' attribute:
> >>>
> >>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> >>> typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> it get's you a vector type that's the appropriate (dependent on the
> >>> target) vector
> >>> mask type for the vector data type (v8si in this case).
> >>>
> >>>
> >>>
> >>> Thanks Richard.
> >>>
> >>> Having had a quick look at the implementation, it does seem to tick t=
he boxes.
> >>>
> >>> I must admit I haven't dug deep, but if the target hook allows the ma=
sk to be
> >>>
> >>> defined in way that is target-friendly (and I don't know how much eff=
ort it will
> >>>
> >>> be to migrate the attribute to more front-ends), it should do the job=
 nicely.
> >>>
> >>> Let me go back and dig a bit deeper and get back with questions if an=
y.
> >>
> >>
> >> Let me add that the advantage of this is the compiler doesn't need
> >> to support weird explicitely laid out packed boolean vectors that do
> >> not match what the target supports and the user doesn't need to know
> >> what the target supports (and thus have an #ifdef maze around explicit=
ely
> >> specified layouts).
> >>
> >> Sorry for the delayed response =E2=80=93 I spent a day experimenting w=
ith vector_mask.
> >>
> >>
> >>
> >> Yeah, this is what option 4 in the RFC is trying to achieve =E2=80=93 =
be portable enough
> >>
> >> to avoid having to sprinkle the code with ifdefs.
> >>
> >>
> >> It does remove some flexibility though, for example with -mavx512f -ma=
vx512vl
> >> you'll get AVX512 style masks for V4SImode data vectors but of course =
the
> >> target sill supports SSE2/AVX2 style masks as well, but those would no=
t be
> >> available as "packed boolean vectors", though they are of course in fa=
ct
> >> equal to V4SImode data vectors with -1 or 0 values, so in this particu=
lar
> >> case it might not matter.
> >>
> >> That said, the vector_mask attribute will get you V4SImode vectors wit=
h
> >> signed boolean elements of 32 bits for V4SImode data vectors with
> >> SSE2/AVX2.
> >>
> >>
> >>
> >> This sounds very much like what the scenario would be with NEON vs SVE=
. Coming to think
> >>
> >> of it, vector_mask resembles option 4 in the proposal with =E2=80=98n=
=E2=80=99 implied by the =E2=80=98base=E2=80=99 vector type
> >>
> >> and a =E2=80=98w=E2=80=99 specified for the type.
> >>
> >>
> >>
> >> Given its current implementation, if vector_mask is exposed to the CFE=
, would there be any
> >>
> >> major challenges wrt implementation or defining behaviour semantics? I=
 played around with a
> >>
> >> few examples from the testsuite and wrote some new ones. I mostly trie=
d operations that
> >>
> >> the new type would have to support (unary, binary bitwise, initializat=
ions etc) =E2=80=93 with a couple of exceptions
> >>
> >> most of the ops seem to be supported. I also triggered a couple of ICE=
s in some tests involving
> >>
> >> implicit conversions to wider/narrower vector_mask types (will raise r=
eports for these). Correct me
> >>
> >> if I=E2=80=99m wrong here, but we=E2=80=99d probably have to support a=
 couple of new ops if vector_mask is exposed
> >>
> >> to the CFE =E2=80=93 initialization and subscript operations?
> >
> > Yes, either that or restrict how the mask vectors can be used, thus
> > properly diagnose improper
> > uses.
>
> Indeed.
>
>   A question would be for example how to write common mask test
> > operations like
> > if (any (mask)) or if (all (mask)).
>
> I see 2 options here. New builtins could support new types - they'd
> provide a target independent way to test any and all conditions. Another
> would be to let the target use its intrinsics to do them in the most
> efficient way possible (which the builtins would get lowered down to
> anyway).
>
>
>   Likewise writing merge operations
> > - do those as
> >
> >   a =3D a | (mask ? b : 0);
> >
> > thus use ternary ?: for this?
>
> Yes, like now, the ternary could just translate to
>
>    {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... }
>
> One thing to flesh out is the semantics. Should we allow this operation
> as long as the number of elements are the same even if the mask type if
> different i.e.
>
>    v4hib ? v4si : v4si;
>
> I don't see why this can't be allowed as now we let
>
>    v4si ? v4sf : v4sf;
>
>
> For initialization regular vector
> > syntax should work:
> >
> > mtype mask =3D (mtype){ -1, -1, 0, 0, ... };
> >
> > there's the question of the signedness of the mask elements.  GCC
> > internally uses signed
> > bools with values -1 for true and 0 for false.
>
> One of the things is the value that represents true. This is largely
> target-dependent when it comes to the vector_mask type. When vector_mask
> types are created from GCC's internal representation of bool vectors
> (signed ints) the point about implicit/explicit conversions from signed
> int vect to mask types in the proposal covers this. So mask in
>
>    v4sib mask =3D (v4sib){-1, -1, 0, 0, ... }
>
> will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx
> on SVE. On AVX2/SSE they'd still be represented as vector of signed ints
> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications this
> new mask type representations will have in the mid-end while being
> converted back and forth to and from GCC's internal representation, but
> I'm guessing this is already being handled at some level by the
> vector_mask type's current support?

Yes, I would guess so.  Of course what the middle-end is currently exposed
to is simply what the vectorizer generates - once fuzzers discover this fea=
ture
we'll see "interesting" uses that might run into missed or wrong handling o=
f
them.

So whatever we do on the side of exposing this to users a good portion
of testsuite coverage for the allowed use cases is important.

Richard.

>
> Thanks,
> Tejas.
>
> >
> > Richard.
> >
> >>
> >>
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Tejas.
> >>
> >>
> >>
> >>
> >>
> >> Richard.
> >>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Tejas.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> 1. __attribute__((vector_size (n))) where n represents bytes
> >>>>
> >>>>    typedef bool vbool __attribute__ ((vector_size (1)));
> >>>>
> >>>> In this approach, the shape of the boolean vector is unclear. IoW, i=
t is not
> >>>> clear if each bit in 'n' controls a byte or an element. On targets
> >>>> like SVE, it would be natural to have each bit control a byte of the=
 target
> >>>> vector (therefore resulting in an 'unpacked' layout of the PBV) and =
on AVX, each
> >>>> bit would control one element/lane on the target vector(therefore re=
sulting in a
> >>>> 'packed' layout with all significant bits at the LSB).
> >>>>
> >>>> 2. __attribute__((vector_size (n))) where n represents num of lanes
> >>>>
> >>>>    typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
> >>>>    typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / size=
of (v4si){0}[0])));
> >>>>
> >>>> Here the 'n' in the vector_size attribute represents the number of b=
its that
> >>>> is needed to represent a vector quantity.  In this case, this packed=
 boolean
> >>>> vector can represent upto 'n' vector lanes. The size of the type is
> >>>> rounded up the nearest byte.  For example, the sizeof v4bi in the ab=
ove
> >>>> example is 1.
> >>>>
> >>>> In this approach, because of the nature of the representation, the n=
 bits required
> >>>> to represent the n lanes of the vector are packed at the LSB. This d=
oes not naturally
> >>>> align with the SVE approach of each bit representing a byte of the t=
arget vector
> >>>> and PBV therefore having an 'unpacked' layout.
> >>>>
> >>>> More importantly, another drawback here is that the change in units =
for vector_size
> >>>> might be confusing to programmers.  The units will have to be interp=
reted based on the
> >>>> base type of the typedef. It does not offer any flexibility in terms=
 of the layout of
> >>>> the bool vector - it is fixed.
> >>>>
> >>>> 3. Combination of 1 and 2.
> >>>>
> >>>> Combining the best of 1 and 2, we can introduce extra parameters to =
vector_size that will
> >>>> unambiguously represent the layout of the PBV. Consider
> >>>>
> >>>>    typedef bool vbool __attribute__((vector_size (s, n[, w])));
> >>>>
> >>>> where 's' is size in bytes, 'n' is the number of lanes and an option=
al 3rd parameter 'w'
> >>>> is the number of bits of the PBV that represents a lane of the targe=
t vector. 'w' would
> >>>> allow a target to force a certain layout of the PBV.
> >>>>
> >>>> The 2-parameter form of vector_size allows the target to have an
> >>>> implementation-defined layout of the PBV. The target is free to choo=
se the 'w'
> >>>> if it is not specified to mirror the target layout of predicate regi=
sters. For
> >>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
> >>>>
> >>>> As an example, to represent the result of a comparison on 2 int16x8_=
t, we'd need
> >>>> 8 lanes of boolean which could be represented by
> >>>>
> >>>>    typedef bool v8b __attribute__ ((vector_size (2, 8)));
> >>>>
> >>>> SVE would implement v8b layout to make every 2nd bit significant i.e=
. w =3D=3D 2
> >>>>
> >>>> and AVX would choose a layout where all 8 consecutive bits packed at=
 LSB would
> >>>> be significant i.e. w =3D=3D 1.
> >>>>
> >>>> This scheme would accomodate more than 1 target to effectively repre=
sent vector
> >>>> bools that mirror the target properties.
> >>>>
> >>>> 4. A new attribite
> >>>>
> >>>> This is based on a suggestion from Richard S in [3]. The idea is to =
introduce a new
> >>>> attribute to define the PBV and make it general enough to
> >>>>
> >>>> * represent all targets flexibly (SVE, AVX etc)
> >>>> * represent sub-byte length predicates
> >>>> * have no change in units of vector_size/no new vector_size signatur=
e
> >>>> * not have the number of bytes constrain representation
> >>>>
> >>>> If we call the new attribute 'bool_vec' (for lack of a better name),=
 consider
> >>>>
> >>>>    typedef bool vbool __attribute__((bool_vec (n[, w])))
> >>>>
> >>>> where 'n' represents number of lanes/elements and the optional 'w' i=
s bits-per-lane.
> >>>>
> >>>> If 'w' is not specified, it and bytes-per-predicate are implementati=
on-defined based on target.
> >>>> If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).
> >>>>
> >>>> 5. Behaviour of the packed vector boolean type.
> >>>>
> >>>> Taking the example of one of the options above, following is an illu=
stration of it's behavior
> >>>>
> >>>> * ABI
> >>>>
> >>>>    New ABI rules will need to be defined for this type - eg alignmen=
t, PCS,
> >>>>    mangling etc
> >>>>
> >>>> * Initialization:
> >>>>
> >>>>    Packed Boolean Vectors(PBV) can be initialized like so:
> >>>>
> >>>>      typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
> >>>>      v4bi p =3D {false, true, false, false};
> >>>>
> >>>>    Each value in the initizlizer constant is of type bool. The lowes=
t numbered
> >>>>    element in the const array corresponds to the LSbit of p, element=
 1 is
> >>>>    assigned to bit 4 etc.
> >>>>
> >>>>    p is effectively a 2-byte bitmask with value 0x0010
> >>>>
> >>>>    With a different layout
> >>>>
> >>>>      typedef bool v4bi __attribute__ ((vector_size (2, 4, 1)));
> >>>>      v4bi p =3D {false, true, false, false};
> >>>>
> >>>>    p is effectively a 2-byte bitmask with value 0x0002
> >>>>
> >>>> * Operations:
> >>>>
> >>>>    Packed Boolean Vectors support the following operations:
> >>>>    . unary ~
> >>>>    . unary !
> >>>>    . binary&,|and=CB=86
> >>>>    . assignments &=3D, |=3D and =CB=86=3D
> >>>>    . comparisons <, <=3D, =3D=3D, !=3D, >=3D and >
> >>>>    . Ternary operator ?:
> >>>>
> >>>>    Operations are defined as applied to the individual elements i.e =
the bits
> >>>>    that are significant in the PBV. Whether the PBVs are treated as =
bitmasks
> >>>>    or otherwise is implementation-defined.
> >>>>
> >>>>    Insignificant bits could affect results of comparisons or ternary=
 operators.
> >>>>    In such cases, it is implementation defined how the unused bits a=
re treated.
> >>>>
> >>>>    . Subscript operator []
> >>>>
> >>>>    For the subscript operator, the packed boolean vector acts like a=
 array of
> >>>>    elements - the first or the 0th indexed element being the LSbit o=
f the PBV.
> >>>>    Subscript operator yields a scalar boolean value.
> >>>>    For example:
> >>>>
> >>>>      typedef bool v8b __attribute__ ((vector_size (2, 8, 2)));
> >>>>
> >>>>      // Subscript operator result yields a boolean value.
> >>>>      // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x.
> >>>>      bool foo (v8b p, int n) { p[3] =3D true; return p[1]; }
> >>>>
> >>>>    Out of bounds access: OOB access can be determined at compile tim=
e given the
> >>>>    strong typing of the PBVs.
> >>>>
> >>>>    PBV does not support address of operator(&) for elements of PBVs.
> >>>>
> >>>>    . Implicit conversion from integer vectors to PBVs
> >>>>
> >>>>    We would like to support the output of comparison operations to b=
e PBVs. This
> >>>>    requires us to define the implicit conversion from an integer vec=
tor to PBV
> >>>>    as the result of vector comparisons are integer vectors.
> >>>>
> >>>>    To define this operation:
> >>>>
> >>>>      bool_vector =3D vector <cmpop> vector
> >>>>
> >>>>    There is no change in how vector <cmpop> vector behavior i.e. thi=
s comparison
> >>>>    would still produce an int_vector type as it does now.
> >>>>
> >>>>      temp_int_vec =3D vector <cmpop> vector
> >>>>      bool_vec =3D temp_int_vec // Implicit conversion from int_vec t=
o bool_vec
> >>>>
> >>>>    The implicit conversion from int_vec to bool I'd define simply to=
 be:
> >>>>
> >>>>      bool_vec[n] =3D (_Bool) int_vec[n]
> >>>>
> >>>>    where the C11 standard rules apply
> >>>>    6.3.1.2 Boolean type  When any scalar value is converted to _Bool=
, the result
> >>>>    is 0 if the value compares equal to 0; otherwise, the result is 1=
.
> >>>>
> >>>>
> >>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
> >>>> [2] https://reviews.llvm.org/D88905
> >>>> [3] https://reviews.llvm.org/D81083
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Thanks,
> >>>> Tejas.
>