From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=I8LG=DM=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a])
	by sourceware.org (Postfix) with ESMTPS id F3BFD3858417
	for <gcc-patches@gcc.gnu.org>; Wed, 26 Jul 2023 12:34:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F3BFD3858417
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x22a.google.com with SMTP id 38308e7fff4ca-2b70404a5a0so99504711fa.2
        for <gcc-patches@gcc.gnu.org>; Wed, 26 Jul 2023 05:34:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1690374868; x=1690979668;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=b12Uyrxp4nY3Nl2cqFvoiV2Q0wOfnPAQa7werL3RXN4=;
        b=U4FbpA5QvfH4X2x0fR3YYuhhzcIN6w6Hul3sKffMs1a/N5FpaXfZj8NAkJm60cM1bx
         ujmgsFCD1riOBjy5TdWmQ7u8iniVNl3j5u0im/moaOFS4njIWanAQXmLTSuPmwLRaE7F
         xJEuc0UEEwaIGX5i07N5XOQnGUWgwayU6VINpZ2ee9zE57+/4JJ4FtLAU16vuKK+h9fx
         l9BlY2aD7dUl+ZVIui1vCOPFrwhVznOKKL32hQHm8qVQWfV7Rfi9ZC+TE5le+/wfrK5R
         29x7IvqhL3wIGX734NbgqL1W0rDSGimsTm6ktogmqZ6VcxgFgTr7sEVje4WHyEpE2fzU
         M+3w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1690374868; x=1690979668;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=b12Uyrxp4nY3Nl2cqFvoiV2Q0wOfnPAQa7werL3RXN4=;
        b=YgMfaOnH9hjaKnvv1n5pT1f0mGpz28lHGUyVsFGozRFd5+66f2hcSXlMq9a+hPqLG/
         WnjBWbY9Umd/jpK4FvnYrbQAqJK/FP4XqVLr7dJckJMXMoMXn7NVMPhP673IJNuWMBJL
         nbvBHC6TxzILP3ONmIvhZHvEzlDIoQXBzJMsvBnqpOQAZs78Ou1zXHm9fz6ff568Z8zO
         6zLvxivwts9lsmi2UiG7T1CMWV2b4RyDlRQVeob2C9Dx14a5PSJSWLHpshx4P9rzeZCz
         IFK6HOgNVoW4ygOxIlk5IJOAtP+AkwQKxkGqvnUCrZZK+LtuKVGAh4p8MbBs+hplc7QK
         /J4w==
X-Gm-Message-State: ABy/qLa6uuckfPGHaCn0ArwwpGjjwGz78xWMi1UZ5dXkvfnvsFhuxr5n
	BBaTs2VDkj7vPVAFLcsxUdI/HrKtb1xovJ+fzMs=
X-Google-Smtp-Source: APBJJlE9Gews1It6i3MlryMyWD/2qG3lW2QLnDQ0zHQDkxbCAcnJSPGtJMPI8q/5xlRDviIPVcsX8I2lgQk/pc4rCDI=
X-Received: by 2002:a05:651c:214:b0:2b6:9bd3:840e with SMTP id
 y20-20020a05651c021400b002b69bd3840emr1412150ljn.21.1690374867601; Wed, 26
 Jul 2023 05:34:27 -0700 (PDT)
MIME-Version: 1.0
References: <AS8PR08MB7079EB8008E1C57F58587E88EA26A@AS8PR08MB7079.eurprd08.prod.outlook.com>
 <CAFiYyc3KFDjXa3XTqDJhuqWFwE+cAqvWGO4HZ9q6fC0nFdNdSw@mail.gmail.com>
 <AS8PR08MB70795366A2BFE13A0EEF6094EA27A@AS8PR08MB7079.eurprd08.prod.outlook.com>
 <CAFiYyc0BCz4ddEhrbcJC=xvMMSCVfJMC+i8x8uPseqTe3rd7pw@mail.gmail.com>
 <AS8PR08MB7079D26858AC63F4645EDBEEEA27A@AS8PR08MB7079.eurprd08.prod.outlook.com>
 <CAFiYyc1ffi8Ahid7EZLiPAuC6BVOE3XaS-Pn8M3SDKHZ51rBsA@mail.gmail.com>
 <ff50b9cc-fb56-a7d9-304b-e20d00d1aee6@arm.com> <CAFiYyc2dN8O8yPfWjL3-1gEdxKt2WQQsskgPR6rvtTDQHL6HsQ@mail.gmail.com>
 <87a51e61-271a-44d7-ed94-de45d32b2e18@arm.com> <CAFiYyc0JR=qkk++UnifD_+vbpd7hCad-GkxxvaeqDENEAsCymQ@mail.gmail.com>
 <f9b7b5c1-c796-76da-7bed-a9a4508fb4b8@arm.com> <CAFiYyc11DiJNL-gJvCNa9gb4OAUJay11wDzzxgAFHPqvUj+qwA@mail.gmail.com>
 <bce0b7e8-2380-159d-92f3-78070287ac47@arm.com> <CAFiYyc09Kwr1n+kH+WcnqeDJn_c59aX=dQ6+D0_c0RZH81SN7w@mail.gmail.com>
In-Reply-To: <CAFiYyc09Kwr1n+kH+WcnqeDJn_c59aX=dQ6+D0_c0RZH81SN7w@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 26 Jul 2023 14:33:41 +0200
Message-ID: <CAFiYyc15ZGB+bRJs9DTmXT6jKWPoH117SFgR4RSMj0oW1qAHDw@mail.gmail.com>
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
To: Tejas Belagod <tejas.belagod@arm.com>, Matthias Kretz <m.kretz@gsi.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, Jul 26, 2023 at 2:26=E2=80=AFPM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Jul 26, 2023 at 9:21=E2=80=AFAM Tejas Belagod <tejas.belagod@arm.=
com> wrote:
> >
> > On 7/17/23 5:46 PM, Richard Biener wrote:
> > > On Fri, Jul 14, 2023 at 12:18=E2=80=AFPM Tejas Belagod <tejas.belagod=
@arm.com> wrote:
> > >>
> > >> On 7/13/23 4:05 PM, Richard Biener wrote:
> > >>> On Thu, Jul 13, 2023 at 12:15=E2=80=AFPM Tejas Belagod <tejas.belag=
od@arm.com> wrote:
> > >>>>
> > >>>> On 7/3/23 1:31 PM, Richard Biener wrote:
> > >>>>> On Mon, Jul 3, 2023 at 8:50=E2=80=AFAM Tejas Belagod <tejas.belag=
od@arm.com> wrote:
> > >>>>>>
> > >>>>>> On 6/29/23 6:55 PM, Richard Biener wrote:
> > >>>>>>> On Wed, Jun 28, 2023 at 1:26=E2=80=AFPM Tejas Belagod <Tejas.Be=
lagod@arm.com> wrote:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> From: Richard Biener <richard.guenther@gmail.com>
> > >>>>>>>> Date: Tuesday, June 27, 2023 at 12:58 PM
> > >>>>>>>> To: Tejas Belagod <Tejas.Belagod@arm.com>
> > >>>>>>>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > >>>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vect=
ors
> > >>>>>>>>
> > >>>>>>>> On Tue, Jun 27, 2023 at 8:30=E2=80=AFAM Tejas Belagod <Tejas.B=
elagod@arm.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> From: Richard Biener <richard.guenther@gmail.com>
> > >>>>>>>>> Date: Monday, June 26, 2023 at 2:23 PM
> > >>>>>>>>> To: Tejas Belagod <Tejas.Belagod@arm.com>
> > >>>>>>>>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > >>>>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vec=
tors
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Jun 26, 2023 at 8:24=E2=80=AFAM Tejas Belagod via Gcc=
-patches
> > >>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> Packed Boolean Vectors
> > >>>>>>>>>> ----------------------
> > >>>>>>>>>>
> > >>>>>>>>>> I'd like to propose a feature addition to GNU Vector extensi=
ons to add packed
> > >>>>>>>>>> boolean vectors (PBV).  This has been discussed in the past =
here[1] and a variant has
> > >>>>>>>>>> been implemented in Clang recently[2].
> > >>>>>>>>>>
> > >>>>>>>>>> With predication features being added to vector architecture=
s (SVE, MVE, AVX),
> > >>>>>>>>>> it is a useful feature to have to model predication on targe=
ts.  This could
> > >>>>>>>>>> find its use in intrinsics or just used as is as a GNU vecto=
r extension being
> > >>>>>>>>>> mapped to underlying target features.  For example, the pack=
ed boolean vector
> > >>>>>>>>>> could directly map to a predicate register on SVE.
> > >>>>>>>>>>
> > >>>>>>>>>> Also, this new packed boolean type GNU extension can be used=
 with SVE ACLE
> > >>>>>>>>>> intrinsics to replace a fixed-length svbool_t.
> > >>>>>>>>>>
> > >>>>>>>>>> Here are a few options to represent the packed boolean vecto=
r type.
> > >>>>>>>>>
> > >>>>>>>>> The GIMPLE frontend uses a new 'vector_mask' attribute:
> > >>>>>>>>>
> > >>>>>>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> > >>>>>>>>> typedef v8si v8sib __attribute__((vector_mask));
> > >>>>>>>>>
> > >>>>>>>>> it get's you a vector type that's the appropriate (dependent =
on the
> > >>>>>>>>> target) vector
> > >>>>>>>>> mask type for the vector data type (v8si in this case).
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks Richard.
> > >>>>>>>>>
> > >>>>>>>>> Having had a quick look at the implementation, it does seem t=
o tick the boxes.
> > >>>>>>>>>
> > >>>>>>>>> I must admit I haven't dug deep, but if the target hook allow=
s the mask to be
> > >>>>>>>>>
> > >>>>>>>>> defined in way that is target-friendly (and I don't know how =
much effort it will
> > >>>>>>>>>
> > >>>>>>>>> be to migrate the attribute to more front-ends), it should do=
 the job nicely.
> > >>>>>>>>>
> > >>>>>>>>> Let me go back and dig a bit deeper and get back with questio=
ns if any.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Let me add that the advantage of this is the compiler doesn't =
need
> > >>>>>>>> to support weird explicitely laid out packed boolean vectors t=
hat do
> > >>>>>>>> not match what the target supports and the user doesn't need t=
o know
> > >>>>>>>> what the target supports (and thus have an #ifdef maze around =
explicitely
> > >>>>>>>> specified layouts).
> > >>>>>>>>
> > >>>>>>>> Sorry for the delayed response =E2=80=93 I spent a day experim=
enting with vector_mask.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Yeah, this is what option 4 in the RFC is trying to achieve =
=E2=80=93 be portable enough
> > >>>>>>>>
> > >>>>>>>> to avoid having to sprinkle the code with ifdefs.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> It does remove some flexibility though, for example with -mavx=
512f -mavx512vl
> > >>>>>>>> you'll get AVX512 style masks for V4SImode data vectors but of=
 course the
> > >>>>>>>> target sill supports SSE2/AVX2 style masks as well, but those =
would not be
> > >>>>>>>> available as "packed boolean vectors", though they are of cour=
se in fact
> > >>>>>>>> equal to V4SImode data vectors with -1 or 0 values, so in this=
 particular
> > >>>>>>>> case it might not matter.
> > >>>>>>>>
> > >>>>>>>> That said, the vector_mask attribute will get you V4SImode vec=
tors with
> > >>>>>>>> signed boolean elements of 32 bits for V4SImode data vectors w=
ith
> > >>>>>>>> SSE2/AVX2.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> This sounds very much like what the scenario would be with NEO=
N vs SVE. Coming to think
> > >>>>>>>>
> > >>>>>>>> of it, vector_mask resembles option 4 in the proposal with =E2=
=80=98n=E2=80=99 implied by the =E2=80=98base=E2=80=99 vector type
> > >>>>>>>>
> > >>>>>>>> and a =E2=80=98w=E2=80=99 specified for the type.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Given its current implementation, if vector_mask is exposed to=
 the CFE, would there be any
> > >>>>>>>>
> > >>>>>>>> major challenges wrt implementation or defining behaviour sema=
ntics? I played around with a
> > >>>>>>>>
> > >>>>>>>> few examples from the testsuite and wrote some new ones. I mos=
tly tried operations that
> > >>>>>>>>
> > >>>>>>>> the new type would have to support (unary, binary bitwise, ini=
tializations etc) =E2=80=93 with a couple of exceptions
> > >>>>>>>>
> > >>>>>>>> most of the ops seem to be supported. I also triggered a coupl=
e of ICEs in some tests involving
> > >>>>>>>>
> > >>>>>>>> implicit conversions to wider/narrower vector_mask types (will=
 raise reports for these). Correct me
> > >>>>>>>>
> > >>>>>>>> if I=E2=80=99m wrong here, but we=E2=80=99d probably have to s=
upport a couple of new ops if vector_mask is exposed
> > >>>>>>>>
> > >>>>>>>> to the CFE =E2=80=93 initialization and subscript operations?
> > >>>>>>>
> > >>>>>>> Yes, either that or restrict how the mask vectors can be used, =
thus
> > >>>>>>> properly diagnose improper
> > >>>>>>> uses.
> > >>>>>>
> > >>>>>> Indeed.
> > >>>>>>
> > >>>>>>      A question would be for example how to write common mask te=
st
> > >>>>>>> operations like
> > >>>>>>> if (any (mask)) or if (all (mask)).
> > >>>>>>
> > >>>>>> I see 2 options here. New builtins could support new types - the=
y'd
> > >>>>>> provide a target independent way to test any and all conditions.=
 Another
> > >>>>>> would be to let the target use its intrinsics to do them in the =
most
> > >>>>>> efficient way possible (which the builtins would get lowered dow=
n to
> > >>>>>> anyway).
> > >>>>>>
> > >>>>>>
> > >>>>>>      Likewise writing merge operations
> > >>>>>>> - do those as
> > >>>>>>>
> > >>>>>>>      a =3D a | (mask ? b : 0);
> > >>>>>>>
> > >>>>>>> thus use ternary ?: for this?
> > >>>>>>
> > >>>>>> Yes, like now, the ternary could just translate to
> > >>>>>>
> > >>>>>>       {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... }
> > >>>>>>
> > >>>>>> One thing to flesh out is the semantics. Should we allow this op=
eration
> > >>>>>> as long as the number of elements are the same even if the mask =
type if
> > >>>>>> different i.e.
> > >>>>>>
> > >>>>>>       v4hib ? v4si : v4si;
> > >>>>>>
> > >>>>>> I don't see why this can't be allowed as now we let
> > >>>>>>
> > >>>>>>       v4si ? v4sf : v4sf;
> > >>>>>>
> > >>>>>>
> > >>>>>> For initialization regular vector
> > >>>>>>> syntax should work:
> > >>>>>>>
> > >>>>>>> mtype mask =3D (mtype){ -1, -1, 0, 0, ... };
> > >>>>>>>
> > >>>>>>> there's the question of the signedness of the mask elements.  G=
CC
> > >>>>>>> internally uses signed
> > >>>>>>> bools with values -1 for true and 0 for false.
> > >>>>>>
> > >>>>>> One of the things is the value that represents true. This is lar=
gely
> > >>>>>> target-dependent when it comes to the vector_mask type. When vec=
tor_mask
> > >>>>>> types are created from GCC's internal representation of bool vec=
tors
> > >>>>>> (signed ints) the point about implicit/explicit conversions from=
 signed
> > >>>>>> int vect to mask types in the proposal covers this. So mask in
> > >>>>>>
> > >>>>>>       v4sib mask =3D (v4sib){-1, -1, 0, 0, ... }
> > >>>>>>
> > >>>>>> will probably end up being represented as 0x3xxxx on AVX512 and =
0x11xxx
> > >>>>>> on SVE. On AVX2/SSE they'd still be represented as vector of sig=
ned ints
> > >>>>>> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramificati=
ons this
> > >>>>>> new mask type representations will have in the mid-end while bei=
ng
> > >>>>>> converted back and forth to and from GCC's internal representati=
on, but
> > >>>>>> I'm guessing this is already being handled at some level by the
> > >>>>>> vector_mask type's current support?
> > >>>>>
> > >>>>> Yes, I would guess so.  Of course what the middle-end is currentl=
y exposed
> > >>>>> to is simply what the vectorizer generates - once fuzzers discove=
r this feature
> > >>>>> we'll see "interesting" uses that might run into missed or wrong =
handling of
> > >>>>> them.
> > >>>>>
> > >>>>> So whatever we do on the side of exposing this to users a good po=
rtion
> > >>>>> of testsuite coverage for the allowed use cases is important.
> > >>>>>
> > >>>>> Richard.
> > >>>>>
> > >>>>
> > >>>> Apologies for the long-ish reply, but here's a TLDR and gory detai=
ls follow.
> > >>>>
> > >>>> TLDR:
> > >>>> GIMPLE's vector_mask type semantics seems to be target-dependent, =
so
> > >>>> elevating vector_mask to CFE with same semantics is undesirable. O=
TOH,
> > >>>> changing vector_mask to have target-independent CFE semantics will=
 cause
> > >>>> dichotomy between its CFE and GFE behaviours. But vector_mask appr=
oach
> > >>>> scales well for sizeless types. Is the solution to have something =
like
> > >>>> vector_mask with defined target-independent type semantics, but ca=
ll it
> > >>>> something else to prevent conflation with GIMPLE, a viable option?
> > >>>>
> > >>>> Details:
> > >>>> After some more analysis of the proposed options, here are some
> > >>>> interesting findings:
> > >>>>
> > >>>> vector_mask looked like a very interesting option until I ran into=
 some
> > >>>> semantic uncertainly. This code:
> > >>>>
> > >>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> > >>>> typedef v8si v8sib __attribute__((vector_mask));
> > >>>>
> > >>>> typedef short v8hi __attribute__((vector_size(8*sizeof(short))));
> > >>>> typedef v8hi v8hib __attribute__((vector_mask));
> > >>>>
> > >>>> v8si res;
> > >>>> v8hi resh;
> > >>>>
> > >>>> v8hib __GIMPLE () foo (v8hib x, v8sib y)
> > >>>> {
> > >>>>      v8hib res;
> > >>>>
> > >>>>      res =3D x & y;
> > >>>>      return res;
> > >>>> }
> > >>>>
> > >>>> When compiled on AArch64, produces a type-mismatch error for binar=
y
> > >>>> expression involving '&' because the 'derived' types 'v8hib' and '=
v8sib'
> > >>>>     have a different target-layout. If the layout of these two 'de=
rived'
> > >>>> types match, then the above code has no issue. Which is the case o=
n
> > >>>> amdgcn-amdhsa target where it compiles without any error(amdgcn us=
es a
> > >>>> scalar DImode mask mode). IoW such code seems to be allowed on som=
e
> > >>>> targets and not on others.
> > >>>>
> > >>>> With the same code, I tried putting casts and it worked fine on AA=
rch64
> > >>>> and amdgcn. This target-specific behaviour of vector_mask derived =
types
> > >>>> will be difficult to specify once we move it to the CFE - in fact =
we
> > >>>> probably don't want target-specific behaviour once it moves to the=
 CFE.
> > >>>>
> > >>>> If we expose vector_mask to CFE, we'd have to specify consistent
> > >>>> semantics for vector_mask types. We'd have to resolve ambiguities =
like
> > >>>> 'v4hib & v4sib' clearly to be able to specify the semantics of the=
 type
> > >>>> system involving vector_mask. If we do this, don't we run the risk=
 of a
> > >>>> dichotomy between the CFE and GFE semantics of vector_mask? I'm as=
suming
> > >>>> we'd want to retain vector_mask semantics as they are in GIMPLE.
> > >>>>
> > >>>> If we want to enforce constant semantics for vector_mask in the CF=
E, one
> > >>>> way is to treat vector_mask types as distinct if they're 'attached=
' to
> > >>>> distinct data vector types. In such a scenario, vector_mask types
> > >>>> attached to two data vector types with the same lane-width and num=
ber of
> > >>>> lanes would be classified as distinct. For eg:
> > >>>>
> > >>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> > >>>> typedef v8si v8sib __attribute__((vector_mask));
> > >>>>
> > >>>> typedef float v8sf __attribute__((vector_size(8*sizeof(float))));
> > >>>> typedef v8sf v8sfb __attribute__((vector_mask));
> > >>>>
> > >>>> v8si  foo (v8sf x, v8sf y, v8si i, v8si j)
> > >>>> {
> > >>>>      (a =3D=3D b) & (v8sfb)(x =3D=3D y) ? x : (v8si){0};
> > >>>> }
> > >>>>
> > >>>> This could be the case for unsigned vs signed int vectors too for =
eg -
> > >>>> seems a bit unnecessary tbh.
> > >>>>
> > >>>> Though vector_mask's being 'attached' to a type has its drawbacks,=
 it
> > >>>> does seem to have an advantage when sizeless types are considered.=
 If we
> > >>>> have to define a sizeless vector boolean type that is implied by t=
he
> > >>>> lane size, we could do something like
> > >>>>
> > >>>> typedef svint32_t svbool32_t __attribute__((vector_mask));
> > >>>>
> > >>>> int32_t foo (svint32_t a, svint32_t b)
> > >>>> {
> > >>>>      svbool32_t pred =3D a > b;
> > >>>>
> > >>>>      return pred[2] ? a[2] : b[2];
> > >>>> }
> > >>>>
> > >>>> This is harder to do in the other schemes proposed so far as they'=
re
> > >>>> size-based.
> > >>>>
> > >>>> To be able to free the boolean from the base type (not size) and r=
etain
> > >>>> vector_mask's flexibility to declare sizeless types, we could have=
 an
> > >>>> attribute that is more flexibly-typed and only 'derives' the lane-=
size
> > >>>> and number of lanes from its 'base' type without actually inheriti=
ng the
> > >>>> actual base type(char, short, int etc) or its signedness. This cre=
ates a
> > >>>> purer and stand-alone boolean type without the associated semantic=
s'
> > >>>> complexity of having to cast between two same-size types with the =
same
> > >>>> number of lanes. Eg.
> > >>>>
> > >>>> typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> > >>>> typedef v8si v8b __attribute__((vector_bool));
> > >>>>
> > >>>> However, with differing lane-sizes, there will have to be a cast a=
s the
> > >>>> 'derived' element size is different which could impact the layout =
of the
> > >>>> vector mask. Eg.
> > >>>>
> > >>>> v8si  foo (v8hi x, v8hi y, v8si i, v8si j)
> > >>>> {
> > >>>>      (v8sib)(x =3D=3D y) & (i =3D=3D j) ? i : (v8si){0};
> > >>>> }
> > >>>>
> > >>>> Such conversions on targets like AVX512/AMDGCN will be a NOP, but
> > >>>> non-trivial on SVE (depending on the implemented layout of the boo=
l vector).
> > >>>>
> > >>>> vector_bool decouples us from having to retain the behaviour of
> > >>>> vector_mask and provides the flexibility of not having to cast acr=
oss
> > >>>> same-element-size vector types. Wrt to sizeless types, it could sc=
ale well.
> > >>>>
> > >>>> typedef svint32_t svbool32_t __attribute__((vector_bool));
> > >>>> typedef svint16_t svbool16_t __attribute__((vector_bool));
> > >>>>
> > >>>> int32_t foo (svint32_t a, svint32_t b)
> > >>>> {
> > >>>>      svbool32_t pred =3D a > b;
> > >>>>
> > >>>>      return pred[2] ? a[2] : b[2];
> > >>>> }
> > >>>>
> > >>>> int16_t bar (svint16_t a, svint16_t b)
> > >>>> {
> > >>>>      svbool16_t pred =3D a > b;
> > >>>>
> > >>>>      return pred[2] ? a[2] : b[2];
> > >>>> }
> > >>>>
> > >>>> On SVE, pred[2] refers to bit 4 for svint16_t and bit 8 for svint3=
2_t on
> > >>>> the target predicate.
> > >>>>
> > >>>> Thoughts?
> > >>>
> > >>> The GIMPLE frontend accepts just what is valid on the target here. =
 Any
> > >>> "plumbing" such as implicit conversions (if we do not want to requi=
re
> > >>> explicit ones even when NOP) need to be done/enforced by the C fron=
tend.
> > >>>
> > >>
> > >> Sorry, I'm not sure I follow - correct me if I'm wrong here.
> > >>
> > >> If we desire to define/allow operations like implicit/explicit
> > >> conversion on vector_mask types in CFE, don't we have to start from =
a
> > >> position of defining what makes vector_mask types distinct and there=
fore
> > >> require implicit/explicit conversions?
> > >
> > > We need to look at which operations we want to produce vector masks a=
nd
> > > which operations consume them and what operations operate on them.
> > >
> > > In GIMPLE comparisons produce them, conditionals consume them and
> > > we allow bitwise ops to operate on them directly (GIMPLE doesn't have
> > > logical && it just has bitwise &).
> > >
> >
> > Thanks for your thoughts - after I spent more cycles researching and
> > experimenting, I think I understand the driving factors here. Compariso=
n
> > producers generate signed integer vectors of the same lane-width as the
> > comparison operands. This means mixed type vectors can't be applied to
> > conditional consumers or bitwise operators eg:
> >
> >    v8hi foo (v8si a, v8si b, v8hi c, v8hi d)
> >    {
> >      return a > b || c > d; // error!
> >      return a > b || __builtin_convertvector (c > d, v8si); // OK.
> >      return a | b && c | d; // error!
> >      return a | b && __builtin_convertvector (c | d, v8si); // OK.
> >    }
> >
> > Similarly, if we extend these 'stricter-typing' rules to vector_mask, i=
t
> > could look like:
> >
> > typedef v4sib v4si __attribute__((vector_mask));
> > typedef v4hib v4hi __attribute__((vector_mask));
> >
> >    v8sib foo (v8si a, v8si b, v8hi c, v8hi d)
> >    {
> >      v8sib psi =3D a > b;
> >      v8hib phi =3D c > d;
> >
> >      return psi || phi; // error!
> >      return psi || __builtin_convertvector (phi, v8sib); // OK.
> >      return psi | phi; // error!
> >      return psi | __builtin_convertvector (phi, v8sib); // OK.
> >    }
> >
> > At GIMPLE stage, on targets where the layout allows it (eg AMDGCN),
> > expressions like
> >    psi | __builtin_convertvector (phi, v8sib)
> > can be optimized to
> >    psi | phi
> > because __builtin_convertvector (phi, v8sib) is a NOP.
> >
> > I think this could make vector_mask more portable across targets. If on=
e
> > wants to take CFE vector_mask code and run it on the GFE, it should
> > work; while the reverse won't as CFE vector_mask rules are more restric=
tive.
> >
> > Does this look like a sensible approach for progress?
>
> Yes, that looks good.
>
> > >> IIUC, GFE's distinctness of vector_mask types depends on how the mas=
k
> > >> mode is implemented on the target. If implemented in CFE, vector_mas=
k
> > >> types' distinctness probably shouldn't be based on target layout and
> > >> could be based on the type they're 'attached' to.
> > >
> > > But since we eventually run on the target the layout should ideally
> > > match that of the target.  Now, the question is whether that's ever
> > > OK behavior - it effectively makes the mask somewhat opaque and
> > > only "observable" by probing it in defined manners.
> > >
> > >> Wouldn't that diverge from target-specific GFE behaviour - or are yo=
u
> > >> suggesting its OK for vector_mask type semantics to be different in =
CFE
> > >> and GFE?
> > >
> > > It's definitely undesirable but as said I'm not sure it has to differ
> > > [the layout].
> > >
> >
> > I agree it is best to have a consistent layout of vector_mask across CF=
E
> > and GFE and also implement it to match the target layout for optimal
> > code quality.
> >
> > For observability, I think it makes sense to allow operations that are
> > relevant and have a consistent meaning irrespective of that target. Eg.
> > 'vector_mask & 2' might not mean the same thing on all targets, but
> > vector_mask[2] does. Therefore, I think the opaqueness is useful and
> > necessary to some extent.
>
> Yes.  The main question regarding to observability will be
> things like sizeof or alignof and putting masks into addressable storage.
>
> I think IBM folks have introduced some "opaque" types for their
> matrix-multiplication accelerator where intrinsics need something to
> work with but the observability of many aspect is restricted.  In
> the middle-end we have OPAQUE_TYPE and MODE_OPAQUE
> (but IIRC there can only be a single kind of that at the moment).
> Interestingly an OPAQUE_TYPE does have a size.
>
> Note one way out would be to make vector_mask types "decay"
> to a value vector type.  Thus any time you try to observe it
> you get a vector bool (a vector of actual 8 bit bool data elements)
> and when you use a vector data type in mask context you
> get a "mask conversion" aka vector bool !=3D 0.  It would then be
> up to the compiler to elide round-trips between mask and data.
>
> That would mean sizeof (vector_mask) would be sizeof (vector bool)
> even when for example the hardware would produce V4SImode
> mask from a V4SFmode compare or when it would produce a
> QImode 4-bit integer from the same?

Btw, how the experimental SIMD C++ standard library handles
these issue might be also interesting to research (author CCed)

Richard.

> Richard.
>
> > Thanks,
> > Tejas.
> >
> > >>> There's one issue I can see that wasn't mentioned yet - GCC current=
ly
> > >>> accepts
> > >>>
> > >>> typedef long gv1024di __attribute__((vector_size(1024*8)));
> > >>>
> > >>> even if there's no underlying support on the target which either ha=
s support
> > >>> only for smaller vectors or no vectors at all.  Currently vector_ma=
sk will
> > >>> simply fail to produce sth desirable here.  What's your idea of mak=
ing
> > >>> that not target dependent?  GCC will later lower operations with su=
ch
> > >>> vectors, possibly splitting them up into sizes supported by the har=
dware
> > >>> natively, possibly performing elementwise operations.  For the form=
er
> > >>> one would need to guess the "decomposition type" and based on that
> > >>> select the mask type [layout]?
> > >>>
> > >>> One idea would be to specify the mask layout follows the largest ve=
ctor
> > >>> kind supported by the target and if there is none follow the layout
> > >>> of (signed?) _Bool [n]?  When there's no target support for vectors
> > >>> GCC will generally use elementwise operations apart from some
> > >>> special-cases.
> > >>>
> > >>
> > >> That is a very good point - thanks for raising it. For when GCC choo=
ses
> > >> to lower to a vector type supported by the target, my initial though=
t
> > >> would be to, as you say, choose a mask that has enough bits to repre=
sent
> > >> the largest vector size with the smallest lane-width. The actual lay=
out
> > >> of the mask will depend on how the target implements its mask mode.
> > >> Decomposition of vector_mask ought to follow the decomposition of th=
e
> > >> GNU vectors type and each decomposed vector_mask type ought to have
> > >> enough bits to represent the decomposed GNU vector shape. It sounds =
nice
> > >> on paper, but I haven't really worked through a design for this. Do =
you
> > >> see any gotchas here?
> > >
> > > Not really.  In the end it comes down to what the C writer is allowed=
 to
> > > do with a vector mask.  I would for example expect that I could do
> > >
> > >   auto m =3D v1 < v2;
> > >   _mm512_mask_sub_epi32 (a, m, b, c);
> > >
> > > so generic masks should inter-operate with intrinsics (when the appro=
priate
> > > ISA is enabled).  That works for the data vectors themselves for exam=
ple
> > > (quite some intrinsics are implemented with GCCs generic vector code)=
.
> > >
> > > I for example can't do
> > >
> > >    _Bool lane2 =3D m[2];
> > >
> > > to inspect lane two of a maks with AVX512.  I can do m & 2 but I woul=
dn't expect
> > > that to work (should I?) with a vector_mask mask (it's at least not
> > > valid directly
> > > in GIMPLE).  There's _mm512_int2mask and _mm512_mask2int which transf=
er
> > > between mask and int (but the mask types are really just typedefd to
> > > integer typeS).
> > >
> > >>> While using a different name than vector_mask is certainly possible
> > >>> it wouldn't me to decide that, but I'm also not yet convinced it's
> > >>> really necessary.  As said, what the GIMPLE frontend accepts
> > >>> or not shouldn't limit us here - just the actual chosen layout of t=
he
> > >>> boolean vectors.
> > >>>
> > >>
> > >> I'm just concerned about creating an alternate vector_mask functiona=
lity
> > >> in the CFE and risk not being consistent with GFE.
> > >
> > > I think it's more important to double-check usablilty from the users =
side.
> > > If the implementation necessarily diverges from GIMPLE then we can
> > > choose a different attribute name but then it will also inevitably ha=
ve
> > > code-generation (quality) issues as GIMPLE matches what the hardware
> > > can do.
> > >
> > > Richard.
> > >
> > >> Thanks,
> > >> Tejas.
> > >>
> > >>> Richard.
> > >>>
> > >>>> Thanks,
> > >>>> Tejas.
> > >>>>
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Tejas.
> > >>>>>>
> > >>>>>>>
> > >>>>>>> Richard.
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>> Tejas.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Richard.
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>> Tejas.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> 1. __attribute__((vector_size (n))) where n represents bytes
> > >>>>>>>>>>
> > >>>>>>>>>>       typedef bool vbool __attribute__ ((vector_size (1)));
> > >>>>>>>>>>
> > >>>>>>>>>> In this approach, the shape of the boolean vector is unclear=
. IoW, it is not
> > >>>>>>>>>> clear if each bit in 'n' controls a byte or an element. On t=
argets
> > >>>>>>>>>> like SVE, it would be natural to have each bit control a byt=
e of the target
> > >>>>>>>>>> vector (therefore resulting in an 'unpacked' layout of the P=
BV) and on AVX, each
> > >>>>>>>>>> bit would control one element/lane on the target vector(ther=
efore resulting in a
> > >>>>>>>>>> 'packed' layout with all significant bits at the LSB).
> > >>>>>>>>>>
> > >>>>>>>>>> 2. __attribute__((vector_size (n))) where n represents num o=
f lanes
> > >>>>>>>>>>
> > >>>>>>>>>>       typedef int v4si __attribute__ ((vector_size (4 * size=
of (int)));
> > >>>>>>>>>>       typedef bool v4bi __attribute__ ((vector_size (sizeof =
v4si / sizeof (v4si){0}[0])));
> > >>>>>>>>>>
> > >>>>>>>>>> Here the 'n' in the vector_size attribute represents the num=
ber of bits that
> > >>>>>>>>>> is needed to represent a vector quantity.  In this case, thi=
s packed boolean
> > >>>>>>>>>> vector can represent upto 'n' vector lanes. The size of the =
type is
> > >>>>>>>>>> rounded up the nearest byte.  For example, the sizeof v4bi i=
n the above
> > >>>>>>>>>> example is 1.
> > >>>>>>>>>>
> > >>>>>>>>>> In this approach, because of the nature of the representatio=
n, the n bits required
> > >>>>>>>>>> to represent the n lanes of the vector are packed at the LSB=
. This does not naturally
> > >>>>>>>>>> align with the SVE approach of each bit representing a byte =
of the target vector
> > >>>>>>>>>> and PBV therefore having an 'unpacked' layout.
> > >>>>>>>>>>
> > >>>>>>>>>> More importantly, another drawback here is that the change i=
n units for vector_size
> > >>>>>>>>>> might be confusing to programmers.  The units will have to b=
e interpreted based on the
> > >>>>>>>>>> base type of the typedef. It does not offer any flexibility =
in terms of the layout of
> > >>>>>>>>>> the bool vector - it is fixed.
> > >>>>>>>>>>
> > >>>>>>>>>> 3. Combination of 1 and 2.
> > >>>>>>>>>>
> > >>>>>>>>>> Combining the best of 1 and 2, we can introduce extra parame=
ters to vector_size that will
> > >>>>>>>>>> unambiguously represent the layout of the PBV. Consider
> > >>>>>>>>>>
> > >>>>>>>>>>       typedef bool vbool __attribute__((vector_size (s, n[, =
w])));
> > >>>>>>>>>>
> > >>>>>>>>>> where 's' is size in bytes, 'n' is the number of lanes and a=
n optional 3rd parameter 'w'
> > >>>>>>>>>> is the number of bits of the PBV that represents a lane of t=
he target vector. 'w' would
> > >>>>>>>>>> allow a target to force a certain layout of the PBV.
> > >>>>>>>>>>
> > >>>>>>>>>> The 2-parameter form of vector_size allows the target to hav=
e an
> > >>>>>>>>>> implementation-defined layout of the PBV. The target is free=
 to choose the 'w'
> > >>>>>>>>>> if it is not specified to mirror the target layout of predic=
ate registers. For
> > >>>>>>>>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
> > >>>>>>>>>>
> > >>>>>>>>>> As an example, to represent the result of a comparison on 2 =
int16x8_t, we'd need
> > >>>>>>>>>> 8 lanes of boolean which could be represented by
> > >>>>>>>>>>
> > >>>>>>>>>>       typedef bool v8b __attribute__ ((vector_size (2, 8)));
> > >>>>>>>>>>
> > >>>>>>>>>> SVE would implement v8b layout to make every 2nd bit signifi=
cant i.e. w =3D=3D 2
> > >>>>>>>>>>
> > >>>>>>>>>> and AVX would choose a layout where all 8 consecutive bits p=
acked at LSB would
> > >>>>>>>>>> be significant i.e. w =3D=3D 1.
> > >>>>>>>>>>
> > >>>>>>>>>> This scheme would accomodate more than 1 target to effective=
ly represent vector
> > >>>>>>>>>> bools that mirror the target properties.
> > >>>>>>>>>>
> > >>>>>>>>>> 4. A new attribite
> > >>>>>>>>>>
> > >>>>>>>>>> This is based on a suggestion from Richard S in [3]. The ide=
a is to introduce a new
> > >>>>>>>>>> attribute to define the PBV and make it general enough to
> > >>>>>>>>>>
> > >>>>>>>>>> * represent all targets flexibly (SVE, AVX etc)
> > >>>>>>>>>> * represent sub-byte length predicates
> > >>>>>>>>>> * have no change in units of vector_size/no new vector_size =
signature
> > >>>>>>>>>> * not have the number of bytes constrain representation
> > >>>>>>>>>>
> > >>>>>>>>>> If we call the new attribute 'bool_vec' (for lack of a bette=
r name), consider
> > >>>>>>>>>>
> > >>>>>>>>>>       typedef bool vbool __attribute__((bool_vec (n[, w])))
> > >>>>>>>>>>
> > >>>>>>>>>> where 'n' represents number of lanes/elements and the option=
al 'w' is bits-per-lane.
> > >>>>>>>>>>
> > >>>>>>>>>> If 'w' is not specified, it and bytes-per-predicate are impl=
ementation-defined based on target.
> > >>>>>>>>>> If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).
> > >>>>>>>>>>
> > >>>>>>>>>> 5. Behaviour of the packed vector boolean type.
> > >>>>>>>>>>
> > >>>>>>>>>> Taking the example of one of the options above, following is=
 an illustration of it's behavior
> > >>>>>>>>>>
> > >>>>>>>>>> * ABI
> > >>>>>>>>>>
> > >>>>>>>>>>       New ABI rules will need to be defined for this type - =
eg alignment, PCS,
> > >>>>>>>>>>       mangling etc
> > >>>>>>>>>>
> > >>>>>>>>>> * Initialization:
> > >>>>>>>>>>
> > >>>>>>>>>>       Packed Boolean Vectors(PBV) can be initialized like so=
:
> > >>>>>>>>>>
> > >>>>>>>>>>         typedef bool v4bi __attribute__ ((vector_size (2, 4,=
 4)));
> > >>>>>>>>>>         v4bi p =3D {false, true, false, false};
> > >>>>>>>>>>
> > >>>>>>>>>>       Each value in the initizlizer constant is of type bool=
. The lowest numbered
> > >>>>>>>>>>       element in the const array corresponds to the LSbit of=
 p, element 1 is
> > >>>>>>>>>>       assigned to bit 4 etc.
> > >>>>>>>>>>
> > >>>>>>>>>>       p is effectively a 2-byte bitmask with value 0x0010
> > >>>>>>>>>>
> > >>>>>>>>>>       With a different layout
> > >>>>>>>>>>
> > >>>>>>>>>>         typedef bool v4bi __attribute__ ((vector_size (2, 4,=
 1)));
> > >>>>>>>>>>         v4bi p =3D {false, true, false, false};
> > >>>>>>>>>>
> > >>>>>>>>>>       p is effectively a 2-byte bitmask with value 0x0002
> > >>>>>>>>>>
> > >>>>>>>>>> * Operations:
> > >>>>>>>>>>
> > >>>>>>>>>>       Packed Boolean Vectors support the following operation=
s:
> > >>>>>>>>>>       . unary ~
> > >>>>>>>>>>       . unary !
> > >>>>>>>>>>       . binary&,|and=CB=86
> > >>>>>>>>>>       . assignments &=3D, |=3D and =CB=86=3D
> > >>>>>>>>>>       . comparisons <, <=3D, =3D=3D, !=3D, >=3D and >
> > >>>>>>>>>>       . Ternary operator ?:
> > >>>>>>>>>>
> > >>>>>>>>>>       Operations are defined as applied to the individual el=
ements i.e the bits
> > >>>>>>>>>>       that are significant in the PBV. Whether the PBVs are =
treated as bitmasks
> > >>>>>>>>>>       or otherwise is implementation-defined.
> > >>>>>>>>>>
> > >>>>>>>>>>       Insignificant bits could affect results of comparisons=
 or ternary operators.
> > >>>>>>>>>>       In such cases, it is implementation defined how the un=
used bits are treated.
> > >>>>>>>>>>
> > >>>>>>>>>>       . Subscript operator []
> > >>>>>>>>>>
> > >>>>>>>>>>       For the subscript operator, the packed boolean vector =
acts like a array of
> > >>>>>>>>>>       elements - the first or the 0th indexed element being =
the LSbit of the PBV.
> > >>>>>>>>>>       Subscript operator yields a scalar boolean value.
> > >>>>>>>>>>       For example:
> > >>>>>>>>>>
> > >>>>>>>>>>         typedef bool v8b __attribute__ ((vector_size (2, 8, =
2)));
> > >>>>>>>>>>
> > >>>>>>>>>>         // Subscript operator result yields a boolean value.
> > >>>>>>>>>>         // x[3] is the 7th LSbit and x[1] is the 3rd LSbit o=
f x.
> > >>>>>>>>>>         bool foo (v8b p, int n) { p[3] =3D true; return p[1]=
; }
> > >>>>>>>>>>
> > >>>>>>>>>>       Out of bounds access: OOB access can be determined at =
compile time given the
> > >>>>>>>>>>       strong typing of the PBVs.
> > >>>>>>>>>>
> > >>>>>>>>>>       PBV does not support address of operator(&) for elemen=
ts of PBVs.
> > >>>>>>>>>>
> > >>>>>>>>>>       . Implicit conversion from integer vectors to PBVs
> > >>>>>>>>>>
> > >>>>>>>>>>       We would like to support the output of comparison oper=
ations to be PBVs. This
> > >>>>>>>>>>       requires us to define the implicit conversion from an =
integer vector to PBV
> > >>>>>>>>>>       as the result of vector comparisons are integer vector=
s.
> > >>>>>>>>>>
> > >>>>>>>>>>       To define this operation:
> > >>>>>>>>>>
> > >>>>>>>>>>         bool_vector =3D vector <cmpop> vector
> > >>>>>>>>>>
> > >>>>>>>>>>       There is no change in how vector <cmpop> vector behavi=
or i.e. this comparison
> > >>>>>>>>>>       would still produce an int_vector type as it does now.
> > >>>>>>>>>>
> > >>>>>>>>>>         temp_int_vec =3D vector <cmpop> vector
> > >>>>>>>>>>         bool_vec =3D temp_int_vec // Implicit conversion fro=
m int_vec to bool_vec
> > >>>>>>>>>>
> > >>>>>>>>>>       The implicit conversion from int_vec to bool I'd defin=
e simply to be:
> > >>>>>>>>>>
> > >>>>>>>>>>         bool_vec[n] =3D (_Bool) int_vec[n]
> > >>>>>>>>>>
> > >>>>>>>>>>       where the C11 standard rules apply
> > >>>>>>>>>>       6.3.1.2 Boolean type  When any scalar value is convert=
ed to _Bool, the result
> > >>>>>>>>>>       is 0 if the value compares equal to 0; otherwise, the =
result is 1.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434=
.html
> > >>>>>>>>>> [2] https://reviews.llvm.org/D88905
> > >>>>>>>>>> [3] https://reviews.llvm.org/D81083
> > >>>>>>>>>>
> > >>>>>>>>>> Thoughts?
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Tejas.
> > >>>>>>
> > >>>>
> > >>
> >