From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2075.outbound.protection.outlook.com [40.107.6.75]) by sourceware.org (Postfix) with ESMTPS id 79E843858C74 for ; Thu, 13 Jul 2023 10:15:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 79E843858C74 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=n93nRNIVEy0PulCR1k2jNC1f0x9Q6ZFT0/FacjBFVGs=; b=V+mjpyInTFb8/T7T8Yp4ScHB0Emm2Fc5/GtxsvHABMyD9IkhJRVdpbgAOd/yKSxXMnFxD/IPtE+//m1O/AlBsKeZJEuhSkCUvY2VpwO2bS/INjnJrxaQgFlMtjRAt+fiDoNTwGbqmftVEPUvUCdIY1JhBGA5XS9IdoMMdCaFcZU= Received: from DUZPR01CA0125.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bc::17) by AS8PR08MB9597.eurprd08.prod.outlook.com (2603:10a6:20b:61b::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6565.36; Thu, 13 Jul 2023 10:15:00 +0000 Received: from DBAEUR03FT042.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4bc:cafe::18) by DUZPR01CA0125.outlook.office365.com (2603:10a6:10:4bc::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.24 via Frontend Transport; Thu, 13 Jul 2023 10:15:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT042.mail.protection.outlook.com (100.127.142.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.20 via Frontend Transport; Thu, 13 Jul 2023 10:15:00 +0000 Received: ("Tessian outbound ba2f3d95109c:v145"); Thu, 13 Jul 2023 10:15:00 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: f66470d3001715f5 X-CR-MTA-TID: 64aa7808 Received: from 16046c72c2ae.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 8664F3D9-954A-4E4E-83AB-5A8BD8239222.1; Thu, 13 Jul 2023 10:14:54 +0000 Received: from EUR02-AM0-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 16046c72c2ae.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 13 Jul 2023 10:14:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mnvNV1tRg5kgA+VpJ1MFpcEwvPER557od4cjxrl43+PT8/hQJieyIOjbN2fgnDZ+fABHgpn360/ZjDgmazUZsm8oP0ExJPRv9g2RfsHoxFyfA16cBayS7OXXSdBAtOKhibKoR2l+szcDDom6r4s7p+UOHZWAJsL0KFXiNlNaj8MZrhQ/L/Jf3bw6CkZKt0UwHpLOQNJKBWKV9vl7ITVx6v9/IHxiyKieByeWP3oUHVbdsMPB8URMJ5sE1bWCo1J8hCerX2I6g5xOtPIAeQtNUeX4rtz2lpeamRnDXD74tfiZV6qRpBC8/vaBL4bCazBw6c+ZltusW18jDi12opQ/DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=n93nRNIVEy0PulCR1k2jNC1f0x9Q6ZFT0/FacjBFVGs=; b=FGlo4x6LWZ/mw6wyi/63lmN+JgVpEYc47gpylL26IibTZjDZ0Xq6ALk+cDcUrFpqJf3pF+/XSVCj/gv3fIRpmcihYODPvkMs/x4AlmurK7yRWsESCkqGEcYBrkkUfbyzhZDOwvoCf31UixCSU5iefK605SQ17CSezCnYCYkYBJsYmoiGaCGEMavDmhLEJApbSdY1UGNuy9yGv65KtP3tli44rb/e7aQv2VhfjQVYQrkwGTO0ZrYRiyHYGobLjSIWcZjCkn2Yo0i6AY85NPX0+rBwB33nzQKEqnE1FRit2rwVj8HrhbPEOD4SHskt87QiEm9WVTSjzLXWLaZum6pvLw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=n93nRNIVEy0PulCR1k2jNC1f0x9Q6ZFT0/FacjBFVGs=; b=V+mjpyInTFb8/T7T8Yp4ScHB0Emm2Fc5/GtxsvHABMyD9IkhJRVdpbgAOd/yKSxXMnFxD/IPtE+//m1O/AlBsKeZJEuhSkCUvY2VpwO2bS/INjnJrxaQgFlMtjRAt+fiDoNTwGbqmftVEPUvUCdIY1JhBGA5XS9IdoMMdCaFcZU= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from AS8PR08MB7079.eurprd08.prod.outlook.com (2603:10a6:20b:400::12) by PAWPR08MB9032.eurprd08.prod.outlook.com (2603:10a6:102:341::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.20; Thu, 13 Jul 2023 10:14:51 +0000 Received: from AS8PR08MB7079.eurprd08.prod.outlook.com ([fe80::135e:c80a:2277:6d98]) by AS8PR08MB7079.eurprd08.prod.outlook.com ([fe80::135e:c80a:2277:6d98%3]) with mapi id 15.20.6588.024; Thu, 13 Jul 2023 10:14:51 +0000 Message-ID: <87a51e61-271a-44d7-ed94-de45d32b2e18@arm.com> Date: Thu, 13 Jul 2023 15:44:43 +0530 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors Content-Language: en-US To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" References: From: Tejas Belagod In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: PN2PR01CA0052.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:22::27) To AS8PR08MB7079.eurprd08.prod.outlook.com (2603:10a6:20b:400::12) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: AS8PR08MB7079:EE_|PAWPR08MB9032:EE_|DBAEUR03FT042:EE_|AS8PR08MB9597:EE_ X-MS-Office365-Filtering-Correlation-Id: 864c530f-4e3a-4007-377c-08db838a0479 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: XwM7RoSv4Uz72/cGvX3ZCY0+RdJiSIAWRqYzSJs2UTwtSbFz0z4oFLYmGtiD+nOymCNAFBJmE0IMYJKJMrXkttz0Q+L1CWav6amhYZ9de7PuV4TK8hu4BfdMt8HnLLRuGvFBuaf3bHeyrRzoLOpoqnHxaYdViKj5ngyCW7bL/gWoKtN227nhF3/fHcRTOqwz+CdEKm74m349KSQ/G+rsgr/54O40UZc4eKqpMPHSFHgsc5S8Jo0g8+RF8yC5TWhw1jdTc6OgGYS5LpWP6jNvrEnGSsGPVr00+KDgVCYKvu69CTcvgNHyGg4wfYKH1fKw1HpETNZ7s06ioEV5uF19ueKLOIUL0BWECzh/aYcw0CqpP+NmRvQLkhXL7oFB4yfwyo89PDTt4+5Ev73eMCHttYsOWtxnBbjZYr1zxTntmt6B+9bHHMIvaY/GwIzqdv+LzE0AcUR7qv33tS2kI7CCvgS45kn4QFKhkCf3JwAbD2AYzWLI597w0XtBWTzfil0S7LLXKM7VAoF1Ufs9JYtN2PdLi1auzKwASKdOlPHqn3nUEHFG5OyyIfSumQe0furtTGHAuu7rHOn18dUbKJmSQwX1KPg4KGJCFNZPL+dVppxuN3Jd2NpFFILsI4xBKcgfqhBJJ0yjqifRMc83asMAJ737z6pyKml5UWQJ2FnThUTlnr5hJfpNlS15pz1XkAWN X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AS8PR08MB7079.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(396003)(366004)(376002)(136003)(346002)(451199021)(66556008)(6916009)(66946007)(66476007)(4326008)(478600001)(86362001)(6506007)(26005)(31696002)(53546011)(186003)(36756003)(6666004)(6512007)(6486002)(966005)(83380400001)(2616005)(38100700002)(30864003)(44832011)(41300700001)(8936002)(8676002)(31686004)(5660300002)(66899021)(2906002)(316002)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9032 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT042.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: aa36ca4c-6537-4312-7d01-08db8389fe93 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: TA8vchJjruGsrK37umxfQIlGVIl8hR23264Fj20NSUaFlCU2oedpFnWSxeGT9fbJDZIXI7yqJ4wsiU8vk2dG0pvc9x+FhImmLi14HdHG5uB5fonOJVwvFUkdeS/zWlSOkVlik0YOGZ345zl4ji9H57p+SZKY2fHOXJqunyFyan0blnYWdK/LrAtOJgowG0Ot1j9lVSXFL0iHI5Q2tIGODMm3erREaisb0v9HTLD7rhBHiWlvsTHOZGBo2t4yxGQCsp69OZa/4V+OVLUnd1V2cZiiqucr4HYrq7mKPWpDhvXsIh0bHBvlGa/cNcoVXj33XWrLV37MFNnnQEhFcMiQMkA9rI5JCTW/qN6TUuO+evMScOVN7hLEfn9Hcl209I9y7DUi6dxjQFoQcPkHt0dLW3Vc1zFD1UMPIsVa3wGNnTIVJGwXspfntJZPxMZTYY2KR5lQafjTeRWu3IKzAwJg9AprQYZ1YnxnKb1Gis6gLt8jGANlhwcZiQA1PCACnHGdn3JUP2Lac9d04tyICxV2C378NPncZkP64IZPBLWtrfu7mag3ctXTAXGGgQrQc5so7LcjHHomwn8I0ey5dAPPLdBPT8nNTlk1SGfQMWVe/sdMtNyKK8P4aXqEjEF2gHnoySu1VLkvqTj1rW9zHzdbvJmVkmzcZGtL40uXWUTX5++KFH8FEEFkXm+oDeXA1aVE06wXInfRbBSdAE1Ikn67G0hRD9Ep2fq4e392D2GVHe1nYv0r0UVvbztvU4mffyZegABHiaKOLY0/d6dDDgUpFw9Z9OjwIjUuGxg09ABWN4hQyz0De8SI6o4OrcVhqT9/ X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(396003)(346002)(136003)(39860400002)(376002)(451199021)(40470700004)(36840700001)(46966006)(40480700001)(6512007)(966005)(4326008)(66899021)(26005)(53546011)(6506007)(40460700003)(36756003)(356005)(81166007)(44832011)(6862004)(8676002)(5660300002)(8936002)(86362001)(30864003)(41300700001)(82310400005)(316002)(70206006)(2906002)(70586007)(31696002)(6486002)(6666004)(478600001)(82740400003)(47076005)(83380400001)(2616005)(36860700001)(31686004)(336012)(186003)(43740500002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jul 2023 10:15:00.6835 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 864c530f-4e3a-4007-377c-08db838a0479 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT042.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9597 X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 7/3/23 1:31 PM, Richard Biener wrote: > On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod wrote: >> >> On 6/29/23 6:55 PM, Richard Biener wrote: >>> On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod wrote: >>>> >>>> >>>> >>>> >>>> >>>> From: Richard Biener >>>> Date: Tuesday, June 27, 2023 at 12:58 PM >>>> To: Tejas Belagod >>>> Cc: gcc-patches@gcc.gnu.org >>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors >>>> >>>> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From: Richard Biener >>>>> Date: Monday, June 26, 2023 at 2:23 PM >>>>> To: Tejas Belagod >>>>> Cc: gcc-patches@gcc.gnu.org >>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors >>>>> >>>>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Packed Boolean Vectors >>>>>> ---------------------- >>>>>> >>>>>> I'd like to propose a feature addition to GNU Vector extensions to add packed >>>>>> boolean vectors (PBV). This has been discussed in the past here[1] and a variant has >>>>>> been implemented in Clang recently[2]. >>>>>> >>>>>> With predication features being added to vector architectures (SVE, MVE, AVX), >>>>>> it is a useful feature to have to model predication on targets. This could >>>>>> find its use in intrinsics or just used as is as a GNU vector extension being >>>>>> mapped to underlying target features. For example, the packed boolean vector >>>>>> could directly map to a predicate register on SVE. >>>>>> >>>>>> Also, this new packed boolean type GNU extension can be used with SVE ACLE >>>>>> intrinsics to replace a fixed-length svbool_t. >>>>>> >>>>>> Here are a few options to represent the packed boolean vector type. >>>>> >>>>> The GIMPLE frontend uses a new 'vector_mask' attribute: >>>>> >>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); >>>>> typedef v8si v8sib __attribute__((vector_mask)); >>>>> >>>>> it get's you a vector type that's the appropriate (dependent on the >>>>> target) vector >>>>> mask type for the vector data type (v8si in this case). >>>>> >>>>> >>>>> >>>>> Thanks Richard. >>>>> >>>>> Having had a quick look at the implementation, it does seem to tick the boxes. >>>>> >>>>> I must admit I haven't dug deep, but if the target hook allows the mask to be >>>>> >>>>> defined in way that is target-friendly (and I don't know how much effort it will >>>>> >>>>> be to migrate the attribute to more front-ends), it should do the job nicely. >>>>> >>>>> Let me go back and dig a bit deeper and get back with questions if any. >>>> >>>> >>>> Let me add that the advantage of this is the compiler doesn't need >>>> to support weird explicitely laid out packed boolean vectors that do >>>> not match what the target supports and the user doesn't need to know >>>> what the target supports (and thus have an #ifdef maze around explicitely >>>> specified layouts). >>>> >>>> Sorry for the delayed response – I spent a day experimenting with vector_mask. >>>> >>>> >>>> >>>> Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough >>>> >>>> to avoid having to sprinkle the code with ifdefs. >>>> >>>> >>>> It does remove some flexibility though, for example with -mavx512f -mavx512vl >>>> you'll get AVX512 style masks for V4SImode data vectors but of course the >>>> target sill supports SSE2/AVX2 style masks as well, but those would not be >>>> available as "packed boolean vectors", though they are of course in fact >>>> equal to V4SImode data vectors with -1 or 0 values, so in this particular >>>> case it might not matter. >>>> >>>> That said, the vector_mask attribute will get you V4SImode vectors with >>>> signed boolean elements of 32 bits for V4SImode data vectors with >>>> SSE2/AVX2. >>>> >>>> >>>> >>>> This sounds very much like what the scenario would be with NEON vs SVE. Coming to think >>>> >>>> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the ‘base’ vector type >>>> >>>> and a ‘w’ specified for the type. >>>> >>>> >>>> >>>> Given its current implementation, if vector_mask is exposed to the CFE, would there be any >>>> >>>> major challenges wrt implementation or defining behaviour semantics? I played around with a >>>> >>>> few examples from the testsuite and wrote some new ones. I mostly tried operations that >>>> >>>> the new type would have to support (unary, binary bitwise, initializations etc) – with a couple of exceptions >>>> >>>> most of the ops seem to be supported. I also triggered a couple of ICEs in some tests involving >>>> >>>> implicit conversions to wider/narrower vector_mask types (will raise reports for these). Correct me >>>> >>>> if I’m wrong here, but we’d probably have to support a couple of new ops if vector_mask is exposed >>>> >>>> to the CFE – initialization and subscript operations? >>> >>> Yes, either that or restrict how the mask vectors can be used, thus >>> properly diagnose improper >>> uses. >> >> Indeed. >> >> A question would be for example how to write common mask test >>> operations like >>> if (any (mask)) or if (all (mask)). >> >> I see 2 options here. New builtins could support new types - they'd >> provide a target independent way to test any and all conditions. Another >> would be to let the target use its intrinsics to do them in the most >> efficient way possible (which the builtins would get lowered down to >> anyway). >> >> >> Likewise writing merge operations >>> - do those as >>> >>> a = a | (mask ? b : 0); >>> >>> thus use ternary ?: for this? >> >> Yes, like now, the ternary could just translate to >> >> {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... } >> >> One thing to flesh out is the semantics. Should we allow this operation >> as long as the number of elements are the same even if the mask type if >> different i.e. >> >> v4hib ? v4si : v4si; >> >> I don't see why this can't be allowed as now we let >> >> v4si ? v4sf : v4sf; >> >> >> For initialization regular vector >>> syntax should work: >>> >>> mtype mask = (mtype){ -1, -1, 0, 0, ... }; >>> >>> there's the question of the signedness of the mask elements. GCC >>> internally uses signed >>> bools with values -1 for true and 0 for false. >> >> One of the things is the value that represents true. This is largely >> target-dependent when it comes to the vector_mask type. When vector_mask >> types are created from GCC's internal representation of bool vectors >> (signed ints) the point about implicit/explicit conversions from signed >> int vect to mask types in the proposal covers this. So mask in >> >> v4sib mask = (v4sib){-1, -1, 0, 0, ... } >> >> will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx >> on SVE. On AVX2/SSE they'd still be represented as vector of signed ints >> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications this >> new mask type representations will have in the mid-end while being >> converted back and forth to and from GCC's internal representation, but >> I'm guessing this is already being handled at some level by the >> vector_mask type's current support? > > Yes, I would guess so. Of course what the middle-end is currently exposed > to is simply what the vectorizer generates - once fuzzers discover this feature > we'll see "interesting" uses that might run into missed or wrong handling of > them. > > So whatever we do on the side of exposing this to users a good portion > of testsuite coverage for the allowed use cases is important. > > Richard. > Apologies for the long-ish reply, but here's a TLDR and gory details follow. TLDR: GIMPLE's vector_mask type semantics seems to be target-dependent, so elevating vector_mask to CFE with same semantics is undesirable. OTOH, changing vector_mask to have target-independent CFE semantics will cause dichotomy between its CFE and GFE behaviours. But vector_mask approach scales well for sizeless types. Is the solution to have something like vector_mask with defined target-independent type semantics, but call it something else to prevent conflation with GIMPLE, a viable option? Details: After some more analysis of the proposed options, here are some interesting findings: vector_mask looked like a very interesting option until I ran into some semantic uncertainly. This code: typedef int v8si __attribute__((vector_size(8*sizeof(int)))); typedef v8si v8sib __attribute__((vector_mask)); typedef short v8hi __attribute__((vector_size(8*sizeof(short)))); typedef v8hi v8hib __attribute__((vector_mask)); v8si res; v8hi resh; v8hib __GIMPLE () foo (v8hib x, v8sib y) { v8hib res; res = x & y; return res; } When compiled on AArch64, produces a type-mismatch error for binary expression involving '&' because the 'derived' types 'v8hib' and 'v8sib' have a different target-layout. If the layout of these two 'derived' types match, then the above code has no issue. Which is the case on amdgcn-amdhsa target where it compiles without any error(amdgcn uses a scalar DImode mask mode). IoW such code seems to be allowed on some targets and not on others. With the same code, I tried putting casts and it worked fine on AArch64 and amdgcn. This target-specific behaviour of vector_mask derived types will be difficult to specify once we move it to the CFE - in fact we probably don't want target-specific behaviour once it moves to the CFE. If we expose vector_mask to CFE, we'd have to specify consistent semantics for vector_mask types. We'd have to resolve ambiguities like 'v4hib & v4sib' clearly to be able to specify the semantics of the type system involving vector_mask. If we do this, don't we run the risk of a dichotomy between the CFE and GFE semantics of vector_mask? I'm assuming we'd want to retain vector_mask semantics as they are in GIMPLE. If we want to enforce constant semantics for vector_mask in the CFE, one way is to treat vector_mask types as distinct if they're 'attached' to distinct data vector types. In such a scenario, vector_mask types attached to two data vector types with the same lane-width and number of lanes would be classified as distinct. For eg: typedef int v8si __attribute__((vector_size(8*sizeof(int)))); typedef v8si v8sib __attribute__((vector_mask)); typedef float v8sf __attribute__((vector_size(8*sizeof(float)))); typedef v8sf v8sfb __attribute__((vector_mask)); v8si foo (v8sf x, v8sf y, v8si i, v8si j) { (a == b) & (v8sfb)(x == y) ? x : (v8si){0}; } This could be the case for unsigned vs signed int vectors too for eg - seems a bit unnecessary tbh. Though vector_mask's being 'attached' to a type has its drawbacks, it does seem to have an advantage when sizeless types are considered. If we have to define a sizeless vector boolean type that is implied by the lane size, we could do something like typedef svint32_t svbool32_t __attribute__((vector_mask)); int32_t foo (svint32_t a, svint32_t b) { svbool32_t pred = a > b; return pred[2] ? a[2] : b[2]; } This is harder to do in the other schemes proposed so far as they're size-based. To be able to free the boolean from the base type (not size) and retain vector_mask's flexibility to declare sizeless types, we could have an attribute that is more flexibly-typed and only 'derives' the lane-size and number of lanes from its 'base' type without actually inheriting the actual base type(char, short, int etc) or its signedness. This creates a purer and stand-alone boolean type without the associated semantics' complexity of having to cast between two same-size types with the same number of lanes. Eg. typedef int v8si __attribute__((vector_size(8*sizeof(int)))); typedef v8si v8b __attribute__((vector_bool)); However, with differing lane-sizes, there will have to be a cast as the 'derived' element size is different which could impact the layout of the vector mask. Eg. v8si foo (v8hi x, v8hi y, v8si i, v8si j) { (v8sib)(x == y) & (i == j) ? i : (v8si){0}; } Such conversions on targets like AVX512/AMDGCN will be a NOP, but non-trivial on SVE (depending on the implemented layout of the bool vector). vector_bool decouples us from having to retain the behaviour of vector_mask and provides the flexibility of not having to cast across same-element-size vector types. Wrt to sizeless types, it could scale well. typedef svint32_t svbool32_t __attribute__((vector_bool)); typedef svint16_t svbool16_t __attribute__((vector_bool)); int32_t foo (svint32_t a, svint32_t b) { svbool32_t pred = a > b; return pred[2] ? a[2] : b[2]; } int16_t bar (svint16_t a, svint16_t b) { svbool16_t pred = a > b; return pred[2] ? a[2] : b[2]; } On SVE, pred[2] refers to bit 4 for svint16_t and bit 8 for svint32_t on the target predicate. Thoughts? Thanks, Tejas. >> >> Thanks, >> Tejas. >> >>> >>> Richard. >>> >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Tejas. >>>> >>>> >>>> >>>> >>>> >>>> Richard. >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Tejas. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> 1. __attribute__((vector_size (n))) where n represents bytes >>>>>> >>>>>> typedef bool vbool __attribute__ ((vector_size (1))); >>>>>> >>>>>> In this approach, the shape of the boolean vector is unclear. IoW, it is not >>>>>> clear if each bit in 'n' controls a byte or an element. On targets >>>>>> like SVE, it would be natural to have each bit control a byte of the target >>>>>> vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, each >>>>>> bit would control one element/lane on the target vector(therefore resulting in a >>>>>> 'packed' layout with all significant bits at the LSB). >>>>>> >>>>>> 2. __attribute__((vector_size (n))) where n represents num of lanes >>>>>> >>>>>> typedef int v4si __attribute__ ((vector_size (4 * sizeof (int))); >>>>>> typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof (v4si){0}[0]))); >>>>>> >>>>>> Here the 'n' in the vector_size attribute represents the number of bits that >>>>>> is needed to represent a vector quantity. In this case, this packed boolean >>>>>> vector can represent upto 'n' vector lanes. The size of the type is >>>>>> rounded up the nearest byte. For example, the sizeof v4bi in the above >>>>>> example is 1. >>>>>> >>>>>> In this approach, because of the nature of the representation, the n bits required >>>>>> to represent the n lanes of the vector are packed at the LSB. This does not naturally >>>>>> align with the SVE approach of each bit representing a byte of the target vector >>>>>> and PBV therefore having an 'unpacked' layout. >>>>>> >>>>>> More importantly, another drawback here is that the change in units for vector_size >>>>>> might be confusing to programmers. The units will have to be interpreted based on the >>>>>> base type of the typedef. It does not offer any flexibility in terms of the layout of >>>>>> the bool vector - it is fixed. >>>>>> >>>>>> 3. Combination of 1 and 2. >>>>>> >>>>>> Combining the best of 1 and 2, we can introduce extra parameters to vector_size that will >>>>>> unambiguously represent the layout of the PBV. Consider >>>>>> >>>>>> typedef bool vbool __attribute__((vector_size (s, n[, w]))); >>>>>> >>>>>> where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd parameter 'w' >>>>>> is the number of bits of the PBV that represents a lane of the target vector. 'w' would >>>>>> allow a target to force a certain layout of the PBV. >>>>>> >>>>>> The 2-parameter form of vector_size allows the target to have an >>>>>> implementation-defined layout of the PBV. The target is free to choose the 'w' >>>>>> if it is not specified to mirror the target layout of predicate registers. For >>>>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n. >>>>>> >>>>>> As an example, to represent the result of a comparison on 2 int16x8_t, we'd need >>>>>> 8 lanes of boolean which could be represented by >>>>>> >>>>>> typedef bool v8b __attribute__ ((vector_size (2, 8))); >>>>>> >>>>>> SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2 >>>>>> >>>>>> and AVX would choose a layout where all 8 consecutive bits packed at LSB would >>>>>> be significant i.e. w == 1. >>>>>> >>>>>> This scheme would accomodate more than 1 target to effectively represent vector >>>>>> bools that mirror the target properties. >>>>>> >>>>>> 4. A new attribite >>>>>> >>>>>> This is based on a suggestion from Richard S in [3]. The idea is to introduce a new >>>>>> attribute to define the PBV and make it general enough to >>>>>> >>>>>> * represent all targets flexibly (SVE, AVX etc) >>>>>> * represent sub-byte length predicates >>>>>> * have no change in units of vector_size/no new vector_size signature >>>>>> * not have the number of bytes constrain representation >>>>>> >>>>>> If we call the new attribute 'bool_vec' (for lack of a better name), consider >>>>>> >>>>>> typedef bool vbool __attribute__((bool_vec (n[, w]))) >>>>>> >>>>>> where 'n' represents number of lanes/elements and the optional 'w' is bits-per-lane. >>>>>> >>>>>> If 'w' is not specified, it and bytes-per-predicate are implementation-defined based on target. >>>>>> If 'w' is specified, sizeof (vbool) will be ceil (n*w/8). >>>>>> >>>>>> 5. Behaviour of the packed vector boolean type. >>>>>> >>>>>> Taking the example of one of the options above, following is an illustration of it's behavior >>>>>> >>>>>> * ABI >>>>>> >>>>>> New ABI rules will need to be defined for this type - eg alignment, PCS, >>>>>> mangling etc >>>>>> >>>>>> * Initialization: >>>>>> >>>>>> Packed Boolean Vectors(PBV) can be initialized like so: >>>>>> >>>>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 4))); >>>>>> v4bi p = {false, true, false, false}; >>>>>> >>>>>> Each value in the initizlizer constant is of type bool. The lowest numbered >>>>>> element in the const array corresponds to the LSbit of p, element 1 is >>>>>> assigned to bit 4 etc. >>>>>> >>>>>> p is effectively a 2-byte bitmask with value 0x0010 >>>>>> >>>>>> With a different layout >>>>>> >>>>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 1))); >>>>>> v4bi p = {false, true, false, false}; >>>>>> >>>>>> p is effectively a 2-byte bitmask with value 0x0002 >>>>>> >>>>>> * Operations: >>>>>> >>>>>> Packed Boolean Vectors support the following operations: >>>>>> . unary ~ >>>>>> . unary ! >>>>>> . binary&,|andˆ >>>>>> . assignments &=, |= and ˆ= >>>>>> . comparisons <, <=, ==, !=, >= and > >>>>>> . Ternary operator ?: >>>>>> >>>>>> Operations are defined as applied to the individual elements i.e the bits >>>>>> that are significant in the PBV. Whether the PBVs are treated as bitmasks >>>>>> or otherwise is implementation-defined. >>>>>> >>>>>> Insignificant bits could affect results of comparisons or ternary operators. >>>>>> In such cases, it is implementation defined how the unused bits are treated. >>>>>> >>>>>> . Subscript operator [] >>>>>> >>>>>> For the subscript operator, the packed boolean vector acts like a array of >>>>>> elements - the first or the 0th indexed element being the LSbit of the PBV. >>>>>> Subscript operator yields a scalar boolean value. >>>>>> For example: >>>>>> >>>>>> typedef bool v8b __attribute__ ((vector_size (2, 8, 2))); >>>>>> >>>>>> // Subscript operator result yields a boolean value. >>>>>> // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x. >>>>>> bool foo (v8b p, int n) { p[3] = true; return p[1]; } >>>>>> >>>>>> Out of bounds access: OOB access can be determined at compile time given the >>>>>> strong typing of the PBVs. >>>>>> >>>>>> PBV does not support address of operator(&) for elements of PBVs. >>>>>> >>>>>> . Implicit conversion from integer vectors to PBVs >>>>>> >>>>>> We would like to support the output of comparison operations to be PBVs. This >>>>>> requires us to define the implicit conversion from an integer vector to PBV >>>>>> as the result of vector comparisons are integer vectors. >>>>>> >>>>>> To define this operation: >>>>>> >>>>>> bool_vector = vector vector >>>>>> >>>>>> There is no change in how vector vector behavior i.e. this comparison >>>>>> would still produce an int_vector type as it does now. >>>>>> >>>>>> temp_int_vec = vector vector >>>>>> bool_vec = temp_int_vec // Implicit conversion from int_vec to bool_vec >>>>>> >>>>>> The implicit conversion from int_vec to bool I'd define simply to be: >>>>>> >>>>>> bool_vec[n] = (_Bool) int_vec[n] >>>>>> >>>>>> where the C11 standard rules apply >>>>>> 6.3.1.2 Boolean type When any scalar value is converted to _Bool, the result >>>>>> is 0 if the value compares equal to 0; otherwise, the result is 1. >>>>>> >>>>>> >>>>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html >>>>>> [2] https://reviews.llvm.org/D88905 >>>>>> [3] https://reviews.llvm.org/D81083 >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Thanks, >>>>>> Tejas. >>