From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2060.outbound.protection.outlook.com [40.107.21.60]) by sourceware.org (Postfix) with ESMTPS id 898153858CD1 for ; Fri, 14 Jul 2023 10:18:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 898153858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nvfi9oYuwdSzc6DAlld9oCPN1LsI9mWA/aY1EPG0h2Q=; b=QzpoVD9RHzZdf9Ivo77rbGGiug+0KEeZNW112m1A0/zTbM2Kz++QHfRh7uj7obwjVPsLxIBeE2he4TdtF2x/aZpL2VIAtwtlWX2uOSePBaBu7q7R8HKeiYejxmCs33Sd0MSXAzJZ+iFJ281glnJB820d1IMvw8PsKcsYTR76ZpM= Received: from DUZPR01CA0240.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b5::8) by PAVPR08MB9091.eurprd08.prod.outlook.com (2603:10a6:102:32d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27; Fri, 14 Jul 2023 10:18:26 +0000 Received: from DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4b5:cafe::c7) by DUZPR01CA0240.outlook.office365.com (2603:10a6:10:4b5::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27 via Frontend Transport; Fri, 14 Jul 2023 10:18:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT030.mail.protection.outlook.com (100.127.142.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.26 via Frontend Transport; Fri, 14 Jul 2023 10:18:26 +0000 Received: ("Tessian outbound f5de790fcf89:v145"); Fri, 14 Jul 2023 10:18:26 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 84c103f7cac501f3 X-CR-MTA-TID: 64aa7808 Received: from 227a645b178f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id BB00EC37-9FDB-4947-99EE-26012F65A137.1; Fri, 14 Jul 2023 10:18:19 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 227a645b178f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 14 Jul 2023 10:18:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SBdR6wMTvDq2+SzcdGdXsUXHaXSMzJTrwrQ7MsnfYKKneJxfSWosBugBGpliL/aU4MthttsH6wrtEX5hlF5DDcaskbsvBVLcoKAc+Ij22uwZhSWx//2cvUWdHAek1Ypdl/6xCSOq3ejOspg+a8I9DacfGzhMvM4Nu+KwWTvXVhzp56nnBwZ6x52DmMkVlGJxpjWj7H+lhlQmbT0bYYOccj7oKfgbIRsmNjYky2Ie7u6XCM5K3sC0rCy09sgd4ft19S+ullmRIFxK5px3AebNc2Yd7DXZhvTF1JSRL7SVr+62lLs9zF/uS9j2ESYXu+XZTwoapTnDwm8EmIrREwTbyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nvfi9oYuwdSzc6DAlld9oCPN1LsI9mWA/aY1EPG0h2Q=; b=QfSV4Ge6KOEyiczQpaiAdP5x5JeKaSwUZcH4ZxgjhiqDWk+OBo9ejjg1E/sepMMNjR0oUQ4oGEFF8M8ehugab0hNbEgfJGtu3STkXKXyd9ECca7WIiIi3XeN1fdFCnPpEvfJOqmkEZAZ3xS+ZHcoU5ZxIEzMB+SK8q6nPg8U3KNtiwmz5OT7UmZS9nmVK+NQHwWq+4gDf2NQkIHm22yLbxvc8pIzu3gLwaMRJyUT7A9tIO/eo2XKo3w0s2JMBj97ufv54yOmCUeg0i6rHc4rRIIJgibt1EcJOBeilP4ZMeOiAInhYq/XLn6WwRKnXpUmgt6oP43HotssKgvUjhac7g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nvfi9oYuwdSzc6DAlld9oCPN1LsI9mWA/aY1EPG0h2Q=; b=QzpoVD9RHzZdf9Ivo77rbGGiug+0KEeZNW112m1A0/zTbM2Kz++QHfRh7uj7obwjVPsLxIBeE2he4TdtF2x/aZpL2VIAtwtlWX2uOSePBaBu7q7R8HKeiYejxmCs33Sd0MSXAzJZ+iFJ281glnJB820d1IMvw8PsKcsYTR76ZpM= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from AS8PR08MB7079.eurprd08.prod.outlook.com (2603:10a6:20b:400::12) by AS1PR08MB7450.eurprd08.prod.outlook.com (2603:10a6:20b:4de::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27; Fri, 14 Jul 2023 10:18:17 +0000 Received: from AS8PR08MB7079.eurprd08.prod.outlook.com ([fe80::135e:c80a:2277:6d98]) by AS8PR08MB7079.eurprd08.prod.outlook.com ([fe80::135e:c80a:2277:6d98%3]) with mapi id 15.20.6588.027; Fri, 14 Jul 2023 10:18:17 +0000 Message-ID: Date: Fri, 14 Jul 2023 15:48:10 +0530 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors Content-Language: en-US To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" References: <87a51e61-271a-44d7-ed94-de45d32b2e18@arm.com> From: Tejas Belagod In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: PN2PR01CA0128.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:6::13) To AS8PR08MB7079.eurprd08.prod.outlook.com (2603:10a6:20b:400::12) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: AS8PR08MB7079:EE_|AS1PR08MB7450:EE_|DBAEUR03FT030:EE_|PAVPR08MB9091:EE_ X-MS-Office365-Filtering-Correlation-Id: f2ce7fe5-a246-4fcc-3672-08db8453a9a9 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Y8AIZYft3Nlx7ro9Ak9PWNsn9AdxkvSMrAH5KYxqF0IuwNJG0+O4JJu33J4LF4Vf/fdFY4tx9Zi3Pm1VDkbGPAcAsSHENSOpp4cL4/MYOGL5b0QNyYXkMvIWzhJe9Sk9wAV5wWKB0VD7LxFxbZQlRfvOMJ/a29Qpwjs9nsAzY+b/g1rnzvlcNOFnV4z46mhjNvlRYu4LcGqzQmYbSt7b+lqlTJYtfn/l2FPB5UdE4XX9tiT6LWnnBAQvWmWjs/1dePUJF/ZfxOhx+CqBpMA+LLhxcvGX6UO7bkvpNuswRPt4DQSAv1CAF2bXMSPMW/xmICg392rL13CfUZHnOC7Y1fRwUZ0psjF1dGg0JCPVuscjXX6K4Np5QbMNdLEFt2O5RzJE9C6tQSCtPJZg5eqcQUbvu2tA5PKlp+P5izJHoknvd26f7+WGiOP6F+l0RCUNSk6rXFabfxe1ftFez0gXW4qQTPl3V8+GorsNO3Exq9HLKkwwue1n6MD1njhQ0nQxf/xkXc6yynbf8BbrERtJ2Y6PNB/JhteFa4PYriIIhUXgzRgPWz57Ol34ejqOCUqLx5s7dFt5kYY5s0cNExJH2pc0qEEtMCyr+vYIe7Qj6EBZ4A1bm147msZ6OajzpkuGqI6Ymph9ABXVSdG1f9YaTSdC0ME5mLj7QeisC41BSGExvJp0FZpvFk5QZYMvC6Oz X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AS8PR08MB7079.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(376002)(396003)(366004)(346002)(39860400002)(136003)(451199021)(83380400001)(6916009)(66556008)(66946007)(478600001)(66476007)(44832011)(30864003)(2616005)(38100700002)(2906002)(31686004)(5660300002)(4326008)(8676002)(8936002)(316002)(6506007)(6666004)(66899021)(31696002)(186003)(86362001)(6486002)(41300700001)(53546011)(26005)(36756003)(966005)(6512007)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR08MB7450 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 2123bfb1-0c7a-4e37-ff55-08db8453a3ef X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: kGGbblOvsaQSTtFI2omfcVfLtBmIOMMvtRYV4/z0gpGbkym0TfUd6Qp6yBfbIvy9owMnpQ6DC7jxWelrkg0JxB+jvYRAHRwubwgXbu4aymypyjv1FjURxF6mRmUOYw1shU/J/6Mmqi7pNY/TSwRiVbesKZsGUcVPPBeWSI+KrS73NOeS493vysB9I1XLQFakls8jAYjtVy+oBglKpkTg3/UZEeZ60d+bYrKqa9iaFs6sYF6AAauBi+ifBhhAeqnqWVieSLMpsWHTnjHwsmLnff/DBLvYkjArtbITxHWuuLDV47oqLbcvDPl2vbzYdcM1Cp+99XZbIyH9v6RDN1xnaRvzUYRumENHY1nKsT84sBMHFFi4XHaSE+hdmjKUh7qqz2UOugb9X5/3YBLQzUjk/xJdr9SFUmUYUXQMHp8Ol+VV2HlMNPleEzk8SSkt9ummwWf+6E+rlDaU8Pl6wnP/4QSf3/0usn5qUdopXtnzSCpQKdjf1FTt+fc9zIVTmy+DWKn3036rPtHymakYnFkY4QJnsMmMn9kXfE5puL+/nWQVt7UCtZAuZhLhjsBEzPXVfvLifuj9jQjZnfRCWLmsi7RnYlkZqs9WxNWY4zR0d4Ge9RW5sTz3N/VxVe9DvqI+v73+pEqpePiiFRYwYYgIFgjFYQ5uBnDOA980Y7aQjr+aL1MGXBpISnNcHK7Ae6RiRUDex8BSZyUif8akpOuiGlADnCEO9eexJusdgMAcDS5w7SY7jhDVdfOR090CDWuOmm7F2/OFN+31P3iTPVyj+pCJwPMry6thw5vhkYFdeR9B/bPE6zEeOlmNpTDSCsA6 X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(136003)(376002)(396003)(346002)(451199021)(36840700001)(46966006)(40470700004)(6666004)(6486002)(478600001)(186003)(53546011)(6506007)(26005)(6512007)(966005)(70586007)(2906002)(30864003)(82310400005)(41300700001)(316002)(4326008)(70206006)(44832011)(5660300002)(6862004)(8676002)(8936002)(81166007)(356005)(82740400003)(40460700003)(86362001)(31696002)(36756003)(36860700001)(336012)(2616005)(83380400001)(47076005)(40480700001)(31686004)(66899021)(43740500002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jul 2023 10:18:26.6491 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f2ce7fe5-a246-4fcc-3672-08db8453a9a9 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9091 X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 7/13/23 4:05 PM, Richard Biener wrote: > On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod wrote: >> >> On 7/3/23 1:31 PM, Richard Biener wrote: >>> On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod wrote: >>>> >>>> On 6/29/23 6:55 PM, Richard Biener wrote: >>>>> On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> From: Richard Biener >>>>>> Date: Tuesday, June 27, 2023 at 12:58 PM >>>>>> To: Tejas Belagod >>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors >>>>>> >>>>>> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Richard Biener >>>>>>> Date: Monday, June 26, 2023 at 2:23 PM >>>>>>> To: Tejas Belagod >>>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors >>>>>>> >>>>>>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Packed Boolean Vectors >>>>>>>> ---------------------- >>>>>>>> >>>>>>>> I'd like to propose a feature addition to GNU Vector extensions to add packed >>>>>>>> boolean vectors (PBV). This has been discussed in the past here[1] and a variant has >>>>>>>> been implemented in Clang recently[2]. >>>>>>>> >>>>>>>> With predication features being added to vector architectures (SVE, MVE, AVX), >>>>>>>> it is a useful feature to have to model predication on targets. This could >>>>>>>> find its use in intrinsics or just used as is as a GNU vector extension being >>>>>>>> mapped to underlying target features. For example, the packed boolean vector >>>>>>>> could directly map to a predicate register on SVE. >>>>>>>> >>>>>>>> Also, this new packed boolean type GNU extension can be used with SVE ACLE >>>>>>>> intrinsics to replace a fixed-length svbool_t. >>>>>>>> >>>>>>>> Here are a few options to represent the packed boolean vector type. >>>>>>> >>>>>>> The GIMPLE frontend uses a new 'vector_mask' attribute: >>>>>>> >>>>>>> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); >>>>>>> typedef v8si v8sib __attribute__((vector_mask)); >>>>>>> >>>>>>> it get's you a vector type that's the appropriate (dependent on the >>>>>>> target) vector >>>>>>> mask type for the vector data type (v8si in this case). >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks Richard. >>>>>>> >>>>>>> Having had a quick look at the implementation, it does seem to tick the boxes. >>>>>>> >>>>>>> I must admit I haven't dug deep, but if the target hook allows the mask to be >>>>>>> >>>>>>> defined in way that is target-friendly (and I don't know how much effort it will >>>>>>> >>>>>>> be to migrate the attribute to more front-ends), it should do the job nicely. >>>>>>> >>>>>>> Let me go back and dig a bit deeper and get back with questions if any. >>>>>> >>>>>> >>>>>> Let me add that the advantage of this is the compiler doesn't need >>>>>> to support weird explicitely laid out packed boolean vectors that do >>>>>> not match what the target supports and the user doesn't need to know >>>>>> what the target supports (and thus have an #ifdef maze around explicitely >>>>>> specified layouts). >>>>>> >>>>>> Sorry for the delayed response – I spent a day experimenting with vector_mask. >>>>>> >>>>>> >>>>>> >>>>>> Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough >>>>>> >>>>>> to avoid having to sprinkle the code with ifdefs. >>>>>> >>>>>> >>>>>> It does remove some flexibility though, for example with -mavx512f -mavx512vl >>>>>> you'll get AVX512 style masks for V4SImode data vectors but of course the >>>>>> target sill supports SSE2/AVX2 style masks as well, but those would not be >>>>>> available as "packed boolean vectors", though they are of course in fact >>>>>> equal to V4SImode data vectors with -1 or 0 values, so in this particular >>>>>> case it might not matter. >>>>>> >>>>>> That said, the vector_mask attribute will get you V4SImode vectors with >>>>>> signed boolean elements of 32 bits for V4SImode data vectors with >>>>>> SSE2/AVX2. >>>>>> >>>>>> >>>>>> >>>>>> This sounds very much like what the scenario would be with NEON vs SVE. Coming to think >>>>>> >>>>>> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the ‘base’ vector type >>>>>> >>>>>> and a ‘w’ specified for the type. >>>>>> >>>>>> >>>>>> >>>>>> Given its current implementation, if vector_mask is exposed to the CFE, would there be any >>>>>> >>>>>> major challenges wrt implementation or defining behaviour semantics? I played around with a >>>>>> >>>>>> few examples from the testsuite and wrote some new ones. I mostly tried operations that >>>>>> >>>>>> the new type would have to support (unary, binary bitwise, initializations etc) – with a couple of exceptions >>>>>> >>>>>> most of the ops seem to be supported. I also triggered a couple of ICEs in some tests involving >>>>>> >>>>>> implicit conversions to wider/narrower vector_mask types (will raise reports for these). Correct me >>>>>> >>>>>> if I’m wrong here, but we’d probably have to support a couple of new ops if vector_mask is exposed >>>>>> >>>>>> to the CFE – initialization and subscript operations? >>>>> >>>>> Yes, either that or restrict how the mask vectors can be used, thus >>>>> properly diagnose improper >>>>> uses. >>>> >>>> Indeed. >>>> >>>> A question would be for example how to write common mask test >>>>> operations like >>>>> if (any (mask)) or if (all (mask)). >>>> >>>> I see 2 options here. New builtins could support new types - they'd >>>> provide a target independent way to test any and all conditions. Another >>>> would be to let the target use its intrinsics to do them in the most >>>> efficient way possible (which the builtins would get lowered down to >>>> anyway). >>>> >>>> >>>> Likewise writing merge operations >>>>> - do those as >>>>> >>>>> a = a | (mask ? b : 0); >>>>> >>>>> thus use ternary ?: for this? >>>> >>>> Yes, like now, the ternary could just translate to >>>> >>>> {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... } >>>> >>>> One thing to flesh out is the semantics. Should we allow this operation >>>> as long as the number of elements are the same even if the mask type if >>>> different i.e. >>>> >>>> v4hib ? v4si : v4si; >>>> >>>> I don't see why this can't be allowed as now we let >>>> >>>> v4si ? v4sf : v4sf; >>>> >>>> >>>> For initialization regular vector >>>>> syntax should work: >>>>> >>>>> mtype mask = (mtype){ -1, -1, 0, 0, ... }; >>>>> >>>>> there's the question of the signedness of the mask elements. GCC >>>>> internally uses signed >>>>> bools with values -1 for true and 0 for false. >>>> >>>> One of the things is the value that represents true. This is largely >>>> target-dependent when it comes to the vector_mask type. When vector_mask >>>> types are created from GCC's internal representation of bool vectors >>>> (signed ints) the point about implicit/explicit conversions from signed >>>> int vect to mask types in the proposal covers this. So mask in >>>> >>>> v4sib mask = (v4sib){-1, -1, 0, 0, ... } >>>> >>>> will probably end up being represented as 0x3xxxx on AVX512 and 0x11xxx >>>> on SVE. On AVX2/SSE they'd still be represented as vector of signed ints >>>> {-1, -1, 0, 0, ... }. I'm not entirely confident what ramifications this >>>> new mask type representations will have in the mid-end while being >>>> converted back and forth to and from GCC's internal representation, but >>>> I'm guessing this is already being handled at some level by the >>>> vector_mask type's current support? >>> >>> Yes, I would guess so. Of course what the middle-end is currently exposed >>> to is simply what the vectorizer generates - once fuzzers discover this feature >>> we'll see "interesting" uses that might run into missed or wrong handling of >>> them. >>> >>> So whatever we do on the side of exposing this to users a good portion >>> of testsuite coverage for the allowed use cases is important. >>> >>> Richard. >>> >> >> Apologies for the long-ish reply, but here's a TLDR and gory details follow. >> >> TLDR: >> GIMPLE's vector_mask type semantics seems to be target-dependent, so >> elevating vector_mask to CFE with same semantics is undesirable. OTOH, >> changing vector_mask to have target-independent CFE semantics will cause >> dichotomy between its CFE and GFE behaviours. But vector_mask approach >> scales well for sizeless types. Is the solution to have something like >> vector_mask with defined target-independent type semantics, but call it >> something else to prevent conflation with GIMPLE, a viable option? >> >> Details: >> After some more analysis of the proposed options, here are some >> interesting findings: >> >> vector_mask looked like a very interesting option until I ran into some >> semantic uncertainly. This code: >> >> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); >> typedef v8si v8sib __attribute__((vector_mask)); >> >> typedef short v8hi __attribute__((vector_size(8*sizeof(short)))); >> typedef v8hi v8hib __attribute__((vector_mask)); >> >> v8si res; >> v8hi resh; >> >> v8hib __GIMPLE () foo (v8hib x, v8sib y) >> { >> v8hib res; >> >> res = x & y; >> return res; >> } >> >> When compiled on AArch64, produces a type-mismatch error for binary >> expression involving '&' because the 'derived' types 'v8hib' and 'v8sib' >> have a different target-layout. If the layout of these two 'derived' >> types match, then the above code has no issue. Which is the case on >> amdgcn-amdhsa target where it compiles without any error(amdgcn uses a >> scalar DImode mask mode). IoW such code seems to be allowed on some >> targets and not on others. >> >> With the same code, I tried putting casts and it worked fine on AArch64 >> and amdgcn. This target-specific behaviour of vector_mask derived types >> will be difficult to specify once we move it to the CFE - in fact we >> probably don't want target-specific behaviour once it moves to the CFE. >> >> If we expose vector_mask to CFE, we'd have to specify consistent >> semantics for vector_mask types. We'd have to resolve ambiguities like >> 'v4hib & v4sib' clearly to be able to specify the semantics of the type >> system involving vector_mask. If we do this, don't we run the risk of a >> dichotomy between the CFE and GFE semantics of vector_mask? I'm assuming >> we'd want to retain vector_mask semantics as they are in GIMPLE. >> >> If we want to enforce constant semantics for vector_mask in the CFE, one >> way is to treat vector_mask types as distinct if they're 'attached' to >> distinct data vector types. In such a scenario, vector_mask types >> attached to two data vector types with the same lane-width and number of >> lanes would be classified as distinct. For eg: >> >> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); >> typedef v8si v8sib __attribute__((vector_mask)); >> >> typedef float v8sf __attribute__((vector_size(8*sizeof(float)))); >> typedef v8sf v8sfb __attribute__((vector_mask)); >> >> v8si foo (v8sf x, v8sf y, v8si i, v8si j) >> { >> (a == b) & (v8sfb)(x == y) ? x : (v8si){0}; >> } >> >> This could be the case for unsigned vs signed int vectors too for eg - >> seems a bit unnecessary tbh. >> >> Though vector_mask's being 'attached' to a type has its drawbacks, it >> does seem to have an advantage when sizeless types are considered. If we >> have to define a sizeless vector boolean type that is implied by the >> lane size, we could do something like >> >> typedef svint32_t svbool32_t __attribute__((vector_mask)); >> >> int32_t foo (svint32_t a, svint32_t b) >> { >> svbool32_t pred = a > b; >> >> return pred[2] ? a[2] : b[2]; >> } >> >> This is harder to do in the other schemes proposed so far as they're >> size-based. >> >> To be able to free the boolean from the base type (not size) and retain >> vector_mask's flexibility to declare sizeless types, we could have an >> attribute that is more flexibly-typed and only 'derives' the lane-size >> and number of lanes from its 'base' type without actually inheriting the >> actual base type(char, short, int etc) or its signedness. This creates a >> purer and stand-alone boolean type without the associated semantics' >> complexity of having to cast between two same-size types with the same >> number of lanes. Eg. >> >> typedef int v8si __attribute__((vector_size(8*sizeof(int)))); >> typedef v8si v8b __attribute__((vector_bool)); >> >> However, with differing lane-sizes, there will have to be a cast as the >> 'derived' element size is different which could impact the layout of the >> vector mask. Eg. >> >> v8si foo (v8hi x, v8hi y, v8si i, v8si j) >> { >> (v8sib)(x == y) & (i == j) ? i : (v8si){0}; >> } >> >> Such conversions on targets like AVX512/AMDGCN will be a NOP, but >> non-trivial on SVE (depending on the implemented layout of the bool vector). >> >> vector_bool decouples us from having to retain the behaviour of >> vector_mask and provides the flexibility of not having to cast across >> same-element-size vector types. Wrt to sizeless types, it could scale well. >> >> typedef svint32_t svbool32_t __attribute__((vector_bool)); >> typedef svint16_t svbool16_t __attribute__((vector_bool)); >> >> int32_t foo (svint32_t a, svint32_t b) >> { >> svbool32_t pred = a > b; >> >> return pred[2] ? a[2] : b[2]; >> } >> >> int16_t bar (svint16_t a, svint16_t b) >> { >> svbool16_t pred = a > b; >> >> return pred[2] ? a[2] : b[2]; >> } >> >> On SVE, pred[2] refers to bit 4 for svint16_t and bit 8 for svint32_t on >> the target predicate. >> >> Thoughts? > > The GIMPLE frontend accepts just what is valid on the target here. Any > "plumbing" such as implicit conversions (if we do not want to require > explicit ones even when NOP) need to be done/enforced by the C frontend. > Sorry, I'm not sure I follow - correct me if I'm wrong here. If we desire to define/allow operations like implicit/explicit conversion on vector_mask types in CFE, don't we have to start from a position of defining what makes vector_mask types distinct and therefore require implicit/explicit conversions? IIUC, GFE's distinctness of vector_mask types depends on how the mask mode is implemented on the target. If implemented in CFE, vector_mask types' distinctness probably shouldn't be based on target layout and could be based on the type they're 'attached' to. Wouldn't that diverge from target-specific GFE behaviour - or are you suggesting its OK for vector_mask type semantics to be different in CFE and GFE? > There's one issue I can see that wasn't mentioned yet - GCC currently > accepts > > typedef long gv1024di __attribute__((vector_size(1024*8))); > > even if there's no underlying support on the target which either has support > only for smaller vectors or no vectors at all. Currently vector_mask will > simply fail to produce sth desirable here. What's your idea of making > that not target dependent? GCC will later lower operations with such > vectors, possibly splitting them up into sizes supported by the hardware > natively, possibly performing elementwise operations. For the former > one would need to guess the "decomposition type" and based on that > select the mask type [layout]? > > One idea would be to specify the mask layout follows the largest vector > kind supported by the target and if there is none follow the layout > of (signed?) _Bool [n]? When there's no target support for vectors > GCC will generally use elementwise operations apart from some > special-cases. > That is a very good point - thanks for raising it. For when GCC chooses to lower to a vector type supported by the target, my initial thought would be to, as you say, choose a mask that has enough bits to represent the largest vector size with the smallest lane-width. The actual layout of the mask will depend on how the target implements its mask mode. Decomposition of vector_mask ought to follow the decomposition of the GNU vectors type and each decomposed vector_mask type ought to have enough bits to represent the decomposed GNU vector shape. It sounds nice on paper, but I haven't really worked through a design for this. Do you see any gotchas here? > While using a different name than vector_mask is certainly possible > it wouldn't me to decide that, but I'm also not yet convinced it's > really necessary. As said, what the GIMPLE frontend accepts > or not shouldn't limit us here - just the actual chosen layout of the > boolean vectors. > I'm just concerned about creating an alternate vector_mask functionality in the CFE and risk not being consistent with GFE. Thanks, Tejas. > Richard. > >> Thanks, >> Tejas. >> >>>> >>>> Thanks, >>>> Tejas. >>>> >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Tejas. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Richard. >>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Tejas. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> 1. __attribute__((vector_size (n))) where n represents bytes >>>>>>>> >>>>>>>> typedef bool vbool __attribute__ ((vector_size (1))); >>>>>>>> >>>>>>>> In this approach, the shape of the boolean vector is unclear. IoW, it is not >>>>>>>> clear if each bit in 'n' controls a byte or an element. On targets >>>>>>>> like SVE, it would be natural to have each bit control a byte of the target >>>>>>>> vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, each >>>>>>>> bit would control one element/lane on the target vector(therefore resulting in a >>>>>>>> 'packed' layout with all significant bits at the LSB). >>>>>>>> >>>>>>>> 2. __attribute__((vector_size (n))) where n represents num of lanes >>>>>>>> >>>>>>>> typedef int v4si __attribute__ ((vector_size (4 * sizeof (int))); >>>>>>>> typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof (v4si){0}[0]))); >>>>>>>> >>>>>>>> Here the 'n' in the vector_size attribute represents the number of bits that >>>>>>>> is needed to represent a vector quantity. In this case, this packed boolean >>>>>>>> vector can represent upto 'n' vector lanes. The size of the type is >>>>>>>> rounded up the nearest byte. For example, the sizeof v4bi in the above >>>>>>>> example is 1. >>>>>>>> >>>>>>>> In this approach, because of the nature of the representation, the n bits required >>>>>>>> to represent the n lanes of the vector are packed at the LSB. This does not naturally >>>>>>>> align with the SVE approach of each bit representing a byte of the target vector >>>>>>>> and PBV therefore having an 'unpacked' layout. >>>>>>>> >>>>>>>> More importantly, another drawback here is that the change in units for vector_size >>>>>>>> might be confusing to programmers. The units will have to be interpreted based on the >>>>>>>> base type of the typedef. It does not offer any flexibility in terms of the layout of >>>>>>>> the bool vector - it is fixed. >>>>>>>> >>>>>>>> 3. Combination of 1 and 2. >>>>>>>> >>>>>>>> Combining the best of 1 and 2, we can introduce extra parameters to vector_size that will >>>>>>>> unambiguously represent the layout of the PBV. Consider >>>>>>>> >>>>>>>> typedef bool vbool __attribute__((vector_size (s, n[, w]))); >>>>>>>> >>>>>>>> where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd parameter 'w' >>>>>>>> is the number of bits of the PBV that represents a lane of the target vector. 'w' would >>>>>>>> allow a target to force a certain layout of the PBV. >>>>>>>> >>>>>>>> The 2-parameter form of vector_size allows the target to have an >>>>>>>> implementation-defined layout of the PBV. The target is free to choose the 'w' >>>>>>>> if it is not specified to mirror the target layout of predicate registers. For >>>>>>>> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n. >>>>>>>> >>>>>>>> As an example, to represent the result of a comparison on 2 int16x8_t, we'd need >>>>>>>> 8 lanes of boolean which could be represented by >>>>>>>> >>>>>>>> typedef bool v8b __attribute__ ((vector_size (2, 8))); >>>>>>>> >>>>>>>> SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2 >>>>>>>> >>>>>>>> and AVX would choose a layout where all 8 consecutive bits packed at LSB would >>>>>>>> be significant i.e. w == 1. >>>>>>>> >>>>>>>> This scheme would accomodate more than 1 target to effectively represent vector >>>>>>>> bools that mirror the target properties. >>>>>>>> >>>>>>>> 4. A new attribite >>>>>>>> >>>>>>>> This is based on a suggestion from Richard S in [3]. The idea is to introduce a new >>>>>>>> attribute to define the PBV and make it general enough to >>>>>>>> >>>>>>>> * represent all targets flexibly (SVE, AVX etc) >>>>>>>> * represent sub-byte length predicates >>>>>>>> * have no change in units of vector_size/no new vector_size signature >>>>>>>> * not have the number of bytes constrain representation >>>>>>>> >>>>>>>> If we call the new attribute 'bool_vec' (for lack of a better name), consider >>>>>>>> >>>>>>>> typedef bool vbool __attribute__((bool_vec (n[, w]))) >>>>>>>> >>>>>>>> where 'n' represents number of lanes/elements and the optional 'w' is bits-per-lane. >>>>>>>> >>>>>>>> If 'w' is not specified, it and bytes-per-predicate are implementation-defined based on target. >>>>>>>> If 'w' is specified, sizeof (vbool) will be ceil (n*w/8). >>>>>>>> >>>>>>>> 5. Behaviour of the packed vector boolean type. >>>>>>>> >>>>>>>> Taking the example of one of the options above, following is an illustration of it's behavior >>>>>>>> >>>>>>>> * ABI >>>>>>>> >>>>>>>> New ABI rules will need to be defined for this type - eg alignment, PCS, >>>>>>>> mangling etc >>>>>>>> >>>>>>>> * Initialization: >>>>>>>> >>>>>>>> Packed Boolean Vectors(PBV) can be initialized like so: >>>>>>>> >>>>>>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 4))); >>>>>>>> v4bi p = {false, true, false, false}; >>>>>>>> >>>>>>>> Each value in the initizlizer constant is of type bool. The lowest numbered >>>>>>>> element in the const array corresponds to the LSbit of p, element 1 is >>>>>>>> assigned to bit 4 etc. >>>>>>>> >>>>>>>> p is effectively a 2-byte bitmask with value 0x0010 >>>>>>>> >>>>>>>> With a different layout >>>>>>>> >>>>>>>> typedef bool v4bi __attribute__ ((vector_size (2, 4, 1))); >>>>>>>> v4bi p = {false, true, false, false}; >>>>>>>> >>>>>>>> p is effectively a 2-byte bitmask with value 0x0002 >>>>>>>> >>>>>>>> * Operations: >>>>>>>> >>>>>>>> Packed Boolean Vectors support the following operations: >>>>>>>> . unary ~ >>>>>>>> . unary ! >>>>>>>> . binary&,|andˆ >>>>>>>> . assignments &=, |= and ˆ= >>>>>>>> . comparisons <, <=, ==, !=, >= and > >>>>>>>> . Ternary operator ?: >>>>>>>> >>>>>>>> Operations are defined as applied to the individual elements i.e the bits >>>>>>>> that are significant in the PBV. Whether the PBVs are treated as bitmasks >>>>>>>> or otherwise is implementation-defined. >>>>>>>> >>>>>>>> Insignificant bits could affect results of comparisons or ternary operators. >>>>>>>> In such cases, it is implementation defined how the unused bits are treated. >>>>>>>> >>>>>>>> . Subscript operator [] >>>>>>>> >>>>>>>> For the subscript operator, the packed boolean vector acts like a array of >>>>>>>> elements - the first or the 0th indexed element being the LSbit of the PBV. >>>>>>>> Subscript operator yields a scalar boolean value. >>>>>>>> For example: >>>>>>>> >>>>>>>> typedef bool v8b __attribute__ ((vector_size (2, 8, 2))); >>>>>>>> >>>>>>>> // Subscript operator result yields a boolean value. >>>>>>>> // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x. >>>>>>>> bool foo (v8b p, int n) { p[3] = true; return p[1]; } >>>>>>>> >>>>>>>> Out of bounds access: OOB access can be determined at compile time given the >>>>>>>> strong typing of the PBVs. >>>>>>>> >>>>>>>> PBV does not support address of operator(&) for elements of PBVs. >>>>>>>> >>>>>>>> . Implicit conversion from integer vectors to PBVs >>>>>>>> >>>>>>>> We would like to support the output of comparison operations to be PBVs. This >>>>>>>> requires us to define the implicit conversion from an integer vector to PBV >>>>>>>> as the result of vector comparisons are integer vectors. >>>>>>>> >>>>>>>> To define this operation: >>>>>>>> >>>>>>>> bool_vector = vector vector >>>>>>>> >>>>>>>> There is no change in how vector vector behavior i.e. this comparison >>>>>>>> would still produce an int_vector type as it does now. >>>>>>>> >>>>>>>> temp_int_vec = vector vector >>>>>>>> bool_vec = temp_int_vec // Implicit conversion from int_vec to bool_vec >>>>>>>> >>>>>>>> The implicit conversion from int_vec to bool I'd define simply to be: >>>>>>>> >>>>>>>> bool_vec[n] = (_Bool) int_vec[n] >>>>>>>> >>>>>>>> where the C11 standard rules apply >>>>>>>> 6.3.1.2 Boolean type When any scalar value is converted to _Bool, the result >>>>>>>> is 0 if the value compares equal to 0; otherwise, the result is 1. >>>>>>>> >>>>>>>> >>>>>>>> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html >>>>>>>> [2] https://reviews.llvm.org/D88905 >>>>>>>> [3] https://reviews.llvm.org/D81083 >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Tejas. >>>> >>