From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2055.outbound.protection.outlook.com [40.107.21.55]) by sourceware.org (Postfix) with ESMTPS id 53927385C329 for ; Tue, 14 Jun 2022 15:58:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 53927385C329 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=YDmz5rhxpr7SC3dIvEbPVOXDY+FHxUtgbTnJDCsBemc17z8MlGnwaEdB/sHOC+XHPl/yK670nYQuKp9lyg9Yt4XWqDi6zGTLx+llSBqb5Ql2OOKuRJ1CVykH7xFcDHICexu2RRKHW4oDku7+BOGtsgASURBpjoN10pwfLbAlNMcpaNZ/iwO6Z+rtHvZWi4BYILnVXi6i0RCsWxXpomFNtqFQIzDkTuww0w9ToqoOR/X50DO0KcBQpy3nY88Q6RC0fAZcIbA5Z4w8sDwNSBydUnDNwwnMMFaYFzbUrIjaTzxqru6rXq+0rq3pTaUCPyrJ73srsYv/QATuLdaVmNMmFw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=khmg6jo8N2A+fa67Y1rYggqpdhr3i34lUFDxok4dysc=; b=GWtkKHCmtkXgmw5T6dRF5U77MLrtb0yZLjcyIWMo073YNR7AupPy36Etq7VwvId29ECk+jPsSsOs/rfrH0hWnbVoxGEdiVxrbO1TnHgau8tH7rBVLdRJN6ESKHd/o0SeTm+GhBeV5DiMAUVHqRmsrgF/T75dvv/RTOPz4SN8u2nDbdm1RdMMKKmd7R5t9WniLuqYIeE1SphkTU0E9EMMEaUu/ezbfbKoyXTfgeinbOg3l0YhtoeTyoQI+FsNSvVg5rm+8XysLzOLfaFa0Y/6/xfCpxZfnsIEUFR8wBTz9ey4J0xKcM7ChOpHQncjzu7cuFOF0OQkPSd0tSuZG3MbhQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AS9PR06CA0744.eurprd06.prod.outlook.com (2603:10a6:20b:487::19) by AS8PR08MB6326.eurprd08.prod.outlook.com (2603:10a6:20b:335::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12; Tue, 14 Jun 2022 15:58:05 +0000 Received: from VE1EUR03FT064.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:487:cafe::a) by AS9PR06CA0744.outlook.office365.com (2603:10a6:20b:487::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Tue, 14 Jun 2022 15:58:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT064.mail.protection.outlook.com (10.152.19.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Tue, 14 Jun 2022 15:58:04 +0000 Received: ("Tessian outbound 1766a3bff204:v120"); Tue, 14 Jun 2022 15:58:04 +0000 X-CR-MTA-TID: 64aa7808 Received: from f1c1d4e9dfda.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id EDF6FE44-04FF-4815-8A10-76E1BF62514B.1; Tue, 14 Jun 2022 15:57:53 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id f1c1d4e9dfda.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 14 Jun 2022 15:57:53 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EbA4aJGA8Zi6RDSymrjyHLGM1nyzZPtvRMLWG7ECF2osnQyVeLYzPY1wiA9hI2W1DIHRGdyZ4x0iAislv7lePa5DWmIYieXg4s3ZsEKG9W9s87IABoonZudoDWReba6RfOiGfw9GOZLqZiYLlfC7jupFSAtxXSswZ9CIBMiPh6mXi8vh29IywDowOen4c7V7Qq7tN4fPO6z+4hI45/w643SDct0RWhPbHmLSvMFosztv8S+CVZc8RgRwlpOV4LWKcVRpIg4YvDP4wQUjaSCGIdbC3852SV3OargivZDHprLzta5do+TmqCSTmDIST48ITE+SNwvU3JfLoZw87SWpQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=khmg6jo8N2A+fa67Y1rYggqpdhr3i34lUFDxok4dysc=; b=almRp4/AmapGl8z9WpImbWsnB6Z540v6O1FshD7EyZFiSQop3r7rqI7DsvCI/UWZ/G96e6KeiOb3MDAH3YsQiRlFIPnfl090+Zsf5CZ18D+77o7QtPDFMOnOrhL2rIfL2T33JopcE2/Cx+SpkuRw8Rk9eB4YLLpwzsvw28IG96PEtMyAMxTmdIWOkEsDjf0fwUWYX1VyLeg+of9AI3zllbLP5r3uJA7CaOiW+R4g6Zu8ia572Bi3McMR6dr2bVu+RInp2yx+XKH1TbAKh7DGBVxwVUFh0IJpdTJty8Y0SrUq+xYlDgeuKfYfBPxkp2db752UQvA2qikY4ZKDviR6TA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB3903.eurprd08.prod.outlook.com (2603:10a6:803:c4::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.13; Tue, 14 Jun 2022 15:57:51 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%8]) with mapi id 15.20.5332.022; Tue, 14 Jun 2022 15:57:51 +0000 From: Tamar Christina To: Richard Sandiford , Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Thread-Topic: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Thread-Index: AQHYe7rkG8mCfxiDnUegM0t31rGaD61NFy0AgAAEOACAAAPNUIAAIDAAgAAlHcCAAYaPAIAABse7gAAjrKA= Date: Tue, 14 Jun 2022 15:57:51 +0000 Message-ID: References: <2p382n54-427o-8q82-6o45-p2nn6869opr5@fhfr.qr> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 5A11B063ECDF614C816E7DB3329B6F1E.0 x-checkrecipientchecked: true Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-MS-Office365-Filtering-Correlation-Id: 88539819-b0a7-4d31-f8ef-08da4e1eaaed x-ms-traffictypediagnostic: VI1PR08MB3903:EE_|VE1EUR03FT064:EE_|AS8PR08MB6326:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: /pP3pieNKhTd7QXMlIkZOO/d07Ojn0PJtI/GRx2Zr5HP9uLR0Wr0RHjtcRXEJL/HKQ98d1O5iZENOKO8hw2BAvs+tYxyJSM+Y3cSAM0qWp54eTchSzsvajaeDl+34hy575Ot91p+2Q1K7TsHqnO7XhwJsAqZPlmcwV35ztkUsp2bJlhK/WEvsAb+C1gV28c6KyaQq9IUcTeX4rRrvRR4IdI/sRjKYg/2VeqSFOmOYziIbfsduo4cxRNWzogKc2xLqGfs5TvWgzGYlr5rFMu+evplbEaWH9pd6iokglAkjt1GDSC+KRrheFtAf2PIo1eg2rLReI2CqdGcbMM7taifX8cgdzNkGKRYw4u1fjGmgCTHzBP7XBw7gw9Nsx9/6NuUuAwVOvq3DcL780bAeszwmn9AxMjU4dlz1uWfWAzGplnIowE1CX5KY7H2i+ETGKjx+cUaUQ0EGZKikYnvOD0Dz2khnzcTlYe6klplGMWl/3XHmpuJ+bKzD1ApeGtD12YhqW0phLK9+igUOhlhMZzUNUCXMgQGtc9R2uj9jSJF7bpid61s/To1qt200O0KALoIpxHnNOg1xihyVXihdzgCHL1xlSm6Zik2qTs0h6jaI35TQUQQuqI/2N/LtwxPp92MmMHNdNqZcR/B3AOWyBSKZqsfLgWH/dUVFkBPKqGsxifWQvLX8u8xSnoLhmETx7J/5QwR/NM/JphvTTdBABtZvA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(366004)(71200400001)(83380400001)(33656002)(2906002)(55016003)(186003)(7696005)(508600001)(54906003)(5660300002)(110136005)(26005)(52536014)(316002)(86362001)(38100700002)(66946007)(9686003)(66556008)(53546011)(6506007)(4326008)(76116006)(8676002)(122000001)(64756008)(66476007)(8936002)(66446008)(38070700005); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3903 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e743308a-4047-4192-1362-08da4e1ea2d3 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 7SMBim2D3rCAJnTICq+jhUjusJ0BLSoef/dNUNBQYBCvZl8lV+SrRXokme0qyXCW1+aq117/4GAkuqutI7oJn4utoTtIc2NTbxPHCrG86B/vOfldeVJsxQyQlI4nrJrnABpBLVgQxaWV0FhjTNbSAyWMcaBzYbOBmFxnt4s1eitumdt86rQHFdMRTRBtliYHImsm/b2pzJ6zBrj5FPBrK5yHU1ijZ9T/XJeczT2/kOxIHwUqvmDcFCWZvuXsyRScALHfByME3IF5laBRASbTfpixUbYa+B54ZTsSvrZEJ4FmZzdUVr+VW7cvk1AXMUslVwefeu3HRrOCRUkrTAB7uGGwzajunTphnjJDKunfW4BPyTfWhyXvCGusy3G0AwWSZk7AmFAESe05s2lQnmOQV1tBA5kjFeZTDVoHHWdFpZNeKuKPNY3S4PQmgDznq9F4c8JbsOH5cHgCtjaQcI8fukXXiaunXKtEziQS537P1eKU7qwF+Fj3gvJf2J6zEPnWoOwWxj6QRogoE6Y131lBjt5zzydAh2Oh0GuT8F12MwmVrnAiJFokKG5k7qEJ8MTBdRPVDZe1cuqbOStymSYKjk9i3IY2pAh/f48wwEERVdkE5CiUarN6LZTgJqu6ETiXNHxKl0lYuh+tNYF455G8WxkBySClY2gD6Xjl7lMTor56vMUtyiNkN0x5pnV0a+BK X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230016)(4636009)(46966006)(40470700004)(36840700001)(47076005)(336012)(70586007)(70206006)(83380400001)(5660300002)(4326008)(52536014)(8676002)(186003)(8936002)(9686003)(26005)(40460700003)(36860700001)(356005)(7696005)(316002)(55016003)(33656002)(82310400005)(53546011)(86362001)(508600001)(54906003)(110136005)(81166007)(2906002)(6506007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jun 2022 15:58:04.8648 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 88539819-b0a7-4d31-f8ef-08da4e1eaaed X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6326 X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2022 15:58:15 -0000 > -----Original Message----- > From: Richard Sandiford > Sent: Tuesday, June 14, 2022 2:43 PM > To: Richard Biener > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; > nd > Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 > bitmask >=20 > Richard Biener writes: > > On Mon, 13 Jun 2022, Tamar Christina wrote: > > > >> > -----Original Message----- > >> > From: Richard Biener > >> > Sent: Monday, June 13, 2022 12:48 PM > >> > To: Tamar Christina > >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > >> > > >> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by > >> > pow2 bitmask > >> > > >> > On Mon, 13 Jun 2022, Tamar Christina wrote: > >> > > >> > > > -----Original Message----- > >> > > > From: Richard Biener > >> > > > Sent: Monday, June 13, 2022 10:39 AM > >> > > > To: Tamar Christina > >> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > >> > > > > >> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division > >> > > > by > >> > > > pow2 bitmask > >> > > > > >> > > > On Mon, 13 Jun 2022, Richard Biener wrote: > >> > > > > >> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote: > >> > > > > > >> > > > > > Hi All, > >> > > > > > > >> > > > > > In plenty of image and video processing code it's common to > >> > > > > > modify pixel values by a widening operation and then scale > >> > > > > > them back into range > >> > > > by dividing by 255. > >> > > > > > > >> > > > > > This patch adds an optab to allow us to emit an optimized > >> > > > > > sequence when doing an unsigned division that is equivalent = to: > >> > > > > > > >> > > > > > x =3D y / (2 ^ (bitsize (y)/2)-1 > >> > > > > > > >> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > >> > > > > > x86_64-pc-linux-gnu and no issues. > >> > > > > > > >> > > > > > Ok for master? > >> > > > > > >> > > > > Looking at 2/2 it seems that this is the wrong way to attack > >> > > > > the problem. The ISA doesn't have such instruction so adding > >> > > > > an optab looks premature. I suppose that there's no unsigned > >> > > > > vector integer division and thus we open-code that in a differ= ent > way? > >> > > > > Isn't the correct thing then to fixup that open-coding if it > >> > > > > is more > >> > efficient? > >> > > > > >> > > > >> > > The problem is that even if you fixup the open-coding it would > >> > > need to be something target specific? The sequence of > >> > > instructions we generate don't have a GIMPLE representation. So > >> > > whatever is generated I'd have to fixup in RTL then. > >> > > >> > What's the operation that doesn't have a GIMPLE representation? > >> > >> For NEON use two operations: > >> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize= (a)/2 > >> Where the + widens and the >> narrows. So you give it two > >> shorts, get a byte 2. Add widening add of lowpart so basically > >> lowpart (a +w b) > >> > >> For SVE2 we use a different sequence, we use two back-to-back > sequences of: > >> 1. Add narrow high part (bottom). In SVE the Top and Bottom instructi= ons > select > >> Even and odd elements of the vector rather than "top half" and "bot= tom > half". > >> > >> So this instruction does : Add each vector element of the first sou= rce > vector to the > >> corresponding vector element of the second source vector, and place > the most > >> significant half of the result in the even-numbered half-width > destination elements, > >> while setting the odd-numbered elements to zero. > >> > >> So there's an explicit permute in there. The instructions are > >> sufficiently different that there wouldn't be a single GIMPLE > representation. > > > > I see. Are these also useful to express scalar integer division? > > > > I'll defer to others to ack the special udiv_pow2_bitmask optab or > > suggest some piecemail things other targets might be able to do as > > well. It does look very special. I'd also bikeshed it to > > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I > > interpreted 'bitmask' correctly ;)). It seems to be even less general > > since it is an unary op and the actual divisor is constrained by the > > mode itself? >=20 > Yeah, those were my concerns as well. For n-bit numbers, the same kind o= f > arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so > from a target-independent point of view, m=3D=3Dn/2 isn't particularly sp= ecial. > Hard-coding one value of m would make sense if there was an underlying > instruction that did exactly this, but like you say, there isn't. >=20 > Would a compromise be to define an optab for ADDHN and then add a vector > pattern for this division that (at least initially) prefers ADDHN over th= e > current approach whenever ADDHN is available? We could then adapt the > conditions on the pattern if other targets also provide ADDHN but don't w= ant > this transform. (I think the other instructions in the pattern already h= ave > optabs.) >=20 > That still leaves open the question about what to do about SVE2, but the > underlying problem there is that the vectoriser doesn't know about the B/= T > layout. Wouldn't it be better to just generalize the optab and to pass on the mask? I'd prefer to do that than teach the vectorizer about ADDHN (which can't be easily done now) let alone teaching it about B/T. It also seems somewhat unnecessary to diverge the implementation here in the mid-end. After all, you can generate better SSE code here as well, so focusing on generating IS= A specific code from here for each ISA seems like the wrong approach to me. Thanks, Tamar >=20 > Thanks, > Richard