From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2088.outbound.protection.outlook.com [40.107.22.88]) by sourceware.org (Postfix) with ESMTPS id 2D5423858D3C for ; Thu, 25 May 2023 08:50:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D5423858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JDEWvohr3kix/5eHWTAYSIzIsZ0WJaXkrte5O/XZmik=; b=JbRmndLnIO+Cr11XVXbO3Ift4WuCC3n1CciW90/aInaNJfGjiZkzeGe3aBcjZXGyp2orXV9PJbqwjVvUE4kmk9JTaFhEIJQrZoa0N7plAiLz3N9dHuiL0vRQQzqTRPIeNCNOqZ9SnyGoN6d9W3wlNqkZlSGy5nJXTi4T9Nan6/w= Received: from AS9P250CA0005.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:532::11) by DBAPR08MB5703.eurprd08.prod.outlook.com (2603:10a6:10:1ad::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.16; Thu, 25 May 2023 08:50:08 +0000 Received: from AM7EUR03FT043.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:532:cafe::57) by AS9P250CA0005.outlook.office365.com (2603:10a6:20b:532::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.16 via Frontend Transport; Thu, 25 May 2023 08:50:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT043.mail.protection.outlook.com (100.127.140.160) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.15 via Frontend Transport; Thu, 25 May 2023 08:50:07 +0000 Received: ("Tessian outbound 5154e9d36775:v136"); Thu, 25 May 2023 08:50:07 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2edc606df70c37ee X-CR-MTA-TID: 64aa7808 Received: from 1a19cd5d182f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 69A397E1-1971-44A2-B61C-1F3411D67F98.1; Thu, 25 May 2023 08:50:00 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 1a19cd5d182f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 25 May 2023 08:50:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iHNsqoSH3S/n+yNDPtZC+zh3AhqGPiriUu+IBvZZMhDLpAXMPg99MeQG3m3ld4I9zADzAaedfLuc6NWLgIBEQF3MbFkAFVRuBY7Ls/B/MJTUCkpgs7WN508lDQa56M4eQPwtIQ4WC6Yj8nR5GgxrhC2+sSMhvwJEOsNhm2/PiDvSVJFpjq7KKGG/Qbjq/ZUe2kXh6AaMX9gL4+Kt7JiWz70PvwDS7WVJTa5wgmBhWThWbZXZKtTiYRI3G94CrvsAtyFqi0dzNp1Q0MlNXB+a8BoT/VItpIG/eiZwUTk05djBUPtuPRKRL1crjz0s/InEMRVcbAapJCN8fSERqIDE4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JDEWvohr3kix/5eHWTAYSIzIsZ0WJaXkrte5O/XZmik=; b=FN3HpA5lbYqCo+AzO9S8KBbUTz7+8XuLl1330TsFR2I7S9O908+GxmyrnGnMs4/BKpArKTyUatHlEDirhiCpxgjzwphCloraUoq9VQDPlAi3SrBMuqzU0DpAt1LI7qLiPni73uwTspsJsxrdc7oxiui5q8iAQoeY9LJLCixZlCg6Z4a7EDVI4rR8REkFzH6Z6v5u0u1USbhrv2/c4XptgH3biqzThrwuHKoPsEop4rB4W/XlFetEinehUFOvC/pJnoRG48xHZAt5bupBzeU8tbufS0798haUMTGEoXWLdOo5NSN5+Aq90NBi6CD/yHAaA34VeJ1fKS7KlF02lo4jZQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JDEWvohr3kix/5eHWTAYSIzIsZ0WJaXkrte5O/XZmik=; b=JbRmndLnIO+Cr11XVXbO3Ift4WuCC3n1CciW90/aInaNJfGjiZkzeGe3aBcjZXGyp2orXV9PJbqwjVvUE4kmk9JTaFhEIJQrZoa0N7plAiLz3N9dHuiL0vRQQzqTRPIeNCNOqZ9SnyGoN6d9W3wlNqkZlSGy5nJXTi4T9Nan6/w= Received: from PAXPR08MB6926.eurprd08.prod.outlook.com (2603:10a6:102:138::24) by DU0PR08MB10366.eurprd08.prod.outlook.com (2603:10a6:10:40a::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.15; Thu, 25 May 2023 08:49:56 +0000 Received: from PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::db73:66ba:ae70:1ff1]) by PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::db73:66ba:ae70:1ff1%3]) with mapi id 15.20.6433.016; Thu, 25 May 2023 08:49:56 +0000 From: Kyrylo Tkachov To: "gcc-patches@gcc.gnu.org" Subject: RE: [PATCH] aarch64: Implement vector FP absolute compare intrinsics with builtins Thread-Topic: [PATCH] aarch64: Implement vector FP absolute compare intrinsics with builtins Thread-Index: AdmJeZhBpCEmbHDYSNqm1K/pSruhiQFbD6Yw Date: Thu, 25 May 2023 08:49:56 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAXPR08MB6926:EE_|DU0PR08MB10366:EE_|AM7EUR03FT043:EE_|DBAPR08MB5703:EE_ X-MS-Office365-Filtering-Correlation-Id: 0ce39646-dc32-4c2d-903b-08db5cfd0ab9 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: eJ9xkRlWFFZ69Qqqn8IejSCbr5Zu/2xJxtX+fQ3DYzdhR4Dp0wVAv0SR4/HxcJj/QUeSHSwRB21pAE/BIV9qnex84cUAKPGZyDdBddFOLxukRV2lnziUJzo+JK3MclrJLta03y6t1lWTupDzNmQqseDaPkP5Buz81W7gsR/heb0YcOiehwd7LTRxtLEhiHhQl9I6nUPbES06Qu+O5eF2P2IlWtoMl+pX3uRn6q9FDK4v3FdyJ0fafGdM7MEeyOugWtp7QS+Q7b34CofCChY9heVdf4pkAKY9V2Esaz/T6SLDpt5krD64rZaZNZaWO+INQotQQ0myUIkUxsXUa6cxREJuRMUn3+cNcKkjrbKAeYc92/GF56mRGTimGc5XF97skW5yvTiJnIkh3pIXgCPvZ/VMiBif4ey/IKmw14+BbzBEkxku2ADxIHeDrc8KGygEmRRxb10VMFkyEmI7nCXo7xRJlobQLMr6K4+gH0fIupX/uyQNzEzhp5wb1eFlmddMIk/HMrtzDmrHM3YhjCR5jFKSKN5NGpIJ3VNnTcwVYS8xBCnB4e94iU9bfPHiv6ihIQqQcu0PydjheCQLrFx+NWL7RbVxTX8g1yIZkv4SL00= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAXPR08MB6926.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(396003)(346002)(366004)(376002)(136003)(451199021)(8676002)(52536014)(8936002)(5660300002)(86362001)(26005)(53546011)(6506007)(9686003)(83380400001)(186003)(2906002)(33656002)(55016003)(66476007)(478600001)(71200400001)(66446008)(316002)(84970400001)(122000001)(38100700002)(6916009)(66556008)(76116006)(66946007)(38070700005)(64756008)(7696005)(41300700001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB10366 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 54872d43-45ad-4215-8060-08db5cfd03ca X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: M2ipo593XcCDXDxB+qhHq+596evAYmrj6JrzrAynSlVBlGF2pJWveiDqUHAQhWRH6p+HJdIfJ3KagEKiLFR2W3wfK6Gd8Obfgmcyn0rWz2R9oK5+uLNKGiZDW9O42Jr5Yc4O+aFgHPO8e5UsWqU336V9TC3dV+B5HRfxFY6CcmH3dnukhbTRZ8vmeuoxzrE+63A/Fd4tsea6lbQPp1DHKR9XNF5xiZukAYXuD+HJRcBO/kpLNFOU8PhSw4anYj7iWXhTKGtp87fPhe37u8bQcV+VIKUNysNSbbamZJ8LmwY/IJSCI4ckivl1wCqbzfsfsyi71aJeE0VgSwpoSIKgEUKfRx1dZUh8oY2jXf8qZn31tYxp5Lkf3VZBNwzMyeLHmP2a/3b95AXOPKWLol1xtO0XeGHiPbE7kFECT6jXjVyMfUrQ42r5NcsAZjMaH+qmIVpyy3zHu44TVjrS4fetag3Xv72nf21zJldNvJ2yfQpZ4sK0klv3fmShpk0Y3mDFjht7p67s9V8bBprLqSmXsqCU8ovCkI6kp9WM40lg1YXjhz0tyGbKLvnwz7qy9eV2knh8bWmqtBPVsR/v3HJTDcFzQRd1vFWHHQyA4JVLmhur5d1AeS1b6KjAN3pY/gUoZmmFPNbrtJ5aa8kYqLVrTrPaDxSWMZN6zeM1DjTIFMFzHMBgmZkyxQtOwoF27yHoPsTF1/dQnsjDXZ9NdmdbH+Q0IzNVilOzEXoC9ws3ss0= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(396003)(346002)(376002)(136003)(39860400002)(451199021)(40470700004)(36840700001)(46966006)(33656002)(55016003)(40480700001)(82310400005)(53546011)(6506007)(8676002)(8936002)(9686003)(41300700001)(26005)(186003)(336012)(5660300002)(52536014)(316002)(83380400001)(70586007)(6916009)(70206006)(478600001)(86362001)(82740400003)(356005)(81166007)(7696005)(36860700001)(2906002)(47076005)(40460700003)(84970400001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 May 2023 08:50:07.8846 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0ce39646-dc32-4c2d-903b-08db5cfd0ab9 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR08MB5703 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Kyrylo Tkachov > Sent: Thursday, May 18, 2023 12:14 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford > Subject: [PATCH] aarch64: Implement vector FP absolute compare intrinsics > with builtins >=20 > Hi all, >=20 > While optimising some vector math library code with intrinsics we stumble= d > upon the issue in the testcase. > The compiler should be generating a FACGT instruction but instead we > generate: > foo(__Float32x4_t, __Float32x4_t, __Float32x4_t): > fabs v0.4s, v0.4s > adrp x0, .LC0 > ldr q31, [x0, #:lo12:.LC0] > fcmgt v0.4s, v0.4s, v31.4s > ret >=20 > This is because the vcagtq_f32 intrinsic is open-coded in arm_neon.h as > return vabsq_f32 (__a) > vabsq_f32 (__b) > thus relying on the optimisers to merge it back together. But since one o= f the > arms of the comparison > is a vector constant the combine pass optimises the abs into it and tries > matching: > (set (reg:V4SI 101) > (neg:V4SI (gt:V4SI (reg:V4SF 100) > (const_vector:V4SF [ > (const_double:SF 1.0e+2 [0x0.c8p+7]) repeated x4 > ])))) > and > (set (reg:V4SI 101) > (neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 104)) > (reg:V4SF 103)))) >=20 > instead of what we want: > (insn 13 9 14 2 (set (reg/i:V4SI 32 v0) > (neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 98)) > (abs:V4SF (reg:V4SF 96))))) >=20 > I don't really see a good way around that with our current implementation= of > these intrinsics. > Therefore this patch reimplements these intrinsics with aarch64 builtins = that > generate the RTL for these > instructions directly. Apparently we already had them defined in aarch64- > simd-builtins.def and have been > using them for the fp16 case already. > I realise that this approach is against the general principle of expressi= ng > intrinsics in the higher-level constructs, > so I'm willing to listen to counter-arguments. > That said, the FACGT/FACGE instructions are as fast as the non-ABS > comparison instructions on all microarchitectures that I know of > so it should always be a win to have them in the merged form rather than > split the fabs step separately or try to hoist it. > And the testcase does come from real library code that we're trying to > optimise. > With this patch for the testcase we generate: > foo: > adrp x0, .LC0 > ldr q31, [x0, #:lo12:.LC0] > facgt v0.4s, v0.4s, v31.4s > ret >=20 > Bootstrapped and tested on aarch64-none-linux-gnu. > I'll hold off on committing this to give folks a few days to comment, but= will > push by the end of next week if there are no objections. Pushed to trunk. Thanks, Kyrill >=20 > Thanks, > Kyrill >=20 > gcc/ChangeLog: >=20 > * config/aarch64/arm_neon.h (vcage_f64): Reimplement with > builtins. > (vcage_f32): Likewise. > (vcages_f32): Likewise. > (vcageq_f32): Likewise. > (vcaged_f64): Likewise. > (vcageq_f64): Likewise. > (vcagts_f32): Likewise. > (vcagt_f32): Likewise. > (vcagt_f64): Likewise. > (vcagtq_f32): Likewise. > (vcagtd_f64): Likewise. > (vcagtq_f64): Likewise. > (vcale_f32): Likewise. > (vcale_f64): Likewise. > (vcaled_f64): Likewise. > (vcales_f32): Likewise. > (vcaleq_f32): Likewise. > (vcaleq_f64): Likewise. > (vcalt_f32): Likewise. > (vcalt_f64): Likewise. > (vcaltd_f64): Likewise. > (vcaltq_f32): Likewise. > (vcaltq_f64): Likewise. > (vcalts_f32): Likewise. >=20 > gcc/testsuite/ChangeLog: >=20 > * gcc.target/aarch64/simd/facgt_constpool_1.c: New test.