From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-AM0-obe.outbound.protection.outlook.com (mail-am0eur02on2073.outbound.protection.outlook.com [40.107.247.73]) by sourceware.org (Postfix) with ESMTPS id 7C1BD3858421 for ; Wed, 5 Jul 2023 17:19:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7C1BD3858421 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3Bv8UELfN2hgnfFfgoCqEUp68chYuhgLuCbRyrjf+/4=; b=RRCgyuoiyZ0TROQsaI//ENHRK/C1xl82YurFMF1/ZPosjrRHsLwPg4pZCRAnOqAER/j7/1676DhIYHsYgCBBvpPrCR/OCtQKLOY0C5jC/yTBSx3jJFrfDS8rYpDhx5S+xRxL2JvbUR1qL6KlEQdyh9E1+phFCZZ3+0Yn6++48F0= Received: from DB7PR05CA0017.eurprd05.prod.outlook.com (2603:10a6:10:36::30) by AS8PR08MB8706.eurprd08.prod.outlook.com (2603:10a6:20b:564::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6565.17; Wed, 5 Jul 2023 17:19:26 +0000 Received: from DBAEUR03FT019.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:36:cafe::40) by DB7PR05CA0017.outlook.office365.com (2603:10a6:10:36::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6565.18 via Frontend Transport; Wed, 5 Jul 2023 17:19:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT019.mail.protection.outlook.com (100.127.142.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6565.23 via Frontend Transport; Wed, 5 Jul 2023 17:19:25 +0000 Received: ("Tessian outbound c08fa2e31830:v142"); Wed, 05 Jul 2023 17:19:25 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 41e56861523f11f1 X-CR-MTA-TID: 64aa7808 Received: from b49abb8933fa.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 612F1FB8-E8F8-4AA8-972B-32AD0CE0013A.1; Wed, 05 Jul 2023 17:19:19 +0000 Received: from EUR02-AM0-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b49abb8933fa.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 05 Jul 2023 17:19:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gxBxzqRznhu6//Bmo72n6rfc2lxFWyqmOE71J6wJRJmT8j/s5SN72kTflwHprg9THqU3ycZ/oK0EhNgpB20S3tYC2rk5ItAI06ifkq1M10Yn7PAGR5D3pMvXGXV3YAWtECqGWPc5f2RAfCzvZ128iIzzA69gReLoFoggQ/0pjFCxitcS2cQjnEkyDHOrNP6eW00dEpkE0Pe87QbDrc1IQ2F9zEA0yaL2PL1csyULcq+xuxt6Jj8M4vI0RzQY5orHnI3IO/P4C2kq5pL6ef6a1nN/nBR24Ujm+CnHMcaqNGrBe0QENqqlhpbMiMxV8KhqnSYRZygmVYcMQS/QGEFzzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3Bv8UELfN2hgnfFfgoCqEUp68chYuhgLuCbRyrjf+/4=; b=balO24xE5y4jl6KEbx60Mya4wM14eoSqjXOquHn7SH9KM4xEiMJp72kUPFsHHTKQWOmNXjjvapnO+J3IDufP4cc6KcivQ/+VtQDzqL9vwOcoTMwEw4+j8A+cy4VhIONlV1LT812Fd0LOf57H8n7vX0MvkylpH2Y9t/xbVykUmK7g5c5XEY2VdLBJqj0qknLkPrCbGPKFVLw7HAtRyit3aprHXGah935xQu0UIYc3j+W6ZIqeAw6eoIyDoPEHPlX98x8NE1Tw91adUbUl229Y78Mjl5uFBL/qvNP3JmhGgR20d5wbMTc6ygwBwMwel/JkprTagmoDDmi/UG98rG0QdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3Bv8UELfN2hgnfFfgoCqEUp68chYuhgLuCbRyrjf+/4=; b=RRCgyuoiyZ0TROQsaI//ENHRK/C1xl82YurFMF1/ZPosjrRHsLwPg4pZCRAnOqAER/j7/1676DhIYHsYgCBBvpPrCR/OCtQKLOY0C5jC/yTBSx3jJFrfDS8rYpDhx5S+xRxL2JvbUR1qL6KlEQdyh9E1+phFCZZ3+0Yn6++48F0= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PA4PR08MB6240.eurprd08.prod.outlook.com (2603:10a6:102:ec::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.24; Wed, 5 Jul 2023 17:19:16 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::1eb3:6a82:376b:5ea7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::1eb3:6a82:376b:5ea7%5]) with mapi id 15.20.6544.024; Wed, 5 Jul 2023 17:19:16 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Kyrylo Tkachov Subject: Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Topic: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Index: AQHZlXVPCtC3Z5wAcEqXuUSCuN2uu6+NbaXlgB4wgHE= Date: Wed, 5 Jul 2023 17:19:16 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PA4PR08MB6240:EE_|DBAEUR03FT019:EE_|AS8PR08MB8706:EE_ X-MS-Office365-Filtering-Correlation-Id: 55bb8b7f-cfd3-42d3-c003-08db7d7bfb96 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: aeI3Ga1ORKcWOuswfdKMPIz3h+XQJ/BcMCMLQVkftHW7mm9NcOlvlH1gFcQid54vCmSziXksYvdn6Imt6Sj5Qu6SvwEOnE+869ISQUKlco2hEWq2GLRLzVeUzOUT9dPfC51q6uc6FwfslwRRVc1Ub3KzL3Llm6FUDOmkxy4JdwUGRuv46Gt/4Tx3uqaE2+XHzxKvzYZc4G64ZF8tDapcTum93DobG0lUoRugnxqx2jtTeiWD7YVllYMrfAg+Hxd97mnCAufnnhxuu7at4kU6bGRql9Vt//anRj+mcQOdL7EsyjGseSDKFS2zEEKIfZd83fKUXEjI4hDb9O6IqaYyxSlwTHpP+fCsz+2TsdOU2Q8qfVu0ToLlQGUbnvMZF8HK5UPHYi9xWA9ngLSEWjA8ma6rp0sMl7ogkxCsCIxVzrXOBps1z5pVHvHNxL+v3iXLEr/cip+sWEg3lnOdnrf/ZV4cR1I6O14URCrzAqWBg5wDUVt5IGT468r6hxHP2a9Le1POl9Xjhnb4yF+Osequ+TB0mCnhUe5tvXo/e0EjoCWmU/GCkhXYVuqder8NoYBlcthM7qncKU+OW1LfrS0iFAtkZ5pnJhduN/CNgITL4w4= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(396003)(366004)(376002)(136003)(346002)(451199021)(41300700001)(316002)(7696005)(186003)(83380400001)(53546011)(9686003)(6506007)(26005)(52536014)(54906003)(478600001)(33656002)(71200400001)(122000001)(38100700002)(66446008)(64756008)(76116006)(66946007)(91956017)(66476007)(6916009)(55016003)(4326008)(66556008)(5660300002)(38070700005)(86362001)(8676002)(8936002)(2906002);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6240 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a2f64df9-574f-4d03-e57c-08db7d7bf5cb X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: VJT5xdyY54B01YFMhYPXYu3bMzGKR9ncX6Q2c+j0l+Ug3ev8JhsJrHPUShpm4Ojq2Eqr1L7L1d6g7UqeNI7Kiw1bw7gzVN/r4cUfILgrerizBPhEr/yPelRheqRk7GNx3KWVdcOTtISrsnOI3dNqhadG2Q+sP1771pHfTCbHCM3Usqh/swStnfd8a15RiNzke7n0xZF15wVbj6x7tTFC6GECvl3IezmjlJyO7Fote36+NY8J8hW35pjpbPN2AM9zaNWBJJoSXbHEvE3USATBwkwPCLYGv0VzfeID99RAoNLA3Ki6cZPEKW3Du26XTsXdRZMsT3U9OpMDSbzcoWmqh3q1hhxih15Zzoy4NNGXcDcxYRFFk+QfFAeqBeAIQCim0gz4fr7yq3IyJCrYZrRtP8fs5qCuZSyarkSdVEAByGa9F/DqEllnYuJM7spf2GNvXBA/gzHTjdcAPaEkkwHQeeOB1Fi0nOKGP4NmgvBadeWZAHWmC1jPju1FA+nIvDXYnuIBigWGhE62kP3AnEBkkzxen40GAAbtUSmq8gBqpdSBns7d0g1Xf6w0nEEuQFn4u3+TdDWgTommtLX36LfRvfT8w1mH7/Nyz+BMdajs8tNdV89kvbQP+y31uwnSWAJKELkrfUwV5INFm5WfNb/tkpmt/hA3cDmvOFrruVwiEWcaJmwU3CGfLq8wwyeb7T8X6KQ1I336kL4+s04qNxU0OgEgWkhAIvaHXATZmyOpQcU= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(376002)(346002)(39860400002)(136003)(396003)(451199021)(40470700004)(46966006)(36840700001)(40460700003)(41300700001)(478600001)(9686003)(36860700001)(54906003)(86362001)(82310400005)(7696005)(356005)(81166007)(82740400003)(316002)(4326008)(83380400001)(47076005)(6916009)(33656002)(70586007)(70206006)(55016003)(40480700001)(2906002)(336012)(6506007)(26005)(186003)(8676002)(8936002)(52536014)(5660300002)(53546011);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jul 2023 17:19:25.8467 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 55bb8b7f-cfd3-42d3-c003-08db7d7bfb96 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8706 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: =0A= ping=0A= =0A= From: Wilco Dijkstra=0A= Sent: 02 June 2023 18:28=0A= To: GCC Patches =0A= Cc: Richard Sandiford ; Kyrylo Tkachov =0A= Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR= 110061] =0A= =A0=0A= =0A= Enable lock-free 128-bit atomics on AArch64.=A0 This is backwards compatibl= e with=0A= existing binaries, gives better performance than locking atomics and is wha= t=0A= most users expect.=0A= =0A= Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not su= pported.=0A= This results in an implicit store which is invisible to software as long as= the given=0A= address is writeable (which will be true when using atomics in actual code)= .=0A= =0A= A simple test on an old Cortex-A72 showed 2.7x speedup of 128-bit atomics.= =0A= =0A= Passes regress, OK for commit?=0A= =0A= libatomic/=0A= =A0=A0=A0=A0=A0=A0=A0 PR target/110061=0A= =A0=A0=A0=A0=A0=A0=A0 config/linux/aarch64/atomic_16.S: Implement lock-free= ARMv8.0 atomics.=0A= =A0=A0=A0=A0=A0=A0=A0 config/linux/aarch64/host-config.h: Use atomic_16.S f= or baseline v8.0.=0A= =A0=A0=A0=A0=A0=A0=A0 State we have lock-free atomics.=0A= =0A= ---=0A= =0A= diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/= linux/aarch64/atomic_16.S=0A= index 05439ce394b9653c9bcb582761ff7aaa7c8f9643..0485c284117edf54f41959d2fab= 9341a9567b1cf 100644=0A= --- a/libatomic/config/linux/aarch64/atomic_16.S=0A= +++ b/libatomic/config/linux/aarch64/atomic_16.S=0A= @@ -22,6 +22,21 @@=0A= =A0=A0=A0 .=A0 */=0A= =A0=0A= =A0=0A= +/* AArch64 128-bit lock-free atomic implementation.=0A= +=0A= +=A0=A0 128-bit atomics are now lock-free for all AArch64 architecture vers= ions.=0A= +=A0=A0 This is backwards compatible with existing binaries and gives bette= r=0A= +=A0=A0 performance than locking atomics.=0A= +=0A= +=A0=A0 128-bit atomic loads use a exclusive loop if LSE2 is not supported.= =0A= +=A0=A0 This results in an implicit store which is invisible to software as= long=0A= +=A0=A0 as the given address is writeable.=A0 Since all other atomics have = explicit=0A= +=A0=A0 writes, this will be true when using atomics in actual code.=0A= +=0A= +=A0=A0 The libat__16 entry points are ARMv8.0.=0A= +=A0=A0 The libat__16_i1 entry points are used when LSE2 is available.= =A0 */=0A= +=0A= +=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .arch=A0=A0 armv8-a+lse=0A= =A0=0A= =A0#define ENTRY(name)=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= @@ -37,6 +52,10 @@ name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .cfi_endproc;=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .size name, .-name;=0A= =A0=0A= +#define ALIAS(alias,name)=A0=A0=A0=A0=A0 \=0A= +=A0=A0=A0=A0=A0=A0 .global alias;=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= +=A0=A0=A0=A0=A0=A0 .set alias, name;=0A= +=0A= =A0#define res0 x0=0A= =A0#define res1 x1=0A= =A0#define in0=A0 x2=0A= @@ -70,6 +89,24 @@ name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0#define SEQ_CST 5=0A= =A0=0A= =A0=0A= +ENTRY (libat_load_16)=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 2f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME/SEQ_CST.=A0 */=0A= +2:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_load_16)=0A= +=0A= +=0A= =A0ENTRY (libat_load_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 1f=0A= =A0=0A= @@ -93,6 +130,23 @@ ENTRY (libat_load_16_i1)=0A= =A0END (libat_load_16_i1)=0A= =A0=0A= =A0=0A= +ENTRY (libat_store_16)=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/SEQ_CST.=A0 */=0A= +2:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= +=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_store_16)=0A= +=0A= +=0A= =A0ENTRY (libat_store_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1f=0A= =A0=0A= @@ -101,14 +155,14 @@ ENTRY (libat_store_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= =A0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 /* RELEASE/SEQ_CST.=A0 */=0A= -1:=A0=A0=A0=A0 ldaxp=A0=A0 xzr, tmp0, [x0]=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= =A0END (libat_store_16_i1)=0A= =A0=0A= =A0=0A= -ENTRY (libat_exchange_16_i1)=0A= +ENTRY (libat_exchange_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -126,22 +180,55 @@ ENTRY (libat_exchange_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 3b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -4:=0A= -=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 w4, RELEASE=0A= -=A0=A0=A0=A0=A0=A0 b.ne=A0=A0=A0 6f=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* RELEASE.=A0 */=0A= -5:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= +4:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 5b=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_exchange_16)=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* ACQ_REL/SEQ_CST.=A0 */=0A= -6:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 6b=0A= +=0A= +ENTRY (libat_compare_exchange_16)=0A= +=A0=A0=A0=A0=A0=A0 ldp=A0=A0=A0=A0 exp0, exp1, [x1]=0A= +=A0=A0=A0=A0=A0=A0 cbz=A0=A0=A0=A0 w4, 3f=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 w4, RELEASE=0A= +=A0=A0=A0=A0=A0=A0 b.hs=A0=A0=A0 4f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME.=A0 */=0A= +1:=A0=A0=A0=A0 ldaxp=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2f=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_exchange_16_i1)=0A= +=0A= +2:=A0=A0=A0=A0 stp=A0=A0=A0=A0 tmp0, tmp1, [x1]=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 0=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +3:=A0=A0=A0=A0 ldxp=A0=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2b=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 3b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= +4:=A0=A0=A0=A0 ldaxp=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2b=0A= +=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_compare_exchange_16)=0A= =A0=0A= =A0=0A= =A0ENTRY (libat_compare_exchange_16_i1)=0A= @@ -180,7 +267,7 @@ ENTRY (libat_compare_exchange_16_i1)=0A= =A0END (libat_compare_exchange_16_i1)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_add_16_i1)=0A= +ENTRY (libat_fetch_add_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -199,10 +286,10 @@ ENTRY (libat_fetch_add_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_add_16_i1)=0A= +END (libat_fetch_add_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_add_fetch_16_i1)=0A= +ENTRY (libat_add_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -221,10 +308,10 @@ ENTRY (libat_add_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_add_fetch_16_i1)=0A= +END (libat_add_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_sub_16_i1)=0A= +ENTRY (libat_fetch_sub_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -243,10 +330,10 @@ ENTRY (libat_fetch_sub_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_sub_16_i1)=0A= +END (libat_fetch_sub_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_sub_fetch_16_i1)=0A= +ENTRY (libat_sub_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -265,10 +352,10 @@ ENTRY (libat_sub_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_sub_fetch_16_i1)=0A= +END (libat_sub_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_or_16_i1)=0A= +ENTRY (libat_fetch_or_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -287,10 +374,10 @@ ENTRY (libat_fetch_or_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_or_16_i1)=0A= +END (libat_fetch_or_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_or_fetch_16_i1)=0A= +ENTRY (libat_or_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -309,10 +396,10 @@ ENTRY (libat_or_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_or_fetch_16_i1)=0A= +END (libat_or_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_and_16_i1)=0A= +ENTRY (libat_fetch_and_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -331,10 +418,10 @@ ENTRY (libat_fetch_and_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_and_16_i1)=0A= +END (libat_fetch_and_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_and_fetch_16_i1)=0A= +ENTRY (libat_and_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -353,10 +440,10 @@ ENTRY (libat_and_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_and_fetch_16_i1)=0A= +END (libat_and_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_xor_16_i1)=0A= +ENTRY (libat_fetch_xor_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -375,10 +462,10 @@ ENTRY (libat_fetch_xor_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_xor_16_i1)=0A= +END (libat_fetch_xor_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_xor_fetch_16_i1)=0A= +ENTRY (libat_xor_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -397,10 +484,10 @@ ENTRY (libat_xor_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_xor_fetch_16_i1)=0A= +END (libat_xor_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_nand_16_i1)=0A= +ENTRY (libat_fetch_nand_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in0, in0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in1, in1=0A= @@ -421,10 +508,10 @@ ENTRY (libat_fetch_nand_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_nand_16_i1)=0A= +END (libat_fetch_nand_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_nand_fetch_16_i1)=0A= +ENTRY (libat_nand_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in0, in0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in1, in1=0A= @@ -445,21 +532,38 @@ ENTRY (libat_nand_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_nand_fetch_16_i1)=0A= +END (libat_nand_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_test_and_set_16_i1)=0A= -=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 w2, 1=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 2f=0A= -=0A= -=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= -=A0=A0=A0=A0=A0=A0 swpb=A0=A0=A0 w0, w2, [x0]=0A= -=A0=A0=A0=A0=A0=A0 ret=0A= +/* __atomic_test_and_set is always inlined, so this entry is unused and=0A= +=A0=A0 only required for completeness.=A0 */=0A= +ENTRY (libat_test_and_set_16)=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= -2:=A0=A0=A0=A0 swpalb=A0 w0, w2, [x0]=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.=A0 = */=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= +1:=A0=A0=A0=A0 ldaxrb=A0 w0, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stlxrb=A0 w4, w2, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_test_and_set_16_i1)=0A= +END (libat_test_and_set_16)=0A= +=0A= +=0A= +/* Alias entry points which are the same in baseline and LSE2.=A0 */=0A= +=0A= +ALIAS (libat_exchange_16_i1, libat_exchange_16)=0A= +ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16)=0A= +ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16)=0A= +ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16)=0A= +ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16)=0A= +ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16)=0A= +ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16)=0A= +ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16)=0A= +ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16)=0A= +ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16)=0A= +ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16)=0A= +ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16)=0A= +ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16)=0A= +ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16)=0A= =A0=0A= =A0=0A= =A0/* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.=A0 */= =0A= diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/confi= g/linux/aarch64/host-config.h=0A= index bea26825b4f75bb8ff348ab4b5fc45f4a5bd561e..851c78c01cd643318aaa52929ce= 4550266238b79 100644=0A= --- a/libatomic/config/linux/aarch64/host-config.h=0A= +++ b/libatomic/config/linux/aarch64/host-config.h=0A= @@ -35,10 +35,19 @@=0A= =A0#endif=0A= =A0#define IFUNC_NCOND(N)=A0 (1)=0A= =A0=0A= -#if N =3D=3D 16 && IFUNC_ALT !=3D 0=0A= +#endif /* HAVE_IFUNC */=0A= +=0A= +/* All 128-bit atomic functions are defined in aarch64/atomic_16.S.=A0 */= =0A= +#if N =3D=3D 16=0A= =A0# define DONE 1=0A= =A0#endif=0A= =A0=0A= -#endif /* HAVE_IFUNC */=0A= +/* State we have lock-free 128-bit atomics.=A0 */=0A= +#undef FAST_ATOMIC_LDST_16=0A= +#define FAST_ATOMIC_LDST_16=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1=0A= +#undef MAYBE_HAVE_ATOMIC_CAS_16=0A= +#define MAYBE_HAVE_ATOMIC_CAS_16=A0=A0=A0=A0=A0=A0 1=0A= +#undef MAYBE_HAVE_ATOMIC_EXCHANGE_16=0A= +#define MAYBE_HAVE_ATOMIC_EXCHANGE_16=A0 1=0A= =A0=0A= =A0#include_next =