From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2085.outbound.protection.outlook.com [40.107.21.85]) by sourceware.org (Postfix) with ESMTPS id 9AC863857700 for ; Fri, 4 Aug 2023 15:08:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9AC863857700 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TEAGxF71Ln1KC5mWUv7Z+I16ra8Xx2Oq80fbdzzsRkQ=; b=HGw2LI7UMH+BLB6EbjK2/61b67mDhjlxSBEc0eiP8swIuiLXgyr80KBP0dLzI7RyVUoAmOv497qJ4LPu5KAzX6v13yrmVouOwWmzJs8aMRRxv1lRrpQWUD0RfF5Ps+yRgI+N+JEJX3IaoeCD7SQM2Vqv2thlIp8ucDTATsu5+xA= Received: from DUZPR01CA0259.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b9::7) by DBBPR08MB6026.eurprd08.prod.outlook.com (2603:10a6:10:1f6::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.21; Fri, 4 Aug 2023 15:08:09 +0000 Received: from DBAEUR03FT010.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4b9:cafe::11) by DUZPR01CA0259.outlook.office365.com (2603:10a6:10:4b9::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.21 via Frontend Transport; Fri, 4 Aug 2023 15:08:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT010.mail.protection.outlook.com (100.127.142.78) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6631.48 via Frontend Transport; Fri, 4 Aug 2023 15:08:09 +0000 Received: ("Tessian outbound f9124736ff4f:v145"); Fri, 04 Aug 2023 15:08:09 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6e3966399280481d X-CR-MTA-TID: 64aa7808 Received: from d2a6640b77c6.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 94C02C8C-5027-44A6-92BD-33FD79A9F876.1; Fri, 04 Aug 2023 15:08:02 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id d2a6640b77c6.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 04 Aug 2023 15:08:02 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ixr9in/mWrzXqra0ZsOOtm6kADpUjxuVACCF779w1N0EOZpDH79PjhxfBI9Z+g4nb2/4t4V3co9hjzSMtulMM72L+vvC3rDnFpj1IzlMEI6gCHToZidomQqEgzshA0fIZOYISKs3sSDm+cd3IX24S4GDyIfq7XZrlYaE1ZwRNWMwsP5VXi++LOLNY5tUf2mqiIYw6dIx7rCaACV1pkqiS2M59CMHpXG1N3XwP44+eztHvrFx/wctGr4vWTTYmXEC1atRY1E1QRT3+qWpWhymsx2nIbIF1/T8oboRQqFuZOV091c2/YZnTO22CazqRuKyER6BN25SiHtpuSqTAo5hIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TEAGxF71Ln1KC5mWUv7Z+I16ra8Xx2Oq80fbdzzsRkQ=; b=ZUOS+bbLHPB72O+1wQ3OrDtxHzZQB9n12S/w6QpVYMumkk24dp9fewjf1kbp7K1adbVxyUKX8cdWH1EuSSCdRRPGXwsU9it/eOdNvo5DiniFLD0rfE21mm8uUC06VHIRRDxUGHetiUvZxnWprJxJf/vonHXirgM05Igd+jGDT8nKU4eNcS3FTvNyxelx2lKM6nCNI0vkKK9I6+viSA8Zd87Pop1AEU5bmxnVcPTB0oN9ZQtLR6nggB5RVwtIw40oz3b8L2TM70y0bEK2cqEsPocwMwoXBDGHfrJHf7YX+pIQRqloeJ92hYdppC04iDLcYj3vUWruAaOThS4cuflvYw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TEAGxF71Ln1KC5mWUv7Z+I16ra8Xx2Oq80fbdzzsRkQ=; b=HGw2LI7UMH+BLB6EbjK2/61b67mDhjlxSBEc0eiP8swIuiLXgyr80KBP0dLzI7RyVUoAmOv497qJ4LPu5KAzX6v13yrmVouOwWmzJs8aMRRxv1lRrpQWUD0RfF5Ps+yRgI+N+JEJX3IaoeCD7SQM2Vqv2thlIp8ucDTATsu5+xA= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by VI1PR08MB5328.eurprd08.prod.outlook.com (2603:10a6:803:13a::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6631.47; Fri, 4 Aug 2023 15:08:00 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::4cbf:41a8:56db:cdb7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::4cbf:41a8:56db:cdb7%7]) with mapi id 15.20.6631.046; Fri, 4 Aug 2023 15:08:00 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Kyrylo Tkachov Subject: Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Topic: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Index: AQHZlXVPCtC3Z5wAcEqXuUSCuN2uu6+NbaXlgB4wgHGALwG51A== Date: Fri, 4 Aug 2023 15:07:59 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|VI1PR08MB5328:EE_|DBAEUR03FT010:EE_|DBBPR08MB6026:EE_ X-MS-Office365-Filtering-Correlation-Id: fa624c67-711e-40fa-acf8-08db94fc9d20 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: kYBRMdmVdeuQ7WuLcva6iNtpkeoHHnyGe4GhBAg0sztL1BZmlyK8dP29BiH/59u61C64zJ7zpiD4boJmpGsJnD35GgC/55MKo+KovMjNi8L/BN0IX9YBeTDwAN1UGA17HtPzJlU1LAUQBhjRKlF/6c5vj3DJWqFBLB4VbSNJdQaMB+JT4FRg2rnlYEjdKIWwFKedaMx6rR5Z+5EnjHE5ZVMgWD7VZCh2zaJuN5it4tblRviq/NcU3vIdqJcI3bCk+LK15OozPQyadkOd0ZgPB7jegsKhjSwqN0/2JX3Cbp0UBbHVHMpLQ2wlEf8wnsqmKZxGTYrS5yZ+0LAzzXYAk7tajjPU6cSBu2kIVoo+4HxShhfvrfjT51gR+d/vQUeIfVXpnruIc5oS4wFTg0K1ZTqZUsirFar1Vp2zTT9gCz7b6YrFKHyXbsNvSoSt+7chMEDSI7pNxwlKzv+Jtk49nAmhhmXvlOBIPrwEHv1vE40kg4i8JVouEKQtTZEbpFHhrECmKabWX9r21Z94IpSAyYsNAPbLb9IxnWnh1KYczICrhzBzVZ2vDfW/Q6d4dDIz6+peXnPJKiQxdxtNLqBNu/095KpgojtCe/g5y5t99mg= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(346002)(396003)(136003)(366004)(39850400004)(376002)(186006)(1800799003)(451199021)(86362001)(33656002)(38070700005)(55016003)(54906003)(478600001)(38100700002)(122000001)(26005)(6506007)(8676002)(83380400001)(53546011)(41300700001)(8936002)(52536014)(9686003)(7696005)(71200400001)(316002)(91956017)(76116006)(66476007)(2906002)(4326008)(6916009)(66446008)(64756008)(66556008)(5660300002)(66946007);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5328 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT010.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 4e08d579-e5d3-48ee-39a6-08db94fc97b3 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: TlejcDvh70k2oBnK8D+m8FlcKlwP8iiixnwfyjnjordh3q4bixpTiPkX3r69x+S0hSqjEkf16A7Ck/76LqbCE5XoVtagbicwdy48RsFAigzFXBLkLLFRVMuIyPV8AXmKEfIvTGWZ3SAPZfwZJztL6FWBk+HauA9BwiSduvAyrPLMRAOh6517suTc1kQGcEehWcbWrw836U+oIWUgdj8AaUBDdoYIlvGPSL4UTpSP+5AYFnPHNgt8vI38E2BzywAevvRY/KjdnvLoGOjYgojNo8t6y+Sf3dwgEwcIjr0hC3UTqw6WbJe4xUs+B7jLDkT6ohXE05oVPJ2zu4fD+8J7VvsJj8woK8M2tYCmyRtM7NfDBadyvmDeZO4NgW39xPZ35/Tco/vElF8pcnv4MNMGON4FOWAhF5dAMbgngFt7EA9ed/O8eHe3CiyBZVdMfCETw/BmerU7zZFGI4pYfmnuu5yOUQfW0PXVJXenjuA/z05w0OKMa+x4IXs6NWb2gNqtrcHgee+246w5ZFw3sQXZJIXWuV5OVXaD9nbnukLge96t2m9L7uHYXch5IgDikFgWOecsIdo4ZzdSPtV9PWRee5itJMfYTy+NyxLCZzXlvvbz6JutgQYDqE25lFXz2VS2GHSvkTRheuyVPxic5vW5lcSaK208Ex8Gl34q/4lbPEbgxkkPSS4ejnZkGOYEvF571R1A8B97TxNVP0WEgJyGGle4mlj31j9cSdEBbop01Uw= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(396003)(346002)(39850400004)(136003)(376002)(1800799003)(186006)(451199021)(82310400008)(46966006)(40470700004)(36840700001)(8936002)(8676002)(26005)(478600001)(55016003)(40480700001)(86362001)(9686003)(7696005)(33656002)(316002)(41300700001)(5660300002)(4326008)(40460700003)(6916009)(52536014)(83380400001)(47076005)(36860700001)(54906003)(2906002)(70206006)(70586007)(6506007)(82740400003)(336012)(81166007)(356005)(53546011);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Aug 2023 15:08:09.1734 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fa624c67-711e-40fa-acf8-08db94fc9d20 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT010.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB6026 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: ping=0A= =0A= From: Wilco Dijkstra=0A= Sent: 02 June 2023 18:28=0A= To: GCC Patches =0A= Cc: Richard Sandiford ; Kyrylo Tkachov =0A= Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR= 110061] =0A= =A0=0A= =0A= Enable lock-free 128-bit atomics on AArch64.=A0 This is backwards compatibl= e with=0A= existing binaries, gives better performance than locking atomics and is wha= t=0A= most users expect.=0A= =0A= Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not su= pported.=0A= This results in an implicit store which is invisible to software as long as= the given=0A= address is writeable (which will be true when using atomics in actual code)= .=0A= =0A= A simple test on an old Cortex-A72 showed 2.7x speedup of 128-bit atomics.= =0A= =0A= Passes regress, OK for commit?=0A= =0A= libatomic/=0A= =A0=A0=A0=A0=A0=A0=A0 PR target/110061=0A= =A0=A0=A0=A0=A0=A0=A0 config/linux/aarch64/atomic_16.S: Implement lock-free= ARMv8.0 atomics.=0A= =A0=A0=A0=A0=A0=A0=A0 config/linux/aarch64/host-config.h: Use atomic_16.S f= or baseline v8.0.=0A= =A0=A0=A0=A0=A0=A0=A0 State we have lock-free atomics.=0A= =0A= ---=0A= =0A= diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/= linux/aarch64/atomic_16.S=0A= index 05439ce394b9653c9bcb582761ff7aaa7c8f9643..0485c284117edf54f41959d2fab= 9341a9567b1cf 100644=0A= --- a/libatomic/config/linux/aarch64/atomic_16.S=0A= +++ b/libatomic/config/linux/aarch64/atomic_16.S=0A= @@ -22,6 +22,21 @@=0A= =A0=A0=A0 .=A0 */=0A= =A0=0A= =A0=0A= +/* AArch64 128-bit lock-free atomic implementation.=0A= +=0A= +=A0=A0 128-bit atomics are now lock-free for all AArch64 architecture vers= ions.=0A= +=A0=A0 This is backwards compatible with existing binaries and gives bette= r=0A= +=A0=A0 performance than locking atomics.=0A= +=0A= +=A0=A0 128-bit atomic loads use a exclusive loop if LSE2 is not supported.= =0A= +=A0=A0 This results in an implicit store which is invisible to software as= long=0A= +=A0=A0 as the given address is writeable.=A0 Since all other atomics have = explicit=0A= +=A0=A0 writes, this will be true when using atomics in actual code.=0A= +=0A= +=A0=A0 The libat__16 entry points are ARMv8.0.=0A= +=A0=A0 The libat__16_i1 entry points are used when LSE2 is available.= =A0 */=0A= +=0A= +=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .arch=A0=A0 armv8-a+lse=0A= =A0=0A= =A0#define ENTRY(name)=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= @@ -37,6 +52,10 @@ name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .cfi_endproc;=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .size name, .-name;=0A= =A0=0A= +#define ALIAS(alias,name)=A0=A0=A0=A0=A0 \=0A= +=A0=A0=A0=A0=A0=A0 .global alias;=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= +=A0=A0=A0=A0=A0=A0 .set alias, name;=0A= +=0A= =A0#define res0 x0=0A= =A0#define res1 x1=0A= =A0#define in0=A0 x2=0A= @@ -70,6 +89,24 @@ name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0#define SEQ_CST 5=0A= =A0=0A= =A0=0A= +ENTRY (libat_load_16)=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 2f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME/SEQ_CST.=A0 */=0A= +2:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_load_16)=0A= +=0A= +=0A= =A0ENTRY (libat_load_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 1f=0A= =A0=0A= @@ -93,6 +130,23 @@ ENTRY (libat_load_16_i1)=0A= =A0END (libat_load_16_i1)=0A= =A0=0A= =A0=0A= +ENTRY (libat_store_16)=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/SEQ_CST.=A0 */=0A= +2:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= +=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_store_16)=0A= +=0A= +=0A= =A0ENTRY (libat_store_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1f=0A= =A0=0A= @@ -101,14 +155,14 @@ ENTRY (libat_store_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= =A0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 /* RELEASE/SEQ_CST.=A0 */=0A= -1:=A0=A0=A0=A0 ldaxp=A0=A0 xzr, tmp0, [x0]=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= =A0END (libat_store_16_i1)=0A= =A0=0A= =A0=0A= -ENTRY (libat_exchange_16_i1)=0A= +ENTRY (libat_exchange_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -126,22 +180,55 @@ ENTRY (libat_exchange_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 3b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -4:=0A= -=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 w4, RELEASE=0A= -=A0=A0=A0=A0=A0=A0 b.ne=A0=A0=A0 6f=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* RELEASE.=A0 */=0A= -5:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= +4:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 5b=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_exchange_16)=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* ACQ_REL/SEQ_CST.=A0 */=0A= -6:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 6b=0A= +=0A= +ENTRY (libat_compare_exchange_16)=0A= +=A0=A0=A0=A0=A0=A0 ldp=A0=A0=A0=A0 exp0, exp1, [x1]=0A= +=A0=A0=A0=A0=A0=A0 cbz=A0=A0=A0=A0 w4, 3f=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 w4, RELEASE=0A= +=A0=A0=A0=A0=A0=A0 b.hs=A0=A0=A0 4f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME.=A0 */=0A= +1:=A0=A0=A0=A0 ldaxp=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2f=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_exchange_16_i1)=0A= +=0A= +2:=A0=A0=A0=A0 stp=A0=A0=A0=A0 tmp0, tmp1, [x1]=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 0=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +3:=A0=A0=A0=A0 ldxp=A0=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2b=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 3b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= +4:=A0=A0=A0=A0 ldaxp=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2b=0A= +=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_compare_exchange_16)=0A= =A0=0A= =A0=0A= =A0ENTRY (libat_compare_exchange_16_i1)=0A= @@ -180,7 +267,7 @@ ENTRY (libat_compare_exchange_16_i1)=0A= =A0END (libat_compare_exchange_16_i1)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_add_16_i1)=0A= +ENTRY (libat_fetch_add_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -199,10 +286,10 @@ ENTRY (libat_fetch_add_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_add_16_i1)=0A= +END (libat_fetch_add_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_add_fetch_16_i1)=0A= +ENTRY (libat_add_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -221,10 +308,10 @@ ENTRY (libat_add_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_add_fetch_16_i1)=0A= +END (libat_add_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_sub_16_i1)=0A= +ENTRY (libat_fetch_sub_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -243,10 +330,10 @@ ENTRY (libat_fetch_sub_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_sub_16_i1)=0A= +END (libat_fetch_sub_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_sub_fetch_16_i1)=0A= +ENTRY (libat_sub_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -265,10 +352,10 @@ ENTRY (libat_sub_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_sub_fetch_16_i1)=0A= +END (libat_sub_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_or_16_i1)=0A= +ENTRY (libat_fetch_or_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -287,10 +374,10 @@ ENTRY (libat_fetch_or_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_or_16_i1)=0A= +END (libat_fetch_or_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_or_fetch_16_i1)=0A= +ENTRY (libat_or_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -309,10 +396,10 @@ ENTRY (libat_or_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_or_fetch_16_i1)=0A= +END (libat_or_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_and_16_i1)=0A= +ENTRY (libat_fetch_and_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -331,10 +418,10 @@ ENTRY (libat_fetch_and_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_and_16_i1)=0A= +END (libat_fetch_and_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_and_fetch_16_i1)=0A= +ENTRY (libat_and_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -353,10 +440,10 @@ ENTRY (libat_and_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_and_fetch_16_i1)=0A= +END (libat_and_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_xor_16_i1)=0A= +ENTRY (libat_fetch_xor_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -375,10 +462,10 @@ ENTRY (libat_fetch_xor_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_xor_16_i1)=0A= +END (libat_fetch_xor_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_xor_fetch_16_i1)=0A= +ENTRY (libat_xor_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -397,10 +484,10 @@ ENTRY (libat_xor_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_xor_fetch_16_i1)=0A= +END (libat_xor_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_nand_16_i1)=0A= +ENTRY (libat_fetch_nand_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in0, in0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in1, in1=0A= @@ -421,10 +508,10 @@ ENTRY (libat_fetch_nand_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_nand_16_i1)=0A= +END (libat_fetch_nand_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_nand_fetch_16_i1)=0A= +ENTRY (libat_nand_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in0, in0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in1, in1=0A= @@ -445,21 +532,38 @@ ENTRY (libat_nand_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_nand_fetch_16_i1)=0A= +END (libat_nand_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_test_and_set_16_i1)=0A= -=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 w2, 1=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 2f=0A= -=0A= -=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= -=A0=A0=A0=A0=A0=A0 swpb=A0=A0=A0 w0, w2, [x0]=0A= -=A0=A0=A0=A0=A0=A0 ret=0A= +/* __atomic_test_and_set is always inlined, so this entry is unused and=0A= +=A0=A0 only required for completeness.=A0 */=0A= +ENTRY (libat_test_and_set_16)=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= -2:=A0=A0=A0=A0 swpalb=A0 w0, w2, [x0]=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.=A0 = */=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= +1:=A0=A0=A0=A0 ldaxrb=A0 w0, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stlxrb=A0 w4, w2, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_test_and_set_16_i1)=0A= +END (libat_test_and_set_16)=0A= +=0A= +=0A= +/* Alias entry points which are the same in baseline and LSE2.=A0 */=0A= +=0A= +ALIAS (libat_exchange_16_i1, libat_exchange_16)=0A= +ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16)=0A= +ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16)=0A= +ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16)=0A= +ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16)=0A= +ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16)=0A= +ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16)=0A= +ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16)=0A= +ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16)=0A= +ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16)=0A= +ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16)=0A= +ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16)=0A= +ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16)=0A= +ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16)=0A= =A0=0A= =A0=0A= =A0/* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.=A0 */= =0A= diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/confi= g/linux/aarch64/host-config.h=0A= index bea26825b4f75bb8ff348ab4b5fc45f4a5bd561e..851c78c01cd643318aaa52929ce= 4550266238b79 100644=0A= --- a/libatomic/config/linux/aarch64/host-config.h=0A= +++ b/libatomic/config/linux/aarch64/host-config.h=0A= @@ -35,10 +35,19 @@=0A= =A0#endif=0A= =A0#define IFUNC_NCOND(N)=A0 (1)=0A= =A0=0A= -#if N =3D=3D 16 && IFUNC_ALT !=3D 0=0A= +#endif /* HAVE_IFUNC */=0A= +=0A= +/* All 128-bit atomic functions are defined in aarch64/atomic_16.S.=A0 */= =0A= +#if N =3D=3D 16=0A= =A0# define DONE 1=0A= =A0#endif=0A= =A0=0A= -#endif /* HAVE_IFUNC */=0A= +/* State we have lock-free 128-bit atomics.=A0 */=0A= +#undef FAST_ATOMIC_LDST_16=0A= +#define FAST_ATOMIC_LDST_16=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1=0A= +#undef MAYBE_HAVE_ATOMIC_CAS_16=0A= +#define MAYBE_HAVE_ATOMIC_CAS_16=A0=A0=A0=A0=A0=A0 1=0A= +#undef MAYBE_HAVE_ATOMIC_EXCHANGE_16=0A= +#define MAYBE_HAVE_ATOMIC_EXCHANGE_16=A0 1=0A= =A0=0A= =A0#include_next =