From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2082.outbound.protection.outlook.com [40.107.7.82]) by sourceware.org (Postfix) with ESMTPS id 539F53858D35 for ; Fri, 16 Jun 2023 12:17:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 539F53858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CtdWWbe4itD7JrXcygw513gmh02iszD0epUXKMVitzE=; b=MCG6yaoBegTDX5h6lCk55EUr8KTB39iqGvBBINpHEkd2Gs5dkQnE8OR8aCPsd9uwpdF9bqBTRl33CIuxby+6HAv4NN3WZ1nnxZlrbc+OtKP89Rrajw4HBTG4A3AoJ17rFrQwXQyP57ViDat4IY16mGoYEJ0lxlWggNkt3WSJ62g= Received: from DUZPR01CA0011.eurprd01.prod.exchangelabs.com (2603:10a6:10:3c3::19) by PA4PR08MB7409.eurprd08.prod.outlook.com (2603:10a6:102:2a1::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6477.37; Fri, 16 Jun 2023 12:17:27 +0000 Received: from DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:3c3:cafe::c3) by DUZPR01CA0011.outlook.office365.com (2603:10a6:10:3c3::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.25 via Frontend Transport; Fri, 16 Jun 2023 12:17:27 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT057.mail.protection.outlook.com (100.127.142.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.13 via Frontend Transport; Fri, 16 Jun 2023 12:17:27 +0000 Received: ("Tessian outbound 5154e9d36775:v136"); Fri, 16 Jun 2023 12:17:27 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: bc30c3c3b5acc586 X-CR-MTA-TID: 64aa7808 Received: from ffbd952b787c.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1BDC9D05-F1F4-4D3B-8267-9C15F3296A0D.1; Fri, 16 Jun 2023 12:17:21 +0000 Received: from EUR02-AM0-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ffbd952b787c.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 16 Jun 2023 12:17:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Z70rOE7tZiADQRSMND0rDCvigLJsibSX09dHxsmRwu3yzCEE3n1hvprrjkecer0QGO7fWQR6SXathOD3uas3acsCexyCnVhuM0KMj03jEdM4VJrvftno+FoiRh0gDYu53+Qg9NiVCrCwCOwzkpDnhc67EPFrFgJ/9FGijJdADMEeJZtUfUjw7scvTWQmn+5Dq3VuZsDBexmmN226E5lzrd4za09egE1PxMpvlU/xhAR0CjN4ZJ4FZC8CS01T5LFAhAXEEp3bjfOwxMcLEOmPU/c55PV2NadsOsPib08TYZfYty1i69g4UWAx8OsUFQ4e/yFHFlxxvl1H1hOHBTIFDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CtdWWbe4itD7JrXcygw513gmh02iszD0epUXKMVitzE=; b=gWhZHZasBlEcjICtIj/MdHDy498cjpXQXA7saW1ZbMjxyKKrkw3fmcBjFFQfOgopXbn/bRQLbyu/EECRFw9qOM62EHrz9Ek3AUXglu7oRK9sZm3eB174uzEDuqQg8SObtn1KVpcyIT2shF7ee7bp1O6acml/ltpEtNJfzwn3Y7th1dUyxx4mQ6hq6gYEuNrjsDB1+MGWbSR4tNHM0XcWJxyc8V0RYjrAk+MHF0pmD18e6suKSRCpJ8AfkcIvjx24kDlNlPRMyHAZhyELAs7FHKQiXbAOth69vKr/2xyBmYAyczB+ar9t2TCFzAv8iTYlugWxrB4czJgE99klOB+Bgg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CtdWWbe4itD7JrXcygw513gmh02iszD0epUXKMVitzE=; b=MCG6yaoBegTDX5h6lCk55EUr8KTB39iqGvBBINpHEkd2Gs5dkQnE8OR8aCPsd9uwpdF9bqBTRl33CIuxby+6HAv4NN3WZ1nnxZlrbc+OtKP89Rrajw4HBTG4A3AoJ17rFrQwXQyP57ViDat4IY16mGoYEJ0lxlWggNkt3WSJ62g= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by DU2PR08MB10157.eurprd08.prod.outlook.com (2603:10a6:10:46c::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.29; Fri, 16 Jun 2023 12:17:19 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f%3]) with mapi id 15.20.6477.028; Fri, 16 Jun 2023 12:17:19 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Kyrylo Tkachov Subject: Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Topic: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Index: AQHZlXVPCtC3Z5wAcEqXuUSCuN2uu6+NbaXl Date: Fri, 16 Jun 2023 12:17:19 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|DU2PR08MB10157:EE_|DBAEUR03FT057:EE_|PA4PR08MB7409:EE_ X-MS-Office365-Filtering-Correlation-Id: 3f618e96-9e23-4b05-0489-08db6e63a67e x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: o1cvm4i0aeLWSa8Qi2+TALIryvRQNa3M0FBpJi7yX1dcQyMKndXrROJGfDz9v/++qVkv45cCVce9F4+nS152U+ldow7tLdD8zKmY3g49dW8ieufEKg1i4zmaQ73aERjZMpEu0sXs1HqjyFyrrfpUBVD5nZCVIs+XWyXuEU6F37SGYLMrh1tp/cttoZRWalpZW4rI/eXvmmOYF6iniofZaAmjbovgrwfNvlYCEExbLhUY6Acqg/NlyXdeMaigAyiqyfmL3E/rZczc7sLWoJK6WnWHGmRHjTTNmbCC68J4zPLp89aY8Bfu9UHyA42ujIBilcBVaV2VfeDP/oljQP1s4Oo5baUhOkvu9/qiowzB2J2pImS/pnRUUBwhpCg6aBTALZxdouZakvkhkGib8Ofnd2BVBJUFDnVTFs9GMDY5Y/WmrmBb0kPVpUttf44qqNYdWFP9xEh02Z7TEBz2rZjEXuQIq5mzHeUHlVunWXtfew9a/SY0HmoYI9gy2V5h5wypZ4em41sCtNvuf0BVOqVzF15taBCf+1Pv4g4Yktc6eR9c9ZjhSPMJ2DXt9dKcxsP34RwMkovg/b0xPjtsDayNoJO7Rwb2YvD9vCDcYgiT/Do= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(346002)(376002)(366004)(396003)(136003)(451199021)(55016003)(478600001)(54906003)(7696005)(122000001)(5660300002)(8936002)(52536014)(41300700001)(8676002)(2906002)(38070700005)(86362001)(33656002)(38100700002)(64756008)(66446008)(66476007)(66556008)(316002)(6916009)(66946007)(4326008)(91956017)(76116006)(71200400001)(26005)(186003)(9686003)(53546011)(6506007)(83380400001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR08MB10157 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 13066494-5179-4354-33e0-08db6e63a17e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: BLg/Pp1TuuGQ7sVLYodtPAw2Ls5QHX+fbtg2+CK4CpC8OrbTd3DEcHdR+TXT5oycS7yNQjQ9+NLnjkiJWSPKz8+SAkLdXh6KURqFoMhEWOrDAXJv9rIGbqzUSI4xawh/inixynUiDasSp2dYgYuoa7a5TcGe7Mqerkd46lru9uAXx1aJACqftkHyEFJgTWqOhOhQYylisFE6hqxkzSrpDEIQ75uw6XJ8oeHpmEj7MYWVkXhgIwJPl5LexHNR7Nxn3KqqRJZJwSIvkuEAQOMQm96ePigtHxcwdNCwJJi1HmB21rS4S73sbdcdHfWzQAkdBJw+Rq2MnPzU59eteco7TCN6+UuGsKxCgM513oYU/nn4EoMhVyHw33SeJEUFIADBYhJHkXt4Qh3Lc4BorHtZCEcs3tDHuHi773DiOdoAhS7hi5EHkzs7K6dUnic1y+t+2wUrTXndPjwelT73dk1QupAQWIH2Xwlj9TQYe/jwPYcBj6aBRIOP/PQJhexxs7vdAEln2WNF30Gr6CtNCJDgf8zDrfwJv5OhfPV7ydb6OpwEUbdt6fKAa8qVfG1A+9TSXOhxO0B4Lw6QFDIEdIM5bLWGtcAY/0GuVnwvBLLi+mGdrX9F8dvnTF5DqZcCquvP8U30I4NxMz4nR2TS6LwIX77hAlcs7hdl3iPj6qajSACSEXKHiwsgpKXdUhMULDFn+qJ2ed83GrDKtheGixIxhKAcxfp7XoAaexEnR6sQ9lM= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(396003)(39860400002)(376002)(346002)(136003)(451199021)(36840700001)(46966006)(40470700004)(26005)(7696005)(8936002)(9686003)(40460700003)(8676002)(41300700001)(36860700001)(81166007)(478600001)(356005)(82740400003)(82310400005)(70206006)(4326008)(86362001)(70586007)(6916009)(54906003)(316002)(33656002)(53546011)(6506007)(55016003)(40480700001)(52536014)(83380400001)(5660300002)(186003)(336012)(47076005)(2906002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2023 12:17:27.7151 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3f618e96-9e23-4b05-0489-08db6e63a67e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB7409 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: =0A= ping=0A= =0A= From: Wilco Dijkstra=0A= Sent: 02 June 2023 18:28=0A= To: GCC Patches =0A= Cc: Richard Sandiford ; Kyrylo Tkachov =0A= Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR= 110061] =0A= =A0=0A= =0A= Enable lock-free 128-bit atomics on AArch64.=A0 This is backwards compatibl= e with=0A= existing binaries, gives better performance than locking atomics and is wha= t=0A= most users expect.=0A= =0A= Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not su= pported.=0A= This results in an implicit store which is invisible to software as long as= the given=0A= address is writeable (which will be true when using atomics in actual code)= .=0A= =0A= A simple test on an old Cortex-A72 showed 2.7x speedup of 128-bit atomics.= =0A= =0A= Passes regress, OK for commit?=0A= =0A= libatomic/=0A= =A0=A0=A0=A0=A0=A0=A0 PR target/110061=0A= =A0=A0=A0=A0=A0=A0=A0 config/linux/aarch64/atomic_16.S: Implement lock-free= ARMv8.0 atomics.=0A= =A0=A0=A0=A0=A0=A0=A0 config/linux/aarch64/host-config.h: Use atomic_16.S f= or baseline v8.0.=0A= =A0=A0=A0=A0=A0=A0=A0 State we have lock-free atomics.=0A= =0A= ---=0A= =0A= diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/= linux/aarch64/atomic_16.S=0A= index 05439ce394b9653c9bcb582761ff7aaa7c8f9643..0485c284117edf54f41959d2fab= 9341a9567b1cf 100644=0A= --- a/libatomic/config/linux/aarch64/atomic_16.S=0A= +++ b/libatomic/config/linux/aarch64/atomic_16.S=0A= @@ -22,6 +22,21 @@=0A= =A0=A0=A0 .=A0 */=0A= =A0=0A= =A0=0A= +/* AArch64 128-bit lock-free atomic implementation.=0A= +=0A= +=A0=A0 128-bit atomics are now lock-free for all AArch64 architecture vers= ions.=0A= +=A0=A0 This is backwards compatible with existing binaries and gives bette= r=0A= +=A0=A0 performance than locking atomics.=0A= +=0A= +=A0=A0 128-bit atomic loads use a exclusive loop if LSE2 is not supported.= =0A= +=A0=A0 This results in an implicit store which is invisible to software as= long=0A= +=A0=A0 as the given address is writeable.=A0 Since all other atomics have = explicit=0A= +=A0=A0 writes, this will be true when using atomics in actual code.=0A= +=0A= +=A0=A0 The libat__16 entry points are ARMv8.0.=0A= +=A0=A0 The libat__16_i1 entry points are used when LSE2 is available.= =A0 */=0A= +=0A= +=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .arch=A0=A0 armv8-a+lse=0A= =A0=0A= =A0#define ENTRY(name)=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= @@ -37,6 +52,10 @@ name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .cfi_endproc;=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0=A0=A0=A0=A0=A0=A0=A0 .size name, .-name;=0A= =A0=0A= +#define ALIAS(alias,name)=A0=A0=A0=A0=A0 \=0A= +=A0=A0=A0=A0=A0=A0 .global alias;=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= +=A0=A0=A0=A0=A0=A0 .set alias, name;=0A= +=0A= =A0#define res0 x0=0A= =A0#define res1 x1=0A= =A0#define in0=A0 x2=0A= @@ -70,6 +89,24 @@ name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 \=0A= =A0#define SEQ_CST 5=0A= =A0=0A= =A0=0A= +ENTRY (libat_load_16)=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 2f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME/SEQ_CST.=A0 */=0A= +2:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_load_16)=0A= +=0A= +=0A= =A0ENTRY (libat_load_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 1f=0A= =A0=0A= @@ -93,6 +130,23 @@ ENTRY (libat_load_16_i1)=0A= =A0END (libat_load_16_i1)=0A= =A0=0A= =A0=0A= +ENTRY (libat_store_16)=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/SEQ_CST.=A0 */=0A= +2:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= +=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_store_16)=0A= +=0A= +=0A= =A0ENTRY (libat_store_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1f=0A= =A0=0A= @@ -101,14 +155,14 @@ ENTRY (libat_store_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= =A0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 /* RELEASE/SEQ_CST.=A0 */=0A= -1:=A0=A0=A0=A0 ldaxp=A0=A0 xzr, tmp0, [x0]=0A= +1:=A0=A0=A0=A0 ldxp=A0=A0=A0 xzr, tmp0, [x0]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= =A0END (libat_store_16_i1)=0A= =A0=0A= =A0=0A= -ENTRY (libat_exchange_16_i1)=0A= +ENTRY (libat_exchange_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -126,22 +180,55 @@ ENTRY (libat_exchange_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 3b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -4:=0A= -=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 w4, RELEASE=0A= -=A0=A0=A0=A0=A0=A0 b.ne=A0=A0=A0 6f=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* RELEASE.=A0 */=0A= -5:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= +4:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 5b=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_exchange_16)=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* ACQ_REL/SEQ_CST.=A0 */=0A= -6:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 6b=0A= +=0A= +ENTRY (libat_compare_exchange_16)=0A= +=A0=A0=A0=A0=A0=A0 ldp=A0=A0=A0=A0 exp0, exp1, [x1]=0A= +=A0=A0=A0=A0=A0=A0 cbz=A0=A0=A0=A0 w4, 3f=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 w4, RELEASE=0A= +=A0=A0=A0=A0=A0=A0 b.hs=A0=A0=A0 4f=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME.=A0 */=0A= +1:=A0=A0=A0=A0 ldaxp=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2f=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_exchange_16_i1)=0A= +=0A= +2:=A0=A0=A0=A0 stp=A0=A0=A0=A0 tmp0, tmp1, [x1]=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 0=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= +3:=A0=A0=A0=A0 ldxp=A0=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2b=0A= +=A0=A0=A0=A0=A0=A0 stxp=A0=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 3b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +=0A= +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= +4:=A0=A0=A0=A0 ldaxp=A0=A0 tmp0, tmp1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cmp=A0=A0=A0=A0 tmp0, exp0=0A= +=A0=A0=A0=A0=A0=A0 ccmp=A0=A0=A0 tmp1, exp1, 0, eq=0A= +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 2b=0A= +=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x0]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x0, 1=0A= +=A0=A0=A0=A0=A0=A0 ret=0A= +END (libat_compare_exchange_16)=0A= =A0=0A= =A0=0A= =A0ENTRY (libat_compare_exchange_16_i1)=0A= @@ -180,7 +267,7 @@ ENTRY (libat_compare_exchange_16_i1)=0A= =A0END (libat_compare_exchange_16_i1)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_add_16_i1)=0A= +ENTRY (libat_fetch_add_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -199,10 +286,10 @@ ENTRY (libat_fetch_add_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_add_16_i1)=0A= +END (libat_fetch_add_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_add_fetch_16_i1)=0A= +ENTRY (libat_add_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -221,10 +308,10 @@ ENTRY (libat_add_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_add_fetch_16_i1)=0A= +END (libat_add_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_sub_16_i1)=0A= +ENTRY (libat_fetch_sub_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -243,10 +330,10 @@ ENTRY (libat_fetch_sub_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_sub_16_i1)=0A= +END (libat_fetch_sub_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_sub_fetch_16_i1)=0A= +ENTRY (libat_sub_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -265,10 +352,10 @@ ENTRY (libat_sub_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_sub_fetch_16_i1)=0A= +END (libat_sub_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_or_16_i1)=0A= +ENTRY (libat_fetch_or_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -287,10 +374,10 @@ ENTRY (libat_fetch_or_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_or_16_i1)=0A= +END (libat_fetch_or_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_or_fetch_16_i1)=0A= +ENTRY (libat_or_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -309,10 +396,10 @@ ENTRY (libat_or_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_or_fetch_16_i1)=0A= +END (libat_or_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_and_16_i1)=0A= +ENTRY (libat_fetch_and_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -331,10 +418,10 @@ ENTRY (libat_fetch_and_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_and_16_i1)=0A= +END (libat_fetch_and_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_and_fetch_16_i1)=0A= +ENTRY (libat_and_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -353,10 +440,10 @@ ENTRY (libat_and_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_and_fetch_16_i1)=0A= +END (libat_and_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_xor_16_i1)=0A= +ENTRY (libat_fetch_xor_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -375,10 +462,10 @@ ENTRY (libat_fetch_xor_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_xor_16_i1)=0A= +END (libat_fetch_xor_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_xor_fetch_16_i1)=0A= +ENTRY (libat_xor_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2f=0A= =A0=0A= @@ -397,10 +484,10 @@ ENTRY (libat_xor_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_xor_fetch_16_i1)=0A= +END (libat_xor_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_fetch_nand_16_i1)=0A= +ENTRY (libat_fetch_nand_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in0, in0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in1, in1=0A= @@ -421,10 +508,10 @@ ENTRY (libat_fetch_nand_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, tmp0, tmp1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_fetch_nand_16_i1)=0A= +END (libat_fetch_nand_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_nand_fetch_16_i1)=0A= +ENTRY (libat_nand_fetch_16)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in0, in0=0A= =A0=A0=A0=A0=A0=A0=A0=A0 mvn=A0=A0=A0=A0 in1, in1=0A= @@ -445,21 +532,38 @@ ENTRY (libat_nand_fetch_16_i1)=0A= =A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, res0, res1, [x5]=0A= =A0=A0=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 2b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_nand_fetch_16_i1)=0A= +END (libat_nand_fetch_16)=0A= =A0=0A= =A0=0A= -ENTRY (libat_test_and_set_16_i1)=0A= -=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 w2, 1=0A= -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w1, 2f=0A= -=0A= -=A0=A0=A0=A0=A0=A0 /* RELAXED.=A0 */=0A= -=A0=A0=A0=A0=A0=A0 swpb=A0=A0=A0 w0, w2, [x0]=0A= -=A0=A0=A0=A0=A0=A0 ret=0A= +/* __atomic_test_and_set is always inlined, so this entry is unused and=0A= +=A0=A0 only required for completeness.=A0 */=0A= +ENTRY (libat_test_and_set_16)=0A= =A0=0A= -=A0=A0=A0=A0=A0=A0 /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= -2:=A0=A0=A0=A0 swpalb=A0 w0, w2, [x0]=0A= +=A0=A0=A0=A0=A0=A0 /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.=A0 = */=0A= +=A0=A0=A0=A0=A0=A0 mov=A0=A0=A0=A0 x5, x0=0A= +1:=A0=A0=A0=A0 ldaxrb=A0 w0, [x5]=0A= +=A0=A0=A0=A0=A0=A0 stlxrb=A0 w4, w2, [x5]=0A= +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 1b=0A= =A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= -END (libat_test_and_set_16_i1)=0A= +END (libat_test_and_set_16)=0A= +=0A= +=0A= +/* Alias entry points which are the same in baseline and LSE2.=A0 */=0A= +=0A= +ALIAS (libat_exchange_16_i1, libat_exchange_16)=0A= +ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16)=0A= +ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16)=0A= +ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16)=0A= +ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16)=0A= +ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16)=0A= +ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16)=0A= +ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16)=0A= +ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16)=0A= +ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16)=0A= +ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16)=0A= +ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16)=0A= +ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16)=0A= +ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16)=0A= =A0=0A= =A0=0A= =A0/* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.=A0 */= =0A= diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/confi= g/linux/aarch64/host-config.h=0A= index bea26825b4f75bb8ff348ab4b5fc45f4a5bd561e..851c78c01cd643318aaa52929ce= 4550266238b79 100644=0A= --- a/libatomic/config/linux/aarch64/host-config.h=0A= +++ b/libatomic/config/linux/aarch64/host-config.h=0A= @@ -35,10 +35,19 @@=0A= =A0#endif=0A= =A0#define IFUNC_NCOND(N)=A0 (1)=0A= =A0=0A= -#if N =3D=3D 16 && IFUNC_ALT !=3D 0=0A= +#endif /* HAVE_IFUNC */=0A= +=0A= +/* All 128-bit atomic functions are defined in aarch64/atomic_16.S.=A0 */= =0A= +#if N =3D=3D 16=0A= =A0# define DONE 1=0A= =A0#endif=0A= =A0=0A= -#endif /* HAVE_IFUNC */=0A= +/* State we have lock-free 128-bit atomics.=A0 */=0A= +#undef FAST_ATOMIC_LDST_16=0A= +#define FAST_ATOMIC_LDST_16=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1=0A= +#undef MAYBE_HAVE_ATOMIC_CAS_16=0A= +#define MAYBE_HAVE_ATOMIC_CAS_16=A0=A0=A0=A0=A0=A0 1=0A= +#undef MAYBE_HAVE_ATOMIC_EXCHANGE_16=0A= +#define MAYBE_HAVE_ATOMIC_EXCHANGE_16=A0 1=0A= =A0=0A= =A0#include_next =0A=