From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2046.outbound.protection.outlook.com [40.107.20.46]) by sourceware.org (Postfix) with ESMTPS id CC6573858D28 for ; Mon, 4 Dec 2023 18:22:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CC6573858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CC6573858D28 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.46 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1701714172; cv=pass; b=Xe2ZFB3vQ6VXHFaDVFmF+nv9mH3r4ms6MZ+mUAD6IaFZq+DivZRMm/V6wJPeHxwHSkmgt+p5LvAib57xIaEeXtiopnPSuRtOLsBTR/qGYrbtKZ24SnRvMxfVBKseKPG3ouCI18AxWKf1iuf/UaG74VuCGELu4rpdUfijONXHFDc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1701714172; c=relaxed/simple; bh=aMwtWbXkuH+PKQsjCR4rLLAZB/MOaJkubDibKZWLfTw=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=M4Ww8RLiRL+sGYGJAW/KiK0owM+1h9gGE4TluqNbXgzk+GBm/nd4oITV7Hjwea32oYiMzNoEm+qo1m/u00Yxr8CO9Qu4UfonttazpkUhVPHzN3MYmYyQ1HMNH6ZrvwK5R+iRlKMXbrwylYahQvJFb/D1dRgSykRSQaOYxQqOFwk= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=OJCHfcj8KNoR86SzUqu9lMe3J4Rptg3Y/JDe+qTzCZV0xL1e4s0kGF0sDYq8jr1RPakPgdKtW7QsGtGqHWTCm7WkaUG0UiO8SbUkRy0gNbSA8qTT+9YmzQy2gfDoSRKsj3LtP0Gl+IEnGQ8wtOX2CLcQ4ko9vWQBSBhyW66yf/TMtmoWG8BKlQOmWInvnaERHASy8/4o78qzUr1xK3KgWN/jLepOUmpWuRNzLALheMR64flsC2u9b7/dROG2Ie5vcIf7L8heoi27sY9ZQXRuqBfFJtSfFhUOO8+NIbPwHAKbW17c0sc37xTy4jAmWmCZiHqLDKFUKlejBgkF9euV2w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I+NnREfGWfiIcax1jsNUJXhd/dSLIuqhkgYCTgymuks=; b=MGuAozW8mGbmeiWD/OG6hkHfAsZ4lvOLhxZg+hIyF74LApW6jlvJTeRvcQatbCvpUqIdyRRsnnuCj4fDCg38AY1wMSWFFt6GqlQFadpe6zDehb0VYNyh4z0r5+gfYvyU73hUlYM6tjkmiJw8vY8KiqcImFE1s0jWxBz26+JNI0OJz84N17TvKpileRVXPOJjL77lekbfBMjaAwa2MVVUIVc09k3IDWaIirRTBcPmI+kRkJo8mIhlYzHXlVxq8zVmV9SkF1NAXLOExuHjrbknPfAwMObhHCOtPtUIeX18COBE4TNAEdMuk8+kfgrUUuqI2EdVngLU2rTcH7LXF6u3fw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=I+NnREfGWfiIcax1jsNUJXhd/dSLIuqhkgYCTgymuks=; b=oP0McoiGRL+JUgw0+gg67wOFGe+vlDwiB8K4v7mQ87gWE80luqJs5koX4YQFZtV23nx6KPKdx6RelJxCFyh8PM4W82Fpd1SnKbQAON/Kv66/F2sTQTW/+hBuskLNiytMALYHqYb6T5vb8nBVGU82wNI1r0H2BZT76/5l1MBk228= Received: from DUZPR01CA0234.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b4::25) by PR3PR08MB5850.eurprd08.prod.outlook.com (2603:10a6:102:92::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.33; Mon, 4 Dec 2023 18:22:47 +0000 Received: from DU6PEPF00009529.eurprd02.prod.outlook.com (2603:10a6:10:4b4:cafe::2) by DUZPR01CA0234.outlook.office365.com (2603:10a6:10:4b4::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.33 via Frontend Transport; Mon, 4 Dec 2023 18:22:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF00009529.mail.protection.outlook.com (10.167.8.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7068.20 via Frontend Transport; Mon, 4 Dec 2023 18:22:47 +0000 Received: ("Tessian outbound 7671e7ddc218:v228"); Mon, 04 Dec 2023 18:22:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: fca6c5277ec12aaf X-CR-MTA-TID: 64aa7808 Received: from cb992c9040be.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 309719D1-D6B8-4ACC-893E-2F078CB37192.1; Mon, 04 Dec 2023 18:22:40 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id cb992c9040be.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 04 Dec 2023 18:22:40 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HPaJdpnmdXqXnnHALLQnK04g14VMll9hlhyNbQ+PErzxiqIG2zN+2V0/eQTIzk1aAc4WFQ91nYFA+8vNnqh/VOHS0vyXBPECpp4sUrfPipKNc8o6CbIs7f28n28kyAhAI7up7vi1x52NECbRckxTmpkZ5kn7g2PlWliMZ92tmGb/G2QfSjGE1ckPRclUJeRp7nMvUKHEVafipdLV4A7VUkf/diIxMo4GZVtdulwHCZGRBUQSJXkx9mnGdsWHkU+W5s8idi7eb2oQX/R/OJ9iV3fh2qeFMwqpQDSPQ2FrrmZktCneB5xIXu/PrCqipWizLkcowL6yVDfRcUWDog8OFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I+NnREfGWfiIcax1jsNUJXhd/dSLIuqhkgYCTgymuks=; b=Xwqwq3LlkX9rlP4ibmfEcrf0euQwsc0qlWQrx30Ihlo1sAQcb9P1axMzdvPN1fkpdtWNIQbeWBoQiYzhXzymZ1aSW5nC/cLKXF6XymS4EyXusVhp5JfL6MJjadYK/+iVOaJRZZ7LQgeb6zHxOoV5zrF1b0Ze8CDShqmlwJK//eteU/pJDzXbSmfcf+lr4Om/tl9Jh/BEYEykbc6lIv+y3BVhXKylX+46CC1BC9L06ns8rhbFP3ax7U52bRyL2CKcIfsUflDc7kdb9y2ggINuigbBnd0c6hbtV7l9pMZRWJgaQ6VHo9SWUREmFVRjEmPXxcLjuWqVp48Hn6Aht6xAtw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=I+NnREfGWfiIcax1jsNUJXhd/dSLIuqhkgYCTgymuks=; b=oP0McoiGRL+JUgw0+gg67wOFGe+vlDwiB8K4v7mQ87gWE80luqJs5koX4YQFZtV23nx6KPKdx6RelJxCFyh8PM4W82Fpd1SnKbQAON/Kv66/F2sTQTW/+hBuskLNiytMALYHqYb6T5vb8nBVGU82wNI1r0H2BZT76/5l1MBk228= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by DU0PR08MB9105.eurprd08.prod.outlook.com (2603:10a6:10:47a::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.33; Mon, 4 Dec 2023 18:22:38 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::2ed5:dc23:2624:df0a]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::2ed5:dc23:2624:df0a%7]) with mapi id 15.20.7025.022; Mon, 4 Dec 2023 18:22:38 +0000 From: Wilco Dijkstra To: Richard Sandiford CC: GCC Patches , Kyrylo Tkachov Subject: Re: [PATCH v2] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Topic: [PATCH v2] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061] Thread-Index: AQHaJt7cSne+T9gRk0WHaASa5wcrIg== Date: Mon, 4 Dec 2023 18:22:37 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|DU0PR08MB9105:EE_|DU6PEPF00009529:EE_|PR3PR08MB5850:EE_ X-MS-Office365-Filtering-Correlation-Id: 78fa2b2d-88df-49c2-457b-08dbf4f6040d x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: EQlQIcydBAoXMYJnPzdu8FxZlUAYjv428wlwH3NsGNSdUpDmWXB5/vbqgVCzwhh4m0YZ3sBhcn4D/ljkjDMBswRIu89h3+/uJCDKovR2up2anQ7Qe7Z6oaNr+SwdmxKg3LLpVD8InpD2vvi8XW5ftIGqkhMzVb0ADIYKk4rvnAghmAPN3+rqvOgA2RzrA4MDab++VP9kSDm/U+syoYWzh+T0sGCds5H+hUxovTLXlFMncI6dR2kve92tiQ87tP3NIgHFcX1LOuXCs1tUM69SdJONjgMmlrqCEPffubFdsJFf3k2zS4skj7+w10sLYDDR3kgjAwNeFndTV/PGXw6pJB9lWYwpIMskvnMezW/I4dogqoX6UvQ0e9pk6yDc56yCJk6bzC134SrG7CUhZ34asr5tLEN2sxl2jfdlTL97vo73vSbX+gek/IYkIvPMiir5e2tDZeIatg/gf9w1UjMJHvRl22Atm0zGBjz+fyGHvszDhV0V0eHPoSWJMoLtC8yYPetMQuDO8O/BXoJvhC1QX4uImmwiV6t8fd0+qNMgCDwbtzMJLn8MqHk1MklqOcMckDZdx/FKeBOcYQcu5BB0mwW3FbgcFUhAZa+bu/alZSO9YEHFIBQZR6O4Ol19WKfCn3/HY0fhqGrI8zabmtWZZg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(136003)(39860400002)(396003)(346002)(376002)(230922051799003)(1800799012)(186009)(64100799003)(451199024)(6636002)(54906003)(91956017)(66556008)(64756008)(66476007)(66446008)(66946007)(76116006)(4326008)(6862004)(8676002)(8936002)(316002)(478600001)(71200400001)(966005)(30864003)(5660300002)(38070700009)(41300700001)(33656002)(2906002)(86362001)(52536014)(83380400001)(26005)(122000001)(38100700002)(55016003)(6506007)(9686003)(7696005);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9105 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF00009529.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 79b31abb-35ee-4fad-d78a-08dbf4f5fe92 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +M3bcIBh7KG3eLnL8RGJsvSCv5tSI16kOKpzGhuSRzdOCcnrSLVKtDAcAUjZgiffu5NnI1bVS5zKsy6CkLyCsuQtoxLFtgWPdGjw0cp0zdu11lC811NhU+aS3V8cJbovaC0xkd9HSPNmBYQ1YhrmKncLqq+KA5xn2GnMNMSe7XgpCKoCrSPmqJRCfHSkgZt6suXfRXkVM1iNIPPVrka3O+99l4E16jJAgP8kSI3graZfCapvF0AB4lAinSNeZf03qL7hp68w5bCA3e6QAe/ltuFuaCt0G4igfGQNgtwJ4msv1VeWSR+GlzKnS5iK3YSmx0buxw+KDkBG0jRiiCi8kpOBXcMgnCzVhZOR428SOerGcD0hDY26Ti5jABs1E3c8q3BpINj4oqDJ61KKO+HmRUoHVYCS7N1TBtJWCabYk99KsAD9iMoI51INTOJHjaMNRze0MBlmEbCWhrFZvyvAJ9GiNcnDhHNbj5+c/71I3aDTC8crkkBpvqB4C/Lpt8nnsuJQOLC7RGlDYD5jnV4XP5yAt4MAWJ8oOwkB/4XtobPeanV4J15trfJK0su0hKiL+yzAbhs74rw1jPSieaBSbJiuYY/MZy6vE25ymSeOTPz8Gmg4/SxgdF6PdBIhVdHqPIQjCxXEW/mxZ0qx92UAreODA71PoS1z4tHRdt/KSR0xg47Pj07tj9sdue6r9g0NusCSQaQoZTNx6MWYrDMUVTq72MxZHkuyCSR7XwD275bkgqmcz5LSIrBPoCBNEs/PQDs0qL7TUEPomqEcoSZrFA== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(136003)(39860400002)(396003)(346002)(376002)(230922051799003)(1800799012)(82310400011)(186009)(64100799003)(451199024)(40470700004)(46966006)(36840700001)(6636002)(54906003)(70206006)(70586007)(4326008)(6862004)(8676002)(8936002)(316002)(40460700003)(478600001)(966005)(30864003)(5660300002)(41300700001)(33656002)(2906002)(86362001)(52536014)(40480700001)(356005)(82740400003)(47076005)(81166007)(83380400001)(336012)(26005)(55016003)(36860700001)(6506007)(9686003)(7696005);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Dec 2023 18:22:47.0167 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 78fa2b2d-88df-49c2-457b-08dbf4f6040d X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF00009529.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3PR08MB5850 X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SCC_5_SHORT_WORD_LINES,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard,=0A= =0A= >> Enable lock-free 128-bit atomics on AArch64.=A0 This is backwards compat= ible with=0A= >> existing binaries, gives better performance than locking atomics and is = what=0A= >> most users expect.=0A= >=0A= > Please add a justification for why it's backwards compatible, rather=0A= > than just stating that it's so.=0A= =0A= This isn't any different than the LSE2 support which also switches some CPU= s to=0A= lock-free implementations. This is basically switching the rest. It trivial= ly follows=0A= from the fact that GCC always calls libatomic so that you switch all atomic= s in a=0A= process. I'll add that to the description.=0A= =0A= Note the compatibility story is even better than this. We are also compatib= le=0A= with LLVM and future GCC versions which may inline these sequences.=0A= =0A= > Thanks for adding this.=A0 https://gcc.gnu.org/bugzilla/show_bug.cgi?id= =3D95722=0A= > suggests that it's still an open question whether this is a correct thing= =0A= > to do, but it sounds from Joseph's comment that he isn't sure whether=0A= > atomic loads from read-only data are valid.=0A= =0A= Yes it's not useful to do an atomic read if it is a read-only value... It s= hould=0A= be feasible to mark atomic types as mutable to force them to .data (see eg.= =0A= https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108659 and=0A= https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109553).=0A= =0A= > Linus's comment in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D70490= =0A= > suggests that a reasonable compromise might be to use a storing=0A= > implementation but not advertise that it is lock-free.=A0 Also,=0A= > the comment above libat_is_lock_free says:=0A= >=0A= > /* Note that this can return that a size/alignment is not lock-free even = if=0A= > =A0=A0 all the operations that we use to implement the respective accesse= s provide=0A= > =A0=A0 lock-free forward progress as specified in C++14:=A0 Users likely = expect=0A= > =A0=A0 "lock-free" to also mean "fast", which is why we do not return tru= e if, for=0A= > =A0=A0 example, we implement loads with this size/alignment using a CAS.= =A0 */=0A= =0A= I don't believe lying about being lock-free like that is a good idea. When= =0A= you use a faster lock-free implementation, you want to tell users about it= =0A= (so they aren't forced to use nasty inline assembler hacks for example).=0A= =0A= > We don't use a CAS for the fallbacks, but like you say, we do use a=0A= > load/store exclusive loop.=A0 So did you consider not doing this:=0A= =0A= > +/* State we have lock-free 128-bit atomics.=A0 */=0A= > +#undef FAST_ATOMIC_LDST_16=0A= > +#define FAST_ATOMIC_LDST_16=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1=0A= =0A= That would result in __atomic_is_lock_free incorrectly returning false.=0A= Note that __atomic_always_lock_free remains false for 128-bit since there= =0A= is no inlining in the compiler, but __atomic_is_lock_free should be true.= =0A= =0A= > -=A0=A0=A0=A0=A0=A0 /* RELEASE.=A0 */=0A= > -5:=A0=A0=A0=A0 ldxp=A0=A0=A0 res0, res1, [x5]=0A= > +=A0=A0=A0=A0=A0=A0 /* RELEASE/ACQ_REL/SEQ_CST.=A0 */=0A= > +4:=A0=A0=A0=A0 ldaxp=A0=A0 res0, res1, [x5]=0A= >=A0=A0=A0=A0=A0=A0=A0=A0=A0 stlxp=A0=A0 w4, in0, in1, [x5]=0A= > -=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 5b=0A= > +=A0=A0=A0=A0=A0=A0 cbnz=A0=A0=A0 w4, 4b=0A= >=A0=A0=A0=A0=A0=A0=A0=A0=A0 ret=0A= > +END (libat_exchange_16)=0A= =0A= > Please explain (here and in the commit message) why you're adding=0A= > acquire semantics to the RELEASE case.=0A= =0A= That merges the RELEASE with ACQ_REL/SEQ_CST cases to keep the code=0A= short and simple like much of the code. I've added a note in the commit msg= .=0A= =0A= Cheers,=0A= Wilco=0A= =0A= Here is v2 - this also incorporates the PR111404 fix to compare-exchange:= =0A= =0A= Enable lock-free 128-bit atomics on AArch64. This is backwards compatible = with=0A= existing binaries (as for these GCC always calls into libatomic, so all 128= -bit=0A= atomic uses in a process are switched), gives better performance than lock= ing=0A= atomics and is what most users expect.=0A= =0A= Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not su= pported.=0A= This results in an implicit store which is invisible to software as long as= the=0A= given address is writeable (which will be true when using atomics in actual= code).=0A= =0A= Passes regress, OK for commit?=0A= =0A= libatomic/=0A= config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomi= cs.=0A= (libat_exchange_16): Merge RELEASE and ACQ_REL/SEQ_CST cases.=0A= config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8= .0.=0A= State we have lock-free atomics.=0A= =0A= ---=0A= =0A= diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/= linux/aarch64/atomic_16.S=0A= index 05439ce394b9653c9bcb582761ff7aaa7c8f9643..a099037179b3f1210145baea02a= 9d43418629813 100644=0A= --- a/libatomic/config/linux/aarch64/atomic_16.S=0A= +++ b/libatomic/config/linux/aarch64/atomic_16.S=0A= @@ -22,6 +22,22 @@=0A= . */=0A= =0A= =0A= +/* AArch64 128-bit lock-free atomic implementation.=0A= +=0A= + 128-bit atomics are now lock-free for all AArch64 architecture versions= .=0A= + This is backwards compatible with existing binaries (as we swap all use= s=0A= + of 128-bit atomics via an ifunc) and gives better performance than lock= ing=0A= + atomics.=0A= +=0A= + 128-bit atomic loads use a exclusive loop if LSE2 is not supported.=0A= + This results in an implicit store which is invisible to software as lon= g=0A= + as the given address is writeable. Since all other atomics have explic= it=0A= + writes, this will be true when using atomics in actual code.=0A= +=0A= + The libat__16 entry points are ARMv8.0.=0A= + The libat__16_i1 entry points are used when LSE2 is available. */= =0A= +=0A= +=0A= .arch armv8-a+lse=0A= =0A= #define ENTRY(name) \=0A= @@ -37,6 +53,10 @@ name: \=0A= .cfi_endproc; \=0A= .size name, .-name;=0A= =0A= +#define ALIAS(alias,name) \=0A= + .global alias; \=0A= + .set alias, name;=0A= +=0A= #define res0 x0=0A= #define res1 x1=0A= #define in0 x2=0A= @@ -70,6 +90,24 @@ name: \=0A= #define SEQ_CST 5=0A= =0A= =0A= +ENTRY (libat_load_16)=0A= + mov x5, x0=0A= + cbnz w1, 2f=0A= +=0A= + /* RELAXED. */=0A= +1: ldxp res0, res1, [x5]=0A= + stxp w4, res0, res1, [x5]=0A= + cbnz w4, 1b=0A= + ret=0A= +=0A= + /* ACQUIRE/CONSUME/SEQ_CST. */=0A= +2: ldaxp res0, res1, [x5]=0A= + stxp w4, res0, res1, [x5]=0A= + cbnz w4, 2b=0A= + ret=0A= +END (libat_load_16)=0A= +=0A= +=0A= ENTRY (libat_load_16_i1)=0A= cbnz w1, 1f=0A= =0A= @@ -93,6 +131,23 @@ ENTRY (libat_load_16_i1)=0A= END (libat_load_16_i1)=0A= =0A= =0A= +ENTRY (libat_store_16)=0A= + cbnz w4, 2f=0A= +=0A= + /* RELAXED. */=0A= +1: ldxp xzr, tmp0, [x0]=0A= + stxp w4, in0, in1, [x0]=0A= + cbnz w4, 1b=0A= + ret=0A= +=0A= + /* RELEASE/SEQ_CST. */=0A= +2: ldxp xzr, tmp0, [x0]=0A= + stlxp w4, in0, in1, [x0]=0A= + cbnz w4, 2b=0A= + ret=0A= +END (libat_store_16)=0A= +=0A= +=0A= ENTRY (libat_store_16_i1)=0A= cbnz w4, 1f=0A= =0A= @@ -101,14 +156,14 @@ ENTRY (libat_store_16_i1)=0A= ret=0A= =0A= /* RELEASE/SEQ_CST. */=0A= -1: ldaxp xzr, tmp0, [x0]=0A= +1: ldxp xzr, tmp0, [x0]=0A= stlxp w4, in0, in1, [x0]=0A= cbnz w4, 1b=0A= ret=0A= END (libat_store_16_i1)=0A= =0A= =0A= -ENTRY (libat_exchange_16_i1)=0A= +ENTRY (libat_exchange_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -126,22 +181,60 @@ ENTRY (libat_exchange_16_i1)=0A= stxp w4, in0, in1, [x5]=0A= cbnz w4, 3b=0A= ret=0A= -4:=0A= - cmp w4, RELEASE=0A= - b.ne 6f=0A= =0A= - /* RELEASE. */=0A= -5: ldxp res0, res1, [x5]=0A= + /* RELEASE/ACQ_REL/SEQ_CST. */=0A= +4: ldaxp res0, res1, [x5]=0A= stlxp w4, in0, in1, [x5]=0A= - cbnz w4, 5b=0A= + cbnz w4, 4b=0A= ret=0A= +END (libat_exchange_16)=0A= =0A= - /* ACQ_REL/SEQ_CST. */=0A= -6: ldaxp res0, res1, [x5]=0A= - stlxp w4, in0, in1, [x5]=0A= - cbnz w4, 6b=0A= +=0A= +ENTRY (libat_compare_exchange_16)=0A= + ldp exp0, exp1, [x1]=0A= + cbz w4, 3f=0A= + cmp w4, RELEASE=0A= + b.hs 5f=0A= +=0A= + /* ACQUIRE/CONSUME. */=0A= +1: ldaxp tmp0, tmp1, [x0]=0A= + cmp tmp0, exp0=0A= + ccmp tmp1, exp1, 0, eq=0A= + csel tmp0, in0, tmp0, eq=0A= + csel tmp1, in1, tmp1, eq=0A= + stxp w4, tmp0, tmp1, [x0]=0A= + cbnz w4, 1b=0A= + beq 2f=0A= + stp tmp0, tmp1, [x1]=0A= +2: cset x0, eq=0A= + ret=0A= +=0A= + /* RELAXED. */=0A= +3: ldxp tmp0, tmp1, [x0]=0A= + cmp tmp0, exp0=0A= + ccmp tmp1, exp1, 0, eq=0A= + csel tmp0, in0, tmp0, eq=0A= + csel tmp1, in1, tmp1, eq=0A= + stxp w4, tmp0, tmp1, [x0]=0A= + cbnz w4, 3b=0A= + beq 4f=0A= + stp tmp0, tmp1, [x1]=0A= +4: cset x0, eq=0A= + ret=0A= +=0A= + /* RELEASE/ACQ_REL/SEQ_CST. */=0A= +5: ldaxp tmp0, tmp1, [x0]=0A= + cmp tmp0, exp0=0A= + ccmp tmp1, exp1, 0, eq=0A= + csel tmp0, in0, tmp0, eq=0A= + csel tmp1, in1, tmp1, eq=0A= + stlxp w4, tmp0, tmp1, [x0]=0A= + cbnz w4, 5b=0A= + beq 6f=0A= + stp tmp0, tmp1, [x1]=0A= +6: cset x0, eq=0A= ret=0A= -END (libat_exchange_16_i1)=0A= +END (libat_compare_exchange_16)=0A= =0A= =0A= ENTRY (libat_compare_exchange_16_i1)=0A= @@ -180,7 +273,7 @@ ENTRY (libat_compare_exchange_16_i1)=0A= END (libat_compare_exchange_16_i1)=0A= =0A= =0A= -ENTRY (libat_fetch_add_16_i1)=0A= +ENTRY (libat_fetch_add_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -199,10 +292,10 @@ ENTRY (libat_fetch_add_16_i1)=0A= stlxp w4, tmp0, tmp1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_fetch_add_16_i1)=0A= +END (libat_fetch_add_16)=0A= =0A= =0A= -ENTRY (libat_add_fetch_16_i1)=0A= +ENTRY (libat_add_fetch_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -221,10 +314,10 @@ ENTRY (libat_add_fetch_16_i1)=0A= stlxp w4, res0, res1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_add_fetch_16_i1)=0A= +END (libat_add_fetch_16)=0A= =0A= =0A= -ENTRY (libat_fetch_sub_16_i1)=0A= +ENTRY (libat_fetch_sub_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -243,10 +336,10 @@ ENTRY (libat_fetch_sub_16_i1)=0A= stlxp w4, tmp0, tmp1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_fetch_sub_16_i1)=0A= +END (libat_fetch_sub_16)=0A= =0A= =0A= -ENTRY (libat_sub_fetch_16_i1)=0A= +ENTRY (libat_sub_fetch_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -265,10 +358,10 @@ ENTRY (libat_sub_fetch_16_i1)=0A= stlxp w4, res0, res1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_sub_fetch_16_i1)=0A= +END (libat_sub_fetch_16)=0A= =0A= =0A= -ENTRY (libat_fetch_or_16_i1)=0A= +ENTRY (libat_fetch_or_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -287,10 +380,10 @@ ENTRY (libat_fetch_or_16_i1)=0A= stlxp w4, tmp0, tmp1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_fetch_or_16_i1)=0A= +END (libat_fetch_or_16)=0A= =0A= =0A= -ENTRY (libat_or_fetch_16_i1)=0A= +ENTRY (libat_or_fetch_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -309,10 +402,10 @@ ENTRY (libat_or_fetch_16_i1)=0A= stlxp w4, res0, res1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_or_fetch_16_i1)=0A= +END (libat_or_fetch_16)=0A= =0A= =0A= -ENTRY (libat_fetch_and_16_i1)=0A= +ENTRY (libat_fetch_and_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -331,10 +424,10 @@ ENTRY (libat_fetch_and_16_i1)=0A= stlxp w4, tmp0, tmp1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_fetch_and_16_i1)=0A= +END (libat_fetch_and_16)=0A= =0A= =0A= -ENTRY (libat_and_fetch_16_i1)=0A= +ENTRY (libat_and_fetch_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -353,10 +446,10 @@ ENTRY (libat_and_fetch_16_i1)=0A= stlxp w4, res0, res1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_and_fetch_16_i1)=0A= +END (libat_and_fetch_16)=0A= =0A= =0A= -ENTRY (libat_fetch_xor_16_i1)=0A= +ENTRY (libat_fetch_xor_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -375,10 +468,10 @@ ENTRY (libat_fetch_xor_16_i1)=0A= stlxp w4, tmp0, tmp1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_fetch_xor_16_i1)=0A= +END (libat_fetch_xor_16)=0A= =0A= =0A= -ENTRY (libat_xor_fetch_16_i1)=0A= +ENTRY (libat_xor_fetch_16)=0A= mov x5, x0=0A= cbnz w4, 2f=0A= =0A= @@ -397,10 +490,10 @@ ENTRY (libat_xor_fetch_16_i1)=0A= stlxp w4, res0, res1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_xor_fetch_16_i1)=0A= +END (libat_xor_fetch_16)=0A= =0A= =0A= -ENTRY (libat_fetch_nand_16_i1)=0A= +ENTRY (libat_fetch_nand_16)=0A= mov x5, x0=0A= mvn in0, in0=0A= mvn in1, in1=0A= @@ -421,10 +514,10 @@ ENTRY (libat_fetch_nand_16_i1)=0A= stlxp w4, tmp0, tmp1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_fetch_nand_16_i1)=0A= +END (libat_fetch_nand_16)=0A= =0A= =0A= -ENTRY (libat_nand_fetch_16_i1)=0A= +ENTRY (libat_nand_fetch_16)=0A= mov x5, x0=0A= mvn in0, in0=0A= mvn in1, in1=0A= @@ -445,21 +538,38 @@ ENTRY (libat_nand_fetch_16_i1)=0A= stlxp w4, res0, res1, [x5]=0A= cbnz w4, 2b=0A= ret=0A= -END (libat_nand_fetch_16_i1)=0A= +END (libat_nand_fetch_16)=0A= =0A= =0A= -ENTRY (libat_test_and_set_16_i1)=0A= - mov w2, 1=0A= - cbnz w1, 2f=0A= -=0A= - /* RELAXED. */=0A= - swpb w0, w2, [x0]=0A= - ret=0A= +/* __atomic_test_and_set is always inlined, so this entry is unused and=0A= + only required for completeness. */=0A= +ENTRY (libat_test_and_set_16)=0A= =0A= - /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */=0A= -2: swpalb w0, w2, [x0]=0A= + /* RELAXED/ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. */=0A= + mov x5, x0=0A= +1: ldaxrb w0, [x5]=0A= + stlxrb w4, w2, [x5]=0A= + cbnz w4, 1b=0A= ret=0A= -END (libat_test_and_set_16_i1)=0A= +END (libat_test_and_set_16)=0A= +=0A= +=0A= +/* Alias entry points which are the same in baseline and LSE2. */=0A= +=0A= +ALIAS (libat_exchange_16_i1, libat_exchange_16)=0A= +ALIAS (libat_fetch_add_16_i1, libat_fetch_add_16)=0A= +ALIAS (libat_add_fetch_16_i1, libat_add_fetch_16)=0A= +ALIAS (libat_fetch_sub_16_i1, libat_fetch_sub_16)=0A= +ALIAS (libat_sub_fetch_16_i1, libat_sub_fetch_16)=0A= +ALIAS (libat_fetch_or_16_i1, libat_fetch_or_16)=0A= +ALIAS (libat_or_fetch_16_i1, libat_or_fetch_16)=0A= +ALIAS (libat_fetch_and_16_i1, libat_fetch_and_16)=0A= +ALIAS (libat_and_fetch_16_i1, libat_and_fetch_16)=0A= +ALIAS (libat_fetch_xor_16_i1, libat_fetch_xor_16)=0A= +ALIAS (libat_xor_fetch_16_i1, libat_xor_fetch_16)=0A= +ALIAS (libat_fetch_nand_16_i1, libat_fetch_nand_16)=0A= +ALIAS (libat_nand_fetch_16_i1, libat_nand_fetch_16)=0A= +ALIAS (libat_test_and_set_16_i1, libat_test_and_set_16)=0A= =0A= =0A= /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code. */=0A= diff --git a/libatomic/config/linux/aarch64/host-config.h b/libatomic/confi= g/linux/aarch64/host-config.h=0A= index 9747accd88f5881d0496f9f8104149e74bbbe268..265b6ebd82fd7cb7fd36554d4da= 82ef54b3b99b5 100644=0A= --- a/libatomic/config/linux/aarch64/host-config.h=0A= +++ b/libatomic/config/linux/aarch64/host-config.h=0A= @@ -35,11 +35,20 @@=0A= #endif=0A= #define IFUNC_NCOND(N) (1)=0A= =0A= -#if N =3D=3D 16 && IFUNC_ALT !=3D 0=0A= +#endif /* HAVE_IFUNC */=0A= +=0A= +/* All 128-bit atomic functions are defined in aarch64/atomic_16.S. */=0A= +#if N =3D=3D 16=0A= # define DONE 1=0A= #endif=0A= =0A= -#endif /* HAVE_IFUNC */=0A= +/* State we have lock-free 128-bit atomics. */=0A= +#undef FAST_ATOMIC_LDST_16=0A= +#define FAST_ATOMIC_LDST_16 1=0A= +#undef MAYBE_HAVE_ATOMIC_CAS_16=0A= +#define MAYBE_HAVE_ATOMIC_CAS_16 1=0A= +#undef MAYBE_HAVE_ATOMIC_EXCHANGE_16=0A= +#define MAYBE_HAVE_ATOMIC_EXCHANGE_16 1=0A= =0A= #ifdef HWCAP_USCAT=0A= =0A= =0A= =0A=