From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80040.outbound.protection.outlook.com [40.107.8.40]) by sourceware.org (Postfix) with ESMTPS id 20D37386FC1B for ; Wed, 23 Jun 2021 15:23:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 20D37386FC1B Received: from AS8PR04CA0074.eurprd04.prod.outlook.com (2603:10a6:20b:313::19) by PR3PR08MB5833.eurprd08.prod.outlook.com (2603:10a6:102:81::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.19; Wed, 23 Jun 2021 15:23:20 +0000 Received: from VE1EUR03FT029.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:313:cafe::c3) by AS8PR04CA0074.outlook.office365.com (2603:10a6:20b:313::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.18 via Frontend Transport; Wed, 23 Jun 2021 15:23:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT029.mail.protection.outlook.com (10.152.18.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4264.18 via Frontend Transport; Wed, 23 Jun 2021 15:23:19 +0000 Received: ("Tessian outbound d6f95fd272ef:v96"); Wed, 23 Jun 2021 15:23:19 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7fee2fcb4ee607eb X-CR-MTA-TID: 64aa7808 Received: from a4b57f135008.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D2640058-0924-4707-8262-0F1163FE04D9.1; Wed, 23 Jun 2021 15:22:58 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a4b57f135008.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 23 Jun 2021 15:22:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hDnIs9pypVlmOF2t7Jimcwz6ZR415V5hbjANqh/lFTWI5a2euCzslUy3ojumIjr76+JsOGBNb96HMSb7v7vKT8+PU5rXG5SOjFFHbsc/mXHi7vaHTjU9gn1F+6H1Fey1EObm6124y4TqD6FwEQNmdJ+kyB5Q5nHs6pLd0y6kWDC7YzBsVduZP7DNP9B8KbBw52hPtgQjLs0TcyXz1WIdDnnqn9try9WhStToQ39cnarMywB/4U8N/74QmLjyLNKbrv3eRkfvitFs6NO+/OBSBWuYa/vwfJMIfZ8lw8AyYEe7/66rzgH77kFVzodRkNZeef/PRzPvhBoVWUrvSYX0Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uUQ7BjbwhsxPgDgsRT0XQkkV85oNT6835swEx0fhkcA=; b=OMCBBBp0uwrSq0YsvE0Oc+I8EDrYgthpJLc3eE5pg0LO0mUiFXxNN71V2bl6sXBpsLcSct4/VlNWk5TILjawp6OrQQUiL+r0nACODaKMs0OYw3qN3vFHUo3iZPiEWwz2IUbouSFNHEf4r09IE7jX8kat601nh+0U9dJ5UB4iXAJ6tMHmvsq1GFTK2E15zyhsfLEt0+w1jWDAiMAG4VpTBzZILcdWYq4s915ndN01VJjpjOkxITfGtxe6bY2yYk/3j6/jexmezNQq0/vtTeUfR0pzceiZ0JjXYeoHkAnAgWKHLWe8YFFIhedgCo9ARcZ4hMBBEgyZfpgOfqECEdjiTA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB4285.eurprd08.prod.outlook.com (2603:10a6:803:f8::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4242.21; Wed, 23 Jun 2021 15:22:57 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::8c25:b5e8:b9be:13ac]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::8c25:b5e8:b9be:13ac%5]) with mapi id 15.20.4242.023; Wed, 23 Jun 2021 15:22:57 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Improve strnlen performance Thread-Topic: [PATCH] AArch64: Improve strnlen performance Thread-Index: AQHXaENiuxB0dSViCUOzdrD54eNJmw== Date: Wed, 23 Jun 2021 15:22:57 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.249.100] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 1541570f-c519-4e6f-32bd-08d9365ad511 x-ms-traffictypediagnostic: VI1PR08MB4285:|PR3PR08MB5833: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:5797;OLM:5797; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: KtrKgqu3vtF1V3fGssXp1SzqR2bkc8FUh8zYmlYsQURX4RYV5CV0dByagG6pfCYkgPVCJ1+cV/9Jv8nFi5og1iR+aaK71e+YsEdm6QNcAXO+kJJL23OVng9Cz2gBPZqvnokrl9mMR5ddLyjvn/SH9JgsnIyarV1YItvGzMR2ou2gfDk0FOqkowKOL8lQ222bfKqybtWtgEKS+1Qr6ZhYiWjuUGjJGVi8R58mHKLodCve/WSt3X+fg0vsEgWfXj4DQfRLeI1nIslWIKIOoxDhYC8HyyMObjQCjKM34Qg0Znhc2ulOEjJZo14V1LV1WGKTLLx/ejMJMdQKdnlziYxMiSJtWoxZzmvp/prDE5ytJQS+ZtueN8bX+6FffRSiplwsAUXkYsppiviy0Cu0QJwxnlS3mXOM7G4gXGR/k7vHrlyx6yCaG6gqqfA1NA/3GX6RjBP1mVZkE2zFe69YA75/5MgmJgaRyV28PYB8V6z+I96O+Izctmcyw6L6q25iDBOQbRtKlGI0TNry0TK+h6hsbxFgkNnsi3/0q7PjwBePgCLvoGBLRw2LA09sUG6Ssjfj3rdgT12nb0BGZQFgTV/++M/zAD/xrgCGsu9qFAMb8G8zq8SwMQPI0fAINKwMK+we4n2W05H86mWbP+o2GfSQUA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(366004)(39850400004)(346002)(136003)(396003)(5660300002)(83380400001)(38100700002)(33656002)(52536014)(26005)(7696005)(186003)(86362001)(6506007)(478600001)(4326008)(71200400001)(2906002)(9686003)(55016002)(6916009)(316002)(8936002)(8676002)(122000001)(64756008)(66556008)(66446008)(66476007)(76116006)(66946007)(91956017); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?4jo/hIGP6H4ZOtuhjArrJF4LmAiN5p4DdPkhNNDvQu98gCZVuG6Xl5hYeI?= =?iso-8859-1?Q?hTssL1ZsmFA8UXftdCn9aR8mhbvc7/sZS0OBRc5L2DbpAcyqSdzX+UxUDb?= =?iso-8859-1?Q?d+LK+K5F8BC6TvODK+w8HkfDBINrZPtUcT7AI53tVaeQe9wiJf8j0A3isg?= =?iso-8859-1?Q?hdpYYvlLb7buAv6OaA2M4wflJBa4TqxVmpoMNZf7H6lZ1EJGoSQ5uXLIdo?= =?iso-8859-1?Q?Gnxxdz/zfI/b5ukDfTCpM2BSWzqzs3EYhdu+/NL5RveJfRTJoctH9gfKdI?= =?iso-8859-1?Q?judYmeEq+H5SHhxcIo+k/19Vw8vC6APGyqHI1pSIWBiitKhC3dc/Qf8+t9?= =?iso-8859-1?Q?8HhQ70GRACDM+mlaBonh7BPtyZSJTNoDb1qByMxwZ/Cc11/uXHCPQrAbbW?= =?iso-8859-1?Q?wXKaj3gkdCEOwiJ8m+ZtMbjFRzpBjuge41yUNGv+i3RQr+yUfu8pxPIMl6?= =?iso-8859-1?Q?/HRuPZI+QkeeBwLzw5GB2jtYYXlQmN2QYqzrqoECJGyHSgJSIykQmyIWPE?= =?iso-8859-1?Q?iZ3bBy1VTbti8U2x4wgNG5ylLfjgdUTzglcGCdhXwI9Cnakh6Fw1tNbac5?= =?iso-8859-1?Q?IxerviRxJ8043tkQCevWzPNazpIFudPbEvNLA3vsFiIkZsVrvXSUTH7QEr?= =?iso-8859-1?Q?XO3EWUAd/BStKWOSEkkHaFj8n8ojrDl+MoLtTPW3JuaryNitj48f012gxF?= =?iso-8859-1?Q?7bI+j1nrGMLWBXOuNRWJXKQ8gP5XKd4SkpU/kSAiz2gB5MypKx57NLE7QJ?= =?iso-8859-1?Q?QM4OyB4IgbsQ8iCLIvfum8zNLjk4tPPj/xO/C/ygwPqmhPJ8/T/o2fZ3f9?= =?iso-8859-1?Q?HweLcy4dBYTIn3F0t84Z0u5/S9zp9+zitzRh+KCvrOWGx7eIK5eY0zREHB?= =?iso-8859-1?Q?jAeRwWY34sMVzeNcbl+f/IVAE2Kcsi8ibSmKNissyOJ+7+ecOLXG0AkqTl?= =?iso-8859-1?Q?YSrX4UAyIo6foxwwQ6O9YrKR4P6O/PZNAC3DLxkg/rffwPT1+WG8kAk+3Q?= =?iso-8859-1?Q?N5lj/HGM7AKFqK39Ujnt1FJqCCxU6NC0vEFZqISnl8Oiv0mOiKhepHVnz+?= =?iso-8859-1?Q?sGOrxHeD6Pv0LZwy5jjA/JwemFMzYL+Fob93pekt/X4OwQzHdfu1hzt2AT?= =?iso-8859-1?Q?fC0nE56/vKjcWcU4Yd+OunW/q2SO7TgjDTT/D9+UxaBJuZin04v3MKfABA?= =?iso-8859-1?Q?Jrh0a41xEjEEH5i8yiBnVs+XyuJqyhSIEgbAqG+DqmLHBUB7xYfxNjqUit?= =?iso-8859-1?Q?WuQW1euOUQ0qWImLhFaxZajgpSOcuFaCuFH9+Mq17XXmMsobXkw9UrzI2Q?= =?iso-8859-1?Q?GRxI4/KuqHudtUedIfQP4T8XRdaDslcg1acbiU2QjvTTjIc=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4285 Original-Authentication-Results: sourceware.org; dkim=none (message not signed) header.d=none; sourceware.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 50e1a661-52b4-4f5e-a38e-08d9365ac7a0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Uwx5AtiJr6K+BG2wIxngHzpIDvdS4fl82G2qPhEbGCm6yZZ5aG1uUbCYuRrrIDYkS1XCbLFFKZnLfaziUqI+3fNORPx0Qlia7a3n44+3RvftipjrKEPZ+dCevI+xooKBIoYNG/B17QNPULlbui4g57I9DOLKhS6VtPYnD3ueevkbdUT8DobITB/MnYP7ZzsPLe9kHCgD8c1pTVdUr+wce+jHST4pJ6VgD+mkY12lvjaEvj6JwEWgBN4hayDXcvE0SEpZPvBVM9dZ4XLceBPSKH3ec4DqeSU37ispEb4NqGGUzrVkTxAFmcIoO8zdWWN61uFwT/VtIlOWMUqyuD96fJ2PQUHH31ut8Fbs9gbIc6SWr3WuAG6kdij0WDZCHyyddITdJf3TlqiaDEcUKtZD2dy87y5MBEJoTjqqKzKOeKi6xFe0wxjg+0pA34bz2Q1A1gIZXKv68c84Xti+SUfxRLH85F9LpxKZe2Q1HWF3iJ20+QXst9OFXw8dAvZ3Mvpn6bD3pwBv2cuzVdRVJ4j+cE7doToB+PKW2pT1fa8tsXh6bVHgzz6vGXTygNVZwhIMvRGqRn34GQpquUTtj5FRl9B4SDe2qMZVuHD8AgbywDAXnGU9I3CE6ZqTVVvokLpZ7SENpCPBnSDY79hjSDN03A== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(39850400004)(136003)(396003)(46966006)(36840700001)(7696005)(2906002)(86362001)(70586007)(52536014)(70206006)(5660300002)(336012)(316002)(6916009)(36860700001)(55016002)(478600001)(83380400001)(82310400003)(186003)(26005)(47076005)(8936002)(8676002)(356005)(82740400003)(81166007)(33656002)(6506007)(9686003)(4326008); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jun 2021 15:23:19.7687 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1541570f-c519-4e6f-32bd-08d9365ad511 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3PR08MB5833 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2021 15:23:24 -0000 =0A= Optimize strnlen by avoiding UMINV which is slow on most cores. On Neoverse= N1=0A= large strings are 1.8x faster than the current version, and bench-strnlen i= s 50%=0A= faster overall. This version is MTE compatible.=0A= =0A= Passes GLIBC regress, OK for commit?=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/strnlen.S b/sysdeps/aarch64/strnlen.S=0A= index 2b57575c55cc41a5c6aa813af216c6e34f6cb7b0..37e9eed4120750f4e03d5639384= 38b8c5384f75d 100644=0A= --- a/sysdeps/aarch64/strnlen.S=0A= +++ b/sysdeps/aarch64/strnlen.S=0A= @@ -22,197 +22,105 @@=0A= =0A= /* Assumptions:=0A= *=0A= - * ARMv8-a, AArch64=0A= + * ARMv8-a, AArch64, Advanced SIMD.=0A= + * MTE compatible.=0A= */=0A= =0A= -/* Arguments and results. */=0A= #define srcin x0=0A= -#define len x0=0A= -#define limit x1=0A= +#define cntin x1=0A= +#define result x0=0A= =0A= -/* Locals and temporaries. */=0A= #define src x2=0A= -#define data1 x3=0A= -#define data2 x4=0A= -#define data2a x5=0A= -#define has_nul1 x6=0A= -#define has_nul2 x7=0A= -#define tmp1 x8=0A= -#define tmp2 x9=0A= -#define tmp3 x10=0A= -#define tmp4 x11=0A= -#define zeroones x12=0A= -#define pos x13=0A= -#define limit_wd x14=0A= -=0A= -#define dataq q2=0A= -#define datav v2=0A= -#define datab2 b3=0A= -#define dataq2 q3=0A= -#define datav2 v3=0A= -#define REP8_01 0x0101010101010101=0A= -#define REP8_7f 0x7f7f7f7f7f7f7f7f=0A= -#define REP8_80 0x8080808080808080=0A= -=0A= -ENTRY_ALIGN_AND_PAD (__strnlen, 6, 9)=0A= +#define synd x3=0A= +#define shift x4=0A= +#define wtmp w4=0A= +#define tmp x4=0A= +#define cntrem x5=0A= +=0A= +#define qdata q0=0A= +#define vdata v0=0A= +#define vhas_chr v1=0A= +#define vrepmask v2=0A= +#define vend v3=0A= +#define dend d3=0A= +=0A= +/*=0A= + Core algorithm:=0A= +=0A= + For each 16-byte chunk we calculate a 64-bit syndrome value with four b= its=0A= + per byte. For even bytes, bits 0-3 are set if the relevant byte matched= the=0A= + requested character or the byte is NUL. Bits 4-7 must be zero. Bits 4-7= are=0A= + set likewise for odd bytes so that adjacent bytes can be merged. Since = the=0A= + bits in the syndrome reflect the order in which things occur in the ori= ginal=0A= + string, counting trailing zeros identifies exactly which byte matched. = */=0A= +=0A= +ENTRY (__strnlen)=0A= PTR_ARG (0)=0A= SIZE_ARG (1)=0A= - cbz limit, L(hit_limit)=0A= - mov zeroones, #REP8_01=0A= - bic src, srcin, #15=0A= - ands tmp1, srcin, #15=0A= - b.ne L(misaligned)=0A= - /* Calculate the number of full and partial words -1. */=0A= - sub limit_wd, limit, #1 /* Limit !=3D 0, so no underflow. */=0A= - lsr limit_wd, limit_wd, #4 /* Convert to Qwords. */=0A= -=0A= - /* NUL detection works on the principle that (X - 1) & (~X) & 0x80=0A= - (=3D> (X - 1) & ~(X | 0x7f)) is non-zero iff a byte is zero, and=0A= - can be done in parallel across the entire word. */=0A= - /* The inner loop deals with two Dwords at a time. This has a=0A= - slightly higher start-up cost, but we should win quite quickly,=0A= - especially on cores with a high number of issue slots per=0A= - cycle, as we get much better parallelism out of the operations. */=0A= -=0A= - /* Start of critial section -- keep to one 64Byte cache line. */=0A= -=0A= - ldp data1, data2, [src], #16=0A= -L(realigned):=0A= - sub tmp1, data1, zeroones=0A= - orr tmp2, data1, #REP8_7f=0A= - sub tmp3, data2, zeroones=0A= - orr tmp4, data2, #REP8_7f=0A= - bic has_nul1, tmp1, tmp2=0A= - bic has_nul2, tmp3, tmp4=0A= - subs limit_wd, limit_wd, #1=0A= - orr tmp1, has_nul1, has_nul2=0A= - ccmp tmp1, #0, #0, pl /* NZCV =3D 0000 */=0A= - b.eq L(loop)=0A= - /* End of critical section -- keep to one 64Byte cache line. */=0A= -=0A= - orr tmp1, has_nul1, has_nul2=0A= - cbz tmp1, L(hit_limit) /* No null in final Qword. */=0A= -=0A= - /* We know there's a null in the final Qword. The easiest thing=0A= - to do now is work out the length of the string and return=0A= - MIN (len, limit). */=0A= -=0A= - sub len, src, srcin=0A= - cbz has_nul1, L(nul_in_data2)=0A= -#ifdef __AARCH64EB__=0A= - mov data2, data1=0A= + bic src, srcin, 15=0A= + mov wtmp, 0xf00f=0A= + cbz cntin, L(nomatch)=0A= + ld1 {vdata.16b}, [src], 16=0A= + dup vrepmask.8h, wtmp=0A= + cmeq vhas_chr.16b, vdata.16b, 0=0A= + lsl shift, srcin, 2=0A= + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b=0A= + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */=0A= + fmov synd, dend=0A= + lsr synd, synd, shift=0A= + cbz synd, L(start_loop)=0A= +L(finish):=0A= + rbit synd, synd=0A= + clz synd, synd=0A= + lsr result, synd, 2=0A= + cmp cntin, result=0A= + csel result, cntin, result, ls=0A= + ret=0A= +=0A= +L(start_loop):=0A= + sub tmp, src, srcin=0A= + subs cntrem, cntin, tmp=0A= + b.ls L(nomatch)=0A= +=0A= + /* Make sure that it won't overread by a 16-byte chunk */=0A= + add tmp, cntrem, 15=0A= + tbnz tmp, 4, L(loop32_2)=0A= +=0A= + .p2align 5=0A= +L(loop32):=0A= + ldr qdata, [src], 16=0A= + cmeq vhas_chr.16b, vdata.16b, 0=0A= + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */=0A= + fmov synd, dend=0A= + cbnz synd, L(end)=0A= +L(loop32_2):=0A= + ldr qdata, [src], 16=0A= + subs cntrem, cntrem, 32=0A= + cmeq vhas_chr.16b, vdata.16b, 0=0A= + b.ls L(end)=0A= + umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */=0A= + fmov synd, dend=0A= + cbz synd, L(loop32)=0A= +=0A= +L(end):=0A= + and vhas_chr.16b, vhas_chr.16b, vrepmask.16b=0A= + addp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */=0A= + sub src, src, 16=0A= + mov synd, vend.d[0]=0A= + sub result, src, srcin=0A= +#ifndef __AARCH64EB__=0A= + rbit synd, synd=0A= #endif=0A= - sub len, len, #8=0A= - mov has_nul2, has_nul1=0A= -L(nul_in_data2):=0A= -#ifdef __AARCH64EB__=0A= - /* For big-endian, carry propagation (if the final byte in the=0A= - string is 0x01) means we cannot use has_nul directly. The=0A= - easiest way to get the correct byte is to byte-swap the data=0A= - and calculate the syndrome a second time. */=0A= - rev data2, data2=0A= - sub tmp1, data2, zeroones=0A= - orr tmp2, data2, #REP8_7f=0A= - bic has_nul2, tmp1, tmp2=0A= -#endif=0A= - sub len, len, #8=0A= - rev has_nul2, has_nul2=0A= - clz pos, has_nul2=0A= - add len, len, pos, lsr #3 /* Bits to bytes. */=0A= - cmp len, limit=0A= - csel len, len, limit, ls /* Return the lower value. */=0A= - RET=0A= -=0A= -L(loop):=0A= - ldr dataq, [src], #16=0A= - uminv datab2, datav.16b=0A= - mov tmp1, datav2.d[0]=0A= - subs limit_wd, limit_wd, #1=0A= - ccmp tmp1, #0, #4, pl /* NZCV =3D 0000 */=0A= - b.eq L(loop_end)=0A= - ldr dataq, [src], #16=0A= - uminv datab2, datav.16b=0A= - mov tmp1, datav2.d[0]=0A= - subs limit_wd, limit_wd, #1=0A= - ccmp tmp1, #0, #4, pl /* NZCV =3D 0000 */=0A= - b.ne L(loop)=0A= -L(loop_end):=0A= - /* End of critical section -- keep to one 64Byte cache line. */=0A= -=0A= - cbnz tmp1, L(hit_limit) /* No null in final Qword. */=0A= -=0A= - /* We know there's a null in the final Qword. The easiest thing=0A= - to do now is work out the length of the string and return=0A= - MIN (len, limit). */=0A= -=0A= -#ifdef __AARCH64EB__=0A= - rev64 datav.16b, datav.16b=0A= -#endif=0A= - /* Set te NULL byte as 0xff and the rest as 0x00, move the data into a=0A= - pair of scalars and then compute the length from the earliest NULL=0A= - byte. */=0A= -=0A= - cmeq datav.16b, datav.16b, #0=0A= -#ifdef __AARCH64EB__=0A= - mov data1, datav.d[1]=0A= - mov data2, datav.d[0]=0A= -#else=0A= - mov data1, datav.d[0]=0A= - mov data2, datav.d[1]=0A= -#endif=0A= - cmp data1, 0=0A= - csel data1, data1, data2, ne=0A= - sub len, src, srcin=0A= - sub len, len, #16=0A= - rev data1, data1=0A= - add tmp2, len, 8=0A= - clz tmp1, data1=0A= - csel len, len, tmp2, ne=0A= - add len, len, tmp1, lsr 3=0A= - cmp len, limit=0A= - csel len, len, limit, ls /* Return the lower value. */=0A= - RET=0A= -=0A= -L(misaligned):=0A= - /* Deal with a partial first word.=0A= - We're doing two things in parallel here;=0A= - 1) Calculate the number of words (but avoiding overflow if=0A= - limit is near ULONG_MAX) - to do this we need to work out=0A= - limit + tmp1 - 1 as a 65-bit value before shifting it;=0A= - 2) Load and mask the initial data words - we force the bytes=0A= - before the ones we are interested in to 0xff - this ensures=0A= - early bytes will not hit any zero detection. */=0A= - sub limit_wd, limit, #1=0A= - neg tmp4, tmp1=0A= - cmp tmp1, #8=0A= -=0A= - and tmp3, limit_wd, #15=0A= - lsr limit_wd, limit_wd, #4=0A= - mov tmp2, #~0=0A= -=0A= - ldp data1, data2, [src], #16=0A= - lsl tmp4, tmp4, #3 /* Bytes beyond alignment -> bits. */=0A= - add tmp3, tmp3, tmp1=0A= -=0A= -#ifdef __AARCH64EB__=0A= - /* Big-endian. Early bytes are at MSB. */=0A= - lsl tmp2, tmp2, tmp4 /* Shift (tmp1 & 63). */=0A= -#else=0A= - /* Little-endian. Early bytes are at LSB. */=0A= - lsr tmp2, tmp2, tmp4 /* Shift (tmp1 & 63). */=0A= -#endif=0A= - add limit_wd, limit_wd, tmp3, lsr #4=0A= -=0A= - orr data1, data1, tmp2=0A= - orr data2a, data2, tmp2=0A= + clz synd, synd=0A= + add result, result, synd, lsr 2=0A= + cmp cntin, result=0A= + csel result, cntin, result, ls=0A= + ret=0A= =0A= - csinv data1, data1, xzr, le=0A= - csel data2, data2, data2a, le=0A= - b L(realigned)=0A= +L(nomatch):=0A= + mov result, cntin=0A= + ret=0A= =0A= -L(hit_limit):=0A= - mov len, limit=0A= - RET=0A= END (__strnlen)=0A= libc_hidden_def (__strnlen)=0A= weak_alias (__strnlen, strnlen)=0A= =0A=