From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on2056.outbound.protection.outlook.com [40.107.103.56]) by sourceware.org (Postfix) with ESMTPS id F40043858D35 for ; Thu, 12 Jan 2023 15:53:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F40043858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KcTDjvSbrraIIHpdRcjTPiFnQu8HSw8SuZ9TT1R7ajo=; b=7Yh5xiXWcGJln/JR0XxextovKDbABzFkPN++5AFh7ZmLIxF0cI3UviHJj2oCWoqvIP9RSs8e1bVzZVlAb7B4UFGZtbNLfUXBkYWACm5sCRfGNrTOjYDjuDiQz5Ea4wg9jPYnyuqHIHeu+q+G+nZK1QiG+IQXnahDVhleq5LQuGg= Received: from FR3P281CA0005.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1d::19) by AS2PR08MB9415.eurprd08.prod.outlook.com (2603:10a6:20b:595::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.12; Thu, 12 Jan 2023 15:53:24 +0000 Received: from VI1EUR03FT032.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:1d:cafe::5c) by FR3P281CA0005.outlook.office365.com (2603:10a6:d10:1d::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6023.6 via Frontend Transport; Thu, 12 Jan 2023 15:53:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VI1EUR03FT032.mail.protection.outlook.com (100.127.145.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:53:23 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Thu, 12 Jan 2023 15:53:23 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c340fa278aecfb0d X-CR-MTA-TID: 64aa7808 Received: from 9c56c2b8b9b9.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 6581C858-0973-4977-B80F-1E51F6C75ACF.1; Thu, 12 Jan 2023 15:53:12 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9c56c2b8b9b9.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:53:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ExXznH4aZ/mX16q3F2jtHD+MoA3tDzrnVn4/Yhz8F+pSajtQpiKbRif2hPlbFaiH34AW39a6da0HBSLC4lS+FUffuwRxNWE9xEucxfYSMSXc1UuiDVoRH6BQvYg1vD8+fDpBo9wUj1uIe+Yi6ERkTL+VV4oGJsVeZNcHcs4MFYoxC0t5ej63H8kLQv/jopwxYiVP3X/jJrJBSX28eHVRYejO0SMPuJ/jI4O6eWU+8YUAXY5z9NZCyhzeiRKi+Hk/fMoy0x8VU5a0Pwvi5Hd/G5J9e0QUplfWCgmTlEEPWseXsQk+Tg4DD1VL4o/BTZeSshKFQwzxkkWhKZAe7YyUKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KcTDjvSbrraIIHpdRcjTPiFnQu8HSw8SuZ9TT1R7ajo=; b=j1JYSs0Xpa+gYQ0qrWPxHn7pSYdzZULhD8fi0ksOituoyXL06+Vm8QN8gN0JyvRIRF3pfsGNwznhQYmaFvRwgrLCo9AJYPl45rMQe9Vj9KAGPDk6iaulR8VieBr+WWZR/yEiTig0Mvc2HY2ElfRm82JcrjDXwFEOR5ydiG/Hcz817kVyAl+rVISMbPw6Kdoj8f342kdmzZfIzJr6F1z4DF59KREhKY5ETZ023WfWE1OJY0xsXSWyaLPwRidKlWGDh49L0OPRz4RoAqxjS2IUvGf28mlJvyBwCxC1sWZ6gEY1ZRuMw+4W92g7NhefvYUnhf3T0DzqxIaswibDDclB1Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KcTDjvSbrraIIHpdRcjTPiFnQu8HSw8SuZ9TT1R7ajo=; b=7Yh5xiXWcGJln/JR0XxextovKDbABzFkPN++5AFh7ZmLIxF0cI3UviHJj2oCWoqvIP9RSs8e1bVzZVlAb7B4UFGZtbNLfUXBkYWACm5sCRfGNrTOjYDjuDiQz5Ea4wg9jPYnyuqHIHeu+q+G+nZK1QiG+IQXnahDVhleq5LQuGg= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:53:09 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:53:09 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize strlen Thread-Topic: [PATCH] AArch64: Optimize strlen Thread-Index: AQHZJp3Wf4+/HJBvoEyQJPlZh5uVSQ== Date: Thu, 12 Jan 2023 15:53:09 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|VI1EUR03FT032:EE_|AS2PR08MB9415:EE_ X-MS-Office365-Filtering-Correlation-Id: fb1ed7c6-77c8-4e95-d39e-08daf4b522e1 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 4hf8hIaFxFm3N6LcvfL1SQhCT7ixkFNq4JGAuVwUBMprSaRB/AcEXgGBrCU1h8ShGVjuRQq8/RHl1f26JySetP+NDomlsEnZaW/NQX2MupELFlKhvmS97kfa05cAjN/GnLQnDdGobxzhMO6OvvV+MRxnEwPJDdZFJKuQOFumGnF3eo3dRcyCl8LRARN5U+DX1ytsG16thJMrGasZKvpxF8uEjwrpoG77qm97c/r03RTIFmMjYaBPinlDjVIOGm+epzdxOVOcvK/ERM0KLxzL5VvCrBQPOOkWm03woD6kB8gRWrgWXtaANzgbKIXAdmYViqR6udyUd5rFp7wSB4Goxomagjx5GrItK673kafw64/eJ6g3O6NHObsrN44y34I08en6UZPysp41/bSK35SDfbo1r1ub6zdfi3G76k+N2OG4x7DqtxsxenPrphKx8Qo5T/cVrbNYZIvMLeD8LIcBRO8g4tOuPe55/7MDV88FHK4Z6N+rSe/YM0F4KYhThhge0PUtlkEu//9TD4DkvDgSaBevjc9cGJ+8mFZ+2FKOD00bnqUVaAXPAb+OWgN+cdjSNLB1ILgpR6FEOcMN0X6FdosKMY3OhNUvrrHCZdidtvfV7H5TkVo4G0m893bVBAGU3g+HKg3KxlcgrzoMWvrJq7S5+5OBAQahanP3N4WLUFU0AZmwp7bVNd7QyJWqkZPXUzXn2AVvsyiV0Rq7Q5ZVKg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(83380400001)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VI1EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 39547d8b-c212-477b-8849-08daf4b51a38 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: t99kTbp68BEpppJCymKOD2VL+cDjagnyrgKvDXhzP0yrxmF4eVgACCWBtvvRd1AeC0atUESJn31tMhg/nJ+dsQaSHEH9F5EOFy8hj2OED2ssGwC4UbdQvuhgshcscHQjsUaUZlNccLsc8CoPRhHCXcGXIsvCdEhqSDSmqFVbhaIOKziDv/TU6LvPj/P+6wlRo8qWoEtyfgdomYuZH0L6QER8haNu3V5S8Lrh3QFUdZuVPiUeEuu9aiE5y63ndzw3kQVFFv9Q4yZBTDTpCVPMsgDiM11wimq7gqIQnI2LW48fZv55t0n3a7avVFSInD2yCKZ2SpgKdvjsIAW+FBEwC1br+jqNo5n0U2+t7RrPAubjdrR9XxRSqnvngvPQHbpYKcormVfXUorkSpi85NPx9SZOf10WYWCGRJU3enScaDVwzdqgo95NSdpz/ycvLowJDnzz+iQnFTJcAlUU9HlrBTIMrPGq96TQj9+YZUwrp43TiQEN4l084J5MPpXTwICEIzcJsod/gK5aSjZiazbxXjLIYs+KwohcgIb14xdAGasZhm+NbVcIV5qNXAoHNvAXpvMldMg8Ss9F25aVX/O0gQj0O5pCqPBTRfSllwpZpeDFdFwdAu5ML5L3WSpQmqXdl55IDEUVxLVrihJeaYRba3SDJiq1X35CvpPMvlW3K3DDT6Wir9Mv3IbuRt4iWBqFahSFsw9fhaGq7in2HBmcbQ== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(376002)(136003)(39860400002)(396003)(451199015)(36840700001)(40470700004)(46966006)(8936002)(70586007)(70206006)(52536014)(6916009)(4326008)(26005)(86362001)(41300700001)(8676002)(5660300002)(186003)(478600001)(356005)(2906002)(40460700003)(7696005)(33656002)(316002)(9686003)(55016003)(40480700001)(336012)(47076005)(6506007)(82310400005)(83380400001)(81166007)(82740400003)(36860700001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:53:23.6735 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fb1ed7c6-77c8-4e95-d39e-08daf4b522e1 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VI1EUR03FT032.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9415 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Optimize strlen by unrolling the main loop. Large strings are 64% faster on= =0A= modern CPUs. Passes regress.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/strlen.S b/sysdeps/aarch64/strlen.S=0A= index b3c92d9dc9b3c52e29e05ebbb89b929f177dc2cf..133ef933425fa260e61642a7840= d73391168507d 100644=0A= --- a/sysdeps/aarch64/strlen.S=0A= +++ b/sysdeps/aarch64/strlen.S=0A= @@ -43,12 +43,9 @@=0A= #define dend d2=0A= =0A= /* Core algorithm:=0A= -=0A= - For each 16-byte chunk we calculate a 64-bit nibble mask value with fou= r bits=0A= - per byte. We take 4 bits of every comparison byte with shift right and = narrow=0A= - by 4 instruction. Since the bits in the nibble mask reflect the order i= n=0A= - which things occur in the original string, counting trailing zeros iden= tifies=0A= - exactly which byte matched. */=0A= + Process the string in 16-byte aligned chunks. Compute a 64-bit mask wit= h=0A= + four bits per byte using the shrn instruction. A count trailing zeros t= hen=0A= + identifies the first zero byte. */=0A= =0A= ENTRY (STRLEN)=0A= PTR_ARG (0)=0A= @@ -68,18 +65,25 @@ ENTRY (STRLEN)=0A= =0A= .p2align 5=0A= L(loop):=0A= - ldr data, [src, 16]!=0A= + ldr data, [src, 16]=0A= + cmeq vhas_nul.16b, vdata.16b, 0=0A= + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b=0A= + fmov synd, dend=0A= + cbnz synd, L(loop_end)=0A= + ldr data, [src, 32]!=0A= cmeq vhas_nul.16b, vdata.16b, 0=0A= umaxp vend.16b, vhas_nul.16b, vhas_nul.16b=0A= fmov synd, dend=0A= cbz synd, L(loop)=0A= -=0A= + sub src, src, 16=0A= +L(loop_end):=0A= shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */=0A= sub result, src, srcin=0A= fmov synd, dend=0A= #ifndef __AARCH64EB__=0A= rbit synd, synd=0A= #endif=0A= + add result, result, 16=0A= clz tmp, synd=0A= add result, result, tmp, lsr 2=0A= ret=0A= =0A=