From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2041.outbound.protection.outlook.com [40.107.8.41]) by sourceware.org (Postfix) with ESMTPS id 5D05138543AF for ; Fri, 13 Jan 2023 12:26:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5D05138543AF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Cj/0ovSo7hEuA+BpD0+SAHKfdSvv9cKJNSjUJKcTgdU=; b=yIw4d8x7pHJiUp6p/BWmgYWHjAgDtXtExw2BpR5ARD0F+wiDmRDE0S+iW7x9784OyrgG+nBBJnUgVqY9QCgGrWdfE/Bdry5rED+X9w4KN3sokiuW6zXa5HUql+hX0IM9ezlwSmQMfM0VYBCpanyxco01wwg/qi8hmSmv0YKJB+0= Received: from DB8PR06CA0064.eurprd06.prod.outlook.com (2603:10a6:10:120::38) by PAXPR08MB7490.eurprd08.prod.outlook.com (2603:10a6:102:2b7::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Fri, 13 Jan 2023 12:26:35 +0000 Received: from DBAEUR03FT033.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:120:cafe::41) by DB8PR06CA0064.outlook.office365.com (2603:10a6:10:120::38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.14 via Frontend Transport; Fri, 13 Jan 2023 12:26:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT033.mail.protection.outlook.com (100.127.142.251) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Fri, 13 Jan 2023 12:26:35 +0000 Received: ("Tessian outbound 333ca28169fa:v132"); Fri, 13 Jan 2023 12:26:34 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3b3bfaba8f7f8bed X-CR-MTA-TID: 64aa7808 Received: from 97b2c4d2cf2e.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 794F4B4F-5B31-4596-A729-6EBA8474DD86.1; Fri, 13 Jan 2023 12:26:28 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 97b2c4d2cf2e.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 13 Jan 2023 12:26:28 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L2bICBzrcrteBTl2qBB84siiArwXmfbD8ds0q6SieihbxMBwnX3X/t3ekPXFSsn0fVMn8gxXctRptwx87xYSwooXNS3wK9SV0rT3MBapUsFy28rzcZRnG6uorCxd14NKRIRBU3g94Hq8hDhpMIx4TxiI5Iq60Y1Xmzod05lepNi7brvN6uOCqKLK9cQ0Cw2vS1DJpzZcbwL335Urz7Qywd1Do0YQLLdvqum09nvfMt8+1A8tC3fY3zw4DmyknXlVRqtw4U5jfCk1rtlyD8EKJVD/XIRCUh6yrklVYF9G9O1GBMhFLVNN8Tip3InjO/UsSlk9O8S0Za26uQqsjLQFGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Cj/0ovSo7hEuA+BpD0+SAHKfdSvv9cKJNSjUJKcTgdU=; b=O2BZUtdseRgkBJdX349dBOB6taV1YNUD4UGzFpn3VskwNvkkJp4IRgfVqZa5++Rn+PPcCDb1E8s0iAZAhXXuxeMaggx22d1exM5xdXmPJValezAu6cV2HfBVRQFxXgeKkziaWbK/rszi6b4ibWSup+49ABzGouy2NnWWjW3Nhj+o7MbzDNnu6aRrM7oX++nVf09mG+RnO5uUzfh07YEWJGDMPfKdKNTk0Jsj4ZaztTdD07b71Dx5vRWqHvfnd5AoevbrdjVQe0V8lKov2NBhVtiaYrqJlQin1QZyNP6dbMkyTCUuy6GGJ01ajqTPIm/gvGxUXGY8w6woQEaRVVQXPw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Cj/0ovSo7hEuA+BpD0+SAHKfdSvv9cKJNSjUJKcTgdU=; b=yIw4d8x7pHJiUp6p/BWmgYWHjAgDtXtExw2BpR5ARD0F+wiDmRDE0S+iW7x9784OyrgG+nBBJnUgVqY9QCgGrWdfE/Bdry5rED+X9w4KN3sokiuW6zXa5HUql+hX0IM9ezlwSmQMfM0VYBCpanyxco01wwg/qi8hmSmv0YKJB+0= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by AS8PR08MB8923.eurprd08.prod.outlook.com (2603:10a6:20b:5b3::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13; Fri, 13 Jan 2023 12:26:27 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff%4]) with mapi id 15.20.6002.013; Fri, 13 Jan 2023 12:26:26 +0000 Date: Fri, 13 Jan 2023 12:26:08 +0000 From: Szabolcs Nagy To: Wilco Dijkstra Cc: 'GNU C Library' Subject: Re: [PATCH] AArch64: Optimize strlen Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA1PR02CA0017.namprd02.prod.outlook.com (2603:10b6:806:2cf::23) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|AS8PR08MB8923:EE_|DBAEUR03FT033:EE_|PAXPR08MB7490:EE_ X-MS-Office365-Filtering-Correlation-Id: b6ae3a1d-4ec9-4823-c9eb-08daf5616919 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Yu3ynXxH8Xbq9NFnFuz3FaznitS0E2DNtwek4zdOnfOMqrpFRZg9m8DnggKSkPLautdwsaDVOw4ppwDz5buSAotqi5AcELr2BpNXKLWRS4kCLYtPcZJHw5hW+Hu5th1l6ppUv/pBufxGqVa6egpDirMtpCwdPK9AIbZFVBlOLUBd6cOPXM+qwxnAze0RNyRxAmosNDsgiuHncjEykRDc/zugU9dJB1apHtW4rpZg/C7KIXDpZuI8Q6ksCeGgdOwvZUdwFxPV/XIkIOTSbA0YznzpIz6W2PK8ca2Oi2CBiLzNadTDZ5DyyvScslBmtLKfrpjGl0raSvXj2UKhQP38X3s1Gj95Jo834Vd7YFOQadwLuqUdtAWeSGtuFQT5Dp3F+2MndGqV6+9w0ZNPyTj1Lc/Wkf8Ta9eFt3FFKpErquEVcSyWKyqDMzYa4xpN8lAVfM5I1aX94HOXIKHdngfgjvRThmpCJTpBOLT5vmq6RDfvOGdWygGSSOQsp4/WHF/lxIKLNIEvlF4fCrwVdlssgHu24GUW8vD4cz2Kq12BLm6SJfPmXUHOmmoj7lRL1Ph4fxQcMYXaJIDOqLFF75YExx1jn/F99ee7KKQlwihWxm02WynqhhxrI09A4qYeN8QEZ/p+Ilt+e5sX7JXtwEauNQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(346002)(136003)(396003)(39860400002)(376002)(451199015)(5660300002)(6636002)(2616005)(6666004)(316002)(6506007)(2906002)(36756003)(37006003)(38100700002)(6486002)(86362001)(6512007)(186003)(478600001)(26005)(83380400001)(8936002)(6862004)(4326008)(8676002)(66946007)(66476007)(66556008)(41300700001);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8923 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT033.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: b62a4b47-258e-4a1c-7bfb-08daf5616401 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: oXxuUjMMqSvye+RlC39jbNbI3cvtcOZbZ+7suk5hvOKeCoQ3lgN7+zLkdpI3lzmmfY2QiDeMigeBi2vxU9JsfawNn4XcqmjGVHWKkSQ4c9TvePONPTojoDqJpKJeu4Gey2oKgt/wAyAyWWhNwifyhsjCeYzfYBgYhVtLpPokiDA/hbrtpImFs1p0/DxqqLekkNNx0HotIzTFaAovau49VwJtw3H1Pg7qNDP+vYIO/Z4FmPsgi0+HLBZHhoJmbDifGwq2CByXWSIm0m3mBumPB3ltl2awq4TsO8qX9nt+OHh+xuhf+lDOOTj2jbhOk0JNwXDdqInkkz0s81XTirvaG84Lra+2MRzltiQarsV+fdK5UdooGJ66wOE8L+r4y7YIGs+auognb21vOlov6MEzrwz+nTnito3oZ5cMucX1WcIfWs4yBFWxCfQyr4z/GCOyYgJeqsEMphbpMATeC5UdpiOKI06Q18IGNh/CZ5Q0EXzNYffE53zFbSUeHsZ2QMEmYoePZJV72I7lcbM58zEmVJiMPjeVL+XUv0zjdqBQJHZNRGKNIOivFfrDQAAx8H90HddSpeP1epX6uzIRihsH8Y4FPkkVt/lnFCwinNlhDifmJat+PjV42ARl4VS3Ok58BrU5sVYMlqLmDPc4YSxCNNEG7pvozdP0FdCllIelVqM1/rO43R3Hv2Nl/OwdD03/bK7IDpgnEzk5+MhUyX0xcw== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(396003)(376002)(136003)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(81166007)(37006003)(86362001)(6636002)(316002)(8936002)(6862004)(2906002)(356005)(478600001)(6512007)(70206006)(70586007)(82740400003)(6666004)(26005)(40460700003)(41300700001)(4326008)(186003)(6506007)(36756003)(8676002)(2616005)(6486002)(40480700001)(83380400001)(47076005)(5660300002)(36860700001)(336012)(82310400005);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jan 2023 12:26:35.0277 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b6ae3a1d-4ec9-4823-c9eb-08daf5616919 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT033.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7490 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 01/12/2023 15:53, Wilco Dijkstra wrote: > Optimize strlen by unrolling the main loop. Large strings are 64% faster on > modern CPUs. Passes regress. please commit it, thanks. Reviewed-by: Szabolcs Nagy > > --- > > diff --git a/sysdeps/aarch64/strlen.S b/sysdeps/aarch64/strlen.S > index b3c92d9dc9b3c52e29e05ebbb89b929f177dc2cf..133ef933425fa260e61642a7840d73391168507d 100644 > --- a/sysdeps/aarch64/strlen.S > +++ b/sysdeps/aarch64/strlen.S > @@ -43,12 +43,9 @@ > #define dend d2 > > /* Core algorithm: > - > - For each 16-byte chunk we calculate a 64-bit nibble mask value with four bits > - per byte. We take 4 bits of every comparison byte with shift right and narrow > - by 4 instruction. Since the bits in the nibble mask reflect the order in > - which things occur in the original string, counting trailing zeros identifies > - exactly which byte matched. */ > + Process the string in 16-byte aligned chunks. Compute a 64-bit mask with > + four bits per byte using the shrn instruction. A count trailing zeros then > + identifies the first zero byte. */ > > ENTRY (STRLEN) > PTR_ARG (0) > @@ -68,18 +65,25 @@ ENTRY (STRLEN) > > .p2align 5 > L(loop): > - ldr data, [src, 16]! > + ldr data, [src, 16] > + cmeq vhas_nul.16b, vdata.16b, 0 > + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b > + fmov synd, dend > + cbnz synd, L(loop_end) > + ldr data, [src, 32]! > cmeq vhas_nul.16b, vdata.16b, 0 > umaxp vend.16b, vhas_nul.16b, vhas_nul.16b > fmov synd, dend > cbz synd, L(loop) > - > + sub src, src, 16 > +L(loop_end): > shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ > sub result, src, srcin > fmov synd, dend > #ifndef __AARCH64EB__ > rbit synd, synd > #endif > + add result, result, 16 > clz tmp, synd > add result, result, tmp, lsr 2 > ret >