From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2061.outbound.protection.outlook.com [40.107.6.61]) by sourceware.org (Postfix) with ESMTPS id 859DA3858D35 for ; Thu, 12 Jan 2023 15:59:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 859DA3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qP4sI8bcZy8BRAqRShkUnce/t64Mjfd3aAQFfB+bt10=; b=sfQlJ8lgp6CZtwm2szuaBwStLOUlitFPQJcCVMgpgUcjzxer2fecocch3YtqRAzLtFZruMVqcp+4oBXgDQleEDg88PlJusYAgnasAbziGyqNtKpHnz9fUZqu+rl1WLDouxYMYcF6RK2ik3XXjkWlhtSMdDPk0eugPzUhm7ZBhAw= Received: from DB8P191CA0011.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::21) by GV2PR08MB8390.eurprd08.prod.outlook.com (2603:10a6:150:bc::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:59:14 +0000 Received: from DBAEUR03FT003.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:130:cafe::95) by DB8P191CA0011.outlook.office365.com (2603:10a6:10:130::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:59:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT003.mail.protection.outlook.com (100.127.142.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:59:13 +0000 Received: ("Tessian outbound baf1b7a96f25:v132"); Thu, 12 Jan 2023 15:59:13 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 78c927954f8c5d9c X-CR-MTA-TID: 64aa7808 Received: from 63d971280f65.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id ED49807B-4C90-40EA-BA10-3C750040FC79.1; Thu, 12 Jan 2023 15:59:07 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 63d971280f65.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:59:07 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LP+Vl36AuFd6gsCeVrn7ayDN6jNPoP5K6r+cd8lCtEJ3vN1M1/cvEOJeEVlHSTk42ZUykGgxejgHqCvcly7Kpi3X6Lzxu7M5vogdFYub3ZC/Pu/UzRgWrzr5C0ws07HQGmUHzhfOO2U+Dxx36Uf+B7JnKHiAQmXSvDOmndHnIkMBGpKsTxd2OsYjUub9oSXz5d4j7acoIAXwtf//Hhdtz85ovDE7AGwkw7Wi6UxcUeL2wLcWggxUvzoIPZpToDV9dlCBuOhGAhpirQRjupICG5JyqduL+RehqZKHXgtmjcLzmmUw7pV5P70MCxUlWPFp2vpztqJY5RVNrhrxyavQGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qP4sI8bcZy8BRAqRShkUnce/t64Mjfd3aAQFfB+bt10=; b=HGXubihwFmF2AwmUglvSBery9+nIrE5miea2yePOdnYbEQS4HNqt8hXuPtZf66ok1o5iXty6oyqRS1pf9TuRZ7CK4MYrLvkSh6DaXrqA3QbtaYo4QaC4QwE9fRBGJUA8wSXo8X/O4Q1RUckgvc2ynFLCXvYhbpeJI+WL7jnuITKGLUGmRB0oYt2kSDfloi4bogeeuyQVxhokTrIuMBZm/2fyDjHa2qpVWm38Bb8DWD7+U3ZQ1p7BWxbpYjQTHdyKc86hVaHZAWuwRfCtca0qgRxdYHPOuWzF5Jjz3mNJOlAbA+6q/EM95WL4OT08V6evOpalQcTmsFBG5mZXffkCqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qP4sI8bcZy8BRAqRShkUnce/t64Mjfd3aAQFfB+bt10=; b=sfQlJ8lgp6CZtwm2szuaBwStLOUlitFPQJcCVMgpgUcjzxer2fecocch3YtqRAzLtFZruMVqcp+4oBXgDQleEDg88PlJusYAgnasAbziGyqNtKpHnz9fUZqu+rl1WLDouxYMYcF6RK2ik3XXjkWlhtSMdDPk0eugPzUhm7ZBhAw= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by DU0PR08MB8978.eurprd08.prod.outlook.com (2603:10a6:10:466::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:58:55 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:58:55 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Optimize strchr Thread-Topic: [PATCH] AArch64: Optimize strchr Thread-Index: AQHZJp6uEaKJLSAxbUqrbILsBpb92Q== Date: Thu, 12 Jan 2023 15:58:55 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|DU0PR08MB8978:EE_|DBAEUR03FT003:EE_|GV2PR08MB8390:EE_ X-MS-Office365-Filtering-Correlation-Id: 7f7db59c-b25c-46ef-c6eb-08daf4b5f365 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: oco+nA11Sn2pNAh5jueBLhdFicEnH2ZkEGEBV1vy4dkcwDxk6qCabORoYkAJts1l31pp5Y/SlBPCJeBXfizpwIa6IAZOAgdS44tiJS/RnpO1DloWZ3a8OI/jeYBQNx7gZx7mwUKSZDVj97HWdc4uhHfEHQFSlVtxh5d0O7YetdAGrIzYFD3uyatnZZhvWgbdFzNjP/MSY4XQ4MQxC/4IuRYGzB6/d/3yZYVQtttSSu1mrWLOrYAWzaiDJ2+cC6BO82ASl65fPv9a5MAjKICdf4NRlsIdFY8/crcIMz3cdPbFetFdAcplOsCxUEFlej/S39Sdu776QpYuFXYCaEscbSXz3m0UJMCrUwqcGZGwqwDLqUFJDV8cwQsgYaGZZpmlrEj8S3k5nylBCC+kTTNIYK3K/G8wMinsw8AGkmaAEIF+bzQzIk6FOD+AFPK0wc82zgehJ75HO5ufPdX7HcMW1S9pFyWZcUPtNF6jKibmodDIpxXPBSELAAZR/hKb8kGqcZF2dsyvNSYjApmPvbdbLQcT2t/vpwTBey4t+n6DEMj0MkfzRdnYutbg2ZLeeEvm2S7jLKHYkxz2eANmD4+03LVwYkjYbqaxncEQIQVpmHPdEsWzYHn71soAJNUrPjUlGUKKl97aeQI9r6Mipa3mIA3jNGzIWrP3irJuy/nZHC33kMvwQ+FEnHFeEKl0xs7JVTfOcHCEDQnfaawpT72vP4V5g8UYTh3kouhPdnIyUaN3xU6eiI1acKoeiQQ5HZFm X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(346002)(39860400002)(136003)(396003)(366004)(376002)(451199015)(5660300002)(52536014)(8936002)(41300700001)(64756008)(2906002)(55016003)(66946007)(8676002)(83380400001)(6916009)(38070700005)(4326008)(91956017)(76116006)(66556008)(66446008)(316002)(71200400001)(33656002)(7696005)(66476007)(186003)(6506007)(9686003)(86362001)(26005)(478600001)(122000001)(38100700002)(17423001)(156123004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB8978 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f4d43229-ae8f-444d-a3a9-08daf4b5e896 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xO6oe1DiYkXkAuHTfw0AzQr4SLeLPnVTiuVD8Kuum5HGzhXr9+H97sA001HM0EIQ81DkrNoKFt90EP0+GBbjiVc59RO0XD8cCIEtsU4zFqZMXNWfsX1XPtV7GRmYJVlEewIRGRmSJg69JlTUx1NHyJe4GLNvN2lmW85unkh9I1p4LDlDLbMddD5t+YUZn0ENXUFa/fd4ajNnVb8TVAYmLDEx8nOzALLlmSYIZ2fcKj9xzZIEGX3jHssVa3VkqL5j8B1pntLflovmOI3r5iR0lRxMjE/jJ2jdYCiZCiF2RFBpubfN8Ta369C/YjMaXbOkxgM8XtkP9EllGi89eaFS2RD1E9o3LSTF/+1xA9v9F5Dsf7yeZHtiA5SW7Yfjvv6j0bIlhC0rPRcPt2EnzalbVqPOsafJR6x7Gho1YnXVE+m8VcGhHittI7O00nN8qO/FeMWhHkiRwI9OL/qeh4Y9gcEOU4HNdtSaJrhXhpYAH5opl25aXwECj/COMkt5EWnPAAWjBHDOSturnewCsSlxgTZF2u8jjsLwx+Fwu02/NnbUE6RHlzVYXMeDiRSEWkLsGKZi99571rvRWuYwqK1zUutXZpxlkBICJKJfM07dOopYm4OauinhMeRfaqOMz2p+P230Pv5D70uq+D2sgu1A/ZltZyOOoAcuKlIbouqW8odlWAQkCpWVbemewTYkpCQ6z7cSJeK1+Yiyy3+CG6c9gw9tqE2cx9gWW/ckTXDz/4od5oBO/ePRBOi9TeKOVOJK X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(136003)(396003)(39860400002)(376002)(451199015)(40470700004)(36840700001)(46966006)(9686003)(36860700001)(186003)(26005)(83380400001)(82740400003)(336012)(6506007)(7696005)(478600001)(47076005)(316002)(6916009)(81166007)(41300700001)(8936002)(70586007)(356005)(4326008)(40460700003)(8676002)(5660300002)(2906002)(33656002)(52536014)(86362001)(70206006)(55016003)(82310400005)(40480700001)(17423001)(156123004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:59:13.6137 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7f7db59c-b25c-46ef-c6eb-08daf4b5f365 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8390 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Simplify calculation of the mask using shrn. Unroll the main loop.=0A= Small strings are 20% faster on modern CPUs. Passes regress.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/strchr.S b/sysdeps/aarch64/strchr.S=0A= index 900ef15944c2b8a82943cc0fbdaf0b40907c40e1..14ae1513a7330a62cf5985d06e1= fb6a8bab78d63 100644=0A= --- a/sysdeps/aarch64/strchr.S=0A= +++ b/sysdeps/aarch64/strchr.S=0A= @@ -32,8 +32,7 @@=0A= =0A= #define src x2=0A= #define tmp1 x1=0A= -#define wtmp2 w3=0A= -#define tmp3 x3=0A= +#define tmp2 x3=0A= =0A= #define vrepchr v0=0A= #define vdata v1=0A= @@ -41,39 +40,30 @@=0A= #define vhas_nul v2=0A= #define vhas_chr v3=0A= #define vrepmask v4=0A= -#define vrepmask2 v5=0A= -#define vend v6=0A= -#define dend d6=0A= +#define vend v5=0A= +#define dend d5=0A= =0A= /* Core algorithm.=0A= =0A= For each 16-byte chunk we calculate a 64-bit syndrome value with four b= its=0A= - per byte. For even bytes, bits 0-1 are set if the relevant byte matched= the=0A= - requested character, bits 2-3 are set if the byte is NUL (or matched), = and=0A= - bits 4-7 are not used and must be zero if none of bits 0-3 are set). Od= d=0A= - bytes set bits 4-7 so that adjacent bytes can be merged. Since the bits= =0A= - in the syndrome reflect the order in which things occur in the original= =0A= - string, counting trailing zeros identifies exactly which byte matched. = */=0A= + per byte. Bits 0-1 are set if the relevant byte matched the requested= =0A= + character, bits 2-3 are set if the byte is NUL or matched. Count traili= ng=0A= + zeroes gives the position of the matching byte if it is a multiple of 4= .=0A= + If it is not a multiple of 4, there was no match. */=0A= =0A= ENTRY (strchr)=0A= PTR_ARG (0)=0A= bic src, srcin, 15=0A= dup vrepchr.16b, chrin=0A= ld1 {vdata.16b}, [src]=0A= - mov wtmp2, 0x3003=0A= - dup vrepmask.8h, wtmp2=0A= + movi vrepmask.16b, 0x33=0A= cmeq vhas_nul.16b, vdata.16b, 0=0A= cmeq vhas_chr.16b, vdata.16b, vrepchr.16b=0A= - mov wtmp2, 0xf00f=0A= - dup vrepmask2.8h, wtmp2=0A= -=0A= bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b=0A= - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b=0A= - lsl tmp3, srcin, 2=0A= - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */=0A= -=0A= + lsl tmp2, srcin, 2=0A= + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */=0A= fmov tmp1, dend=0A= - lsr tmp1, tmp1, tmp3=0A= + lsr tmp1, tmp1, tmp2=0A= cbz tmp1, L(loop)=0A= =0A= rbit tmp1, tmp1=0A= @@ -87,28 +77,34 @@ ENTRY (strchr)=0A= =0A= .p2align 4=0A= L(loop):=0A= - ldr qdata, [src, 16]!=0A= + ldr qdata, [src, 16]=0A= + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b=0A= + cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b=0A= + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b=0A= + fmov tmp1, dend=0A= + cbnz tmp1, L(end)=0A= + ldr qdata, [src, 32]!=0A= cmeq vhas_chr.16b, vdata.16b, vrepchr.16b=0A= cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b=0A= umaxp vend.16b, vhas_nul.16b, vhas_nul.16b=0A= fmov tmp1, dend=0A= cbz tmp1, L(loop)=0A= + sub src, src, 16=0A= +L(end):=0A= =0A= #ifdef __AARCH64EB__=0A= bif vhas_nul.16b, vhas_chr.16b, vrepmask.16b=0A= - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b=0A= - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */=0A= + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */=0A= fmov tmp1, dend=0A= #else=0A= bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b=0A= - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b=0A= - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */=0A= + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */=0A= fmov tmp1, dend=0A= rbit tmp1, tmp1=0A= #endif=0A= + add src, src, 16=0A= clz tmp1, tmp1=0A= - /* Tmp1 is an even multiple of 2 if the target character was=0A= - found first. Otherwise we've found the end of string. */=0A= + /* Tmp1 is a multiple of 4 if the target character was found. */=0A= tst tmp1, 2=0A= add result, src, tmp1, lsr 2=0A= csel result, result, xzr, eq=0A=