From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on2087.outbound.protection.outlook.com [40.107.104.87]) by sourceware.org (Postfix) with ESMTPS id A749F3854806 for ; Fri, 13 Jan 2023 12:29:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A749F3854806 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5O3Lc2GJVvSWZWri5TczF4fGqqf2bZmv1MWEmsDNFi4=; b=7UimnznRyh3KXZ+tZ6k4GcdC8995DUqDWr94KMMDraPs59HFsbSTYlvPmPUepkX96RKfoMNe+Hgs65N7HVSaWu1SjKHTOqK1YwRR6+KVIpxr0C7/vehmR4Groly7wNF/NGxvC7XmZi9GYOyWWAT1OMWjZ7SVTndTArR1zkKg2HI= Received: from AM6PR01CA0047.eurprd01.prod.exchangelabs.com (2603:10a6:20b:e0::24) by AM7PR08MB5494.eurprd08.prod.outlook.com (2603:10a6:20b:dc::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.7; Fri, 13 Jan 2023 12:29:16 +0000 Received: from AM7EUR03FT034.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:e0:cafe::ab) by AM6PR01CA0047.outlook.office365.com (2603:10a6:20b:e0::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.14 via Frontend Transport; Fri, 13 Jan 2023 12:29:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT034.mail.protection.outlook.com (100.127.140.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Fri, 13 Jan 2023 12:29:16 +0000 Received: ("Tessian outbound 8038f0863a52:v132"); Fri, 13 Jan 2023 12:29:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: bee1e6b90b4b28ac X-CR-MTA-TID: 64aa7808 Received: from 3351ee3a3ba6.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0F4312DF-672A-4E37-9EFE-9C920C9F95CC.1; Fri, 13 Jan 2023 12:29:05 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3351ee3a3ba6.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 13 Jan 2023 12:29:05 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=knbHieli1zaASCwnFMsxfK/O6GToDwm3p4Pc57gj/5oPm5uLSSPqEeYDp7dJtR3kv5naWpiB/7xyGAaDQPiUp6oss2ynZg8SfvEHN0mZyRFUexAU674pq30eN9PU6eqolst7lEvmHGkU3WMcY349PRiEojdWcdZvNCeW1qqj/1FVa3fCs7OytqcPEnxkumB44f421mx0qh76b3VxAuyctRlTAASeoUyU44Sx/2GcDM/d28Inzl7izWM0uHXizB3mH6mfC2EbPyremWI4lKBsACusa9dPpX1FORKGxxDmokSRk9225dZoRzT+hJXD4rtcqtmJWvbQmnBjBm4BPH5FGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5O3Lc2GJVvSWZWri5TczF4fGqqf2bZmv1MWEmsDNFi4=; b=SmSHqQDvLPrQEY7b1J4i670fQqmN+FrGz96NHrbH63L2I8xVY1YtO9ZajlsOi46bTFr5V1r398MrmjJmhZpZrrVifD+Eee2V4HwEH5zRZi98sJbsIAvFEK8fT0jysJ4Uypbmc1ypNxaXiwzk+zq0YURcBYJuO5oFviB2q7XqGJrm2c8CPHm/QIb0oLFYZ8DaRVsqxWRLJQJYLLF6LW404zi0hKg2LdsHh/rw0OQCGmWJliYJn4LJTVmBJO/WHNe4kMF0ixdOFyjQE9eQg73NHA9lvt2wjqACJb/W7bF4qgrHl2rA+kE7a3RPzS9Jem6d+VyDCAb1Rd3Rp/Eqq8zK/A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5O3Lc2GJVvSWZWri5TczF4fGqqf2bZmv1MWEmsDNFi4=; b=7UimnznRyh3KXZ+tZ6k4GcdC8995DUqDWr94KMMDraPs59HFsbSTYlvPmPUepkX96RKfoMNe+Hgs65N7HVSaWu1SjKHTOqK1YwRR6+KVIpxr0C7/vehmR4Groly7wNF/NGxvC7XmZi9GYOyWWAT1OMWjZ7SVTndTArR1zkKg2HI= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by DB9PR08MB9466.eurprd08.prod.outlook.com (2603:10a6:10:458::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.19; Fri, 13 Jan 2023 12:29:02 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff%4]) with mapi id 15.20.6002.013; Fri, 13 Jan 2023 12:29:02 +0000 Date: Fri, 13 Jan 2023 12:28:44 +0000 From: Szabolcs Nagy To: Wilco Dijkstra Cc: 'GNU C Library' Subject: Re: [PATCH] AArch64: Optimize strchr Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SN4PR0501CA0105.namprd05.prod.outlook.com (2603:10b6:803:42::22) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|DB9PR08MB9466:EE_|AM7EUR03FT034:EE_|AM7PR08MB5494:EE_ X-MS-Office365-Filtering-Correlation-Id: 1b3b8912-9eab-4ed7-7437-08daf561c959 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: n4cZr7vmNOsq3q0HwNsv1zWOYvVl8GkgeI+nJuDqYTr5aTfD7IrNberK0nTzNc22ne+tOzqK2RoNpS2XDoC3+vMIU14Z31Yy2ikB12Oq3fX1Bc0POhaF9U2t4HZnJ72WyLZiQVi+I3Yf0hycCr3UGSESRQPBuDPnsagNfPWkSsA8Xvvs34X+lPcQcWm0PdqQV+oETBUKQaHDrb71Cuu3t2voCi2x2hecLCkn6Jya/UwXAAjPOzOGPNEsJh/ouz6GMTtAZ/TEk1ZC+h+h/7R47wljBpsTO7sxfpIN2M9AueOegeUj5VzDSoYRmaltzBHPkVehLsGPVECrt+WIwRb10Zw3JFvkqedwH+kXaHIBXG4vE5BETLhqsHxMR8162NfaRyxZaZhIv2xvwW8+oPXrZ/TSWfBcJevHir6lH+LlWlyFE74M9ndbMhy/UeQkt8Jh+5zj1Xf/TaIFqzEtSOwefYqQlQ4Mzgqw5+qjK5mohudOKx7Y57HUTcol5M+y9C78sW0pZcMKDjxIiqaw6sbSec8infSrmAFsB075Muvu1xoY08K7igZIPF2JPiwPUNZdEmOXrQM+d4UAK3Vi6FLAgZQQfMEOfHcw7oBORY1U0aWuCmgeZyxzTA1avvkgNFT0nuqFAkK0mKwmLaXzRR1RRO+WlB/Mtwif5UMgIKvthmL+HTDDzMgOoJUyFzYUNO1zGx3a8ZPnxO+eaza6P22d/w== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(136003)(366004)(346002)(39860400002)(376002)(396003)(451199015)(83380400001)(36756003)(41300700001)(2616005)(66556008)(66476007)(8676002)(6862004)(6636002)(4326008)(66946007)(37006003)(316002)(38100700002)(86362001)(8936002)(5660300002)(2906002)(26005)(6486002)(186003)(6512007)(478600001)(6666004)(6506007)(17423001)(156123004);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB9466 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a45118af-ece4-4d52-827e-08daf561c10c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 22Zr5G2iBHWyi+ZnkdMjUZPRN7ft6/rQqTdXh44WZ/0IJ0H3Kqp388fVgQEMgv4ebSoNGIhvlPmZDt9chANaLD/3n1C4B4jTNjdEaFuyvGoph1JhS44KAYrwQcjD2ftgiCLltY+gy93EgUtayop0NJJC3EeS57RHZSgmMqbd8YsmIUMNm4N2XICQGRrwmg/7G29izx1sDMSkL4D92yqAk0lVG07jxzPkoPxBGE0D8js+hXah3NVvxRBQUDdyHDgp9v/nA3r2rql0CCH2g6NHc/aWCE+Lw5GOwzCaRyy0EDxYHhaJeBE53eqcOeSHqkBo6x+vn2olAjvqLONEr6jYWf9GQ3JyPqr4cjXMZjfg8O8BK/j/CD1iFkrcLwdWzlho/0cRzwonMzYhUxL53WNj1pQBwKNowlPXi2ppWdqBhu8qAGzaD5KU0ZxjKquze6qeONJtxbGKKftI7JTQ2AzUw6lOD6eHXE6BH8absxpVc/jwnZKXDcpjRvcvfNZEk8Jy3btVSq+gbSPQB0yvu5AeNTOPaTHN7+VCzDlawkoTgqoYh0E8a92dGdogtuU6DwI9WwPCIQFIdfVM8eFxC5MFSrqG2ym2yVKsuDAtkStq7V5KsQGzcNdZnZP+6oOcy0D5uMpQDlIJK0i/QNUN6VoeGFlsvNIwHDEJFPjOIRQNKkWoDh7GYY++OtkNstvwBS3gFOUwTiLuThYm3MREXMystjsWrVAaxCj3TgHNZmrOMRxaK03DDJvra/KOkzgyxl4W X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(346002)(136003)(39860400002)(396003)(451199015)(40470700004)(46966006)(36840700001)(36756003)(4326008)(6636002)(8676002)(70586007)(41300700001)(86362001)(356005)(81166007)(82740400003)(70206006)(36860700001)(40460700003)(6666004)(6506007)(82310400005)(6486002)(186003)(6862004)(26005)(478600001)(5660300002)(316002)(47076005)(8936002)(37006003)(40480700001)(83380400001)(2906002)(336012)(2616005)(6512007)(17423001)(156123004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jan 2023 12:29:16.4437 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1b3b8912-9eab-4ed7-7437-08daf561c959 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5494 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 01/12/2023 15:58, Wilco Dijkstra wrote: > Simplify calculation of the mask using shrn. Unroll the main loop. > Small strings are 20% faster on modern CPUs. Passes regress. please commit it, thanks. Reviewed-by: Szabolcs Nagy > > --- > > diff --git a/sysdeps/aarch64/strchr.S b/sysdeps/aarch64/strchr.S > index 900ef15944c2b8a82943cc0fbdaf0b40907c40e1..14ae1513a7330a62cf5985d06e1fb6a8bab78d63 100644 > --- a/sysdeps/aarch64/strchr.S > +++ b/sysdeps/aarch64/strchr.S > @@ -32,8 +32,7 @@ > > #define src x2 > #define tmp1 x1 > -#define wtmp2 w3 > -#define tmp3 x3 > +#define tmp2 x3 > > #define vrepchr v0 > #define vdata v1 > @@ -41,39 +40,30 @@ > #define vhas_nul v2 > #define vhas_chr v3 > #define vrepmask v4 > -#define vrepmask2 v5 > -#define vend v6 > -#define dend d6 > +#define vend v5 > +#define dend d5 > > /* Core algorithm. > > For each 16-byte chunk we calculate a 64-bit syndrome value with four bits > - per byte. For even bytes, bits 0-1 are set if the relevant byte matched the > - requested character, bits 2-3 are set if the byte is NUL (or matched), and > - bits 4-7 are not used and must be zero if none of bits 0-3 are set). Odd > - bytes set bits 4-7 so that adjacent bytes can be merged. Since the bits > - in the syndrome reflect the order in which things occur in the original > - string, counting trailing zeros identifies exactly which byte matched. */ > + per byte. Bits 0-1 are set if the relevant byte matched the requested > + character, bits 2-3 are set if the byte is NUL or matched. Count trailing > + zeroes gives the position of the matching byte if it is a multiple of 4. > + If it is not a multiple of 4, there was no match. */ > > ENTRY (strchr) > PTR_ARG (0) > bic src, srcin, 15 > dup vrepchr.16b, chrin > ld1 {vdata.16b}, [src] > - mov wtmp2, 0x3003 > - dup vrepmask.8h, wtmp2 > + movi vrepmask.16b, 0x33 > cmeq vhas_nul.16b, vdata.16b, 0 > cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > - mov wtmp2, 0xf00f > - dup vrepmask2.8h, wtmp2 > - > bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b > - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b > - lsl tmp3, srcin, 2 > - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ > - > + lsl tmp2, srcin, 2 > + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ > fmov tmp1, dend > - lsr tmp1, tmp1, tmp3 > + lsr tmp1, tmp1, tmp2 > cbz tmp1, L(loop) > > rbit tmp1, tmp1 > @@ -87,28 +77,34 @@ ENTRY (strchr) > > .p2align 4 > L(loop): > - ldr qdata, [src, 16]! > + ldr qdata, [src, 16] > + cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > + cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b > + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b > + fmov tmp1, dend > + cbnz tmp1, L(end) > + ldr qdata, [src, 32]! > cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > cmhs vhas_nul.16b, vhas_chr.16b, vdata.16b > umaxp vend.16b, vhas_nul.16b, vhas_nul.16b > fmov tmp1, dend > cbz tmp1, L(loop) > + sub src, src, 16 > +L(end): > > #ifdef __AARCH64EB__ > bif vhas_nul.16b, vhas_chr.16b, vrepmask.16b > - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b > - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ > + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ > fmov tmp1, dend > #else > bit vhas_nul.16b, vhas_chr.16b, vrepmask.16b > - and vhas_nul.16b, vhas_nul.16b, vrepmask2.16b > - addp vend.16b, vhas_nul.16b, vhas_nul.16b /* 128->64 */ > + shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ > fmov tmp1, dend > rbit tmp1, tmp1 > #endif > + add src, src, 16 > clz tmp1, tmp1 > - /* Tmp1 is an even multiple of 2 if the target character was > - found first. Otherwise we've found the end of string. */ > + /* Tmp1 is a multiple of 4 if the target character was found. */ > tst tmp1, 2 > add result, src, tmp1, lsr 2 > csel result, result, xzr, eq