From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2043.outbound.protection.outlook.com [40.107.21.43]) by sourceware.org (Postfix) with ESMTPS id 2211C385480D for ; Fri, 13 Jan 2023 12:28:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2211C385480D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ga32z4M7wTYQ9FqngmAEqTZXd5Ge6XdpW/oV4Dm1TOw=; b=Q4krNh6lNhxVavwbwf03V2b70twZ2xQBA99ug3B2A+oAaMiDf2pq9ArVX4agQ64pvSd/7lpKZ8arSP449tnL4X/seFPdg48VYqCaRQeBGL8V7T+xOhl1ETFSp+aQl0sPzTI8Z1kFif7ktkRaioe7w/PJ9z+EARUOdxZtO3T2xQU= Received: from AM6PR02CA0004.eurprd02.prod.outlook.com (2603:10a6:20b:6e::17) by AS8PR08MB8419.eurprd08.prod.outlook.com (2603:10a6:20b:567::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.12; Fri, 13 Jan 2023 12:27:50 +0000 Received: from AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:6e:cafe::55) by AM6PR02CA0004.outlook.office365.com (2603:10a6:20b:6e::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.14 via Frontend Transport; Fri, 13 Jan 2023 12:27:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT006.mail.protection.outlook.com (100.127.141.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Fri, 13 Jan 2023 12:27:49 +0000 Received: ("Tessian outbound 3ad958cd7492:v132"); Fri, 13 Jan 2023 12:27:49 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: e28d6eb838dc1bcd X-CR-MTA-TID: 64aa7808 Received: from 80eaf7875471.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AE02F55A-E189-427E-ACFC-492D014DFC29.1; Fri, 13 Jan 2023 12:27:43 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 80eaf7875471.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 13 Jan 2023 12:27:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mtCrgp1B6D/BqDFnMsGeMSM4d7aZUlG9jOUTVfglugJLVlUKE35UwnQimJdj68K/tc95t+8kHMKyXcokKSOkUfa9pp/AP9PtX766ajgG0uSnDriFja7/WSxWz8JAWF0McPVgDeoCiYk1191PEOhP/F15O/S9PAbVXsN3dQvxqXAiVOAvZ+piM7EvrcrVkt/cqP7XIv9kmc01tBahMxln44hD+3zfWySB6BJ+jdEpCDCnCRGfo9SeT2xw82/ApqqScy/ZgxyZtFjuYu7Eve12jQ8VOkJHLdalXMQU/0Ot+vl+W2L5jk2aYiQubBeRR9HSSLCOsOJKHHDWhPNyNzL8bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ga32z4M7wTYQ9FqngmAEqTZXd5Ge6XdpW/oV4Dm1TOw=; b=OzpeuVhJN2nvELt51AxSjp0yw1h+ehjY9Msz5d1fJmT3GfdFfnKofdfBkxrAhT74KsJm5o0Qf9Hd/EvpP/TCi8e9HJ4+EXFGqYn3nkpk4VykZwC5VHZMmevkLsuYRXArsrTG7J/+B8cPP8xVK6raB4o0eBe0KTeCcifESlvnuOkBdxHAQlDmTaAwI2O1dznY0M98XG4yAdzl2UZ+QfyZ44bRac5ajcNT5LBR3wZdojIkZ66GHcmbnlvH7hQuS+b4T07vMjvZquSm/J05loNM7fa+xdNzt504JOl+miyjZ++DhBioCM6N7k9aTLoTXJu5SWIEWYR3+yDQhMO4S34iPA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ga32z4M7wTYQ9FqngmAEqTZXd5Ge6XdpW/oV4Dm1TOw=; b=Q4krNh6lNhxVavwbwf03V2b70twZ2xQBA99ug3B2A+oAaMiDf2pq9ArVX4agQ64pvSd/7lpKZ8arSP449tnL4X/seFPdg48VYqCaRQeBGL8V7T+xOhl1ETFSp+aQl0sPzTI8Z1kFif7ktkRaioe7w/PJ9z+EARUOdxZtO3T2xQU= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by AS8PR08MB8923.eurprd08.prod.outlook.com (2603:10a6:20b:5b3::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13; Fri, 13 Jan 2023 12:27:42 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff%4]) with mapi id 15.20.6002.013; Fri, 13 Jan 2023 12:27:41 +0000 Date: Fri, 13 Jan 2023 12:27:28 +0000 From: Szabolcs Nagy To: Wilco Dijkstra Cc: 'GNU C Library' Subject: Re: [PATCH] AArch64: Optimize memchr Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0590.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:295::6) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|AS8PR08MB8923:EE_|AM7EUR03FT006:EE_|AS8PR08MB8419:EE_ X-MS-Office365-Filtering-Correlation-Id: 1b248dfe-5d15-409b-347e-08daf56195b3 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ooQukoAIiHKlybrpcPHejN1NUqHMFoh1TaSapru9+ULDK2KEhaYJEzPXLySBx82kHHsLIjRUFfskEvLEz98k1NlmIihZLG/mFDOLUwi5G7kV0FdA+IAXmyUc1Mn36xf+M2BmhsxcRWTSS0ZheuCTBvzDTEnl9Hioln2Ny8nPcjevAiFU0XAW28wM1hLUM/0ga8H9X9zbQriHb8nKa6x1xQltKBW2IApotvWmIeCWU6T5BDRAnr8U7oW8Hd7n6oDuhR4seDeBGYFB5XCLuvo/C2nBT/zqNOlvbdvJsyMr/gSlNPoMQYkUzWckftplPqZD8Vdzm43pQgRc/fsJ7mQD+iNpHOPpwvESHGdl4J6Kpra0jPqRS2ZKd3ZtM1BYIfO6ixIvzsp+RQ4gfDBVaVsuhpG5qqgqw7NyP6yEsimDIgwixVynftsJmc8g6f/xHnbdV2y/kl2TNHEFTVmvX0NOKCegPSycAttPo3vEJBYsP2Tbx8jTrip0pAkURLw8eUx4xiesNt3DuNpERD6LOnhcCiTLW4vdk4Vyy3ix4RrsdSXfoVDCw3ApbAsKqrtH2Xk44WtQTQDo2MWuQOnKjjbOliIKVO/O5wdCfdYg1yrKkJ4rnprL9RbhLT9ISo/o+smFTLdJEKxm38r5gMhNIZ15vA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(346002)(136003)(396003)(39860400002)(376002)(451199015)(5660300002)(6636002)(2616005)(6666004)(316002)(6506007)(2906002)(36756003)(37006003)(38100700002)(6486002)(86362001)(6512007)(186003)(478600001)(26005)(8936002)(6862004)(4326008)(8676002)(66946007)(66476007)(66556008)(41300700001);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8923 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: eae914c9-f284-4fac-078d-08daf56190c9 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9obovpxlKJpfiGH9vFTXUSsLd1lK9stQn6hVMNs0+GP+ZlJ9uPUG1dUntew3HTyKSrO1fGMDRPzBbpiCFFAD6MRq3Nwi2Wk3YiWfKXltp1TTtNMmlYxUg/3eNdZekOCjBEXGlMop8cuPR4P5tu9ImDFQJo5onFJa4Nw8FaMREmRSuxyvArzyu6dFmpziMc20a/4+pYpzP19YoExiqrMbBlkRxI6KaIwZEzkrB+qnf3yPC3PBMs052kDuVjhchxFaG4Gzq9ext11cQYToaV/torqMl8C8VGsf0OXjY38UAtWO63bbNO2MIZTN1ehd7gheI6D240TW8FZ7pa0U+ggph8CbyibjXyJZDcZf1j9NvwREl7GLU4WXwZz7HE41kWhMrb99ztoYXo7AovMhYlpxx/C+j4cnjqBMoQohLkmZO1sFPT2Is0gS5VHSSoIE7dLBvper6j/6QtGIETwoe4vo0rKA2zPokc4quMR4yptFff7QmIRtD9ZNv4IVUdkOqDObYZmceuFb/uWEkjdi9n/92TJyezAhwBZCrsEmIOmUcYWFhiyd4BZh12GW9asSI6fGYjscgowhph3QTdcwqVKQhJ2WKamA/TdimR/+3TnX8W1BtPtXFd9C+T5QtAMwWJQlGG4EctTJz9qaBLkx01Ae6VdhRuoxFT2vrPrVgYYkKTERBTk7yDFFsQSAkTw1Gtfc5IL+pp4yYsElk1n/LJ884Q== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(346002)(396003)(39860400002)(376002)(451199015)(40470700004)(46966006)(36840700001)(6666004)(6506007)(81166007)(26005)(186003)(6512007)(2616005)(40480700001)(40460700003)(37006003)(6636002)(2906002)(8676002)(70206006)(36860700001)(47076005)(41300700001)(316002)(336012)(70586007)(4326008)(6862004)(36756003)(8936002)(82310400005)(5660300002)(6486002)(86362001)(478600001)(82740400003)(356005);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jan 2023 12:27:49.7903 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1b248dfe-5d15-409b-347e-08daf56195b3 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8419 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 01/12/2023 15:56, Wilco Dijkstra wrote: > Optimize the main loop - large strings are 40% faster on modern CPUs. > Passes regress. please commit it, thanks. Reviewed-by: Szabolcs Nagy > > --- > > diff --git a/sysdeps/aarch64/memchr.S b/sysdeps/aarch64/memchr.S > index 1cd32bf8cffa82d665304d54d2a4d4f75d4ff541..1c99d45fbb506c86a1db5b4d45de49b33d8635c9 100644 > --- a/sysdeps/aarch64/memchr.S > +++ b/sysdeps/aarch64/memchr.S > @@ -30,7 +30,6 @@ > # define MEMCHR __memchr > #endif > > -/* Arguments and results. */ > #define srcin x0 > #define chrin w1 > #define cntin x2 > @@ -73,42 +72,44 @@ ENTRY (MEMCHR) > > rbit synd, synd > clz synd, synd > - add result, srcin, synd, lsr 2 > cmp cntin, synd, lsr 2 > + add result, srcin, synd, lsr 2 > csel result, result, xzr, hi > ret > > + .p2align 3 > L(start_loop): > sub tmp, src, srcin > - add tmp, tmp, 16 > + add tmp, tmp, 17 > subs cntrem, cntin, tmp > - b.ls L(nomatch) > + b.lo L(nomatch) > > /* Make sure that it won't overread by a 16-byte chunk */ > - add tmp, cntrem, 15 > - tbnz tmp, 4, L(loop32_2) > - > + tbz cntrem, 4, L(loop32_2) > + sub src, src, 16 > .p2align 4 > L(loop32): > - ldr qdata, [src, 16]! > + ldr qdata, [src, 32]! > cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ > fmov synd, dend > cbnz synd, L(end) > > L(loop32_2): > - ldr qdata, [src, 16]! > - subs cntrem, cntrem, 32 > + ldr qdata, [src, 16] > cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > - b.ls L(end) > + subs cntrem, cntrem, 32 > + b.lo L(end_2) > umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ > fmov synd, dend > cbz synd, L(loop32) > +L(end_2): > + add src, src, 16 > L(end): > shrn vend.8b, vhas_chr.8h, 4 /* 128->64 */ > + sub cntrem, src, srcin > fmov synd, dend > - add tmp, srcin, cntin > - sub cntrem, tmp, src > + sub cntrem, cntin, cntrem > #ifndef __AARCH64EB__ > rbit synd, synd > #endif >