From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2047.outbound.protection.outlook.com [40.107.21.47]) by sourceware.org (Postfix) with ESMTPS id A22D638543AF for ; Fri, 13 Jan 2023 12:28:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A22D638543AF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0dpInGKh77A/GCVgPjUTtu7pVEelCF5LyCBzIcbOHJI=; b=xlBWaj33iQPcMqqQpNdxAkZVTgquyQBSuDUXYcfM5X/Zm5mYw+4iaUXDO4cpytE6OcXiccNtKY9mSx2moW2626cTRcovTrxK9uiV88SuZLCobFX0WQbRUiFNEiTCFAPbVyjFSpmBinPDltEc/cqHvsznR57vnVFWGMceaeQ0TZI= Received: from AM5PR04CA0003.eurprd04.prod.outlook.com (2603:10a6:206:1::16) by AS8PR08MB6550.eurprd08.prod.outlook.com (2603:10a6:20b:31b::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13; Fri, 13 Jan 2023 12:28:43 +0000 Received: from AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:1:cafe::cc) by AM5PR04CA0003.outlook.office365.com (2603:10a6:206:1::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.14 via Frontend Transport; Fri, 13 Jan 2023 12:28:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT040.mail.protection.outlook.com (100.127.140.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Fri, 13 Jan 2023 12:28:43 +0000 Received: ("Tessian outbound baf1b7a96f25:v132"); Fri, 13 Jan 2023 12:28:43 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 9ea5f8ee67f04f9d X-CR-MTA-TID: 64aa7808 Received: from 67f7c954a845.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id ED2FB8DE-0CC2-4300-8A0C-DB225BD6A692.1; Fri, 13 Jan 2023 12:28:32 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 67f7c954a845.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 13 Jan 2023 12:28:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JIGNA9dBsFz1XSgny/RIfRTHBuSoZxF+7C/fSHVxNAmu5hILulUM7WxzAJv/XazY+QHS6UOYWUerwJ0sW34S0dNQbExGQMUTAIURh15YGZNLjFkgzu/5JF8qGXB3fwhvAEC7Om1kEPYvLqP+qJz21m+6z0b0zxFyk+FtBEkqBP8Q/W6LWgJDsLOUhAb+dZk/VJIwaPjZkEfAmMTSNaDQzWO7Tu7oBPhbDmutmbUP77xpve27Q1aPtkTaCYZKcLV/JaPcXUSZ+r0PjfamplAwOr2FRmi4vl84DkkBSZrM79GJUpzsFBYW5dQ02qyNJ/ulF6y4Q7umtRMzDFCoaNcXNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0dpInGKh77A/GCVgPjUTtu7pVEelCF5LyCBzIcbOHJI=; b=eCgxEWFpx+V/bwW43kNMgZ240cauf5Qe6Sef4LSMzE/HaD12Tlb02Q6qdkEhe1Y30XTJCVxYzdq2ruI7j+7n3rvr0wUIK9p3ChiFi5e1l7BXsRQ7GPUTHSNTkT+o3kMXZHMVyH4cHri6ddGNuk7k2P0jxIcZ9GM2jtKY0T4/yPG1lz7zFE6HL1pbxJSZo3/+YZlTsCoQJJXEXv0zTi2SfMCAe8fREtshEMVhP4Hu4IZcsfbPORDwj6Z1pvREzyV4kDpxdaAn2qO+DIEt1NC0ESyXQjkT0yX8EkPzGxuF0H9+8K433r/cUw1aOlPVtT+Z2MTyOlBeXCfzZ9Ahv31/zg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0dpInGKh77A/GCVgPjUTtu7pVEelCF5LyCBzIcbOHJI=; b=xlBWaj33iQPcMqqQpNdxAkZVTgquyQBSuDUXYcfM5X/Zm5mYw+4iaUXDO4cpytE6OcXiccNtKY9mSx2moW2626cTRcovTrxK9uiV88SuZLCobFX0WQbRUiFNEiTCFAPbVyjFSpmBinPDltEc/cqHvsznR57vnVFWGMceaeQ0TZI= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by DB9PR08MB9466.eurprd08.prod.outlook.com (2603:10a6:10:458::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.19; Fri, 13 Jan 2023 12:28:29 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::7e1c:3eb8:a25:50ff%4]) with mapi id 15.20.6002.013; Fri, 13 Jan 2023 12:28:29 +0000 Date: Fri, 13 Jan 2023 12:28:11 +0000 From: Szabolcs Nagy To: Wilco Dijkstra Cc: 'GNU C Library' Subject: Re: [PATCH] AArch64: Optimize memrchr Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA0PR11CA0119.namprd11.prod.outlook.com (2603:10b6:806:d1::34) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|DB9PR08MB9466:EE_|AM7EUR03FT040:EE_|AS8PR08MB6550:EE_ X-MS-Office365-Filtering-Correlation-Id: 290a8263-2b05-4099-5b5d-08daf561b5b3 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: IXaU8G4nLB0uEZr9DZXp4lVJi8yIU3kWifVN8oWJ8xrc3OyobLW+Y/pEFLtomUlQa2TC4K4sPoMV4IUDoDjpsDm9kFvZZwYp+fIjEpg0NDzW/0IOMqD2vCmN6DiKUPsdaxpeGDr9BNPHyPCxw9FXd9T8v7v76r/GiP8dcFK8yj7VLzL4fc4Ckj2OytubpP5ljF0OYpQE1PfF5GfHS+Jd0cV4ffMV7kAttK24ijPs4y2ZLMsK5YnUZ7NHG2a8O2F5mFh/ND33yPoLosIosE2lnCjWf/XvdDPsYmdfapZIA8wDN5QYFonSj1y2xPLCnY2Y6itf/QCHXcSMdIo25BFlzpiUzoka5HmO5M5uNLPPpIZqsF1DqlBpbtYkeIzfEVqSkZz2mf/hnhibTaJ7YEImyGXwsAKr0tzaAVWOyih6jxzr5tYYN8QMvwlWM1lbDL4Uze4ziGImGBNnrRogzs05t0Iu4rd4n1v79e/0JgI5aYgPB4HoMfDEcDD2/5VemXJS+uG5Gj6joY5iVbzSlXNOiPajNUM7Vvhs4XeGsobkLBJLWQa3vS+Puf8xB0CBGAMIwVcAC1xkkQPImmoxDhoDevOYwiW5iv/8CUs22lbGmPvP4sdKagxLQq+4k1KHOz4lThOAcIm5+jRCjv/iKVzKDg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(136003)(366004)(346002)(39860400002)(376002)(396003)(451199015)(36756003)(41300700001)(2616005)(66556008)(66476007)(8676002)(6862004)(6636002)(4326008)(66946007)(37006003)(316002)(38100700002)(86362001)(8936002)(5660300002)(2906002)(26005)(6486002)(186003)(6512007)(478600001)(6666004)(6506007);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB9466 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 635be087-8895-46f9-2d30-08daf561ad50 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bwucWGzWsJHJcM5ZsT8K6DrQnxhkCbTEkTudkoIK7hKiHIJMLWj2LAyQzb08c7BinEXXxadRFUwpUtyysJ6wTCpeCxjgTZmTTasnfNxvAjmPUc3HA629lgKRXFNC44dMsMR/9rWSd085uI2WoW84cEWbmP7HFpGZH1vFwWLr4r9ReVfiwrbCZW1XJh71HYZ7Pvcert+Yo0D1eLVD1oEiA8AZgLx4TA8lX8g1oFFkleIsw7nh9avU8kt+TjXTDmQx2ygNqe0LEDeEBHYI6hM/F5AeQD/MUSRFafLLLSakn+Wd8Ncc9uZ8THrHsJ/1eAQJ0g8pwcnpWxf/wjfvN92KqNBrRadlU5RkP8pVfRvjZMfLn4jqLgF63Z7oxg/udJOPUcoCbaXAqB/I8Y01xkGvEPYamPqkpVgK8FcNXr/eZ5sdc95c+8zvC/4YLLvn11SOaLEfzlpOtYdJx+tkHk4kErNv9Cv0/yvtQmtd6esS6/mVPr2gtkE/tmYoxkX3YkgK0Z6IKkvyDGswn+EGHUoPOOVgIgdRco2KJhrMZiin+jip7AQXmkXYaR1ivEXblhfK0RMMuRauGIHB0g4SrSzA6CfV/LVHZnBCYj2tMoKw7sd/ZYYU/zKtjkyzZRZ0xmq5vo+mJSdmbA973zrOhjnpTRs/OLlqngMxghxh9NhYRkZnZBORI581iM4qUKkTV/Z0hcpg8Bc1Gk3NFhvEVrCGEA== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(346002)(136003)(376002)(451199015)(40470700004)(46966006)(36840700001)(478600001)(86362001)(40460700003)(6486002)(316002)(37006003)(2616005)(6636002)(2906002)(36756003)(356005)(81166007)(186003)(82310400005)(36860700001)(82740400003)(47076005)(40480700001)(6666004)(336012)(6512007)(70206006)(70586007)(8936002)(6862004)(6506007)(8676002)(4326008)(5660300002)(26005)(41300700001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jan 2023 12:28:43.4784 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 290a8263-2b05-4099-5b5d-08daf561b5b3 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6550 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 01/12/2023 15:57, Wilco Dijkstra wrote: > Optimize the main loop - large strings are 43% faster on modern CPUs. > Passes regress. please commit it, thanks. Reviewed-by: Szabolcs Nagy > > --- > > diff --git a/sysdeps/aarch64/memrchr.S b/sysdeps/aarch64/memrchr.S > index 9d2d29a396d46d6c2e74e3ca637091e2f3d68d5e..621fc65109736646b74900db8d15c6f8a7c68895 100644 > --- a/sysdeps/aarch64/memrchr.S > +++ b/sysdeps/aarch64/memrchr.S > @@ -26,7 +26,6 @@ > * MTE compatible. > */ > > -/* Arguments and results. */ > #define srcin x0 > #define chrin w1 > #define cntin x2 > @@ -77,31 +76,34 @@ ENTRY (__memrchr) > csel result, result, xzr, hi > ret > > + nop > L(start_loop): > - sub tmp, end, src > - subs cntrem, cntin, tmp > + subs cntrem, src, srcin > b.ls L(nomatch) > > /* Make sure that it won't overread by a 16-byte chunk */ > - add tmp, cntrem, 15 > - tbnz tmp, 4, L(loop32_2) > + sub cntrem, cntrem, 1 > + tbz cntrem, 4, L(loop32_2) > + add src, src, 16 > > - .p2align 4 > + .p2align 5 > L(loop32): > - ldr qdata, [src, -16]! > + ldr qdata, [src, -32]! > cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ > fmov synd, dend > cbnz synd, L(end) > > L(loop32_2): > - ldr qdata, [src, -16]! > + ldr qdata, [src, -16] > subs cntrem, cntrem, 32 > cmeq vhas_chr.16b, vdata.16b, vrepchr.16b > - b.ls L(end) > + b.lo L(end_2) > umaxp vend.16b, vhas_chr.16b, vhas_chr.16b /* 128->64 */ > fmov synd, dend > cbz synd, L(loop32) > +L(end_2): > + sub src, src, 16 > L(end): > shrn vend.8b, vhas_chr.8h, 4 /* 128->64 */ > fmov synd, dend