From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2085.outbound.protection.outlook.com [40.107.15.85]) by sourceware.org (Postfix) with ESMTPS id 85F313858D34 for ; Wed, 6 Mar 2024 19:10:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 85F313858D34 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 85F313858D34 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.15.85 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1709752250; cv=pass; b=d2F97gQr5H8ZmrHdGCefo9r7jZv/Ud2DK+SR3VkKz7VeufwiCdyd3nlfjAdSSFffW7u9I1me+SG9a2eY+5sj6H+b9JoHd0DS0lf3GgcK9aTOTd8ReXSvVdM3fxUsqh6zgpeTvt6p2NSHcnKGGnpjeR6+YrBj9gKb+ApRZueKPxg= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1709752250; c=relaxed/simple; bh=jeC9Uv5fL3CeU7o6gBe1VUPDYnoO1y2vNRxgfO7BLyk=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=Vjb2F5SRlmjdhBNJHrf9Z5y5nYuPfOmocbT6mi0RVTsiBT3CRyH5v/P4Z9KaZb715dfqUtFYHX5Xf5FBSgcoe+UUmHKVfr4n0FWP0J2dldyko3mI8w4oRYFtRMYM1a3HNMXg0aEdYxOX1ZSIim/SOTmsJ+VnEvEnlSUpm+vFbsA= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=gEf3M0Ed3cPqFd4ZmlC+8qfrlX7AXDAY4rEKOEYzlrnvLW4482Cj8uIo+nc9oaCF2bk4BWrh+h6S9s2n47HKyTruxyUy+ZfwNFnZuPZ73Fkp6ofXJpvwqXJGztzAutOQUvHzhKaSkKiw7wW41lGdKS7pvEohZi0FZfIuC2QOwOL9sKsJ3dKcnwV3GRxQNLTCfVB9IkZAyt0T/5Z9EWD8VQgaWYDjzt/1IWdnubJBVBZ7cs0aTF+Y2RdEdmBBa0auxoh8C9lKxUUtXNssGgscuvxIdw7l9ckP4DjFvX+2VSQXNM+d3u2dOJNpcyqhKz568VQeNJ78KVpUWsOStTg6YQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jeC9Uv5fL3CeU7o6gBe1VUPDYnoO1y2vNRxgfO7BLyk=; b=kgE+Jully4pgEkd+pSQaVGGyzwbMFoe3V3xm+Y8au82cI0a/4fz8GFrODWn+aOq5y7LRwPnNOfcrA0TdfZ6Jld99+q/+WzvSMQQSrPDjSYC9P+rWIV8pAK1NGVdx5f8DWvhvmyAjXOfkBr4bSLxjBugOSqdvQkcek/ohHOS1buSGaQgbdtSBCJhbMVlfXKA6D2/7BSttTUEjatJAWg+FPUFD4vJgYIAm3pw/rLFgmJSZDxMDN6m/nUmhEeLotaCn1T1vjtq2dW8f2O0rSKEhTYLWSN29QjMqbHhA/KXmBW5DaWnVhkYaKrFICxd9imtWeYcF4D+uvH3//KRsPva1vw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jeC9Uv5fL3CeU7o6gBe1VUPDYnoO1y2vNRxgfO7BLyk=; b=BJnXctvwJob5mRaghR0ae0zETaaiZR39CTT3cEA1Eqbn0MKFOIMEOvig7BEKL4uDN/J9HZl8ot9n3r8zQJiCkL3RE+L1dcmDzglLW21n0b2PAkItk7vbrP2rT2KRqrcPkIjIkOfPPfS7yeZ0SZjgmckkkm0dzSV3FFeaeG4EVu0= Received: from DUZPR01CA0061.eurprd01.prod.exchangelabs.com (2603:10a6:10:3c2::19) by AS8PR08MB9696.eurprd08.prod.outlook.com (2603:10a6:20b:614::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7339.39; Wed, 6 Mar 2024 19:10:46 +0000 Received: from DU2PEPF0001E9C1.eurprd03.prod.outlook.com (2603:10a6:10:3c2:cafe::27) by DUZPR01CA0061.outlook.office365.com (2603:10a6:10:3c2::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7362.24 via Frontend Transport; Wed, 6 Mar 2024 19:10:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU2PEPF0001E9C1.mail.protection.outlook.com (10.167.8.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7292.25 via Frontend Transport; Wed, 6 Mar 2024 19:10:45 +0000 Received: ("Tessian outbound 7b0d57313a48:v276"); Wed, 06 Mar 2024 19:10:45 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: f914b790a4f4fee2 X-CR-MTA-TID: 64aa7808 Received: from 460909a54798.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 7E9C9DBE-A361-4577-9D7F-EB2699534564.1; Wed, 06 Mar 2024 19:10:39 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 460909a54798.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 06 Mar 2024 19:10:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CRbLTQ57NzuJbvNPrI0SBkIPsC7Kozoa3oRiiBwxtW1awG8WKcmeSmgubR6QNjBJaeEqkPxjoImomO2WK1kXfmKScCRbuXJFAQyTz+Mk73L8HZLlpXVHtCm31O+o+oO8Aum/hHy0ue7g8xh46tpKWMzlat7FSqmtdnMeDWgsTcJUSRiBizcx+W750b2+6QCoJ153E8S8ZXXEx2rDdpHJi2iTuJoSquyyeYMHzSFsWaFnWTb5Ji02aQWQbSekcsvIKFoXU71SFtM3zCjdFNhK04nhc4gmHGMc2f63nvyDJQoK9yCfmBpWsiZ28VEjjvEMSYUb5SfKQUy7B7dL/T/5eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jeC9Uv5fL3CeU7o6gBe1VUPDYnoO1y2vNRxgfO7BLyk=; b=bLXD0D9IPNeVUvIRlEpKii96zl5510lzeqZJFCriVa7Mk7ifaE3HG02O8fWHBrQ258xTvkQb8yYqDGsfBRaZd24Zs0oGn+stQe/hiMX4FMk1tY98uz1v/VwZ3Tx91n4/UkSStHDm1CLqzi0I5r5xOy9lucWhq3hj2REKGouPdob9pKUVgdL8D//5B7q9QkOF+t2No+G4dAOjWjyhFs8mKQAnFMZ2ht4RLpERGHkcAN599eNEXMB73IqvNCUTej76PVl0TTgzdCzdj+OldpW8veBFKqEKkQH0s2K8Mi9KLI+PJQxRoYN2FysvNe3234NHk8sllrMux0p+Dy6Vo+t4kg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jeC9Uv5fL3CeU7o6gBe1VUPDYnoO1y2vNRxgfO7BLyk=; b=BJnXctvwJob5mRaghR0ae0zETaaiZR39CTT3cEA1Eqbn0MKFOIMEOvig7BEKL4uDN/J9HZl8ot9n3r8zQJiCkL3RE+L1dcmDzglLW21n0b2PAkItk7vbrP2rT2KRqrcPkIjIkOfPPfS7yeZ0SZjgmckkkm0dzSV3FFeaeG4EVu0= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by PAWPR08MB9445.eurprd08.prod.outlook.com (2603:10a6:102:2e0::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7339.38; Wed, 6 Mar 2024 19:10:37 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::8b1b:5f28:5006:ac18]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::8b1b:5f28:5006:ac18%3]) with mapi id 15.20.7316.039; Wed, 6 Mar 2024 19:10:37 +0000 From: Wilco Dijkstra To: "tirtajames45@gmail.com" CC: 'GNU C Library' Subject: Re: [PATCH] optimize memmem for short needles Thread-Topic: [PATCH] optimize memmem for short needles Thread-Index: AQHab/OcB8M8Fm/zXEWyjLcgmDTONA== Date: Wed, 6 Mar 2024 19:10:36 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|PAWPR08MB9445:EE_|DU2PEPF0001E9C1:EE_|AS8PR08MB9696:EE_ X-MS-Office365-Filtering-Correlation-Id: cf199cb8-ab84-492b-2c79-08dc3e112065 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 72+yV6jJFSqBLn4g0QdwyNPJ169Gf6+crTTjy1el9C9uWd9h/pgMS5l8ZJcyHno7ys3PbEC2HBvwcbjaQz130G6DTOyYW4KrhPRmKnhHsDk0ir5zHGs6tmPLkkEGqlzLTWQqRGlm45OgMCZKuaUVwkbmtqBEWfGUqAP3wVGv30SGZFJT81QFc63ktJbsaaPPc1QLetpm4pGHk13SySBoJWnd0py+b1M+FAMxUTqG0qEzN1+k36PE+xBqJrUOAYFEMfV6ANlceDsJVN9NOeR9O2cJUMbtsQ+TANKZTKmpIrmfp1RuzKgbqUHPgaORhNBC3Ea6Uxfm3YRV7nN2gtkkLy8Q9MgvfXQ7dsZXl8qbeSqwofARf/8cREeNhJ7+QmMmsT/5Q74bK3xh6eniVVFhWbke1qlx3DfdGRkMFwEwOGOgzi4qIIr9ojgJjAL0Hfz557VyIsz5V9N9geRyRLTj1wmoX0OVY+YkXWPW43Anh7goQ4TiDKvxn2pQ6dcjIIEUVTc7oLhaX//7bIrXBIRIGcWngGgquBTx+tloOge1c5csrD7zHdgaui2y0GnQB56np1XTBbu93XfZN6qUcEsmtZUqLAjHMpCYa9R+wij53r8TltTryit04eh9rDwnE3qr/mu0hNdprkOmx/GBzx38uoEJEarWYMttnE7OCtegzxc= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376005)(38070700009);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9445 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU2PEPF0001E9C1.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: abeaa506-5d43-4ce5-6561-08dc3e111b23 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NQvRKmJGw/XYOGSna+o7UqMk0BcafRqHUiNzQgiWiK4e4avTFfJfXzaIFwmYcm3h3xKSvSvnwmBdwnw5UYuMA55OQr3Sg1VE13hZsZ0szc5u2ku6ZjSzPrac7+TJcmbrj+sSMbyMRKXxpV5ko8Upo82g4oUnYAVOm3sIdEsPsv/nN+Ret7jIVq+cn4zrMEmFDAd0jNTF1jRzzys+Lmu4b/vujeawoVyIMLUYwI7IwkJPfgHdp5rtvNCoeudtRYAeeEPV0tEFSFLpWkSe1PIuR7ngSTrXExQRSvXKESHYvOLfWCVT44yKoj+Ev1lBfYitoMcabv2/fDQTlPVnfSIlh+G52uqSSGvv09zMqRzCmu91FQnhBkpKR5FIptqU3Bk8ZVSKHMXmNgf8EFQ+ckcDu75sQ9lwDdpl4GdxorNakW3Qj4kjkX8qXki35wrDsDU7tp7H6tza62WLJ9zpdIwqdyKd/UKWANspyJsywQRZe7wqw/6rnxkvw/1/hC/L/yqydLX659c1mABSDvROfifAj+DHUyW+nbA82tfGABJjjRRhEvhyfetslF3gt1CQEbxhRJjI09w2Ik+O0+JqrrAV0j0jQOdSkM0Xhfl8joe5EAmRHtibSV6haQk6IixuAsU95ZbKr2Zqxa1d6yrBKoF0ytBbn3/dnWTkIz6QDRu5UkLhO3xBTtFh643RPx4ut+ptZKu7ODcGzJohBZd76xaCAA== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(376005)(82310400014)(36860700004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2024 19:10:45.8479 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cf199cb8-ab84-492b-2c79-08dc3e112065 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU2PEPF0001E9C1.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9696 X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi James,=0A= =0A= > The current implementation does not check for an early match with memchr= =0A= > before initializing the shift table because large shifts may be faster=0A= > than memchr. However, for short shifts, memchr may be faster.=0A= > find_edge_in_needle (taken from strstr_avx512) is used to find a rarer=0A= > character for memchr to find.=0A= =0A= It looks like it is faster on this particular benchmark - however I'm not c= onvinced=0A= doing this is faster in general. I tried the change on the SMART benchmark = [1][2],=0A= and there it is generally slower. A few cases show quite large differences:= =0A= =0A= rand2, len 2: 58% slower (len 3-16 also slower but by less)=0A= rand4, len 2: 43% slower (len 3-16 also slower but by less)=0A= rand8, len 2: 16% slower (len 3-16 slower but close)=0A= chineseTexts, len 2: 8% faster=0A= genome, len 2: 44% slower (len 3 slower but close)=0A= =0A= Overall there is only one clear gain on Chinese but there are large slowdow= ns=0A= with short needles in texts with a small number of symbols.=0A= =0A= Cheers,=0A= Wilco=0A= =0A= [1] https://www.dmi.unict.it/faro/smart/howto.php=0A= [2] https://github.com/smart-tool/smart.git=0A=