From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2055.outbound.protection.outlook.com [40.107.20.55]) by sourceware.org (Postfix) with ESMTPS id DC69E3858C74 for ; Thu, 12 Jan 2023 15:51:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DC69E3858C74 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; b=hMxZt4AmPXshytu1R2yMHFCyz3u+PcmmUfnMwgWEV67fXqv3v1Qly86ST9NfbUzuMNlRV5v9ApfyZ4C1WmilNigj2uL7INZcZc/wbjvp9EMWdeHU68wtwd65XkBBIAJsHiwai35KzOt30W2QScFXDV6v7WkFXCP43QMbPXHJF0E= Received: from DB6PR07CA0099.eurprd07.prod.outlook.com (2603:10a6:6:2c::13) by DU0PR08MB7590.eurprd08.prod.outlook.com (2603:10a6:10:317::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:51:42 +0000 Received: from DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:2c:cafe::45) by DB6PR07CA0099.outlook.office365.com (2603:10a6:6:2c::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6023.6 via Frontend Transport; Thu, 12 Jan 2023 15:51:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT030.mail.protection.outlook.com (100.127.142.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 15:51:42 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Thu, 12 Jan 2023 15:51:42 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 222dc42d7b6b6fd4 X-CR-MTA-TID: 64aa7808 Received: from 4b1ec5a452e6.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C0AE5D42-1D25-41B1-9F2A-01F30853F805.1; Thu, 12 Jan 2023 15:51:35 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4b1ec5a452e6.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 12 Jan 2023 15:51:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YHOGp8uoWRK6r2t3Qr5G1RLRMNnZWWvjvAd4EG9gsYj/28m1bsAy3QwvuvkgAjY4v6fazEngcawl6k7lpqoQAYRPItxMZEtkQnBdXnvIK8MnRNaNoAoDDTT65vQ/UQnLOHzz75a+edLeHj85+bKspL4hXsa6cmhJh422ntL7pj6met2A44rCg47+KFDfhLFtzrDDF29rSQAjRPOty7PAbT85fvCoscO7imlLoGrrCFsstJeQ9i7rbdCjk7KDMcO49Q15C+MLQ1/z1M5bBNtAjMD/vHKJ67qdmjfluiEnZG8mwRFulE7X0MfuMUJUwq1KES6nq1DLOM8QWjzal4cJYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; b=HFxKqadiEUM9cUWOTr44UArcYcV4qisgRI0nFe84KIMgSPw3rVamyylxszWhgBgi/zibrMMnrWu5BaiYePAN6W7yYaUhEaeNipCmYhBwi40TOYNKe/Hes0/KDI4AFO8DWOUERdBXUduHT6r+d+163i0+mG9jM8v7Av1w/QU0XEOSxC1tG/K+o22t9Cmj+ONtfk9X2fsC9mL1Kz4aVbcNs5fMkInB3QLB7GRlPLR0fbMLgE3PmCTS7pQ69pEdE93J+DE/Vc7m9jDOd3djS8brC6rrNrD2X5s79GEm6KQHwc1XkXf/bivLs2v+uNsZjZORGcieORR/zk6g0mT5kHx/GQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UkfFwoM/UJoR6klORCfwuBkMlxyXHzbNpDtkKckY6jU=; b=hMxZt4AmPXshytu1R2yMHFCyz3u+PcmmUfnMwgWEV67fXqv3v1Qly86ST9NfbUzuMNlRV5v9ApfyZ4C1WmilNigj2uL7INZcZc/wbjvp9EMWdeHU68wtwd65XkBBIAJsHiwai35KzOt30W2QScFXDV6v7WkFXCP43QMbPXHJF0E= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB10111.eurprd08.prod.outlook.com (2603:10a6:20b:62d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Thu, 12 Jan 2023 15:51:34 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::66e4:4940:d096:4f7%9]) with mapi id 15.20.5986.018; Thu, 12 Jan 2023 15:51:32 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Improve strlen_asimd Thread-Topic: [PATCH] AArch64: Improve strlen_asimd Thread-Index: AQHZJp2JkWdXUUw02UqIwGwR+HcG/Q== Date: Thu, 12 Jan 2023 15:51:32 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB10111:EE_|DBAEUR03FT030:EE_|DU0PR08MB7590:EE_ X-MS-Office365-Filtering-Correlation-Id: f5b6b68c-ae57-491d-45df-08daf4b4e647 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ijGM0Uk0r+8wfVKWCowyBG5hkgOcVVyAOCE4z+ePDywTqPCLZkNSu/wiuZze3PL4FUCSV8c41+khjLhXDsQxGiagqZqwF45suMpSsPQ15OjuiTCESRGvyn7NxEdZ4itXF22q4W21pKNqhtYI8t+RIGfI9G8/S9aKS7UE7pmNLSi7s7TlxycgiTA8+owKhR3CL1y6JVQEa5isKcaebqmwcKoclYuMLerxfsr6smqE7hxtSsrp7Q1qIZLqcN4Qm5NgyEk4HY66UGI8f8gBVjhuju2MT3y+0RMHlMccU/IENY266VeHBYyKsTUBcvsj9/HyVqTsBJnUvNpYnBUaJ8wJNYgUxw7dbBiY3A9GocnRzk+NM2RTgrk6KX+6eXvzFG4jOwTyfmysOHXSlpH2RpccY3RNQ+L0nx7Ejioxz9xzSehDf/kGTkjEL3oJ0pheo5pC8kPgFkxJTtTkhkNUDFP4PzPccFWekdBe3m8OJ3lzECh6tGWrY6AZrCC4YdF8tufSpG2NNEHTm03EkIav5v2yQuE9yt0zFNvsjSPVaGHytJFjN8C+yIXoD/628vqIApalvzmqhPnXy61LH98sI/fIB+9gHIb7jNpu9TmbHU5KYnsRtNjI0HqYvuAf2OKcEZgyo8Pz/J0WQtXXSJ9LhCQBJMdrbhs18intW1jzGK9q9aye+oMjlXzrFlJ13du1o5OYJudc9u/JAx+vWUdeVMApKg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(396003)(39860400002)(366004)(136003)(346002)(451199015)(86362001)(2906002)(5660300002)(8936002)(66446008)(41300700001)(66556008)(91956017)(33656002)(66476007)(52536014)(8676002)(66946007)(64756008)(76116006)(38100700002)(83380400001)(4326008)(316002)(55016003)(6916009)(6506007)(26005)(9686003)(71200400001)(186003)(7696005)(478600001)(122000001)(38070700005);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10111 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c4c77672-508b-4d3d-1792-08daf4b4e087 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dUtFvdTCPiHF2KO5N+fW406cyR8mh7nmXqsDs3YHBTyeHVgcq0axWPgkTGwkNp6qucPvtits2Dq/CFaUcoKnGH4HKEpLSE987YIjHrlVqLfBFW2lLOtgaCDg1Jv7d5PRDs3GDwubumw2EYKmgH2Ij9TXDIIs5CHGKxEc3cz+DIuTfMXJDRJIhNZbfJsc+Ub7T/oUDHJzKOPPno/GX5vllRUcR0Iz6fbItiU40psBjQzEZlZavJ0JFXtV6XKZpSHEK+LMgh+fdS4X1VypOO8Cuxp1tg/W9JCyVSbOKN+/TqCoB1KEZtOb/zjxvSnY7DJXVFyEsj9BM72xkvynGronH461Ioe9YkbrnTFEWM9xmfQUk+mWaZBbMdAuDqJTgKC47lJhAMmiUW0bDwpXE3zkKP5rKElwqZ3XKMD81sbVs1rnVuv6EAyGW+pdMbKifcxNw3QfHEb0V7r12Ra4PQasJl2L0e0SpN+GZiyyHPGldcHivQ/0WqrktBvu215Sj4XJIKuu2jIUMrBzX5gmPTXCfjbv83O7PRJOyaqOPz/bbl1mZcSmlZD/m9sGjBg8mrBqz0ncI2IVAszagl7hv+kRUxr2l57vzOZadD0HCvYVixkwLRnOn1Jses03oZGiNILRDqNHm+8VUQ9rekWLfPcVQeO39ZuJ7zvLi0d2ZAFlfdxxwhAkeeumeqH7ONilr7YvXVEKUuEB7VWnDPmIvHB3+w== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(376002)(346002)(39860400002)(396003)(451199015)(36840700001)(46966006)(40470700004)(82310400005)(40460700003)(478600001)(83380400001)(9686003)(36860700001)(6916009)(86362001)(5660300002)(2906002)(316002)(4326008)(33656002)(82740400003)(26005)(70586007)(6506007)(336012)(70206006)(40480700001)(7696005)(55016003)(8936002)(52536014)(356005)(8676002)(81166007)(47076005)(186003)(41300700001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 15:51:42.0911 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f5b6b68c-ae57-491d-45df-08daf4b4e647 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7590 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment.= =0A= Performance improves slightly as a result. Passes regress.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/multiarch/strlen_asimd.S b/sysdeps/aarch64/mul= tiarch/strlen_asimd.S=0A= index ca6ab96ecf2de45def79539facd8e0b86f4edc95..490439491d19c3f14b0228f4224= 8bc8aa6e9e8bd 100644=0A= --- a/sysdeps/aarch64/multiarch/strlen_asimd.S=0A= +++ b/sysdeps/aarch64/multiarch/strlen_asimd.S=0A= @@ -48,6 +48,7 @@=0A= #define tmp x2=0A= #define tmpw w2=0A= #define synd x3=0A= +#define syndw w3=0A= #define shift x4=0A= =0A= /* For the first 32 bytes, NUL detection works on the principle that=0A= @@ -87,7 +88,6 @@=0A= =0A= ENTRY (__strlen_asimd)=0A= PTR_ARG (0)=0A= -=0A= and tmp1, srcin, MIN_PAGE_SIZE - 1=0A= cmp tmp1, MIN_PAGE_SIZE - 32=0A= b.hi L(page_cross)=0A= @@ -123,7 +123,6 @@ ENTRY (__strlen_asimd)=0A= add len, len, tmp1, lsr 3=0A= ret=0A= =0A= - .p2align 3=0A= /* Look for a NUL byte at offset 16..31 in the string. */=0A= L(bytes16_31):=0A= ldp data1, data2, [srcin, 16]=0A= @@ -151,6 +150,7 @@ L(bytes16_31):=0A= add len, len, tmp1, lsr 3=0A= ret=0A= =0A= + nop=0A= L(loop_entry):=0A= bic src, srcin, 31=0A= =0A= @@ -166,18 +166,12 @@ L(loop):=0A= /* Low 32 bits of synd are non-zero if a NUL was found in datav1. */=0A= cmeq maskv.16b, datav1.16b, 0=0A= sub len, src, srcin=0A= - tst synd, 0xffffffff=0A= - b.ne 1f=0A= + cbnz syndw, 1f=0A= cmeq maskv.16b, datav2.16b, 0=0A= add len, len, 16=0A= 1:=0A= /* Generate a bitmask and compute correct byte offset. */=0A= -#ifdef __AARCH64EB__=0A= - bic maskv.8h, 0xf0=0A= -#else=0A= - bic maskv.8h, 0x0f, lsl 8=0A= -#endif=0A= - umaxp maskv.16b, maskv.16b, maskv.16b=0A= + shrn maskv.8b, maskv.8h, 4=0A= fmov synd, maskd=0A= #ifndef __AARCH64EB__=0A= rbit synd, synd=0A= @@ -186,8 +180,6 @@ L(loop):=0A= add len, len, tmp, lsr 2=0A= ret=0A= =0A= - .p2align 4=0A= -=0A= L(page_cross):=0A= bic src, srcin, 31=0A= mov tmpw, 0x0c03=0A=