From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by sourceware.org (Postfix) with ESMTPS id E5DF83858D33 for ; Thu, 2 Mar 2023 11:49:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5DF83858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 322AwwOH028836; Thu, 2 Mar 2023 11:49:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=references : from : to : cc : subject : in-reply-to : date : message-id : content-type : mime-version; s=corp-2022-7-12; bh=8Srd7ZsFZtNpBGfEjdZnEaXV6xWdwsvkKGGXhd3jcZw=; b=TyYRykicxxhaO6DYY+aSWiNQWxYnoOpppOYK0ICYnxR92sYsJLJ/lkesOVkDADXkw9DT hlbnEU0ldRfIiqcNeSEb7g5Ms5HXmvqCC5slqAszgmCbumjJOrCyQWpyUXQGWrIXPETZ WXfgE0HT48bkYm/AoUhYV5+z3+ywtvtuQnRsUTDidmdTMmM7V85c/mvyWkSd52IjPuY9 O08Ph1IWVVwdoKH4OAEtsnKaizpOIsp408fLdAAhlxKtpcmAd1ZiAuYEqZZJ9jA8npBd /ENrte73etqqB1rF+TO3j5w5fKK6OyKXmskoGj1ChH0xYnS1OINXcFbRFATgYAjgwaWj Zg== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nyb6ekp4d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 11:49:56 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 322BWmTG033086; Thu, 2 Mar 2023 11:49:55 GMT Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2048.outbound.protection.outlook.com [104.47.66.48]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3ny8s9x0jr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 11:49:55 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=muDSTmwFTH2rUicLaAGEcNnTbBp+f4zoiQZQUZQbeF8w/AaXOtsoTXVSve0G1nHokodL+A8uiyOHsZSs9iAdifWbPjD7/XbzDcEfkwXhW6S/7BORJ/ZMY8975X2Wp/Z6B5EREHEN8AcXjGZKL2eEJWcml0jES7FRvieEJ3XDJTjLE3Ay7LIefJEUuG7Y1DpA7X/MsKtJM4RWuxaW+K/w1g9X+c8EvUzUnibyo9u49UB1XpI2koJOtLIt6VUsMLa24GaKSb0qEvlgTAJS9VfgxfqdsizGZ9epC/mmQ0176QyfvO6oSPHxcrOL7hW4NDMiilwpDBNBTNVPEqblPUuN8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8Srd7ZsFZtNpBGfEjdZnEaXV6xWdwsvkKGGXhd3jcZw=; b=m2U6CohiBeUozJagV8N4Kt6U1lWB5MmxfmtVCP6t1Ne444qC3CA/KqPD1lwsURu1v5irj+utGMXU/U0Ml25n1kSo9YC3H3UWM+hV/JY9brkdI1eFz35u/4f1x0xvZP6p609BnWWwzf+/4VQATqgkkg9pcE/lyfvFzZcoQUMLZYsWQiTNYMMrWLewOAiyBXZ5P9174O2Fra9afD7+Pol65WtN4y7+JFxi21U6xACDGN5ufHzN1MKHP0nL3irVVmreXle1JqilUvD+1mXQBhjMqLGFliGoqm+K5OYxipM/7L7+wI+08NYfx9ApgEyVeuJt4Sfej+3zqE16w3GTr1Pgrg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8Srd7ZsFZtNpBGfEjdZnEaXV6xWdwsvkKGGXhd3jcZw=; b=HmpdBrf36eh5GoXBCeJhlyI7Zi4nZEmJrEPGY1g6GiCwCowq7HvkYAgKuZuJBidRx67/nDAloKd6+rzkjW+ubGhUGM2SXHlWSBT3K5vJ53wEVUfOA3OGEZoui/Qx8TyyONg3VigkMb/toYX28Y1x+HR7CUQhC8hfqhLvlNVA7iQ= Received: from BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) by CY8PR10MB6610.namprd10.prod.outlook.com (2603:10b6:930:56::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6156.16; Thu, 2 Mar 2023 11:49:54 +0000 Received: from BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728]) by BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728%6]) with mapi id 15.20.6156.014; Thu, 2 Mar 2023 11:49:53 +0000 References: User-agent: mu4e 1.4.15; emacs 28.1 From: Cupertino Miranda To: Szabolcs Nagy Cc: Wilco Dijkstra , libc-alpha@sourceware.org Subject: Re: [PATCH] AArch64: Optimize strlen In-reply-to: Date: Thu, 02 Mar 2023 11:49:48 +0000 Message-ID: <87lekf9xur.fsf@oracle.com> Content-Type: text/plain X-ClientProxiedBy: LO2P265CA0175.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a::19) To BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN6PR1001MB2340:EE_|CY8PR10MB6610:EE_ X-MS-Office365-Filtering-Correlation-Id: 3a3efafe-852a-4104-23d3-08db1b143cc5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: e6HYeZfvkf7wpKYqv0x1NG2dEkbdF/hVRUO8//7UDNhlW/aqGk9pmTuvyudVFAta9IX7IdZ2NjTL0vT/0VGFX7Sb0CID6FsGUyUApoSdq3I60JG8c0vxkkCWzu2/azu+vgDK+F9e6zSVtAGjyJ+P5nursR++03NiA56rAKBbP+jqKgeSTcjvAUxZBoufJUMhXbHVZCWGy94NQAn8vcw+axt4iaMiIAYx0ZO6+YxTHCow0F9tFZe5ls9kfk9lrQw7iaf351U3EyIt/RSRK22NEdqRSC5NBxCcMB6+mo/59BPGMLB8YE6dzzXzZrmfrFZEzAXC8JhUoTTqXqihfQdbDHwGM5soKK4Bc5scuDLmfhUAKlsKIwzhE6I50+kTCKt6kR3Z0R5llCwQMP8O6QSLii7SO6e5Ui87IogsSHD9JQJP7Bl4JfEp7i5Gpx6Z078TePriTZ7N3ZA/9oU4fkE51jooiyM+7kAxBChRrn0Xij3vTqVjuaRhE0rmF1u4YcnWl9HulaMCY86aLucEDXlAKfChOwon86nRqq5B3tDccv+1JreuQQXdgJFViPl7MH6qa29nTCmXAl9hl3OeEvm/n6j3gItnttTl8ebp3q68OUbvQ+9iXFdr/+BS4VvRjeGlmwWJt1OWYsSqFn69uD75iQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN6PR1001MB2340.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(346002)(39860400002)(396003)(366004)(376002)(136003)(451199018)(83380400001)(6666004)(36756003)(38100700002)(478600001)(5660300002)(8936002)(66946007)(86362001)(2616005)(186003)(6512007)(6486002)(6506007)(66476007)(2906002)(8676002)(66556008)(44832011)(6916009)(4326008)(41300700001)(316002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?J0QVgAuHMD9cAINKjyvKC6UKoc5ZZ+bdUfcwSBDP5YnTBBA7qIFlgrLx+Q0B?= =?us-ascii?Q?Q05BiiyK0Z3Xfb6pM1UbrOy53DB4tzgzS+dwBlUU8axqtcSfFU/No0GJYk8i?= =?us-ascii?Q?RPnQq4I6Kr674MsfZu3IvidrHPUxX7z9BMBRkdpMA7A6VHwPvytVGccvlC+Y?= =?us-ascii?Q?b1u/VxXfFEhWGAPCr/wRuWAetx08W2JVOSgbhulOZpncESyJxpfx0XHCIa0m?= =?us-ascii?Q?EwLsli/T+XGo/anO/RMWb/ct465IlBIIlRl+D+qpsc+cL2ZBQ+6XivA6dIVn?= =?us-ascii?Q?vkLfoH3HQkIn9lTPHF7cHB/ZISzYTcgbpJcyTaOLLysrzyjwoYhDLpZnupC2?= =?us-ascii?Q?bZSL/jWTii/LD00tYwqA2gQ71oIJmYOSZuE/LjG/Fq7i66fWy1Xl/YoVQ5j7?= =?us-ascii?Q?n4I2BJqqJh4Klz77LeSeCu5+TNxRIcuBQwp1l+GmegkKV/1usvOVHldwSf14?= =?us-ascii?Q?ln3VB/bvE2fNUcYoCmdCxIKoKRe8s2yXXxteE11C/br1jQ1nVCSMlmXqUZZm?= =?us-ascii?Q?N0v0ar4tZTOKaL0UuAm9UHsoUUZJT9Aazs8lCtKoZ6gcH8iSPJDdQJ+7xOKg?= =?us-ascii?Q?taUcEqv/pdX8pChyUQPiGcWkkBh8VOnD8dNZ397VZaye8Rc63eEqadZ7bUYE?= =?us-ascii?Q?2ABD8h0/3+8IT5DshuHIWHBfvw10MBnrecS/DZjmT2TYA/jseyvu2r3Tyzo8?= =?us-ascii?Q?+IAlLU0St7lrqZNeqFtO5YB9AR6s4V6qcEI+CabQBoaU3umH8njUaB2qd3Pg?= =?us-ascii?Q?ICOZ4hJMn2bmMm3C4h+c7F1xBypnZP2Qe+J3ilO3vn4EkTXw3qjXYAt/Kvuc?= =?us-ascii?Q?x5aP+wRomJhdLUiDecVBkPPzMAELzYceWI6qDfLO3nGbB2JnqIRDRT4Kq55Y?= =?us-ascii?Q?yEtWgRLrusCmAEU/CJCHZOW9Bn9fyOecUnRVfQ6NvkDkTUDbhFCxrYS53RSx?= =?us-ascii?Q?SNx5vZ5T1089Onyn1xPHc1V4lMA0lqhwUgSA1+FkCpMrTkqMKvEOYwqwL3DP?= =?us-ascii?Q?lIcUD4V99gXofPOmeDZ/cWDVb9v5VcSqjm402BuVdVYQPqdRdl0+jCSdd++I?= =?us-ascii?Q?O1E0qLq+A33QwUN2fnwj7P+YuoQ4kCeYJA1Dkh8+rNlFIC8TvqkjlvvXQTkx?= =?us-ascii?Q?SGTxnliJXYeU5AW04UCx0/I8YmLr4A+0B1ud3N69REVkPsTNqIvUQ62G4ewk?= =?us-ascii?Q?hXIRr2+g1X2sg2brvGMyC5SW7l62SSWfnYKWiPrH6Ibk1dSbq7lvzg9MWgHs?= =?us-ascii?Q?mQ+YlHYcDtzwaTVcP9/+gRWa/X+DkrhPY9nwyuniXvuoFmD88OlliTlEIn5F?= =?us-ascii?Q?cVSjFaxaMzVf0z9RpmY9iSgROmNHBLffusycyOVBZ8fRh4jBekpAX91DkYCn?= =?us-ascii?Q?YcnWp2TF4K4l3fkDEEwmCIkgYlX+733laBwVqTyQYfXAfxZId8cbrtIDvito?= =?us-ascii?Q?CdIo2stJ6HrW0Q2wnRfIHLYoNi8LMlHOBq7UpbKV/ohKl5OUGKi+6Tx5ESRc?= =?us-ascii?Q?XV8J1ww/YFbhU31Ba79KPeId+QQHikt69AQwfsD/pLs+iwynRbyMICnZhhHk?= =?us-ascii?Q?MO9LjOGA3+E/ozY1PnVAI2rSKf1c3G1Us72JvHNfDFD96/A8new7UgQE3P4T?= =?us-ascii?Q?uPlAcPcxNFeykfA5eA7idp5/No5GExe7pC94znZaCRnI?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: LBJRulSoPYryUnhdy7BKyuO/sN9DEYGITnL5CMNlfc52KepgO8r4B56sXenHFDokGD5MBkBhfpu5cURgDrAm/SO3PzEWa6Goqj1FAdWtGZ5uqHhcK0dVBdJrJmX/raHn7akm9To9eH1tPaiWbetflgv3MzQXF+RTsc3j5+YxQT6NIqVaFpL5n8qZjbFIQ3d4+LCkgVWyf0LNw6/4lLhHtPeVJczDJlf3LDGyM98qHDpCJEfpbwJ9z59YlNUzxPBgQEQU5H86a2eDDzU3cyTZDg0imaw4kpB2r40HAauO0Poo2t+bnZ1TT3rohvMKrFeSSJP2yKuwVn1a+kmZ376b2smsHcVrOTUp4Snqsrmpq6bFARGVfRr+XvUIiKX79UvQtx5Wh0HwtrtYqVv4JIdReXXRvpJiq20G4rwFCSlGbVD5OcaS8EQO0CYsm4b49mwPayGY9hSJxEzF/JcMk2aOZNW2d9xX3OfrCvZOoIFD61Xsm4lAPpUYZhd0iDE3RRQJCzEdeEXNSwPwo5U8B/cHIa0M0SLGZNGUXusYTAyoMMoSWZNfCjj62bUz6OA3Sy8Zo8jXHl+2w+YR9idUky8I38btOswLsHUZUj0zlB34UdXKxyMhNhwu1iOI02J0YBoJgenTiyibB2zyIUqzQU6eSqZMVqhfm5WMTMtSsIG7tFOm9W3m05jMoVUIFzYJjzJw2b1aV6M+m/JG6hNuCLTL9T3km0+ZQNoJy4ZreHiPwg6KdSN3zXWZY/mf6Cw8FDOjpoP4aAG+AZLVvmX/GVVpcQ== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3a3efafe-852a-4104-23d3-08db1b143cc5 X-MS-Exchange-CrossTenant-AuthSource: BN6PR1001MB2340.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Mar 2023 11:49:53.8249 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: sbLj8av79eEe2l1TT7mT1cWejoJOD79Ue0qXKNazkOQI4Wxzd853x6YUX1NPFqOFyzNORfEwJX8okOIHvzeNkQnnvLuzKNiAJhvOa7L6FTE= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR10MB6610 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-02_06,2023-03-02_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxlogscore=999 phishscore=0 bulkscore=0 spamscore=0 suspectscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303020103 X-Proofpoint-ORIG-GUID: M8PjSqZhAb3J07BHnqUt58QjkcIhTH_z X-Proofpoint-GUID: M8PjSqZhAb3J07BHnqUt58QjkcIhTH_z X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Szabolcs, I am attempting to reproduce the presented performance improvements on a Ampere Altra processor. Can you please detail some more what was your setup and how did you measured it. BTW, I am not looking to discredit the work, but rather to be able to replicate the results on our end to evaluate backporting your patches. Best regards, Cupertino Szabolcs Nagy via Libc-alpha writes: > The 01/12/2023 15:53, Wilco Dijkstra wrote: >> Optimize strlen by unrolling the main loop. Large strings are 64% faster on >> modern CPUs. Passes regress. > > please commit it, thanks. > > Reviewed-by: Szabolcs Nagy > > >> >> --- >> >> diff --git a/sysdeps/aarch64/strlen.S b/sysdeps/aarch64/strlen.S >> index b3c92d9dc9b3c52e29e05ebbb89b929f177dc2cf..133ef933425fa260e61642a7840d73391168507d 100644 >> --- a/sysdeps/aarch64/strlen.S >> +++ b/sysdeps/aarch64/strlen.S >> @@ -43,12 +43,9 @@ >> #define dend d2 >> >> /* Core algorithm: >> - >> - For each 16-byte chunk we calculate a 64-bit nibble mask value with four bits >> - per byte. We take 4 bits of every comparison byte with shift right and narrow >> - by 4 instruction. Since the bits in the nibble mask reflect the order in >> - which things occur in the original string, counting trailing zeros identifies >> - exactly which byte matched. */ >> + Process the string in 16-byte aligned chunks. Compute a 64-bit mask with >> + four bits per byte using the shrn instruction. A count trailing zeros then >> + identifies the first zero byte. */ >> >> ENTRY (STRLEN) >> PTR_ARG (0) >> @@ -68,18 +65,25 @@ ENTRY (STRLEN) >> >> .p2align 5 >> L(loop): >> - ldr data, [src, 16]! >> + ldr data, [src, 16] >> + cmeq vhas_nul.16b, vdata.16b, 0 >> + umaxp vend.16b, vhas_nul.16b, vhas_nul.16b >> + fmov synd, dend >> + cbnz synd, L(loop_end) >> + ldr data, [src, 32]! >> cmeq vhas_nul.16b, vdata.16b, 0 >> umaxp vend.16b, vhas_nul.16b, vhas_nul.16b >> fmov synd, dend >> cbz synd, L(loop) >> - >> + sub src, src, 16 >> +L(loop_end): >> shrn vend.8b, vhas_nul.8h, 4 /* 128->64 */ >> sub result, src, srcin >> fmov synd, dend >> #ifndef __AARCH64EB__ >> rbit synd, synd >> #endif >> + add result, result, 16 >> clz tmp, synd >> add result, result, tmp, lsr 2 >> ret >>