From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 95673398E464 for ; Thu, 22 Apr 2021 15:22:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 95673398E464 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13MFLc3f012325 for ; Thu, 22 Apr 2021 11:22:51 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com with ESMTP id 383bhsgddc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 22 Apr 2021 11:22:51 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13MF4Wn2016300 for ; Thu, 22 Apr 2021 15:22:50 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma01dal.us.ibm.com with ESMTP id 38311tmw3m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 22 Apr 2021 15:22:50 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13MFMnRf36110748 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 22 Apr 2021 15:22:50 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DB770112061; Thu, 22 Apr 2021 15:22:49 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 78623112063; Thu, 22 Apr 2021 15:22:48 +0000 (GMT) Received: from [9.160.27.185] (unknown [9.160.27.185]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 22 Apr 2021 15:22:48 +0000 (GMT) Subject: Re: [PATCH v2] powerpc: Add optimized strlen for POWER10 To: Matheus Castanho , libc-alpha@sourceware.org References: <20210422122911.27758-1-msc@linux.ibm.com> From: Raphael M Zinsly Message-ID: <3a3e48a5-1083-dcf3-bcc2-32350fa4d20a@linux.ibm.com> Date: Thu, 22 Apr 2021 12:22:47 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 In-Reply-To: <20210422122911.27758-1-msc@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 89gg8Gex6s817sJChiAUEsy-evAcq7uM X-Proofpoint-GUID: 89gg8Gex6s817sJChiAUEsy-evAcq7uM Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-04-22_06:2021-04-22, 2021-04-22 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 mlxscore=0 lowpriorityscore=0 mlxlogscore=999 impostorscore=0 phishscore=0 bulkscore=0 spamscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104220120 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Apr 2021 15:23:03 -0000 Hi Matheus, the patch LGTM with some trivial changes. On 22/04/2021 09:29, Matheus Castanho via Libc-alpha wrote: > diff --git a/sysdeps/powerpc/powerpc64/le/power10/strlen.S b/sysdeps/powerpc/powerpc64/le/power10/strlen.S > new file mode 100644 > index 0000000000..7eb37a8f54 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power10/strlen.S > @@ -0,0 +1,221 @@ > +/* Optimized strlen implementation for POWER10 LE. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +#ifndef STRLEN > +# define STRLEN __strlen > +# define DEFINE_STRLEN_HIDDEN_DEF 1 > +#endif > + > +/* TODO: Replace macros by the actual instructions when minimum binutils becomes > + >= 2.35. This is used to keep compatibility with older versions. */ > +#define VEXTRACTBM(rt,vrb) \ > + .long(((4)<<(32-6)) \ > + | ((rt)<<(32-11)) \ > + | ((8)<<(32-16)) \ > + | ((vrb)<<(32-21)) \ > + | 1602) > + > +#define LXVP(xtp,dq,ra) \ > + .long(((6)<<(32-6)) \ > + | ((((xtp)-32)>>1)<<(32-10)) \ > + | ((1)<<(32-11)) \ > + | ((ra)<<(32-16)) \ > + | dq) > + > +#define CHECK16(vreg,offset,addr,label) \ > + lxv vreg+32,offset(addr); \ > + vcmpequb. vreg,vreg,v18; \ > + bne cr6,L(label); > + > +/* Load 4 quadwords, merge into one VR for speed and check for NULLs. r6 has # > + of bytes already checked. */ > +#define CHECK64(offset,addr,label) \ > + li r6,offset; \ > + LXVP(v4+32,offset,addr); \ > + LXVP(v6+32,offset+32,addr); \ > + vminub v14,v4,v5; \ > + vminub v15,v6,v7; \ > + vminub v16,v14,v15; \ > + vcmpequb. v0,v16,v18; \ > + bne cr6,L(label) > + > +# define TAIL(vreg,increment) \ nit: the space before define is not needed. > + vctzlsbb r4,vreg; \ > + subf r3,r3,r5; \ > + addi r4,r4,increment; \ > + add r3,r3,r4; \ > + blr > + > +/* Implements the function > + > + int [r3] strlen (const void *s [r3]) > + > + The implementation can load bytes past a matching byte, but only > + up to the next 64B boundary, so it never crosses a page. */ > + > +.machine power9 > + > +ENTRY_TOCLESS (STRLEN, 4) > + CALL_MCOUNT 1 > + > + vspltisb v18,0 > + vspltisb v19,-1 > + > + /* Next 16B-aligned address. Prepare address for L(aligned). */ > + addi r5,r3,16 > + clrrdi r5,r5,4 > + > + /* Align data and fill bytes not loaded with non matching char. */ > + lvx v0,0,r3 > + lvsr v1,0,r3 > + vperm v0,v19,v0,v1 > + > + vcmpequb. v6,v0,v18 > + beq cr6,L(aligned) > + > + vctzlsbb r3,v6 > + blr > + > + /* Test more 112B, 16B at a time. The main loop is optimized for longer s/112B/176B/ > + strings, so checking the first bytes in 16B chunks benefits a lot > + small strings. */ > + .p2align 5 > +L(aligned): -- Raphael Moreira Zinsly