From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 837333986028 for ; Wed, 16 Sep 2020 12:35:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 837333986028 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08GCXVZh151034 for ; Wed, 16 Sep 2020 08:35:36 -0400 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0a-001b2d01.pphosted.com with ESMTP id 33kj3nhbpq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 16 Sep 2020 08:35:36 -0400 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 08GCWOj5025087 for ; Wed, 16 Sep 2020 12:35:35 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma02wdc.us.ibm.com with ESMTP id 33k67bve07-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 16 Sep 2020 12:35:35 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 08GCZZn551184026 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Sep 2020 12:35:35 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00767112065; Wed, 16 Sep 2020 12:35:35 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 36DB0112062; Wed, 16 Sep 2020 12:35:34 +0000 (GMT) Received: from [9.160.124.9] (unknown [9.160.124.9]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 16 Sep 2020 12:35:33 +0000 (GMT) Subject: Re: [PATCH v2 2/2] powerpc: Add optimized stpncpy for POWER9 To: Raphael Moreira Zinsly , libc-alpha@sourceware.org References: <20200904165653.16202-1-rzinsly@linux.ibm.com> <20200904165653.16202-2-rzinsly@linux.ibm.com> From: Matheus Castanho Message-ID: <2f66a54f-c290-495b-67c8-2038ee73409c@linux.ibm.com> Date: Wed, 16 Sep 2020 09:35:33 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20200904165653.16202-2-rzinsly@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-09-16_07:2020-09-16, 2020-09-16 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 mlxlogscore=999 suspectscore=0 spamscore=0 clxscore=1015 priorityscore=1501 phishscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009160092 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2020 12:35:39 -0000 On 9/4/20 1:56 PM, Raphael Moreira Zinsly via Libc-alpha wrote: > Add stpncpy support into the POWER9 strncpy. > --- > sysdeps/powerpc/powerpc64/le/power9/stpncpy.S | 24 +++++++ > sysdeps/powerpc/powerpc64/le/power9/strncpy.S | 65 +++++++++++++++++++ > sysdeps/powerpc/powerpc64/multiarch/Makefile | 2 +- > .../powerpc64/multiarch/ifunc-impl-list.c | 5 ++ > .../powerpc64/multiarch/stpncpy-power9.S | 24 +++++++ > sysdeps/powerpc/powerpc64/multiarch/stpncpy.c | 7 ++ > 6 files changed, 126 insertions(+), 1 deletion(-) > create mode 100644 sysdeps/powerpc/powerpc64/le/power9/stpncpy.S > create mode 100644 sysdeps/powerpc/powerpc64/multiarch/stpncpy-power9.S > > diff --git a/sysdeps/powerpc/powerpc64/le/power9/stpncpy.S b/sysdeps/powerpc/powerpc64/le/power9/stpncpy.S > new file mode 100644 > index 0000000000..81d9673d8b > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power9/stpncpy.S > @@ -0,0 +1,24 @@ > +/* Optimized stpncpy implementation for POWER9 LE. > + Copyright (C) 2020 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#define USE_AS_STPNCPY > +#include > + > +weak_alias (__stpncpy, stpncpy) > +libc_hidden_def (__stpncpy) > +libc_hidden_builtin_def (stpncpy) > diff --git a/sysdeps/powerpc/powerpc64/le/power9/strncpy.S b/sysdeps/powerpc/powerpc64/le/power9/strncpy.S > index 34fcdee913..f7265b11ec 100644 > --- a/sysdeps/powerpc/powerpc64/le/power9/strncpy.S > +++ b/sysdeps/powerpc/powerpc64/le/power9/strncpy.S > @@ -18,16 +18,30 @@ > > #include > > +#ifdef USE_AS_STPNCPY > +# ifndef STPNCPY > +# define FUNC_NAME __stpncpy > +# else > +# define FUNC_NAME STPNCPY > +# endif > +#else > # ifndef STRNCPY > # define FUNC_NAME strncpy > # else > # define FUNC_NAME STRNCPY > # endif > +#endif /* !USE_AS_STPNCPY */ > > /* Implements the function > > char * [r3] strncpy (char *dest [r3], const char *src [r4], size_t n [r5]) > > + or > + > + char * [r3] stpncpy (char *dest [r3], const char *src [r4], size_t n [r5]) > + > + if USE_AS_STPNCPY is defined. > + > The implementation can load bytes past a null terminator, but only > up to the next 16-byte aligned address, so it never crosses a page. */ > > @@ -49,7 +63,15 @@ ENTRY_TOCLESS (FUNC_NAME, 4) > > /* Empty/1-byte string optimization */ > cmpdi r5,0 > +#ifdef USE_AS_STPNCPY > + bgt L(cont) > + /* Compute pointer to last byte copied into dest. */ > + addi r3,r3,1 > + blr > +L(cont): > +#else > beqlr > +#endif > > addi r4,r4,1 > neg r7,r4 > @@ -79,12 +101,20 @@ ENTRY_TOCLESS (FUNC_NAME, 4) > sldi r10,r5,56 /* stxvl wants size in top 8 bits */ > stxvl 32+v0,r11,r10 /* Partial store */ > > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r5 > +#endif > blr > > L(null): > sldi r10,r8,56 /* stxvl wants size in top 8 bits */ > stxvl 32+v0,r11,r10 /* Partial store */ > > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r7 > +#endif > add r11,r11,r8 > sub r5,r5,r8 > b L(zero_padding_loop) > @@ -168,6 +198,10 @@ L(n_tail4): > sldi r10,r5,56 /* stxvl wants size in top 8 bits */ > addi r11,r11,48 /* Offset */ > stxvl 32+v3,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r5 > +#endif > blr > > L(prep_n_tail1): > @@ -179,6 +213,10 @@ L(prep_n_tail1): > L(n_tail1): > sldi r10,r5,56 /* stxvl wants size in top 8 bits */ > stxvl 32+v0,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r5 > +#endif > blr > > L(prep_n_tail2): > @@ -192,6 +230,10 @@ L(n_tail2): > sldi r10,r5,56 /* stxvl wants size in top 8 bits */ > addi r11,r11,16 /* offset */ > stxvl 32+v1,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r5 > +#endif > blr > > L(prep_n_tail3): > @@ -206,6 +248,10 @@ L(n_tail3): > sldi r10,r5,56 /* stxvl wants size in top 8 bits */ > addi r11,r11,32 /* Offset */ > stxvl 32+v2,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r5 > +#endif > blr > > L(prep_tail1): > @@ -215,6 +261,10 @@ L(tail1): > addi r9,r8,1 /* Add null terminator */ > sldi r10,r9,56 /* stxvl wants size in top 8 bits */ > stxvl 32+v0,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r8 > +#endif > add r11,r11,r9 > sub r5,r5,r9 > b L(zero_padding_loop) > @@ -229,6 +279,10 @@ L(tail2): > sldi r10,r9,56 /* stxvl wants size in top 8 bits */ > addi r11,r11,16 /* offset */ > stxvl 32+v1,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r8 > +#endif > add r11,r11,r9 > sub r5,r5,r9 > b L(zero_padding_loop) > @@ -244,6 +298,10 @@ L(tail3): > sldi r10,r9,56 /* stxvl wants size in top 8 bits */ > addi r11,r11,32 /* offset */ > stxvl 32+v2,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r8 > +#endif > add r11,r11,r9 > sub r5,r5,r9 > b L(zero_padding_loop) > @@ -259,6 +317,10 @@ L(tail4): > sldi r10,r9,56 /* stxvl wants size in top 8 bits */ > addi r11,r11,48 /* offset */ > stxvl 32+v3,r11,r10 /* Partial store */ > +#ifdef USE_AS_STPNCPY > + /* Compute pointer to last byte copied into dest. */ > + add r3,r11,r8 > +#endif > add r11,r11,r9 > sub r5,r5,r9 > > @@ -279,3 +341,6 @@ L(zero_padding_end): > blr > > END (FUNC_NAME) > +#ifndef USE_AS_STPNCPY > +libc_hidden_builtin_def (strncpy) > +#endif > diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile > index cd2b47b403..f46bf50732 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile > +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile > @@ -33,7 +33,7 @@ sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \ > > ifneq (,$(filter %le,$(config-machine))) > sysdep_routines += strcmp-power9 strncmp-power9 strcpy-power9 stpcpy-power9 \ > - rawmemchr-power9 strlen-power9 strncpy-power9 > + rawmemchr-power9 strlen-power9 strncpy-power9 stpncpy-power9 > endif > CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops > CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops > diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > index aa63e1c23f..56790bcfe3 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > @@ -317,6 +317,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > /* Support sysdeps/powerpc/powerpc64/multiarch/stpncpy.c. */ > IFUNC_IMPL (i, name, stpncpy, > +#ifdef __LITTLE_ENDIAN__ > + IFUNC_IMPL_ADD (array, i, stpncpy, > + hwcap2 & PPC_FEATURE2_ARCH_3_00, > + __stpncpy_power9) > +#endif > IFUNC_IMPL_ADD (array, i, stpncpy, > hwcap2 & PPC_FEATURE2_ARCH_2_07, > __stpncpy_power8) > diff --git a/sysdeps/powerpc/powerpc64/multiarch/stpncpy-power9.S b/sysdeps/powerpc/powerpc64/multiarch/stpncpy-power9.S > new file mode 100644 > index 0000000000..ccbab55c31 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/multiarch/stpncpy-power9.S > @@ -0,0 +1,24 @@ > +/* Optimized stpncpy implementation for POWER9 LE. > + Copyright (C) 2020 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#define STPNCPY __stpncpy_power9 > + > +#undef libc_hidden_builtin_def > +#define libc_hidden_builtin_def(name) > + > +#include > diff --git a/sysdeps/powerpc/powerpc64/multiarch/stpncpy.c b/sysdeps/powerpc/powerpc64/multiarch/stpncpy.c > index 17df886431..ac17b26650 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/stpncpy.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/stpncpy.c > @@ -26,10 +26,17 @@ > extern __typeof (__stpncpy) __stpncpy_ppc attribute_hidden; > extern __typeof (__stpncpy) __stpncpy_power7 attribute_hidden; > extern __typeof (__stpncpy) __stpncpy_power8 attribute_hidden; > +# ifdef __LITTLE_ENDIAN__ > +extern __typeof (__stpncpy) __stpncpy_power9 attribute_hidden; > +# endif > # undef stpncpy > # undef __stpncpy > > libc_ifunc_redirected (__redirect___stpncpy, __stpncpy, > +# ifdef __LITTLE_ENDIAN__ > + (hwcap2 & PPC_FEATURE2_ARCH_3_00) > + ? __stpncpy_power9 : > +# endif > (hwcap2 & PPC_FEATURE2_ARCH_2_07) > ? __stpncpy_power8 > : (hwcap & PPC_FEATURE_HAS_VSX) > LGTM. Reviewed-by: Matheus Castanho -- Matheus Castanho