From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 84E143955426 for ; Thu, 29 Apr 2021 21:24:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 84E143955426 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13TL2rkb120939 for ; Thu, 29 Apr 2021 17:24:16 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com with ESMTP id 3883sr191c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 29 Apr 2021 17:24:16 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13TL7CoT031082 for ; Thu, 29 Apr 2021 21:24:15 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma04wdc.us.ibm.com with ESMTP id 384ay9y70k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 29 Apr 2021 21:24:15 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13TLOEnZ43843862 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Apr 2021 21:24:14 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2984DAE060; Thu, 29 Apr 2021 21:24:14 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2E1CEAE05C; Thu, 29 Apr 2021 21:24:13 +0000 (GMT) Received: from [9.160.55.240] (unknown [9.160.55.240]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 29 Apr 2021 21:24:12 +0000 (GMT) Subject: Re: [PATCH v2] powerpc: Optimized memmove for POWER10 To: "Lucas A. M. Magalhaes" , libc-alpha@sourceware.org References: <20210429204231.1486973-1-lamm@linux.ibm.com> From: Raphael M Zinsly Message-ID: <0968f636-0ced-72c3-1c2e-fa0e8339281d@linux.ibm.com> Date: Thu, 29 Apr 2021 18:24:11 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 In-Reply-To: <20210429204231.1486973-1-lamm@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 8SpxmDLFgITBivoJxz5e62KRvwekT2Yv X-Proofpoint-ORIG-GUID: 8SpxmDLFgITBivoJxz5e62KRvwekT2Yv Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-04-29_11:2021-04-28, 2021-04-29 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 spamscore=0 adultscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 phishscore=0 impostorscore=0 bulkscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104290135 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Apr 2021 21:24:20 -0000 The patch LGTM now, thanks! On 29/04/2021 17:42, Lucas A. M. Magalhaes via Libc-alpha wrote: > Thanks for you for the reviews of the first version > > Changes from v1: > - Fix comments > - Add bcopy code > - Fix Makefile entry > - Fix END macro > > -- >8 -- > > This patch was initially based on the __memmove_power7 with some ideas > from strncpy implementation for Power 9. > > Improvements from __memmove_power7: > > 1. Use lxvl/stxvl for alignment code. > > The code for Power 7 uses branches when the input is not naturally > aligned to the width of a vector. The new implementation uses > lxvl/stxvl instead which reduces pressure on GPRs. It also allows > the removal of branch instructions, implicitly removing branch stalls > and mispredictions. > > 2. Use of lxv/stxv and lxvl/stxvl pair is safe to use on Cache Inhibited > memory. > > On Power 10 vector load and stores are safe to use on CI memory for > addresses unaligned to 16B. This code takes advantage of this to > do unaligned loads. > > The unaligned loads don't have a significant performance impact by > themselves. However doing so decreases register pressure on GPRs > and interdependence stalls on load/store pairs. This also improved > readability as there are now less code paths for different alignments. > Finally this reduces the overall code size. > > 3. Improved performance. > > This version runs on average about 30% better than memmove_power7 > for lengths larger than 8KB. For input lengths shorter than 8KB > the improvement is smaller, it has on average about 17% better > performance. > > This version has a degradation of about 50% for input lengths > in the 0 to 31 bytes range when dest is unaligned. > --- > .../powerpc/powerpc64/le/power10/memmove.S | 320 ++++++++++++++++++ > sysdeps/powerpc/powerpc64/multiarch/Makefile | 5 +- > sysdeps/powerpc/powerpc64/multiarch/bcopy.c | 9 + > .../powerpc64/multiarch/ifunc-impl-list.c | 14 + > .../powerpc64/multiarch/memmove-power10.S | 27 ++ > .../powerpc64/multiarch/memmove-power7.S | 4 +- > sysdeps/powerpc/powerpc64/multiarch/memmove.c | 16 +- > sysdeps/powerpc/powerpc64/power7/memmove.S | 2 + > 8 files changed, 389 insertions(+), 8 deletions(-) > create mode 100644 sysdeps/powerpc/powerpc64/le/power10/memmove.S > create mode 100644 sysdeps/powerpc/powerpc64/multiarch/memmove-power10.S > > diff --git a/sysdeps/powerpc/powerpc64/le/power10/memmove.S b/sysdeps/powerpc/powerpc64/le/power10/memmove.S > new file mode 100644 > index 0000000000..7dfd57edeb > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power10/memmove.S > @@ -0,0 +1,320 @@ > +/* Optimized memmove implementation for POWER10. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > + > +/* void* [r3] memmove (void *dest [r3], const void *src [r4], size_t len [r5]) > + > + This optimization checks if 'src' and 'dst' overlap. If they do not > + or 'src' is ahead of 'dest' then it copies forward. > + Otherwise, an optimized backward copy is used. */ > + > +#ifndef MEMMOVE > +# define MEMMOVE memmove > +#endif > + .machine power9 > +ENTRY_TOCLESS (MEMMOVE, 5) > + CALL_MCOUNT 3 > + > +L(_memmove): > + .p2align 5 > + /* Check if there is overlap, if so it will branch to backward copy. */ > + subf r9,r4,r3 > + cmpld cr7,r9,r5 > + blt cr7,L(memmove_bwd) > + > + /* Fast path for length shorter than 16 bytes. */ > + sldi r7,r5,56 > + lxvl 32+v2,r4,r7 > + stxvl 32+v2,r3,r7 > + subic. r8,r5,16 > + blelr > + > + /* For shorter lengths aligning the dest address to 16 bytes either > + decreases performance or is irrelevant. I'm making use of this > + comparison to skip the alignment in. */ > + cmpldi cr6,r5,256 > + bge cr6,L(ge_256) > + /* Account for the first 16-byte copy. */ > + addi r4,r4,16 > + addi r11,r3,16 /* use r11 to keep dest address on r3. */ > + subi r5,r5,16 > + b L(loop_head) > + > + .p2align 5 > +L(ge_256): > + /* Account for the first copy <= 16 bytes. This is necessary for > + memmove because at this point the src address can be in front of the > + dest address. */ > + clrldi r9,r5,56 > + li r8,16 > + cmpldi r9,16 > + iselgt r9,r8,r9 > + add r4,r4,r9 > + add r11,r3,r9 /* use r11 to keep dest address on r3. */ > + sub r5,r5,r9 > + > + /* Align dest to 16 bytes. */ > + neg r7,r3 > + clrldi. r9,r7,60 > + beq L(loop_head) > + > + .p2align 5 > + sldi r6,r9,56 > + lxvl 32+v0,r4,r6 > + stxvl 32+v0,r11,r6 > + sub r5,r5,r9 > + add r4,r4,r9 > + add r11,r11,r9 > + > +L(loop_head): > + cmpldi r5,63 > + ble L(final_64) > + > + srdi. r7,r5,7 > + beq L(loop_tail) > + > + mtctr r7 > + > +/* Main loop that copies 128 bytes each iteration. */ > + .p2align 5 > +L(loop): > + addi r9,r4,64 > + addi r10,r11,64 > + > + lxv 32+v0,0(r4) > + lxv 32+v1,16(r4) > + lxv 32+v2,32(r4) > + lxv 32+v3,48(r4) > + > + stxv 32+v0,0(r11) > + stxv 32+v1,16(r11) > + stxv 32+v2,32(r11) > + stxv 32+v3,48(r11) > + > + addi r4,r4,128 > + addi r11,r11,128 > + > + lxv 32+v4,0(r9) > + lxv 32+v5,16(r9) > + lxv 32+v6,32(r9) > + lxv 32+v7,48(r9) > + > + stxv 32+v4,0(r10) > + stxv 32+v5,16(r10) > + stxv 32+v6,32(r10) > + stxv 32+v7,48(r10) > + > + bdnz L(loop) > + clrldi. r5,r5,57 > + beqlr > + > +/* Copy 64 bytes. */ > + .p2align 5 > +L(loop_tail): > + cmpldi cr5,r5,63 > + ble cr5,L(final_64) > + > + lxv 32+v0,0(r4) > + lxv 32+v1,16(r4) > + lxv 32+v2,32(r4) > + lxv 32+v3,48(r4) > + > + stxv 32+v0,0(r11) > + stxv 32+v1,16(r11) > + stxv 32+v2,32(r11) > + stxv 32+v3,48(r11) > + > + addi r4,r4,64 > + addi r11,r11,64 > + subi r5,r5,64 > + > +/* Copies the last 1-63 bytes. */ > + .p2align 5 > +L(final_64): > + /* r8 holds the number of bytes that will be copied with lxv/stxv. */ > + clrrdi. r8,r5,4 > + beq L(tail1) > + > + cmpldi cr5,r5,32 > + lxv 32+v0,0(r4) > + blt cr5,L(tail2) > + > + cmpldi cr6,r5,48 > + lxv 32+v1,16(r4) > + blt cr6,L(tail3) > + > + .p2align 5 > + lxv 32+v2,32(r4) > + stxv 32+v2,32(r11) > +L(tail3): > + stxv 32+v1,16(r11) > +L(tail2): > + stxv 32+v0,0(r11) > + sub r5,r5,r8 > + add r4,r4,r8 > + add r11,r11,r8 > + .p2align 5 > +L(tail1): > + sldi r6,r5,56 > + lxvl v4,r4,r6 > + stxvl v4,r11,r6 > + blr > + > +/* If dest and src overlap, we should copy backwards. */ > +L(memmove_bwd): > + add r11,r3,r5 > + add r4,r4,r5 > + > + /* Optimization for length smaller than 16 bytes. */ > + cmpldi cr5,r5,15 > + ble cr5,L(tail1_bwd) > + > + /* For shorter lengths the alignment either slows down or is irrelevant. > + The forward copy uses a already need 256 comparison for that. Here > + it's using 128 as it will reduce code and improve readability. */ > + cmpldi cr7,r5,128 > + blt cr7,L(bwd_loop_tail) > + > + /* Align dest address to 16 bytes. */ > + .p2align 5 > + clrldi. r9,r11,60 > + beq L(bwd_loop_head) > + sub r4,r4,r9 > + sub r11,r11,r9 > + lxv 32+v0,0(r4) > + sldi r6,r9,56 > + stxvl 32+v0,r11,r6 > + sub r5,r5,r9 > + > +L(bwd_loop_head): > + srdi. r7,r5,7 > + beq L(bwd_loop_tail) > + > + mtctr r7 > + > +/* Main loop that copies 128 bytes every iteration. */ > + .p2align 5 > +L(bwd_loop): > + addi r9,r4,-64 > + addi r10,r11,-64 > + > + lxv 32+v0,-16(r4) > + lxv 32+v1,-32(r4) > + lxv 32+v2,-48(r4) > + lxv 32+v3,-64(r4) > + > + stxv 32+v0,-16(r11) > + stxv 32+v1,-32(r11) > + stxv 32+v2,-48(r11) > + stxv 32+v3,-64(r11) > + > + addi r4,r4,-128 > + addi r11,r11,-128 > + > + lxv 32+v0,-16(r9) > + lxv 32+v1,-32(r9) > + lxv 32+v2,-48(r9) > + lxv 32+v3,-64(r9) > + > + stxv 32+v0,-16(r10) > + stxv 32+v1,-32(r10) > + stxv 32+v2,-48(r10) > + stxv 32+v3,-64(r10) > + > + bdnz L(bwd_loop) > + clrldi. r5,r5,57 > + beqlr > + > +/* Copy 64 bytes. */ > + .p2align 5 > +L(bwd_loop_tail): > + cmpldi cr5,r5,63 > + ble cr5,L(bwd_final_64) > + > + addi r4,r4,-64 > + addi r11,r11,-64 > + > + lxv 32+v0,0(r4) > + lxv 32+v1,16(r4) > + lxv 32+v2,32(r4) > + lxv 32+v3,48(r4) > + > + stxv 32+v0,0(r11) > + stxv 32+v1,16(r11) > + stxv 32+v2,32(r11) > + stxv 32+v3,48(r11) > + > + subi r5,r5,64 > + > +/* Copies the last 1-63 bytes. */ > + .p2align 5 > +L(bwd_final_64): > + /* r8 holds the number of bytes that will be copied with lxv/stxv. */ > + clrrdi. r8,r5,4 > + beq L(tail1_bwd) > + > + cmpldi cr5,r5,32 > + lxv 32+v2,-16(r4) > + blt cr5,L(tail2_bwd) > + > + cmpldi cr6,r5,48 > + lxv 32+v1,-32(r4) > + blt cr6,L(tail3_bwd) > + > + .p2align 5 > + lxv 32+v0,-48(r4) > + stxv 32+v0,-48(r11) > +L(tail3_bwd): > + stxv 32+v1,-32(r11) > +L(tail2_bwd): > + stxv 32+v2,-16(r11) > + sub r4,r4,r5 > + sub r11,r11,r5 > + sub r5,r5,r8 > + sldi r6,r5,56 > + lxvl v4,r4,r6 > + stxvl v4,r11,r6 > + blr > + > +/* Copy last 16 bytes. */ > + .p2align 5 > +L(tail1_bwd): > + sub r4,r4,r5 > + sub r11,r11,r5 > + sldi r6,r5,56 > + lxvl v4,r4,r6 > + stxvl v4,r11,r6 > + blr > + > +END_GEN_TB (MEMMOVE,TB_TOCLESS) > +libc_hidden_builtin_def (memmove) > + > +/* void bcopy(const void *src [r3], void *dest [r4], size_t n [r5]) > + Implemented in this file to avoid linker create a stub function call > + in the branch to '_memmove'. */ > +ENTRY_TOCLESS (__bcopy) > + mr r6,r3 > + mr r3,r4 > + mr r4,r6 > + b L(_memmove) > +END (__bcopy) > +#ifndef __bcopy > +weak_alias (__bcopy, bcopy) > +#endif > diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile > index 8aa46a3702..cd5f46576b 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile > +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile > @@ -24,7 +24,8 @@ sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \ > stpncpy-power8 stpncpy-power7 stpncpy-ppc64 \ > strcmp-power8 strcmp-power7 strcmp-ppc64 \ > strcat-power8 strcat-power7 strcat-ppc64 \ > - memmove-power7 memmove-ppc64 wordcopy-ppc64 bcopy-ppc64 \ > + memmove-power7 memmove-ppc64 \ > + wordcopy-ppc64 bcopy-ppc64 \ > strncpy-power8 strstr-power7 strstr-ppc64 \ > strspn-power8 strspn-ppc64 strcspn-power8 strcspn-ppc64 \ > strlen-power8 strcasestr-power8 strcasestr-ppc64 \ > @@ -34,7 +35,7 @@ sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \ > ifneq (,$(filter %le,$(config-machine))) > sysdep_routines += strcmp-power9 strncmp-power9 strcpy-power9 stpcpy-power9 \ > rawmemchr-power9 strlen-power9 strncpy-power9 stpncpy-power9 \ > - strlen-power10 > + memmove-power10 strlen-power10 > endif > CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops > CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops > diff --git a/sysdeps/powerpc/powerpc64/multiarch/bcopy.c b/sysdeps/powerpc/powerpc64/multiarch/bcopy.c > index 04f3432f2b..2840b17fdf 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/bcopy.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/bcopy.c > @@ -22,8 +22,17 @@ > extern __typeof (bcopy) __bcopy_ppc attribute_hidden; > /* __bcopy_power7 symbol is implemented at memmove-power7.S */ > extern __typeof (bcopy) __bcopy_power7 attribute_hidden; > +#ifdef __LITTLE_ENDIAN__ > +extern __typeof (bcopy) __bcopy_power10 attribute_hidden; > +#endif > > libc_ifunc (bcopy, > +#ifdef __LITTLE_ENDIAN__ > + hwcap2 & (PPC_FEATURE2_ARCH_3_1 | > + PPC_FEATURE2_HAS_ISEL) > + && (hwcap & PPC_FEATURE_HAS_VSX) > + ? __bcopy_power10 : > +#endif > (hwcap & PPC_FEATURE_HAS_VSX) > ? __bcopy_power7 > : __bcopy_ppc); > diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > index 1a6993616f..d00bcc8178 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > @@ -67,6 +67,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > /* Support sysdeps/powerpc/powerpc64/multiarch/memmove.c. */ > IFUNC_IMPL (i, name, memmove, > +#ifdef __LITTLE_ENDIAN__ > + IFUNC_IMPL_ADD (array, i, memmove, > + hwcap2 & (PPC_FEATURE2_ARCH_3_1 | > + PPC_FEATURE2_HAS_ISEL) > + && (hwcap & PPC_FEATURE_HAS_VSX), > + __memmove_power10) > +#endif > IFUNC_IMPL_ADD (array, i, memmove, hwcap & PPC_FEATURE_HAS_VSX, > __memmove_power7) > IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_ppc)) > @@ -186,6 +193,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > /* Support sysdeps/powerpc/powerpc64/multiarch/bcopy.c. */ > IFUNC_IMPL (i, name, bcopy, > +#ifdef __LITTLE_ENDIAN__ > + IFUNC_IMPL_ADD (array, i, bcopy, > + hwcap2 & (PPC_FEATURE2_ARCH_3_1 | > + PPC_FEATURE2_HAS_ISEL) > + && (hwcap & PPC_FEATURE_HAS_VSX), > + __bcopy_power10) > +#endif > IFUNC_IMPL_ADD (array, i, bcopy, hwcap & PPC_FEATURE_HAS_VSX, > __bcopy_power7) > IFUNC_IMPL_ADD (array, i, bcopy, 1, __bcopy_ppc)) > diff --git a/sysdeps/powerpc/powerpc64/multiarch/memmove-power10.S b/sysdeps/powerpc/powerpc64/multiarch/memmove-power10.S > new file mode 100644 > index 0000000000..171b32921a > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/multiarch/memmove-power10.S > @@ -0,0 +1,27 @@ > +/* Optimized memmove implementation for POWER10. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#define MEMMOVE __memmove_power10 > + > +#undef libc_hidden_builtin_def > +#define libc_hidden_builtin_def(name) > + > +#undef __bcopy > +#define __bcopy __bcopy_power10 > + > +#include > diff --git a/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S b/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S > index d66da5826f..27b196d06c 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S > +++ b/sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S > @@ -21,7 +21,7 @@ > #undef libc_hidden_builtin_def > #define libc_hidden_builtin_def(name) > > -#undef bcopy > -#define bcopy __bcopy_power7 > +#undef __bcopy > +#define __bcopy __bcopy_power7 > > #include > diff --git a/sysdeps/powerpc/powerpc64/multiarch/memmove.c b/sysdeps/powerpc/powerpc64/multiarch/memmove.c > index 9bec61a321..420c2f279a 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/memmove.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/memmove.c > @@ -28,14 +28,22 @@ > # include "init-arch.h" > > extern __typeof (__redirect_memmove) __libc_memmove; > - > extern __typeof (__redirect_memmove) __memmove_ppc attribute_hidden; > extern __typeof (__redirect_memmove) __memmove_power7 attribute_hidden; > +#ifdef __LITTLE_ENDIAN__ > +extern __typeof (__redirect_memmove) __memmove_power10 attribute_hidden; > +#endif > > libc_ifunc (__libc_memmove, > - (hwcap & PPC_FEATURE_HAS_VSX) > - ? __memmove_power7 > - : __memmove_ppc); > +#ifdef __LITTLE_ENDIAN__ > + hwcap2 & (PPC_FEATURE2_ARCH_3_1 | > + PPC_FEATURE2_HAS_ISEL) > + && (hwcap & PPC_FEATURE_HAS_VSX) > + ? __memmove_power10 : > +#endif > + (hwcap & PPC_FEATURE_HAS_VSX) > + ? __memmove_power7 > + : __memmove_ppc); > > #undef memmove > strong_alias (__libc_memmove, memmove); > diff --git a/sysdeps/powerpc/powerpc64/power7/memmove.S b/sysdeps/powerpc/powerpc64/power7/memmove.S > index 8366145457..f61949d30f 100644 > --- a/sysdeps/powerpc/powerpc64/power7/memmove.S > +++ b/sysdeps/powerpc/powerpc64/power7/memmove.S > @@ -832,4 +832,6 @@ ENTRY_TOCLESS (__bcopy) > mr r4,r6 > b L(_memmove) > END (__bcopy) > +#ifndef __bcopy > weak_alias (__bcopy, bcopy) > +#endif > -- Raphael Moreira Zinsly