From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by sourceware.org (Postfix) with ESMTPS id 79A8F3890C0A for ; Tue, 5 May 2020 21:10:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 79A8F3890C0A Received: by mail-qt1-x842.google.com with SMTP id h26so3313047qtu.8 for ; Tue, 05 May 2020 14:10:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:references:from:autocrypt:subject:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=3oNuVZu3Ah+YXZKKrVU2tzGXHLJeiXGlDt0ZoPZ2MXU=; b=ZWFr1KNL8gQVG9XUNQ9H2bLNLBJBAQeP6RMrnaQFiKKz9T4nCZWM72ahrDvIwxSSoE FFlqlwTScvDDEAFcM+PYRiG2y9swvYZub03lqypkXit+uBIBNk6l8pj+Cv0WPdGg78h6 4rG8jyVtc4klvPaYjrXeH6R1HWivfFEJck80RyoW06FVJy3NAPxlgUdOVTCaR8xeOrcu P+SqS6zn4n10xv6EZNfDRnYbfkxKDWF86zNG05WpwxD5cPZIzGO95ANRFJ+gmEwR4lI8 9YWS1KxWQPuRHxasWzIhhd9d+V6cy93cjaGaeXcYGCnMGJZnCKhOEe8orEOmKPc5T0wm oHcg== X-Gm-Message-State: AGi0PuZCFue+g4sNUOH1ycEKL6NHzLVSfq+/ewH5erQB4rSMVbtF0Idj 1e6IJVKA/B+90VI8u7c35L2JTjoc3T4= X-Google-Smtp-Source: APiQypJcagkwu/+QBBlL8oEvIdmQzg8ZDpZYRX+RMdTahqNPGwpRnuCcwKG/EY8J5SgxCjUjCOTNJw== X-Received: by 2002:ac8:776f:: with SMTP id h15mr4891650qtu.36.1588713038657; Tue, 05 May 2020 14:10:38 -0700 (PDT) Received: from [192.168.1.4] ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id z65sm59603qka.60.2020.05.05.14.10.37 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 May 2020 14:10:38 -0700 (PDT) To: libc-alpha@sourceware.org References: <20200501133213.68bfbec7@kryten.localdomain> From: Adhemerval Zanella Autocrypt: addr=adhemerval.zanella@linaro.org; prefer-encrypt=mutual; keydata= mQINBFcVGkoBEADiQU2x/cBBmAVf5C2d1xgz6zCnlCefbqaflUBw4hB/bEME40QsrVzWZ5Nq 8kxkEczZzAOKkkvv4pRVLlLn/zDtFXhlcvQRJ3yFMGqzBjofucOrmdYkOGo0uCaoJKPT186L NWp53SACXguFJpnw4ODI64ziInzXQs/rUJqrFoVIlrPDmNv/LUv1OVPKz20ETjgfpg8MNwG6 iMizMefCl+RbtXbIEZ3TE/IaDT/jcOirjv96lBKrc/pAL0h/O71Kwbbp43fimW80GhjiaN2y WGByepnkAVP7FyNarhdDpJhoDmUk9yfwNuIuESaCQtfd3vgKKuo6grcKZ8bHy7IXX1XJj2X/ BgRVhVgMHAnDPFIkXtP+SiarkUaLjGzCz7XkUn4XAGDskBNfbizFqYUQCaL2FdbW3DeZqNIa nSzKAZK7Dm9+0VVSRZXP89w71Y7JUV56xL/PlOE+YKKFdEw+gQjQi0e+DZILAtFjJLoCrkEX w4LluMhYX/X8XP6/C3xW0yOZhvHYyn72sV4yJ1uyc/qz3OY32CRy+bwPzAMAkhdwcORA3JPb kPTlimhQqVgvca8m+MQ/JFZ6D+K7QPyvEv7bQ7M+IzFmTkOCwCJ3xqOD6GjX3aphk8Sr0dq3 4Awlf5xFDAG8dn8Uuutb7naGBd/fEv6t8dfkNyzj6yvc4jpVxwARAQABtElBZGhlbWVydmFs IFphbmVsbGEgTmV0dG8gKExpbmFybyBWUE4gS2V5KSA8YWRoZW1lcnZhbC56YW5lbGxhQGxp bmFyby5vcmc+iQI3BBMBCAAhBQJXFRpKAhsDBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJ EKqx7BSnlIjv0e8P/1YOYoNkvJ+AJcNUaM5a2SA9oAKjSJ/M/EN4Id5Ow41ZJS4lUA0apSXW NjQg3VeVc2RiHab2LIB4MxdJhaWTuzfLkYnBeoy4u6njYcaoSwf3g9dSsvsl3mhtuzm6aXFH /Qsauav77enJh99tI4T+58rp0EuLhDsQbnBic/ukYNv7sQV8dy9KxA54yLnYUFqH6pfH8Lly sTVAMyi5Fg5O5/hVV+Z0Kpr+ZocC1YFJkTsNLAW5EIYSP9ftniqaVsim7MNmodv/zqK0IyDB GLLH1kjhvb5+6ySGlWbMTomt/or/uvMgulz0bRS+LUyOmlfXDdT+t38VPKBBVwFMarNuREU2 69M3a3jdTfScboDd2ck1u7l+QbaGoHZQ8ZNUrzgObltjohiIsazqkgYDQzXIMrD9H19E+8fw kCNUlXxjEgH/Kg8DlpoYJXSJCX0fjMWfXywL6ZXc2xyG/hbl5hvsLNmqDpLpc1CfKcA0BkK+ k8R57fr91mTCppSwwKJYO9T+8J+o4ho/CJnK/jBy1pWKMYJPvvrpdBCWq3MfzVpXYdahRKHI ypk8m4QlRlbOXWJ3TDd/SKNfSSrWgwRSg7XCjSlR7PNzNFXTULLB34sZhjrN6Q8NQZsZnMNs TX8nlGOVrKolnQPjKCLwCyu8PhllU8OwbSMKskcD1PSkG6h3r0AquQINBFcVGkoBEACgAdbR Ck+fsfOVwT8zowMiL3l9a2DP3Eeak23ifdZG+8Avb/SImpv0UMSbRfnw/N81IWwlbjkjbGTu oT37iZHLRwYUFmA8fZX0wNDNKQUUTjN6XalJmvhdz9l71H3WnE0wneEM5ahu5V1L1utUWTyh VUwzX1lwJeV3vyrNgI1kYOaeuNVvq7npNR6t6XxEpqPsNc6O77I12XELic2+36YibyqlTJIQ V1SZEbIy26AbC2zH9WqaKyGyQnr/IPbTJ2Lv0dM3RaXoVf+CeK7gB2B+w1hZummD21c1Laua +VIMPCUQ+EM8W9EtX+0iJXxI+wsztLT6vltQcm+5Q7tY+HFUucizJkAOAz98YFucwKefbkTp eKvCfCwiM1bGatZEFFKIlvJ2QNMQNiUrqJBlW9nZp/k7pbG3oStOjvawD9ZbP9e0fnlWJIsj 6c7pX354Yi7kxIk/6gREidHLLqEb/otuwt1aoMPg97iUgDV5mlNef77lWE8vxmlY0FBWIXuZ yv0XYxf1WF6dRizwFFbxvUZzIJp3spAao7jLsQj1DbD2s5+S1BW09A0mI/1DjB6EhNN+4bDB SJCOv/ReK3tFJXuj/HbyDrOdoMt8aIFbe7YFLEExHpSk+HgN05Lg5TyTro8oW7TSMTk+8a5M kzaH4UGXTTBDP/g5cfL3RFPl79ubXwARAQABiQIfBBgBCAAJBQJXFRpKAhsMAAoJEKqx7BSn lIjvI/8P/jg0jl4Tbvg3B5kT6PxJOXHYu9OoyaHLcay6Cd+ZrOd1VQQCbOcgLFbf4Yr+rE9l mYsY67AUgq2QKmVVbn9pjvGsEaz8UmfDnz5epUhDxC6yRRvY4hreMXZhPZ1pbMa6A0a/WOSt AgFj5V6Z4dXGTM/lNManr0HjXxbUYv2WfbNt3/07Db9T+GZkpUotC6iknsTA4rJi6u2ls0W9 1UIvW4o01vb4nZRCj4rni0g6eWoQCGoVDk/xFfy7ZliR5B+3Z3EWRJcQskip/QAHjbLa3pml xAZ484fVxgeESOoaeC9TiBIp0NfH8akWOI0HpBCiBD5xaCTvR7ujUWMvhsX2n881r/hNlR9g fcE6q00qHSPAEgGr1bnFv74/1vbKtjeXLCcRKk3Ulw0bY1OoDxWQr86T2fZGJ/HIZuVVBf3+ gaYJF92GXFynHnea14nFFuFgOni0Mi1zDxYH/8yGGBXvo14KWd8JOW0NJPaCDFJkdS5hu0VY 7vJwKcyHJGxsCLU+Et0mryX8qZwqibJIzu7kUJQdQDljbRPDFd/xmGUFCQiQAncSilYOcxNU EMVCXPAQTteqkvA+gNqSaK1NM9tY0eQ4iJpo+aoX8HAcn4sZzt2pfUB9vQMTBJ2d4+m/qO6+ cFTAceXmIoFsN8+gFN3i8Is3u12u8xGudcBPvpoy4OoG Subject: Re: [PATCH] powerpc: Optimized strlen for POWER9 Message-ID: Date: Tue, 5 May 2020 18:10:35 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200501133213.68bfbec7@kryten.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-17.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 May 2020 21:10:41 -0000 On 01/05/2020 00:32, Anton Blanchard via Libc-alpha wrote: > This version performs much better than the POWER8 version for medium > length strings (50-100 bytes) as well as smaller gains on small > and long unaligned strings. As for strcmp, it seems that it uses the ISA 3.0 partial stores to optimize vector instructions usage. Could you add it on the commit message? Usually for such optimizations we try to get a baseline benchmark results using glibc benchtests. Could you post the results for before and after? LGTM with some clarification below. Reviewed-by: Adhemerval Zanella > --- > sysdeps/powerpc/powerpc64/le/power9/strlen.S | 102 ++++++++++++++++++ > sysdeps/powerpc/powerpc64/multiarch/Makefile | 2 +- > .../powerpc64/multiarch/ifunc-impl-list.c | 4 + > .../powerpc64/multiarch/strlen-power9.S | 24 +++++ > sysdeps/powerpc/powerpc64/multiarch/strlen.c | 17 ++- > 5 files changed, 143 insertions(+), 6 deletions(-) > create mode 100644 sysdeps/powerpc/powerpc64/le/power9/strlen.S > create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strlen-power9.S > > diff --git a/sysdeps/powerpc/powerpc64/le/power9/strlen.S b/sysdeps/powerpc/powerpc64/le/power9/strlen.S > new file mode 100644 > index 0000000000..afaa76907f > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power9/strlen.S > @@ -0,0 +1,102 @@ > +/* Optimized strlen implementation for PowerPC64/POWER9. > + Copyright (C) 2020 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +#ifndef STRLEN > +# define STRLEN strlen > +#endif > + > +/* Implements the function > + > + int [r3] strlen (char *s [r3]) > + > + The implementation can load bytes past a null terminator, but only > + up to the next 16B boundary, so it never crosses a page. */ > + > +.machine power9 I assume that minimal supported binutils won't barf on this, correct? > +ENTRY_TOCLESS (STRLEN, 4) > + CALL_MCOUNT 1 > + > + vspltisb v19,-1 /* Ones in v19 */ > + vspltisb v18,0 /* Zeroes in v18 */ > + > + neg r5,r3 > + rldicl r9,r5,0,60 /* How many bytes to get source 16B aligned? */ > + > + /* Align data and fill bytes not loaded with ones */ > + lvx v0,0,r3 > + lvsr v1,0,r3 > + vperm v0,v19,v0,v1 > + > + vcmpequb. v6,v0,v18 /* 0xff if byte is NULL, 0x00 otherwise */ > + beq cr6,L(aligned) > + > + vctzlsbb r3,v6 > + blr > + > +L(aligned): > + add r4,r3,r9 > + mr r3,r9 > + > +L(loop): Should we enforce alignment here? > + lxv v0+32,0(r4) > + vcmpequb. v6,v0,v18 /* 0xff if byte is NULL, 0x00 otherwise */ > + bne cr6,L(tail1) > + > + lxv v0+32,16(r4) > + vcmpequb. v6,v0,v18 /* 0xff if byte is NULL, 0x00 otherwise */ > + bne cr6,L(tail2) > + > + lxv v0+32,32(r4) > + vcmpequb. v6,v0,v18 /* 0xff if byte is NULL, 0x00 otherwise */ > + bne cr6,L(tail3) > + > + lxv v0+32,48(r4) > + vcmpequb. v6,v0,v18 /* 0xff if byte is NULL, 0x00 otherwise */ > + bne cr6,L(tail4) As for strcmp, why unroll 4x time here? > + > + addi r3,r3,64 > + addi r4,r4,64 > + b L(loop) > + > +L(tail1): > + vctzlsbb r0,v6 > + add r3,r3,r0 > + blr > + > +L(tail2): > + vctzlsbb r0,v6 > + add r3,r3,r0 > + addi r3,r3,16 > + blr > + > +L(tail3): > + vctzlsbb r0,v6 > + add r3,r3,r0 > + addi r3,r3,32 > + blr > + > +L(tail4): > + vctzlsbb r0,v6 > + add r3,r3,r0 > + addi r3,r3,48 > + blr > + > +END (STRLEN) > +libc_hidden_builtin_def (strlen) Ok. > diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile > index ea936bf9ed..1f9318bfbb 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile > +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile > @@ -32,7 +32,7 @@ sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \ > strncase-power8 > > ifneq (,$(filter %le,$(config-machine))) > -sysdep_routines += strcmp-power9 strncmp-power9 > +sysdep_routines += strcmp-power9 strncmp-power9 strlen-power9 > endif > CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops > CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops Ok. > diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > index b9fef3f43c..1fe7fb7812 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > @@ -103,6 +103,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > /* Support sysdeps/powerpc/powerpc64/multiarch/strlen.c. */ > IFUNC_IMPL (i, name, strlen, > +#ifdef __LITTLE_ENDIAN__ > + IFUNC_IMPL_ADD (array, i, strlen, hwcap2 & PPC_FEATURE2_ARCH_3_00, > + __strlen_power9) > +#endif > IFUNC_IMPL_ADD (array, i, strlen, hwcap2 & PPC_FEATURE2_ARCH_2_07, > __strlen_power8) > IFUNC_IMPL_ADD (array, i, strlen, hwcap & PPC_FEATURE_HAS_VSX, Ok. > diff --git a/sysdeps/powerpc/powerpc64/multiarch/strlen-power9.S b/sysdeps/powerpc/powerpc64/multiarch/strlen-power9.S > new file mode 100644 > index 0000000000..223ff54c39 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/multiarch/strlen-power9.S > @@ -0,0 +1,24 @@ > +/* Optimized strlen implementation for POWER8. > + Copyright (C) 2020 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#define STRLEN __strlen_power9 > + > +#undef libc_hidden_builtin_def > +#define libc_hidden_builtin_def(name) > + > +#include Ok. > diff --git a/sysdeps/powerpc/powerpc64/multiarch/strlen.c b/sysdeps/powerpc/powerpc64/multiarch/strlen.c > index e587554221..c418c2bde4 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/strlen.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/strlen.c > @@ -30,13 +30,20 @@ extern __typeof (__redirect_strlen) __libc_strlen; > extern __typeof (__redirect_strlen) __strlen_ppc attribute_hidden; > extern __typeof (__redirect_strlen) __strlen_power7 attribute_hidden; > extern __typeof (__redirect_strlen) __strlen_power8 attribute_hidden; > +# ifdef __LITTLE_ENDIAN__ > +extern __typeof (__redirect_strlen) __strlen_power9 attribute_hidden; > +# endif > > libc_ifunc (__libc_strlen, > - (hwcap2 & PPC_FEATURE2_ARCH_2_07) > - ? __strlen_power8 : > - (hwcap & PPC_FEATURE_HAS_VSX) > - ? __strlen_power7 > - : __strlen_ppc); > +# ifdef __LITTLE_ENDIAN__ > + (hwcap2 & PPC_FEATURE2_ARCH_3_00) > + ? __strlen_power9 : > +# endif > + (hwcap2 & PPC_FEATURE2_ARCH_2_07) > + ? __strlen_power8 : > + (hwcap & PPC_FEATURE_HAS_VSX) > + ? __strlen_power7 > + : __strlen_ppc); > > #undef strlen > strong_alias (__libc_strlen, strlen) > Ok.