From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by sourceware.org (Postfix) with ESMTPS id 66B84385840A for ; Wed, 27 Oct 2021 12:48:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 66B84385840A Received: by mail-pg1-x529.google.com with SMTP id c4so2788896pgv.11 for ; Wed, 27 Oct 2021 05:48:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GyNLNez9O+K7jMJ/7PZhET9SqdPKGbEhV+sb708zmNk=; b=cDXxfXXLquV4XePfPaJAlSQDxyLjmrKx+MUOXTkNJIrRyyNxbz+zhbNbeP/6DQ4WYi u6MSq/5pwh6mIWmA6tT3PlKmBpiN1L7Tkt5bRLa6MPc581Z7zQKmTi8LLSipYfniryIx FI/G/zAEGo6h9WCeZT6eRQJCjWdBr5P4C3xP+KxGSKsdnyQzIZEFm8MRw8vjpRDC23E+ pPkrie3GkB/h6z2pHZki2G66qj8q6ASxXx4wcY1roNOufR7PoYaIPeHjsTIshbACN49q 6Ywrh+NZzqB0e/yShlJBQBIxFWAAagKmJXOdar9GNyEG7NaF13M+wmhq+OQzkcxGQLKE IjzQ== X-Gm-Message-State: AOAM5322uDGBAWrXJUqUlL8dssUFuKb5yG0WRXtREEqnkibw4SALUD8O GqFNbuN3YKE6wBGgIvtAXawc01LafycEFNcJuFWv6E5qfoI= X-Google-Smtp-Source: ABdhPJw/DVadEnmO1r+Pi/mQi8h4U2u37zcaVO6OdgTl3bkqkWfW7CQhVXebB1hE9jLY4qpKw9K7iJ3Nc5KW/kji/iQ= X-Received: by 2002:a62:7858:0:b0:47c:1cf3:d95 with SMTP id t85-20020a627858000000b0047c1cf30d95mr8048176pfc.60.1635338890408; Wed, 27 Oct 2021 05:48:10 -0700 (PDT) MIME-Version: 1.0 References: <20211027024323.1199441-1-goldstein.w.n@gmail.com> <20211027024323.1199441-3-goldstein.w.n@gmail.com> In-Reply-To: <20211027024323.1199441-3-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Wed, 27 Oct 2021 05:47:29 -0700 Message-ID: Subject: Re: [PATCH v1 3/6] x86_64: Add support for __memcmpeq using sse2, avx2, and evex To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3029.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Oct 2021 12:48:14 -0000 On Tue, Oct 26, 2021 at 7:43 PM Noah Goldstein wrote: > > No bug. This commit adds support for __memcmpeq to be implemented > seperately from memcmp. Support is added for versions optimized with > sse2, avx2, and evex. > --- > sysdeps/generic/ifunc-init.h | 5 +- > sysdeps/x86_64/memcmp.S | 9 ++-- > sysdeps/x86_64/multiarch/Makefile | 4 ++ > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 21 +++++++++ > sysdeps/x86_64/multiarch/ifunc-memcmpeq.h | 49 ++++++++++++++++++++ > sysdeps/x86_64/multiarch/memcmp-sse2.S | 4 +- > sysdeps/x86_64/multiarch/memcmp.c | 3 -- > sysdeps/x86_64/multiarch/memcmpeq-avx2-rtm.S | 12 +++++ > sysdeps/x86_64/multiarch/memcmpeq-avx2.S | 23 +++++++++ > sysdeps/x86_64/multiarch/memcmpeq-evex.S | 23 +++++++++ > sysdeps/x86_64/multiarch/memcmpeq-sse2.S | 23 +++++++++ > sysdeps/x86_64/multiarch/memcmpeq.c | 35 ++++++++++++++ > 12 files changed, 202 insertions(+), 9 deletions(-) > create mode 100644 sysdeps/x86_64/multiarch/ifunc-memcmpeq.h > create mode 100644 sysdeps/x86_64/multiarch/memcmpeq-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/memcmpeq-avx2.S > create mode 100644 sysdeps/x86_64/multiarch/memcmpeq-evex.S > create mode 100644 sysdeps/x86_64/multiarch/memcmpeq-sse2.S > create mode 100644 sysdeps/x86_64/multiarch/memcmpeq.c > > diff --git a/sysdeps/generic/ifunc-init.h b/sysdeps/generic/ifunc-init.h > index 7f69485de8..ee8a8289c8 100644 > --- a/sysdeps/generic/ifunc-init.h > +++ b/sysdeps/generic/ifunc-init.h > @@ -50,5 +50,8 @@ > '___' as the optimized implementation and > '_ifunc_selector' as the IFUNC selector. */ > #define REDIRECT_NAME EVALUATOR1 (__redirect, SYMBOL_NAME) > -#define OPTIMIZE(name) EVALUATOR2 (SYMBOL_NAME, name) > #define IFUNC_SELECTOR EVALUATOR1 (SYMBOL_NAME, ifunc_selector) > +#define OPTIMIZE1(name) EVALUATOR1 (SYMBOL_NAME, name) > +#define OPTIMIZE2(name) EVALUATOR2 (SYMBOL_NAME, name) > +/* Default is to use OPTIMIZE2. */ > +#define OPTIMIZE(name) OPTIMIZE2(name) > diff --git a/sysdeps/x86_64/memcmp.S b/sysdeps/x86_64/memcmp.S > index 8a03e572e8..b53f2c0866 100644 > --- a/sysdeps/x86_64/memcmp.S > +++ b/sysdeps/x86_64/memcmp.S > @@ -356,9 +356,10 @@ L(ATR32res): > .p2align 4,, 4 > END(memcmp) > > -#undef bcmp > +#ifdef USE_AS_MEMCMPEQ > +libc_hidden_def (memcmp) > +#else > +# undef bcmp > weak_alias (memcmp, bcmp) > -#undef __memcmpeq > -strong_alias (memcmp, __memcmpeq) > libc_hidden_builtin_def (memcmp) > -libc_hidden_def (__memcmpeq) > +#endif > diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile > index 26be40959c..044778585b 100644 > --- a/sysdeps/x86_64/multiarch/Makefile > +++ b/sysdeps/x86_64/multiarch/Makefile > @@ -7,7 +7,9 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \ > memchr-sse2 rawmemchr-sse2 memchr-avx2 rawmemchr-avx2 \ > memrchr-sse2 memrchr-avx2 \ > memcmp-sse2 \ > + memcmpeq-sse2 \ > memcmp-avx2-movbe \ > + memcmpeq-avx2 \ > memcmp-sse4 memcpy-ssse3 \ > memmove-ssse3 \ > memcpy-ssse3-back \ > @@ -42,6 +44,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \ > memset-avx512-unaligned-erms \ > memchr-avx2-rtm \ > memcmp-avx2-movbe-rtm \ > + memcmpeq-avx2-rtm \ > memmove-avx-unaligned-erms-rtm \ > memrchr-avx2-rtm \ > memset-avx2-unaligned-erms-rtm \ > @@ -61,6 +64,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \ > strrchr-avx2-rtm \ > memchr-evex \ > memcmp-evex-movbe \ > + memcmpeq-evex \ > memmove-evex-unaligned-erms \ > memrchr-evex \ > memset-evex-unaligned-erms \ > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index 39ab10613b..f7f3806d1d 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -38,6 +38,27 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > size_t i = 0; > > + /* Support sysdeps/x86_64/multiarch/memcmpeq.c. */ > + IFUNC_IMPL (i, name, __memcmpeq, > + IFUNC_IMPL_ADD (array, i, __memcmpeq, > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (MOVBE) > + && CPU_FEATURE_USABLE (BMI2)), > + __memcmpeq_avx2) > + IFUNC_IMPL_ADD (array, i, __memcmpeq, > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > + && CPU_FEATURE_USABLE (MOVBE) > + && CPU_FEATURE_USABLE (RTM)), > + __memcmpeq_avx2_rtm) > + IFUNC_IMPL_ADD (array, i, __memcmpeq, > + (CPU_FEATURE_USABLE (AVX512VL) > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (MOVBE) > + && CPU_FEATURE_USABLE (BMI2)), > + __memcmpeq_evex) > + IFUNC_IMPL_ADD (array, i, __memcmpeq, 1, __memcmpeq_sse2)) > + > /* Support sysdeps/x86_64/multiarch/memchr.c. */ > IFUNC_IMPL (i, name, memchr, > IFUNC_IMPL_ADD (array, i, memchr, > diff --git a/sysdeps/x86_64/multiarch/ifunc-memcmpeq.h b/sysdeps/x86_64/multiarch/ifunc-memcmpeq.h > new file mode 100644 > index 0000000000..3319a9568a > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/ifunc-memcmpeq.h > @@ -0,0 +1,49 @@ > +/* Common definition for __memcmpeq ifunc selections. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +# include > + > +extern __typeof (REDIRECT_NAME) OPTIMIZE1 (sse2) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx2) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx2_rtm) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE1 (evex) attribute_hidden; > + > +static inline void * > +IFUNC_SELECTOR (void) > +{ > + const struct cpu_features* cpu_features = __get_cpu_features (); > + > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && CPU_FEATURE_USABLE_P (cpu_features, BMI2) > + && CPU_FEATURE_USABLE_P (cpu_features, MOVBE) > + && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) > + { > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > + return OPTIMIZE1 (evex); > + > + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > + return OPTIMIZE1 (avx2_rtm); > + > + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > + return OPTIMIZE1 (avx2); > + } > + > + return OPTIMIZE1 (sse2); > +} > diff --git a/sysdeps/x86_64/multiarch/memcmp-sse2.S b/sysdeps/x86_64/multiarch/memcmp-sse2.S > index 7b30b7ca2e..132d6fb339 100644 > --- a/sysdeps/x86_64/multiarch/memcmp-sse2.S > +++ b/sysdeps/x86_64/multiarch/memcmp-sse2.S > @@ -17,7 +17,9 @@ > . */ > > #if IS_IN (libc) > -# define memcmp __memcmp_sse2 > +# ifndef memcmp > +# define memcmp __memcmp_sse2 > +# endif > > # ifdef SHARED > # undef libc_hidden_builtin_def > diff --git a/sysdeps/x86_64/multiarch/memcmp.c b/sysdeps/x86_64/multiarch/memcmp.c > index 7b3409b1dd..fe725f3563 100644 > --- a/sysdeps/x86_64/multiarch/memcmp.c > +++ b/sysdeps/x86_64/multiarch/memcmp.c > @@ -29,9 +29,6 @@ > libc_ifunc_redirected (__redirect_memcmp, memcmp, IFUNC_SELECTOR ()); > # undef bcmp > weak_alias (memcmp, bcmp) > -# undef __memcmpeq > -strong_alias (memcmp, __memcmpeq) > -libc_hidden_def (__memcmpeq) > > # ifdef SHARED > __hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp) > diff --git a/sysdeps/x86_64/multiarch/memcmpeq-avx2-rtm.S b/sysdeps/x86_64/multiarch/memcmpeq-avx2-rtm.S > new file mode 100644 > index 0000000000..24b6a0c9ff > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/memcmpeq-avx2-rtm.S > @@ -0,0 +1,12 @@ > +#ifndef MEMCMP > +# define MEMCMP __memcmpeq_avx2_rtm > +#endif > + > +#define ZERO_UPPER_VEC_REGISTERS_RETURN \ > + ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST > + > +#define VZEROUPPER_RETURN jmp L(return_vzeroupper) > + > +#define SECTION(p) p##.avx.rtm > + > +#include "memcmpeq-avx2.S" > diff --git a/sysdeps/x86_64/multiarch/memcmpeq-avx2.S b/sysdeps/x86_64/multiarch/memcmpeq-avx2.S > new file mode 100644 > index 0000000000..0181ea0d8d > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/memcmpeq-avx2.S > @@ -0,0 +1,23 @@ > +/* __memcmpeq optimized with AVX2. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef MEMCMP > +# define MEMCMP __memcmpeq_avx2 > +#endif > + > +#include "memcmp-avx2-movbe.S" > diff --git a/sysdeps/x86_64/multiarch/memcmpeq-evex.S b/sysdeps/x86_64/multiarch/memcmpeq-evex.S > new file mode 100644 > index 0000000000..951e1e9560 > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/memcmpeq-evex.S > @@ -0,0 +1,23 @@ > +/* __memcmpeq optimized with EVEX. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef MEMCMP > +# define MEMCMP __memcmpeq_evex > +#endif > + > +#include "memcmp-evex-movbe.S" > diff --git a/sysdeps/x86_64/multiarch/memcmpeq-sse2.S b/sysdeps/x86_64/multiarch/memcmpeq-sse2.S > new file mode 100644 > index 0000000000..c488cbbcd9 > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/memcmpeq-sse2.S > @@ -0,0 +1,23 @@ > +/* __memcmpeq optimized with SSE2. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef memcmp > +# define memcmp __memcmpeq_sse2 > +#endif > +#define USE_AS_MEMCMPEQ 1 > +#include "memcmp-sse2.S" > diff --git a/sysdeps/x86_64/multiarch/memcmpeq.c b/sysdeps/x86_64/multiarch/memcmpeq.c > new file mode 100644 > index 0000000000..163e56047e > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/memcmpeq.c > @@ -0,0 +1,35 @@ > +/* Multiple versions of __memcmpeq. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +/* Define multiple versions only for the definition in libc. */ > +#if IS_IN (libc) > +# define __memcmpeq __redirect___memcmpeq > +# include > +# undef __memcmpeq > + > +# define SYMBOL_NAME __memcmpeq > +# include "ifunc-memcmpeq.h" > + > +libc_ifunc_redirected (__redirect___memcmpeq, __memcmpeq, IFUNC_SELECTOR ()); > + > +# ifdef SHARED > +__hidden_ver1 (__memcmpeq, __GI___memcmpeq, __redirect___memcmpeq) > + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (__memcmpeq); > +# endif > +#endif > -- > 2.25.1 > LGTM. Reviewed-by: H.J. Lu Thanks. -- H.J.