From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 7852) id A1607385801D; Tue, 19 Jul 2022 05:54:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A1607385801D Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Sunil Pandey To: glibc-cvs@sourceware.org Subject: [glibc/release/2.33/master] x86: Add sse42 implementation to strcmp's ifunc X-Act-Checkin: glibc X-Git-Author: Noah Goldstein X-Git-Refname: refs/heads/release/2.33/master X-Git-Oldrev: 20bfbb3a5789551a09b1ec4db97dcaed6f9180f0 X-Git-Newrev: cfa13a8205092cd4216e65eddc19742be379aaea Message-Id: <20220719055416.A1607385801D@sourceware.org> Date: Tue, 19 Jul 2022 05:54:16 +0000 (GMT) X-BeenThere: glibc-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2022 05:54:16 -0000 https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfa13a8205092cd4216e65eddc19742be379aaea commit cfa13a8205092cd4216e65eddc19742be379aaea Author: Noah Goldstein Date: Tue Jun 14 15:37:28 2022 -0700 x86: Add sse42 implementation to strcmp's ifunc This has been missing since the the ifuncs where added. The performance of SSE4.2 is preferable to to SSE2. Measured on Tigerlake with N = 20 runs. Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906 (cherry picked from commit ff439c47173565fbff4f0f78d07b0f14e4a7db05) Diff: --- sysdeps/x86_64/multiarch/strcmp.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c index 7c2901bf44..b457fb4c15 100644 --- a/sysdeps/x86_64/multiarch/strcmp.c +++ b/sysdeps/x86_64/multiarch/strcmp.c @@ -29,6 +29,7 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; @@ -53,6 +54,10 @@ IFUNC_SELECTOR (void) return OPTIMIZE (avx2); } + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2) + && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2)) + return OPTIMIZE (sse42); + if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load)) return OPTIMIZE (sse2_unaligned);