From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by sourceware.org (Postfix) with ESMTPS id CB5DF385842D for ; Mon, 1 Nov 2021 12:54:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CB5DF385842D Received: by mail-pj1-x1036.google.com with SMTP id x16-20020a17090a789000b001a69735b339so1280261pjk.5 for ; Mon, 01 Nov 2021 05:54:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bHS3j5wKQwvCH3/T3ZdWXKOpi1ZODtw4GG/9lS8sZE4=; b=z87k1jVsq8X7MfWnmDgqzrXlBjxZvUOMwlbRG+8t8ziCAXPcC6ee3/9koJnPvfSDS5 EhrtQY4W73pBwTgf5DbnaLUe8ID3+ZcNYaDJqDmdvIc/GCKkPPefRMuEqS1/QvzVQVKC ZwHiIUwiMiLCVAFFgy6o7qOFQDWYG+MlU3UQNJzadQ/yEImlJ7iwmK9S5OmqHsBqGS/z XDgnTPifOfKzf1rZJVOvikNazqipdVsHqI/nzqRJcuEFC4/lmQ6YXSwyEWBZrIYfGYLx nS2wz7a9NRwlPUEcF2x8B4q+heohNN450gH7Osy8egwlUh9boa3CCaoAFypQWgoZEL4U 9pTg== X-Gm-Message-State: AOAM5305TNLVtnHIFAku+xLrP8LlceZfH8wzxzj5LWXDdb0LvvUPPF/y zBBsqJbJcAyFnzPZg43A2a/gx9Euj9U= X-Google-Smtp-Source: ABdhPJwOENPGuqtzJ9HoVEPHnR8/lojiRQvLXQJEwRIlhdupbbc70EB9EFIuGWWYfXwBRz4DGek67w== X-Received: by 2002:a17:90b:17c3:: with SMTP id me3mr38053127pjb.243.1635771254942; Mon, 01 Nov 2021 05:54:14 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([2607:fb90:a63f:468b:b937:402b:0:c66]) by smtp.gmail.com with ESMTPSA id u10sm14841437pfh.49.2021.11.01.05.54.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Nov 2021 05:54:14 -0700 (PDT) Received: from gnu-tgl-2.localdomain (gnu-tgl-2 [192.168.1.34]) by gnu-cfl-2.localdomain (Postfix) with ESMTPS id AD7661A0907; Mon, 1 Nov 2021 05:54:13 -0700 (PDT) Received: from gnu-tgl-2.. (localhost [IPv6:::1]) by gnu-tgl-2.localdomain (Postfix) with ESMTP id 5F44A30033C; Mon, 1 Nov 2021 05:54:12 -0700 (PDT) From: "H.J. Lu" To: libc-alpha@sourceware.org Subject: [PATCH 2/2] x86-64: Remove Prefer_AVX2_STRCMP Date: Mon, 1 Nov 2021 05:54:12 -0700 Message-Id: <20211101125412.611713-3-hjl.tools@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211101125412.611713-1-hjl.tools@gmail.com> References: <20211101125412.611713-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3032.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Nov 2021 12:54:17 -0000 Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake. --- sysdeps/x86/cpu-features.c | 8 -------- sysdeps/x86/cpu-tunables.c | 2 -- .../include/cpu-features-preferred_feature_index_1.def | 1 - sysdeps/x86_64/multiarch/strcmp.c | 3 +-- sysdeps/x86_64/multiarch/strncmp.c | 3 +-- 5 files changed, 2 insertions(+), 15 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 645bba6314..be2498b2e7 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -546,14 +546,6 @@ init_cpu_features (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] |= bit_arch_Prefer_No_VZEROUPPER; - - /* Since to compare 2 32-byte strings, 256-bit EVEX strcmp - requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp - requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, - AVX2 strcmp is faster than EVEX strcmp. */ - if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)) - cpu_features->preferred[index_arch_Prefer_AVX2_STRCMP] - |= bit_arch_Prefer_AVX2_STRCMP; } /* Avoid avoid short distance REP MOVSB on processor with FSRM. */ diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c index 00fe5045eb..61b05e5b1d 100644 --- a/sysdeps/x86/cpu-tunables.c +++ b/sysdeps/x86/cpu-tunables.c @@ -239,8 +239,6 @@ TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp) CHECK_GLIBC_IFUNC_PREFERRED_BOTH (n, cpu_features, Fast_Copy_Backward, disable, 18); - CHECK_GLIBC_IFUNC_PREFERRED_NEED_BOTH - (n, cpu_features, Prefer_AVX2_STRCMP, AVX2, disable, 18); } break; case 19: diff --git a/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def b/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def index d7c93f00c5..1530d594b3 100644 --- a/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def +++ b/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def @@ -32,5 +32,4 @@ BIT (Prefer_ERMS) BIT (Prefer_No_AVX512) BIT (MathVec_Prefer_No_AVX512) BIT (Prefer_FSRM) -BIT (Prefer_AVX2_STRCMP) BIT (Avoid_Short_Distance_REP_MOVSB) diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c index 62b7abeeee..7c2901bf44 100644 --- a/sysdeps/x86_64/multiarch/strcmp.c +++ b/sysdeps/x86_64/multiarch/strcmp.c @@ -43,8 +43,7 @@ IFUNC_SELECTOR (void) { if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && CPU_FEATURE_USABLE_P (cpu_features, BMI2) - && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_AVX2_STRCMP)) + && CPU_FEATURE_USABLE_P (cpu_features, BMI2)) return OPTIMIZE (evex); if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c index 60ba0fe356..f94a421784 100644 --- a/sysdeps/x86_64/multiarch/strncmp.c +++ b/sysdeps/x86_64/multiarch/strncmp.c @@ -43,8 +43,7 @@ IFUNC_SELECTOR (void) { if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && CPU_FEATURE_USABLE_P (cpu_features, BMI2) - && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_AVX2_STRCMP)) + && CPU_FEATURE_USABLE_P (cpu_features, BMI2)) return OPTIMIZE (evex); if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) -- 2.33.1