From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id 5D29C385DC2E for ; Mon, 15 Mar 2021 14:25:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5D29C385DC2E Received: by mail-pg1-x535.google.com with SMTP id l2so20557400pgb.1 for ; Mon, 15 Mar 2021 07:25:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BxB5Fu8dZyXSG8Ra95yF2qKD8nu3xd6/oTNIasXv93E=; b=gM0r0mUtsxP4ctnIfspk5iB8Qz05jE8PJyhJax8fxN2mgNki4jdZ2Yxk/LgfII0VGE ErVzY1BLz3iL/v4islnVI5fjyEuZEFDsQdU5XugN1rlVN/oVNiVIJvTzVWcICMABNj41 /2+3MfZx7R6uKmmjmWkyAcoRVV5sye0WmGzgUMhXuOM1z0RTFE7/+q7XzfSjuDd18ARZ z0ff5CedT+NJSoDRN+CYl7gNs6e8r+ajfmNGKXQr7V/Cfeoz75De19FdfbkNB9ZOKszq /2GiWV/LZyNVd4FBek0FcrIiW4RclL2OgLVlIFPSncfZ04b+Gf7JimeHKGVkpuyHE/IE qifw== X-Gm-Message-State: AOAM532xv0wb+ek+5++dQOsjDHjNavbKn8s7JUrExKxi+au5ZNJcsKIe 8qNWKoJGbzjb/VZlfIs62F4zzw2hoI0= X-Google-Smtp-Source: ABdhPJwh6dX8XJJZj9uDf36gNoxo1T4UVDotdohHDz/7tOF+epNej63NVdOFllXV6XJ4zGXTBhDuxA== X-Received: by 2002:a05:6a00:a83:b029:1ed:55fc:e22a with SMTP id b3-20020a056a000a83b02901ed55fce22amr10525662pfl.45.1615818322961; Mon, 15 Mar 2021 07:25:22 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.56.38.48]) by smtp.gmail.com with ESMTPSA id t22sm11514942pjo.45.2021.03.15.07.25.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Mar 2021 07:25:22 -0700 (PDT) Received: from gnu-cfl-2.?040none?041 (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id CA1A51A0950 for ; Mon, 15 Mar 2021 07:25:20 -0700 (PDT) From: "H.J. Lu" To: libc-alpha@sourceware.org Subject: [PATCH v2 01/10] x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP Date: Mon, 15 Mar 2021 07:25:11 -0700 Message-Id: <20210315142520.1661407-2-hjl.tools@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210315142520.1661407-1-hjl.tools@gmail.com> References: <20210315142520.1661407-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3034.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2021 14:25:25 -0000 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. --- sysdeps/x86/cpu-features.c | 20 +++++++++++++++++-- sysdeps/x86/cpu-tunables.c | 2 ++ ...cpu-features-preferred_feature_index_1.def | 1 + 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index d7248cbb45..d7808acb33 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -531,8 +531,24 @@ init_cpu_features (struct cpu_features *cpu_features) cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] |= bit_arch_Prefer_No_VZEROUPPER; else - cpu_features->preferred[index_arch_Prefer_No_AVX512] - |= bit_arch_Prefer_No_AVX512; + { + cpu_features->preferred[index_arch_Prefer_No_AVX512] + |= bit_arch_Prefer_No_AVX512; + + /* Avoid RTM abort triggered by VZEROUPPER inside a + transactionally executing RTM region. */ + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) + cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] + |= bit_arch_Prefer_No_VZEROUPPER; + + /* Since to compare 2 32-byte strings, 256-bit EVEX strcmp + requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp + requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, + AVX2 strcmp is faster than EVEX strcmp. */ + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)) + cpu_features->preferred[index_arch_Prefer_AVX2_STRCMP] + |= bit_arch_Prefer_AVX2_STRCMP; + } } /* This spells out "AuthenticAMD" or "HygonGenuine". */ else if ((ebx == 0x68747541 && ecx == 0x444d4163 && edx == 0x69746e65) diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c index 126896f41b..a90df39b78 100644 --- a/sysdeps/x86/cpu-tunables.c +++ b/sysdeps/x86/cpu-tunables.c @@ -238,6 +238,8 @@ TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp) CHECK_GLIBC_IFUNC_PREFERRED_BOTH (n, cpu_features, Fast_Copy_Backward, disable, 18); + CHECK_GLIBC_IFUNC_PREFERRED_NEED_BOTH + (n, cpu_features, Prefer_AVX2_STRCMP, AVX2, disable, 18); } break; case 19: diff --git a/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def b/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def index 06af1a8dd5..133aab19f1 100644 --- a/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def +++ b/sysdeps/x86/include/cpu-features-preferred_feature_index_1.def @@ -32,3 +32,4 @@ BIT (Prefer_ERMS) BIT (Prefer_No_AVX512) BIT (MathVec_Prefer_No_AVX512) BIT (Prefer_FSRM) +BIT (Prefer_AVX2_STRCMP) -- 2.30.2