From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 9E81A386C591 for ; Mon, 27 Jun 2022 19:45:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9E81A386C591 Received: by mail-pf1-x432.google.com with SMTP id a15so9907659pfv.13 for ; Mon, 27 Jun 2022 12:45:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=qMS0CkRX/+VNuCqipB56yBshCC5tVVrwSrbLjt1xVq8=; b=aCrCeuBu56n3D5uQ+cDBQ8ONfOrXkFCR1K46v2vrGrGRdajNv2AGFH1Bjo7B34BrrJ xf49qzCI/QQOUvT5KIwZe7UrPkCgcoFBoQ5MnujwEGiUp9Tj44GtIaxeibBX3/R+3ogE ii8CfDAi2j4lGK9G9rEGFFTN3usaBDJWEWKeBV9oMYzzxTLC2oa5MTaFFhuelmMTNT0Q DxNexdlBaYy6VcYHWBuoRAQ6ywHumRlFPo1R9nYFfnhXcoNEQoRlSwTZe38lMmxpcGtC ntTm1V0wezzq+7cPMjffcKaOJ6uB9i3pRarSydvbBuZmr9bqaP75pEhwJxAeiSWMXY9Q Iu1g== X-Gm-Message-State: AJIora86yN+M9rKTGhkbzaml/imJKTbeqMazQ8u8mEcyarNR9upCnkVK 2F0kUGBScUGp2vRLQkegOCilfwxstvU/FAICSIMH40hO X-Google-Smtp-Source: AGRyM1tVSJfWGOZrjZltEwvFTM0zL9Kz9cEuT1wdw1qgkciTblm4qELQ6mQA0QCjEvORTj41D0EgdDAUIkYfSD3VjvA= X-Received: by 2002:a05:6a00:c:b0:525:55cb:83cd with SMTP id h12-20020a056a00000c00b0052555cb83cdmr727753pfk.20.1656359133435; Mon, 27 Jun 2022 12:45:33 -0700 (PDT) MIME-Version: 1.0 References: <20220627190659.831144-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Mon, 27 Jun 2022 12:44:57 -0700 Message-ID: Subject: Re: [PATCH] x86-64: Only define used SSE/AVX/AVX512 run-time resolvers To: Noah Goldstein , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3025.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2022 19:45:36 -0000 On Mon, Jun 27, 2022 at 12:27 PM Noah Goldstein wrote: > > On Mon, Jun 27, 2022 at 12:23 PM Noah Goldstein wrote: > > > > On Mon, Jun 27, 2022 at 12:07 PM H.J. Lu wrote: > > > > > > When glibc is built with x86-64 ISA level v3, SSE run-time resolvers > > > aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers > > > are unused. > > > > > > 1. Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to > > > isa-level.h. > > > 2. Check the minimum x86-64 ISA level to exclude the unused run-time > > > resolvers. > > > --- > > > sysdeps/x86/isa-ifunc-macros.h | 27 ---------------- > > > sysdeps/x86/isa-level.h | 26 +++++++++++++++ > > > sysdeps/x86_64/dl-machine.h | 12 ++++--- > > > sysdeps/x86_64/dl-trampoline.S | 59 ++++++++++++++++++---------------- > > > 4 files changed, 66 insertions(+), 58 deletions(-) > > > > > > diff --git a/sysdeps/x86/isa-ifunc-macros.h b/sysdeps/x86/isa-ifunc-macros.h > > > index d69905689b..f967a1bec6 100644 > > > --- a/sysdeps/x86/isa-ifunc-macros.h > > > +++ b/sysdeps/x86/isa-ifunc-macros.h > > > @@ -56,31 +56,4 @@ > > > # define X86_IFUNC_IMPL_ADD_V1(...) > > > #endif > > > > > > -/* Both X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P > > Should this be a separate commit? I can do it. > > > - macros are wrappers for the the respective > > > - CPU_FEATURE{S}_{USABLE|ARCH}_P runtime checks. They differ in two > > > - ways. > > > - > > > - 1. The USABLE_P version is evaluated to true when the feature > > > - is enabled. > > > - > > > - 2. The ARCH_P version has a third argument `not`. The `not` > > > - argument can either be '!' or empty. If the feature is > > > - enabled above an ISA level, the third argument should be empty > > > - and the expression is evaluated to true when the feature is > > > - enabled. If the feature is disabled above an ISA level, the > > > - third argument should be `!` and the expression is evaluated > > > - to true when the feature is disabled. > > > - */ > > > - > > > -#define X86_ISA_CPU_FEATURE_USABLE_P(ptr, name) \ > > > - (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ > > > - || CPU_FEATURE_USABLE_P (ptr, name)) > > > - > > > - > > > -#define X86_ISA_CPU_FEATURES_ARCH_P(ptr, name, not) \ > > > - (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ > > > - || not CPU_FEATURES_ARCH_P (ptr, name)) > > > - > > > - > > > #endif > > > diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h > > > index 075e7c6ee1..f293aea906 100644 > > > --- a/sysdeps/x86/isa-level.h > > > +++ b/sysdeps/x86/isa-level.h > > > @@ -68,10 +68,12 @@ > > > compile-time constant.. */ > > > > > > /* ISA level >= 4 guaranteed includes. */ > > > +#define AVX512F_X86_ISA_LEVEL 4 > > > #define AVX512VL_X86_ISA_LEVEL 4 > > > #define AVX512BW_X86_ISA_LEVEL 4 > > > > > > /* ISA level >= 3 guaranteed includes. */ > > > +#define AVX_X86_ISA_LEVEL 3 > > > #define AVX2_X86_ISA_LEVEL 3 > > > #define BMI2_X86_ISA_LEVEL 3 > > > > > > @@ -87,6 +89,30 @@ > > > when ISA level < 3. */ > > > #define Prefer_No_VZEROUPPER_X86_ISA_LEVEL 3 > > > > > > +/* Both X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P > > > + macros are wrappers for the respective CPU_FEATURE{S}_{USABLE|ARCH}_P > > > + runtime checks. They differ in two ways. > > > + > > > + 1. The USABLE_P version is evaluated to true when the feature > > > + is enabled. > > > + > > > + 2. The ARCH_P version has a third argument `not`. The `not` > > > + argument can either be `!` or empty. If the feature is > > > + enabled above an ISA level, the third argument should be empty > > > + and the expression is evaluated to true when the feature is > > > + enabled. If the feature is disabled above an ISA level, the > > > + third argument should be `!` and the expression is evaluated > > > + to true when the feature is disabled. > > > + */ > > > + > > > +#define X86_ISA_CPU_FEATURE_USABLE_P(ptr, name) \ > > > + (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ > > > + || CPU_FEATURE_USABLE_P (ptr, name)) > > > + > > > +#define X86_ISA_CPU_FEATURES_ARCH_P(ptr, name, not) \ > > > + (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ > > > + || not CPU_FEATURES_ARCH_P (ptr, name)) > > > + > > > #define ISA_SHOULD_BUILD(isa_build_level) \ > > > (MINIMUM_X86_ISA_LEVEL <= (isa_build_level) && IS_IN (libc)) \ > > > || defined ISA_DEFAULT_IMPL > > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h > > > index 34766325ae..005d089501 100644 > > > --- a/sysdeps/x86_64/dl-machine.h > > > +++ b/sysdeps/x86_64/dl-machine.h > > > @@ -28,6 +28,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > /* Return nonzero iff ELF header is compatible with the running host. */ > > > static inline int __attribute__ ((unused)) > > > @@ -86,6 +87,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], > > > /* Identify this shared object. */ > > > *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l; > > > > > > + const struct cpu_features* cpu_features = __get_cpu_features (); > > > + > > > /* The got[2] entry contains the address of a function which gets > > > called to get the address of a so far unresolved function and > > > jump to it. The profiling extension of the dynamic linker allows > > > @@ -94,9 +97,9 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], > > > end in this function. */ > > > if (__glibc_unlikely (profile)) > > > { > > > - if (CPU_FEATURE_USABLE (AVX512F)) > > > + if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F)) > > > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512; > > > - else if (CPU_FEATURE_USABLE (AVX)) > > > + else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX)) > > > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx; > > > else > > > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_sse; > > > @@ -112,9 +115,10 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], > > > /* This function will get called to fix up the GOT entry > > > indicated by the offset on the stack, and then jump to > > > the resolved address. */ > > > - if (GLRO(dl_x86_cpu_features).xsave_state_size != 0) > > > + if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL > > > + || GLRO(dl_x86_cpu_features).xsave_state_size != 0) > > > *(ElfW(Addr) *) (got + 2) > > > - = (CPU_FEATURE_USABLE (XSAVEC) > > > + = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC) > > > ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec > > > : (ElfW(Addr)) &_dl_runtime_resolve_xsave); > > > else > > > diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S > > > index 831a654713..f669805ac5 100644 > > > --- a/sysdeps/x86_64/dl-trampoline.S > > > +++ b/sysdeps/x86_64/dl-trampoline.S > > > @@ -20,6 +20,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #ifndef DL_STACK_ALIGNMENT > > > /* Due to GCC bug: > > > @@ -62,35 +63,39 @@ > > > > Can you also add a guard for isa level v4 on the avx512 version? > > > #undef VMOVA > > > #undef VEC_SIZE > > > > > > -#define VEC_SIZE 32 > > > -#define VMOVA vmovdqa > > > -#define VEC(i) ymm##i > > > -#define _dl_runtime_profile _dl_runtime_profile_avx > > > -#include "dl-trampoline.h" > > > -#undef _dl_runtime_profile > > > -#undef VEC > > > -#undef VMOVA > > > -#undef VEC_SIZE > > > +#if MINIMUM_X86_ISA_LEVEL <= AVX_X86_ISA_LEVEL > > Can this be <= __X86_ISA_LEVEL_V3 > > > > > +# define VEC_SIZE 32 > > > +# define VMOVA vmovdqa > > > +# define VEC(i) ymm##i > > > +# define _dl_runtime_profile _dl_runtime_profile_avx > > > +# include "dl-trampoline.h" > > > +# undef _dl_runtime_profile > > > +# undef VEC > > > +# undef VMOVA > > > +# undef VEC_SIZE > > > +#endif > > > > > > +#if MINIMUM_X86_ISA_LEVEL < AVX_X86_ISA_LEVEL > > > > Can this be <= __X86_ISA_LEVEL_V2? > > > /* movaps/movups is 1-byte shorter. */ > > > -#define VEC_SIZE 16 > > > -#define VMOVA movaps > > > -#define VEC(i) xmm##i > > > -#define _dl_runtime_profile _dl_runtime_profile_sse > > > -#undef RESTORE_AVX > > > -#include "dl-trampoline.h" > > > -#undef _dl_runtime_profile > > > -#undef VEC > > > -#undef VMOVA > > > -#undef VEC_SIZE > > > - > > > -#define USE_FXSAVE > > > -#define STATE_SAVE_ALIGNMENT 16 > > > -#define _dl_runtime_resolve _dl_runtime_resolve_fxsave > > > -#include "dl-trampoline.h" > > > -#undef _dl_runtime_resolve > > > -#undef USE_FXSAVE > > > -#undef STATE_SAVE_ALIGNMENT > > > +# define VEC_SIZE 16 > > > +# define VMOVA movaps > > > +# define VEC(i) xmm##i > > > +# define _dl_runtime_profile _dl_runtime_profile_sse > > > +# undef RESTORE_AVX > > > +# include "dl-trampoline.h" > > > +# undef _dl_runtime_profile > > > +# undef VEC > > > +# undef VMOVA > > > +# undef VEC_SIZE > > > + > > > +# define USE_FXSAVE > > > +# define STATE_SAVE_ALIGNMENT 16 > > > +# define _dl_runtime_resolve _dl_runtime_resolve_fxsave > > > +# include "dl-trampoline.h" > > > +# undef _dl_runtime_resolve > > > +# undef USE_FXSAVE > > > +# undef STATE_SAVE_ALIGNMENT > > > +#endif > > > > > > #define USE_XSAVE > > > #define STATE_SAVE_ALIGNMENT 64 > > > -- > > > 2.36.1 > > > > > > > There are cases of unconditional `movaps` usage in dl-trampoline.h (L222) > > > > This appears to be on ISA level v3 / v4 path. > > > > For ISA level >= 3 builds can those become `vmovdqa`? -- H.J.