From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) by sourceware.org (Postfix) with ESMTPS id 9AB7F3858D33 for ; Thu, 6 Jul 2023 13:17:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9AB7F3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-6b8decf09e1so628404a34.0 for ; Thu, 06 Jul 2023 06:17:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1688649381; x=1691241381; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=i+qPMhhR5xjPLXwjvLfbtH/dUHWcTr7tBqXSVf6dTPU=; b=UpvEGcsrkQeifDdR9xVclh0X1yvJpfdgus1CIbfQXxoxyJtoO0wJ7ayiBEc3Y6YIcj WqVTmrsJFyOeSFhFHvrUa5pp7lXUFdAOsKuLHBRBalpuBJ2mp7TYp4kRxYwXXtm+vU3f K3zR0drBhsFGgsLPzTbf4VAsVhHpR8LQjg/Gcplem412Wv3ACOR8OIlQIZYy3GduXr9A LUSHCuWkEeowUk9bj4KiW+TqwkTy3gzAqaB0lNH4gvhCnJE9gp5LTTXByy7bOO3ADa9E qLFQCTz8kOYAcZPqCZBz+rd4tIkHkKF7EoFCBXW1PiPW+raXoFHsXYu25jNY7SJlet/T g5Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688649381; x=1691241381; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i+qPMhhR5xjPLXwjvLfbtH/dUHWcTr7tBqXSVf6dTPU=; b=Mwg6oVR+BMARVs11UqE1LdxY3lMVH/5wURD61vzIG1Kp1tTT/nnKyCogZSahTVB2+Y p38Vq18xeTfuKYyZUMFcyPvplsFH61QKc4L/GOBehdZuREsFGx67X1Nyz1Ng1lpsU4QB POgfGmCkYcpkJoLBhWOw4uLKR54JyXnnSMYtnM4L3EDxFrakoHnp6RwIIdVO5FNx0pxt 45TTbIewnE0MvJUPgAlR3dcpI8hWH8qlyEoOSWTWijBa2RK3AKLfDglokRHD7v7uPpHN FHO7API+x7bsGoSZ0i+7uG2TbvmLJ6d+Y0HNARhyiJ+qnSZTKdby6bn4wl89slUv3QTM jtVg== X-Gm-Message-State: ABy/qLZoBSw8Ad+whDoP28h0pQLg+Kd84EsG7lPJvOQJL0abaffz0y9f nm0Y/21lVoySQadih0zGB1FOdQ== X-Google-Smtp-Source: APBJJlEF9FKd9KneL693hbvgxjYXq5YjVTkh7rHEUw7mYpQnxysfy6LzZ/5beWuxUKlSS/nhR51HkA== X-Received: by 2002:a9d:7487:0:b0:6b6:382a:a5f6 with SMTP id t7-20020a9d7487000000b006b6382aa5f6mr2042621otk.21.1688649380469; Thu, 06 Jul 2023 06:16:20 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c3:e0c8:cd42:a845:c93a:9515? ([2804:1b3:a7c3:e0c8:cd42:a845:c93a:9515]) by smtp.gmail.com with ESMTPSA id s16-20020a0568301e1000b006b8bf76174fsm672761otr.21.2023.07.06.06.16.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Jul 2023 06:16:19 -0700 (PDT) Message-ID: Date: Thu, 6 Jul 2023 10:16:16 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v3] PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES. Content-Language: en-US To: bmahi496@linux.ibm.com, libc-alpha@sourceware.org Cc: rajis@linux.ibm.com, bergner@linux.ibm.com, Mahesh Bodapati References: <20230706122544.175643-1-bmahi496@linux.ibm.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <20230706122544.175643-1-bmahi496@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 06/07/23 09:25, bmahi496--- via Libc-alpha wrote: > From: Mahesh Bodapati > > This patch enables the option to influence hwcaps used by PowerPC. > The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz...., > can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx > and zzz, where the feature name is case-sensitive and has to match the ones > mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. > > Note that the tunable only handles the features which are really used > in the IFUNC selection. All others are ignored as the values are only > used inside glibc. > --- > manual/tunables.texi | 5 +- > sysdeps/powerpc/cpu-features.c | 91 ++++++++++++++++++- > sysdeps/powerpc/cpu-features.h | 57 ++++++++++++ > sysdeps/powerpc/dl-tunables.list | 3 + > sysdeps/powerpc/hwcapinfo.c | 4 + > .../power4/multiarch/ifunc-impl-list.c | 4 +- > .../powerpc32/power4/multiarch/init-arch.h | 10 +- > sysdeps/powerpc/powerpc64/dl-machine.h | 2 - > .../powerpc64/multiarch/ifunc-impl-list.c | 7 +- > 9 files changed, 171 insertions(+), 12 deletions(-) > > diff --git a/manual/tunables.texi b/manual/tunables.texi > index 4ca0e42a11..776fd93fd9 100644 > --- a/manual/tunables.texi > +++ b/manual/tunables.texi > @@ -513,7 +513,10 @@ On s390x, the supported HWCAP and STFLE features can be found in > @code{sysdeps/s390/cpu-features.c}. In addition the user can also set > a CPU arch-level like @code{z13} instead of single HWCAP and STFLE features. > > -This tunable is specific to i386, x86-64 and s390x. > +On powerpc, the supported HWCAP and HWCAP2 features can be found in > +@code{sysdeps/powerpc/dl-procinfo.c}. > + > +This tunable is specific to i386, x86-64, s390x and powerpc. > @end deftp > > @deftp Tunable glibc.cpu.cached_memopt > diff --git a/sysdeps/powerpc/cpu-features.c b/sysdeps/powerpc/cpu-features.c > index 0ef3cf89d2..40d945ba03 100644 > --- a/sysdeps/powerpc/cpu-features.c > +++ b/sysdeps/powerpc/cpu-features.c > @@ -19,14 +19,103 @@ > #include > #include > #include > +#include > +#include > +#define MEMCMP_DEFAULT memcmp > +#define STRLEN_DEFAULT strlen > + > +static void > +TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp) > +{ > + /* The current IFUNC selection is always using the most recent > + features which are available via AT_HWCAP or AT_HWCAP2. But in > + some scenarios it is useful to adjust this selection. > + > + The environment variable: > + > + GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,.... > + > + Can be used to enable HWCAP/HWCAP2 feature yyy, disable HWCAP/HWCAP2 > + feature xxx, where the feature name is case-sensitive and has to match > + the ones mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. */ > + > + /* Copy the features from dl_powerpc_cpu_features, which contains the > + features provided by AT_HWCAP and AT_HWCAP2. */ > + struct cpu_features *cpu_features = &GLRO(dl_powerpc_cpu_features); > + unsigned long int tcbv_hwcap = cpu_features->hwcap; > + unsigned long int tcbv_hwcap2 = cpu_features->hwcap2; > + const char *token = valp->strval; > + do > + { > + const char *token_end, *feature; > + bool disable; > + size_t token_len, i, feature_len; > + /* Find token separator or end of string. */ > + for (token_end = token; *token_end != ','; token_end++) > + if (*token_end == '\0') > + break; > + > + /* Determine feature. */ > + token_len = token_end - token; > + if (*token == '-') > + { > + disable = true; > + feature = token + 1; > + feature_len = token_len - 1; > + } > + else > + { > + disable = false; > + feature = token; > + feature_len = token_len; > + } > + for (i = 0; hwcap_tunables[i].name != NULL; ++i) It is not clear to me how this is true since there is no NULL sentinel on hwcap_tunables. > + { > + /* Check the tunable name on the supported list. */ > + if (STRLEN_DEFAULT (hwcap_tunables[i].name) == feature_len > + && MEMCMP_DEFAULT (feature, hwcap_tunables[i].name, feature_len) > + == 0) > + { > + /* Update the hwcap and hwcap2 bits. */ > + if (disable) > + { > + /* Id is 1 for hwcap2 tunable. */ > + if (hwcap_tunables[i].id) > + cpu_features->hwcap2 &= ~(hwcap_tunables[i].mask); > + else > + cpu_features->hwcap &= ~(hwcap_tunables[i].mask); > + } > + else > + { > + /* Enable the features and also checking that no unsupported > + features were enabled by user. */ > + if (hwcap_tunables[i].id) > + cpu_features->hwcap2 |= (tcbv_hwcap2 & hwcap_tunables[i].mask); > + else > + cpu_features->hwcap |= (tcbv_hwcap & hwcap_tunables[i].mask); > + } > + } > + } > + token += token_len; > + /* ... and skip token separator for next round. */ > + if (*token == ',') token++; > + } > + while (*token != '\0'); > +} > > static inline void > -init_cpu_features (struct cpu_features *cpu_features) > +init_cpu_features (struct cpu_features *cpu_features, uint64_t hwcaps[]) > { > + /* Fill the cpu_features with the supported hwcaps > + which are set by __tcb_parse_hwcap_and_convert_at_platform. */ > + cpu_features->hwcap = hwcaps[0]; > + cpu_features->hwcap2 = hwcaps[1]; > /* Default is to use aligned memory access on optimized function unless > tunables is enable, since for this case user can explicit disable > unaligned optimizations. */ > int32_t cached_memfunc = TUNABLE_GET (glibc, cpu, cached_memopt, int32_t, > NULL); > cpu_features->use_cached_memopt = (cached_memfunc > 0); > + TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *, > + TUNABLE_CALLBACK (set_hwcaps)); > } > diff --git a/sysdeps/powerpc/cpu-features.h b/sysdeps/powerpc/cpu-features.h > index d316dc3d64..6839c3085b 100644 > --- a/sysdeps/powerpc/cpu-features.h > +++ b/sysdeps/powerpc/cpu-features.h > @@ -19,10 +19,67 @@ > # define __CPU_FEATURES_POWERPC_H > > #include > +#include > > struct cpu_features > { > bool use_cached_memopt; > + unsigned long int hwcap; > + unsigned long int hwcap2; > +}; > + > +static const struct > +{ > + const char *name; > + int mask; > + bool id; > +} hwcap_tunables[] = { > + /* AT_HWCAP tunable masks. */ > + { "4xxmac", PPC_FEATURE_HAS_4xxMAC, 0 }, This creates one extra dynamic relocation per entry: powerpc64le-linux-gnu-base$ powerpc64le-linux-gnu-readelf -a elf/ld.so [...] Relocation section '.relr.dyn' at offset 0xc68 contains 3 entries: 10 offsets [...] powerpc64le-linux-gnu-patch$ powerpc64le-linux-gnu-readelf -a elf/ld.so [...] Relocation section '.relr.dyn' at offset 0xc68 contains 5 entries: 53 offsets [...] Which I think we should avoid since is a small slow down on every program invocation. You can either define the name with a predefine size that fits for every name (say 32), but this will waste a some of space. Or you can specify the hwcap_tunables struct as pointing to an offset: static const char hwcap_names[] = "4xxmac\0" "altivec\0" [...] static const struct { unsigned short off; int mask; bool id; } hwcap_tunables[] = { { 0, PPC_FEATURE_HAS_4xxMAC, 0 }, { 7, PPC_FEATURE_HAS_ALTIVEC, 0 }, [...] } And then you check the name as: for (i = 0; array_length (hwcap_tunables); ++i) { const char *hwcap_name = hwcap_names + hwcap_tunables[i].off; [...] } The drowback is to get the offsets right it would require some preprocessor phase (something like we do for the signal and errno list). In any case I think it should add some testing. > + { "altivec", PPC_FEATURE_HAS_ALTIVEC, 0 }, > + { "arch_2_05", PPC_FEATURE_ARCH_2_05, 0 }, > + { "arch_2_06", PPC_FEATURE_ARCH_2_06, 0 }, > + { "archpmu", PPC_FEATURE_PSERIES_PERFMON_COMPAT, 0 }, > + { "booke", PPC_FEATURE_BOOKE, 0 }, > + { "cellbe", PPC_FEATURE_CELL_BE, 0 }, > + { "dfp", PPC_FEATURE_HAS_DFP, 0 }, > + { "efpdouble", PPC_FEATURE_HAS_EFP_DOUBLE, 0 }, > + { "efpsingle", PPC_FEATURE_HAS_EFP_SINGLE, 0 }, > + { "fpu", PPC_FEATURE_HAS_FPU, 0 }, > + { "ic_snoop", PPC_FEATURE_ICACHE_SNOOP, 0 }, > + { "mmu", PPC_FEATURE_HAS_MMU, 0 }, > + { "notb", PPC_FEATURE_NO_TB, 0 }, > + { "pa6t", PPC_FEATURE_PA6T, 0 }, > + { "power4", PPC_FEATURE_POWER4, 0 }, > + { "power5", PPC_FEATURE_POWER5, 0 }, > + { "power5+", PPC_FEATURE_POWER5_PLUS, 0 }, > + { "power6x", PPC_FEATURE_POWER6_EXT, 0 }, > + { "ppc32", PPC_FEATURE_32, 0 }, > + { "ppc601", PPC_FEATURE_601_INSTR, 0 }, > + { "ppc64", PPC_FEATURE_64, 0 }, > + { "ppcle", PPC_FEATURE_PPC_LE, 0 }, > + { "smt", PPC_FEATURE_SMT, 0 }, > + { "spe", PPC_FEATURE_HAS_SPE, 0 }, > + { "true_le", PPC_FEATURE_TRUE_LE, 0 }, > + { "ucache", PPC_FEATURE_UNIFIED_CACHE, 0 }, > + { "vsx", PPC_FEATURE_HAS_VSX, 0 }, > + > + /* AT_HWCAP2 tunable masks. */ > + { "arch_2_07", PPC_FEATURE2_ARCH_2_07, 1 }, > + { "dscr", PPC_FEATURE2_HAS_DSCR, 1 }, > + { "ebb", PPC_FEATURE2_HAS_EBB, 1 }, > + { "htm", PPC_FEATURE2_HAS_HTM, 1 }, > + { "htm-nosc", PPC_FEATURE2_HTM_NOSC, 1 }, > + { "htm-no-suspend", PPC_FEATURE2_HTM_NO_SUSPEND, 1 }, > + { "isel", PPC_FEATURE2_HAS_ISEL, 1 }, > + { "tar", PPC_FEATURE2_HAS_TAR, 1 }, > + { "vcrypto", PPC_FEATURE2_HAS_VEC_CRYPTO, 1 }, > + { "arch_3_00", PPC_FEATURE2_ARCH_3_00, 1 }, > + { "ieee128", PPC_FEATURE2_HAS_IEEE128, 1 }, > + { "darn", PPC_FEATURE2_DARN, 1 }, > + { "scv", PPC_FEATURE2_SCV, 1 }, > + { "arch_3_1", PPC_FEATURE2_ARCH_3_1, 1 }, > + { "mma", PPC_FEATURE2_MMA, 1 }, > }; > > #endif /* __CPU_FEATURES_H */ > diff --git a/sysdeps/powerpc/dl-tunables.list b/sysdeps/powerpc/dl-tunables.list > index 87d6235c75..807b7f8013 100644 > --- a/sysdeps/powerpc/dl-tunables.list > +++ b/sysdeps/powerpc/dl-tunables.list > @@ -24,5 +24,8 @@ glibc { > maxval: 1 > default: 0 > } > + hwcaps { > + type: STRING > + } > } > } > diff --git a/sysdeps/powerpc/hwcapinfo.c b/sysdeps/powerpc/hwcapinfo.c > index e26e64d99e..f2c473c556 100644 > --- a/sysdeps/powerpc/hwcapinfo.c > +++ b/sysdeps/powerpc/hwcapinfo.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > tcbhead_t __tcb __attribute__ ((visibility ("hidden"))); > > @@ -63,6 +64,9 @@ __tcb_parse_hwcap_and_convert_at_platform (void) > else if (h1 & PPC_FEATURE_POWER5) > h1 |= PPC_FEATURE_POWER4; > > + uint64_t array_hwcaps[] = { h1, h2 }; > + init_cpu_features (&GLRO(dl_powerpc_cpu_features), array_hwcaps); > + > /* Consolidate both HWCAP and HWCAP2 into a single doubleword so that > we can read both in a single load later. */ > __tcb.hwcap = (h1 << 32) | (h2 & 0xffffffff); > diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c > index b4f80539e7..986c37d71e 100644 > --- a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > size_t > __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > @@ -28,7 +29,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > { > size_t i = max; > > - unsigned long int hwcap = GLRO(dl_hwcap); > + const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features); > + unsigned long int hwcap = features->hwcap; > /* hwcap contains only the latest supported ISA, the code checks which is > and fills the previous supported ones. */ > if (hwcap & PPC_FEATURE_ARCH_2_06) > diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h > index 3dd00e02ee..a0bbd12012 100644 > --- a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h > +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h > @@ -16,6 +16,7 @@ > . */ > > #include > +#include > > /* The code checks if _rtld_global_ro was realocated before trying to access > the dl_hwcap field. The assembly is to make the compiler not optimize the > @@ -32,11 +33,12 @@ > # define __GLRO(value) GLRO(value) > #endif > > -/* dl_hwcap contains only the latest supported ISA, the macro checks which is > - and fills the previous ones. */ > +/* Get the hardware information post the tunables set , the macro checks > + it and fills the previous ones. */ > #define INIT_ARCH() \ > - unsigned long int hwcap = __GLRO(dl_hwcap); \ > - unsigned long int __attribute__((unused)) hwcap2 = __GLRO(dl_hwcap2); \ > + const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features); \ > + unsigned long int hwcap = features->hwcap; \ > + unsigned long int __attribute__((unused)) hwcap2 = features->hwcap2; \ > bool __attribute__((unused)) use_cached_memopt = \ > __GLRO(dl_powerpc_cpu_features.use_cached_memopt); \ > if (hwcap & PPC_FEATURE_ARCH_2_06) \ > diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h > index 9b8943bc91..449208e86f 100644 > --- a/sysdeps/powerpc/powerpc64/dl-machine.h > +++ b/sysdeps/powerpc/powerpc64/dl-machine.h > @@ -27,7 +27,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -297,7 +296,6 @@ static inline void __attribute__ ((unused)) > dl_platform_init (void) > { > __tcb_parse_hwcap_and_convert_at_platform (); > - init_cpu_features (&GLRO(dl_powerpc_cpu_features)); > } > #endif > > diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > index ebe9434052..fc26dd0e17 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > @@ -17,6 +17,7 @@ > . */ > > #include > +#include > #include > #include > #include > @@ -27,9 +28,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > size_t max) > { > size_t i = max; > - > - unsigned long int hwcap = GLRO(dl_hwcap); > - unsigned long int hwcap2 = GLRO(dl_hwcap2); > + const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features); > + unsigned long int hwcap = features->hwcap; > + unsigned long int hwcap2 = features->hwcap2; > #ifdef SHARED > int cacheline_size = GLRO(dl_cache_line_size); > #endif