From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by sourceware.org (Postfix) with ESMTPS id C7F0F3858D20 for ; Wed, 12 Jul 2023 14:26:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C7F0F3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oi1-x229.google.com with SMTP id 5614622812f47-3a3ad1f39ebso511952b6e.1 for ; Wed, 12 Jul 2023 07:26:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1689172005; x=1691764005; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=7tpWywou/FaHZ18QZUPPiokEt3mAWzJNYIrUvI9n3dU=; b=nSSWrqHLo+2Q5fVWUbSjrigdCENFns3tLaYKj3GuFGYcIulSTZEoDOo73LNgaXeKBO YNn22ssqQVTcySjaF+IhHNqsSyFXXbgI1dAk59uKrNUTlpyAkmjFnlTQKE1QWZ1i44RY Zr7yIMsgJBK7dUvqTH+i3CIktlnlXZ+gAi1FDXXE7RIe8T/ZrsFRya2L2vu7g80U0df6 DeePudokV2+jMQflXndm1UlCg1Wq4InqtJ99QEsDMPu3XI9ngaDaJ22/5FjtKBlpN9UN kth0CgtFB+AQhKihSaNtSdvpB3c8V9+2XlqtkJXSmjdHsS6DRasSU9j6nC6CutZ92dKn WGvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689172005; x=1691764005; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7tpWywou/FaHZ18QZUPPiokEt3mAWzJNYIrUvI9n3dU=; b=AoXz3S0Kq78PEOt2JiZ7BhSL9yHM5mVz4oIsNOLv8kigws71JIWyZBaz0HxhOsIUdf 4VeYj00wQEop6zdXmCzlISuWNVc6Q84ySSRFptDJf6ojEHHFl2QOzrgGmUvHQsJBR8l6 pggQNq/gn3xogxXMYQU619JEoc2+qovQs0uf7n3RTcXeQ1jyOKObkgnerLIPZKbRHeIJ cmzkekdQTp48HqCzhwto3tEYKom2weBgqBngZnIf/3fH2otE9llKzTKl91Euz+MviXvS Rf+uDCfyK0Nqo8uYx9orKRRrtmshFbvom3xIRUR3lqdXT+F2GdpgvsBqw5vwO3T1tx/X ZQ8w== X-Gm-Message-State: ABy/qLZkWiJOri4EmweOkEeqm8PVbToDScQCziTTZLDy+8IILTelTIt/ Scx461xKxjGumEXYwRmU1LEQ5A== X-Google-Smtp-Source: APBJJlHGM2sGscTo4W2Iw0qVLAe0cjFGCnLoMklSsxlYmsU1/BiPfAX8uVvsaHoISSGyejnrs9JgxQ== X-Received: by 2002:a05:6808:1a90:b0:3a0:5420:700b with SMTP id bm16-20020a0568081a9000b003a05420700bmr1322987oib.28.1689172003432; Wed, 12 Jul 2023 07:26:43 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c3:e0c8:142d:91c0:e0dd:736d? ([2804:1b3:a7c3:e0c8:142d:91c0:e0dd:736d]) by smtp.gmail.com with ESMTPSA id q64-20020a4a4b43000000b00565ebacf9cfsm1857075ooa.33.2023.07.12.07.26.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 12 Jul 2023 07:26:42 -0700 (PDT) Message-ID: Date: Wed, 12 Jul 2023 11:26:39 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v6] PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES. Content-Language: en-US To: bmahi496@linux.ibm.com, libc-alpha@sourceware.org Cc: rajis@linux.ibm.com, bergner@linux.ibm.com, Mahesh Bodapati References: <20230711121406.962921-1-bmahi496@linux.ibm.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <20230711121406.962921-1-bmahi496@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 11/07/23 09:14, bmahi496@linux.ibm.com wrote: > From: Mahesh Bodapati > > This patch enables the option to influence hwcaps used by PowerPC. > The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz...., > can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx > and zzz, where the feature name is case-sensitive and has to match the ones > mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. > > Note that the hwcap tunables only used in the IFUNC selection. > --- > manual/tunables.texi | 5 +- > sysdeps/powerpc/cpu-features.c | 96 ++++++++++++++++- > sysdeps/powerpc/cpu-features.h | 102 ++++++++++++++++++ > sysdeps/powerpc/dl-tunables.list | 3 + > sysdeps/powerpc/hwcapinfo.c | 4 + > .../power4/multiarch/ifunc-impl-list.c | 4 +- > .../powerpc32/power4/multiarch/init-arch.h | 10 +- > sysdeps/powerpc/powerpc64/dl-machine.h | 2 - > .../powerpc64/multiarch/ifunc-impl-list.c | 7 +- > 9 files changed, 221 insertions(+), 12 deletions(-) > > diff --git a/manual/tunables.texi b/manual/tunables.texi > index 4ca0e42a11..776fd93fd9 100644 > --- a/manual/tunables.texi > +++ b/manual/tunables.texi > @@ -513,7 +513,10 @@ On s390x, the supported HWCAP and STFLE features can be found in > @code{sysdeps/s390/cpu-features.c}. In addition the user can also set > a CPU arch-level like @code{z13} instead of single HWCAP and STFLE features. > > -This tunable is specific to i386, x86-64 and s390x. > +On powerpc, the supported HWCAP and HWCAP2 features can be found in > +@code{sysdeps/powerpc/dl-procinfo.c}. > + > +This tunable is specific to i386, x86-64, s390x and powerpc. > @end deftp > > @deftp Tunable glibc.cpu.cached_memopt > diff --git a/sysdeps/powerpc/cpu-features.c b/sysdeps/powerpc/cpu-features.c > index 0ef3cf89d2..2b727b5917 100644 > --- a/sysdeps/powerpc/cpu-features.c > +++ b/sysdeps/powerpc/cpu-features.c This file should be within sysdep/unix/sysv/linux/powerpc since hwcap is a Linux feature (similar to what aarch64 does). > @@ -19,14 +19,108 @@ > #include > #include > #include > +#include > +#include > +#define MEMCMP_DEFAULT memcmp > +#define STRLEN_DEFAULT strlen There is no need to redefine these macros for powerpc, s390x does it because it redefines the default string routines depending of the minimum ISA used to build (sysdeps/s390/ifunc-memcmp.h). > + > +static void > +TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp) > +{ > + /* The current IFUNC selection is always using the most recent > + features which are available via AT_HWCAP or AT_HWCAP2. But in > + some scenarios it is useful to adjust this selection. > + > + The environment variable: > + > + GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,.... > + > + Can be used to enable HWCAP/HWCAP2 feature yyy, disable HWCAP/HWCAP2 > + feature xxx, where the feature name is case-sensitive and has to match > + the ones mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. */ > + > + /* Copy the features from dl_powerpc_cpu_features, which contains the > + features provided by AT_HWCAP and AT_HWCAP2. */ > + struct cpu_features *cpu_features = &GLRO (dl_powerpc_cpu_features); > + unsigned long int tcbv_hwcap = cpu_features->hwcap; > + unsigned long int tcbv_hwcap2 = cpu_features->hwcap2; > + unsigned int tun_count; > + const char *token = valp->strval; > + tun_count = sizeof (hwcap_tunables) / sizeof (hwcap_tunables[0]); Use array_length (hwcap_tunables) here. > + do > + { > + const char *token_end, *feature; > + bool disable; > + size_t token_len, i, feature_len, offset = 0; > + /* Find token separator or end of string. */ > + for (token_end = token; *token_end != ','; token_end++) > + if (*token_end == '\0') > + break; > + > + /* Determine feature. */ > + token_len = token_end - token; > + if (*token == '-') > + { > + disable = true; > + feature = token + 1; > + feature_len = token_len - 1; > + } > + else > + { > + disable = false; > + feature = token; > + feature_len = token_len; > + } > + for (i = 0; i < tun_count; ++i) > + { > + const char *hwcap_name = hwcap_names + offset; > + /* Check the tunable name on the supported list. */ > + if (STRLEN_DEFAULT (hwcap_name) == feature_len You can strlen twice here, although compiler most likely will optimize it maybe it should be better to just: const char *hwcap_name = hwcap_names + offset; size_t hwcap_name_len = strlen (hwcap_name); if (hwcap_name_len == feature_len && memcmp (feature, hwcap_name, feature_len) == 0) { [...] } offset += hwcap_name_len + 1; > + && MEMCMP_DEFAULT (feature, hwcap_name, feature_len) > + == 0) > + { > + /* Update the hwcap and hwcap2 bits. */ > + if (disable) > + { > + /* Id is 1 for hwcap2 tunable. */ > + if (hwcap_tunables[i].id) > + cpu_features->hwcap2 &= ~(hwcap_tunables[i].mask); > + else > + cpu_features->hwcap &= ~(hwcap_tunables[i].mask); > + } > + else > + { > + /* Enable the features and also check that no unsupported > + features were enabled by user. */ > + if (hwcap_tunables[i].id) > + cpu_features->hwcap2 |= (tcbv_hwcap2 & hwcap_tunables[i].mask); > + else > + cpu_features->hwcap |= (tcbv_hwcap & hwcap_tunables[i].mask); > + } > + break; > + } > + offset += STRLEN_DEFAULT (hwcap_name) + 1; > + } > + token += token_len; > + /* ... and skip token separator for next round. */ > + if (*token == ',') token++; Put 'token++' in a new line. > + } > + while (*token != '\0'); > +} > > static inline void > -init_cpu_features (struct cpu_features *cpu_features) > +init_cpu_features (struct cpu_features *cpu_features, uint64_t hwcaps[]) > { > + /* Fill the cpu_features with the supported hwcaps > + which are set by __tcb_parse_hwcap_and_convert_at_platform. */ > + cpu_features->hwcap = hwcaps[0]; > + cpu_features->hwcap2 = hwcaps[1]; > /* Default is to use aligned memory access on optimized function unless > tunables is enable, since for this case user can explicit disable > unaligned optimizations. */ > int32_t cached_memfunc = TUNABLE_GET (glibc, cpu, cached_memopt, int32_t, > NULL); > cpu_features->use_cached_memopt = (cached_memfunc > 0); > + TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *, > + TUNABLE_CALLBACK (set_hwcaps)); > } > diff --git a/sysdeps/powerpc/cpu-features.h b/sysdeps/powerpc/cpu-features.h > index d316dc3d64..e5fce88e5e 100644 > --- a/sysdeps/powerpc/cpu-features.h > +++ b/sysdeps/powerpc/cpu-features.h > @@ -19,10 +19,112 @@ > # define __CPU_FEATURES_POWERPC_H > > #include > +#include > > struct cpu_features > { > bool use_cached_memopt; > + unsigned long int hwcap; > + unsigned long int hwcap2; > +}; > + > +static const char hwcap_names[] = { > + "4xxmac\0" > + "altivec\0" > + "arch_2_05\0" > + "arch_2_06\0" > + "archpmu\0" > + "booke\0" > + "cellbe\0" > + "dfp\0" > + "efpdouble\0" > + "efpsingle\0" > + "fpu\0" > + "ic_snoop\0" > + "mmu\0" > + "notb\0" > + "pa6t\0" > + "power4\0" > + "power5\0" > + "power5+\0" > + "power6x\0" > + "ppc32\0" > + "ppc601\0" > + "ppc64\0" > + "ppcle\0" > + "smt\0" > + "spe\0" > + "true_le\0" > + "ucache\0" > + "vsx\0" > + "arch_2_07\0" > + "dscr\0" > + "ebb\0" > + "htm\0" > + "htm-nosc\0" > + "htm-no-suspend\0" > + "isel\0" > + "tar\0" > + "vcrypto\0" > + "arch_3_00\0" > + "ieee128\0" > + "darn\0" > + "scv\0" > + "arch_3_1\0" > + "mma\0" > +}; > + > +static const struct > +{ > + unsigned int mask; > + bool id; > +} hwcap_tunables[] = { > + /* AT_HWCAP tunable masks. */ > + { PPC_FEATURE_HAS_4xxMAC, 0 }, > + { PPC_FEATURE_HAS_ALTIVEC, 0 }, > + { PPC_FEATURE_ARCH_2_05, 0 }, > + { PPC_FEATURE_ARCH_2_06, 0 }, > + { PPC_FEATURE_PSERIES_PERFMON_COMPAT, 0 }, > + { PPC_FEATURE_BOOKE, 0 }, > + { PPC_FEATURE_CELL_BE, 0 }, > + { PPC_FEATURE_HAS_DFP, 0 }, > + { PPC_FEATURE_HAS_EFP_DOUBLE, 0 }, > + { PPC_FEATURE_HAS_EFP_SINGLE, 0 }, > + { PPC_FEATURE_HAS_FPU, 0 }, > + { PPC_FEATURE_ICACHE_SNOOP, 0 }, > + { PPC_FEATURE_HAS_MMU, 0 }, > + { PPC_FEATURE_NO_TB, 0 }, > + { PPC_FEATURE_PA6T, 0 }, > + { PPC_FEATURE_POWER4, 0 }, > + { PPC_FEATURE_POWER5, 0 }, > + { PPC_FEATURE_POWER5_PLUS, 0 }, > + { PPC_FEATURE_POWER6_EXT, 0 }, > + { PPC_FEATURE_32, 0 }, > + { PPC_FEATURE_601_INSTR, 0 }, > + { PPC_FEATURE_64, 0 }, > + { PPC_FEATURE_PPC_LE, 0 }, > + { PPC_FEATURE_SMT, 0 }, > + { PPC_FEATURE_HAS_SPE, 0 }, > + { PPC_FEATURE_TRUE_LE, 0 }, > + { PPC_FEATURE_UNIFIED_CACHE, 0 }, > + { PPC_FEATURE_HAS_VSX, 0 }, > + > + /* AT_HWCAP2 tunable masks. */ > + { PPC_FEATURE2_ARCH_2_07, 1 }, > + { PPC_FEATURE2_HAS_DSCR, 1 }, > + { PPC_FEATURE2_HAS_EBB, 1 }, > + { PPC_FEATURE2_HAS_HTM, 1 }, > + { PPC_FEATURE2_HTM_NOSC, 1 }, > + { PPC_FEATURE2_HTM_NO_SUSPEND, 1 }, > + { PPC_FEATURE2_HAS_ISEL, 1 }, > + { PPC_FEATURE2_HAS_TAR, 1 }, > + { PPC_FEATURE2_HAS_VEC_CRYPTO, 1 }, > + { PPC_FEATURE2_ARCH_3_00, 1 }, > + { PPC_FEATURE2_HAS_IEEE128, 1 }, > + { PPC_FEATURE2_DARN, 1 }, > + { PPC_FEATURE2_SCV, 1 }, > + { PPC_FEATURE2_ARCH_3_1, 1 }, > + { PPC_FEATURE2_MMA, 1 }, > }; > > #endif /* __CPU_FEATURES_H */ > diff --git a/sysdeps/powerpc/dl-tunables.list b/sysdeps/powerpc/dl-tunables.list > index 87d6235c75..807b7f8013 100644 > --- a/sysdeps/powerpc/dl-tunables.list > +++ b/sysdeps/powerpc/dl-tunables.list > @@ -24,5 +24,8 @@ glibc { > maxval: 1 > default: 0 > } > + hwcaps { > + type: STRING > + } > } > } > diff --git a/sysdeps/powerpc/hwcapinfo.c b/sysdeps/powerpc/hwcapinfo.c > index e26e64d99e..0652df353a 100644 > --- a/sysdeps/powerpc/hwcapinfo.c > +++ b/sysdeps/powerpc/hwcapinfo.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > tcbhead_t __tcb __attribute__ ((visibility ("hidden"))); > > @@ -63,6 +64,9 @@ __tcb_parse_hwcap_and_convert_at_platform (void) > else if (h1 & PPC_FEATURE_POWER5) > h1 |= PPC_FEATURE_POWER4; > > + uint64_t array_hwcaps[] = { h1, h2 }; > + init_cpu_features (&GLRO (dl_powerpc_cpu_features), array_hwcaps); > + > /* Consolidate both HWCAP and HWCAP2 into a single doubleword so that > we can read both in a single load later. */ > __tcb.hwcap = (h1 << 32) | (h2 & 0xffffffff); > diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c > index b4f80539e7..3b9278f5e5 100644 > --- a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > size_t > __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > @@ -28,7 +29,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > { > size_t i = max; > > - unsigned long int hwcap = GLRO(dl_hwcap); > + const struct cpu_features *features = &GLRO (dl_powerpc_cpu_features); For GLRO macros that acts a variable access it does need the extra space. > + unsigned long int hwcap = features->hwcap; > /* hwcap contains only the latest supported ISA, the code checks which is > and fills the previous supported ones. */ > if (hwcap & PPC_FEATURE_ARCH_2_06) > diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h > index 3dd00e02ee..a0bbd12012 100644 > --- a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h > +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h > @@ -16,6 +16,7 @@ > . */ > > #include > +#include > > /* The code checks if _rtld_global_ro was realocated before trying to access > the dl_hwcap field. The assembly is to make the compiler not optimize the > @@ -32,11 +33,12 @@ > # define __GLRO(value) GLRO(value) > #endif > > -/* dl_hwcap contains only the latest supported ISA, the macro checks which is > - and fills the previous ones. */ > +/* Get the hardware information post the tunables set , the macro checks Extra space before ','. > + it and fills the previous ones. */ > #define INIT_ARCH() \ > - unsigned long int hwcap = __GLRO(dl_hwcap); \ > - unsigned long int __attribute__((unused)) hwcap2 = __GLRO(dl_hwcap2); \ > + const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features); \ > + unsigned long int hwcap = features->hwcap; \ > + unsigned long int __attribute__((unused)) hwcap2 = features->hwcap2; \ > bool __attribute__((unused)) use_cached_memopt = \ > __GLRO(dl_powerpc_cpu_features.use_cached_memopt); \ > if (hwcap & PPC_FEATURE_ARCH_2_06) \ > diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h > index 9b8943bc91..449208e86f 100644 > --- a/sysdeps/powerpc/powerpc64/dl-machine.h > +++ b/sysdeps/powerpc/powerpc64/dl-machine.h > @@ -27,7 +27,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -297,7 +296,6 @@ static inline void __attribute__ ((unused)) > dl_platform_init (void) > { > __tcb_parse_hwcap_and_convert_at_platform (); > - init_cpu_features (&GLRO(dl_powerpc_cpu_features)); > } > #endif > > diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > index ebe9434052..139b846cef 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c > @@ -17,6 +17,7 @@ > . */ > > #include > +#include > #include > #include > #include > @@ -27,9 +28,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > size_t max) > { > size_t i = max; > - > - unsigned long int hwcap = GLRO(dl_hwcap); > - unsigned long int hwcap2 = GLRO(dl_hwcap2); > + const struct cpu_features *features = &GLRO (dl_powerpc_cpu_features); Ditto. > + unsigned long int hwcap = features->hwcap; > + unsigned long int hwcap2 = features->hwcap2; > #ifdef SHARED > int cacheline_size = GLRO(dl_cache_line_size); > #endif