From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 693AD3858D28 for ; Mon, 19 Jun 2023 08:10:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 693AD3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35J88JCD006230 for ; Mon, 19 Jun 2023 08:10:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=ptLTMwd2jT+Ozh0X7WXRyoXq5yYdPF5u99lfJ/aEEjY=; b=l3bF3ypqytCJ060L5njbdsHEWCFZiPNNeitYeP7HL/63wLhFfjNzZ8Ds2yIWF1agEjVj JaKrnSPOIJ14UVcUL7CXnZRy8PuIi2Ulwg6LAOOX/rtMraCI8EDxxh3/Z/2eI7MZNFcR p0KBBNojItTRPopjHOMeGbjM/F3kJ4a3OovH7ZStbT13U7NZKO4HwdFQ+tU0Gs7bDZDK aomxgls/qxdSmG/vRi93rlv2A1R++WfwMJVxUZLk7u+gEqkUxwda/hsgGwDYmew7RJMv 2mjLNTf2IKaDBvKpBzgq6qjk5FqOMhOcXx+1pF6q560rE4WBStA7bqayzcrcIUrPC1zq 2w== Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rak0ug7sq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 19 Jun 2023 08:10:24 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35J5mXEq025265 for ; Mon, 19 Jun 2023 08:10:21 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma03fra.de.ibm.com (PPS) with ESMTPS id 3r94f58x74-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 19 Jun 2023 08:10:21 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35J8AINM3277344 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 19 Jun 2023 08:10:18 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1147320043; Mon, 19 Jun 2023 08:10:18 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5D0C720040; Mon, 19 Jun 2023 08:10:17 +0000 (GMT) Received: from ltcden2-lp1.aus.stglabs.ibm.com (unknown [9.3.90.43]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 19 Jun 2023 08:10:17 +0000 (GMT) From: bmahi496@linux.ibm.com To: libc-alpha@sourceware.org Cc: rajis@linux.ibm.com, Mahesh Bodapati Subject: [PATCH] PowerPC: Influence hwcaps via cpu arch-level GLIBC_TUNABLES. Date: Mon, 19 Jun 2023 03:09:56 -0500 Message-Id: <20230619080956.3187040-1-bmahi496@linux.ibm.com> X-Mailer: git-send-email 2.39.3 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TNxqwksph70qicTtE1wn3dKPiQmiuhNy X-Proofpoint-ORIG-GUID: TNxqwksph70qicTtE1wn3dKPiQmiuhNy Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-19_05,2023-06-16_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 adultscore=0 spamscore=0 bulkscore=0 mlxscore=0 malwarescore=0 impostorscore=0 clxscore=1011 suspectscore=0 priorityscore=1501 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306190072 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Mahesh Bodapati This patch enables the option to influence hwcaps used by the powerpc. The user can set a CPU arch-level tunable like power10 instead of single HWCAP features. The influenced hwcap are stored in the powerpc-specific cpu_features struct. Below are the supported cpu arch-level tunables. - power10: power10 feature set - power9: power9 feature set - power8: power8 feature set - power7: power7 feature set - power6: power6 feature set - power5: power5 feature set - power4: power4 feature set. --- manual/tunables.texi | 5 +- sysdeps/powerpc/cpu-features.c | 164 ++++++++++++++++++ sysdeps/powerpc/cpu-features.h | 2 + sysdeps/powerpc/dl-tunables.list | 3 + sysdeps/powerpc/powerpc32/dl-machine.h | 2 + .../powerpc32/power4/multiarch/init-arch.h | 10 +- .../powerpc64/multiarch/ifunc-impl-list.c | 7 +- 7 files changed, 185 insertions(+), 8 deletions(-) diff --git a/manual/tunables.texi b/manual/tunables.texi index 4ca0e42a11..c3a657d5d2 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -513,7 +513,10 @@ On s390x, the supported HWCAP and STFLE features can be found in @code{sysdeps/s390/cpu-features.c}. In addition the user can also set a CPU arch-level like @code{z13} instead of single HWCAP and STFLE features. -This tunable is specific to i386, x86-64 and s390x. +On powerpc, the user can set a CPU arch-level like @code{power10}, @code{power9} +instead of single HWCAP features. + +This tunable is specific to i386, x86-64, s390x and powerpc. @end deftp @deftp Tunable glibc.cpu.cached_memopt diff --git a/sysdeps/powerpc/cpu-features.c b/sysdeps/powerpc/cpu-features.c index 0ef3cf89d2..35c88e8ebf 100644 --- a/sysdeps/powerpc/cpu-features.c +++ b/sysdeps/powerpc/cpu-features.c @@ -19,14 +19,178 @@ #include #include #include +#include +#include +#define MEMCMP_DEFAULT memcmp + +#define POWERPC_COPY_CPU_FEATURES(SRC_PTR, DEST_PTR) \ + (DEST_PTR)->hwcap = (SRC_PTR)->hwcap; \ + (DEST_PTR)->hwcap2 = (SRC_PTR)->hwcap2; + +static void +TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp) +{ + /* The current IFUNC selection is always using the most recent + features which are available via AT_HWCAP or AT_HWCAP2. But in + some scenarios it is useful to adjust this selection. + + The environment variable: + + GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,.... + + Can be used to enable HWCAP/HWCAP2 feature yyy, disable HWCAP/HWCAP2 feature + xxx, where the feature name is case-sensitive and has to match the ones + used below. + power10: enable/disable power10 feature set + power9: enable/disable power9 feature set + power8: enable/disable power8 feature set + power7: enable/disable power7 feature set + power6: enable/disable power6 feature set + power5: enable/disable power5 feature set + power4: enable/disable power4 feature set. */ + + /* Copy the features from dl_powerpc_cpu_features, which contains the features + provided by AT_HWCAP and AT_HWCAP2. */ + struct cpu_features *cpu_features = &GLRO(dl_powerpc_cpu_features); + struct cpu_features cpu_features_curr; + bool disable_vsx = 0; + POWERPC_COPY_CPU_FEATURES (cpu_features, &cpu_features_curr); + const char *token = valp->strval; + do + { + const char *token_end, *feature; + bool disable; + size_t token_len; + size_t feature_len; + /* Find token separator or end of string. */ + for (token_end = token; *token_end != ','; token_end++) + if (*token_end == '\0') + break; + /* Determine feature. */ + token_len = token_end - token; + if(*token == '-') + { + disable = true; + feature = token + 1; + feature_len = token_len - 1; + } + else + { + disable = false; + feature = token; + feature_len = token_len; + } + unsigned long int hwcap_mask = 0UL, hwcap2_mask = 0UL; + unsigned long int hwcap_disable = 0UL, hwcap2_disable = 0UL; + if (*feature == 'p') + { + if (feature_len == 7 && MEMCMP_DEFAULT (feature, "power10", 7) == 0) + { + hwcap2_mask = hwcap2_mask | PPC_FEATURE2_ARCH_3_1; + disable_vsx = 0; + } + else if (feature_len == 6 + && MEMCMP_DEFAULT (feature, "power9", 6) == 0) + { + hwcap2_mask = hwcap2_mask | PPC_FEATURE2_ARCH_3_00; + hwcap2_disable = PPC_FEATURE2_ARCH_3_1; + disable_vsx = 0; + } + else if (feature_len == 6 + && MEMCMP_DEFAULT (feature, "power8", 6) == 0) + { + hwcap2_mask = hwcap2_mask | PPC_FEATURE2_ARCH_2_07; + hwcap2_disable = PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_ARCH_3_00; + disable_vsx = 0; + } + else if (feature_len == 6 + && MEMCMP_DEFAULT (feature, "power7", 6) == 0) + { + hwcap_mask = hwcap_mask | PPC_FEATURE_ARCH_2_06; + hwcap2_disable = PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_ARCH_3_00 + | PPC_FEATURE2_ARCH_2_07; + disable_vsx = 0; + } + else if (feature_len == 6 + && MEMCMP_DEFAULT (feature, "power6", 6) == 0) + { + hwcap_mask = hwcap_mask | PPC_FEATURE_ARCH_2_05; + hwcap_disable = PPC_FEATURE_ARCH_2_06; + hwcap2_disable = PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_ARCH_3_00 + | PPC_FEATURE2_ARCH_2_07; + if (!disable) + disable_vsx = 1; + } + else if (feature_len == 6 + && MEMCMP_DEFAULT (feature, "power5", 6) == 0) + { + hwcap_mask = hwcap_mask | PPC_FEATURE_POWER5; + hwcap_disable = PPC_FEATURE_ARCH_2_06 | PPC_FEATURE_ARCH_2_05; + hwcap2_disable = PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_ARCH_3_00 + | PPC_FEATURE2_ARCH_2_07; + if (!disable) + disable_vsx = 1; + } + else if (feature_len == 6 + && MEMCMP_DEFAULT (feature, "power4", 6) == 0) + { + hwcap_mask = hwcap_mask | PPC_FEATURE_POWER4; + hwcap_disable = PPC_FEATURE_ARCH_2_06 | PPC_FEATURE_ARCH_2_05 + | PPC_FEATURE_POWER5; + hwcap2_disable = PPC_FEATURE2_ARCH_3_1 | PPC_FEATURE2_ARCH_3_00 + | PPC_FEATURE2_ARCH_2_07; + if (!disable) + disable_vsx = 1; + } + } + /* Perform the actions determined above. */ + if (hwcap_mask != 0UL) + { + /* we don't disable altivec and vsx */ + if (disable) + { + cpu_features_curr.hwcap &= ~hwcap_mask; + } + else + cpu_features_curr.hwcap |= hwcap_mask; + } + if (!disable && hwcap_disable != 0UL) + cpu_features_curr.hwcap &= ~hwcap_disable; + if (hwcap2_mask != 0UL) + { + if (disable) + cpu_features_curr.hwcap2 &= ~hwcap2_mask; + else + cpu_features_curr.hwcap2 |= hwcap2_mask; + } + if (!disable && hwcap2_disable != 0UL) + cpu_features_curr.hwcap2 &= ~hwcap2_disable; + + token += token_len; + /* ... and skip token separator for next round. */ + if (*token == ',') token++; + } + while (*token != '\0'); + if (disable_vsx) + cpu_features_curr.hwcap &= ~PPC_FEATURE_HAS_VSX; + + /* Copy back the supported tunable features */ + cpu_features->hwcap = cpu_features_curr.hwcap; + cpu_features->hwcap2 = cpu_features_curr.hwcap2; +} static inline void init_cpu_features (struct cpu_features *cpu_features) { + /* Fill cpu_features as passed by kernel and machine. */ + cpu_features->hwcap = GLRO(dl_hwcap); + cpu_features->hwcap2 = GLRO(dl_hwcap2); /* Default is to use aligned memory access on optimized function unless tunables is enable, since for this case user can explicit disable unaligned optimizations. */ int32_t cached_memfunc = TUNABLE_GET (glibc, cpu, cached_memopt, int32_t, NULL); cpu_features->use_cached_memopt = (cached_memfunc > 0); + TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *, + TUNABLE_CALLBACK (set_hwcaps)); } diff --git a/sysdeps/powerpc/cpu-features.h b/sysdeps/powerpc/cpu-features.h index d316dc3d64..928d2e7f74 100644 --- a/sysdeps/powerpc/cpu-features.h +++ b/sysdeps/powerpc/cpu-features.h @@ -23,6 +23,8 @@ struct cpu_features { bool use_cached_memopt; + unsigned long int hwcap; + unsigned long int hwcap2; }; #endif /* __CPU_FEATURES_H */ diff --git a/sysdeps/powerpc/dl-tunables.list b/sysdeps/powerpc/dl-tunables.list index 87d6235c75..807b7f8013 100644 --- a/sysdeps/powerpc/dl-tunables.list +++ b/sysdeps/powerpc/dl-tunables.list @@ -24,5 +24,8 @@ glibc { maxval: 1 default: 0 } + hwcaps { + type: STRING + } } } diff --git a/sysdeps/powerpc/powerpc32/dl-machine.h b/sysdeps/powerpc/powerpc32/dl-machine.h index a4cad7583c..43c07205e0 100644 --- a/sysdeps/powerpc/powerpc32/dl-machine.h +++ b/sysdeps/powerpc/powerpc32/dl-machine.h @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -157,6 +158,7 @@ static inline void __attribute__ ((unused)) dl_platform_init (void) { __tcb_parse_hwcap_and_convert_at_platform (); + init_cpu_features (&GLRO(dl_powerpc_cpu_features)); } #endif diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h index 3dd00e02ee..a0bbd12012 100644 --- a/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h @@ -16,6 +16,7 @@ . */ #include +#include /* The code checks if _rtld_global_ro was realocated before trying to access the dl_hwcap field. The assembly is to make the compiler not optimize the @@ -32,11 +33,12 @@ # define __GLRO(value) GLRO(value) #endif -/* dl_hwcap contains only the latest supported ISA, the macro checks which is - and fills the previous ones. */ +/* Get the hardware information post the tunables set , the macro checks + it and fills the previous ones. */ #define INIT_ARCH() \ - unsigned long int hwcap = __GLRO(dl_hwcap); \ - unsigned long int __attribute__((unused)) hwcap2 = __GLRO(dl_hwcap2); \ + const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features); \ + unsigned long int hwcap = features->hwcap; \ + unsigned long int __attribute__((unused)) hwcap2 = features->hwcap2; \ bool __attribute__((unused)) use_cached_memopt = \ __GLRO(dl_powerpc_cpu_features.use_cached_memopt); \ if (hwcap & PPC_FEATURE_ARCH_2_06) \ diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c index ebe9434052..fc26dd0e17 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c @@ -17,6 +17,7 @@ . */ #include +#include #include #include #include @@ -27,9 +28,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, size_t max) { size_t i = max; - - unsigned long int hwcap = GLRO(dl_hwcap); - unsigned long int hwcap2 = GLRO(dl_hwcap2); + const struct cpu_features *features = &GLRO(dl_powerpc_cpu_features); + unsigned long int hwcap = features->hwcap; + unsigned long int hwcap2 = features->hwcap2; #ifdef SHARED int cacheline_size = GLRO(dl_cache_line_size); #endif -- 2.39.3