public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ld.so: Add --list-tunables to print tunable values
@ 2020-09-12 13:44 H.J. Lu
  2020-09-12 13:44 ` [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: H.J. Lu @ 2020-09-12 13:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell

Tunable values and their minimum/maximum values are invisible to users.
This patch set adds --list-tunables to ld.so to print tunable values
with their minimum and maximum values.  For these tunables whose values
and minimum/maximum values are determinted at run-time, TUNABLE_SET_ALL
and TUNABLE_SET_ALL_FULL are added to update tunable values together with
their minimum and maximum values.  

--list-tunables works on i686 and x86-64.  Please test --list-tunables on
your native processors.  users/hjl/tunable/master branch at:

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/tunable/master

contains the same set of patches.

On x86, to make cache info accessible to --list-tunables, they are moved
to cpu_features in ld.so and initialized by dummy function pointers via
IFUNC relocation.  CPU features are initialized by DL_PLATFORM_INIT in
dynamic executable and by ARCH_INIT_CPU_FEATURES in static executable.
To initialize CPU features when loading ld.so inside of static executable,
where DL_PLATFORM_INIT isn't called, CPU features are also initialized by
dummy function pointers via IFUNC relocation.

$ ./elf/ld.so --list-tunables
glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
glibc.elision.skip_lock_after_retries: 3 (min: -2147483648, max: 2147483647)
glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffff)
glibc.malloc.perturb: 0 (min: 0, max: 255)
glibc.cpu.x86_shared_cache_size: 0x0 (min: 0x0, max: 0xffffffff)
glibc.elision.tries: 3 (min: -2147483648, max: 2147483647)
glibc.elision.enable: 0 (min: 0, max: 1)
glibc.cpu.x86_rep_movsb_threshold: 0x800 (min: 0x100, max: 0xffffffff)
glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0xffffffff)
glibc.elision.skip_lock_busy: 3 (min: -2147483648, max: 2147483647)
glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0xffffffff)
glibc.cpu.x86_rep_stosb_threshold: 0x800 (min: 0x1, max: 0xffffffff)
glibc.cpu.x86_non_temporal_threshold: 0x0 (min: 0x0, max: 0xffffffff)
glibc.cpu.x86_shstk: 
glibc.cpu.hwcap_mask: 0x1 (min: 0x0, max: 0xffffffff)
glibc.malloc.mmap_max: 0 (min: -2147483648, max: 2147483647)
glibc.elision.skip_trylock_internal_abort: 3 (min: -2147483648, max: 2147483647)
glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffff)
glibc.cpu.x86_ibt: 
glibc.cpu.hwcaps: 
glibc.elision.skip_lock_internal_abort: 3 (min: -2147483648, max: 2147483647)
glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0xffffffff)
glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffff)
glibc.cpu.x86_data_cache_size: 0x0 (min: 0x0, max: 0xffffffff)
glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0xffffffff)
glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0xffffffff)
glibc.pthread.mutex_spin_count: 100 (min: 0, max: 32767)
glibc.rtld.optional_static_tls: 0x200 (min: 0x0, max: 0xffffffff)
glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0xffffffff)
glibc.malloc.check: 0 (min: 0, max: 3)

H.J. Lu (3):
  x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  Set tunable value as well as min/max values
  ld.so: Add --list-tunables to print tunable values

 NEWS                               |   2 +
 elf/Makefile                       |   6 +-
 elf/dl-tunables.c                  |  53 +-
 elf/dl-tunables.h                  |  20 +-
 elf/rtld.c                         |  37 +-
 manual/README.tunables             |  24 +-
 manual/tunables.texi               |  37 ++
 sysdeps/i386/dl-machine.h          |   3 +-
 sysdeps/x86/cacheinfo.c            | 873 ++-------------------------
 sysdeps/x86/cpu-cacheinfo.c        | 922 +++++++++++++++++++++++++++++
 sysdeps/x86/cpu-features.c         |  25 +-
 sysdeps/x86/dl-get-cpu-features.c  |  25 +-
 sysdeps/x86/include/cpu-features.h |  23 +
 sysdeps/x86_64/dl-machine.h        |   3 +-
 14 files changed, 1201 insertions(+), 852 deletions(-)
 create mode 100644 sysdeps/x86/cpu-cacheinfo.c

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-12 13:44 [PATCH 0/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
@ 2020-09-12 13:44 ` H.J. Lu
  2020-09-18  8:06   ` Florian Weimer
  2020-09-12 13:44 ` [PATCH 2/3] Set tunable value as well as min/max values H.J. Lu
  2020-09-12 13:44 ` [PATCH 3/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
  2 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-12 13:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell

X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Instead, we should initialize x86 CPU
features and cache info by initializing dummy function pointers via
IFUNC relocation.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
---
 sysdeps/i386/dl-machine.h          |  3 +--
 sysdeps/x86/cacheinfo.c            | 10 +++++++++-
 sysdeps/x86/dl-get-cpu-features.c  | 25 ++++++++++++++++++++++++-
 sysdeps/x86/include/cpu-features.h |  1 +
 sysdeps/x86_64/dl-machine.h        |  3 +--
 5 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 0f08079e48..5e22e795cc 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,7 +25,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -250,7 +249,7 @@ dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 217c21c34f..7a325ab70e 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -756,7 +756,6 @@ intel_bug_no_cache_info:
 
 
 static void
-__attribute__((constructor))
 init_cacheinfo (void)
 {
   /* Find out what brand of processor.  */
@@ -770,6 +769,8 @@ init_cacheinfo (void)
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
+  assert (cpu_features->basic.kind != arch_kind_unknown);
+
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
@@ -894,4 +895,11 @@ init_cacheinfo (void)
 # endif
 }
 
+/* NB: Call init_cacheinfo by initializing a dummy function pointer via
+   IFUNC relocation.  */
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+  = __x86_cacheinfo;
+
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
 #endif
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
index 5f9e46b0c6..da4697b895 100644
--- a/sysdeps/x86/dl-get-cpu-features.c
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -1,4 +1,4 @@
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
    Copyright (C) 2015-2020 Free Software Foundation, Inc.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,29 @@
 
 #include <ldsodefs.h>
 
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
+   CPU features.  But when loading ld.so inside of static executable,
+   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
+   a dummy function pointer via IFUNC relocation for ld.so.  */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+  = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+  struct cpu_features *cpu_features = __get_cpu_features ();
+  if (cpu_features->basic.kind == arch_kind_unknown)
+    init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+	 _dl_x86_init_cpu_features);
+#endif
+
 #undef __x86_get_cpu_features
 
 const struct cpu_features *
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index dcf29b6fe8..f62be0b9b3 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -159,6 +159,7 @@ struct cpu_features
 /* Unused for x86.  */
 #  define INIT_ARCH()
 #  define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
 # endif
 
 # ifdef __x86_64__
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ca73d8fef9..773e94c8bb 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,7 +26,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,7 +224,7 @@ dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/3] Set tunable value as well as min/max values
  2020-09-12 13:44 [PATCH 0/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
  2020-09-12 13:44 ` [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu
@ 2020-09-12 13:44 ` H.J. Lu
  2020-09-18  8:29   ` Florian Weimer
  2020-09-12 13:44 ` [PATCH 3/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
  2 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-12 13:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell

Some tunable values and their minimum/maximum values must be determinted
at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
tunable value together with minimum and maximum values.  __tunable_set_val
is updated to set tunable value as well as min/max values.  Move x86
processor cache info to cpu_features to use TUNABLE_SET_ALL and cache
CPUID outputs with the same inputs.
---
 elf/dl-tunables.c                  |  17 +-
 elf/dl-tunables.h                  |  18 +-
 manual/README.tunables             |  24 +-
 sysdeps/x86/cacheinfo.c            | 867 ++-------------------------
 sysdeps/x86/cpu-cacheinfo.c        | 922 +++++++++++++++++++++++++++++
 sysdeps/x86/cpu-features.c         |  25 +-
 sysdeps/x86/include/cpu-features.h |  22 +
 7 files changed, 1050 insertions(+), 845 deletions(-)
 create mode 100644 sysdeps/x86/cpu-cacheinfo.c

diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 26e6e26612..b44174fe71 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -101,12 +101,19 @@ get_next_env (char **envp, char **name, size_t *namelen, char **val,
 })
 
 static void
-do_tunable_update_val (tunable_t *cur, const void *valp)
+do_tunable_update_val (tunable_t *cur, const void *valp,
+		       const void *minp, const void *maxp)
 {
   uint64_t val;
 
   if (cur->type.type_code != TUNABLE_TYPE_STRING)
-    val = *((int64_t *) valp);
+    {
+      val = *((int64_t *) valp);
+      if (minp)
+	cur->type.min = *((int64_t *) minp);
+      if (maxp)
+	cur->type.max = *((int64_t *) maxp);
+    }
 
   switch (cur->type.type_code)
     {
@@ -153,15 +160,15 @@ tunable_initialize (tunable_t *cur, const char *strval)
       cur->initialized = true;
       valp = strval;
     }
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, NULL, NULL);
 }
 
 void
-__tunable_set_val (tunable_id_t id, void *valp)
+__tunable_set_val (tunable_id_t id, void *valp, void *minp, void *maxp)
 {
   tunable_t *cur = &tunable_list[id];
 
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, minp, maxp);
 }
 
 #if TUNABLES_FRONTEND == TUNABLES_FRONTEND_valstring
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index f05eb50c2f..b1add3184f 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -70,9 +70,10 @@ typedef struct _tunable tunable_t;
 
 extern void __tunables_init (char **);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
-extern void __tunable_set_val (tunable_id_t, void *);
+extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
 rtld_hidden_proto (__tunable_get_val)
+rtld_hidden_proto (__tunable_set_val)
 
 /* Define TUNABLE_GET and TUNABLE_SET in short form if TOP_NAMESPACE and
    TUNABLE_NAMESPACE are defined.  This is useful shorthand to get and set
@@ -82,11 +83,16 @@ rtld_hidden_proto (__tunable_get_val)
   TUNABLE_GET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __cb)
 # define TUNABLE_SET(__id, __type, __val) \
   TUNABLE_SET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __val)
+# define TUNABLE_SET_ALL(__id, __type, __val, __min, __max) \
+  TUNABLE_SET_ALL_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, \
+			__val, __min, __max)
 #else
 # define TUNABLE_GET(__top, __ns, __id, __type, __cb) \
   TUNABLE_GET_FULL (__top, __ns, __id, __type, __cb)
 # define TUNABLE_SET(__top, __ns, __id, __type, __val) \
   TUNABLE_SET_FULL (__top, __ns, __id, __type, __val)
+# define TUNABLE_SET_ALL(__top, __ns, __id, __type, __val, __min, __max) \
+  TUNABLE_SET_ALL_FULL (__top, __ns, __id, __type, __val, __min, __max)
 #endif
 
 /* Get and return a tunable value.  If the tunable was set externally and __CB
@@ -103,7 +109,15 @@ rtld_hidden_proto (__tunable_get_val)
 # define TUNABLE_SET_FULL(__top, __ns, __id, __type, __val) \
 ({									      \
   __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
-			& (__type) {__val});				      \
+		     & (__type) {__val}, NULL, NULL);			      \
+})
+
+/* Set a tunable value together with min/max values.  */
+# define TUNABLE_SET_ALL_FULL(__top, __ns, __id, __type, __val, __min, __max) \
+({									      \
+  __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
+		     & (__type) {__val},  & (__type) {__min},		      \
+		     & (__type) {__max});				      \
 })
 
 /* Namespace sanity for callback functions.  Use this macro to keep the
diff --git a/manual/README.tunables b/manual/README.tunables
index f87a31a65e..db6f6bae8d 100644
--- a/manual/README.tunables
+++ b/manual/README.tunables
@@ -67,7 +67,7 @@ The list of allowed attributes are:
 				     non-AT_SECURE subprocesses.
 			NONE: Read all the time.
 
-2. Use TUNABLE_GET/TUNABLE_SET to get and set tunables.
+2. Use TUNABLE_GET/TUNABLE_SET/TUNABLE_SET_ALL to get and set tunables.
 
 3. OPTIONAL: If tunables in a namespace are being used multiple times within a
    specific module, set the TUNABLE_NAMESPACE macro to reduce the amount of
@@ -112,9 +112,29 @@ form of the macros as follows:
 where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
 remaining arguments are the same as the short form macros.
 
+The minimum and maximum values can updated together with the tunable value
+using:
+
+  TUNABLE_SET_ALL (check, int32_t, val, min, max)
+
+where 'check' is the tunable name, 'int32_t' is the C type of the tunable,
+'val' is a value of same type, 'min' and 'max' are the minimum and maximum
+values of the tunable.
+
+To set the minimum and maximum values of tunables in a different namespace
+from that module, use the full form of the macros as follows:
+
+  val = TUNABLE_GET_FULL (glibc, cpu, hwcap_mask, uint64_t, NULL)
+
+  TUNABLE_SET_ALL_FULL (glibc, cpu, hwcap_mask, uint64_t, val, min, max)
+
+where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
+remaining arguments are the same as the short form macros.
+
 When TUNABLE_NAMESPACE is not defined in a module, TUNABLE_GET is equivalent to
 TUNABLE_GET_FULL, so you will need to provide full namespace information for
-both macros.  Likewise for TUNABLE_SET and TUNABLE_SET_FULL.
+both macros.  Likewise for TUNABLE_SET, TUNABLE_SET_FULL, TUNABLE_SET_ALL
+and TUNABLE_SET_ALL_FULL.
 
 ** IMPORTANT NOTE **
 
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 7a325ab70e..da17ff76f4 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -18,498 +18,9 @@
 
 #if IS_IN (libc)
 
-#include <assert.h>
-#include <stdbool.h>
-#include <stdlib.h>
 #include <unistd.h>
-#include <cpuid.h>
 #include <init-arch.h>
 
-static const struct intel_02_cache_info
-{
-  unsigned char idx;
-  unsigned char assoc;
-  unsigned char linesize;
-  unsigned char rel_name;
-  unsigned int size;
-} intel_02_known [] =
-  {
-#define M(sc) ((sc) - _SC_LEVEL1_ICACHE_SIZE)
-    { 0x06,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),    8192 },
-    { 0x08,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   16384 },
-    { 0x09,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
-    { 0x0a,  2, 32, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
-    { 0x0c,  4, 32, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x0d,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x0e,  6, 64, M(_SC_LEVEL1_DCACHE_SIZE),   24576 },
-    { 0x21,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x22,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
-    { 0x23,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0x25,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0x29,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0x2c,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
-    { 0x30,  8, 64, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
-    { 0x39,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x3a,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   196608 },
-    { 0x3b,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x3c,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x3d,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   393216 },
-    { 0x3e,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x3f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x41,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x42,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x43,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x44,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x45,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x46,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0x47,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0x48, 12, 64, M(_SC_LEVEL2_CACHE_SIZE),  3145728 },
-    { 0x49, 16, 64, M(_SC_LEVEL2_CACHE_SIZE),  4194304 },
-    { 0x4a, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  6291456 },
-    { 0x4b, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0x4c, 12, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
-    { 0x4d, 16, 64, M(_SC_LEVEL3_CACHE_SIZE), 16777216 },
-    { 0x4e, 24, 64, M(_SC_LEVEL2_CACHE_SIZE),  6291456 },
-    { 0x60,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x66,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
-    { 0x67,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x68,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
-    { 0x78,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x79,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x7a,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x7b,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x7c,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x7d,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x7f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x80,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x82,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x83,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x84,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x85,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x86,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x87,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0xd0,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
-    { 0xd1,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0xd2,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xd6,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0xd7,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xd8,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xdc, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xdd, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xde, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0xe2, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xe3, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xe4, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0xea, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
-    { 0xeb, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 18874368 },
-    { 0xec, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 25165824 },
-  };
-
-#define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0]))
-
-static int
-intel_02_known_compare (const void *p1, const void *p2)
-{
-  const struct intel_02_cache_info *i1;
-  const struct intel_02_cache_info *i2;
-
-  i1 = (const struct intel_02_cache_info *) p1;
-  i2 = (const struct intel_02_cache_info *) p2;
-
-  if (i1->idx == i2->idx)
-    return 0;
-
-  return i1->idx < i2->idx ? -1 : 1;
-}
-
-
-static long int
-__attribute__ ((noinline))
-intel_check_word (int name, unsigned int value, bool *has_level_2,
-		  bool *no_level_2_or_3,
-		  const struct cpu_features *cpu_features)
-{
-  if ((value & 0x80000000) != 0)
-    /* The register value is reserved.  */
-    return 0;
-
-  /* Fold the name.  The _SC_ constants are always in the order SIZE,
-     ASSOC, LINESIZE.  */
-  int folded_rel_name = (M(name) / 3) * 3;
-
-  while (value != 0)
-    {
-      unsigned int byte = value & 0xff;
-
-      if (byte == 0x40)
-	{
-	  *no_level_2_or_3 = true;
-
-	  if (folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-	    /* No need to look further.  */
-	    break;
-	}
-      else if (byte == 0xff)
-	{
-	  /* CPUID leaf 0x4 contains all the information.  We need to
-	     iterate over it.  */
-	  unsigned int eax;
-	  unsigned int ebx;
-	  unsigned int ecx;
-	  unsigned int edx;
-
-	  unsigned int round = 0;
-	  while (1)
-	    {
-	      __cpuid_count (4, round, eax, ebx, ecx, edx);
-
-	      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
-	      if (type == null)
-		/* That was the end.  */
-		break;
-
-	      unsigned int level = (eax >> 5) & 0x7;
-
-	      if ((level == 1 && type == data
-		   && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
-		  || (level == 1 && type == inst
-		      && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
-		  || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-		  || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-		  || (level == 4 && folded_rel_name == M(_SC_LEVEL4_CACHE_SIZE)))
-		{
-		  unsigned int offset = M(name) - folded_rel_name;
-
-		  if (offset == 0)
-		    /* Cache size.  */
-		    return (((ebx >> 22) + 1)
-			    * (((ebx >> 12) & 0x3ff) + 1)
-			    * ((ebx & 0xfff) + 1)
-			    * (ecx + 1));
-		  if (offset == 1)
-		    return (ebx >> 22) + 1;
-
-		  assert (offset == 2);
-		  return (ebx & 0xfff) + 1;
-		}
-
-	      ++round;
-	    }
-	  /* There is no other cache information anywhere else.  */
-	  break;
-	}
-      else
-	{
-	  if (byte == 0x49 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-	    {
-	      /* Intel reused this value.  For family 15, model 6 it
-		 specifies the 3rd level cache.  Otherwise the 2nd
-		 level cache.  */
-	      unsigned int family = cpu_features->basic.family;
-	      unsigned int model = cpu_features->basic.model;
-
-	      if (family == 15 && model == 6)
-		{
-		  /* The level 3 cache is encoded for this model like
-		     the level 2 cache is for other models.  Pretend
-		     the caller asked for the level 2 cache.  */
-		  name = (_SC_LEVEL2_CACHE_SIZE
-			  + (name - _SC_LEVEL3_CACHE_SIZE));
-		  folded_rel_name = M(_SC_LEVEL2_CACHE_SIZE);
-		}
-	    }
-
-	  struct intel_02_cache_info *found;
-	  struct intel_02_cache_info search;
-
-	  search.idx = byte;
-	  found = bsearch (&search, intel_02_known, nintel_02_known,
-			   sizeof (intel_02_known[0]), intel_02_known_compare);
-	  if (found != NULL)
-	    {
-	      if (found->rel_name == folded_rel_name)
-		{
-		  unsigned int offset = M(name) - folded_rel_name;
-
-		  if (offset == 0)
-		    /* Cache size.  */
-		    return found->size;
-		  if (offset == 1)
-		    return found->assoc;
-
-		  assert (offset == 2);
-		  return found->linesize;
-		}
-
-	      if (found->rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-		*has_level_2 = true;
-	    }
-	}
-
-      /* Next byte for the next round.  */
-      value >>= 8;
-    }
-
-  /* Nothing found.  */
-  return 0;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_intel (int name, const struct cpu_features *cpu_features)
-{
-  unsigned int maxidx = cpu_features->basic.max_cpuid;
-
-  /* Return -1 for older CPUs.  */
-  if (maxidx < 2)
-    return -1;
-
-  /* OK, we can use the CPUID instruction to get all info about the
-     caches.  */
-  unsigned int cnt = 0;
-  unsigned int max = 1;
-  long int result = 0;
-  bool no_level_2_or_3 = false;
-  bool has_level_2 = false;
-
-  while (cnt++ < max)
-    {
-      unsigned int eax;
-      unsigned int ebx;
-      unsigned int ecx;
-      unsigned int edx;
-      __cpuid (2, eax, ebx, ecx, edx);
-
-      /* The low byte of EAX in the first round contain the number of
-	 rounds we have to make.  At least one, the one we are already
-	 doing.  */
-      if (cnt == 1)
-	{
-	  max = eax & 0xff;
-	  eax &= 0xffffff00;
-	}
-
-      /* Process the individual registers' value.  */
-      result = intel_check_word (name, eax, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, ebx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, ecx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, edx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-    }
-
-  if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE
-      && no_level_2_or_3)
-    return -1;
-
-  return 0;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_amd (int name)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  __cpuid (0x80000000, eax, ebx, ecx, edx);
-
-  /* No level 4 cache (yet).  */
-  if (name > _SC_LEVEL3_CACHE_LINESIZE)
-    return 0;
-
-  unsigned int fn = 0x80000005 + (name >= _SC_LEVEL2_CACHE_SIZE);
-  if (eax < fn)
-    return 0;
-
-  __cpuid (fn, eax, ebx, ecx, edx);
-
-  if (name < _SC_LEVEL1_DCACHE_SIZE)
-    {
-      name += _SC_LEVEL1_DCACHE_SIZE - _SC_LEVEL1_ICACHE_SIZE;
-      ecx = edx;
-    }
-
-  switch (name)
-    {
-    case _SC_LEVEL1_DCACHE_SIZE:
-      return (ecx >> 14) & 0x3fc00;
-
-    case _SC_LEVEL1_DCACHE_ASSOC:
-      ecx >>= 16;
-      if ((ecx & 0xff) == 0xff)
-	/* Fully associative.  */
-	return (ecx << 2) & 0x3fc00;
-      return ecx & 0xff;
-
-    case _SC_LEVEL1_DCACHE_LINESIZE:
-      return ecx & 0xff;
-
-    case _SC_LEVEL2_CACHE_SIZE:
-      return (ecx & 0xf000) == 0 ? 0 : (ecx >> 6) & 0x3fffc00;
-
-    case _SC_LEVEL2_CACHE_ASSOC:
-      switch ((ecx >> 12) & 0xf)
-	{
-	case 0:
-	case 1:
-	case 2:
-	case 4:
-	  return (ecx >> 12) & 0xf;
-	case 6:
-	  return 8;
-	case 8:
-	  return 16;
-	case 10:
-	  return 32;
-	case 11:
-	  return 48;
-	case 12:
-	  return 64;
-	case 13:
-	  return 96;
-	case 14:
-	  return 128;
-	case 15:
-	  return ((ecx >> 6) & 0x3fffc00) / (ecx & 0xff);
-	default:
-	  return 0;
-	}
-      /* NOTREACHED */
-
-    case _SC_LEVEL2_CACHE_LINESIZE:
-      return (ecx & 0xf000) == 0 ? 0 : ecx & 0xff;
-
-    case _SC_LEVEL3_CACHE_SIZE:
-      return (edx & 0xf000) == 0 ? 0 : (edx & 0x3ffc0000) << 1;
-
-    case _SC_LEVEL3_CACHE_ASSOC:
-      switch ((edx >> 12) & 0xf)
-	{
-	case 0:
-	case 1:
-	case 2:
-	case 4:
-	  return (edx >> 12) & 0xf;
-	case 6:
-	  return 8;
-	case 8:
-	  return 16;
-	case 10:
-	  return 32;
-	case 11:
-	  return 48;
-	case 12:
-	  return 64;
-	case 13:
-	  return 96;
-	case 14:
-	  return 128;
-	case 15:
-	  return ((edx & 0x3ffc0000) << 1) / (edx & 0xff);
-	default:
-	  return 0;
-	}
-      /* NOTREACHED */
-
-    case _SC_LEVEL3_CACHE_LINESIZE:
-      return (edx & 0xf000) == 0 ? 0 : edx & 0xff;
-
-    default:
-      assert (! "cannot happen");
-    }
-  return -1;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_zhaoxin (int name)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-
-  int folded_rel_name = (M(name) / 3) * 3;
-
-  unsigned int round = 0;
-  while (1)
-    {
-      __cpuid_count (4, round, eax, ebx, ecx, edx);
-
-      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
-      if (type == null)
-        break;
-
-      unsigned int level = (eax >> 5) & 0x7;
-
-      if ((level == 1 && type == data
-        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
-        || (level == 1 && type == inst
-            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
-        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
-        {
-          unsigned int offset = M(name) - folded_rel_name;
-
-          if (offset == 0)
-            /* Cache size.  */
-            return (((ebx >> 22) + 1)
-                * (((ebx >> 12) & 0x3ff) + 1)
-                * ((ebx & 0xfff) + 1)
-                * (ecx + 1));
-          if (offset == 1)
-            return (ebx >> 22) + 1;
-
-          assert (offset == 2);
-          return (ebx & 0xfff) + 1;
-        }
-
-      ++round;
-    }
-
-  /* Nothing found.  */
-  return 0;
-}
-
-
-/* Get the value of the system variable NAME.  */
-long int
-attribute_hidden
-__cache_sysconf (int name)
-{
-  const struct cpu_features *cpu_features = __get_cpu_features ();
-
-  if (cpu_features->basic.kind == arch_kind_intel)
-    return handle_intel (name, cpu_features);
-
-  if (cpu_features->basic.kind == arch_kind_amd)
-    return handle_amd (name);
-
-  if (cpu_features->basic.kind == arch_kind_zhaoxin)
-    return handle_zhaoxin (name);
-
-  // XXX Fill in more vendors.
-
-  /* CPU not known, we have no information.  */
-  return 0;
-}
-
-
 /* Data cache size for use in memory and string routines, typically
    L1 size, rounded to multiple of 256 bytes.  */
 long int __x86_data_cache_size_half attribute_hidden = 32 * 1024 / 2;
@@ -537,362 +48,78 @@ long int __x86_rep_movsb_threshold attribute_hidden = 2048;
 long int __x86_rep_stosb_threshold attribute_hidden = 2048;
 
 
-static void
-get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
-                long int core)
+/* Get the value of the system variable NAME.  */
+long int
+attribute_hidden
+__cache_sysconf (int name)
 {
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-
-  /* Number of logical processors sharing L2 cache.  */
-  int threads_l2;
-
-  /* Number of logical processors sharing L3 cache.  */
-  int threads_l3;
-
   const struct cpu_features *cpu_features = __get_cpu_features ();
-  int max_cpuid = cpu_features->basic.max_cpuid;
-  unsigned int family = cpu_features->basic.family;
-  unsigned int model = cpu_features->basic.model;
-  long int shared = *shared_ptr;
-  unsigned int threads = *threads_ptr;
-  bool inclusive_cache = true;
-  bool support_count_mask = true;
-
-  /* Try L3 first.  */
-  unsigned int level = 3;
-
-  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
-    support_count_mask = false;
-
-  if (shared <= 0)
-    {
-      /* Try L2 otherwise.  */
-      level  = 2;
-      shared = core;
-      threads_l2 = 0;
-      threads_l3 = -1;
-    }
-  else
-    {
-      threads_l2 = 0;
-      threads_l3 = 0;
-    }
-
-  /* A value of 0 for the HTT bit indicates there is only a single
-     logical processor.  */
-  if (CPU_FEATURE_USABLE (HTT))
+  switch (name)
     {
-      /* Figure out the number of logical threads that share the
-         highest cache level.  */
-      if (max_cpuid >= 4)
-        {
-          int i = 0;
-
-          /* Query until cache level 2 and 3 are enumerated.  */
-          int check = 0x1 | (threads_l3 == 0) << 1;
-          do
-            {
-              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+    case _SC_LEVEL1_ICACHE_SIZE:
+      return cpu_features->level1_icache_size;
 
-              /* There seems to be a bug in at least some Pentium Ds
-                 which sometimes fail to iterate all cache parameters.
-                 Do not loop indefinitely here, stop in this case and
-                 assume there is no such information.  */
-              if (cpu_features->basic.kind == arch_kind_intel
-                  && (eax & 0x1f) == 0 )
-                goto intel_bug_no_cache_info;
+    case _SC_LEVEL1_DCACHE_SIZE:
+      return cpu_features->level1_dcache_size;
 
-              switch ((eax >> 5) & 0x7)
-                {
-                  default:
-                    break;
-                  case 2:
-                    if ((check & 0x1))
-                      {
-                        /* Get maximum number of logical processors
-                           sharing L2 cache.  */
-                        threads_l2 = (eax >> 14) & 0x3ff;
-                        check &= ~0x1;
-                      }
-                    break;
-                  case 3:
-                    if ((check & (0x1 << 1)))
-                      {
-                        /* Get maximum number of logical processors
-                           sharing L3 cache.  */
-                        threads_l3 = (eax >> 14) & 0x3ff;
+    case _SC_LEVEL1_DCACHE_ASSOC:
+      return cpu_features->level1_dcache_assoc;
 
-                        /* Check if L2 and L3 caches are inclusive.  */
-                        inclusive_cache = (edx & 0x2) != 0;
-                        check &= ~(0x1 << 1);
-                      }
-                    break;
-                }
-            }
-          while (check);
+    case _SC_LEVEL1_DCACHE_LINESIZE:
+      return cpu_features->level1_dcache_linesize;
 
-          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
-             numbers of addressable IDs for logical processors sharing
-             the cache, instead of the maximum number of threads
-             sharing the cache.  */
-          if (max_cpuid >= 11 && support_count_mask)
-            {
-              /* Find the number of logical processors shipped in
-                 one core and apply count mask.  */
-              i = 0;
+    case _SC_LEVEL2_CACHE_SIZE:
+      return cpu_features->level2_cache_size;
 
-              /* Count SMT only if there is L3 cache.  Always count
-                 core if there is no L3 cache.  */
-              int count = ((threads_l2 > 0 && level == 3)
-                           | ((threads_l3 > 0
-                               || (threads_l2 > 0 && level == 2)) << 1));
+    case _SC_LEVEL2_CACHE_ASSOC:
+      return cpu_features->level2_cache_assoc;
 
-              while (count)
-                {
-                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+    case _SC_LEVEL2_CACHE_LINESIZE:
+      return cpu_features->level2_cache_linesize;
 
-                  int shipped = ebx & 0xff;
-                  int type = ecx & 0xff00;
-                  if (shipped == 0 || type == 0)
-                    break;
-                  else if (type == 0x100)
-                    {
-                      /* Count SMT.  */
-                      if ((count & 0x1))
-                        {
-                          int count_mask;
+    case _SC_LEVEL3_CACHE_SIZE:
+      return cpu_features->level3_cache_size;
 
-                          /* Compute count mask.  */
-                          asm ("bsr %1, %0"
-                               : "=r" (count_mask) : "g" (threads_l2));
-                          count_mask = ~(-1 << (count_mask + 1));
-                          threads_l2 = (shipped - 1) & count_mask;
-                          count &= ~0x1;
-                        }
-                    }
-                  else if (type == 0x200)
-                    {
-                      /* Count core.  */
-                      if ((count & (0x1 << 1)))
-                        {
-                          int count_mask;
-                          int threads_core
-                            = (level == 2 ? threads_l2 : threads_l3);
+    case _SC_LEVEL3_CACHE_ASSOC:
+      return cpu_features->level3_cache_assoc;
 
-                          /* Compute count mask.  */
-                          asm ("bsr %1, %0"
-                               : "=r" (count_mask) : "g" (threads_core));
-                          count_mask = ~(-1 << (count_mask + 1));
-                          threads_core = (shipped - 1) & count_mask;
-                          if (level == 2)
-                            threads_l2 = threads_core;
-                          else
-                            threads_l3 = threads_core;
-                          count &= ~(0x1 << 1);
-                        }
-                    }
-                }
-            }
-          if (threads_l2 > 0)
-            threads_l2 += 1;
-          if (threads_l3 > 0)
-            threads_l3 += 1;
-          if (level == 2)
-            {
-              if (threads_l2)
-                {
-                  threads = threads_l2;
-                  if (cpu_features->basic.kind == arch_kind_intel
-                      && threads > 2
-                      && family == 6)
-                    switch (model)
-                      {
-                        case 0x37:
-                        case 0x4a:
-                        case 0x4d:
-                        case 0x5a:
-                        case 0x5d:
-                          /* Silvermont has L2 cache shared by 2 cores.  */
-                          threads = 2;
-                          break;
-                        default:
-                          break;
-                      }
-                }
-            }
-          else if (threads_l3)
-            threads = threads_l3;
-        }
-      else
-        {
-intel_bug_no_cache_info:
-          /* Assume that all logical threads share the highest cache
-             level.  */
-          threads
-            = ((cpu_features->features[COMMON_CPUID_INDEX_1].cpuid.ebx
-                >> 16) & 0xff);
-        }
+    case _SC_LEVEL3_CACHE_LINESIZE:
+      return cpu_features->level3_cache_linesize;
 
-        /* Cap usage of highest cache level to the number of supported
-           threads.  */
-        if (shared > 0 && threads > 0)
-          shared /= threads;
-    }
+    case _SC_LEVEL4_CACHE_SIZE:
+      return cpu_features->level4_cache_size;
 
-  /* Account for non-inclusive L2 and L3 caches.  */
-  if (!inclusive_cache)
-    {
-      if (threads_l2 > 0)
-        core /= threads_l2;
-      shared += core;
+    default:
+      break;
     }
-
-  *shared_ptr = shared;
-  *threads_ptr = threads;
+  return -1;
 }
 
-
 static void
 init_cacheinfo (void)
 {
-  /* Find out what brand of processor.  */
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  int max_cpuid_ex;
-  long int data = -1;
-  long int shared = -1;
-  long int core;
-  unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
+  long int data = cpu_features->data_cache_size;
+  __x86_raw_data_cache_size_half = data / 2;
+  __x86_raw_data_cache_size = data;
+  /* Round data cache size to multiple of 256 bytes.  */
+  data = data & ~255L;
+  __x86_data_cache_size_half = data / 2;
+  __x86_data_cache_size = data;
+
+  long int shared = cpu_features->shared_cache_size;
+  __x86_raw_shared_cache_size_half = shared / 2;
+  __x86_raw_shared_cache_size = shared;
+  /* Round shared cache size to multiple of 256 bytes.  */
+  shared = shared & ~255L;
+  __x86_shared_cache_size_half = shared / 2;
+  __x86_shared_cache_size = shared;
 
-  assert (cpu_features->basic.kind != arch_kind_unknown);
-
-  if (cpu_features->basic.kind == arch_kind_intel)
-    {
-      data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
-      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
-      shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
-
-      get_common_cache_info (&shared, &threads, core);
-    }
-  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
-    {
-      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
-      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
-      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
-
-      get_common_cache_info (&shared, &threads, core);
-    }
-  else if (cpu_features->basic.kind == arch_kind_amd)
-    {
-      data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
-      long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
-      shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
-
-      /* Get maximum extended function. */
-      __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
-
-      if (shared <= 0)
-	/* No shared L3 cache.  All we have is the L2 cache.  */
-	shared = core;
-      else
-	{
-	  /* Figure out the number of logical threads that share L3.  */
-	  if (max_cpuid_ex >= 0x80000008)
-	    {
-	      /* Get width of APIC ID.  */
-	      __cpuid (0x80000008, max_cpuid_ex, ebx, ecx, edx);
-	      threads = 1 << ((ecx >> 12) & 0x0f);
-	    }
-
-	  if (threads == 0)
-	    {
-	      /* If APIC ID width is not available, use logical
-		 processor count.  */
-	      __cpuid (0x00000001, max_cpuid_ex, ebx, ecx, edx);
-
-	      if ((edx & (1 << 28)) != 0)
-		threads = (ebx >> 16) & 0xff;
-	    }
-
-	  /* Cap usage of highest cache level to the number of
-	     supported threads.  */
-	  if (threads > 0)
-	    shared /= threads;
-
-	  /* Account for exclusive L2 and L3 caches.  */
-	  shared += core;
-	}
-    }
-
-  if (cpu_features->data_cache_size != 0)
-    data = cpu_features->data_cache_size;
-
-  if (data > 0)
-    {
-      __x86_raw_data_cache_size_half = data / 2;
-      __x86_raw_data_cache_size = data;
-      /* Round data cache size to multiple of 256 bytes.  */
-      data = data & ~255L;
-      __x86_data_cache_size_half = data / 2;
-      __x86_data_cache_size = data;
-    }
-
-  if (cpu_features->shared_cache_size != 0)
-    shared = cpu_features->shared_cache_size;
-
-  if (shared > 0)
-    {
-      __x86_raw_shared_cache_size_half = shared / 2;
-      __x86_raw_shared_cache_size = shared;
-      /* Round shared cache size to multiple of 256 bytes.  */
-      shared = shared & ~255L;
-      __x86_shared_cache_size_half = shared / 2;
-      __x86_shared_cache_size = shared;
-    }
-
-  /* The large memcpy micro benchmark in glibc shows that 6 times of
-     shared cache size is the approximate value above which non-temporal
-     store becomes faster on a 8-core processor.  This is the 3/4 of the
-     total shared cache size.  */
   __x86_shared_non_temporal_threshold
-    = (cpu_features->non_temporal_threshold != 0
-       ? cpu_features->non_temporal_threshold
-       : __x86_shared_cache_size * threads * 3 / 4);
-
-  /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
-  unsigned int minimum_rep_movsb_threshold;
-  /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16).  */
-  unsigned int rep_movsb_threshold;
-  if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
-      && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
-    {
-      rep_movsb_threshold = 2048 * (64 / 16);
-      minimum_rep_movsb_threshold = 64 * 8;
-    }
-  else if (CPU_FEATURE_PREFERRED_P (cpu_features,
-				    AVX_Fast_Unaligned_Load))
-    {
-      rep_movsb_threshold = 2048 * (32 / 16);
-      minimum_rep_movsb_threshold = 32 * 8;
-    }
-  else
-    {
-      rep_movsb_threshold = 2048 * (16 / 16);
-      minimum_rep_movsb_threshold = 16 * 8;
-    }
-  if (cpu_features->rep_movsb_threshold > minimum_rep_movsb_threshold)
-    __x86_rep_movsb_threshold = cpu_features->rep_movsb_threshold;
-  else
-    __x86_rep_movsb_threshold = rep_movsb_threshold;
+    = cpu_features->non_temporal_threshold;
 
-# if HAVE_TUNABLES
+  __x86_rep_movsb_threshold = cpu_features->rep_movsb_threshold;
   __x86_rep_stosb_threshold = cpu_features->rep_stosb_threshold;
-# endif
 }
 
 /* NB: Call init_cacheinfo by initializing a dummy function pointer via
diff --git a/sysdeps/x86/cpu-cacheinfo.c b/sysdeps/x86/cpu-cacheinfo.c
new file mode 100644
index 0000000000..16b81af333
--- /dev/null
+++ b/sysdeps/x86/cpu-cacheinfo.c
@@ -0,0 +1,922 @@
+/* x86 cache info.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <unistd.h>
+
+static const struct intel_02_cache_info
+{
+  unsigned char idx;
+  unsigned char assoc;
+  unsigned char linesize;
+  unsigned char rel_name;
+  unsigned int size;
+} intel_02_known [] =
+  {
+#define M(sc) ((sc) - _SC_LEVEL1_ICACHE_SIZE)
+    { 0x06,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),    8192 },
+    { 0x08,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   16384 },
+    { 0x09,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
+    { 0x0a,  2, 32, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
+    { 0x0c,  4, 32, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x0d,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x0e,  6, 64, M(_SC_LEVEL1_DCACHE_SIZE),   24576 },
+    { 0x21,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x22,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
+    { 0x23,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0x25,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0x29,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0x2c,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
+    { 0x30,  8, 64, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
+    { 0x39,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x3a,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   196608 },
+    { 0x3b,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x3c,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x3d,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   393216 },
+    { 0x3e,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x3f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x41,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x42,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x43,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x44,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x45,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x46,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0x47,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0x48, 12, 64, M(_SC_LEVEL2_CACHE_SIZE),  3145728 },
+    { 0x49, 16, 64, M(_SC_LEVEL2_CACHE_SIZE),  4194304 },
+    { 0x4a, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  6291456 },
+    { 0x4b, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0x4c, 12, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
+    { 0x4d, 16, 64, M(_SC_LEVEL3_CACHE_SIZE), 16777216 },
+    { 0x4e, 24, 64, M(_SC_LEVEL2_CACHE_SIZE),  6291456 },
+    { 0x60,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x66,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
+    { 0x67,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x68,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
+    { 0x78,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x79,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x7a,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x7b,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x7c,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x7d,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x7f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x80,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x82,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x83,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x84,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x85,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x86,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x87,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0xd0,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
+    { 0xd1,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0xd2,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xd6,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0xd7,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xd8,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xdc, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xdd, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xde, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0xe2, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xe3, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xe4, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0xea, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
+    { 0xeb, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 18874368 },
+    { 0xec, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 25165824 },
+  };
+
+#define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0]))
+
+static int
+intel_02_known_compare (const void *p1, const void *p2)
+{
+  const struct intel_02_cache_info *i1;
+  const struct intel_02_cache_info *i2;
+
+  i1 = (const struct intel_02_cache_info *) p1;
+  i2 = (const struct intel_02_cache_info *) p2;
+
+  if (i1->idx == i2->idx)
+    return 0;
+
+  return i1->idx < i2->idx ? -1 : 1;
+}
+
+
+static long int
+__attribute__ ((noinline))
+intel_check_word (int name, unsigned int value, bool *has_level_2,
+		  bool *no_level_2_or_3,
+		  const struct cpu_features *cpu_features)
+{
+  if ((value & 0x80000000) != 0)
+    /* The register value is reserved.  */
+    return 0;
+
+  /* Fold the name.  The _SC_ constants are always in the order SIZE,
+     ASSOC, LINESIZE.  */
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  while (value != 0)
+    {
+      unsigned int byte = value & 0xff;
+
+      if (byte == 0x40)
+	{
+	  *no_level_2_or_3 = true;
+
+	  if (folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+	    /* No need to look further.  */
+	    break;
+	}
+      else if (byte == 0xff)
+	{
+	  /* CPUID leaf 0x4 contains all the information.  We need to
+	     iterate over it.  */
+	  unsigned int eax;
+	  unsigned int ebx;
+	  unsigned int ecx;
+	  unsigned int edx;
+
+	  unsigned int round = 0;
+	  while (1)
+	    {
+	      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+	      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+	      if (type == null)
+		/* That was the end.  */
+		break;
+
+	      unsigned int level = (eax >> 5) & 0x7;
+
+	      if ((level == 1 && type == data
+		   && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+		  || (level == 1 && type == inst
+		      && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+		  || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+		  || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+		  || (level == 4 && folded_rel_name == M(_SC_LEVEL4_CACHE_SIZE)))
+		{
+		  unsigned int offset = M(name) - folded_rel_name;
+
+		  if (offset == 0)
+		    /* Cache size.  */
+		    return (((ebx >> 22) + 1)
+			    * (((ebx >> 12) & 0x3ff) + 1)
+			    * ((ebx & 0xfff) + 1)
+			    * (ecx + 1));
+		  if (offset == 1)
+		    return (ebx >> 22) + 1;
+
+		  assert (offset == 2);
+		  return (ebx & 0xfff) + 1;
+		}
+
+	      ++round;
+	    }
+	  /* There is no other cache information anywhere else.  */
+	  break;
+	}
+      else
+	{
+	  if (byte == 0x49 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+	    {
+	      /* Intel reused this value.  For family 15, model 6 it
+		 specifies the 3rd level cache.  Otherwise the 2nd
+		 level cache.  */
+	      unsigned int family = cpu_features->basic.family;
+	      unsigned int model = cpu_features->basic.model;
+
+	      if (family == 15 && model == 6)
+		{
+		  /* The level 3 cache is encoded for this model like
+		     the level 2 cache is for other models.  Pretend
+		     the caller asked for the level 2 cache.  */
+		  name = (_SC_LEVEL2_CACHE_SIZE
+			  + (name - _SC_LEVEL3_CACHE_SIZE));
+		  folded_rel_name = M(_SC_LEVEL2_CACHE_SIZE);
+		}
+	    }
+
+	  struct intel_02_cache_info *found;
+	  struct intel_02_cache_info search;
+
+	  search.idx = byte;
+	  found = bsearch (&search, intel_02_known, nintel_02_known,
+			   sizeof (intel_02_known[0]), intel_02_known_compare);
+	  if (found != NULL)
+	    {
+	      if (found->rel_name == folded_rel_name)
+		{
+		  unsigned int offset = M(name) - folded_rel_name;
+
+		  if (offset == 0)
+		    /* Cache size.  */
+		    return found->size;
+		  if (offset == 1)
+		    return found->assoc;
+
+		  assert (offset == 2);
+		  return found->linesize;
+		}
+
+	      if (found->rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+		*has_level_2 = true;
+	    }
+	}
+
+      /* Next byte for the next round.  */
+      value >>= 8;
+    }
+
+  /* Nothing found.  */
+  return 0;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_intel (int name, const struct cpu_features *cpu_features)
+{
+  unsigned int maxidx = cpu_features->basic.max_cpuid;
+
+  /* Return -1 for older CPUs.  */
+  if (maxidx < 2)
+    return -1;
+
+  /* OK, we can use the CPUID instruction to get all info about the
+     caches.  */
+  unsigned int cnt = 0;
+  unsigned int max = 1;
+  long int result = 0;
+  bool no_level_2_or_3 = false;
+  bool has_level_2 = false;
+
+  while (cnt++ < max)
+    {
+      unsigned int eax;
+      unsigned int ebx;
+      unsigned int ecx;
+      unsigned int edx;
+      __cpuid (2, eax, ebx, ecx, edx);
+
+      /* The low byte of EAX in the first round contain the number of
+	 rounds we have to make.  At least one, the one we are already
+	 doing.  */
+      if (cnt == 1)
+	{
+	  max = eax & 0xff;
+	  eax &= 0xffffff00;
+	}
+
+      /* Process the individual registers' value.  */
+      result = intel_check_word (name, eax, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, ebx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, ecx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, edx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+    }
+
+  if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE
+      && no_level_2_or_3)
+    return -1;
+
+  return 0;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_amd (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+  __cpuid (0x80000000, eax, ebx, ecx, edx);
+
+  /* No level 4 cache (yet).  */
+  if (name > _SC_LEVEL3_CACHE_LINESIZE)
+    return 0;
+
+  unsigned int fn = 0x80000005 + (name >= _SC_LEVEL2_CACHE_SIZE);
+  if (eax < fn)
+    return 0;
+
+  __cpuid (fn, eax, ebx, ecx, edx);
+
+  if (name < _SC_LEVEL1_DCACHE_SIZE)
+    {
+      name += _SC_LEVEL1_DCACHE_SIZE - _SC_LEVEL1_ICACHE_SIZE;
+      ecx = edx;
+    }
+
+  switch (name)
+    {
+    case _SC_LEVEL1_DCACHE_SIZE:
+      return (ecx >> 14) & 0x3fc00;
+
+    case _SC_LEVEL1_DCACHE_ASSOC:
+      ecx >>= 16;
+      if ((ecx & 0xff) == 0xff)
+	/* Fully associative.  */
+	return (ecx << 2) & 0x3fc00;
+      return ecx & 0xff;
+
+    case _SC_LEVEL1_DCACHE_LINESIZE:
+      return ecx & 0xff;
+
+    case _SC_LEVEL2_CACHE_SIZE:
+      return (ecx & 0xf000) == 0 ? 0 : (ecx >> 6) & 0x3fffc00;
+
+    case _SC_LEVEL2_CACHE_ASSOC:
+      switch ((ecx >> 12) & 0xf)
+	{
+	case 0:
+	case 1:
+	case 2:
+	case 4:
+	  return (ecx >> 12) & 0xf;
+	case 6:
+	  return 8;
+	case 8:
+	  return 16;
+	case 10:
+	  return 32;
+	case 11:
+	  return 48;
+	case 12:
+	  return 64;
+	case 13:
+	  return 96;
+	case 14:
+	  return 128;
+	case 15:
+	  return ((ecx >> 6) & 0x3fffc00) / (ecx & 0xff);
+	default:
+	  return 0;
+	}
+      /* NOTREACHED */
+
+    case _SC_LEVEL2_CACHE_LINESIZE:
+      return (ecx & 0xf000) == 0 ? 0 : ecx & 0xff;
+
+    case _SC_LEVEL3_CACHE_SIZE:
+      return (edx & 0xf000) == 0 ? 0 : (edx & 0x3ffc0000) << 1;
+
+    case _SC_LEVEL3_CACHE_ASSOC:
+      switch ((edx >> 12) & 0xf)
+	{
+	case 0:
+	case 1:
+	case 2:
+	case 4:
+	  return (edx >> 12) & 0xf;
+	case 6:
+	  return 8;
+	case 8:
+	  return 16;
+	case 10:
+	  return 32;
+	case 11:
+	  return 48;
+	case 12:
+	  return 64;
+	case 13:
+	  return 96;
+	case 14:
+	  return 128;
+	case 15:
+	  return ((edx & 0x3ffc0000) << 1) / (edx & 0xff);
+	default:
+	  return 0;
+	}
+      /* NOTREACHED */
+
+    case _SC_LEVEL3_CACHE_LINESIZE:
+      return (edx & 0xf000) == 0 ? 0 : edx & 0xff;
+
+    default:
+      assert (! "cannot happen");
+    }
+  return -1;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_zhaoxin (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  unsigned int round = 0;
+  while (1)
+    {
+      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+      if (type == null)
+        break;
+
+      unsigned int level = (eax >> 5) & 0x7;
+
+      if ((level == 1 && type == data
+        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+        || (level == 1 && type == inst
+            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
+        {
+          unsigned int offset = M(name) - folded_rel_name;
+
+          if (offset == 0)
+            /* Cache size.  */
+            return (((ebx >> 22) + 1)
+                * (((ebx >> 12) & 0x3ff) + 1)
+                * ((ebx & 0xfff) + 1)
+                * (ecx + 1));
+          if (offset == 1)
+            return (ebx >> 22) + 1;
+
+          assert (offset == 2);
+          return (ebx & 0xfff) + 1;
+        }
+
+      ++round;
+    }
+
+  /* Nothing found.  */
+  return 0;
+}
+
+
+static void
+get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
+                long int core)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  /* Number of logical processors sharing L2 cache.  */
+  int threads_l2;
+
+  /* Number of logical processors sharing L3 cache.  */
+  int threads_l3;
+
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+  int max_cpuid = cpu_features->basic.max_cpuid;
+  unsigned int family = cpu_features->basic.family;
+  unsigned int model = cpu_features->basic.model;
+  long int shared = *shared_ptr;
+  unsigned int threads = *threads_ptr;
+  bool inclusive_cache = true;
+  bool support_count_mask = true;
+
+  /* Try L3 first.  */
+  unsigned int level = 3;
+
+  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
+    support_count_mask = false;
+
+  if (shared <= 0)
+    {
+      /* Try L2 otherwise.  */
+      level  = 2;
+      shared = core;
+      threads_l2 = 0;
+      threads_l3 = -1;
+    }
+  else
+    {
+      threads_l2 = 0;
+      threads_l3 = 0;
+    }
+
+  /* A value of 0 for the HTT bit indicates there is only a single
+     logical processor.  */
+  if (HAS_CPU_FEATURE (HTT))
+    {
+      /* Figure out the number of logical threads that share the
+         highest cache level.  */
+      if (max_cpuid >= 4)
+        {
+          int i = 0;
+
+          /* Query until cache level 2 and 3 are enumerated.  */
+          int check = 0x1 | (threads_l3 == 0) << 1;
+          do
+            {
+              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+
+              /* There seems to be a bug in at least some Pentium Ds
+                 which sometimes fail to iterate all cache parameters.
+                 Do not loop indefinitely here, stop in this case and
+                 assume there is no such information.  */
+              if (cpu_features->basic.kind == arch_kind_intel
+                  && (eax & 0x1f) == 0 )
+                goto intel_bug_no_cache_info;
+
+              switch ((eax >> 5) & 0x7)
+                {
+                  default:
+                    break;
+                  case 2:
+                    if ((check & 0x1))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L2 cache.  */
+                        threads_l2 = (eax >> 14) & 0x3ff;
+                        check &= ~0x1;
+                      }
+                    break;
+                  case 3:
+                    if ((check & (0x1 << 1)))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L3 cache.  */
+                        threads_l3 = (eax >> 14) & 0x3ff;
+
+                        /* Check if L2 and L3 caches are inclusive.  */
+                        inclusive_cache = (edx & 0x2) != 0;
+                        check &= ~(0x1 << 1);
+                      }
+                    break;
+                }
+            }
+          while (check);
+
+          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
+             numbers of addressable IDs for logical processors sharing
+             the cache, instead of the maximum number of threads
+             sharing the cache.  */
+          if (max_cpuid >= 11 && support_count_mask)
+            {
+              /* Find the number of logical processors shipped in
+                 one core and apply count mask.  */
+              i = 0;
+
+              /* Count SMT only if there is L3 cache.  Always count
+                 core if there is no L3 cache.  */
+              int count = ((threads_l2 > 0 && level == 3)
+                           | ((threads_l3 > 0
+                               || (threads_l2 > 0 && level == 2)) << 1));
+
+              while (count)
+                {
+                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+
+                  int shipped = ebx & 0xff;
+                  int type = ecx & 0xff00;
+                  if (shipped == 0 || type == 0)
+                    break;
+                  else if (type == 0x100)
+                    {
+                      /* Count SMT.  */
+                      if ((count & 0x1))
+                        {
+                          int count_mask;
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_l2));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_l2 = (shipped - 1) & count_mask;
+                          count &= ~0x1;
+                        }
+                    }
+                  else if (type == 0x200)
+                    {
+                      /* Count core.  */
+                      if ((count & (0x1 << 1)))
+                        {
+                          int count_mask;
+                          int threads_core
+                            = (level == 2 ? threads_l2 : threads_l3);
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_core));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_core = (shipped - 1) & count_mask;
+                          if (level == 2)
+                            threads_l2 = threads_core;
+                          else
+                            threads_l3 = threads_core;
+                          count &= ~(0x1 << 1);
+                        }
+                    }
+                }
+            }
+          if (threads_l2 > 0)
+            threads_l2 += 1;
+          if (threads_l3 > 0)
+            threads_l3 += 1;
+          if (level == 2)
+            {
+              if (threads_l2)
+                {
+                  threads = threads_l2;
+                  if (cpu_features->basic.kind == arch_kind_intel
+                      && threads > 2
+                      && family == 6)
+                    switch (model)
+                      {
+                        case 0x37:
+                        case 0x4a:
+                        case 0x4d:
+                        case 0x5a:
+                        case 0x5d:
+                          /* Silvermont has L2 cache shared by 2 cores.  */
+                          threads = 2;
+                          break;
+                        default:
+                          break;
+                      }
+                }
+            }
+          else if (threads_l3)
+            threads = threads_l3;
+        }
+      else
+        {
+intel_bug_no_cache_info:
+          /* Assume that all logical threads share the highest cache
+             level.  */
+          threads
+            = ((cpu_features->features[COMMON_CPUID_INDEX_1].cpuid.ebx
+                >> 16) & 0xff);
+        }
+
+        /* Cap usage of highest cache level to the number of supported
+           threads.  */
+        if (shared > 0 && threads > 0)
+          shared /= threads;
+    }
+
+  /* Account for non-inclusive L2 and L3 caches.  */
+  if (!inclusive_cache)
+    {
+      if (threads_l2 > 0)
+        core /= threads_l2;
+      shared += core;
+    }
+
+  *shared_ptr = shared;
+  *threads_ptr = threads;
+}
+
+static void
+dl_init_cacheinfo (struct cpu_features *cpu_features)
+{
+  /* Find out what brand of processor.  */
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+  int max_cpuid_ex;
+  long int data = -1;
+  long int shared = -1;
+  long int core;
+  unsigned int threads = 0;
+  unsigned long int level1_icache_size = -1;
+  unsigned long int level1_dcache_size = -1;
+  unsigned long int level1_dcache_assoc = -1;
+  unsigned long int level1_dcache_linesize = -1;
+  unsigned long int level2_cache_size = -1;
+  unsigned long int level2_cache_assoc = -1;
+  unsigned long int level2_cache_linesize = -1;
+  unsigned long int level3_cache_size = -1;
+  unsigned long int level3_cache_assoc = -1;
+  unsigned long int level3_cache_linesize = -1;
+  unsigned long int level4_cache_size = -1;
+
+  if (cpu_features->basic.kind == arch_kind_intel)
+    {
+      data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
+      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
+      shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
+
+      level1_icache_size
+	= handle_intel (_SC_LEVEL1_ICACHE_SIZE, cpu_features);
+      level1_dcache_size = data;
+      level1_dcache_assoc
+	= handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features);
+      level1_dcache_linesize
+	= handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features);
+      level2_cache_size = core;
+      level2_cache_assoc
+	= handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features);
+      level2_cache_linesize
+	= handle_intel (_SC_LEVEL2_CACHE_LINESIZE, cpu_features);
+      level3_cache_size = shared;
+      level3_cache_assoc
+	= handle_intel (_SC_LEVEL3_CACHE_ASSOC, cpu_features);
+      level3_cache_linesize
+	= handle_intel (_SC_LEVEL3_CACHE_LINESIZE, cpu_features);
+      level4_cache_size
+	= handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features);
+
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    {
+      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
+
+      level1_icache_size = handle_zhaoxin (_SC_LEVEL1_ICACHE_SIZE);
+      level1_dcache_size = data;
+      level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC);
+      level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE);
+      level2_cache_size = core;
+      level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC);
+      level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE);
+      level3_cache_size = shared;
+      level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC);
+      level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE);
+
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_amd)
+    {
+      data  = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
+
+      level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE);
+      level1_dcache_size = data;
+      level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC);
+      level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE);
+      level2_cache_size = core;
+      level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC);
+      level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE);
+      level3_cache_size = shared;
+      level3_cache_assoc = handle_amd (_SC_LEVEL3_CACHE_ASSOC);
+      level3_cache_linesize = handle_amd (_SC_LEVEL3_CACHE_LINESIZE);
+
+      /* Get maximum extended function. */
+      __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
+
+      if (shared <= 0)
+	/* No shared L3 cache.  All we have is the L2 cache.  */
+	shared = core;
+      else
+	{
+	  /* Figure out the number of logical threads that share L3.  */
+	  if (max_cpuid_ex >= 0x80000008)
+	    {
+	      /* Get width of APIC ID.  */
+	      __cpuid (0x80000008, max_cpuid_ex, ebx, ecx, edx);
+	      threads = 1 << ((ecx >> 12) & 0x0f);
+	    }
+
+	  if (threads == 0)
+	    {
+	      /* If APIC ID width is not available, use logical
+		 processor count.  */
+	      __cpuid (0x00000001, max_cpuid_ex, ebx, ecx, edx);
+
+	      if ((edx & (1 << 28)) != 0)
+		threads = (ebx >> 16) & 0xff;
+	    }
+
+	  /* Cap usage of highest cache level to the number of
+	     supported threads.  */
+	  if (threads > 0)
+	    shared /= threads;
+
+	  /* Account for exclusive L2 and L3 caches.  */
+	  shared += core;
+	}
+    }
+
+  cpu_features->level1_icache_size = level1_icache_size;
+  cpu_features->level1_dcache_size = level1_dcache_size;
+  cpu_features->level1_dcache_assoc = level1_dcache_assoc;
+  cpu_features->level1_dcache_linesize = level1_dcache_linesize;
+  cpu_features->level2_cache_size = level2_cache_size;
+  cpu_features->level2_cache_assoc = level2_cache_assoc;
+  cpu_features->level2_cache_linesize = level2_cache_linesize;
+  cpu_features->level3_cache_size = level3_cache_size;
+  cpu_features->level3_cache_assoc = level3_cache_assoc;
+  cpu_features->level3_cache_linesize = level3_cache_linesize;
+  cpu_features->level4_cache_size = level4_cache_size;
+
+  /* The large memcpy micro benchmark in glibc shows that 6 times of
+     shared cache size is the approximate value above which non-temporal
+     store becomes faster on a 8-core processor.  This is the 3/4 of the
+     total shared cache size.  */
+  unsigned long int non_temporal_threshold = (shared * threads * 3 / 4);
+
+#if HAVE_TUNABLES
+  /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
+  unsigned int minimum_rep_movsb_threshold;
+#endif
+  /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16).  */
+  unsigned int rep_movsb_threshold;
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
+      && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
+    {
+      rep_movsb_threshold = 2048 * (64 / 16);
+#if HAVE_TUNABLES
+      minimum_rep_movsb_threshold = 64 * 8;
+#endif
+    }
+  else if (CPU_FEATURE_PREFERRED_P (cpu_features,
+				    AVX_Fast_Unaligned_Load))
+    {
+      rep_movsb_threshold = 2048 * (32 / 16);
+#if HAVE_TUNABLES
+      minimum_rep_movsb_threshold = 32 * 8;
+#endif
+    }
+  else
+    {
+      rep_movsb_threshold = 2048 * (16 / 16);
+#if HAVE_TUNABLES
+      minimum_rep_movsb_threshold = 16 * 8;
+#endif
+    }
+
+  /* The default threshold to use Enhanced REP STOSB.  */
+  unsigned long int rep_stosb_threshold = 2048;
+
+#if HAVE_TUNABLES
+  long int tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_data_cache_size, long int, NULL);
+  /* NB: Ignore the default value 0.  */
+  if (tunable_size)
+    data = tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_shared_cache_size, long int, NULL);
+  /* NB: Ignore the default value 0.  */
+  if (tunable_size)
+    shared = tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL);
+  /* NB: Ignore the default value 0.  */
+  if (tunable_size)
+    non_temporal_threshold = tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL);
+  if (tunable_size > minimum_rep_movsb_threshold)
+    rep_movsb_threshold = tunable_size;
+
+  /* NB: The default value of the x86_rep_stosb_threshold tunable is the
+     same as the default value of __x86_rep_stosb_threshold and the
+     minimum value is fixed.  */
+  rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
+				     long int, NULL);
+
+  TUNABLE_SET_ALL (x86_data_cache_size, long int, data,
+		   0, (long int) -1);
+  TUNABLE_SET_ALL (x86_shared_cache_size, long int, shared,
+		   0, (long int) -1);
+  TUNABLE_SET_ALL (x86_non_temporal_threshold, long int,
+		   non_temporal_threshold, 0, (long int) -1);
+  TUNABLE_SET_ALL (x86_rep_movsb_threshold, long int,
+		   rep_movsb_threshold, minimum_rep_movsb_threshold,
+		   (long int) -1);
+  TUNABLE_SET_ALL (x86_rep_stosb_threshold, long int,
+		   rep_stosb_threshold, 1, (long int) -1);
+#endif
+
+  cpu_features->data_cache_size = data;
+  cpu_features->shared_cache_size = shared;
+  cpu_features->non_temporal_threshold = non_temporal_threshold;
+  cpu_features->rep_movsb_threshold = rep_movsb_threshold;
+  cpu_features->rep_stosb_threshold = rep_stosb_threshold;
+}
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index b0ded20486..d77695623f 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -19,6 +19,7 @@
 #include <cpuid.h>
 #include <cpu-features.h>
 #include <dl-hwcap.h>
+#include <init-arch.h>
 #include <libc-pointer-arith.h>
 
 #if HAVE_TUNABLES
@@ -37,6 +38,8 @@ extern void TUNABLE_CALLBACK (set_x86_shstk) (tunable_val_t *)
 # endif
 #endif
 
+#include <cpu-cacheinfo.c>
+
 #if CET_ENABLED
 # include <dl-cet.h>
 #endif
@@ -616,24 +619,14 @@ no_cpuid:
   cpu_features->basic.model = model;
   cpu_features->basic.stepping = stepping;
 
+  dl_init_cacheinfo (cpu_features);
+
 #if HAVE_TUNABLES
   TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps));
-  cpu_features->non_temporal_threshold
-    = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL);
-  cpu_features->rep_movsb_threshold
-    = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL);
-  cpu_features->rep_stosb_threshold
-    = TUNABLE_GET (x86_rep_stosb_threshold, long int, NULL);
-  cpu_features->data_cache_size
-    = TUNABLE_GET (x86_data_cache_size, long int, NULL);
-  cpu_features->shared_cache_size
-    = TUNABLE_GET (x86_shared_cache_size, long int, NULL);
-#endif
-
-  /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  */
-#if !HAVE_TUNABLES && defined SHARED
-  /* The glibc.cpu.hwcap_mask tunable is initialized already, so no need to do
-     this.  */
+#elif defined SHARED
+  /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  The
+     glibc.cpu.hwcap_mask tunable is initialized already, so no
+     need to do this.  */
   GLRO(dl_hwcap_mask) = HWCAP_IMPORTANT;
 #endif
 
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index f62be0b9b3..3f3bd93320 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -153,6 +153,28 @@ struct cpu_features
   unsigned long int rep_movsb_threshold;
   /* Threshold to use "rep stosb".  */
   unsigned long int rep_stosb_threshold;
+  /* _SC_LEVEL1_ICACHE_SIZE.  */
+  unsigned long int level1_icache_size;
+  /* _SC_LEVEL1_DCACHE_SIZE.  */
+  unsigned long int level1_dcache_size;
+  /* _SC_LEVEL1_DCACHE_ASSOC.  */
+  unsigned long int level1_dcache_assoc;
+  /* _SC_LEVEL1_DCACHE_LINESIZE.  */
+  unsigned long int level1_dcache_linesize;
+  /* _SC_LEVEL2_CACHE_ASSOC.  */
+  unsigned long int level2_cache_size;
+  /* _SC_LEVEL2_DCACHE_ASSOC.  */
+  unsigned long int level2_cache_assoc;
+  /* _SC_LEVEL2_CACHE_LINESIZE.  */
+  unsigned long int level2_cache_linesize;
+  /* /_SC_LEVEL3_CACHE_SIZE.  */
+  unsigned long int level3_cache_size;
+  /* _SC_LEVEL3_CACHE_ASSOC.  */
+  unsigned long int level3_cache_assoc;
+  /* _SC_LEVEL3_CACHE_LINESIZE.  */
+  unsigned long int level3_cache_linesize;
+  /* /_SC_LEVEL4_CACHE_SIZE.  */
+  unsigned long int level4_cache_size;
 };
 
 # if defined (_LIBC) && !IS_IN (nonlib)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] ld.so: Add --list-tunables to print tunable values
  2020-09-12 13:44 [PATCH 0/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
  2020-09-12 13:44 ` [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu
  2020-09-12 13:44 ` [PATCH 2/3] Set tunable value as well as min/max values H.J. Lu
@ 2020-09-12 13:44 ` H.J. Lu
  2020-09-18  8:13   ` Florian Weimer
  2 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-12 13:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, Carlos O'Donell

Pass --list-tunables to ld.so to print tunables with min and max values.
---
 NEWS                 |  2 ++
 elf/Makefile         |  6 +++++-
 elf/dl-tunables.c    | 36 ++++++++++++++++++++++++++++++++++++
 elf/dl-tunables.h    |  2 ++
 elf/rtld.c           | 37 +++++++++++++++++++++++++++++++++++--
 manual/tunables.texi | 37 +++++++++++++++++++++++++++++++++++++
 6 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 878221639b..3ad798c4e3 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,8 @@ Version 2.33
 
 Major new features:
 
+* Pass --list-tunables to ld.so to print tunable values.
+
 * Add <sys/platform/x86.h> to provide query macros for x86 CPU features.
 
 * Support for the RISC-V ISA running on Linux has been expanded to run on
diff --git a/elf/Makefile b/elf/Makefile
index 0b78721848..7943a684ab 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -414,7 +414,7 @@ endif
 ifeq (yes,$(build-shared))
 ifeq ($(run-built-tests),yes)
 tests-special += $(objpfx)tst-pathopt.out $(objpfx)tst-rtld-load-self.out \
-		 $(objpfx)tst-rtld-preload.out
+		 $(objpfx)tst-rtld-preload.out $(objpfx)list-tunables.out
 endif
 tests-special += $(objpfx)check-textrel.out $(objpfx)check-execstack.out \
 		 $(objpfx)check-wx-segment.out \
@@ -962,6 +962,10 @@ $(objpfx)tst-rtld-preload.out: tst-rtld-preload.sh $(objpfx)ld.so \
 		    '$(rpath-link)' '$(tst-rtld-preload-OBJS)' > $@; \
 	$(evaluate-test)
 
+$(objpfx)list-tunables.out: $(objpfx)ld.so
+	$(objpfx)ld.so --list-tunables > $@; \
+	$(evaluate-test)
+
 $(objpfx)initfirst: $(libdl)
 $(objpfx)initfirst.out: $(objpfx)firstobj.so
 
diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index b44174fe71..53226ef258 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -368,6 +368,42 @@ __tunables_init (char **envp)
     }
 }
 
+void
+__tunables_print (void)
+{
+  for (int i = 0; i < sizeof (tunable_list) / sizeof (tunable_t); i++)
+    {
+      tunable_t *cur = &tunable_list[i];
+      _dl_printf ("%s: ", cur->name);
+      switch (cur->type.type_code)
+	{
+	case TUNABLE_TYPE_INT_32:
+	  _dl_printf ("%d (min: %d, max: %d)\n",
+		      (int) cur->val.numval,
+		      (int) cur->type.min,
+		      (int) cur->type.max);
+	  break;
+	case TUNABLE_TYPE_UINT_64:
+	  _dl_printf ("0x%lx (min: 0x%lx, max: 0x%lx)\n",
+		      (long int) cur->val.numval,
+		      (long int) cur->type.min,
+		      (long int) cur->type.max);
+	  break;
+	case TUNABLE_TYPE_SIZE_T:
+	  _dl_printf ("0x%Zx (min: 0x%Zx, max: 0x%Zx)\n",
+		      (size_t) cur->val.numval,
+		      (size_t) cur->type.min,
+		      (size_t) cur->type.max);
+	  break;
+	case TUNABLE_TYPE_STRING:
+	  _dl_printf ("%s\n", cur->val.strval ? cur->val.strval : "");
+	  break;
+	default:
+	  __builtin_unreachable ();
+	}
+    }
+}
+
 /* Set the tunable value.  This is called by the module that the tunable exists
    in. */
 void
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index b1add3184f..a2c728f0dd 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -69,9 +69,11 @@ typedef struct _tunable tunable_t;
 # include "dl-tunable-list.h"
 
 extern void __tunables_init (char **);
+extern void __tunables_print (void);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
 extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
+rtld_hidden_proto (__tunables_print)
 rtld_hidden_proto (__tunable_get_val)
 rtld_hidden_proto (__tunable_set_val)
 
diff --git a/elf/rtld.c b/elf/rtld.c
index 5b882163fa..705bca504f 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -48,6 +48,10 @@
 #include <array_length.h>
 #include <libc-early-init.h>
 
+#if HAVE_TUNABLES
+# include <dl-tunables.h>
+#endif
+
 #include <assert.h>
 
 /* Only enables rtld profiling for architectures which provides non generic
@@ -168,7 +172,7 @@ static void audit_list_add_dynamic_tag (struct audit_list *,
 static const char *audit_list_next (struct audit_list *);
 
 /* This is a list of all the modes the dynamic loader can be in.  */
-enum mode { normal, list, verify, trace };
+enum mode { normal, list, verify, trace, list_tunables };
 
 /* Process all environments variables the dynamic linker must recognize.
    Since all of them start with `LD_' we are a bit smarter while finding
@@ -1263,13 +1267,37 @@ dl_main (const ElfW(Phdr) *phdr,
 	    _dl_argc -= 2;
 	    _dl_argv += 2;
 	  }
+#if HAVE_TUNABLES
+	else if (! strcmp (_dl_argv[1], "--list-tunables"))
+	  {
+	    mode = list_tunables;
+
+	    ++_dl_skip_args;
+	    --_dl_argc;
+	    ++_dl_argv;
+	  }
+#endif
 	else
 	  break;
 
+#if HAVE_TUNABLES
+      if (__builtin_expect (mode, normal) == list_tunables)
+	{
+	  __tunables_print ();
+	  _exit (0);
+	}
+#endif
+
       /* If we have no further argument the program was called incorrectly.
 	 Grant the user some education.  */
       if (_dl_argc < 2)
-	_dl_fatal_printf ("\
+	{
+#if HAVE_TUNABLES
+	  _dl_printf
+#else
+	  _dl_fatal_printf
+#endif
+	    ("\
 Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]\n\
 You have invoked `ld.so', the helper program for shared library executables.\n\
 This program usually lives in the file `/lib/ld.so', and special directives\n\
@@ -1293,6 +1321,11 @@ of this helper program; chances are you did not intend to run this program.\n\
 			in LIST\n\
   --audit LIST          use objects named in LIST as auditors\n\
   --preload LIST        preload objects named in LIST\n");
+#if HAVE_TUNABLES
+	  _dl_fatal_printf ("\
+  --list-tunables       list all tunables with minimum and maximum values\n");
+#endif
+	}
 
       ++_dl_skip_args;
       --_dl_argc;
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 23ef0d40e7..38c8578229 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -28,6 +28,43 @@ Finally, the set of tunables available may vary between distributions as
 the tunables feature allows distributions to add their own tunables under
 their own namespace.
 
+Passing @option{--list-tunables} to the dynamic loader to print all
+tunables with minimum and maximum values:
+
+@example
+$ /lib64/ld-linux-x86-64.so.2 --list-tunables
+glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
+glibc.elision.skip_lock_after_retries: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.perturb: 0 (min: 0, max: 255)
+glibc.cpu.x86_shared_cache_size: 0x100000 (min: 0x0, max: 0xffffffffffffffff)
+glibc.elision.tries: 3 (min: -2147483648, max: 2147483647)
+glibc.elision.enable: 0 (min: 0, max: 1)
+glibc.cpu.x86_rep_movsb_threshold: 0x800 (min: 0x100, max: 0xffffffffffffffff)
+glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.elision.skip_lock_busy: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_rep_stosb_threshold: 0x800 (min: 0x1, max: 0xffffffffffffffff)
+glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_shstk: 
+glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.mmap_max: 0 (min: -2147483648, max: 2147483647)
+glibc.elision.skip_trylock_internal_abort: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_ibt: 
+glibc.cpu.hwcaps: 
+glibc.elision.skip_lock_internal_abort: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0xffffffffffffffff)
+glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_data_cache_size: 0x8000 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0xffffffffffffffff)
+glibc.pthread.mutex_spin_count: 100 (min: 0, max: 32767)
+glibc.rtld.optional_static_tls: 0x200 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.check: 0 (min: 0, max: 3)
+@end example
+
 @menu
 * Tunable names::  The structure of a tunable name
 * Memory Allocation Tunables::  Tunables in the memory allocation subsystem
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-12 13:44 ` [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu
@ 2020-09-18  8:06   ` Florian Weimer
  2020-09-18 13:55     ` H.J. Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2020-09-18  8:06 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha

* H. J. Lu via Libc-alpha:

> X86 CPU features in ld.so are initialized by init_cpu_features, which is
> invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> loaded by static executable, DL_PLATFORM_INIT is never called.

I don't think that's accurate.  It's called from elf/dl-support.c in the
static case.  But I agree that it's too late in this case.

> Also x86 cache info in libc.o and libc.a is initialized by a
> constructor which may be called too late.  Instead, we should
> initialize x86 CPU features and cache info by initializing dummy
> function pointers via IFUNC relocation.

Why would ld put this special IRELATIVE relocation first?  I'm sorry,
but this looks very brittle.

I think there's an existing mechanism for this corner case,
ARCH_SETUP_IREL.  Could you use that?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] ld.so: Add --list-tunables to print tunable values
  2020-09-12 13:44 ` [PATCH 3/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
@ 2020-09-18  8:13   ` Florian Weimer
  2020-09-18 15:24     ` H.J. Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2020-09-18  8:13 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha

* H. J. Lu via Libc-alpha:

> diff --git a/elf/Makefile b/elf/Makefile
> index 0b78721848..7943a684ab 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -414,7 +414,7 @@ endif
>  ifeq (yes,$(build-shared))
>  ifeq ($(run-built-tests),yes)
>  tests-special += $(objpfx)tst-pathopt.out $(objpfx)tst-rtld-load-self.out \
> -		 $(objpfx)tst-rtld-preload.out
> +		 $(objpfx)tst-rtld-preload.out $(objpfx)list-tunables.out
>  endif

I think the new test needs to be conditional on $(have-tunables) as
well.

>        /* If we have no further argument the program was called incorrectly.
>  	 Grant the user some education.  */
>        if (_dl_argc < 2)
> -	_dl_fatal_printf ("\
> +	{
> +#if HAVE_TUNABLES
> +	  _dl_printf
> +#else
> +	  _dl_fatal_printf
> +#endif
> +	    ("\
>  Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]\n\
>  You have invoked `ld.so', the helper program for shared library executables.\n\
>  This program usually lives in the file `/lib/ld.so', and special directives\n\
> @@ -1293,6 +1321,11 @@ of this helper program; chances are you did not intend to run this program.\n\
>  			in LIST\n\
>    --audit LIST          use objects named in LIST as auditors\n\
>    --preload LIST        preload objects named in LIST\n");
> +#if HAVE_TUNABLES
> +	  _dl_fatal_printf ("\
> +  --list-tunables       list all tunables with minimum and maximum values\n");
> +#endif
> +	}

I think you should preprocessor string splicing, so something like this:

  --preload LIST        preload objects named in LIST\n"
#if HAVE_TUNABLES
"\
  --list-tunables       list all tunables with minimum and maximum values\n"
#endif
  );

It should minimize the required changes.

Rest looks okay to me.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] Set tunable value as well as min/max values
  2020-09-12 13:44 ` [PATCH 2/3] Set tunable value as well as min/max values H.J. Lu
@ 2020-09-18  8:29   ` Florian Weimer
  2020-09-18 15:11     ` H.J. Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2020-09-18  8:29 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha

* H. J. Lu via Libc-alpha:

> Some tunable values and their minimum/maximum values must be determinted
> at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
> tunable value together with minimum and maximum values.  __tunable_set_val
> is updated to set tunable value as well as min/max values.  Move x86
> processor cache info to cpu_features to use TUNABLE_SET_ALL and cache
> CPUID outputs with the same inputs.

H.J., would it be too much work to separate the generic and x86 bits in
this patch?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18  8:06   ` Florian Weimer
@ 2020-09-18 13:55     ` H.J. Lu
  2020-09-18 13:59       ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-18 13:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> > loaded by static executable, DL_PLATFORM_INIT is never called.
>
> I don't think that's accurate.  It's called from elf/dl-support.c in the
> static case.  But I agree that it's too late in this case.

I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.

> > Also x86 cache info in libc.o and libc.a is initialized by a
> > constructor which may be called too late.  Instead, we should
> > initialize x86 CPU features and cache info by initializing dummy
> > function pointers via IFUNC relocation.
>
> Why would ld put this special IRELATIVE relocation first?  I'm sorry,
> but this looks very brittle.

This is for ld.so.  ld.so doesn't assume that IFUNC is usable before everything
is setted up properly.   This IRELATIVE relocation is not different from other
dynamic relocations,  IRELATIVE or not.  The key here is that we want to call
init_cacheinfo as early as possible.  Calling from _dl_relocate_object is as
early as we can get.

> I think there's an existing mechanism for this corner case,
> ARCH_SETUP_IREL.  Could you use that?
>

ARCH_SETUP_IREL is for static executable, not ld.so.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 13:55     ` H.J. Lu
@ 2020-09-18 13:59       ` Florian Weimer
  2020-09-18 14:18         ` H.J. Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2020-09-18 13:59 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

> On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
>> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
>> > loaded by static executable, DL_PLATFORM_INIT is never called.
>>
>> I don't think that's accurate.  It's called from elf/dl-support.c in the
>> static case.  But I agree that it's too late in this case.
>
> I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.

I missed that.

You cannot use ld.so in a static executable because it is
uninitialized/dormant.

If you need functionality after static dlopen, it has to reside in
libc.so.6, not the dynamic loader.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 13:59       ` Florian Weimer
@ 2020-09-18 14:18         ` H.J. Lu
  2020-09-18 14:22           ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-18 14:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu via Libc-alpha:
> >>
> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
> >>
> >> I don't think that's accurate.  It's called from elf/dl-support.c in the
> >> static case.  But I agree that it's too late in this case.
> >
> > I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.
>
> I missed that.
>
> You cannot use ld.so in a static executable because it is
> uninitialized/dormant.
>
> If you need functionality after static dlopen, it has to reside in
> libc.so.6, not the dynamic loader.

Here is the problem.  static dlopen uses libc.so.6 which brings in ld.so
with _rtld_global and _rtld_global_ro via DT_NEEDED.  My patch is
trying to deal with it to initialize CPU info in _rtld_global_ro in this case.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 14:18         ` H.J. Lu
@ 2020-09-18 14:22           ` Florian Weimer
  2020-09-18 14:30             ` H.J. Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2020-09-18 14:22 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

> On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >>
>> >> * H. J. Lu via Libc-alpha:
>> >>
>> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
>> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
>> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
>> >>
>> >> I don't think that's accurate.  It's called from elf/dl-support.c in the
>> >> static case.  But I agree that it's too late in this case.
>> >
>> > I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.
>>
>> I missed that.
>>
>> You cannot use ld.so in a static executable because it is
>> uninitialized/dormant.
>>
>> If you need functionality after static dlopen, it has to reside in
>> libc.so.6, not the dynamic loader.
>
> Here is the problem.  static dlopen uses libc.so.6 which brings in ld.so
> with _rtld_global and _rtld_global_ro via DT_NEEDED.  My patch is
> trying to deal with it to initialize CPU info in _rtld_global_ro in this case.

Putting this data into _rtld_global_ro is not correct if it is required
after static dlopen.  The current static dlopen approach simply does not
support that.  You can only assume that libc.so has been initialized
after static dlopen.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 14:22           ` Florian Weimer
@ 2020-09-18 14:30             ` H.J. Lu
  2020-09-21  8:21               ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-18 14:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Fri, Sep 18, 2020 at 7:23 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Fri, Sep 18, 2020 at 6:59 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu:
> >>
> >> > On Fri, Sep 18, 2020 at 1:06 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >>
> >> >> * H. J. Lu via Libc-alpha:
> >> >>
> >> >> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> >> >> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> >> >> > loaded by static executable, DL_PLATFORM_INIT is never called.
> >> >>
> >> >> I don't think that's accurate.  It's called from elf/dl-support.c in the
> >> >> static case.  But I agree that it's too late in this case.
> >> >
> >> > I was referring to the ld.so case.  elf/dl-support.c isn't used for ld.so.
> >>
> >> I missed that.
> >>
> >> You cannot use ld.so in a static executable because it is
> >> uninitialized/dormant.
> >>
> >> If you need functionality after static dlopen, it has to reside in
> >> libc.so.6, not the dynamic loader.
> >
> > Here is the problem.  static dlopen uses libc.so.6 which brings in ld.so
> > with _rtld_global and _rtld_global_ro via DT_NEEDED.  My patch is
> > trying to deal with it to initialize CPU info in _rtld_global_ro in this case.
>
> Putting this data into _rtld_global_ro is not correct if it is required
> after static dlopen.  The current static dlopen approach simply does not
> support that.  You can only assume that libc.so has been initialized
> after static dlopen.
>

Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
is initialized by dynamic relocation.  My patch does it.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] Set tunable value as well as min/max values
  2020-09-18  8:29   ` Florian Weimer
@ 2020-09-18 15:11     ` H.J. Lu
  0 siblings, 0 replies; 18+ messages in thread
From: H.J. Lu @ 2020-09-18 15:11 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Fri, Sep 18, 2020 at 1:29 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Some tunable values and their minimum/maximum values must be determinted
> > at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
> > tunable value together with minimum and maximum values.  __tunable_set_val
> > is updated to set tunable value as well as min/max values.  Move x86
> > processor cache info to cpu_features to use TUNABLE_SET_ALL and cache
> > CPUID outputs with the same inputs.
>
> H.J., would it be too much work to separate the generic and x86 bits in
> this patch?

Will do.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] ld.so: Add --list-tunables to print tunable values
  2020-09-18  8:13   ` Florian Weimer
@ 2020-09-18 15:24     ` H.J. Lu
  0 siblings, 0 replies; 18+ messages in thread
From: H.J. Lu @ 2020-09-18 15:24 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Fri, Sep 18, 2020 at 1:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > diff --git a/elf/Makefile b/elf/Makefile
> > index 0b78721848..7943a684ab 100644
> > --- a/elf/Makefile
> > +++ b/elf/Makefile
> > @@ -414,7 +414,7 @@ endif
> >  ifeq (yes,$(build-shared))
> >  ifeq ($(run-built-tests),yes)
> >  tests-special += $(objpfx)tst-pathopt.out $(objpfx)tst-rtld-load-self.out \
> > -              $(objpfx)tst-rtld-preload.out
> > +              $(objpfx)tst-rtld-preload.out $(objpfx)list-tunables.out
> >  endif
>
> I think the new test needs to be conditional on $(have-tunables) as
> well.

Will do.

> >        /* If we have no further argument the program was called incorrectly.
> >        Grant the user some education.  */
> >        if (_dl_argc < 2)
> > -     _dl_fatal_printf ("\
> > +     {
> > +#if HAVE_TUNABLES
> > +       _dl_printf
> > +#else
> > +       _dl_fatal_printf
> > +#endif
> > +         ("\
> >  Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]\n\
> >  You have invoked `ld.so', the helper program for shared library executables.\n\
> >  This program usually lives in the file `/lib/ld.so', and special directives\n\
> > @@ -1293,6 +1321,11 @@ of this helper program; chances are you did not intend to run this program.\n\
> >                       in LIST\n\
> >    --audit LIST          use objects named in LIST as auditors\n\
> >    --preload LIST        preload objects named in LIST\n");
> > +#if HAVE_TUNABLES
> > +       _dl_fatal_printf ("\
> > +  --list-tunables       list all tunables with minimum and maximum values\n");
> > +#endif
> > +     }
>
> I think you should preprocessor string splicing, so something like this:
>
>   --preload LIST        preload objects named in LIST\n"
> #if HAVE_TUNABLES
> "\
>   --list-tunables       list all tunables with minimum and maximum values\n"
> #endif
>   );
>
> It should minimize the required changes.

Will do.

> Rest looks okay to me.
>

I will submit the updated patch set.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 14:30             ` H.J. Lu
@ 2020-09-21  8:21               ` Florian Weimer
  2020-09-21 11:31                 ` H.J. Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2020-09-21  8:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

>> Putting this data into _rtld_global_ro is not correct if it is required
>> after static dlopen.  The current static dlopen approach simply does not
>> support that.  You can only assume that libc.so has been initialized
>> after static dlopen.

> Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
> is initialized by dynamic relocation.  My patch does it.

It initializes only part of it.  I find this rather confusing.

Why is this data in ld.so in the first place?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-21  8:21               ` Florian Weimer
@ 2020-09-21 11:31                 ` H.J. Lu
  2020-09-25 12:55                   ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: H.J. Lu @ 2020-09-21 11:31 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Mon, Sep 21, 2020 at 1:21 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >> Putting this data into _rtld_global_ro is not correct if it is required
> >> after static dlopen.  The current static dlopen approach simply does not
> >> support that.  You can only assume that libc.so has been initialized
> >> after static dlopen.
>
> > Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
> > is initialized by dynamic relocation.  My patch does it.
>
> It initializes only part of it.  I find this rather confusing.
>
> Why is this data in ld.so in the first place?

I put it in ld.so with

commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Aug 13 03:37:47 2015 -0700

    Add _dl_x86_cpu_features to rtld_global

    This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
    and initializes it early before __libc_start_main is called so that
    cpu_features is always available when it is used and we can avoid
    calling __init_cpu_features in IFUNC selectors.

which made many things possible, including CET support and

commit bea3f92405f705684275bffee954cafe84ffb09d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Oct 22 08:24:00 2017 -0700

    x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]

    In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
    mask and bound registers.  It simplifies _dl_runtime_resolve and supports
    different calling conventions.  ld.so code size is reduced by more than
    1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
    than saving and restoring vector and bound registers individually.

-- 
H.J.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-21 11:31                 ` H.J. Lu
@ 2020-09-25 12:55                   ` Florian Weimer
  0 siblings, 0 replies; 18+ messages in thread
From: Florian Weimer @ 2020-09-25 12:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

> On Mon, Sep 21, 2020 at 1:21 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> >> Putting this data into _rtld_global_ro is not correct if it is required
>> >> after static dlopen.  The current static dlopen approach simply does not
>> >> support that.  You can only assume that libc.so has been initialized
>> >> after static dlopen.
>>
>> > Unless it is removed,  _rtld_global_ro is the right place.  _rtld_global_ro
>> > is initialized by dynamic relocation.  My patch does it.
>>
>> It initializes only part of it.  I find this rather confusing.
>>
>> Why is this data in ld.so in the first place?
>
> I put it in ld.so with
>
> commit e2e4f56056adddc3c1efe676b40a4b4f2453103b
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Thu Aug 13 03:37:47 2015 -0700
>
>     Add _dl_x86_cpu_features to rtld_global
>
>     This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
>     and initializes it early before __libc_start_main is called so that
>     cpu_features is always available when it is used and we can avoid
>     calling __init_cpu_features in IFUNC selectors.
>
> which made many things possible, including CET support and
>
> commit bea3f92405f705684275bffee954cafe84ffb09d
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Sun Oct 22 08:24:00 2017 -0700
>
>     x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
>
>     In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
>     mask and bound registers.  It simplifies _dl_runtime_resolve and supports
>     different calling conventions.  ld.so code size is reduced by more than
>     1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
>     than saving and restoring vector and bound registers individually.

Hmm.  I think the current setup is not really ideal, but I guess it's
what we've got now.  I have not reviewed your entire patch series, but I
won't object to this aspect.  (I hope to review it soon, but if anyone
else wants to step in, that's okay, too.)

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-07-12 14:24 [PATCH 0/3] " H.J. Lu
@ 2020-07-12 14:24 ` H.J. Lu
  0 siblings, 0 replies; 18+ messages in thread
From: H.J. Lu @ 2020-07-12 14:24 UTC (permalink / raw)
  To: libc-alpha

X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Instead, we should initialize x86 CPU
features and cache info by initializing dummy function pointers via
IFUNC relocation.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation.
---
 sysdeps/i386/dl-machine.h          |  3 +--
 sysdeps/x86/cacheinfo.c            | 10 +++++++++-
 sysdeps/x86/dl-get-cpu-features.c  | 23 ++++++++++++++++++++++-
 sysdeps/x86/include/cpu-features.h |  1 +
 sysdeps/x86_64/dl-machine.h        |  3 +--
 5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 0f08079e48..5e22e795cc 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,7 +25,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -250,7 +249,7 @@ dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 217c21c34f..7a325ab70e 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -756,7 +756,6 @@ intel_bug_no_cache_info:
 
 
 static void
-__attribute__((constructor))
 init_cacheinfo (void)
 {
   /* Find out what brand of processor.  */
@@ -770,6 +769,8 @@ init_cacheinfo (void)
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
+  assert (cpu_features->basic.kind != arch_kind_unknown);
+
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
@@ -894,4 +895,11 @@ init_cacheinfo (void)
 # endif
 }
 
+/* NB: Call init_cacheinfo by initializing a dummy function pointer via
+   IFUNC relocation.  */
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+  = __x86_cacheinfo;
+
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
 #endif
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
index 5f9e46b0c6..592a6d5f6b 100644
--- a/sysdeps/x86/dl-get-cpu-features.c
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -1,4 +1,4 @@
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
    Copyright (C) 2015-2020 Free Software Foundation, Inc.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,27 @@
 
 #include <ldsodefs.h>
 
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Call init_cpu_features by initializing a dummy function pointer
+   via IFUNC relocation.  */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+  = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+  struct cpu_features *cpu_features = __get_cpu_features ();
+  if (cpu_features->basic.kind == arch_kind_unknown)
+    init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+	 _dl_x86_init_cpu_features);
+#endif
+
 #undef __x86_get_cpu_features
 
 const struct cpu_features *
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index dcf29b6fe8..f62be0b9b3 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -159,6 +159,7 @@ struct cpu_features
 /* Unused for x86.  */
 #  define INIT_ARCH()
 #  define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
 # endif
 
 # ifdef __x86_64__
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ca73d8fef9..773e94c8bb 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,7 +26,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,7 +224,7 @@ dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-25 12:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-12 13:44 [PATCH 0/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
2020-09-12 13:44 ` [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu
2020-09-18  8:06   ` Florian Weimer
2020-09-18 13:55     ` H.J. Lu
2020-09-18 13:59       ` Florian Weimer
2020-09-18 14:18         ` H.J. Lu
2020-09-18 14:22           ` Florian Weimer
2020-09-18 14:30             ` H.J. Lu
2020-09-21  8:21               ` Florian Weimer
2020-09-21 11:31                 ` H.J. Lu
2020-09-25 12:55                   ` Florian Weimer
2020-09-12 13:44 ` [PATCH 2/3] Set tunable value as well as min/max values H.J. Lu
2020-09-18  8:29   ` Florian Weimer
2020-09-18 15:11     ` H.J. Lu
2020-09-12 13:44 ` [PATCH 3/3] ld.so: Add --list-tunables to print tunable values H.J. Lu
2020-09-18  8:13   ` Florian Weimer
2020-09-18 15:24     ` H.J. Lu
  -- strict thread matches above, loose matches on Subject: below --
2020-07-12 14:24 [PATCH 0/3] " H.J. Lu
2020-07-12 14:24 ` [PATCH 1/3] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).