public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] x86: Add support for Zhaoxin processors
@ 2020-04-24 12:29 mayshao-oc
  2020-04-24 12:29 ` [PATCH v3 1/3] x86: Add CPU Vendor ID detection " mayshao-oc
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: mayshao-oc @ 2020-04-24 12:29 UTC (permalink / raw)
  To: libc-alpha
  Cc: hjl.tools, carlos, fw, QiyuanWang, HerryYang, RickyLi, mayshao-oc

This patch series fix Shanghai Zhaoxin processor CPU Vendor ID detection
problem in glibc sysdep module.  Current glibc doesn't recognize Zhaoxin
CPU Vendor ID("CentaurHauls" and "Shanghai") and set kind to
arch_kind_other.  These lead to incorrect result of __cache_sysconf(),
incorrect value for variables like __x86_shared_cache_size, and fail
of test case tst-get-cpu-features.

v3:
 - code formatting fixups
 - Add a new function get_common_info() that extracts the code in
   init_cacheinfo() to get the value of shared, threads.

v2:
https://sourceware.org/pipermail/libc-alpha/2020-March/112286.html
 - Remove the bit_arch_Prefer_MAP_32BIT_EXEC flag on the Zhaoxin processor
   with family==0x6.

v1:
https://sourceware.org/pipermail/libc-alpha/2019-December/109170.html

This series was checked on x86_64-linux-gnu.

mayshao (3):
  x86: Add CPU Vendor ID detection support for Zhaoxin processors
  x86: Add cache information support for Zhaoxin processors
  x86: Add the test case of __get_cpu_features support for Zhaoxin
    processors

 sysdeps/x86/cacheinfo.c            | 477 ++++++++++++++++++++++---------------
 sysdeps/x86/cpu-features.c         |  54 +++++
 sysdeps/x86/cpu-features.h         |   1 +
 sysdeps/x86/tst-get-cpu-features.c |   2 +
 4 files changed, 338 insertions(+), 196 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/3] x86: Add CPU Vendor ID detection support for Zhaoxin processors
  2020-04-24 12:29 [PATCH v3 0/3] x86: Add support for Zhaoxin processors mayshao-oc
@ 2020-04-24 12:29 ` mayshao-oc
  2020-04-24 12:53   ` H.J. Lu
  2020-04-24 12:29 ` [PATCH v3 2/3] x86: Add cache information " mayshao-oc
  2020-04-24 12:29 ` [PATCH v3 3/3] x86: Add the test case of __get_cpu_features " mayshao-oc
  2 siblings, 1 reply; 20+ messages in thread
From: mayshao-oc @ 2020-04-24 12:29 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, carlos, fw, QiyuanWang, HerryYang, RickyLi, mayshao

From: mayshao <mayshao-oc@zhaoxin.com>

To recognize Zhaoxin CPU Vendor ID, add a new architecture type
arch_kind_zhaoxin for Vendor Zhaoxin detection.
---
 sysdeps/x86/cpu-features.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++
 sysdeps/x86/cpu-features.h |  1 +
 2 files changed, 55 insertions(+)

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 81a170a..bfb415f 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -466,6 +466,60 @@ init_cpu_features (struct cpu_features *cpu_features)
 	  }
 	}
     }
+  /* This spells out "CentaurHauls" or " Shanghai ".  */
+  else if ((ebx == 0x746e6543 && ecx == 0x736c7561 && edx == 0x48727561)
+	   || (ebx == 0x68532020 && ecx == 0x20206961 && edx == 0x68676e61))
+    {
+      unsigned int extended_model, stepping;
+
+      kind = arch_kind_zhaoxin;
+
+      get_common_indices (cpu_features, &family, &model, &extended_model,
+			  &stepping);
+
+      get_extended_indices (cpu_features);
+
+      model += extended_model;
+      if (family == 0x6)
+        {
+          if (model == 0xf || model == 0x19)
+            {
+              cpu_features->feature[index_arch_AVX_Usable]
+                &= (~bit_arch_AVX_Usable
+                & ~bit_arch_AVX2_Usable);
+
+              cpu_features->feature[index_arch_Slow_SSE4_2]
+                |= (bit_arch_Slow_SSE4_2);
+
+              cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
+                &= ~bit_arch_AVX_Fast_Unaligned_Load;
+            }
+        }
+      else if (family == 0x7)
+        {
+          if (model == 0x1b)
+            {
+              cpu_features->feature[index_arch_AVX_Usable]
+                &= (~bit_arch_AVX_Usable
+                & ~bit_arch_AVX2_Usable);
+
+              cpu_features->feature[index_arch_Slow_SSE4_2]
+                |= bit_arch_Slow_SSE4_2;
+
+              cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
+                &= ~bit_arch_AVX_Fast_Unaligned_Load;
+           }
+         else if (model == 0x3b)
+           {
+             cpu_features->feature[index_arch_AVX_Usable]
+               &= (~bit_arch_AVX_Usable
+               & ~bit_arch_AVX2_Usable);
+
+               cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
+               &= ~bit_arch_AVX_Fast_Unaligned_Load;
+           }
+       }
+    }
   else
     {
       kind = arch_kind_other;
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index aea83e6..f05d5ce 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -53,6 +53,7 @@ enum cpu_features_kind
   arch_kind_unknown = 0,
   arch_kind_intel,
   arch_kind_amd,
+  arch_kind_zhaoxin,
   arch_kind_other
 };
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-24 12:29 [PATCH v3 0/3] x86: Add support for Zhaoxin processors mayshao-oc
  2020-04-24 12:29 ` [PATCH v3 1/3] x86: Add CPU Vendor ID detection " mayshao-oc
@ 2020-04-24 12:29 ` mayshao-oc
  2020-04-24 12:53   ` H.J. Lu
  2020-04-24 12:29 ` [PATCH v3 3/3] x86: Add the test case of __get_cpu_features " mayshao-oc
  2 siblings, 1 reply; 20+ messages in thread
From: mayshao-oc @ 2020-04-24 12:29 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, carlos, fw, QiyuanWang, HerryYang, RickyLi, mayshao

From: mayshao <mayshao-oc@zhaoxin.com>

To obtain Zhaoxin CPU cache information, add a new function
handle_zhaoxin().

Add a new function get_common_info() that extracts the code
in init_cacheinfo() to get the value of the variable shared,
threads.

Add Zhaoxin branch in init_cacheinfo() for initializing variables,
such as __x86_shared_cache_size.
---
 sysdeps/x86/cacheinfo.c | 477 ++++++++++++++++++++++++++++--------------------
 1 file changed, 281 insertions(+), 196 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index e3e8ef2..14c6094 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -436,6 +436,57 @@ handle_amd (int name)
 }
 
 
+static long int __attribute__ ((noinline))
+handle_zhaoxin (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  unsigned int round = 0;
+  while (1)
+    {
+      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+      if (type == null)
+        break;
+
+      unsigned int level = (eax >> 5) & 0x7;
+
+      if ((level == 1 && type == data
+        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+        || (level == 1 && type == inst
+            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
+        {
+          unsigned int offset = M(name) - folded_rel_name;
+
+          if (offset == 0)
+            /* Cache size.  */
+            return (((ebx >> 22) + 1)
+                * (((ebx >> 12) & 0x3ff) + 1)
+                * ((ebx & 0xfff) + 1)
+                * (ecx + 1));
+          if (offset == 1)
+            return (ebx >> 22) + 1;
+
+          assert (offset == 2);
+          return (ebx & 0xfff) + 1;
+        }
+
+      ++round;
+    }
+
+  /* Nothing found.  */	
+  return 0;
+}
+
+
 /* Get the value of the system variable NAME.  */
 long int
 attribute_hidden
@@ -449,6 +500,9 @@ __cache_sysconf (int name)
   if (cpu_features->basic.kind == arch_kind_amd)
     return handle_amd (name);
 
+  if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    return handle_zhaoxin (name);
+
   // XXX Fill in more vendors.
 
   /* CPU not known, we have no information.  */
@@ -483,6 +537,223 @@ int __x86_prefetchw attribute_hidden;
 
 
 static void
+get_common_info (long int *shared_ptr, unsigned int *threads_ptr,
+                long int core)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  /* Number of logical processors sharing L2 cache.  */
+  int threads_l2;
+
+  /* Number of logical processors sharing L3 cache.  */
+  int threads_l3;
+
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+  int max_cpuid = cpu_features->basic.max_cpuid;
+  unsigned int family = cpu_features->basic.family;
+  unsigned int model = cpu_features->basic.model;
+  long int shared = *shared_ptr;
+  unsigned int threads = *threads_ptr;
+  bool inclusive_cache = true;
+  bool ignore_leaf_b = false; 
+
+  /* Try L3 first.  */
+  unsigned int level = 3;
+
+  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
+    ignore_leaf_b = true;
+  
+  if (shared <= 0)
+    {
+      /* Try L2 otherwise.  */
+      level  = 2;
+      shared = core;
+      threads_l2 = 0;
+      threads_l3 = -1;
+    }
+  else
+    {
+      threads_l2 = 0;
+      threads_l3 = 0;
+    }
+
+  /* A value of 0 for the HTT bit indicates there is only a single
+     logical processor.  */
+  if (HAS_CPU_FEATURE (HTT))
+    {
+      /* Figure out the number of logical threads that share the
+         highest cache level.  */
+      if (max_cpuid >= 4)
+        {
+          int i = 0;
+
+          /* Query until cache level 2 and 3 are enumerated.  */
+          int check = 0x1 | (threads_l3 == 0) << 1;
+          do
+            {
+              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+
+              /* There seems to be a bug in at least some Pentium Ds
+                 which sometimes fail to iterate all cache parameters.
+                 Do not loop indefinitely here, stop in this case and
+                 assume there is no such information.  */
+              if ((eax & 0x1f) == 0 
+                   && cpu_features->basic.kind == arch_kind_intel)
+                goto intel_bug_no_cache_info;
+
+              switch ((eax >> 5) & 0x7)
+                {
+                  default:
+                    break;
+                  case 2:
+                    if ((check & 0x1))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L2 cache.  */
+                        threads_l2 = (eax >> 14) & 0x3ff;
+                        check &= ~0x1;
+                      }
+                    break;
+                  case 3:
+                    if ((check & (0x1 << 1)))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L3 cache.  */
+                        threads_l3 = (eax >> 14) & 0x3ff;
+
+                        /* Check if L2 and L3 caches are inclusive.  */
+                        inclusive_cache = (edx & 0x2) != 0;
+                        check &= ~(0x1 << 1);
+                      }
+                    break;
+                }
+            }
+          while (check);
+
+          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
+             numbers of addressable IDs for logical processors sharing
+             the cache, instead of the maximum number of threads
+             sharing the cache.  */
+          if ((max_cpuid >= 11) && (!ignore_leaf_b))
+            {
+              /* Find the number of logical processors shipped in
+                 one core and apply count mask.  */
+              i = 0;
+
+              /* Count SMT only if there is L3 cache.  Always count
+                 core if there is no L3 cache.  */
+              int count = ((threads_l2 > 0 && level == 3)
+                           | ((threads_l3 > 0
+                               || (threads_l2 > 0 && level == 2)) << 1));
+
+              while (count)
+                {
+                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+
+                  int shipped = ebx & 0xff;
+                  int type = ecx & 0xff00;
+                  if (shipped == 0 || type == 0)
+                    break;
+                  else if (type == 0x100)
+                    {
+                      /* Count SMT.  */
+                      if ((count & 0x1))
+                        {
+                          int count_mask;
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_l2));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_l2 = (shipped - 1) & count_mask;
+                          count &= ~0x1;
+                        }
+                    }
+                  else if (type == 0x200)
+                    {
+                      /* Count core.  */
+                      if ((count & (0x1 << 1)))
+                        {
+                          int count_mask;
+                          int threads_core
+                            = (level == 2 ? threads_l2 : threads_l3);
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_core));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_core = (shipped - 1) & count_mask;
+                          if (level == 2)
+                            threads_l2 = threads_core;
+                          else
+                            threads_l3 = threads_core;
+                          count &= ~(0x1 << 1);
+                        }
+                    }
+                }
+            }
+          if (threads_l2 > 0)
+            threads_l2 += 1;
+          if (threads_l3 > 0)
+            threads_l3 += 1;
+          if (level == 2)
+            {
+              if (threads_l2)
+                {
+                  threads = threads_l2;
+                  if (threads > 2 && family == 6
+                     && cpu_features->basic.kind == arch_kind_intel)
+                    switch (model)
+                      {
+                        case 0x37:
+                        case 0x4a:
+                        case 0x4d:
+                        case 0x5a:
+                        case 0x5d:
+                          /* Silvermont has L2 cache shared by 2 cores.  */
+                          threads = 2;
+                          break;
+                        default:
+                          break;
+                      }
+                }
+            }
+          else if (threads_l3)
+            threads = threads_l3;
+        }
+      else
+        {
+intel_bug_no_cache_info:
+          /* Assume that all logical threads share the highest cache
+             level.  */
+          threads
+            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
+                >> 16) & 0xff);
+        }
+
+        /* Cap usage of highest cache level to the number of supported
+           threads.  */
+        if (shared > 0 && threads > 0)
+          shared /= threads;
+    }
+
+  /* Account for non-inclusive L2 and L3 caches.  */
+  if (!inclusive_cache)
+    {
+      if (threads_l2 > 0)
+        core /= threads_l2;
+      shared += core;
+    }
+
+  *shared_ptr = shared;
+  *threads_ptr = threads;
+}
+
+
+static void
 __attribute__((constructor))
 init_cacheinfo (void)
 {
@@ -494,211 +765,25 @@ init_cacheinfo (void)
   int max_cpuid_ex;
   long int data = -1;
   long int shared = -1;
-  unsigned int level;
+  long int core;
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
-  int max_cpuid = cpu_features->basic.max_cpuid;
 
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
-
-      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
-      bool inclusive_cache = true;
-
-      /* Try L3 first.  */
-      level  = 3;
+      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
       shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
 
-      /* Number of logical processors sharing L2 cache.  */
-      int threads_l2;
-
-      /* Number of logical processors sharing L3 cache.  */
-      int threads_l3;
-
-      if (shared <= 0)
-	{
-	  /* Try L2 otherwise.  */
-	  level  = 2;
-	  shared = core;
-	  threads_l2 = 0;
-	  threads_l3 = -1;
-	}
-      else
-	{
-	  threads_l2 = 0;
-	  threads_l3 = 0;
-	}
-
-      /* A value of 0 for the HTT bit indicates there is only a single
-	 logical processor.  */
-      if (HAS_CPU_FEATURE (HTT))
-	{
-	  /* Figure out the number of logical threads that share the
-	     highest cache level.  */
-	  if (max_cpuid >= 4)
-	    {
-	      unsigned int family = cpu_features->basic.family;
-	      unsigned int model = cpu_features->basic.model;
-
-	      int i = 0;
-
-	      /* Query until cache level 2 and 3 are enumerated.  */
-	      int check = 0x1 | (threads_l3 == 0) << 1;
-	      do
-		{
-		  __cpuid_count (4, i++, eax, ebx, ecx, edx);
-
-		  /* There seems to be a bug in at least some Pentium Ds
-		     which sometimes fail to iterate all cache parameters.
-		     Do not loop indefinitely here, stop in this case and
-		     assume there is no such information.  */
-		  if ((eax & 0x1f) == 0)
-		    goto intel_bug_no_cache_info;
-
-		  switch ((eax >> 5) & 0x7)
-		    {
-		    default:
-		      break;
-		    case 2:
-		      if ((check & 0x1))
-			{
-			  /* Get maximum number of logical processors
-			     sharing L2 cache.  */
-			  threads_l2 = (eax >> 14) & 0x3ff;
-			  check &= ~0x1;
-			}
-		      break;
-		    case 3:
-		      if ((check & (0x1 << 1)))
-			{
-			  /* Get maximum number of logical processors
-			     sharing L3 cache.  */
-			  threads_l3 = (eax >> 14) & 0x3ff;
-
-			  /* Check if L2 and L3 caches are inclusive.  */
-			  inclusive_cache = (edx & 0x2) != 0;
-			  check &= ~(0x1 << 1);
-			}
-		      break;
-		    }
-		}
-	      while (check);
-
-	      /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
-		 numbers of addressable IDs for logical processors sharing
-		 the cache, instead of the maximum number of threads
-		 sharing the cache.  */
-	      if (max_cpuid >= 11)
-		{
-		  /* Find the number of logical processors shipped in
-		     one core and apply count mask.  */
-		  i = 0;
-
-		  /* Count SMT only if there is L3 cache.  Always count
-		     core if there is no L3 cache.  */
-		  int count = ((threads_l2 > 0 && level == 3)
-			       | ((threads_l3 > 0
-				   || (threads_l2 > 0 && level == 2)) << 1));
-
-		  while (count)
-		    {
-		      __cpuid_count (11, i++, eax, ebx, ecx, edx);
-
-		      int shipped = ebx & 0xff;
-		      int type = ecx & 0xff00;
-		      if (shipped == 0 || type == 0)
-			break;
-		      else if (type == 0x100)
-			{
-			  /* Count SMT.  */
-			  if ((count & 0x1))
-			    {
-			      int count_mask;
-
-			      /* Compute count mask.  */
-			      asm ("bsr %1, %0"
-				   : "=r" (count_mask) : "g" (threads_l2));
-			      count_mask = ~(-1 << (count_mask + 1));
-			      threads_l2 = (shipped - 1) & count_mask;
-			      count &= ~0x1;
-			    }
-			}
-		      else if (type == 0x200)
-			{
-			  /* Count core.  */
-			  if ((count & (0x1 << 1)))
-			    {
-			      int count_mask;
-			      int threads_core
-				= (level == 2 ? threads_l2 : threads_l3);
-
-			      /* Compute count mask.  */
-			      asm ("bsr %1, %0"
-				   : "=r" (count_mask) : "g" (threads_core));
-			      count_mask = ~(-1 << (count_mask + 1));
-			      threads_core = (shipped - 1) & count_mask;
-			      if (level == 2)
-				threads_l2 = threads_core;
-			      else
-				threads_l3 = threads_core;
-			      count &= ~(0x1 << 1);
-			    }
-			}
-		    }
-		}
-	      if (threads_l2 > 0)
-		threads_l2 += 1;
-	      if (threads_l3 > 0)
-		threads_l3 += 1;
-	      if (level == 2)
-		{
-		  if (threads_l2)
-		    {
-		      threads = threads_l2;
-		      if (threads > 2 && family == 6)
-			switch (model)
-			  {
-			  case 0x37:
-			  case 0x4a:
-			  case 0x4d:
-			  case 0x5a:
-			  case 0x5d:
-			    /* Silvermont has L2 cache shared by 2 cores.  */
-			    threads = 2;
-			    break;
-			  default:
-			    break;
-			  }
-		    }
-		}
-	      else if (threads_l3)
-		threads = threads_l3;
-	    }
-	  else
-	    {
-intel_bug_no_cache_info:
-	      /* Assume that all logical threads share the highest cache
-		 level.  */
-
-	      threads
-		= ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
-		    >> 16) & 0xff);
-	    }
-
-	  /* Cap usage of highest cache level to the number of supported
-	     threads.  */
-	  if (shared > 0 && threads > 0)
-	    shared /= threads;
-	}
+      get_common_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    {
+      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
 
-      /* Account for non-inclusive L2 and L3 caches.  */
-      if (!inclusive_cache)
-	{
-	  if (threads_l2 > 0)
-	    core /= threads_l2;
-	  shared += core;
-	}
+      get_common_info (&shared, &threads, core);
     }
   else if (cpu_features->basic.kind == arch_kind_amd)
     {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 3/3] x86: Add the test case of __get_cpu_features support for Zhaoxin processors
  2020-04-24 12:29 [PATCH v3 0/3] x86: Add support for Zhaoxin processors mayshao-oc
  2020-04-24 12:29 ` [PATCH v3 1/3] x86: Add CPU Vendor ID detection " mayshao-oc
  2020-04-24 12:29 ` [PATCH v3 2/3] x86: Add cache information " mayshao-oc
@ 2020-04-24 12:29 ` mayshao-oc
  2020-04-24 12:54   ` H.J. Lu
  2 siblings, 1 reply; 20+ messages in thread
From: mayshao-oc @ 2020-04-24 12:29 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, carlos, fw, QiyuanWang, HerryYang, RickyLi, mayshao

From: mayshao <mayshao-oc@zhaoxin.com>

For the test case of the __get_cpu_features interface, add an item in
cpu_kinds and a switch case for Zhaoxin support.
---
 sysdeps/x86/tst-get-cpu-features.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sysdeps/x86/tst-get-cpu-features.c b/sysdeps/x86/tst-get-cpu-features.c
index 0f55987..0dcb906 100644
--- a/sysdeps/x86/tst-get-cpu-features.c
+++ b/sysdeps/x86/tst-get-cpu-features.c
@@ -38,6 +38,7 @@ static const char * const cpu_kinds[] =
   "Unknown",
   "Intel",
   "AMD",
+  "ZHAOXIN",
   "Other",
 };
 
@@ -50,6 +51,7 @@ do_test (void)
     {
     case arch_kind_intel:
     case arch_kind_amd:
+    case arch_kind_zhaoxin:
     case arch_kind_other:
       printf ("Vendor: %s\n", cpu_kinds[cpu_features->basic.kind]);
       printf ("Family: 0x%x\n", cpu_features->basic.family);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-24 12:29 ` [PATCH v3 2/3] x86: Add cache information " mayshao-oc
@ 2020-04-24 12:53   ` H.J. Lu
  2020-04-26  5:54     ` Mayshao-oc
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-04-24 12:53 UTC (permalink / raw)
  To: mayshao-oc
  Cc: GNU C Library, Carlos O'Donell, Florian Weimer,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Ricky Li(BJ-RD)

On Fri, Apr 24, 2020 at 5:29 AM mayshao-oc <mayshao-oc@zhaoxin.com> wrote:
>
> From: mayshao <mayshao-oc@zhaoxin.com>
>
> To obtain Zhaoxin CPU cache information, add a new function
> handle_zhaoxin().
>
> Add a new function get_common_info() that extracts the code
> in init_cacheinfo() to get the value of the variable shared,
> threads.
>
> Add Zhaoxin branch in init_cacheinfo() for initializing variables,
> such as __x86_shared_cache_size.
> ---
>  sysdeps/x86/cacheinfo.c | 477 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 281 insertions(+), 196 deletions(-)
>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index e3e8ef2..14c6094 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -436,6 +436,57 @@ handle_amd (int name)
>  }
>
>
> +static long int __attribute__ ((noinline))
> +handle_zhaoxin (int name)
> +{
> +  unsigned int eax;
> +  unsigned int ebx;
> +  unsigned int ecx;
> +  unsigned int edx;
> +
> +  int folded_rel_name = (M(name) / 3) * 3;
> +
> +  unsigned int round = 0;
> +  while (1)
> +    {
> +      __cpuid_count (4, round, eax, ebx, ecx, edx);
> +
> +      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
> +      if (type == null)
> +        break;
> +
> +      unsigned int level = (eax >> 5) & 0x7;
> +
> +      if ((level == 1 && type == data
> +        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
> +        || (level == 1 && type == inst
> +            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
> +        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
> +        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
> +        {
> +          unsigned int offset = M(name) - folded_rel_name;
> +
> +          if (offset == 0)
> +            /* Cache size.  */
> +            return (((ebx >> 22) + 1)
> +                * (((ebx >> 12) & 0x3ff) + 1)
> +                * ((ebx & 0xfff) + 1)
> +                * (ecx + 1));
> +          if (offset == 1)
> +            return (ebx >> 22) + 1;
> +
> +          assert (offset == 2);
> +          return (ebx & 0xfff) + 1;
> +        }
> +
> +      ++round;
> +    }
> +
> +  /* Nothing found.  */
> +  return 0;
> +}
> +
> +
>  /* Get the value of the system variable NAME.  */
>  long int
>  attribute_hidden
> @@ -449,6 +500,9 @@ __cache_sysconf (int name)
>    if (cpu_features->basic.kind == arch_kind_amd)
>      return handle_amd (name);
>
> +  if (cpu_features->basic.kind == arch_kind_zhaoxin)
> +    return handle_zhaoxin (name);
> +
>    // XXX Fill in more vendors.
>
>    /* CPU not known, we have no information.  */
> @@ -483,6 +537,223 @@ int __x86_prefetchw attribute_hidden;
>
>
>  static void
> +get_common_info (long int *shared_ptr, unsigned int *threads_ptr,
> +                long int core)

get_common_cache_info

> +{
> +  unsigned int eax;
> +  unsigned int ebx;
> +  unsigned int ecx;
> +  unsigned int edx;
> +
> +  /* Number of logical processors sharing L2 cache.  */
> +  int threads_l2;
> +
> +  /* Number of logical processors sharing L3 cache.  */
> +  int threads_l3;
> +
> +  const struct cpu_features *cpu_features = __get_cpu_features ();
> +  int max_cpuid = cpu_features->basic.max_cpuid;
> +  unsigned int family = cpu_features->basic.family;
> +  unsigned int model = cpu_features->basic.model;
> +  long int shared = *shared_ptr;
> +  unsigned int threads = *threads_ptr;
> +  bool inclusive_cache = true;
> +  bool ignore_leaf_b = false;

Change to support_count_mask.

> +
> +  /* Try L3 first.  */
> +  unsigned int level = 3;
> +
> +  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
> +    ignore_leaf_b = true;
> +
> +  if (shared <= 0)
> +    {
> +      /* Try L2 otherwise.  */
> +      level  = 2;
> +      shared = core;
> +      threads_l2 = 0;
> +      threads_l3 = -1;
> +    }
> +  else
> +    {
> +      threads_l2 = 0;
> +      threads_l3 = 0;
> +    }
> +
> +  /* A value of 0 for the HTT bit indicates there is only a single
> +     logical processor.  */
> +  if (HAS_CPU_FEATURE (HTT))
> +    {
> +      /* Figure out the number of logical threads that share the
> +         highest cache level.  */
> +      if (max_cpuid >= 4)
> +        {
> +          int i = 0;
> +
> +          /* Query until cache level 2 and 3 are enumerated.  */
> +          int check = 0x1 | (threads_l3 == 0) << 1;
> +          do
> +            {
> +              __cpuid_count (4, i++, eax, ebx, ecx, edx);
> +
> +              /* There seems to be a bug in at least some Pentium Ds
> +                 which sometimes fail to iterate all cache parameters.
> +                 Do not loop indefinitely here, stop in this case and
> +                 assume there is no such information.  */
> +              if ((eax & 0x1f) == 0
> +                   && cpu_features->basic.kind == arch_kind_intel)

Check arch_kind_intel first.

> +                goto intel_bug_no_cache_info;
> +
> +              switch ((eax >> 5) & 0x7)
> +                {
> +                  default:
> +                    break;
> +                  case 2:
> +                    if ((check & 0x1))
> +                      {
> +                        /* Get maximum number of logical processors
> +                           sharing L2 cache.  */
> +                        threads_l2 = (eax >> 14) & 0x3ff;
> +                        check &= ~0x1;
> +                      }
> +                    break;
> +                  case 3:
> +                    if ((check & (0x1 << 1)))
> +                      {
> +                        /* Get maximum number of logical processors
> +                           sharing L3 cache.  */
> +                        threads_l3 = (eax >> 14) & 0x3ff;
> +
> +                        /* Check if L2 and L3 caches are inclusive.  */
> +                        inclusive_cache = (edx & 0x2) != 0;
> +                        check &= ~(0x1 << 1);
> +                      }
> +                    break;
> +                }
> +            }
> +          while (check);
> +
> +          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
> +             numbers of addressable IDs for logical processors sharing
> +             the cache, instead of the maximum number of threads
> +             sharing the cache.  */
> +          if ((max_cpuid >= 11) && (!ignore_leaf_b))

Drop unnecessary ().

> +            {
> +              /* Find the number of logical processors shipped in
> +                 one core and apply count mask.  */
> +              i = 0;
> +
> +              /* Count SMT only if there is L3 cache.  Always count
> +                 core if there is no L3 cache.  */
> +              int count = ((threads_l2 > 0 && level == 3)
> +                           | ((threads_l3 > 0
> +                               || (threads_l2 > 0 && level == 2)) << 1));
> +
> +              while (count)
> +                {
> +                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
> +
> +                  int shipped = ebx & 0xff;
> +                  int type = ecx & 0xff00;
> +                  if (shipped == 0 || type == 0)
> +                    break;
> +                  else if (type == 0x100)
> +                    {
> +                      /* Count SMT.  */
> +                      if ((count & 0x1))
> +                        {
> +                          int count_mask;
> +
> +                          /* Compute count mask.  */
> +                          asm ("bsr %1, %0"
> +                               : "=r" (count_mask) : "g" (threads_l2));
> +                          count_mask = ~(-1 << (count_mask + 1));
> +                          threads_l2 = (shipped - 1) & count_mask;
> +                          count &= ~0x1;
> +                        }
> +                    }
> +                  else if (type == 0x200)
> +                    {
> +                      /* Count core.  */
> +                      if ((count & (0x1 << 1)))
> +                        {
> +                          int count_mask;
> +                          int threads_core
> +                            = (level == 2 ? threads_l2 : threads_l3);
> +
> +                          /* Compute count mask.  */
> +                          asm ("bsr %1, %0"
> +                               : "=r" (count_mask) : "g" (threads_core));
> +                          count_mask = ~(-1 << (count_mask + 1));
> +                          threads_core = (shipped - 1) & count_mask;
> +                          if (level == 2)
> +                            threads_l2 = threads_core;
> +                          else
> +                            threads_l3 = threads_core;
> +                          count &= ~(0x1 << 1);
> +                        }
> +                    }
> +                }
> +            }
> +          if (threads_l2 > 0)
> +            threads_l2 += 1;
> +          if (threads_l3 > 0)
> +            threads_l3 += 1;
> +          if (level == 2)
> +            {
> +              if (threads_l2)
> +                {
> +                  threads = threads_l2;
> +                  if (threads > 2 && family == 6
> +                     && cpu_features->basic.kind == arch_kind_intel)

Check arch_kind_intel first.  Put each condition on a separate line.

> +                    switch (model)
> +                      {
> +                        case 0x37:
> +                        case 0x4a:
> +                        case 0x4d:
> +                        case 0x5a:
> +                        case 0x5d:
> +                          /* Silvermont has L2 cache shared by 2 cores.  */
> +                          threads = 2;
> +                          break;
> +                        default:
> +                          break;
> +                      }
> +                }
> +            }
> +          else if (threads_l3)
> +            threads = threads_l3;
> +        }
> +      else
> +        {
> +intel_bug_no_cache_info:
> +          /* Assume that all logical threads share the highest cache
> +             level.  */
> +          threads
> +            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> +                >> 16) & 0xff);
> +        }
> +
> +        /* Cap usage of highest cache level to the number of supported
> +           threads.  */
> +        if (shared > 0 && threads > 0)
> +          shared /= threads;
> +    }
> +
> +  /* Account for non-inclusive L2 and L3 caches.  */
> +  if (!inclusive_cache)
> +    {
> +      if (threads_l2 > 0)
> +        core /= threads_l2;
> +      shared += core;
> +    }
> +
> +  *shared_ptr = shared;
> +  *threads_ptr = threads;
> +}
> +
> +
> +static void
>  __attribute__((constructor))
>  init_cacheinfo (void)
>  {
> @@ -494,211 +765,25 @@ init_cacheinfo (void)
>    int max_cpuid_ex;
>    long int data = -1;
>    long int shared = -1;
> -  unsigned int level;
> +  long int core;
>    unsigned int threads = 0;
>    const struct cpu_features *cpu_features = __get_cpu_features ();
> -  int max_cpuid = cpu_features->basic.max_cpuid;
>
>    if (cpu_features->basic.kind == arch_kind_intel)
>      {
>        data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
> -
> -      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
> -      bool inclusive_cache = true;
> -
> -      /* Try L3 first.  */
> -      level  = 3;
> +      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
>        shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
>
> -      /* Number of logical processors sharing L2 cache.  */
> -      int threads_l2;
> -
> -      /* Number of logical processors sharing L3 cache.  */
> -      int threads_l3;
> -
> -      if (shared <= 0)
> -       {
> -         /* Try L2 otherwise.  */
> -         level  = 2;
> -         shared = core;
> -         threads_l2 = 0;
> -         threads_l3 = -1;
> -       }
> -      else
> -       {
> -         threads_l2 = 0;
> -         threads_l3 = 0;
> -       }
> -
> -      /* A value of 0 for the HTT bit indicates there is only a single
> -        logical processor.  */
> -      if (HAS_CPU_FEATURE (HTT))
> -       {
> -         /* Figure out the number of logical threads that share the
> -            highest cache level.  */
> -         if (max_cpuid >= 4)
> -           {
> -             unsigned int family = cpu_features->basic.family;
> -             unsigned int model = cpu_features->basic.model;
> -
> -             int i = 0;
> -
> -             /* Query until cache level 2 and 3 are enumerated.  */
> -             int check = 0x1 | (threads_l3 == 0) << 1;
> -             do
> -               {
> -                 __cpuid_count (4, i++, eax, ebx, ecx, edx);
> -
> -                 /* There seems to be a bug in at least some Pentium Ds
> -                    which sometimes fail to iterate all cache parameters.
> -                    Do not loop indefinitely here, stop in this case and
> -                    assume there is no such information.  */
> -                 if ((eax & 0x1f) == 0)
> -                   goto intel_bug_no_cache_info;
> -
> -                 switch ((eax >> 5) & 0x7)
> -                   {
> -                   default:
> -                     break;
> -                   case 2:
> -                     if ((check & 0x1))
> -                       {
> -                         /* Get maximum number of logical processors
> -                            sharing L2 cache.  */
> -                         threads_l2 = (eax >> 14) & 0x3ff;
> -                         check &= ~0x1;
> -                       }
> -                     break;
> -                   case 3:
> -                     if ((check & (0x1 << 1)))
> -                       {
> -                         /* Get maximum number of logical processors
> -                            sharing L3 cache.  */
> -                         threads_l3 = (eax >> 14) & 0x3ff;
> -
> -                         /* Check if L2 and L3 caches are inclusive.  */
> -                         inclusive_cache = (edx & 0x2) != 0;
> -                         check &= ~(0x1 << 1);
> -                       }
> -                     break;
> -                   }
> -               }
> -             while (check);
> -
> -             /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
> -                numbers of addressable IDs for logical processors sharing
> -                the cache, instead of the maximum number of threads
> -                sharing the cache.  */
> -             if (max_cpuid >= 11)
> -               {
> -                 /* Find the number of logical processors shipped in
> -                    one core and apply count mask.  */
> -                 i = 0;
> -
> -                 /* Count SMT only if there is L3 cache.  Always count
> -                    core if there is no L3 cache.  */
> -                 int count = ((threads_l2 > 0 && level == 3)
> -                              | ((threads_l3 > 0
> -                                  || (threads_l2 > 0 && level == 2)) << 1));
> -
> -                 while (count)
> -                   {
> -                     __cpuid_count (11, i++, eax, ebx, ecx, edx);
> -
> -                     int shipped = ebx & 0xff;
> -                     int type = ecx & 0xff00;
> -                     if (shipped == 0 || type == 0)
> -                       break;
> -                     else if (type == 0x100)
> -                       {
> -                         /* Count SMT.  */
> -                         if ((count & 0x1))
> -                           {
> -                             int count_mask;
> -
> -                             /* Compute count mask.  */
> -                             asm ("bsr %1, %0"
> -                                  : "=r" (count_mask) : "g" (threads_l2));
> -                             count_mask = ~(-1 << (count_mask + 1));
> -                             threads_l2 = (shipped - 1) & count_mask;
> -                             count &= ~0x1;
> -                           }
> -                       }
> -                     else if (type == 0x200)
> -                       {
> -                         /* Count core.  */
> -                         if ((count & (0x1 << 1)))
> -                           {
> -                             int count_mask;
> -                             int threads_core
> -                               = (level == 2 ? threads_l2 : threads_l3);
> -
> -                             /* Compute count mask.  */
> -                             asm ("bsr %1, %0"
> -                                  : "=r" (count_mask) : "g" (threads_core));
> -                             count_mask = ~(-1 << (count_mask + 1));
> -                             threads_core = (shipped - 1) & count_mask;
> -                             if (level == 2)
> -                               threads_l2 = threads_core;
> -                             else
> -                               threads_l3 = threads_core;
> -                             count &= ~(0x1 << 1);
> -                           }
> -                       }
> -                   }
> -               }
> -             if (threads_l2 > 0)
> -               threads_l2 += 1;
> -             if (threads_l3 > 0)
> -               threads_l3 += 1;
> -             if (level == 2)
> -               {
> -                 if (threads_l2)
> -                   {
> -                     threads = threads_l2;
> -                     if (threads > 2 && family == 6)
> -                       switch (model)
> -                         {
> -                         case 0x37:
> -                         case 0x4a:
> -                         case 0x4d:
> -                         case 0x5a:
> -                         case 0x5d:
> -                           /* Silvermont has L2 cache shared by 2 cores.  */
> -                           threads = 2;
> -                           break;
> -                         default:
> -                           break;
> -                         }
> -                   }
> -               }
> -             else if (threads_l3)
> -               threads = threads_l3;
> -           }
> -         else
> -           {
> -intel_bug_no_cache_info:
> -             /* Assume that all logical threads share the highest cache
> -                level.  */
> -
> -             threads
> -               = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> -                   >> 16) & 0xff);
> -           }
> -
> -         /* Cap usage of highest cache level to the number of supported
> -            threads.  */
> -         if (shared > 0 && threads > 0)
> -           shared /= threads;
> -       }
> +      get_common_info (&shared, &threads, core);
> +    }
> +  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
> +    {
> +      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
> +      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
> +      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
>
> -      /* Account for non-inclusive L2 and L3 caches.  */
> -      if (!inclusive_cache)
> -       {
> -         if (threads_l2 > 0)
> -           core /= threads_l2;
> -         shared += core;
> -       }
> +      get_common_info (&shared, &threads, core);
>      }
>    else if (cpu_features->basic.kind == arch_kind_amd)
>      {
> --
> 2.7.4
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/3] x86: Add CPU Vendor ID detection support for Zhaoxin processors
  2020-04-24 12:29 ` [PATCH v3 1/3] x86: Add CPU Vendor ID detection " mayshao-oc
@ 2020-04-24 12:53   ` H.J. Lu
  0 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-04-24 12:53 UTC (permalink / raw)
  To: mayshao-oc
  Cc: GNU C Library, Carlos O'Donell, Florian Weimer,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Ricky Li(BJ-RD)

On Fri, Apr 24, 2020 at 5:29 AM mayshao-oc <mayshao-oc@zhaoxin.com> wrote:
>
> From: mayshao <mayshao-oc@zhaoxin.com>
>
> To recognize Zhaoxin CPU Vendor ID, add a new architecture type
> arch_kind_zhaoxin for Vendor Zhaoxin detection.
> ---
>  sysdeps/x86/cpu-features.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++
>  sysdeps/x86/cpu-features.h |  1 +
>  2 files changed, 55 insertions(+)
>
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index 81a170a..bfb415f 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -466,6 +466,60 @@ init_cpu_features (struct cpu_features *cpu_features)
>           }
>         }
>      }
> +  /* This spells out "CentaurHauls" or " Shanghai ".  */
> +  else if ((ebx == 0x746e6543 && ecx == 0x736c7561 && edx == 0x48727561)
> +          || (ebx == 0x68532020 && ecx == 0x20206961 && edx == 0x68676e61))
> +    {
> +      unsigned int extended_model, stepping;
> +
> +      kind = arch_kind_zhaoxin;
> +
> +      get_common_indices (cpu_features, &family, &model, &extended_model,
> +                         &stepping);
> +
> +      get_extended_indices (cpu_features);
> +
> +      model += extended_model;
> +      if (family == 0x6)
> +        {
> +          if (model == 0xf || model == 0x19)
> +            {
> +              cpu_features->feature[index_arch_AVX_Usable]
> +                &= (~bit_arch_AVX_Usable
> +                & ~bit_arch_AVX2_Usable);
> +
> +              cpu_features->feature[index_arch_Slow_SSE4_2]
> +                |= (bit_arch_Slow_SSE4_2);
> +
> +              cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
> +                &= ~bit_arch_AVX_Fast_Unaligned_Load;
> +            }
> +        }
> +      else if (family == 0x7)
> +        {
> +          if (model == 0x1b)
> +            {
> +              cpu_features->feature[index_arch_AVX_Usable]
> +                &= (~bit_arch_AVX_Usable
> +                & ~bit_arch_AVX2_Usable);
> +
> +              cpu_features->feature[index_arch_Slow_SSE4_2]
> +                |= bit_arch_Slow_SSE4_2;
> +
> +              cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
> +                &= ~bit_arch_AVX_Fast_Unaligned_Load;
> +           }
> +         else if (model == 0x3b)
> +           {
> +             cpu_features->feature[index_arch_AVX_Usable]
> +               &= (~bit_arch_AVX_Usable
> +               & ~bit_arch_AVX2_Usable);
> +
> +               cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
> +               &= ~bit_arch_AVX_Fast_Unaligned_Load;
> +           }
> +       }
> +    }
>    else
>      {
>        kind = arch_kind_other;
> diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
> index aea83e6..f05d5ce 100644
> --- a/sysdeps/x86/cpu-features.h
> +++ b/sysdeps/x86/cpu-features.h
> @@ -53,6 +53,7 @@ enum cpu_features_kind
>    arch_kind_unknown = 0,
>    arch_kind_intel,
>    arch_kind_amd,
> +  arch_kind_zhaoxin,
>    arch_kind_other
>  };
>
> --
> 2.7.4
>

LGTM.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/3] x86: Add the test case of __get_cpu_features support for Zhaoxin processors
  2020-04-24 12:29 ` [PATCH v3 3/3] x86: Add the test case of __get_cpu_features " mayshao-oc
@ 2020-04-24 12:54   ` H.J. Lu
  0 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2020-04-24 12:54 UTC (permalink / raw)
  To: mayshao-oc
  Cc: GNU C Library, Carlos O'Donell, Florian Weimer,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Ricky Li(BJ-RD)

On Fri, Apr 24, 2020 at 5:29 AM mayshao-oc <mayshao-oc@zhaoxin.com> wrote:
>
> From: mayshao <mayshao-oc@zhaoxin.com>
>
> For the test case of the __get_cpu_features interface, add an item in
> cpu_kinds and a switch case for Zhaoxin support.
> ---
>  sysdeps/x86/tst-get-cpu-features.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/sysdeps/x86/tst-get-cpu-features.c b/sysdeps/x86/tst-get-cpu-features.c
> index 0f55987..0dcb906 100644
> --- a/sysdeps/x86/tst-get-cpu-features.c
> +++ b/sysdeps/x86/tst-get-cpu-features.c
> @@ -38,6 +38,7 @@ static const char * const cpu_kinds[] =
>    "Unknown",
>    "Intel",
>    "AMD",
> +  "ZHAOXIN",
>    "Other",
>  };
>
> @@ -50,6 +51,7 @@ do_test (void)
>      {
>      case arch_kind_intel:
>      case arch_kind_amd:
> +    case arch_kind_zhaoxin:
>      case arch_kind_other:
>        printf ("Vendor: %s\n", cpu_kinds[cpu_features->basic.kind]);
>        printf ("Family: 0x%x\n", cpu_features->basic.family);
> --
> 2.7.4
>

LGTM.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-24 12:53   ` H.J. Lu
@ 2020-04-26  5:54     ` Mayshao-oc
  2020-04-26 12:07       ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Mayshao-oc @ 2020-04-26  5:54 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GNU C Library, Carlos O'Donell, Florian Weimer,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Ricky Li(BJ-RD)

[-- Attachment #1: Type: text/plain, Size: 21374 bytes --]


On Fri, Apr 24, 2020 at 8:53 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Fri, Apr 24, 2020 at 5:29 AM mayshao-oc <mayshao-oc@zhaoxin.com>
> wrote:
> >
> > From: mayshao <mayshao-oc@zhaoxin.com>
> >
> > To obtain Zhaoxin CPU cache information, add a new function
> > handle_zhaoxin().
> >
> > Add a new function get_common_info() that extracts the code
> > in init_cacheinfo() to get the value of the variable shared,
> > threads.
> >
> > Add Zhaoxin branch in init_cacheinfo() for initializing variables,
> > such as __x86_shared_cache_size.
> > ---
> >  sysdeps/x86/cacheinfo.c | 477
> ++++++++++++++++++++++++++++--------------------
> >  1 file changed, 281 insertions(+), 196 deletions(-)
> >
> > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > index e3e8ef2..14c6094 100644
> > --- a/sysdeps/x86/cacheinfo.c
> > +++ b/sysdeps/x86/cacheinfo.c
> > @@ -436,6 +436,57 @@ handle_amd (int name)
> >  }
> >
> >
> > +static long int __attribute__ ((noinline))
> > +handle_zhaoxin (int name)
> > +{
> > +  unsigned int eax;
> > +  unsigned int ebx;
> > +  unsigned int ecx;
> > +  unsigned int edx;
> > +
> > +  int folded_rel_name = (M(name) / 3) * 3;
> > +
> > +  unsigned int round = 0;
> > +  while (1)
> > +    {
> > +      __cpuid_count (4, round, eax, ebx, ecx, edx);
> > +
> > +      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
> > +      if (type == null)
> > +        break;
> > +
> > +      unsigned int level = (eax >> 5) & 0x7;
> > +
> > +      if ((level == 1 && type == data
> > +        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
> > +        || (level == 1 && type == inst
> > +            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
> > +        || (level == 2 && folded_rel_name ==
> M(_SC_LEVEL2_CACHE_SIZE))
> > +        || (level == 3 && folded_rel_name ==
> M(_SC_LEVEL3_CACHE_SIZE)))
> > +        {
> > +          unsigned int offset = M(name) - folded_rel_name;
> > +
> > +          if (offset == 0)
> > +            /* Cache size.  */
> > +            return (((ebx >> 22) + 1)
> > +                * (((ebx >> 12) & 0x3ff) + 1)
> > +                * ((ebx & 0xfff) + 1)
> > +                * (ecx + 1));
> > +          if (offset == 1)
> > +            return (ebx >> 22) + 1;
> > +
> > +          assert (offset == 2);
> > +          return (ebx & 0xfff) + 1;
> > +        }
> > +
> > +      ++round;
> > +    }
> > +
> > +  /* Nothing found.  */
> > +  return 0;
> > +}
> > +
> > +
> >  /* Get the value of the system variable NAME.  */
> >  long int
> >  attribute_hidden
> > @@ -449,6 +500,9 @@ __cache_sysconf (int name)
> >    if (cpu_features->basic.kind == arch_kind_amd)
> >      return handle_amd (name);
> >
> > +  if (cpu_features->basic.kind == arch_kind_zhaoxin)
> > +    return handle_zhaoxin (name);
> > +
> >    // XXX Fill in more vendors.
> >
> >    /* CPU not known, we have no information.  */
> > @@ -483,6 +537,223 @@ int __x86_prefetchw attribute_hidden;
> >
> >
> >  static void
> > +get_common_info (long int *shared_ptr, unsigned int *threads_ptr,
> > +                long int core)
> 
> get_common_cache_info

Fixed.

> > +{
> > +  unsigned int eax;
> > +  unsigned int ebx;
> > +  unsigned int ecx;
> > +  unsigned int edx;
> > +
> > +  /* Number of logical processors sharing L2 cache.  */
> > +  int threads_l2;
> > +
> > +  /* Number of logical processors sharing L3 cache.  */
> > +  int threads_l3;
> > +
> > +  const struct cpu_features *cpu_features = __get_cpu_features ();
> > +  int max_cpuid = cpu_features->basic.max_cpuid;
> > +  unsigned int family = cpu_features->basic.family;
> > +  unsigned int model = cpu_features->basic.model;
> > +  long int shared = *shared_ptr;
> > +  unsigned int threads = *threads_ptr;
> > +  bool inclusive_cache = true;
> > +  bool ignore_leaf_b = false;
> 
> Change to support_count_mask.

Fixed.

> > +
> > +  /* Try L3 first.  */
> > +  unsigned int level = 3;
> > +
> > +  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
> > +    ignore_leaf_b = true;
> > +
> > +  if (shared <= 0)
> > +    {
> > +      /* Try L2 otherwise.  */
> > +      level  = 2;
> > +      shared = core;
> > +      threads_l2 = 0;
> > +      threads_l3 = -1;
> > +    }
> > +  else
> > +    {
> > +      threads_l2 = 0;
> > +      threads_l3 = 0;
> > +    }
> > +
> > +  /* A value of 0 for the HTT bit indicates there is only a single
> > +     logical processor.  */
> > +  if (HAS_CPU_FEATURE (HTT))
> > +    {
> > +      /* Figure out the number of logical threads that share the
> > +         highest cache level.  */
> > +      if (max_cpuid >= 4)
> > +        {
> > +          int i = 0;
> > +
> > +          /* Query until cache level 2 and 3 are enumerated.  */
> > +          int check = 0x1 | (threads_l3 == 0) << 1;
> > +          do
> > +            {
> > +              __cpuid_count (4, i++, eax, ebx, ecx, edx);
> > +
> > +              /* There seems to be a bug in at least some Pentium Ds
> > +                 which sometimes fail to iterate all cache parameters.
> > +                 Do not loop indefinitely here, stop in this case and
> > +                 assume there is no such information.  */
> > +              if ((eax & 0x1f) == 0
> > +                   && cpu_features->basic.kind == arch_kind_intel)
> 
> Check arch_kind_intel first.

Fixed. 

> > +                goto intel_bug_no_cache_info;
> > +
> > +              switch ((eax >> 5) & 0x7)
> > +                {
> > +                  default:
> > +                    break;
> > +                  case 2:
> > +                    if ((check & 0x1))
> > +                      {
> > +                        /* Get maximum number of logical processors
> > +                           sharing L2 cache.  */
> > +                        threads_l2 = (eax >> 14) & 0x3ff;
> > +                        check &= ~0x1;
> > +                      }
> > +                    break;
> > +                  case 3:
> > +                    if ((check & (0x1 << 1)))
> > +                      {
> > +                        /* Get maximum number of logical processors
> > +                           sharing L3 cache.  */
> > +                        threads_l3 = (eax >> 14) & 0x3ff;
> > +
> > +                        /* Check if L2 and L3 caches are inclusive.  */
> > +                        inclusive_cache = (edx & 0x2) != 0;
> > +                        check &= ~(0x1 << 1);
> > +                      }
> > +                    break;
> > +                }
> > +            }
> > +          while (check);
> > +
> > +          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the
> maximum
> > +             numbers of addressable IDs for logical processors sharing
> > +             the cache, instead of the maximum number of threads
> > +             sharing the cache.  */
> > +          if ((max_cpuid >= 11) && (!ignore_leaf_b))
> 
> Drop unnecessary ().

Fixed.

> > +            {
> > +              /* Find the number of logical processors shipped in
> > +                 one core and apply count mask.  */
> > +              i = 0;
> > +
> > +              /* Count SMT only if there is L3 cache.  Always count
> > +                 core if there is no L3 cache.  */
> > +              int count = ((threads_l2 > 0 && level == 3)
> > +                           | ((threads_l3 > 0
> > +                               || (threads_l2 > 0 && level == 2)) <<
> 1));
> > +
> > +              while (count)
> > +                {
> > +                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
> > +
> > +                  int shipped = ebx & 0xff;
> > +                  int type = ecx & 0xff00;
> > +                  if (shipped == 0 || type == 0)
> > +                    break;
> > +                  else if (type == 0x100)
> > +                    {
> > +                      /* Count SMT.  */
> > +                      if ((count & 0x1))
> > +                        {
> > +                          int count_mask;
> > +
> > +                          /* Compute count mask.  */
> > +                          asm ("bsr %1, %0"
> > +                               : "=r" (count_mask) : "g"
> (threads_l2));
> > +                          count_mask = ~(-1 << (count_mask + 1));
> > +                          threads_l2 = (shipped - 1) & count_mask;
> > +                          count &= ~0x1;
> > +                        }
> > +                    }
> > +                  else if (type == 0x200)
> > +                    {
> > +                      /* Count core.  */
> > +                      if ((count & (0x1 << 1)))
> > +                        {
> > +                          int count_mask;
> > +                          int threads_core
> > +                            = (level == 2 ? threads_l2 : threads_l3);
> > +
> > +                          /* Compute count mask.  */
> > +                          asm ("bsr %1, %0"
> > +                               : "=r" (count_mask) : "g"
> (threads_core));
> > +                          count_mask = ~(-1 << (count_mask + 1));
> > +                          threads_core = (shipped - 1) & count_mask;
> > +                          if (level == 2)
> > +                            threads_l2 = threads_core;
> > +                          else
> > +                            threads_l3 = threads_core;
> > +                          count &= ~(0x1 << 1);
> > +                        }
> > +                    }
> > +                }
> > +            }
> > +          if (threads_l2 > 0)
> > +            threads_l2 += 1;
> > +          if (threads_l3 > 0)
> > +            threads_l3 += 1;
> > +          if (level == 2)
> > +            {
> > +              if (threads_l2)
> > +                {
> > +                  threads = threads_l2;
> > +                  if (threads > 2 && family == 6
> > +                     && cpu_features->basic.kind == arch_kind_intel)
> 
> Check arch_kind_intel first.  Put each condition on a separate line.

Fixed.

> > +                    switch (model)
> > +                      {
> > +                        case 0x37:
> > +                        case 0x4a:
> > +                        case 0x4d:
> > +                        case 0x5a:
> > +                        case 0x5d:
> > +                          /* Silvermont has L2 cache shared by 2
> cores.  */
> > +                          threads = 2;
> > +                          break;
> > +                        default:
> > +                          break;
> > +                      }
> > +                }
> > +            }
> > +          else if (threads_l3)
> > +            threads = threads_l3;
> > +        }
> > +      else
> > +        {
> > +intel_bug_no_cache_info:
> > +          /* Assume that all logical threads share the highest cache
> > +             level.  */
> > +          threads
> > +            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> > +                >> 16) & 0xff);
> > +        }
> > +
> > +        /* Cap usage of highest cache level to the number of supported
> > +           threads.  */
> > +        if (shared > 0 && threads > 0)
> > +          shared /= threads;
> > +    }
> > +
> > +  /* Account for non-inclusive L2 and L3 caches.  */
> > +  if (!inclusive_cache)
> > +    {
> > +      if (threads_l2 > 0)
> > +        core /= threads_l2;
> > +      shared += core;
> > +    }
> > +
> > +  *shared_ptr = shared;
> > +  *threads_ptr = threads;
> > +}
> > +
> > +
> > +static void
> >  __attribute__((constructor))
> >  init_cacheinfo (void)
> >  {
> > @@ -494,211 +765,25 @@ init_cacheinfo (void)
> >    int max_cpuid_ex;
> >    long int data = -1;
> >    long int shared = -1;
> > -  unsigned int level;
> > +  long int core;
> >    unsigned int threads = 0;
> >    const struct cpu_features *cpu_features = __get_cpu_features ();
> > -  int max_cpuid = cpu_features->basic.max_cpuid;
> >
> >    if (cpu_features->basic.kind == arch_kind_intel)
> >      {
> >        data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
> > -
> > -      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE,
> cpu_features);
> > -      bool inclusive_cache = true;
> > -
> > -      /* Try L3 first.  */
> > -      level  = 3;
> > +      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
> >        shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
> >
> > -      /* Number of logical processors sharing L2 cache.  */
> > -      int threads_l2;
> > -
> > -      /* Number of logical processors sharing L3 cache.  */
> > -      int threads_l3;
> > -
> > -      if (shared <= 0)
> > -       {
> > -         /* Try L2 otherwise.  */
> > -         level  = 2;
> > -         shared = core;
> > -         threads_l2 = 0;
> > -         threads_l3 = -1;
> > -       }
> > -      else
> > -       {
> > -         threads_l2 = 0;
> > -         threads_l3 = 0;
> > -       }
> > -
> > -      /* A value of 0 for the HTT bit indicates there is only a single
> > -        logical processor.  */
> > -      if (HAS_CPU_FEATURE (HTT))
> > -       {
> > -         /* Figure out the number of logical threads that share the
> > -            highest cache level.  */
> > -         if (max_cpuid >= 4)
> > -           {
> > -             unsigned int family = cpu_features->basic.family;
> > -             unsigned int model = cpu_features->basic.model;
> > -
> > -             int i = 0;
> > -
> > -             /* Query until cache level 2 and 3 are enumerated.  */
> > -             int check = 0x1 | (threads_l3 == 0) << 1;
> > -             do
> > -               {
> > -                 __cpuid_count (4, i++, eax, ebx, ecx, edx);
> > -
> > -                 /* There seems to be a bug in at least some Pentium Ds
> > -                    which sometimes fail to iterate all cache
> parameters.
> > -                    Do not loop indefinitely here, stop in this case and
> > -                    assume there is no such information.  */
> > -                 if ((eax & 0x1f) == 0)
> > -                   goto intel_bug_no_cache_info;
> > -
> > -                 switch ((eax >> 5) & 0x7)
> > -                   {
> > -                   default:
> > -                     break;
> > -                   case 2:
> > -                     if ((check & 0x1))
> > -                       {
> > -                         /* Get maximum number of logical processors
> > -                            sharing L2 cache.  */
> > -                         threads_l2 = (eax >> 14) & 0x3ff;
> > -                         check &= ~0x1;
> > -                       }
> > -                     break;
> > -                   case 3:
> > -                     if ((check & (0x1 << 1)))
> > -                       {
> > -                         /* Get maximum number of logical processors
> > -                            sharing L3 cache.  */
> > -                         threads_l3 = (eax >> 14) & 0x3ff;
> > -
> > -                         /* Check if L2 and L3 caches are inclusive.  */
> > -                         inclusive_cache = (edx & 0x2) != 0;
> > -                         check &= ~(0x1 << 1);
> > -                       }
> > -                     break;
> > -                   }
> > -               }
> > -             while (check);
> > -
> > -             /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the
> maximum
> > -                numbers of addressable IDs for logical processors
> sharing
> > -                the cache, instead of the maximum number of threads
> > -                sharing the cache.  */
> > -             if (max_cpuid >= 11)
> > -               {
> > -                 /* Find the number of logical processors shipped in
> > -                    one core and apply count mask.  */
> > -                 i = 0;
> > -
> > -                 /* Count SMT only if there is L3 cache.  Always count
> > -                    core if there is no L3 cache.  */
> > -                 int count = ((threads_l2 > 0 && level == 3)
> > -                              | ((threads_l3 > 0
> > -                                  || (threads_l2 > 0 && level == 2))
> << 1));
> > -
> > -                 while (count)
> > -                   {
> > -                     __cpuid_count (11, i++, eax, ebx, ecx, edx);
> > -
> > -                     int shipped = ebx & 0xff;
> > -                     int type = ecx & 0xff00;
> > -                     if (shipped == 0 || type == 0)
> > -                       break;
> > -                     else if (type == 0x100)
> > -                       {
> > -                         /* Count SMT.  */
> > -                         if ((count & 0x1))
> > -                           {
> > -                             int count_mask;
> > -
> > -                             /* Compute count mask.  */
> > -                             asm ("bsr %1, %0"
> > -                                  : "=r" (count_mask) : "g"
> (threads_l2));
> > -                             count_mask = ~(-1 << (count_mask + 1));
> > -                             threads_l2 = (shipped - 1) & count_mask;
> > -                             count &= ~0x1;
> > -                           }
> > -                       }
> > -                     else if (type == 0x200)
> > -                       {
> > -                         /* Count core.  */
> > -                         if ((count & (0x1 << 1)))
> > -                           {
> > -                             int count_mask;
> > -                             int threads_core
> > -                               = (level == 2 ? threads_l2 :
> threads_l3);
> > -
> > -                             /* Compute count mask.  */
> > -                             asm ("bsr %1, %0"
> > -                                  : "=r" (count_mask) : "g"
> (threads_core));
> > -                             count_mask = ~(-1 << (count_mask + 1));
> > -                             threads_core = (shipped - 1) &
> count_mask;
> > -                             if (level == 2)
> > -                               threads_l2 = threads_core;
> > -                             else
> > -                               threads_l3 = threads_core;
> > -                             count &= ~(0x1 << 1);
> > -                           }
> > -                       }
> > -                   }
> > -               }
> > -             if (threads_l2 > 0)
> > -               threads_l2 += 1;
> > -             if (threads_l3 > 0)
> > -               threads_l3 += 1;
> > -             if (level == 2)
> > -               {
> > -                 if (threads_l2)
> > -                   {
> > -                     threads = threads_l2;
> > -                     if (threads > 2 && family == 6)
> > -                       switch (model)
> > -                         {
> > -                         case 0x37:
> > -                         case 0x4a:
> > -                         case 0x4d:
> > -                         case 0x5a:
> > -                         case 0x5d:
> > -                           /* Silvermont has L2 cache shared by 2
> cores.  */
> > -                           threads = 2;
> > -                           break;
> > -                         default:
> > -                           break;
> > -                         }
> > -                   }
> > -               }
> > -             else if (threads_l3)
> > -               threads = threads_l3;
> > -           }
> > -         else
> > -           {
> > -intel_bug_no_cache_info:
> > -             /* Assume that all logical threads share the highest cache
> > -                level.  */
> > -
> > -             threads
> > -               = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> > -                   >> 16) & 0xff);
> > -           }
> > -
> > -         /* Cap usage of highest cache level to the number of supported
> > -            threads.  */
> > -         if (shared > 0 && threads > 0)
> > -           shared /= threads;
> > -       }
> > +      get_common_info (&shared, &threads, core);
> > +    }
> > +  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
> > +    {
> > +      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
> > +      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
> > +      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
> >
> > -      /* Account for non-inclusive L2 and L3 caches.  */
> > -      if (!inclusive_cache)
> > -       {
> > -         if (threads_l2 > 0)
> > -           core /= threads_l2;
> > -         shared += core;
> > -       }
> > +      get_common_info (&shared, &threads, core);
> >      }
> >    else if (cpu_features->basic.kind == arch_kind_amd)
> >      {
> > --
> > 2.7.4
> >

My mailbox may change the format of the patch, So I have
attached the updated patch to this email, please check.

Thank you so much.


Best Regards,
May Shao


[-- Attachment #2: 0002-x86-Add-cache-information-support-for-Zhaoxin-proces.patch --]
[-- Type: application/octet-stream, Size: 16075 bytes --]

From 2f09779cfaa9d14e14ef5f957702a9cb89339abe Mon Sep 17 00:00:00 2001
From: mayshao-oc <mayshao-oc@zhaoxin.com>
Date: Sun, 26 Apr 2020 13:48:27 +0800
Subject: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors

To obtain Zhaoxin CPU cache information, add a new function
handle_zhaoxin().

Add a new function get_common_cache_info() that extracts the code
in init_cacheinfo() to get the value of the variable shared, threads.

Add Zhaoxin branch in init_cacheinfo() for initializing variables,
such as __x86_shared_cache_size.
---
 sysdeps/x86/cacheinfo.c | 478 ++++++++++++++++++++++++++++--------------------
 1 file changed, 282 insertions(+), 196 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index e3e8ef2..436c05d 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -436,6 +436,57 @@ handle_amd (int name)
 }
 
 
+static long int __attribute__ ((noinline))
+handle_zhaoxin (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  unsigned int round = 0;
+  while (1)
+    {
+      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+      if (type == null)
+        break;
+
+      unsigned int level = (eax >> 5) & 0x7;
+
+      if ((level == 1 && type == data
+        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+        || (level == 1 && type == inst
+            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
+        {
+          unsigned int offset = M(name) - folded_rel_name;
+
+          if (offset == 0)
+            /* Cache size.  */
+            return (((ebx >> 22) + 1)
+                * (((ebx >> 12) & 0x3ff) + 1)
+                * ((ebx & 0xfff) + 1)
+                * (ecx + 1));
+          if (offset == 1)
+            return (ebx >> 22) + 1;
+
+          assert (offset == 2);
+          return (ebx & 0xfff) + 1;
+        }
+
+      ++round;
+    }
+
+  /* Nothing found.  */	
+  return 0;
+}
+
+
 /* Get the value of the system variable NAME.  */
 long int
 attribute_hidden
@@ -449,6 +500,9 @@ __cache_sysconf (int name)
   if (cpu_features->basic.kind == arch_kind_amd)
     return handle_amd (name);
 
+  if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    return handle_zhaoxin (name);
+
   // XXX Fill in more vendors.
 
   /* CPU not known, we have no information.  */
@@ -483,6 +537,224 @@ int __x86_prefetchw attribute_hidden;
 
 
 static void
+get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
+                long int core)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  /* Number of logical processors sharing L2 cache.  */
+  int threads_l2;
+
+  /* Number of logical processors sharing L3 cache.  */
+  int threads_l3;
+
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+  int max_cpuid = cpu_features->basic.max_cpuid;
+  unsigned int family = cpu_features->basic.family;
+  unsigned int model = cpu_features->basic.model;
+  long int shared = *shared_ptr;
+  unsigned int threads = *threads_ptr;
+  bool inclusive_cache = true;
+  bool support_count_mask = true; 
+
+  /* Try L3 first.  */
+  unsigned int level = 3;
+
+  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
+    support_count_mask = false;
+  
+  if (shared <= 0)
+    {
+      /* Try L2 otherwise.  */
+      level  = 2;
+      shared = core;
+      threads_l2 = 0;
+      threads_l3 = -1;
+    }
+  else
+    {
+      threads_l2 = 0;
+      threads_l3 = 0;
+    }
+
+  /* A value of 0 for the HTT bit indicates there is only a single
+     logical processor.  */
+  if (HAS_CPU_FEATURE (HTT))
+    {
+      /* Figure out the number of logical threads that share the
+         highest cache level.  */
+      if (max_cpuid >= 4)
+        {
+          int i = 0;
+
+          /* Query until cache level 2 and 3 are enumerated.  */
+          int check = 0x1 | (threads_l3 == 0) << 1;
+          do
+            {
+              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+
+              /* There seems to be a bug in at least some Pentium Ds
+                 which sometimes fail to iterate all cache parameters.
+                 Do not loop indefinitely here, stop in this case and
+                 assume there is no such information.  */
+              if (cpu_features->basic.kind == arch_kind_intel
+                  && (eax & 0x1f) == 0 )
+                goto intel_bug_no_cache_info;
+
+              switch ((eax >> 5) & 0x7)
+                {
+                  default:
+                    break;
+                  case 2:
+                    if ((check & 0x1))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L2 cache.  */
+                        threads_l2 = (eax >> 14) & 0x3ff;
+                        check &= ~0x1;
+                      }
+                    break;
+                  case 3:
+                    if ((check & (0x1 << 1)))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L3 cache.  */
+                        threads_l3 = (eax >> 14) & 0x3ff;
+
+                        /* Check if L2 and L3 caches are inclusive.  */
+                        inclusive_cache = (edx & 0x2) != 0;
+                        check &= ~(0x1 << 1);
+                      }
+                    break;
+                }
+            }
+          while (check);
+
+          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
+             numbers of addressable IDs for logical processors sharing
+             the cache, instead of the maximum number of threads
+             sharing the cache.  */
+          if (max_cpuid >= 11 && support_count_mask)
+            {
+              /* Find the number of logical processors shipped in
+                 one core and apply count mask.  */
+              i = 0;
+
+              /* Count SMT only if there is L3 cache.  Always count
+                 core if there is no L3 cache.  */
+              int count = ((threads_l2 > 0 && level == 3)
+                           | ((threads_l3 > 0
+                               || (threads_l2 > 0 && level == 2)) << 1));
+
+              while (count)
+                {
+                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+
+                  int shipped = ebx & 0xff;
+                  int type = ecx & 0xff00;
+                  if (shipped == 0 || type == 0)
+                    break;
+                  else if (type == 0x100)
+                    {
+                      /* Count SMT.  */
+                      if ((count & 0x1))
+                        {
+                          int count_mask;
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_l2));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_l2 = (shipped - 1) & count_mask;
+                          count &= ~0x1;
+                        }
+                    }
+                  else if (type == 0x200)
+                    {
+                      /* Count core.  */
+                      if ((count & (0x1 << 1)))
+                        {
+                          int count_mask;
+                          int threads_core
+                            = (level == 2 ? threads_l2 : threads_l3);
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_core));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_core = (shipped - 1) & count_mask;
+                          if (level == 2)
+                            threads_l2 = threads_core;
+                          else
+                            threads_l3 = threads_core;
+                          count &= ~(0x1 << 1);
+                        }
+                    }
+                }
+            }
+          if (threads_l2 > 0)
+            threads_l2 += 1;
+          if (threads_l3 > 0)
+            threads_l3 += 1;
+          if (level == 2)
+            {
+              if (threads_l2)
+                {
+                  threads = threads_l2;
+                  if (cpu_features->basic.kind == arch_kind_intel
+                      && threads > 2 
+                      && family == 6)
+                    switch (model)
+                      {
+                        case 0x37:
+                        case 0x4a:
+                        case 0x4d:
+                        case 0x5a:
+                        case 0x5d:
+                          /* Silvermont has L2 cache shared by 2 cores.  */
+                          threads = 2;
+                          break;
+                        default:
+                          break;
+                      }
+                }
+            }
+          else if (threads_l3)
+            threads = threads_l3;
+        }
+      else
+        {
+intel_bug_no_cache_info:
+          /* Assume that all logical threads share the highest cache
+             level.  */
+          threads
+            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
+                >> 16) & 0xff);
+        }
+
+        /* Cap usage of highest cache level to the number of supported
+           threads.  */
+        if (shared > 0 && threads > 0)
+          shared /= threads;
+    }
+
+  /* Account for non-inclusive L2 and L3 caches.  */
+  if (!inclusive_cache)
+    {
+      if (threads_l2 > 0)
+        core /= threads_l2;
+      shared += core;
+    }
+
+  *shared_ptr = shared;
+  *threads_ptr = threads;
+}
+
+
+static void
 __attribute__((constructor))
 init_cacheinfo (void)
 {
@@ -494,211 +766,25 @@ init_cacheinfo (void)
   int max_cpuid_ex;
   long int data = -1;
   long int shared = -1;
-  unsigned int level;
+  long int core;
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
-  int max_cpuid = cpu_features->basic.max_cpuid;
 
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
-
-      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
-      bool inclusive_cache = true;
-
-      /* Try L3 first.  */
-      level  = 3;
+      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
       shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
 
-      /* Number of logical processors sharing L2 cache.  */
-      int threads_l2;
-
-      /* Number of logical processors sharing L3 cache.  */
-      int threads_l3;
-
-      if (shared <= 0)
-	{
-	  /* Try L2 otherwise.  */
-	  level  = 2;
-	  shared = core;
-	  threads_l2 = 0;
-	  threads_l3 = -1;
-	}
-      else
-	{
-	  threads_l2 = 0;
-	  threads_l3 = 0;
-	}
-
-      /* A value of 0 for the HTT bit indicates there is only a single
-	 logical processor.  */
-      if (HAS_CPU_FEATURE (HTT))
-	{
-	  /* Figure out the number of logical threads that share the
-	     highest cache level.  */
-	  if (max_cpuid >= 4)
-	    {
-	      unsigned int family = cpu_features->basic.family;
-	      unsigned int model = cpu_features->basic.model;
-
-	      int i = 0;
-
-	      /* Query until cache level 2 and 3 are enumerated.  */
-	      int check = 0x1 | (threads_l3 == 0) << 1;
-	      do
-		{
-		  __cpuid_count (4, i++, eax, ebx, ecx, edx);
-
-		  /* There seems to be a bug in at least some Pentium Ds
-		     which sometimes fail to iterate all cache parameters.
-		     Do not loop indefinitely here, stop in this case and
-		     assume there is no such information.  */
-		  if ((eax & 0x1f) == 0)
-		    goto intel_bug_no_cache_info;
-
-		  switch ((eax >> 5) & 0x7)
-		    {
-		    default:
-		      break;
-		    case 2:
-		      if ((check & 0x1))
-			{
-			  /* Get maximum number of logical processors
-			     sharing L2 cache.  */
-			  threads_l2 = (eax >> 14) & 0x3ff;
-			  check &= ~0x1;
-			}
-		      break;
-		    case 3:
-		      if ((check & (0x1 << 1)))
-			{
-			  /* Get maximum number of logical processors
-			     sharing L3 cache.  */
-			  threads_l3 = (eax >> 14) & 0x3ff;
-
-			  /* Check if L2 and L3 caches are inclusive.  */
-			  inclusive_cache = (edx & 0x2) != 0;
-			  check &= ~(0x1 << 1);
-			}
-		      break;
-		    }
-		}
-	      while (check);
-
-	      /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
-		 numbers of addressable IDs for logical processors sharing
-		 the cache, instead of the maximum number of threads
-		 sharing the cache.  */
-	      if (max_cpuid >= 11)
-		{
-		  /* Find the number of logical processors shipped in
-		     one core and apply count mask.  */
-		  i = 0;
-
-		  /* Count SMT only if there is L3 cache.  Always count
-		     core if there is no L3 cache.  */
-		  int count = ((threads_l2 > 0 && level == 3)
-			       | ((threads_l3 > 0
-				   || (threads_l2 > 0 && level == 2)) << 1));
-
-		  while (count)
-		    {
-		      __cpuid_count (11, i++, eax, ebx, ecx, edx);
-
-		      int shipped = ebx & 0xff;
-		      int type = ecx & 0xff00;
-		      if (shipped == 0 || type == 0)
-			break;
-		      else if (type == 0x100)
-			{
-			  /* Count SMT.  */
-			  if ((count & 0x1))
-			    {
-			      int count_mask;
-
-			      /* Compute count mask.  */
-			      asm ("bsr %1, %0"
-				   : "=r" (count_mask) : "g" (threads_l2));
-			      count_mask = ~(-1 << (count_mask + 1));
-			      threads_l2 = (shipped - 1) & count_mask;
-			      count &= ~0x1;
-			    }
-			}
-		      else if (type == 0x200)
-			{
-			  /* Count core.  */
-			  if ((count & (0x1 << 1)))
-			    {
-			      int count_mask;
-			      int threads_core
-				= (level == 2 ? threads_l2 : threads_l3);
-
-			      /* Compute count mask.  */
-			      asm ("bsr %1, %0"
-				   : "=r" (count_mask) : "g" (threads_core));
-			      count_mask = ~(-1 << (count_mask + 1));
-			      threads_core = (shipped - 1) & count_mask;
-			      if (level == 2)
-				threads_l2 = threads_core;
-			      else
-				threads_l3 = threads_core;
-			      count &= ~(0x1 << 1);
-			    }
-			}
-		    }
-		}
-	      if (threads_l2 > 0)
-		threads_l2 += 1;
-	      if (threads_l3 > 0)
-		threads_l3 += 1;
-	      if (level == 2)
-		{
-		  if (threads_l2)
-		    {
-		      threads = threads_l2;
-		      if (threads > 2 && family == 6)
-			switch (model)
-			  {
-			  case 0x37:
-			  case 0x4a:
-			  case 0x4d:
-			  case 0x5a:
-			  case 0x5d:
-			    /* Silvermont has L2 cache shared by 2 cores.  */
-			    threads = 2;
-			    break;
-			  default:
-			    break;
-			  }
-		    }
-		}
-	      else if (threads_l3)
-		threads = threads_l3;
-	    }
-	  else
-	    {
-intel_bug_no_cache_info:
-	      /* Assume that all logical threads share the highest cache
-		 level.  */
-
-	      threads
-		= ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
-		    >> 16) & 0xff);
-	    }
-
-	  /* Cap usage of highest cache level to the number of supported
-	     threads.  */
-	  if (shared > 0 && threads > 0)
-	    shared /= threads;
-	}
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    {
+      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
 
-      /* Account for non-inclusive L2 and L3 caches.  */
-      if (!inclusive_cache)
-	{
-	  if (threads_l2 > 0)
-	    core /= threads_l2;
-	  shared += core;
-	}
+      get_common_cache_info (&shared, &threads, core);
     }
   else if (cpu_features->basic.kind == arch_kind_amd)
     {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-26  5:54     ` Mayshao-oc
@ 2020-04-26 12:07       ` H.J. Lu
  2020-04-30  5:09         ` Mayshao-oc
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-04-26 12:07 UTC (permalink / raw)
  To: Mayshao-oc
  Cc: GNU C Library, Carlos O'Donell, Florian Weimer,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Ricky Li(BJ-RD)

On Sat, Apr 25, 2020 at 10:55 PM Mayshao-oc <Mayshao-oc@zhaoxin.com> wrote:
>
>
> On Fri, Apr 24, 2020 at 8:53 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Fri, Apr 24, 2020 at 5:29 AM mayshao-oc <mayshao-oc@zhaoxin.com>
> > wrote:
> > >
> > > From: mayshao <mayshao-oc@zhaoxin.com>
> > >
> > > To obtain Zhaoxin CPU cache information, add a new function
> > > handle_zhaoxin().
> > >
> > > Add a new function get_common_info() that extracts the code
> > > in init_cacheinfo() to get the value of the variable shared,
> > > threads.
> > >
> > > Add Zhaoxin branch in init_cacheinfo() for initializing variables,
> > > such as __x86_shared_cache_size.
> > > ---
> > >  sysdeps/x86/cacheinfo.c | 477
> > ++++++++++++++++++++++++++++--------------------
> > >  1 file changed, 281 insertions(+), 196 deletions(-)
> > >
> > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > > index e3e8ef2..14c6094 100644
> > > --- a/sysdeps/x86/cacheinfo.c
> > > +++ b/sysdeps/x86/cacheinfo.c
> > > @@ -436,6 +436,57 @@ handle_amd (int name)
> > >  }
> > >
> > >
> > > +static long int __attribute__ ((noinline))
> > > +handle_zhaoxin (int name)
> > > +{
> > > +  unsigned int eax;
> > > +  unsigned int ebx;
> > > +  unsigned int ecx;
> > > +  unsigned int edx;
> > > +
> > > +  int folded_rel_name = (M(name) / 3) * 3;
> > > +
> > > +  unsigned int round = 0;
> > > +  while (1)
> > > +    {
> > > +      __cpuid_count (4, round, eax, ebx, ecx, edx);
> > > +
> > > +      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
> > > +      if (type == null)
> > > +        break;
> > > +
> > > +      unsigned int level = (eax >> 5) & 0x7;
> > > +
> > > +      if ((level == 1 && type == data
> > > +        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
> > > +        || (level == 1 && type == inst
> > > +            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
> > > +        || (level == 2 && folded_rel_name ==
> > M(_SC_LEVEL2_CACHE_SIZE))
> > > +        || (level == 3 && folded_rel_name ==
> > M(_SC_LEVEL3_CACHE_SIZE)))
> > > +        {
> > > +          unsigned int offset = M(name) - folded_rel_name;
> > > +
> > > +          if (offset == 0)
> > > +            /* Cache size.  */
> > > +            return (((ebx >> 22) + 1)
> > > +                * (((ebx >> 12) & 0x3ff) + 1)
> > > +                * ((ebx & 0xfff) + 1)
> > > +                * (ecx + 1));
> > > +          if (offset == 1)
> > > +            return (ebx >> 22) + 1;
> > > +
> > > +          assert (offset == 2);
> > > +          return (ebx & 0xfff) + 1;
> > > +        }
> > > +
> > > +      ++round;
> > > +    }
> > > +
> > > +  /* Nothing found.  */
> > > +  return 0;
> > > +}
> > > +
> > > +
> > >  /* Get the value of the system variable NAME.  */
> > >  long int
> > >  attribute_hidden
> > > @@ -449,6 +500,9 @@ __cache_sysconf (int name)
> > >    if (cpu_features->basic.kind == arch_kind_amd)
> > >      return handle_amd (name);
> > >
> > > +  if (cpu_features->basic.kind == arch_kind_zhaoxin)
> > > +    return handle_zhaoxin (name);
> > > +
> > >    // XXX Fill in more vendors.
> > >
> > >    /* CPU not known, we have no information.  */
> > > @@ -483,6 +537,223 @@ int __x86_prefetchw attribute_hidden;
> > >
> > >
> > >  static void
> > > +get_common_info (long int *shared_ptr, unsigned int *threads_ptr,
> > > +                long int core)
> >
> > get_common_cache_info
>
> Fixed.
>
> > > +{
> > > +  unsigned int eax;
> > > +  unsigned int ebx;
> > > +  unsigned int ecx;
> > > +  unsigned int edx;
> > > +
> > > +  /* Number of logical processors sharing L2 cache.  */
> > > +  int threads_l2;
> > > +
> > > +  /* Number of logical processors sharing L3 cache.  */
> > > +  int threads_l3;
> > > +
> > > +  const struct cpu_features *cpu_features = __get_cpu_features ();
> > > +  int max_cpuid = cpu_features->basic.max_cpuid;
> > > +  unsigned int family = cpu_features->basic.family;
> > > +  unsigned int model = cpu_features->basic.model;
> > > +  long int shared = *shared_ptr;
> > > +  unsigned int threads = *threads_ptr;
> > > +  bool inclusive_cache = true;
> > > +  bool ignore_leaf_b = false;
> >
> > Change to support_count_mask.
>
> Fixed.
>
> > > +
> > > +  /* Try L3 first.  */
> > > +  unsigned int level = 3;
> > > +
> > > +  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
> > > +    ignore_leaf_b = true;
> > > +
> > > +  if (shared <= 0)
> > > +    {
> > > +      /* Try L2 otherwise.  */
> > > +      level  = 2;
> > > +      shared = core;
> > > +      threads_l2 = 0;
> > > +      threads_l3 = -1;
> > > +    }
> > > +  else
> > > +    {
> > > +      threads_l2 = 0;
> > > +      threads_l3 = 0;
> > > +    }
> > > +
> > > +  /* A value of 0 for the HTT bit indicates there is only a single
> > > +     logical processor.  */
> > > +  if (HAS_CPU_FEATURE (HTT))
> > > +    {
> > > +      /* Figure out the number of logical threads that share the
> > > +         highest cache level.  */
> > > +      if (max_cpuid >= 4)
> > > +        {
> > > +          int i = 0;
> > > +
> > > +          /* Query until cache level 2 and 3 are enumerated.  */
> > > +          int check = 0x1 | (threads_l3 == 0) << 1;
> > > +          do
> > > +            {
> > > +              __cpuid_count (4, i++, eax, ebx, ecx, edx);
> > > +
> > > +              /* There seems to be a bug in at least some Pentium Ds
> > > +                 which sometimes fail to iterate all cache parameters.
> > > +                 Do not loop indefinitely here, stop in this case and
> > > +                 assume there is no such information.  */
> > > +              if ((eax & 0x1f) == 0
> > > +                   && cpu_features->basic.kind == arch_kind_intel)
> >
> > Check arch_kind_intel first.
>
> Fixed.
>
> > > +                goto intel_bug_no_cache_info;
> > > +
> > > +              switch ((eax >> 5) & 0x7)
> > > +                {
> > > +                  default:
> > > +                    break;
> > > +                  case 2:
> > > +                    if ((check & 0x1))
> > > +                      {
> > > +                        /* Get maximum number of logical processors
> > > +                           sharing L2 cache.  */
> > > +                        threads_l2 = (eax >> 14) & 0x3ff;
> > > +                        check &= ~0x1;
> > > +                      }
> > > +                    break;
> > > +                  case 3:
> > > +                    if ((check & (0x1 << 1)))
> > > +                      {
> > > +                        /* Get maximum number of logical processors
> > > +                           sharing L3 cache.  */
> > > +                        threads_l3 = (eax >> 14) & 0x3ff;
> > > +
> > > +                        /* Check if L2 and L3 caches are inclusive.  */
> > > +                        inclusive_cache = (edx & 0x2) != 0;
> > > +                        check &= ~(0x1 << 1);
> > > +                      }
> > > +                    break;
> > > +                }
> > > +            }
> > > +          while (check);
> > > +
> > > +          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the
> > maximum
> > > +             numbers of addressable IDs for logical processors sharing
> > > +             the cache, instead of the maximum number of threads
> > > +             sharing the cache.  */
> > > +          if ((max_cpuid >= 11) && (!ignore_leaf_b))
> >
> > Drop unnecessary ().
>
> Fixed.
>
> > > +            {
> > > +              /* Find the number of logical processors shipped in
> > > +                 one core and apply count mask.  */
> > > +              i = 0;
> > > +
> > > +              /* Count SMT only if there is L3 cache.  Always count
> > > +                 core if there is no L3 cache.  */
> > > +              int count = ((threads_l2 > 0 && level == 3)
> > > +                           | ((threads_l3 > 0
> > > +                               || (threads_l2 > 0 && level == 2)) <<
> > 1));
> > > +
> > > +              while (count)
> > > +                {
> > > +                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
> > > +
> > > +                  int shipped = ebx & 0xff;
> > > +                  int type = ecx & 0xff00;
> > > +                  if (shipped == 0 || type == 0)
> > > +                    break;
> > > +                  else if (type == 0x100)
> > > +                    {
> > > +                      /* Count SMT.  */
> > > +                      if ((count & 0x1))
> > > +                        {
> > > +                          int count_mask;
> > > +
> > > +                          /* Compute count mask.  */
> > > +                          asm ("bsr %1, %0"
> > > +                               : "=r" (count_mask) : "g"
> > (threads_l2));
> > > +                          count_mask = ~(-1 << (count_mask + 1));
> > > +                          threads_l2 = (shipped - 1) & count_mask;
> > > +                          count &= ~0x1;
> > > +                        }
> > > +                    }
> > > +                  else if (type == 0x200)
> > > +                    {
> > > +                      /* Count core.  */
> > > +                      if ((count & (0x1 << 1)))
> > > +                        {
> > > +                          int count_mask;
> > > +                          int threads_core
> > > +                            = (level == 2 ? threads_l2 : threads_l3);
> > > +
> > > +                          /* Compute count mask.  */
> > > +                          asm ("bsr %1, %0"
> > > +                               : "=r" (count_mask) : "g"
> > (threads_core));
> > > +                          count_mask = ~(-1 << (count_mask + 1));
> > > +                          threads_core = (shipped - 1) & count_mask;
> > > +                          if (level == 2)
> > > +                            threads_l2 = threads_core;
> > > +                          else
> > > +                            threads_l3 = threads_core;
> > > +                          count &= ~(0x1 << 1);
> > > +                        }
> > > +                    }
> > > +                }
> > > +            }
> > > +          if (threads_l2 > 0)
> > > +            threads_l2 += 1;
> > > +          if (threads_l3 > 0)
> > > +            threads_l3 += 1;
> > > +          if (level == 2)
> > > +            {
> > > +              if (threads_l2)
> > > +                {
> > > +                  threads = threads_l2;
> > > +                  if (threads > 2 && family == 6
> > > +                     && cpu_features->basic.kind == arch_kind_intel)
> >
> > Check arch_kind_intel first.  Put each condition on a separate line.
>
> Fixed.
>
> > > +                    switch (model)
> > > +                      {
> > > +                        case 0x37:
> > > +                        case 0x4a:
> > > +                        case 0x4d:
> > > +                        case 0x5a:
> > > +                        case 0x5d:
> > > +                          /* Silvermont has L2 cache shared by 2
> > cores.  */
> > > +                          threads = 2;
> > > +                          break;
> > > +                        default:
> > > +                          break;
> > > +                      }
> > > +                }
> > > +            }
> > > +          else if (threads_l3)
> > > +            threads = threads_l3;
> > > +        }
> > > +      else
> > > +        {
> > > +intel_bug_no_cache_info:
> > > +          /* Assume that all logical threads share the highest cache
> > > +             level.  */
> > > +          threads
> > > +            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> > > +                >> 16) & 0xff);
> > > +        }
> > > +
> > > +        /* Cap usage of highest cache level to the number of supported
> > > +           threads.  */
> > > +        if (shared > 0 && threads > 0)
> > > +          shared /= threads;
> > > +    }
> > > +
> > > +  /* Account for non-inclusive L2 and L3 caches.  */
> > > +  if (!inclusive_cache)
> > > +    {
> > > +      if (threads_l2 > 0)
> > > +        core /= threads_l2;
> > > +      shared += core;
> > > +    }
> > > +
> > > +  *shared_ptr = shared;
> > > +  *threads_ptr = threads;
> > > +}
> > > +
> > > +
> > > +static void
> > >  __attribute__((constructor))
> > >  init_cacheinfo (void)
> > >  {
> > > @@ -494,211 +765,25 @@ init_cacheinfo (void)
> > >    int max_cpuid_ex;
> > >    long int data = -1;
> > >    long int shared = -1;
> > > -  unsigned int level;
> > > +  long int core;
> > >    unsigned int threads = 0;
> > >    const struct cpu_features *cpu_features = __get_cpu_features ();
> > > -  int max_cpuid = cpu_features->basic.max_cpuid;
> > >
> > >    if (cpu_features->basic.kind == arch_kind_intel)
> > >      {
> > >        data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
> > > -
> > > -      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE,
> > cpu_features);
> > > -      bool inclusive_cache = true;
> > > -
> > > -      /* Try L3 first.  */
> > > -      level  = 3;
> > > +      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
> > >        shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
> > >
> > > -      /* Number of logical processors sharing L2 cache.  */
> > > -      int threads_l2;
> > > -
> > > -      /* Number of logical processors sharing L3 cache.  */
> > > -      int threads_l3;
> > > -
> > > -      if (shared <= 0)
> > > -       {
> > > -         /* Try L2 otherwise.  */
> > > -         level  = 2;
> > > -         shared = core;
> > > -         threads_l2 = 0;
> > > -         threads_l3 = -1;
> > > -       }
> > > -      else
> > > -       {
> > > -         threads_l2 = 0;
> > > -         threads_l3 = 0;
> > > -       }
> > > -
> > > -      /* A value of 0 for the HTT bit indicates there is only a single
> > > -        logical processor.  */
> > > -      if (HAS_CPU_FEATURE (HTT))
> > > -       {
> > > -         /* Figure out the number of logical threads that share the
> > > -            highest cache level.  */
> > > -         if (max_cpuid >= 4)
> > > -           {
> > > -             unsigned int family = cpu_features->basic.family;
> > > -             unsigned int model = cpu_features->basic.model;
> > > -
> > > -             int i = 0;
> > > -
> > > -             /* Query until cache level 2 and 3 are enumerated.  */
> > > -             int check = 0x1 | (threads_l3 == 0) << 1;
> > > -             do
> > > -               {
> > > -                 __cpuid_count (4, i++, eax, ebx, ecx, edx);
> > > -
> > > -                 /* There seems to be a bug in at least some Pentium Ds
> > > -                    which sometimes fail to iterate all cache
> > parameters.
> > > -                    Do not loop indefinitely here, stop in this case and
> > > -                    assume there is no such information.  */
> > > -                 if ((eax & 0x1f) == 0)
> > > -                   goto intel_bug_no_cache_info;
> > > -
> > > -                 switch ((eax >> 5) & 0x7)
> > > -                   {
> > > -                   default:
> > > -                     break;
> > > -                   case 2:
> > > -                     if ((check & 0x1))
> > > -                       {
> > > -                         /* Get maximum number of logical processors
> > > -                            sharing L2 cache.  */
> > > -                         threads_l2 = (eax >> 14) & 0x3ff;
> > > -                         check &= ~0x1;
> > > -                       }
> > > -                     break;
> > > -                   case 3:
> > > -                     if ((check & (0x1 << 1)))
> > > -                       {
> > > -                         /* Get maximum number of logical processors
> > > -                            sharing L3 cache.  */
> > > -                         threads_l3 = (eax >> 14) & 0x3ff;
> > > -
> > > -                         /* Check if L2 and L3 caches are inclusive.  */
> > > -                         inclusive_cache = (edx & 0x2) != 0;
> > > -                         check &= ~(0x1 << 1);
> > > -                       }
> > > -                     break;
> > > -                   }
> > > -               }
> > > -             while (check);
> > > -
> > > -             /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the
> > maximum
> > > -                numbers of addressable IDs for logical processors
> > sharing
> > > -                the cache, instead of the maximum number of threads
> > > -                sharing the cache.  */
> > > -             if (max_cpuid >= 11)
> > > -               {
> > > -                 /* Find the number of logical processors shipped in
> > > -                    one core and apply count mask.  */
> > > -                 i = 0;
> > > -
> > > -                 /* Count SMT only if there is L3 cache.  Always count
> > > -                    core if there is no L3 cache.  */
> > > -                 int count = ((threads_l2 > 0 && level == 3)
> > > -                              | ((threads_l3 > 0
> > > -                                  || (threads_l2 > 0 && level == 2))
> > << 1));
> > > -
> > > -                 while (count)
> > > -                   {
> > > -                     __cpuid_count (11, i++, eax, ebx, ecx, edx);
> > > -
> > > -                     int shipped = ebx & 0xff;
> > > -                     int type = ecx & 0xff00;
> > > -                     if (shipped == 0 || type == 0)
> > > -                       break;
> > > -                     else if (type == 0x100)
> > > -                       {
> > > -                         /* Count SMT.  */
> > > -                         if ((count & 0x1))
> > > -                           {
> > > -                             int count_mask;
> > > -
> > > -                             /* Compute count mask.  */
> > > -                             asm ("bsr %1, %0"
> > > -                                  : "=r" (count_mask) : "g"
> > (threads_l2));
> > > -                             count_mask = ~(-1 << (count_mask + 1));
> > > -                             threads_l2 = (shipped - 1) & count_mask;
> > > -                             count &= ~0x1;
> > > -                           }
> > > -                       }
> > > -                     else if (type == 0x200)
> > > -                       {
> > > -                         /* Count core.  */
> > > -                         if ((count & (0x1 << 1)))
> > > -                           {
> > > -                             int count_mask;
> > > -                             int threads_core
> > > -                               = (level == 2 ? threads_l2 :
> > threads_l3);
> > > -
> > > -                             /* Compute count mask.  */
> > > -                             asm ("bsr %1, %0"
> > > -                                  : "=r" (count_mask) : "g"
> > (threads_core));
> > > -                             count_mask = ~(-1 << (count_mask + 1));
> > > -                             threads_core = (shipped - 1) &
> > count_mask;
> > > -                             if (level == 2)
> > > -                               threads_l2 = threads_core;
> > > -                             else
> > > -                               threads_l3 = threads_core;
> > > -                             count &= ~(0x1 << 1);
> > > -                           }
> > > -                       }
> > > -                   }
> > > -               }
> > > -             if (threads_l2 > 0)
> > > -               threads_l2 += 1;
> > > -             if (threads_l3 > 0)
> > > -               threads_l3 += 1;
> > > -             if (level == 2)
> > > -               {
> > > -                 if (threads_l2)
> > > -                   {
> > > -                     threads = threads_l2;
> > > -                     if (threads > 2 && family == 6)
> > > -                       switch (model)
> > > -                         {
> > > -                         case 0x37:
> > > -                         case 0x4a:
> > > -                         case 0x4d:
> > > -                         case 0x5a:
> > > -                         case 0x5d:
> > > -                           /* Silvermont has L2 cache shared by 2
> > cores.  */
> > > -                           threads = 2;
> > > -                           break;
> > > -                         default:
> > > -                           break;
> > > -                         }
> > > -                   }
> > > -               }
> > > -             else if (threads_l3)
> > > -               threads = threads_l3;
> > > -           }
> > > -         else
> > > -           {
> > > -intel_bug_no_cache_info:
> > > -             /* Assume that all logical threads share the highest cache
> > > -                level.  */
> > > -
> > > -             threads
> > > -               = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> > > -                   >> 16) & 0xff);
> > > -           }
> > > -
> > > -         /* Cap usage of highest cache level to the number of supported
> > > -            threads.  */
> > > -         if (shared > 0 && threads > 0)
> > > -           shared /= threads;
> > > -       }
> > > +      get_common_info (&shared, &threads, core);
> > > +    }
> > > +  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
> > > +    {
> > > +      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
> > > +      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
> > > +      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
> > >
> > > -      /* Account for non-inclusive L2 and L3 caches.  */
> > > -      if (!inclusive_cache)
> > > -       {
> > > -         if (threads_l2 > 0)
> > > -           core /= threads_l2;
> > > -         shared += core;
> > > -       }
> > > +      get_common_info (&shared, &threads, core);
> > >      }
> > >    else if (cpu_features->basic.kind == arch_kind_amd)
> > >      {
> > > --
> > > 2.7.4
> > >
>
> My mailbox may change the format of the patch, So I have
> attached the updated patch to this email, please check.
>

LGTM.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-26 12:07       ` H.J. Lu
@ 2020-04-30  5:09         ` Mayshao-oc
  2020-04-30  5:15           ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Mayshao-oc @ 2020-04-30  5:09 UTC (permalink / raw)
  To: H.J. Lu, Carlos O'Donell, Florian Weimer
  Cc: GNU C Library, Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD),
	Cooper Yan(BJ-RD), Ricky Li(BJ-RD)

Hi 

This is my first patch.  I’m not sure what I need to do next about this patch set.
And I was wandering if this patch set is ok for the master.

Thank you all for your patience and kind help.


Best Regards,
May Shao


> -----Original Message-----
> From: H.J. Lu <hjl.tools@gmail.com>
> Sent: Sunday, April 26, 2020 8:08 PM
> To: Mayshao-oc <Mayshao-oc@zhaoxin.com>
> Cc: GNU C Library <libc-alpha@sourceware.org>; Carlos O'Donell
> <carlos@redhat.com>; Florian Weimer <fw@deneb.enyo.de>; Qiyuan
> Wang(BJ-RD) <QiyuanWang@zhaoxin.com>; Herry Yang(BJ-RD)
> <HerryYang@zhaoxin.com>; Ricky Li(BJ-RD) <RickyLi@zhaoxin.com>
> Subject: Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin
> processors
> 
> On Sat, Apr 25, 2020 at 10:55 PM Mayshao-oc <Mayshao-oc@zhaoxin.com>
> wrote:
> >
> >
> > On Fri, Apr 24, 2020 at 8:53 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Fri, Apr 24, 2020 at 5:29 AM mayshao-oc <mayshao-oc@zhaoxin.com>
> > > wrote:
> > > >
> > > > From: mayshao <mayshao-oc@zhaoxin.com>
> > > >
> > > > To obtain Zhaoxin CPU cache information, add a new function
> > > > handle_zhaoxin().
> > > >
> > > > Add a new function get_common_info() that extracts the code
> > > > in init_cacheinfo() to get the value of the variable shared,
> > > > threads.
> > > >
> > > > Add Zhaoxin branch in init_cacheinfo() for initializing variables,
> > > > such as __x86_shared_cache_size.
> > > > ---
> > > >  sysdeps/x86/cacheinfo.c | 477
> > > ++++++++++++++++++++++++++++--------------------
> > > >  1 file changed, 281 insertions(+), 196 deletions(-)
> > > >
> > > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > > > index e3e8ef2..14c6094 100644
> > > > --- a/sysdeps/x86/cacheinfo.c
> > > > +++ b/sysdeps/x86/cacheinfo.c
> > > > @@ -436,6 +436,57 @@ handle_amd (int name)
> > > >  }
> > > >
> > > >
> > > > +static long int __attribute__ ((noinline))
> > > > +handle_zhaoxin (int name)
> > > > +{
> > > > +  unsigned int eax;
> > > > +  unsigned int ebx;
> > > > +  unsigned int ecx;
> > > > +  unsigned int edx;
> > > > +
> > > > +  int folded_rel_name = (M(name) / 3) * 3;
> > > > +
> > > > +  unsigned int round = 0;
> > > > +  while (1)
> > > > +    {
> > > > +      __cpuid_count (4, round, eax, ebx, ecx, edx);
> > > > +
> > > > +      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
> > > > +      if (type == null)
> > > > +        break;
> > > > +
> > > > +      unsigned int level = (eax >> 5) & 0x7;
> > > > +
> > > > +      if ((level == 1 && type == data
> > > > +        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
> > > > +        || (level == 1 && type == inst
> > > > +            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
> > > > +        || (level == 2 && folded_rel_name ==
> > > M(_SC_LEVEL2_CACHE_SIZE))
> > > > +        || (level == 3 && folded_rel_name ==
> > > M(_SC_LEVEL3_CACHE_SIZE)))
> > > > +        {
> > > > +          unsigned int offset = M(name) - folded_rel_name;
> > > > +
> > > > +          if (offset == 0)
> > > > +            /* Cache size.  */
> > > > +            return (((ebx >> 22) + 1)
> > > > +                * (((ebx >> 12) & 0x3ff) + 1)
> > > > +                * ((ebx & 0xfff) + 1)
> > > > +                * (ecx + 1));
> > > > +          if (offset == 1)
> > > > +            return (ebx >> 22) + 1;
> > > > +
> > > > +          assert (offset == 2);
> > > > +          return (ebx & 0xfff) + 1;
> > > > +        }
> > > > +
> > > > +      ++round;
> > > > +    }
> > > > +
> > > > +  /* Nothing found.  */
> > > > +  return 0;
> > > > +}
> > > > +
> > > > +
> > > >  /* Get the value of the system variable NAME.  */
> > > >  long int
> > > >  attribute_hidden
> > > > @@ -449,6 +500,9 @@ __cache_sysconf (int name)
> > > >    if (cpu_features->basic.kind == arch_kind_amd)
> > > >      return handle_amd (name);
> > > >
> > > > +  if (cpu_features->basic.kind == arch_kind_zhaoxin)
> > > > +    return handle_zhaoxin (name);
> > > > +
> > > >    // XXX Fill in more vendors.
> > > >
> > > >    /* CPU not known, we have no information.  */
> > > > @@ -483,6 +537,223 @@ int __x86_prefetchw attribute_hidden;
> > > >
> > > >
> > > >  static void
> > > > +get_common_info (long int *shared_ptr, unsigned int *threads_ptr,
> > > > +                long int core)
> > >
> > > get_common_cache_info
> >
> > Fixed.
> >
> > > > +{
> > > > +  unsigned int eax;
> > > > +  unsigned int ebx;
> > > > +  unsigned int ecx;
> > > > +  unsigned int edx;
> > > > +
> > > > +  /* Number of logical processors sharing L2 cache.  */
> > > > +  int threads_l2;
> > > > +
> > > > +  /* Number of logical processors sharing L3 cache.  */
> > > > +  int threads_l3;
> > > > +
> > > > +  const struct cpu_features *cpu_features = __get_cpu_features ();
> > > > +  int max_cpuid = cpu_features->basic.max_cpuid;
> > > > +  unsigned int family = cpu_features->basic.family;
> > > > +  unsigned int model = cpu_features->basic.model;
> > > > +  long int shared = *shared_ptr;
> > > > +  unsigned int threads = *threads_ptr;
> > > > +  bool inclusive_cache = true;
> > > > +  bool ignore_leaf_b = false;
> > >
> > > Change to support_count_mask.
> >
> > Fixed.
> >
> > > > +
> > > > +  /* Try L3 first.  */
> > > > +  unsigned int level = 3;
> > > > +
> > > > +  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
> > > > +    ignore_leaf_b = true;
> > > > +
> > > > +  if (shared <= 0)
> > > > +    {
> > > > +      /* Try L2 otherwise.  */
> > > > +      level  = 2;
> > > > +      shared = core;
> > > > +      threads_l2 = 0;
> > > > +      threads_l3 = -1;
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      threads_l2 = 0;
> > > > +      threads_l3 = 0;
> > > > +    }
> > > > +
> > > > +  /* A value of 0 for the HTT bit indicates there is only a single
> > > > +     logical processor.  */
> > > > +  if (HAS_CPU_FEATURE (HTT))
> > > > +    {
> > > > +      /* Figure out the number of logical threads that share the
> > > > +         highest cache level.  */
> > > > +      if (max_cpuid >= 4)
> > > > +        {
> > > > +          int i = 0;
> > > > +
> > > > +          /* Query until cache level 2 and 3 are enumerated.  */
> > > > +          int check = 0x1 | (threads_l3 == 0) << 1;
> > > > +          do
> > > > +            {
> > > > +              __cpuid_count (4, i++, eax, ebx, ecx, edx);
> > > > +
> > > > +              /* There seems to be a bug in at least some Pentium
> Ds
> > > > +                 which sometimes fail to iterate all cache
> parameters.
> > > > +                 Do not loop indefinitely here, stop in this case and
> > > > +                 assume there is no such information.  */
> > > > +              if ((eax & 0x1f) == 0
> > > > +                   && cpu_features->basic.kind == arch_kind_intel)
> > >
> > > Check arch_kind_intel first.
> >
> > Fixed.
> >
> > > > +                goto intel_bug_no_cache_info;
> > > > +
> > > > +              switch ((eax >> 5) & 0x7)
> > > > +                {
> > > > +                  default:
> > > > +                    break;
> > > > +                  case 2:
> > > > +                    if ((check & 0x1))
> > > > +                      {
> > > > +                        /* Get maximum number of logical
> processors
> > > > +                           sharing L2 cache.  */
> > > > +                        threads_l2 = (eax >> 14) & 0x3ff;
> > > > +                        check &= ~0x1;
> > > > +                      }
> > > > +                    break;
> > > > +                  case 3:
> > > > +                    if ((check & (0x1 << 1)))
> > > > +                      {
> > > > +                        /* Get maximum number of logical
> processors
> > > > +                           sharing L3 cache.  */
> > > > +                        threads_l3 = (eax >> 14) & 0x3ff;
> > > > +
> > > > +                        /* Check if L2 and L3 caches are inclusive.
> */
> > > > +                        inclusive_cache = (edx & 0x2) != 0;
> > > > +                        check &= ~(0x1 << 1);
> > > > +                      }
> > > > +                    break;
> > > > +                }
> > > > +            }
> > > > +          while (check);
> > > > +
> > > > +          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the
> > > maximum
> > > > +             numbers of addressable IDs for logical processors
> sharing
> > > > +             the cache, instead of the maximum number of threads
> > > > +             sharing the cache.  */
> > > > +          if ((max_cpuid >= 11) && (!ignore_leaf_b))
> > >
> > > Drop unnecessary ().
> >
> > Fixed.
> >
> > > > +            {
> > > > +              /* Find the number of logical processors shipped in
> > > > +                 one core and apply count mask.  */
> > > > +              i = 0;
> > > > +
> > > > +              /* Count SMT only if there is L3 cache.  Always count
> > > > +                 core if there is no L3 cache.  */
> > > > +              int count = ((threads_l2 > 0 && level == 3)
> > > > +                           | ((threads_l3 > 0
> > > > +                               || (threads_l2 > 0 && level == 2))
> <<
> > > 1));
> > > > +
> > > > +              while (count)
> > > > +                {
> > > > +                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
> > > > +
> > > > +                  int shipped = ebx & 0xff;
> > > > +                  int type = ecx & 0xff00;
> > > > +                  if (shipped == 0 || type == 0)
> > > > +                    break;
> > > > +                  else if (type == 0x100)
> > > > +                    {
> > > > +                      /* Count SMT.  */
> > > > +                      if ((count & 0x1))
> > > > +                        {
> > > > +                          int count_mask;
> > > > +
> > > > +                          /* Compute count mask.  */
> > > > +                          asm ("bsr %1, %0"
> > > > +                               : "=r" (count_mask) : "g"
> > > (threads_l2));
> > > > +                          count_mask = ~(-1 << (count_mask + 1));
> > > > +                          threads_l2 = (shipped - 1) &
> count_mask;
> > > > +                          count &= ~0x1;
> > > > +                        }
> > > > +                    }
> > > > +                  else if (type == 0x200)
> > > > +                    {
> > > > +                      /* Count core.  */
> > > > +                      if ((count & (0x1 << 1)))
> > > > +                        {
> > > > +                          int count_mask;
> > > > +                          int threads_core
> > > > +                            = (level == 2 ? threads_l2 :
> threads_l3);
> > > > +
> > > > +                          /* Compute count mask.  */
> > > > +                          asm ("bsr %1, %0"
> > > > +                               : "=r" (count_mask) : "g"
> > > (threads_core));
> > > > +                          count_mask = ~(-1 << (count_mask + 1));
> > > > +                          threads_core = (shipped - 1) &
> count_mask;
> > > > +                          if (level == 2)
> > > > +                            threads_l2 = threads_core;
> > > > +                          else
> > > > +                            threads_l3 = threads_core;
> > > > +                          count &= ~(0x1 << 1);
> > > > +                        }
> > > > +                    }
> > > > +                }
> > > > +            }
> > > > +          if (threads_l2 > 0)
> > > > +            threads_l2 += 1;
> > > > +          if (threads_l3 > 0)
> > > > +            threads_l3 += 1;
> > > > +          if (level == 2)
> > > > +            {
> > > > +              if (threads_l2)
> > > > +                {
> > > > +                  threads = threads_l2;
> > > > +                  if (threads > 2 && family == 6
> > > > +                     && cpu_features->basic.kind ==
> arch_kind_intel)
> > >
> > > Check arch_kind_intel first.  Put each condition on a separate line.
> >
> > Fixed.
> >
> > > > +                    switch (model)
> > > > +                      {
> > > > +                        case 0x37:
> > > > +                        case 0x4a:
> > > > +                        case 0x4d:
> > > > +                        case 0x5a:
> > > > +                        case 0x5d:
> > > > +                          /* Silvermont has L2 cache shared by 2
> > > cores.  */
> > > > +                          threads = 2;
> > > > +                          break;
> > > > +                        default:
> > > > +                          break;
> > > > +                      }
> > > > +                }
> > > > +            }
> > > > +          else if (threads_l3)
> > > > +            threads = threads_l3;
> > > > +        }
> > > > +      else
> > > > +        {
> > > > +intel_bug_no_cache_info:
> > > > +          /* Assume that all logical threads share the highest cache
> > > > +             level.  */
> > > > +          threads
> > > > +            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> > > > +                >> 16) & 0xff);
> > > > +        }
> > > > +
> > > > +        /* Cap usage of highest cache level to the number of
> supported
> > > > +           threads.  */
> > > > +        if (shared > 0 && threads > 0)
> > > > +          shared /= threads;
> > > > +    }
> > > > +
> > > > +  /* Account for non-inclusive L2 and L3 caches.  */
> > > > +  if (!inclusive_cache)
> > > > +    {
> > > > +      if (threads_l2 > 0)
> > > > +        core /= threads_l2;
> > > > +      shared += core;
> > > > +    }
> > > > +
> > > > +  *shared_ptr = shared;
> > > > +  *threads_ptr = threads;
> > > > +}
> > > > +
> > > > +
> > > > +static void
> > > >  __attribute__((constructor))
> > > >  init_cacheinfo (void)
> > > >  {
> > > > @@ -494,211 +765,25 @@ init_cacheinfo (void)
> > > >    int max_cpuid_ex;
> > > >    long int data = -1;
> > > >    long int shared = -1;
> > > > -  unsigned int level;
> > > > +  long int core;
> > > >    unsigned int threads = 0;
> > > >    const struct cpu_features *cpu_features = __get_cpu_features ();
> > > > -  int max_cpuid = cpu_features->basic.max_cpuid;
> > > >
> > > >    if (cpu_features->basic.kind == arch_kind_intel)
> > > >      {
> > > >        data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
> > > > -
> > > > -      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE,
> > > cpu_features);
> > > > -      bool inclusive_cache = true;
> > > > -
> > > > -      /* Try L3 first.  */
> > > > -      level  = 3;
> > > > +      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
> > > >        shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
> > > >
> > > > -      /* Number of logical processors sharing L2 cache.  */
> > > > -      int threads_l2;
> > > > -
> > > > -      /* Number of logical processors sharing L3 cache.  */
> > > > -      int threads_l3;
> > > > -
> > > > -      if (shared <= 0)
> > > > -       {
> > > > -         /* Try L2 otherwise.  */
> > > > -         level  = 2;
> > > > -         shared = core;
> > > > -         threads_l2 = 0;
> > > > -         threads_l3 = -1;
> > > > -       }
> > > > -      else
> > > > -       {
> > > > -         threads_l2 = 0;
> > > > -         threads_l3 = 0;
> > > > -       }
> > > > -
> > > > -      /* A value of 0 for the HTT bit indicates there is only a single
> > > > -        logical processor.  */
> > > > -      if (HAS_CPU_FEATURE (HTT))
> > > > -       {
> > > > -         /* Figure out the number of logical threads that share the
> > > > -            highest cache level.  */
> > > > -         if (max_cpuid >= 4)
> > > > -           {
> > > > -             unsigned int family = cpu_features->basic.family;
> > > > -             unsigned int model = cpu_features->basic.model;
> > > > -
> > > > -             int i = 0;
> > > > -
> > > > -             /* Query until cache level 2 and 3 are enumerated.  */
> > > > -             int check = 0x1 | (threads_l3 == 0) << 1;
> > > > -             do
> > > > -               {
> > > > -                 __cpuid_count (4, i++, eax, ebx, ecx, edx);
> > > > -
> > > > -                 /* There seems to be a bug in at least some
> Pentium Ds
> > > > -                    which sometimes fail to iterate all cache
> > > parameters.
> > > > -                    Do not loop indefinitely here, stop in this case
> and
> > > > -                    assume there is no such information.  */
> > > > -                 if ((eax & 0x1f) == 0)
> > > > -                   goto intel_bug_no_cache_info;
> > > > -
> > > > -                 switch ((eax >> 5) & 0x7)
> > > > -                   {
> > > > -                   default:
> > > > -                     break;
> > > > -                   case 2:
> > > > -                     if ((check & 0x1))
> > > > -                       {
> > > > -                         /* Get maximum number of logical
> processors
> > > > -                            sharing L2 cache.  */
> > > > -                         threads_l2 = (eax >> 14) & 0x3ff;
> > > > -                         check &= ~0x1;
> > > > -                       }
> > > > -                     break;
> > > > -                   case 3:
> > > > -                     if ((check & (0x1 << 1)))
> > > > -                       {
> > > > -                         /* Get maximum number of logical
> processors
> > > > -                            sharing L3 cache.  */
> > > > -                         threads_l3 = (eax >> 14) & 0x3ff;
> > > > -
> > > > -                         /* Check if L2 and L3 caches are inclusive.
> */
> > > > -                         inclusive_cache = (edx & 0x2) != 0;
> > > > -                         check &= ~(0x1 << 1);
> > > > -                       }
> > > > -                     break;
> > > > -                   }
> > > > -               }
> > > > -             while (check);
> > > > -
> > > > -             /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are
> the
> > > maximum
> > > > -                numbers of addressable IDs for logical processors
> > > sharing
> > > > -                the cache, instead of the maximum number of
> threads
> > > > -                sharing the cache.  */
> > > > -             if (max_cpuid >= 11)
> > > > -               {
> > > > -                 /* Find the number of logical processors shipped in
> > > > -                    one core and apply count mask.  */
> > > > -                 i = 0;
> > > > -
> > > > -                 /* Count SMT only if there is L3 cache.  Always
> count
> > > > -                    core if there is no L3 cache.  */
> > > > -                 int count = ((threads_l2 > 0 && level == 3)
> > > > -                              | ((threads_l3 > 0
> > > > -                                  || (threads_l2 > 0 && level ==
> 2))
> > > << 1));
> > > > -
> > > > -                 while (count)
> > > > -                   {
> > > > -                     __cpuid_count (11, i++, eax, ebx, ecx, edx);
> > > > -
> > > > -                     int shipped = ebx & 0xff;
> > > > -                     int type = ecx & 0xff00;
> > > > -                     if (shipped == 0 || type == 0)
> > > > -                       break;
> > > > -                     else if (type == 0x100)
> > > > -                       {
> > > > -                         /* Count SMT.  */
> > > > -                         if ((count & 0x1))
> > > > -                           {
> > > > -                             int count_mask;
> > > > -
> > > > -                             /* Compute count mask.  */
> > > > -                             asm ("bsr %1, %0"
> > > > -                                  : "=r" (count_mask) : "g"
> > > (threads_l2));
> > > > -                             count_mask = ~(-1 << (count_mask +
> 1));
> > > > -                             threads_l2 = (shipped - 1) &
> count_mask;
> > > > -                             count &= ~0x1;
> > > > -                           }
> > > > -                       }
> > > > -                     else if (type == 0x200)
> > > > -                       {
> > > > -                         /* Count core.  */
> > > > -                         if ((count & (0x1 << 1)))
> > > > -                           {
> > > > -                             int count_mask;
> > > > -                             int threads_core
> > > > -                               = (level == 2 ? threads_l2 :
> > > threads_l3);
> > > > -
> > > > -                             /* Compute count mask.  */
> > > > -                             asm ("bsr %1, %0"
> > > > -                                  : "=r" (count_mask) : "g"
> > > (threads_core));
> > > > -                             count_mask = ~(-1 << (count_mask +
> 1));
> > > > -                             threads_core = (shipped - 1) &
> > > count_mask;
> > > > -                             if (level == 2)
> > > > -                               threads_l2 = threads_core;
> > > > -                             else
> > > > -                               threads_l3 = threads_core;
> > > > -                             count &= ~(0x1 << 1);
> > > > -                           }
> > > > -                       }
> > > > -                   }
> > > > -               }
> > > > -             if (threads_l2 > 0)
> > > > -               threads_l2 += 1;
> > > > -             if (threads_l3 > 0)
> > > > -               threads_l3 += 1;
> > > > -             if (level == 2)
> > > > -               {
> > > > -                 if (threads_l2)
> > > > -                   {
> > > > -                     threads = threads_l2;
> > > > -                     if (threads > 2 && family == 6)
> > > > -                       switch (model)
> > > > -                         {
> > > > -                         case 0x37:
> > > > -                         case 0x4a:
> > > > -                         case 0x4d:
> > > > -                         case 0x5a:
> > > > -                         case 0x5d:
> > > > -                           /* Silvermont has L2 cache shared by 2
> > > cores.  */
> > > > -                           threads = 2;
> > > > -                           break;
> > > > -                         default:
> > > > -                           break;
> > > > -                         }
> > > > -                   }
> > > > -               }
> > > > -             else if (threads_l3)
> > > > -               threads = threads_l3;
> > > > -           }
> > > > -         else
> > > > -           {
> > > > -intel_bug_no_cache_info:
> > > > -             /* Assume that all logical threads share the highest
> cache
> > > > -                level.  */
> > > > -
> > > > -             threads
> > > > -               =
> ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
> > > > -                   >> 16) & 0xff);
> > > > -           }
> > > > -
> > > > -         /* Cap usage of highest cache level to the number of
> supported
> > > > -            threads.  */
> > > > -         if (shared > 0 && threads > 0)
> > > > -           shared /= threads;
> > > > -       }
> > > > +      get_common_info (&shared, &threads, core);
> > > > +    }
> > > > +  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
> > > > +    {
> > > > +      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
> > > > +      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
> > > > +      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
> > > >
> > > > -      /* Account for non-inclusive L2 and L3 caches.  */
> > > > -      if (!inclusive_cache)
> > > > -       {
> > > > -         if (threads_l2 > 0)
> > > > -           core /= threads_l2;
> > > > -         shared += core;
> > > > -       }
> > > > +      get_common_info (&shared, &threads, core);
> > > >      }
> > > >    else if (cpu_features->basic.kind == arch_kind_amd)
> > > >      {
> > > > --
> > > > 2.7.4
> > > >
> >
> > My mailbox may change the format of the patch, So I have
> > attached the updated patch to this email, please check.
> >
> 
> LGTM.
> 
> Thanks.
> 
> --
> H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30  5:09         ` Mayshao-oc
@ 2020-04-30  5:15           ` H.J. Lu
  2020-04-30  6:04             ` Mayshao-oc
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-04-30  5:15 UTC (permalink / raw)
  To: Mayshao-oc
  Cc: Carlos O'Donell, Florian Weimer, GNU C Library,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Cooper Yan(BJ-RD),
	Ricky Li(BJ-RD)

On Wed, Apr 29, 2020 at 10:10 PM Mayshao-oc <Mayshao-oc@zhaoxin.com> wrote:
>
> Hi
>
> This is my first patch.  I’m not sure what I need to do next about this patch set.
> And I was wandering if this patch set is ok for the master.
>
> Thank you all for your patience and kind help.
>

It is OK.  Please check it in.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30  5:15           ` H.J. Lu
@ 2020-04-30  6:04             ` Mayshao-oc
  2020-04-30 12:52               ` H.J. Lu
  2020-04-30 19:10               ` Joseph Myers
  0 siblings, 2 replies; 20+ messages in thread
From: Mayshao-oc @ 2020-04-30  6:04 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Carlos O'Donell, Florian Weimer, GNU C Library,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Cooper Yan(BJ-RD),
	Ricky Li(BJ-RD)


On Thur, April 30, 2020 at 1:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Wed, Apr 29, 2020 at 10:10 PM Mayshao-oc <Mayshao-oc@zhaoxin.com>
> wrote:
> >
> > Hi
> >
> > This is my first patch.  I’m not sure what I need to do next about this patch
> set.
> > And I was wandering if this patch set is ok for the master.
> >
> > Thank you all for your patience and kind help.
> >
> 
> It is OK.  Please check it in.
> 
Do you mean that I should push the patch set to the master by myself?
If so, I should have no permission to push.  Would you like to push to the master
for me at your leisure? 

If I understand something wrong, please feel free to point it out.

Thank you very much.

Best Regards,
May Shao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30  6:04             ` Mayshao-oc
@ 2020-04-30 12:52               ` H.J. Lu
  2020-04-30 13:22                 ` Mayshao-oc
  2020-04-30 19:10               ` Joseph Myers
  1 sibling, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-04-30 12:52 UTC (permalink / raw)
  To: Mayshao-oc
  Cc: Carlos O'Donell, Florian Weimer, GNU C Library,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Cooper Yan(BJ-RD),
	Ricky Li(BJ-RD)

On Wed, Apr 29, 2020 at 11:04 PM Mayshao-oc <Mayshao-oc@zhaoxin.com> wrote:
>
>
> On Thur, April 30, 2020 at 1:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Wed, Apr 29, 2020 at 10:10 PM Mayshao-oc <Mayshao-oc@zhaoxin.com>
> > wrote:
> > >
> > > Hi
> > >
> > > This is my first patch.  I’m not sure what I need to do next about this patch
> > set.
> > > And I was wandering if this patch set is ok for the master.
> > >
> > > Thank you all for your patience and kind help.
> > >
> >
> > It is OK.  Please check it in.
> >
> Do you mean that I should push the patch set to the master by myself?
> If so, I should have no permission to push.  Would you like to push to the master
> for me at your leisure?
>
> If I understand something wrong, please feel free to point it out.
>
> Thank you very much.

Please send me those patches from "git format-patch" as attachments, I will
apply them for you.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30 12:52               ` H.J. Lu
@ 2020-04-30 13:22                 ` Mayshao-oc
  2020-04-30 13:55                   ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Mayshao-oc @ 2020-04-30 13:22 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Carlos O'Donell, Florian Weimer, GNU C Library,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Cooper Yan(BJ-RD),
	Ricky Li(BJ-RD)

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]


On Thur, Apr 30, 2020 at 8:52 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Wed, Apr 29, 2020 at 11:04 PM Mayshao-oc <Mayshao-oc@zhaoxin.com>
> wrote:
> >
> >
> > On Thur, April 30, 2020 at 1:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Wed, Apr 29, 2020 at 10:10 PM Mayshao-oc
> <Mayshao-oc@zhaoxin.com>
> > > wrote:
> > > >
> > > > Hi
> > > >
> > > > This is my first patch.  I’m not sure what I need to do next about this
> patch
> > > set.
> > > > And I was wandering if this patch set is ok for the master.
> > > >
> > > > Thank you all for your patience and kind help.
> > > >
> > >
> > > It is OK.  Please check it in.
> > >
> > Do you mean that I should push the patch set to the master by myself?
> > If so, I should have no permission to push.  Would you like to push to the
> master
> > for me at your leisure?
> >
> > If I understand something wrong, please feel free to point it out.
> >
> > Thank you very much.
> 
> Please send me those patches from "git format-patch" as attachments, I will
> apply them for you.
> 
I have attached the patch to this email, please check.

Thank you so much.

Best Regards,
May Shao


[-- Attachment #2: 0001-x86-Add-CPU-Vendor-ID-detection-support-for-Zhaoxin-.patch --]
[-- Type: application/octet-stream, Size: 2940 bytes --]

From b45d5780ac9023d7a210a8e86ca3bf41a2e92d50 Mon Sep 17 00:00:00 2001
From: mayshao <mayshao-oc@zhaoxin.com>
Date: Fri, 24 Apr 2020 12:55:38 +0800
Subject: [PATCH v3 1/3] x86: Add CPU Vendor ID detection support for Zhaoxin
 processors

To recognize Zhaoxin CPU Vendor ID, add a new architecture type
arch_kind_zhaoxin for Vendor Zhaoxin detection.
---
 sysdeps/x86/cpu-features.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++
 sysdeps/x86/cpu-features.h |  1 +
 2 files changed, 55 insertions(+)

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 81a170a..bfb415f 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -466,6 +466,60 @@ init_cpu_features (struct cpu_features *cpu_features)
 	  }
 	}
     }
+  /* This spells out "CentaurHauls" or " Shanghai ".  */
+  else if ((ebx == 0x746e6543 && ecx == 0x736c7561 && edx == 0x48727561)
+	   || (ebx == 0x68532020 && ecx == 0x20206961 && edx == 0x68676e61))
+    {
+      unsigned int extended_model, stepping;
+
+      kind = arch_kind_zhaoxin;
+
+      get_common_indices (cpu_features, &family, &model, &extended_model,
+			  &stepping);
+
+      get_extended_indices (cpu_features);
+
+      model += extended_model;
+      if (family == 0x6)
+        {
+          if (model == 0xf || model == 0x19)
+            {
+              cpu_features->feature[index_arch_AVX_Usable]
+                &= (~bit_arch_AVX_Usable
+                & ~bit_arch_AVX2_Usable);
+
+              cpu_features->feature[index_arch_Slow_SSE4_2]
+                |= (bit_arch_Slow_SSE4_2);
+
+              cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
+                &= ~bit_arch_AVX_Fast_Unaligned_Load;
+            }
+        }
+      else if (family == 0x7)
+        {
+          if (model == 0x1b)
+            {
+              cpu_features->feature[index_arch_AVX_Usable]
+                &= (~bit_arch_AVX_Usable
+                & ~bit_arch_AVX2_Usable);
+
+              cpu_features->feature[index_arch_Slow_SSE4_2]
+                |= bit_arch_Slow_SSE4_2;
+
+              cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
+                &= ~bit_arch_AVX_Fast_Unaligned_Load;
+           }
+         else if (model == 0x3b)
+           {
+             cpu_features->feature[index_arch_AVX_Usable]
+               &= (~bit_arch_AVX_Usable
+               & ~bit_arch_AVX2_Usable);
+
+               cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
+               &= ~bit_arch_AVX_Fast_Unaligned_Load;
+           }
+       }
+    }
   else
     {
       kind = arch_kind_other;
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index aea83e6..f05d5ce 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -53,6 +53,7 @@ enum cpu_features_kind
   arch_kind_unknown = 0,
   arch_kind_intel,
   arch_kind_amd,
+  arch_kind_zhaoxin,
   arch_kind_other
 };
 
-- 
2.7.4


[-- Attachment #3: 0002-x86-Add-cache-information-support-for-Zhaoxin-proces.patch --]
[-- Type: application/octet-stream, Size: 16075 bytes --]

From 2f09779cfaa9d14e14ef5f957702a9cb89339abe Mon Sep 17 00:00:00 2001
From: mayshao-oc <mayshao-oc@zhaoxin.com>
Date: Sun, 26 Apr 2020 13:48:27 +0800
Subject: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors

To obtain Zhaoxin CPU cache information, add a new function
handle_zhaoxin().

Add a new function get_common_cache_info() that extracts the code
in init_cacheinfo() to get the value of the variable shared, threads.

Add Zhaoxin branch in init_cacheinfo() for initializing variables,
such as __x86_shared_cache_size.
---
 sysdeps/x86/cacheinfo.c | 478 ++++++++++++++++++++++++++++--------------------
 1 file changed, 282 insertions(+), 196 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index e3e8ef2..436c05d 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -436,6 +436,57 @@ handle_amd (int name)
 }
 
 
+static long int __attribute__ ((noinline))
+handle_zhaoxin (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  unsigned int round = 0;
+  while (1)
+    {
+      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+      if (type == null)
+        break;
+
+      unsigned int level = (eax >> 5) & 0x7;
+
+      if ((level == 1 && type == data
+        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+        || (level == 1 && type == inst
+            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
+        {
+          unsigned int offset = M(name) - folded_rel_name;
+
+          if (offset == 0)
+            /* Cache size.  */
+            return (((ebx >> 22) + 1)
+                * (((ebx >> 12) & 0x3ff) + 1)
+                * ((ebx & 0xfff) + 1)
+                * (ecx + 1));
+          if (offset == 1)
+            return (ebx >> 22) + 1;
+
+          assert (offset == 2);
+          return (ebx & 0xfff) + 1;
+        }
+
+      ++round;
+    }
+
+  /* Nothing found.  */	
+  return 0;
+}
+
+
 /* Get the value of the system variable NAME.  */
 long int
 attribute_hidden
@@ -449,6 +500,9 @@ __cache_sysconf (int name)
   if (cpu_features->basic.kind == arch_kind_amd)
     return handle_amd (name);
 
+  if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    return handle_zhaoxin (name);
+
   // XXX Fill in more vendors.
 
   /* CPU not known, we have no information.  */
@@ -483,6 +537,224 @@ int __x86_prefetchw attribute_hidden;
 
 
 static void
+get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
+                long int core)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  /* Number of logical processors sharing L2 cache.  */
+  int threads_l2;
+
+  /* Number of logical processors sharing L3 cache.  */
+  int threads_l3;
+
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+  int max_cpuid = cpu_features->basic.max_cpuid;
+  unsigned int family = cpu_features->basic.family;
+  unsigned int model = cpu_features->basic.model;
+  long int shared = *shared_ptr;
+  unsigned int threads = *threads_ptr;
+  bool inclusive_cache = true;
+  bool support_count_mask = true; 
+
+  /* Try L3 first.  */
+  unsigned int level = 3;
+
+  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
+    support_count_mask = false;
+  
+  if (shared <= 0)
+    {
+      /* Try L2 otherwise.  */
+      level  = 2;
+      shared = core;
+      threads_l2 = 0;
+      threads_l3 = -1;
+    }
+  else
+    {
+      threads_l2 = 0;
+      threads_l3 = 0;
+    }
+
+  /* A value of 0 for the HTT bit indicates there is only a single
+     logical processor.  */
+  if (HAS_CPU_FEATURE (HTT))
+    {
+      /* Figure out the number of logical threads that share the
+         highest cache level.  */
+      if (max_cpuid >= 4)
+        {
+          int i = 0;
+
+          /* Query until cache level 2 and 3 are enumerated.  */
+          int check = 0x1 | (threads_l3 == 0) << 1;
+          do
+            {
+              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+
+              /* There seems to be a bug in at least some Pentium Ds
+                 which sometimes fail to iterate all cache parameters.
+                 Do not loop indefinitely here, stop in this case and
+                 assume there is no such information.  */
+              if (cpu_features->basic.kind == arch_kind_intel
+                  && (eax & 0x1f) == 0 )
+                goto intel_bug_no_cache_info;
+
+              switch ((eax >> 5) & 0x7)
+                {
+                  default:
+                    break;
+                  case 2:
+                    if ((check & 0x1))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L2 cache.  */
+                        threads_l2 = (eax >> 14) & 0x3ff;
+                        check &= ~0x1;
+                      }
+                    break;
+                  case 3:
+                    if ((check & (0x1 << 1)))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L3 cache.  */
+                        threads_l3 = (eax >> 14) & 0x3ff;
+
+                        /* Check if L2 and L3 caches are inclusive.  */
+                        inclusive_cache = (edx & 0x2) != 0;
+                        check &= ~(0x1 << 1);
+                      }
+                    break;
+                }
+            }
+          while (check);
+
+          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
+             numbers of addressable IDs for logical processors sharing
+             the cache, instead of the maximum number of threads
+             sharing the cache.  */
+          if (max_cpuid >= 11 && support_count_mask)
+            {
+              /* Find the number of logical processors shipped in
+                 one core and apply count mask.  */
+              i = 0;
+
+              /* Count SMT only if there is L3 cache.  Always count
+                 core if there is no L3 cache.  */
+              int count = ((threads_l2 > 0 && level == 3)
+                           | ((threads_l3 > 0
+                               || (threads_l2 > 0 && level == 2)) << 1));
+
+              while (count)
+                {
+                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+
+                  int shipped = ebx & 0xff;
+                  int type = ecx & 0xff00;
+                  if (shipped == 0 || type == 0)
+                    break;
+                  else if (type == 0x100)
+                    {
+                      /* Count SMT.  */
+                      if ((count & 0x1))
+                        {
+                          int count_mask;
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_l2));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_l2 = (shipped - 1) & count_mask;
+                          count &= ~0x1;
+                        }
+                    }
+                  else if (type == 0x200)
+                    {
+                      /* Count core.  */
+                      if ((count & (0x1 << 1)))
+                        {
+                          int count_mask;
+                          int threads_core
+                            = (level == 2 ? threads_l2 : threads_l3);
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_core));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_core = (shipped - 1) & count_mask;
+                          if (level == 2)
+                            threads_l2 = threads_core;
+                          else
+                            threads_l3 = threads_core;
+                          count &= ~(0x1 << 1);
+                        }
+                    }
+                }
+            }
+          if (threads_l2 > 0)
+            threads_l2 += 1;
+          if (threads_l3 > 0)
+            threads_l3 += 1;
+          if (level == 2)
+            {
+              if (threads_l2)
+                {
+                  threads = threads_l2;
+                  if (cpu_features->basic.kind == arch_kind_intel
+                      && threads > 2 
+                      && family == 6)
+                    switch (model)
+                      {
+                        case 0x37:
+                        case 0x4a:
+                        case 0x4d:
+                        case 0x5a:
+                        case 0x5d:
+                          /* Silvermont has L2 cache shared by 2 cores.  */
+                          threads = 2;
+                          break;
+                        default:
+                          break;
+                      }
+                }
+            }
+          else if (threads_l3)
+            threads = threads_l3;
+        }
+      else
+        {
+intel_bug_no_cache_info:
+          /* Assume that all logical threads share the highest cache
+             level.  */
+          threads
+            = ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
+                >> 16) & 0xff);
+        }
+
+        /* Cap usage of highest cache level to the number of supported
+           threads.  */
+        if (shared > 0 && threads > 0)
+          shared /= threads;
+    }
+
+  /* Account for non-inclusive L2 and L3 caches.  */
+  if (!inclusive_cache)
+    {
+      if (threads_l2 > 0)
+        core /= threads_l2;
+      shared += core;
+    }
+
+  *shared_ptr = shared;
+  *threads_ptr = threads;
+}
+
+
+static void
 __attribute__((constructor))
 init_cacheinfo (void)
 {
@@ -494,211 +766,25 @@ init_cacheinfo (void)
   int max_cpuid_ex;
   long int data = -1;
   long int shared = -1;
-  unsigned int level;
+  long int core;
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
-  int max_cpuid = cpu_features->basic.max_cpuid;
 
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
-
-      long int core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
-      bool inclusive_cache = true;
-
-      /* Try L3 first.  */
-      level  = 3;
+      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
       shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
 
-      /* Number of logical processors sharing L2 cache.  */
-      int threads_l2;
-
-      /* Number of logical processors sharing L3 cache.  */
-      int threads_l3;
-
-      if (shared <= 0)
-	{
-	  /* Try L2 otherwise.  */
-	  level  = 2;
-	  shared = core;
-	  threads_l2 = 0;
-	  threads_l3 = -1;
-	}
-      else
-	{
-	  threads_l2 = 0;
-	  threads_l3 = 0;
-	}
-
-      /* A value of 0 for the HTT bit indicates there is only a single
-	 logical processor.  */
-      if (HAS_CPU_FEATURE (HTT))
-	{
-	  /* Figure out the number of logical threads that share the
-	     highest cache level.  */
-	  if (max_cpuid >= 4)
-	    {
-	      unsigned int family = cpu_features->basic.family;
-	      unsigned int model = cpu_features->basic.model;
-
-	      int i = 0;
-
-	      /* Query until cache level 2 and 3 are enumerated.  */
-	      int check = 0x1 | (threads_l3 == 0) << 1;
-	      do
-		{
-		  __cpuid_count (4, i++, eax, ebx, ecx, edx);
-
-		  /* There seems to be a bug in at least some Pentium Ds
-		     which sometimes fail to iterate all cache parameters.
-		     Do not loop indefinitely here, stop in this case and
-		     assume there is no such information.  */
-		  if ((eax & 0x1f) == 0)
-		    goto intel_bug_no_cache_info;
-
-		  switch ((eax >> 5) & 0x7)
-		    {
-		    default:
-		      break;
-		    case 2:
-		      if ((check & 0x1))
-			{
-			  /* Get maximum number of logical processors
-			     sharing L2 cache.  */
-			  threads_l2 = (eax >> 14) & 0x3ff;
-			  check &= ~0x1;
-			}
-		      break;
-		    case 3:
-		      if ((check & (0x1 << 1)))
-			{
-			  /* Get maximum number of logical processors
-			     sharing L3 cache.  */
-			  threads_l3 = (eax >> 14) & 0x3ff;
-
-			  /* Check if L2 and L3 caches are inclusive.  */
-			  inclusive_cache = (edx & 0x2) != 0;
-			  check &= ~(0x1 << 1);
-			}
-		      break;
-		    }
-		}
-	      while (check);
-
-	      /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
-		 numbers of addressable IDs for logical processors sharing
-		 the cache, instead of the maximum number of threads
-		 sharing the cache.  */
-	      if (max_cpuid >= 11)
-		{
-		  /* Find the number of logical processors shipped in
-		     one core and apply count mask.  */
-		  i = 0;
-
-		  /* Count SMT only if there is L3 cache.  Always count
-		     core if there is no L3 cache.  */
-		  int count = ((threads_l2 > 0 && level == 3)
-			       | ((threads_l3 > 0
-				   || (threads_l2 > 0 && level == 2)) << 1));
-
-		  while (count)
-		    {
-		      __cpuid_count (11, i++, eax, ebx, ecx, edx);
-
-		      int shipped = ebx & 0xff;
-		      int type = ecx & 0xff00;
-		      if (shipped == 0 || type == 0)
-			break;
-		      else if (type == 0x100)
-			{
-			  /* Count SMT.  */
-			  if ((count & 0x1))
-			    {
-			      int count_mask;
-
-			      /* Compute count mask.  */
-			      asm ("bsr %1, %0"
-				   : "=r" (count_mask) : "g" (threads_l2));
-			      count_mask = ~(-1 << (count_mask + 1));
-			      threads_l2 = (shipped - 1) & count_mask;
-			      count &= ~0x1;
-			    }
-			}
-		      else if (type == 0x200)
-			{
-			  /* Count core.  */
-			  if ((count & (0x1 << 1)))
-			    {
-			      int count_mask;
-			      int threads_core
-				= (level == 2 ? threads_l2 : threads_l3);
-
-			      /* Compute count mask.  */
-			      asm ("bsr %1, %0"
-				   : "=r" (count_mask) : "g" (threads_core));
-			      count_mask = ~(-1 << (count_mask + 1));
-			      threads_core = (shipped - 1) & count_mask;
-			      if (level == 2)
-				threads_l2 = threads_core;
-			      else
-				threads_l3 = threads_core;
-			      count &= ~(0x1 << 1);
-			    }
-			}
-		    }
-		}
-	      if (threads_l2 > 0)
-		threads_l2 += 1;
-	      if (threads_l3 > 0)
-		threads_l3 += 1;
-	      if (level == 2)
-		{
-		  if (threads_l2)
-		    {
-		      threads = threads_l2;
-		      if (threads > 2 && family == 6)
-			switch (model)
-			  {
-			  case 0x37:
-			  case 0x4a:
-			  case 0x4d:
-			  case 0x5a:
-			  case 0x5d:
-			    /* Silvermont has L2 cache shared by 2 cores.  */
-			    threads = 2;
-			    break;
-			  default:
-			    break;
-			  }
-		    }
-		}
-	      else if (threads_l3)
-		threads = threads_l3;
-	    }
-	  else
-	    {
-intel_bug_no_cache_info:
-	      /* Assume that all logical threads share the highest cache
-		 level.  */
-
-	      threads
-		= ((cpu_features->cpuid[COMMON_CPUID_INDEX_1].ebx
-		    >> 16) & 0xff);
-	    }
-
-	  /* Cap usage of highest cache level to the number of supported
-	     threads.  */
-	  if (shared > 0 && threads > 0)
-	    shared /= threads;
-	}
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    {
+      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
 
-      /* Account for non-inclusive L2 and L3 caches.  */
-      if (!inclusive_cache)
-	{
-	  if (threads_l2 > 0)
-	    core /= threads_l2;
-	  shared += core;
-	}
+      get_common_cache_info (&shared, &threads, core);
     }
   else if (cpu_features->basic.kind == arch_kind_amd)
     {
-- 
2.7.4


[-- Attachment #4: 0003-x86-Add-the-test-case-of-__get_cpu_features-support-.patch --]
[-- Type: application/octet-stream, Size: 1066 bytes --]

From de68a1831a434c69c9614e144ed60dd0677dfd1a Mon Sep 17 00:00:00 2001
From: mayshao-oc <mayshao-oc@zhaoxin.com>
Date: Sun, 26 Apr 2020 13:49:44 +0800
Subject: [PATCH v3 3/3] x86: Add the test case of __get_cpu_features support for
 Zhaoxin processors

For the test case of the __get_cpu_features interface, add an item in
cpu_kinds and a switch case for Zhaoxin support.
---
 sysdeps/x86/tst-get-cpu-features.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sysdeps/x86/tst-get-cpu-features.c b/sysdeps/x86/tst-get-cpu-features.c
index 0f55987..0dcb906 100644
--- a/sysdeps/x86/tst-get-cpu-features.c
+++ b/sysdeps/x86/tst-get-cpu-features.c
@@ -38,6 +38,7 @@ static const char * const cpu_kinds[] =
   "Unknown",
   "Intel",
   "AMD",
+  "ZHAOXIN",
   "Other",
 };
 
@@ -50,6 +51,7 @@ do_test (void)
     {
     case arch_kind_intel:
     case arch_kind_amd:
+    case arch_kind_zhaoxin:
     case arch_kind_other:
       printf ("Vendor: %s\n", cpu_kinds[cpu_features->basic.kind]);
       printf ("Family: 0x%x\n", cpu_features->basic.family);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30 13:22                 ` Mayshao-oc
@ 2020-04-30 13:55                   ` H.J. Lu
  2020-04-30 14:39                     ` Mayshao-oc
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-04-30 13:55 UTC (permalink / raw)
  To: Mayshao-oc
  Cc: Carlos O'Donell, Florian Weimer, GNU C Library,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Cooper Yan(BJ-RD),
	Ricky Li(BJ-RD)

On Thu, Apr 30, 2020 at 6:23 AM Mayshao-oc <Mayshao-oc@zhaoxin.com> wrote:
>
>
> On Thur, Apr 30, 2020 at 8:52 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Wed, Apr 29, 2020 at 11:04 PM Mayshao-oc <Mayshao-oc@zhaoxin.com>
> > wrote:
> > >
> > >
> > > On Thur, April 30, 2020 at 1:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 29, 2020 at 10:10 PM Mayshao-oc
> > <Mayshao-oc@zhaoxin.com>
> > > > wrote:
> > > > >
> > > > > Hi
> > > > >
> > > > > This is my first patch.  I’m not sure what I need to do next about this
> > patch
> > > > set.
> > > > > And I was wandering if this patch set is ok for the master.
> > > > >
> > > > > Thank you all for your patience and kind help.
> > > > >
> > > >
> > > > It is OK.  Please check it in.
> > > >
> > > Do you mean that I should push the patch set to the master by myself?
> > > If so, I should have no permission to push.  Would you like to push to the
> > master
> > > for me at your leisure?
> > >
> > > If I understand something wrong, please feel free to point it out.
> > >
> > > Thank you very much.
> >
> > Please send me those patches from "git format-patch" as attachments, I will
> > apply them for you.
> >
> I have attached the patch to this email, please check.
>
> Thank you so much.
>

I checked in your patches after fixing up the white space issues in
the second patch:

Applying: x86: Add cache information support for Zhaoxin processors
.git/rebase-apply/patch:59: trailing whitespace.
  /* Nothing found.  */
.git/rebase-apply/patch:102: trailing whitespace.
  bool support_count_mask = true;
.git/rebase-apply/patch:109: trailing whitespace.

.git/rebase-apply/patch:249: trailing whitespace.
                      && threads > 2
warning: 4 lines add whitespace errors.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30 13:55                   ` H.J. Lu
@ 2020-04-30 14:39                     ` Mayshao-oc
  0 siblings, 0 replies; 20+ messages in thread
From: Mayshao-oc @ 2020-04-30 14:39 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Carlos O'Donell, Florian Weimer, GNU C Library,
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD), Cooper Yan(BJ-RD),
	Ricky Li(BJ-RD)


On Thur, Apr 30, 2020 at 9:55 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> On Thu, Apr 30, 2020 at 6:23 AM Mayshao-oc <Mayshao-oc@zhaoxin.com>
> wrote:
> >
> >
> > On Thur, Apr 30, 2020 at 8:52 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Wed, Apr 29, 2020 at 11:04 PM Mayshao-oc
> <Mayshao-oc@zhaoxin.com>
> > > wrote:
> > > >
> > > >
> > > > On Thur, April 30, 2020 at 1:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > On Wed, Apr 29, 2020 at 10:10 PM Mayshao-oc
> > > <Mayshao-oc@zhaoxin.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi
> > > > > >
> > > > > > This is my first patch.  I’m not sure what I need to do next about this
> > > patch
> > > > > set.
> > > > > > And I was wandering if this patch set is ok for the master.
> > > > > >
> > > > > > Thank you all for your patience and kind help.
> > > > > >
> > > > >
> > > > > It is OK.  Please check it in.
> > > > >
> > > > Do you mean that I should push the patch set to the master by myself?
> > > > If so, I should have no permission to push.  Would you like to push to the
> > > master
> > > > for me at your leisure?
> > > >
> > > > If I understand something wrong, please feel free to point it out.
> > > >
> > > > Thank you very much.
> > >
> > > Please send me those patches from "git format-patch" as attachments, I
> will
> > > apply them for you.
> > >
> > I have attached the patch to this email, please check.
> >
> > Thank you so much.
> >
> 
> I checked in your patches after fixing up the white space issues in
> the second patch:
> 
> Applying: x86: Add cache information support for Zhaoxin processors
> .git/rebase-apply/patch:59: trailing whitespace.
>   /* Nothing found.  */
> .git/rebase-apply/patch:102: trailing whitespace.
>   bool support_count_mask = true;
> .git/rebase-apply/patch:109: trailing whitespace.
> 
> .git/rebase-apply/patch:249: trailing whitespace.
>                       && threads > 2
> warning: 4 lines add whitespace errors.
> 

I didn’t notice this before, It was my fault.  Thank you for fixing it up.


Best Regards,
May Shao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30  6:04             ` Mayshao-oc
  2020-04-30 12:52               ` H.J. Lu
@ 2020-04-30 19:10               ` Joseph Myers
  2020-04-30 19:16                 ` Florian Weimer
  1 sibling, 1 reply; 20+ messages in thread
From: Joseph Myers @ 2020-04-30 19:10 UTC (permalink / raw)
  To: Mayshao-oc
  Cc: H.J. Lu, Qiyuan Wang(BJ-RD), Cooper Yan(BJ-RD), Herry Yang(BJ-RD),
	Florian Weimer, GNU C Library, Ricky Li(BJ-RD)

This has broken the build for 32-bit x86.

In file included from ../sysdeps/i386/cacheinfo.c:3:
../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
  762 |   unsigned int eax;
      |                ^~~
cc1: all warnings being treated as errors

The variable eax in that function is only used inside #ifndef 
DISABLE_PREFETCHW, and 32-bit defines DISABLE_PREFETCHW in 
sysdeps/i386/cacheinfo.c.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30 19:10               ` Joseph Myers
@ 2020-04-30 19:16                 ` Florian Weimer
  2020-04-30 19:21                   ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-04-30 19:16 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Mayshao-oc, H.J. Lu, Qiyuan Wang(BJ-RD), Cooper Yan(BJ-RD),
	Herry Yang(BJ-RD), GNU C Library, Ricky Li(BJ-RD)

* Joseph Myers:

> This has broken the build for 32-bit x86.
>
> In file included from ../sysdeps/i386/cacheinfo.c:3:
> ../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
> ../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
>   762 |   unsigned int eax;
>       |                ^~~
> cc1: all warnings being treated as errors
>
> The variable eax in that function is only used inside #ifndef 
> DISABLE_PREFETCHW, and 32-bit defines DISABLE_PREFETCHW in 
> sysdeps/i386/cacheinfo.c.

This seems to fix it.  Okay?

8<------------------------------------------------------------------8<
Subject: i386: Remove unused variable in sysdeps/x86/cacheinfo.c

Commit a98dc92dd1e278df4c501deb07985018bc2b06de ("x86: Add cache
information support for Zhaoxin processors") introduced an unused
variable warning in the default i686-linux-gnu build:

In file included from ../sysdeps/i386/cacheinfo.c:3:
../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
  762 |   unsigned int eax;
      |                ^~~

-----
 sysdeps/x86/cacheinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 17d7e41aed..311502dee3 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -759,7 +759,6 @@ __attribute__((constructor))
 init_cacheinfo (void)
 {
   /* Find out what brand of processor.  */
-  unsigned int eax;
   unsigned int ebx;
   unsigned int ecx;
   unsigned int edx;
@@ -830,6 +829,7 @@ init_cacheinfo (void)
 #ifndef DISABLE_PREFETCHW
       if (max_cpuid_ex >= 0x80000001)
 	{
+	  unsigned int eax;
 	  __cpuid (0x80000001, eax, ebx, ecx, edx);
 	  /*  PREFETCHW     || 3DNow!  */
 	  if ((ecx & 0x100) || (edx & 0x80000000))

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30 19:16                 ` Florian Weimer
@ 2020-04-30 19:21                   ` H.J. Lu
  2020-04-30 20:04                     ` Florian Weimer
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2020-04-30 19:21 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Joseph Myers, Mayshao-oc, Qiyuan Wang(BJ-RD), Cooper Yan(BJ-RD),
	Herry Yang(BJ-RD), GNU C Library, Ricky Li(BJ-RD)

On Thu, Apr 30, 2020 at 12:16 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * Joseph Myers:
>
> > This has broken the build for 32-bit x86.
> >
> > In file included from ../sysdeps/i386/cacheinfo.c:3:
> > ../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
> > ../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
> >   762 |   unsigned int eax;
> >       |                ^~~
> > cc1: all warnings being treated as errors
> >
> > The variable eax in that function is only used inside #ifndef
> > DISABLE_PREFETCHW, and 32-bit defines DISABLE_PREFETCHW in
> > sysdeps/i386/cacheinfo.c.

Why didn't I see the problem with GCC 10?

> This seems to fix it.  Okay?
>
> 8<------------------------------------------------------------------8<
> Subject: i386: Remove unused variable in sysdeps/x86/cacheinfo.c
>
> Commit a98dc92dd1e278df4c501deb07985018bc2b06de ("x86: Add cache
> information support for Zhaoxin processors") introduced an unused
> variable warning in the default i686-linux-gnu build:
>
> In file included from ../sysdeps/i386/cacheinfo.c:3:
> ../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
> ../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
>   762 |   unsigned int eax;
>       |                ^~~
>
> -----
>  sysdeps/x86/cacheinfo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index 17d7e41aed..311502dee3 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -759,7 +759,6 @@ __attribute__((constructor))
>  init_cacheinfo (void)
>  {
>    /* Find out what brand of processor.  */
> -  unsigned int eax;
>    unsigned int ebx;
>    unsigned int ecx;
>    unsigned int edx;
> @@ -830,6 +829,7 @@ init_cacheinfo (void)
>  #ifndef DISABLE_PREFETCHW
>        if (max_cpuid_ex >= 0x80000001)
>         {
> +         unsigned int eax;
>           __cpuid (0x80000001, eax, ebx, ecx, edx);
>           /*  PREFETCHW     || 3DNow!  */
>           if ((ecx & 0x100) || (edx & 0x80000000))

OK if it fixes the build problem.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/3] x86: Add cache information support for Zhaoxin processors
  2020-04-30 19:21                   ` H.J. Lu
@ 2020-04-30 20:04                     ` Florian Weimer
  0 siblings, 0 replies; 20+ messages in thread
From: Florian Weimer @ 2020-04-30 20:04 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Joseph Myers, Mayshao-oc, Qiyuan Wang(BJ-RD), Cooper Yan(BJ-RD),
	Herry Yang(BJ-RD), GNU C Library, Ricky Li(BJ-RD)

* H. J. Lu:

> On Thu, Apr 30, 2020 at 12:16 PM Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> * Joseph Myers:
>>
>> > This has broken the build for 32-bit x86.
>> >
>> > In file included from ../sysdeps/i386/cacheinfo.c:3:
>> > ../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
>> > ../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
>> >   762 |   unsigned int eax;
>> >       |                ^~~
>> > cc1: all warnings being treated as errors
>> >
>> > The variable eax in that function is only used inside #ifndef
>> > DISABLE_PREFETCHW, and 32-bit defines DISABLE_PREFETCHW in
>> > sysdeps/i386/cacheinfo.c.
>
> Why didn't I see the problem with GCC 10?

I don't know.  Sounds like a GCC 10 bug. 8-/

> OK if it fixes the build problem.

Thanks, pushed after testing on i686-linux-gnu and x86_64-linux-gnu.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-04-30 20:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-24 12:29 [PATCH v3 0/3] x86: Add support for Zhaoxin processors mayshao-oc
2020-04-24 12:29 ` [PATCH v3 1/3] x86: Add CPU Vendor ID detection " mayshao-oc
2020-04-24 12:53   ` H.J. Lu
2020-04-24 12:29 ` [PATCH v3 2/3] x86: Add cache information " mayshao-oc
2020-04-24 12:53   ` H.J. Lu
2020-04-26  5:54     ` Mayshao-oc
2020-04-26 12:07       ` H.J. Lu
2020-04-30  5:09         ` Mayshao-oc
2020-04-30  5:15           ` H.J. Lu
2020-04-30  6:04             ` Mayshao-oc
2020-04-30 12:52               ` H.J. Lu
2020-04-30 13:22                 ` Mayshao-oc
2020-04-30 13:55                   ` H.J. Lu
2020-04-30 14:39                     ` Mayshao-oc
2020-04-30 19:10               ` Joseph Myers
2020-04-30 19:16                 ` Florian Weimer
2020-04-30 19:21                   ` H.J. Lu
2020-04-30 20:04                     ` Florian Weimer
2020-04-24 12:29 ` [PATCH v3 3/3] x86: Add the test case of __get_cpu_features " mayshao-oc
2020-04-24 12:54   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).