[PATCH 0/1] Optimizing memcpy for AMD Zen architecture.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/1] Optimizing memcpy for AMD Zen architecture.
@ 2020-10-22  4:50 sajan.karumanchi
  2020-10-22  4:50 ` [PATCH 1/1] x86: " sajan.karumanchi
  0 siblings, 1 reply; 9+ messages in thread
From: sajan.karumanchi @ 2020-10-22  4:50 UTC (permalink / raw)
  To: libc-alpha, carlos, fweimer; +Cc: Sajan Karumanchi, premachandra.mallappa

From: Sajan Karumanchi <sajan.karumanchi@amd.com>

Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architecture.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'.
Recomputing this shareable cache as 'L3 per CCX'(Core-Complex) has
brought performance gains of ~44% for memory sizes greater than 16MB.

The patch I posted earlier: 'Tuning NT Threshold parameter for AMD
machines'
https://sourceware.org/pipermail/libc-alpha/2020-August/117080.html
and the recent patch committed by Patrick McGehearty: 'Reversing
calculation of __x86_shared_non_temporal_threshold', both have
regression problems on AMD Zen machines for memory ranges of 1MB to 8MB
as per the large bench variant results.
This patch addresses the regression problem on AMD Zen machines.
The below link will show the performance results chart comparison of
'Master' branch and 'AMD' patch against the 2.32 stable release.
https://i.imgur.com/0ZJAwes.png
Summary: On master branch we see a regression for memoery sizes below
8MB with performance drop of upto 99%, whereas AMD patch has performance
gains for 16MB and above with no regressions.

Note: The benchmarking is done by isolating all the cpu cores in a CCX,
configuring them to fixed frequency mode and routing the IRQs to other
cpu cores.
Then the large bench tests were run by pinning to one of the isolated
cores for 1000 iterations and the performance computation is done by
taking average of these iterations.

Sajan Karumanchi (1):
  x86: Optimizing memcpy for AMD Zen architecture.

 sysdeps/x86/cacheinfo.h | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-22  4:50 [PATCH 0/1] Optimizing memcpy for AMD Zen architecture sajan.karumanchi
@ 2020-10-22  4:50 ` sajan.karumanchi
  2020-10-27 10:38   ` Florian Weimer
  2020-10-28 14:00   ` [PATCH 1/1] " H.J. Lu
  0 siblings, 2 replies; 9+ messages in thread
From: sajan.karumanchi @ 2020-10-22  4:50 UTC (permalink / raw)
  To: libc-alpha, carlos, fweimer; +Cc: Sajan Karumanchi, premachandra.mallappa

From: Sajan Karumanchi <sajan.karumanchi@amd.com>

Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Signed-off-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Signed-off-by: Sajan Karumanchi <sajan.karumanchi@amd.com>
---
 sysdeps/x86/cacheinfo.h | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h
index 7f342fdc23..d6d6877702 100644
--- a/sysdeps/x86/cacheinfo.h
+++ b/sysdeps/x86/cacheinfo.h
@@ -303,6 +303,8 @@ init_cacheinfo (void)
       data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
       long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
       shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
+      unsigned int eax;
+      unsigned int threads_per_ccx = 0;
 
       /* Get maximum extended function. */
       __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
@@ -320,7 +322,7 @@ init_cacheinfo (void)
 	      threads = 1 << ((ecx >> 12) & 0x0f);
 	    }
 
-	  if (threads == 0)
+	  if (threads == 0 || cpu_features->basic.family >= 0x17)
 	    {
 	      /* If APIC ID width is not available, use logical
 		 processor count.  */
@@ -335,13 +337,27 @@ init_cacheinfo (void)
 	  if (threads > 0)
 	    shared /= threads;
 
-	  /* Account for exclusive L2 and L3 caches.  */
-	  shared += core;
-	}
+	  /* Get shared cache per ccx for Zen architectures */
+	  if (cpu_features->basic.family >= 0x17)
+	    {
+	      /* Get number of threads share the L3 cache in CCX */
+	      __cpuid_count(0x8000001D, 0x3, eax, ebx, ecx, edx);
+	      threads_per_ccx = ((eax >> 14) & 0xfff) + 1;
+	      shared = shared * threads_per_ccx;
+	    }
+	  else
+	    {
+	      /* Account for exclusive L2 and L3 caches.  */
+	      shared += core;
+            }
+      }
     }
 
   if (cpu_features->data_cache_size != 0)
-    data = cpu_features->data_cache_size;
+    {
+      if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
+        data = cpu_features->data_cache_size;
+    }
 
   if (data > 0)
     {
@@ -354,7 +370,10 @@ init_cacheinfo (void)
     }
 
   if (cpu_features->shared_cache_size != 0)
-    shared = cpu_features->shared_cache_size;
+    {
+      if (shared == 0 || cpu_features->basic.kind != arch_kind_amd)
+        shared = cpu_features->shared_cache_size;
+    }
 
   if (shared > 0)
     {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-22  4:50 ` [PATCH 1/1] x86: " sajan.karumanchi
@ 2020-10-27 10:38   ` Florian Weimer
  2020-10-28  7:35     ` [PATCH] " sajan.karumanchi
  2020-10-28 14:00   ` [PATCH 1/1] " H.J. Lu
  1 sibling, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2020-10-27 10:38 UTC (permalink / raw)
  To: sajan.karumanchi; +Cc: libc-alpha, carlos, premachandra.mallappa

* sajan karumanchi:

> diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h
> index 7f342fdc23..d6d6877702 100644
> --- a/sysdeps/x86/cacheinfo.h
> +++ b/sysdeps/x86/cacheinfo.h
> @@ -303,6 +303,8 @@ init_cacheinfo (void)
>        data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
>        long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
>        shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
> +      unsigned int eax;
> +      unsigned int threads_per_ccx = 0;
>  
>        /* Get maximum extended function. */
>        __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
> @@ -320,7 +322,7 @@ init_cacheinfo (void)
>  	      threads = 1 << ((ecx >> 12) & 0x0f);
>  	    }
>  
> -	  if (threads == 0)
> +	  if (threads == 0 || cpu_features->basic.family >= 0x17)
>  	    {
>  	      /* If APIC ID width is not available, use logical
>  		 processor count.  */
> @@ -335,13 +337,27 @@ init_cacheinfo (void)
>  	  if (threads > 0)
>  	    shared /= threads;
>  
> -	  /* Account for exclusive L2 and L3 caches.  */
> -	  shared += core;
> -	}
> +	  /* Get shared cache per ccx for Zen architectures */
> +	  if (cpu_features->basic.family >= 0x17)
> +	    {
> +	      /* Get number of threads share the L3 cache in CCX */
> +	      __cpuid_count(0x8000001D, 0x3, eax, ebx, ecx, edx);
> +	      threads_per_ccx = ((eax >> 14) & 0xfff) + 1;
> +	      shared = shared * threads_per_ccx;
> +	    }
> +	  else
> +	    {
> +	      /* Account for exclusive L2 and L3 caches.  */
> +	      shared += core;
> +            }
> +      }
>      }

Although not visible in the patch, these changes a properly guarded by
an arch_kind_amd check, as expected.

Beyond that, I can't comment on the substance of the patch, but I'd like
to request the follow style changes:

* Move the definitions of eax and threads_per_ccx closer
  to their usage site.  Initialize threads_per_ccx directly with its
  final variable.  (The separate variable is nice for documentation
  purposes.)
* Add a space after __cpuid_count (to follow GNU style).
* Add ".  " (period and two spaces) at the end of all new comments.
* Remove Signed-off-by.  glibc does not use DCO
  <https://developercertificate.org/>.  I assume this patch is covered
  by AMD's copyright assignment instead.

I can make these changes for you and push this, or you can post a new
patch.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-27 10:38   ` Florian Weimer
@ 2020-10-28  7:35     ` sajan.karumanchi
  2020-10-28  9:22       ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: sajan.karumanchi @ 2020-10-28  7:35 UTC (permalink / raw)
  To: libc-alpha, carlos, fweimer; +Cc: premachandra.mallappa, Sajan Karumanchi

From: Sajan Karumanchi <sajan.karumanchi@amd.com>

Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
---
 sysdeps/x86/cacheinfo.h | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h
index 7f342fdc23..1296c93b2b 100644
--- a/sysdeps/x86/cacheinfo.h
+++ b/sysdeps/x86/cacheinfo.h
@@ -320,7 +320,7 @@ init_cacheinfo (void)
 	      threads = 1 << ((ecx >> 12) & 0x0f);
 	    }
 
-	  if (threads == 0)
+	  if (threads == 0 || cpu_features->basic.family >= 0x17)
 	    {
 	      /* If APIC ID width is not available, use logical
 		 processor count.  */
@@ -335,13 +335,30 @@ init_cacheinfo (void)
 	  if (threads > 0)
 	    shared /= threads;
 
-	  /* Account for exclusive L2 and L3 caches.  */
-	  shared += core;
-	}
+	  /* Get shared cache per ccx for Zen architectures.  */
+	  if (cpu_features->basic.family >= 0x17)
+	    {
+	      unsigned int eax;
+
+	      /* Get number of threads share the L3 cache in CCX.  */
+	      __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx);
+
+	      unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1;
+	      shared *= threads_per_ccx;
+	    }
+	  else
+	    {
+	      /* Account for exclusive L2 and L3 caches.  */
+	      shared += core;
+            }
+      }
     }
 
   if (cpu_features->data_cache_size != 0)
-    data = cpu_features->data_cache_size;
+    {
+      if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
+        data = cpu_features->data_cache_size;
+    }
 
   if (data > 0)
     {
@@ -354,7 +371,10 @@ init_cacheinfo (void)
     }
 
   if (cpu_features->shared_cache_size != 0)
-    shared = cpu_features->shared_cache_size;
+    {
+      if (shared == 0 || cpu_features->basic.kind != arch_kind_amd)
+        shared = cpu_features->shared_cache_size;
+    }
 
   if (shared > 0)
     {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-28  7:35     ` [PATCH] " sajan.karumanchi
@ 2020-10-28  9:22       ` Florian Weimer
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Weimer @ 2020-10-28  9:22 UTC (permalink / raw)
  To: sajan.karumanchi; +Cc: libc-alpha, carlos, premachandra.mallappa

* sajan karumanchi:

> From: Sajan Karumanchi <sajan.karumanchi@amd.com>
>
> Modifying the shareable cache '__x86_shared_cache_size', which is a
> factor in computing the non-temporal threshold parameter
> '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
> architectures.
> In the existing implementation, the shareable cache is computed as 'L3
> per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
> CCX(Core-Complex)' has brought in performance gains.
> As per the large bench variant results, this patch also addresses the
> regression problem on AMD Zen architectures.
>
> Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>

Thanks, I've pushed this version for you.

Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-22  4:50 ` [PATCH 1/1] x86: " sajan.karumanchi
  2020-10-27 10:38   ` Florian Weimer
@ 2020-10-28 14:00   ` H.J. Lu
  2020-10-28 14:29     ` Florian Weimer
  1 sibling, 1 reply; 9+ messages in thread
From: H.J. Lu @ 2020-10-28 14:00 UTC (permalink / raw)
  To: Sajan Karumanchi
  Cc: GNU C Library, Carlos O'Donell, Florian Weimer, Mallappa,
	Premachandra

On Wed, Oct 21, 2020 at 9:51 PM <sajan.karumanchi@amd.com> wrote:
>
> From: Sajan Karumanchi <sajan.karumanchi@amd.com>
>
> Modifying the shareable cache '__x86_shared_cache_size', which is a
> factor in computing the non-temporal threshold parameter
> '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
> architectures.
> In the existing implementation, the shareable cache is computed as 'L3
> per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
> CCX(Core-Complex)' has brought in performance gains.
> As per the large bench variant results, this patch also addresses the
> regression problem on AMD Zen architectures.
>
> Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
> Signed-off-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
> Signed-off-by: Sajan Karumanchi <sajan.karumanchi@amd.com>
> ---
>  sysdeps/x86/cacheinfo.h | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h
> index 7f342fdc23..d6d6877702 100644
> --- a/sysdeps/x86/cacheinfo.h
> +++ b/sysdeps/x86/cacheinfo.h
> @@ -303,6 +303,8 @@ init_cacheinfo (void)
>        data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
>        long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
>        shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
> +      unsigned int eax;
> +      unsigned int threads_per_ccx = 0;
>
>        /* Get maximum extended function. */
>        __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
> @@ -320,7 +322,7 @@ init_cacheinfo (void)
>               threads = 1 << ((ecx >> 12) & 0x0f);
>             }
>
> -         if (threads == 0)
> +         if (threads == 0 || cpu_features->basic.family >= 0x17)
>             {
>               /* If APIC ID width is not available, use logical
>                  processor count.  */
> @@ -335,13 +337,27 @@ init_cacheinfo (void)
>           if (threads > 0)
>             shared /= threads;
>
> -         /* Account for exclusive L2 and L3 caches.  */
> -         shared += core;
> -       }
> +         /* Get shared cache per ccx for Zen architectures */
> +         if (cpu_features->basic.family >= 0x17)
> +           {
> +             /* Get number of threads share the L3 cache in CCX */
> +             __cpuid_count(0x8000001D, 0x3, eax, ebx, ecx, edx);
> +             threads_per_ccx = ((eax >> 14) & 0xfff) + 1;
> +             shared = shared * threads_per_ccx;
> +           }
> +         else
> +           {
> +             /* Account for exclusive L2 and L3 caches.  */
> +             shared += core;
> +            }
> +      }
>      }
>
>    if (cpu_features->data_cache_size != 0)
> -    data = cpu_features->data_cache_size;
> +    {
> +      if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
> +        data = cpu_features->data_cache_size;
> +    }

This looks wrong.  cpu_features->data_cache_size is set by
GLIBC tunables:

-- Tunable: glibc.cpu.x86_data_cache_size
     The ‘glibc.cpu.x86_data_cache_size’ tunable allows the user to set
     data cache size in bytes for use in memory and string routines.

     This tunable is specific to i386 and x86-64.

Why is it ignored?

>    if (data > 0)
>      {
> @@ -354,7 +370,10 @@ init_cacheinfo (void)
>      }
>
>    if (cpu_features->shared_cache_size != 0)
> -    shared = cpu_features->shared_cache_size;
> +    {
> +      if (shared == 0 || cpu_features->basic.kind != arch_kind_amd)
> +        shared = cpu_features->shared_cache_size;
> +    }

This looks wrong.  cpu_features->shared_cache_size is set by
GLIBC tunables:

-- Tunable: glibc.cpu.x86_shared_cache_size
     The ‘glibc.cpu.x86_shared_cache_size’ tunable allows the user to
     set shared cache size in bytes for use in memory and string
     routines.

Why is it ignored?

>    if (shared > 0)
>      {
> --
> 2.17.1
>

I think they should be reverted.

-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-28 14:00   ` [PATCH 1/1] " H.J. Lu
@ 2020-10-28 14:29     ` Florian Weimer
  2020-10-28 14:44       ` H.J. Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2020-10-28 14:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Sajan Karumanchi, GNU C Library, Carlos O'Donell, Mallappa,
	Premachandra

* H. J. Lu:

>>    if (cpu_features->data_cache_size != 0)
>> -    data = cpu_features->data_cache_size;
>> +    {
>> +      if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
>> +        data = cpu_features->data_cache_size;
>> +    }
>
> This looks wrong.  cpu_features->data_cache_size is set by
> GLIBC tunables:
>
> -- Tunable: glibc.cpu.x86_data_cache_size
>      The ‘glibc.cpu.x86_data_cache_size’ tunable allows the user to set
>      data cache size in bytes for use in memory and string routines.
>
>      This tunable is specific to i386 and x86-64.
>
> Why is it ignored?

So we should revert those two hunks only?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-28 14:29     ` Florian Weimer
@ 2020-10-28 14:44       ` H.J. Lu
  2020-10-30  6:35         ` Karumanchi, Sajan
  0 siblings, 1 reply; 9+ messages in thread
From: H.J. Lu @ 2020-10-28 14:44 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Sajan Karumanchi, GNU C Library, Carlos O'Donell, Mallappa,
	Premachandra

On Wed, Oct 28, 2020 at 7:29 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >>    if (cpu_features->data_cache_size != 0)
> >> -    data = cpu_features->data_cache_size;
> >> +    {
> >> +      if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
> >> +        data = cpu_features->data_cache_size;
> >> +    }
> >
> > This looks wrong.  cpu_features->data_cache_size is set by
> > GLIBC tunables:
> >
> > -- Tunable: glibc.cpu.x86_data_cache_size
> >      The ‘glibc.cpu.x86_data_cache_size’ tunable allows the user to set
> >      data cache size in bytes for use in memory and string routines.
> >
> >      This tunable is specific to i386 and x86-64.
> >
> > Why is it ignored?
>
> So we should revert those two hunks only?
>

Yes.

-- 
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-28 14:44       ` H.J. Lu
@ 2020-10-30  6:35         ` Karumanchi, Sajan
  0 siblings, 0 replies; 9+ messages in thread
From: Karumanchi, Sajan @ 2020-10-30  6:35 UTC (permalink / raw)
  To: H.J. Lu, Florian Weimer
  Cc: GNU C Library, Carlos O'Donell, Mallappa, Premachandra

[AMD Public Use]

Hi H. J .Lu,

Thanks for pointing it out and sorry for my ignorance of the Glibc tunables.  I overlooked this tunables part, as it was once reviewed by you for another patch which did not make through.
   Patch:  https://sourceware.org/pipermail/libc-alpha/2020-August/117081.html
   Review: https://sourceware.org/pipermail/libc-alpha/2020-September/117385.html



Thanks & Regards,
Sajan K.

-----Original Message-----
From: H.J. Lu <hjl.tools@gmail.com> 
Sent: Wednesday, October 28, 2020 8:14 PM
To: Florian Weimer <fweimer@redhat.com>
Cc: Karumanchi, Sajan <Sajan.Karumanchi@amd.com>; GNU C Library <libc-alpha@sourceware.org>; Carlos O'Donell <carlos@redhat.com>; Mallappa, Premachandra <Premachandra.Mallappa@amd.com>
Subject: Re: [PATCH 1/1] x86: Optimizing memcpy for AMD Zen architecture.

[CAUTION: External Email]

On Wed, Oct 28, 2020 at 7:29 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >>    if (cpu_features->data_cache_size != 0)
> >> -    data = cpu_features->data_cache_size;
> >> +    {
> >> +      if (data == 0 || cpu_features->basic.kind != arch_kind_amd)
> >> +        data = cpu_features->data_cache_size;
> >> +    }
> >
> > This looks wrong.  cpu_features->data_cache_size is set by GLIBC 
> > tunables:
> >
> > -- Tunable: glibc.cpu.x86_data_cache_size
> >      The ‘glibc.cpu.x86_data_cache_size’ tunable allows the user to set
> >      data cache size in bytes for use in memory and string routines.
> >
> >      This tunable is specific to i386 and x86-64.
> >
> > Why is it ignored?
>
> So we should revert those two hunks only?
>

Yes.

--
H.J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-30  6:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22  4:50 [PATCH 0/1] Optimizing memcpy for AMD Zen architecture sajan.karumanchi
2020-10-22  4:50 ` [PATCH 1/1] x86: " sajan.karumanchi
2020-10-27 10:38   ` Florian Weimer
2020-10-28  7:35     ` [PATCH] " sajan.karumanchi
2020-10-28  9:22       ` Florian Weimer
2020-10-28 14:00   ` [PATCH 1/1] " H.J. Lu
2020-10-28 14:29     ` Florian Weimer
2020-10-28 14:44       ` H.J. Lu
2020-10-30  6:35         ` Karumanchi, Sajan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).