public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 2.32] x86: Optimizing memcpy for AMD Zen architecture.
@ 2020-10-30  8:45 Florian Weimer
  2020-10-30 11:28 ` H.J. Lu
  0 siblings, 1 reply; 2+ messages in thread
From: Florian Weimer @ 2020-10-30  8:45 UTC (permalink / raw)
  To: Sajan Karumanchi; +Cc: libc-alpha, Premachandra Mallappa, H.J. Lu

From: Sajan Karumanchi <sajan.karumanchi@amd.com>

Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Backport of commit 59803e81f96b479c17f583b31eac44b57591a1bf upstream,
with the fix from cb3a749a22a55645dc6a52659eea765300623f98 ("x86:
Restore processing of cache size tunables in init_cacheinfo") applied.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Co-Authored-by: Florian Weimer <fweimer@redhat.com>

---
 sysdeps/x86/cacheinfo.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index dadec5d58f..3fb4a028d8 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -808,7 +808,7 @@ init_cacheinfo (void)
 	      threads = 1 << ((ecx >> 12) & 0x0f);
 	    }
 
-	  if (threads == 0)
+	  if (threads == 0 || cpu_features->basic.family >= 0x17)
 	    {
 	      /* If APIC ID width is not available, use logical
 		 processor count.  */
@@ -823,8 +823,22 @@ init_cacheinfo (void)
 	  if (threads > 0)
 	    shared /= threads;
 
-	  /* Account for exclusive L2 and L3 caches.  */
-	  shared += core;
+	  /* Get shared cache per ccx for Zen architectures.  */
+	  if (cpu_features->basic.family >= 0x17)
+	    {
+	      unsigned int eax;
+
+	      /* Get number of threads share the L3 cache in CCX.  */
+	      __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx);
+
+	      unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1;
+	      shared *= threads_per_ccx;
+	    }
+	  else
+	    {
+	      /* Account for exclusive L2 and L3 caches.  */
+	      shared += core;
+            }
 	}
     }
 

-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH v2 2.32] x86: Optimizing memcpy for AMD Zen architecture.
  2020-10-30  8:45 [PATCH v2 2.32] x86: Optimizing memcpy for AMD Zen architecture Florian Weimer
@ 2020-10-30 11:28 ` H.J. Lu
  0 siblings, 0 replies; 2+ messages in thread
From: H.J. Lu @ 2020-10-30 11:28 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Sajan Karumanchi, GNU C Library, Premachandra Mallappa

On Fri, Oct 30, 2020 at 1:45 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> From: Sajan Karumanchi <sajan.karumanchi@amd.com>
>
> Modifying the shareable cache '__x86_shared_cache_size', which is a
> factor in computing the non-temporal threshold parameter
> '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
> architectures.
> In the existing implementation, the shareable cache is computed as 'L3
> per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
> CCX(Core-Complex)' has brought in performance gains.
> As per the large bench variant results, this patch also addresses the
> regression problem on AMD Zen architectures.
>
> Backport of commit 59803e81f96b479c17f583b31eac44b57591a1bf upstream,
> with the fix from cb3a749a22a55645dc6a52659eea765300623f98 ("x86:
> Restore processing of cache size tunables in init_cacheinfo") applied.
>
> Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
> Co-Authored-by: Florian Weimer <fweimer@redhat.com>
>
> ---
>  sysdeps/x86/cacheinfo.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index dadec5d58f..3fb4a028d8 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -808,7 +808,7 @@ init_cacheinfo (void)
>               threads = 1 << ((ecx >> 12) & 0x0f);
>             }
>
> -         if (threads == 0)
> +         if (threads == 0 || cpu_features->basic.family >= 0x17)
>             {
>               /* If APIC ID width is not available, use logical
>                  processor count.  */
> @@ -823,8 +823,22 @@ init_cacheinfo (void)
>           if (threads > 0)
>             shared /= threads;
>
> -         /* Account for exclusive L2 and L3 caches.  */
> -         shared += core;
> +         /* Get shared cache per ccx for Zen architectures.  */
> +         if (cpu_features->basic.family >= 0x17)
> +           {
> +             unsigned int eax;
> +
> +             /* Get number of threads share the L3 cache in CCX.  */
> +             __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx);
> +
> +             unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1;
> +             shared *= threads_per_ccx;
> +           }
> +         else
> +           {
> +             /* Account for exclusive L2 and L3 caches.  */
> +             shared += core;
> +            }
>         }
>      }
>
>

LGTM.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-10-30 11:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30  8:45 [PATCH v2 2.32] x86: Optimizing memcpy for AMD Zen architecture Florian Weimer
2020-10-30 11:28 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).