From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2178) id 4DBB6385781D; Fri, 30 Oct 2020 11:59:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4DBB6385781D Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Florian Weimer To: glibc-cvs@sourceware.org Subject: [glibc/release/2.32/master] x86: Optimizing memcpy for AMD Zen architecture. X-Act-Checkin: glibc X-Git-Author: Sajan Karumanchi X-Git-Refname: refs/heads/release/2.32/master X-Git-Oldrev: e61a8fd8fadbf1a8cef997a0f921575cb2905ea2 X-Git-Newrev: 8813b2682e4094e43b0cf1634e99619f1b8b2c62 Message-Id: <20201030115955.4DBB6385781D@sourceware.org> Date: Fri, 30 Oct 2020 11:59:55 +0000 (GMT) X-BeenThere: glibc-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Oct 2020 11:59:55 -0000 https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8813b2682e4094e43b0cf1634e99619f1b8b2c62 commit 8813b2682e4094e43b0cf1634e99619f1b8b2c62 Author: Sajan Karumanchi Date: Wed Oct 28 13:05:33 2020 +0530 x86: Optimizing memcpy for AMD Zen architecture. Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Backport of commit 59803e81f96b479c17f583b31eac44b57591a1bf upstream, with the fix from cb3a749a22a55645dc6a52659eea765300623f98 ("x86: Restore processing of cache size tunables in init_cacheinfo") applied. Reviewed-by: Premachandra Mallappa Co-Authored-by: Florian Weimer Diff: --- sysdeps/x86/cacheinfo.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index dadec5d58f..3fb4a028d8 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -808,7 +808,7 @@ init_cacheinfo (void) threads = 1 << ((ecx >> 12) & 0x0f); } - if (threads == 0) + if (threads == 0 || cpu_features->basic.family >= 0x17) { /* If APIC ID width is not available, use logical processor count. */ @@ -823,8 +823,22 @@ init_cacheinfo (void) if (threads > 0) shared /= threads; - /* Account for exclusive L2 and L3 caches. */ - shared += core; + /* Get shared cache per ccx for Zen architectures. */ + if (cpu_features->basic.family >= 0x17) + { + unsigned int eax; + + /* Get number of threads share the L3 cache in CCX. */ + __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx); + + unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1; + shared *= threads_per_ccx; + } + else + { + /* Account for exclusive L2 and L3 caches. */ + shared += core; + } } }