* [PATCH] Add cache info to _dl_x86_cpu_features
@ 2016-05-10 23:49 H.J. Lu
0 siblings, 0 replies; only message in thread
From: H.J. Lu @ 2016-05-10 23:49 UTC (permalink / raw)
To: GNU C Library
[-- Attachment #1: Type: text/plain, Size: 1272 bytes --]
On Sun, May 8, 2016 at 8:57 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> Support setting processor-specific __x86_shared_non_temporal_threshold
> value in init_cpu_features.
>
> Tested on x86 and x86-64. Any comments and feebacks?
>
> H.J.
> ---
> * sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
> Initialize only if it is zero.
> ---
> sysdeps/x86/cacheinfo.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index 143b333..788f42d 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -669,5 +669,6 @@ init_cacheinfo (void)
> /* The large memcpy micro benchmark in glibc shows that 6 times of
> shared cache size is the approximate value above which non-temporal
> store becomes faster. */
> - __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6;
> + if (__x86_shared_non_temporal_threshold == 0)
> + __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6;
> }
This doesn't work since init_cpu_features may only update the private copy
in ld.so. Instead, this patch adds cache info to _dl_x86_cpu_features so
that init_cpu_features sets cache size.
Any comments and feebacks?
--
H.J.
[-- Attachment #2: 0001-X86-Add-cache-info-to-_dl_x86_cpu_features.patch --]
[-- Type: text/x-patch, Size: 2971 bytes --]
From fa82f9be580d9a4be4c3a56cde2aa2b1aadf98fc Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 10 May 2016 05:42:49 -0700
Subject: [PATCH] X86: Add cache info to _dl_x86_cpu_features
This patch adds cache info to _dl_x86_cpu_features to allow a processor
to override cache info derived from CPUID.
Tested on x86 and x86-64.
* sysdeps/x86/cacheinfo.c: Skip if not in libc.
(init_cacheinfo): Use raw_data_size, raw_shared_size and
shared_non_temporal_threshold from _dl_x86_cpu_features if
not zero.
* sysdeps/x86/cpu-features.h (cache_info): New.
(cpu_features): Add cache.
---
sysdeps/x86/cacheinfo.c | 17 ++++++++++++++++-
sysdeps/x86/cpu-features.h | 13 +++++++++++++
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 143b333..d6d0083 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -16,6 +16,8 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
+#if IS_IN (libc)
+
#include <assert.h>
#include <stdbool.h>
#include <stdlib.h>
@@ -646,6 +648,11 @@ init_cacheinfo (void)
#endif
}
+ const struct cache_info *cache = &GLRO(dl_x86_cpu_features).cache;
+
+ if (cache->raw_data_size != 0)
+ data = cache->raw_data_size;
+
if (data > 0)
{
__x86_raw_data_cache_size_half = data / 2;
@@ -656,6 +663,9 @@ init_cacheinfo (void)
__x86_data_cache_size = data;
}
+ if (cache->raw_shared_size != 0)
+ shared = cache->raw_shared_size;
+
if (shared > 0)
{
__x86_raw_shared_cache_size_half = shared / 2;
@@ -669,5 +679,10 @@ init_cacheinfo (void)
/* The large memcpy micro benchmark in glibc shows that 6 times of
shared cache size is the approximate value above which non-temporal
store becomes faster. */
- __x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6;
+ __x86_shared_non_temporal_threshold
+ = (cache->shared_non_temporal_threshold != 0
+ ? cache->shared_non_temporal_threshold
+ : __x86_shared_cache_size * 6);
}
+
+#endif
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index 9529d61..335d96a 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -164,6 +164,18 @@
#else /* __ASSEMBLER__ */
+struct cache_info
+{
+ /* Data cache size for use in memory and string routines, typically
+ L1 size. */
+ long int raw_data_size;
+ /* Shared cache size for use in memory and string routines, typically
+ L2 or L3 size. */
+ long int raw_shared_size;
+ /* Threshold to use non temporal store. */
+ long int shared_non_temporal_threshold;
+};
+
enum
{
COMMON_CPUID_INDEX_1 = 0,
@@ -193,6 +205,7 @@ struct cpu_features
unsigned int family;
unsigned int model;
unsigned int feature[FEATURE_INDEX_MAX];
+ struct cache_info cache;
};
/* Used from outside of glibc to get access to the CPU features
--
2.5.5
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2016-05-10 23:49 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-10 23:49 [PATCH] Add cache info to _dl_x86_cpu_features H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).