public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] x86: Set rep_movsb_threshold to 2112 on processors with FSRM
@ 2021-04-30 18:24 H.J. Lu
  2021-04-30 22:59 ` Noah Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: H.J. Lu @ 2021-04-30 18:24 UTC (permalink / raw)
  To: libc-alpha

The glibc memcpy benchmark on Intel Core i7-1065G7 (Ice Lake) showed
that REP MOVSB became faster after 2112 bytes:

                                    Vector + REP MOVSB  REP MOVSB
length=2112, align1=0, align2=0:        24.20             24.40
length=2112, align1=1, align2=0:        26.07             23.13
length=2112, align1=0, align2=1:        27.18             28.13
length=2112, align1=1, align2=1:        26.23             25.16
length=2176, align1=0, align2=0:        23.18             22.52
length=2176, align1=2, align2=0:        25.45             22.52
length=2176, align1=0, align2=2:        27.14             27.82
length=2176, align1=2, align2=2:        22.73             25.56
length=2240, align1=0, align2=0:        24.62             24.25
length=2240, align1=3, align2=0:        29.77             27.15
length=2240, align1=0, align2=3:        35.55             29.93
length=2240, align1=3, align2=3:        34.49             25.15
length=2304, align1=0, align2=0:        34.75             26.64
length=2304, align1=4, align2=0:        32.09             22.63
length=2304, align1=0, align2=4:        28.43             31.24

Use REP MOVSB for data size > 2112 bytes in memcpy on processors with
fast short REP MOVSB (FSRM).

	* sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Set
	rep_movsb_threshold to 2112 on processors with fast short REP
	MOVSB (FSRM).
---
 sysdeps/x86/dl-cacheinfo.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index d9944250fc..3f04fb5019 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -871,7 +871,10 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
   if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
       && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
     {
-      rep_movsb_threshold = 2048 * (64 / 16);
+      if (CPU_FEATURE_USABLE_P (cpu_features, FSRM))
+	rep_movsb_threshold = 2112;
+      else
+	rep_movsb_threshold = 2048 * (64 / 16);
 #if HAVE_TUNABLES
       minimum_rep_movsb_threshold = 64 * 8;
 #endif
@@ -879,7 +882,10 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
   else if (CPU_FEATURE_PREFERRED_P (cpu_features,
 				    AVX_Fast_Unaligned_Load))
     {
-      rep_movsb_threshold = 2048 * (32 / 16);
+      if (CPU_FEATURE_USABLE_P (cpu_features, FSRM))
+	rep_movsb_threshold = 2112;
+      else
+	rep_movsb_threshold = 2048 * (32 / 16);
 #if HAVE_TUNABLES
       minimum_rep_movsb_threshold = 32 * 8;
 #endif
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-27 23:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 18:24 [PATCH] x86: Set rep_movsb_threshold to 2112 on processors with FSRM H.J. Lu
2021-04-30 22:59 ` Noah Goldstein
2021-04-30 23:50   ` H.J. Lu
2021-05-01  0:17     ` Noah Goldstein
2021-05-01  1:26       ` H.J. Lu
2021-05-01 14:08         ` [PATCH v2] " H.J. Lu
2021-05-01 18:45           ` Noah Goldstein
2022-04-27 23:56             ` Sunil Pandey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).