public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/1] x86: Tuning NT Threshold parameter for AMD machines
@ 2020-08-19 10:45 Sajan Karumanchi
  2020-08-19 10:45 ` [PATCH 1/1] " Sajan Karumanchi
  0 siblings, 1 reply; 5+ messages in thread
From: Sajan Karumanchi @ 2020-08-19 10:45 UTC (permalink / raw)
  To: libc-alpha, carlos; +Cc: Sajan Karumanchi, premachandra.mallappa

Tuning NT threshold parameter '__x86_shared_non_temporal_threshold' to 2/3 of
shared cache size on AMD Zen[1|2] machines brings in performance gains
for memcpy/memmove as per the Large and Walk Bench variant reuslts.

As there are run to run variations in bench results, I took average of 100 runs
for both vanilla and patched glibc.

AMD ZEN[1/2] architectures doesn't have ERMS cpu feature.
So, on ZEN architecutre memcpy takes 'memcpy_avx_unaligned' entry point.

Below is the large bench test results comparision for entry points:
avx_unaligned and avx_unaligned_erms.
-------------------------------------------------------------------------
size     load_align store_align avx_unaligned(%) avx_unaligned_erms(%)
-------------------------------------------------------------------------
1048583         0       0       1.89                    68.28
1048591         0       3       1.19                    94.56
1048607         3       0       -0.25                   68.25
1048639         3       5       -90.7                   89.69
2097159         0       0       -75.11                  43.18
2097167         0       3       -74.08                  90.16
2097183         3       0       -78.12                  43.81
2097215         3       5       -73.75                  90.58
4194311         0       0       -88.5                   39.26
4194319         0       3       -72.13                  90.21
4194335         3       0       -78.31                  43.97
4194367         3       5       -72                     90.64
8388615         0       0       -12.22                  43.24
8388623         0       3       -15.76                  90.3
8388639         3       0       -22.31                  39.92
8388671         3       5       -15.34                  90.74
16777223        0       0       49.8                    46.89
16777231        0       3       52.5                    90.14
16777247        3       0       51.82                   46.68
16777279        3       5       52.35                   90.55
33554439        0       0       41.76                   52.72
33554447        0       3       44.17                   88.29
33554463        3       0       43.74                   53.62
33554495        3       5       44.09                   88.78
-------------------------------------------------------------------------

Below is the Walk bench test results comparision for entry points.
avx_unaligned and avx_unaligned_erms.
---------------------------------------------------
size            avx_unaligned(%) avx_unaligned_erms(%)
---------------------------------------------------
1048576                 -0.2            15.03
1048577                 0.92            15.52
2097152                 40.52           50.92
2097153                 40.76           50.84
4194304                 40.6            51.22
4194305                 40.57           51.25
8388608                 40.61           51.23
8388609                 40.82           51.32
16777216                40.56           51.11
16777217                40.35           51.29
33554432                40.15           37.41
33554433                20.75           41.22
---------------------------------------------------
Question:
Why do we see discrepancies in the results of Large bench, though code path
taken for NT Stores in memcpy is same for both entry points
"memcpy_avx_unaligned" and "memcpy_avx_unaligned_erms"?


Sajan Karumanchi (1):
  x86: Tuning NT Threshold parameter for AMD machines.

 sysdeps/x86/cacheinfo.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-07 14:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 10:45 [PATCH 0/1] x86: Tuning NT Threshold parameter for AMD machines Sajan Karumanchi
2020-08-19 10:45 ` [PATCH 1/1] " Sajan Karumanchi
2020-09-01 19:23   ` H.J. Lu
2020-09-08 11:36     ` Sajan Karumanchi
2020-12-07 14:23       ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).