On Fri, May 22, 2020 at 9:10 PM liqingqing wrote: > > this commitid 830566307f038387ca0af3fd327706a8d1a2f595 optimize implementation of function memset, > and set macro REP_STOSB_THRESHOLD's default value to 2KB, when the input value is less than 2KB, the data flow is the same, and when the input value is large than 2KB, > this api will use STOB to instead of MOVQ > > but when I test this API on x86_64 platform > and found that this default value is not appropriate for some input length. here it's the enviornment and result > > test suite: libMicro-0.4.0 > ./memset -E -C 200 -L -S -W -N "memset_4k" -s 4k -I 250 > ./memset -E -C 200 -L -S -W -N "memset_4k_uc" -s 4k -u -I 400 > ./memset -E -C 200 -L -S -W -N "memset_1m" -s 1m -I 200000 > ./memset -E -C 200 -L -S -W -N "memset_10m" -s 10m -I 2000000 > > hardware platform: > Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz > L1d cache:32KB > L1i cache: 32KB > L2 cache: 1MB > L3 cache: 60MB > > the result is that when input length is between the processor's L1 data cache and L2 cache size, the REP_STOSB_THRESHOLD=2KB will reduce performance. > > before this commit after this commit > cycle cycle > memset_4k 249 96 > memset_10k 657 185 > memset_36k 2773 3767 > memset_100k 7594 10002 > memset_500k 37678 52149 > memset_1m 86780 108044 > memset_10m 1307238 1148994 > > before this commit after this commit > MLC cache miss(10sec) MLC cache miss(10sec) > memset_4k 1,09,33,823 1,01,79,270 > memset_10k 1,23,78,958 1,05,41,087 > memset_36k 3,61,64,244 4,07,22,429 > memset_100k 8,25,33,052 9,31,81,253 > memset_500k 37,32,55,449 43,56,70,395 > memset_1m 75,16,28,239 88,29,90,237 > memset_10m 9,36,61,67,397 8,96,69,49,522 > > > though REP_STOSB_THRESHOLD can be modified at the building time by use -DREP_STOSB_THRESHOLD=xxx, > but I think the default value may be is not a better one, cause I think most of the processor's L2 cache is large than 2KB, so i submit a patch as below: > > > > From 44314a556239a7524b5a6451025737c1bdbb1cd0 Mon Sep 17 00:00:00 2001 > From: liqingqing > Date: Thu, 21 May 2020 11:23:06 +0800 > Subject: [PATCH] update REP_STOSB_THRESHOLD's default value from 2k to 1M > macro REP_STOSB_THRESHOLD's value will reduce memset performace when input length is between processor's L1 data cache and L2 cache. > so update the defaule value to eliminate the decrement . > There is no single threshold value which is good for all workloads. I don't think we should change REP_STOSB_THRESHOLD to 1MB. On the other hand, the fixed threshold isn't flexible. Please try this patch to see if you can set the threshold for your specific workload. -- H.J.