On Mon, May 22, 2017 at 12:17 PM, H.J. Lu wrote: > On Thu, May 18, 2017 at 1:59 PM, Erich Elsen wrote: >> Hi H.J., >> >> I was on vacation, sorry for the slow reply. The updated benchmark >> still shows the same behavior, thanks. >> >> I'll try my hand at creating a patch that makes that variable >> __x86_shared_non_temporal_threshold a tunable. It will be necessary >> to do internal experiments anyway. >> > > __x86_shared_non_temporal_threshold was set to 6 times of per-core > shared cache size, based on the large memcpy micro benchmark in glibc > on a 8-core processor. For a processor with more than 8 cores, the > threshold is too low. Set __x86_shared_non_temporal_threshold to the > 3/4 of the total shared cache size so that it is unchanged on 8-core > processors. On processors with less than 8 cores, the threshold is > lower. > > Any comments? > Here is a patch to add support for "glibc.x86_cache.non_temporal_threshold=number" to GLIBC_TUNABLES. -- H.J.