From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 106077 invoked by alias); 15 Jan 2019 02:56:39 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 106059 invoked by uid 89); 15 Jan 2019 02:56:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=heavy, H*RU:HELO, Hx-spam-relays-external:HELO, slight X-HELO: mga17.intel.com Subject: Re: [PATCH] NUMA spinlock [BZ #23962] To: Ma Ling , libc-alpha@sourceware.org Cc: hongjiu.lu@intel.com, "ling.ma" , Wei Xiao References: <20181226025019.38752-1-ling.ma@MacBook-Pro-8.local> From: kemi Message-ID: Date: Tue, 15 Jan 2019 02:56:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181226025019.38752-1-ling.ma@MacBook-Pro-8.local> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2019-01/txt/msg00321.txt.bz2 On 2018/12/26 上午10:50, Ma Ling wrote: > From: "ling.ma" > > On multi-socket systems, memory is shared across the entire system. > Data access to the local socket is much faster than the remote socket > and data access to the local core is faster than sibling cores on the > same socket. For serialized workloads with conventional spinlock, > when there is high spinlock contention between threads, lock ping-pong > among sockets becomes the bottleneck and threads spend majority of > their time in spinlock overhead. > > On multi-socket systems, the keys to our NUMA spinlock performance > are to minimize cross-socket traffic as well as localize the serialized > workload to one core for execution. The basic principles of NUMA > spinlock are mainly consisted of following approaches, which reduce > data movement and accelerate critical section, eventually give us > significant performance improvement. > > 1. MCS spinlock > MCS spinlock help us to reduce the useless lock movement in the > spinning state. This paper provides a good description for this > kind of lock: That's not the truth. No matter generic spinlock(or x86 version) spinlock has used the way of test and test_and_set to reduce the useless lock movement in the spinning state. See glibc/nptl/pthread_spin_lock.c glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S What MCS-spinlock really helps is to accelerate lock release and lock acquisition by reducing lots of cache line bouncing. > NUMA spinlock can greatly speed up critical section on multi-socket > systems. It should improve spinlock performance on all multi-socket > systems. > This is out-of-question that NUMA spinlock helps a lot in case of heavy lock contention. But, we should also propose the data for non-contented case and slight contended case. It's expected that extra code complexity may degrade lock performance a bit for slight contended case, I would like to see the data for that. Also, the lock starvation would be possible if the running core is always busy with heavy lock contention. More explanation is expected.