From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 86068 invoked by alias); 15 Jan 2019 02:33:09 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 86055 invoked by uid 89); 15 Jan 2019 02:33:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=H*RU:HELO, Hx-spam-relays-external:HELO, acts, act X-HELO: mga07.intel.com Subject: Re: [PATCH] NUMA spinlock [BZ #23962] To: Torvald Riegel , Rich Felker , "H.J. Lu" Cc: Ma Ling , GNU C Library , "Lu, Hongjiu" , "ling.ma" , Wei Xiao References: <20181226025019.38752-1-ling.ma@MacBook-Pro-8.local> <20190103204338.GU23599@brightrain.aerifal.cx> <20190103212113.GV23599@brightrain.aerifal.cx> <5c2bf8859a412759aba26a21b317ea98f6ff8eaf.camel@redhat.com> From: kemi Message-ID: <0b4620c1-a9c5-061e-9636-65d80655a6fd@intel.com> Date: Tue, 15 Jan 2019 02:33:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <5c2bf8859a412759aba26a21b317ea98f6ff8eaf.camel@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2019-01/txt/msg00320.txt.bz2 >> "Scalable spinlock" is something of an oxymoron. > > No, that's not true at all. Most high-performance shared-memory > synchronization constructs (on typical HW we have today) will do some kind > of spinning (and back-off), and there's nothing wrong about it. This can > scale very well. > >> Spinlocks are for >> situations where contention is extremely rare, > > No, the question is rather whether the program needs blocking through the > OS (for performance, or for semantics such as PI) or not. Energy may be > another factor. For example, glibc's current mutexes don't scale well on > short critical because there's not enough spinning being done. > yes. That's why we need pthread.mutex.spin_count tunable interface before. But, that's not enough. When tunable is not the bottleneck, the simple busy-waiting algorithm of current adaptive mutex is the major negative factor which degrades mutex performance. That's why I proposed to use MCS-based spinning-waiting algorithm for adaptive mutex. https://sourceware.org/ml/libc-alpha/2019-01/msg00279.html Also, if with very small critical section in the worklad, this new type of mutex with GNU extension PTHREAD_MUTEX_QUEUESPINNER_NP acts like MCS-spinlock, and performs much better than original spinlock. So, in some day, if adaptive mutex is tuned good enough, it should act like mcs-spinlock (or NUMA spinlock) if workload has small critical section, and performs like normal mutex if the critical section is too big to spinning-wait.