From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1964 invoked by alias); 15 Jun 2012 01:21:07 -0000 Received: (qmail 1943 invoked by uid 22791); 15 Jun 2012 01:21:05 -0000 X-SWARE-Spam-Status: No, hits=-4.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_PX X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 15 Jun 2012 01:20:50 +0000 Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93]) by relay1.mentorg.com with esmtp id 1SfLDV-0004UU-MV from Maxim_Kuvyrkov@mentor.com ; Thu, 14 Jun 2012 18:20:49 -0700 Received: from SVR-IES-FEM-03.mgc.mentorg.com ([137.202.0.108]) by svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 14 Jun 2012 18:20:45 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-03.mgc.mentorg.com (137.202.0.108) with Microsoft SMTP Server id 14.1.289.1; Fri, 15 Jun 2012 02:18:45 +0100 Subject: Re: [PATCH] Optimize libc_lock_lock for MIPS XLP. MIME-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset="iso-8859-1" From: Maxim Kuvyrkov In-Reply-To: <4FD9DB74.8080905@tilera.com> Date: Fri, 15 Jun 2012 01:21:00 -0000 CC: "Joseph S. Myers" , GLIBC Devel , , Tom de Vries Content-Transfer-Encoding: quoted-printable Message-ID: <40CBC472-71CC-4FF3-A452-073B76701215@codesourcery.com> References: <4FD9DB74.8080905@tilera.com> To: Chris Metcalf Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org X-SW-Source: 2012-06/txt/msg00041.txt.bz2 On 15/06/2012, at 12:39 AM, Chris Metcalf wrote: > On 6/14/2012 1:03 AM, Maxim Kuvyrkov wrote: >> These two patches (libc part and ports part) optimize libc_lock_lock() m= acro that GLIBC uses for locking internally to take advantage of fetch_and_= add instruction that is available as an extension on certain processors, e.= g., MIPS-architecture XLP. >>=20 >> The libc_lock_lock macros implement boolean lock: 0 corresponds to unloc= ked state and non-zero corresponds to locked state. >=20 > Just to be clear, if you put this comment somewhere when you commit, you > should say locks are tristate, where 0 is unlocked, 1 is locked and > uncontended, and >1 is locked and contended. Right, it's all coming back now. I will update the comments to mention thi= s. [This optimization was written around 6 months ago, and not by me. Thi= s and below points are worth elaborating on, thanks for bringing them up.] I've CC'ed Tom de Vries, who is the original author of patch. Tom, please = let us know if I'm misrepresenting the optimization or the rationale for it= s correctness. >=20 >> It is, therefore, possible to use fetch_and_add semantics to acquire loc= k in libc_lock_lock. For XLP this translates to a single LDADD instruction= . This optimization allows architectures that can perform fetch_and_add fa= ster than compare_and_exchange, such situation is indicated by defining the= new macro "lll_add_lock". >>=20 >> The unlocking counterpart doesn't require any change as it is already us= es plain atomic_exchange operation, which, incidentally, also supported on = XLP as a single instruction. >=20 > This seems like it would work well for a single thread acquiring the lock, > but I have some questions about it in the presence of multiple threads > trying to acquire the lock. >=20 > First, the generic __lll_lock_wait() code assumes the contended value is > exactly "2". Um, not exactly. __lll_lock_wait() *sets* the contended lock to a value of= "2", but it will work as well with >2 values. void __lll_lock_wait (int *futex, int private) { if (*futex =3D=3D 2) lll_futex_wait (futex, 2, private); while (atomic_exchange_acq (futex, 2) !=3D 0) lll_futex_wait (futex, 2, private); } > So if two or more threads both try and fail to acquire the > lock, the value will be >2. This will cause the waiters to busywait, > spinning on atomic exchange instructions, rather than calling into > futex_wait(). As I read it, in case of a contended lock __lll_lock_wait will reset the va= lue of the lock to "2" before calling lll_futex_wait(). I agree that there= is a timing window in which the other threads will see a value of the lock= greater than "2", but the value will not get as high as hundreds or billio= ns as it will be constantly reset to "2" in atomic_exchange in lll_lock_wai= t(). I do not see how threads will get into a busywait state, though. Would you= please elaborate on that? > I think it might be possible to change the generic code to > support the more general ">1" semantics of contended locks, but it might = be > a bit less efficient, so you might end up wanting to provide overrides for > these functions on MIPS. Even on MIPS it might result in a certain amount > of spinning since you'd have to hit the race window correctly to feed the > right value of the lock to futex_wait. >=20 > Second, if a lock is held long enough for 4 billion threads to try to > acquire it and fail, you will end up with an unlocked lock. :-) I'm not > sure how likely this seems, but it is a potential issue. You might > consider, for example, doing a cmpxchg on the contended-lock path to try = to > reset the lock value back to 2 again; if it fails, it's not a big deal, > since statistically I would expect the occasional thread to succeed, which > is all you need. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics