From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-ports-return-2822-listarch-libc-ports=sources.redhat.com@sourceware.org>
Received: (qmail 1964 invoked by alias); 15 Jun 2012 01:21:07 -0000
Received: (qmail 1943 invoked by uid 22791); 15 Jun 2012 01:21:05 -0000
X-SWARE-Spam-Status: No, hits=-4.6 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_PX
X-Spam-Check-By: sourceware.org
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 15 Jun 2012 01:20:50 +0000
Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93])	by relay1.mentorg.com with esmtp 	id 1SfLDV-0004UU-MV from Maxim_Kuvyrkov@mentor.com ; Thu, 14 Jun 2012 18:20:49 -0700
Received: from SVR-IES-FEM-03.mgc.mentorg.com ([137.202.0.108]) by svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675);	 Thu, 14 Jun 2012 18:20:45 -0700
Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-03.mgc.mentorg.com (137.202.0.108) with Microsoft SMTP Server id 14.1.289.1; Fri, 15 Jun 2012 02:18:45 +0100
Subject: Re: [PATCH] Optimize libc_lock_lock for MIPS XLP.
MIME-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset="iso-8859-1"
From: Maxim Kuvyrkov <maxim@codesourcery.com>
In-Reply-To: <4FD9DB74.8080905@tilera.com>
Date: Fri, 15 Jun 2012 01:21:00 -0000
CC: "Joseph S. Myers" <joseph@codesourcery.com>, GLIBC Devel	<libc-alpha@sourceware.org>, <libc-ports@sourceware.org>, Tom de Vries	<vries@codesourcery.com>
Content-Transfer-Encoding: quoted-printable
Message-ID: <40CBC472-71CC-4FF3-A452-073B76701215@codesourcery.com>
References: <FC4EF172-B43E-4298-A2E9-681FA28650DB@mentor.com> <4FD9DB74.8080905@tilera.com>
To: Chris Metcalf <cmetcalf@tilera.com>
Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-ports.sourceware.org>
List-Subscribe: <mailto:libc-ports-subscribe@sourceware.org>
List-Post: <mailto:libc-ports@sourceware.org>
List-Help: <mailto:libc-ports-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-ports-owner@sourceware.org
X-SW-Source: 2012-06/txt/msg00041.txt.bz2

On 15/06/2012, at 12:39 AM, Chris Metcalf wrote:

> On 6/14/2012 1:03 AM, Maxim Kuvyrkov wrote:
>> These two patches (libc part and ports part) optimize libc_lock_lock() m=
acro that GLIBC uses for locking internally to take advantage of fetch_and_=
add instruction that is available as an extension on certain processors, e.=
g., MIPS-architecture XLP.
>>=20
>> The libc_lock_lock macros implement boolean lock: 0 corresponds to unloc=
ked state and non-zero corresponds to locked state.
>=20
> Just to be clear, if you put this comment somewhere when you commit, you
> should say locks are tristate, where 0 is unlocked, 1 is locked and
> uncontended, and >1 is locked and contended.

Right, it's all coming back now.  I will update the comments to mention thi=
s.  [This optimization was written around 6 months ago, and not by me.  Thi=
s and below points are worth elaborating on, thanks for bringing them up.]

I've CC'ed Tom de Vries, who is the original author of patch.  Tom, please =
let us know if I'm misrepresenting the optimization or the rationale for it=
s correctness.

>=20
>> It is, therefore, possible to use fetch_and_add semantics to acquire loc=
k in libc_lock_lock.  For XLP this translates to a single LDADD instruction=
.  This optimization allows architectures that can perform fetch_and_add fa=
ster than compare_and_exchange, such situation is indicated by defining the=
 new macro "lll_add_lock".
>>=20
>> The unlocking counterpart doesn't require any change as it is already us=
es plain atomic_exchange operation, which, incidentally, also supported on =
XLP as a single instruction.
>=20
> This seems like it would work well for a single thread acquiring the lock,
> but I have some questions about it in the presence of multiple threads
> trying to acquire the lock.
>=20
> First, the generic __lll_lock_wait() code assumes the contended value is
> exactly "2".

Um, not exactly.  __lll_lock_wait() *sets* the contended lock to a value of=
 "2", but it will work as well with >2 values.

void
__lll_lock_wait (int *futex, int private)
{
  if (*futex =3D=3D 2)
    lll_futex_wait (futex, 2, private);

  while (atomic_exchange_acq (futex, 2) !=3D 0)
    lll_futex_wait (futex, 2, private);
}

>  So if two or more threads both try and fail to acquire the
> lock, the value will be >2.  This will cause the waiters to busywait,
> spinning on atomic exchange instructions, rather than calling into
> futex_wait().

As I read it, in case of a contended lock __lll_lock_wait will reset the va=
lue of the lock to "2" before calling lll_futex_wait().  I agree that there=
 is a timing window in which the other threads will see a value of the lock=
 greater than "2", but the value will not get as high as hundreds or billio=
ns as it will be constantly reset to "2" in atomic_exchange in lll_lock_wai=
t().

I do not see how threads will get into a busywait state, though.  Would you=
 please elaborate on that?

>  I think it might be possible to change the generic code to
> support the more general ">1" semantics of contended locks, but it might =
be
> a bit less efficient, so you might end up wanting to provide overrides for
> these functions on MIPS.  Even on MIPS it might result in a certain amount
> of spinning since you'd have to hit the race window correctly to feed the
> right value of the lock to futex_wait.
>=20
> Second, if a lock is held long enough for 4 billion threads to try to
> acquire it and fail, you will end up with an unlocked lock. :-)  I'm not
> sure how likely this seems, but it is a potential issue.  You might
> consider, for example, doing a cmpxchg on the contended-lock path to try =
to
> reset the lock value back to 2 again; if it fails, it's not a big deal,
> since statistically I would expect the occasional thread to succeed, which
> is all you need.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics