From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21000 invoked by alias); 14 Aug 2012 04:00:58 -0000 Received: (qmail 20986 invoked by uid 22791); 14 Aug 2012 04:00:56 -0000 X-SWARE-Spam-Status: No, hits=-4.7 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_FN X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 14 Aug 2012 04:00:41 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1T18J6-0003qi-4s from Maxim_Kuvyrkov@mentor.com ; Mon, 13 Aug 2012 21:00:40 -0700 Received: from SVR-IES-FEM-02.mgc.mentorg.com ([137.202.0.106]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 13 Aug 2012 21:00:39 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.1.289.1; Tue, 14 Aug 2012 05:00:37 +0100 Subject: Re: [PATCH] Optimize libc_lock_lock for MIPS XLP. MIME-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset="iso-8859-1" From: Maxim Kuvyrkov In-Reply-To: <4FF73F75.6060303@mentor.com> Date: Tue, 14 Aug 2012 04:00:00 -0000 CC: "Joseph S. Myers" , GLIBC Devel , , Tom de Vries Content-Transfer-Encoding: quoted-printable Message-ID: <7CC74175-BA9B-4461-8918-9D99DABEC484@codesourcery.com> References: <4FD9DB74.8080905@tilera.com> <40CBC472-71CC-4FF3-A452-073B76701215@codesourcery.com> <4FDAA190.3050706@tilera.com> <15EB7E17-5692-4221-A1B1-FC16EA236BFF@codesourcery.com> <4FEC94AF.40301@tilera.com> <4FF73F75.6060303@mentor.com> To: Tom de Vries , Chris Metcalf Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org X-SW-Source: 2012-08/txt/msg00115.txt.bz2 On 7/07/2012, at 7:41 AM, Tom de Vries wrote: > On 28/06/12 19:30, Chris Metcalf wrote: >>=20 >>=20 >> It looks OK to me. I would want someone else to sign off on it before >> applying to 2.17. >>=20 >=20 > Chris, >=20 > I cannot sign off on this, but I reviewed the current patch as well and i= t looks > ok to me too. >=20 > Thanks, > - Tom Attached is an updated version of the patch. Given reviews from Chris and = Tom I intend to commit this patch in couple of days if no-one objects. The differences in this version are 1. the use of now-available atomic_exchange_and_add_acq macro (previously o= nly atomic_exchange_and_add existed), 2. __libc_lock_lock is now defined for all MIPS processors, not just XLP, s= ince there is no downside to using atomic_exchange_and_add_acq versus atomi= c_compare_and_exchange_acq, 3. as Tom correctly spotted, in __libc_lock_trylock we only need to perform= exchange for >=3D2 values. For 0 and 1 everything works out by itself. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics Optimize __libc_lock_lock and __libc_lock_trylock for MIPS. * nptl/sysdeps/pthread/bits/libc-lockP.h (__libc_lock_lock) (__libc_lock_trylock): Allow pre-existing definitions. ports/ * sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h (__libc_lock_lock) (__libc_lock_trylock): Define versions optimized for MIPS. --- nptl/sysdeps/pthread/bits/libc-lockP.h | 10 ++++- .../unix/sysv/linux/mips/nptl/lowlevellock.h | 39 ++++++++++++++++= +++- 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/nptl/sysdeps/pthread/bits/libc-lockP.h b/nptl/sysdeps/pthread/= bits/libc-lockP.h index 0ebac91..7adaeb4 100644 --- a/nptl/sysdeps/pthread/bits/libc-lockP.h +++ b/nptl/sysdeps/pthread/bits/libc-lockP.h @@ -176,9 +176,12 @@ typedef pthread_key_t __libc_key_t; =20 /* Lock the named lock variable. */ #if !defined NOT_IN_libc || defined IS_IN_libpthread -# define __libc_lock_lock(NAME) \ +# ifndef __libc_lock_lock +# define __libc_lock_lock(NAME) \ ({ lll_lock (NAME, LLL_PRIVATE); 0; }) +# endif #else +# undef __libc_lock_lock # define __libc_lock_lock(NAME) \ __libc_maybe_call (__pthread_mutex_lock, (&(NAME)), 0) #endif @@ -189,9 +192,12 @@ typedef pthread_key_t __libc_key_t; =20 /* Try to lock the named lock variable. */ #if !defined NOT_IN_libc || defined IS_IN_libpthread -# define __libc_lock_trylock(NAME) \ +# ifndef __libc_lock_trylock +# define __libc_lock_trylock(NAME) \ lll_trylock (NAME) +# endif #else +# undef __libc_lock_trylock # define __libc_lock_trylock(NAME) \ __libc_maybe_call (__pthread_mutex_trylock, (&(NAME)), 0) #endif diff --git a/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h b/ports= /sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h index 88b601e..2584e7d 100644 --- a/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h +++ b/ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h @@ -1,5 +1,4 @@ -/* Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, - 2009 Free Software Foundation, Inc. +/* Copyright (C) 2003-2012 Free Software Foundation, Inc. This file is part of the GNU C Library. =20 The GNU C Library is free software; you can redistribute it and/or @@ -291,4 +290,40 @@ extern int __lll_timedwait_tid (int *, const struct ti= mespec *) __res; \ }) =20 +/* Implement __libc_lock_lock using exchange_and_add, which expands into + a single instruction on XLP processors. We enable this for all MIPS + processors as atomic_exchange_and_add_acq and + atomic_compared_and_exchange_acq take the same time to execute. + This is a simplified expansion of ({ lll_lock (NAME, LLL_PRIVATE); 0; }= ). + + Note: __lll_lock_wait_private() resets lock value to '2', which prevents + unbounded increase of the lock value and [with billions of threads] + overflow. */ +#define __libc_lock_lock(NAME) \ + ({ \ + int *__futex =3D &(NAME); \ + if (__builtin_expect (atomic_exchange_and_add_acq (__futex, 1), 0)) \ + __lll_lock_wait_private (__futex); \ + 0; \ + }) + +#ifdef _MIPS_ARCH_XLP +/* The generic version using a single atomic_compare_and_exchange_acq takes + less time for non-XLP processors, so we use below for XLP only. */ +# define __libc_lock_trylock(NAME) \ + ({ \ + int *__futex =3D &(NAME); \ + int __result =3D atomic_exchange_and_add_acq (__futex, 1); \ + /* If __result =3D=3D 0, we succeeded in acquiring the lock. \ + If __result =3D=3D 1, we switched the lock to 'contended' state, whic= h \ + will cause a [possibly unnecessary] call to lll_futex_wait. This is \ + unlikely, so we accept the possible inefficiency. \ + If __result >=3D 2, we need to set the lock to 'contended' state to a= void \ + unbounded increase from subsequent trylocks. */ \ + if (__result >=3D 2) \ + __result =3D (atomic_exchange_acq (__futex, 2) !=3D 0); \ + __result; \ + }) +#endif + #endif /* lowlevellock.h */ --=20 1.7.4.1