From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by sourceware.org (Postfix) with ESMTPS id F027A3857C44 for ; Fri, 26 Mar 2021 18:00:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F027A3857C44 Received: by mail-qk1-x72e.google.com with SMTP id v70so6099180qkb.8 for ; Fri, 26 Mar 2021 11:00:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=F/boOyO4LjXBGwwuryE+RKOsLhcFosyzYdN2a+8JLjc=; b=H/wbSTttT/yKVBHOC/r0NYcedSudg1eGegBCSrG4HlxbW3x2B68xR0xjDFg70NNuJE g77AGyTkarexzb8hUFztXt1ux02K4G4VZ921q+H9McMVM4D1lyhCPX0JOxxebKGlt//e j24Wxc4GPFRDgY/+J/G08IFHSR7WujKw3+CwtrO8QBeBB8+kEiPCvljKcmiuL8QSZk9n H4rVc9ZTd8dM81D4waJHHfrb5PReJaeReqGFT1PBjFuyBizcY0hhbRuuqKBefKTeh9ax WvlKzBbwKk0sgdMSCHFO5JKPosgZAZ445N/5vm/994rFoxBv9/BtWBvPfbMKiaT5BoDA W0qA== X-Gm-Message-State: AOAM533/eDFFvc1SOLwQMWTnWeoWWbyHNMW/lgFU+NLi/7pmwhvuEm+j gxBvJTqHEUiKz0bCytDYIT8o3vtBrPWddZwD X-Google-Smtp-Source: ABdhPJwgzPUNARhqO6bwzOue3TELIWgfUeOokzc9Pt9IOXOdtLMi/MUWSU6RgFVEDoMgc4vskq2geA== X-Received: by 2002:a37:9dc1:: with SMTP id g184mr14927777qke.285.1616781656470; Fri, 26 Mar 2021 11:00:56 -0700 (PDT) Received: from [192.168.1.132] ([177.194.41.149]) by smtp.gmail.com with ESMTPSA id d2sm7138898qkk.42.2021.03.26.11.00.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 26 Mar 2021 11:00:56 -0700 (PDT) Subject: Re: [PATCH v3 32/37] nptl: pthread_mutex_lock, pthread_mutex_unock single-threaded optimization To: libc-alpha@sourceware.org, Florian Weimer References: <782ad0d9a371fa66bd54df07413f3d15fba0cf5a.1615914632.git.fweimer@redhat.com> From: Adhemerval Zanella Message-ID: Date: Fri, 26 Mar 2021 15:00:53 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <782ad0d9a371fa66bd54df07413f3d15fba0cf5a.1615914632.git.fweimer@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2021 18:00:58 -0000 On 16/03/2021 14:31, Florian Weimer via Libc-alpha wrote: > This is optimization is similar in spirit to the SINGLE_THREAD_P check > in the malloc implementation. Doing this in generic code allows us > to prioritize those cases which are likely to occur in single-threaded > programs (normal and recursive mutexes). At first I though about suggesting to move it on generic lowlevellock.h, but it is used quite only in some specific places and once libpthread is moved we can use libc_lock_lock (which uses pthread_mutex) instead. The only issue with the approach of moving this optimization to pthread_mutex_lock (instead of custom lock as lll_lock) is to use a slight more memory due pthread_mutex_t object, but I think the sparse usage within glibc should be ok. It would be good if we disentangle and consolidate both lowlevellock.h and libc-lock.h so we have only header and definitions. LGTM, thanks. Reviewed-by: Adhemerval Zanella > --- > nptl/pthread_mutex_cond_lock.c | 1 + > nptl/pthread_mutex_lock.c | 25 ++++++++++++++++++++++--- > nptl/pthread_mutex_unlock.c | 17 ++++++++++++++++- > 3 files changed, 39 insertions(+), 4 deletions(-) > > diff --git a/nptl/pthread_mutex_cond_lock.c b/nptl/pthread_mutex_cond_lock.c > index 2f0771302f..3386bd689b 100644 > --- a/nptl/pthread_mutex_cond_lock.c > +++ b/nptl/pthread_mutex_cond_lock.c > @@ -2,6 +2,7 @@ > > #define LLL_MUTEX_LOCK(mutex) \ > lll_cond_lock ((mutex)->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex)) > +#define LLL_MUTEX_LOCK_OPTIMIZED(mutex) LLL_MUTEX_LOCK (mutex) > > /* Not actually elided so far. Needed? */ > #define LLL_MUTEX_LOCK_ELISION(mutex) \ > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index f0de7b7fd6..8649a92ffb 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -30,8 +30,27 @@ > /* Some of the following definitions differ when pthread_mutex_cond_lock.c > includes this file. */ > #ifndef LLL_MUTEX_LOCK > -# define LLL_MUTEX_LOCK(mutex) \ > +/* lll_lock with single-thread optimization. */ > +static inline void > +lll_mutex_lock_optimized (pthread_mutex_t *mutex) > +{ > + /* The single-threaded optimization is only valid for private > + mutexes. For process-shared mutexes, the mutex could be in a > + shared mapping, so synchronization with another process is needed > + even without any threads. If the lock is already marked as > + acquired, POSIX requires that pthread_mutex_lock deadlocks for > + normal mutexes, so skip the optimization in that case as > + well. */ > + int private = PTHREAD_MUTEX_PSHARED (mutex); > + if (private == LLL_PRIVATE && SINGLE_THREAD_P && mutex->__data.__lock == 0) > + mutex->__data.__lock = 1; > + else > + lll_lock (mutex->__data.__lock, private); > +} > + > +# define LLL_MUTEX_LOCK(mutex) \ > lll_lock ((mutex)->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex)) Ok. > +# define LLL_MUTEX_LOCK_OPTIMIZED(mutex) lll_mutex_lock_optimized (mutex) > # define LLL_MUTEX_TRYLOCK(mutex) \ > lll_trylock ((mutex)->__data.__lock) > # define LLL_ROBUST_MUTEX_LOCK_MODIFIER 0 > @@ -64,7 +83,7 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) > FORCE_ELISION (mutex, goto elision); > simple: > /* Normal mutex. */ > - LLL_MUTEX_LOCK (mutex); > + LLL_MUTEX_LOCK_OPTIMIZED (mutex); > assert (mutex->__data.__owner == 0); > } > #if ENABLE_ELISION_SUPPORT Ok. > @@ -99,7 +118,7 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) > } > > /* We have to get the mutex. */ > - LLL_MUTEX_LOCK (mutex); > + LLL_MUTEX_LOCK_OPTIMIZED (mutex); > > assert (mutex->__data.__owner == 0); > mutex->__data.__count = 1; Ok. > diff --git a/nptl/pthread_mutex_unlock.c b/nptl/pthread_mutex_unlock.c > index 3b5ccdacf9..655093ee9a 100644 > --- a/nptl/pthread_mutex_unlock.c > +++ b/nptl/pthread_mutex_unlock.c > @@ -28,6 +28,21 @@ static int > __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) > __attribute_noinline__; > > +/* lll_lock with single-thread optimization. */ > +static inline void > +lll_mutex_unlock_optimized (pthread_mutex_t *mutex) > +{ > + /* The single-threaded optimization is only valid for private > + mutexes. For process-shared mutexes, the mutex could be in a > + shared mapping, so synchronization with another process is needed > + even without any threads. */ > + int private = PTHREAD_MUTEX_PSHARED (mutex); > + if (private == LLL_PRIVATE && SINGLE_THREAD_P) > + mutex->__data.__lock = 0; > + else > + lll_unlock (mutex->__data.__lock, private); > +} > + > int > attribute_hidden > __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr) > @@ -51,7 +66,7 @@ __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr) > --mutex->__data.__nusers; > > /* Unlock. */ > - lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex)); > + lll_mutex_unlock_optimized (mutex); > > LIBC_PROBE (mutex_release, 1, mutex); > > Ok.