From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adhemerval.zanella@linaro.org>
Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com
 [IPv6:2607:f8b0:4864:20::72e])
 by sourceware.org (Postfix) with ESMTPS id F027A3857C44
 for <libc-alpha@sourceware.org>; Fri, 26 Mar 2021 18:00:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F027A3857C44
Received: by mail-qk1-x72e.google.com with SMTP id v70so6099180qkb.8
 for <libc-alpha@sourceware.org>; Fri, 26 Mar 2021 11:00:56 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=F/boOyO4LjXBGwwuryE+RKOsLhcFosyzYdN2a+8JLjc=;
 b=H/wbSTttT/yKVBHOC/r0NYcedSudg1eGegBCSrG4HlxbW3x2B68xR0xjDFg70NNuJE
 g77AGyTkarexzb8hUFztXt1ux02K4G4VZ921q+H9McMVM4D1lyhCPX0JOxxebKGlt//e
 j24Wxc4GPFRDgY/+J/G08IFHSR7WujKw3+CwtrO8QBeBB8+kEiPCvljKcmiuL8QSZk9n
 H4rVc9ZTd8dM81D4waJHHfrb5PReJaeReqGFT1PBjFuyBizcY0hhbRuuqKBefKTeh9ax
 WvlKzBbwKk0sgdMSCHFO5JKPosgZAZ445N/5vm/994rFoxBv9/BtWBvPfbMKiaT5BoDA
 W0qA==
X-Gm-Message-State: AOAM533/eDFFvc1SOLwQMWTnWeoWWbyHNMW/lgFU+NLi/7pmwhvuEm+j
 gxBvJTqHEUiKz0bCytDYIT8o3vtBrPWddZwD
X-Google-Smtp-Source: ABdhPJwgzPUNARhqO6bwzOue3TELIWgfUeOokzc9Pt9IOXOdtLMi/MUWSU6RgFVEDoMgc4vskq2geA==
X-Received: by 2002:a37:9dc1:: with SMTP id
 g184mr14927777qke.285.1616781656470; 
 Fri, 26 Mar 2021 11:00:56 -0700 (PDT)
Received: from [192.168.1.132] ([177.194.41.149])
 by smtp.gmail.com with ESMTPSA id d2sm7138898qkk.42.2021.03.26.11.00.55
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Fri, 26 Mar 2021 11:00:56 -0700 (PDT)
Subject: Re: [PATCH v3 32/37] nptl: pthread_mutex_lock, pthread_mutex_unock
 single-threaded optimization
To: libc-alpha@sourceware.org, Florian Weimer <fweimer@redhat.com>
References: <cover.1615914631.git.fweimer@redhat.com>
 <782ad0d9a371fa66bd54df07413f3d15fba0cf5a.1615914632.git.fweimer@redhat.com>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Message-ID: <b99fc610-7a3f-0f70-ec79-5f12e077fa3d@linaro.org>
Date: Fri, 26 Mar 2021 15:00:53 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <782ad0d9a371fa66bd54df07413f3d15fba0cf5a.1615914632.git.fweimer@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Mar 2021 18:00:58 -0000


On 16/03/2021 14:31, Florian Weimer via Libc-alpha wrote:
> This is optimization is similar in spirit to the SINGLE_THREAD_P check
> in the malloc implementation.  Doing this in generic code allows us
> to prioritize those cases which are likely to occur in single-threaded
> programs (normal and recursive mutexes).

At first I though about suggesting to move it on generic lowlevellock.h,
but it is used quite only in some specific places and once libpthread is 
moved we can use libc_lock_lock (which uses pthread_mutex) instead.

The only issue with the approach of moving this optimization to 
pthread_mutex_lock (instead of custom lock as lll_lock) is to use a
slight more memory due pthread_mutex_t object, but I think the sparse
usage within glibc should be ok.

It would be good if we disentangle and consolidate both lowlevellock.h 
and libc-lock.h so we have only header and definitions.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  nptl/pthread_mutex_cond_lock.c |  1 +
>  nptl/pthread_mutex_lock.c      | 25 ++++++++++++++++++++++---
>  nptl/pthread_mutex_unlock.c    | 17 ++++++++++++++++-
>  3 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/nptl/pthread_mutex_cond_lock.c b/nptl/pthread_mutex_cond_lock.c
> index 2f0771302f..3386bd689b 100644
> --- a/nptl/pthread_mutex_cond_lock.c
> +++ b/nptl/pthread_mutex_cond_lock.c
> @@ -2,6 +2,7 @@
>  
>  #define LLL_MUTEX_LOCK(mutex) \
>    lll_cond_lock ((mutex)->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex))
> +#define LLL_MUTEX_LOCK_OPTIMIZED(mutex) LLL_MUTEX_LOCK (mutex)
>  
>  /* Not actually elided so far. Needed? */
>  #define LLL_MUTEX_LOCK_ELISION(mutex)  \
> diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
> index f0de7b7fd6..8649a92ffb 100644
> --- a/nptl/pthread_mutex_lock.c
> +++ b/nptl/pthread_mutex_lock.c
> @@ -30,8 +30,27 @@
>  /* Some of the following definitions differ when pthread_mutex_cond_lock.c
>     includes this file.  */
>  #ifndef LLL_MUTEX_LOCK
> -# define LLL_MUTEX_LOCK(mutex) \
> +/* lll_lock with single-thread optimization.  */
> +static inline void
> +lll_mutex_lock_optimized (pthread_mutex_t *mutex)
> +{
> +  /* The single-threaded optimization is only valid for private
> +     mutexes.  For process-shared mutexes, the mutex could be in a
> +     shared mapping, so synchronization with another process is needed
> +     even without any threads.  If the lock is already marked as
> +     acquired, POSIX requires that pthread_mutex_lock deadlocks for
> +     normal mutexes, so skip the optimization in that case as
> +     well.  */
> +  int private = PTHREAD_MUTEX_PSHARED (mutex);
> +  if (private == LLL_PRIVATE && SINGLE_THREAD_P && mutex->__data.__lock == 0)
> +    mutex->__data.__lock = 1;
> +  else
> +    lll_lock (mutex->__data.__lock, private);
> +}
> +
> +# define LLL_MUTEX_LOCK(mutex)						\
>    lll_lock ((mutex)->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex))

Ok.

> +# define LLL_MUTEX_LOCK_OPTIMIZED(mutex) lll_mutex_lock_optimized (mutex)
>  # define LLL_MUTEX_TRYLOCK(mutex) \
>    lll_trylock ((mutex)->__data.__lock)
>  # define LLL_ROBUST_MUTEX_LOCK_MODIFIER 0
> @@ -64,7 +83,7 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>        FORCE_ELISION (mutex, goto elision);
>      simple:
>        /* Normal mutex.  */
> -      LLL_MUTEX_LOCK (mutex);
> +      LLL_MUTEX_LOCK_OPTIMIZED (mutex);
>        assert (mutex->__data.__owner == 0);
>      }
>  #if ENABLE_ELISION_SUPPORT

Ok.

> @@ -99,7 +118,7 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
>  	}
>  
>        /* We have to get the mutex.  */
> -      LLL_MUTEX_LOCK (mutex);
> +      LLL_MUTEX_LOCK_OPTIMIZED (mutex);
>  
>        assert (mutex->__data.__owner == 0);
>        mutex->__data.__count = 1;

Ok.

> diff --git a/nptl/pthread_mutex_unlock.c b/nptl/pthread_mutex_unlock.c
> index 3b5ccdacf9..655093ee9a 100644
> --- a/nptl/pthread_mutex_unlock.c
> +++ b/nptl/pthread_mutex_unlock.c
> @@ -28,6 +28,21 @@ static int
>  __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr)
>       __attribute_noinline__;
>  
> +/* lll_lock with single-thread optimization.  */
> +static inline void
> +lll_mutex_unlock_optimized (pthread_mutex_t *mutex)
> +{
> +  /* The single-threaded optimization is only valid for private
> +     mutexes.  For process-shared mutexes, the mutex could be in a
> +     shared mapping, so synchronization with another process is needed
> +     even without any threads.  */
> +  int private = PTHREAD_MUTEX_PSHARED (mutex);
> +  if (private == LLL_PRIVATE && SINGLE_THREAD_P)
> +    mutex->__data.__lock = 0;
> +  else
> +    lll_unlock (mutex->__data.__lock, private);
> +}
> +
>  int
>  attribute_hidden
>  __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr)
> @@ -51,7 +66,7 @@ __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr)
>  	--mutex->__data.__nusers;
>  
>        /* Unlock.  */
> -      lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex));
> +      lll_mutex_unlock_optimized (mutex);
>  
>        LIBC_PROBE (mutex_release, 1, mutex);
>  
> 

Ok.