From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by sourceware.org (Postfix) with ESMTPS id D56E63851C07 for ; Wed, 30 Mar 2022 11:53:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D56E63851C07 Received: by mail-ot1-x332.google.com with SMTP id d15-20020a9d72cf000000b005cda54187c3so14752039otk.2 for ; Wed, 30 Mar 2022 04:53:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=QKKhgNVdvCRxK2L6tnLafl8r5XQk2Kwep6ovK5jpCWo=; b=WR4MVzmlqD5saQU7GxrIHTcHV2NzgCCbed7fbSbUSZb7biRlKZaq7TEMgTz3a/vIm7 HaBcyxmaZfo7ngZ99gzs2xkcwLOsN16L6DHiCngjKrRccFflyBd6Rl/Vl4qGw4IIQ3oO GNPCsg/TxXfIeDvmacbk8Ucd8xYST1fMuIE10JwPqCREI9giCdyge4et/2oAlMiNf1Ml Q5nsntkhj/SLnf0GQ/Fokb6P0/7I25gdG4/fPGNdg6bLBeAgRfAkmzhteMYXBerc6kwX 5MsBsbvTyFPrOTM1Dpm8zUunPobxKV0R6bEW2lZgsbVdcO8c2wNIUeI8DQ4XSbOFIkBp Im/w== X-Gm-Message-State: AOAM532h9Epa7dP6r/NCouwyl+pGg5ivQNznmuJoD07xyELXomZDF9mx d+zM8E5NmvYK+NhnebASsfMlaw== X-Google-Smtp-Source: ABdhPJyXQuBSGyZBfjRgaN1XeWGqDML2+Zs+BzK+UzcY7crXSjgzG8yev4cgXFYidhe8PZGA4UbprQ== X-Received: by 2002:a05:6830:2706:b0:5b2:6ff9:5a84 with SMTP id j6-20020a056830270600b005b26ff95a84mr2953904otu.221.1648641227003; Wed, 30 Mar 2022 04:53:47 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:a6c0:ca7b:5b69:d952:46d0? ([2804:431:c7cb:a6c0:ca7b:5b69:d952:46d0]) by smtp.gmail.com with ESMTPSA id p1-20020a05683003c100b005c927b6e645sm10594019otc.20.2022.03.30.04.53.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Mar 2022 04:53:46 -0700 (PDT) Message-ID: <97b1105f-42e8-a347-f82e-c81e548f0c2f@linaro.org> Date: Wed, 30 Mar 2022 08:53:43 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v2] nptl: Add backoff mechanism to spinlock loop Content-Language: en-US To: Wangyang Guo , libc-alpha@sourceware.org References: <20220328084705.468207-1-wangyang.guo@intel.com> From: Adhemerval Zanella In-Reply-To: <20220328084705.468207-1-wangyang.guo@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Mar 2022 11:53:49 -0000 On 28/03/2022 05:47, Wangyang Guo via Libc-alpha wrote: > When mutiple threads waiting for lock at the same time, once lock owner > releases the lock, waiters will see lock available and all try to lock, > which may cause an expensive CAS storm. > > Binary exponential backoff with random jitter is introduced. As try-lock > attempt increases, there is more likely that a larger number threads > compete for adaptive mutex lock, so increase wait time in exponential. > A random jitter is also added to avoid synchronous try-lock from other > threads. > > v2: Remove read-check before try-lock for performance. > > Signed-off-by: Wangyang Guo > --- > nptl/pthread_mutex_lock.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > index d2e652d151..7e75ec1cba 100644 > --- a/nptl/pthread_mutex_lock.c > +++ b/nptl/pthread_mutex_lock.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > > /* Some of the following definitions differ when pthread_mutex_cond_lock.c > includes this file. */ > @@ -64,11 +65,6 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex) > # define PTHREAD_MUTEX_VERSIONS 1 > #endif > > -#ifndef LLL_MUTEX_READ_LOCK > -# define LLL_MUTEX_READ_LOCK(mutex) \ > - atomic_load_relaxed (&(mutex)->__data.__lock) > -#endif > - > static int __pthread_mutex_lock_full (pthread_mutex_t *mutex) > __attribute_noinline__; > > @@ -138,17 +134,28 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) > int cnt = 0; > int max_cnt = MIN (max_adaptive_count (), > mutex->__data.__spins * 2 + 10); > + int spin_count, exp_backoff = 1; > + unsigned int jitter = random_bits (); This will issue a syscall for architectures that do not have clock_gettime on vDSO, which is a performance regression. You will need to move the jitter setup to be arch-specific, where the generic interface setting no random jitter. > do > { > - if (cnt++ >= max_cnt) > + /* In each loop, spin count is exponential backoff plus > + random jitter, random range is [0, exp_backoff-1]. */ > + spin_count = exp_backoff + (jitter & (exp_backoff - 1)); > + cnt += spin_count; > + if (cnt >= max_cnt) > { > + /* If cnt exceeds max spin count, just go to wait > + queue. */ > LLL_MUTEX_LOCK (mutex); > break; > } > - atomic_spin_nop (); > + do > + atomic_spin_nop (); > + while (--spin_count > 0); > + /* Binary exponential backoff, prepare for next loop. */ > + exp_backoff <<= 1; > } > - while (LLL_MUTEX_READ_LOCK (mutex) != 0 > - || LLL_MUTEX_TRYLOCK (mutex) != 0); > + while (LLL_MUTEX_TRYLOCK (mutex) != 0); > > mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; > }