From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by sourceware.org (Postfix) with ESMTPS id E87D13858402 for ; Thu, 18 Nov 2021 00:03:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E87D13858402 Received: by mail-pl1-x62a.google.com with SMTP id b11so3603841pld.12 for ; Wed, 17 Nov 2021 16:03:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MpEJ/pZ6kJ4sd5l/Y9MH8BLbKuFDY3Mv5Kxbyvl0veo=; b=xqgJ7DgZfK9p5/auYDjjg/XMzeXN46TO0iXnM+8u8VCX+eyzno1cVRpBWpmA+BGB1E JG4aNIHd2TfFY5HqX312tVjEiPZzKBAHhFrOA6yRDoefOqQTIM5oXio+rQSslBuS6CMW smxulYYek0mKzYyiAWKjFWAZ1cvemcMcD3XmhsjIUlLH2nRwxkM337gQo3c1fVJQ+en8 Cm9NEPqKUtVAtshR/5XLYqfC81W1Nw8Ye0XR1Qq9wfUoZZyWiFB3TxUDFNyNV1s/VE51 ixLoKJLp5pzs21cFxPFrZA0s0TbUjKCkaJSx8qYM0ghI4zRLw3E+l+hJSUQAWVN24/Hl Xb7Q== X-Gm-Message-State: AOAM532GwCitlyNXMBONgmexq3bjdkN9OBly1nCBnnIjwK4kMh2rKThO uBz+vJnRZKrfUzl8yjqFc8wixLPOz9Jljqj+asg= X-Google-Smtp-Source: ABdhPJzGPp1tnUUVWWOZCV4yOW+6b/RkuBRgcX52qGLLRdb9BTMM9zN9fXzTH7a8miH+3eACdV1G9RFBU7FOU/4mww4= X-Received: by 2002:a17:902:ab14:b0:143:77d8:2558 with SMTP id ik20-20020a170902ab1400b0014377d82558mr60504873plb.54.1637193826050; Wed, 17 Nov 2021 16:03:46 -0800 (PST) MIME-Version: 1.0 References: <20211111162428.2286605-1-hjl.tools@gmail.com> <20211111162428.2286605-2-hjl.tools@gmail.com> In-Reply-To: From: Noah Goldstein Date: Wed, 17 Nov 2021 18:03:35 -0600 Message-ID: Subject: Re: [PATCH v6 1/4] Add LLL_MUTEX_READ_LOCK [BZ #28537] To: "H.J. Lu" Cc: GNU C Library , Florian Weimer , Oleh Derevenko , Arjan van de Ven , Andreas Schwab , "Paul A . Clarke" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Nov 2021 00:03:48 -0000 On Wed, Nov 17, 2021 at 5:55 PM H.J. Lu wrote: > > On Tue, Nov 16, 2021 at 6:24 PM Noah Goldstein wrote: > > > > On Thu, Nov 11, 2021 at 10:24 AM H.J. Lu wrote: > > > > > > CAS instruction is expensive. From the x86 CPU's point of view, getting > > > a cache line for writing is more expensive than reading. See Appendix > > > A.2 Spinlock in: > > > > > > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf > > > > > > The full compare and swap will grab the cache line exclusive and cause > > > excessive cache line bouncing. > > > > > > Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock > > > loop if compare may fail to reduce cache line bouncing on contended locks. > > > --- > > > nptl/pthread_mutex_lock.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > > > index 2bd41767e0..72058c719c 100644 > > > --- a/nptl/pthread_mutex_lock.c > > > +++ b/nptl/pthread_mutex_lock.c > > > @@ -64,6 +64,11 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex) > > > # define PTHREAD_MUTEX_VERSIONS 1 > > > #endif > > > > > > +#ifndef LLL_MUTEX_READ_LOCK > > > +# define LLL_MUTEX_READ_LOCK(mutex) \ > > > + atomic_load_relaxed (&(mutex)->__data.__lock) > > > +#endif > > > + > > > static int __pthread_mutex_lock_full (pthread_mutex_t *mutex) > > > __attribute_noinline__; > > > > > > @@ -141,6 +146,8 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) > > > break; > > > } > > > atomic_spin_nop (); > > > + if (LLL_MUTEX_READ_LOCK (mutex) != 0) > > > + continue; > > > > Now that the lock spins on a simple read should `max_cnt` be adjusted? > > Adding LLL_MUTEX_READ_LOCK just avoids the more expensive > LLL_MUTEX_TRYLOCK. It doesn't change the flow. Yes, but the loop will be able to run `max_cnt` iterations much faster now. Just wondering if the value needs to be re-tuned. Not that is necessarily needs to be. > > > https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_lock.c;h=762059b230ba97140d6ca16c7273b489592dd3bc;hb=d672a98a1af106bd68deb15576710cd61363f7a6#l143 > > > } > > > while (LLL_MUTEX_TRYLOCK (mutex) != 0); > > > > > > -- > > > 2.33.1 > > > > > > > -- > H.J.