From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id AA0743858C2C for ; Thu, 18 Nov 2021 00:31:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AA0743858C2C Received: by mail-pl1-x634.google.com with SMTP id z6so2495151plk.6 for ; Wed, 17 Nov 2021 16:31:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PqlrC3idQoG4WU5wfpw/NyhUgAUwZJJDkEgnKxMFhP4=; b=eaSDpUcBRnnJamWtDPFAsxoQZ/uwByfQ8Fp8LJSLRXDcoysQx8M6QpfiwMFv7I6EQS a4zmRN1WXithfYhUVLaU/uAWkhKY2mSZriqP/3ex25qma/b32eT4N1paiRt4XMB5DEdN 62w94EbQxrN7GWTO3m2Pu8rrEmppbheXFLd4QRHBjlxfVPIfELRRkVUPZ1S8a3CzF7d9 5xLHXRAAn8GgZV4w44Twk145jhQURxjvYbG7jeef/Tfs64CVE6F0wWb8AdRCFyB+LYeF b6eBd7Plb5LdH4d/3nyA9kw36v6N77g9et91L2TNNbgZZKk/wwpckbXv4AU9tNPSbIw+ n31w== X-Gm-Message-State: AOAM532oDVZffl0g6O9NLBlexjr3EFrIsg4WFZy8isDFOHoAyQ8CuX2e 0fTU501xbKTb8MSRhL33qDkUYLcaCYT9CMVj6xg= X-Google-Smtp-Source: ABdhPJyZo32XnIWxtpQ98LF1aodjTaOb6SV7kZNb/jbew1dWSZhRT8s9dZaBAUti2S/sHgkzc57xMRyh5csQ68akjW0= X-Received: by 2002:a17:902:ced1:b0:141:e15d:49e0 with SMTP id d17-20020a170902ced100b00141e15d49e0mr61250560plg.27.1637195508804; Wed, 17 Nov 2021 16:31:48 -0800 (PST) MIME-Version: 1.0 References: <20211111162428.2286605-1-hjl.tools@gmail.com> <20211111162428.2286605-2-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Wed, 17 Nov 2021 16:31:13 -0800 Message-ID: Subject: Re: [PATCH v6 1/4] Add LLL_MUTEX_READ_LOCK [BZ #28537] To: Noah Goldstein Cc: GNU C Library , Florian Weimer , Oleh Derevenko , Arjan van de Ven , Andreas Schwab , "Paul A . Clarke" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3028.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Nov 2021 00:31:51 -0000 On Wed, Nov 17, 2021 at 4:03 PM Noah Goldstein wrote: > > On Wed, Nov 17, 2021 at 5:55 PM H.J. Lu wrote: > > > > On Tue, Nov 16, 2021 at 6:24 PM Noah Goldstein wrote: > > > > > > On Thu, Nov 11, 2021 at 10:24 AM H.J. Lu wrote: > > > > > > > > CAS instruction is expensive. From the x86 CPU's point of view, getting > > > > a cache line for writing is more expensive than reading. See Appendix > > > > A.2 Spinlock in: > > > > > > > > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf > > > > > > > > The full compare and swap will grab the cache line exclusive and cause > > > > excessive cache line bouncing. > > > > > > > > Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock > > > > loop if compare may fail to reduce cache line bouncing on contended locks. > > > > --- > > > > nptl/pthread_mutex_lock.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c > > > > index 2bd41767e0..72058c719c 100644 > > > > --- a/nptl/pthread_mutex_lock.c > > > > +++ b/nptl/pthread_mutex_lock.c > > > > @@ -64,6 +64,11 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex) > > > > # define PTHREAD_MUTEX_VERSIONS 1 > > > > #endif > > > > > > > > +#ifndef LLL_MUTEX_READ_LOCK > > > > +# define LLL_MUTEX_READ_LOCK(mutex) \ > > > > + atomic_load_relaxed (&(mutex)->__data.__lock) > > > > +#endif > > > > + > > > > static int __pthread_mutex_lock_full (pthread_mutex_t *mutex) > > > > __attribute_noinline__; > > > > > > > > @@ -141,6 +146,8 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) > > > > break; > > > > } > > > > atomic_spin_nop (); > > > > + if (LLL_MUTEX_READ_LOCK (mutex) != 0) > > > > + continue; > > > > > > Now that the lock spins on a simple read should `max_cnt` be adjusted? > > > > Adding LLL_MUTEX_READ_LOCK just avoids the more expensive > > LLL_MUTEX_TRYLOCK. It doesn't change the flow. > > Yes, but the loop will be able to run `max_cnt` iterations much faster now. > Just wondering if the value needs to be re-tuned. Not that is necessarily needs > to be. Maybe if we can find some data to show for. > > > > > https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_lock.c;h=762059b230ba97140d6ca16c7273b489592dd3bc;hb=d672a98a1af106bd68deb15576710cd61363f7a6#l143 > > > > } > > > > while (LLL_MUTEX_TRYLOCK (mutex) != 0); > > > > > > > > -- > > > > 2.33.1 > > > > > > > > > > > > -- > > H.J. -- H.J.