From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id AD7363858C2C for ; Mon, 8 Nov 2021 15:32:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AD7363858C2C Received: by mail-lf1-x12d.google.com with SMTP id p16so37100308lfa.2 for ; Mon, 08 Nov 2021 07:32:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VhQBU7wm6CpestJDfmjsbPR+CiDKzuDGlOm34UZ2Wm4=; b=dknxZ8ZdPKVZGkONS0FOGXJcAvTxywkP7fUl/XLZ7wc50kZI6AWT6QzVftzD/C59Mz ouBH1iLC+lqgccLDsNKh8qjhnQlZMwsFYPyMa/HuOHY0V6VluFyV//Za05WkuJpTtkXQ Iy2MSsDN3Fg8YEekvYYWSZP6nEybLbDkVf5NPzj+bQfTLkp/Pyx3eurihtxQmhIM0oig Fv7CfSYz4I8a2RkfbOMYhWyUDx7/v+YGe8z802xnWK0Bc6X+4bG6J2iexTFn++78bPwL ifqA0jxNSucMPLDmw4REQ2gFIFwSkT0wA9aDzMWo0SenEbJMwwRLxI4Z0WIeUKsbVXEu c9kg== X-Gm-Message-State: AOAM530eEQ+Kq22ZFe4PRbY+RHvWSK9B35s759F9AtMZDDvdO/5r65R7 s1nc+Vw/nCDnsuTcQ+ex3Ty1k5A0BpDXKqx0pmE= X-Google-Smtp-Source: ABdhPJzZGPyFQ48Ef+9tIys7jSsP/A4Fz2TfXtWR2S1Y+rJ/PFcSX2uK8xT5U7ivat7z8pgVTVCwPBD5c2c5t4FwEeU= X-Received: by 2002:a05:6512:21cb:: with SMTP id d11mr97310lft.579.1636385547283; Mon, 08 Nov 2021 07:32:27 -0800 (PST) MIME-Version: 1.0 References: <20211104161443.734681-1-hjl.tools@gmail.com> In-Reply-To: <20211104161443.734681-1-hjl.tools@gmail.com> From: Noah Goldstein Date: Mon, 8 Nov 2021 09:32:16 -0600 Message-ID: Subject: Re: [PATCH v3] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537] To: "H.J. Lu" Cc: GNU C Library , Florian Weimer , Hongyu Wang , Andreas Schwab , liuhongt , Arjan van de Ven Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2021 15:32:30 -0000 On Thu, Nov 4, 2021 at 11:15 AM H.J. Lu via Libc-alpha wrote: > > From the CPU's point of view, getting a cache line for writing is more > expensive than reading. See Appendix A.2 Spinlock in: > > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf > > The full compare and swap will grab the cache line exclusive and cause > excessive cache line bouncing. Load the current memory value via a > volatile pointer first, which should be atomic and won't be optimized > out by compiler, check and return immediately if writing cache line may > fail to reduce cache line bouncing on contended locks. > > This fixes BZ# 28537. > > A GCC bug is opened: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065 > > The fixed compiler should define __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK > to indicate that compiler will generate the check with the volatile load. > Then glibc can check __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to avoid the > extra volatile load. > --- > sysdeps/x86/atomic-machine.h | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h > index 2692d94a92..597dc1cf92 100644 > --- a/sysdeps/x86/atomic-machine.h > +++ b/sysdeps/x86/atomic-machine.h > @@ -73,9 +73,20 @@ typedef uintmax_t uatomic_max_t; > #define ATOMIC_EXCHANGE_USES_CAS 0 > > #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \ > - __sync_val_compare_and_swap (mem, oldval, newval) > + ({ volatile __typeof (*(mem)) *memp = (mem); \ > + __typeof (*(mem)) oldmem = *memp, ret; \ > + ret = (oldmem == (oldval) \ > + ? __sync_val_compare_and_swap (mem, oldval, newval) \ > + : oldmem); \ > + ret; }) > #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \ > - (! __sync_bool_compare_and_swap (mem, oldval, newval)) > + ({ volatile __typeof (*(mem)) *memp = (mem); \ > + __typeof (*(mem)) oldmem = *memp; \ > + int ret; \ > + ret = (oldmem == (oldval) \ > + ? !__sync_bool_compare_and_swap (mem, oldval, newval) \ > + : 1); \ > + ret; }) > > > #define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \ > -- > 2.33.1 > Worth noting on X86 any of the __atomic_fetch_* builtins aside from add/sub are implemented with a CAS loop that may benefit from this: https://godbolt.org/z/z87v9Kbcz