From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 0E854385801D for ; Mon, 8 Nov 2021 15:45:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E854385801D Received: by mail-pf1-x432.google.com with SMTP id c4so4646519pfj.2 for ; Mon, 08 Nov 2021 07:45:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RK6JmgSRgzc4VM6KFrSET8CFSIMcM5GI19l2CEg35WY=; b=xrfukLRiXz+LpdueR4NidG5Yrv3LmzRVUGkPpOXynKpkToFn3lftN/0bbH6n98S8WN KsCGLst6RFBuvBDwzcjH6InYeU0imaEdeomgYbKffPSn1IFuT7d0tle4TkhlHXr6MAMq lVJ6yA0ySLURvRnS3ZCLmTTQ5WZgDNWJRekMy9e9mdz2SIANbiOsWVeCvVdVYgE+kAZS 4ZtCilWnjCmtGkNYK1rOsII1VLqbMOAmIR0DmY5ACx8wBAe+QxN8rq32jUun+0O0KebV VgPDA2BliDE3smoiQZqMRYqQLfWaxyzRj6htS0/AyZ29IKigPWaDyvuVjZnGwkqZ3cFa orpA== X-Gm-Message-State: AOAM533WBouMPosV6yVg7+72hpNbBM1cKl11bzzErWvoWhiuW40N4f6H Gw1iyFU3WOegh9BBDqC8Jr4Ppycn0KwMnq0aVNU= X-Google-Smtp-Source: ABdhPJwhBRV+1SMidEZ0IC07YS4lfCBWi9jcRr9XX2xgbfm6NxSC38jx5c+r/iwtkb3/bZeX7J8AESdkxPZkpUcmDGQ= X-Received: by 2002:a05:6a00:2351:b0:47b:d092:d2e4 with SMTP id j17-20020a056a00235100b0047bd092d2e4mr110245pfj.76.1636386336113; Mon, 08 Nov 2021 07:45:36 -0800 (PST) MIME-Version: 1.0 References: <20211104161443.734681-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Mon, 8 Nov 2021 07:45:00 -0800 Message-ID: Subject: Re: [PATCH v3] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537] To: Noah Goldstein Cc: GNU C Library , Florian Weimer , Hongyu Wang , Andreas Schwab , liuhongt , Arjan van de Ven Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3029.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2021 15:45:38 -0000 On Mon, Nov 8, 2021 at 7:32 AM Noah Goldstein wrote: > > On Thu, Nov 4, 2021 at 11:15 AM H.J. Lu via Libc-alpha > wrote: > > > > From the CPU's point of view, getting a cache line for writing is more > > expensive than reading. See Appendix A.2 Spinlock in: > > > > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf > > > > The full compare and swap will grab the cache line exclusive and cause > > excessive cache line bouncing. Load the current memory value via a > > volatile pointer first, which should be atomic and won't be optimized > > out by compiler, check and return immediately if writing cache line may > > fail to reduce cache line bouncing on contended locks. > > > > This fixes BZ# 28537. > > > > A GCC bug is opened: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065 > > > > The fixed compiler should define __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK > > to indicate that compiler will generate the check with the volatile load. > > Then glibc can check __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to avoid the > > extra volatile load. > > --- > > sysdeps/x86/atomic-machine.h | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h > > index 2692d94a92..597dc1cf92 100644 > > --- a/sysdeps/x86/atomic-machine.h > > +++ b/sysdeps/x86/atomic-machine.h > > @@ -73,9 +73,20 @@ typedef uintmax_t uatomic_max_t; > > #define ATOMIC_EXCHANGE_USES_CAS 0 > > > > #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \ > > - __sync_val_compare_and_swap (mem, oldval, newval) > > + ({ volatile __typeof (*(mem)) *memp = (mem); \ > > + __typeof (*(mem)) oldmem = *memp, ret; \ > > + ret = (oldmem == (oldval) \ > > + ? __sync_val_compare_and_swap (mem, oldval, newval) \ > > + : oldmem); \ > > + ret; }) > > #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \ > > - (! __sync_bool_compare_and_swap (mem, oldval, newval)) > > + ({ volatile __typeof (*(mem)) *memp = (mem); \ > > + __typeof (*(mem)) oldmem = *memp; \ > > + int ret; \ > > + ret = (oldmem == (oldval) \ > > + ? !__sync_bool_compare_and_swap (mem, oldval, newval) \ > > + : 1); \ > > + ret; }) > > > > > > #define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \ > > -- > > 2.33.1 > > > > Worth noting on X86 any of the __atomic_fetch_* builtins aside from add/sub > are implemented with a CAS loop that may benefit from this: > https://godbolt.org/z/z87v9Kbcz This is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069 -- H.J.