From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hjl.tools@gmail.com>
Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com
 [IPv6:2607:f8b0:4864:20::432])
 by sourceware.org (Postfix) with ESMTPS id 0E854385801D
 for <libc-alpha@sourceware.org>; Mon,  8 Nov 2021 15:45:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E854385801D
Received: by mail-pf1-x432.google.com with SMTP id c4so4646519pfj.2
 for <libc-alpha@sourceware.org>; Mon, 08 Nov 2021 07:45:37 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=RK6JmgSRgzc4VM6KFrSET8CFSIMcM5GI19l2CEg35WY=;
 b=xrfukLRiXz+LpdueR4NidG5Yrv3LmzRVUGkPpOXynKpkToFn3lftN/0bbH6n98S8WN
 KsCGLst6RFBuvBDwzcjH6InYeU0imaEdeomgYbKffPSn1IFuT7d0tle4TkhlHXr6MAMq
 lVJ6yA0ySLURvRnS3ZCLmTTQ5WZgDNWJRekMy9e9mdz2SIANbiOsWVeCvVdVYgE+kAZS
 4ZtCilWnjCmtGkNYK1rOsII1VLqbMOAmIR0DmY5ACx8wBAe+QxN8rq32jUun+0O0KebV
 VgPDA2BliDE3smoiQZqMRYqQLfWaxyzRj6htS0/AyZ29IKigPWaDyvuVjZnGwkqZ3cFa
 orpA==
X-Gm-Message-State: AOAM533WBouMPosV6yVg7+72hpNbBM1cKl11bzzErWvoWhiuW40N4f6H
 Gw1iyFU3WOegh9BBDqC8Jr4Ppycn0KwMnq0aVNU=
X-Google-Smtp-Source: ABdhPJwhBRV+1SMidEZ0IC07YS4lfCBWi9jcRr9XX2xgbfm6NxSC38jx5c+r/iwtkb3/bZeX7J8AESdkxPZkpUcmDGQ=
X-Received: by 2002:a05:6a00:2351:b0:47b:d092:d2e4 with SMTP id
 j17-20020a056a00235100b0047bd092d2e4mr110245pfj.76.1636386336113; Mon, 08 Nov
 2021 07:45:36 -0800 (PST)
MIME-Version: 1.0
References: <20211104161443.734681-1-hjl.tools@gmail.com>
 <CAFUsyf+g=aieH9z0vc81=vN8RgJu0CD+0Zs_zWo-9XB6MU6+ww@mail.gmail.com>
In-Reply-To: <CAFUsyf+g=aieH9z0vc81=vN8RgJu0CD+0Zs_zWo-9XB6MU6+ww@mail.gmail.com>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 8 Nov 2021 07:45:00 -0800
Message-ID: <CAMe9rOow=xg_qH6bwYxO-x9=RpuHpDUPTsUOnvgCacBLNJx3xA@mail.gmail.com>
Subject: Re: [PATCH v3] x86: Optimize
 atomic_compare_and_exchange_[val|bool]_acq [BZ #28537]
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
 Florian Weimer <fweimer@redhat.com>, 
 Hongyu Wang <hongyu.wang@intel.com>, Andreas Schwab <schwab@linux-m68k.org>, 
 liuhongt <hongtao.liu@intel.com>, Arjan van de Ven <arjan@linux.intel.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3029.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Nov 2021 15:45:38 -0000

On Mon, Nov 8, 2021 at 7:32 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Thu, Nov 4, 2021 at 11:15 AM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > From the CPU's point of view, getting a cache line for writing is more
> > expensive than reading.  See Appendix A.2 Spinlock in:
> >
> > https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
> >
> > The full compare and swap will grab the cache line exclusive and cause
> > excessive cache line bouncing.  Load the current memory value via a
> > volatile pointer first, which should be atomic and won't be optimized
> > out by compiler, check and return immediately if writing cache line may
> > fail to reduce cache line bouncing on contended locks.
> >
> > This fixes BZ# 28537.
> >
> > A GCC bug is opened:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065
> >
> > The fixed compiler should define __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK
> > to indicate that compiler will generate the check with the volatile load.
> > Then glibc can check __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to avoid the
> > extra volatile load.
> > ---
> >  sysdeps/x86/atomic-machine.h | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h
> > index 2692d94a92..597dc1cf92 100644
> > --- a/sysdeps/x86/atomic-machine.h
> > +++ b/sysdeps/x86/atomic-machine.h
> > @@ -73,9 +73,20 @@ typedef uintmax_t uatomic_max_t;
> >  #define ATOMIC_EXCHANGE_USES_CAS       0
> >
> >  #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \
> > -  __sync_val_compare_and_swap (mem, oldval, newval)
> > +  ({ volatile __typeof (*(mem)) *memp = (mem);                         \
> > +     __typeof (*(mem)) oldmem = *memp, ret;                            \
> > +     ret = (oldmem == (oldval)                                         \
> > +           ? __sync_val_compare_and_swap (mem, oldval, newval)         \
> > +           : oldmem);                                                  \
> > +     ret; })
> >  #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
> > -  (! __sync_bool_compare_and_swap (mem, oldval, newval))
> > +  ({ volatile __typeof (*(mem)) *memp = (mem);                         \
> > +     __typeof (*(mem)) oldmem = *memp;                                 \
> > +     int ret;                                                          \
> > +     ret = (oldmem == (oldval)                                         \
> > +           ? !__sync_bool_compare_and_swap (mem, oldval, newval)       \
> > +           : 1);                                                       \
> > +     ret; })
> >
> >
> >  #define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \
> > --
> > 2.33.1
> >
>
> Worth noting on X86 any of the __atomic_fetch_* builtins aside from add/sub
> are implemented with a CAS loop that may benefit from this:
> https://godbolt.org/z/z87v9Kbcz

This is:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

-- 
H.J.