From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <oleh.derevenko@gmail.com>
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com
 [IPv6:2a00:1450:4864:20::52e])
 by sourceware.org (Postfix) with ESMTPS id B862E3858D28
 for <libc-alpha@sourceware.org>; Wed,  3 Nov 2021 17:27:04 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B862E3858D28
Received: by mail-ed1-x52e.google.com with SMTP id j21so11871424edt.11
 for <libc-alpha@sourceware.org>; Wed, 03 Nov 2021 10:27:04 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=fxiibbwRbVpFLCgakzq/NJIFaMOg5zdNR4Or2Q09py4=;
 b=Myqe3oO+SqEakXYQe0+fN6K105j/S7nU76Sedad6IOdXqwsj2PFfAl+XaKk9uO1VRL
 u2KIzIKqmymkJmC4ZwZt+Id8PjopaspZKFuY0BBF/229RwhouQim3FfNmG679hbAFhNg
 2ioC6xoydqNXCxLlgFlObtipKmR0ZZZtHYfpHf16pp6M5CJ6biZaTbC7Az25azAsuoi/
 GBWvDjRH5LounttIcXFOS6GsERjS6emPaI3+1azGlU3Et34ThfdjqOrhU7r7a3wadDQc
 pcP9e9U+6nbs1O7S1Pb2YmlKkmkY46h4mYb87QhYhplRsCCBZY8NjpouPSbeViX2YQKc
 cprw==
X-Gm-Message-State: AOAM532W/g56IAQPeuBGczTbmu78zhtWKRosW+5iSKh2G8J/byaFHh7x
 EMoKpMAaezI3/0ZQjTIQw5cBhx7WQxypFPVCKKw=
X-Google-Smtp-Source: ABdhPJyPrNm8eDCnmt7r/zCLT9qrxngw4Rj63GfP+z9pCYFiXBJLnJoGddSkKs5x3J6BrmLstgUZQa9V08NFZC8h/x8=
X-Received: by 2002:a17:906:7044:: with SMTP id
 r4mr24464976ejj.256.1635960423779; 
 Wed, 03 Nov 2021 10:27:03 -0700 (PDT)
MIME-Version: 1.0
References: <20211103150415.1211388-1-hjl.tools@gmail.com>
 <CAC1wWD2pTqw3OC4fPssOpdaR99G5AL8XuV9h8wGFep2xeyGbZg@mail.gmail.com>
 <fe0064d4-3281-3c31-2013-09c877f2feb7@linux.intel.com>
In-Reply-To: <fe0064d4-3281-3c31-2013-09c877f2feb7@linux.intel.com>
From: Oleh Derevenko <oleh.derevenko@gmail.com>
Date: Wed, 3 Nov 2021 19:26:52 +0200
Message-ID: <CAC1wWD3Q8BVEZ4ZBOqULT7LbfZ_3MegsQZ=t=g3OWaeEvNAfLA@mail.gmail.com>
Subject: Re: [PATCH] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq
 [BZ #28537]
To: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H.J. Lu" <hjl.tools@gmail.com>, libc-alpha@sourceware.org
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Nov 2021 17:27:06 -0000

Arjan

> What the patch does is check non-atomic first if the actual atomic operation has
a chance of working. if it has a chance, the actual normal atomic
operation is done as
before. But if non-atomic read already tells you the cmpxchg has no
chance to succeed, it errors
out early.

The idea of atomic function is that they are intended to work fairly
with any type of memory. In your case, the speculative reads for a
cached device memory may result in cache access only and will prevent
fetching memory updates from the device, thus making the reading
thread "see" the change later than it could.
If you want to make a "RAM-specific" version of compare-n-exchange
give it a distinct specific name.

On Wed, Nov 3, 2021 at 7:00 PM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
> On 11/3/2021 8:50 AM, Oleh Derevenko wrote:
> > Hi, H.J. Lu
> >
> > You may not perform plain reads on values you want to be atomic. This
> > results in undefined behavior.
>
> so the way the patch works is that it does not DEPEND on that read to be atomic.
>
> What the patch does is check non-atomic first if the actual atomic operation has
> a chance of working. if it has a chance, the actual normal atomic operation is done as
> before. But if non-atomic read already tells you the cmpxchg has no chance to succeed, it errors
> out early.
>
> The big gain is for the contended lock case (_acq suffix!). If there's, say, 4 threads spinning
> on a lock. Before this patch these 4 cpu cores would be taking turns bouncing the cacheline around
> super aggressively.. which causes system degradation and worse, also makes the core that will
> eventually unlock the lock wait for the cacheline.
>
> Now with the patch, the "it is locked already" is noticed before the cacheline gets taken exclusive,
> so all 4 spinning cores have the same cacheline in a shared state -- no pingponging.
> Now the core that is going to unlock the lock can now do the exclusive acquire of the cacheline
> without having to fight with those 4 cores in the exclusive acquire fight.
>
>
> > For example, the compiler IS NOT obliged to perform the read with a
> > single CPU instruction -- of course it will not, but it is allowed to
> > read it in two halves and compare them separately. Or it may reuse
> > cached value from previous evaluations.
>
> > This is only the compiler level issue. Similar issues will arise at
> > CPU level with all the kind of memory coherency, caching and
> > instruction reordering.
>
> the cpu in this case won't, the x86 memory model won't allow that
> (and this is in the x86 implementation code)
>
> > Or if the value would cross a cache line boundary the plain read might
> > return half-updated value with the part from one cache line being new
> > and the other part being old.
>
> (I can't say in polite company what cmpxchg across cache lines does)


-- 

Oleh Derevenko

-- Skype with underscore