From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) by sourceware.org (Postfix) with ESMTPS id 6F7AE3858D28 for ; Wed, 3 Nov 2021 17:55:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F7AE3858D28 Received: by mail-ed1-x52a.google.com with SMTP id 5so11791816edw.7 for ; Wed, 03 Nov 2021 10:55:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=LVxIw/Pk75sBO/I0M5FmvrKEjw1aGctHNBspeh2EiT8=; b=SwKxasZyRMBEI96dTF552AjmcCz/rx38TfiUNNw3mXViSJX6vXd6iMpKO+41h4P4Ek LMR/x/mDKH31XHqnCvkcru+ZQKGD21c1eI0lUtw9StcNiNgjyRj6/E9Ya4ZupIUhERPE 2YWYSMBn5GlG3QyOLZTWdH6bxJSrOrglLpmKRvLFndrDOtDcpYq6wt92HDvE07vDsiTJ Nb5EH/f4wqeO2Y7yPQM9fkOqGFcbtc59l7m0VwLwx0van6gc6ZA9qvEHVeV2EwDv//4V kW8tN4qlVSYfFMK2Ya+VsS6KZ+JLBEL5u/6ay7n+5tmVFuAtus0t3BOsnqQNayXjQilH tHww== X-Gm-Message-State: AOAM533wUqVqOkBp4JUjPtGO04W6iEchd/tel5IRKAO+hAi+KwScLCoE mp24p2+9nPXBNVUqu35G/GCpPN52gNRQj5+qNaw= X-Google-Smtp-Source: ABdhPJzhQb0LPOdE6O/S47eezGk6C8rc4/dOyP565ys7dT9TDLrjq1DfiYzUtKEngYLE5FAZF6y7he6F5A+FBT29lLM= X-Received: by 2002:a50:da48:: with SMTP id a8mr62581296edk.146.1635962148494; Wed, 03 Nov 2021 10:55:48 -0700 (PDT) MIME-Version: 1.0 References: <20211103150415.1211388-1-hjl.tools@gmail.com> <5b16e035-d523-21c9-226c-3c8b8d9aa759@linux.intel.com> In-Reply-To: <5b16e035-d523-21c9-226c-3c8b8d9aa759@linux.intel.com> From: Oleh Derevenko Date: Wed, 3 Nov 2021 19:55:35 +0200 Message-ID: Subject: Re: [PATCH] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537] To: Arjan van de Ven Cc: "H.J. Lu" , libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Nov 2021 17:55:50 -0000 Arjan, > eh I am not sure I understand what you say since cmpxchg uses the exact s= ame > cache protocol/etc to do its read... Well, if you are sure... I did not have that information. The last question if you permit. What, as to your opinion, are the reasons they did it in the hardware implementation? This thing, I mean: > The full compare and swap will grab the cache line exclusive and cause ex= cessive cache line bouncing. Why was not this optimization initially implemented in the CPU? Well, I guess I can explain it myself. Because in the success cases these extra checks would add to execution time. And cmpcxhg can be used for many purposes =E2=80=94 not only for polling from four threads in parallel. On Wed, Nov 3, 2021 at 7:30 PM Arjan van de Ven wro= te: > > On 11/3/2021 10:26 AM, Oleh Derevenko wrote: > > Arjan > > > >> What the patch does is check non-atomic first if the actual atomic ope= ration has > > a chance of working. if it has a chance, the actual normal atomic > > operation is done as > > before. But if non-atomic read already tells you the cmpxchg has no > > chance to succeed, it errors > > out early. > > > > The idea of atomic function is that they are intended to work fairly > > with any type of memory. In your case, the speculative reads for a > > cached device memory may result in cache access only and will prevent > > fetching memory updates from the device, thus making the reading > > thread "see" the change later than it could. > > > eh I am not sure I understand what you say since cmpxchg uses the exact s= ame > cache protocol/etc to do its read... it won't go to device memory either= if > the cache line is anywhere in the cache hierarchy (including core-to-core= transfers > in case another core has it in their caches) > > > (and cmpxchg on MMIO space has very very interesting and unexpected behav= ior. If folks remember > the "linux torches your e1000 eeprom" bug from some years ago, it came fr= om that) --=20 Oleh Derevenko -- Skype with underscore